File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-1006_metho.xml
Size: 23,887 bytes
Last Modified: 2025-10-06 14:09:10
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-1006"> <Title>Legal Texts Summarization by Exploration of the Thematic Structures and Argumentative Roles</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> JASPER NATIONAL PARK Applicants and THE ATTORNEY GENERAL OF CANADA Respondent, </SectionTitle> <Paragraph position="0"> Docket: T-1557-98</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Judgment Professional abstract Role </SectionTitle> <Paragraph position="0"> [1] This application for judicial review arises out of a decision (the Decision) announced on or about the 30th of June 1998 by the Minister of Canadian Heritage (the Minister) to close the Maligne River (the River) in Jasper National Park to all boating activity, beginning in 1999.</Paragraph> <Paragraph position="1"> Judicial review of Minister of Canadian Heritage's decision to close Maligne River in Jasper National Park to all boating activity beginning in 1999 to protect habitat of harlequin ducks.</Paragraph> </Section> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> INTRO- DUCTION </SectionTitle> <Paragraph position="0"> [7] The applicants offer commercial rafting trips to Park visitors in this area each year from mid-June to sometime in September.</Paragraph> <Paragraph position="1"> Applicants offer commercial rafting trips on River.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> CONTEXT </SectionTitle> <Paragraph position="0"> [10] Consequently, a further environmental assessment regarding commercial rafting on the Maligne River was prepared in 1991. The assessment indicated that rafting activity had expanded since 1986, with an adverse impact on Harlequin ducks along the Maligne River.</Paragraph> <Paragraph position="1"> 1991 environmental assessment indicating rafting having adverse impact on harlequin ducks along river.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> CONTEXT </SectionTitle> <Paragraph position="0"> the legal domain and legal interpretations of expressions produce many ambiguities. For example, the word sentence can have two very different meanings: one is a sequence of words and the other is a more particular meaning in law, the decision as to what punishment is to be imposed. Similarly disposition which means nature, effort, mental attitude or property but in legal terms it means the final part of a judgement indicating the nature of a decision: acceptation of a inquiry or dismission.</Paragraph> <Paragraph position="1"> Most previous systems of automatic summarization are limited to newspaper articles and scientific articles (Saggion and Lapalme, 2002). There are important differences between news style and the legal language: statistics of words, probability of selection of textual units, position of paragraphs and sentences, words of title and lexical chains relations between words of the title and the key ideas of the text, relations between sentences and paragraphs and structures of the text.</Paragraph> <Paragraph position="2"> For judgments, we show that we can identify discursive structures for the different parts of the decision and assign some argumentative roles to them.</Paragraph> <Paragraph position="3"> Newspapers articles often repeat the most important message but, in law, important information may appear only once. The processing of a legal document requires detailed attention and it is not straight forward to adapt the techniques developed for other types of document to the legal domain.</Paragraph> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Observations from a Corpus </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.1 Composition </SectionTitle> <Paragraph position="0"> Our corpus contains 3500 judgments of the Federal Court of Canada, which are available in HTML on http://www.canlii.org/ca/cas/fct/.</Paragraph> <Paragraph position="1"> We analyzed manually 50 judgments in English and 15 judgments in French as well as their summaries written by professional legal abstractors. The average size of the documents that are input to our system are judgments between 500 and 4000 words long (2 to 8 pages), which form 80% of all 3500 judgments; 10% of the documents having less than 500 words (about one page) and so they do not need a summary. Only 10% of the decisions have more than 4000 words. Contrary to some existing systems (Moens et al., 1999) that focus only on limited types of judgments, such as criminal cases, our research deals with many categories of texts such as: Access to information, Administrative law, Air law, Broadcasting, Competition, Constitutional law, Copyright, Customs and Excise - Customs Act, Environment, Evidence, Human rights, Maritime law, Official languages, Penitentiaries, Unemployment insurance and etc.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 3.2 Structure of Legal Judgments </SectionTitle> <Paragraph position="0"> During our corpus analysis, we compared model summaries written by humans with the texts of the original judgments. We have identified the organisational architecture of a typical judgment.</Paragraph> <Paragraph position="1"> Thematic structures Content Judgment Summary</Paragraph> </Section> </Section> <Section position="9" start_page="0" end_page="0" type="metho"> <SectionTitle> DECISION DATA Name of the jurisdiction, </SectionTitle> <Paragraph position="0"> place of the hearing, date of the decision, identity of the author, names of parties, title of proceeding and The paragraphs that address the same subject are grouped as members of a block. We annotated the blocks with a label describing their semantic roles. We also manually annotated citations which are textual units (sentence or paragraph) quoted by the judge as reference, for example an article of law or other jurisprudence. The citations account for a large part of the text of the judgment, but they are not considered relevant for the summary, therefore these segments will be eliminated during the information filtering stage.</Paragraph> <Paragraph position="1"> The textual units considered as important by the professional abstractors were aligned manually with one or more elements of the source text. Table 1 shows an example of an alignment between a human summary and the original judgment. We look for a match between the information considered important in the professional abstract and the information in the source documents. Our observation shows that, for producing a summary, a professional abstractor mainly relies on the manual extraction of important units while conforming to general guidelines. The collection of these selected units forms a summary.</Paragraph> <Paragraph position="2"> During this analysis, we observed that texts of jurisprudence are organized according to a macrostructure and contain various levels of information, independently of the category of judgment. Proposed guidelines by Judge Mailhot of the Court of Appeal of Quebec (Mailhot, 1998) and (Branting et al., 1997) on legal judgments support this idea that it is possible to define organisational structures for decisions. Jurisprudence is organized by the discourse itself, which makes it possible to segment the texts thematically.</Paragraph> <Paragraph position="3"> Textual units dealing with the same subject form a thematic segment set. In this context, we distinguish the layered thematic segments, which divide the legal decisions into different discursive structures. The identification of these structures separates the key ideas from the details of a judgment and improves readability and coherency in the summary. We will present the argumentative roles of each level of discourse, and their importance in the judgment from the point of view of the key and principal ideas. Table 2 shows the structure of a jurisprudence and its different discourse levels. Therefore, in the presentation of a final summary, we propose to preserve this organization of the structures of the text in order to build a table style summary with five themes: DECISION DATA contains the name of the jurisdiction, the place of the hearing, the date of the decision, the identity of the author, names of parties, title of proceeding, authority and doctrine. It groups all the basic preliminary information which is needed for planning the decision.</Paragraph> <Paragraph position="4"> INTRODUCTION describes the situation before the court and answers these questions: who are the parties? what did they do to whom? CONTEXT explains the facts in chronological order, or by description. It recomposes the story from the facts and events between the parties and findings of credibility on the disputed facts.</Paragraph> <Paragraph position="5"> JURIDICAL ANALYSIS describes the comments of the judge and finding of facts, and the application of the law to the facts as found. For the legal expert this section of judgment is the most important part because it gives a solution to the problem of the parties and leads the judgment to a conclusion.</Paragraph> <Paragraph position="6"> CONCLUSION expresses the disposition which is the final part of a decision containing the information about what is decided by the court. For example, it specifies if the person is discharged or not or the cost for a party.</Paragraph> <Paragraph position="7"> During our corpus analysis, we computed the distribution of the information (number of words shown in Table 2) in each level of thematic structure of the judgment. The average length of a judgment is 3500 words and 350 words for its summary i.e. a compression rate of about 10%.</Paragraph> </Section> <Section position="10" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 Method for Producing Table Style Summary </SectionTitle> <Paragraph position="0"> Our approach for producing the summary first identifies thematic structures and argumentative roles in the document. We extract the relevant sentences and present them as a table style summary. Showing the information considered important which could help the user read and navigate easily between the summary and the source judgment. For each sentence of the summary, the user can determine the theme by looking at its rhetorical role. If a sentence seems more important for a user and more information is needed about this topic, the complete thematic segment containing the selected sentence could be presented. The summary is built in four phases (Figure 1): thematic segmentation, filtering of less important units such as citations of law articles, selection of relevant textual units and production of the summary within the size limit of the abstract.</Paragraph> <Paragraph position="1"> The implementation of our approach is a system called LetSum (Legal text Summarizer), which has been developed in Java and Perl. Input to the system is a legal judgment in English. To determine the Part-of-Speech tags, the tagger described by (Hepple, 2000) is used. The semantic grammars and rules are developed in JAPE language (Java Annotations Pattern Engine) and executed by a GATE transducer (Cunningham et al., 2002).</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.1 Components of LetSum </SectionTitle> <Paragraph position="0"> Thematic segmentation for which we performed some experiments with two statistic segmenters: one described by Hearst for the TexTiling system (Hearst, 1994) and the C99 segmenter described by Choi (Choi, 2000), both of which apply a clustering function on a document to find classes divided by theme. But because the results of these numerical segmenters were not satisfactory enough to find the thematic structures of the legal judgments, we decided to develop a segmentation process based on the specific knowledge of the legal field.</Paragraph> <Paragraph position="1"> Category of section title Linguistic markers Examples of section title Begin of the judgment decision, judgment, reason, Each thematical segment can be associated with an argumentative role in the judgment based on the following information: the presence of significant section titles (Table 3 shows categories and features of the section titles), the absolute and relative positions of a segment, the identification of direct or narrative style (as the border of CONTEXT and JURIDICAL ANALYSIS segments), certain linguistic markers. null The linguistic markers used for each thematic segment are organized as follows: CONTEXT introduces the parties with the verb to be (eg. the application is company X), describes the application request like: advise, indicate, request and explains the situation in the past tense and narration form.</Paragraph> <Paragraph position="2"> In JURIDICAL ANALYSIS, the judge gives his explanation on the subject thus the style of expression is direct such as: I, we, this court, the cue phrases (Paice, 1981) like: In reviewing the sections No. of the Act, Pursuant to section No., As I have stated, In the present case, The case at bar is.</Paragraph> <Paragraph position="3"> In CONCLUSION the classes of verbs are: note, accept, summarise, scrutinize, think, say, satisfy, discuss, conclude, find, believe, reach, persuade, agree, indicate, review, the concepts such as: opinion, conclusion, summary, because, cost, action, the cue phrases: in the case at bar, for all the above reasons, in my view, my review of, in view of the evidence, finally, thus, consequently, in the result. This segment contains the final result of court decision using phrases such as: The motion is dismissed, the application must be granted. The important verbs are: allow, deny, dismiss, grant, refuse.</Paragraph> <Paragraph position="4"> Filtering identifies parts of the text which can be eliminated, without losing relevant information for the summary. In a judgment, the citation units (sentence or paragraph) occupy a large volume in the text, up to 30%, of the judgment, whereas their contents are less important for the summary. This is why we remove citations inside blocks of thematic segments. We thus filter two categories of segments: submissions and arguments that report the points of view of the parties in the litigation and citations related for previous issues or references to applicable legislation. In the case of eliminating a citation of a legislation (eg. law's article), we save the reference of the citation in DECISION DATA in the field of authority and doctrine.</Paragraph> <Paragraph position="5"> The identification of citations is based on two types of markers: direct and indirect. A direct marker is one of the linguistic indicators that we classified into three classes: verbs, concepts (noun, adverb, adjective) and complementary indications.</Paragraph> <Paragraph position="6"> Examples of verbs of citation are: conclude, define, indicate, provide, read, reference, refer, say, state, summarize. Examples of the concepts are: following, section, subsection, page, paragraph, pursuant. Complementary indications include numbers, certain preposition, relative clauses and typographic marks (colon, quotation marks).</Paragraph> <Paragraph position="7"> The indirect citations are the neighboring units of a quoted phrase. For example, in Table 4 a citation is shown. For detecting CITATION segment units such as paragraph 78(1), which reads as follows: are identified using direct markers (shown here in bold) but surrounding textual units with numbers are also quotations. We thus developed a linear integration identification mechanism for sentences following a quoted sentence for determining a group of citations.</Paragraph> <Paragraph position="8"> Selection builds a list of the best candidate units for each structural level of the summary. LetSum computes a score for each sentence in the judgment based on heuristic functions related to the following information: position of the paragraphs in the document, position of the paragraphs in the thematic segment, position of the sentences in the paragraph, distribution of the words in document and corpus (tf idf ). Depending on the given information in each layered segment, we have identified some cue words and linguistic markers. The thematic segment can change the value of linguistic indicators. For example, the phrase application is dismissed that can be considered as a important feature in the CONCLUSION might not have the same value in CONTEXT segment. At the end of this stage, the passages with the highest resulting scores are sorted to determine the most relevant ones.</Paragraph> <Paragraph position="9"> Production of the final summary in which the selected sentences are normalized and displayed in tabular format. The final summary is about 10% of source document. The elimination of the unimportant sentences takes into account length statistics presented in Table 2. In the INTRODUCTION segment, units with the highest score are kept within 10% of the size of summary. In the CONTEXT segment, the selected units occupy 24% of the summary length. The contribution of the JURIDICAL ANALYSIS segment is 60% and the units with the role CONCLUSION occupy 6% of the summary.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 4.2 Current State of LetSum </SectionTitle> <Paragraph position="0"> Table 4 shows an example of the output after the execution of the Selection module of LetSum (modules of Figure 1 up to the horizontal line) applied on a judgment of Federal Court of Canada (2468 words). Thematic segmentation module has divided the text into structural blocks according to the rhetorical roles (given to the left of braces in Table 4). The Filtering module removes citation blocks and its enumerated quoted paragraphs (e.g.</Paragraph> <Paragraph position="1"> paragraph (15) in tablet). Selection module chooses total relevant textual units (shown in bold in Table 4) in each thematic segment. The units are selected according to their argumentative role in the judgement. Here the length of all extracted units is 313 words.</Paragraph> <Paragraph position="2"> Preliminary evaluations of components of LetSum are very promising; we obtained 0.90 F-measure for thematic segmentation and 0.97 F-measure for filtering stage (detection of 57 quoted segment correctly on 60).</Paragraph> <Paragraph position="3"> From this information, the Production module (currently being implemented) could concatenate textual units with some grammatical modification to produce a short summary.</Paragraph> </Section> </Section> <Section position="11" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Related Research </SectionTitle> <Paragraph position="0"> LetSum is the one of the few systems developed specifically for the summarization of legal documents. All of these approaches attest the importance of the exploration of thematic structures in legal documents.</Paragraph> <Paragraph position="1"> The FLEXICON project (Smith and Deedman, 1987) generates a summary of legal cases by using information retrieval based on location heuristics, occurrence frequency of index terms and the use of indicator phrases. A term extraction module that recognizes concepts, case citations, statute citations and fact phrases leads to a document profile. This project was developed for the decision reports of Canadian courts, which are similar to our corpus.</Paragraph> <Paragraph position="2"> SALOMON (Moens et al., 1999) automatically extracts informative paragraphs of text from Belgian legal cases. In this project a double methodology was used. First, the case category, the case structure and irrelevant text units are identified based on a knowledge base represented as a text grammar. Consequently, general data and legal foundations concerning the essence of the case are extracted. Secondly, the system extracts informative text units of the alleged offences and of the opinion of the court based on the selection of representative objects.</Paragraph> <Paragraph position="3"> More recently, SUM (Grover et al., 2003) examined the use of rhetorical and discourse structure in level of the sentence of legal cases for finding the main verbes. The methodology is based on (Teufel and Moens, 2002) where sentences are classified according to their argumentative role.</Paragraph> <Paragraph position="4"> These studies have shown the interest of summarization in a specialized domain such as legal texts but none of these systems was implemented in an environment such as CANLII which has to deal with thousands of texts and produce summaries for each.</Paragraph> </Section> <Section position="12" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 Conclusion </SectionTitle> <Paragraph position="0"> In this paper, we have presented our approach for dealing with automatic summarization techniques.</Paragraph> <Paragraph position="1"> This work refers to the problem of processing of a huge volume of electronic documents in the legal field which becomes more and more difficult to access. Our method is based on the extraction of relevant units in the source judgment by identifying the discourse structures and determining the semantic roles of thematic segments in the document. The presentation of the summary is in a tabular form divided by the following thematic structures: DECI-</Paragraph> </Section> <Section position="13" start_page="0" end_page="0" type="metho"> <SectionTitle> SION DATA, INTRODUCTION, CONTEXT, JURIDI- CAL ANALYSIS and CONCLUSION. The generation </SectionTitle> <Paragraph position="0"> of summary is done in four steps: thematic segmentation to detect the document structures, filtering to eliminate unimportant quotations and noises, selection of the candidate units and production of table style summary. The system is currently being finalized and preliminary evaluation results are very promising.</Paragraph> </Section> <Section position="14" start_page="0" end_page="0" type="metho"> <SectionTitle> 7 Acknowledgements </SectionTitle> <Paragraph position="0"> We would like to thanks LexUM group of legal information-processing laboratory of the Public</Paragraph> </Section> <Section position="15" start_page="0" end_page="0" type="metho"> <SectionTitle> DECISION DATA </SectionTitle> <Paragraph position="0"> raise preliminary objections to the notice of an originating motion filed by the applicant (the Commissioner). As a result, this motion filed by Air Canada on March 18, 1997 raises six alternative preliminary objections asking the Court to strike out in part the motion made by the Commissioner on September 6, 1996 under section 78 of the Official Languages Act.</Paragraph> <Paragraph position="1"> French language at the Halifax airport. The Commissioner asks this Court to declare that there is a significant demand for services in French in Air Canada&quot;s office at the Halifax airport and that Air Canada is failing to discharge its duties under Part IV of the Act. Part IV establishes language-related duties for communications with and services to the public, including the travelling public, where there is significant demand.</Paragraph> <Paragraph position="2"> edy limited to facts relating to a specific complaint, the investigation of that complaint and the resulting reports and recommendations. In my view, this interpretation is too narrow and is inconsistent with the general objectives of the Act and its remedial and quasi-constitutional nature.</Paragraph> </Section> class="xml-element"></Paper>