File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/04/w04-0908_metho.xml
Size: 23,884 bytes
Last Modified: 2025-10-06 14:09:09
<?xml version="1.0" standalone="yes"?> <Paper uid="W04-0908"> <Title>Input Text Linguistic Component Formal Description Visualizer Component OutputAnimation</Title> <Section position="4" start_page="0" end_page="0" type="metho"> <SectionTitle> 3 Carsim </SectionTitle> <Paragraph position="0"> Carsim (Egges et al., 2001; Dupuy et al., 2001) is a program that analyzes texts describing car accidents and visualizes them in a 3D environment. It has been developed using real-world texts.</Paragraph> <Paragraph position="1"> The Carsim architecture is divided into two parts that communicate using a formal representation of ule that extracts information from the report and fills the frame slots. The second part is a virtual scene generator that takes the structured representation as input, creates the visual entities, and animates them (Figure 1).</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="metho"> <SectionTitle> 4 A Corpus of Traffic Accident </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Descriptions </SectionTitle> <Paragraph position="0"> As development and test sets, we have collected approximately 200 reports of road accidents from various Swedish newspapers. The task of analyzing the news reports is made more complex by their variability in style and length. The size of the texts ranges from a couple of sentences to more than a page. The amount of details is overwhelming in some reports, while in others most of the information is implicit. The complexity of the accidents described ranges from simple accidents with only one vehicle to multiple collisions with several participating vehicles and complex movements.</Paragraph> <Paragraph position="1"> Although our work has concentrated on the press clippings, we also have access to accident reports from the STRADA database (Swedish TRaffic Accident Data Acquisition) of V&quot;agverket, the Swedish traffic authority. STRADA registers nearly all the accidents that occur in Sweden (Karlberg, 2003).</Paragraph> <Paragraph position="2"> (All the accidents where there are casualties.) After an accident, the victims describe the location and conditions of it in a standardized form collected in hospitals. The corresponding reports are transcribed in a computer-readable format in the STRADA database. This source contains two kinds of reports: the narratives written by the victims of the accident and their transcriptions by traffic experts. The original texts contain spelling mistakes, abbreviations, and grammatical errors. The transcriptions often simplify, interpret the original texts, and contain jargon.</Paragraph> <Paragraph position="3"> The next text is an excerpt from our development corpus. This report is an example of a press wire describing an accident.</Paragraph> <Paragraph position="4"> En d&quot;odsolycka intr&quot;affade inatt s&quot;oder om Vissefj&quot;arda p@a riksv&quot;ag 28. Det var en bil med tv@a personer i som kom av v&quot;agen i en v&quot;ansterkurva och k&quot;orde i h&quot;og hastighet in i en gran. Passageraren, som var f&quot;odd -84, dog. F&quot;oraren som var 21 @ar gammal v@ardas p@a sjukhus med sv@ara skador.</Paragraph> <Paragraph position="5"> Polisen misst&quot;anker att bilen de f&quot;ardades i, en ny Saab, var stulen i Emmaboda och det ska under dagen unders&quot;okas.</Paragraph> <Paragraph position="6"> Sveriges Radio, November 9, 2002 A fatal accident took place tonight south of Vissefj&quot;arda on Road 28. A car carrying two persons departed from the road in a left-hand curve and crashed at a high speed into a spruce. The passenger, who was born in 1984, died. The driver, who was 21 years old, is severely injured and is taken care of in a hospital. The police suspects that the car they were traveling in, a new Saab, was stolen in Emmaboda and will investigate it today.</Paragraph> <Paragraph position="7"> The text above, our translation.</Paragraph> </Section> </Section> <Section position="6" start_page="0" end_page="0" type="metho"> <SectionTitle> 5 Knowledge Representation </SectionTitle> <Paragraph position="0"> The Carsim language processing module reduces the text content to a formal representation that outlines what happened and enables a conversion to a symbolic scene. It uses information extraction techniques to map a text onto a structure that consists of three main elements: A scene object, which describes the static parameters of the environment, such as weather, light, and road configuration.</Paragraph> <Paragraph position="1"> A list of road objects, for example cars, trucks, and trees, and their associated sequences of movements.</Paragraph> <Paragraph position="2"> A list of collisions between road objects.</Paragraph> <Paragraph position="3"> The structure of the formalism, which sets the limit of what information can be expressed, was designed with the help of traffic safety experts at the Department of Traffic and Road at Lund University. It contains the information necessary to reproduce and animate the accident entities in our visualization model. We used an iterative process to design it. We started from a first incomplete model (Dupuy et al., 2001) and we manually constructed the representation of about 50 texts until we had reached a sufficient degree of expressivity.</Paragraph> <Paragraph position="4"> The representation we use is a typical example of frames `a la Minsky, where the objects in the representation consist of a number of attribute/values slots which are to be filled by the information ex- null developed. The concepts are ordered in an inheritance hierarchy.</Paragraph> <Paragraph position="5"> Figure 2 shows how Carsim's graphical user interface presents the representation of the accident in the example above. The scene element contains the location of the accident and the configuration of roads, in this case a left-hand bend. The list of road objects contains one car and one tree. The event chain for the car describes the movements: the car leaves the road. Finally, the collision list describes one collision between the car and the tree.</Paragraph> </Section> <Section position="7" start_page="0" end_page="0" type="metho"> <SectionTitle> 6 The Information Extraction Module </SectionTitle> <Paragraph position="0"> The information extraction subsystem fills the frame slots. Its processing flow consists in analyzing the text linguistically using the word groups obtained from the linguistic modules and a sequence of semantic modules. The information extraction sub-system uses the literal content of certain phrases it finds in the text or infers the environment and the actions.</Paragraph> <Paragraph position="1"> We use a pipeline of modules in the first stages of the natural language processing chain. The tasks consists of tokenizing, part-of-speech tagging, splitting into sentences, detecting the noun groups, clause boundaries, and domain-specific multiwords.</Paragraph> <Paragraph position="2"> We use the Granska part-of-speech tagger (Carlberger and Kann, 1999) and Ejerhed's algorithm (Ejerhed, 1996) to detect clause boundaries.</Paragraph> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.1 Named Entity Recognition </SectionTitle> <Paragraph position="0"> Carsim uses a domain-specific named entity recognition module, which detects names of persons, places, roads, and car makes (Persson and Danielsson, 2004).</Paragraph> <Paragraph position="1"> The recognition is based on a small database of 2,500 entries containing person names, city and region names, and car names. It applies a cascade of regular expressions that takes into account the morphology of Swedish proper noun formation and the road nomenclature. The recall/precision performance of the detector is 0.89/0.97.</Paragraph> </Section> <Section position="2" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.2 Finding the Participants </SectionTitle> <Paragraph position="0"> The system uses the detected noun groups to identify the physical objects, which are involved in the accident. It extracts the headword of each group and associates it to an entity in the ontology. We used parts of the Swedish WordNet as a resource to develop this dictionary ( @Ake Viberg et al., 2002).</Paragraph> <Paragraph position="1"> We track the entities along the text with a simple coreference resolution algorithm. It assumes that each definite expression corefers with the last sortally consistent (according to the ontology) entity which was mentioned. Indefinite expressions are assumed to be references to previously unmentioned entities. This is similar to the algorithm mentioned in (Appelt and Israel, 1999). Although this approach is relatively simple, we get reasonable results with it and could use it as a baseline when investigating other approaches.</Paragraph> <Paragraph position="2"> Figure 3 shows an excerpt from a text with the annotation of the participants as well as their coreferences. null Olyckan intr&quot;affade n&quot;ar [bilen]1 som de fem f&quot;ardades i k&quot;orde om [en annan personbil]2. N&quot;ar [den]1 sv&quot;angde tillbaka in framf&quot;or [den omk&quot;orda bilen]2 fick [den]1 sladd och for med sidan rakt mot fronten p@a [den m&quot;otande lastbilen]3.</Paragraph> <Paragraph position="3"> The accident took place when [the car]1 where the five people were traveling overtook [another car]2.</Paragraph> <Paragraph position="4"> When [it]1 pulled in front of [the overtaken car]2,</Paragraph> </Section> <Section position="3" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.3 Resolution of Metonymy </SectionTitle> <Paragraph position="0"> Use of metonymy, such as alternation between the driver and his vehicle, is frequent in the Swedish press clippings. An improper resolution of it introduces errors in the templates and in the visualization. It can create independently moving graphic entities i.e. the vehicle and its driver, that should be represented as one single object, a moving vehicle, or stand together.</Paragraph> <Paragraph position="1"> We detect the metonymic relations between drivers and their vehicles. We use either cue phrases like lastbilschauff&quot;oren ('the truck driver') or the location or instrument semantic roles in phrases like Mannen som f&quot;ardades i lastbilen ('The man who was traveling in the truck'). We then apply constraints on the detected events and directions to exclude wrong candidates. For example, given the phrase Mannen krockade med en traktor ('The man collided with a tractor'), we know that the man cannot be the driver of the tractor.</Paragraph> <Paragraph position="2"> We do not yet handle the metonymic relations between parts of vehicles and the vehicles themselves.</Paragraph> <Paragraph position="3"> They are less frequent in the texts we have examined. null</Paragraph> </Section> <Section position="4" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.4 Marking Up the Events </SectionTitle> <Paragraph position="0"> Events in car accident reports correspond to vehicle motions and collisions. We detect them to be able to visualize and animate the scene actions. To carry out the detection, we created a dictionary of words - nouns and verbs - depicting vehicle activity and maneuvers. We use these words to anchor the event identification as well as the semantic roles of the dependents to determine the event arguments.</Paragraph> <Paragraph position="1"> Figure 4 shows a sentence that we translated from our corpus of news texts, where the groups have been marked up and labeled with semantic roles.</Paragraph> <Paragraph position="2"> Gildea and Jurafsky (2002) describe an algorithm to label automatically semantic roles in a general context. They use the semantic frames and associated roles defined in FrameNet (Baker et al., 1998) and train their classifier on the FrameNet corpus. They report a performance of 82 percent.</Paragraph> <Paragraph position="3"> Carsim uses a classification algorithm similar to the one described in this paper. However, as there is no lexical resource such as FrameNet for Swedish and no widely available parser, we adapted it. Our classifier uses a more local strategy as well as a different set of attributes.</Paragraph> <Paragraph position="4"> The analysis starts from the words in our dictionary for which we designed a specific set of frames and associated roles. The classifier limits the scope of each event to the clause where it appears. It identifies the verb and nouns dependents: noun groups, prepositional groups, and adverbs that it classifies according to semantic roles.</Paragraph> <Paragraph position="5"> The attributes of the classifier are: Target word: the keyword denoting the event.</Paragraph> <Paragraph position="6"> Head word: the head word of the group to be classified.</Paragraph> <Paragraph position="7"> Syntactic class of head word: noun group, prepositional group, or adverb.</Paragraph> <Paragraph position="8"> Voice of the target word: active or passive.</Paragraph> <Paragraph position="9"> Domain-specific semantic type: Dynamic object, static object, human, place, time, cause, or speed.</Paragraph> <Paragraph position="10"> The classifier chooses the role, which maximizes the estimated probability of a role given the values of the target, head, and semantic type attributes:</Paragraph> <Paragraph position="12"> If a particular combination of target, head, and semantic type is not found in the training set, the classifier uses a back-off strategy, taking the other attributes into account.</Paragraph> <Paragraph position="13"> We annotated manually a set of 819 examples on which we trained and tested our classifier. We used a random subset of 100 texts as a test set and the rest as a learning set. On the test set, the classifier achieved an accuracy of 90 percent. A classifier based on decision trees built using the ID3 algorithm with gain ratio measure yielded roughly the same performance.</Paragraph> <Paragraph position="14"> The value of the semantic type attribute is set using domain knowledge. Removing this attribute degraded the performance of the classifier to 80 percent. null When the events have been detected in the text, they can be represented and interpreted in the formal description of the accidents.</Paragraph> <Paragraph position="15"> We observed that event coreferences are very frequent in longer texts: A same action like a collision is repeated in several places in the text. As for metonymy, duplicated events in the template entails a wrong visualization. We solve it through the unification of as many events as possible, taking metonymy relations into account, and we remove the duplicates.</Paragraph> </Section> <Section position="5" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.5 Time Processing and Event Ordering </SectionTitle> <Paragraph position="0"> In some texts, the order in which events are mentioned does not correspond to their chronological order. To address this issue and order the events correctly, we developed a module based on the generic TimeML framework (Pustejovsky et al., 2002). We use a machine learning approach to annotate the whole set of events contained in a text and from this set, we extract events used specifically by the Carsim template - the Carsim events.</Paragraph> <Paragraph position="1"> TimeML has tags for time expressions (today), &quot;signals&quot; indicating the polarity (not), the modality (could), temporal prepositions and connectives such as for, during, before, after, events (crashed, accident), and tags that indicate relations between entities. Amongst the relations, the TLINKs are the most interesting for our purposes. They express temporal relations between time expressions and events as well as temporal relations between pairs of events.</Paragraph> <Paragraph position="2"> We developed a comprehensive phrase-structure grammar to detect the time expressions, signals, and TimeML events and to assign values to the entities' attributes. The string den tolfte maj ('May 12th') is detected as a time expression with the attribute value=&quot;YYYY-05-12&quot;. We extended the TimeML attributes to store the events' syntactic features. They include the part-of-speech annotation and verb group structure, i.e. auxiliary + participle, etc.</Paragraph> <Paragraph position="3"> We first apply the PS rules to detect the time expressions, signals, and events. Let e1, e2, e3, ..., en be the events in the order they are mentioned in a text. We then generate TLINKs to relate these events together using a set of decision trees.</Paragraph> <Paragraph position="4"> We apply three decision trees on sequences of two to four consecutive events (ei;ei+1; [;ei+2[;ei+3]]), with the constraint that there is no time expression between them, as they might change the temporal ordering substantially. The output of each tree is the temporal relation holding between the first and last event of the considered sequence, i.e. respectively: adjacent pairs (ei;ei+1), pairs separated by one event (ei;ei+2), and by two events (ei;ei+3). The possible output values are simultaneous, after, before, is included, includes, and none. As a result, each event is linked by TLINKs to the three other events immediately after and before it.</Paragraph> <Paragraph position="5"> We built automatically the decision trees using the ID3 algorithm (Quinlan, 1986). We trained them on a set of hand-annotated examples, which consists of 476 events and 1,162 TLINKs.</Paragraph> <Paragraph position="6"> As a set of features, the decision trees use certain attributes of the events considered, temporal signals between them, and some other parameters such as the number of tokens separating the pair of events to be linked. The complete list of features with x ranging from 0 to 1, 0 to 2, and 0 to 3 for each tree respectively, and their possible values is: Eventi+xTense: none, past, present, future, NOT DETERMINED.</Paragraph> <Paragraph position="7"> Eventi+xAspect: progressive, perfective, perfective progressive, none, temporalSignalInbetween: none, before, after, later, when, still, several.</Paragraph> <Paragraph position="8"> tokenDistance: 1, 2 to 3, 4 to 6, 7 to 10, greater than 10.</Paragraph> <Paragraph position="9"> sentenceDistance: 0, 1, 2, 3, 4, greater than 4.</Paragraph> <Paragraph position="10"> punctuationSignDistance: 0, 1, 2, 3, 4, 5, greater than 5.</Paragraph> <Paragraph position="11"> The process results in an overgeneration of links. The reason for doing this is to have a large set of TLINKs to ensure a fine-grained ordering of the events. As the generated TLINKs can be conflicting, we assign each of them a score, which is derived from the C4.5 metrics (Quinlan, 1993).</Paragraph> <Paragraph position="12"> We complement the decision trees with heuristics and hints from the event interpreter that events are identical. Heuristics represent common-sense knowledge and are encoded as nine production rules. An example of them is that an event in the present tense is after an event in the past tense. Event identity and heuristics enable to connect events across the time expressions. The TLINKs generated by the rules also have a score that is rule dependent.</Paragraph> <Paragraph position="13"> When all TLINKs are generated, we resolve temporal loops by removing the TLINK with the lowest score within the loop. Finally, we extract the Carsim events from the whole set of TimeML events and we order them using the relevant TLINKs.</Paragraph> </Section> <Section position="6" start_page="0" end_page="0" type="sub_section"> <SectionTitle> 6.6 Detecting the Roads </SectionTitle> <Paragraph position="0"> The configuration of roads is inferred from the information in the detected events. When one of the involved vehicles makes a turn, this indicates that the configuration is probably a crossroads.</Paragraph> <Paragraph position="1"> Additional information is provided using key-word spotting in the text. Examples of relevant key-words are korsning ('crossing'), 'rondell' ('roundabout') and kurva ('bend'), which are very likely indicators of the road configuration if seen in the text.</Paragraph> <Paragraph position="2"> These methods are very simple, but the cases where they fail are quite rare. During the evaluation described below, we found no text where the road configuration was misclassified.</Paragraph> </Section> </Section> <Section position="8" start_page="0" end_page="0" type="metho"> <SectionTitle> 7 Evaluation of the Information </SectionTitle> <Paragraph position="0"/> <Section position="1" start_page="0" end_page="0" type="sub_section"> <SectionTitle> Extraction Module </SectionTitle> <Paragraph position="0"> To evaluate the performance of the information extraction component, we applied it to 50 previously unseen texts, which were collected from newspaper sources on the web. The size of the texts ranged from 31 to 459 words. We calculated precision and recall measures for detection of road objects and for detection of events. A road object was counted as correctly detected if there was a corresponding object in the text, and the type of the object was correct. The same criteria apply to the detection of events, but here we also add the criterion that the actor (and victim, where this applies) must be correct. The performance figures are shown in Tables 1 and 2.</Paragraph> <Paragraph position="1"> Total number of objects in the texts 105 Number of detected objects 110 Number of correctly detected objects 94 The system was able to extract or infer all relevant information correctly in 23 of the 50 texts. In order to find out the causes of the errors, we investigated what simplifications of the texts needed to be</Paragraph> </Section> </Section> <Section position="9" start_page="0" end_page="0" type="metho"> <SectionTitle> 8 Scene Synthesis and Visualization </SectionTitle> <Paragraph position="0"> The visualizer reads its input from the formal description. It synthesizes a symbolic 3D scene and animates the vehicles. We designed the graphic elements in the scene with the help of traffic safety experts.</Paragraph> <Paragraph position="1"> The scene generation algorithm positions the static objects and plans the vehicle motions. It uses rule-based modules to check the consistency of the description and to estimate the 3D start and end co-ordinates of the vehicles.</Paragraph> <Paragraph position="2"> The visualizer uses a planner to generate the vehicle trajectories. A first module determines the start and end positions of the vehicles from the initial directions, the configuration of the other objects in the scene, and the chain of events as if they were no accident. Then, a second module alters these trajectories to insert the collisions according to the accident slots in the accident representation (Figure 5).</Paragraph> <Paragraph position="3"> This two-step procedure can be justified by the descriptions found in most reports. The car drivers generally start the description of their accident as if it were a normal movement, which is subsequently been modified by the abnormal conditions of the accident. null Finally, the temporal module of the planner assigns time intervals to all the segments of the trajectories. null Figure 6 shows two screenshots that the Carsim visualizer produces for the text above. It should be noted that the graphic representation is intended to be iconic in order not to convey any meaning which is not present in the text.</Paragraph> </Section> class="xml-element"></Paper>