File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/n06-1018_metho.xml
Size: 11,479 bytes
Last Modified: 2025-10-06 14:10:11
<?xml version="1.0" standalone="yes"?> <Paper uid="N06-1018"> <Title>Understanding Temporal Expressions in Emails</Title> <Section position="4" start_page="137" end_page="139" type="metho"> <SectionTitle> 3 Representing Times in Natural Language </SectionTitle> <Paragraph position="0"> This section provides a concise overview of TCNL; readers are referred to (Han and Kohlhase, 2003; Han et al., 2006) for more detail.</Paragraph> <Paragraph position="1"> TCNL has two major components: a constraint-based model for human calendars and a representational language built on top of the model. Different from the other representations such as Zeit-Gram (Stede and Haas, 1998), TOP (Androutsopoulos, 1999), and TimeML/Timex3 (Saur'i et al., 2006), the language component of TCNL is essentially &quot;calendar-agnostic&quot; - any temporal unit can be plugged in a formula once it is defined in the calendar model, i.e., the calendar model serves as the lexicon for the TCNL language.</Paragraph> <Paragraph position="2"> Fig. 2 shows a partial model for the Gregorian calendar used in TEA. The entire calendar model is basically a constraint graph with partial ordering. The nodes labeled with &quot;year&quot; etc. represent temporal units (or variables when viewed as a constraint satisfaction problem (CSP) (Ruttkay, 1998)), and each unit can take on a set of possible values. The undirected edges represent constraints among the units, e.g., the constraint between month and day mandates that February cannot have more than 29 days.</Paragraph> <Paragraph position="3"> A temporal expression in NL is then viewed as if it assigns values to some of the units, e.g., &quot;Friday the 13th&quot; assigns values to only units dow (dayof-week) and day. An interval-based AC-3 algorithm with a chronological backtracking mechanism is used to derive at the consistent assignments to the other units, therefore allowing us to iterate to any one of the possible Friday the 13th.</Paragraph> <Paragraph position="4"> The ordering among the units is designated by two relations: measurement and periodicity (arrows in Fig. 2). These relations are essential for supporting various operations provided by the TCNL language such as determining temporal ordering of two time points, performing arithmetic, and changing temporal granularity, etc. For example, to interpret the expression &quot;early July&quot;, we identify that July is a value of unit month, and month is measured by day. We then obtain the size of July in terms of day (31) and designate the first 10 days (31/3) as the &quot;early&quot; part of July.</Paragraph> <Paragraph position="5"> Internally the calendar model is further partitioned into several components, and different components are aligned using non-binary constraints (e.g., in Fig. 2 the year component and the week component are aligned at the day and dow units).</Paragraph> <Paragraph position="6"> This is necessary because the top units in these component are not periodic within one another. All of the operations are then extended to deal with multiple calendar components.</Paragraph> <Paragraph position="7"> Built on top of the calendar model is the typed TCNL language. The three major types are coordinates (time points; e.g., {sep,6 hour) and enumerations (sets of points, including intervals; e.g., [{wed},{fri}] for Wednesday and Friday). More complex expressions can be represented by using various operators, relations and in such a way that syntactically different formulae can be evaluated to denote the same date;</Paragraph> <Paragraph position="9"> |} (&quot;next Tuesday&quot;) can denote the same date.</Paragraph> <Paragraph position="10"> Associated with the operators are type and granularity requirements. For example, when a focus is specified down to second granularity, the formula {now+|1 day |} will return a coordinate at the day granularity - essentially stripping away information finer than day. This is because the operator '+' (called fuzzy forward shifting) requires the left-hand side operand to have the same granularity as that of the right-hand side operand. Type coercion can also happen automatically if it is required by an operator. For example, the operator '@' (ordinal selection) requires that the right-hand side operand to be of type enumeration. When presenting a coordinate such as {>= } (some point in the future), it will be coerced The f denotes the relation &quot;finishes&quot; (Allen, 1984); the formula denotes a set of coordinates no later than a Saturday noon. into an enumeration so that the ordinal operator can select a requested element out of it. These designs make granularity change and re-interpretation part of a transparent process. Table 2 lists the operators in the TCNL language.</Paragraph> <Paragraph position="11"> Most of under-specified temporal expressions still lack necessary information in themselves in order to be anchored. For example, it is not clear what to make out of &quot;on Wednesday&quot; with no context. In TCNL more information can be supplied by using one of the coordinate prefixes: the '+'/'[?]' prefix signifies the relation of a coordinate with the focus (after/before the focus), and the 'f'/'p' indicates the relation of a coordinate with the speech time (future/past). For example, the Wednesday in &quot;the company will announce on Wednesday&quot; is represented as +f{wed}, while &quot;the company announced on Wednesday&quot; is represented as [?]p{wed}. When evaluating these formulae, TEA will rewrite the former into {|1</Paragraph> <Paragraph position="13"> sentially trying to find the nearest Wednesday either in the future or in the past. Since TCNL formulae can be embedded, prefixed coordinates can also appear inside a more complex formula; e.g., .</Paragraph> <Paragraph position="14"> Note that TCNL itself does not provide a mechanism to instantiate the temporal focus (' '). The responsibility of shifting a focus whenever necessary (focus tracking) is up to TEA, which is described in the next section.</Paragraph> <Paragraph position="15"> This denotes a possible range of dates, but it is still different from an enumeration.</Paragraph> </Section> <Section position="5" start_page="139" end_page="140" type="metho"> <SectionTitle> 4 TEA: Temporal Expression Anchorer </SectionTitle> <Paragraph position="0"> The input to our system TEA is English texts with temporal expression markups, and the output is a time string for each temporal expression. The format of a time string is similar to the ISO 8601 scheme: for a time point the format is YYYYMMDDTHHMMSS (T is a separator), for an interval it is a pair of points separated by '/' (slash). Also whenever there are slots that lack information, we use '?' (question mark) in its place. If a points can reside at any place between two bounds, we use (lower..upper) to represent it. Table. 3 shows the TEA output over the example email given in Fig. 1 (min and max are the minimal and the maximal time points TEA can reason with).</Paragraph> <Paragraph position="1"> TEA uses the following procedure to anchor each temporal expression: 1. The speech time (variable now) and the focus (' ') is first assigned to a timestamp (e.g., the received date of an email).</Paragraph> <Paragraph position="2"> 2. For each temporal expression, its nearest verb chunk is identified using the part-of-speech tags of the sentence. Expressions associated with a verb of past tense or present imperfective will be given prefix &quot;[?]p&quot; to its TCNL formula, otherwise it is given &quot;+f&quot; .</Paragraph> <Paragraph position="3"> 3. A finite-state parser is then used to transduce an expression into its TCNL formula. At the parsing stage the tense and granularity information is available to the parser.</Paragraph> <Paragraph position="4"> This is of course a simplification; future work needs to be done to explore other possibilities.</Paragraph> </Section> <Section position="6" start_page="140" end_page="141" type="metho"> <SectionTitle> 4. The produced TCNL formula (or formulae </SectionTitle> <Paragraph position="0"> when ambiguity arises) is then evaluated with the speech time and the current focus. In case of ambiguity, one formula will be chosen based on certain heuristics (below). The result of the evaluation is the final output for the expression.</Paragraph> <Paragraph position="1"> 5. Recency-based focus tracking: we use the following procedure to determine if the result obtained above can replace the current focus (below). In cases where the result is an ambiguous coordinate (i.e., it denotes a possible range of points), if one of the bounds is min or max, we use the other to be the new focus; if it is not possible, we choose to keep the focus unchanged. On the other hand, if the result is an enumeration, we go through a similar procedure to avoid using an enumeration with a min/max bound as the new focus. Finally no quantity can become a focus.</Paragraph> <Paragraph position="2"> Note that in Step 3 the decision to make partial semantics of a temporal expression available to our parser is based on the following observation: consider the two expressions below Both expressions share the same &quot;X before Y &quot; pattern, but their interpretations are different . The key to discriminate the two is to compare the granularities of X and Y : if Y if at a coarser granularity then the first interpretation should be adopted.</Paragraph> <Paragraph position="3"> In Step 4 we use the following procedure to disambiguate the result: de denotes a relation &quot;during or equal&quot; (Allen, 1984). 1. Remove any candidate that resulted in an inconsistency when solving for a solution in the calendar CSP.</Paragraph> <Paragraph position="4"> 2. If the result is meant to be a coordinate, pick the one that is closest to the focus.</Paragraph> <Paragraph position="5"> 3. If the result is supposed to be an enumeration, pick the one whose starting point is closest to the focus, and whose length is the shortest one. 4. Otherwise pick the first one as the result.</Paragraph> <Paragraph position="6"> For example, if the current time is 2:00 pm, for expression &quot;at 3&quot; with a present/future tense, the best answer is 15:00. For expression &quot;from 3 to 5&quot;, the best answer is from 3 pm to 5 pm.</Paragraph> <Paragraph position="7"> When deciding whether a temporal expression can become the next focus, we use simple heuristics to rule out any expression that behaves like a noun modifier. This is motivated by the following example (timestamp: 19970919T103315): IT basically analyses the breakdown on labor costs and compares our 1998 labor costs with their demands for 1999-2000.</Paragraph> <Paragraph position="8"> ...</Paragraph> <Paragraph position="9"> I will check mail on Sunday and see any feedback.</Paragraph> <Paragraph position="10"> Without blocking the expression 1999-2000 from becoming the focus, the last expression will be incorrectly anchored in year 2000. The key observation here is that a noun-modifying temporal expression usually serves as a temporal co-reference instead of representing a new temporal entity in the discourse. These references tend to have a more confined effect in anchoring the subsequent expressions.</Paragraph> </Section> class="xml-element"></Paper>