File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/06/w06-2712_metho.xml

Size: 9,820 bytes

Last Modified: 2025-10-06 14:10:55

<?xml version="1.0" standalone="yes"?>
<Paper uid="W06-2712">
  <Title>Representing and Accessing Multi-Level Annotations in MMAX2</Title>
  <Section position="2" start_page="0" end_page="75" type="metho">
    <SectionTitle>
2 The MMAX2 Data Model
</SectionTitle>
    <Paragraph position="0"> Like most current annotation tools, MMAX2 makes use of stand-off annotation. The base data is represented as a sequence of &lt;word&gt; elements with exactly one PCDATA child each. Normally, these PCDATA children represent orthographical words, but larger or smaller units (e.g. characters) are also possible, depending on the required granularity of the representation.2  The order of the elements in the base data file is determined by the order of the segments that they belong to.</Paragraph>
    <Paragraph position="1"> Annotations are represented in the form of &lt;markable&gt; elements which reference one or more base data elements by means of a span attribute. Each markable is associated with exactly one markable level which has a unique, descriptive name and which groups markables that belong to the same category or annotation dimension. Each markable level is stored in a separate XML file which bears the level name as an XML name space. The ID of a markable must be unique for the level that it belongs to. Markables can carry arbitrarily many features in the common attribute=value format. It is by means of these features that the actual annotation information is represented. For each markable level, admissible attributes and possible values are defined in an annotation scheme XML file (not shown).</Paragraph>
    <Paragraph position="2"> These annotation schemes are much more powerful for expressing attribute-related constraints than e.g. DTDs. The following first example shows the result of the segmentation of the sample base data. The participant attribute contains the identifier of the speaker that is associated with the respective segment.</Paragraph>
    <Paragraph position="3">  The next example contains markables representing the nominal and verbal chunks in the sample base data.</Paragraph>
    <Paragraph position="4">  1. Every markable is defined with reference to  the base data. Markables on the same or different levels are independent and ignorant of each other, and only related indirectly, i.e. by means of base data elements that they have in common.3 Structural relations like embedding ([['s] just [a specification]]) can only be determined with recourse to the base data elements that each markable spans. This lazy representation makes it simple and straightforward to add markables and entire markable levels to existing annotations. It is also a natural way to represent non-hierarchical relations like overlap between markables. For example, a segment break runs through the nominal chunk represented by markable markable 7836 ([a specification]) in the example above. If the segment markables were defined in terms of the markables contained in them, this would be a problem because the nominal chunk crosses a segment boundary. The downside of this lazy representation is that more processing is required for e.g. querying, when the structural relations between markables have to be determined.</Paragraph>
    <Paragraph position="5"> 2. Markables can be discontinuous. A markable normally spans a sequence of base data elements. Each connected subsequence of these is called a markable fragment. A discontinuous markable is one that contains more than one fragment, like 3Note that this merely means that markables are not defined in terms of other markables, while they can indeed reference each other: In the above example, markable markable 7837 (['s just a specification]) uses an associative relation (in this case namedsubject) to represent a reference to markable markable 7834 ([That]) on the same level. References to markables on other levels can be represented by prefixing the markable ID with the level name. markable markable 7838 ([the XML format]) above. Actually, this markable exemplifies what could be called discontinuous overlap because it does not only cross a segment boundary, but it also has to omit elements from an intervening segment by another speaker.</Paragraph>
    <Section position="1" start_page="73" end_page="74" type="sub_section">
      <SectionTitle>
3 Accessing Data From Within MMAX2
3.1 Visualization
</SectionTitle>
      <Paragraph position="0"> When a MMAX2 document is currently loaded, the main display contains the base data text plus  annotation-related information. This information can comprise * line breaks (e.g. one after each segment), * markable feature's values (e.g. the participant value at the beginning of each segment), * literal text (e.g. a tab character after the participant value), * markable customizations, and * markable handles.</Paragraph>
      <Paragraph position="1">  The so-called markable customizations are in charge of displaying text in different colors, fonts, font styles, or font sizes depending on a markable's features. The order in which they are applied to the text is determined by the order of the currently available markable levels. Markable customizations are processed bottom-up, so markable levels should be ordered in such a way that levels containing smaller elements (e.g. POS tags) should be on top of those levels containing larger elements (chunks, segments etc.). This way, smaller elements will not be hidden by larger ones.</Paragraph>
      <Paragraph position="2"> When it comes to visualizing several, potentially embedded or overlapping markables, the so-called markable handles are of particular importance. In their most simple form, markable handles are pairs of short strings (most often pairs of brackets) that are displayed directly before and after each fragment of a markable. When two or more markables from different levels start at the same base data element, the nesting order of the markables (and their handles) is determined on the basis of the order of the currently available markable levels. The color of markable handles can also be customized depending on a markable's features. Figure 1 gives an idea of what the  dles and text background for chunk markables with type=predication are rendered in light gray. Other handles are rendered in a darker color. Markable handles are sensitive to mouse events: resting the mouse pointer over a markable handle will highlight all handles of the pertaining markable. Reasonable use of markable customizations and handles allows for convenient visualization of even rather complex annotations.</Paragraph>
    </Section>
    <Section position="2" start_page="74" end_page="75" type="sub_section">
      <SectionTitle>
3.2 Querying
</SectionTitle>
      <Paragraph position="0"> MMAX2 includes a query console which can be used to formulate simple queries using a special multi-level query language called MMAXQL. A query in MMAXQL consists of a sequence of query tokens which describe elements (i.e. either base data elements or markables) to be matched, and relation operators which specify which relation should hold between the elements matched by two adjacent query tokens. A single markable query token has the form string/conditions where string is an optional regular expression and conditions specifies which features(s) the markable should match. The most simple condition is just the name of a markable level, which will match all markables on that level. If a regular expression is also supplied, the query will return only the matching markables. The query [Aa]n?\s.+/chunks will return all markables from the chunks level that begin with the indefinite article4. Markables with particular features can be queried by specifying the desired attribute-value combinations. The 4The space character in the regular expression must be masked as \s because otherwise it will be interpreted as a query token separator.</Paragraph>
      <Paragraph position="1"> following query e.g. will return all markables from the chunks level with a type value of either nn or demonstrative: /chunks.type={nn,demonstrative} If a particular value is defined for exactly one attribute on exactly one markable level only, both the level and the attribute name can be left out in a query, rendering queries very concise (cf. the access to the meta level below).</Paragraph>
      <Paragraph position="2"> Relation operators can be used to connect two query tokens to form a complex query.</Paragraph>
      <Paragraph position="3"> The set of supported sequential and hierarchical relation operators5 includes meets (default), starts, starts with, in, dom, equals, ends, ends with, overlaps right, and overlaps left. Whether two markables stand in a certain relation is determined with respect to the base data elements that they span. In the current early implementation, for all markables (including discontinuous ones), only the first and last base data element is considered. The result of a query can directly be used as the input to another query. The following example gives an idea of what a more complex query can look like. The query combines the segment level, the meta level (which contains markables representing e.g. pauses, emphases, or sounds like breathing or mike noise), and the base data level to retrieve those instances of you know from the ICSI Meeting corpus that occur in segments spoken by female speakers6 which also contain a pause or an emphasis (represented on the meta level): '[Yy]ou know' in (/participant={f.*} dom /{pause,emphasis}) The next query shows how overlap can be han- null dled. It retrieves all chunk markables along with their pertaining segments by getting two partial lists and merging them using the operator or.</Paragraph>
      <Paragraph position="4"> (/chunks in /segment) or (/chunks overlaps_right /segment)</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML