File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/02/c02-1071_metho.xml

Size: 9,431 bytes

Last Modified: 2025-10-06 14:07:53

<?xml version="1.0" standalone="yes"?>
<Paper uid="C02-1071">
  <Title>Integrating Shallow Linguistic Processing into a Unication{based</Title>
  <Section position="4" start_page="3" end_page="4" type="metho">
    <SectionTitle>
3 Latch: The Linguistic Tagger and
Chunker
</SectionTitle>
    <Paragraph position="0"> Latch was rstly conceived as a lexical disambiguation tool based on analyses promotion/reduction by means of weighted symbolic contextrules(Porta,1996).</Paragraph>
    <Paragraph position="1"> It is a lean formalism where lexical information, including fullform, lemma and MorphoSyntactic Description (MSD), is expressed byregularexpressions. Thepivotsoftherules, which specify the tokens to be disambiguated, aresequences oflexical elementsthatreceivea voteontheir morphosyntacticanalyses. Votes may be positive or negative to promote or to eliminatethem,respectively. Inaddition,aprecondition may be expressed in the pivots to specify the typeofambiguity the rule is referredto. Linear generalizations areexpressed bymeansofcontextualoperatorsforimmediate,  unboundedandconstrainedunboundedcontextualconditions. null  Besides phrase structure rules, a set of word structure rules are applied at the parsing component performing morphosyntactic analysis.</Paragraph>
    <Paragraph position="2">  Inafurtherdevelopmentstate,theLatchformalismwasextendedsothatitcanalsobeused null to mark chunks (or intra{clausal partial constituents)(Abney,1996)andusethatinforma- null tion for PoS disambiguation. This interaction of PoS disambiguation and partial parsing reduces the eort needed for writing rules considerably and improves results (Marimon and Porta,2000).</Paragraph>
  </Section>
  <Section position="5" start_page="4" end_page="56" type="metho">
    <SectionTitle>
4 Integrating PoS Tags and Chunks
</SectionTitle>
    <Paragraph position="0"> into the Grammar The integration of shallow processing techniques(PoStaggingandpartialparsing)isfully null supported by the open architecture of ALEP, which allowseasyintegrationofexternalmodules. null  Oursystemrequiressomechangestothedefault architecture of the ALEP system where boththeTHsystemandthemorphographemic analysis component are replaced by a unique external preprocessing module (Latch). It also requires the lifting componenttobeextendedinordertotransfertheinformationde- null livered by the external preprocessing module  intothehigh{levellinguisticprocessingcomponents. Thechangestobemadeinthehigh{level linguistic processing components, however, are very thin: word structurerules havetobeextended, but phrase structure rules and lexical entriescanbeleftuntouched.</Paragraph>
    <Section position="1" start_page="4" end_page="56" type="sub_section">
      <SectionTitle>
4.1 Text Structure to Linguistic
Structure Rules
</SectionTitle>
      <Paragraph position="0"> Latch is currently being used to annotate the 125 million word Corpus Diacronico del Espa~nol (CORDE) and 125 million word Corpus de Referencia del Espa~nol Actual (CREA)by the Departamento de Lingustica Computational de la Real Academia Espa~nola. Some results on the rst version of the tool can be found in (Sanchez et al., 1999).</Paragraph>
      <Paragraph position="1"> LDsrepresentingmorphemes,fullforms,andthe top node establishing the axiom of the grammar. null  Structure rules, then, are distributed according to the dierenttypes of structural units being involved in the parsing operation: `morphemestowords'(wordstructurerules)or `wordstosentences'(phrasestructurerules).</Paragraph>
      <Paragraph position="2">  Integrating PoS information in a system like ALEP means dening TS{LS rules propagatingthemorphosyntacticinformationassociated null tofullforms(i.e. PoStagandlemma)delivered by the tagger to the relevant morphosyntactic featuresatthelexicalentriesofthegrammar.</Paragraph>
      <Paragraph position="3"> The integration of PoS tags into ALEP is done at the level `M'.By using the lowest tag level to lift the lexical information associated tofullforms, wecanpropagatethe ambiguities which can not be reliably solved by the shallowprocessingtooltothegrammarcomponent, null thusensuringthattheaccuracyofthegrammar remainsthesame.</Paragraph>
      <Paragraph position="4"> (1)shows the rule we dened to lift the tag  Similar to the integration of PoS information, theintegrationofchunkmark{upsintheALEP system requires TS{LS rules to convert them into LD data structures used by the linguistic processingcomponentsofALEP.</Paragraph>
      <Paragraph position="5">  Normally, this will be the sentence node, though it can also be any phrasal node when partial input strings are to be processed.</Paragraph>
      <Paragraph position="6">  The output of the lifting process is a Partial Linguistic Structure (PLS) where the hierarchical relations between the dierent structural elements is expressed in terms of week dominance relations.</Paragraph>
      <Paragraph position="7"> The integration of chunk mark{ups into ALEP is done at the level 'W'. By integrating chunk mark{ups at the intermediate level, weavoidmodifyingphrasestructureruleswhich buildupaLDontopoftheconvertedLDs: (i) attaching post{head sisters (modiers and/or complementstotherightoftheheadelement), (ii) and/or attaching modiers and/or speciers to the left of the head element when the chunkhasonlybeenpartially recognized. Furthermore,weavoidinterference withthesetof phrasestructureruleswhichbuild upthesame type of LDs. These rules are maintained to build up nodes thathave not been marked up bythepreprocessingmodule.</Paragraph>
      <Paragraph position="8">  The system we propose, in addition, integratesintothehigh{levelcomponentsofALEP null LDswhichdonotneedtobere-builtbyphrase structure rules, since, even though they are quiteunderspecied w.r.t. theheadelementof thechunk(theyonlycontaininformationabout  itspart{of{speech),theyalreadyspecifysyntactic and semantic information about the non{ head elements that have been attached to the</Paragraph>
    </Section>
    <Section position="2" start_page="56" end_page="56" type="sub_section">
      <SectionTitle>
4.2 Word Structure Rules
</SectionTitle>
      <Paragraph position="0"> Besides the TS{LS rules wehave presented, the strategy we propose also requires unary word structure rules to consolidate the structuralnodes provided by the`lift' operationfor thenewtags`M'and`W'.</Paragraph>
      <Paragraph position="1">  Theserules,inaddition,areinchargeofpercolating the linguistic information of the head element of the chunk, which is encoded in the  lexicon,tothemothernode,whichalreadycontainsinformationaboutthenon{headelements null  These rules are applied when parsing words to sentences, whereas lifted chunk mark{ups are dealt with word structure rules (cf. section 4.2).</Paragraph>
      <Paragraph position="2">  This strategy,however, requires very specialized TS{ LS rules not only w.r.t. the category of the head element (noun, verb, adjective, adverb) but also the number, category (determiner, adjective, adverb, auxiliary, ...) and type (denite, indenite, ...) of non{head elements.  notnd aspecic lexical entrytoapply.Note that having default lexical entries in a system like ALEP increases ambiguity, and, thus, the parsing search space, unless a mechanism is used to restrict as much as possible the templatesthatareactivated. Theintegrationofthe tagger, which supplies the PoS information to thelinguisticprocessingmodulesofoursystem, allowsustoincrease robustnesswhile avoiding increaseinPoSambiguity.</Paragraph>
      <Paragraph position="3">  Therearetwobasicwaystodenedefaultlexicalentries. Oneistoimplementunderspecied lexical entry templates assigned to eachmajor wordclasssuchthat,whileparsing,thesystem llsinthemissinginformationofeachunknown  word(Horiguchietal.,1995;;MusicandNavarretta,1996;; Mitsuishi et al., 1998;; Groverand Lascarides,2001). In theotherapproach,very detailed default lexical entries for eachmajor wordclassaredened.</Paragraph>
      <Paragraph position="4"> Theapproachwehavefollowedfallsundera middle type. Wehave dened several default lexical entry templates for the dierent majorwordclasses|verbs,nouns,adjectivesand null  adverbs|whichcovertheirmostfrequentsubcategorization frames. These templates, however,areunspeciedw.r.t. thosefeatureswhich encode the subcategorization restrictions imposed on their subjects and complements, e.g.</Paragraph>
      <Paragraph position="5"> marking prepositions, lexical semantics, etc.</Paragraph>
      <Paragraph position="6"> This information is lled by the application of phrasestructurerules.</Paragraph>
      <Paragraph position="7"> First experiments testing the eect of our default lexical entries, however, showed that, bycovering the most frequent subcategorization frames, we ensured that the accuracy of the grammar |percentage of input sentences that received the correct analysis |remained the same. The precision of the grammar | percentage of input sentences thatreceived no superuous(orwrong)analysis|,however,was verylow,sincewecouldnotrestrictthelexical templatetobeactivatedforeachwordtype.</Paragraph>
      <Paragraph position="8"> To improve the precision of the system we extended the PoS tags of our external lexicon (i.e. thelexiconweuseformorphosyntacticannotation in Latch) so that they included syntacticinformationaboutthesubcategorizedfor null elements (category, marking prepositions, ...).</Paragraph>
      <Paragraph position="9"> Thisallowedustoreducethenumberofdefault lexicaltemplatestobeapplied.</Paragraph>
    </Section>
  </Section>
class="xml-element"></Paper>
Download Original XML