File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/metho/01/h01-1046_metho.xml

Size: 5,308 bytes

Last Modified: 2025-10-06 14:07:34

<?xml version="1.0" standalone="yes"?>
<Paper uid="H01-1046">
  <Title>LaTaT: Language and Text Analysis Tools</Title>
  <Section position="3" start_page="0" end_page="0" type="metho">
    <SectionTitle>
DUTY RESPONSIBILITY
</SectionTitle>
    <Paragraph position="0">  modifiedby null adjectives fiduciary 319, active 251, other 82, official 76, additional 47, administrative 44, military 44, constitutional 41, reserve 24, high 23, moral 21, double 16, day-to-day 15, normal 15, specific 15, assigned 14, extra 13, operating 13, temporary 13, corporate 12, peacekeeping 12, possible 12, regular 12, retaliatory 12, heavy 11, routine 11, sacred 11, stiff 11, congressional 10, fundamental 10, hazardous 10, main 10, patriotic 10, punitive 10, special 10,  modifiedby null adjectives more 107, full 92, fiduciary 89, primary 88, personal 79, great 69, financial 64, fiscal 59, social 59, moral 48, additional 46, ultimate 39, day-to-day 37, special 37, individual 36, legal 35, other 35, corporate 30, direct 30, constitutional 29, given 29, overall 29, added 28, sole 25, operating 23, broad 22, political 22, heavy 20, main 18, shared 18, professional 17, current 15, federal 14, joint 14, enormous 13, executive 13, operational 13, similar 13, administrative 10, fundamental 10, specific 10, object-of verbs have 253, assume 190, perform 153, do 131, impose 118, breach 112, carry out 79, violate 54, return to 50, fulfill 44, handle 42, resume 41, take over 35, pay 26, see 26, avoid 19, neglect 18, shirk 18, include 17, share 17, discharge 16, double 16, relinquish 16, slap 16, divide 14, split 13, take up 13, continue 11, levy 11, owe 10, object-of verbs  have 747, claim 741, take 643, assume 390, accept 220, bear 187, share 103, deny 86, fulfill 53, meet 48, feel 47, retain 47, shift 47, carry out 45, take over 41, shoulder 29, escape 28, transfer 28, delegate 26, give 25, admit 23, do 21, acknowledge 20, exercise 20, shirk 20, divide 19, get 19, include 19, assign 18, avoid 17, put 17, recognize 17, hold 16, understand 16, evade 15, disclaim 12, handle 12, turn over 12, become 11, expand 11, relinquish 11, show 11,</Paragraph>
  </Section>
  <Section position="4" start_page="0" end_page="0" type="metho">
    <SectionTitle>
RELATION DESCRIPTION EXAMPLE
</SectionTitle>
    <Paragraph position="0"> appo appositive of a noun the CEO, John det determiner of a noun the dog gen genitive modifier of a noun John's dog mod adjunct modifier of any head tiny hole nn prenominal modifier of a noun station manager pcomp complement of a preposition in the garden subj subject of a verb John loves Mary.</Paragraph>
    <Paragraph position="1"> John found a solution to the problem.</Paragraph>
    <Paragraph position="2">  than if they shared the feature modified-by-fiduciary. The similarity measure proposed in (Lin, 1998) takes this into account by computing the mutual information between two words involved in a dependency relationship.</Paragraph>
    <Paragraph position="3"> Using the collocation database, (Lin, 1998) presented an unsupervised method to construct a similarity matrix. Given a word w, the matrix returns a set of similar words of w along with their similarity to w. For example, the 35 most similar words of duty, Beethoven, and eat are shown in Table 3. The similarity matrix consists of about 20,000 nouns, 4,000 verbs and 6,000 adjectives and adverbs.</Paragraph>
  </Section>
  <Section position="5" start_page="0" end_page="0" type="metho">
    <SectionTitle>
4. Unsupervised Induction of Semantic Classes
</SectionTitle>
    <Paragraph position="0"> Consider the similar words of Beethoven. The quality of similar words obviously decreases as the similarity value decreases.</Paragraph>
    <Paragraph position="1"> Some of the words have non-zero similarity simply because they share common features with Beethoven by accident. For example, tough guy is similar to Beethoven because both Beethoven and tough guy can be used as the object of the verb play.</Paragraph>
    <Paragraph position="2"> The similar words of duty exemplify another problem: The top similar words of a given word may be similar to different senses of the word. However, this is not made explicit by the similarity matrix.</Paragraph>
    <Paragraph position="3"> LaTaT includes an algorithm called UNICON (UNsupervised Induction of CONcepts) that clusters similar words to create semantic classes (Lin and Pantel, 2001a). UNICON uses a heuristic maximal-clique algorithm, called CLIMAX, to find clusters in the similar words of a given word. The purpose of CLIMAX is to find small, tight clusters. For example, two of the clusters returned by CLIMAX are:  clusters. The number after each word in the clusters is the similarity between the word and the centroid of that cluster. The UNICON algorithm computes the centroids of a cluster by averaging the collocational features of the words in the cluster. The CLIMAX algorithm is then recursively used to construct clusters of centroids and the clusters whose centroids are clustered together are merged. This process continues until no more clusters are merged. The details of the UNICON and CLIMAX algorithms are presented in (Lin and Pantel, 2001a). Table 4 shows 10 sample semantic classes identified by the UNICON algorithm, using a 1GB newspaper text corpus.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML