File Information
File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/concl/03/w03-2006_concl.xml
Size: 5,042 bytes
Last Modified: 2025-10-06 13:53:51
<?xml version="1.0" standalone="yes"?> <Paper uid="W03-2006"> <Title>Can Text Analysis Tell us Something about Technology Progress?</Title> <Section position="7" start_page="0" end_page="0" type="concl"> <SectionTitle> 4 Afterword </SectionTitle> <Paragraph position="0"> It appears that there is a local grammar , comprising vocabulary of t he specialist domain and a sy ntax that appears different from the general (universal?) syntax, used in framing the claims, background and su mmary of the invention in a US Patent document. A number of slots in the US PTO document are reserved for proper na mes patentees, assignees, places of work, and other slots hold dates and all these slots show the e xtremes of the local grammar - essentially a gra mmar for a one-word language. The document comprises 'references to (other patents) and also citations to an extant by other later patents - this information is encoded in another local grammar of one or more 4 -tuples referring to a referenced patent - the 4-tuple has a clearly defined s equence and allows expressions only in terms of four noun phrases. The referenced patent number is an active hyperlink through which the details of the refe renced patent can be a ccessed and subsequently a chain of references can be established in a (semi -) automatic manner. The existence of a local gra mmar and the hyperlinks s uggests to us that one can create a historic (diachronic) description of an invention together with the crucial account of the influence of other inventions.</Paragraph> <Paragraph position="1"> Restricted syntax is used, for example, in describing time (hours, minutes, seconds, days, years, months), in financial news wire as well as mission-critical communication. The sp ecialist vocabulary, and more so the productive use of the vocabulary (see below for details), as well as the restricted syntax emerges initially for assuring a mbiguity-free communication in an inherent noisy medium of communication - natural language.</Paragraph> <Paragraph position="2"> Complementary to the emergence of the present US patent document, there has been an a ccumulation of terminological knowledge in terms of the repositories usually referred to as patent classification. The Patent Offices around the world classify all manners of 'art icles' ranging from micro -electronics to kitchen utensils and from software systems to heavy excavation machinery, for example. Much like a number of other utilita rian classification systems, including the Dewey Decimal Classification on the one hand and the US National Library of Medicine's Disease Classific ation system on the other, the US PTO classification system is detailed, complex, full of cross refe rences, and occasionally confusing. The fact r emains, however, that like all utilita rian systems, the US PTO classification system is a rich repository that can be used, with some alterations, as the lex ical/terminological resource for information extra ction in particular and NLP in general. The repository states the ontological commitment of the US PTO and its advisers, and can be used for building knowledge representation schema or s emantic processing sy stems.</Paragraph> <Paragraph position="3"> The appearance of a local grammar, or perhaps local grammars, used to frame a patent document together with an extensive terminology database of patent class ification, is good news for the patent processing comm unity. There is some hope that the information extraction and NLP sy stems will be able to extrac t the terminology and identify the idiosyncratic syntax that governs the different parts of the patent document with the help of techniques pioneered in corpus linguistics.</Paragraph> <Paragraph position="4"> Terminology extraction can be facilitated by refe rring to the patent classific ation terminology base and facilitated by various statistical and linguisti c techniques used to identify complex noun -phrases in specialist texts. Once the local grammar is ide ntified it will be able to meaningfully process the documents for inferring the imp ort of a given invention in relation to other inventions and to assess the impact of journal publications of inventions.</Paragraph> <Paragraph position="5"> And, indeed all manner of new ways of examining a patent document may open up once the investig ator overcomes the burden of sifting th rough an overgrowing lexical mountain of new patents, rev isions to existing patents and the scientific and technical publication juggernaut that adds more to the mountain on almost daily basis. The aut omatic extraction of compounds from a corpus of patent documents appears to show the introduction of new artifacts through the use of morphological processes like word formations. Cu rrently, our work in progress is to 'chart' a transfer of such terms in journal papers onto patents, in a ddition to the exercise reported which charts the transfer of terms within a diachronically organised corpus of patent documents.</Paragraph> </Section> class="xml-element"></Paper>