XML Viewer - w03-2006

File Information

File: 05-lr/acl_arc_1_sum/cleansed_text/xml_by_section/abstr/03/w03-2006_abstr.xml
Size: 13,981 bytes
Last Modified: 2025-10-06 13:43:10
<?xml version="1.0" standalone="yes"?>
<Paper uid="W03-2006">
  <Title>Can Text Analysis Tell us Something about Technology Progress?</Title>
  <Section position="1" start_page="0" end_page="0" type="abstr">
    <SectionTitle>
Abstract
</SectionTitle>
    <Paragraph position="0"> A corpus-based diachronic analysis of patent documents, based mainly on the morphologically productive use of certain terms can help in tracking the evolution of key developments in a rapidly e volving specialist field. The patent texts were o btained from the US Patent &amp; Trade Marks Office's on-line service and the terms were extracted automatically from the texts. The chosen specialist field was that of fast-switching devices and systems.</Paragraph>
    <Paragraph position="1"> The method presented draws from liter ature on biblio - and sciento -metrics, information extraction, corpus linguistics, and on aspects of English morphology. This interdisciplinary fram ework shows that the evolution of word -formation closely shadows the devel opments in a field of technology.</Paragraph>
    <Paragraph position="2"> Introduction A patent document is written to pe rsuade a techno-legal authority that the patentee should be allowed to manufacture, sell, or deal in an article to the exclusion of other persons. The article is typ ically based on an invention that the patentee(s) claim has been theirs. The term article is important in that it refers to a tangible object and its u sage is to emphasise that ideas, intangibles essentially, ca nnot be patented. Patent documents are the repos itory of how technology advances and, more importantly, show how language supports the change.</Paragraph>
    <Paragraph position="3"> The techno-legal authority requires the patent document to follow a template. This template is divided broadly into two parts: first, legal te mplates comprising pate ntee's details, juri sdictional scope, and related item; second, technical templates divided into a summary of the patentee's claims, relation of the article to previously patented articles - the so-called prior art - and the scientific/technical basis of t he claim. The scientific claim is written in a language that is similar to the language of journal p apers.</Paragraph>
    <Paragraph position="4"> One important task that is slowly emerging is the extent to which the analysis of a patent doc ument can be automated particularly to a ssess the overlap between the claims in the document about the article to be patented with that of related, rel evant and even counter -claims about the article. The related and rel evant claims and counter claims may be found in existing patent documents and may, more in directly, exist in journal papers.</Paragraph>
    <Paragraph position="5"> A patent document has to make references to all other relevant/related articles that have been patented prior to the invention of the art icle, which is yet to be patented and is the object of the patent document. The ref erences are made primarily by citing the name of the prior art patentees and the titles of their patent documents. A patent doc ument also has other linguistic descriptions of prior art; such descriptions are reminiscent of citations of journal papers in a journal paper. The overlap of a new patent document with a set of existing patent documents may suggest the impact of extant knowledge in patent documents on emerging knowledge in the new patent document. Such an overlap has been studied by the impact of US semiconductor technology on the rest of the world (Appleyard and Kalsow: 1999): this overlap relies largely on the fr equency of citation of a US patent by the name of its author or the author's place of work. In computational linguistic (CL) terms thi s exercise relies on proper noun extra ction.</Paragraph>
    <Paragraph position="6"> The patent document relates to an explicit and exclusive right over an intellectual property. A journal article relates to an implicit and i nclusive right over an intellectual property. The overlap between these two forms of claims is crucial not only in ascertaining the rights of the patentee, or the abuse of the rights of others by the pa tentee, but also for monitoring the effectiveness of r esearch based on a specialism as a whole or that of its component gro ups.</Paragraph>
    <Paragraph position="7"> The effect of one author or a group of authors working in an institution is indirectly mea sured by the so-called impact factor. This factor relates to the frequency of citation of one or more journal papers written by an author or by a group. The calculation of the impact fa ctor relies mainly on computing the frequency of the authors' name(s) within a corpus of journal articles. Such an impact factor type calculation is used typically in bibl iometrics (Garfield 1995). Again, as in intra -patent impact studies mentioned above, in CL terms this is an exercise in proper noun identification and extraction.</Paragraph>
    <Paragraph position="8"> The analysis of a patent document, together with the analysis of the related corpora of other patent documents and intellectual property doc uments, should be based on a framework which provides methods and tec hniques for analysing the contents of the document and of the corpora. For us the source of a framework still lies in li nguistic and language studies. Here we are pa rticularly interested in word formation and terminology u sage in highly specialised disc iplines particularly those disciplines that deal with inta ngible articles coupling the word formation and terminology u sage with the citation patterns of proper nouns brings us closer to analysing the contents of a patent document and its siblings distributed over co rpora. null Information scientists usually use the referen cing data of research documents to analyse know ledge evolution in scientific fields as well as to identify the key authors, institutes , and journals in specific domains, using tools such as publication counts, citation analysis, co -citation analysis, and co-term analysis to do so. In recent years, patent documents have gained considerable attention as a valuable resource that can be use d to analyse tec hnology advances using the same tools.</Paragraph>
    <Paragraph position="9"> Gupta and Pangannaya (2000) have applied bibliometric analysis to carbon nanotube patents to measure the growth of activity of carbon nan otube industries and their links with sc ience. They have also used patents data to study the country -wise distribution of patenting activity for the USA, J apan, and other countries. Sector -wise performances of industry, academia and government, and the active players of carbon nanotubes were also stu died. They describe the nature of inventions taking place in this particular field of technology, and the authors claim to have identified the emerging r esearch directions, and the active companies and research groups involved.</Paragraph>
    <Paragraph position="10"> Meyer (2001) has used citation anal ysis and co-word analysis of patent documents and sc ientific literature to explore the interrel ationship between nano-science and nano -technology. Meyer investigated patent citation relations at the orga nizational levels along with geographical locations and affiliations of inventors and a uthors. The term co-occurrence is used by Meyer to find the rel ationship between the patent documents and the two scientific literature databases SCI and INSPEC. He has noticed that '...the terms that occur frequently in the document titles of all databases are related to [...] instrumentalities and/or are located in fields that are generally associated with substantial indu strial research activity' (2001:177). Meyer has a rgued that 'Our data suggests that nano -technology and nano-science are essentially separate and he terogeneous, yet interrelated cumulative stru ctures' (2001:164).</Paragraph>
    <Paragraph position="11"> The study of word formation through n eologisms within the special language of science and technology has led some authors to argue that it is the scientists as technologists who attempt to rationalise our experience of the world around us in written language by using new words or forms or by relexicalising the existing stock (see Ahmad 2000 for relevant references). Some lexicogr aphers (see for example Quirk et al. 1985) have su ggested that neologisms can be formed by two processes: First, the addition or combination of elements such as compounding: Resonant Tunne ling Diodes and Scanning tunneling microscopy are examples for this type of neologism (compoundin g as a neologism formation is used extensively in science and technology literature); Second, the r eduction of elements into abbreviated forms. The abbreviations FET (Field E ffect Transistor) and MOSFET (Metallic Oxide Semiconductor FET) are examples of this type.</Paragraph>
    <Paragraph position="12"> Neologisms appear to signal the eme rgence of new concepts or artefacts and the frequency of this new word might indicate the scientific comm unity's acceptance of this new concept or artefact.</Paragraph>
    <Paragraph position="13"> Effenberger (1995) has argued that '... the faster a subject field is developing, the more novelties are constructed, discovered or created. And these no velties are talked and written about. In o rder to make this technical communication as efficient as possible, provision should be made for avoiding misunderstanding. One crucial point in this process is the vocabulary that is being used' (1995:131, emphasis added).</Paragraph>
    <Paragraph position="14"> In this paper we discuss the idiosyncratic la nguage used in patent documents. The language is replete with terms and there are instances within a patent document that suggest that the authors not only use the specialist terms but use a local syntax as well. We look specifically at the structure of the US Patents and suggest how with existing tec hniques used in information extraction and NLP, including term extraction and proper noun identif ication, one can perform fairly complex tasks in patent analysis - some of which are performed by patent experts by hand currently (Section 2). This examination suggests to us a model of develo pment in computer and semi -conductor technology: an incremental model where each subsequent pa tent helps in the development of ever -complex artifacts - starting from devices onto circuits and onto systems. We will look at one of the key i nventions in the field of semiconducto rs physics - the electron tunneling device . These devices co mbine technical elegance, experimental complexity and manufacturing challenge. Due to its strategic i mportance, a number of patents have been o btained by the US government and also by a nu mber of US and Japanese companies (Section 3). Section 4 concludes this paper.</Paragraph>
    <Paragraph position="15"> The Structure of US PTO Doc uments and a Local Grammar for the Documents null The USPTO database is a representative sa mple of patent documents. The USPTO has documents r elated to most bra nches of science and technology.</Paragraph>
    <Paragraph position="16"> It includes information about all US patent doc uments since the first pa tent issued in 1970 to the most recent. The USPTO database a llows the user to search the full text of the patent documents for a certain word or a co mbination of words. It also provides a field search for specific information such as inventor or assignee. The search can also be conducted for a sp ecific year or range of years. The US Patents are written partly as a legal text and partly as a scientific d ocument. Over the last 50 years or so, it appears that US Patent doc uments have been structured in terms of layout and have a superficial resemblance to Marvin Minsky's frame-like knowledge represe ntation schema.</Paragraph>
    <Paragraph position="17"> The patent document can be divided into three main parts for the present discussion: The first part comprises the biographical details of the inventors (and their employers) together with the title of the invention and a brief free -text abstract, dates when the patent was applied for and when the patent was granted and so on. The free text is essentially a summary of the claims of the pa tentee; The second part contains external refe rences of three sorts: the first sort is the specialist domain of the invention - the subject class indica ting the super-ordinate class and instances; the se cond sort are other cited patents organised as a 4 -tuple: (i) patent number, (ii) date of approval, (iii) first i nventor and (iv) classification number; and, the third sort is a bibliographic reference to public ations that may have contributed to the pa tent; The third part of a current US Patent document co mprises 'claims' related to the patent and the d escription of the 'invention' (there are diagrams of the inve ntion attached to the document and the diagrams d escribed in the text). Table 1 on the next page shows the template of the current (c. 1980 and a fter) USPTO's.</Paragraph>
    <Paragraph position="18"> The 'claims' of the patentees are clearly itemised and initialised by the number of the claim; the first claim is the basis of the patent abstract generally. The 'background to the invention' is written in an idiosyncratic fashion as well - the invention is first contextualised in a broader group of other inventions to date and then the specific nature of the invention is e xemplified. The broader and the specific are usually marked by phrases like 'The (present) invention relates to' and the specificity is phrased as '(more) specif ically.' or '(more) pa rticularly'. These phrases are followed by one or more noun phrases connected with, for example, c onjunctions or qualifiers. The first noun phrase names the article i nvented, for instance, a name of a new device, circuit or a fabr icating or testing pro cess.</Paragraph>
  </Section>
class="xml-element"></Paper>
Download Original XML