首页    期刊浏览 2024年09月19日 星期四
登录注册

文章基本信息

  • 标题:Text Representations for Patent Classification
  • 本地全文:下载
  • 作者:Eva D'hondt ; Suzan Verberne ; Cornelis Koster
  • 期刊名称:Computational Linguistics
  • 印刷版ISSN:0891-2017
  • 电子版ISSN:1530-9312
  • 出版年度:2013
  • 卷号:39
  • 期号:3
  • 页码:755-775
  • DOI:10.1162/COLI_a_00149
  • 语种:English
  • 出版社:MIT Press
  • 摘要:With the increasing rate of patent application filings, automated patent classification is of rising economic importance. This article investigates how patent classification can be improved by using different representations of the patent documents. Using the Linguistic Classification System (LCS), we compare the impact of adding statistical phrases (in the form of bigrams) and linguistic phrases (in two different dependency formats) to the standard bag-of-words text representation on a subset of 532,264 English abstracts from the CLEF-IP 2010 corpus. In contrast to previous findings on classification with phrases in the Reuters-21578 data set, for patent classification the addition of phrases results in significant improvements over the unigram baseline. The best results were achieved by combining all four representations, and the second best by combining unigrams and lemmatized bigrams. This article includes extensive analyses of the class models (a.k.a. class profiles) created by the classifiers in the LCS framework, to examine which types of phrases are most informative for patent classification. It appears that bigrams contribute most to improvements in classification accuracy. Similar experiments were performed on subsets of French and German abstracts to investigate the generalizability of these findings.
国家哲学社会科学文献中心版权所有