首页    期刊浏览 2024年10月04日 星期五
登录注册

文章基本信息

  • 标题:A Method to Extract Sentences Containing Protein Function Information with Training Data Extension Based on User's Feedback
  • 本地全文:下载
  • 作者:Kazunori Miyanishi ; Tomonobu Ozaki ; Takenao Ohkawa
  • 期刊名称:Information and Media Technologies
  • 电子版ISSN:1881-0896
  • 出版年度:2010
  • 卷号:5
  • 期号:4
  • 页码:1278-1286
  • DOI:10.11185/imt.5.1278
  • 出版社:Information and Media Technologies Editorial Board
  • 摘要:A protein expresses various functions by interacting with chemical compounds. Protein function is clarified by protein structure analysis and the obtained knowledge has been stated in a number of documents. Extracting the function information and constructing the database are useful for various application fields such as drug discovery, understanding of life phenomenon, and so on. However, it is impractical to extract the function information manually from a number of documents for constructing the database, which strongly provide motivation to study automatic extraction of the function information. Extraction of protein function information is considered as a classification problem, namely, whether each sentence from the target document includes the function information or not is determined. Typically, in the case of addressing such a classification problem, a classifier is learned using the training data previously given. However, the accuracy is not high when the training data is not large enough. In such a case, we attempt to improve the accuracy of classification by extending the training data. Effective sentences for getting high accuracy are selected from the reference data aside from the training data set, and added to the training data. In order to select such effective sentences, we introduce the reliability of temporary labels assigned to sentences in the reference data. Sentences with low reliability temporary labels are presented to users, assigned true labels as users' feedback, and added to the training data. Additionally, a classifier is learned by the training data with sentences with high reliability temporary labels. By iterating this process, we attempt to improve the accuracy steadily. In the experiment, compared with the related approach, the accuracy is higher when the iteration steps of feedbacks and the number of sentences returned by users' feedback are small. Thus, it is confirmed that the training data is appropriately extended based on users' feedback by the proposed method. In addition, this result serves a purpose of reducing users' load.
国家哲学社会科学文献中心版权所有