首页    期刊浏览 2024年11月30日 星期六
登录注册

文章基本信息

  • 标题:An Approach of XML-ifying the Crude Corpus in the Field of Opinion Mining
  • 本地全文:下载
  • 作者:Debnath Bhattacharyya ; Kheyali Mitra ; Minkyu Choi
  • 期刊名称:International Journal of Grid and Distributed Computing
  • 印刷版ISSN:2005-4262
  • 出版年度:2009
  • 卷号:2
  • 期号:3
  • 出版社:SERSC
  • 摘要:This paper is meant for an easy approach for XML ifying of crude corpus in the field of Opinion Mining. The XMLification is done based on regular expressions. Corpus is the plural form of ‘corpora’. It is nothing but the collection of linguistic data. In this proposed work, the corpus is reviews posted on web sites; more specifically some product reviews. The reviews or the opinions are in the html files which are collected from sites like Cnet.com, Epinions.com, Amazon.com, ebay.com etc. After getting the crude corpus of html files, it is polished further to get only the required part of review details from that web page and thus removes the rest. This corpus is processed again and yields ultimate output in the form of XML files which contains only the important parts of the review details from raw html page. These XML files are ready to be used for further steps of Opinion Mining like parts of Speech(POS) tagging or any kind of language processes for machine learning process..
  • 关键词:Crude corpus; language processing; regular expression; XML; parts of speech tagging.
国家哲学社会科学文献中心版权所有