文章基本信息

标题：Data Mining: a Healthy Tool for Your Information Retrieval and Text Mining
本地全文：下载
作者：Santosh Kumar Rath ; Manobendu Kesari Jena ; Tapaswini Nayak 等
期刊名称：International Journal of Computer Science and Information Technologies
电子版ISSN：0975-9646
出版年度：2011
卷号：2
期号：5
页码：2042-2045
出版社：TechScience Publications
摘要：Data Warehousing and Data Mining are widely used by many industries like banking, insurance, healthcare, security and many others, however very little work has been done for Text-mining. Text mining involves the application of techniques from areas such as information retrieval, natural language processing, information extraction and data mining. In this paper we describe text mining as a truly interdisciplinary method drawing on information retrieval, machine learning, statistics, computational linguistics and especially data mining. We first give a short sketch of these methods and then define text mining in relation to them. Later sections survey state of the art approaches for the main analysis tasks preprocessing, classification, clustering, information extraction and visualization. The last section exemplifies text mining in the context of a number of successful applications. Text mining offers a solution to this problem by replacing or supplementing the human reader with automatic systems Undeterred by the text explosion. It involves analyzing a large collection of documents to discover previously unknown information. The information might be relationships or patterns that are buried in the document collection and which would otherwise be extremely difficult, if not impossible, to discover. Text mining can be used to analyze natural language documents about any subject, although much of the interest at present is coming from the biological sciences