期刊名称:International Journal of Electronics Communication and Computer Engineering
印刷版ISSN:2249-071X
电子版ISSN:2278-4209
出版年度:2011
卷号:2
期号:2
页码:116-122
出版社:IJECCE
摘要:This paper is based on the preprocessing activities which is performed by the software or language translators before applying mining algorithms on the huge data. Text mining is an important area of Data mining and it plays a vital role for extracting useful information from the huge database or data ware house. But before applying the text mining or information extraction process, preprocessing is must because the given data or dataset have the noisy, incomplete, inconsistent, dirty and unformatted data. In this paper we try to collect the necessary requirements for preprocessing. When we complete the preprocess task then we can easily extract the knowledgful information using mining strategy. This paper also provides the information about the analysis of data like tokenization, stemming and semantic analysis like phrase recognition and parsing. This paper also collect the procedures for preprocessing data i.e. it describe that how the stemming, tokenization or parsing are applied