期刊名称:International Journal of Computer Science & Information Technology (IJCSIT)
印刷版ISSN:0975-4660
电子版ISSN:0975-3826
出版年度:2016
卷号:8
期号:2
页码:101
出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:The amount of text data mining in the world and in our life seems ever increasing and there’s no end to it.The concept (Text Data Mining) defined as the process of deriving high-quality information from text. Ithas been applied on different fields including: Pattern mining, opinion mining, and web mining. Theconcept of Text Data Mining is based around the global Stemming of different forms of Arabic words.Stemming is defined like the method of reducing inflected (or typically derived) words to their word stem,base or root kind typically a word kind. We use the REP-Tree to improve text representation. In addition,test new combinations of weighting schemes to be applied on Arabic text data for classification purposes.For processing, WEKA workbench is used. The results in the paper on data set of BBC-Arabic website alsoshow the efficiency and accuracy of REP-TREE in Arabic text classification.
关键词:Data mining; Text classification; Text data mining; Arabic text classification; Pre-processing.