文章基本信息

标题：Optimization of Word Sense Disambiguation using Clustering in WEKA
本地全文：下载
作者：Neetu Sharma ; Dr. S. Niranjan
期刊名称：International Journal of Computer Technology and Applications
电子版ISSN：2229-6093
出版年度：2012
卷号：3
期号：4
页码：1598-1604
出版社：Technopark Publications
摘要：In the Natural Language Processing (NLP) community, Word Sense Disambiguation (WSD) has been described as the task which selects the appropriate meaning (sense) to a given word in a text or discourse where this meaning is distinguishable from other senses potentially attributable to that word. These senses could be seen as the target labels of a classification problem. Clustering and classification are two important techniques of data mining. Classification is a supervised learning problem of assigning an object to one of several pre-defined categories based upon the attributes of the object. While, clustering is an unsupervised learning problem that group objects based upon distance or similarity. Each group is known as a cluster. In this paper we make use of data file poach.arff containing 7 attributes and 37 instances to perform an integration of clustering and classification techniques of data mining. We compared results of simple classification technique (using Random Forest classifier) with the results of integration of clustering and classification technique, based upon various parameters using WEKA (Waikato Environment for Knowledge Analysis), a Data Mining tool. The results of the experiment show that integration of clustering and classification gives promising results with utmost accuracy rate and robustness
关键词：machine learning software; data mining; data preprocessing; data visualization; WEKA; WORDNET; K-Means; Random Forest