期刊名称:International Journal of Software Engineering & Applications (IJSEA)
印刷版ISSN:0976-2221
电子版ISSN:0975-9018
出版年度:2017
卷号:8
期号:4
页码:21
出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:In this paper we propose a novel method to cluster categorical data while retaining their context. Typically,clustering is performed on numerical data. However it is often useful to cluster categorical data as well,especially when dealing with data in real-world contexts. Several methods exist which can clustercategorical data, but our approach is unique in that we use recent text-processing and machine learningadvancements like GloVe and t- SNE to develop a a context-aware clustering approach (using pre-trainedword embeddings). We encode words or categorical data into numerical, context-aware, vectors that weuse to cluster the data points using common clustering algorithms like K-means
关键词:Natural language processing; context-aware clustering; k-means; word embeddings; GloVe; t-SNE