文章基本信息

标题：A Framework for Building Applications Based on Hidden Topics with Short and Sparse Web Documents
本地全文：下载
作者：Kanimozhiveena E ; D. Ramya Dorai
期刊名称：International Journal of Advanced Research in Computer Engineering & Technology (IJARCET)
印刷版ISSN：2278-1323
出版年度：2013
卷号：2
期号：3
页码：984-988
出版社：Shri Pannalal Research Institute of Technolgy
摘要：The main aim of this paper is to provide an approach for resolving two major issues in the web such as (1) data sparseness and (2) synonymy of the data. This paper provides a model that could reduce the data sparseness and the synonymy issues. To attain this objective, here the external data from users is taken. This external data helps to reduce both the mentioned issues. The external data is taken into consideration along with the dataset to reduce the data sparseness. It is because if a document that has more relevant content in it but, with very few sentences present in it, related to the keyword given in the query space, then the classification is not likely to be done perfectly. In this case, to classify such sparse and short documents more accurately, we use external data where the document may contain very few sentences and very fewer keywords present it and then enhance classification. In advertising, the ad messages and web pages are considered. Semantic similarity is measured between the ad messages and the web pages for their matching and ranking.
关键词：classification; data sparseness; matching/ranking; text ; categorization; semantic similarity; web mining