首页    期刊浏览 2024年07月23日 星期二
登录注册

文章基本信息

  • 标题:Efficient Similarity Join Method Using Unsupervised Learning
  • 本地全文:下载
  • 作者:Bilal Hawashin ; Farshad Fotouhi ; William Grosky
  • 期刊名称:International Journal of Computer Science & Information Technology (IJCSIT)
  • 印刷版ISSN:0975-4660
  • 电子版ISSN:0975-3826
  • 出版年度:2012
  • 卷号:4
  • 期号:5
  • 页码:23
  • 出版社:Academy & Industry Research Collaboration Center (AIRCC)
  • 摘要:This paper proposes an efficient similarity join method using unsupervised learning, when no labeled datais available. In our previous work, we showed that the performance of similarity join could improve whenlong string attributes, such as paper abstracts, movie summaries, product descriptions, and user feedback,are used under supervised learning, where a training set exists. In this work, we adopt using long stringattributes during the similarity join under unsupervised learning. Along with its importance when nolabeled data exists, unsupervised learning is used when no labeled data is available, it acts also as a quickpreprocessing method for huge datasets. Here, we show that using long attributes during the unsupervisedlearning can further enhance the performance. Moreover, we provide an efficient dynamically expandablealgorithm for databases with frequent transactions.
  • 关键词:Similarity Join; Unsupervised Learning; Diffusion Maps; Databases; Machine Learning.
国家哲学社会科学文献中心版权所有