首页    期刊浏览 2025年06月27日 星期五
登录注册

文章基本信息

  • 标题:E-mails Mining using Generalized Addressing Patterns (GAP)
  • 本地全文:下载
  • 作者:Lakshmi Sravani Grande ; K. Mallikarjuna Mallu ; P.Pedda Sadhu Naik
  • 期刊名称:International Journal of Computer Science & Technology
  • 印刷版ISSN:2229-4333
  • 电子版ISSN:0976-8491
  • 出版年度:2012
  • 卷号:3
  • 期号:2
  • 页码:1103-1109
  • 语种:English
  • 出版社:Ayushmaan Technologies
  • 摘要:Emails become an important medium of communication. A user may receive tens or even hundreds of emails every day. Handling these emails takes much time. Therefore, it is necessary to provide some automatic approaches to relieve the burden of processing the emails. A straightforward method is to group the similar emails by supervised classifications such as mail-id, to-mail-id, subject, message, sending-time, attachments. Email mining is a process of discovering useful patterns from emails. Clustering techniques can be applied over email data to create groups of similar emails. In our algorithm, natural language processing techniques and frequent item set mining techniques are utilized to automatically generate meaningful Generalized Addressing Patterns (GAPs) from mailid, to-mail-id, subject, message, sending-time, attachments of emails. Then we put forward a novel unsupervised approach which treats GAPs as pseudo class labels and conduct email clustering in a supervised manner, although no human labeling is involved. Our proposed algorithm is not only expected to improve the clustering performance, it can also provide meaningful descriptions of the resulted clusters by the GAPs. Experimental results on open dataset and a personal email dataset collected by ourselves demonstrate that the proposed algorithm outperforms the K-means algorithm in terms of the popular measurement F1. Furthermore, the cluster naming readability is improved by 68.5% on the personal email dataset.
国家哲学社会科学文献中心版权所有