期刊名称:Bulletin of the Technical Committee on Data Engineering
出版年度:2005
卷号:28
期号:04
出版社:IEEE Computer Society
摘要:Publication searching based on keywords provided by users is traditional in digital libraries. While
useful in many circumstances, the success of locating related publications via keyword-based searching
paradigm is influenced by how users choose their keywords. Example-based searching, where user
provides an example publication to locate similar publications, is also becoming commonplace in digital
libraries.
Existing publication similarity measures, needed for example-based searching, fall into two classes,
namely, text-based similarity measures from Information Retrieval, and citation-based similarity mea-
sures based on bibliographic coupling and/or co-citation.
In this paper, we list a number of publication similarity measures, and extend and evaluate them
in terms of their accuracy, separability, and independence. For evaluation, we use the ACM SIGMOD
Anthology, a digital library of about 15,000 publications.