期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2015
卷号:6
期号:6
页码:4941-4951
出版社:TechScience Publications
摘要:A topic evolution graph enables one to get a quick overview of the knowledge body and its growth. Hence, discovery of topic evolution over time in a scientific corpus is an important problem. We propose a technique that uses the content of documents to discover topics as well as to link them in order to represent evolution. In addition to depicting the growth of knowledge body as a topic evolution chart, we mine influential documents and extract representative documents for topics. Previous work includes citation-aware approaches that go beyond considering documents as bags-of-words but which are computationally expensive. We discover topic labels (n-grams) using semantic relevance measures and discover influential nodes by applying the PageRank algorithm on the citation network. We evaluate our method on an arXiv corpus of ~29,000 research papers in physics. While producing meaningful results, our technique is also found to offer improvement on existing methods in terms of computational complexity and scalability. Specifically, our method runs in time linear in the number of documents and words in the corpus.