期刊名称:Faculty of Computer Science and Information Technology
出版年度:2009
卷号:0
期号:0
语种:English
出版社:Faculty of Computer Science and Information Technology
摘要:Keyword: Keywords : Document Categorization, Document Similarity, Text Mining, TF-IDF, Vector Space ModelAbstrack:ABSTRACT :The increasing progress of the Gunadarma University then it is making a lot of scientific writing by students. A scientific writing can be easily categorized manually by humans, but if it is done in a computerized will bring its own problems. Similarly, the level of resemblance or similarity search for a document with other documents, people can easily determine whether a document has the level of resemblance or similarity with other documents or not, for that in this study will be made a tool that can categorize a document and a given level of similarity between documents are computerized. In this study the techniques used to solve the above problems is to use text mining techniques for document categorization of scientific writing. As for the search for similarity value of a document with other documents using keywords categorization results obtained from the document, and the algorithm used is algorithm TF / IDF (Term Frequency - Inversed Document Frequency) and Vector Space Model Algorithm. With this research, it is hope that the document categorization process will be computerized, the result can be in accordance with the result of manual categorization. And measuring the level of similarity of documents was to show how much the value of similarity of documents with other documents.