文章基本信息

标题：Front Index Extraction from Research Documents Using Meta-Content Framework
作者：Tripti Sharma ; Sarang Pitale
期刊名称：Indian Journal of Education and Information Management
印刷版ISSN：2277-5367
电子版ISSN：2277-5374
出版年度：2012
卷号：1
期号：7
页码：301-305
语种：English
出版社：Indian Society for Education and Environment
其他摘要：Text mining is providing new areas of research for the researchers. Front index extraction is one of such area in the field of text mining. Front index for a book is a tabular management of topics and subtopics with page numbers. Various ongoing researches focus on front index extraction from e-books using various techniques such as image processing. The present scheme focuses on front index extraction from research documents using a string matching algorithm. The paper also describe the working of a framework called Meta-Content framework for e-books, MCFE, which uses the front index extraction process and uses the extracted front index as meta information. The framework takes e-book in PDF form and extracts the front index by converting the PDF format e-book in text. The framework is developed using Java and iText library.
关键词：Text Mining; Front Index; e-book; Meta-information; PDF; Java; itext