期刊名称:International Journal of Computer Science and Network Security
印刷版ISSN:1738-7906
出版年度:2020
卷号:20
期号:11
页码:172-181
DOI:10.22937/IJCSNS.2020.20.11.21
出版社:International Journal of Computer Science and Network Security
摘要:It is evident from day to day web usage experience that a huge number of PDF sources have been uploaded on daily basis. For example, there are several scientific societies that publish volumes of articles and periodicals like IEEE, ACM, Elsevier, and Springer etc. Most of these resources are unstructured or semi-structured that makes it difficult to search and retrieve information. In this paper, an effective model for digital library creation is proposed which is originally motivated by an automated ontological information extraction framework (OFIE). The framework takes a PDF published paper, extracts its structural information like title, authors, abstract, funding information, table of contents, references etc. with the help of fuzzy rule-based system (FRBS) and word sense disambiguation (WSD) approach. Consequently, this extracted information is converted to RDF triples. The proposed scheme takes this extracted information and converts into a digital library stored in MS-SQL databased by Extract, Transform and Load (ETL) process. This digital library can be an institute’s library or an individual scholar’s library who is interested in synthesizing his downloaded PDF files for better search and retrieve purposes. Moreover, by using the SQL queries based front-end design, the information can be searched, retrieved, and exported in the form of reports.
关键词:Ontology; Digital Library; ETL; SQL; RDF; OFIE; FRBS