首页    期刊浏览 2024年11月23日 星期六
登录注册

文章基本信息

  • 标题:Big Data Processing with MapReduce for E-Book
  • 本地全文:下载
  • 作者:Tae Ho Hong ; Chang Ho Yun1 ; Jong Won Park1
  • 期刊名称:International Journal of Multimedia and Ubiquitous Engineering
  • 印刷版ISSN:1975-0080
  • 出版年度:2013
  • 卷号:8
  • 期号:1
  • 出版社:SERSC
  • 摘要:Evolution of IT and computer has made e-books popular day by day. In this paper, we are interested in searching a word in e-books. However, it is impossible to search a word in digitized e-books if they consist of image files such as JPG and PDF. Our solution to this problem is to transform the image file based e-books into text files based e-books to enable searching a word in e-books. We use EPUB, a XML-based text file, which is defined by IDPF(International Digital Publishing Forum). That is, we convert the image file based e-books into EPUB format e-books, so that searching a word in e-books can be done without any problem. The converting job should deal with very big data usually and require a lot of computing power. If we do the conversion in an usual personal computer, it would take a lot of processing time or it might be impossible for us to complete it. We used MapReduce model with a cluster system which enables us to perform the conversion successfully and reduce the processing time. This paper presents our Hadoop-based e-book Conversion System which is a distributed computing framework to transform the image based e-books into EPUB format e-books. Our experimental system consists of up to 15 cluster nodes. This paper evaluates the performance of the experimental system which processes the conversion of up to 2TB(Terra Byte) image files into EPUB files with a 15 nodes cluster system. We analyzed the processing time when the number of nodes in the cluster system was varied. We also analyzed the improvement effect when the dpi of the image file was varied. The performance evaluation confirmed us that the Hadoop-based e-book Conversion System successfully processed the big data for e-book.
  • 关键词:E-book; Big Data; MapReduce; Hadoop; EPUB; Internet
国家哲学社会科学文献中心版权所有