首页    期刊浏览 2024年11月06日 星期三
登录注册

文章基本信息

  • 标题:Lossless text compression using GPT-2 language model and Huffman coding
  • 本地全文:下载
  • 作者:Md.Atiqur Rahman ; Mohamed Hamada
  • 期刊名称:SHS Web of Conferences
  • 印刷版ISSN:2416-5182
  • 电子版ISSN:2261-2424
  • 出版年度:2021
  • 卷号:102
  • 页码:1-8
  • DOI:10.1051/shsconf/202110204013
  • 语种:English
  • 出版社:EDP Sciences
  • 摘要:Modern daily life activities produced lots of information for the advancement of telecommunication. It is a challenging issue to store them on a digital device or transmit it over the Internet, leading to the necessity for data compression. Thus, research on data compression to solve the issue has become a topic of great interest to researchers. Moreover, the size of compressed data is generally smaller than its original. As a result, data compression saves storage and increases transmission speed. In this article, we propose a text compression technique using GPT-2 language model and Huffman coding. In this proposed method, Burrows-Wheeler transform and a list of keys are used to reduce the original text file’s length. Finally, we apply GPT-2 language mode and then Huffman coding for encoding. This proposed method is compared with the state-of-the-art techniques used for text compression. Finally, we show that the proposed method demonstrates a gain in compression ratio compared to the other state-of-the-art methods.
国家哲学社会科学文献中心版权所有