期刊名称:International Journal of Computer and Information Technology
印刷版ISSN:2279-0764
出版年度:2016
卷号:5
期号:5
页码:438-442
出版社:International Journal of Computer and Information Technology
摘要:The aim of this study was to profile the use and usage
patterns of influenza virus genome from scientific publications in
online databases using Natural Language Processing and Text
Mining techniques. A systematic research was performed to
select papers in PubMed electronic database using the keywords:
‘influenza’, ‘genome’, ‘database’. The 45 articles that presented
free full text available were processed with the sofwares
AntFileConverter and AntConc. Text Mining was performed
with the software Weka. Association rules were expected
between genome and influenza. Also, it was predicted that
influenza genome and terms related directly to the application of
genome databases would relate. However, the results revealed an
association between influenza virus protein and mutation
sequence/database. The discovery of different associations than
the expected revealed the necessity of expanding the research in
order to increase the size of the corpus and to improve the
attributes selection for mining in Weka sofware.
关键词:Data Mining; Natural Language Processing;
Influenza A virus; Genome; Viral; Databases; Nucleic Acid