期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2019
卷号:10
期号:5
页码:28-35
DOI:10.14569/IJACSA.2019.0100505
出版社:Science and Information Society (SAI)
摘要:Millions of compounds which exist in huge datasets are represented using Simplified Molecular-Input Line- Entry System (SMILES) representation. Fragmenting SMILES strings into overlapping substrings of a defined size called LINGO Profiles avoids the otherwise time-consuming conversion process. One drawback of this process is the generation of numerous identical LINGO Profiles. Introduced by Kristensen et al, the inverted indexing approach represents a modification intended to deal with the large number of molecules residing in the database. Implementing this technique effectively reduced the storage space requirement of the dataset by half, while also achieving significant speedup and a favourable accuracy value when performing similarity searching. This report presents an in-depth analysis of results, with conclusions about the effectiveness of the working prototype for this study.