期刊名称:Indian Journal of Computer Science and Engineering
印刷版ISSN:2231-3850
电子版ISSN:0976-5166
出版年度:2020
卷号:11
期号:4
页码:383-393
DOI:10.21817/indjcse/2020/v11i4/201104264
出版社:Engg Journals Publications
摘要:The ever growing biological research generates large volumes of biological data and knowledge bases ranging from clinical test results to genome analysis. The dynamic changes of genome sequences and complexity of these database and their relations have given lot of challenges to data analysis. There are many online databases are available for biological studies. It is essential that biological data can be analyzed in multidimensional way creating data warehouse and then online analytical processing. The method of multidimensional modeling, star schema is not sufficient for biological data as it cannot cater more relationships. The Snowflake schema though helpful in better relations among datasets than star schema but cannot model all data from all databases specially the hidden states of long new biological sequences or complex medical data. Looking at above scenario, the idea mentioned in this paper combined the efforts of generating datasets by HMM (Hidden Markov Model) from all types biological databases available online and use Fact Constellation schema of data warehouse modeling. Hidden Markov Model has adopted in this study to find newly datasets and help in analyzing relations between these datasets. Once the data sets generated the fact constellation schema of multidimensional modeling done for making data warehouse. Henceforth new proposed model in this work is called BioFactHMM schema specially proposed for biological data which is a mix of star and snowflake schema. This model desires to capture all semantics of bio sequence from various data sources using HMM. Then data warehouse modeling is done with design principles of Fact constellation schema. Subsequently, the analysis technique of OLAP cube is done to view the data and reports in a multidimensional way.