期刊名称:International Journal of Computer Science and Information Technologies
电子版ISSN:0975-9646
出版年度:2015
卷号:6
期号:1
页码:141-147
出版社:TechScience Publications
摘要:Data quality is major concern area in an Data Warehouse environment. ETL tools focus on detection and correction of data quality problems that affect the success of a data warehouse. Data imported from source into the data warehouse often has different quality, format, coding etc. In order to bring all the data together in a standard, homogeneous environment, Extraction–transformation– loading (ETL) tools are used. Proprietary tools used for data cleaning have a very limited functionality. Small and Medium Scale Enterprises(SME) and Small Scale Enterprises (SSE) cannot afford the licensing cost of these paid tools. The solution to data quality problems is provided by open source data quality tool - MaSSEETL is to deal with naming conflicts, structural conflicts, date conversions, missing values and changing dimensions. This tool solves the integrity issues faced by various available GPL tools. MaSSEETL solves the appropriate errors with appropriate level of warning. In this paper, we are presenting the implementation of MaSSEETL. The tool provides an increased ease of use in a data warehouse environment. General Terms -Data warehousing, data cleansing, quality data, dirty data, surrogate keys
关键词:Data inconsistency; identification of errors;organization growth; ETL; data quality