文章基本信息

标题：Data mining in production management and manufacturing.
作者：Matsi, B. ; Otto, T. ; Loun, K. 等
期刊名称：DAAAM International Scientific Book
印刷版ISSN：1726-9687
出版年度：2009
期号：January
语种：English
出版社：DAAAM International Vienna
摘要：Recently data mining has been the subject of many articles in business and software magazines and books. However, few years ago only few people had even heard about the term--Data Mining (DM). The term itself was introduced relatively recently in the 1990s (Tan et al., 2006).
关键词：Data mining;Manufacturing;Manufacturing processes;Production management

Data mining in production management and manufacturing.

Matsi, B. ; Otto, T. ; Loun, K. 等

1. Introduction

Recently data mining has been the subject of many articles in business and software magazines and books. However, few years ago only few people had even heard about the term--Data Mining (DM). The term itself was introduced relatively recently in the 1990s (Tan et al., 2006).

Traditionally analysts have performed the task of extracting useful information from recorded data. But the increasing volume of data in modern business and science calls for computer-based approaches. As data sets have grown in size and complexity, there has been an unavoidable shift away from direct hands-on data analysis toward indirect, automatic data analysis using more complex and sophisticated tools. Nowadays the modern technology has made data collection as an almost effortless task. However, the captured data needs to be converted into information and knowledge to become useful. DM is the entire process of applying computer-based methodology, including new techniques for knowledge discovery from data (Kantardzic, 2003).

DM is a general term, which encompasses a number of techniques to pick out useful information from large data files and enables to sort it. It is a new powerful technology for analyzing data from different perspectives and summarizing it into useful information. It has been also said that DM is the analyses of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner (Tsai, Chen, Chan, 2008). Nowadays it is also very often known as Knowledge Discovery in Databases (KDD) or simply Knowledge Discovery (KD), which enables to identify trends within data that go beyond simple analysis and through the use of sophisticated algorithms; users have the ability to identify key attributes of business processes and target opportunities. DM is predicted to be "one of the most revolutionary developments of the next decade", according to the online technology magazine ZDNET News (February 8, 2001). In fact, the MIT Technology Review has chosen the DM as one of 10 emerging technologies that will change the world.

DM has been used widely by companies with a strong consumer focus like retail, financial, communication, and marketing organizations, where it enables to determine relationships among "internal" factors such as price, product positioning, and "external" factors such as economic indicators, competition, and customer demographics (Hand et al., 2001).

There are many different possibilities for using DM but the most frequent examples encompass mainly these applications:

* Rate customers by their propensity to respond to an offer.

* Identify cross-sell opportunities.

* Detect fraud and abuse in insurance and finance.

* Estimate probability of an illness re-occurrence or hospital re-admission.

* Isolate root causes of an outcome in clinical studies.

* Determine optimal sets of parameters for a production line operation.

* Predict peak load of a network.

Also DM technology is often used together with many theories and technologies such as data basis, artificial intelligence, machine learning, statistics etc, and it is also applied in the industries of finance, insurance, telecommunications and retail, which have accumulated a great quantity of data (Siqing, Yin, Yan, 2003). The general Data Mining Process (DMP) includes six phases that address the main issues in DM (Lentzsch, 2007). All these phases fit together in cyclical process and cover the full DM process (see Fig 1).

[FIGURE 1 OMITTED]

This process is also known as the leading data mining methodology--CRISP-DM (Cross Industry Standard Processing for Data mining) (Chapman, Clinton, Kerber, Khabaza, Reinartz, Shearer, Wirth, 2000). CRISP-DM began as a European Union project under the ESPRIT (European Strategic program on Research in Information Technology of the European Union) funding initiative. It was leaded by four companies: ISL, NCR, Daimler-Benz and OHRA and the first version of the methodology was released as CRISP-DM 1.0 in 1999 (wikipedia.com, 2008). It has been developed as an industry- and tool-neutral Data Mining process model, which makes large data mining projects faster, cheaper and more reliable and manageable (Roiger et al., 2003).

2. DM implementation in production management and manufacturing

2.1 Data mining in manufacturing

As it was mentioned, DM could have a great potential also in manufacturing. Products and components generate a data trail across lifecycle phases such as market analysis, design engineering, manufacturing, and service. DM algorithms extract knowledge from this large volume of data leading to significant improvements in the next generation of products and services. In fact, the knowledge discovery activity could become the key factor to innovation and business success (Kusiak & Smith, 2007). Integrating a DM framework within the manufacturing information system enables to improve manufacturing decision making process and enhance the productivity. It enables to analyze enterprises opportunities and employee's skills and competences, find relations between enterprises, customers and subcontractors, and make consequences based on different data conjunctions.

In order to implement the DMP to the production management and manufacturing sector, it is important to understand the business problem--what we would like to do with all that data and also to understand what that data is all about. It is obvious that all data need to be in an easily accessible format and available from one central database. It is often the case that relevant data files are stored in several locations and in different formats and need to be pulled together before analyses. The extracted information and knowledge can assist the engineers as their reference and basis for advanced investigation of the root causes of the defects (Larose, 2005). In current case study is used data about the enterprises, their technological capabilities, and employees' competences from three different databases: Metnet, Innomet and Innoclus. For data analyzing the data understanding and preparation is unavoidable phases in DMP (see Fig 1). Also there is need to assess the data for the DM project. The several aspects which should be considered are the followings:

2.2 Relevant factors covered by data

For making a DM project worthwhile, it is important that the data contain all relevant factors/variables and is mutually joinable. Therefore the data was separated into different tables according to logical themes. For example, one table includes all information about the enterprises contacts, other enterprises technological capabilities, etc. It is smart to hold the data in different tables, because it simplify the data understanding and facilitate later the DMP. In order to join the data between these thematically separated tables, the common ID-s (enterprise_id, sector_id etc) were worked out.

On the figure 2 is pointed out in which tables all the data were divided and from which primary source the data is available from.

[FIGURE 2 OMITTED]

INNOMET is an acronym for development of the innovative database model for adding innovation capacity of labour force and entrepreneurs of the metal engineering, machinery and apparatus sector. In terms of development the scope of information system includes the following functionalities:

1. management of users and user assessment rights

2. management of classificators (skills, vocations, different definitions)

3. management of organisations according to the type (industrial enterprise, educational organisation, awarding body)

4. compilation and management of questionnaires for staff members according to the INNOMET methodology

5. management of staff competency queries

6. management of enterprise staff members

7. evaluation of competencies and evaluation results

8. generalisation of evaluation results over enterprise, sector, vocation, region or state

9. management of vocational exams

10. management of curricula

11. management of vocational courses

12. management of manpower requirements and further education data

On the figure 3 the structure of tables is presented more specifically. In addition, this conception of data model illustrates, which common ID-s has been worked out in order to join the data from different tables.

[FIGURE 3 OMITTED]

2.3 Handling noisy data

The term "noisy" in DM refers usually to errors in data or also sometimes to missing data (Hastie et al., 2001). In this study that is the problem we have to handle. It results from the data collection. All data about the enterprises capabilities were gathered without multiple-choices. Initially the question options were not defined and therefore every enterprise answered to the questions differently. In order to understand the answers unambiguously, the data synchronization was unavoidable. This solution for employee's skills and competences and enterprise technological capabilities was simpler, as the multiple-choices were worked out and enterprises answered to the questions by doing the suitable selections.

2.4 Gathering enough data

It is obvious that the more complex patterns and relationships we would like to find with data mining, the more records required to find them. There is self-evident difference, when we are analyzing ten, hundred or all Estonian machinery enterprises. It is important to point out, that in our case study all machinery enterprises have been included and that information has been gathered and will be used in different analyses by DM.

3. Data mining analysis

When the data will be gathered into one central database and also would be structured and therefore easily understandable, we could use the DM in many different applications. We could build up the models, which would be able to predict different important indications for better and more effective production management. For example, it could be possible to create the predictive DM model for investigating the competences, which could be needed for most effective product development. In addition it could be possible to classify enterprises for different clusters based on different technological capabilities and etc. Therefore the DM implementation is also effective in manufacturing sector and certainly is necessary for improving enterprises productivity and innovation in product development and manufacturing.

As the central database is not be finished and all the data is not structured yet, the following example is presented for understanding the data preparation matter and the essence of analytical data mining tool. The aim is to find from all Estonian machinery enterprises those enterprises, which are corresponding to the following criteria:

* Enterprises are located in North-Estonia.

* Enterprises are dealing with mechanical treatment of steel and aluminium products.

* Employees" technological skills are on the highest level.

On the figure 4 is shown the target solution.

This was set up in data mining program Clementine. In the centre of the figure is shown the stream, which enables to make necessary selections for finding the suitable enterprises. Around the stream is pointed out the selection conditions. As follows the condition descriptions with comments are pointed out.

[FIGURE 4 OMITTED]

Geographic_Location_Id=2

As the enterprise locations are described in the database as follows: Geographic_Location_Id=1, when Geographic_Location_Name= West-Estonia Geographic_Location_Id=2, when Geographic_Location_Name= North-Estonia Geographic_Location_Id=3, when Geographic_Location_Name= Central-Estonia Geographic_Location_Id=4, when Geographic_Location_Name= South-Estonia Geographic_Location_Id=5, when Geographic_Location_Name= East-Estonia The selection for picking up the North-Estonian enterprises is defined with the Geographic_Location_Id, which in the case of North-Estonia is 2.

Technology_Id=2

Every technological capability is defined separately in the database and marked with the specific ID. Because the technological capability--mechanical treatment of steel-and aluminium products has been defined in the database with the technology ID 2, the selection condition is that kind. All Technology_Id definitions are not presented, because there are more than hundred technologies.

Skill_Type_Id=4

Similarly to prior reasons, the selection has been done according to the data definition in database.

Skill_Type_Id=1, when Skill_Type_Name= professionalisms Skill_Type_Id=2 when Skill_Type_Name= personal identities Skill_Type_Id=3, when Skill_Type_Name= base skills Skill_Type_Id=4, when Skill_Type_Name= general skills

In addition this selection is associated with employees" skill levels. It is shown, that the skills are divided into four main groups: general skills, base skills, professionalisms and personal identities. As the technical skills are one part of the general skills, the necessary selection has been done. This selection helps to speed up the query, because the technical skill is required only among the general skills and the other skill types are excluded.

Competence_Id=24

The selection condition is done again according to the data definition in database. Every skill is defined separately and marked with its own specific ID. Technical skill is defined in the database with the Competence_Id, which is 24.

Excistent_Skill_Level_Id=1

Skill levels are defined in the database as follows: Excistent_Skill_Level_Id=1, when Skill_Level_Name= the highest Excistent_Skill_Level_Id=2, when Skill_Level_Name= high Excistent_Skill_Level_Id=3, when Skill_Level_Name= medium Excistent_Skill_Level_Id=4, when Skill_Level_Name= low Excistent_Skill_Level_Id=5, when Skill_Level_Name= the lowest

After the stream has been completed and the necessary selection conditions have been done, the stream is executable. The result could be shown in different ways. In that example it is presented as a table. In other words the result will be the list of enterprises, which satisfies the brought up criteria.

4. Conclusions

Data mining is a powerful tool, needed when amounts of data increase rapidly. In addition, it could be used also for complex analysis at a country level in sector of machinery, metal and apparatus engineering. The implementation of DM could be useful for analyzing and updating existing databases in a process of development collaborative e-Manufacturing information system. In addition, its implementation could give the significant effect for machinery enterprises productivity and innovation in product development and manufacturing. Therefore the future research is targeted to increasing of proactivity of the system. If we add data feeds from embedded systems reporting technological capability, the DM is one of the most promising methods to handle the information thus increasing productivity of management and innovation in the collaboration network.

After the main database will be created (based on three existing databases: Metnet, Innomet and Innoclus) and "noisy" data will be eliminated, the aim is to use all these gathered data about the enterprises and those technological capabilities and employee's skills for making and experimenting some DM models in order to increase these enterprises productivity and innovation in product development. On the other hand DM could be also useful from product improvement and repair process improvement perspectives to be able to determine the most frequent repairs by product, the factors that contribute to a failure type, and the correlations between failures.

DOI: 10.2507/daaam.scibook.2009.11

5. Acknowledgement

The work has been supported by Estonian Science Foundation grant ETF7852.

6. References

Chapman, P.; Clinton, J.; Kerber, R.; Khabaza, T.; Reinartz, T.; Shearer, C.; Wirth, R. (2000). CRISP-DM 1.0: Step-by-step Data Mining Guide. SPSS

Hand, D.; Mannila, H.; Smyth, P. (2001). Principles of Data Mining, MIT Press, Cambridge, MA

Hastie , T.; Tibshirani, R. & Friedman, J. H. (2001). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer

Kantardzic, M. (2003). Data Mining: Concepts, Models, Methods, and Algorithms. John Wiley & Sons

Kusiak, A.; Smith, M. (2007). Data mining in design of products and production systems, Annual Reviews in Control, 31. 147-156 A

Larose Daniel T. (2006). Data Mining Methods and Models. John Wiley & Sons Inc., United States of America

Lentzsch, K. (2007). Introduction to Clementine and Data Mining. SPSS Inc.

Riives, J.; Otto, T.; Keerman, M. (2007). INNOMET system functionality and software description. Innovative development of human resources in enterprise and in society (38-46). Tallinn: TUT Press

Roiger R.; Geatz M., (2003). Data Mining: a tutorial based primer (CRSP-M), 408p. Lavoiser

Siqing, S.; Yin, C.; Yan, C. (2003). Data Mining--Concept, Model, Method and Algorithm. Tsinghua University Publishing Company, Beijing

Tan, P-N.; Steinbach M. & Kumar V. (2006). Introduction to Data Mining, Addison Wesley, ISBN-13:9780321321367

Tsai, F. S.; Chen, Y.; Chan, K. L. (2008). Probabilistic latent semantic analysis for search and mining of corporate blogs. In C. Soares, Y. Peng, J. Meng, Z.-H. Zhou, and T. Washio, editors, Applications of Data Mining in E-business and Finance. IOS Press

This Publication has to be referred as: Matsi, B[irthe]; Otto, T[auno]; Loun, K[aia] & Roosimolder, L[embit] (2009). Data Mining in Production Management and Manufacturing, Chapter 11 in DAAAM International Scientific Book 2009, pp. 097106, B. Katalinic (Ed.), Published by DAAAM International, ISBN 978-3-90150969-8, ISSN 1726-9687, Vienna, Austria

Authors' data: M.Sc. Matsi, B[irthe]; Ph.D. Otto, T[auno]; M.Sc. Loun, K[aia] & Ph.D. Prof Roosimolder, L[embit], Tallinn University of Technology, Ehitajate tee 5, 19086, Tallinn, Estonia, birthe21@hotmail.com, tauno.otto@ttu.ee, kaia.loun@ttu.ee, lembitr@staff.ttu.ee