Data mining in production management and manufacturing.
Matsi, B. ; Otto, T. ; Loun, K. 等
1. Introduction
Recently data mining has been the subject of many articles in
business and software magazines and books. However, few years ago only
few people had even heard about the term--Data Mining (DM). The term
itself was introduced relatively recently in the 1990s (Tan et al.,
2006).
Traditionally analysts have performed the task of extracting useful
information from recorded data. But the increasing volume of data in
modern business and science calls for computer-based approaches. As data
sets have grown in size and complexity, there has been an unavoidable
shift away from direct hands-on data analysis toward indirect, automatic
data analysis using more complex and sophisticated tools. Nowadays the
modern technology has made data collection as an almost effortless task.
However, the captured data needs to be converted into information and
knowledge to become useful. DM is the entire process of applying
computer-based methodology, including new techniques for knowledge
discovery from data (Kantardzic, 2003).
DM is a general term, which encompasses a number of techniques to
pick out useful information from large data files and enables to sort
it. It is a new powerful technology for analyzing data from different
perspectives and summarizing it into useful information. It has been
also said that DM is the analyses of (often large) observational data
sets to find unsuspected relationships and to summarize the data in
novel ways that are both understandable and useful to the data owner
(Tsai, Chen, Chan, 2008). Nowadays it is also very often known as
Knowledge Discovery in Databases (KDD) or simply Knowledge Discovery
(KD), which enables to identify trends within data that go beyond simple
analysis and through the use of sophisticated algorithms; users have the
ability to identify key attributes of business processes and target
opportunities. DM is predicted to be "one of the most revolutionary
developments of the next decade", according to the online
technology magazine ZDNET News (February 8, 2001). In fact, the MIT
Technology Review has chosen the DM as one of 10 emerging technologies
that will change the world.
DM has been used widely by companies with a strong consumer focus
like retail, financial, communication, and marketing organizations,
where it enables to determine relationships among "internal"
factors such as price, product positioning, and "external"
factors such as economic indicators, competition, and customer
demographics (Hand et al., 2001).
There are many different possibilities for using DM but the most
frequent examples encompass mainly these applications:
* Rate customers by their propensity to respond to an offer.
* Identify cross-sell opportunities.
* Detect fraud and abuse in insurance and finance.
* Estimate probability of an illness re-occurrence or hospital
re-admission.
* Isolate root causes of an outcome in clinical studies.
* Determine optimal sets of parameters for a production line
operation.
* Predict peak load of a network.
Also DM technology is often used together with many theories and
technologies such as data basis, artificial intelligence, machine
learning, statistics etc, and it is also applied in the industries of
finance, insurance, telecommunications and retail, which have
accumulated a great quantity of data (Siqing, Yin, Yan, 2003). The
general Data Mining Process (DMP) includes six phases that address the
main issues in DM (Lentzsch, 2007). All these phases fit together in
cyclical process and cover the full DM process (see Fig 1).
[FIGURE 1 OMITTED]
This process is also known as the leading data mining
methodology--CRISP-DM (Cross Industry Standard Processing for Data
mining) (Chapman, Clinton, Kerber, Khabaza, Reinartz, Shearer, Wirth,
2000). CRISP-DM began as a European Union project under the ESPRIT
(European Strategic program on Research in Information Technology of the
European Union) funding initiative. It was leaded by four companies:
ISL, NCR, Daimler-Benz and OHRA and the first version of the methodology
was released as CRISP-DM 1.0 in 1999 (wikipedia.com, 2008). It has been
developed as an industry- and tool-neutral Data Mining process model,
which makes large data mining projects faster, cheaper and more reliable
and manageable (Roiger et al., 2003).
2. DM implementation in production management and manufacturing
2.1 Data mining in manufacturing
As it was mentioned, DM could have a great potential also in
manufacturing. Products and components generate a data trail across
lifecycle phases such as market analysis, design engineering,
manufacturing, and service. DM algorithms extract knowledge from this
large volume of data leading to significant improvements in the next
generation of products and services. In fact, the knowledge discovery
activity could become the key factor to innovation and business success
(Kusiak & Smith, 2007). Integrating a DM framework within the
manufacturing information system enables to improve manufacturing
decision making process and enhance the productivity. It enables to
analyze enterprises opportunities and employee's skills and
competences, find relations between enterprises, customers and
subcontractors, and make consequences based on different data
conjunctions.
In order to implement the DMP to the production management and
manufacturing sector, it is important to understand the business
problem--what we would like to do with all that data and also to
understand what that data is all about. It is obvious that all data need
to be in an easily accessible format and available from one central
database. It is often the case that relevant data files are stored in
several locations and in different formats and need to be pulled
together before analyses. The extracted information and knowledge can
assist the engineers as their reference and basis for advanced
investigation of the root causes of the defects (Larose, 2005). In
current case study is used data about the enterprises, their
technological capabilities, and employees' competences from three
different databases: Metnet, Innomet and Innoclus. For data analyzing
the data understanding and preparation is unavoidable phases in DMP (see
Fig 1). Also there is need to assess the data for the DM project. The
several aspects which should be considered are the followings:
2.2 Relevant factors covered by data
For making a DM project worthwhile, it is important that the data
contain all relevant factors/variables and is mutually joinable.
Therefore the data was separated into different tables according to
logical themes. For example, one table includes all information about
the enterprises contacts, other enterprises technological capabilities,
etc. It is smart to hold the data in different tables, because it
simplify the data understanding and facilitate later the DMP. In order
to join the data between these thematically separated tables, the common
ID-s (enterprise_id, sector_id etc) were worked out.
On the figure 2 is pointed out in which tables all the data were
divided and from which primary source the data is available from.
[FIGURE 2 OMITTED]
INNOMET is an acronym for development of the innovative database
model for adding innovation capacity of labour force and entrepreneurs
of the metal engineering, machinery and apparatus sector. In terms of
development the scope of information system includes the following
functionalities:
1. management of users and user assessment rights
2. management of classificators (skills, vocations, different
definitions)
3. management of organisations according to the type (industrial
enterprise, educational organisation, awarding body)
4. compilation and management of questionnaires for staff members
according to the INNOMET methodology
5. management of staff competency queries
6. management of enterprise staff members
7. evaluation of competencies and evaluation results
8. generalisation of evaluation results over enterprise, sector,
vocation, region or state
9. management of vocational exams
10. management of curricula
11. management of vocational courses
12. management of manpower requirements and further education data
On the figure 3 the structure of tables is presented more
specifically. In addition, this conception of data model illustrates,
which common ID-s has been worked out in order to join the data from
different tables.
[FIGURE 3 OMITTED]
2.3 Handling noisy data
The term "noisy" in DM refers usually to errors in data
or also sometimes to missing data (Hastie et al., 2001). In this study
that is the problem we have to handle. It results from the data
collection. All data about the enterprises capabilities were gathered
without multiple-choices. Initially the question options were not
defined and therefore every enterprise answered to the questions
differently. In order to understand the answers unambiguously, the data
synchronization was unavoidable. This solution for employee's
skills and competences and enterprise technological capabilities was
simpler, as the multiple-choices were worked out and enterprises
answered to the questions by doing the suitable selections.
2.4 Gathering enough data
It is obvious that the more complex patterns and relationships we
would like to find with data mining, the more records required to find
them. There is self-evident difference, when we are analyzing ten,
hundred or all Estonian machinery enterprises. It is important to point
out, that in our case study all machinery enterprises have been included
and that information has been gathered and will be used in different
analyses by DM.
3. Data mining analysis
When the data will be gathered into one central database and also
would be structured and therefore easily understandable, we could use
the DM in many different applications. We could build up the models,
which would be able to predict different important indications for
better and more effective production management. For example, it could
be possible to create the predictive DM model for investigating the
competences, which could be needed for most effective product
development. In addition it could be possible to classify enterprises
for different clusters based on different technological capabilities and
etc. Therefore the DM implementation is also effective in manufacturing
sector and certainly is necessary for improving enterprises productivity
and innovation in product development and manufacturing.
As the central database is not be finished and all the data is not
structured yet, the following example is presented for understanding the
data preparation matter and the essence of analytical data mining tool.
The aim is to find from all Estonian machinery enterprises those
enterprises, which are corresponding to the following criteria:
* Enterprises are located in North-Estonia.
* Enterprises are dealing with mechanical treatment of steel and
aluminium products.
* Employees" technological skills are on the highest level.
On the figure 4 is shown the target solution.
This was set up in data mining program Clementine. In the centre of
the figure is shown the stream, which enables to make necessary
selections for finding the suitable enterprises. Around the stream is
pointed out the selection conditions. As follows the condition
descriptions with comments are pointed out.
[FIGURE 4 OMITTED]
Geographic_Location_Id=2
As the enterprise locations are described in the database as
follows: Geographic_Location_Id=1, when Geographic_Location_Name=
West-Estonia Geographic_Location_Id=2, when Geographic_Location_Name=
North-Estonia Geographic_Location_Id=3, when Geographic_Location_Name=
Central-Estonia Geographic_Location_Id=4, when Geographic_Location_Name=
South-Estonia Geographic_Location_Id=5, when Geographic_Location_Name=
East-Estonia The selection for picking up the North-Estonian enterprises
is defined with the Geographic_Location_Id, which in the case of
North-Estonia is 2.
Technology_Id=2
Every technological capability is defined separately in the
database and marked with the specific ID. Because the technological
capability--mechanical treatment of steel-and aluminium products has
been defined in the database with the technology ID 2, the selection
condition is that kind. All Technology_Id definitions are not presented,
because there are more than hundred technologies.
Skill_Type_Id=4
Similarly to prior reasons, the selection has been done according
to the data definition in database.
Skill_Type_Id=1, when Skill_Type_Name= professionalisms
Skill_Type_Id=2 when Skill_Type_Name= personal identities
Skill_Type_Id=3, when Skill_Type_Name= base skills Skill_Type_Id=4, when
Skill_Type_Name= general skills
In addition this selection is associated with employees" skill
levels. It is shown, that the skills are divided into four main groups:
general skills, base skills, professionalisms and personal identities.
As the technical skills are one part of the general skills, the
necessary selection has been done. This selection helps to speed up the
query, because the technical skill is required only among the general
skills and the other skill types are excluded.
Competence_Id=24
The selection condition is done again according to the data
definition in database. Every skill is defined separately and marked
with its own specific ID. Technical skill is defined in the database
with the Competence_Id, which is 24.
Excistent_Skill_Level_Id=1
Skill levels are defined in the database as follows:
Excistent_Skill_Level_Id=1, when Skill_Level_Name= the highest
Excistent_Skill_Level_Id=2, when Skill_Level_Name= high
Excistent_Skill_Level_Id=3, when Skill_Level_Name= medium
Excistent_Skill_Level_Id=4, when Skill_Level_Name= low
Excistent_Skill_Level_Id=5, when Skill_Level_Name= the lowest
After the stream has been completed and the necessary selection
conditions have been done, the stream is executable. The result could be
shown in different ways. In that example it is presented as a table. In
other words the result will be the list of enterprises, which satisfies
the brought up criteria.
4. Conclusions
Data mining is a powerful tool, needed when amounts of data
increase rapidly. In addition, it could be used also for complex
analysis at a country level in sector of machinery, metal and apparatus
engineering. The implementation of DM could be useful for analyzing and
updating existing databases in a process of development collaborative
e-Manufacturing information system. In addition, its implementation
could give the significant effect for machinery enterprises productivity
and innovation in product development and manufacturing. Therefore the
future research is targeted to increasing of proactivity of the system.
If we add data feeds from embedded systems reporting technological
capability, the DM is one of the most promising methods to handle the
information thus increasing productivity of management and innovation in
the collaboration network.
After the main database will be created (based on three existing
databases: Metnet, Innomet and Innoclus) and "noisy" data will
be eliminated, the aim is to use all these gathered data about the
enterprises and those technological capabilities and employee's
skills for making and experimenting some DM models in order to increase
these enterprises productivity and innovation in product development. On
the other hand DM could be also useful from product improvement and
repair process improvement perspectives to be able to determine the most
frequent repairs by product, the factors that contribute to a failure
type, and the correlations between failures.
DOI: 10.2507/daaam.scibook.2009.11
5. Acknowledgement
The work has been supported by Estonian Science Foundation grant
ETF7852.
6. References
Chapman, P.; Clinton, J.; Kerber, R.; Khabaza, T.; Reinartz, T.;
Shearer, C.; Wirth, R. (2000). CRISP-DM 1.0: Step-by-step Data Mining
Guide. SPSS
Hand, D.; Mannila, H.; Smyth, P. (2001). Principles of Data Mining,
MIT Press, Cambridge, MA
Hastie , T.; Tibshirani, R. & Friedman, J. H. (2001). The
elements of statistical learning: Data mining, inference, and
prediction. New York: Springer
Kantardzic, M. (2003). Data Mining: Concepts, Models, Methods, and
Algorithms. John Wiley & Sons
Kusiak, A.; Smith, M. (2007). Data mining in design of products and
production systems, Annual Reviews in Control, 31. 147-156 A
Larose Daniel T. (2006). Data Mining Methods and Models. John Wiley
& Sons Inc., United States of America
Lentzsch, K. (2007). Introduction to Clementine and Data Mining.
SPSS Inc.
Riives, J.; Otto, T.; Keerman, M. (2007). INNOMET system
functionality and software description. Innovative development of human
resources in enterprise and in society (38-46). Tallinn: TUT Press
Roiger R.; Geatz M., (2003). Data Mining: a tutorial based primer
(CRSP-M), 408p. Lavoiser
Siqing, S.; Yin, C.; Yan, C. (2003). Data Mining--Concept, Model,
Method and Algorithm. Tsinghua University Publishing Company, Beijing
Tan, P-N.; Steinbach M. & Kumar V. (2006). Introduction to Data
Mining, Addison Wesley, ISBN-13:9780321321367
Tsai, F. S.; Chen, Y.; Chan, K. L. (2008). Probabilistic latent
semantic analysis for search and mining of corporate blogs. In C.
Soares, Y. Peng, J. Meng, Z.-H. Zhou, and T. Washio, editors,
Applications of Data Mining in E-business and Finance. IOS Press
This Publication has to be referred as: Matsi, B[irthe]; Otto,
T[auno]; Loun, K[aia] & Roosimolder, L[embit] (2009). Data Mining in
Production Management and Manufacturing, Chapter 11 in DAAAM
International Scientific Book 2009, pp. 097106, B. Katalinic (Ed.),
Published by DAAAM International, ISBN 978-3-90150969-8, ISSN 1726-9687,
Vienna, Austria
Authors' data: M.Sc. Matsi, B[irthe]; Ph.D. Otto, T[auno];
M.Sc. Loun, K[aia] & Ph.D. Prof Roosimolder, L[embit], Tallinn
University of Technology, Ehitajate tee 5, 19086, Tallinn, Estonia,
birthe21@hotmail.com, tauno.otto@ttu.ee, kaia.loun@ttu.ee,
lembitr@staff.ttu.ee