Data mining in production management and manufacturing.
Matsi, Birthe ; Loun, Kaia ; Otto, Tauno 等
1. INTRODUCTION
Traditionally analysts have performed the task of extracting useful
information from recorded data. But the increasing volume of data in
modern business and science calls for computer-based approaches. As data
sets have grown in size and complexity, there has been an unavoidable
shift away from direct hands-on data analysis toward indirect, automatic
data analysis using more complex and sophisticated tools. Nowadays the
modern technology has made data collection as an almost effortless task.
However, the captured data needs to be converted into information and
knowledge to become useful. Data Mining (DM) is the entire process of
applying computer-based methodology, including new techniques for
knowledge discovery from data (Kantardzic, 2003).
DM is a general term, which encompasses a number of techniques to
pick out useful information from large data files and enables to sort
it. It is a new powerful technology for analyzing data from different
perspectives and summarizing it into useful information. It has been
also said that DM is the analyses of (often large) observational data
sets to find unsuspected relationships and to summarize the data in
novel ways that are both understandable and useful to the data owner.
Nowadays it is also very often known as Knowledge Discovery in Databases
(KDD) or simply Knowledge Discovery (KD), which enables to identify
trends within data that go beyond simple analysis and through the use of
sophisticated algorithms; users have the ability to identify key
attributes of business processes and target opportunities. DM is
predicted to be "one of the most revolutionary developments of the
next decade", according to the online technology magazine ZDNET News (February 8, 2001). In fact, the MIT Technology Review has chosen
the DM as one of 10 emerging technologies that will change the world.
DM has been used widely by companies with a strong consumer focus
like retail, financial, communication, and marketing organizations,
where it enables to determine relationships among "internal"
factors such as price, product positioning, and "external"
factors such as economic indicators, competition, and customer
demographics (Hand et al., 2001).
There are many different possibilities for using DM but the most
frequent examples encompass mainly these applications:
* Rate customers by their propensity to respond to an offer.
* Identify cross-sell opportunities.
* Detect fraud and abuse in insurance and finance.
* Estimate probability of an illness re-occurrence or hospital
re-admission.
* Isolate root causes of an outcome in clinical studies.
* Determine optimal sets of parameters for a production line
operation.
* Predict peak load of a network.
[FIGURE 1 OMITTED]
The general Data Mining Process (DMP) includes six phases that
address the main issues in DM (Lentzsch, 2007). All these phases fit
together in cyclical process and cover the full DM process (see Fig 1).
In fact, the knowledge discovery activity could become the key
factor to innovation and business success (Kusiak & Smith, 2007).
Therefore DM could have a great potential also in manufacturing.
Data mining implementations in production management and
manufacturing is the key question of this paper.
2. DM IMPLEMENTATION IN PRODUCTION MANAGEMENT AND MANUFACTURING
2.1 Data mining in manufacturing
Products and components generate a data trail across lifecycle
phases such as market analysis, design engineering, manufacturing, and
service. DM algorithms extract knowledge from this large volume of data
leading to significant improvements in the next generation of products
and services. Integrating a DM framework within the manufacturing
information system enables to improve manufacturing decision making
process and enhance the productivity. It enables to analyze enterprises
opportunities and employee's skills and competences, find relations
between enterprises, customers and subcontractors, and make consequences
based on different data conjunctions.
In order to implement the DMP to the production management and
manufacturing sector, it is important to understand the business
problem--what we would like to do with all that data and also to
understand what that data is all about. It is obvious that all data need
to be in an easily accessible format and available from one central
database. It is often the case that relevant data files are stored in
several locations and in different formats and need to be pulled
together before analyses. The extracted information and knowledge can
assist the engineers as their reference and basis for advanced
investigation of the root causes of the defects (Larose, 2005). In
current case study is used data about the enterprises, their
technological capabilities, and employees' competences from three
different databases: Metnet, Innomet and Innoclus (see Fig 2). For data
analyzing the data understanding and preparation is unavoidable phases
in DMP. Also there is need to assess the data for the DM project. The
several aspects which should be considered are the followings:
2.2 Relevant factors covered by data
For making a DM project worthwhile, it is important that the data
contain all relevant factors/variables and is mutually joinable.
Therefore the data was separated into different tables according to
logical themes. For example, one table includes all information about
the enterprises contacts, other enterprises technological capabilities,
etc. It is smart to hold the data in different tables, because it
simplify the data understanding and facilitate later the DMP. In order
to joint the data between these thematically separated tables, the
common identifications (ID-s, for example: enterprise_id, sector_id etc)
were worked out. These ID-s are needed for connecting one enterprise
data from different tables.
In the figure 2 is pointed out in which tables all the data were
divided and from which primary source the data is available from.
2.3 Handling noisy data
The term "noisy" in DM refers usually to errors in data
or also sometimes to missing data (Hastie et al., 2001). In this study
that is the problem we have to handle. It results from the data
collection. All data about the enterprises capabilities were gathered
without multiple-choices. Initially the question options were not
defined and therefore every enterprise answered to the questions
differently. In order to understand the answers unambiguously, the data
synchronization was unavoidable. This solution for employee's
skills and competences and enterprise technological capabilities was
simpler, as the multiple-choices were worked out and enterprises
answered to the questions by doing the suitable selections.
2.4 Gathering enough data
It is obvious that the more complex patterns and relationships we
would like to find with data mining, the more records required to find
them. There is self-evident difference, when we are analyzing ten,
hundred or all Estonian machinery enterprises. It is important to point
out, that in our case study all machinery enterprises have been included
and that information has been gathered and will be used in different
analyses by DM.
3. DM ANALYSES
When the data will be gathered into one central database and also
would be structured and therefore easily under-standable, we could use
the DM in many different applications. We could build up the models,
which would be able to predict different important indications for
better and more effective production management. For example, it could
be possible to create the predictive DM model for investigating the
competences, which could be needed for most effective product
development. In addition it could be possible to classify enterprises
for different clusters based on different technological capabilities and
etc. Therefore the DM implementation is also effective in manufacturing
sector and certainly is necessary for improving enterprises productivity
and innovation in product development and manufacturing.
[FIGURE 2 OMITTED]
4. CONCLUSIONS
Data mining is a powerful tool, needed when amounts of data
increase rapidly. In addition, it could be used also for complex
analysis at a country level in sector of machinery, metal and apparatus
engineering. The implementation of DM could be useful for analyzing and
updating existing databases in a process of development collaborative
e-Manufacturing information system. In addition, its implementation
could give the powerful effect for machinery enterprises productivity
and innovation in product development and manufacturing. Therefore the
future research is targeted to increase the proactivity of the system.
If we add data feeds from embedded systems reporting technological
capability, the DM is one of the most promising methods to handle the
information thus increasing productivity of management and innovation in
the collaboration network.
After the main database will be created (for example based on three
existing databases: Metnet, Innomet and Innoclus) and "noisy"
data will be eliminated. The aim is to use all these gathered data about
the enterprises and those technological capabilities and employees'
skills for making and experimenting some DM models in order to increase
these enterprises productivity and innovation in product development. On
the other hand DM could be also useful from product improvement and
repair process improvement perspectives to be able to determine the most
frequent repairs by product, the factors that contribute to a failure
type, and the correlations between failures.
5. ACKNOWLEDGEMENTS
The work has been supported by Estonian Science Foundation grant
G6795.
6. REFERENCES
Hand, D.; Mannila, H.; Smyth, P. (2001). Principles of Data Mining,
MIT Press, Cambridge, MA.
Hastie, T.; Tibshirani, R. & Friedman, J. H. (2001). The
elements of statistical learning: Data mining, inference, and
prediction. New York: Springer.
Kantardzic, M. (2003). Data Mining: Concepts, Models, Methods, and
Algorithms. John Wiley & Sons.
Kusiak, A.; Smith, M. (2007). Data mining in design of products and
production systems, Annual Reviews in Control, 31. 147-156 A.
Larose Daniel T. (2006). Data Mining Methods and Models. John Wiley
& Sons Inc., United States of America.
Lentzsch, K. (2007). Introduction to Clementine and Data Mining.
SPSS Inc.