Data mining in telecommunications: case study of cluster analysis.
Bach, Mirjana Pejic ; Simicevic, Vanja ; Leskovic, Darko 等
1. INTRODUCTION
The development of many industries would not have flourished
without the support of information and communication technology.
Telecommunication industry uses information and communication technology
as a support for providing telecommunication services but also as for
business processes. The support of business processes is realized in the
form of: (1) transaction information systems which follow regular
business activities and generate standardized reports and (2) support
systems for the decision-making process which enable intelligent use of
data stored in the databases with the aim of making quality decisions.
Data mining is a part of the support system for the decision-making
process enabling many applications in the field of telecommunications.
The most frequent ones are the following: telecommunication market
analysis (Costea, 2006), preventing clients from shifting to other
companies (Lejeune, 2001; Hung et al., 2006), sale of additional
services to existing customers (Malabocchia et al., 1998), assessment of
the client's values (Daskalaki et al., 2003), as well as market
segmentation.
In telecommunication companies, for the purpose of segmentation of
the industrial market, the most frequently used variables include the
location and the size of the revenue realized from the sale of
telecommunication services. The aim of this paper is to present a case
study on the segmentation of the industrial market in a
telecommunication company by means of cluster analysis. The business
users' data were used as a sample and the approach of dynamic
market microsegmentation is suggested on the basis of the data for each
individual client.
2. METHODOLOGY
The objective of conducting a cluster analysis is to discover if
members of the dataset can be classified as pertaining to one of a small
number of types. This can be especially important for marketing managers
in order to discover what constitutes a market segment in a
telecommunication company.
The cluster analysis is conducted with the aim of assigning data
points (sequences) into reasonably homogenous groups (clusters). The
main task in the cluster analysis is to determine how many clusters are
to be used (Cattrell, 1998). If the number of clusters is too high,
dissimilarity within each cluster will be low, but clusters might be
very specific. Therefore, the result of such an analysis could not be
easily interpreted and generalized. If the number of clusters is too
low, the dissimilarity within each cluster will be high and such
clusters could not produce new and useful information. However, a
decision needs to be made on how many clusters will be used. In order to
describe the discovery of market segments in databases well, a case
study involving a telecommunication operator is used. This research will
enable us to present segmentation modalities used so far as well as the
proposed modality, based on the discovery of market segments in
databases. We will analyse the industrial market segmentation. The
telecommunication operator from the case study uses the basic market
segmentation, whereby two demographic criteria are used: location and
the size of the user (the total annual revenue from the user). The
market of the Republic of Croatia is divided into four geographic
regions. The industrial market is divided into five important market
segments based on the users' size measured by the total annual
revenue gained. The market segmentation is implemented once a year. One
should note that a period of a calendar year is too long for the
survival of static segments. In the course of a year, a large number of
legal subjects register with the company, which means a large number of
new telecommunication services' users in both private and business
sector. Additionally, the new services market is very dynamic. New
services are offered and some existing ones lose their importance. The
users buy new services and new solutions thus changing their position
towards the telecommunication operator. The presented approach to the
industrial market segmentation, which changes only every calendar year,
is not dynamic enough to encompass neither all the changes in the
business activities of business subjects nor the changes in the
telecommunications market. The analysis, in which variables are measured
by the total revenue, other than the location and the size of the user,
will be presented. The analysis is based on the following variables: (1)
total telecommunications revenue from the users, (2) coefficient of
revenue size from users, (3) potential of the user's branch of
economic activity, (4) ICT potential, (5) compactness of the
relationship between a user and the telecommunication operator and (6)
loyalty coefficient. A database of 2000 business users was analysed.
3. RESULTS
A cluster analysis was performed in four clusters, whereby the two
previously mentioned variables were omitted.
Cluster 1 contains the companies, which have an average compactness
of the relationship, very low revenue and low ICT potential. Cluster 2
represents the companies with high compactness of the relationship but
also with high revenue and average ICT potential. Cluster 3 includes the
companies with low ICT potential as well as low compactness of the
relationship and low revenue. Cluster 4 contains the companies with
highest revenue and low ICT potential as well as low compactness of the
relationship (Table 1).
In order to additionally determine in what way the identified
clusters differ from each other, a descriptive statistics for the used
variables will be presented: median values and standard deviations were
calculated for the Internet revenue and the revenue of fixed telephony
of the companies in individual clusters. The data showed that the
clusters, which have higher median values of variables, used for cluster
analysis in relation to other clusters also have higher average values
of internet revenue and revenue from fixed telephony and vice-versa. So,
the companies from Cluster 2, with the highest average values of
variables (the coefficient of the revenue size, ICT potential,
compactness of the relationship) have the highest average values related
to the Internet revenue and the revenue from fixed telephony. The
analysis of variance (ANOVA) showed that the differences of average
values are statistically significant for both Internet revenue
(p-value=0,000) and revenue from fixed telephony (p-value=0,000)
according to individual clusters. The data revealed that this assumption
is correct for both groups of revenue at 0.1 probability level. In order
to determine between which clusters the statistically significant
difference exists, a post-hoc analysis by means of Scheffe test was
performed. The data revealed that for Internet revenue there is a
statistically significant difference for all pairs of the Cluster 1 and
other clusters at 0.1 probability level. For the revenue from fixed
telephony a statistically significant difference exists for all pairs at
0.1 probability level except for Cluster 3 and Cluster 4. The analysis
of variance and Scheffe post-hoc analysis showed that the cluster
analysis is acceptable and that it resulted in determining market
segments of the analysed telecommunication operator.
The experts in the telecommunication company interpreted the
determined segments in the following way:
Cluster 1 represents the companies with very low coefficient of the
revenue size. These companies annually spend less than KN 10.000,00 for
telecommunication services. The data related to their ICT potential
suggest that these companies have low ICT potential. The ICT potential
is directing us to the companies, which in the future might have the
need for additional telecommunication solutions. Cluster 1 represents
the companies that also have an average level of compactness of the
relationship with our telecommunication operator. These companies have
been for quite some time the clients of this operator. Thus, this
Cluster might be named SOHO (small office home office).
Cluster 2 includes the companies with a high level of compactness
of the relationship and of ICT potential and somewhat lower level of
revenue. It is undoubtedly the most profitable market segment to which
the most attention should be paid. These companies are steady clients,
who will most probably have the need to expand their business and they
can be named LA (large account).
Cluster 3 represents the companies with an extremely low ICT
potential as well as the compactness of the relationship, with slightly
higher revenue from the lowest. It is the most unrewarding market
segment with the tendency of transferring to the competition. They have
not been the company's clients for a long time and they do not have
the need to develop their own ICT. The best name for this market segment
could be SI (Silver).
Cluster 4 represents the companies with highest revenue but in the
same time with low ICT potential and compactness of relationship. This
group could be named SME (small and medium enterprises).
4. CONCLUSION
The modern information and communication systems enable the storage
of a large number of transaction data. By means of transaction data
mining, it is possible to gain new knowledge on the users of
company's products/ services/solutions. It is necessary to apply
this knowledge in order to determine the user's habits and to form
effective market segments, which will be characterized by similar
consumer habits. A particular value of this case study lies in the
elaboration of the segmentation model based on gaining knowledge from
the databases of a Croatian telecommunication operator. It is a leading
regional information and communication company which, at the moment,
does not implement market segmentation using information from its own
and external databases but it uses the common approach to segmentation
based on location and the revenue size from telecommunication services
invoiced to individual users. The study has proved that the market
segmentation has to be based on thorough knowledge of users and their
habits and noting all the interactions with a user. The stored data can
be used for data mining, which will result in new knowledge on
users' habits and inclinations and enable forming effective market
segments. Targeted approach to individual market segments results in
significant competitive advantage. By using cluster analysis as the
proposed market segmentation model, exceptionally attractive market
segments were created. It enables the company to manage profitability
and loyalty of each user. Therefore, we have to be aware the limitation
of this research, that there is no correct number of clusters. However,
a decision is made on how many clusters we used. This model of market
segmentation vividly presents the importance of effective and
interactive market segmentation, which will result in their increased
competitiveness in the conditions of ever-growing globalization. Future
studies should be aimed at implementation of other statistical methods
and techniques as well as the methods of artificial intelligence in the
field of market segmentation.
5. REFERENCES
Cattrell, R.B. (1998). The Scientific Use of Factor Analysis in the
Behavioural and Life Sciences, Plenum Press, ISBN: 0306309394, New York,
USA
Costea, A. (2006). The Analysis of The Telecommunication Sector by
The Means of data Mining Techniques. Journal of Applied Quantitative
Methods, Vol. 1, No. 2, (December, 2006) pp. 144-150, ISSN: 1842-4562
Daskalaki, S.; Kopanas; I.; Goudara, M. & Avouris, N. (2003).
Data mining for decision support on customer insolvency in
telecommunications business. European Journal of Operational Research,
Vol. 145, No. 2, (Marc, 2003) pp. 239-255, ISSN: 0377-2217
Hung, S.; Yen, D.C. & Wang, H. Y. (2006). Applying data mining
to telecom churn management. Expert Systems with Applications, Vol. 31,
No. 3, (October, 2006) pp. 515-524, ISSN: 0957-4174
Lejeune, M.A.P.M. (2001). Measuring the impact of data mining on
churn management. Internet Research, Vol. 11, No. 5, (December, 2001)
pp. 375-387, ISSN: 1066-2243
Malabocchia, G.; Buriano, L.; Mollo, M.J.; Richeldi, M. &
Rossotto, M. (1998). Mining telecommunications data bases: an approach
to support the business management, Available from: Network Operations
and Management Symposium, 1998. NOMS 98., IEEE, Accessed:1998-02-15
Tab. 1. Average values of the variables
from individual clusters
Cluster 1 Cluster 2 Cluster 3 Cluster 4
Coefficient of the
revenue size 0.90 3.93 1.51 4.10
ICT potential 2.04 2.78 1.27 1.67
Compactness of the
relationship 3.13 3.86 0.38 2.36