文章基本信息

标题：Developing a hybrid data mining approach based on multi-objective particle swarm optimization for solving a traveling salesman problem.
作者：Haeri, Abdorrahman ; Tavakkoli-Moghaddam, Reza
期刊名称：Journal of Business Economics and Management
印刷版ISSN：1611-1699
出版年度：2012
期号：November
语种：English
出版社：Vilnius Gediminas Technical University
摘要：A traveling salesman problem (TSP) is a traditional and well-known optimization problem in the field of operations research. There are n cities and distances between cities are specific and known. In this paper, a symmetric TSP is considered, in which the distance from city i to city j is equal to the distance from city j to city i. A salesman starts from one arbitrary city and visits all cities exactly once and at the end returns to the first city. In other words, the aim of a TSP is to find a tour between cities that minimizes the total distances travelled by the salesman. This problem can be explained by graphs, in which cities are vertices of a graph and the route between two cities is an edge in the graph. The weight of each edge is the distance between two cities connected by an edge. The Hamilton tour is a tour between vertices that visits all vertices once. Therefore in this case, the purpose is to find a Hamilton tour so that the sum of edge weights in the tour is minimized. The input information is the distance matrix that shows the distance among any two pairs of cities. It can be obtained from coordination of cities in the two or three-dimensional space. Each city is specified with horizontal and vertical indices in a two-dimensional plane. The distance between each pair of two cities is equivalent to the Euclidean distance between two points in the two-dimensional space.
关键词：Algorithms;Data mining;Econometrics;Mathematical optimization;Optimization theory

Developing a hybrid data mining approach based on multi-objective particle swarm optimization for solving a traveling salesman problem.

Haeri, Abdorrahman ; Tavakkoli-Moghaddam, Reza

1. Introduction

A traveling salesman problem (TSP) is a traditional and well-known optimization problem in the field of operations research. There are n cities and distances between cities are specific and known. In this paper, a symmetric TSP is considered, in which the distance from city i to city j is equal to the distance from city j to city i. A salesman starts from one arbitrary city and visits all cities exactly once and at the end returns to the first city. In other words, the aim of a TSP is to find a tour between cities that minimizes the total distances travelled by the salesman. This problem can be explained by graphs, in which cities are vertices of a graph and the route between two cities is an edge in the graph. The weight of each edge is the distance between two cities connected by an edge. The Hamilton tour is a tour between vertices that visits all vertices once. Therefore in this case, the purpose is to find a Hamilton tour so that the sum of edge weights in the tour is minimized. The input information is the distance matrix that shows the distance among any two pairs of cities. It can be obtained from coordination of cities in the two or three-dimensional space. Each city is specified with horizontal and vertical indices in a two-dimensional plane. The distance between each pair of two cities is equivalent to the Euclidean distance between two points in the two-dimensional space.

There are many researches in the literature that use intelligent approaches, such as artificial neural network (ANN), for solving TSPs. Masutti and de Castro (2009) developed a modified version of an immune self-organizing neural network for solving a TSP. The results show that the performance of their proposed algorithm is better than other neural network methods in the literature. Leung et al. (2004) applied an expanding self-organizing map, called ESOM on some examples that their range is varied from 50 to 2400 cities. The results show the superiority of the proposed approach over some of other SOM approaches in the literature. Jin et al. (2003) proposed an integrated SOM (ISOM) with a new learning rule that combines three learning procedures available in the literature. Yan and Zhou (2006) applied a three-tier multi-agent approach to present solutions for TSPs. These tiers are ant colony optimization agent, genetic algorithm agent, and fast local searching agent. The results of this paper indicate the suitable performance of the proposed approach for both solution quality and computational time criteria. Tan et al. (2006) developed an improved multi-agent approach to solve large-scale TSPs. The proposed approach uses three kinds of agents with different functions that are generating a new solution, optimizing the current solution group, and refining the best solution. The experimental results show the good performance of the proposed approach. Liu et al. (2006) developed a hybrid of particle swarm optimization (PSO) and memetic algorithm for solving TSPs. In addition, it includes a simulated annealing (SA) local search based approach.

In the real world, there is usually more than one objective function. For example it is necessary to minimize the distance, cost, time and risk simultaneously. So it is necessary to consider more than one distance matrices between cities to minimize multiple objectives. In this paper, like Cheng et al. (2011), Samanlioglu et al. (2008), Jozefowiez et al. (2008) and Zhong et al. (2010), a bi-objectives TSP is considered. In multi-objective problems, there is no best solution, in which a collection of solutions is considered as best solutions. This collection, which is called non-dominated (efficient) solutions, is related to the dominance concept investigated in below. Consider that A and B are two solutions in a minimization multi-objective problem. Suppose that the following two conditions are occurred.

a) The objective values of solution A are less than or equal to the objectives of solution B.

b) The value of at least one objective of solution A is less than the considered objective of solution B.

In this condition, it is called that solution A dominates solution B. Indeed, solution B does not have any advantage in comparison with solution A. If there is not any solution that dominates solution A, it is called that solution A is a non-dominated solution. The aim of solving multi-objective problems is to find non-dominated solutions. In the literature, it is an evident approach in the recent years for solving problems with multiple objectives. Jaszkiewicz (2002) presented a genetic local search for multi-objective optimization problems to create non-dominated solutions. In each iteration of a local search, the process is implemented on generated offspring in order to increase the quality of solutions. At the end, he examined the efficiency of the proposed approach on TSP instances. Yang et al. (2008) considered a dynamic multi-objective TSP (DMO-TSP) of a mobile communication network that its attributes change dynamically. Attributes are the number of cities and conflict a degree between objectives. They proposed a parallel form of multi-algorithm co-evolution strategy (MACS) for solving this complicated model.

It is obvious from the literature that multi-objective particle swarm optimization (MOPSO) approach is not used for solving multi-objective TSPs. It is also apparent that the useful data mining (DM) approach is not used for solving TSPs effectively. It is worth noting that DM is a collection of computational techniques that can be used for finding knowledge, hidden patterns and rules from data in different sciences. Ince and Aktan (2009) introduced and applied some of data mining techniques in their research. In the recent years, data mining approach has been used for optimization purposes. In this paper, one of data mining techniques is used for extracting rules from non-dominated solutions in a multi-objective TSP (MOTSP). Indeed, this paper presents a hybrid approach consisting of the MOPSO procedure and data mining process for solving MOTSP. Whereas DM is an intelligent approach for solving problems, the proposed approach is then called intelligent MOPSO (IMOPSO). Three major steps of the proposed

IMOPSO for solving MOTSPs are stated as follows:

Step 1: Solving some MOTSPs with the MOPSO approach.

Step 2: Extracting rules from non-dominated solutions that are obtained in Step 1 in order to establish a rule set for each problem.

Step 3: Using a rule base for of each problem to improve obtained solutions of larger problems.

In this paper, the single and multi-objective PSO is explained first. Afterwards, the proposed IMOPSO is considered. For this purpose, the data mining process for extracting rules from non-dominated solutions of the MOPSO is demonstrated. After that the results of the MOPSO and the proposed IMOPSO are compared. Finally, conclusion and suggestions for future researches are stated.

2. Single and multi-objective PSO

Particle swarm optimization (PSO) is a meta-heuristic method that is used efficiently for solving NP-hard problems, such as TSP in the previous studies (e.g., Ouyang, Zhou (2011)), Shen et al. (2006), Liu et al. (2006), Zhang and Si (2010) and Shi et al. (2004)). This method simulates a moving group of fish or birds, called particle or swarm in PSO, respectively. In comparison with genetic algorithms (GAs), particle and swarm are similar to chromosome and population, respectively. In PSO, particles created in the first iteration are not excluded and are remained until the end. In each iteration, every particle has a position and a velocity, in which the positions of particles are updated in order to obtain better solutions. The best position for each particle is stored as personal best (i.e., pbest) position. The best position of all of particles is stored as global best (i.e., gbest) position. The best position is the position that has the minimum/maximum objective function. Symbols of the PSO procedure are stated as follows:

[x.sub.i,1], [x.sub.i,2], ..., [x.sub.i,n]: n continuous decision variables

[x.sub.i] = [[x.sub.i,1], [x.sub.i,2], ..., [x.sub.i,n]: Position vector in the i-th iteration.

[v.sub.i] = [[v.sub.i,1], [v.sub.i,2], ..., [v.sub.i,n]: Velocity vector in the i-th iteration.

[pbest.sub.i]: Vector that stores the best position of the particles during iterations

[gbest.sub.i]: Best positions of all particles during iterations

[c.sub.1], [c.sub.2]: Predefined coefficients

[r.sub.1], [r.sub.2]: Random numbers between 0 and 1, generated for each particle in each iteration w: Inertia factor that can be equal to one

The basic PSO approach for solving single-objective problems is stated as follows:

1) Initial particles are generated randomly.

2) Initial velocities of particles are zero.

3) In each iteration, the velocity of each particle is computed by:

[v.sub.i+1] = w x [v.sub.i] + [c.sub.1] x [r.sub.1] x ([pbest.sub.i] - [x.sub.i]) + [c.sub.2] x [r.sub.2] x ([gbest.sub.i] - [x.sub.i]). (1)

4) The position of each particle is updated by using the following equation.

[x.sub.i+1] = [x.sub.i] + [v.sub.i+1]. (2)

5) The above process is repeated until a termination condition is occurred. This condition is usually a number of iterations.

PSO is suitable for continuous variables. It is worth noting that a TSP is a problem with integer variables. So it is necessary to modify PSO to be applicable for solving TSPs. For this purpose, the rank ordered value (ROV) method is used as same as given in Liu et al. (2006). For solving an n-city TSP, a string with n numbers is defined, namely original string. Numbers of this string are in [0, 1] range, in which each number is correspondent to one city. Corresponding to each original string, a tour consisting of n cities is defined by using the ROV method. This method performs in three steps explained by:

1) Sorting the numbers of the original string in an ascending order.

2) Specifying the rank of each numbers in an ascending order.

3) Creating a tour with the rank of cities in an ascending order.

For example, consider a TSP with five cities and assume that the original string is A = [0.23, 0.11, 0.58, 0.49, 0.87]. The sorted order of A is as follows: A = [0.11, 0.23, 0.49, 0.58, 0.87]. So the corresponding tour with A is [2, 1, 4, 3, 5].

In this paper, a multi-objective particle swarm optimization (MOPSO) algorithm is used for solving a multi-objective traveling salesman problem (MOTSP). For this purpose, a crowding distance (CD) factor is defined on the basis of the concept given in Deb et al. (2002). This factor is used for specifying how much a solution is crowded with other solutions. In other words, it is a density estimator used for non-dominated solutions. Consider a collection that includes m non-dominated solutions. The CD factor for each solution is calculated by the following steps:

1) For each objective, sort solutions in an ascending order.

2) The CD for the first and last solutions in order is equal to [infinity]. In an applicable case, it can be equal to a big number.

3) For the other solutions, the CD is calculated by the relation shown below:

[CD.sub.i] = [f.sub.i+1] [f.sub.i-1])/([f.sub.max] - [f.sub.min]). (3)

Symbols of Eq. (3) are defined as follows:

[CD.sub.i]: Crowding distance (CD) factor of the i-th solution in the sorted collection of the non-dominated solutions.

[f.sub.i+1]: Objective function value of the (i+1)-th solution in the sorted collection of the non-dominated solutions.

[f.sub.i-1]: Objective function value of the (i-1)-th solution in the sorted collection of the non-dominated solutions.

[f.sub.max]: Maximum objective function value in the sorted collection of the non-dominated solutions.

[f.sub.min]: Minimum objective function value in the sorted collection of the non-dominated solutions.

4) The overall CD factor for each solution is the sum of CD factors for each objective function.

Steps of MOPSO are explained as follows:

1) Initial particles are generated randomly.

2) Initial velocities of particles are zero.

3) Evaluate all particles and select non-dominated solutions from swarm. Non-dominated solutions are stored in a pool, called repository. In each iteration, new non-dominated solutions are added to repository. If any of the current solutions of repository is dominated by new solutions, it is deleted from repository. The capacity of repository is limited and is defined by the user. Suppose that a number of non-dominated solutions are more than the capacity of repository. So it is necessary to delete (or exclude) some solutions from repository. In this situation, non-dominated solutions are sorted in an ascending order on the basis of their CD factor. Solutions with the smaller CD factor are excluded. It means that solutions, which are more crowded with other solutions, are deleted. It results in that solutions, which are less crowded with other solutions, are remained. It results in more diversification in the space search process during the algorithm implementation.

4) pbest of each particle is updated. In the first iteration, pbest is equal to the initial position of a particle. In the next iterations, pbest for each particle is updated by using three simple rules as follows:

a) If the current solution (i.e., position) dominates the pbest solution, the pbest solution is equal to the current solution.

b) If the pbest solution (i.e., position) dominates the current solution, the pbest solution is remained without any change.

c) If neither of them dominates the other, one of them is selected randomly as the pbest solution.

5) In each iteration, the velocity of each particle is calculated by:

[v.sub.i+1] = w x [v.sub.i] + [c.sub.1] x [r.sub.1] x ([pbest.sub.i] - [x.sub.i]) + [c.sub.2] x [r.sub.2] x([rep.sub.H] - [x.sub.i]). (4)

There is a main difference in the velocity equation between single and multi-objective problems. In multi-objective problems, there is not any global solution. Instead, there is a repository of non-dominated solutions. H implies to one of the solutions that is selected from repository. There are some ways for selecting a solution from repository at random. In this paper, similar to Tsou et al. (2007) it is selected from less crowded solutions. For this purpose, solutions in repository are sorted on the basis of their CD factors. Then 10% of solutions with less CD factors are specified and H is selected from them randomly. So, [rep.sub.H] is a vector stating position of the selected solution and is used in Eq. (4).

6) Update the position of each particle by using the following equation.

[x.sub.i+1] = [x.sub.i] + [v.sub.i+1]. (5)

7) The above process is repeated until a termination condition is occurred. This condition is usually a number of iterations.

It is necessary to tune up some parameters before running the algorithm. It is recommended that number of particles is set between 20 and 80 and number of iterations (swarms) is set between 80 and 120 (Coello, Lechunga 2002). In this paper, we consider 20 for the swarm size and 80 for a number of iterations. [c.sub.1] and [c.sub.2] coefficients are equal to 2. The repository capacity of should be defined by the user. In this paper, we consider that the repository capacity is equal to 20.

3. Data mining process

The data mining process is expressed on the basis of a standard procedure that is called the CRISP-DM algorithm and is explained in the previous studies, such as Olson and Delen (2008), Nisbet et al. (2009), Mladenic (2003), Han and Kamber (2006), Gupta (2006), Lin et al. (2008), Maimon and Rokach (2005), Riccia (2000) and Larose (2006). The six steps of this algorithm are as follows:

3.1. Business understanding

In this phase, the objective of the data mining process is defined. Usually, the business objective is considered in data mining studies. So this phase is named business understanding. However, the objective in this study is to find rules in non-dominated solutions for some examples of TSPs. In other words, in this paper the goal of the data mining study is finding suitable rules and patterns in non-dominated solutions of a TSP

3.2. Data understanding

In this phase, a perception from data set is obtained. There are two cost matrixes between cities in each TSP problem. Usually the cost matrix is the distance matrix between cities. So there are two distance matrices that show distances between cities. In an n-city TSP, distance matrices are n x n matrices as shown by [D.sub.n,1] and [D.sub.n,2]. In a bi-objectives TSP, each edge that connects two cities has two weights related to the two distance matrices. For example, consider p-q edge that connects p and q cities. Sp is the set of edges that connect p to other cities. [Sum.sub.p,1] is the sum of the weight of edges in [S.sub.p] on the basis of the [D.sub.n,1] matrix. Similarly [Sum.sub.p,2] is the sum of the weight of edges in Sp on the basis of the [D.sub.n,2] matrix.

The considered data set includes a table consisting of some rows and some columns. Rows and columns of this table are called records and fields, respectively. Each record presents one edge between two cities. Each field presents one of edge attributes. In this paper, nine attributes are considered for edges that are mentioned in below:

A field: This field is a binary (0 or 1) field that relates to existence of an edge in a non-dominated solution. If an edge exists in a non-dominated solution the value of this field is equal to 1. Also if an edge does not exist in a non-dominated solution the value of this field is equal to 0. As before said the goal of data mining process is finding rules in non-dominated solutions. So it is necessary to focus on edges that exist in non-dominated solutions. Therefore in data mining process, only edges with A = 1, which exist in a non dominated solution, are important. In other words, DM analysis is performed on edges with A = 1.

F1 field: The normalized value of each edge weight related to the whole matrix in the objective function 1.

F2 field: The normalized value of each edge weight related to the whole matrix in the objective function 2.

G1 field: The normalized value of each edge weight in [S.sub.p] set in the objective function 1.

G2 field: The normalized value of each edge weight in [S.sub.p] set in the objective function 2.

In this paper, the max-min method is used for normalizing. For example, consider a set of n variables that are called [x.sub.1], [x.sub.2], ..., [x.sub.n]. The normalized values of this set are shown by [nx.sub.1], [nx.sub.2], ...., [nx.sub.n] x min - x and max-x are the minimum and maximum values of the set, respectively. Each value is normalized by using Eq. (6) stated as follows:

[nx.sub.i] = ([x.sub.i] - min - x)/(max - x - min - x). (6)

3.3. Data preparation

Usually a pure data set is not suitable for performing data mining algorithms. Data preparation provides the possibility to present a standard framework for decision making and comparison. Since data in different problems have different values, so it is necessary to convert values to standard and normalized values. Thus in this phase, preparation of data set is performed. The value of field A is equal to 0 or 1 and values of F1, F2, G1, and G2 fields are in [0, 1] range. So there is no need for changing these values any more.

3.4. Modeling

In this phase, a suitable data mining algorithm is performed on the normalized data set, and then the results are obtained. Association rule mining (ARM) algorithms apply on a data set of records and fields to efficiently extract suitable rules that explain relationships between fields. In general for applying ARM algorithms, fields are specified as input or output fields (factors). The ARM algorithms present if-then rules to explain relations between input and output fields (factors). For example, consider this rule: If "B < x" then "A = 1". Antecedent of the rule is "B < x" and B is one of input fields. "A = 1" is consequent of the rule and A is one of output fields. Rules have two major indices, called support and confidence. For aforesaid simple rule support is the percentage of records that "B < x" condition is occurred. This value is shown with Y. Consider percentage of records that both of "B < x" and "A = 1" conditions are occurred. This value is shown with X. Confidence is equal to division of X to the Y. Indeed confidence is the accuracy of rule and is a good measure for specifying how much a rule is reliable. Support and confidence are two important criteria for selecting suitable and efficient rules. Rules with high support are frequent and rules with high confidence have high accuracy.

The goal of this study is to find suitable rules about edges in non-dominated solutions. In other words, the purpose is to specify which edges have more chance to be in a non-dominated solution. So output field (goal field) is existence of an edge in a non-dominated solution. In this case, field A is output field and other fields that are attributes of edges (i.e., F1, F2, G1 and G2) are input fields. There are two major ARM algorithms, namely Apriori and GRI (Generalized Rule Induction). It is necessary to mention that the Apriori algorithm does not accept continuous fields. Since that input fields include continuous values, this algorithm is not applicable for the considered data set, and the GRI algorithm is used for extracting rules. The GRI method is introduced and applied in previous studies, such as Larose (2005), Abbas et al. (2002) and Bramer (2007, 1999). To perform the GRI algorithm, the SPSS Clementine 11.1 software is used.

3.5. Evaluation

In this phase, the results of the previous phase are evaluated and analyzed. For each non-dominated solution, rules that include "A = 1" term as a consequent are considered. Since that number of rules is much, it is necessary to select some of rules for more analysis. In this paper, since confidence is the good criterion for rule selection, a threshold for rule confidence is defined. Rules that their confidence is lower than threshold are excluded and rules that their confidence is higher than threshold are stored for more analysis in the next step. Threshold selection for confidence of rules is user-defined and depends to the nature of the considered problem and its needed accuracy. In this paper, 70% is a good threshold that can satisfy needed accuracy and is considered as the minimum threshold for confidence of rules.

3.6. Deployment

In this phase, the results of the previous steps (i.e., extracted rules) are used for solving MOTSP problems. Indeed, the set of extracted rules from efficient solutions of an MOTSP problem is considered. For example, consider [RS.sub.n] that is a rule set of efficient solutions from an n-city bi-objectives problem. This rule set contains m rules as mentioned in Table 1.

After establishing a rule set, it is necessary to use of the rule set to solve another k-city (k > n) problem. The following steps that explain the IMOPSO approach are performed for this purpose:

1) First a k-city bi-objectives TSP problem with the MOPSO method is solved. The obtained efficient solutions constitute a set that is called [ES.sub.k]. Suppose that [ES.sub.k] contains p efficient solutions.

2) For i-th (1 [less than or equal to] i [less than or equal to] p) solution of the [ES.sub.k] set, consider the j-th (1 [less than or equal to] j [less than or equal to] m) rule of [RS.sub.n]. If consequent of the j-th rule is "False", then the next rule will be selected. If consequent of the j-th rule is "True", it means that this rule states the conditions that two cities (e.g., x and y) with probability [CR.sub.j] can be adjacent.

3) Therefore, it is probable that cities x and y are adjacent in efficient solutions of the k-city bi-objective TSP problem. If cities x and y are adjacent in the i-th solution, it will be remained without any change.

4) If cities x and y are not adjacent in the i-th solution, a random number (e.g., RI) will be generated and compared with the j-th rule confidence. If (RI [less than or equal to] CRj), then the i-th solution will be changed so that cities x and y will be adjacent. For this purpose, a tour of cities will be considered and one of the adjacent cities of x (e.g., z) will be selected. Then, position of y and z will be exchanged to reach a new tour that includes x and y as adjacent cities.

5) The previous steps (i.e., Steps 2, 3 and 4) are performed several times to obtain a diverse set of solutions.

6) At the end, new set of solutions are explored to select efficient solutions from that. Afterwards, the new obtained efficient solutions from the IMOPSO approach are compared with the previous efficient solutions obtained by the MOPSO approach in order to find the final set of efficient solutions. This comparison specifies that how the proposed hybrid approach can improve the ability to reach the efficient solutions.

4. Experimental results and discussion

The proposed approach is performed on five examples of TSPs. Test problems are taken from the TSPLIB (http://elib.zib.de/pub/mp-testdata/tsp/tsplib/tsplib.html), whose data set includes the TSP data set with a single objective. However, in this paper, two objectives TSP examples are needed. To prepare these examples, a trick similar to Jaszkiewicz (2002) is used. For each n-city problem, two n-city examples are considered. So the distance matrices from a bi-objectives problem are taken from two problems with a single distance matrix. For example, to create a 29-city bi-objectives TSP, "bayg29" and "bays29" are considered and each distance matrix is taken from one of them. The obtained TSP with two distance matrices is called EX29, in which EX29 is a bi-objectives TSP problem that includes both "bayg29" and "bays29" distance matrices. Other bi-objectives examples and their correspondent single-objective problems are stated in Table 2.

For each example, the MOPSO approach is used to reach efficient solutions (ESs). Then, the GRI algorithm is applied to extract rules from efficient solutions. Each rule is an "If-Then" statement that includes some terms as antecedents and consequent. In this paper, F1, F2, G1 and G2 fields are antecedents and field A is consequent. For example, one rule that is extracted from EX29 is stated as follows:

If ((F1 < 0.17487) & (F2 > 0.146366) & (G1 < 0.222709) & (0.212743< G2 < 0.292752)), then A = 1

All rule sets related to EX29, EX42, EX48, and EX76 are stated in Appendix. There are some similarities between rules such as:

* Consequent of all rules is "A = 1"

* Antecedents of all rules are about F1, F2, G1 and G2 fields.

Differences among rules are stated as follows:

* Some rules do not include all fields. For example, this is a rule from EX42 that does not include F2 and G1 fields.

If (F1 < 0.356874) and (0.247236 < G2 < 0.24897), then A = 1.

* Determined ranges of F1, F2, G1 and G2 fields are different numerical values in different rules.

Extracted rules are used to improve efficient solutions from the MOPSO approach for larger MOTSP problems. For example, a rule set from EX29, called EX29 rules set, is used to improve efficient solutions of EX42, EX48, EX76 and EX100. Table 3 shows the results of applying the EX29 rule set to solve large-scale problems. In addition, Table 3 includes comparison between the performance of MOPSO and intelligent MOPSO (IMOPSO) approaches. The first column of this table shows the considered problem. The second and third columns include the number of efficient solutions (ES) obtained by the applying MOPSO and IMOPSO approaches, respectively. The fourth column shows the number of efficient solutions of the MOPSO approach (MOPSO-ES) that are dominated by the efficient solutions of the IMOPSO approach (IMOPSO-ES). The fifth column shows the number of the efficient solutions of the IMOPSO method (IMOPSOES) that are dominated by the efficient solutions of the MOPSO approach (MOPSOES). The sixth column states the total number of a collection that includes MOPSO-ES and IMOPSO-ES simultaneously. The seventh and eighth columns contain the number of MOPSO-ES and IMOPSO-ES in the new collection, respectively.

The fourth column of Table 3 shows that efficient solutions of the IMOPSO approach dominated all of the efficient solutions of the MOPSO approach for EX48, EX76 and EX100. Only four MOPSO solutions for EX42 are not dominated by the IMOPSO efficient solutions. The fifth column of Table 3 indicates that there is no efficient solution of the IMOPSO approach that is dominated by any solution of the MOPSO approach for all considered problems (EX42, EX48, EX76 and EX100). The seventh and eighth columns of Table 3 show that for EX48, EX76 and EX100 all solutions of collection of MOPSO-ES and IMOPSO-ES are dedicated to the IMOPSO-ES. Only four solutions of MOPSO-ES in EX42 exist in collection of MOPSO-ES and IMOPSO-ES.

Similarly to Table 3, Tables 4 to 6 show the results of applying EX42, EX48 and EX76 rule sets, respectively. Table 4 shows that efficient solutions of the IMOPSO approach dominate all efficient solutions of the MOPSO approach for EX76 and EX100. Only three MOPSO solutions of EX48 are not dominated by the IMOPSO solutions. Moreover, there is no efficient solution of the IMOPSO approach that is dominated by any solution of the MOPSO approach for all problems (EX48, EX76 and EX100).

Table 5 shows that efficient solutions of the IMOPSO approach dominate all efficient solutions of the MOPSO approach for EX76. Two MOPSO solutions of EX100 are dominated by the IMOPSO solutions. Moreover, there is no solution of the IMOPSO approach that is dominated by any solution of the MOPSO approach for EX76. Also only three solutions of the IMOPSO approach are dominated by solutions of the MOPSO approach for EX100. Table 6 shows that efficient solutions of the IMOPSO approach dominate three solutions of the MOPSO approach for EX100. Also two solutions of the IMOPSO approach are dominated by solutions of MOPSO for EX100.

Tables 3 to 6 show the absolute superiority of the IMOPSO approach in comparison to the MOPSO approach in the following there aspects:

a) Most of efficient solutions of the MOPSO approach are dominated by the efficient solutions of the IMOPSO approach.

b) Most of efficient solutions of the IMOPSO approach are not dominated by the efficient solutions of the IMOPSO approach.

c) Applying IMOPSO approach results in presenting new efficient solutions in comparison to the MOPSO approach.

Finally, Table 7 shows a summary of results for applying the EX29, EX42, EX48 and EX76 rule sets. In this table, the average percentage of efficient solutions of the MOPSO approach dominated by the efficient solutions of IMOPSO approach is presented. In addition, this table includes the average percentage of efficient solutions of the IMOPSO approach that are dominated by the efficient solutions of the MOPSO approach. At whole, this indicates that most solutions of the MOPSO approach are dominated by the efficient solutions of the IMOPSO approach. Instead, few solutions of the IMOPSO approach are dominated by the efficient solutions of the MOPSO approach.

5. Conclusions

This paper has proposed an integrated intelligent approach for solving a multi-objective traveling salesman problem (MOTSP). This approach has used data mining and multi-objective particle swarm optimization (MOPSO). First, five problems were solved by the MOPSO approach. Then, data mining (DM) was used to find knowledge from efficient solutions of MOTSPs. So DM based on MOPSO was called intelligent MOPSO (IMOPSO) as a novel hybrid approach. Then, the GRI algorithm, which was an association rule mining algorithm, was performed and the extracted knowledge was explained as if-then rules. Extracted rules were used for solving new problems. The process of rule extracting and applying them to improve solutions of the MOPSO approach was stated in a standard data mining framework, called CRISP-DM algorithm. The proposed approach was compared with the MOPSO approach resulting that the IMOPSO approach has two major benefits. First, it produces new efficient solutions and therefore increases the number of non-dominated (efficient) solutions. The second benefit is that most solutions of the MOPSO approach are dominated by the efficient solutions of the IMOPSO approach. So, the IMOPSO approach presents better solutions. Indeed, In addition, a few solutions of the IMOPSO approach were dominated by solutions of the MOPSO approach. In other words, the IMOPSO approach produced solutions that were better than solutions of the MOPSO approach in terms of the solution quality and quantity. Table 7 shows that 91, 90, 64.5 and 43% of efficient solutions of the MOPSO approach are dominated by the efficient solutions of the IMOPSO approach in case of applying EX29, EX42, EX48 and EX76 rule sets, respectively. In addition, it has shown that only 0, 0, 11.5 and 15% of efficient solutions of the IMOPSO approach are dominated by the efficient solutions of the MOPSO approach in case of applying EX29, EX42, EX48 and EX76 rule sets, respectively. Furthermore, in multi-objective problems, finding many numbers of efficient solutions is a major benefit. The IMOPSO approach provides more efficient solutions in comparison to the MOPSO approach. Applying the hybrid proposed approach in this paper to the other optimization problems can be suggested for future research. Furthermore, it is suggested to develop a rule-based optimization approach that uses other rule extracting techniques during the optimization process.

doi: 10.3846/16111699.2011.643445

APPENDIX

Rule sets of EX29, EX42, EX48, and EX76

Problem           Antecedent 1                 Antecedent 2
                   (F1 Field)                   (F2 Field)

Ex29              F1 < 0.167098               F2 > 0.146366
Ex29              F1 < 0.185233               F2 > 0.200393
Ex29              F1 < 0.090673          0.027505 < F2 < 0.085461
Ex29              F1 < 0.17487                F2 > 0.146366
Ex29        0.156736 < F1 < 0.167098          F2 > 0.081533
Ex29              F1 < 0.123057          0.083497 < F2 < 0.167977
Ex29        0.156736 < F1 < 0.167098           F2 > 0.11886
Ex29              F1 < 0.123057               F2 < 0.085461
Ex29              F1 < 0.167098               F2 > 0.146366
EX42         0.194803 < F1 < 0.24207          F2 > 0.119195
EX42
EX42        0.194803 < F1 < 0.195786     0.100619 < F2 < 0.168731
EX42        0.300163 < F1 < 0.356874     0.140867 < F2 < 0.174923
EX42         0.194803 < F1 < 0.24207
EX42              F1 < 0.356874
EX42        0.101025 < F1 < 0.233493     0.106811 < F2 < 0.147059
EX48        0.167129 < F1 < 0.199907
EX48         0.1759 < F1 < 0.199907
EX48
EX48              F1 > 0.158357               F2 < 0.071507
EX48        0.050323 < F1 < 0.199907          F2 < 0.162582
EX48        0.070637 < F1 < 0.199907          F2 < 0.162582
EX48        0.174053 < F1 < 0.199907
EX48                                          F2 < 0.071507
EX48              F1 > 0.158357               F2 < 0.071507
EX48        0.146352 < F1 < 0.165743          F2 > 0.207023
EX48         0.150046 < F1 < 0.19252
EX48              F1 < 0.199907               F2 < 0.360827
EX48        0.151893 < F1 < 0.165743     0.273043 < F2 < 0.580285
EX48
EX48        0.149122 < F1 < 0.182825          F2 > 0.207023
EX48        0.150046 < F1 < 0.191597
EX48        0.146352 < F1 < 0.165743          F2 > 0.207023
EX48              F1 > 0.167129
EX48
EX48         0.151893 < F1 < 0.19252     0.207023 < F2 < 0.580285
EX76        0.227084 < F1 < 0.234679          F2 < 0.190058

Problem          Antecedent 3               Antecedent 4
                  (G1 Field)                 (G2 Field)

Ex29            G1 < 0.213662         0.212743 < G2 < 0.292752
Ex29                                  0.215393 < G2 < 0.292752
Ex29            G1 > 0.109936              G2 > 0.095274
Ex29            G1 < 0.222709         0.212743 < G2 < 0.292752
Ex29            G1 < 0.213662         0.176741 < G2 < 0.292752
Ex29            G1 > 0.133524               G2 > 0.12297
Ex29            G1 < 0.213662         0.107629 < G2 < 0.292752
Ex29            G1 < 0.150334              G2 > 0.095274
Ex29            G1 < 0.213662         0.176741 < G2 < 0.292752
EX42       0.138495 < G1 < 0.355553        G2 < 0.182414
EX42            G1 < 0.494339         0.247236 < G2 < 0.24897
EX42       0.138495 < G1 < 0.494339         G2 < 0.24897
EX42       0.41657 < G1 < 0.517352         G2 > 0.237426
EX42       0.297633 < G1 < 0.355553        G2 < 0.182414
EX42                                  0.247236 < G2 < 0.24897
EX42            G1 < 0.494339              G2 < 0.181245
EX48       0.25849 < G1 < 0.288211         G2 < 0.309073
EX48       0.261527 < G1 < 0.288211   0.217505 < G2 < 0.512937
EX48            G1 < 0.303016         0.502909 < G2 < 0.506344
EX48       0.180723 < G1 < 0.313554        G2 < 0.512937
EX48            G1 < 0.190416         0.192244 < G2 < 0.512937
EX48            G1 < 0.190416         0.192244 < G2 < 0.512937
EX48       0.261527 < G1 < 0.288211   0.217505 < G2 < 0.512937
EX48       0.228761 < G1 < 0.345719        G2 < 0.512937
EX48       0.180723 < G1 < 0.345719        G2 < 0.512937
EX48       0.194593 < G1 < 0.224968        G2 > 0.373047
EX48       0.113653 < G1 < 0.277044   0.437615 < G2 < 0.512937
EX48       0.204753 < G1 < 0.345719   0.405038 < G2 < 0.512937
EX48       0.194593 < G1 < 0.242592
EX48       0.180723 < G1 < 0.345719   0.502909 < G2 < 0.512937
EX48       0.180723 < G1 < 0.213478         G2 < 0.69799
EX48       0.113653 < G1 < 0.345719   0.436966 < G2 < 0.512937
EX48       0.194593 < G1 < 0.224968
EX48       0.25849 < G1 < 0.277044         G2 < 0.309073
EX48            G1 < 0.345719         0.501403 < G2 < 0.512937
EX48            G1 < 0.242132         0.423106 < G2 < 0.69799
EX76       0.277293 < G1 < 0.332044        G2 > 0.160957

Problem    Consequent   Confidence
           (A Field)       (%)

Ex29         A = 1         100
Ex29         A = 1         100
Ex29         A = 1         100
Ex29         A = 1        85.71
Ex29         A = 1        83.33
Ex29         A = 1          80
Ex29         A = 1          75
Ex29         A = 1        71.43
Ex29         A = 1          70
EX42         A = 1         100
EX42         A = 1         100
EX42         A = 1         100
EX42         A = 1        83.33
EX42         A = 1          80
EX42         A = 1          75
EX42         A = 1        71.43
EX48         A = 1         100
EX48         A = 1         100
EX48         A = 1         100
EX48         A = 1         100
EX48         A = 1        90.91
EX48         A = 1          90
EX48         A = 1        88.89
EX48         A = 1         87.5
EX48         A = 1        85.71
EX48         A = 1        83.33
EX48         A = 1        81.82
EX48         A = 1          80
EX48         A = 1        78.57
EX48         A = 1        77.78
EX48         A = 1        76.92
EX48         A = 1          75
EX48         A = 1        73.33
EX48         A = 1        72.73
EX48         A = 1        71.43
EX48         A = 1        70.59
EX76         A = 1         100

References

Abbass, H. A.; Sarker, R. A.; Newton, C. S. 2002. Data Mining: a Heuristic Approach. PA: Idea Group Publishing.

Bramer, M. A. 1999. Knowledge Discovery and Data Mining. London: The Institution of Electrical Engineers.

Bramer, M. A. 2007. Principles of Data Mining. London: Springer.

Coello, C. A.; Lechunga, M. S. 2002. MOPSO: a proposal for multiple objective particle swarm optimization, in Proc. of the IEEE World Congress on Evolutionary Computation. Hawaii, 1051-1056.

Cheng, J.; Zhang, G.; Li, Z.; Li, Y. 2012. Multi-objective ant colony optimization based on decomposition for bi-objective traveling salesman problems, Soft Computing 16(4): 597-614.

Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. 2002. A fast and elitist multi-objective genetic algorithm: NSGAII, IEEE Transaction on Evolutionary Computation 6(2): 181-197. http://dx.doi.org/10.1109/4235.996017

Gupta, G. K. 2006. Introduction to Data Mining with Case Studies. New Delhi: Prentice-Hall of India.

Han, J.; Kamber, M. 2006. Data Mining: Concepts and Techniques. USA: Elsevier Inc. Available from Internet: http://elib.zib.de/pub/mp-testdata/tsp/tsplib/tsplib.html

Ince, H.; Aktan, B. 2009. A comparison of data mining techniques for credit scoring in banking: a managerial perspective, Journal of Business Economics and Management 10(3): 233-240. http://dx.doi.org/10.3846/1611-1699.2009.10.233-240

Jaszkiewicz, A. 2002. Genetic local search for multi-objective combinatorial optimization, European Journal of Operational Research 137(1): 50-71. http://dx.doi.org/10.1016/S0377-2217(01)00104-7

Jin, H.-D.; Leung, K.-S.; Wong, M.-L.; Xu, Z.-B. 2003. An efficient self-organizing map designed by genetic algorithms for the traveling salesman problem, IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 33(6): 877-888. http://dx.doi.org/10.1109/TSMCB.2002.804367

Jozefowiez, N.; Glover, F.; Laguna, M. 2008. Multi-objective meta-heuristics for the traveling salesman problem with profits, Journal of Mathematical Modelling and Algorithms 7(2): 177-195. http://dx.doi.org/10.1007/s10852-008-9080-2

Larose, D. T. 2005. Discovering Knowledge in Data: an Introduction to Data Mining. NJ: John Wiley & Sons, Inc.

Larose, D. T. 2006. Data Mining Methods and Models. NJ: John Wiley & Sons, Inc.

Leung, K.-S.; Jin, H.-D.; Xu, Z.-B. 2004. An expanding self-organizing neural network for the traveling salesman problem, Neurocomputing 62(1-4): 267-292. http://dx.doi.org/10.1016/j.neucom.2004.02.006

Lin, T. Y.; Xie, Y.; Wasilewska, A. 2008. Data Mining: Foundations and Practice. Heidelberg: Springer. http://dx.doi.org/10.1007/978-3-540-78488-3

Liu, B.; Wang, L.; Jin, Y.-H.; Huang, D.-X. 2006. An effective PSO-based memetic algorithm for TSP, Lecture Notes in Control and Information Sciences 345: 1151-1156. http://dx.doi.org/10.1007/978-3-540-37258-5_149

Maimon, O. Z.; Rokach, L. 2005. Data Mining and Knowledge Discovery Handbook. USA: Springer. http://dx.doi.org/10.1007/b107408

Masutti, T. A. S.; de Castro, L. N. 2009. A self-organizing neural network using ideas from the immune system to solve the traveling salesman problem, Information Sciences 179(10): 1454-1468. http://dx.doi.org/10.1016/j.ins.2008.12.016

Mladenic, D. 2003. Data Mining and Decision Support: Integration and Collaboration. USA: Kluwer Academic Publishers.

Nisbet, R.; Elder, J.; Elder, J. F.; Miner, G. 2009. Handbook of Statistical Analysis and Data Mining Applications. Canada: Elsevier Inc.

Olson, D. L.; Delen, D. 2008. Advanced Data Mining Techniques. Heidelberg: Springer.

Ouyang, A.; Zhou, Y. 2011. An improved PSO-ACO algorithm for solving large-scale TSP, Advanced Materials Research 143-144: 1154-1158. http://dx.doi.org/10.4028/www.scientific.net/AMR.143-144.1154

Riccia, G. D.; Kruse, R.; Lenz, H.-J. 2000. Computational Intelligence in Data Mining. Italy: Springer.

Samanlioglu, F.; Ferrell, Jr. W. G.; Kurz, M. E. 2008. A memetic random-key genetic algorithm for a symmetric multi-objective traveling salesman problem, Computers and Industrial Engineering 55(2): 439-149. http://dx.doi.org/10.1016/j.cie.2008.01.005

Shen, B.; Yao, M.; Yi, W. 2006. Heuristic information based improved fuzzy discrete PSO method for solving TSP, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 4099: 859-863.

Shi, X. H.; Xing, X. L.; Wang, Q. X.; Zhang, L. H.; Yang, X. W.; Zhou, C. G.; Liang, Y. C. 2004. A discrete PSO method for generalized TSP problem, in Proceedings of 3rd International Conference on Machine Learning and Cybernetics, August 26-29, 2004. Shanghai, China, 2378-2383.

Tan, Y.-A.; Zhang, X.-H.; Xing, L.-N.; Zhang, X.-L.; Wang, S.-W. 2006. An improved multi-agent approach for solving large traveling salesman problem, Lecture Notes in Artificial Intelligence 4088: 351-361.

Tsou, C.-S.; Chang, S.-C.; Lai, P.-W. 2007. Swarm Intelligence: Focus on Ant and Particle Swarm Optimization. Chapter five. I-Tech Education and Publishing.

Yan, S.-L.; Zhou, K.-F. 2006. Three-tier multi-agent approach for solving traveling salesman problem, Lecture Notes in Artificial Intelligence 4099: 813-817.

Yang, M.; Kang, Z.; Kang, L. 2008. A parallel multi-algorithm solver for dynamic multi-objective TSP (DMO-TSP), Lecture Notes in Artificial Intelligence 5227: 164-173.

Zhang, J.-W.; Si, W.-J. 2010. Improved Enhanced Self-Tentative PSO algorithm for TSP, in Proceedings of 6th International Conference on Natural Computation, August 10-12, 2010. Yantai, China, 2638-2641. http://dx.doi.org/10.1109/ICNC.2010.5583011

Zhong, H.; Zhuang, N.; Cai, W.; Zhang, G.; Wu, Z. 2010. Multi-objective two-depot traveling salesman problem with uncertain fleet size for hazardous materials, in Proceedings of 8th International Conference on Supply Chain Management and Information Systems: Logistics Systems and Engineering, October 6-8, 2010. Hong Kong, China, 1-8.

Abdorrahman Haeri (1), Reza Tavakkoli-Moghaddam (2)

Department of Industrial Engineering and Center of Excellence for Intelligence Based Experimental Mechanic, College of Engineering, University of Tehran, Tehran, Iran E-mail: (2) tavakoli@ut.ac.ir (corresponding author)

Received 26 January 2011; accepted 15 November 2011

Abdorrahman HAERI is a PhD student in Department of Industrial Engineering at University of Tehran. His interested research area is about data mining and soft computing methods, such as genetic algorithm and particle swarm optimization. His previous papers were about applying data mining methods, such as association rule mining, on optimization and decision-making problems.

Reza TAVAKKOLI-MOGHADDAM is a professor of Industrial Engineering at College of Engineering, University of Tehran in Iran. He obtained his PhD in Industrial Engineering from the Swinburne University of Technology in Melbourne (1998), his M.Sc. in Industrial Engineering from the University of Melbourne in Melbourne (1994) and his B.Sc. in Industrial Engineering from the Iran University of Science & Technology in Tehran (1989). He serves as Editorial Boards of the International Journal of Engineering and Iranian Journal of Operations Research. Additionally, he is the Executive Member of Board of Iranian Operations Research Society. He is the recipient of the 2009 and 2011 Distinguished Researcher Award and the 2010 Distinguished Applied Research Award at the University of Tehran, Iran. He has been selected as National Iranian Distinguished Researcher for 2008 and 2010. Professor Tavakkoli-Moghaddam has published 3 books, 12 book chapters, more than 450 papers in reputable academic journals and conferences.

Table 1. Rules that constitute the [RS.sub.n] rule set

Number    Antecedent   Consequent   Confidence
of rule

1          [A.sub.1]     A = 1      [CR.sub.1]
2          [A.sub.2]     A = 1      [CR.sub.2]
...           ...         ...          ...
m          [A.sub.m      A = 1      [CR.sub.m]

Table 2. Original examples from the TSPLIB
corresponding bi-objectives examples

Original example     New two objectives
                          example

bayg29, bays29              EX29
dantzig42, swiss42          EX42
gr48, hk48                  EX48
eil76, pr76                 EX76
kroD100, kroE100           EX100

Table 3. Results of applying the EX29 rule set

Problem   No. of ES   No. of ES    No. of MOPSO-ES   No. of IMOPSO-ES
          for MOPSO   for IMOPSO   dominated by      dominated by
          approach    approach     IMOPSO-ES         MOPSO-ES

EX42         11           7            7 (64%)            0 (0%)
EX48         10           9           10 (100%)           0 (0%)
EX76         11           8           11 (100%)           0 (0%)
EX100         7           13          7 (100%)            0 (0%)
Average                                  91%                0%

Problem   Total number    Number of     Number of
          of Collection   MOPSO-ES in   IMOPSO-ES
          of MOPSO-ES     collection    in collection
          and IMOPSO-ES

EX42       (11 - 7) +     11 - 7 = 4      7 - 0 = 7
          (7 - 0) = 11
EX48       (10 - 10) +    10 - 10 = 0     9 - 0 = 9
           (9 - 0) = 9
EX76       (11 - 11) +    11 - 11 = 0     8 - 0 = 8
           (8 - 0) = 8
EX100       (7 - 7) +      7 - 7 = 0      13 - 0 =
          (13 - 0) = 13                      13
Average

Table 4. Results of applying the EX42 rule set

Problem   No. of ES   No. of ES    No. of MOPSO-ES   No. of IMOPSO-ES
          for MOPSO   for IMOPSO   dominated by      dominated by
          approach    approach     IMOPSO-ES         MOPSO-ES

EX48         10           6            7 (70%)            0 (0%)
EX76         11           14          11 (100%)           0 (0%)
EX100         7           5           7 (100%)            0 (0%)
Average                                  90%                0%

Problem   Total number    Number of     Number of
          of Collection   MOPSO-ES in   IMOPSO-ES
          of MOPSO-ES     collection    in collection
          and IMOPSO-ES

EX48       (10 - 7) +     10 - 7 = 3      6 - 0 = 6
           (6 - 0) = 9
EX76       (11 - 11) +    11 - 11 = 0    14 - 0 = 14
          (14 - 0) = 14
EX100       (7 - 7) +      7 - 7 = 0      5 - 0 = 5
           (5 - 0) = 5
Average

Table 5. Results of applying the EX48 rule set

Problem   No. of ES   No. of ES    No. of MOPSO-ES   No. of IMOPSO-ES
          for MOPSO   for IMOPSO   dominated by      dominated by
          approach    approach     IMOPSO-ES         MOPSO-ES

EX76         11           11          11 (100%)           0 (0%)
EX100         7           13           2 (29%)           3 (23%)
Average                                 64.5%             11.5%

Problem   Total number    Number of     Number of
          of Collection   MOPSO-ES in   IMOPSO-ES in
          of MOPSO-ES     collection    collection
          and IMOPSO-ES

EX76       (11 - 11) +    11 - 11 = 0   11 - 0 = 11
          (11 - 0) = 11
EX100       (7 - 2) +      7 - 2 = 5    13 - 3 = 10
          (13 - 3) = 15
Average

Table 6. Results of applying the EX76 rule set

Problem   No. of ES   No. of ES    No. of MOPSO-ES   No. of IMOPSO-ES
          for MOPSO   for IMOPSO   dominated by      dominated by
          approach    approach     IMOPSO-ES         MOPSO-ES

EX100         7           13           3 (43%)           2 (15%)
Average                                 (43%)             (15%)

Problem   Total number    Number of     Number of
          of Collection   MOPSO-ES in   IMOPSO-ES
          of MOPSO-ES     collection    in collection
          and IMOPSO-ES

EX100       (7 - 3) +      7 - 3 = 4     13 - 2 = 11
          (13 - 2) = 15
Average

Table 7. Summary of results for applying the EX42,
EX48, EX76 and EX100 rule sets

Problem   Average percentage   Average percentage
             of MOPSO-ES          of IMOPSO-ES
               that are             that are
             dominated by         dominated by
            the IMOPSO-ES         the MOPSO-ES

EX29             91%                   0%
EX42             90%                   0%
EX48            64.5%                11.5%
EX76             43%                  15%