Recent advances on support vector machines research.
Tian, Yingjie ; Shi, Yong ; Liu, Xiaohui 等
1. Introduction
Support vector machines (SVMs), which were introduced by Vapnik and
his coworkers in the early 1990's (Cortes, Vapnik 1995; Vapnik
1996, 1998), are proved to be effective and promising techniques for
data mining (Peng et al. 2008; Yang, Wu 2006). There are three essential
elements making SVMs so successful: the principle of maximal margin,
dual theory, and kernel trick. SVMs, unlike traditional methods (e.g.
Neural Networks), having their roots in Statistical Learning Theory
(SLT) and optimization methods, become powerful tools to solve the
problems of machine learning with finite training points and overcome
some traditional difficulties such as the "curse of
dimensionality", "over-fitting" and etc. SVMs'
theoretical foundation and implementation techniques have been
established and SVMs are gaining quick development and popularity due to
a number of their attractive features: nice mathematical
representations, geometrical explanations, good generalization abilities
and promising empirical performance (Cristianini, Shawe-Taylor 2000;
Deng, Tian 2004, 2009; Deng et al. 2012; Herbrich 2002; Scholkopf, Smola
2002). They have been successfully applied in many fields ranging from
text categorization (Joachims 1999a; Lodhi et al. 2000), face detection,
verification, and recognition (Jonsson et al. 2002; Lu et al. 2001;
Tefas et al. 2001), speech recognition (Ganapathiraju et al. 2004; Ma et
al. 2001), to bioinformatics (Guyon et al. 2001; Zhou, Tuck 2006),
bankruptcy prediction (Shin et al. 2005), remote sensing image analysis
(Melgani, Bruzzone 2004), time series forecasting (Kim 2003; Tay, Cao
2001), information and image retrieval (Druker et al. 2001; Liu et al.
2007; Tian et al. 2000), information security (Mukkamala et al. 2002)
and etc. (Adankon, Cheriet 2009; Ancona et al. 2001; Azimi-Sadjadi,
Zekavat 2000; Borgwardt 2011; Gutta et al. 2000; Peng et al. 2009;
Schweikert et al. 2009; Yao et al. 2002).
In recent years, the fields of machine learning and mathematical
programming are increasingly intertwined (Bennett, Parrado-Hernandez
2006), in which SVMs are the typical representatives. SVMs reduce most
machine learning problems to optimization problems, optimization lies at
the heart of SVMs, especially the convex optimization problem plays an
important role in SVMs. Since convex problems are much more tractable
algorithmically and theoretically, lots of SVM algorithms involves
solving convex problems, such as linear programming (Nash, Sofer 1996;
Vanderbei 2001), convex quadratic programming (Nash, Sofer 1996), second
order cone programming (Alizadeh, Goldfarb 2003; Boyd, Vandenberghe
2004; Goldfarb, Iyengar 2003), semi-definite programming (Klerk 2002)
and etc. However, there are also non-convex and more general
optimization problems appeared in SVMs: integer or discrete optimization
considers non-convex problems with integer constraints, semi-infinite
programming (Goberna, Lopez 1998), bi-level optimization (Bennett et al.
2006) and so on. Especially in the process of model construction, these
optimization problems may be solved many times. The research area of
mathematical programming intersects with SVMs closely through these core
optimization problems.
Generally speaking, there are three majors themes in the interplay
of SVMs and mathematical programming. The first theme contains the
development of under-lying models for standard classification or
regression problems. Novel methods are developed by making some changes
to the standard SVM models that enable the development of powerful new
algorithms, including v-SVM (Scholkopf, Smola 2002; Vapnik 1998), linear
programming SVM (Deng, Tian 2009; Deng et al. 2012, Weston et al. 1999),
least squares SVM (LSSVM) (Johan et al. 2002), proximal SVM (PSVM)
(Fung, Mangasarian 2001), twin SVM (TWSVM) (Khemchandani, Chandra 2007;
Shao et al. 2011), multi-kernel SVM (Sonnenburg et al. 2006; Wu et al.
2007), AUC maximizing SVM (Ataman, Street 2005; Brefeld, Scheffer 2005),
localized SVM (Segata, Blanzieri 2009), cost sensitive SVM (Akbani et
al. 2004), fuzzy SVM (Lin, Wang 2002), Crammer-Singer SVM (Crammer,
Singer 2001), K-support vector classification regression (K-SVCR)
(Angulo, Catala 2000) and etc., are developed. The second theme concerns
the well-known optimization methods extended to new SVM models and
paradigms. A wide range of programming methods is used to create novel
optimization models in order to deal with different practical problems
such as ordinal regression (Herbrich et al. 1999), robust classification
(Goldfarb, Iyengar 2003; Yang 2007; Zhong, Fukushima 2007),
semisupervised and unsupervised classification (Xu, Schuurmans 2005;
Zhao et al. 2006, 2007), transductive classification (Joachims 1999b),
knowledge based classification (Fung et al. 2001, 2003; Mangasarian,
Wild 2006), Universum classification (Vapnik 2006), privileged
classification (Vapnik, Vashist 2009), multi-instance classification
(Mangasarian, Wild 2008), multi-label classification (Tsoumakas, Katakis
2007; Tsoumakas et al. 2010), multi-view classification (Farquhar et al.
2005), structured output classification (Tsochantaridis et al. 2005) and
etc. The third theme considers the important issues in constructing and
solving SVM optimization problems. On the one hand, several methods are
developed for constructing optimization problems in order to enforce
feature selection (Chen, Tian 2010; Tan et al. 2010), model selection
(Bennett et al. 2006; Kunapuli et al. 2008), probabilistic outputs
(Platt 2000), rule extraction from SVMs (Martens et al. 2008) and so on.
On the other hand existing SVM optimization models are aimed at being
solved more efficiently for the large scale data set, in which the key
point is creating algorithms that exploit the structure of the
optimization problem and pay careful attention to algorithmic and
numeric issue, such as SMO (Platt 1999), efficient methods for solving
large-scale linear SVM (Chang et al. 2008; Hsieh et al. 2008; Joachims
2006; Keerthi et al. 2008), parallel methods for solving large-scale SVM
(Zanghirati, Zanni 2003) and etc.
Considering the many variants of SVM core optimization problems, a
systematic survey is needed and helpful to understand and use this
family of data mining techniques more easily. The goal of this paper is
to closely review SVMs from the optimization point of view. Section 2 of
the paper takes standard C - SVM as an example to summarize and explain
the nature of SVMs. Section 3 will describe SVM optimization models with
different variations according to the above three major themes. Several
applications of SVMs to financial forecasting, bankruptcy prediction,
credit risk analysis are introduced in Section 4. Finally, Section 5
will provide remarks and future research directions.
2. The nature of C-Support vector machines
In this section, standard C-SVM (Deng, Tian 2004, 2009; Deng et al.
2012; Vapnik 1998) for binary classification is briefly summarized and
understood from several points of view.
Definition 2.1. (Binary classification). For the given training set
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (1)
where [x.sub.t] [member of] [R.sup.n], [y.sub.i [member of] y =
{1,-1}, i = 1,xxx,l, the goal is to find a real function g(x) in Rn and
derive the value of y for any x by the decision function
f (x) = sgn(g(x)). (2)
C - SVM formulates the problem as a convex quadratic programming
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (3)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (4)
[[xi].sub.i] [greater than or equal to] 0, i = 1,l, (5)
where [xi] = [([xi], ...,[[xi].sub.l]).sup.T], and C > 0 is a
penalty parameter. For this primal problem, C - SVM solves its
Lagrangian dual problem
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (6)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (7)
0 [less than or equal to] [[alpha].sub.i] [less than or equal to]
C, i = 1, xxx, l, (8)
where K(x, x') is the kernel function, which is also a convex
quadratic problem and then construct the decision function.
As we all know, the principal of Structural Risk Minimization (SRM)
is embodied in SVM, the confidential interval and the empirical risk
should be considered at the same time. The two terms in the objective
function (3) indicate that we not only minimize [parallel] w
[[parallel].sup.2] (maximize the margin), but also minimize
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.], which is a
measurement of violation of the constraints [y.sub.i] ((w x [x.sub.i]) +
b) [greater than or equal to] 1, i = 1, xxx,l. Here the parameter C
determines the weighting between the two terms, the larger the value of
C, the larger the punishment on empirical risk.
In fact, the parameter C has another meaningful interpretation
(Deng, Tian 2009; Deng et al. 2012). Consider the binary classification
problem, select a decision function candidate set F(t) depending on a
real parameter t:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (9)
and suppose that the loss function to be the soft margin loss
function defined by
c(x,y,f(x)) = max{0,1 - yg(x)}, where g(x) = (w x x) + b. (10)
Thus structural risk minimization is implemented by solving the
following convex programming for an appropriate parameter t:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (11)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (12)
[[xi].sub.i] [greater than or equal to] 0, i = 1, xxx, l, (13)
[parallel] w [parallel] [less than or equal to] t. (14)
An interesting point is proved that when the parameters C and t are
chosen satisfying t = [psi](C), where [psi](x) is nondecreasing in the
interval (0, +[infinity]), problem(3)~(5) and problem (11)~(14) will get
the same decision function (Zhang et al. 2010). Hence the very
interesting and important meaning of the parameter C is proposed: C
corresponds to the size of the decision function candidate set in the
principle of SRM: the larger the value of C, the larger the decision
function candidate set.
Now we can summarize and understand C - SVM from following points
of view: (i) Construct a decision function by selecting a proper size of
the decision function candidate set via adjusting the parameter C; (ii)
Construct a decision function by selecting the weighting between the
margin of the decision function and the deviation of the decision
function measured by the soft-margin loss function via adjusting the
parameter C; (iii) Another understanding about C - SVM can also be seen
in the literatures (Deng et al. 2012): Construct a decision function by
selecting the weighting between flatness of the decision function and
the deviation of the decision function measured by the soft-margin loss
function via adjusting the parameter C.
3. Optimization models of support vector machines
In this section, several representative and important SVM
optimization models with different variations are described and
analyzed. These models can be divided into three categories: models for
standard problems, models for nonstandard learning problems, and models
combining SVMs with other issues in machine learning.
3.1. Models for standard problems
For the standard classification or regression problems, lot of
methods are developed based on standard SVM models to be the powerful
new algorithms. Here we briefly introduce several basic and efficient
models, lots of developments of these models are omitted here.
3.1.1. Least squares support vector machine
Just like the standard C - SVM the starting point of least squares
SVM (LSSVM) (Johan et al. 2002) is also to find a separating hyperplane,
but with different primal problem. In fact, introducing the
transformation x = [PHI}(x) and the corresponding kernel K(x, x') =
([PHI](x) x [PHI](x')), the primal problem becomes the convex
quadratic programming
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (15)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (16)
The geometric interpretation of the above problem with x [member
of] [R.sup.2] is shown in Figure 1, where minimizing A [parallel] w
[parallel][sup.2] realizes the maximal margin between the straight lines
(w x x) + b = 1 and (w x x) + b = -1, (17)
while minimizing [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN
ASCII.] implies making the straight lines (17) be proximal to all inputs
of positive points and negative points respectively.
[FIGURE 1 OMITTED]
Its dual problem to be solved in LSSVM is also a convex quadratic
programming
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (18)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (19)
where
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (20)
In C - SVM, the error is measured by the soft margin loss function,
this leads to the fact that the decision function is decided only by the
support vectors. While in LSSVM, almost all training points contribute
to the decision function, which makes it lose the sparseness. However,
LSSVM needs to solve a quadratic programming with only equality
constraints, or equivalently a linear system of equations. Therefore, it
is simpler and faster than C - SVM.
3.1.2. Twin support vector machine
Twin support vector machine (TWSVM) is a binary classifier that
perform classification using two nonparallel hyperplanes instead of a
single hyperplane as in the case of conventional SVMs (Shao et al.
2011). Suppose the two non-parallel hyperplanes are the positive
hyperplane
([w.sub.+] x x) + [b.sub.+] = 0, (21)
and the negative hyperplane
([w.sub.-] x x) + [b.sub.-] = 0. (22)
The primal problems for finding these two hyperplanes are two
convex quadratic programming problems (Shao et al. 2011)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (23)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (24)
[[xi].sub.j] [greater than or equal to] 0, j = p + 1, xxx, p + q
(25)
and
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (26)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (27)
[[xi].sub.j] [greater than or equal to] 0, j = 1, xxx, p, (28)
where [x.sub.i], i = 1, ..., p are positive inputs, and [x.sub.i],
i = p + 1, ...,p + q are negative inputs, [c.sub.1] > 0, [c.sub.2]
> 0, [c.sub.3] > 0, [c.sub.4] > 0 are parameters, [MATHEMATICAL
EXPRESSION NOT REPRODUCIBLE IN ASCII.]
For both of the above primal problems an interpretation can be
offered in the same way. The geometric interpretation of the problem
(23)~(25) with x [member of] [R.sup.2] is shown in Figure 2, where
minimizing the second term [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN
ASCII.] makes the positive hyperplane (blue solid line in Fig. 2) to be
proximal to all positive inputs, minimizing the third term [MATHEMATICAL
EXPRESSION NOT REPRODUCIBLE IN ASCII.] with the constraints (24) and
(25) requires the positive hyperplane to be at a distance from the
negative inputs by pushing the negative inputs to the other side of the
bounding hyperplane (blue dotted line in Fig. 2), where a set [xi] of
variables is used to measure the error whenever the positive hyperplane
is close to the negative inputs. Minimizing the first term [MATHEMATICAL
EXPRESSION NOT REPRODUCIBLE IN ASCII.] realizes the maximal margin
between the positive hyperplane ([w.sub.+] x x) + [b.sub.+] = 0 and the
bounding hyperplane ([w.sub.+] x x) + [b.sub.+] =_1 in [R.sup.n+1]
space.
[FIGURE 2 OMITTED]
TWSVM is established based on solving two dual problems of the
above primal problems separately. The generalization of TWSVM has been
shown to be significantly better than standard SVM for both linear and
nonlinear kernels. It has become one of the popular methods in machine
learning because of its low computational complexity, since it solves
above two smaller sized convex quadratic programming problems. On
average, it is about four times faster than the standard SVMs.
3.1.3. AUC maximizing support vector machine
Nowadays the area under the receiver operating characteristics
(ROC) curve, which corresponds to the Wilcoxon-Mann-Whitney test
statistic, is increasingly used as a performance measure for
classification systems, especially when one often has to deal with
imbalanced class priors or misclassification costs. The area of that
curve is the probability that a randomly drawn positive example has a
higher decision function value than a random negative example; it is
called the AUC (area under ROC curve). When the goal of a learning
problem is to find a decision function with high AUC value, then it is
natural to use a learning algorithm that directly maximizes this
criterion. Over the last years, AUC maximizing SVMs (AUCSVM) have been
developed (Ataman, Street 2005; Brefeld, Scheffer 2005), in which one
kind of primary problem to be solved is a convex problem
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (29)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (30)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (31)
where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] are
positive and negative inputs separately. It's dual problem is also
a convex quadratic programming problem.
However, the existing algorithms all have the serious drawback that
the number of constraints is quadratic in the number of training points,
so they become very large even for small training set. To cope with
this, different strategies can be constructed, in one of which a Fast
and Exact k - Means (FEKM) (Goswami et al. 2004) algorithm is applied to
approximate the problem by representing the [1.sup.+][1.sup.-] many
pairs [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] by only
[1.sup.+] - [1.sup.-] cluster centers and thereby reduce the number of
constraints and parameters. The approximate k - Means AUCSVM is more
effective at maximizing the AUC than the SVM for linear kernels. Its
execution time is quadratic in the sample size.
3.1.4. Fuzzy support vector machine
In standard SVMs, each sample is treated equally; i.e., each input
point is fully assigned to one of the two classes. However, in many
applications, some input points, such as the outliers, may not be
exactly assigned to one of these two classes, and each point does not
have the same meaning to the decision surface. To solve this problem,
each data point in the training data set is assigned with a membership,
if one data point is detected as an outlier, it is assigned with a low
membership, so its contribution to total error term decreases. Unlike
the equal treatment in standard SVMs, this kind of SVM fuzzifies the
penalty term in order to reduce the sensitivity of less important data
points. Fuzzy SVM (FSVM) construct its primal problem as (Lin, Wang
2002)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (32)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (33)
[[xi].sub.t] [greater than or equal to] 0, i = 1, xxx,l, (34)
where s is the membership generalized by some outlier-detecting
methods. Its dual problem is similarly deduced as C - SVM to be a convex
quadratic programming
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (35)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (36)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.]. (37)
Model (32)~(34) is also the general formulation of the cost
sensitive SVM (Akbani et al. 2004) solving the imbalanced problem, in
which different error costs are used for the positive ([C.sub.+]) and
negative ([C.sub.-]) classes
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (38)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (39)
[[xi].sub.i] [greater than or equal to] 0, i = 1, xxx,l. (40)
3.2. Models for nonstandard problems
For the nonstandard problems appeared in different practical
applications, a wide range of programming methods are used to build
novel optimization models. Here we present several important and
interesting models to show the interplay of SVMs and optimization.
3.2.1. Support vector ordinal regression
Support vector ordinal regression (SVOR) (Herbrich et al. 1999) is
a method to solve a specialization of the multi-class classification
problem: ordinal regression problem. The problem of ordinal regression
arises in many fields, e.g., information retrieval, econometric models,
and classical statistics. It is complementary to the classification
problem and metric regression problem due to its discrete and ordered
outcome space.
Definition 3.1. (Ordinal regression problem). Given a training set
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (41)
where [x.sup.j.sub.i] is an input of a training point, the
supscript j = 1, xxx,M denotes the corresponding class number, i = 1,
xxx,[l.sup.j] is the index within each class, and [l.sup.j] is the
number of the training points in class j. Find M - 1 parallel
hyperplanes in [R.sup.n]
(w x x) - [b.sub.r] = 0, r = 1, xxx, M - 1, (42)
where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] such
that the class number for any x can be predicted by
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (43)
SVOR constructs the primal problem as
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (44)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (45)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (46)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (47)
where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] Its dual
problem is the following convex quadratic programming
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (48)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (49)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (50)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (51)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (52)
Though SVOR is a method to solve a specialization of the
multi-class classification problem and has many applications itself
(Herbrich et al. 1999), it is also used in the context of solving
general multi-class classification problem (Deng, Tian 2009; Deng et al.
2012; Yang 2007; Yang et al. 2005), in which the SVOR is used as a basic
classifier and used several times instead of only once, just as the
binary classifiers for multi-class classification. There are many
choices since any p - class SVOR with different order can be candidate,
where p = 2, 3, ..., M. When p = 2, this approach reduces to the
approach based on binary classifiers.
3.2.2. Semi-supervised support vector machine
In practice, labeled instances are often difficult, expensive, or
time consuming to obtain, meanwhile unlabeled instance may be relatively
easy to collect. Different with standard SVMs using only labeled
training points, lots of semi-supervised SVMs (S3VM) use large amount of
unlabeled data, together with the labeled data, to build better
classifiers. Transductive support vector machine (TSVM) (Joachims 1999b)
is such an efficient method finding a labeling of the unlabeled data, so
that a linear boundary has the maximum margin on both the original
labeled data and the (now labeled) unlabeled data. The decision function
has the smallest generalization error bound on unlabeled data.
For a training set given by
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (53)
where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] is a
collection of unlabeled inputs. The primal problem in TSVM is
constructed as the following (partly) combinational optimization problem
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (54)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (55)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (56)
[[xi].sub.i] [greater than or equal to]0, i = 1,xxx,l, (57)
[[xi].sup.*.sub.i][greater than or equal to] 0, i = l + 1,xxx,l +
q, (58)
where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] are
parameters. However, finding the exact solution to this problem is
NP-hard. Major effort has focused on efficient approximation algorithms.
The SVM-light is the first widely used software (Joachims 1999b).
In the approximation algorithms, several relax the above TSVM
training problem to semi-definite programming (SDP) (Xu, Schuurmans
2005; Zhao et al. 2006, 2007). The basic idea is to work with the binary
label matrix of rank 1, and relax it by a positive semi-definite matrix
without the rank constraint. However the computational cost of SDP is
still expensive for large scale problems.
3.2.3. Universum support vector machine
Different with semi-supervised SVM leveraging unlabeled data from
the same distribution, Universum support vector machine (USVM) use the
the additional data not belonging to either class of interest. Universum
contains data belonging to the same domain as the problem of interest
and is expected to represent meaningful information related to the
pattern recognition task at hand. Universum classification problem can
be formulated as follows:
Definition 3.2. (Universum classification problem). Given a
training set
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (59)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (60)
is a collection of unlabeled inputs known not to belong to either
class, find a real function g(x) in [R.sup.n] such that the value of y
for any x can be predicted by the decision function
f (x) = sgn(g(x)). (61)
Universum SVM constructs the following primal problem
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (62)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (63)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (64)
[[psi].sub.s],[[psi]*.sub.s] [greater than or equl to] 0, s =
1,xxx,u, (65)
where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] and
[C.sub.t] >0,[C.sub.u] >0, [epsilon] > 0 are parameters. Its
goal is to find a separating hyperplane (w x x) + b = 0 such that, on
the one hand, it separates the inputs {[x.sub.1], xxx,[x.sub.l]} with
maximal margin, and on the other hand, it approximates to the inputs
{[x.sup.*.sub.l], xxx,[x.sup.*.sub.u]}. We can also get its dual problem
and introduce kernel function for dealing with nonlinear classification.
It is natural to consider the relationship between USVM and some
3-class classification. In fact, it can be shown that, under some
assumptions, USVM is equivalent to K-SVCR (Angulo, Catala 2000), and is
also equivalent to the SVOR with M = 3 with slight modification (Gao
2008). USVM's performance depends on the quality of the Universum,
methodology of choosing the appropriate Universum is the subject of
future research.
3.2.4. Robust support vector machine
In standard SVMs, the parameters in the optimization problems are
implicitly assumed to be known exactly. However, in practice, some
uncertainty is often resent in many real-world problems, these
parameters have perturbations since they are estimated from the training
data which are usually corrupted by measurement noise. The solutions to
the optimization problems are sensitive to parameter perturbations. So
it is useful to explore formulations that can yield discriminants robust
to such measurement errors. For example, when the inputs are subjected
to measurement errors, it would be better to describe the inputs by
uncertainty sets [X.sub.i] [member of] [R.sup.n], i = 1,xxx,l, since all
we know is that the input belongs to the set [X.sup.i]. Therefore the
standard problem turns to be the following robust classification
problem.
Definition 3.3. (Robust classification problem). Given a training
set
T = {([X.sub.1],[Y.sub.1]), xxx,([X.sub.l],[Y.sub.l])}, (66)
where [X.sub.i] is a set in [R.sup.n], [Y.sub.i] [member of]
{-1,1}. Find a real function g(x) in [R.sup.n], such that the value of y
for any x can be predicted by the decision function
f(x) = sgn(g(x)). (67)
The geometric interpretation of the robust problem with circle
perturbations is shown in Figure 3, where the circles with "+"
and "o" are positive and negative input sets respectively, the
optimal separating hyperplane ([w.sup.*] x x) + [b.sup.*] = 0 by the
principle of maximal margin is constructed by robust SVM (RSVM). Now,
the primal problem of RSVM for such case is a semi-infinite programming
problem
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (68)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (69)
[[xi].sub.i] [greater than or equal to] 0,i = 1, xxx,l, (70)
where the set [x.sub.i] is a supersphere obtained from perturbation
of a point [x.sub.i]
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (71)
[FIGURE 3 OMITTED]
This semi-infinite programming problem can be proved to be
equivalent to the following second order cone programming (Goldfarb,
Iyengar 2003; Yang 2007)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (72)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (73)
[[xi].sub.i] [greater than or equal to] 0,i = 1, xxx,l, (74)
u + v = 1, (75)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (76)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (77)
its dual problem is also a second order cone programming
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.](78)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (79)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (80)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (81)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (82)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (83)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (84)
which can be efficiently solved by Self-Dual-Minimization (SeDuMi).
SeDuMi is a tool for solving optimization problems. It can be used to
solve linear programming, second-order cone programming and
semi-definite programming, and is available at the web site http://
sedumi.mcmaster.ca.
3.2.5. Knowledge based support vector machine
In many real-world problems, we are given not only the traditional
training set, but also prior knowledge such as some advised
classification rules. If appropriately used, prior knowledge can
significantly improve the predictive accuracy of learning algorithms or
reduce the amount of training data needed. Now the problem can be
extended in the following way: the single input points in the training
points are extended to input sets, called knowledge sets. If we consider
the input sets restricted as polyhedrons, the problem is formulated
mathematically as follows:
Definition 3.4. (Knowledge-based classification problem). Given a
training set
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (85)
where [X.sub.i] is a polyhedron in [R.sup.n] defined by
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] Find a real valued
function g(x) in [R.sup.n], such that the value of y for any x can be
predicted by the decision function
f (x) = sgn(g (x)). (86)
Of course we can construct the primal problem to be the following
semi-infinite programming problem
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (87)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (88)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (89)
[[xi].sub.i] [greater than or equal to] 0,i = 1, xxx, p + q. (90)
However, it was shown that the constraints (88)~(90) can be
converted into a set of limited constraints and then the problem becomes
a quadratic programming (Fung et al. 2001)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (91)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (92)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (93)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (94)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (95)
[xi],[eta],u [greater than or equal to] 0. (96)
This model considered the linear knowledge incorporated to linear
SVM, while linear knowledge based nonlinear SVM and nonlinear knowledge
based SVM were also proposed by Mangasarian and his co-workers (Fung et
al. 2003; Mangasarian, Wild 2006). Handling prior knowledge is worthy of
further study, especially when the training data may not be easily
available whereas expert knowledge may be readily available in the form
of knowledge sets. Another prior information such as some additional
descriptions of the training points was also considered and a method
called privileged SVM was proposed (Vapnik, Vashist 2009), which allows
one to introduce human elements of teaching: teacher's remarks,
explanations, analogy, and so on in the machine learning process.
3.2.6. Multi-instance support vector machine
Multi-instance problem was proposed in the application domain of
drug activity prediction, and similar to both the robust and
knowledge-based classification problems, it can be formulated as
follows.
Definition 3.5. (Multi-instance classification problem). Suppose
that there is a training set
T = {([X.sub.1],[Y.sub.1]),xxx,([X.sub.l] ,[Y.sub.l])}, (97)
where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] Find a
real function g(x) in [R.sup.n], such that the label y for any instance
x can be predicted by the decision function
f (x) = sgn(g (x)). (98)
The set [X.sub.i] is called a bag containing a number of instances.
Note that the interesting point of this problem is that: the label of a
bag is related with the labels of the instances in the bag and decided
by the following way: a bag is positive if and only if there is at least
one instance in the bag is positive; a bag is negative if and only if
all instances in the bag are negative. A geometric interpretation of
multi-instance classification problem is shown in Figure 4, where every
enclosure stands for a bag; a bag with "+" is positive and a
bag with "o" is negative, and both "+" and
"o" stand for instances.
[FIGURE 4 OMITTED]
For a linear classifier, a positive bag is classified correctly if
and only if some convex combination of points in the bag lies on the
positive side of a separating plane. Thus the primal problem in the
multi-instance SVM (MISVM) is constructed as the following nonlinear
programming problem (Mangasarian, Wild 2008)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (99)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (100)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (101)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (102)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (103)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (104)
where r and s are respectively the number of the instances in all
positive bags and all negative bags, and p is the number of positive
bags.
Though the above problem is nonlinear, it is easy to see that among
its constraints, only the first one is nonlinear, and in fact is
bilinear. Then a local solution to this problem is obtained by solving a
succession of fast linear programs in a few iterations: Alternatively,
hold one set of variables which constitute the bilinear terms constant
while varying the other set. For a nonlinear classifier, a similar
statement applies to the higher dimensional space induced by the kernel.
3.3. Other SVM issues
This section concerns some important issues of SVMs: feature
selection, parameter (model) selection, probabilistic outputs, rule
extraction, implements of algorithms and so on, in which the
optimization models are also applied.
3.3.1. Feature selection via SVMs
Standard SVMs cannot get the importance features, while identifying
a subset of features which contribute most to classification is also an
important task in machine learning. The benefit of feature selection is
twofold. It leads to parsimonious models that are often preferred in
many scientific problems, and it is also crucial for achieving good
classification accuracy in the presence of redundant features. We can
combine SVM with various feature selection strategies, Some of them are
"filters": general feature selection methods independent of
SVMs. That is, these methods select important features first and then
SVMs are applied. On the other hand, some are wrapper-type methods:
modifications of SVMs which choose important features as well as conduct
training/testing. In the machine learning literature, there are several
proposals for feature selection to accomplish the goal of automatic
feature selection in the SVM (Bradley, Mangasarian 1998; Guyon et al.
2001; Li et al. 2007; Weston et al. 2001; Zhu et al. 2004; Zou, Yuan
2008) via some optimization problems, in some of which they applied the
[l.sub.0]-norm, [l.sub.1]-norm or [l.sub.[infinity]]-norm SVM and got
competitive performance.
Naturally, we expect that using the [l.sub.p]-norm (0 < p <
1) in SVM can find more sparse solution than using l1-norm and more
algorithmic advantages. Through combining C-SVM and feature selection
strategy by introducing the [l.sub.p]-norm (0 < p < 1), the primal
problem in [l.sub.p]-support vector machines ([l.sub.p] -SVM) is (Chen,
Tian 2010; Deng et al. 2012; Tian et al. 2010)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (105)
s.t. [y.sub.i]((w x [x.sub.i]) + b) [greater than or equal to] 1 -
[[xi].sub.i], i = 1,xxx,l, (106)
[[xi.sub.i] [greater than or equal to] 0, i = 1, xxx,l, (107)
where p is a nonnegative parameter, and
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (108)
For the case of p = 0, [parallel] w [[parallel].sub.0] represents
the number of nonzero components of w, for the case of p = 0, the
problem turns to be a linear programming, for the case of p = 2, a
convex quadratic programming, and for the case of p = [infinity], the
problem is proved to be equivalent to a linear programming problem (Zou,
Yuan 2008).
However, solving this nonconvex, non-Lipschitz continuous
minimization problem is very difficult. After equivalently transforming
the problem to be
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (109)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (110)
[[xi].sub.i] [greater than or equal to] 0, i = 1, xxx,l, (111)
-v [less than or equal to] w [less than or equal to] v, (112)
and introducing the first-order Taylor's expansion as the
approximation of this nonlinear objective function, this problem can be
solved by a successive linear approximation algorithm (Bradley et al.
1998; Deng et al. 2012). Furthermore, a lower bound for the absolute
value of nonzero entries in every local optimal solution of lp-SVM is
developed (Tian et al. 2010), which reflects the relationship between
sparsity of the solution and the choice of the parameters C and p.
3.3.2. LOO error bounds for SVMs
The success of SVMs depends on the tuning of their several
parameters which affect the generalization error. An effective approach
choosing these parameters which will generalize well is to estimate the
generalization error and then search for parameters so that this
estimator is minimized. This requires that the estimators are both
effective and computationally efficient. Leave-one-out (LOO) method
(Vapnik, Chapelle 2000) is the extreme case of cross-validation, and LOO
error provides an almost unbiased estimate of the generalization error.
However, one shortcoming of the LOO method is that it is highly time
consuming when the number of training points l is very large thus
methods are sought to speed up the process. An effective approach is to
approximate the LOO error by its upper bound, that is computed by
running a concrete classification algorithm only once on the original
training set T of size l. This approach has successfully been developed
for both support vector classification machine (Gretton et al. 2001;
Jaakkola, Haussler 1998, 1999; Joachims 2000; Vapnik, Chapelle 2000),
support vector regression machine (Chang, Lin 2005; Tian 2005; Tian,
Deng 2005), and support vector ordinal regression (Yang et al. 2009).
Then we can search for parameter so that this upper bound is minimized.
Furthermore, inspired by the LOO error bound, approaches were
proposed by directly minimizing the expression given by the bound in an
attempt to minimize leave-one-out error (Tian 2005; Weston 1999), and
these approaches are called LOO support vector machines (LOOSVM).
LOOSVMs also involve solving convex optimization problems, and one of
which in such the algorithms is a linear programming problem
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (113)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (114)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (115)
where
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (116)
and K(x, x') is the kernel function. LOOSVMs possess many of
the same properties as SVMs. The main novelty of these algorithms is
that apart from the choice of kernel, they are parameter less: the
selection of the number of training errors is inherent in the algorithms
and not chosen by an extra free parameter as in SVMs.
3.3.3. Probabilistic outputs for support vector machines
For a binary classification problem with the training set (1),
standard C-SVM computes a decision function (2) such that it can be used
to predict the label of any test input x. However, we cannot guarantee
that the deduction is absolutely correct. So sometimes we hope to know
how much confidence we have, i.e. the probability of the input x
belonging to the positive class. To answer this question, investigate
the information contained in g(x). It is not difficult to imagine that
the larger g(x) is, the larger the probability is. So the value of g(x)
can be used to estimate the probability P(y = 1| g(x)) of the input x
belonging to the positive class. In fact, we only need to establish an
appropriate monotonic function from (-[infinity, +[infinity]) where g(x)
takes value to the probability values interval [0,1], such as the
sigmoid function is used (Platt 2000)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (117)
where [c.sub.1] < 0 and [c.sub.2] are two parameters to be
found. In order to choose the optimal values [c.sup.*.sub.1] and
[c.sup.*.sub.2] , an unconstrained optimization problem is constructed
following the idea of maximum likelihood estimation
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (118)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (119)
This problem is a two-parameter maximization, hence it can be
performed using any number of optimization algorithms, while Figure 5
shows a numerical results of the probabilistic outputs for a linear SVM
on some data (Platt 2000).
For better implementation of solving problem (118), an improved
algorithm that theoretically converges and avoids numerical difficulties
was also proposed (Lin et al. 2007).
3.3.4. Rule extraction from support vector machines
Though SVMs are the state-of-the-art tools in data mining, their
strength are also their main weakness, as the generated nonlinear models
are typically regarded as incomprehensible black-box models. Therefore,
opening the black-box or making SVMs explainable, i.e. extracting rules
from SVMs models to mimic their behavior and give comprehensibility to
them became more important and necessary in areas such as medical
diagnosis and credit evaluation (Martens et al. 2008).
[FIGURE 5 OMITTED]
[FIGURE 6 OMITTED]
There are several techniques to extract rules from SVMs so far, and
one potential method of classifying these rule extraction techniques is
in terms of the "translucency", which is of the view taken
within the rule extraction method of the underlying classifier. Two main
categories of rule extraction methods are known as pedagogical (Setiono
et al. 2006) and decompositional (Fung et al. 2005; Nunez et al. 2002).
Pedagogical algorithms consider the trained model as a black box and
directly extract rules which relate the inputs and outputs of the SVMs.
On the other hand, decompositional approach is closely related to the
internal workings of the SVMs and their constructed hyperplane.
Fung et al. (2005) present an algorithm to extract propositional
classification rules from linear SVMs. The method is considered to be
decompositional because it is only applicable when the underlying model
provides a linear decision boundary. The resulting rules are parallel
with the axes and nonoverlapping, but only (asymptotically) exhaustive.
The algorithm is iterative and extracts the rules by solving a
constrained optimization problem that is computationally inexpensive to
solve. Figure 6 shows execution of the algorithm for binary
classification and only rules for the black squares are being extracted
(Fung et al. 2005). Different optimal rules will be extracted according
to different criteria, and maximizes the log of the volume of the region
that the rules encloses is one kind of which, leads to solving the
following optimization problem
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (120)
s.t. (w x x) + b = 0, (121)
0 [less than or equal to] x [less than or equal to] 1. (122)
However, existing rule extracted algorithms have limitations in
real applications especially when the problems are large scale with high
dimensions. So the incorporation of the feature selection into the rule
extraction problem is also a possibility to be explored, and there are
already some papers considering this topic (Yang, Tian 2011).
4. Applications in economics
SVMs have been successfully applied in many fields including
economics, finance and management. Some applications of SVMs to
financial forecasting problems have been reported (Cao, Tay 2001, 2003;
Kim 2003; Tay, Cao 2001, 2002). Tay and Cao (2002) proposed C-ascending
SVMs by increasing the value of parameter C, this idea was based on the
assumption that it was better to give more weights on recent data than
distant data. Their results showed that C-ascending SVMs gave better
performance than standard SVM in financial time series forecasting. Cao
and Tay (2003) also compared SVMs with multilayer backpropagation (BP)
neural network and the regularized radial basis function (RBF) neural
network. Simulation results showed that SVMs with adaptive parameters
outperform two other methods.
Bankruptcy prediction is an important and widely studied topic
since it can have significant impact on bank lending decisions and
profitability, SVMs were successfully adopted to this problem in recent
years (Fan, Palaniswami 2000; Huang et al. 2004; Min, Lee 2005; Min et
al. 2006; Shin et al. 2005). The results for different real world data
sets demonstrated that SVMs outperform BP at the accuracy and
generalization performance. The effect of the variability in performance
with respect to various values of parameters in SVMs were also
investigated.
Due to recent financial crises and regulatory concerns, credit risk
assessment is an area that has seen a resurgence of interest from both
the academic world and the business community. Since credit risk
analysis or credit scoring is in fact a classification problem, so lots
of classification techniques were applied to this field, and naturally
competitive SVMs can be used (Stoenescu Cimpoeru 2011; Shi et al. 2005;
Thomas et al. 2005; Van Gestel et al. 2003; Yu et al. 2009; Zhou et al.
2009). Additionally, combining genetic algorithms with SVMs, named
hybrid GA-SVMs can simultaneously perform feature selection task and
model parameters optimization (Huang et al. 2007). Because in credit
scoring areas we usually cannot label one customer as absolutely good or
bad, a fuzzy support vector machine different with model (32)~(34) was
proposed to treat every inputs as both positive and negative classes,
but with different memberships (Wang et al. 2005),
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (123)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (124)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (125)
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII.] (126)
where [m.sub.i] is the membership for the ith inputs to the class
[y.sub.i].
Other applications in economics, including motor insurance fraud
management (Furlan et al. 2011), environmental risk assessment
(Kochanek, Tynan 2010), e-banking website quality assessment (Kaya,
Kahraman 2011) and etc., can also be explored by SVMs.
5. Remarks and future directions
This paper has offered an extensive review of optimization models
of SVMs, including least squares SVM, twin SVM, AUC Maximizing SVM, and
fuzzy SVM for standard problems; support vector ordinal machine,
semi-supervised SVM, Universum SVM, robust SVM, knowledge based SVM, and
multi-instance SVM for nonstandard problems, as well as [l.sub.p] -norm
SVM for feature selection, LOOSVM based on minimizing LOO error bound,
probabilistic outputs for SVM, and rule extraction from SVM. These
models have already been used in many real-life applications, such as
text categorization, bio-informatics, bankruptcy prediction, remote
sensing image analysis, network intrusion and detection, information
security, and credit assessment management. Some applications to
financial forecasting, bankruptcy prediction, credit risk analysis are
also reviewed in this paper. Researchers and engineers in data mining,
especially in SVMs can benefit from this survey in better understanding
the essence of the relation between SVMs and optimization. In addition,
it can also serve as a reference repertory of such approaches.
Research in SVMs and research in optimization have become
increasingly coupled. In this paper, we can see optimization models
including linear, nonlinear, second order cone, and semi-definite,
integer or discrete, semi-infinite programming models are used. Of
course, there are still many optimization models of SVMs not discussed
here, and new practical problems remaining to be explored present new
challenges to SVM to construct new optimization models. These models
should also have the same desirable properties as the models in this
paper including (Bennett et al. 2006): good generalization, scalability,
simple and easy implementation of algorithm, robustness, as well as
theoretically known convergence and complexity.
doi: 10.3846/20294913.2012.661205
Acknowledgments
This work has been partially supported by grants from National
Natural Science Foundation of China (No. 70921061, No. 10601064), the
CAS/SAFEA International Partnership Program for Creative Research Teams,
Major International (Regional) Joint Research Project (No. 71110107026),
the President Fund of GUCAS, and the National Technology Support Program
2009BAH42B02.
Received 05 September 2011; accepted 19 December 2011
References
Adankon, M. M.; Cheriet, M. 2009. Model selection for the LS-SVM
application to handwriting recognition, Pattern Recognition 42(12):
3264-3270. http://dx.doi.org/10.1016/j.patcog.2008.10.023
Akbani, R.; Kwek1, S.; Japkowicz, N. 2004. Applying support vector
machines to imbalanced datasets, in Proceedings of European Conference
on Machine Learning, Lecture Notes in Computer Science 3201: 39-50.
Alizadeh, F.; Goldfarb, D. 2003. Second-order cone programming,
Mathematical Programming, Series B 95: 3-51.
http://dx.doi.org/10.1007/s10107-002-0339-5
Ancona, N.; Cicirelli, G.; Branca, A.; Distante, A. 2001. Goal
detection in football by using support vector machines for
classification, in Proceedings of International Joint Conference on
Neural Networks 1: 611-616.
Angulo, C.; Catala, A. 2000. K-SVCR, a multi-class support vector
machine, in Proceedings of European Conference on Machine Learning,
Lecture Notes in Computer Science 1810: 31-38.
http://dx.doi.org/10.1007/3-540-45164-1_4
Ataman, K.; Street, W N. 2005. Optimizing area under the ROC curve
using ranking SVMs, in Proceedings of International Conference on
Knowledge Discovery in Data Mining. Available from Internet:
http://dollar. biz.uiowa.edu/ street/research/kdd05kaan.pdf
Azimi-Sadjadi, M. R.; Zekavat, S. A. 2000. Cloud classification
using support vector machines, in Proceedings of IEEE Geoscience and
Remote Sensing Symposium 2: 669-671.
Bennett, K.; Ji, X.; Hu, J.; Kunapuli, G.; Pang, J. S. 2006. Model
selection via bilevel optimization, in Proceedings of IEEE World
Congress on Computational Intelligence, 1922-1929.
Bennett, K.; Parrado-Hernandez, E. 2006. The interplay of
optimization and machine learning research, Journal of Machine Learning
Research 7: 1265-1281.
Borgwardt, K. M. 2011. Kernel Methods in Bioinformatics. Handbook
of Statistical Bioinformatics. Part 3, 317-334.
Boyd, S.; Vandenberghe, L. 2004. Convex Optimization. Cambridge
University Press.
Bradley, P. S.; Mangasarian, O. L.; Street, W. N. 1998. Feature
selection via mathematical programming, INFORMS Journal on Computing
Spring 10(2): 209-217.
Bradley, P.; Mangasarian, O. 1998. Feature selection via concave
minimization and support vector machines, in Proceedings of
International Conference on Machine Learning, Morgan Kaufmann, 82-90.
Brefeld, U.; Scheffer, T. 2005. Auc maximizing support vector
learning, in Proceedings of the 22nd International Conference on Machine
Learning, Workshop on ROC Analysis in Machine Learning. Available from
Internet: http://users.dsic.upv.es/~flip/ROCML2005/papers/brefeldCRC.pdf
Cao, L. J.; Tay, F. 2001. Financial forecasting using support
vector machines, Neural Computing Applications 10: 184-192.
http://dx.doi.org/10.1007/s005210170010
Cao, L. J.; Tay, F. 2003. Support vector machine with adaptive
parameters in financial time series forecasting, IEEE Transactions on
Neural Networks 14(6): 1506-1518.
http://dx.doi.org/10.1109/TNN.2003.820556
Chang, K. W.; Hsieh, C. J.; Lin, C. J. 2008. Coordinate descent
method for large-scale L2-loss linear SVM, Journal of Machine Learning
Research 9: 1369-1398.
Chang, M. W.; Lin, C. J. 2005. Leave-one-out bounds for support
vector regression model selection, Neural Computation 17(5): 1188-1222.
http://dx.doi.org/10.1162/0899766053491869
Chen, W. J.; Tian, Y. J. 2010. [l.sub.p]-norm proximal support
vector machine and its applications, Procedia Computer Science 1(1):
2417-2423. http://dx.doi.org/10.1016/j.procs.2010.04.272
Stoenescu Cimpoeru, S. 2011. Neural networks and their application
in credit risk assessment. Evidence from the Romanian market,
Technological and Economic Development of Economy 17(3): 519-534.
http://dx.doi.org/10.3846/20294913.2011.606339
Cortes, C.; Vapnik, V. 1995. Support vector networks, in
Proceedings of Machine Learning 20: 273-297.
Crammer, K.; Singer, Y. 2001. On the algorithmic implementation of
multi-class kernel based vector machines, Journal of Machine Learning
Research 2: 265-292.
Cristianini, N.; Shawe-Taylor, J. 2000. An Introduction to Support
Vector Machines and Other Kernel-based Learning Methods. Cambridge
University Press.
Deng, N. Y.; Tian, Y. J. 2004. New Method in Data Mining: Support
Vector Machines. Science Press, Beijing, China.
Deng, N. Y.; Tian, Y. J. 2009. Support Vector Machines: Theory,
Algorithms and Extensions. Science Press, Beijing, China.
Deng, N. Y.; Tian, Y. J.; Zhang, C. H. 2012. Support Vector
Machines: Optimization Based Theory, Algorithms and Extensions. CRC
Press (in press).
Druker, H.; Shahrary, B.; Gibbon, D. C. 2001. Support vector
machines: relevance feedback and information retrieval, Information
Processing and Management 38(3): 305-323.
Fan, A.; Palaniswami, M. 2000. Selecting bankruptcy predictors
using a support vector machine approach, in Proceedings of International
Joint Conference on Neural Net-works (IJCNN'00) 6: 354-359.
Farquhar, J. D. R.; Hardoon, D. R.; Meng, H. Y.; Taylor, J. S.;
Szedmak, S. 2005. Two view learning: SVM-2K, theory and practice,
Advances in Neural Information Processing Systems 18: 355-362.
Fung, G.; Mangasarian, O. L. 2001. Proximal support vector machine
classifiers, in Proceedings of International Conference of Knowledge
Discovery and Data Mining, 77-86.
Fung, G.; Mangasarian, O. L.; Shavlik, J. 2001. Knowledge-based
support vector machine classifiers, Advances in Neural Information
Processing Systems 15: 537-544.
Fung, G.; Mangasarian, O. L.; Shavlik, J. 2003. Knowledge-based
nonlinear
kernel classifiers, Learning Theory and Kernel Machines, Lecture
Notes in Computer Science 2777: 102-113.
http://dx.doi.org/10.1007/978-3-540-45167-9_9
Fung, G.; Sandilya, S.; Rao, R. B. 2005. Rule extraction from
linear support vector machines, in Proceedings of International
Conference on Knowledge Discovery in Data Mining, 32-40.
Furlan, S.; Vasilecas, O.; Bajec, M. 2011. Method for selection of
motor insurance fraud management system components based on business
performance, Technological and Economic Development of Economy 17(3):
535-561. http://dx.doi.org/10.3846/20294913.2011.602440
Ganapathiraju, A.; Hamaker, J.; Picone, J. 2004. Applications of
support vector machines to speech recognition, IEEE Transaction on
Signal Process 52(8): 2348-2355.
http://dx.doi.org/10.1109/TSP.2004.831018
Gao, T. T. 2008. U-support Vector Machine and Its Applications:
Master Thesis. China Agricultural University.
Goberna, M. A.; Lopez, M. A. 1998. Linear Semi-Infinite
Optimization. New York: John Wiley.
Goldfarb, D.; Iyengar, G. 2003. Robust convex quadratically
constrained programs, Mathematical Programming, Series B 97: 495-515.
http://dx.doi.org/10.1007/s10107-003-0425-3
Goswami, A.; Jin, R.; Agrawal, G. 2004. Fast and exact out-of-core
k-means clustering, in Proceedings of the IEEE International Conference
on Data Mining 10: 17-40.
Gretton, A.; Herbrich, R.; Chapelle, O. 2001. Estimating the
leave-one-out error for classification learning with SVMs. Available
from Internet: http://www.kyb.tuebingen.mpg.de/publications/pss/ps1854.ps
Gutta, S.; Huang, J. R. J.; Jonathon, P.; Wechsler, H. 2000.
Mixture of experts for classification of gender, ethnic origin, and pose
of human, IEEE Transaction on Neural Networks 11(4): 948-960.
http://dx.doi.org/10.1109/72.857774
Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. 2001. Gene
selection for cancer classification using support vector machines,
Machine Learning 46: 389-422. http://dx.doi.org/10.1023/A:1012487302797
Herbrich, R. 2002. Learning Kernel Classifiers: Theory and
Algorithms. The MIT Press.
Herbrich, R.; Graepel, T.; Obermayer, K. 1999. Support vector
learning for ordinal regression, in Proceedings of the 9th International
Conference on Artifical Neural Networks, 97-102.
http://dx.doi.org/10.1049/cp:19991091
Hsieh, C. J.; Chang, K. W.; Lin, C. J.; Keerthi, S. S.;
Sundararajan, S. 2008. A dual coordinate descent method for large-scale
linear SVM, in Proceedings of the 25th International Conference on
Machine Learning (ICML08), 408-415.
Huang, C. L.; Chen, M. C.; Wang, C. J. 2007. Credit scoring with a
data mining approach based on support vector machines, Expert Systems
with Applications 33(4): 847-856.
http://dx.doi.org/10.1016/j.eswa.2006.07.007
Huang, W.; Lai, K. K.; Nakamori, Y.; Wang, S. Y. 2004. Forecasting
foreign exchange rates with artificial neural networks: a review,
International Journal of Information Technology and Decision Making
3(1): 145-165. http://dx.doi.org/10.1142/S0219622004000969
Jaakkola, T. S.; Haussler, D. 1998. Exploiting generative models in
discriminative classifiers, Advances in Neural Information Processing
Systems 11. MIT Press.
Jaakkola, T. S.; Haussler, D. 1999. Probabilistic Kernel regression
models, in Proceedings of the 1999 Conference on AI and Statistics.
Morgan Kaufmann.
Joachims, T. 1999a. Text categorization with support vector
machines: learning with many relevant features, in Proceedings of 10th
European Conference on Machine Learning, 137-142.
Joachims, T. 1999b. Transductive inference for text classification
using support vector machines, in Proceedings of 16th International
Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA,
200-209.
Joachims, T. 2000. Estimating the generalization performance of an
SVM efficiently, in Proceedings of the 17th International Conference on
Machine Learning. Morgan Kaufmann, San Franscisco, California, 431-438.
Joachims, T. 2006. Training linear SVMs in linear time, in
Proceedings of International Conference on Knowledge Discovery in Data
Mining, 217-226.
Johan, A. K. S.; Tony, V. G.; Jos, D. B.; Bart, D. M.; Joos, V.
2002. Least Squares Support Vector Machines. World Scientific.
Jonsson, K.; Kittler, J.; Matas, Y. P. 2002. Support vector
machines for face authentication, Journal of Image
and Vision Computing 20(5): 369-375.
http://dx.doi.org/10.1016/S0262-8856(02)00009-4
Kaya, T.; Kahraman, C. 2011. A fuzzy approach to e-banking website
quality assessment based on an integrated AHP-ELECTRE method,
Technological and Economic Development of Economy 17(2): 313-334.
http://dx.doi.org/10.3846/20294913.2011.583727
Keerthi, S. S.; Sundararajan, S.; Chang, K. W.; Hsieh, C. J.; Lin,
C. J. 2008. A sequential dual method for large scale multi-class linear
SVMs, in Proceedings of the International Conference on Knowledge
Discovery and Data Mining, 408-416.
Khemchandani, J. R.; Chandra, S. 2007. Twin support vector machines
for pattern classification, IEEE Transaction on Pattern Analysis and
Machine Intelligence 29(5): 905-910.
http://dx.doi.org/10.1109/TPAMI.2007.1068
Kim, K. J. 2003. Financial time series forecasting using support
vector machines, Neurocomputing 55(1): 307-319.
http://dx.doi.org/10.1016/S0925-2312(03)00372-2
Klerk, E. 2002. Aspects of Semidefinite Programming. Kluwer
Academic Publishers, Dordrecht.
Kochanek, K.; Tynan, S. 2010. The environmental risk assessment for
decision support system for water management in the vicinity of open
cast mines (DS WMVOC), Technological and Economic Development of Economy
(16)3: 414-431. http://dx.doi.org/10.3846/tede.2010.26
Kunapuli, G.; Bennett, K.; Hu, J.; Pang, J. S. 2008. Bilevel model
selection for support vector machines, CRM Proceedings and Lecture Notes
45: 129-158.
Li, J. P.; Chen, Z. Y.; Wei, L. W.; Xu, W. X.; Kou, G. 2007.
Feature selection via least squares support feature machine,
International Journal of Information Technology and Decision Making
6(4): 671-686. http://dx.doi.org/10.1142/S0219622007002733
Lin, C. F.; Wang, S. D. 2002. Fuzzy support vector machine, IEEE
Transaction on Neural Network 13(2): 464-471.
http://dx.doi.org/10.1109/72.991432
Lin, H. T.; Lin, C. J.; Weng, R. C. 2007. A note on Platts
probabilistic outputs for support vector machines, Machine Learning 68:
267-276. http://dx.doi.org/10.1007/s10994-007-5018-6
Liu, Y.; Zhang, D.; Lu, G.; Ma, W. Y. 2007. A survey of
content-based image retrieval with high-level semantics, Pattern
Recognition 40(1): 262-282.
http://dx.doi.org/10.1016/j.patcog.2006.04.045
Lodhi, H.; Cristianini, N.; Shawe-Taylor, J.; Watkins, C. 2000.
Text classification using string kernels, Advances in Neural Information
Processing Systems 13: 563-569.
Lu, J. W.; Plataniotis, K. N.; Ventesanopoulos, A. N. 2001. Face
recognition using feature optimization and u-support vector machine, in
Proceedings of the 2001 IEEE Signal Processing Society Workshop,
373-382.
Ma, C.; Randolph, M. A.; Drish, J. 2001. A support vector
machines-based rejection technique for speech recognition, in
Proceedings of IEEE International. Conference on Acoustics, Speech, and
Signal Processing 1: 381-384.
Mangasarian, O. L.; Wild, E. W 2006. Nonlinear knowledge-based
classifiers, IEEE Transactions on Neural Networks 19(10): 1826-1832.
http://dx.doi.org/10.1109/TNN.2008.2005188
Mangasarian, O. L.; Wild, E. W. 2008. Multiple instance
classification via successive linear programming, Journal of
Optimization Theory and Application 137(1): 555-568.
http://dx.doi.org/10.1007/s10957-007-9343-5
Martens, D.; Huysmans, J.; Setiono, R.; Vanthienen, J.; Baesens, B.
2008. Rule extraction from support vector machines: an overview of
issues and application in credit scoring, Studies in Computational
Intelligence (SCI) 80: 33-63.
http://dx.doi.org/10.1007/978-3-540-75390-2_2
Melgani, F.; Bruzzone, L. 2004. Classification of hyperspectral
remotesensing images with support vector machines, IEEE Transactions on
Geoscience and Remote Sensing 42(8): 1778-1790.
http://dx.doi.org/10.1109/TGRS.2004.831865
Min, J. H.; Lee, Y. C. 2005. Bankruptcy prediction using support
vector machine with optimal choice of Kernel function parameters, Expert
Systems with Applications 28(4): 603-614.
http://dx.doi.org/10.1016/j.eswa.2004.12.008
Min, S. H.; Lee, J.; Han, I. 2006. Hybrid genetic algorithms and
support vector machines for bankruptcy prediction, Expert Systems with
Applications 31(3): 652-660.
http://dx.doi.org/10.1016/j.eswa.2005.09.070
Mukkamala, S.; Janoski, G.; Sung, A. H. 2002. Intrusion detection
using neural networks and support vector machines, in Proceedings of
IEEE International Joint Conference on Neural Network, 1702-1707.
Nash, S. G.; Sofer, A. 1996. Linear and Nonlinear Programming.
McGraw-Hill Companies, Inc. USA.
Nunez, H.; Angulo, C.; Catala, A. 2002. Rule extraction from
support vector machines, in European Symposium on Artificial Neural
Networks (ESANN), 107-112.
Peng, Y.; Kou, G.; Shi, Y.; Chen, Z. X. 2008. A descriptive
framework for the field of data mining and knowledge discovery,
International Journal of Information Technology and Decision Making
7(4): 639-682. http://dx.doi.org/10.1142/S0219622008003204
Peng, Y.; Kou, G.; Wang, G. X., et al. 2009. Empirical evaluation
of classifiers for software risk management, International Journal of
Information Technology and Decision Making 8(4): 749-767.
http://dx.doi.org/10.1142/S0219622009003715
Platt, J. 1999. Fast training of support vector machines using
sequential minimal optimization, in Scholkopf, B.; Burges, C. J. C.;
Smola, A. J. (Eds.). Advances in Kernel Methods Support Vector Learning.
Cambridge, MA: MIT Press, 185-208.
Platt, J. 2000. Probabilistic outputs for support vector machines
and comparison to regularized likelihood methods, in Smola, A.;
Bartlett, P.; Scholkopf, B.; Schuurmans, D. (Eds.). Advances in Large
Margin Classifiers. MIT Press, Cambridge, MA.
Scholkopf, B.; Smola, A. J. 2002. Learning with Kernels-Support
Vector Machines, Regularization, Optimization, and Beyond. The MIT
Press.
Schweikert, G.; Zien, A.; Zeller, G.; Behr, J.; Dieterich, C., et
al. 2009. mGene: accurate SVM-based gene finding with an application to
nematode genomes, Genome Research 19: 2133-2143.
http://dx.doi.org/10.1101/gr.090597.108
Segata, N.; Blanzieri, E. 2009. Fast local support vector machines
for large datasets, Machine Learning and Data Mining in Pattern
Recognition, Lecture Notes in Computer Science 5632: 295-310.
http://dx.doi.org/10.1007/978-3-642-03070-3_22
Setiono, R.; Baesens, B.; Mues, C. 2006. Risk management and
regulatory compliance: a data mining framework based on neural network
rule extraction, in Proceedings of the International Conference on
Information Systems (ICIS06). Available from Internet:
http://www.springerlink.com/content/v837r344822815hr/fulltext.pdf
Shao, Y.; Zhang, C. H.; Wang, X. B.; Deng, N. Y. 2011. Improvements
on twin support vector machines, IEEE Transactions on Neural Networks
22(6): 962-968. http://dx.doi.org/10.1109/TNN.2011.2130540
Shi, Y.; Peng, Y.; Kou, G.; Chen, Z. X. 2005. Classifying credit
card accounts for business intelligence and decision making: a
multiple-criteria quadratic programming approach, International Journal
of Information Technology and Decision Making4(4): 581-599.
http://dx.doi.org/10.1142/S0219622005001775
Shin, K. S.; Lee, T. S.; Kim, H. J. 2005. An application of support
vector machines in bankruptcy prediction model, Expert Systems with
Applications 28(1): 127-135.
http://dx.doi.org/10.1016/j.eswa.2004.08.009
Sonnenburg, S.; Ratsch, G.; Schofer, C.; Scholkopf, B. 2006. Large
scale multiple kernel learning, Journal of Machine Learning Research 7:
1-18.
Tan, J. Y.; Zhang, C. H.; Deng, N. Y. 2010. Cancer related gene
identification via p-norm support vector machine, in Proceeding of
International Conference on Computational Systems Biology, 101-108.
Tay, F. E. H.; Cao, L. J. 2001. Application of support vector
machines in financial time series forecasting, Omega 29(4): 309-317.
http://dx.doi.org/10.1016/S0305-0483(01)00026-3
Tay, F. E. H.; Cao, L. J. 2001. Improved financial time series
forecasting by combining support vector machines with self-organizing
feature map, Intelligent Data Analysis 5(4): 339-354.
http://dx.doi.org/10.1016/S0925-2312(01)00676-2
Tay, F. E. H.; Cao, L. J. 2002. Modified support vector machines in
financial time series forecasting, Neurocomputing 48(1): 847-861.
Tefas, A.; Kotropoulos, C.; Pitas, I. 2001. Using support vector
machines to enhance the performance of elastic graph matching for
frontal face authentication, IEEE Transaction on Pattern Analysis and
Machine Intelligence 23(7): 735-746. http://dx.doi.org/10.1109/34.935847
Thomas, L. C.; Oliver, R. W.; Hand, D. J. 2005. A survey of the
issues in consumer credit modeling research, Journal of the Operational
Research Society 56: 1006-1015.
http://dx.doi.org/10.1057/palgrave.jors.2602018
Tian, Q.; Hong, P.; Huang, T. S. 2000. Update relevant image
weights for content based image retrieval using support vector machines,
in Proceedings of IEEE International Conference on Multimedia and Expo
2: 1199-1202.
Tian, Y. J. 2005. Support Vector Regression and Its Applications:
PhD Thesis. China Agricultural University.
Tian, Y. J.; Deng, N. Y. 2005. Leave-one-out bounds for support
vector regression, in Proceedings of the 2005 International Conference
on Computational Intelligence for Modelling, Control and Automation, and
International Conference on Intelligent Agents, Web Technologies and
Internet Commerce, 1061-1066.
http://dx.doi.org/10.1109/FSKD.2010.5569345
Tian, Y. J.; Yu, J.; Chen, W. J. 2010. lp-norm support vector
machine with CCCP, in Proceedings of 2010 Seventh International
Conference on Fuzzy Systems and Knowledge Discovery, 1560-1564.
Tsochantaridis, I.; Joachims, T.; Hofmann, T.; Altun, Y. 2005.
Large margin methods for structured and interdependent output variables,
Journal of Machine Learning Research 6: 1453-1484.
Tsoumakas, G.; Katakis, I. 2007. Multi-label classification: an
overview, International Journal of Data Warehousing and Mining 3(3):
1-13. http://dx.doi.org/10.4018/jdwm.2007070101
Tsoumakas, G.; Katakis, I.; Vlahavas, I. 2010. Mining multi-label
data, Data Mining and Knowledge Discovery Handbook 6: 667-685.
Van Gestel, T.; Baesens, B.; Garcia, J.; Van Dijcke, P. 2003. A
support vector machine approach to credit scoring, Bank en Financiewezen
2: 73-82.
Vanderbei, R. J. 2001. Linear Programming: Foundations and
Extensions. Second edition. Kluwer Academic Publishers.
Vapnik, V. N. 1996. The Nature of Statistical Learning Theory.
Springer. New York.
Vapnik, V. N. 1998. Statistical Learning Theory. New York: John
Wiley and Sons.
Vapnik, V N. 2006. Estimation of Dependences Based on Empirical
Data. 2nd edition. Springer .Verlag, Berlin.
Vapnik, V. N.; Chapelle, O. 2000. Bounds on error expectation for
SVM, in Advances in Large-Margin Classifiers, Neural Information
Processing. MIT Press, 261-280.
Vapnik, V. N.; Vashist, A. 2009. A new learning paradigm: learning
using privileged information, Neural Networks 22(5): 544-577.
http://dx.doi.org/10.1016/j.neunet.2009.06.042
Wang, Y. Q.; Wang, S. Y.; Lai, K. K. 2005. A new fuzzy support
vector machine to evaluate credit risk, IEEE Transactions on Fuzzy
Systems 13(6): 820-831. http://dx.doi.org/10.1109/TFUZZ.2005.859320
Weston, J. 1999. Leave-one-out support vector machines, in
Proceedings of the International Joint Conference on Artificial
Intelligence, 727-731.
Weston, J.; Gammerman, A.; Stitson, M. O.; Vapnik, V. N.; Vovk, V.;
Watkins, C. 1999. Support vector density estimation, in Advances in
Kernel Methods-Support Vector Learning. Cambridge. MA: MIT Press,
293-305.
Weston, J.; Mukherjee, S.; Vapnik, V. 2001. Feature selection for
svms, Advances in Neural Information Processing Systems 13: 668-674.
Wu, Q.; Ying, Y.; Zhou, D. X. 2007. Multi-kernel regularized
classifiers, Journal of Complexity 23(1): 108-134.
http://dx.doi.org/10.1016/j.jco.2006.06.007
Xu, L.; Schuurmans, D. 2005. Unsupervised and semi-supervised
multi-class support vector machines, in Proceedings of the 20th National
Conference on Artificial Intelligence.
Yang, Q.; Wu, X. D. 2006. 10 Challenging problems in data mining
research, International Journal of Information Technology and Decision
Making5(4): 567-604. http://dx.doi.org/10.1142/S0219622006002258
Yang, S. X.; Tian, Y. J. 2011. Rule extraction from support vector
machines and its applications, in Proceedings of IEEE/WIC/ACM
International Conference on Web Intelligence and Intelligent Agent
Technology, 221-224. http://dx.doi.org/10.1109/WI-IAT.2011.132
Yang, Z. X. 2007. Support Vector Ordinal Regression and Multi-class
Problems: PhD Thesis. China Agricultural University.
Yang, Z. X.; Deng, N. Y.; Tian, Y. J. 2005. A multi-class
classification algorithm based on ordinal regression machine, in
Proceedings of International Conference on CIMCA& IAWTIC 2: 810-815.
Yang, Z. X.; Tian, Y. J.; Deng, N. Y. 2009. Leave-one-out bounds
for support vector ordinal regression machine, Neural Computing and
Applications 18(7): 731-748. http://dx.doi.org/10.1007/s00521-008-0217-z
Yao, Y.; Marcialis, G. L.; Pontil, M.; Frasconi, P.; Roli, F. 2002.
Combining flat and structured representations for fingerprint
classification with recursive neural networks and support vector
machines, Pattern Recognition 36(2): 397-406.
http://dx.doi.org/10.1016/S0031-3203(02)00039-0
Yu, L.; Wang, S. Y.; Cao, J. 2009. A modified least squares support
vector machine classifier with application to credit risk analysis,
International Journal of Information Technology and Decision Making
8(4): 697-710. http://dx.doi.org/10.1142/S0219622009003600
Zanghirati, G.; Zanni, L. 2003. A parallel solver for large
quadratic programs in training support vector machines, Parallel
Computing29(4): 535-551. http://dx.doi.org/10.1016/S0167-8191(03)00021-8
Zhang, C. H.; Tian, Y. J.; Deng, N. Y. 2010. The new interpretation
of support vector machines on statistical learning theory, Science China
Mathematics 53(1): 151-164.
http://dx.doi.org/10.1016/S0167-8191(03)00021-8
Zhao, K.; Tian, Y. J.; Deng, N. Y. 2007. Unsupervised and
semi-supervised lagrangian support vector machines, in Proceedings of
the 7th International Conference on Computational Science Workshops,
Lecture Notes in Computer Science 4489: 882-889.
http://dx.doi.org/10.1007/978-3-540-72588-6_140
Zhao, K.; Tian, Y. J.; Deng, N. Y. 2006. Unsupervised and
semi-supervised two-class support vector machines, in Proceedings of the
6th IEEE International Conference on Data Mining Workshops, 813-817.
Zhong, P.; Fukushima, M. 2007. Second order cone programming
formulations for robust multi-class classification, Neural Computation
19(1): 258-282. http://dx.doi.org/10.1162/neco.2007.19.L258
Zhou, L.; Lai, K. K.; Yen, J. 2009. Credit scoring models with AUC
maximization based on weighted SVM, International Journal of Information
Technology and Decision Making 8(4): 677-696.
Zhou, X.; Tuck, D. P. 2006. MSVM-RFE: extensions of SVM-RFE for
multiclass gene selection on DNA microarray data, Bioinformatics 23(9):
1106-1114. http://dx.doi.org/10.1142/S0219622009003582
Zhu, J.; Rosset, S.; Hastie, T.; Tibshirani, R. 2004. 1-norm
support vector machines, Advances in Neural Information Processing
Systems 16: 49-56.
Zou, H.; Yuan, M. 2008. The [F.sub.infinity]-norm support vector
machine, Statistica Sinica 18: 379-398.
http://dx.doi.org/10.1093/bioinformatics/btm036
Yingjie Tian (1), Yong Shi (2), Xiaohui Liu (3)
(1) Research Center on Fictitious Economy and Data Science, Chinese
Academy of Sciences, No. 80 Zhongguancun East Road, Haidian District,
Beijing 100190, China
(2) College of Information Science and Technology, University of
Nebraska at Omaha, Omaha, NE 68182, USA
(3) School of Information Systems, Computing and Mathematics,
Brunel University, Uxbridge, Middlesex, UK E-mails: (1) Hyj@gucas.ac.cn
(corresponding author); (2) yshi@gucas.ac.cn; (3)
xiaohui.liu@brunel.ac.uk
Yingjie TIAN. Doctor. Associate Professor of Research Center on
Fictitious Economy & Data Science, Chinese Academy of Sciences.
First degree in mathematics (1994), Master in applied mathematics
(1997), PhD in Management Science and Engineering. He has published 4
books (one of which has been cited over 700 times), and over 50 papers
in various journals and numerous conferences/proceedings papers.
Research interests: support vector machines, optimization theory and
applications, data mining, intelligent knowledge management, risk
management.
Yong SHI. Doctor. He currently serves as the Executive Deputy
Director, Research Center on Fictitious Economy & Data Science,
Chinese Academy of Sciences. He has been the Charles W. and Margre H.
Durham Distinguished Professor of Information Science and Technology,
College of Information Science and Technology, Peter Kiewit Institute,
University of Nebraska, USA in 1999-2009. Dr. Shi's research
interests include business intelligence, data mining, and multiple
criteria decision making. He has published more than 17 books, over 200
papers in various journals and numerous conferences/proceedings papers.
He is the Editor-in-Chief of International Journal of Information
Technology and Decision Making (SCI), and a member of Editorial Board
for a number of academic journals. Dr. Shi has received many
distinguished awards including the Georg Cantor Award of the
International Society on Multiple Criteria Decision Making (MCDM), 2009;
Outstanding Young Scientist Award, National Natural Science Foundation
of China, 2001; and Speaker of Distinguished Visitors Program (DVP) for
1997-2000, IEEE Computer Society. He has consulted or worked on business
projects for a number of international companies in data mining and
knowledge management.
Xiaohui LIU. Doctor. Professor of Computing at Brunel University in
the UK where he directs the Centre for Intelligent Data Analysis,
conducting interdisciplinary research concerned with the effective
analysis of data. He was Honorary Pascal Professor at Leiden University
(2004) and Visiting Researcher at Harvard Medical School (2005).
Professor Liu is a Charted Engineer, Life Member of the Association for
the Advancement of Artificial Intelligence, Fellow of the Royal
Statistical Society and Fellow of the British Computer Society. He has
given numerous invited and keynote talks, chaired several international
conferences, and advised funding agencies on interdisciplinary research
programs. Collaborating with many talented physical, clinical and life
scientists, Professor Liu has over 250 publications in biomedical
informatics, data mining, dynamic and intelligent systems.