其他摘要:In present front-line of Big Data, prediction tasks over the nodes and edges in complex deep architecture needs a careful representation of features by assigning hundreds of thousands, or even millions of labels and samples for information access system, especially for hierarchical extreme multi-label classifica- tion. We introduce edge 2 vec , an edge representations framework for learning discrete and continuous features of edges in deep architecture. In edge 2 vec , we learn a mapping of edges associated with nodes where random samples are augmented by statistical and semantic representations of words and documents. We argue that infusing semantic representations of features for edges by exploiting word 2 vec and para 2 vec is the key to learning richer representations for exploring target nodes or labels in the hierarchy. Moreover, we design and implement a balanced stochastic dual coordinate ascent (DCA)-based support vector machine for speeding up training. We introduce a global decision-based top-down walks instead of random walks to predict the most likelihood labels in the deep architecture. We judge the efficiency of edge 2 vec over the existing state-of-the-art techniques on extreme multi-label hierarchical as well as flat classification tasks. The empirical results show that edge 2 vec is very promising and computationally very efficient in fast learning and predicting tasks. In deep learning workbench, edge 2 vec represents a new direction for statistical and semantic representations of features in task-independent networks.
其他关键词:Hierarchical text classification; multi-label learning; indexing; extreme classification; tree-structured.