期刊名称:International Journal of Computer Science and Network Security
印刷版ISSN:1738-7906
出版年度:2014
卷号:14
期号:5
页码:1-7
出版社:International Journal of Computer Science and Network Security
摘要:The use of streaming data to analyze and discern patterns to make better decisions is becoming the basis for creating significant value for companies. Torrents of data flooding continuously force organizations to understand what information truly count, and analyze what they can do with that information. The aim of this paper is to explore the adaptation of a linear prediction model to discover trend topics on a news stream, by uncovering the most influential variables (words). Each input consists in a news classified within two dummy categories and transformed into a numerical vector containing around 10,000 different features. We apply continuously a linear model to perform shrinkage on input vectors as a result, variables with strongest characteristics show up, while those with negligible characteristics are removed. Due the dynamic and uninterrupted characteristics of the input, the output exposes the evolution of the most significant variables over the time. Firstly we provide details of linear prediction/regression model, secondly we introduce our proposed algorithm and finally simulation results and conclusions are shown
关键词:Lasso; big data; data mining; features detection