期刊名称:International Journal of Advanced Computer Science and Applications(IJACSA)
印刷版ISSN:2158-107X
电子版ISSN:2156-5570
出版年度:2021
卷号:12
期号:12
DOI:10.14569/IJACSA.2021.0121271
语种:English
出版社:Science and Information Society (SAI)
摘要:Companies nowadays are sharing a lot of data on the web in structured and unstructured format, the data holds many signals from which we can analyze and detect innovation using weak signal detection approaches. To gain a competitive advantage over competitors, the velocity and volume of data available online must be exploited and processed to extract and monitor any type of strategic challenge or surprise whether it is in form of opportunities or threats. To capture early signs of a change in the environment in a big data context where data is voluminous and unstructured, we present in this paper a framework for weak signal detection relying on the crawling of a variety of web sources and big data based implementation of text mining techniques for the automatic detection and monitoring of weak signals using an aggregation approach of semantic clustering algorithms. The novelty of this paper resides in the capability of the framework to extend to an unlimited amount of unstructured data, that needs novel approaches to analyze, and the aggregation of semantic clustering algorithms for better automation and higher accuracy of weak signal detection. A corpus of scientific articles and patents is collected in order to validate the framework and provide a use case for future interested researchers in identifying weak signals in a corpus of data of a specific technological domain.
关键词:Competitive intelligence; apache spark; big data; weak signal detection; web mining; semantic clustering