期刊名称:Eastern-European Journal of Enterprise Technologies
印刷版ISSN:1729-3774
电子版ISSN:1729-4061
出版年度:2017
卷号:4
期号:2
页码:10-19
DOI:10.15587/1729-4061.2017.107512
语种:English
出版社:PC Technology Center
摘要:We solved the problem of development of algorithmic software for processes of content monitoring for solving the problem of recognition of the style of an author of a Ukrainian text based on Web Mining and NLP technology. Decomposition of the method for recognition of the style of an author, based of analysis of the found stop words, was carried out. Specific features of the method include adaptation of morphological and syntactic analysis of lexical units to structural peculiarities of words/ texts in Ukrainian. It is syntactic words (stop words or anchor words) that are significant for an author’s individual style, as they are not related to the theme and content of the publication. Recognition of the author's style is based on analysis of coefficients of lexical author’s language: coherence of speech, lexical diversity, syntactic complexity indices of concentration and exclusivity for the author's fragment. They are used for subsequent comparison and determining of a degree of belonging of the analyzed text to a particular author. We studied internal "dynamics" of a text of randomly selected authors through analysis of coefficients of lexical author’s language for the first k, n and m (without the title) words of the author's fragment and the analyzed one. The obtained results were compared. We obtained results of experimental testing of the proposed method for content-monitoring for determining and analysis of stop words in Ukrainian scientific texts of technical area based on Web Mining technology. It was found that for the selected experimental base that contains 100 works, the method for analysis of an article without compulsory initial information and list of references attains the best results by density criterion. It is achieved through learning of the system and by checking specified blocked words and specified thematic vocabulary. Testing of the proposed method for determining of keywords from other categories of texts – of scientific humanitarian area, belles-lettres, journalistic, etc. – requires subsequent experimental research.
关键词:style of the author;statistical linguistic analysis;quantitative linguistics;author's attribution