期刊名称:Computational Methods in Science and Technology
印刷版ISSN:1505-0602
出版年度:2018
卷号:24
期号:1
页码:43-58
DOI:10.12921/cmst.2018.0000007
出版社:Poznan Supercomputing and Networking Center
摘要:The paper presents an open, web-based system for stylometric analysis named WebSty, which is a part of the CLARIN-PL research infrastructure. WebSty does not require local installation by users, can be used via any web browser, offers rich set-up, and runs on a computing cluster. We discuss the underlying ideas of the system, its architecture, a pipeline of language tools for processing Polish, and its integration with systems for clustering, visualizing the results of clustering, and identifying the features of the strongest discrimination power. The techniques used for feature weighting and text similarity measuring are also concisely overviewed. In conclusions, we present preliminary evaluation of WebSty on the corpus of 1000 literary works, and we report on the results of the first research applications of WebSty. Even if the system was initially focused on processing Polish texts, we also briefly discuss its development towards a multilingual system, which already supports English, German and Hungarian.
其他关键词:stylometry, language technology infrastructure, web application, authorship attribution, style analysis, CLARIN