其他摘要:Even though suicide is one of the top three causes of young people’s deaths, no reliable methods of identifying suicidal behavior have been developed. One of the promising directions of research is quantitative analysis of speech. It is nowadays common to process texts by suicidal individuals (mostly suicidal notes or literary texts by famous people, e.g., poets, writes, etc.) and t exts by individuals from a control group using software (mostly LIWC) and to design models for classifying texts as those by suicidal individuals or not. This kind of analysis has been mainly performed for English texts that generally have a number of rest rictions due to their linguistic nature. The authors are the first to attempt to design a mathematical model to classify texts as those by suicidal or nonsuicidal individuals using numerical values of linguistic parameters as features. Texts (blogs by youn g people who committed suicides, similar in both genre and topic, to those by individuals of an age - corresponding control group) were processed using the Russian version of LIWC with users’ dictionaries. Unlike current studies, in designing the model we mo stly made use of features that are not significantly dependent on the content. This is because not all individuals who committed suicides are known to deal with the topic in their texts. The resulting model was shown to be 71.5% accurate, which is comparab le with the stat e - of - the - art for English texts.
其他关键词:Suicide language; internet texts; suicide predictors; text corpus; computational linguistics; Russian texts; RusPersonality.