摘要:Stylometry consists of the analysis of linguistic styles and writing characteristics of the authors for identification, characterization, or verification purposes. In this paper, we investigate authorship verification for the purpose of user authentication process. In this setting, authentication consists of comparing sample writing of an individual against the model or profile associated with the identity claimed by that individual at login time (i.e. 1-to-1 identity matching). In addition, the authentication process must be done in a short period of time, which means analyzing short messages. Although a significant amount of literature has been produced showing high accuracy rates for long documents, it is still challenging to identify accurately authors of short unstructured documents, in particular when dealing with large authors populations. In this paper, we pose some steps toward achieving that goal by proposing a supervised learning technique combined with n-grams analysis for authorship verification for short texts. We introduce a new n-gram metric and study several sizes of n-grams using a relatively large dataset. The experimental evaluation shows increased effectiveness of our approach compared to the existing approaches published in the literature.
关键词:Authentication and access control;biometrics systems;authorship verification;stylometry;n-gram features;short message verification.