摘要:This paper presents a supervised approach to the recognition of Crossdocument
Structure Theory (CST) relations in Polish texts. Its core is a graphbased
representation constructed for sentences. Graphs are built on the basis of
lexicalised syntactic-semantic relations extracted from text. Similarity between
sentences is calculated as similarity between their graphs, and the values are used
as features to train the classifiers. Several different configurations of graphs, as
well as graph similarity methods were analysed for this task. The approach was
evaluated on a large open corpus annotated manually with 17 types of selected CST
relations. The configuration of experiments was similar to those known from
SEMEVAL and we obtained very promising results.
关键词:Cross-document structure theory; CST; supervised learning; graphbased;
representation; logistic model tree; LMT; support vector machine; SVM.