期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2012
卷号:2012
出版社:ACL Anthology
摘要:We propose Symbol-Refined Tree Substitution
Grammars (SR-TSGs) for syntactic parsing.
An SR-TSG is an extension of the conventional
TSG model where each nonterminal
symbol can be refined (subcategorized) to fit
the training data. We aim to provide a unified
model where TSG rules and symbol refinement
are learned from training data in a fully
automatic and consistent fashion. We present
a novel probabilistic SR-TSG model based
on the hierarchical Pitman-Yor Process to encode
backoff smoothing from a fine-grained
SR-TSG to simpler CFG rules, and develop
an efficient training method based on Markov
Chain Monte Carlo (MCMC) sampling. Our
SR-TSG parser achieves an F1 score of 92.4%
in the Wall Street Journal (WSJ) English Penn
Treebank parsing task, which is a 7.7 point improvement
over a conventional Bayesian TSG
parser, and better than state-of-the-art discriminative
reranking parsers.