期刊名称:Conference on European Chapter of the Association for Computational Linguistics (EACL)
出版年度:2011
卷号:2011
出版社:ACL Anthology
摘要:Information extraction (IE) holds the promise
of generating a large-scale knowledge
base from the Web¡¯s natural language text.
Knowledge-based weak supervision, using
structured data to heuristically label a training
corpus, works towards this goal by enabling
the automated learning of a potentially
unbounded number of relation extractors.
Recently, researchers have developed multiinstance
learning algorithms to combat the
noisy training data that can come from
heuristic labeling, but their models assume
relations are disjoint ¡ª for example they
cannot extract the pair Founded(Jobs,
Apple) and CEO-of(Jobs, Apple).
This paper presents a novel approach for
multi-instance learning with overlapping relations
that combines a sentence-level extraction
model with a simple, corpus-level component
for aggregating the individual facts. We
apply our model to learn extractors for NY
Times text using weak supervision from Freebase.
Experiments show that the approach
runs quickly and yields surprising gains in
accuracy, at both the aggregate and sentence
level.