摘要:As long as natural language processing applications are considered prediction problems with insufficient context, usually referred to as a single sentence or paragraph, this does not reveal how humans perceive natural language. When reading a text, humans are sensitive to much more context, such as the rest or other relevant documents. This study focuses on simultaneously capturing syntax and global semantics from a text, thus acquiring document-level understanding. Accordingly, we introduce a Topic-Transformer that combines the benefits of a neural topic model that captures global semantic information and a transformer-based language model, which can capture the local structure of texts both semantically and syntactically. Experiments on various datasets confirm that our model has a lower perplexity metric compared to standard transformer architecture and the recent topic-guided language models and generates topics that are conceivably coherent compared to those of regular Latent Dirichlet Allocation (LDA) topic model.
关键词:Neural Topic Model;Neural Language Model;Topic-Guided Language Model;Document-Level Understanding;Long-Range Semantic Dependencies