摘要:Computational corpora are used as tools in Natural Language Processing (NLP) to solve disambiguation, translation and automated text generation problems. In order to complete these tasks, the main feature of computational corpora (the fact that they have proven uses of a language) is combined with statistical analysis along with information extraction methods based on neural networks or genetic algorithms. In software engineering, there is no evidence supporting the use of diagram computational corpora. Diagram repositories have a similar application working with real examples of diagrams (mainly for reuse purposes), but without using neither statistics nor heuristic methods for information extraction. In this paper, the UNC-Corpus, a tool for managing a corpus of UML (Unified Modelling Language) diagrams, which applies NPL traditional techniques in order to solve completeness problems in software engineering, is proposed.
关键词:annotated corpus;UML diagrams;XMI;repository;metamodelling;NLP;information extraction;corpus anotado;diagramas UML;XMI;repositorio;metamodelado;PLN;extracción de información