首页    期刊浏览 2024年11月30日 星期六
登录注册

文章基本信息

  • 标题:Language Engineering for Creating Relevance Corpus
  • 本地全文:下载
  • 作者:Nuha H. El-Khalili ; Bassam Haddad ; Haya El-Ghalayini
  • 期刊名称:International Journal of Software Engineering and Its Applications
  • 印刷版ISSN:1738-9984
  • 出版年度:2015
  • 卷号:9
  • 期号:3
  • 页码:107-116
  • DOI:10.14257/ijseia.2015.9.3.11
  • 出版社:SERSC
  • 摘要:Building large relevance datasets is important for the training and evaluation of Information Retrieval (IR) systems. This process involves the collection of documents, queries and assessors' judgments of the degree of relevance of a query to a document. This process is expensive and time consuming. Additionally, it is not a one-of-a-kind project as it can be repeated for different languages and different corpora scopes and with different techniques. This paper presents a software engineering solution for the process of creating relevance corpora that achieves reusability, flexibility, multilingualism and modularity, in order to respect the experimental nature of IR field. The software engineering solution is presented as UML models. This paper then shows how the proposed design model was used to implement the process of building an open source relevance Arabic corpus based on the Clue Web 2009 data set for the purpose of supporting research evaluating and improving search engines for Arabic language.
  • 关键词:Software Engineering Models; Information Retrieval; Relevance Corpus; ; Language Engineering
国家哲学社会科学文献中心版权所有