首页    期刊浏览 2024年10月06日 星期日
登录注册

文章基本信息

  • 标题:Unified, Labeled, and Semi-Structured Database of Pre-Processed Mexican Laws
  • 本地全文:下载
  • 作者:Bella Martinez-Seis ; Obdulia Pichardo-Lagunas ; Harlan Koff
  • 期刊名称:Data
  • 印刷版ISSN:2306-5729
  • 出版年度:2022
  • 卷号:7
  • 期号:7
  • 页码:1-13
  • DOI:10.3390/data7070091
  • 语种:English
  • 出版社:MDPI Publishing
  • 摘要:This paper presents a corpus of pre-processed Mexican laws for computational tasks. Themain contributions are the proposed JSON structure and the methodology used to achieve the semistructured corpus with the selected algorithms. Law PDF documents were transformed into plain text,unified by a deconstruction of law–document structure, and labeled with natural language processingtechniques considering part of speech (PoS); a process of entity extraction was also performed. Thecorpus includes the Mexican constitution and the Mexican laws that were collected from the officialsite in PDF format repealed before 14 October 2021. The collection has 305 documents, including: theMexican constitution, 289 laws, 8 federal codes, 3 regulations, 2 statutes, 1 decree, and 1 ordinance.The semi-structured database includes the transformation of the set of laws from PDF format to adigital representation in order to facilitate its computational analysis. The documents were migratedto JSON type files to represent internal hierarchical relations. In addition, basic natural languageprocessing techniques were implemented on laws for the identification of part of speech and namedentities. The presented data set is mainly useful for text analysis and data science. It could be used forvarious legislative analysis tasks including: comprehension, interpretation, translation, classification,accessibility, coherence, and searches. Finally, we present some statistic of the identified entities andan example of the usefulness of the corpus for environmental laws.
  • 关键词:Mexican legislation;laws;natural language processing;legislative documents
国家哲学社会科学文献中心版权所有