文章基本信息

标题：ENCODING POLYLEXICAL UNITS WITH TEI LEX-0: A CASE STUDY
其他标题：KODIRANJE VEČBESEDNIH LEKSIKALNIH ENOT S TEI LEX-O: ŠTUDIJA PRIMERA
本地全文：下载
作者：Toma TASOVAC ; Ana SALGADO ; Rute COSTA 等
期刊名称：Slovenščina 2.0: empirical, applied and interdisciplinary research
电子版ISSN：2335-2736
出版年度：2020
卷号：8
期号：2
页码：28-57
DOI：10.4312/slo2.0.2020.2.28-57
语种：Slovenian
出版社：Trojina, Institute for Applied Slovene Studies
摘要：The modelling and encoding of polylexical units,i.e. recurrent sequences of lexemes that are perceived as independent lexical units,is a topic that has not been covered adequately and in sufficient depth by the Guidelines of the Text Encoding Initiative (TEI),a de facto standard for the digital representation of textual resources in the scholarly research community. In this paper,we use the Dictionary of the Portuguese Academy of Sciences as a case study for presenting our ongoing work on encoding polylexical units using TEI Lex-0,an initiative aimed at simplifying and streamlining the encoding of lexical data with TEI in order to improve interoperability. We introduce the notion of macro- and microstructural relevance to differentiate between polylexicals that serve as headwords for their own independent dictionary entries and those which appear inside entries for different headwords. We develop the notion of lexicographic transparency to distinguish between those units which are not accompanied by an explicit definition and those that are: the former are encoded as
–like constructs,whereas the latter becomes –like constructs, which can have further constraints imposed on them (sense numbers,domain labels,grammatical labels etc.). We codify the use of attributes on to encode different kinds of labels for polylexicals (implicit,explicit and normalised),concluding that the interoperability of lexical resources would be significantly improved if dictionary encoders would have access to an expressive but relatively simple typology of polylexical units.
其他摘要：Modeliranje in kodiranje večbesednih leksikalnih enot oz. pogostih nizov leksemov,ki jih obravnavamo kot samostojne leksikalne enote,je tematika,ki v smernicah Text Encoding Initiative (TEI) ni ustrezno in dovolj poglobljeno predstavljena,čeprav je TEI v raziskovalni skupnosti de facto standard pri delu z elektronskimi besedili. V prispevku na primeru Slovarja Portugalske akademije znanosti predstavimo nekatere rešitve pri kodiranju večbesednih leksikalnih enot v formatu TEI Lex-o,iniciative,katere namen je poenostaviti in racionalizirati kodiranje leksikalnih podatkov s TEI in posledično izboljšati interoperabilnost. Vpeljemo pojem makro- in mikrostrukturne relevantnosti z namenom razločevati med večbesednimi leksikalnimi enotami,ki so samostojne slovarske iztočnice,in tistimi,ki se nahajajo v geslih enobesednih iztočnic. Vpeljemo tudi pojem leksikografske transparentnosti za razlikovanje med enotami,ki nimajo razlage,in tistimi,ki jo imajo;prve so kodirane v okviru elementa ,slednje pa v okviru elementa in lahko vsebujejo nadaljnje omejitve (številke pomenov,področne oznake,slovnične oznake ipd.). V elementu vpeljemo uporabo atributov za kodiranje različnih tipov oznak za večbesedne leksikalne enote (implicitne,eksplicitne in normirane). Prispevek zaključimo s sklepom,da bi se interoperabilnost leksikalnih virov močno izboljšala,če bi avtorji slovarskih shem imeli dostop do bogate,a relativno enostavne tipologije večbesednih leksikalnih enot.
关键词：TEI;Lexicography;Language Resources;Polylexical Units;Interoperability
其他关键词：TEI;leksikografija;jezikovni viri;večbesedne leksikalne enote;interoperabilnost