期刊名称:Slovenščina 2.0: empirical, applied and interdisciplinary research
电子版ISSN:2335-2736
出版年度:2020
卷号:8
期号:2
页码:28-57
DOI:10.4312/slo2.0.2020.2.28-57
语种:Slovenian
出版社:Trojina, Institute for Applied Slovene Studies
摘要:The modelling and encoding of polylexical units,i.e. recurrent sequences of lexemes that are perceived as independent lexical units,is a topic that has not been covered adequately and in sufficient depth by the Guidelines of the Text Encoding Initiative (TEI),a de facto standard for the digital representation of textual resources in the scholarly research community. In this paper,we use the Dictionary of the Portuguese Academy of Sciences as a case study for presenting our ongoing work on encoding polylexical units using TEI Lex-0,an initiative aimed at simplifying and streamlining the encoding of lexical data with TEI in order to improve interoperability. We introduce the notion of macro- and microstructural relevance to differentiate between polylexicals that serve as headwords for their own independent dictionary entries and those which appear inside entries for different headwords. We develop the notion of lexicographic transparency to distinguish between those units which are not accompanied by an explicit definition and those that are: the former are encoded as
其他摘要:Modeliranje in kodiranje večbesednih leksikalnih enot oz. pogostih nizov leksemov,ki jih obravnavamo kot samostojne leksikalne enote,je tematika,ki v smernicah Text Encoding Initiative (TEI) ni ustrezno in dovolj poglobljeno predstavljena,čeprav je TEI v raziskovalni skupnosti de facto standard pri delu z elektronskimi besedili. V prispevku na primeru Slovarja Portugalske akademije znanosti predstavimo nekatere rešitve pri kodiranju večbesednih leksikalnih enot v formatu TEI Lex-o,iniciative,katere namen je poenostaviti in racionalizirati kodiranje leksikalnih podatkov s TEI in posledično izboljšati interoperabilnost. Vpeljemo pojem makro- in mikrostrukturne relevantnosti z namenom razločevati med večbesednimi leksikalnimi enotami,ki so samostojne slovarske iztočnice,in tistimi,ki se nahajajo v geslih enobesednih iztočnic. Vpeljemo tudi pojem leksikografske transparentnosti za razlikovanje med enotami,ki nimajo razlage,in tistimi,ki jo imajo;prve so kodirane v okviru elementa