出版社:Eesti Rakenduslingvistika Ühing (Estonian Association for Applied Linguistics)
摘要:This article concentrates on aspects of Estonian that are different in computermediated communication and the standard written language: orthography and the divergence of word-forms.The authors present an analysis of these differences and propose a way to adapt an existing morphological analyser for analysing computermediated communication.The method entails the creation of a user lexicon for the morphological analyser,deployed largely in an automated manner,and the automatic pre-processing of texts.While analysing the word-forms used in the texts of new media and comparing them with those of standard written language,one gets the feeling that most of the differences are the result of conscious language play.The lexical traits of Internet language include particles,emoticons,genre-specific neologisms,acronyms,borrowings from foreign languages and colloquial words.There is a great deal of play with orthography: substituting letters,omitting letters,lengthening and shortening letter sequences,and non-standard use of capitalization.As a result of pre-processing and the user lexicon,the percentage of unrecognized tokens decreases from 27.2 to 10.5 for chatroom texts,from 10.3 to 8.8 for texts of Internet forums,from 5.6 to 4.8 for comments,and from 11.7 to 10.5 for newsgroup texts.The main source of errors while analyzing texts with the customized morphological analyzer are non-Estonian words,phrases,and sentences that the analyzer cannot handle.
其他摘要:Artiklis analüüsitakse eesti uue meedia keelekasutuse e internetikeele leksikaalseid ja ortograafilisi eripärasusi ning nendest tulenevaid automaatsel morfoloogilisel analüüsil kerkivaid raskusi.Esitatakse meetodid nende probleemide lahendamiseks: sagedased kõrvalekalded normeeritud kirjakeelest lahendatakse kasutajasõnastiku abil ning harvem esinevad,kuid regulaarsed kõrvalekalded automaatseid teisendusreegleid rakendades.Korpusespetsiifilise leksika analüüsiks pakutakse välja kasutajasõnastiku automaatse täiendamise meetod.Artikli autorid on seisukohal,et internetisuhtluses kasutatava keelevariandi erinevused normeeritud kirjakeelest on valdavalt kirjutajate teadliku keelemängu tulemus,mitte kehva kirjaoskuse väljendus.