摘要:This paper describes the process of preparing Bulgarian lexical databases for the CONCEDE EC project whose aim is to harmonise the methodology, tools and resources for building Lexical Data Bases (LDBs) in a general-purpose document-interchange format, for six Central European languages: Bulgarian, Czech, Estonian, Hungarian, Romanian and Slovene. The selection of the words on the basis of their frequency in naturally occurring texts - Orwell's 1984 - ensures that the project produce the lexical databases useful for real applications.
关键词:dictionary encoding; lexical databases; document type definition