摘要:Unicode 6.1 (2012) had encoded more than 74,000 Han characters. This great repertory could solve the problem of unencoded Han characters to a significant extent. However, most information systems today still only support input and display of the first 20,902 encoded Han characters in Unicode 1.0 (1991). Even in latest systems, designed to support 32-bit Unicode and with suitable fonts installed, it is not easy to use these newly encoded Han characters. We note that many of these newly encoded Han characters are rarely used in users’ everyday life. An ordinary user may have confusions of their glyph shapes, pronunciations, meanings, and usages. IMEs (input method editors) for Han characters usually require users to have good knowledge of wanted Han characters. It is not unusual users try but fail to input unfamiliar Han characters. In this paper, we present an auxiliary Unicode Han character lookup service by radicals. One can use any Han character IME to key in one or more radicals to look up a wanted Han character. Every Unicode Han character is decomposed as a glyph expression of radicals. The similarity between the glyph expression and user input is estimated by a derived edit distance algorithm. The most similar Unicode Han characters are returned. As a result, the system provides users a convenient way to look up unfamiliar Unicode Han characters.
关键词:Unicode; Han character lookup; glyph expression; radicals; edit distance