其他标题:Labor saving for reprinting Japanese rare classical books : The development of the new method for OCR technology including kana and kanji characters in cursive style
其他摘要:Most modern Japanese people can't read Japanese rare classical books written in kana and kanji characters in cursive style, and felt it more difficult to understand contents of a large quantity of existing them. Therefore we developed a new method OCR for the purpose of the labor saving for a heavy reprint load, and demonstrated that it is possible to make the automatic text data having more than 80% precision under a constant condition as a result of principle validation tests for their books including kana and kanji characters in cursive style. In the new method OCR, character images were extracted with position information and a ideographic variation database was constructed, from which the character codes of the rare classical books for reprinting are identified by the similar kanji retrieval method. In addition, we make an effort to reduce loads to reprint generally by the working process design combined automatic processing with manpower without the full automation. We report the structure of the new method OCR and the present reprint situation using this.
关键词:古典籍;くずし字;変体仮名;翻刻;字形データベース
其他关键词:Japanese rare classical book ; OCR ; kana and kanji characters in cursive style ; hentaigana; anomalous Japanese cursive syllabary ; reprint ; ideographic variation database