出版社:Academy & Industry Research Collaboration Center (AIRCC)
摘要:This paper explains a method used to detect the presence of impulse noise in a set of scanned documents as a part of OCR preprocessing. As the document set is supposed to be processed in large scale, the primary concern of the noise detection method was efficiency within existing project constraints. Following the nature of noise, the method seeks to detect the presence of noise in document margins. The method works in two stages. First stage is margin detection, based on color spectre analysis. Second stage is noise recognition in margin samples, based on a pixel contrast score. The resulting implementation proved efficient both in terms of detection accuracy and algorithmic complexity.