Printed Text Recognition System for Multi-Script Image

Inderpreet Kaur and Kiran Jot Singh
Department of ECE, Chandigarh University, Gharuan, India
Abstract—Optical Character Recognition system provides transformation of input text into editable form. Multi-script recognition systems are requisite in the countries like India where different people speak different languages in numerous states of country. In the recent time, multi-script recognition is a demanding problem and research work for expansion of optical character recognition scheme for classification of multi-scripts is needed. In this paper, a multi-script recognition system is proposed for the English, Numerals and Gurumukhi scripts. For recognition the image is processed through various stages like pre-processing, segmentation, feature extraction, and classification. After binarization of the image, it is segmented using line segmentation, word segmentation and character segmentation techniques of proposed system. Then features like number of holes, and projection histogram profiles are calculated for its classification. The system efficiency is calculated by using test images of different text sizes. Arial font is used for English script and Gurbanikalmi font is used for Gurumukhi script to train the system. Results show that proposed system provides high accuracy. 
Index Terms—OCR, multi-script, English, Gurumukhi, segmentation, feature extraction, recognition

Cite: Inderpreet Kaur and Kiran Jot Singh, "Printed Text Recognition System for Multi-Script Image," International Journal of Signal Processing Systems, Vol. 4, No. 5, pp. 411-416, October 2016. doi: 10.18178/ijsps.4.5.411-416
