A Heuristic Approach for Designing Regional Language Based Raw–Text Extractor and Unicode Font–Mapping Tool
"A Heuristic Approach for Designing ..." refers background in this paper
...Not only that, a detailed study of the Hindi fonts is highly required as the algorithm is based on this language [4, 5 ]. The approach for raw text extraction is same for any other language(s) as it follows a generic heuristic depending on regular expression searching a matched pattern of tags associated with the keyword specifying the proprietary font name as. But, here Hindi is used for our test set....