Showing papers on "Noisy text analytics published in 2017"

PDF

Open Access

Posted Content•

A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques

[...]

Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D. Trippe, Juan B. Gutierrez, Krys J. Kochut - Show less +3 more

10 Jul 2017-arXiv: Computation and Language

TL;DR: Several of the most fundamental text mining tasks and techniques including text pre-processing, classification and clustering are described, which briefly explain text mining in biomedical and health care domains.

...read moreread less

Abstract: The amount of text that is generated every day is increasing dramatically. This tremendous volume of mostly unstructured text cannot be simply processed and perceived by computers. Therefore, efficient and effective techniques and algorithms are required to discover useful patterns. Text mining is the task of extracting meaningful information from text, which has gained significant attentions in recent years. In this paper, we describe several of the most fundamental text mining tasks and techniques including text pre-processing, classification and clustering. Additionally, we briefly explain text mining in biomedical and health care domains.

...read moreread less

422 citations

Posted Content•

STN-OCR: A single Neural Network for Text Detection and Text Recognition.

[...]

Christian Bartz, Haojin Yang, Christoph Meinel

27 Jul 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: Experimental results on public benchmark datasets show the ability of the STN-OCR model to handle a variety of different tasks, without substantial changes in its overall network structure.

...read moreread less

Abstract: Detecting and recognizing text in natural scene images is a challenging, yet not completely solved task. In re- cent years several new systems that try to solve at least one of the two sub-tasks (text detection and text recognition) have been proposed. In this paper we present STN-OCR, a step towards semi-supervised neural networks for scene text recognition, that can be optimized end-to-end. In contrast to most existing works that consist of multiple deep neural networks and several pre-processing steps we propose to use a single deep neural network that learns to detect and recognize text from natural images in a semi-supervised way. STN-OCR is a network that integrates and jointly learns a spatial transformer network, that can learn to detect text regions in an image, and a text recognition network that takes the identified text regions and recognizes their textual content. We investigate how our model behaves on a range of different tasks (detection and recognition of characters, and lines of text). Experimental results on public benchmark datasets show the ability of our model to handle a variety of different tasks, without substantial changes in its overall network structure.

...read moreread less

70 citations

Proceedings Article•DOI•

Unambiguous Text Localization and Retrieval for Cluttered Scenes

[...]

Xuejian Rong¹, Chucai Yi², Yingli Tian²•Institutions (2)

City College of New York¹, City University of New York²

01 Jul 2017

TL;DR: A novel recurrent Dense Text Localization Network (DTLN) is proposed to sequentially decode the intermediate convolutional representations of a cluttered scene image into a set of distinct text instance detections, and a Context Reasoning Text Retrieval model is proposed, which jointly encodes text instances and their context information through a recurrent network, and ranks localized text bounding boxes by a scoring function of context compatibility.

...read moreread less

Abstract: Text instance as one category of self-described objects provides valuable information for understanding and describing cluttered scenes. In this paper, we explore the task of unambiguous text localization and retrieval, to accurately localize a specific targeted text instance in a cluttered image given a natural language description that refers to it. To address this issue, first a novel recurrent Dense Text Localization Network (DTLN) is proposed to sequentially decode the intermediate convolutional representations of a cluttered scene image into a set of distinct text instance detections. Our approach avoids repeated detections at multiple scales of the same text instance by recurrently memorizing previous detections, and effectively tackles crowded text instances in close proximity. Second, we propose a Context Reasoning Text Retrieval (CRTR) model, which jointly encodes text instances and their context information through a recurrent network, and ranks localized text bounding boxes by a scoring function of context compatibility. Quantitative evaluations on standard scene text localization benchmarks and a newly collected scene text retrieval dataset demonstrate the effectiveness and advantages of our models for both scene text localization and retrieval.

...read moreread less

31 citations

Journal Article•DOI•

Review of Text Extraction Algorithms for Scene-text and Document Images

[...]

Parul Sahare¹, Sanjay B. Dhok¹•Institutions (1)

Visvesvaraya National Institute of Technology¹

04 Mar 2017-Iete Technical Review

TL;DR: This paper attempts to analyze and classify the various text extraction schemes for the scene-text and document images, and compares different approaches of these images based on common problems and discusses their merits and demerits.

...read moreread less

Abstract: One of the major applications of text retrieval from images is to extract the text information and then recognize its characters. This is helpful for indexing the images within storage media. When we want to search a particular image or document, there is no need to go through a large bunch of images. We go only through the group of indexed images, so that the task of finding the particular image becomes easy. Extracting text lines from scanned document images present a major problem in optical character recognition process as skewed text lines raise the complexity. The problem gets even worse with the text lines of different orientations. Such lines are called as multi-skewed lines. These multi-skewed lines are easily observed in both printed and handwritten documents. It is a challenging task to design a real time system, which can maintain a high recognition rate with good accuracy and is independent of the type of documents and character fonts. In this paper, we attempt to analyze and classify...

...read moreread less

27 citations

Journal Article•DOI•

Towards filtering undesired short text messages using an online learning approach with semantic indexing

[...]

Renato Moraes Silva¹, Tulio C. Alberto², Tiago A. Almeida², Akebo Yamakami¹•Institutions (2)

State University of Campinas¹, Federal University of São Carlos²

15 Oct 2017-Expert Systems With Applications

TL;DR: A new hybrid ensemble approach is proposed that combines the predictions obtained by the classifiers using the original text samples along with their variations created by applying text normalization and semantic indexing techniques, which can improve the text content quality and enhance the performance of the expert systems for spamming detection.

...read moreread less

Abstract: A new classifier is presented to detect undesired short text comments.The proposed approach is light, fast, multinomial and offers incremental learning.The impact of applying text normalization and semantic indexing is studied.The results indicate the proposed techniques outperformed most of the approaches.Text normalization and semantic indexing enhanced the classifiers performance. The popularity and reach of short text messages commonly used in electronic communication have led spammers to use them to propagate undesired content. This is often composed by misleading information, advertisements, viruses, and malwares that can be harmful and annoying to users. The dynamic nature of spam messages demands for knowledge-based systems with online learning and, therefore, the most traditional text categorization techniques can not be used. In this study, we introduce the MDLText, a text classifier based on the minimum description length principle, to the context of filtering undesired short text messages. The proposed approach supports incremental learning and, therefore, its predictive model is scalable and can adapt to continuously evolving spamming techniques. It is also fast, with computational cost increasing linearly with the number of samples and features, which is very desirable for expert systems applied to real-time electronic communication. In addition to the dynamic nature of these messages, they are also short and usually poorly written, rife with slangs, symbols, and abbreviations that difficult text representation, learning, and filtering. In this scenario, we also investigated the benefits of using text normalization and semantic indexing techniques. We showed these techniques can improve the text content quality and, consequently, enhance the performance of the expert systems for spamming detection. Based on these findings, we propose a new hybrid ensemble approach that combines the predictions obtained by the classifiers using the original text samples along with their variations created by applying text normalization and semantic indexing techniques. It has the advantages of being independent of the classification method and the results indicated it is efficient to filter undesired short text messages.

...read moreread less

23 citations

Journal Article•DOI•

FAST: Facilitated and Accurate Scene Text Proposals through FCN Guided Pruning

[...]

Dena Bazazian¹, Raul Gomez¹, Anguelos Nicolaou¹, Lluis Gomez¹, Dimosthenis Karatzas¹, Andrew D. Bagdanov², Andrew D. Bagdanov¹ - Show less +3 more•Institutions (2)

Autonomous University of Barcelona¹, University of Florence²

01 Sep 2017-Pattern Recognition Letters

TL;DR: This paper combines the Text Proposals algorithm with Fully Convolutional Networks to efficiently reduce the number of proposals while maintaining the same recall level and thus gaining a significant speed up.

...read moreread less

20 citations

Proceedings Article•DOI•

End-to-end Learning for Short Text Expansion

[...]

Jian Tang¹, Yue Wang¹, Kai Zheng², Qiaozhu Mei¹•Institutions (2)

University of Michigan¹, University of California, Irvine²

13 Aug 2017

TL;DR: In this paper, a deep memory network is proposed to automatically find relevant information from a collection of longer documents and reformulate the short text through a gating mechanism, which significantly outperforms classical text expansion methods.

...read moreread less

Abstract: Effectively making sense of short texts is a critical task for many real world applications such as search engines, social media services, and recommender systems. The task is particularly challenging as a short text contains very sparse information, often too sparse for a machine learning algorithm to pick up useful signals. A common practice for analyzing short text is to first expand it with external information, which is usually harvested from a large collection of longer texts. In literature, short text expansion has been done with all kinds of heuristics. We propose an end-to-end solution that automatically learns how to expand short text to optimize a given learning task. A novel deep memory network is proposed to automatically find relevant information from a collection of longer documents and reformulate the short text through a gating mechanism. Using short text classification as a demonstrating task, we show that the deep memory network significantly outperforms classical text expansion methods with comprehensive experiments on real world data sets.

...read moreread less

13 citations

Journal Article•DOI•

Strategies to reduce the negative effects of spoken explanatory text on integrated tasks

[...]

Anne-Marie Singh¹, Nadine Marcus¹, Paul Ayres¹•Institutions (1)

University of New South Wales¹

01 Apr 2017-Instructional Science

TL;DR: The results suggest that spoken text is a cause of the transient information effect, which can be best avoided by substituting written text for spoken text on tasks that require integration of information.

...read moreread less

Abstract: Two experiments involving 125 grade-10 students learning about commerce investigated strategies to overcome the transient information effect caused by explanatory spoken text. The transient information effect occurs when learning is reduced as a result of information disappearing before the learner has time to adequately process it, or link it with new information. Spoken text, unless recorded or repeated in some fashion, is fleeting in nature and can be a major cause of transiency. The three strategies investigated, all theoretically expected to enhance learning, were: (a) replacing lengthy spoken text with written text (Experiments 1 and 2), (b) replacing lengthy continuous text with segmented text (Experiment 1), and (c) adding a diagram to lengthy spoken text (Experiment 2). In both experiments on tasks that required information to be integrated across segments, written text was found to be superior to spoken text. In Experiment 1 the expected advantage of segmented text in reducing transitory effects was not found. Compared with written continuous text the segmented spoken text strategy was inferior. Experiment 2 found that adding a diagram to spoken text was an advantage compared to spoken text alone consistent with a multimedia effect. Overall, the results suggest that spoken text is a cause of the transient information effect, which can be best avoided by substituting written text for spoken text on tasks that require integration of information.

...read moreread less

8 citations

Journal Article•

Converting and Deploying an Unstructured Data using Pattern Matching

[...]

M Anujna, A Ushadevi

01 Jan 2017-American Journal of Intelligent Systems

TL;DR: This paper focuses on using the pattern matching technique for regular expression to find the relevant data from the text/word file to extract the patterns from theText mining documents.

...read moreread less

Abstract: Text mining is also known as knowledge discovery from textual databases; its job is to derive a high level knowledge from the text. The process of obtaining useful information from the records is also known a text mining. The system uses many data mining approaches to extract the patterns from the text documents. The challenge is using those updated patterns and implementing an algorithm for pattern discovery is still an open research issue. The paper focuses on using the pattern matching technique for regular expression to find the relevant data from the text/word file. The text file containing large number free-text is used to fetch all the discovered words or characters from the documents. The system is helpful for the users to search the relevant document, and converts all the unstructured data into structured form.

...read moreread less

6 citations

Proceedings Article•DOI•

Constructing Structured Information Networks from Massive Text Corpora

[...]

Xiang Ren¹, Meng Jiang¹, Jingbo Shang¹, Jiawei Han¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

03 Apr 2017

TL;DR: This tutorial introduces data-driven methods to construct structured information networks for text corpora of different kinds to rep- resent their factual information and demonstrates how these constructed networks aid in text analytics and knowledge discovery at a large scale.

...read moreread less

Abstract: In today's computerized and information-based society, text data is rich but messy. People are soaked with vast amounts of natural-language text data, ranging from news articles, social media post, advertisements, to a wide range of textual information from various domains (medical records, corporate reports). To turn such massive unstructured text data into actionable knowledge, one of the grand challenges is to gain an understanding of the factual information (e.g., entities, attributes, relations, events) in the text. In this tutorial, we introduce data-driven methods to construct structured information networks (where nodes are different types of entities attached with attributes, and edges are different relations between entities) for text corpora of different kinds (especially for massive, domain-specific text corpora) to rep- resent their factual information. We focus on methods that are minimally-supervised, domain-independent, and language-independent for fast network construction across various application domains (news, web, biomedical, reviews). We demonstrate on real datasets including news articles, scientific publications, tweets and reviews how these constructed networks aid in text analytics and knowledge discovery at a large scale.

...read moreread less

5 citations

Proceedings Article•DOI•

A framework of text detection and recognition from natural images for mobile device

[...]

Zied Selmi, Mohamed Ben Halima, Ali Wali, Adel M. Alimi

17 Mar 2017

TL;DR: This paper presents a framework for text detection and recognition from natural images for mobile devices, and focuses particularly on the text.

...read moreread less

Abstract: On the light of the remarkable audio-visual effect on modern life, and the massive use of new technologies (smartphones, tablets ...), the image has been given a great importance in the field of communication. Actually, it has become the most effective, attractive and suitable means of communication for transmitting information between different people. Of all the various parts of information that can be extracted from the image, our focus will be particularly on the text. Actually, since its detection and recognition in a natural image is a major problem in many applications, the text has drawn the attention of a great number of researchers in recent years. In this paper, we present a framework for text detection and recognition from natural images for mobile devices.

...read moreread less

Proceedings Article•DOI•

Adverse conditions and techniques for cross-lingual text recognition

[...]

Achint Kaur, Urmila Shrawankar

01 Feb 2017

TL;DR: This paper presents a work of Text identification from an image consisting of two languages and translating it into a single language using OCR and SVM classifier.

...read moreread less

Abstract: Now a days it has become a trend to write movie name, company name, name plate, vehicle number plate, brands name in mixed languages. Because of lack of language knowledge, it becomes difficult for people to identify bi-lingual words. The text written in the image contains different fonts and background image. This paper presents a work of Text identification from an image consisting of two languages and translating it into a single language. OCR is implemented in order to convert electronic form of image into machine editable text. SVM classifier is used for differentiating scripts based on characters density features and map new examples to the predicted category. Bi-lingual or cross language words recognition from an image solves the problem of reading a text from an image which contains Hindi and English language.

...read moreread less

Posted Content•

Full-Page Text Recognition: Learning Where to Start and When to Stop

[...]

Bastien Moysset, Christopher Kermorvant, Christian Wolf

27 Apr 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: A new approach for full page text recognition based on regressions with Fully Convolutional Neural Networks and Multidimensional Long Short-Term Memory as contextual layers and only the position of the left side of the text lines are predicted.

...read moreread less

Abstract: Text line detection and localization is a crucial step for full page document analysis, but still suffers from heterogeneity of real life documents. In this paper, we present a new approach for full page text recognition. Localization of the text lines is based on regressions with Fully Convolutional Neural Networks and Multidimensional Long Short-Term Memory as contextual layers. In order to increase the efficiency of this localization method, only the position of the left side of the text lines are predicted. The text recognizer is then in charge of predicting the end of the text to recognize. This method has shown good results for full page text recognition on the highly heterogeneous Maurdor dataset.

...read moreread less

Posted Content•

A learning-based approach to text image retrieval: using CNN features and improved similarity metrics

[...]

Tan Mao, Yuan Siping, Su Yongxin

23 Mar 2017

TL;DR: Experimental results show that the proposed learning-based approach to text image retrieval with the purpose of finding out the original or similar text through a query text image has good ability to retrieve the original text content.

...read moreread less

Abstract: Rapid increase of digitized document give birth to high demand of document image retrieval. While conventional document image retrieval approaches depend on complex OCR-based text recognition and text similarity detection, this paper proposes a new content-based approach, in which more attention is paid to features extraction and fusion. In the proposed approach, multiple features of document images are extracted by different CNN models. After that, the extracted CNN features are reduced and fused into weighted average feature. Finally, the document images are ranked based on feature similarity to a provided query image. Experimental procedure is performed on a group of document images that transformed from academic papers, which contain both English and Chinese document, the results show that the proposed approach has good ability to retrieve document images with similar text content, and the fusion of CNN features can effectively improve the retrieval accuracy.

...read moreread less

Proceedings Article•DOI•

Automatic Text Recognition in Web Images

[...]

Rodolfo Valiente¹, José C. Gutiérrez¹, Marcelo T. Sadaike¹, Graça Bressan¹•Institutions (1)

University of São Paulo¹

17 Oct 2017

TL;DR: An architecture that efficiently integrates localization, extraction and recognition algorithms applied to text recognition in web images is presented, making the system flexible, scalable and robust in detecting texts from complex web images with different orientations, dimensions and colors.

...read moreread less

Abstract: Web images play an important role in delivering multimedia content on the Web. The text embedded in web images carry semantic information related to layout and content of the pages. Statistics show that there is a significant need to detect and recognize text from web images. This paper presents an architecture that efficiently integrates localization, extraction and recognition algorithms applied to text recognition in web images. In the recognition step is proposed a procedure based on super-resolution and an iterative method for improving the performance. The approach is implemented and evaluated using Matlab and cloud computing, making the system flexible, scalable and robust in detecting texts from complex web images with different orientations, dimensions and colors. Competitive results are presented, both in precision and recognition rate, when compared with other systems in the existing literature.

...read moreread less

Journal Article•

A Brief Review on Text Traces using OCR

[...]

Anuj Kumar Garg, Amal Rastogi, Kartik Rastogi, Avdhesh Gupta

01 Apr 2017-Imperial journal of interdisciplinary research

TL;DR: The technology for text recognition using Optical Character Recognition (OCR) for the mobile phone application is proposed which can easily recognized the text written on that document with various languages.

...read moreread less

Abstract: In this paper, we propose the technology for text recognition using Optical Character Recognition (OCR) for the mobile phone application. The camera within android mobile phone scan the text written on document and then the OCR is applied which can easily recognized the text written on that document with various languages. OCR is the machine replication of human reading and has been the subject of intensive research for more than three decades. OCR can be described as mechanical or electronic conversion of scanned text where text can be handwritten, typewritten or printed form. It is a method of digitizing printed texts so that it can be electronically searched and used in various machine processes. It converts the images into machine-encoded text that can be used in machine translation, text-to-speech and text mining.

...read moreread less

Journal Article•DOI•

TM-SGTD: Text Mining Based Semantic Graph for Text Document Approach for Text Representation

[...]

Ashish Pacharne, Pramod S Nair, Srinivasa Rao D

31 Aug 2017-International journal of engineering and technology

TL;DR: The experimental results have proved the better performance of the proposed text information representation model in terms of its Time and Space complexity.

...read moreread less

Abstract: Text representation is the essential step for the tasks of text mining. To represent the textual information more expressively, a kind of Text Mining based Semantic Graph approach is proposed, in which more semantic and ordering information among terms as well as the structural information of the text is incorporated. Such model can be constructed by extracting representative terms from texts and their mutually semantic relationships. The implementation of the proposed work is provided using the JAVA environment and python environment. Moreover, WordNet is showing relationship amongst word node. So that GEPHI tool is used to constructing more effectively semantic graph. Additionally the comparative performance is also compared with traditional. In order to compare the performance of the algorithms the memory consumption and time consumption is taken as stand parameters. The experimental results have proved the better performance of the proposed text information representation model in terms of its Time and Space complexity. KeywordSVM, Semantic Graphs, POS Tagging, WordNet, Text representation, Text Mining, Graph Model, Semantic Networks

...read moreread less

A Review of Text Mining Techniques & Applications

[...]

Kanak Sharma, Ashish Sharma, Dhananjay Joshi, Nikhil Vyas, Arpit Bapna - Show less +1 more

13 Mar 2017

TL;DR: The paper discusses some of the developments in text mining applications, primarily reviewing techniques in the classification, summarization and analysis of text, as advocated by academia.

...read moreread less

Abstract: Due to the ever increasing rate at which information is generated, text mining and its automated analysis have become the need of the hour. The paper discusses some of the developments in text mining applications, primarily reviewing techniques in the classification, summarization and analysis of text, as advocated by academia. The goal is, in essence, to ultimately turn unstructured text into useful data and information for analysis using critical methods. We introduce the paper by introducing the concept of “textual analysis” similar to text mining done using the analysis of Natural Language texts, their respective techniques in use and the open source tools in use to do so. We survey varied topics that use NLP, and also expand the horizons of this domain by devising new techniques for improving the efficiency even in limited amounts of data, improved accuracy, new methods, novel approaches, and new application areas for it, and relating to text summarization and text classification. Various text mining techniques used in text classification and summarization are reviewed, followed by the application areas of text mining being worked upon by businesses. Finally, the paper concludes by introducing “organizational text mining” and emphasizing the need for it.

...read moreread less

Journal Article•DOI•

Visualising Arabic sentiments and association rules in financial text

[...]

Hamed AL-Rubaiee, Renxi Qiu, Dayou Li

28 Feb 2017-International Journal of Advanced Computer Science and Applications

TL;DR: Two novel frameworks for the classification and extraction of the association rules and the visualisation of financial Arabic text are presented in order to realize both the general structure and the sentiment within an accumulated corpus.

...read moreread less

Abstract: Text mining methods involve various techniques, such as text categorization, summarisation, information retrieval, document clustering, topic detection, and concept extraction. In addition, because of the difficulties involved in text mining, visualisation techniques can play a paramount role in the analysis and pre-processing of textual data. This paper will present two novel frameworks for the classification and extraction of the association rules and the visualisation of financial Arabic text in order to realize both the general structure and the sentiment within an accumulated corpus. However, mining unstructured data with natural language processing (NLP) and machine learning techniques can be arduous, especially where the Arabic language is concerned, because of limited research in this area. The results show that our frameworks can readily classify Arabic tweets. Furthermore, they can handle many antecedent text association rules for the positive class and the negative class.

...read moreread less

Proceedings Article•DOI•

Scene text dataset in Turkish

[...]

Nesli Erdogmus¹•Institutions (1)

İzmir Institute of Technology¹

01 May 2017

TL;DR: A Turkish scene text database is collected for the first time in the literature and the contents of this database, shortly called STRIT (Scene Text Recognition In Turkish), are detailed.

...read moreread less

Abstract: Scene text localization and recognition keeps attracting an increasing interest from researchers due to its valuable advantage in extracting content from real world images and in image retrieval via text search. Nevertheless, due to the fact that the majority of the image datasets that are commonly used in this field is comprised of text in English, the related studies have mostly been limited to a single language. On that account, in order to apply the technologies developed for scene text detection and recognition to Turkish scene text, analyze their performances and to develop Turkish language specific algorithms, a Turkish scene text database is collected for the first time in the literature. In this paper, the contents of this database, shortly called STRIT (Scene Text Recognition In Turkish), are detailed. Additionally, two baseline methods are tested to detect and recognize scene text in Turkish and the preliminary results are presented.

...read moreread less

Book Chapter•DOI•

Integration of Text Classification Model with Speech to Text System

[...]

T. S. Aswin, Rahul Ignatius, Mathangi Ramachandran

12 Dec 2017

TL;DR: The proposed system is the integration of an efficiently trained text classifier model with an open source speech to text conversion plat-form that will eliminate the need for agents manually processing the conversation and initiating required action.

...read moreread less

Abstract: In the services industry chat helplines were seen as more effective than a voice based service because more number of users could be serviced at the same time and with help of standard text message templates. By training text classifier models and integrating them text to speech conversion systems we can further reduce human effort and thereby deliver efficient solutions with minimal participation and increase user convenience multifold. Our proposed system is the integration of an efficiently trained text classifier model with an open source speech to text conversion plat-form. Our trained model can receive the input in text format from the con-version tool and can accurately classify its category (i.e label it). Based on its classification, consequent action is initiated. Our trained model will eliminate the need for agents manually processing the conversation and initiating required action. The system can save lot of energy, time and other resources.

...read moreread less

Product label reading based on portable camera-based for blind individuals

[...]

Poluboina Venkateswarlu, P. Gangadhar

11 Jul 2017

TL;DR: A camera-based helpful text browsing framework to assist blind persons to read text labels from hand-held objects in their day to day lives is proposed.

...read moreread less

Abstract: We propose a camera-based helpful text browsing framework to assist blind persons to read text labels from hand-held objects in their day to day lives. During this paper Camera acts as main vision to capture the image of product packaging and hand-held objects. To isolate the article from complicated backgrounds, we tend to initial propose an efficient motion-based technique to outline a district of interest (DOI) within the image. Within the extracted ROI, text localization and recognition are conducted to amass text data. Then text characters are recognized by ready-to-wear optical character recognition (OCR) computer code. Victimization text to speech converter the extracted texts are output in audio output.

...read moreread less

Journal Article•DOI•

Statistical text language recognition with the use of n-gram frequency

[...]

Yurii Nikolaevich Orlov¹, Sergey Anatolievich Shilin•Institutions (1)

Keldysh Institute of Applied Mathematics¹

01 Jan 2017-Keldysh Institute Preprints