Topic

Noisy text analytics

About: Noisy text analytics is a research topic. Over the lifetime, 700 publications have been published within this topic receiving 28759 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

PhotoOCR: Reading Text in Uncontrolled Conditions

[...]

Alessandro Bissacco¹, Mark Cummins¹, Yuval Netzer¹, Hartmut Neven¹•Institutions (1)

Google¹

01 Dec 2013

TL;DR: This work describes Photo OCR, a system for text extraction from images that is capable of recognizing text in a variety of challenging imaging conditions where traditional OCR systems fail, notably in the presence of substantial blur, low resolution, low contrast, high image noise and other distortions.

...read moreread less

Abstract: We describe Photo OCR, a system for text extraction from images. Our particular focus is reliable text extraction from smartphone imagery, with the goal of text recognition as a user input modality similar to speech recognition. Commercially available OCR performs poorly on this task. Recent progress in machine learning has substantially improved isolated character classification, we build on this progress by demonstrating a complete OCR system using these techniques. We also incorporate modern data center-scale distributed language modelling. Our approach is capable of recognizing text in a variety of challenging imaging conditions where traditional OCR systems fail, notably in the presence of substantial blur, low resolution, low contrast, high image noise and other distortions. It also operates with low latency, mean processing time is 600 ms per image. We evaluate our system on public benchmark datasets for text extraction and outperform all previously reported results, more than halving the error rate on multiple benchmarks. The system is currently in use in many applications at Google, and is available as a user input modality in Google Translate for Android.

...read moreread less

499 citations

Book•

Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications

[...]

Gary D. Miner, John Elder, Thomas Hill, Robert Nisbet, Dursun Delen, Andrew Fast - Show less +2 more

25 Jan 2012

TL;DR: This comprehensive professional reference brings together all the information, tools and methods a professional will need to efficiently use text mining applications and statistical analysis and presents a comprehensive how- to reference that shows the user how to conduct text mining and statistically analyze results.

...read moreread less

Abstract: The world contains an unimaginably vast amount of digital information which is getting ever vaster ever more rapidly. This makes it possible to do many things that previously could not be done: spot business trends, prevent diseases, combat crime and so on. Managed well, the textual data can be used to unlock new sources of economic value, provide fresh insights into science and hold governments to account. As the Internet expands and our natural capacity to process the unstructured text that it contains diminishes, the value of text mining for information retrieval and search will increase dramatically. This comprehensive professional reference brings together all the information, tools and methods a professional will need to efficiently use text mining applications and statistical analysis. The Handbook of Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications presents a comprehensive how- to reference that shows the user how to conduct text mining and statistically analyze results. In addition to providing an in-depth examination of core text mining and link detection tools, methods and operations, the book examines advanced preprocessing techniques, knowledge representation considerations, and visualization approaches. Finally, the book explores current real-world, mission-critical applications of text mining and link detection using real world example tutorials in such varied fields as corporate, finance, business intelligence, genomics research, and counterterrorism activities.-Extensive case studies, most in a tutorial format, allow the reader to 'click through' the example using a software program, thus learning to conduct text mining analyses in the most rapid manner of learning possible -Numerous examples, tutorials, power points and datasets available via companion website on Elsevierdirect.com -Glossary of text mining terms provided in the appendix

...read moreread less

450 citations

Text Classification Using Machine Learning Techniques

[...]

M. Ikonomakis, Sotiris Kotsiantis, Vassilis Tampakas

01 Jan 2005

TL;DR: This paper illustrates the text classification process using machine learning techniques to manage and process a vast amount of documents in digital forms that are widespread and continuously increasing.

...read moreread less

Abstract: Automated text classification has been considered as a vital method to manage and process a vast amount of documents in digital forms that are widespread and continuously increasing In general, text classification plays an important role in information extraction and summarization, text retrieval, and question- answering This paper illustrates the text classification process using machine learning techniques The references cited cover the major theoretical issues and guide the researcher to interesting research directions

...read moreread less

447 citations

Posted Content•

A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques

[...]

Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D. Trippe, Juan B. Gutierrez, Krys J. Kochut - Show less +3 more

10 Jul 2017-arXiv: Computation and Language

TL;DR: Several of the most fundamental text mining tasks and techniques including text pre-processing, classification and clustering are described, which briefly explain text mining in biomedical and health care domains.

...read moreread less

Abstract: The amount of text that is generated every day is increasing dramatically. This tremendous volume of mostly unstructured text cannot be simply processed and perceived by computers. Therefore, efficient and effective techniques and algorithms are required to discover useful patterns. Text mining is the task of extracting meaningful information from text, which has gained significant attentions in recent years. In this paper, we describe several of the most fundamental text mining tasks and techniques including text pre-processing, classification and clustering. Additionally, we briefly explain text mining in biomedical and health care domains.

...read moreread less

422 citations

Journal Article•DOI•

Automatic text categorization in terms of genre and author

[...]

Efstathios Stamatatos¹, George Kokkinakis¹, Nikos Fakotakis¹•Institutions (1)

University of Patras¹

01 Dec 2000-Computational Linguistics

TL;DR: This paper proposes a set of style markers including analysis-level measures that represent the way in which the input text has been analyzed and capture useful stylistic information without additional cost to take full advantage of existing natural language processing (NLP) tools.

...read moreread less

Abstract: The two main factors that characterize a text are its content and its style, and both can be used as a means of categorization. In this paper we present an approach to text categorization in terms of genre and author for Modern Greek. In contrast to previous stylometric approaches, we attempt to take full advantage of existing natural language processing (NLP) tools. To this end, we propose a set of style markers including analysis-level measures that represent the way in which the input text has been analyzed and capture useful stylistic information without additional cost. We present a set of small-scale but reasonable experiments in text genre detection, author identification, and author verification tasks and show that the proposed method performs better than the most popular distributional lexical measures, i.e., functions of vocabulary richness and frequencies of occurrence of the most frequent words. All the presented experiments are based on unrestricted text downloaded from the World Wide Web without any manual text preprocessing or text sampling. Various performance issues regarding the training set size and the significance of the proposed style markers are discussed. Our system can be used in any application that requires fast and easily adaptable text categorization in terms of stylistically homogeneous categories. Moreover, the procedure of defining analysis-level markers can be followed in order to extract useful stylistic information using existing text processing tools.

...read moreread less

416 citations

Collapse

Network Information

Performance

Metrics

715

Papers

30,953

Citations

No. of papers in the topic in previous years
Year	Papers
2023	6
2022	8
2020	1
2019	1
2018	4
2017	23

Noisy text analytics

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics