scispace - formally typeset
Search or ask a question

Showing papers on "Noisy text analytics published in 2023"


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a text-centered method called RUArt (Reading, Understanding and Answering the Related Text) for text-based VQA, which takes an image and a question as input and obtains text and scene objects.
Abstract: Text-based visual question answering (VQA) requires to read and understand text in an image to correctly answer a given question. However, most current methods simply add optical character recognition (OCR) tokens extracted from the image into the VQA model without considering contextual information of OCR tokens and mining the relationships between OCR tokens and scene objects. In this paper, we propose a novel text-centered method called RUArt (Reading, Understanding and Answering the Related Text) for text-based VQA. Taking an image and a question as input, RUArt first reads the image and obtains text and scene objects. Then, it understands the question, OCRed text and objects in the context of the scene, and further mines the relationships among them. Finally, it answers the related text for the given question through text semantic matching and reasoning. We evaluate our RUArt on two text-based VQA benchmarks (ST-VQA and TextVQA) and conduct extensive ablation studies for exploring the reasons behind RUArt’s effectiveness. Experimental results demonstrate that our method can effectively explore the contextual information of the text and mine the stable relationships between the text and objects.

1 citations


Journal ArticleDOI
TL;DR: In this paper , the authors presented an application to analyze text from documents on smartphones using optical character recognition (OCR) and text-to-speech (T2S) algorithms.
Abstract: Abstract: With the exponential growth in data, it is essential to analyse it. The data is a valuable source of information and knowledge that should be effectively interpreted to be helpful. The data should be interpreted in different forms based on the user requirements to get precise information and knowledge. Nowadays smartphones are the most commonly used electronic device. A smartphone is not only a communication device, it is also a powerful computing device. So it is possible to apply translation, text extraction, summary, and much more techniques, which require much computational work. This paper presents an application to analyse the text from documents on smartphones. However, it is challenging to interpret the documents on smartphones. The proposed application converts the documents or images to searchable and editable digital text, and further, it can be used to analyse them into different forms. The objective of this application is four-fold 1) To recognize text from documents or images by using optical Character Recognition 2) Summarization of the text 3) Translate the extracted text to different languages 4) Generation of speech from the text by using a text-to-speech algorithm.

Proceedings ArticleDOI
03 Mar 2023
TL;DR: In this paper , an integrated model that uses machine learning techniques to perform text-to-text, image-totext, and audio-to text conversions, with particularly focus on Indian languages, is presented.
Abstract: This paper presents an integrated model that uses machine learning techniques to perform text-to-text, image-to-text, and audio-to-text conversions, with particularly focus on Indian languages. The proposed model which can translate text, image, and voice has been tested on large datasets of various Indian languages and utilizes state-of-the-art techniques such as machine learning, computer vision, and speech recognition to accurately transcribe and translate the input data. The results obtained from the experiments demonstrate the effectiveness of the model by accurately converting text, images, and audio to text, and the potential applications of our proposed model range from language learning, accessibility for non-verbal or non-hearing individuals to cross-language communication. The proposed model is intended to bridge the language gap and facilitate communication among people from different linguistic backgrounds.

Proceedings ArticleDOI
21 Apr 2023
TL;DR: In this article , a text summarization tool based on the concepts of Machine Learning and Natural Language Processing is presented. This tool aims to improve productivity and efficiency of processed data by faster generation of precise and meaningful data to all the providers across all platforms.
Abstract: The automated analysis of electronic text is referred to as “text processing”. The amount of online textual data is increasing, and automatic text processing techniques have the potential to be tremendously beneficial because they can gather more meaningful information faster. Features like text summarization, language translation, emotion classifier and headline generation of news articles are some of the popular ways of text processing. The fundamental goal of text summarization is to extract the most important information from a text and deliver it in a concise and legible form. The practice of transforming written text from one language into another such that it may be easily understood is known as language translation. Emotion classifier analyses the emotions and categorizes the text into various emotions. Headline generation is the process of obtaining the headlines from various news articles. This paper is about developing a text processing tool based on the concepts of Machine Learning and Natural Language Processing. Implementation of this tool allows for the automation of text processing. This tool aims to improve productivity and efficiency of processed data by faster generation of precise and meaningful data to all the providers across all platforms.

Journal ArticleDOI
TL;DR: In this article , the authors presented an application to analyze text from documents on smartphones using optical character recognition (OCR) and a text-to-speech (T2S) algorithm.
Abstract: Abstract: With the exponential growth in data, it is essential to analyze it. The data is a valuable source ofinformation and knowledge that should be effectivelyinterpreted to be helpful. The data should be interpreted in different forms based on the user requirements to get precise information and knowledge. Nowadays smartphones are the most commonly used electronic device. A smartphone is notonly a communication device, it is also a powerful computing device. So it is possible to apply translation, text extraction, summary, and much moretechniques, which require much computational work. This paper presents an application to analyze the text from documents on smartphones. However, it is challenging to interpret the documents onsmartphones. The proposed application converts the documents or images to searchable and editable digital text, and further, it can be used to analyse them into different forms. The objective of this application is four-fold 1) To recognize text from documents orimages by using optical Character Recognition 2) Summarization of the text 3) Translate the extracted text to different languages 4) Generation of speech from the text by using a text-to-speech algorithm.

Posted ContentDOI
18 May 2023
TL;DR: TextDiffuser as mentioned in this paper uses a Transformer model to generate the layout of keywords extracted from text prompts, and then diffusion models generate images conditioned on the text prompt and the generated layout.
Abstract: Diffusion models have gained increasing attention for their impressive generation abilities but currently struggle with rendering accurate and coherent text. To address this issue, we introduce TextDiffuser, focusing on generating images with visually appealing text that is coherent with backgrounds. TextDiffuser consists of two stages: first, a Transformer model generates the layout of keywords extracted from text prompts, and then diffusion models generate images conditioned on the text prompt and the generated layout. Additionally, we contribute the first large-scale text images dataset with OCR annotations, MARIO-10M, containing 10 million image-text pairs with text recognition, detection, and character-level segmentation annotations. We further collect the MARIO-Eval benchmark to serve as a comprehensive tool for evaluating text rendering quality. Through experiments and user studies, we show that TextDiffuser is flexible and controllable to create high-quality text images using text prompts alone or together with text template images, and conduct text inpainting to reconstruct incomplete images with text. The code, model, and dataset will be available at \url{https://aka.ms/textdiffuser}.