scispace - formally typeset
Book ChapterDOI

Processing Large Text Corpus Using N-Gram Language Modeling and Smoothing

Reads0
Chats0
TLDR
In this article, N-gram models are discussed and evaluated using Good Turing Estimation, perplexity measure and type-to-token ratio to predict the next word when the user provides input.
Abstract
The prediction of next word, letter or phrase for the user, while she is typing, is a really valuable tool for improving user experience. The users are communicating, writing reviews and expressing their opinion on such platforms frequently and many times while moving. It has become necessary to provide the user with an application that can reduce typing effort and spelling errors when they have limited time. The text data is getting larger in size due to the extensive use of all kinds of social media platforms and so implementation of text prediction application is difficult considering the size of text data to be processed for language modeling. This research paper’s primary objective is processing large text corpus and implementing a probabilistic model like N-grams to predict the next word when the user provides input. In this exploratory research, n-gram models are discussed and evaluated using Good Turing Estimation, perplexity measure and type-to-token ratio.

read more

Citations
More filters
Journal ArticleDOI

Quranic Optical Text Recognition Using Deep Learning Models

TL;DR: In this paper, a Quranic optical character recognition (OCR) system based on convolutional neural network (CNN) followed by RNN is introduced, and six deep learning models are built to study the effect of different representations of the input and output, and the accuracy and performance of the models.
Book ChapterDOI

Augmenting Mental Healthcare With Artificial Intelligence, Machine Learning, and Challenges in Telemedicine

TL;DR: The goal of this chapter is to review the literature on artificial intelligence and machine learning algorithms for detecting a person's mental health by utilizing patient health records and explains the use of artificial intelligence in curing and monitoring a patient with mental illness through telemedicine.
Journal ArticleDOI

Shedding light on the reverse logistics’ decision-making: a social-media analytics study of the electronics industry in developing vs developed countries

TL;DR: In this paper , a multi-industry applied model using the deep learning method in social media analysis to make the best decision for returning products in reverse logistics, along with the sustainability and circular economy concerns is proposed.
Journal ArticleDOI

Extracting information and inferences from a large text corpus

TL;DR: In this article , an incremental topic model with word embedding (ITMWE) is proposed that processes large text data in an incremental environment and extracts latent topics that best describe the document collections.
References
More filters
Journal ArticleDOI

A Survey on Techniques in NLP

TL;DR: Three phases of natural language processing namely, language modelling, parts-ofspeech tagging and parsing are described, outlining the approaches used that can be used.
Book ChapterDOI

Techniques, Applications, and Issues in Mining Large-Scale Text Databases

TL;DR: The main objective is to review text mining techniques, application areas, and existing issues.
Journal ArticleDOI

Bayesian Analysis in Natural Language Processing

TL;DR: In this article, the methods and algorithms that are needed to fluently read Bayesian learning papers in NLP and to do research in the area are discussed. But they are partially borrowed from both machine learning and statistics and are partially developed ''in-house''.
Journal ArticleDOI

Text Classification Using the N-Gram Graph Representation Model Over High Frequency Data Streams

TL;DR: This research proposes an innovative and high-accurate text stream classification model that is designed in an elastic distributed way and is capable to service text load with fluctuated frequency.
Book ChapterDOI

SPAM: An Effective and Efficient Spatial Algorithm for Mining Grid Data

TL;DR: This chapter has defined the novel framework SpaGrid and SPAM algorithm to retrieve clusters of variant shape and size from large databases and the application of the framework is used with spatial medical databases where the implementation details are discussed with Matlab 7.1.