scispace - formally typeset
Book ChapterDOI

Topic Modeling on Online News Extraction

Aashka Sahni, +1 more
- pp 611-622
Reads0
Chats0
TLDR
A word co-occurrence network-based model named WNTM is presented, which works for both long and short news by overcoming its shortcomings, and is intended to create a news recommendation system, which would recommend news to the user according to user preference.
Abstract
News media includes print media, broadcast news, and Internet (online newspapers, news blogs, etc.). The proposed system intends to collect news data from such diverse sources, capture the varied perceptions, summarize, and present the news. It involves identifying topic from real-time news extractions, then perform clustering of the news documents based on the topics. Previous approaches, like LDA, identify topics efficiently for long news texts, however, fail to do so in case of short news texts. In short news texts, the issues of acute sparsity and irregularity are prevalent. In this paper, we present a solution for topic modeling, i.e, a word co-occurrence network-based model named WNTM, which works for both long and short news by overcoming its shortcomings. It effectively works without wasting much time and space complexity. Further, we intend to create a news recommendation system, which would recommend news to the user according to user preference.

read more

Citations
More filters
Journal ArticleDOI

An Efficient Topic Modeling Approach for Text Mining and Information Retrieval through K-means Clustering

TL;DR: The proposed k-means topic modeling (KTM) approach is applicable for classification and clustering tasks in text mining and achieves higher performance with a comparison of its competitors LDA and LSA.
Proceedings ArticleDOI

Automatic Text summarization in Gujarati language

TL;DR: In this article , a statistical text summarization technique on Gujarati text which is one of the resource-poor South Asian languages has been performed by using TF-IDF, LSA, and LDA methods on the custom dataset.

Automatic Text summarization in Gujarati language

TL;DR: In this paper , a statistical text summarization technique on Gujarati text which is one of the resource-poor South Asian languages has been performed by using TF-IDF, LSA, and LDA methods on the custom dataset.
Proceedings ArticleDOI

Text summarization using Secretary problem

TL;DR: In this article , a mathematical model was proposed to generate summary that does not include some important sentences, which is called secretary problem, which comes under the extractive text summarization method.
Journal ArticleDOI

A novel centroid based sentence classification approach for extractive summarization of COVID-19 news reports

TL;DR: In this paper , a vector space model (VSM) is used to extract a centroid having the lexical pattern of the sentences on those subtopics by the frequently used words in them, which is then used as a query in the VSM for sentence classification and extraction.
References
More filters
Book ChapterDOI

Comparing twitter and traditional media using topic models

TL;DR: This paper empirically compare the content of Twitter with a traditional news medium, New York Times, using unsupervised topic modeling, and finds interesting and useful findings for downstream IR or DM applications.
Proceedings ArticleDOI

Learning to classify short and sparse text & web with hidden topics from large-scale data collections

TL;DR: A general framework for building classifiers that deal with short and sparse text & Web segments by making the most of hidden topics discovered from large-scale data collections that is general enough to be applied to different data domains and genres ranging from Web search results to medical text.
Proceedings ArticleDOI

A web-based kernel function for measuring the similarity of short text snippets

TL;DR: This paper defines a similarity kernel function, mathematically analyze some of its properties, and provides examples of its efficacy, and shows the use of this kernel function in a large-scale system for suggesting related queries to search engine users.
Proceedings Article

Characterizing Microblogs with Topic Models

TL;DR: A scalable implementation of a partially supervised learning model (Labeled LDA) that maps the content of the Twitter feed into dimensions that correspond roughly to substance, style, status, and social characteristics of posts is presented.
Journal ArticleDOI

BTM: Topic Modeling over Short Texts

TL;DR: This paper proposes a novel way for short text topic modeling, referred as biterm topic model (BTM), which learns topics by directly modeling the generation of word co-occurrence patterns in the corpus, making the inference effective with the rich corpus-level information.