Book ChapterDOI
Topic Modeling on Online News Extraction
Aashka Sahni,Sushila Palwe +1 more
- pp 611-622
Reads0
Chats0
TLDR
A word co-occurrence network-based model named WNTM is presented, which works for both long and short news by overcoming its shortcomings, and is intended to create a news recommendation system, which would recommend news to the user according to user preference.Abstract:
News media includes print media, broadcast news, and Internet (online newspapers, news blogs, etc.). The proposed system intends to collect news data from such diverse sources, capture the varied perceptions, summarize, and present the news. It involves identifying topic from real-time news extractions, then perform clustering of the news documents based on the topics. Previous approaches, like LDA, identify topics efficiently for long news texts, however, fail to do so in case of short news texts. In short news texts, the issues of acute sparsity and irregularity are prevalent. In this paper, we present a solution for topic modeling, i.e, a word co-occurrence network-based model named WNTM, which works for both long and short news by overcoming its shortcomings. It effectively works without wasting much time and space complexity. Further, we intend to create a news recommendation system, which would recommend news to the user according to user preference.read more
Citations
More filters
Journal ArticleDOI
An Efficient Topic Modeling Approach for Text Mining and Information Retrieval through K-means Clustering
TL;DR: The proposed k-means topic modeling (KTM) approach is applicable for classification and clustering tasks in text mining and achieves higher performance with a comparison of its competitors LDA and LSA.
Proceedings ArticleDOI
Automatic Text summarization in Gujarati language
TL;DR: In this article , a statistical text summarization technique on Gujarati text which is one of the resource-poor South Asian languages has been performed by using TF-IDF, LSA, and LDA methods on the custom dataset.
Automatic Text summarization in Gujarati language
TL;DR: In this paper , a statistical text summarization technique on Gujarati text which is one of the resource-poor South Asian languages has been performed by using TF-IDF, LSA, and LDA methods on the custom dataset.
Proceedings ArticleDOI
Text summarization using Secretary problem
TL;DR: In this article , a mathematical model was proposed to generate summary that does not include some important sentences, which is called secretary problem, which comes under the extractive text summarization method.
Journal ArticleDOI
A novel centroid based sentence classification approach for extractive summarization of COVID-19 news reports
TL;DR: In this paper , a vector space model (VSM) is used to extract a centroid having the lexical pattern of the sentences on those subtopics by the frequently used words in them, which is then used as a query in the VSM for sentence classification and extraction.
References
More filters
Book ChapterDOI
Comparing twitter and traditional media using topic models
TL;DR: This paper empirically compare the content of Twitter with a traditional news medium, New York Times, using unsupervised topic modeling, and finds interesting and useful findings for downstream IR or DM applications.
Proceedings ArticleDOI
Learning to classify short and sparse text & web with hidden topics from large-scale data collections
TL;DR: A general framework for building classifiers that deal with short and sparse text & Web segments by making the most of hidden topics discovered from large-scale data collections that is general enough to be applied to different data domains and genres ranging from Web search results to medical text.
Proceedings ArticleDOI
A web-based kernel function for measuring the similarity of short text snippets
TL;DR: This paper defines a similarity kernel function, mathematically analyze some of its properties, and provides examples of its efficacy, and shows the use of this kernel function in a large-scale system for suggesting related queries to search engine users.
Proceedings Article
Characterizing Microblogs with Topic Models
TL;DR: A scalable implementation of a partially supervised learning model (Labeled LDA) that maps the content of the Twitter feed into dimensions that correspond roughly to substance, style, status, and social characteristics of posts is presented.
Journal ArticleDOI
BTM: Topic Modeling over Short Texts
TL;DR: This paper proposes a novel way for short text topic modeling, referred as biterm topic model (BTM), which learns topics by directly modeling the generation of word co-occurrence patterns in the corpus, making the inference effective with the rich corpus-level information.