scispace - formally typeset
Open AccessJournal ArticleDOI

Research on Topic Detection and Tracking for Online News Texts

Reads0
Chats0
TLDR
A method for the evolution of news topics over time is proposed in this paper to realize the tracking and evolution of topics in the news text set and can effectively detect and track the topic and clearly reflect the trend of topic evolution.
Abstract
With the rapid development of the Internet, the amount of data has grown exponentially. On the one hand, the accumulation of big data provides the basic support for artificial intelligence. On the other hand, in the face of such huge data information, how to extract the knowledge of interest from it has become a matter of general concern. Topic tracking can help people to explore the process of topic development from the huge and complex network texts information. By effectively organizing large-scale news documents, a method for the evolution of news topics over time is proposed in this paper to realize the tracking and evolution of topics in the news text set. First, the LDA (latent Dirichlet allocation) model is used to extract topics from news texts and the Gibbs Sampling method is used to speculate parameters. The topic mining using the K-means method is compared to highlight the advantages of using LDA for topic discovery. Second, the improved single-pass algorithm is used to track news topics. The JS (Jensen-Shannon) divergence is used to measure the topic similarity, and the time decay function is introduced to improve the similarity between topics with the similar time. Finally, the strength of the news topic and the content change of the topic in different time windows are analyzed. The experiments show that the proposed method can effectively detect and track the topic and clearly reflect the trend of topic evolution.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

An Infoveillance System for Detecting and Tracking Relevant Topics from Italian Tweets during the COVID-19 Event

TL;DR: This study presents an in-depth analysis of the main emergent topics discussed during the lockdown phase within the Italian Twitter community and demonstrates via a careful parameter setting the effectiveness of the topic tracking system, tailored to the current Twitter standard API restrictions, in capturing the main sociopolitical events that occurred during this dramatic phase.
Journal ArticleDOI

Visual topic models for healthcare data clustering

TL;DR: This work presents hybrid topic modeling techniques by integrating traditional topic models with visualization procedures to aid in the visualization of topic clouds and health tendencies in the document collection and believes proposed visual topic models viz., Visual Non-Negative Matrix Factorization (VNMF), Visual Latent Dirichlet Allocation (VLDA), and Visual Probabilistic Latent Schematic Indexing (VPLSI).
Book ChapterDOI

A Novel Approach for Selecting Hybrid Features from Online News Textual Metadata for Fake News Detection.

TL;DR: A new way in which humans, in real life, are dealing with news documents is introduced, which can deal with the whole textual content of the news documents by extracting a number of characteristics of those texts and extracting a complex set of other metadata related features.
Journal ArticleDOI

Sampling-based visual assessment computing techniques for an efficient social data clustering

TL;DR: The sampling-based MVS-VAT computing technique is presented to overcome the scalability problem in social data clustering to select sample inter-cluster viewpoints and show the performance comparison between existing and proposed visual methods.
Journal ArticleDOI

Hybrid Topic Cluster Models for Social Healthcare Data

TL;DR: Evaluation and comparison of hybrid topic models are presented in the experimental section for demonstrating the efficiency with different distance measures, include, Euclidean distance, cosine distance, and multi-viewpoint cosine similarity.
References
More filters
Journal ArticleDOI

Finding scientific topics

TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
Proceedings Article

Online Learning for Latent Dirichlet Allocation

TL;DR: An online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA) based on online stochastic optimization with a natural gradient step is developed, which shows converges to a local optimum of the VB objective function.
Journal Article

A Brief Survey of Text Mining.

TL;DR: The main analysis tasks preprocessing, classification, clustering, information extraction and visualization are described and a number of successful applications of text mining are discussed.
Journal ArticleDOI

Face recognition using LDA-based algorithms

TL;DR: A new algorithm is proposed that deals with both of the shortcomings in an efficient and cost effective manner of traditional linear discriminant analysis methods for face recognition systems.
Proceedings ArticleDOI

Discovering evolutionary theme patterns from text: an exploration of temporal text mining

TL;DR: This paper studies a particular TTM task -- discovering and summarizing the evolutionary patterns of themes in a text stream and presents general probabilistic methods for solving this problem through discovering latent themes from text and constructing an evolution graph of themes.
Related Papers (5)