Showing papers on "Dynamic topic model published in 2018"

PDF

Open Access

Journal Article•DOI•

[...]

Martin Gerlach¹, Martin Gerlach², Tiago P. Peixoto³, Eduardo G. Altmann⁴, Eduardo G. Altmann² - Show less +1 more•Institutions (4)

Northwestern University¹, Max Planck Society², University of Bath³, University of Sydney⁴

01 Jul 2018-Science Advances

TL;DR: In this article, a bipartite network of documents and words is used to detect the number of topics and hierarchically cluster both the words and documents, which leads to better topic models than LDA.

...read moreread less

Abstract: One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach that infers the latent topical structure of a collection of documents. Despite their success—particularly of the most widely used variant called latent Dirichlet allocation (LDA)—and numerous applications in sociology, history, and linguistics, topic models are known to suffer from severe conceptual and practical problems, for example, a lack of justification for the Bayesian priors, discrepancies with statistical properties of real texts, and the inability to properly choose the number of topics. We obtain a fresh view of the problem of identifying topical structures by relating it to the problem of finding communities in complex networks. We achieve this by representing text corpora as bipartite networks of documents and words. By adapting existing community-detection methods (using a stochastic block model (SBM) with nonparametric priors), we obtain a more versatile and principled framework for topic modeling (for example, it automatically detects the number of topics and hierarchically clusters both the words and documents). The analysis of artificial and real corpora demonstrates that our SBM approach leads to better topic models than LDA in terms of statistical model selection. Our work shows how to formally relate methods from community detection and topic modeling, opening the possibility of cross-fertilization between these two fields.

...read moreread less

148 citations

Posted Content•

Scalable Generalized Dynamic Topic Models

[...]

Patrick Jähnichen¹, Florian Wenzel¹, Marius Kloft², Stephan Mandt³•Institutions (3)

Humboldt University of Berlin¹, Kaiserslautern University of Technology², Disney Research³

21 Mar 2018-arXiv: Machine Learning

TL;DR: In this article, the authors extend the class of tractable priors from Wiener processes to the generic class of Gaussian processes (GPs) to explore topics that develop smoothly over time, that have a long-term memory or are temporally concentrated (for event detection).

...read moreread less

Abstract: Dynamic topic models (DTMs) model the evolution of prevalent themes in literature, online media, and other forms of text over time. DTMs assume that word co-occurrence statistics change continuously and therefore impose continuous stochastic process priors on their model parameters. These dynamical priors make inference much harder than in regular topic models, and also limit scalability. In this paper, we present several new results around DTMs. First, we extend the class of tractable priors from Wiener processes to the generic class of Gaussian processes (GPs). This allows us to explore topics that develop smoothly over time, that have a long-term memory or are temporally concentrated (for event detection). Second, we show how to perform scalable approximate inference in these models based on ideas around stochastic variational inference and sparse Gaussian processes. This way we can train a rich family of DTMs to massive data. Our experiments on several large-scale datasets show that our generalized model allows us to find interesting patterns that were not accessible by previous approaches.

...read moreread less

24 citations

Journal Article•DOI•

Identifying Objective and Subjective Words via Topic Modeling

[...]

Hanqi Wang¹, Fei Wu¹, Weiming Lu¹, Yi Yang², Xi Li¹, Xuelong Li³, Yueting Zhuang¹ - Show less +3 more•Institutions (3)

Zhejiang University¹, University of Technology, Sydney², Chinese Academy of Sciences³

01 Mar 2018-IEEE Transactions on Neural Networks

TL;DR: A model namedosLDA, which boosts the performance of topic modeling via the joint discovery of latent topics and the different objective and subjective power hidden in every word has lower computational complexity than supervised LDA, especially under an increasing number of topics.

...read moreread less

Abstract: It is observed that distinct words in a given document have either strong or weak ability in delivering facts (i.e., the objective sense) or expressing opinions (i.e., the subjective sense) depending on the topics they associate with. Motivated by the intuitive assumption that different words have varying degree of discriminative power in delivering the objective sense or the subjective sense with respect to their assigned topics, a model named as ${i}$ dentified ${o}$ bjective– ${s}$ ubjective latent Dirichlet allocation (LDA) ( ${i}$ osLDA) is proposed in this paper. In the ${i}$ osLDA model, the simple Polya urn model adopted in traditional topic models is modified by incorporating it with a probabilistic generative process, in which the novel “ Bag-of-Discriminative-Words ” (BoDW) representation for the documents is obtained; each document has two different BoDW representations with regard to objective and subjective senses, respectively, which are employed in the joint objective and subjective classification instead of the traditional Bag-of-Topics representation. The experiments reported on documents and images demonstrate that: 1) the BoDW representation is more predictive than the traditional ones; 2) ${i}$ osLDA boosts the performance of topic modeling via the joint discovery of latent topics and the different objective and subjective power hidden in every word; and 3) ${i}$ osLDA has lower computational complexity than supervised LDA, especially under an increasing number of topics.

...read moreread less

22 citations

Journal Article•DOI•

Latent Dirichlet mixture model

[...]

Jen-Tzung Chien, Chao Hsi Lee¹, Zheng-Hua Tan²•Institutions (2)

National Chiao Tung University¹, Aalborg University²

22 Feb 2018-Neurocomputing

TL;DR: This paper proposes a new latent variable model where latent topics and their proportionals are learned by incorporating the prior based on Dirichlet mixture model, and carries out the inference for LDMM according to the variational Bayes and the collapsed variationalBayes.

...read moreread less

20 citations

Journal Article•DOI•

Learning Methods for Dynamic Topic Modeling in Automated Behavior Analysis

[...]

Olga Isupova¹, Danil Kuzin¹, Lyudmila Mihaylova¹•Institutions (1)

University of Sheffield¹

01 Sep 2018-IEEE Transactions on Neural Networks

TL;DR: This paper proposes new learning algorithms for activity analysis in video based on the expectation maximization approach and variational Bayes inference and proposes an anomaly localization procedure, elegantly embedded in the topic modeling framework.

...read moreread less

Abstract: Semisupervised and unsupervised systems provide operators with invaluable support and can tremendously reduce the operators’ load. In the light of the necessity to process large volumes of video data and provide autonomous decisions, this paper proposes new learning algorithms for activity analysis in video. The activities and behaviors are described by a dynamic topic model. Two novel learning algorithms based on the expectation maximization approach and variational Bayes inference are proposed. Theoretical derivations of the posterior estimates of model parameters are given. The designed learning algorithms are compared with the Gibbs sampling inference scheme introduced earlier in the literature. A detailed comparison of the learning algorithms is presented on real video data. We also propose an anomaly localization procedure, elegantly embedded in the topic modeling framework. It is shown that the developed learning algorithms can achieve 95% success rate. The proposed framework can be applied to a number of areas, including transportation systems, security, and surveillance.

...read moreread less

15 citations

Journal Article•DOI•

Conceptualization topic modeling

[...]

Yi-Kun Tang¹, Yi-Kun Tang², Xian-Ling Mao², Heyan Huang², Xuewen Shi², Guihua Wen³ - Show less +2 more•Institutions (3)

Minjiang University¹, Beijing Institute of Technology², South China University of Technology³

01 Feb 2018-Multimedia Tools and Applications

TL;DR: Wang et al. as mentioned in this paper proposed to assume that each topic is a probability distribution over concepts and then each concept is a probabilistic distribution over words, adding a latent concept layer between topic layer and word layer in traditional three-layer assumption.

...read moreread less

Abstract: Recently, topic modeling has been widely used to discover the abstract topics in the multimedia field. Most of the existing topic models are based on the assumption of three-layer hierarchical Bayesian structure, i.e. each document is modeled as a probability distribution over topics, and each topic is a probability distribution over words. However, the assumption is not optimal. Intuitively, it’s more reasonable to assume that each topic is a probability distribution over concepts, and then each concept is a probability distribution over words, i.e. adding a latent concept layer between topic layer and word layer in traditional three-layer assumption. In this paper, we verify the proposed assumption by incorporating the new assumption in two representative topic models, and obtain two novel topic models. Extensive experiments were conducted among the proposed models and corresponding baselines, and the results show that the proposed models significantly outperform the baselines in terms of case study and perplexity, which means the new assumption is more reasonable than traditional one.

...read moreread less

15 citations

Journal Article•DOI•

Sparse Partially Collapsed MCMC for Parallel Inference in Topic Models

[...]

Måns Magnusson¹, Leif J. Jönsson², Leif J. Jönsson¹, Mattias Villani¹, David Broman³ - Show less +1 more•Institutions (3)

Linköping University¹, Ericsson², Royal Institute of Technology³

31 May 2018-Journal of Computational and Graphical Statistics

TL;DR: A parallel sparse partially collapsed Gibbs sampler is proposed and compared and it is proved that the partially collapsed samplers scale well with the size of the corpus and can be used in more modeling situations than the ordinary collapsed sampler.

...read moreread less

Abstract: Topic models, and more specifically the class of latent Dirichlet allocation (LDA), are widely used for probabilistic modeling of text. Markov chain Monte Carlo (MCMC) sampling from the posterior d...

...read moreread less

13 citations

Posted Content•

Transfer Topic Labeling with Domain-Specific Knowledge Base: An Analysis of UK House of Commons Speeches 1935-2014.

[...]

Alexander Herzog, Peter John, Slava Mikhaylov

03 Jun 2018-arXiv: Computation and Language

TL;DR: This work presents a semi-automatic transfer topic labeling method, using the coding instructions of the Comparative Agendas Project to label topics, and shows that it works well for a majority of the topics it estimates, but finds that institution-specific topics require manual input.

...read moreread less

Abstract: Topic models are widely used in natural language processing, allowing researchers to estimate the underlying themes in a collection of documents. Most topic models use unsupervised methods and hence require the additional step of attaching meaningful labels to estimated topics. This process of manual labeling is not scalable and suffers from human bias. We present a semi-automatic transfer topic labeling method that seeks to remedy these problems. Domain-specific codebooks form the knowledge-base for automated topic labeling. We demonstrate our approach with a dynamic topic model analysis of the complete corpus of UK House of Commons speeches 1935-2014, using the coding instructions of the Comparative Agendas Project to label topics. We show that our method works well for a majority of the topics we estimate; but we also find that institution-specific topics, in particular on subnational governance, require manual input. We validate our results using human expert coding.

...read moreread less

10 citations

Proceedings Article•

Scalable Generalized Dynamic Topic Models

[...]

Patrick Jähnichen¹, Florian Wenzel¹, Marius Kloft², Stephan Mandt³•Institutions (3)

Humboldt University of Berlin¹, Kaiserslautern University of Technology², Disney Research³

31 Mar 2018

TL;DR: This paper extends the class of tractable priors from Wiener processes to the generic class of Gaussian processes (GPs), which allows to explore topics that develop smoothly over time, that have a long-term memory or are temporally concentrated (for event detection).

...read moreread less

10 citations

Proceedings Article•DOI•

Expert Identification Based on Dynamic LDA Topic Model

[...]

Renjun Chi, Bin Wu¹, Lin Wang•Institutions (1)

Beijing University of Posts and Telecommunications¹

01 Jun 2018

TL;DR: This paper presents a domain Expert Identification method with the improved dynamic LDA algorithm which solves these shortcomings of existed methods and considers both the semantic information of the domain and expert authority.

...read moreread less

Abstract: In recent years, human society is transferring from information society to knowledge society Experts mastering professional knowledge are becoming more and more valuable resources in the society, therefore Expert Identification, also known as Expert Finding, became an important research field Existed Expert Identification work is mainly based on traditional information retrieval, or standard topic models Experts finding still faces a lot of problems, such as the missing of semantic information or the inaccuracy without changes over time taken into consideration This paper presents a domain Expert Identification method with the improved dynamic LDA algorithm which solves these shortcomings of existed methods Based on the standard LDA model, this method divides the corpus with large time span according to time to apply the dynamic LDA model and combines profile modelling and file modelling for expert modelling In addition, this method considers both the semantic information of the domain and expert authority Experiments show its feasibility and effectiveness, and its advantage over the traditional static topic model It has opened up new application fields of dynamic topic model

...read moreread less

9 citations

Proceedings Article•DOI•

Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time

[...]

Pankaj Gupta¹, Subburam Rajaram², Hinrich Schütze², Bernt Andrassy•Institutions (2)

Technische Universität München¹, Siemens²

01 Jun 2018

TL;DR: This work introduces a novel unsupervised neural dynamic topic model named as Recurrent Neural Network-Replicated Softmax Model (RNNRSM), where the discovered topics at each time influence the topic discovery in the subsequent time steps, and introduces a metric to quantify the capability ofynamic topic model to capture word evolution in topics over time.

...read moreread less

Abstract: Dynamic topic modeling facilitates the identification of topical trends over time in temporal collections of unstructured documents. We introduce a novel unsupervised neural dynamic topic model named as Recurrent Neural Network-Replicated Softmax Model (RNNRSM), where the discovered topics at each time influence the topic discovery in the subsequent time steps. We account for the temporal ordering of documents by explicitly modeling a joint distribution of latent topical dependencies over time, using distributional estimators with temporal recurrent connections. Applying RNN-RSM to 19 years of articles on NLP research, we demonstrate that compared to state-of-the art topic models, RNNRSM shows better generalization, topic interpretation, evolution and trends. We also introduce a metric (named as SPAN) to quantify the capability of dynamic topic model to capture word evolution in topics over time.

...read moreread less

Journal Article•DOI•

Time-varying dynamic topic model: A better tool for mining microblogs at a global level

[...]

Jun Han¹, Yu Huang¹, Kuldeep Kumar², Sukanto Bhattacharya³•Institutions (3)

Beihang University¹, Bond University², Deakin University³

01 Jan 2018-Journal of Global Information Management

TL;DR: The authors' results show a better performance of mD TM in terms of the quality of the mined information compared to prior research and showcases mDTM as a promising tool for the effective mining of microblogs in a rapidly changing global information space.

...read moreread less

Abstract: In this paper the authors build on prior literature to develop an adaptive and time-varying metadata-enabled dynamic topic model (mDTM) and apply it to a large Weibo dataset using an online Gibbs sampler for parameter estimation. Their approach simultaneously captures the maximum number of inherent dynamic features of microblogs thereby setting it apart from other online document mining methods in the extant literature. In summary, the authors' results show a better performance of mDTM in terms of the quality of the mined information compared to prior research and showcases mDTM as a promising tool for the effective mining of microblogs in a rapidly changing global information space.

...read moreread less

Proceedings Article•DOI•

A Dual Markov Chain Topic Model for Dynamic Environments

[...]

Ayan Acharya, Joydeep Ghosh¹, Mingyuan Zhou¹•Institutions (1)

University of Texas at Austin¹

19 Jul 2018

TL;DR: Novel applications of the Negative-Binomial augmentation trick result in simple, efficient, closed-form updates of all the required conditional posteriors, resulting in far lower computational requirements as well as less sensitivity to initial conditions, as compared to existing approaches.

...read moreread less

Abstract: The abundance of digital text has led to extensive research on topic models that reason about documents using latent representations. Since for many online or streaming textual sources such as news outlets, the number, and nature of topics change over time, there have been several efforts that attempt to address such situations using dynamic versions of topic models. Unfortunately, existing approaches encounter more complex inferencing when their model parameters are varied over time, resulting in high computation complexity and performance degradation. This paper introduces the DM-DTM, a dual Markov chain dynamic topic model, for characterizing a corpus that evolves over time. This model uses a gamma Markov chain and a Dirichlet Markov chain to allow the topic popularities and word-topic assignments, respectively, to vary smoothly over time. Novel applications of the Negative-Binomial augmentation trick result in simple, efficient, closed-form updates of all the required conditional posteriors, resulting in far lower computational requirements as well as less sensitivity to initial conditions, as compared to existing approaches. Moreover, via a gamma process prior, the number of desired topics is inferred directly from the data rather than being pre-specified and can vary as the data changes. Empirical comparisons using multiple real-world corpora demonstrate a clear superiority of DM-DTM over strong baselines for both static and dynamic topic models.

...read moreread less

Journal Article•DOI•

Soft orthogonal non-negative matrix factorization with sparse representation: Static and dynamic

[...]

Yong Chen¹, Hui Zhang¹, Rui Liu¹, Zhiwen Ye¹•Institutions (1)

Beihang University¹

08 Oct 2018-Neurocomputing

TL;DR: An improved framework called SONMFSR (Soft Orthogonal NMF with Sparse Representation), which makes full use of soft orthogonality and sparsity constraints to tackle problems in practical NMF problems, and exhibits great potential in real-world applications.

...read moreread less

Posted Content•

Viscovery: Trend Tracking in Opinion Forums based on Dynamic Topic Models

[...]

Ignacio Espinoza, Marcelo Mendoza, Pablo Ortega, Daniel Rivera, Fernanda Weiss - Show less +1 more

01 May 2018-arXiv: Information Retrieval

TL;DR: The Viscovery platform is created, a platform for opinion summarization and trend tracking that is able to analyze a stream of opinions recovered from forums, using dynamic topic models to uncover the hidden structure of topics behind opinions, characterizing vocabulary dynamics.

...read moreread less

Abstract: Opinions in forums and social networks are released by millions of people due to the increasing number of users that use Web 2.0 platforms to opine about brands and organizations. For enterprises or government agencies it is almost impossible to track what people say producing a gap between user needs/expectations and organizations actions. To bridge this gap we create Viscovery, a platform for opinion summarization and trend tracking that is able to analyze a stream of opinions recovered from forums. To do this we use dynamic topic models, allowing to uncover the hidden structure of topics behind opinions, characterizing vocabulary dynamics. We extend dynamic topic models for incremental learning, a key aspect needed in Viscovery for model updating in near-real time. In addition, we include in Viscovery sentiment analysis, allowing to separate positive/negative words for a specific topic at different levels of granularity. Viscovery allows to visualize representative opinions and terms in each topic. At a coarse level of granularity, the dynamic of the topics can be analyzed using a 2D topic embedding, suggesting longitudinal topic merging or segmentation. In this paper we report our experience developing this platform, sharing lessons learned and opportunities that arise from the use of sentiment analysis and topic modeling in real world applications.

...read moreread less

Proceedings Article•

IMS-DTM: Incremental multi-scale dynamic topic models ∗

[...]

Chen Xilun¹, K. Selçuk Candan, Maria Luisa Sapino²•Institutions (2)

Arizona State University¹, University of Turin²

01 Jan 2018

TL;DR: This paper proposes a Multi- Scale Dynamic Topic Model (MS-DTM) and a complementary Incremental Multi-Scale Dynamic topic Model (IMS-D TM) inference method that can be used to capture latent topics and their dynamics simultaneously, at different scales.

...read moreread less

Abstract: Dynamic topic models (DTM) are commonly used for mining latent topics in evolving web corpora. In this paper, we note that a major limitation of the conventional DTM based models is that they assume a predetermined and fixed scale of topics. In reality, however, topics may have varying spans and topics of multiple scales can co-exist in a single web or social media data stream. Therefore, DTMs that assume a fixed epoch length may not be able to effectively capture latent topics and thus negatively affect accuracy. In this paper, we propose a Multi-Scale Dynamic Topic Model (MS-DTM) and a complementary Incremental Multi-Scale Dynamic Topic Model (IMS-DTM) inference method that can be used to capture latent topics and their dynamics simultaneously, at different scales. In this model, topic specific feature distributions are generated based on a multi-scale feature distribution of the previous epochs; moreover, multiple scales of the current epoch are analyzed together through a novel multi-scale incremental Gibbs sampling technique. We show that the proposed model significantly improves efficiency and effectiveness compared to the single scale dynamic DTMs and prior models that consider only multiple scales of the past.

...read moreread less

Journal Article•DOI•

Constructing Dynamic Topic Models Based on Variational Autoencoder and Factor Graph

[...]

Zhinan Gou¹, Lixin Han¹, Ling Sun¹, Jun Zhu¹, Hong Yan² - Show less +1 more•Institutions (2)

Hohai University¹, City University of Hong Kong²

13 Sep 2018-IEEE Access

TL;DR: This paper introduces a new method for constructing DTM based on variational autoencoder and factor graphs that uses re-parameterization of the variational lower bound to generate a lower bound estimator which is optimized by standard stochastic gradient descent method directly.

...read moreread less

Abstract: Topic models are widely used in various fields of machine learning and statistics. Among them, the dynamic topic model (DTM) is the most popular time-series topic model for the dynamic representations of text corpora. A major challenge is that the posterior distribution of DTM requires a complex reasoning process with the high cost of computing time in modeling, and even a tiny change of model requires restructuring. For these reasons, the variability and generality of DTM is so poor that DTM is difficult to be carried out. In this paper, we introduce a new method for constructing DTM based on variational autoencoder and factor graphs. This model uses re-parameterization of the variational lower bound to generate a lower bound estimator which is optimized by standard stochastic gradient descent method directly. At the same time, the optimization process is simplified by integrating the dynamic factor graph in the state space to achieve a better model. The experimental dataset uses a journal paper corpus that mainly focuses on natural language processing and spans twenty-five years (1984–2009) from DBLP. Experiment results indicate that the proposed method is effective and feasible by comparing several state-of-the-art baselines.

...read moreread less

A Hybrid Approach for Dynamic Topic Models with Fluctuating Number of Topics.

[...]

Christin Katharina Kreutz

01 Jan 2018

TL;DR: This work outlines a research agenda for approaching that task by using LDA as a base in combination with the observation of state transitions in topics at consecutive times with a fixed number of topics k.

...read moreread less

Abstract: Scientific communities are always changing and evolving. Topics of today might split or even disappear in the future, other topics might merge or appear at some time. Nowadays, the closest we come to picture these developments are dynamic topic models which come with a fixed number of topics k. It would be desirable to omit k. This work outlines a research agenda for approaching that task by using LDA as a base in combination with the observation of state transitions in topics at consecutive times.

...read moreread less

Dissertation•

Using topic models to detect behaviour patterns for healthcare monitoring

[...]

Ruth Jemma White

01 Jan 2018

TL;DR: The application of topic models, a machine learning algorithm, to detect behaviour patterns in different types of data produced by a monitoring system is presented, suggesting potential for dynamic topic models to identify changes in routines that could aid early diagnosis of chronic diseases.

...read moreread less

Abstract: Healthcare systems worldwide are facing growing demands on their resources due to an ageing population and increase in prevalence of chronic diseases. Innovative residential healthcare monitoring systems, using a variety of sensors are being developed to help address these needs. Interpreting the vast wealth of data generated is key to fully exploiting the benefits offered by a monitoring system. This thesis presents the application of topic models, a machine learning algorithm, to detect behaviour patterns in different types of data produced by a monitoring system. Latent Dirichlet Allocation was applied to real world activity data with corresponding ground truth labels of daily routines. The results from an existing dataset and a novel dataset collected using a custom mobile phone app, demonstrated that the patterns found are equivalent of routines. Long term monitoring can identify changes that could indicate an alteration in health status. Dynamic topic models were applied to simulated long term activity datasets to detect changes in the structure of daily routines. It was shown that the changes occurring in the simulated data can successfully be detected. This result suggests potential for dynamic topic models to identify changes in routines that could aid early diagnosis of chronic diseases. Furthermore, chronic conditions, such as diabetes and obesity, are related to quality of diet. Current research findings on the association between eating behaviours, especially snacking, and the impact on diet quality and health are often conflicting. One problem is the lack of consistent definitions for different types of eating event. The novel application of Latent Dirichlet Allocation to three nutrition datasets is described. The results demonstrated that combinations of food groups representative of eating event types can be detected. Moreover, labels assigned to these combinations showed good agreement with alternative methods for labelling eating event types.

...read moreread less

Patent•

Dynamic short text cluster searching method

[...]

Ma Tinghuai, Zhao Yuwei, Honghao Zhou, Jie Cao

04 May 2018

TL;DR: In this article, a dynamic short text cluster searching method is proposed, where short text stream data are used for building a short-term topic model and a long-term historic topic model is synthesized to amend the short-time topic model in a data stream to obtain the probability distribution of topics and feature words, clustering is performed by the conditional probability of thetext and the topics, and dynamic accurate searching of the keywords is formed.

...read moreread less

Abstract: The invention discloses a dynamic short text cluster searching method. According to the method, short text stream data are used for building a short-term topic model and a long-term historic topic model is synthesized to amend the short-term topic model in a data stream to obtain the probability distribution of topics and feature words, clustering is performed by the conditional probability of thetext and the topics, and dynamic accurate searching of the keywords is formed. The dynamic topic model is built, the keyword searching function changing along the time is realized, the problems of sparsity of the short text data, information loss and the like are solved by a polynomial mixed topic model, and the efficiency and the performance of information searching are improved.

...read moreread less

Proceedings Article•DOI•

On Temporal Cluster Analysis for Early Identifying In-trouble Students in an Academic Credit System

[...]

Le Minh Chau¹, Vo Thi Ngoc Chau¹, Nguyen Hua Phung¹•Institutions (1)

Ho Chi Minh City University of Technology¹

01 Nov 2018

TL;DR: An unsupervised learning approach is used to determine those students with higher effectiveness and no preparation of other labeled data sets and obtains better temporal clusters with dynamic topic modeling, suitable for the early in-trouble student identification task.

...read moreread less

Abstract: Early in-trouble student identification in an academic credit system is a challenging popular task in the educational data mining field. Only the first few semesters of the students can be observed for the task so that the in-trouble students can be recognized soon and have enough time for improving their study performance. The task can be tackled with different machine learning approaches. In this paper, we use the unsupervised learning approach to determine those students with higher effectiveness and no preparation of other labeled data sets. In this approach, a temporal cluster analysis method is proposed in our work based on the temporal clusters returned by dynamic topic models. In addition, we consider temporal characteristics in the study performance of each student to form a pattern from the temporal clusters he/she belongs to over the time. Similar students share similar patterns and therefore, allowing us to determine the pattern types of in-trouble students and recognize them more accurately. In an evaluation study, experimental results show that our method outperforms the other unsupervised and supervised learning methods with higher Recall and F-measure values. It also obtains better temporal clusters with dynamic topic modeling. As a result, our method is suitable for the early in-trouble student identification task.

...read moreread less

Proceedings Article•DOI•

Collaboratively Learning Latent Factors and Correlations for New Paper Influence Predication

[...]

Lei Shen¹, Yuqing Sun¹, Xin Li¹•Institutions (1)

Shandong University¹

09 May 2018

TL;DR: A novel method to predicate a new paper influence by collaboratively learning the latent vectors of paper features and correlations through the Factorization Machine method, which does not require the citation information to evaluate a paper quality.

...read moreread less

Abstract: There are an increasing number of papers published every year. It is desired for researchers to find the new high-quality papers, which is also a challenging task due to the lack of citation information. In this paper, we propose a novel method to predicate a new paper influence by collaboratively learning the latent vectors of paper features and correlations. We propose the concept topic related authority to integrate the dynamic topic model with paper citations so as to learn how content and authors influence a paper quality. We adopt the Factorization Machine method to collaboratively learn the latent vectors of correlations between different paper features. Comparing with traditional methods, it does not require the citation information to evaluate a paper quality, which is appropriate for new published papers. We conduct extensive evaluation against a real dataset crawled from ACM Digital Library. The results show that our method outperforms the other methods.

...read moreread less