scispace - formally typeset
Search or ask a question
Author

Bolun Chen

Bio: Bolun Chen is an academic researcher. The author has contributed to research in topics: Topic model & Academic writing. The author has an hindex of 3, co-authored 3 publications receiving 22 citations.

Papers
More filters
Journal ArticleDOI
Yongjun Zhang1, Jialin Ma, Zijian Wang1, Bolun Chen, Yongtao Yu 
TL;DR: A pipeline model, named collective topical PageRank, is proposed, which incorporates the venue, the correlations of the scientific topics, and the publication year of each paper into a random walk to evaluate the topic-dependent impact of scientific papers.
Abstract: With the explosive growth of academic writing, it is difficult for researchers to find significant papers in their area of interest. In this paper, we propose a pipeline model, named collective topical PageRank, to evaluate the topic-dependent impact of scientific papers. First, we fit the model to a correlation topic model based on the textual content of papers to extract scientific topics and correlations. Then, we present a modified PageRank algorithm, which incorporates the venue, the correlations of the scientific topics, and the publication year of each paper into a random walk to evaluate the paper’s topic-dependent academic impact. Our experiments showed that the model can effectively identify significant papers as well as venues for each scientific topic, recommend papers for further reading or citing, explore the evolution of scientific topics, and calculate the venues’ dynamic topic-dependent academic impact.

16 citations

Journal ArticleDOI
Yongjun Zhang1, Zijian Wang1, Yongtao Yu, Bolun Chen, Jialin Ma1, Liang Shi 
TL;DR: There is a huge room for a multi-label classification of text-based documents, according to KeyWoRDS Function Terms.
Abstract: This article describes how text documents are a major data structure in the era of big data. With the explosive growth of data, the number of documents with multi-labels has increased dramatically. The popular multi-label classification technology, which is usually employed to handle multinomial text documents, is sensitive to the noise terms of text documents. Therefore, there still exists a huge room for multi-label classification of text documents. This article introduces a supervised topic model, named labeled LDA with function terms (LF-LDA), to filter out the noisy function terms from text documents, which can help to improve the performance of multi-label classification of text documents. The article also shows the derivation of the Gibbs Sampling formulas in detail, which can be generalized to other similar topic models. Based on the textual data set RCV1-v2, the article compared the proposed model with other two state-of-the-art multi-label classifiers, Tuned SVM and labeled LDA, on both Macro-F1 and Micro-F1 metrics. The result shows that LF-LDA outperforms them and has the lowest variance, which indicates the robustness of the LF-LDA classifier. KeyWoRDS Function Terms, Gibbs Sampling, Graph Model, Multi Label, Parameter Estimation, Probability Generation Process, Text Classification, Topic Model

8 citations

Book ChapterDOI
10 Jun 2017
TL;DR: The experimental result on RCV1-v2 textual dataset shows that LF-LDA can outperform the other two state-of-art multi-label classifiers: Tuned SVM and L-L DA on both Macro-F1 and Micro-F 1 metrics, and the low variance also indicates LF- LDA is a robust classifier.
Abstract: The textual data grows explosively with the advent of the era of big data, a significant portion of textual data is text documents labeled with multi-label such as the papers with keywords. Multi-label classification is a power technology to handle the multi-labeled textual data, but a huge room stays for improving the effect of multi-label classifying for textual data. This paper introduces labeled LDA with function terms (LF-LDA), a topic model that extracts noisy function terms from textual data to improve the performance of multi-label classification. The experimental result on RCV1-v2 textual dataset shows that LF-LDA can outperform the other two state-of-art multi-label classifiers: Tuned SVM and L-LDA on both Macro-F1 and Micro-F1 metrics. The low variance also indicates LF-LDA is a robust classifier.

6 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: An analysis of a Deep Learning architecture devoted to text classification, considering the extreme multi-class and multi-label text classification problem, when a hierarchical label set is defined and a methodology named Hierarchical Label Set Expansion (HLSE) is presented.

86 citations

Journal ArticleDOI
TL;DR: It was found that there were some unique sub-fields to Indian Library and Information Science research, such as open access; online exhibition; virtual libraries; multimedia libraries; open source software; library automation; and library management system.
Abstract: This study analyzed 928 full-text research articles retrieved from DESIDOC Journal of Library and Information Technology for the period of 1981–2018 using Latent Dirichlet Allocation. The study further tagged the articles with the modeled topics. 50 core topics were identified throughout the period of 38 years whereas only 26 topics were unique in nature. Bibliometrics, ICT, information retrieval, and user studies were highly researched areas in India for the epoch. Further, Spain and Taiwan showed common research trends and areas as India whereas India has quite distinct research interests from America and China. Therefore, researchers in Library and Information Science in India should pay more attention to the topics which are under-researched. Further, it was found that there were some unique sub-fields to Indian Library and Information Science research, such as open access; online exhibition; virtual libraries; multimedia libraries; open source software; library automation; and library management system. With the passage of time topics evolve over time, new topics emerge, and old ones become obsolete. Topic modeling not only helps the researcher to determine the trending themes or related fields with respect to their field of interest but also helps them to identify new concepts and fields over time.

30 citations

Journal ArticleDOI
TL;DR: This paper presents a meta-analyses of the determinants of infectious disease in eight operation theatres of the immune system and shows clear patterns of decline in the number of vaccinated patients and their ages.
Abstract: Correction to Annals of Applied Statistics 1 (2007) 17--35 [doi:10.1214/07-AOAS114]

27 citations

Proceedings ArticleDOI
01 Jan 2018
TL;DR: This paper presents an analysis on the usage of Deep Neural Networks for extreme multi-label and multiclass text classification, and investigates on the behaviour of the neural networks as function of the training hyperparameters, analysing the link between them and the dataset complexity.
Abstract: In this paper we present an analysis on the usage of Deep Neural Networks for extreme multi-label and multiclass text classification. We will consider two network models: the first one is formed by a word embeddings (WEs) stage followed by two dense layers, hereinafter Dense, and a second model with a convolution stage between the WEs and the dense layers, hereinafter CNN-Dense. We will take into account classification problems characterized by different number of labels, ranging from an order of 10 to an order of 30,000, showing the different performances of the neural networks varying the total label number and the average number of labels for sample, exploiting the hierarchical structure of the label space of the dataset used for experimental assessment. It is worth noting that multi-label classification is an harder problem if compared to multi-class, due to the variable number of labels associated to each sample. We will even investigate on the behaviour of the neural networks as function of the training hyperparameters, analysing the link between them and the dataset complexity. All the result will be evaluated using the PubMed scientific articles collection as

19 citations