A collective topic model for milestone paper discovery
Summary (2 min read)
1. INTRODUCTION
- Academic literature surveying plays a vital role in academic research; researchers can learn what has been done, what research gaps might exist and what potential research directions to work on.
- Academic search engines such as Google Scholar 1 and CiteSeerX 2 enable researchers to find related literatures or prior arts.
- The authors experimental results show that paper importance is well captured by their model; authorship and published venues have considerable influence on milestone paper discovery.
3. PROBABILISTIC TOPIC MODEL
- The importance of a paper depends on a variety of factors, including the authority of authors, the publication venue and co-citation relationship with other papers.
- Since authors and venues are linked with documents in the academic document collection, the authors build a “virtual document” for each author and venue by aggregating all documents associated with that author or venue (they call the result author document and venue document, respectively).
- This way, for each author or venue the authors also derive a bag of citation IDs.
- Based on [3], the authors assume that the multipletyped documents (paper document, author document and venue document) have a common set of latent topics and each topic is represented as the distribution over citations.
- Then, the problem about milestone paper discovery is defined as follows.
3.1 Model Description
- Table 1 describes meanings of the notations used in their model.
- Each document is represented as the distribution over topics and each topic is represented as the distribution over citations.
- Then, the process of generating an academic document is as follows: for each citation in that document, firstly sample a topic zk according to the distribution from paper topic distribution δ(z; d) or author topic distribution ζ(z; a) or venue topic distribution ψ(z; v) based on the document type.
- Then, draw a citation c from the sampled topic distribution φ(:; zk) in topic citation distribution φ(c; z).
- The authors developed their model based on PLSA [4].
3.2 Parameter Inference
- The authors use the Expectation-Maximization (EM) algorithm for parameter inference.
- Each E-step computes the lower bound function Q of L(θ).
- In the first E-step, the posterior probabilities are randomly initialized.
4.1 Dataset
- The ACL Anthology Network (ANN) [7] was used in their experiments.
- This dataset is also used in previous work [8]; thus, the authors can use it to perform some comparisons with [8].
- Figure 2 shows the perplexity scores during model estimation for different values of k.
- From this graph, the authors can see that a value of k around 150 is appropriate for this dataset, since it gives the lowest perplexity score among all tested values.
4.2 Experimental Results
- 2.1 Results of Topic Milestone Paper Discovery Each topic is presented as the mixture of citations in their model.
- Those citations can be ranked based on φ(ci; zk) and citations ranking at the top for each topic zk are considered as topic milestone papers.
- Table 2 presents topic milestone papers for Sentiment Analysis in [8] while Table 3 shows their results.
- Finally, their model can also indicate popular topics for an author or a venue.
Did you find this useful? Give us your feedback
Citations
38 citations
13 citations
Cites background from "A collective topic model for milest..."
...Lu et al. (2014) proposed a topic model which uses authorship, published venues, and citation relations among scientific documents to detect topics and identify the most notable works in the corpus....
[...]
...The collective topic model (CTM) proposed by Lu et al. (2014) simultaneously discovers topics and related milestone papers in the corpus by modeling papers, authors, and published venues as a bag of citations based on the PLSA model....
[...]
3 citations
Cites background or methods from "A collective topic model for milest..."
...In our model, different from [6, 17], we use the topics extracted from textual information....
[...]
...Lu et al.[6] extend the method by considering additional factors that influence the importance of papers, such as authorship and published venues....
[...]
...Thus, the topics described in [6, 17] are too general but imprecise....
[...]
...Although [6, 17] use “topic” in the discription of their methods, the topic defined in [6, 17] is actually a cluster of documents....
[...]
...In [6, 17], the reference for a document is determined by sampling cited documents according to the topicdocument distribution....
[...]
2 citations
Cites background from "A collective topic model for milest..."
...Topic model is a common technology for the evolution of research themes [31,32] and discovery of high quality papers [33]....
[...]
2 citations
Cites methods from "A collective topic model for milest..."
...Further, topic models have been employed for retrieving relevant papers [7]....
[...]
References
332 citations
"A collective topic model for milest..." refers methods in this paper
...Experiments on a real dataset ANN show that our model can better evaluate the impact of papers and its result is not biased against new publications....
[...]
...The ACL Anthology Network (ANN) [7] was used in our experiments....
[...]
144 citations
"A collective topic model for milest..." refers background in this paper
...Some citation recommendation systems have been designed to recommend appropriate citations for academic works [5, 1]....
[...]
...For example, [5] designed a translation model between citation contexts and reference words, and recommended a list of citations by using long queries such as sentences or a manuscript....
[...]
...Previous work exists in citation recommendation [5, 1]; i....
[...]
140 citations
"A collective topic model for milest..." refers background in this paper
...Some citation recommendation systems have been designed to recommend appropriate citations for academic works [5, 1]....
[...]
...Bethard and Jurafsky [1] designed a feature-based learning model for literature retrieval....
[...]
...Previous work exists in citation recommendation [5, 1]; i....
[...]
...[1] S. Bethard and D. Jurafsky....
[...]
54 citations
"A collective topic model for milest..." refers background or methods or result in this paper
...Table 2: Topic milestone papers (top-10 papers) for Sentiment Analysis from [8]....
[...]
...The top-2 paper in our model is ranked highly, compared with that in [8] (top-4)....
[...]
...This dataset is also used in previous work [8]; thus, we can use it to perform some comparisons with [8]....
[...]
...However, [8] only considered co-citation relations for topic milestone paper discovery....
[...]
...In order to compare with previous work [8], we use the top-10 papers for the topic Sentiment Analysis as an example....
[...]
17 citations