scispace - formally typeset
Search or ask a question
Posted Content

Automatic Term Identification for Bibliometric Mapping

TL;DR: A methodology for automatic term identification is proposed and this methodology is used to select the terms to be included in a term map and it turns out that in general the proposed methodology performs quite well.
Abstract: textA term map is a map that visualizes the structure of a scientific field by showing the relations between important terms in the field. The terms shown in a term map are usually selected manually with the help of domain experts. Manual term selection has the disadvantages of being subjective and labor-intensive. To overcome these disadvantages, we propose a methodology for automatic term identification and we use this methodology to select the terms to be included in a term map. To evaluate the proposed methodology, we use it to construct a term map of the field of operations research. The quality of the map is assessed by a number of operations research experts. It turns out that in general the proposed methodology performs quite well.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: VOSviewer’s ability to handle large maps is demonstrated by using the program to construct and display a co-citation map of 5,000 major scientific journals.
Abstract: We present VOSviewer, a freely available computer program that we have developed for constructing and viewing bibliometric maps. Unlike most computer programs that are used for bibliometric mapping, VOSviewer pays special attention to the graphical representation of bibliometric maps. The functionality of VOSviewer is especially useful for displaying large bibliometric maps in an easy-to-interpret way. The paper consists of three parts. In the first part, an overview of VOSviewer’s functionality for displaying bibliometric maps is provided. In the second part, the technical implementation of specific parts of the program is discussed. Finally, in the third part, VOSviewer’s ability to handle large maps is demonstrated by using the program to construct and display a co-citation map of 5,000 major scientific journals.

7,719 citations

Posted Content
TL;DR: In this report, the new text mining functionality of VOSviewer is presented and a number of examples are given of applications in which VOSViewer is used for analyzing large amounts of text data.
Abstract: VOSviewer is a computer program for creating, visualizing, and exploring bibliometric maps of science. In this report, the new text mining functionality of VOSviewer is presented. A number of examples are given of applications in which VOSviewer is used for analyzing large amounts of text data.

474 citations

Posted Content
TL;DR: It is concluded that in general Maps constructed using VOS provide a more satisfactory representation of a data set than maps constructed using well-known multidimensional scaling approaches.
Abstract: VOS is a new mapping technique that can serve as an alternative to the well-known technique of multidimensional scaling. We present an extensive comparison between the use of multidimensional scaling and the use of VOS for constructing bibliometric maps. In our theoretical analysis, we show the mathematical relation between the two techniques. In our experimental analysis, we use the techniques for constructing maps of authors, journals, and keywords. Two commonly used approaches to bibliometric mapping, both based on multidimensional scaling, turn out to produce maps that suffer from artifacts. Maps constructed using VOS turn out not to have this problem. We conclude that in general maps constructed using VOS provide a more satisfactory representation of a data set than maps constructed using well-known multidimensional scaling approaches.

453 citations


Cites methods from "Automatic Term Identification for B..."

  • ...The keywords data set has already been used in an earlier paper (Van Eck et al., 2010)....

    [...]

Journal IssueDOI
TL;DR: It is concluded that in general maps constructed using VOS provide a more satisfactory representation of a dataset than maps constructing using well-known MDS approaches.
Abstract: VOS is a new mapping technique that can serve as an alternative to the well-known technique of multidimensional scaling (MDS). We present an extensive comparison between the use of MDS and the use of VOS for constructing bibliometric maps. In our theoretical analysis, we show the mathematical relation between the two techniques. In our empirical analysis, we use the techniques for constructing maps of authors, journals, and keywords. Two commonly used approaches to bibliometric mapping, both based on MDS, turn out to produce maps that suffer from artifacts. Maps constructed using VOS turn out not to have this problem. We conclude that in general maps constructed using VOS provide a more satisfactory representation of a dataset than maps constructed using well-known MDS approaches. © 2010 Wiley Periodicals, Inc.

272 citations

Posted Content
TL;DR: The aim in this paper is to provide an overview of the functionality of VOSviewer and to elaborate on the technical implementation of specific parts of the program.
Abstract: textWe present VOSviewer, a computer program that we have developed for constructing and viewing bibliometric maps. VOSviewer combines the VOS mapping technique and an advanced viewer into a single easy-to-use computer program that is freely available to the bibliometric research community. Our aim in this paper is to provide an overview of the functionality of VOSviewer and to elaborate on the technical implementation of specific parts of the program.

217 citations


Cites background or methods from "Automatic Term Identification for B..."

  • ...Such maps are difficult to show on paper (e.g., Van Eck et al., 2008b), and a simple static representation on a computer screen usually also does not provide satisfactory results....

    [...]

  • ...The viewer component of VOSviewer is based on the viewer software used by Van Eck and Waltman (2007b) and Van Eck et al. (2006, 2008b)....

    [...]

  • ...The VOS mapping technique was introduced by Van Eck and Waltman (2007a), who used it in a number of papers (Van Eck & Waltman, 2007b; Van Eck et al., 2006, 2008b)....

    [...]

  • ...In certain cases, this technique produces much better structured maps than multidimensional scaling (Van Eck et al., 2008a)....

    [...]

  • ...Because the VOS mapping technique shows a very good performance (Van Eck et al., 2008a), this technique has been fully integrated into VOSviewer....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

30,570 citations

Proceedings Article
03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

25,546 citations

Book
28 May 1999
TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.
Abstract: Statistical approaches to processing natural language text have become dominant in recent years This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear The book contains all the theory and algorithms needed for building NLP tools It provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations The book covers collocation finding, word sense disambiguation, probabilistic parsing, information retrieval, and other applications

9,295 citations


"Automatic Term Identification for B..." refers result in this paper

  • ...Our idea of a semantic unit is similar to that of a collocation (Manning and Schütze, 1999)....

    [...]

  • ...Our idea of a semantic unit is similar to that of a collocation (Manning and Schütze 1999)....

    [...]

  • ...Statistical approaches to measure unithood are discussed extensively by Manning and Schütze (1999)....

    [...]

Journal ArticleDOI
TL;DR: VOSviewer’s ability to handle large maps is demonstrated by using the program to construct and display a co-citation map of 5,000 major scientific journals.
Abstract: We present VOSviewer, a freely available computer program that we have developed for constructing and viewing bibliometric maps. Unlike most computer programs that are used for bibliometric mapping, VOSviewer pays special attention to the graphical representation of bibliometric maps. The functionality of VOSviewer is especially useful for displaying large bibliometric maps in an easy-to-interpret way. The paper consists of three parts. In the first part, an overview of VOSviewer’s functionality for displaying bibliometric maps is provided. In the second part, the technical implementation of specific parts of the program is discussed. Finally, in the third part, VOSviewer’s ability to handle large maps is demonstrated by using the program to construct and display a co-citation map of 5,000 major scientific journals.

7,719 citations

Journal ArticleDOI
TL;DR: The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words.
Abstract: The term word association is used in a very particular sense in the psycholinguistic literature (Generally speaking, subjects respond quicker than normal to the word nurse if it follows a highly associated word such as doctor ) We will extend the term to provide the basis for a statistical description of a variety of interesting linguistic phenomena, ranging from semantic relations of the doctor/nurse type (content word/content word) to lexico-syntactic co-occurrence constraints between verbs and prepositions (content word/function word) This paper will propose an objective measure based on the information theoretic notion of mutual information, for estimating word association norms from computer readable corpora (The standard method of obtaining word association norms, testing a few thousand subjects on a few hundred words, is both costly and unreliable) The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words

4,272 citations


"Automatic Term Identification for B..." refers background in this paper

  • ...However, sample is a quite general statistical term, while chi-square test is more specific and, consequently, more discriminatory....

    [...]

  • ...(e.g., Church and Hanks 1990; Damerau 1993; Daille et al. 1994) or a likelihood ratio (e.g., Dunning 1993; Daille et al. 1994)....

    [...]