scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Citation-based clustering of publications using CitNetExplorer and VOSviewer

01 May 2017-Scientometrics (Springer Netherlands)-Vol. 111, Iss: 2, pp 1053-1070
TL;DR: Using the approach presented in this paper, bibliometricians are able to carry out sophisticated cluster analyses without the need to have a deep knowledge of clustering techniques and without requiring advanced computer skills.
Abstract: Clustering scientific publications in an important problem in bibliometric research. We demonstrate how two software tools, CitNetExplorer and VOSviewer, can be used to cluster publications and to analyze the resulting clustering solutions. CitNetExplorer is used to cluster a large set of publications in the field of astronomy and astrophysics. The publications are clustered based on direct citation relations. CitNetExplorer and VOSviewer are used together to analyze the resulting clustering solutions. Both tools use visualizations to support the analysis of the clustering solutions, with CitNetExplorer focusing on the analysis at the level of individual publications and VOSviewer focusing on the analysis at an aggregate level. The demonstration provided in this paper shows how a clustering of publications can be created and analyzed using freely available software tools. Using the approach presented in this paper, bibliometricians are able to carry out sophisticated cluster analyses without the need to have a deep knowledge of clustering techniques and without requiring advanced computer skills.

Content maybe subject to copyright    Report

Citation-based clustering of publications using
CitNetExplorer and VOSviewer
Nees Jan van Eck
1
Ludo Waltman
1
Received: 6 June 2016 / Published online: 27 February 2017
The Author(s) 2017. This article is published with open access at Springerlink.com
Abstract Clustering scientific publications in an important problem in bibliometric
research. We demonstrate how two software tools, CitNetExplorer and VOSviewer, can be
used to cluster publications and to analyze the resulting clustering solutions. CitNetEx-
plorer is used to cluster a large set of publications in the field of astronomy and astro-
physics. The publications are clustered based on direct citation relations. CitNetExplorer
and VOSviewer are used together to analyze the resulting clustering solutions. Both tools
use visualizations to support the analysis of the clustering solutions, with CitNetExplorer
focusing on the analysis at the level of individual publications and VOSviewer focusing on
the analysis at an aggregate level. The demonstration provided in this paper shows how a
clustering of publications can be created and analyzed using freely available software
tools. Using the approach presented in this paper, bibliometricians are able to carry out
sophisticated cluster analyses without the need to have a deep knowledge of clustering
techniques and without requiring advanced computer skills.
Keywords Citation Clustering CitNetExplorer VOSviewer
Introduction
Clustering techniques play a prominent role in bibliometric research. They are for instance
used to identify groups of related publications, authors, or journals. Clustering techniques
have been developed mainly in fields such as statistics, computer science, and network
science. Bibliometricians usually do not develop their own clustering techniques, but they
& Nees Jan van Eck
ecknjpvan@cwts.leidenuniv.nl
Ludo Waltman
waltmanlr@cwts.leidenuniv.nl
1
Centre for Science and Technology Studies, Leiden University, Leiden, The Netherlands
123
Scientometrics (2017) 111:1053–1070
DOI 10.1007/s11192-017-2300-7

use existing clustering techniques developed in other fields. They apply these techniques to
bibliometric data sets, sometimes after adapting the techniques to the specific character-
istics of bibliometric data.
When the number of objects to be clustered is relatively limited (e.g., at most a few
hundred objects), analyzing and interpreting the results obtained from a clustering tech-
nique usually does not cause any significant difficulties. However, when dealing with large
numbers of objects, analyzing and interpreting a clustering solution is far from straight-
forward. This can be a problem especially when clustering techniques are applied at the
level of individual publications. We may then have clustering solutions that include many
thousands or even many millions of publications (e.g., Boyack and Klavans
2014; Klavans
and Boyack
2017; Waltman and Van Eck 2012). Making sense of these clustering solutions
can be a serious challenge.
In this paper, our aim is to demonstrate how two software tools that we have developed,
CitNetExplorer (Van Eck and Waltman
2014a, b; www.citnetexplorer.nl) and VOSviewer
(Van Eck and Waltman
2010, 2014b; www.vosviewer.com), can be used to cluster pub-
lications and to analyze the resulting clustering solutions. We use CitNetExplorer to cluster
publications based on their citation relations and to analyze the resulting clustering solu-
tions at the level of individual publications. We use VOSviewer to analyze the clustering
solutions obtained using CitNetExplorer at an aggregate level. CitNetExplorer and VOS-
viewer both rely strongly on visualizations to facilitate the analysis of clustering solutions.
CitNetExplorer, which is an abbreviation of ‘citation network explorer’, is a software
tools that we have developed for analyzing and visualizing citation networks. In the
approach that we take in this paper, we first use CitNetExplorer to cluster publications
based on their citation relations. For this purpose, CitNetExplorer employs a clustering
technique that we have introduced in earlier papers (Waltman and Van Eck
2012, 2013 ).
We then use CitNetExplorer to analyze the resulting clustering solution at the level of
individual publications. To facilitate the analysis of a clustering solution, the following
features of CitNetExplorer are essential:
Visualizing a citation network. CitNetExplorer can be used to visualize a citation
network of publications, with publications shown along a time axis and with colors
indicating the clusters to which publications belong. Using the visualization
functionality of CitNetExplorer, we obtain an overview of the most frequently cited
publications in a citation network, the citation relations between these publications, and
the clusters to which the publications belong.
Drilling down into a citation network. The drill down functionality of CitNetExplorer
can be used to analyze a clustering solution at different levels of detail. We may for
instance start with a visualization at the level of the entire citation network. We may
then perform a drill down into one or more selected clusters, after which we are
provided with a visualization at the level of the subnetwork consisting of the
publications belonging to the selected clusters.
Searching for publications. We can search for publications based on title, publication
year, author name, and journal name. The search functionality of CitNetExplorer can
be used to find publications that are of special interest, for instance all publications in a
specific journal, and to find out to which clusters these publications belong.
VOSviewer is a software tool for constructing and visualizing bibliometric networks. In
this paper, VOSviewer is used to complement CitNetExplorer. While we use CitNetEx-
plorer to analyze a clustering solution at the level of individual publications, we use
VOSviewer to analyze a clustering solution at an aggregate level. Two visualizations
1054 Scientometrics (2017) 111:1053–1070
123

provided by VOSviewer play an important role. The first visualization shows the clusters in
a clustering solution and the citation relations between these clusters. The second visu-
alization uses a so-called term map to indicate the topics that are covered by a cluster. This
visualization shows the most important terms occurring in the publications belonging to a
cluster and the co-occurrence relations between these terms.
This paper is organized as follows.
Clustering technique section discusses the clus-
tering technique that is used by CitNetExplorer to cluster publications based on their
citation relations.
Results section demonstrates the use of CitNetExplorer and VOS-
viewer to cluster publications and to analyze the resulting clustering solutions.
CitNetExplorer is used to cluster more than 100,000 publications in the field of astronomy
and astrophysics, and CitNetExplorer and VOSviewer are used together to analyze the
resulting clustering solutions. Conclusion section concludes the paper.
Clustering technique
In this paper, we use the clustering technique that is available in the CitNetExplorer
software tool. This section provides a discussion of this clustering technique.
Determining
the relatedness of publications
section explains how the relatedness of publications is
determined, and Clustering publications section describes how publications are assigned
to clusters. We refer to Waltman and Van Eck (
2012, 2013) for a more extensive dis-
cussion of our clustering technique.
Determining the relatedness of publications
To cluster publications, we first need to determine the relatedness of publications. In the
bibliometric literature, the most commonly used approaches to determine the relatedness of
publications are based on either citation relations or word relations (for a more extensive
discussion, see Van Eck and Waltman
2014b). In the case of citation relations, a further
distinction can be made between direct citation relations, bibliographic coupling relations,
and co-citation relations (e.g., Boyack and Klavans
2010; Klavans and Boyack 2017). In
the case of word relations, shared words in the titles, abstracts, or full texts of publications
serve as an indication of the relatedness of publications (e.g., Boyack et al.
2011; Janssens
et al.
2006). Sometimes the relatedness of publications is determined using a combined
approach that takes into account both citation relations and word relations (e.g., Boyack
and Klavans
2010; Janssens et al. 2008).
Our clustering technique determines the relatedness of publications based on direct
citation relations. We prefer to use citation relations rather than word relations because the
use of word relations involves some difficulties. Some words have a different meaning in
different fields of science. These words may incorrectly indicate that publications from
different fields are related to each other. Also, some words are very general and are used in
many different fields. These words do not provide useful information on the relatedness of
publications.
We prefer to use direct citation relations rather than bibliographic coupling relations
(i.e., relations between publications that cite the same publication) or co-citation relations
(i.e., relations between publications that are cited by the same publication) for two reasons.
First, bibliographic coupling and co-citation relations are indirect relations, and we
therefore expect them to provide less accurate information on the relatedness of
Scientometrics (2017) 111:1053–1070 1055
123

publications than direct citation relations (Waltman and Van Eck 2012). Second, there are
many more bibliographic coupling or co-citation relations between publications than direct
citation relations, and therefore the use of bibliographic coupling or co-citation relations
may easily lead to computational problems. (This also applies to the use of word relations.)
Although we prefer the use of direct citation relations over the use of bibliographic cou-
pling or co-citation relations, we acknowledge that the use of direct citation relations also
has a disadvantage. Within the period of analysis, some publications may have no direct
citation relations with other publications. When using direct citation relations, these
publications cannot be properly assigned to a cluster. This problem is especially serious
when the period of analysis is relatively short. When using bibliographic coupling relations
rather than direct citation relations, one usually does not have this problem. We note that,
in addition to our own work, the use of direct citation relations is also advocated in recent
work by Klavans and Boyack (
2017).
Clustering publications
After the relatedness of publications has been determined, our clustering technique assigns
publications to clusters. Each publication is assigned to exactly one cluster. Hence, there is
no overlap of clusters and there are no publications without a cluster assignment. It may be
argued that there should be room for publications to be assigned to more than one cluster.
However, allowing publications to be assigned to multiple clusters introduces significant
technical challenges. For this reason, we prefer to assign publications to a single cluster
only. For most publications, we believe that it is reasonable to assign them to just one
cluster.
Publications are assigned to clusters by maximizing a quality function. The quality
function that is used has been introduced in an earlier paper (Waltman and Van Eck
2012).
This quality function is a variant of the well-known modularity function of Newman and
Girvan (
2004) and Newman (2004) developed in the field of network science. The quality
function is very similar to the quality function resulting from the so-called constant Potts
model proposed by Traag et al. (
2011). Our quality function has an important advantage
over the popular modularity function. The modularity function suffers from a problem
known as the resolution limit (Fortunato and Barthe
´
lemy
2007). This problem causes the
modularity function to yield counterintuitive results in certain situations. As shown by
Traag et al. (
2011), our quality function does not suffer from the resolution limit problem.
More specifically, our clustering technique assigns publications to clusters by maxi-
mizing the quality function
Qðx
1
; ...; x
n
Þ¼
X
n
i¼1
X
n
j¼1
d x
i
; x
j

a
ij
c
2n

; ð1Þ
where n denotes the number of publications, a
ij
denotes the relatedness of publication i
with publication j, c denotes a so-called resolution parameter, and x
i
denotes the cluster to
which publication i is assigned. The function d(x
i
, x
j
) equals 1 if x
i
= x
j
and 0 otherwise.
The relatedness of publication i with publication j is given by
a
ij
¼
c
ij
P
n
k¼1
c
ik
; ð2Þ
where c
ij
equals 1 if either publication i cites publication j or publication j cites publication
i and c
ij
equals 0 otherwise. Hence, if there is a direct citation relation between publications
1056 Scientometrics (2017) 111:1053–1070
123

i and j, the relatedness of publication i with publication j is inversely proportional to the
total number of direct citation relations of publication i. If there is no direct citation
relation between publications i and j, the relatedness of the publications equals 0. Notice
that our clustering technique ignores the direction of a citation (i.e., no distinction is made
between publication i citing publication j and publication j citing publication i).
The value of the resolution parameter c in (1) should be chosen based on the purpose of
the cluster analysis. Higher values of this parameter will yield a larger number of clusters.
In other words, the higher the value of c, the higher the level of detail of the clustering
solution that will be obtained. In CitNetExplorer, the default value of c is 1. However, we
emphasize that there is no generally optimal value of c. Our recommendation to users of
our clustering technique is to try out different values of c and to choose the value that
seems to give the most useful results for the specific needs of a user.
In order to maximize the quality function in (
1), our clustering technique uses the smart
local moving algorithm introduced by Waltman and Van Eck (
2013). This algorithm offers
a more sophisticated alternative to the popular Louvain algorithm for modularity opti-
mization (Blondel et al.
2008). When the smart local moving algorithm and the Louvain
algorithm are given a similar amount of computing time, the smart local moving algorithm
typically identifies a clustering solution with a significantly higher value for the quality
function. We refer to Waltman and Van Eck (
2013) for an extensive comparison of the two
algorithms.
Our clustering technique usually identifies a relatively limited number of larger clusters
and a more substantial number of smaller clusters. Sometimes clusters are very small and
for instance include only one or two publications. Because in many cases small clusters are
of limited interest, a minimum cluster size parameter can be specified. Clusters that are too
small can be either discarded or merged with other clusters. We refer to Waltman and Van
Eck (
2012) for a discussion of the approach that we take to merge small clusters with larger
ones.
Results
We now demonstrate how CitNetExplorer and VOSviewer can be used to cluster publi-
cations and to analyze the resulting clustering solutions. In our demonstration, we work
with a large data set of publications in the field of astronomy and astrophysics. We
emphasize that in this paper it is not our aim to assess the quality of our clustering solutions
or to compare our clustering solutions with other alternative solutions. We do not have the
domain knowledge required to provide an in-depth interpretation of our clusters and to
assess their quality. For a comparison of our clustering solutions with other alternative
solutions, we refer to the comparison paper by Velden et al. (
2017) in this special issue.
Data
We use the ‘Astro data set’ that is also used in other papers in this special issue. A general
introduction to the data set is provided in the introductory paper by Gla
¨
ser et al. (
2017)in
this special issue. The data set was extracted from the Web of Science bibliographic
database. It includes all publications of the document types ‘article’, ‘letter’, and ‘pro-
ceedings paper’ published between 2003 and 2010 in journals belonging to the Web of
Science subject category ‘Astronomy and Astrophysics’. The number of publications in the
Scientometrics (2017) 111:1053–1070 1057
123

Citations
More filters
Journal ArticleDOI
TL;DR: The main findings revealed the networks of current topics as they appear in the publications such as business models, the circular economy, circularbusiness models, value, supply chain, transition, resource, waste, and reuse, and their most prevalent relationships.
Abstract: This study investigates how the circular economy and business models are related in the current business and management literature Based on bibliometric analytical procedures, 253 articles were retrieved from the Scopus, Web of Science, and ScienceDirect scientific databases The articles were analyzed according to network analysis principles, and key terms were mapped into a network We used VOSviewer to build the network, explore the most‐researched terms and their relationships, and identify less‐explored terms and research gaps We furthermore conducted a qualitative review of selected publications to provide an illustration of quantitative results and delve deeper into the research topics The main findings revealed the networks of current topics as they appear in the publications such as business models, the circular economy, circular business models, value, supply chain, transition, resource, waste, and reuse, and their most prevalent relationships The results also highlighted several emerging topics such as those connected with managerial, supply‐side, demand‐side, networking, performance, and contextual considerations of circular business models

211 citations

Journal ArticleDOI
TL;DR: Information gaps are identified to inform the community, industry, government authorities about research directions for IoT in food safety, and the most frequently used communication technologies were Internet, radio frequency identifications (RFID) and wireless sensor networks (WSN).
Abstract: Background Internet of Things (IoT) is growing exponentially and can become an enormous source of information. IoT has provided new opportunities in different domains but also challenges are apparent that must be addressed. Little attention has been paid to the potential use of IoT in the food safety domain and therefore the aim of this study was to fill this gap. Scope and approach This paper reviews the use of IoT technology in food safety. A literature review was conducted using academic documents written in English language and published in peer-reviewed scientific journals. The relevant articles were analysed using the bibliometric networks to investigate the relationships between authors, countries, and content. Key findings and conclusions IoT in food safety is a relatively new approach; the first article appeared in 2011 and has increased since then. Majority of these studies were performed by Chinese universities and the main IoT applications reported were on food supply chains to trace food products, followed by monitoring of food safety and quality. The vast majority of publications were related to food, meat, cold chain products and agricultural products. These studies used sensors to monitor mainly temperature, humidity, and location. The most frequently used communication technologies were Internet, radio frequency identifications (RFID) and wireless sensor networks (WSN). This article identifies knowledge gaps to inform the community, industry, government authorities about research directions for IoT in food safety.

169 citations

Journal ArticleDOI
TL;DR: A bibliometric analysis of the literature on policy mixes in both fields as well as their emerging connections is presented in this paper, where five major themes in the policy mix literature and summarise the contributions made by the articles in the special issue to these: methodological advances; policy making and implementation; actors and agency; evaluating policy mixes; and the coevolution of policy mixes and socio-technical systems.

163 citations

Journal ArticleDOI
TL;DR: In this paper, the authors conduct a systematic review by adopting citation network analysis (CNA), which helps identify seven independent and interdependent research domains, which depict (or constitute) a whole picture of omni-channel management.

142 citations

Journal ArticleDOI
TL;DR: In this paper, a Structured Literature Review (SLR) about the strategic role of Intellectual Capital (IC) for achieving sustainable development goals (SDGs) is provided, which offers an outline of past and present literature and frames a future research agenda.

137 citations

References
More filters
Journal ArticleDOI
TL;DR: It is demonstrated that the algorithms proposed are highly effective at discovering community structure in both computer-generated and real-world network data, and can be used to shed light on the sometimes dauntingly complex structure of networked systems.
Abstract: We propose and study a set of algorithms for discovering community structure in networks-natural divisions of network nodes into densely connected subgroups. Our algorithms all share two definitive features: first, they involve iterative removal of edges from the network to split it into communities, the edges removed being identified using any one of a number of possible "betweenness" measures, and second, these measures are, crucially, recalculated after each removal. We also propose a measure for the strength of the community structure found by our algorithms, which gives us an objective metric for choosing the number of communities into which a network should be divided. We demonstrate that our algorithms are highly effective at discovering community structure in both computer-generated and real-world network data, and show how they can be used to shed light on the sometimes dauntingly complex structure of networked systems.

12,882 citations


"Citation-based clustering of public..." refers methods in this paper

  • ...This quality function is a variant of the well-known modularity function of Newman and Girvan (2004) and Newman (2004) developed in the field of network science....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a simple method to extract the community structure of large networks based on modularity optimization, which is shown to outperform all other known community detection methods in terms of computation time.
Abstract: We propose a simple method to extract the community structure of large networks. Our method is a heuristic method that is based on modularity optimization. It is shown to outperform all other known community detection methods in terms of computation time. Moreover, the quality of the communities detected is very good, as measured by the so-called modularity. This is shown first by identifying language communities in a Belgian mobile phone network of 2 million customers and by analysing a web graph of 118 million nodes and more than one billion links. The accuracy of our algorithm is also verified on ad hoc modular networks.

11,078 citations

Journal ArticleDOI
TL;DR: VOSviewer’s ability to handle large maps is demonstrated by using the program to construct and display a co-citation map of 5,000 major scientific journals.
Abstract: We present VOSviewer, a freely available computer program that we have developed for constructing and viewing bibliometric maps. Unlike most computer programs that are used for bibliometric mapping, VOSviewer pays special attention to the graphical representation of bibliometric maps. The functionality of VOSviewer is especially useful for displaying large bibliometric maps in an easy-to-interpret way. The paper consists of three parts. In the first part, an overview of VOSviewer’s functionality for displaying bibliometric maps is provided. In the second part, the technical implementation of specific parts of the program is discussed. Finally, in the third part, VOSviewer’s ability to handle large maps is demonstrated by using the program to construct and display a co-citation map of 5,000 major scientific journals.

7,719 citations


"Citation-based clustering of public..." refers methods in this paper

  • ...…our aim is to demonstrate how two software tools that we have developed, CitNetExplorer (Van Eck and Waltman 2014a, b; www.citnetexplorer.nl) and VOSviewer (Van Eck and Waltman 2010, 2014b; www.vosviewer.com), can be used to cluster publications and to analyze the resulting clustering solutions....

    [...]

Journal ArticleDOI
TL;DR: An algorithm is described which gives excellent results when tested on both computer-generated and real-world networks and is much faster, typically thousands of times faster, than previous algorithms.
Abstract: Many networks display community structure--groups of vertices within which connections are dense but between which they are sparser--and sensitive computer algorithms have in recent years been developed for detecting this structure. These algorithms, however, are computationally demanding, which limits their application to small networks. Here we describe an algorithm which gives excellent results when tested on both computer-generated and real-world networks and is much faster, typically thousands of times faster, than previous algorithms. We give several example applications, including one to a collaboration network of more than 50,000 physicists.

5,127 citations


"Citation-based clustering of public..." refers methods in this paper

  • ...This quality function is a variant of the well-known modularity function of Newman and Girvan (2004) and Newman (2004) developed in the field of network science....

    [...]

Journal ArticleDOI
TL;DR: An information theoretic approach is introduced that reveals community structure in weighted and directed networks of large-scale biological and social systems and reveals a directional pattern of citation from the applied fields to the basic sciences.
Abstract: To comprehend the multipartite organization of large-scale biological and social systems, we introduce an information theoretic approach that reveals community structure in weighted and directed networks. We use the probability flow of random walks on a network as a proxy for information flows in the real system and decompose the network into modules by compressing a description of the probability flow. The result is a map that both simplifies and highlights the regularities in the structure and their relationships. We illustrate the method by making a map of scientific communication as captured in the citation patterns of >6,000 journals. We discover a multicentric organization with fields that vary dramatically in size and degree of integration into the network of science. Along the backbone of the network—including physics, chemistry, molecular biology, and medicine—information flows bidirectionally, but the map reveals a directional pattern of citation from the applied fields to the basic sciences.

4,051 citations


"Citation-based clustering of public..." refers methods in this paper

  • ...For instance, in a recent study (Šubelj et al. 2015), we found indications suggesting that the map equation technique, used together with the Infomap optimization algorithm (Bohlin et al. 2014; Rosvall and Bergstrom 2008), may give particularly good results....

    [...]