scispace - formally typeset
Open AccessJournal ArticleDOI

Generalized Hirsch h-index for disclosing latent facts in citation networks

Reads0
Chats0
TLDR
The effectiveness and the benefits of the new indices are exhibited to unfold the full potential of the h-index, with extensive experimental results obtained from the DBLP, a widely known on-line digital library.
Abstract
What is the value of a scientist and its impact upon the scientific thinking? How can we measure the prestige of a journal or a conference? The evaluation of the scientific work of a scientist and the estimation of the quality of a journal or conference has long attracted significant interest, due to the benefits by obtaining an unbiased and fair criterion. Although it appears to be simple, defining a quality metric is not an easy task. To overcome the disadvantages of the present metrics used for ranking scientists and journals, J. E. Hirsch proposed a pioneering metric, the now famous h-index. In this article we demonstrate several inefficiencies of this index and develop a pair of generalizations and effective variants of it to deal with scientist ranking and publication forum ranking. The new citation indices are able to disclose trendsetters in scientific research, as well as researchers that constantly shape their field with their influential work, no matter how old they are. We exhibit the effectiveness and the benefits of the new indices to unfold the full potential of the h-index, with extensive experimental results obtained from the DBLP, a widely known on-line digital library.

read more

Content maybe subject to copyright    Report

Jointly published by Akadémiai Kiadó, Budapest Scientometrics, Vol. 72, No. 2 (2007) 253–280
and Springer, Dordrecht DOI: 10.1007/s11192-007-1722-z
Received September 15, 2006
Address for correspondence:
D
IMITRIOS KATSAROS
Aristotle University, Thessaloniki, 54124, Greece
E-mail: dimitris@delab.csd.auth.gr
0138–9130/US $ 20.00
Copyright © 2007 Akadémiai Kiadó, Budapest
All rights reserved
Generalized Hirsch h-index for disclosing latent facts
in citation networks
ANTONIS SIDIROPOULOS
a
DIMITRIOS KATSAROS,
a,b
YANNIS MANOLOPOULOS
a
a
Informatics Department, Aristotle University, Thessaloniki (Greece)
b
Computer & Communications Engineering Department, University of Thessaly, Volos (Greece)
What is the value of a scientist and its impact upon the scientific thinking? How can we
measure the prestige of a journal or a conference? The evaluation of the scientific work of a
scientist and the estimation of the quality of a journal or conference has long attracted significant
interest, due to the benefits by obtaining an unbiased and fair criterion. Although it appears to be
simple, defining a quality metric is not an easy task. To overcome the disadvantages of the present
metrics used for ranking scientists and journals, J. E. Hirsch proposed a pioneering metric, the now
famous h-index. In this article we demonstrate several inefficiencies of this index and develop a
pair of generalizations and effective variants of it to deal with scientist ranking and publication
forum ranking. The new citation indices are able to disclose trendsetters in scientific research, as
well as researchers that constantly shape their field with their influential work, no matter how old
they are. We exhibit the effectiveness and the benefits of the new indices to unfold the full
potential of the h-index, with extensive experimental results obtained from the DBLP, a widely
known on-line digital library.
Introduction
The evaluation of the scientific work of a scientist has long attracted significant
interest, due to the benefits by obtaining an unbiased and fair criterion. Having defined
such a metric we can use it for faculty recruitment, promotion, prize awarding, funding
allocation, comparison of personal scientific merit, etc. Similarly, the estimation of a

A. SIDIROPOULOS et al.: Generalized Hirsch h-index
254 Scientometrics 72 (2007)
publication forum’s (journal or conference) quality is of particular interest, since it
impacts the scientists’ decisions about where to publish their work, the researchers’
preference in seeking for important articles, and so on.
Although, the issue of ranking a scientist or a journal/conference dates back to the
seventies with the seminal work of Eugene Garfield
16
and continued with sparse
publications,
18,20
during the last five years we have witnessed a blossom of this
field
4,6,7,21,22,25–28,31,35
due to the proliferation of digital libraries, which made
available a huge amount of bibliographic data.
Until present there are two major popular ways for evaluating scientific work and a
hybrid of them. The first method is by allowing some contacted experts to perform the
ranking and the second method is based on what is termed citation analysis, which
involves examining the referring articles of an item (scientist/journal/conference). An
amalgamation of them is also possible, although it is more close to the latter approach.
The first method adopts an ad hoc approach, which works by collecting the opinion
of different experts (or not) in a domain. The study reported in Ref. 26 focused in the
area of Information Systems and performed an on-line survey for 87 journals with 1000
respondents approximately, whereas the authors of Ref. 25 conducted the most
extensive survey to date of IS journal rankings. They collected responses from 2559
respondents (32% of the 8741 targeted faculty members in 414 IS departments
worldwide). Instead of using a predetermined journal list, they asked the respondents to
freely nominate their top-four research journals. This kind of works is very interesting,
because they perform a ranking according to readers’ (and authors’) perception, which
is not always adequately expressed through citation analysis, but they suffer from the
fact of being basically “manual” sometimes biased, and not highly computerized
(automated) and objective.
On the other hand, the second way of evaluating the scientific work is by defining
an objective function that calculates some “score” for the “objects” under evaluation,
taking into account the graph structure created by the citations among the published
articles. Defining a quality and representative metric is not an easy task, since it should
account for the productivity of a scientist and the impact of all of his/her work
(analogously for journals/conferences). Most of the existing methods up-to-date are
based on some form of (arithmetics upon) the total number of authored papers, the
average number of authored papers per year, the total number of citations, the average
number of citations per paper, the average number of citations per year, etc. A
comprehensive description for many of them can be found in Ref. 36.
Finally, characteristic works implementing the hybrid approach of combining the
experts’ judge and citation analysis are described in Ref. 22, 38. Their rankings are
realized by taking some averages upon the results obtained from the citation analysis
and experts’ opinion, thus implementing a post-processing step of the two major
approaches.

A. SIDIROPOULOS et al.: Generalized Hirsch h-index
Scientometrics 72 (2007)
255
Motivation for new citation indices
Although there is no clear winner among citation analysis and experts’ assessment,
the former method is usually the preferred method, because it can be performed in a
fully automated and computerized manner and it is able to exploit the wealth of citation
information available in digital libraries.
All the metrics used so far in citation analysis, even those which are based on
popular spectral techniques, like HITS,
23
PageRank
29
and its variations for
bibliometrics, like Ref. 8, present one or more of the following drawbacks (see also
Ref. 19:
x They do not measure the importance or impact of papers, e.g., the metrics
based solely on the total number of papers.
x They are affected by a small number of “big hits” articles, which received
huge number of citations, whereas the rest of the articles may have
negligible total impact, e.g., the metrics based on the total number of
citations.
x They can not measure productivity, e.g., the metrics based on the average
number of citations per paper.
x They have difficulty to set administrative parameters, e.g., the metrics
based on the number x of articles, which have received y citations each, or
the metrics based on the number z of the most cited articles.
To collectively overcome all these disadvantages of the present metrics, in 2005
J. E. Hirsch proposed the pioneering h-index,
19
which, in a short period of time, became
extremely popular.
*
The h-index is defined as follows:
Definition 1. A researcher has h-index h, if h of his/her N
p
articles have received at
least h citations each, and the rest (N
p
–h) articles have received no more than h
citations.
1,19
This metric calculates how broad the research work of a scientist is. The h-index
accounts for both productivity and impact. For some researcher, to have large h-index,
s/he must have a lot of “good” articles, and not just a few “good” articles.
The h-index acts as a lower bound on the real number of citations for a scientist. In
fact, there is a significant gap between the total number of citations as calculated by h-
index and the real number of citations of a scientist. Think that the quantity h will
always be smaller than or equal to the number N
p
of the articles of a researcher; it holds
that h
2
d Nc,tot, where Nc,tot is the total number of citations that the researcher has
*
Notice that the economics literature defines the H index (the Herfindahl-Hirschman index), which is a way
of measuring the concentration of market share held by particular suppliers in a market. The H index is the
sum of squares of the percentages of the market shares held by the firms in a market. If there is a monopoly,
i.e., one firm with all sales, the H index is 10000. If there is perfect competition, with an infinite number of
firms with near-zero market share each, the H index is approximately zero. Other industry structures will have
H indices between zero and 10000.

A. SIDIROPOULOS et al.: Generalized Hirsch h-index
256 Scientometrics 72 (2007)
received. Apparently, the equality holds when all the articles, which contribute to h-
index have received exactly h citations each, which is quite improbable. Therefore, in
the usual case it will hold that h
2
< Nc,tot. To bridge this gap, J. E. Hirsch defined the
index a as follows:
Definition 2. A scientist has a-index a if the following equation holds:
19
N
c,tot = ah
2
(1)
The a-index can be used as a second metric-index for the evaluation and ranking of
scientists. It describes the “magnitude” of each scientist’s “hits”. A large a implies that
some article(s) have received a fairly large number of citations compared to the rest of
its articles and with respect to what the h-index presents.
The introduction of the h-index was a major breakthrough in citation analysis.
Though several aspects of the inefficiency of the original h-index are apparent; or to
state it in its real dimension, significant efforts are needed to unfold the full potential of
h-index. Firstly, the original h-index assigns the same importance to all citations, no
matter what their age is, thus refraining from revealing the trendsetters scientists.
Secondly, the h-index assigns the same importance to all articles, thus making the
young researchers to have a relatively small h-index, because they did not have enough
time either to publish a lot of good articles, or time to accumulate large number of
citation for their good papers. Thus, the h-index can not reveal the brilliant though
young scientists.
Our contributions
The purpose of our work is to extend and generalize the original h-index in such
ways, so as to reveal various latent though strong facts hidden in citation networks. Our
proposals aim to maintain the elegance and ease of computation of the original h-index,
thus we strive for developing relatively simple indexes, since we believe that the
simplicity of the h-index is one of its beauties. In this context, the article makes the
following contributions:
x Introduces two generalizations of the h-index, namely the contemporary
h-index and the trend h-index, which are appropriate for scientist ranking
and are able to reveal brilliant young scientists and trendsetters,
respectively. These two generalizations can also be used for the cases of
conference and journal ranking.
x Introduces a normalized version of the h-index for scientist ranking,
namely the normalized h-index.
x Introduces two variants of the h-index appropriate for journal/conference
ranking, namely the yearly h-index and the normalized yearly h-index.

A. SIDIROPOULOS et al.: Generalized Hirsch h-index
Scientometrics 72 (2007)
257
x Performs an extensive experimental evaluation of the aforementioned
citation indices, using real data from DBLP, an online bibliographic
database.
Developing mathematical models and conducting theoretical analysis of the
properties of the proposed indices is the next step in this work, but it is beyond the
scope of this paper; here we are interested in providing extensive experimental evidence
of the power of the generalizations to the h-index.
Novel citation indices for scientist ranking
After the introduction of the h-index, a number of other proposals followed, either
presenting case studies using it,
3,5,10,11,30,32,34
or describing a new variation of it
13,33
(aiming to bridge the gap between the lower bound of the total number of
citations calculated by h-index and their real number), or presenting normalized
versions of it with respect to the time-span,
2,44
or studying its mathematics and its
performance.
9,12,14,17
Deviating from their line of research, we develop in this article a pair of
generalizations of the h-index for ranking scientists, which are novel citation indices, a
normalized variant of the h-index and a pair of variants of the h-index suitable for
journal/conference ranking.
The contemporary h-index
The original h-index does not take into account the “age” of an article. It may be the
case that some scientist contributed a number of significant articles that produced a
large h-index, but now s/he is rather inactive or retired. Therefore, senior scientists, who
keep contributing nowadays, or brilliant young scientists, who are expected to
contribute a large number of significant works in the near future but now they have only
a small number of important articles due to the time constraint, are not distinguished by
the original h-index. Thus, it arises the need of defining a generalization of the h-index,
in order to account for these facts.
We define a novel score S
c
(i) for an article i based on citation counting, as follows:
__
)(
)1)()((
)( iC
iYnowY
iS
c
G
J
(2)
where Y(i) is the publication year of an article i and C(i) are the articles citing the article
i. If we set G = 1, then S
c
(i) is the number of citations that the article i has received,
divided by the “age” of the article. Since we divide the number of citations with the
time interval, the quantities S
c
(i) will be too small to create a meaningful h-index; thus,
we use the coefficient J. In our experiments reported in the Experiments sections, we
use the value of 4 for the coefficient J. Thus, for an article published during the current

Citations
More filters
Journal ArticleDOI

h-Index: A review focused in its variants, computation and standardization for different scientific fields

TL;DR: This contribution presents a comprehensive review on the h-index and related indicators field, studying their main advantages, drawbacks and the main applications that can be found in the literature.
Journal ArticleDOI

Which h-index? — A comparison of WoS, Scopus and Google Scholar

TL;DR: This paper compares the h-indices of a list of highly-cited Israeli researchers based on citations counts retrieved from the Web of Science, Scopus and Google Scholar respectively with results obtained through Google Scholar.
Proceedings ArticleDOI

The Science of Science

TL;DR: SciSci has revealed choices and trade-offs that scientists face as they advance both their own careers and the scientific horizon, and offers a deep quantitative understanding of the relational structure between scientists, institutions, and ideas, which facilitates the identification of fundamental mechanisms responsible for scientific discovery.
Journal ArticleDOI

What do we know about the h index

TL;DR: The advantages and disadvantages of the h index are described and the studies on the convergent validity of this index are summarized and corrections and complements as well as single-number alternatives are introduced.
Journal ArticleDOI

An index to quantify an individual's scientific research output that takes into account the effect of multiple coauthorship

TL;DR: The index hbar, defined as the number of papers of an individual that have citation count larger than or equal to the citation count of all coauthors of each paper, is proposed as a useful index to characterize the scientific output of a researcher that takes into account the effect of multiple authorship.
References
More filters
Proceedings Article

The PageRank Citation Ranking : Bringing Order to the Web

TL;DR: This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.
Journal ArticleDOI

An index to quantify an individual's scientific research output

TL;DR: The index h, defined as the number of papers with citation number ≥h, is proposed as a useful index to characterize the scientific output of a researcher.
Journal ArticleDOI

Authoritative sources in a hyperlinked environment

TL;DR: This work proposes and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure, and has connections to the eigenvectors of certain matrices associated with the link graph.
Journal ArticleDOI

Citation analysis as a tool in journal evaluation.

Eugene Garfield
- 03 Nov 1972 - 
TL;DR: In 1971, the Institute for Scientfic Information decided to undertake a systematic analysis of journal citation patterns across the whole of science and technology.
Journal ArticleDOI

Theory and practise of the g-index

TL;DR: It is shown that the g-index inherits all the good properties of the h-index and better takes into account the citation scores of the top articles and yields a better distinction between and order of the scientists from the point of view of visibility.
Related Papers (5)