(Open Access) The R- and AR-indices: Complementing the h-index (2007) | Jin Bihui

Chinese Science Bulletin

Springer-Verlag

www.scichina.com www.springerlink.com Chinese Science Bulletin | March 2007 | vol. 52 | no. 6 | 855-863

ARTICLES SCIENTOMETRICS

The R- and AR-indices: Complementing the h-index

JIN BiHui

1†

, LIANG LiMing

2,3

, Ronald ROUSSEAU

3,4,5

& Leo EGGHE

3,5

National Science Library, Chinese Academy of Sciences, Beijing 100080, China;

Institute for Science, Technology and Society, Henan Normal University, Xinxiang 453002, China;

University of Antwerp (UA), IBW, B-2610 Wilrijk, Belgium;

KHBO, Industrial Sciences and Technology, B-8400, Oostende, Belgium;

Universiteit Hasselt (UHasselt), Agoralaan, B-3590 Diepenbeek, Belgium

Based on the foundation laid by the h-index we introduce and study the R- and AR-indices. These new

indices eliminate some of the disadvantages of the h-index, especially when they are used in combina-

tion with the h-index. The R-index measures the h-core’s citation intensity, while AR goes one step

further and takes the age of publications into account. This allows for an index that can actually in-

crease and decrease over time. We propose the pair (h, AR) as a meaningful indicator for research

evaluation. We further prove a relation characterizing the h-index in the power law model.

h-index, A-index, R-index, AR-index, g-index, performance evaluation, power law

1 The Hirsch index

The h-index, also known as the Hirsch index, was in-

troduced by Hirsch

[1]

as an indicator for lifetime

achievement. Considering a scientist’s list of publica-

tions, ranked according to the number of citations re-

ceived, the h-index is defined as the highest rank such

that the first h publications received each at least h cita-

tions. It became soon clear that the h-index can not only

be used for lifetime achievements, but also in the con-

text of many―― but not all―― other source-item rela-

tionships

[2,3]

. Consequently, the Hirsch index has been

calculated for journal citations

[2,4]

, topics

[5,6]

, library

loans per category

[7]

, and, pre-dating its actual intro-

duction, even cycling

[8]

. In this paper we will, however,

mainly use the terminology of publications and cita-

tions.

1.1 The Hirsch core

All publications ranked between rank 1 and rank h form

the Hirsch core. If there are several publications with the

same number of citations, then one may use two ap-

proaches to determine the Hirsch core. Either one in-

cludes all publications with h citations (hence the Hirsch

core may contain more than h elements), or one intro-

duces a secondary criterion for ranking. A good idea is

ranking articles with the same number of citations in

anti-chronological order so that more recent articles

have a larger probability to belong to the Hirsch core

than older ones. The Hirsch core can be considered as a

group of high-performance publications, with respect to

the scientist’s career. Hence the term ‘high-performance’

should be understood in a relative sense.

1.2 Advantages and disadvantages of the h-index

We recall some advantages and disadvantages of the

h-index that have been put forward in the literature

[1,9]

Advantages

●

It is a mathematically simple index.

●

It encourages a large amount of high quality (at

least highly

visible

) work.

●

The h-index can be applied to any level of aggre-

gation.

●

It combines two types of activity (in the original

setting this is citation impact and publications).

Received February 5, 2007; accepted February 26, 2007

doi: 10.1007/s11434-007-0145-9

†Corresponding author (email: jinbh@mail.las.ac.cn)

Supported by a Major State Basic Research Special Program China under grant (No.

2004CCC00400) and National Natural Science Foundation of China (Grant No.

70376019)

856 JIN BiHui et al. Chinese Science Bulletin | March 2007 | vol. 52 | no. 6 | 855-863

●

It is a robust indicator

[10]

. Increasing the number

of publications alone does not have an immediate effect

on this index.

●

Single peaks (top publications) have hardly any

influence on the h-index.

●

In principle, any document type can be included.

●

Publications that are hardly ever cited do not in-

fluence the h-index. In this way, the h-index discourages

publishing unimportant work.

●

It has been shown that the h-index is closely cor-

related to total publication output

[1]

Disadvantages

●

The h-index, in its original setting

[1]

, puts new-

comers at a disadvantage since both publication output

and observed citation rates will be relatively low. In

other words, it is based on long-term observations.

●

The index allows scientists to rest on their laurels

since the number of citations received may increase

even if no new papers are published.

●

The h-index is only useful for comparing the bet-

ter scientists in a field. It does not discriminate among

average scientists.

●

This indicator can never decrease.

●

The h-index is only weakly sensitive to the num-

ber of citations received. Indeed, when a scientist’s

h-index is equal to h, then this scientist’s first h articles

received at least h times h, i.e. h

citations. This lower

bound is the only relation that logically exists between

publications and citations, when the h-index is known.

The two previously mentioned disadvantages may be

summarized by stating that the h-index lacks sensitivity

to performance changes.

Moreover, the h-index suffers from the same prob-

lems as all simple indicators that use citations.

●

Like most pure citation measures it is field-de-

pendent, and may be influenced by self-citations.

●

There is a problem finding reference standards.

●

There exist many more versatile indicators

[11]

●

It is rather difficult to collect all data necessary for

the determination of the h-index. Often a scientist’s

complete publication list is necessary in order to dis-

criminate between scientists with the same name and

initial. We refer to this problem as the precision prob-

lem.

It seems that in most applications colleagues have

used only Web of Science data. Such a practice is not

implied by the definition of the h-index, but when re-

stricting data to WoS data this punishes colleagues who

have highly cited articles in conference proceedings or

journals, including web journals, not covered by the

Web of Science (WoS).

Although (or because?) the h-index is a relatively

simple indicator it immediately attracted a lot of atten-

tion from the scientific community

[12

―

18]

1.3 Other h-type indices

In view of the advantages and disadvantages mentioned

above it is no surprise that colleagues proposed some

simple variations on the h-index idea

[19,20]

, elaborated

mathematical models

[3,21,22]

and proposed some new

‘Hirsch-type’ indices trying to overcome some of the

disadvantages. Among these we mention Egghe’s

g-index

[23,24]

, Kosmulski’s H

(2)

-index

[25]

and Jin’s

A-index

[26]

For the g-index as well as for the H

(2)

-index one

draws the same list as for the h-index. The g-index, on

the one hand, is defined as the highest rank such that the

cumulative sum of the number of citations received is

larger than or equal to the square of this rank. Clearly h

≤g. The H

(2)

-index, on the other hand, is k if k is the

highest rank such that the first k publications received

each at least k

citations. The main advantage of this

index is that it reduces the precision problem. We think

however that this index is not sensitive enough

[7]

and

will not consider it anymore in this article. The g-index

clearly overcomes the problem that the h-index does not

include an indicator for the internal changes of the

Hirsch core. Yet, it requires drawing a longer list than

necessary for the h-index, hence increasing the precision

problem.

1.4 The A-index and the new R-index

Jin’s A-index achieves the same goal as the g-index,

namely correcting for the fact that the original h-index

does not take the exact number of citations of articles

included in the h-core into account. This index is simply

defined as the average number of citations received by

the publications included in the Hirsch core. The name

of this index is derived from the fact that it is just an

average (A). Mathematically, this is,

cit

∑

(1)

In formula (1) the numbers of citations (cit

) are

ranked in decreasing order. Note that, as long as the

Hirsch core contains exactly h elements, the A-index is

unambiguously defined. The A-index, moreover, uses

JIN BiHui et al. Chinese Science Bulletin | March 2007 | vol. 52 | no. 6 | 855-863 857

ARTICLES SCIENTOMETRICS

the same data as the h-index so that the precision prob-

lem is exactly the same as for the original h-index, and

is not increased as in the case of the g-index. Clearly

≤A. Yet, the A-index suffers from another problem

illustrated by the following fictitious case. Assume that

scientist X

has published 20 articles, one cited 10 times

and all other ones just once. Scientist X

has published

30 articles, one cited 10 times and all other ones exactly

twice. Clearly, scientist X

is the better one. This is ex-

pressed by their h-indices which are 1 for X

and 2 for

. Yet their A-indices are 10 for X

and 6 for X

. The

better scientist is ‘punished’ for having a higher h-index,

as the A-index involves a division by h. This is, however,

only a small problem which can easily be solved by

simply taking the sum, or, the square root of the sum.

Taking the square root has the advantage of leading to

indicator values which are not very high and of the same

dimension as the A-index. As this new index is calcu-

lated using a (square) root we refer to it as the R-index.

As a mathematical formula the R-index is defined as

Rcit

∑

(2)

Clearly, R=

h . In general one may write R(X,Y),

where X denotes a particular scientist and Y the year for

which the R-index has been calculated. As this is of no

importance in our investigations we omit the symbols X

and Y. It is clear that h

≤R as each cit

is at least equal to

h. In the special case where each cit

is exactly equal to

h, R = h. This nice result is another advantage of using

the square root of the sum, and not the sum itself.

1.5 Further relations between h, A, R and g

We have already observed that h≤g, h≤A and that R =

h . Now we show one less obvious relation between

A and g, and hence between h, A, R and g.

Proposition 1. The following inequalities always

hold:

≥g≥h. (3)

Proof. The last inequality is already known. Now

cit

∑

≥

cit

∑

This inequality holds because the citations (cit

) are

ranked in decreasing order, hence the average number of

citations of the first m articles is a decreasing function of

m. As the g-index satisfies the relation

cit

∑

≥

cit

∑

≥g.

This proves the first inequality in line (3).

The following corollary, involving the four indices

under study follows immediately.

Corollary.

R = .

h ≥ .

h ≥ h . (4)

In practice the

R-index is correlated to the h-index

(see further) but, especially for individual scientists,

does add another view on scientist’s achievements.

1.6 Relations between h, A, R and g in the power law

model

In this section we show how the four indices: h, A, R and

g are related in the power law model. The power law

model

[27]

assumes that the number of sources producing

x items, e.g. authors’ articles receiving citations, is given

by the function

:[1, [ ]0, ]: .

FCx

+∞ → → (5)

In eq. (5) C is a strictly positive constant, and

> 1.

Equivalently

[27]

, the corresponding rank-frequency func-

tion (number of citations received by the article ranked

r) is given by the function G:

]][ [

:0, 1, : ()

GT rGr

→+∞ → = (6)

with B,

> 0. The relation between the parameters α and

−

(7)

In the power law model the four Hirsch-type indices are

defined as follows:

h is the unique solution of r = G(r),

g is the unique solution of

() ,

rGsds=

∫

()

Grdr

∫

and

() .

RGrdr=

∫

Note that we do not claim that actual sources follow a

power law: we just apply this model as a first approxi-

mation of an observed frequency distribution. Assuming

further that

> 2, we prove the following proposition.

Proposition 2. Assuming a power law model as

858 JIN BiHui et al. Chinese Science Bulletin | March 2007 | vol. 52 | no. 6 | 855-863

described above with

> 2 or equivalently 0 <

< 1, we

have

−

⎛⎞

⎜⎟

−

⎝⎠

and

−

(8)

−

⎛⎞

⎜⎟

−

⎝⎠

and

1/2

Rgh

−

⎛⎞

⎜⎟

−

⎝⎠

(9)

Proof. A is defined as the average number of cita-

tions received by publications belonging to the Hirsch

core. Hence

AdrB

−

∫

1/( 1)

= (by Theorem C)

[3]

, and by eq. (7) this

result implies that

11 11

122

Bh h

ββ

αα

βαα

−

++−

−−

== =

−−−

This proves the first equality of line (8). It is further

shown by Egghe

[16]

that

−

⎛⎞

⎜⎟

−

⎝⎠

(10)

Eliminating h from eqs. (8) and (10) yields the first

equality of line (9). The corresponding relations for R

follow then easily from those for A.

Remark 1. As

> 2 eqs. (8) and (9) imply that A

and R are always larger than h. Moreover, A > g, while R

is larger than the geometric average of h and g

this fol-

lows from the fact that for

> 2

−

⎛⎞

⎜⎟

−

⎝⎠

>1 . Note

that the power law model yields the same inequalities as

in the discrete case.

Remark 2. Eq. (8) or eq. (9) does not prove that h

and A, or h and R are linearly related. The reason is that

in the power law model

= , where T is the total

number of sources (here the total number of publica-

tions). Hence the factor

−

cannot be considered as a

constant.

Finally we prove a very remarkable relation, charac-

terizing the h-index in the power law model.

Characterization Theorem. Assuming a power

law model as described above with

> 2 and denoting

the average production (here: total number of cita-

tions divided by the total number of articles in the au-

thor’s publication list) the following relations hold:

and .Rh

= (11)

Proof. Eqs. (11) follow immediately from equations

(8) and the fact that, in the power law model,

−

(as shown on page 115 of ref. [27]).

This result shows that in the power law model h is the

unique number N such that the average number of cita-

tions of the first N publications is equal to the global

average multiplied by N. Uniqueness follows from the

fact that the average number of citations of the first N

publications is a decreasing function of N, while

N is

an increasing function of N.

1.7 The h-, A-, R- and g-indices are highly correlated

in practice

Notwithstanding remark 2 above, we think that in most

practical cases the four Hirsch-type indices h, A, R and g

are linearly correlated. Indeed, they more or less use the

same, highly restricted, data set, and this with similar

objectives. In order to investigate this we study in this

section a number of practical cases.

Using Egghe’s data for Price awardees

[16]

we calcu-

lated the A- and the R-index of each of these colleagues.

We did the same for publications in the WoS of a num-

ber of physics, chemistry and biology subfields (1996

―

2005) and of the contribution of four large national re-

search institutes in the WoS (2001

―2005). Data were

obtained from the China in World Science Series

[28

―

30]

Details of the calculations can be found in the Appendi-

ces. Table 1 shows the observed Pearson correlation

coefficients (CCs).

Table 1 Correlation coefficients between R and g

Data set CC (R vs. g) CC (R/h vs. g/h)

Price awardees 0.998 0.995

Chemistry subfields 0.999 0.998

Biology subfields 0.999 0.997

Physics subfields 0.999 0.998

CAS physics subfields 0.999 0.995

Max Planck physics subfields 0.999 0.997

CNRS physics subfields 0.998 0.995

RAS physics subfields 0.991 0.959

These data speak for themselves: there is no doubt

that the R-index and the g-index are highly correlated in

practice. The same observation holds for the ratios R/h

and g/h. A similar remark (not shown) holds for A and g,

but with slightly smaller correlations. We further ob-

serve that the CC between R and g is always higher than

⎛

⎜

⎝

⎞

⎟

⎠

JIN BiHui et al. Chinese Science Bulletin | March 2007 | vol. 52 | no. 6 | 855-863 859

ARTICLES SCIENTOMETRICS

that between R and h or g and h. The latter two are very

similar (see appendices for details).

1.8 A preliminary conclusion

It seems that the g-index and R-index are highly corre-

lated while the latter has a computational advantage. Yet,

as a stand-alone index R may be overly sensitive to one

article receiving an extremely high number of cita-

tions

[31]

. In the extreme case one may encounter a scien-

tist with an h-index of 1 and an R-index of 10 (any high

number). This observation similarly applies to the

g-index (in particular when fictitious articles with zero

citations are added

[16]

). For this reason we suggest using

the R-index in conjunction with h. Consequently we

propose, as a preliminary conclusion, the pair (h, R) as a

good indicator for research evaluation. For practical

evaluation purposes applying time windows, e.g. a

5-year window, seems advisable. Moreover, the ratio R/h

might be an interesting indicator in its own right.

2 An age-dependent indicator: The

AR-index

In order to overcome the problem that the h-index may

never decrease and that scientists may, so to speak, ‘rest

on their laurels’ we propose the following adaptation of

the R-index

[32]

2.1 Definition: the AR-index

If a

denotes the age of article j we define the

age-dependent R-index, denoted by AR, by the following

equation:

cit

∑

. (12)

If there are several publications with exactly h citations

then we include the most recent ones in the h-core. This

means that we include those with the more favorable

(cit/a) ratio.

Advantages of the AR-index are clear. Besides taking

the actual number of citations into account, it makes also

use of the age of the publications. In this way, the

h-index is complemented by an index that can actually

decrease. Such behavior is, in our opinion, a necessary

condition for a good research evaluation indicator. We

note that, moreover, the AR-index is based on the

h-index as it makes use of the h-core. For the AR-index

the inequality h

≤AR is not necessarily true anymore,

contrary to the corresponding relation involving the

R-index (see eq. (4)). We note that calculation of the

AR-index only requires the age of the publications in the

h-core, besides the data necessary for the calculation of

the h-index. This does not make the calculation of the

AR-index more difficult than that of the h-index. Note

that for source-item relations where age has no meaning

this indicator just does not apply. This is somewhat

similar to the h-index, which also does not apply for all

possible source-item relations.

Favorable points concerning the R-index also apply

here. Hence we propose the pair (h, AR) as a good indi-

cator for research evaluation.

2.2 An example

We calculated the AR-index over several years for the

articles written by B.C. Brookes after 1971 (WoS publi-

cation and citation data on January 1, 2007). Recall that

B.C. Brookes was a Price awardee in 1989. He died in

1991. Results are shown in Table 2.

Table 2 Evolution of B.C. Brookes’ AR-index

Year R-index AR-index

2002 18.60 3.93

2003 18.81 3.89

2004 18.97 3.84

2005 19.13 3.79

2006 19.34 3.76

2007 19.54 3.73

Brookes’ h-index over the whole period (2002―

2007) stays fixed at h = 12 (hence here h > AR). Be-

tween 2002 and 2007 his R-index increased by 5% while

the AR-index decreased by about 5%. A year written in

the first column of Table 2 stands for January 1 of that

year. The (average) age of an article on January 1 of year

Y is (k-0.5) if the article is published during the year Y-k.

Indeed, if an article is published during the year Y-3 then

it is, on January 1 of the year Y at least two years, and at

most three years old. On average it is 2.5 years old

[33]

This is how we calculated the average age of an article

in order to obtain the AR-index. Figure 1 illustrates the

decrease of Brookes’ AR-index over the latest years.

Figure 1 Decrease of Brookes’ AR-index.

The R- and AR-indices: Complementing the h-index

Figures

Citations

Impact vitality: an indicator based on citing publications in search of excellent scientists

Family-tree of bibliometric indices

Would it be possible to increase the Hirsch-index, π-index or CDS-index by increasing the number of publications or citations only by unity?

Ranking Authors in an Academic Network Using Social Network Measures

An Internet measure of the value of citations

References

An index to quantify an individual's scientific research output

Theory and practise of the g-index

A Hirsch-type index for journals

Comparison of the Hirsch-index with standard bibliometric indicators and with peer judgment for 147 chemistry research groups

Is it possible to compare researchers with different scientific interests

Related Papers (5)

An index to quantify an individual's scientific research output

Theory and practise of the g-index

Are there better indices for evaluation purposes than the h index? A comparison of nine different variants of the h index using data from biomedicine

Comparison of the Hirsch-index with standard bibliometric indicators and with peer judgment for 147 chemistry research groups

A Hirsch-type index for journals

Frequently Asked Questions (2)

Q1. What is the advantage of taking the square root?

Q2. What are the disadvantages of the h-index?