scispace - formally typeset
Search or ask a question
Journal ArticleDOI

An index to quantify an individual's scientific research output

15 Nov 2005-Proceedings of the National Academy of Sciences of the United States of America (National Academy of Sciences)-Vol. 102, Iss: 46, pp 16569-16572
TL;DR: The index h, defined as the number of papers with citation number ≥h, is proposed as a useful index to characterize the scientific output of a researcher.
Abstract: I propose the index h, defined as the number of papers with citation number ≥h, as a useful index to characterize the scientific output of a researcher.

Content maybe subject to copyright    Report

Citations
More filters
Book
05 Jul 2012
TL;DR: Data matching (also known as record or data linkage, entity resolution, object identification, or field matching) is the task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database as mentioned in this paper.
Abstract: Data matching (also known as record or data linkage, entity resolution, object identification, or field matching) is the task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database. Based on research in various domains including applied statistics, health informatics, data mining, machine learning, artificial intelligence, database management, and digital libraries, significant advances have been achieved over the last decade in all aspects of the data matching process, especially on how to improve the accuracy of data matching, and its scalability to large databases. Peter Christens book is divided into three parts: Part I, Overview, introduces the subject by presenting several sample applications and their special challenges, as well as a general overview of a generic data matching process. Part II, Steps of the Data Matching Process, then details its main steps like pre-processing, indexing, field and record comparison, classification, and quality evaluation. Lastly, part III, Further Topics, deals with specific aspects like privacy, real-time matching, or matching unstructured data. Finally, it briefly describes the main features of many research and open source systems available today. By providing the reader with a broad range of data matching concepts and techniques and touching on all aspects of the data matching process, this book helps researchers as well as students specializing in data quality or data matching aspects to familiarize themselves with recent research advances and to identify open research challenges in the area of data matching. To this end, each chapter of the book includes a final section that provides pointers to further background and research material. Practitioners will better understand the current state of the art in data matching as well as the internal workings and limitations of current systems. Especially, they will learn that it is often not feasible to simply implement an existing off-the-shelf data matching system without substantial adaption and customization. Such practical considerations are discussed for each of the major steps in the data matching process.

713 citations

Journal ArticleDOI
TL;DR: This paper compares the h-indices of a list of highly-cited Israeli researchers based on citations counts retrieved from the Web of Science, Scopus and Google Scholar respectively with results obtained through Google Scholar.
Abstract: This paper compares the h-indices of a list of highly-cited Israeli researchers based on citations counts retrieved from the Web of Science, Scopus and Google Scholar respectively. In several case the results obtained through Google Scholar are considerably different from the results based on the Web of Science and Scopus. Data cleansing is discussed extensively.

672 citations


Cites methods from "An index to quantify an individual'..."

  • ...…bibliometric measure, the h-index was introduced by Jorge Hirsch in August 2005 [HIRSCH, 2005A, B], and it is defined as follows A scientist has index h if h of his/her Np papers have at least h citations each, and the other (Np h) papers have no more than h citations each [HIRSCH, 2005B : 16569]....

    [...]

  • ...The new bibliometric measure, the h-index was introduced by Jorge Hirsch in August 2005 [HIRSCH, 2005A, B], and it is defined as follows A scientist has index h if h of his/her Np papers have at least h citations each, and the other (Np h) papers have no more than h citations each [HIRSCH, 2005B :…...

    [...]

  • ...In this paper we tried to provide a partial answer by considering the h-indexes [HIRSCH, 2005A, B] of a group of highly cited researchers based on each of the three citation databases....

    [...]

Journal ArticleDOI
TL;DR: A new open-source software tool, SciMAT, which performs science mapping analysis within a longitudinal framework that provides different modules that help the analyst to carry out all the steps of the science mapping workflow.
Abstract: This article presents a new open-source software tool, SciMAT, which performs science mapping analysis within a longitudinal framework. It provides different modules that help the analyst to carry out all the steps of the science mapping workflow. In addition, SciMAT presents three key features that are remarkable in respect to other science mapping software tools: (a) a powerful preprocessing module to clean the raw bibliographical data, (b) the use of bibliometric measures to study the impact of each studied element, and (c) a wizard to configure the analysis. © 2012 Wiley Periodicals, Inc.

660 citations


Cites background or methods from "An index to quantify an individual'..."

  • ...Bibliometric measures (mainly based on citations) such as the h-index (Alonso et al., 2009; Hirsch, 2005), g-index (Egghe, 2006), hg-index (Alonso et al....

    [...]

  • ...In this sense, SciMAT provides several bibliometric measures based on citations, such as the sum, minimum, maximum, and average citations, or complex measures such as the h-index (Alonso et al., 2009; Hirsch, 2005), g-index (Egghe, 2006), hg-index (Alonso et al....

    [...]

  • ...In this sense, basic measures such as the sum, minimum, maximum, and average citations or complex measures such as the h-index (Alonso et al., 2009; Hirsch, 2005), g-index (Egghe, 2006), hg-index (Alonso et al., 2010), or q2-index (Cabrerizo et al., 2010) can be used, even simultaneously....

    [...]

  • ...…et al. (2011a), the performance analysis uses bibliometric measures and indicators (based on citations), such as the h-index (Alonso et al., 2009; Hirsch, 2005), g-index (Egghe, 2006), hg-index (Alonso et al., 2010), or q2-index (Cabrerizo et al., 2010) to quantify the importance, impact, and…...

    [...]

  • ...In this sense, basic measures such as the sum, minimum, maximum, and average citations or complex measures such as the h-index (Alonso et al., 2009; Hirsch, 2005), g-index (Egghe, 2006), hg-index (Alonso et al....

    [...]

Journal ArticleDOI
TL;DR: Several actions could improve the research landscape: developing a common evaluation framework, agreement on the information to include in research papers, a stronger focus on non-accuracy aspects and user modeling, a platform for researchers to exchange information, and an open-source framework that bundles the available recommendation approaches.
Abstract: In the last 16 years, more than 200 research articles were published about research-paper recommender systems. We reviewed these articles and present some descriptive statistics in this paper, as well as a discussion about the major advancements and shortcomings and an overview of the most common recommendation concepts and approaches. We found that more than half of the recommendation approaches applied content-based filtering (55 %). Collaborative filtering was applied by only 18 % of the reviewed approaches, and graph-based recommendations by 16 %. Other recommendation concepts included stereotyping, item-centric recommendations, and hybrid recommendations. The content-based filtering approaches mainly utilized papers that the users had authored, tagged, browsed, or downloaded. TF-IDF was the most frequently applied weighting scheme. In addition to simple terms, n-grams, topics, and citations were utilized to model users' information needs. Our review revealed some shortcomings of the current research. First, it remains unclear which recommendation concepts and approaches are the most promising. For instance, researchers reported different results on the performance of content-based and collaborative filtering. Sometimes content-based filtering performed better than collaborative filtering and sometimes it performed worse. We identified three potential reasons for the ambiguity of the results. (A) Several evaluations had limitations. They were based on strongly pruned datasets, few participants in user studies, or did not use appropriate baselines. (B) Some authors provided little information about their algorithms, which makes it difficult to re-implement the approaches. Consequently, researchers use different implementations of the same recommendations approaches, which might lead to variations in the results. (C) We speculated that minor variations in datasets, algorithms, or user populations inevitably lead to strong variations in the performance of the approaches. Hence, finding the most promising approaches is a challenge. As a second limitation, we noted that many authors neglected to take into account factors other than accuracy, for example overall user satisfaction. In addition, most approaches (81 %) neglected the user-modeling process and did not infer information automatically but let users provide keywords, text snippets, or a single paper as input. Information on runtime was provided for 10 % of the approaches. Finally, few research papers had an impact on research-paper recommender systems in practice. We also identified a lack of authority and long-term research interest in the field: 73 % of the authors published no more than one paper on research-paper recommender systems, and there was little cooperation among different co-author groups. We concluded that several actions could improve the research landscape: developing a common evaluation framework, agreement on the information to include in research papers, a stronger focus on non-accuracy aspects and user modeling, a platform for researchers to exchange information, and an open-source framework that bundles the available recommendation approaches.

648 citations


Cites methods from "An index to quantify an individual'..."

  • ...Some of the measures—h-index [232], co-citation strength [233] and bibliographic coupling strength [234]—have also been applied by research-paper recommender systems [13,123,...

    [...]

Journal ArticleDOI
02 Mar 2018-Science
TL;DR: The Science of Science (SciSci) as discussed by the authors provides a quantitative understanding of the interactions among scientific agents across diverse geographic and temporal scales, providing insights into the conditions underlying creativity and the genesis of scientific discovery, with the ultimate goal of developing tools and policies that have the potential to accelerate science.
Abstract: BACKGROUND The increasing availability of digital data on scholarly inputs and outputs—from research funding, productivity, and collaboration to paper citations and scientist mobility—offers unprecedented opportunities to explore the structure and evolution of science. The science of science (SciSci) offers a quantitative understanding of the interactions among scientific agents across diverse geographic and temporal scales: It provides insights into the conditions underlying creativity and the genesis of scientific discovery, with the ultimate goal of developing tools and policies that have the potential to accelerate science. In the past decade, SciSci has benefited from an influx of natural, computational, and social scientists who together have developed big data–based capabilities for empirical analysis and generative modeling that capture the unfolding of science, its institutions, and its workforce. The value proposition of SciSci is that with a deeper understanding of the factors that drive successful science, we can more effectively address environmental, societal, and technological problems. ADVANCES Science can be described as a complex, self-organizing, and evolving network of scholars, projects, papers, and ideas. This representation has unveiled patterns characterizing the emergence of new scientific fields through the study of collaboration networks and the path of impactful discoveries through the study of citation networks. Microscopic models have traced the dynamics of citation accumulation, allowing us to predict the future impact of individual papers. SciSci has revealed choices and trade-offs that scientists face as they advance both their own careers and the scientific horizon. For example, measurements indicate that scholars are risk-averse, preferring to study topics related to their current expertise, which constrains the potential of future discoveries. Those willing to break this pattern engage in riskier careers but become more likely to make major breakthroughs. Overall, the highest-impact science is grounded in conventional combinations of prior work but features unusual combinations. Last, as the locus of research is shifting into teams, SciSci is increasingly focused on the impact of team research, finding that small teams tend to disrupt science and technology with new ideas drawing on older and less prevalent ones. In contrast, large teams tend to develop recent, popular ideas, obtaining high, but often short-lived, impact. OUTLOOK SciSci offers a deep quantitative understanding of the relational structure between scientists, institutions, and ideas because it facilitates the identification of fundamental mechanisms responsible for scientific discovery. These interdisciplinary data-driven efforts complement contributions from related fields such as scientometrics and the economics and sociology of science. Although SciSci seeks long-standing universal laws and mechanisms that apply across various fields of science, a fundamental challenge going forward is accounting for undeniable differences in culture, habits, and preferences between different fields and countries. This variation makes some cross-domain insights difficult to appreciate and associated science policies difficult to implement. The differences among the questions, data, and skills specific to each discipline suggest that further insights can be gained from domain-specific SciSci studies, which model and identify opportunities adapted to the needs of individual research fields.

630 citations

References
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors proposed the stretched exponential family as a complement to the often used power law distributions, which has many advantages, among which to be economical with only two adjustable parameters with clear physical interpretation.
Abstract: To account quantitatively for many reported “natural” fat tail distributions in Nature and Economy, we propose the stretched exponential family as a complement to the often used power law distributions. It has many advantages, among which to be economical with only two adjustable parameters with clear physical interpretation. Furthermore, it derives from a simple and generic mechanism in terms of multiplicative processes. We show that stretched exponentials describe very well the distributions of radio and light emissions from galaxies, of US GOM OCS oilfield reserve sizes, of World, US and French agglomeration sizes, of country population sizes, of daily Forex US-Mark and Franc-Mark price variations, of Vostok (near the south pole) temperature variations over the last 400 000 years, of the Raup-Sepkoski's kill curve and of citations of the most cited physicists in the world. We also discuss its potential for the distribution of earthquake sizes and fault displacements. We suggest physical interpretations of the parameters and provide a short toolkit of the statistical properties of the stretched exponentials. We also provide a comparison with other distributions, such as the shifted linear fractal, the log-normal and the recently introduced parabolic fractal distributions.

763 citations

Journal ArticleDOI
TL;DR: The first extensive measurement of the occurrence of Sleeping Beauties in the science literature is reported, derived from the measurements an ‘awakening’ probability function and identified the ‘most extreme Sleeping Beauty so far’.
Abstract: A 'Sleeping Beauty in Science' is a publication that goes unnoticed ('sleeps') for a long time and then, almost suddenly, attracts a lot of attention ('is awakened by a prince'). We here report the -to our knowledge- first extensive measurement of the occurrence of Sleeping Beauties in the science literature. We derived from the measurements an 'awakening' probability function and identified the 'most extreme Sleeping Beauty so far'.

466 citations

01 Jan 2004

96 citations

Journal ArticleDOI

13 citations