scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Digital Libraries in 2014"


Posted Content
TL;DR: In this paper, the authors re-examine the question of the growth of science and analyse it across all disciplines and also separately for the natural sciences and for the medical and health sciences.
Abstract: Many studies in information science have looked at the growth of science. In this study, we re-examine the question of the growth of science. To do this we (i) use current data up to publication year 2012 and (ii) analyse it across all disciplines and also separately for the natural sciences and for the medical and health sciences. Furthermore, the data are analysed with an advanced statistical technique - segmented regression analysis - which can identify specific segments with similar growth rates in the history of science. The study is based on two different sets of bibliometric data: (1) The number of publications held as source items in the Web of Science (WoS, Thomson Reuters) per publication year and (2) the number of cited references in the publications of the source items per cited reference year. We have looked at the rate at which science has grown since the mid-1600s. In our analysis of cited references we identified three growth phases in the development of science, which each led to growth rates tripling in comparison with the previous phase: from less than 1% up to the middle of the 18th century, to 2 to 3% up to the period between the two world wars and 8 to 9% to 2012.

617 citations


Posted Content
TL;DR: An increase in the number of authors leads to an increase in impact, from the beginning of the last century onward, and that this is not due simply to self‐citations.
Abstract: This paper provides the first historical analysis of the relationship between collaboration and scientific impact, using three indicators of collaboration (number of authors, number of addresses, and number of countries) and including articles published between 1900 and 2011. The results demonstrate that an increase in the number of authors leads to an increase in impact--from the beginning of the last century onwards--and that this is not simply due to self-citations. A similar trend is also observed for the number of addresses and number of countries represented in the byline of an article. However, the constant inflation of collaboration since 1900 has resulted in diminishing citation returns: larger and more diverse (in terms of institutional and country affiliation) teams are necessary to realize higher impact. The paper concludes with a discussion of the potential causes of the impact gain in citations of collaborative papers.

235 citations


Posted Content
Lutz Bornmann1
TL;DR: An overview of research into three of the most important altmetrics: microblogging (Twitter), online reference managers (Mendeley and CiteULike) and blogging is provided and the correlation between altmetric counts and citation counts is focused on.
Abstract: Alternative metrics are currently one of the most popular research topics in scientometric research. This paper provides an overview of research into three of the most important altmetrics: microblogging (Twitter), online reference managers (Mendeley and CiteULike) and blogging. The literature is discussed in relation to the possible use of altmetrics in research evaluation. Since the research was particularly interested in the correlation between altmetrics counts and citation counts, this overview focuses particularly on this correlation. For each altmetric, a meta-analysis is calculated for its correlation with traditional citation counts. As the results of the meta-analyses show, the correlation with traditional citations for micro-blogging counts is negligible (pooled r=0.003), for blog counts it is small (pooled r=0.12) and for bookmark counts from online reference managers, medium to large (CiteULike pooled r=0.23; Mendeley pooled r=0.51).

125 citations


Posted Content
TL;DR: It is suggested that power laws in citation distributions, when present, account only for a very small fraction of the published papers and that the power-law scaling parameter is substantially higher than found in the older literature.
Abstract: Modeling distributions of citations to scientific papers is crucial for understanding how science develops. However, there is a considerable empirical controversy on which statistical model fits the citation distributions best. This paper is concerned with rigorous empirical detection of power-law behaviour in the distribution of citations received by the most highly cited scientific papers. We have used a large, novel data set on citations to scientific papers published between 1998 and 2002 drawn from Scopus. The power-law model is compared with a number of alternative models using a likelihood ratio test. We have found that the power-law hypothesis is rejected for around half of the Scopus fields of science. For these fields of science, the Yule, power-law with exponential cut-off and log-normal distributions seem to fit the data better than the pure power-law model. On the other hand, when the power-law hypothesis is not rejected, it is usually empirically indistinguishable from most of the alternative models. The pure power-law model seems to be the best model only for the most highly cited papers in "Physics and Astronomy". Overall, our results seem to support theories implying that the most highly cited scientific papers follow the Yule, power-law with exponential cut-off or log-normal distribution. Our findings suggest also that power laws in citation distributions, when present, account only for a very small fraction of the published papers (less than 1% for most of science fields) and that the power-law scaling parameter (exponent) is substantially higher (from around 3.2 to around 4.7) than found in the older literature.

91 citations


Posted Content
TL;DR: Network analysis shows that China was embedded in this top-layer of internationally co-authored publications, where some of the EU28 member states overtook the United States during this decade; but a clear divide remains between EU15 (Western Europe) and the Accession Countries.
Abstract: The percentages of shares of world publications of the European Union and its member states, China, and the United States have been represented differently as a result of using different databases. An analytical variant of the Web-of-Science (of Thomson Reuters) enables us to study the dynamics in the world publication system in terms of the field-normalized top-1% and top-10% most-frequently-cited publications. Comparing the EU28, USA, and China at the global level shows a top-level dynamics that is different from the analysis in terms of shares of publications: the United States remains far more productive in the top-1% of all papers; China drops out of the competition for elite status; and the EU28 increased its share among the top-cited papers from 2000-2010. Some of the EU28 member states overtook the U.S. during this decade, but a clear divide remains between EU15 (Western Europe) and the Accession Countries. Network analysis shows that internationally co-authored top-1% publications perform far above expectation and also above top-10% ones. In 2005, China was embedded in this top-layer of internationally co-authored publications. These publications often involve more than a single European nation.

84 citations


Posted Content
TL;DR: The objective of this work is to identify the set of highly cited documents in Google Scholar and define their core characteristics: their languages, their file format, or how many of them can be accessed free of charge.
Abstract: The study of highly cited documents on Google Scholar (GS) has never been addressed to date in a comprehensive manner. The objective of this work is to identify the set of highly cited documents in Google Scholar and define their core characteristics: their languages, their file format, or how many of them can be accessed free of charge. We will also try to answer some additional questions that hopefully shed some light about the use of GS as a tool for assessing scientific impact through citations. The decalogue of research questions is shown below: 1. Which are the most cited documents in GS? 2. Which are the most cited document types in GS? 3. What languages are the most cited documents written in GS? 4. How many highly cited documents are freely accessible? 4.1 What file types are the most commonly used to store these highly cited documents? 4.2 Which are the main providers of these documents? 5. How many of the highly cited documents indexed by GS are also indexed by WoS? 6. Is there a correlation between the number of citations that these highly cited documents have received in GS and the number of citations they have received in WoS? 7. How many versions of these highly cited documents has GS detected? 8. Is there a correlation between the number of versions GS has detected for these documents, and the number citations they have received? 9. Is there a correlation between the number of versions GS has detected for these documents, and their position in the search engine result pages? 10. Is there some relation between the positions these documents occupy in the search engine result pages, and the number of citations they have received?

82 citations


Posted Content
Lutz Bornmann1
TL;DR: Altmetrics as discussed by the authors is a term to describe web-based metrics for the impact of publications and other scholarly material by using data from social media platforms (e.g. Twitter or Mendeley).
Abstract: Today, it is not clear how the impact of research on other areas of society than science should be measured. While peer review and bibliometrics have become standard methods for measuring the impact of research in science, there is not yet an accepted framework within which to measure societal impact. Alternative metrics (called altmetrics to distinguish them from bibliometrics) are considered an interesting option for assessing the societal impact of research, as they offer new ways to measure (public) engagement with research output. Altmetrics is a term to describe web-based metrics for the impact of publications and other scholarly material by using data from social media platforms (e.g. Twitter or Mendeley). This overview of studies explores the potential of altmetrics for measuring societal impact. It deals with the definition and classification of altmetrics. Furthermore, their benefits and disadvantages for measuring impact are discussed.

68 citations


Posted Content
TL;DR: It is demonstrated that almost all disciplines show similar numbers of references in the appendices of their papers, and that the average citation rate is far more influenced by the extent to which the papers (cited as references) are included in WoS as linked database records.
Abstract: It is well known in bibliometrics that the average number of citations per paper differs greatly between the various disciplines. The differing citation culture (in particular the different average number of references per paper and thereby the different probability of being cited) is widely seen as the cause of this variation. Based on all Web of Science (WoS) records published in 1990, 1995, 2000, 2005, and 2010 we demonstrate that almost all disciplines show similar numbers of references in the appendices of their papers. Our results suggest that the average citation rate is far more influenced by the extent to which the papers (cited as references) are included in WoS as linked database records. For example, the comparatively low citation rates in the humanities are not at all the result of a lower average number of references per paper but are caused by the low fraction of linked references which refer to papers published in the core journals covered by WoS.

66 citations


Journal ArticleDOI
TL;DR: It is found that China’s international scientific collaboration is focused on a handful of countries, and nearly 95 % international co-authored papers are collaborated with only 20 countries, among which the USA account for more than 40 % of all.
Abstract: Using bibliometric methods, we investigate China's international scientific collaboration from 3 levels of collaborating countries, institutions and individuals. We design a database in SQL Server, and make analysis of Chinese SCI papers based on the corresponding author field. We find that China's international scientific collaboration is focused on a handful of countries. Nearly 95% international co-authored papers are collaborated with only 20 countries, among which the USA account for more than 40% of all. Results also show that Chinese lineage in the international co-authorship is obvious, which means Chinese immigrant scientists are playing an important role in China's international scientific collaboration, especially in English-speaking countries.

64 citations


Posted Content
Lutz Bornmann1
TL;DR: In this article, the authors discuss how impact is generally measured within science and beyond, and the effects impact measurements have on the science system, and which problems are associated with impact measurement.
Abstract: Impact of science is one of the most important topics in scientometrics. Recent developments show a fundamental change in impact measurements from impact on science to impact on society. Since impact measurement is currently in a state of far reaching changes, this paper describes recent developments and facing problems in this area. For that the results of key publications (dealing with impact measurement) are discussed. The paper discusses how impact is generally measured within science and beyond (section 2), which effects impact measurements have on the science system (section 3), and which problems are associated with impact measurement (section 4). The problems associated with impact measurement constitute the focus of this paper: Science is marked by inequality, random chance, anomalies, the right to make mistakes, unpredictability, and a high significance of extreme events, which might distort impact measurements. Scientometricians as the producer of impact scores and decision makers as their consumers should be aware of these problems and should consider them in the generation and interpretation of bibliometric results, respectively.

56 citations


Posted Content
TL;DR: In this paper, the authors examined the evolution of the impact of older scholarly articles and found that the number of citations to older articles has increased substantially over 1990-2013 and that the trend of a growing impact for older articles also holds for even older articles.
Abstract: In this paper, we examine the evolution of the impact of older scholarly articles. We attempt to answer four questions. First, how often are older articles cited and how has this changed over time. Second, how does the impact of older articles vary across different research fields. Third, is the change in the impact of older articles accelerating or slowing down. Fourth, are these trends different for much older articles. To answer these questions, we studied citations from articles published in 1990-2013. We computed the fraction of citations to older articles from articles published each year as the measure of impact. We considered articles that were published at least 10 years before the citing article as older articles. We computed these numbers for 261 subject categories and 9 broad areas of research. Finally, we repeated the computation for two other definitions of older articles, 15 years and older and 20 years and older. There are three conclusions from our study. First, the impact of older articles has grown substantially over 1990-2013. In 2013, 36% of citations were to articles that are at least 10 years old; this fraction has grown 28% since 1990. The fraction of older citations increased over 1990-2013 for 7 out of 9 broad areas and 231 out of 261 subject categories. Second, the increase over the second half (2002-2013) was double the increase in the first half (1990-2001). Third, the trend of a growing impact of older articles also holds for even older articles. In 2013, 21% of citations were to articles >= 15 years old with an increase of 30% since 1990 and 13% of citations were to articles >= 20 years old with an increase of 36%. Now that finding and reading relevant older articles is about as easy as finding and reading recently published articles, significant advances aren't getting lost on the shelves and are influencing work worldwide for years after.

Posted Content
TL;DR: This article examined the sub-field of philosophy of science using a new method developed in information science, Referenced Publication Years Spectroscopy (RPYS), which allows to identify peak years in citations in a field, which promises to help scholars identify the key contributions to a field and revolutionary discoveries in the field.
Abstract: We examine the sub-field of philosophy of science using a new method developed in information science, Referenced Publication Years Spectroscopy (RPYS). RPYS allows us to identify peak years in citations in a field, which promises to help scholars identify the key contributions to a field, and revolutionary discoveries in a field. We discovered that philosophy of science, a sub-field in the humanities, differs significantly from other fields examined with this method. Books play a more important role in philosophy of science than in the sciences. Further, Einstein's famous 1905 papers created a citation peak in the philosophy of science literature. But rather than being a contribution to the philosophy of science, their importance lies in the fact that they are revolutionary contributions to physics with important implications for philosophy of science.

Posted Content
TL;DR: The evolution of the impact of non-elite journals is examined to answer two questions: first, what fraction of the top-cited articles are published in non-Elite journals and how has this changed over time and second, now that finding and reading relevant articles inNon-elites is about as easy as finding andReading articles in elite journals, researchers are increasingly building on and citing work published everywhere.
Abstract: In this paper, we examine the evolution of the impact of non-elite journals. We attempt to answer two questions. First, what fraction of the top-cited articles are published in non-elite journals and how has this changed over time. Second, what fraction of the total citations are to non-elite journals and how has this changed over time. We studied citations to articles published in 1995-2013. We computed the 10 most-cited journals and the 1000 most-cited articles each year for all 261 subject categories in Scholar Metrics. We marked the 10 most-cited journals in a category as the elite journals for the category and the rest as non-elite. There are two conclusions from our study. First, the fraction of top-cited articles published in non-elite journals increased steadily over 1995-2013. While the elite journals still publish a substantial fraction of high-impact articles, many more authors of well-regarded papers in diverse research fields are choosing other venues. The number of top-1000 papers published in non-elite journals for the representative subject category went from 149 in 1995 to 245 in 2013, a growth of 64%. Looking at broad research areas, 4 out of 9 areas saw at least one-third of the top-cited articles published in non-elite journals in 2013. For 6 out of 9 areas, the fraction of top-cited papers published in non-elite journals for the representative subject category grew by 45% or more. Second, now that finding and reading relevant articles in non-elite journals is about as easy as finding and reading articles in elite journals, researchers are increasingly building on and citing work published everywhere. Considering citations to all articles, the percentage of citations to articles in non-elite journals went from 27% in 1995 to 47% in 2013. Six out of nine broad areas had at least 50% of citations going to articles published in non-elite journals in 2013.

Book ChapterDOI
TL;DR: A survey of open knowledge bases, focusing on their geospatial dimension, with particular attention to the crucial issue of the quality of geoknowledge bases, as well as of crowdsourced data.
Abstract: Over the past decade, rapid advances in web technologies, coupled with innovative models of spatial data collection and consumption, have generated a robust growth in geo-referenced information, resulting in spatial information overload. Increasing 'geographic intelligence' in traditional text-based information retrieval has become a prominent approach to respond to this issue and to fulfill users' spatial information needs. Numerous efforts in the Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the Linking Open Data initiative have converged in a constellation of open knowledge bases, freely available online. In this article, we survey these open knowledge bases, focusing on their geospatial dimension. Particular attention is devoted to the crucial issue of the quality of geo-knowledge bases, as well as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic Network, is outlined as our contribution to this area. Research directions in information integration and Geographic Information Retrieval (GIR) are then reviewed, with a critical discussion of their current limitations and future prospects.

Posted Content
TL;DR: The present findings suggest that it would be productive for existing and future mandates to adopt the three identified conditions so as to maximize their effectiveness, and thereby the growth of OA.
Abstract: MELIBEA is a Spanish database that uses a composite formula with eight weighted conditions to estimate the effectiveness of Open Access mandates (registered in ROARMAP). We analyzed 68 mandated institutions for publication years 2011-2013 to determine how well the MELIBEA score and its individual conditions predict what percentage of published articles indexed by Web of Knowledge is deposited in each institution's OA repository, and when. We found a small but significant positive correlation (0.18) between MELIBEA score and deposit percentage. We also found that for three of the eight MELIBEA conditions (deposit timing, internal use, and opt-outs), one value of each was strongly associated with deposit percentage or deposit latency (immediate deposit required, deposit required for performance evaluation, unconditional opt-out allowed for the OA requirement but no opt-out for deposit requirement). When we updated the initial values and weights of the MELIBEA formula for mandate effectiveness to reflect the empirical association we had found, the score's predictive power doubled (.36). There are not yet enough OA mandates to test further mandate conditions that might contribute to mandate effectiveness, but these findings already suggest that it would be useful for future mandates to adopt these three conditions so as to maximize their effectiveness, and thereby the growth of OA.

Posted Content
TL;DR: Findings show that the use of scientific knowledge negatively affects patent influence outside the biotechnology industry, while it positively contributes to make a patent more relevant for the assignee's subsequent technological developments.
Abstract: The present paper extends the literature investigating key drivers leading certain patents to exert a stronger influence on the subsequent technological developments (inventions) than other ones. We investigated six key determinants, as (i) the use of scientific knowledge, (ii) the breadth of the technological base, (iii) the existence of collaboration in patent development, (iv) the number of claims, (v) the scope, and (vi) the novelty, and how the effect of these determinants varies when patent influence - as measured by the number of forward citations the patent received - is distinguished as within and across the industrial and organizational boundaries. We conducted an empirical analysis on a sample of 5671 patents granted to 293 US biotechnology firms from 1976 to 2003. Results reveal that the contribution of the determinants to patent influence differs across the domains that are identified by the industrial and organizational boundaries. Findings, for example, show that the use of scientific knowledge negatively affects patent influence outside the biotechnology industry, while it positively contributes to make a patent more relevant for the assignee's subsequent technological developments. In addition, the broader the scope of a patent the higher the number of citations the patent receives from subsequent non-biotechnology patents. This relationship is inverted-U shaped when considering the influence of a patent on inventions granted to other organizations than the patent's assignee. Finally, the novelty of a patent is inverted-U related with the influence the patent exerts on the subsequent inventions granted across the industrial and organizational boundaries.

Book ChapterDOI
TL;DR: This chapter addresses the statistical analysis of percentiles and shows how examinations of effect sizes and confidence intervals can lead to a clear understanding of citation impact differences.
Abstract: In this chapter we address the statistical analysis of percentiles: How should the citation impact of institutions be compared? In educational and psychological testing, percentiles are already used widely as a standard to evaluate an individual’s test scores—intelligence tests for example—by comparing them with the scores of a calibrated sample. Percentiles, or percentile rank classes, are also a very suitable method for bibliometrics to normalize citations of publications in terms of the subject category and the publication year and, unlike the mean-based indicators (the relative citation rates), percentiles are scarcely affected by skewed distributions of citations. The percentile of a certain publication provides information about the citation impact this publication has achieved in comparison to other similar publications in the same subject category and publication year. Analyses of percentiles, however, have not always been presented in the most effective and meaningful way. New APA guidelines (Association American Psychological, Publication manual of the American Psychological Association (6 ed.). Washington, DC: American Psychological Association (APA), 2010) suggest a lesser emphasis on significance tests and a greater emphasis on the substantive and practical significance of findings. Drawing on work by Cumming (Understanding the new statistics: effect sizes, confidence intervals, and meta-analysis. London: Routledge, 2012) we show how examinations of effect sizes (e.g., Cohen’s d statistic) and confidence intervals can lead to a clear understanding of citation impact differences.

Journal ArticleDOI
TL;DR: In this paper, the authors present a review of 108 indicators that can potentially be used to measure performance on the individual author level, and examine the complexity of their calculations in relation to what they are supposed to reflect and ease of end-user application.
Abstract: An increasing demand for bibliometric assessment of individuals has led to a growth of new bibliometric indicators as well as new variants or combinations of established ones. The aim of this review is to contribute with objective facts about the usefulness of bibliometric indicators of the effects of publication activity at the individual level. This paper reviews 108 indicators that can potentially be used to measure performance on the individual author level, and examines the complexity of their calculations in relation to what they are supposed to reflect and ease of end-user application.

Posted Content
TL;DR: A framework for assessing temporal coherence between a root resource and its embedded resource depending on Memento-Datetime, Last-Modified datetime, and entity body is introduced.
Abstract: Most archived HTML pages embed other web resources, such as images and stylesheets. Playback of the archived web pages typically provides only the capture date (or Memento-Datetime) of the root resource and not the Memento-Datetime of the embedded resources. In the course of our research, we have discovered that the Memento-Datetime of embedded resources can be up to several years in the future or past, relative to the Memento-Datetime of the embedding root resource. We introduce a framework for assessing temporal coherence between a root resource and its embedded resource depending on Memento-Datetime, Last-Modified datetime, and entity body.

Posted Content
TL;DR: In this article, the authors compare the network of aggregated journal-journal citation relations provided by the Journal Citation Reports (JCR) 2012 of the Science and Social Science Citation Indexes (SCI and SSCI) with similar data based on Scopus 2012.
Abstract: We compare the network of aggregated journal-journal citation relations provided by the Journal Citation Reports (JCR) 2012 of the Science and Social Science Citation Indexes (SCI and SSCI) with similar data based on Scopus 2012. First, global maps were developed for the two sets separately; sets of documents can then be compared using overlays to both maps. Using fuzzy-string matching and ISSN numbers, we were able to match 10,524 journal names between the two sets; that is, 96.4% of the 10,936 journals contained in JCR or 51.2% of the 20,554 journals covered by Scopus. Network analysis was then pursued on the set of journals shared between the two databases and the two sets of unique journals. Citations among the shared journals are more comprehensively covered in JCR than Scopus, so the network in JCR is denser and more connected than in Scopus. The ranking of shared journals in terms of indegree (that is, numbers of citing journals) or total citations is similar in both databases overall (Spearman's \r{ho} > 0.97), but some individual journals rank very differently. Journals that are unique to Scopus seem to be less important--they are citing shared journals rather than being cited by them--but the humanities are covered better in Scopus than in JCR.

Journal ArticleDOI
TL;DR: In this paper, the authors report results of an empirical study analyzing the evolution of the thus defined timed h-index in dependence on the length of the citation time window, which is a measure of the influence of citations to rather old publications.
Abstract: The h-index has been shown to increase in many cases mostly because of citations to rather old publications. This inertia can be circumvented by restricting the evaluation to a citation time window. Here I report results of an empirical study analyzing the evolution of the thus defined timed h-index in dependence on the length of the citation time window.

Posted Content
TL;DR: The article-level metrics (ALMs) or altmetrics becomes a new trendsetter in recent times for measuring the impact of scientific publications and their social outreach to intended audiences.
Abstract: The article-level metrics (ALMs) or altmetrics becomes a new trendsetter in recent times for measuring the impact of scientific publications and their social outreach to intended audiences. The popular social networks such as Facebook, Twitter, and Linkedin and social bookmarks such as Mendeley and CiteULike are nowadays widely used for communicating research to larger transnational audiences. In 2012, the San Francisco Declaration on Research Assessment got signed by the scientific and researchers communities across the world. This declaration has given preference to the ALM or altmetrics over traditional but faulty journal impact factor (JIF)-based assessment of career scientists. JIF does not consider impact or influence beyond citations count as this count reflected only through Thomson Reuters' Web of Science database. Furthermore, JIF provides indicator related to the journal, but not related to a published paper. Thus, altmetrics now becomes an alternative metrics for performance assessment of individual scientists and their contributed scholarly publications. This paper provides a glimpse of genesis of altmetrics in measuring efficacy of scholarly communications and highlights available altmetric tools and social platforms linking altmetric tools, which are widely used in deriving altmetric scores of scholarly publications. The paper thus argues for institutions and policy makers to pay more attention to altmetrics based indicators for evaluation purpose but cautions that proper safeguards and validations are needed before their adoption.

Journal ArticleDOI
TL;DR: In this paper, the authors summarize the main empirical evidences provided by the scientific community as regards the comparison between the two main citation based academic search engines: Google Scholar and Microsoft Academic Search, paying special attention to the following issues: coverage, correlations between journal rankings, and usage of these academic search engine.
Abstract: The goal of this working paper is to summarize the main empirical evidences provided by the scientific community as regards the comparison between the two main citation based academic search engines: Google Scholar and Microsoft Academic Search, paying special attention to the following issues: coverage, correlations between journal rankings, and usage of these academic search engines. Additionally, selfelaborated data is offered, which are intended to provide current evidence about the popularity of these tools on the Web, by measuring the number of rich files PDF, PPT and DOC in which these tools are mentioned, the amount of external links that both products receive, and the search queries frequency from Google Trends. The poor results obtained by MAS led us to an unexpected and unnoticed discovery: Microsoft Academic Search is outdated since 2013. Therefore, the second part of the working paper aims at advancing some data demonstrating this lack of update. For this purpose we gathered the number of total records indexed by Microsoft Academic Search since 2000. The data shows an abrupt drop in the number of documents indexed from 2,346,228 in 2010 to 8,147 in 2013 and 802 in 2014. This decrease is offered according to 15 thematic areas as well. In view of these problems it seems logical not only that Microsoft Academic Searchwas poorly used to search for articles by academics and students, who mostly use Google or Google Scholar, but virtually ignored by bibliometricians

Posted Content
TL;DR: An empirical application comparing some impact indicators with the topic normalized impact factor in a set of 224 journals from four different fields shows that the normalization, using the citation potential in the journal topic, reduces the between- group variance with respect to the within-group variance in a higher proportion than the rest of indicators analyzed.
Abstract: The journal impact factor is not comparable among fields of science and social science because of systematic differences in publication and citation behaviour across disciplines. In this work, a source normalization of the journal impact factor is proposed. We use the aggregate impact factor of the citing journals as a measure of the citation potential in the journal topic, and we employ this citation potential in the normalization of the journal impact factor to make it comparable between scientific fields. An empirical application comparing some impact indicators with our topic normalized impact factor in a set of 224 journals from four different fields shows that our normalization, using the citation potential in the journal topic, reduces the between-group variance with respect to the within-group variance in a higher proportion than the rest of indicators analysed. The effect of journal self-citations over the normalization process is also studied.

Book ChapterDOI
TL;DR: The Fair Dealing Button as mentioned in this paper is a feature designed for authors who have deposited their papers in an Open Access Institutional Repository but have deposited them as "Closed Access" (meaning only the metadata are visible and retrievable, not the full eprint) rather than Open Access.
Abstract: We describe the "Fair Dealing Button," a feature designed for authors who have deposited their papers in an Open Access Institutional Repository but have deposited them as "Closed Access" (meaning only the metadata are visible and retrievable, not the full eprint) rather than Open Access. The Button allows individual users to request and authors to provide a single eprint via semi-automated email. The purpose of the Button is to tide over research usage needs during any publisher embargo on Open Access and, more importantly, to make it possible for institutions to adopt the "Immediate-Deposit/Optional-Access" Mandate, without exceptions or opt-outs, instead of a mandate that allows delayed deposit or deposit waivers, depending on publisher permissions or embargoes (or no mandate at all). This is only "Almost-Open Access," but in facilitating exception-free immediate-deposit mandates it will accelerate the advent of universal Open Access.

Journal ArticleDOI
TL;DR: In this paper, the authors present, apply and discuss up to four empirical methods: Khabsa & Giles's method, an estimate based on empirical data, and estimates based on direct queries and absurd queries.
Abstract: The emergence of academic search engines (Google Scholar and Microsoft Academic Search essentially) has revived and increased the interest in the size of the academic web, since their aspiration is to index the entirety of current academic knowledge. The search engine functionality and human search patterns lead us to believe, sometimes, that what you see in the search engine's results page is all that really exists. And, even when this is not true, we wonder which information is missing and why. The main objective of this working paper is to calculate the size of Google Scholar at present (May 2014). To do this, we present, apply and discuss up to 4 empirical methods: Khabsa & Giles's method, an estimate based on empirical data, and estimates based on direct queries and absurd queries. The results, despite providing disparate values, place the estimated size of Google Scholar in about 160 million documents. However, the fact that all methods show great inconsistencies, limitations and uncertainties, makes us wonder why Google does not simply provide this information to the scientific community if the company really knows this figure.

Posted Content
TL;DR: A systematic account of the potential usefulness and limitations of a set of 10 important metrics applied at the level of individual articles, individual researchers, research groups, and institutions and introduces the concept of a “meta‐analysis” of the units under assessment.
Abstract: This article introduces the Multidimensional Research Assessment Matrix of scientific output. Its base notion holds that the choice of metrics to be applied in a research assessment process depends upon the unit of assessment, the research dimension to be assessed, and the purposes and policy context of the assessment. An indicator may by highly useful within one assessment process, but less so in another. For instance, publication counts are useful tools to help discriminating between those staff members who are research active, and those who are not, but are of little value if active scientists are to be compared one another according to their research performance. This paper gives a systematic account of the potential usefulness and limitations of a set of 10 important metrics including altmetrics, applied at the level of individual articles, individual researchers, research groups and institutions. It presents a typology of research impact dimensions, and indicates which metrics are the most appropriate to measure each dimension. It introduces the concept of a meta-analysis of the units under assessment in which metrics are not used as tools to evaluate individual units, but to reach policy inferences regarding the objectives and general setup of an assessment process.

Journal ArticleDOI
TL;DR: In this article, the authors analyzed a set of publications with DOI number indexed in the Web of Science during the period 2011-2013 and collected their data with the Altmetric API.
Abstract: This paper analyzes this http URL, one of the most important altmetric data providers currently used. We have analyzed a set of publications with DOI number indexed in the Web of Science during the period 2011-2013 and collected their data with the Altmetric API. 19% of the original set of papers was retrieved from this http URL including some altmetric data. We identified 16 different social media sources from which this http URL retrieves data. However five of them cover 95.5% of the total set. Twitter (87.1%) and Mendeley (64.8%) have the highest coverage. We conclude that this http URL is a transparent, rich and accurate tool for altmetric data. Nevertheless, there are still potential limitations on its exhaustiveness as well as on the selection of social media sources that need further research.

Posted Content
TL;DR: It is found that social attention is highly correlated with article view, especially the browser html view, and high altmetric score has the potential role in promoting the long-term academic impact of articles.
Abstract: Scholarly and social impacts of scientific publications could be measured by various metrics. In this study, the relationship between various metrics of 63,805 PLOS research articles are studied. Generally, article views correlate well with citation, however, different types of article view have different levels of correlation with citation, when pdf download correlates the citation most significantly. It's necessary for publishers and journals to provide detailed and comprehensive article metrics. Although the low correlation between social attention and citation is confirmed by this study and previous studies, more than ever, we find that social attention is highly correlated with article view, especially the browser html view. Social attention is the important source that bringing network traffic to browser html view and may lead to citation subsequently. High altmetric score has the potential role in promoting the long-term academic impact of articles, when a conceptual model is proposed to interpret the conversion from social attention to article view, and to citation finally.

Journal ArticleDOI
TL;DR: In this article, the authors compared 16 bibliometric indicators with respect to their validity at the level of individual researcher by estimating their power to predict later successful researchers and found that field and citation-window normalisation substantially improves the predicting power of citation indicators.
Abstract: We test 16 bibliometric indicators with respect to their validity at the level of the individual researcher by estimating their power to predict later successful researchers. We compare the indicators of a sample of astrophysics researchers who later co-authored highly cited papers before their first landmark paper with the distributions of these indicators over a random control group of young authors in astronomy and astrophysics. We find that field and citation-window normalisation substantially improves the predicting power of citation indicators. The two indicators of total influence based on citation numbers normalised with expected citation numbers are the only indicators which show differences between later stars and random authors significant on a 1% level. Indicators of paper output are not very useful to predict later stars. The famous $h$-index makes no difference at all between later stars and the random control group.