scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Patent citation analysis with Google

01 Jan 2017-Vol. 68, Iss: 1, pp 48-61
TL;DR: A semiautomatic indirect method via Bing to extract and filter patent citations from Google to academic papers with an overall precision of 98%, which suggests that traditional citation counts cannot substitute for patent citations when evaluating research.
Abstract: Citations from patents to scientific publications provide useful evidence about the commercial impact of academic research, but automatically searchable databases are needed to exploit this connection for large-scale patent citation evaluations. Google covers multiple different international patent office databases but does not index patent citations or allow automatic searches. In response, this article introduces a semiautomatic indirect method via Bing to extract and filter patent citations from Google to academic papers with an overall precision of 98%. The method was evaluated with 322,192 science and engineering Scopus articles from every second year for the period 1996-2012. Although manual Google Patent searches give more results, especially for articles with many patent citations, the difference is not large enough to be a major problem. Within Biomedical Engineering, Biotechnology, and Pharmacology &Pharmaceutics, 7% to 10% of Scopus articles had at least one patent citation but other fields had far fewer, so patent citation analysis is only relevant for a minority of publications. Low but positive correlations between Google Patent citations and Scopus citations across all fields suggest that traditional citation counts cannot substitute for patent citations when evaluating research.

Summary (3 min read)

Introduction

  • The share of scientific references in patents seems to differ across technological domains (Callaert et al. 2006), patent offices (Michel & Bettels, 2001) and between domestic and international patents (Tijssen, 2001) and so results from one investigation may not be valid in other contexts.
  • Counting citations to academic articles from all web pages provides a free impact indicator that correlates with conventional citations at both the article and journal levels (Vaughan & Shaw, 2003, 2005), also known as The Web.

Data Sets

  • Bibliographic information and citation counts for English language articles from every second year 1996- 2012 were extracted from Scopus from sixteen science and engineering fields (see Table 3).
  • These years were selected to investigate the impact of time on citations to relatively recent academic articles in patents.
  • For each selected field and year, a random sample of 2,250 articles was taken from the Scopus set (e.g., Biotechnology articles published in 2002).
  • Scopus articles with less than three words in their titles were also excluded.

Removing Duplicate Matches

  • In a few cases the Bing search results included multiple versions of the same or overlapping patents, such as the initially submitted version and the finally accepted one, or an original patent and a continuation or continuation-in-part.
  • In order to avoid counting both, the authors filtered out patents with the same titles and descriptions in the Bing API search results.
  • Only two matches were unique and one was a duplicate.
  • This removed about 3% (1,564 out of 52,453) of the initial Bing results searches, ranging from 1.4% Biochemistry & Molecular Biology to 4.6% Electrical & Electronic Engineering.

Manual Checks of the Bing API Searches

  • Manually checking was used to investigate the coverage of the filtered automatic Bing searches.
  • A stratified sample of 320 results was selected for this with five high, medium, low and uncited articles in the online patents from the original Bing API searches for each field (20 results * 16 fields = 320).
  • To check for this, each article was searched for in the Google Patent main search interface and duplicate patents were identified by comparing the patent titles, authors and initial descriptions, but ignoring the patent number.
  • Patents were considered to be duplicates if they described the same invention, even if the wording was slightly different because these changes were presumably due to revisions of the text rather than major changes to the product.

Estimating the Accuracy and Coverage of Bing API Searches

  • The manual checking of the stratified sample of 320 articles (Uncited, low-cited, medium-cited and highcited) found that the Bing API results tended to be less comprehensive than the direct manual Google Patent searches.
  • For low-cited articles, the authors randomly chose five articles with one or two citations in the Bing API searches.
  • In Food Science, Environmental Science, Industrial & Manufacturing Engineering and Pharmacology & Pharmaceutics the medians were 4, but in Biotechnology and Biomedical Engineering the medians were 5 and 6 respectively.
  • Out of 80 uncited articles from the Bing API searches, 8(10%) had (1 or 2) citations in the manual Google Patent searches.
  • From the 1.6% (9 of 560) false matches, four results were from patent citations with the same title and author .

Google Patent Citation Counts

  • In all fields (Table 3) the vast majority of articles had no patent citations in the Bing results (and hence also in manual Google Patent searches; see above).
  • Biomedical Engineering (10%), Biotechnology (9%), and Pharmacology & Pharmaceutics (7%) had the highest proportions of Scopus articles with at least one patent citation, suggesting that these fields either have a particularly direct commercial value or more of a patenting culture than others.
  • These three field proportions at least triple those of Mechanical Engineering (1.9%) and Energy Engineering (2.2%), showing that there are substantial disciplinary differences in the proportion of academic articles that are cited in patents.

The Relationship between Google Patents and Scopus Citations

  • There are statistically significant positive low correlations between Scopus and Google Patent citations from the Bing API searches in all fields (Table 3).
  • Spearman correlations were used because citation data is typically skewed.
  • The weak but significant positive correlations between patents and Scopus citations across all sixteen science and engineering fields analysed suggests that academic papers are more likely to be commercially valuable if they are more highly cited (see also Tijssen, Buter, & van Leeuwen, 2000).

Patent Citations by Year

  • Citations take time to accrue, whether from patents or academic articles, and so the publication year is important.
  • There are higher Spearman correlations between Scopus and Google Patent citations for longer time periods in most fields (Table 4), presumably because the additional data makes the statistic more powerful.
  • One reason for the large increases could be that it takes a long time for patents to be processed and granted – over two years for the USPTO (http://www.uspto.gov/learning-and-resources/general-faqs) – and industrial inventors may also be slower to patent than researchers are to publish, or may be less up-todate with the academic literature if they are not publishing scientists.
  • From the results, long time periods are needed for assessing the commercial or technical value of academic publications based on patent citations (see also: Breschi et al., 2006).
  • For the fields analysed, five years would be an absolute minimum since the figures are very low for 2012, and even fifteen years would give substantially more results than ten years.

Results by Patent Office

  • The URLs of the Google Patent search results were used to assess the share of the citations that originated from the US (United States Patent and Trademark Office), WO (for World Intellectual Property Organization-WIPO), EP (for European Patent Office), CA (Canadian Intellectual Property Office), CN (State Intellectual Property Office of China) and DE (German Patent and Trade Mark Office).
  • Over two thirds of the Google Patents citations in all fields were from US patents .
  • This probably reflects the more extensive use of citations in USPTO patents than in the others covered as well and the English language selection criterion for the articles analysed.
  • The relatively large size of the USPTO database is another important factor: since 1996 it has had at least three times as many applications per year as the other indexed offices except for China, which broke this pattern in about 2008 and grew rapidly to overtake the US in about 2010 (WIPO, 2014, p14).
  • The duplicate removal process eliminated 3% of the citations, irrespective of patent office (see methods), and this may have had a minor impact on the proportions.

Patent-Cited Topics and Scopus-Cited Topics

  • A simple heuristic was used to detect whether the topic areas that tended to attract patent citations were the same as those that tended to attract Scopus citations.
  • Google Scholar as a new source for citation analysis.
  • An assessment of citation linkages between patents and research papers.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Patent Citation Analysis with Google
1
Kayvan Kousha and Mike Thelwall
Statistical Cybermetrics Research Group, School of Mathematics and Computer Science, University of
Wolverhampton, Wulfruna Street, Wolverhampton WV1 1LY, UK.
E-mail: {k.kousha, m.thelwall}@wlv.ac.uk
Citations from patents to scientific publications provide useful evidence about the commercial impact
of academic research but automatically searchable databases are needed to exploit this connection
for large scale patent citation evaluations. Google covers multiple different international patent office
databases but does not index patent citations or allow automatic searches. In response, this article
introduces a semi-automatic indirect method via Bing to extract and filter patent citations from
Google to academic papers with an overall precision of 98%. The method was evaluated with 322,192
science and engineering Scopus articles from every second year during 1996-2012. Although manual
Google Patent searches give more results, especially for articles with many patent citations, the
difference is not large enough to be a major problem. Within Biomedical Engineering, Biotechnology,
and Pharmacology & Pharmaceutics, 7%-10% of Scopus articles had at least one patent citation but
other fields had far fewer so patent citation analysis is only relevant for a minority of publications.
Low but positive correlations between Google Patent citations and Scopus citations across all fields
suggests that traditional citation counts cannot substitute for patent citations when evaluating
research, however.
Introduction
Bibliometric methods are commonly used to help assess the research impact of scientific publications
based upon citations in conventional citation indexes, such as the Web of Science (WoS) and Scopus.
However, some scholarly publications have a commercial utility that does not directly translate into
academic citations. Since governments value research that has commercial value, alternative methods
are needed to assess it in order to track it or adequately reward its authors. Citations to academic
publications from patents, for instance, suggest the commercial value or at least technological
innovation of the cited article. In some cases, the invention may even have been triggered by the
academic research (Verbeek, et al., 2002). Citations from patents have been used to assess the
relationship between science and industry (e.g., Narin & Olivastro, 1992; Schmoch, 1993; Narin,
Hamilton, & Olivastro, 1997). Of the many bibliometric studies using patent citations (e.g., Tijssen,
Buter, & Van Leeuwen, 2000; Tijssen, 2001; Callaert et al., 2006; Meyer, Debackere, & Glänzel, 2010;
Callaert, Grouwels, & van Looy, 2012; Roach & Cohen, 2013), some applications include assessing the
technological value of academic journals (Huang, Huang, & Chen, 2014; Liaw, Chan, Fan, & Chiang,
2014), the research performance of firms (Nagaoka, 2007; Subramanian & Soh, 2010; Hung, 2012),
university-industry knowledge relationships (Leydesdorff, 2004) and the research performance of
countries (Van Looy et al., 2003). Hence patent citations are an important data source for bibliometrics
but, as discussed below, they are difficult to use for large scale studies.
1
This is a preprint of an article to be published in the Journal of the Association for Information Science and
Technology © copyright 2015 John Wiley & Sons, Inc.

Bibliometric studies of patent citations can use time-consuming manual searches of patent databases,
such as Google Patents or the Derwent World Patents Index, none of which currently include an
academic citation index to aid the process. This is difficult for researchers or evaluators who want to
estimate the overall citation impact of large numbers of articles from a reasonably comprehensive
collection of patents (Verbeek et al., 2002; Shirabe, 2014) and so many studies are restricted to patents
from a narrow range of years and disciplines (e.g., Tijssen, Buter & van Leeuwen, 2000; Meyer, 2003;
Callaert et al., 2006; Meyer, Debackere, & Glänzel, 2010; Callaert, Grouwels, & van Looy, 2012).
Although attempts have been made to partially automate this process (e.g., Lawson, Kemp, Lynch, &
Chowdhury, 1996; Nanba, Anzen, & Okumura, 2008; Lopez, 2010; Ma, Sun, Wang, & Yang, 2010;
Callaert, Grouwels, & van Looy, 2012), these methods require all the patents to be downloaded or
collected as a first step, which is inefficient and requires repeated downloading to keep the results up-
to-date.
In response to the above problems, this article introduces a practical method to help evaluators and
funders to extract patent citation counts from Google Patents for large collections of academic articles.
Google Patents contains a large collection of fully searchable patents from the United States Patent and
Trademark Office (USPTO) since 1790 and the European Patent Office (EPO) since 1978
(https://support.google.com/faqs/answer/2539193?). It also indexes patents from the World
Intellectual Property Organization (WIPO), Canada, China and Germany. Google Patents is not a citation
index but its full-text search capability can be used to locate citations to other scientific publications in
the patent references. It does not support automatic API searches
(https://developers.google.com/patent-search/) and its main search interface is impractical for large-
scale research evaluations. To solve these problems, the new method introduced here uses automatic
Bing searches (exploiting Bing’s crawl of the Google Patents website) in combination with automatic
duplicate results filtering of the data returned by Bing. The method was evaluated with a study of
citations to 322,192 Scopus articles across sixteen science and engineering fields and manual checks of a
sample of the search results.
Patent Citations
Three decades ago, Narin and Noma (1985) first convincingly argued that patent citations to scientific
papers could be used to investigate the relationship between science and technology. They found a
significant number of citations from US biotechnology patents 1978-1980 to scientific publications.
About half (48%) of the non-patent references (i.e., patent citations to documents other than other
patents) were to journal articles. A slightly larger proportion (56%) was found later for Netherlands-
invented U.S. patents between 1982 and 1985 in all technological fields (van Vianen, Moed, & van Raan,
1990) and so it is likely that academic research is the most important non-patent source of evidence for
U.S. patents. A large-scale study analysed 430,226 non-patent references in about 397,600 U.S. patents
issued 1987-1988 and 1993-1994 (Narin, Hamilton, & Olivastro, 1997). For 1993-1994 patents there
were 1.5 non-patent references per patent. About 73% of the papers cited by industry patents were
public science, authored at academic, governmental, and other public institutions, with the remaining
27% authored by industrial scientists. There was a strong tendency for inventors to cite articles authored
in their own country, at prestigious universities and laboratories, and supported by well-known founding
bodies such as the National Institute of Health (NIH) and The National Science Foundation (NSF). This
confirms that public science plays an important role in supporting U.S. industry and technology.
Instructions in the US for applicants to include a complete list of references to the state of the art have
led to 3.5 times more non-patent references than in European patents, however (Michel & Bettels,
2001), and so other patent databases may contain substantially fewer academic citations.

Out of 33,127 pharmaceutical EPO patents 1990-1997, 46.5% of the non-patent references were to
scientific articles in ISI (Institute for Scientific Information) databases (now WoS) (Brusoni, Criscuolo, &
Geuna, 2005). A study of 10,000 patents from 1991-2001 found that 55% (USPTO) and 64% (EPO) of the
non-patent references were to journal articles (Callaert et al., 2006). Similarly, in Chinese-authored
USPTO patents 19952004, 64% of the non-patent references were to journal articles (Guan & He, 2007)
and in New Zealand USPTO patents 1976-2004, 65% of the non-patent references were to either WoS-
indexed journals (52.6%) or other articles (12.6%) (He & Deng, 2007). In contrast, 90% of the non-patent
references were to journal articles in study of 6,274 USPTO genetics patents 19802004 (Lo, 2010),
revealing the importance of disciplinary differences. Overall, however, the majority of non-patent
references in patents are probably to scientific articles.
A study of patent citations to Dutch research papers from U.S. patents 1987-1996 found a low but
statistically significant correlation (Pearson r=0.16, n=2,241) between academic citations and USPTO
citations to Dutch papers from 1993-1996, suggesting that academic and commercial impact have
something in common (Tijssen, Buter, & van Leeuwen, 2000). Supporting this, although nearly all
respondents (94%) to a survey of 35 Dutch inventors believed that their internal research was important
or critical for their patents, non-patent citations were important sources of information in 70% of cases.
The majority of these non-patent citations were created by the applicants themselves rather than by
examiners (Tijssen, Buter, & van Leeuwen, 2000). Applicants were also found to create more references
than did examiners for 502,687 patents issued by USPTO during 2001-2003, with the difference being
26% (Sampat, 2004).
Citations in patents can reflect the differing citing motivations of both patent authors and examiners
(Schmoch, 1993; Meyer, 2000a; Oppenheim, 2000) and some do not reflect technological innovation
(Jaffe, Trajtenberg, & Fogarty, 2000), although this problem is less frequent in self-citations (Li et al.,
2014). Citations from patent examiners probably do not reflect knowledge flows from public research to
industry (Alcacer & Gittelman 2006; Alcácer, Gittelman, & Sampat, 2009; Roach & Cohen, 2013), but this
is not a substantial problem because they are in a minority (Tijssen, Buter, & Van Leeuwen, 2000;
Sampat, 2004; Lemley & Sampat 2010). The share of scientific references in patents seems to differ
across technological domains (Callaert et al. 2006), patent offices (Michel & Bettels, 2001) and between
domestic and international patents (Tijssen, 2001) and so results from one investigation may not be
valid in other contexts. Overall, however, scientific references in patents can be used cautiously as
indicators of the impact of science on technology (e.g., Meyer, 2000b; Callaert, Pellens, & Van Looy,
2014).
A study of the relationship between patent citations and citation impact for articles from the Science
Citation Index (SCI) matched nanoscience and nanotechnology SCI articles 19912004 with patent
citations in the Derwent Innovations Index (DII) up to 2004. Few (4.6% or 7,000) of the SCI papers had
received at least one patent citation and less than 1.5% (2,000) had been cited at least twice. However,
about 14% of the most cited papers had also been cited in patents, indicating that highly cited articles
are more likely to receive patent citations in this field (Meyer, Debackere, & Glänzel, 2010).
Non-patent citations have been used to indicate the technological value of academic journals in a similar
way to Journal Impact Factors (JIFs) for scientific impact (Huang, Huang, & Chen, 2014; Liaw, Chan, Fan,
& Chiang, 2014). One study matched the references in 2011 US patents with journal articles published in
journals in the 2011 Journal Citation Reports (JCR) to generate five and ten-year Technological Impact
Factors (TIFs). There were low positive correlations between five and ten-year TIFs and JIFs, suggesting
that TIFs may reflect a genuinely different type of impact (Huang, Huang, & Chen, 2014).

Text mining techniques to extract information from patents, such as with keywords from titles or
abstracts, can be used for patent analyses (e.g., Yoon & Park, 2004; Tseng, Lin, & Lin, 2007; Lee, Yoon, &
Park, 2009). Several studies have used keywords in patent titles to analyse technical topics (Han et al.,
2014), technological trends (Courtial, Callon, & Sigogneau, 1993) or universityindustry relationships
(Leydesdorff, 2004).
Patents play a more significant role in some technological fields than in others (see Cohen, 2010).
According to World Intellectual Property Organization (WIPO) statistics, the share of US patent
applications between 1999 2013 is higher in technology fields such as Computer Technology (10.8%),
Medical Technology (7.9%) and Pharmaceuticals (6.3%) than in other fields
(http://www.wipo.int/ipstats/en/statistics/country_profile/profile.jsp?code=US) Another issue is that
there might be some areas that are frequently patented but these patents are rarely informed by
scientific research. In contrast, Biotechnology, biomedical science and pharmaceutics have a high
linkage with scientific research, as discussed above (see Narin & Olivastro 1992; Narin, Hamilton, &
Olivastro, 1997; Verbeek et al. 2002).
Experiments in Web Citation Extraction
Web-based citation analyses of academic publications have the potential to provide data about the
wider impact of research beyond that of conventional citation indicators (Cronin, 2001). For example,
webometric investigations have attempted to extract impact evidence from scholarly articles, digitised
books, clinical guidelines, online presentations, and academic course syllabi on the web.
The Web: Counting citations to academic articles from all web pages provides a free impact indicator
that correlates with conventional citations at both the article and journal levels (Vaughan & Shaw, 2003,
2005). Most of these web citations originate from web CVs, journal tables of contents and library web
sites, which are mainly created for navigational, self-publicity or current awareness (Kousha & Thelwall,
2007b). In response, later studies extracted web citations from specific parts of the web in order to get
more targeted information for the impact assessment.
Google Scholar: Google Scholar indexes citations from documents that are online or provided by
publishers. It seems to cover about 88% (100 million) of the English-language scholarly documents
accessible on the web (Khabsa & Giles, 2014), which is double the size of WoS (about 53 million:
authors' own WoS searches in March 2015). Google Scholar indexes more citations than do WoS and
Scopus in many fields, especially in the social sciences, arts and humanities and computing (Meho &
Yang 2007; Kousha & Thelwall, 2007a; Bar-Ilan, 2008; Kulkarni, Aziz, Shams, & Busse, 2009). Although
the Publish or Perish software can be used to automate data gathering from Google Scholar in some
extent (Harzing & van der Wal, 2008), it cannot be used to extract citations from Google patents. The
substantial coverage of Google Scholar seem to be useful for assessments of recently-published, in press
publications and other publications (e.g., non-English) which have been invisible in conventional citation
indexes. Hence, the derived citation indicators from Google Scholar such as h-index found to be much
higher than either WoS or Scopus (e.g., Amara & Landry, 2012; De Groote & Raszewski, 2012).
Moreover, the retroactive growth of Google Scholar citation seems to be considerably higher than WoS,
making it a promising tool for citation tracking (de Winter, Zadpoor, & Dodou, 2014). Nonetheless, the
lack of quality control and potential for manipulation of citation results makes it problematic to use
Google Scholar for impact assessments (e.g., Jacso 2011; Beel & Gipp, 2010; López-Cózar, Robinson-
García, & Torres-Salinas, 2014). Google Scholar includes citations from patents but extensive manual
searching and filtering are needed to locate them and automatic searching is not allowed for this.

Google Books: Google Books does not index citations but contains digitised versions of millions of
books and can be searched for citations (Kousha & Thelwall, 2009). Google Books citations are useful for
the impact assessment of research, especially in book-based fields, because existing citation indexes
include few citations from books (Kousha, Thelwall, & Rezaie, 2011). Although the Google Books API
(Applications Programming Interface) can be used to automate citation counting from digitised books
(Kousha & Thelwall, 2014) and it uses similar scanning technology to Google Patents, Google Books
searches do not include citations from patents.
Online Presentations: Citations from online presentations can give impact evidence in conference-
based subject areas, such as computer science and engineering, where proceedings papers are
important. Many conference papers have associated presentation files (e.g., in Microsoft PowerPoint)
that are shared online (e.g., in slideshare.net or slideshow.com). Citations from presentations can be
collected by automatic Bing searches with web queries that combine bibliographic information with the
advanced search operator filetype:ppt to restrict the results to presentation files (Thelwall & Kousha,
2008).
Clinical Guidelines: Citations from clinical guidelines directly reflect the impact of published research
on the treatment of patients and are to some extent the health equivalent of patent citations. These
citations can sometimes be systematically gathered from websites that publish them, such as the
National Institute of Health and Clinical Excellence (NICE) site in the UK. A study of NICE citations using
this method found that articles cited in guidelines are more likely to be highly cited in WoS (Thelwall &
Maflahi, 2015).
Academic Syllabi: Mentions of academic outputs in course reading lists can be used as an indicator of
their value for teaching utility of publications (Kousha & Thelwall, 2008) and this is particularly useful for
textbooks and introductory science books that have primarily educational value. It is possible to
automatically count mentions of monographs in online academic syllabi through a combination of Bing
searches and rules to filter out false matches. Over a third of 14,000 monographs in one study had at
least one academic syllabus mention, with more in the arts and humanities (56%) and social sciences
(52%), confirming the importance of monographs for teaching in book-based subject areas (Kousha &
Thelwall, in press).
In summary, there is empirical evidence that different types of citations can be extracted from the web
for impact assessment. Nevertheless, no previous study has used an automatic method to extract large
numbers of patent citations from the web.
Research Questions
This study introduces and assesses a new method to semi-automatically extract patent citations on a
large scale from the web. The technique exploits both the Google Patents database and the Bing API
automatic search interface. The following questions drive the evaluation and investigations into the
value of the extracted information.
1. Can citations to academic articles be automatically extracted from the Google Patent database
with an acceptable degree of coverage and accuracy?
2. Do Google Patent citations correlate with Scopus citations to academic articles?
3. How do publication date and discipline affect the answers to the above questions?
4. Which patent offices do the Google Patent results mainly originate from?

Citations
More filters
Journal ArticleDOI
TL;DR: Mendeley reader counts were more precise than Scopus citations for the most recent articles and all three funders could be demonstrated to have an impact in Wikipedia that was significantly above the world average.

92 citations

Journal ArticleDOI
01 Mar 2017
TL;DR: The results show that citations from Wikipedia to articles are too rare for most research evaluation purposes, with only 5% of articles being cited in all fields, and so Wikipedia is not recommended for evaluations affecting stakeholder interests.
Abstract: Individual academics and research evaluators often need to assess the value of published research. Although citation counts are a recognized indicator of scholarly impact, alternative data is needed to provide evidence of other types of impact, including within education and wider society. Wikipedia is a logical choice for both of these because the role of a general encyclopaedia is to be an understandable repository of facts about a diverse array of topics and hence it may cite research to support its claims. To test whether Wikipedia could provide new evidence about the impact of scholarly research, this article counted citations to 302,328 articles and 18,735 monographs in English indexed by Scopus in the period 2005 to 2012. The results show that citations from Wikipedia to articles are too rare for most research evaluation purposes, with only 5% of articles being cited in all fields. In contrast, a third of monographs have at least one citation from Wikipedia, with the most in the arts and humanities. Hence, Wikipedia citations can provide extra impact evidence for academic monographs. Nevertheless, the results may be relatively easily manipulated and so Wikipedia is not recommended for evaluations affecting stakeholder interests.

89 citations


Cites methods or result from "Patent citation analysis with Googl..."

  • ...Mentions of books in course reading lists can reflect their teaching value (Kousha & Thelwall, 2008) and hence citations from online academic course syllabi were counted in order to investigate the teaching value of articles cited in Wikipedia (for method details, see Kousha & Thelwall, 2015a)....

    [...]

  • ...Weak but statistically significant correlations between academic syllabus mentions and traditional citations in many fields are broadly consistent with this conclusion (Kousha & Thelwall, 2015a)....

    [...]

  • ...A semi-automatic method can extract citations from Google-indexed patents on a large scale and low correlations between patent citations and journal citations suggest that patent citations may reflect the wider commercial value of research (Kousha & Thelwall, 2015b)....

    [...]

Journal ArticleDOI
TL;DR: This study deals with a relatively new form of societal impact measurements, and reveals that papers published in Nature and Science as well as from the areas “Earth and related environmental sciences” and “Social and economic geography” are especially relevant in the policy context.
Abstract: In the current UK Research Excellence Framework (REF) and the Excellence in Research for Australia (ERA), societal impact measurements are inherent parts of the national evaluation systems. In this study, we deal with a relatively new form of societal impact measurements. Recently, Altmetric--a start-up providing publication level metrics--started to make data for publications available which have been mentioned in policy documents. We regard this data source as an interesting possibility to specifically measure the (societal) impact of research. Using a comprehensive dataset with publications on climate change as an example, we study the usefulness of the new data source for impact measurement. Only 1.2 % (n = 2341) out of 191,276 publications on climate change in the dataset have at least one policy mention. We further reveal that papers published in Nature and Science as well as from the areas "Earth and related environmental sciences" and "Social and economic geography" are especially relevant in the policy context. Given the low coverage of the climate change literature in policy documents, this study can be only a first attempt to study this new source of altmetrics data. Further empirical studies are necessary, because mentions in policy documents are of special interest in the use of altmetrics data for measuring target-oriented the broader impact of research.

71 citations

Journal ArticleDOI
Lutz Bornmann1
TL;DR: In this article, the authors discuss how impact is generally measured within science and beyond, which effects impact measurements have on the science system and which problems are associated with impact measurement, including inequality, random chance, anomalies, the right to make mistakes, unpredictability and a high significance of extreme events.
Abstract: Impact of science is one of the most important topics in scientometrics. Recent developments show a fundamental change in impact measurements from impact on science to impact on society. Since impact measurement is currently in a state of far reaching changes, this paper describes recent developments and facing problems in this area. For that, the results of key publications (dealing with impact measurement) are discussed. The paper discusses how impact is generally measured within science and beyond, which effects impact measurements have on the science system and which problems are associated with impact measurement. The problems associated with impact measurement constitute the focus of this paper: Science is marked by inequality, random chance, anomalies, the right to make mistakes, unpredictability and a high significance of extreme events, which might distort impact measurements. Scientometricians as the producer of impact scores and decision makers as their consumers should be aware of these problems and should consider them in the generation and interpretation of bibliometric results, respectively.

69 citations

Posted Content
TL;DR: In this article, the authors deal with a relatively new form of societal impact measurements: Altmetric -a start-up providing publication level metrics - started to make data for publications available which have been mentioned in policy documents and regard this data source as an interesting possibility to specifically measure the (societal) impact of research.
Abstract: In the current UK Research Excellence Framework (REF) and the Excellence in Research for Australia (ERA) societal impact measurements are inherent parts of the national evaluation systems. In this study, we deal with a relatively new form of societal impact measurements. Recently, Altmetric - a start-up providing publication level metrics - started to make data for publications available which have been mentioned in policy documents. We regard this data source as an interesting possibility to specifically measure the (societal) impact of research. Using a comprehensive dataset with publications on climate change as an example, we study the usefulness of the new data source for impact measurement. Only 1.2% (n=2,341) out of 191,276 publications on climate change in the dataset have at least one policy mention. We further reveal that papers published in Nature and Science as well as from the areas "Earth and related environmental sciences" and "Social and economic geography" are especially relevant in the policy context. Given the low coverage of the climate change literature in policy documents, this study can be only a first attempt to study this new source of altmetric data. Further empirical studies are necessary in upcoming years, because mentions in policy documents are of special interest in the use of altmetric data for measuring target-oriented the broader impact of research.

64 citations

References
More filters
Journal ArticleDOI
TL;DR: A detailed and systematic examination of the contribution of public science to industrial technology would be useful evidence in arguing the case for governmental support of science as mentioned in this paper, by tracing the rapidly growing citation linkage between U.S. patents and scientific research papers.

1,222 citations


"Patent citation analysis with Googl..." refers background in this paper

  • ...In contrast, biotechnology, biomedical science, and pharmaceutics have a high linkage with scientific research, as discussed above (see Narin & Olivastro, 1992; Narin et al., 1997; Verbeek et al., 2002)....

    [...]

  • ...A large-scale study analyzed 430,226 nonpatent references in about 397,600 US patents issued in 1987–1988 and 1993–1994 (Narin et al., 1997)....

    [...]

Journal IssueDOI
TL;DR: Results show that Scopus significantly alters the relative ranking of those scholars that appear in the middle of the rankings and that GS stands out in its coverage of conference proceedings as well as international, non-English language journals.
Abstract: The Institute for Scientific Information's (ISI, now Thomson Scientific, Philadelphia, PA) citation databases have been used for decades as a starting point and often as the only tools for locating citations andsor conducting citation analyses. The ISI databases (or Web of Science [WoS]), however, may no longer be sufficient because new databases and tools that allow citation searching are now available. Using citations to the work of 25 library and information science (LIS) faculty members as a case study, the authors examine the effects of using Scopus and Google Scholar (GS) on the citation counts and rankings of scholars as measured by WoS. Overall, more than 10,000 citing and purportedly citing documents were examined. Results show that Scopus significantly alters the relative ranking of those scholars that appear in the middle of the rankings and that GS stands out in its coverage of conference proceedings as well as international, non-English language journals. The use of Scopus and GS, in addition to WoS, helps reveal a more accurate and comprehensive picture of the scholarly impact of authors. The WoS data took about 100 hours of collecting and processing time, Scopus consumed 200 hours, and GS a grueling 3,000 hours. © 2007 Wiley Periodicals, Inc.

784 citations

Book ChapterDOI
Wesley M. Cohen1
TL;DR: The authors reviewed the empirical literature on the determination of firms and industries' innovative activity and performance, highlighting the questions addressed, the approaches adopted, impediments to progress in the field, and research opportunities.
Abstract: This chapter reviews the empirical literature on the determination of firms’ and industries’ innovative activity and performance, highlighting the questions addressed, the approaches adopted, impediments to progress in the field, and research opportunities. We review the “neo-Schumpeterian” empirical literature that examines the effects of firm size and market concentration upon innovation, focusing on robust findings, questions of interpretation, and the identification of major gaps. We also consider the more modest literature that considers the effect on innovation of firm characteristics other than size. Finally, we review the literature that considers three classes of factors that affect interindustry variation in innovative activity and performance: demand, appropriability, and technological opportunity conditions.

759 citations


"Patent citation analysis with Googl..." refers background in this paper

  • ...Patents play a more significant role in some technological fields than in others (see Cohen, 2010)....

    [...]

Journal ArticleDOI
TL;DR: Analysis of patent citations with respect to self-citation, distance, technology overlap, and vintage indicates that inferences about inventor knowledge using pooled citations may suffer from bias or overinflated significance levels.
Abstract: Analysis of patent citations is a core methodology in the study of knowledge diffusion. However, citations made by patent examiners have not been separately reported, adding unknown noise to the data. We leverage a recent change in the reporting of patent data showing citations added by examiners. The magnitude is high: two-thirds of citations on the average patent are inserted by examiners. Furthermore, 40% of all patents have all citations added by examiners. We analyze the distribution of examiner and inventor citations with respect to self-citation, distance, technology overlap, and vintage. Results indicate that inferences about inventor knowledge using pooled citations may suffer from bias or overinflated significance levels.

723 citations


"Patent citation analysis with Googl..." refers background in this paper

  • ...Citations from patent examiners probably do not reflect knowledge flows from public research to industry (Alcácer & Gittelman, 2006; Alcácer, Gittelman, & Sampat, 2009; Roach & Cohen, 2013), but this is not a substantial problem because they are in a minority (Lemley & Sampat, 2012; Sampat, 2004;…...

    [...]

Journal ArticleDOI
TL;DR: The attempt in this paper to automate the whole process not only helps create final patent maps for topic analyses, but also facilitates or improves other patent analysis tasks such as patent classification, organization, knowledge sharing, and prior art searches.
Abstract: Patent documents contain important research results. However, they are lengthy and rich in technical terminology such that it takes a lot of human efforts for analyses. Automatic tools for assisting patent engineers or decision makers in patent analysis are in great demand. This paper describes a series of text mining techniques that conforms to the analytical process used by patent analysts. These techniques include text segmentation, summary extraction, feature selection, term association, cluster generation, topic identification, and information mapping. The issues of efficiency and effectiveness are considered in the design of these techniques. Some important features of the proposed methodology include a rigorous approach to verify the usefulness of segment extracts as the document surrogates, a corpus- and dictionary-free algorithm for keyphrase extraction, an efficient co-word analysis method that can be applied to large volume of patents, and an automatic procedure to create generic cluster titles for ease of result interpretation. Evaluation of these techniques was conducted. The results confirm that the machine-generated summaries do preserve more important content words than some other sections for classification. To demonstrate the feasibility, the proposed methodology was applied to a real-world patent set for domain analysis and mapping, which shows that our approach is more effective than existing classification systems. The attempt in this paper to automate the whole process not only helps create final patent maps for topic analyses, but also facilitates or improves other patent analysis tasks such as patent classification, organization, knowledge sharing, and prior art searches.

695 citations

Frequently Asked Questions (1)
Q1. What have the authors contributed in "Patent citation analysis with google1" ?

In response, this article introduces a semi-automatic indirect method via Bing to extract and filter patent citations from Google to academic papers with an overall precision of 98 %. The method was evaluated with 322,192 science and engineering Scopus articles from every second year during 1996-2012. Low but positive correlations between Google Patent citations and Scopus citations across all fields suggests that traditional citation counts can not substitute for patent citations when evaluating research, however.