scispace - formally typeset
Search or ask a question
Author

Bela Gipp

Bio: Bela Gipp is an academic researcher from University of Wuppertal. The author has contributed to research in topics: Plagiarism detection & Computer science. The author has an hindex of 32, co-authored 187 publications receiving 3759 citations. Previous affiliations of Bela Gipp include University of California & University of California, Berkeley.


Papers
More filters
Journal ArticleDOI
TL;DR: Several actions could improve the research landscape: developing a common evaluation framework, agreement on the information to include in research papers, a stronger focus on non-accuracy aspects and user modeling, a platform for researchers to exchange information, and an open-source framework that bundles the available recommendation approaches.
Abstract: In the last 16 years, more than 200 research articles were published about research-paper recommender systems. We reviewed these articles and present some descriptive statistics in this paper, as well as a discussion about the major advancements and shortcomings and an overview of the most common recommendation concepts and approaches. We found that more than half of the recommendation approaches applied content-based filtering (55 %). Collaborative filtering was applied by only 18 % of the reviewed approaches, and graph-based recommendations by 16 %. Other recommendation concepts included stereotyping, item-centric recommendations, and hybrid recommendations. The content-based filtering approaches mainly utilized papers that the users had authored, tagged, browsed, or downloaded. TF-IDF was the most frequently applied weighting scheme. In addition to simple terms, n-grams, topics, and citations were utilized to model users' information needs. Our review revealed some shortcomings of the current research. First, it remains unclear which recommendation concepts and approaches are the most promising. For instance, researchers reported different results on the performance of content-based and collaborative filtering. Sometimes content-based filtering performed better than collaborative filtering and sometimes it performed worse. We identified three potential reasons for the ambiguity of the results. (A) Several evaluations had limitations. They were based on strongly pruned datasets, few participants in user studies, or did not use appropriate baselines. (B) Some authors provided little information about their algorithms, which makes it difficult to re-implement the approaches. Consequently, researchers use different implementations of the same recommendations approaches, which might lead to variations in the results. (C) We speculated that minor variations in datasets, algorithms, or user populations inevitably lead to strong variations in the performance of the approaches. Hence, finding the most promising approaches is a challenge. As a second limitation, we noted that many authors neglected to take into account factors other than accuracy, for example overall user satisfaction. In addition, most approaches (81 %) neglected the user-modeling process and did not infer information automatically but let users provide keywords, text snippets, or a single paper as input. Information on runtime was provided for 10 % of the approaches. Finally, few research papers had an impact on research-paper recommender systems in practice. We also identified a lack of authority and long-term research interest in the field: 73 % of the authors published no more than one paper on research-paper recommender systems, and there was little cooperation among different co-author groups. We concluded that several actions could improve the research landscape: developing a common evaluation framework, agreement on the information to include in research papers, a stronger focus on non-accuracy aspects and user modeling, a platform for researchers to exchange information, and an open-source framework that bundles the available recommendation approaches.

648 citations

Journal ArticleDOI
TL;DR: In this paper, the concept of academic search engine optimization (ASEO) is introduced and guidelines are provided on how to optimize scholarly literature for academic search engines in general and for Google Scholar in particular.
Abstract: This article introduces and discusses the concept of academic search engine optimization (ASEO). Based on three recently conducted studies, guidelines are provided on how to optimize scholarly literature for academic search engines in general, and for Google Scholar in particular. In addition, we briefly discuss the risk of researchers' illegitimately ‘over-optimizing’ their articles.

166 citations

01 Jan 2009
TL;DR: The first steps to reverse-engineering Google Scholar’s ranking algorithm are performed and the results may help authors to optimize their articles for Google Scholar and enable researchers to estimate the usefulness of Google Scholar with respect to their search intention and hence the need to use further academic search engines or databases.
Abstract: Google Scholar is one of the major academic search engines but its ranking algorithm for academic articles is unknown. We performed the first steps to reverse-engineering Google Scholar’s ranking algorithm and present the results in this research-in-progress paper. The results are: Citation counts is the highest weighed factor in Google Scholar’s ranking algorithm. Therefore, highly cited articles are found significantly more often in higher positions than articles that have been cited less often. As a consequence, Google Scholar seems to be more suitable for finding standard literature than gems or articles by authors advancing a new or different view from the mainstream. However, interesting exceptions for some search queries occurred. Moreover, the occurrence of a search term in an article’s title seems to have a strong impact on the article’s ranking. The impact of search term frequencies in an article’s full text is weak. That means it makes no difference in an article’s ranking if the article contains the query terms only once or multiple times. It was further researched whether the name of an author or journal has an impact on the ranking and whether differences exist between the ranking algorithms of different search modes that Google Scholar offers. The answer in both of these cases was "yes". The results of our research may help authors to optimize their articles for Google Scholar and enable researchers to estimate the usefulness of Google Scholar with respect to their search intention and hence the need to use further academic search engines or databases. Academic Search Engines, Google Scholar, Ranking Algorithm, Research in Progress

146 citations

Proceedings ArticleDOI
12 Oct 2013
TL;DR: It is found that results of offline and online evaluations often contradict each other, and it is concluded that offline evaluations may be inappropriate for evaluating research paper recommender systems, in many settings.
Abstract: Offline evaluations are the most common evaluation method for research paper recommender systems. However, no thorough discussion on the appropriateness of offline evaluations has taken place, despite some voiced criticism. We conducted a study in which we evaluated various recommendation approaches with both offline and online evaluations. We found that results of offline and online evaluations often contradict each other. We discuss this finding in detail and conclude that offline evaluations may be inappropriate for evaluating research paper recommender systems, in many settings.

136 citations

01 Jan 2009
TL;DR: The approach called Citation Proximity Analysis (CPA) is a further development of co-citation analysis, but in addition, considers the proximity of citations to each other within an article’s full-text.
Abstract: This paper presents an approach for identifying similar documents that can be used to assist scientists in finding related work. The approach called Citation Proximity Analysis (CPA) is a further development of co-citation analysis, but in addition, considers the proximity of citations to each other within an article‟s full-text. The underlying idea is that the closer citations are to each other, the more likely it is that they are related. In comparison to existing approaches, such as bibliographic coupling, co-citation analysis or keyword based approaches the advantages of CPA are a higher precision and the possibility to identify related sections within documents. Moreover, CPA allows a more precise automatic document classification. CPA is used as the primary approach to analyse the similarity and to classify the 1.2 million publications contained in the research paper recommender system Scienstein.org. Introduction and Motivation The search for related scientific work can be tedious, and often important documents are missed out. Difficulties are caused by an increasing number of publications, growing exponentially at a yearly rate of 3.7 %, unclear nomenclature, synonyms and numerous other factors [1]. In practice, most searches for related work start with some initial papers and navigating the citation web nearest to those papers. However, even the more advanced approaches for identifying related work based on co-word analysis, collaborative filtering, Subject-Action-Object (SAO) structures or citation analysis do often not deliver satisfying results [2-8]. Therefore, we developed a new approach to determine the similarity of documents, which we name Citation Proximity Analysis (CPA). The approach is based on cocitation analysis and improves precision by considering the position of citations. The presented approach was developed for the research paper recommender Scienstein 1 to assist researchers in finding related work. The first part of this paper gives an overview about existing methods to identify similar documents, whereas the focus lies on the most popular citation analysis approaches and their strengths and weaknesses. The second part explains how the CPA can be used to measure similarity and the steps necessary to calculate a new metric that we call Citation Proximity Index (CPI). Afterwards, first results from an empirical study comparing the performance of co-citation analysis and CPA are presented. Finally, an outlook on further implications and how the CPA could be used in other fields is given. 1 www.scienstein.org is a research paper recommender focusing on identifying related work developed by the authors Related Work Various approaches exist to determine the degree of similarity of documents in order to identify related work. Whereas text-mining approaches are used in cases in which references are not stated, citation analysis approaches usually deliver superior results as e.g. synonyms and unclear nomenclature do not lead to misleading results [3, 4, 5]. Many citation analysis approaches exist and they all have their own strengths and weaknesses for identifying similar documents. Among the most widely used are the easily applicable „cited by‟ approach, which considers papers as relevant that cite the same input document and the „reference list‟ approach, which considers papers as relevant that were referenced by the input document. The best results can usually be obtained by bibliographic coupling and co-citation analysis, which allow calculating the coupling strength [6]. These approaches, which were already invented in the 60s and 70s, are used by scientists and on academic search engine websites like CiteSeer 2

133 citations


Cited by
More filters
Posted Content
TL;DR: It is found that BERT was significantly undertrained, and can match or exceed the performance of every model published after it, and the best model achieves state-of-the-art results on GLUE, RACE and SQuAD.
Abstract: Language model pretraining has led to significant performance gains but careful comparison between different approaches is challenging. Training is computationally expensive, often done on private datasets of different sizes, and, as we will show, hyperparameter choices have significant impact on the final results. We present a replication study of BERT pretraining (Devlin et al., 2019) that carefully measures the impact of many key hyperparameters and training data size. We find that BERT was significantly undertrained, and can match or exceed the performance of every model published after it. Our best model achieves state-of-the-art results on GLUE, RACE and SQuAD. These results highlight the importance of previously overlooked design choices, and raise questions about the source of recently reported improvements. We release our models and code.

13,994 citations

01 Jan 2009

3,235 citations

01 Jan 1985
TL;DR: In this paper, Meyrowitz shows how changes in media have created new social situations that are no longer shaped by where we are or who is "with" us, making it impossible for us to behave with each other in traditional ways.
Abstract: How have changes in media affected our everyday experience, behavior, and sense of identity? Such questions have generated endless arguments and speculations, but no thinker has addressed the issue with such force and originality as Joshua Meyrowitz in No Sense of Place. Advancing a daring and sophisticated theory, Meyrowitz shows how television and other electronic media have created new social situations that are no longer shaped by where we are or who is "with" us. While other media experts have limited the debate to message content, Meyrowitz focuses on the ways in which changes in media rearrange "who knows what about whom" and "who knows what compared to whom," making it impossible for us to behave with each other in traditional ways. No Sense of Place explains how the electronic landscape has encouraged the development of: -More adultlike children and more childlike adults; -More career-oriented women and more family-oriented men; and -Leaders who try to act more like the "person next door" and real neighbors who want to have a greater say in local, national, and international affairs. The dramatic changes fostered by electronic media, notes Meyrowitz, are neither entirely good nor entirely bad. In some ways, we are returning to older, pre-literate forms of social behavior, becoming "hunters and gatherers of an information age." In other ways, we are rushing forward into a new social world. New media have helped to liberate many people from restrictive, place-defined roles, but the resulting heightened expectations have also led to new social tensions and frustrations. Once taken-for-granted behaviors are now subject to constant debate and negotiation. The book richly explicates the quadruple pun in its title: Changes in media transform how we sense information and how we make sense of our physical and social places in the world.

1,361 citations

Journal ArticleDOI
TL;DR: A comprehensive classification of blockchain-enabled applications across diverse sectors such as supply chain, business, healthcare, IoT, privacy, and data management is presented, and key themes, trends and emerging areas for research are established.

1,310 citations

Journal ArticleDOI
TL;DR: A holistic framework which incorporates different components from IoT architectures/frameworks proposed in the literature, in order to efficiently integrate smart home objects in a cloud-centric IoT based solution is proposed.

1,003 citations