Showing papers in "Journal of the Association for Information Science and Technology in 2017"
TL;DR: This review provides an extensive account of the state of the art in both scholarly use of social media and altmetrics, reviewing the various functions these platforms have in the scholarly communication process and the factors that affect this use.
Abstract: Social media has become integrated into the fabric of the scholarly communication system in fundamental ways, principally through scholarly use of social media platforms and the promotion of new indicators on the basis of interactions with these platforms. Research and scholarship in this area has accelerated since the coining and subsequent advocacy for altmetrics—that is, research indicators based on social media activity. This review provides an extensive account of the state-of-the art in both scholarly use of social media and altmetrics. The review consists of 2 main parts: the first examines the use of social media in academia, reviewing the various functions these platforms have in the scholarly communication process and the factors that affect this use. The second part reviews empirical studies of altmetrics, discussing the various interpretations of altmetrics, data collection and methodological limitations, and differences according to platform. The review ends with a critical discussion of the implications of this transformation in the scholarly communication system.
TL;DR: The range of RDM activities explored in this study are positioned on a "landscape maturity model,” which reflects current and planned research data services and practice in academic libraries, representing a “snapshot” of current developments and a baseline for future research.
Abstract: This article reports an international study of research data management (RDM) activities, services, and capabilities in higher education libraries It presents the results of a survey covering higher education libraries in Australia, Canada, Germany, Ireland, the Netherlands, New Zealand, and the UK The results indicate that libraries have provided leadership in RDM, particularly in advocacy and policy development Service development is still limited, focused especially on advisory and consultancy services (such as data management planning support and data-related training), rather than technical services (such as provision of a data catalog, and curation of active data) Data curation skills development is underway in libraries, but skills and capabilities are not consistently in place and remain a concern Other major challenges include resourcing, working with other support services, and achieving “buy in” from researchers and senior managers Results are compared with previous studies in order to assess trends and relative maturity levels The range of RDM activities explored in this study are positioned on a “landscape maturity model,” which reflects current and planned research data services and practice in academic libraries, representing a “snapshot” of current developments and a baseline for future research
TL;DR: In this article, the authors take stock of the main uses of patent citation data and highlight the pitfalls associated with patent data, related to office, time and technology, examiner, and strategic effects.
Abstract: The last 2 decades have witnessed a dramatic increase in the use of patent citation data in social science research. Facilitated by digitization of the patent data and increasing computing power, a community of practice has grown up that has developed methods for using these data to: measure attributes of innovations such as impact and originality; to trace flows of knowledge across individuals, institutions and regions; and to map innovation networks. The objective of this article is threefold. First, it takes stock of these main uses. Second, it discusses 4 pitfalls associated with patent citation data, related to office, time and technology, examiner, and strategic effects. Third, it highlights gaps in our understanding and offers directions for future research.
TL;DR: Comparing science, technology, and medicine with arts, humanities, and social sciences showed a significant difference in attitude on a number of questions, but the effect size was small, suggesting that attitudes are relatively consistent across the academic community.
Abstract: While there is significant progress with policy and a lively debate regarding the potential impact of open access publishing, few studies have examined academics' behavior and attitudes to open access publishing (OAP) in scholarly journals. This article seeks to address this gap through an international and interdisciplinary survey of academics. Issues covered include: use of and intentions regarding OAP, and perceptions regarding advantages and disadvantages of OAP, journal article publication services, peer review, and reuse. Despite reporting engagement in OAP, academics were unsure about their future intentions regarding OAP. Broadly, academics identified the potential for wider circulation as the key advantage of OAP, and were more positive about its benefits than they were negative about its disadvantages. As regards services, rigorous peer review, followed by rapid publication were most valued. Academics reported strong views on reuse of their work; they were relatively happy with noncommercial reuse, but not in favor of commercial reuse, adaptations, and inclusion in anthologies. Comparing science, technology, and medicine with arts, humanities, and social sciences showed a significant difference in attitude on a number of questions, but, in general, the effect size was small, suggesting that attitudes are relatively consistent across the academic community.
TL;DR: This work investigates text relevance decision dynamics in a question‐answering task by direct measurement of eye movement using eye‐tracking and brain activity using electroencephalography EEG, suggesting differences in cognitive processes used to assess texts of varied relevance levels and providing evidence for the potential to detect these differences in information search sessions.
Abstract: Assessment of text relevance is an important aspect of human–information interaction. For many search sessions it is essential to achieving the task goal. This work investigates text relevance decision dynamics in a question-answering task by direct measurement of eye movement using eye-tracking and brain activity using electroencephalography EEG. The EEG measurements are correlated with the user's goal-directed attention allocation revealed by their eye movements. In a within-subject lab experiment (N = 24), participants read short news stories of varied relevance. Eye movement and EEG features were calculated in three epochs of reading each news story (early, middle, final) and for periods where relevant words were read. Perceived relevance classification models were learned for each epoch. The results show reading epochs where relevant words were processed could be distinguished from other epochs. The classification models show increasing divergence in processing relevant vs. irrelevant documents after the initial epoch. This suggests differences in cognitive processes used to assess texts of varied relevance levels and provides evidence for the potential to detect these differences in information search sessions using eye tracking and EEG.
TL;DR: The authors identified the 250 most heavily used journals in each of 26 research fields (4,721 journals, 19.4M articles) indexed by the Scopus database, and test whether topic, academic status, and accessibility make articles from these journals more or less likely to be referenced on Wikipedia.
Abstract: With the rise of Wikipedia as a first-stop source for scientific information, it is important to understand whether Wikipedia draws upon the research that scientists value most. Here we identify the 250 most heavily used journals in each of 26 research fields (4,721 journals, 19.4M articles) indexed by the Scopus database, and test whether topic, academic status, and accessibility make articles from these journals more or less likely to be referenced on Wikipedia. We find that a journal's academic status (impact factor) and accessibility (open access policy) both strongly increase the probability of it being referenced on Wikipedia. Controlling for field and impact factor, the odds that an open access journal is referenced on the English Wikipedia are 47% higher compared to paywall journals. These findings provide evidence is that a major consequence of open access policies is to significantly amplify the diffusion of science, through an intermediary like Wikipedia, to a broad audience.
TL;DR: It was showed that scientists' data reuse intentions are influenced by both disciplinary level factors and individual level factors, which has practical implications for promoting data reuse practices.
Abstract: This study explores the factors that influence the data reuse behaviors of scientists and identifies the generalized patterns that occur in data reuse across various disciplines. This research employed an integrated theoretical framework combining institutional theory and the theory of planned behavior. The combined theoretical framework can apply the institutional theory at the individual level and extend the theory of planned behavior by including relevant contexts. This study utilized a survey method to test the proposed research model and hypotheses. Study participants were recruited from the Community of Science's (CoS) Scholar Database, and a total of 1,528 scientists responded to the survey. A multilevel analysis method was used to analyze the 1,237 qualified responses. This research showed that scientists' data reuse intentions are influenced by both disciplinary level factors (availability of data repositories) and individual level factors (perceived usefulness, perceived concern, and the availability of internal resources). This study has practical implications for promoting data reuse practices. Three main areas that need to be improved are identified: Educating scientists, providing internal supports, and providing external resources and supports such as data repositories.
TL;DR: Analysis of data from higher education institutions in the UK on their experience of the open‐access (OA) publishing market working within a policy environment favoring “Gold” OA suggests a correlation between APC price and journal quality.
Abstract: This paper reports analysis of data from higher education institutions in the UK on their experience of the open-access (OA) publishing market working within a policy environment favouring ‘Gold’ OA (OA publishing in journals). It models the ‘total cost of publication’ – comprising costs of journal subscriptions, OA article-processing charges (APCs) and new administrative costs – for a sample of 24 institutions. APCs are shown to constitute 12% of the ‘total cost of publication’, APC administration, 1%, and subscriptions, 87% (for a sample of seven publishers). APC expenditure in institutions rose between 2012 and 2014 at the same time as rising subscription costs. There was disproportionately high take up of Gold options for Health and Life Sciences articles. APC prices paid varied widely, with a mean APC of £1,586 in 2014. ‘Hybrid’ options (subscription journals also offering OA for individual articles on payment of an APC) were considerably more expensive than fully-OA titles, but the data indicate a correlation between APC price and journal quality (as reflected in the citation rates of journals). The policy implications of these developments are explored particularly in relation to hybrid OA and potential of offsetting subscription and APC costs.
TL;DR: An integrative study of the use of CiteSpace, a visual analytic tool for finding trends and patterns in scientific literature, is investigated and three levels of proficiency are identified: level 1: low proficiency, level 2: intermediate proficiency, and level 3: high proficiency.
Abstract: Using visual analytic systems effectively may incur a steep learning curve for users, especially for those who have little prior knowledge of either using the tool or accomplishing analytic tasks. ...
TL;DR: This article discusses how research infrastructures are identified and referenced by scholars in the research literature and how those references are being collected and analyzed for the purposes of evaluating impact and identifies notable challenges that impede the analysis of impact metrics.
Abstract: Recent policy shifts on the part of funding agencies and journal publishers are causing changes in the acknowledgment and citation behaviors of scholars. A growing emphasis on open science and reproducibility is changing how authors cite and acknowledge “research infrastructures”—entities that are used as inputs to or as underlying foundations for scholarly research, including data sets, software packages, computational models, observational platforms, and computing facilities. At the same time, stakeholder interest in quantitative understanding of impact is spurring increased collection and analysis of metrics related to use of research infrastructures. This article reviews work spanning several decades on tracing and assessing the outcomes and impacts from these kinds of research infrastructures. We discuss how research infrastructures are identified and referenced by scholars in the research literature and how those references are being collected and analyzed for the purposes of evaluating impact. Synthesizing common features of a wide range of studies, we identify notable challenges that impede the analysis of impact metrics for research infrastructures and outline key open research questions that can guide future research and applications related to such metrics.
TL;DR: An approach for triaging user content into four severity categories that are defined based on an indication of self‐harm ideation is proposed and it is shown that overall, long‐term users of the forum demonstrate decreased severity of risk over time.
Abstract: In recent years, social media has become a significant resource for improving healthcare and mental health. Mental health forums are online communities where people express their issues, and seek help from moderators and other users. In such forums, there are often posts with severe content indicating that the user is in acute distress and there is a risk of attempted self-harm. Moderators need to respond to these severe posts in a timely manner to prevent potential self-harm. However, the large volume of daily posted content makes it difficult for the moderators to locate and respond to these critical posts. We propose an approach for triaging user content into four severity categories that are defined based on an indication of self-harm ideation. Our models are based on a feature-rich classification framework, which includes lexical, psycholinguistic, contextual, and topic modeling features. Our approaches improve over the state of the art in triaging the content severity in mental health forums by large margins (up to 17% improvement over the F-1 scores). Furthermore, using our proposed model, we analyze the mental state of users and we show that overall, long-term users of the forum demonstrate decreased severity of risk over time. Our analysis on the interaction of the moderators with the users further indicates that without an automatic way to identify critical content, it is indeed challenging for the moderators to provide timely response to the users in need.
TL;DR: It is argued that cooperation and collaborations among iSchools can promote a culture of sustainable information practices among university graduates and researchers in different disciplines that will pave the way for achieving SDGs in every sector.
Abstract: In September 2015, the United Nations (UN) GeneralAssembly passed a resolution identifying 17 Sustain-able Development Goals (SDGs) and 169 associated tar-gets, and countries around the world agreed to achievethese by 2030. By conducting a thematic analysis offour key UN policy documents related to sustainabledevelopment, this paper argues that alongside financialand other resources, access to, and use of, appropriateinformation are essential for achieving SDGs. The paperalso reviews research on information and sustainabilityundertaken at the iSchools and the computer andhuman–computer interaction HCI communities. Giventhat the mission of iSchools is to connect people andsociety with the required information through the use ofappropriate technologies and tools, this paper arguesthat iSchools can play a key role in helping people, insti-tutions, and businesses, and thus countries around theworld achieve SDGs. The paper identifies 4 broad areasof teaching and research that can help iSchools aroundthe world prepare a trained workforce who can manage,and facilitate access to, information in specific domainsand contexts. It is also argued that cooperation and col-laborations among iSchools can promote a culture ofsustainable information practices among universitygraduates and researchers in different disciplines thatwill pave the way for achieving SDGs in every sector.
TL;DR: A set of case studies where researchers were embedded within data science teams and where the researcher observations and analysis was focused on the attributes that can help describe data science projects and the challenges faced by the teams executing these projects, as opposed to the algorithms and technologies that were used to perform the analytics.
Abstract: The challenge in executing a data science project is more than just identifying the best algorithm and tool set to use. Additional sociotechnical challenges include items such as how to define the project goals and how to ensure the project is effectively managed. This paper reports on a set of case studies where researchers were embedded within data science teams and where the researcher observations and analysis was focused on the attributes that can help describe data science projects and the challenges faced by the teams executing these projects, as opposed to the algorithms and technologies that were used to perform the analytics. Based on our case studies, we identified 14 characteristics that can help describe a data science project. We then used these characteristics to create a model that defines two key dimensions of the project. Finally, by clustering the projects within these two dimensions, we identified four types of data science projects, and based on the type of project, we identified some of the sociotechnical challenges that project teams should expect to encounter when executing data science projects.
TL;DR: This meta‐synthesis provides an in‐depth description of acknowledgments research and reveals the five main thematic categories that emerge from this corpus of literature.
Abstract: This review of the literature presents an overview of the last 50 years of research on acknowledgments in the context of scholarly communication. Through qualitative coding and bibliometric methods, this meta-synthesis provides an in-depth description of acknowledgments research and reveals the five main thematic categories that emerge from this corpus of literature. Adopting a historical approach, this review shows a diversified and scattered research landscape. Despite five decades of analysis putting forward the potential value of acknowledgments as markers of scientific capital, the literature still lacks consensus as to the value and functions of acknowledgments within the reward system of science.
TL;DR: Wang et al. as mentioned in this paper analyzed a set of retracted articles indexed in Thomson Reuters Web of Science (WoS), and ran multiple experiments to compare changes in scholarly impact against a control set of non-retracted articles, authors, and institutions.
Abstract: During the past few decades, the rate of publication retractions has increased dramatically in academia. In this study, we investigate retractions from a quantitative perspective, aiming to answer two fundamental questions. One, how do retractions influence the scholarly impact of retracted papers, authors, and institutions? Two, does this influence propagate to the wider academic community through scholarly associations? Specifically, we analyzed a set of retracted articles indexed in Thomson Reuters Web of Science (WoS), and ran multiple experiments to compare changes in scholarly impact against a control set of nonretracted articles, authors, and institutions. We further applied the Granger Causality test to investigate whether different scientific topics are dynamically affected by retracted papers occurring within those topics. Our results show two key findings: first, the scholarly impact of retracted papers and authors significantly decreases after retraction, and the most severe impact decrease correlates with retractions based on proven, purposeful scientific misconduct; second, this retraction penalty does not seem to spread through the broader scholarly social graph, but instead has a limited and localized effect. Our findings may provide useful insights for scholars or science committees to evaluate the scholarly value of papers, authors, or institutions related to retractions.
TL;DR: It is argued that banter fosters disclosure in both subreddits, and that banter and disclosure are linked with information‐seeking behaviors in online forums.
Abstract: Although people disclose illicit activities such as drug use online, we currently know little about what information people choose to disclose and share or whether there are differences in behavior depending on the illicit activity being disclosed. This exploratory mixed-methods study examines how people discuss and disclose the use of two different drugs—marijuana and opioids—on Reddit. In this study, hermeneutic content analysis is employed to describe the type of comments people make in forums dedicated to discussions about illicit drugs. With inductive analysis, seven categories of comments were identified: disclosure, instruction and advice, culture, community norms, moralizing, legality, and banter. Our subsequent quantitative analysis indicates that although the amounts of disclosure are similar in each subreddit, there are more instances of instruction and advice in discussions about opiates, and more examples of banter in comments about marijuana use. In fact, both subreddits have high rates of banter. We argue that banter fosters disclosure in both subreddits, and that banter and disclosure are linked with information-seeking behaviors in online forums. This work has implications for future explorations of disclosure online and for public health interventions aimed at disseminating credible information about drug use to at-risk individuals.
TL;DR: This work first leverage a pattern‐based method to automatically extract drug–disease pairs with treatment and inducement relationships from free text, and a network embedding algorithm is proposed to calculate the degree of correlation of a drug–Disease pair.
Abstract: Automatic extraction of large-scale and accurate drug–disease pairs from the medical literature plays an important role for drug repurposing. However, many existing extraction methods are mainly in a supervised manner. It is costly and time-consuming to manually label drug–disease pairs datasets. There are many drug–disease pairs buried in free text. In this work, we first leverage a pattern-based method to automatically extract drug–disease pairs with treatment and inducement relationships from free text. Then, to reflect a drug–disease relation, a network embedding algorithm is proposed to calculate the degree of correlation of a drug–disease pair. In the experiments, we use the method to extract treatment and inducement drug–disease pairs from 27 million medical abstracts and titles available on PubMed. We extract 138,318 unique treatment pairs and 75,396 unique inducement pairs. Our algorithm achieves a precision of 0.912 and a recall of 0.898 in extracting the frequent treatment drug–disease pairs, and a precision of 0.923 and a recall of 0.833 in extracting the frequent inducement drug–disease pairs. Besides, our proposed information network embedding algorithm can efficiently reflect the degree of correlation of drug–disease pairs. Our algorithm can achieve a precision of 0.802, a recall of 0.783 in the fine-grained evaluation of extracting frequent pairs.
TL;DR: This study examines how the goals behind methodology surface in everyday DH work practices and in DH curricula in order to investigate if the critiques that have appeared in relation to DH information work are well founded and to suggest alternative narratives about information work in DH that will help advance the impact of the field in the humanities and beyond.
Abstract: The omnipresence and escalating efficiency of digital, networked information systems alongside the resulting deluge of digital corpora, apps, software, and data has coincided with increased concerns in the humanities with new topics and methods of inquiry. In particular, digital humanities (DH), the subfield that has emerged as the site of most of this work, has received growing attention in higher education in recent years. This study seeks to facilitate a better understanding of digital humanities by studying the motivations and practices of digital humanists as information workers in the humanities. To this end, we observe information work through interviews with DH scholars about their work practices and through a survey of DH programs such as graduate degrees, certificates, minors, and training institutes. In this study we focus on how the goals behind methodology (a link between theories and method) surface in everyday DH work practices and in DH curricula in order to investigate if the critiques that have appeared in relation to DH information work are well founded and to suggest alternative narratives about information work in DH that will help advance the impact of the field in the humanities and beyond.
TL;DR: In this paper, the authors measure the diversification explanatory power of the patent network map, and present a method to objectively choose an optimal trade-off between explanatory power and removing weak links.
Abstract: In the information science literature, recent studies have used patent databases and patent classification information to construct network maps of patent technology classes. In such a patent technology map, almost all pairs of technology classes are connected, whereas most of the connections between them are extremely weak. This observation suggests the possibility of filtering the patent network map by removing weak links. However, removing links may reduce the explanatory power of the network on inventor or organization diversification. The network links may explain the patent portfolio diversification paths of inventors and inventing organizations. We measure the diversification explanatory power of the patent network map, and present a method to objectively choose an optimal trade-off between explanatory power and removing weak links. We show that this method can remove a degree of arbitrariness compared with previous filtering methods based on arbitrary thresholds, and also identify previous filtering methods that created filters outside the optimal trade-off. The filtered map aims to aid in network visualization analyses of the technological diversification of inventors, organizations and other innovation agents, and potential foresight analysis. Such applications to a prolific inventor (Leonard Forbes) and company (Google) are demonstrated.
TL;DR: This paper characterize the popularity of news articles through a set of online metrics and tries to predict their values across time using machine learning techniques on a large collection of features obtained from various sources, indicating that predicting news popularity at cold start is a difficult task.
Abstract: Prominent news sites on the web provide hundreds of news articles daily. The abundance of news content competing to attract online attention, coupled with the manual effort involved in article selection, necessitates the timely prediction of future popularity of these news articles. The future popularity of a news article can be estimated using signals indicating the article's penetration in social media (e.g., number of tweets) in addition to traditional web analytics (e.g., number of page views). In practice, it is important to make such estimations as early as possible, preferably before the article is made available on the news site (i.e., at cold start). In this paper we perform a study on cold-start news popularity prediction using a collection of 13,319 news articles obtained from Yahoo News, a major news provider. We characterize the popularity of news articles through a set of online metrics and try to predict their values across time using machine learning techniques on a large collection of features obtained from various sources. Our findings indicate that predicting news popularity at cold start is a difficult task, contrary to the findings of a prior work on the same topic. Most articles' popularity may not be accurately anticipated solely on the basis of content features, without having the early-stage popularity values.
TL;DR: The nature of the relationship between UE and learning was more nuanced than expected and has implications for the design of information systems and, more fundamentally, the impetus to make digital environments engaging.
Abstract: User engagement (UE) is a quality of user experience characterized by the depth of an actor's cognitive, temporal, and/or emotional investment in an interaction with a digital system. Currently more art than science, UE has gained theoretical and methodological traction over the past decade, yet there is still a need to establish empirical links between UE and desired outcomes (e.g., learning, behavior change), and to understand the myriad user, system, contextual, and so on, factors that predict successful digital engagement. This paper focuses on the relationship between UE and media format as a potential antecedent, and the outcome of learning, operationalized as short‐term knowledge retention. Participants interacted with two human‐interest stories in one of four media formats: video, audio, narrative text, or transcript‐style text; short‐term knowledge retention was measured using post‐task multiple choice and short‐answer questions. It was anticipated that format would have a strong effect on UE, and that more engaged users would recall more information about the stories. However, these hypotheses were not fully supported, and the nature of the relationship between UE and learning was more nuanced than expected. This research has implications for the design of information systems and, more fundamentally, the impetus to make digital environments engaging.
TL;DR: The proposed framework, named TS‐Petar (Two‐Stage POI Extractor with Temporal Awareness), consists of a POI inventory and a two‐stage time‐aware POI tagger, devised to disambiguate the POI mentions and to resolve their associated temporal awareness accordingly.
Abstract: Twitter has attracted billions of users for life logging and sharing activities and opinions In their tweets, users often reveal their location information and short-term visiting histories or plans Capturing user's short-term activities could benefit many applications for providing the right context at the right time and location In this paper we are interested in extracting locations mentioned in tweets at fine-grained granularity, with temporal awareness Specifically, we recognize the points-of-interest (POIs) mentioned in a tweet and predict whether the user has visited, is currently at, or will soon visit the mentioned POIs A POI can be a restaurant, a shopping mall, a bookstore, or any other fine-grained location Our proposed framework, named TS-Petar (Two-Stage POI Extractor with Temporal Awareness), consists of two main components: a POI inventory and a two-stage time-aware POI tagger The POI inventory is built by exploiting the crowd wisdom of the Foursquare community It contains both POIs' formal names and their informal abbreviations, commonly observed in Foursquare check-ins The time-aware POI tagger, based on the Conditional Random Field (CRF) model, is devised to disambiguate the POI mentions and to resolve their associated temporal awareness accordingly Three sets of contextual features (linguistic, temporal, and inventory features) and two labeling schema features (OP and BILOU schemas) are explored for the time-aware POI extraction task Our empirical study shows that the subtask of POI disambiguation and the subtask of temporal awareness resolution call for different feature settings for best performance We have also evaluated the proposed TS-Petar against several strong baseline methods The experimental results demonstrate that the two-stage approach achieves the best accuracy and outperforms all baseline methods in terms of both effectiveness and efficiency
TL;DR: The aim of this paper is to extend the knowledge about the power‐law relationship between citation‐based performance and coauthorship patterns in papers in the natural sciences by analyzing 829,924 articles that received 16,490,346 citations.
Abstract: The aim of this paper is to extend our knowledge about the power-law relationship between citation-based performance and collaboration patterns for papers in the natural sciences. We analyzed 829,924 articles that received 16,490,346 citations. The number of articles published through collaboration account for 89%. The citation-based performance and collaboration patterns exhibit a power-law correlation with a scaling exponent of 1.20 ± 0.07. Citations to a subfield’s research articles tended to increase 2.1.20 or 2.30 times each time it doubles the number of collaborative papers. The scaling exponent for the power-law relationship for single-authored papers was 0.85 ± 0.11. The citations to a subfield’s single-authored research articles increased 2.0.85 or 1.89 times each time the research area doubles the number of non-collaborative papers. The Matthew effect is stronger for collaborated papers than for single-authored. In fact, with a scaling exponent < 1.0 the impact of single-author papers exhibits a cumulative disadvantage or inverse Matthew effect.
TL;DR: It is shown that online content creation, digital freedom, and access to the mobile Internet may positively impact political engagement, and the development of these factors may not only promote the inclusion of marginalized populations in future political events, but also help to build a more equal society.
Abstract: Information and communication technologies (ICTs) provide a distinctive structure of opportunities with the potential to promote political engagement. However, concerns remain over unequal technological access in our society, as political resources available on the internet empower those with the resources and motivation to take advantage of them, leaving those who are disengaged farther behind. Hence, those who face digital inequalities are not only deprived of the benefits of the so-called Information Society, they are also deprived of exercising their civic rights. To promote political engagement among the marginalized, we analyze different sociotechnical factors that may play a role in promoting their inclusion in future political activities. We employed a survey for marginalized communities to analyze a set of research questions relating to sociotechnical factors. We show that online content creation, digital freedom, and access to the mobile Internet may positively impact political engagement. The development of these factors may not only promote the inclusion of marginalized populations in future political events, but also help to build a more equal society where everyone's voice has a chance to be heard.
TL;DR: This work introduces two systems designed to help retrieving medical literature, both of which receive a long, discursive clinical note as input query, and return highly relevant literature that could be used in support of clinical practice.
Abstract: The large volume of biomedical literature poses a serious problem for medical professionals, who are often struggling to keep current with it. At the same time, many health providers consider knowledge of the latest literature in their field a key component for successful clinical practice. In this work, we introduce two systems designed to help retrieving medical literature. Both receive a long, discursive clinical note as input query, and return highly relevant literature that could be used in support of clinical practice. The first system is an improved version of a method previously proposed by the authors; it combines pseudo relevance feedback and a domain-specific term filter to reformulate the query. The second is an approach that uses a deep neural network to reformulate a clinical note. Both approaches were evaluated on the 2014 and 2015 TREC CDS datasets; in our tests, they outperform the previously proposed method by up to 28% in inferred NDCG; furthermore, they are competitive with the state of the art, achieving up to 8% improvement in inferred NDCG.
TL;DR: This article proposes to study the multidimensional user relevance model (MURM) on large scale query logs, which record users' various search behaviors in natural search settings and investigates the impact of each dimension on retrieval performance.
Abstract: Modeling multidimensional relevance in information retrieval (IR) has attracted much attention in recent years. However, most existing studies are conducted through relatively small-scale user studies, which may not reflect a real-world and natural search scenario. In this paper, we propose to study the multidimensional user relevance model (MURM) on large scale query logs, which record users’ various search behaviors (e.g., query reformulations, clicks and dwelling time, etc.) in natural search settings. We advance an existing MURM model (including five dimensions: topicality, novelty, reliability, understandability and scope) by providing two additional dimensions, i.e., interest and habit. The two new dimensions represent personalized relevance judgment on retrieved documents. Further, for each dimension in the enriched MURM model, a set of computable features are formulated. By conducting extensive document ranking experiments on Bing’s query logs and TREC session Track data, we systematically investigated the impact of each dimension on retrieval performance and gained a series of insightful findings which may bring benefits for the design of future IR systems.
TL;DR: A personalized graph‐based recommender framework is proposed, representing rating history and background multi‐facet information jointly as a relational graph, and a random walk measure is applied to rank available complementary multimedia presentations by their relevancy to a visitor's profile, integrating the various dimensions.
Abstract: Visitors to museums and other cultural heritage sites encounter a wealth of exhibits in a variety of subject areas, but can explore only a small number of them. Moreover, there typically exists rich complementary information that can be delivered to the visitor about exhibits of interest, but only a fraction of this information can be consumed during the limited time of the visit. Recommender systems may help visitors to cope with this information overload. Ideally, the recommender system of choice should model user preferences, as well as background knowledge about the museum's environment, considering aspects of physical and thematic relevancy. We propose a personalized graph-based recommender framework, representing rating history and background multi-facet information jointly as a relational graph. A random walk measure is applied to rank available complementary multimedia presentations by their relevancy to a visitor's profile, integrating the various dimensions. We report the results of experiments conducted using authentic data collected at the Hecht museum. An evaluation of multiple graph variants, compared with several popular and state-of-the-art recommendation methods, indicates on advantages of the graph-based approach.
TL;DR: The goal of this work was to classify tumor event attributes: negation, temporality, and malignancy, using biomedical ontology and linguistically enriched features, and show that the improved classification improves overall template structuring.
Abstract: Radiology reports contain vital diagnostic information that characterizes patient disease progression. However, information from reports is represented in free text, which is difficult to query against for secondary use. Automatic extraction of important information, such as tumor events using natural language processing, offers possibilities in improved clinical decision support, cohort identification, and retrospective evidence-based research for cancer patients. The goal of this work was to classify tumor event attributes: negation, temporality, and malignancy, using biomedical ontology and linguistically enriched features. We report our results on an annotated corpus of 101 hepatocellular carcinoma patient radiology reports, and show that the improved classification improves overall template structuring. Classification performances for negation identification, past temporality classification, and malignancy classification were at 0.94, 0.62, and 0.77 F1, respectively. Incorporating the attributes into full templates led to an improvement of 0.72 F1 for tumor-related events over a baseline of 0.65 F1. Improvement of negation, malignancy, and temporality classifications led to significant improvements in template extraction for the majority of categories. We present our machine-learning approach to identifying these several tumor event attributes from radiology reports, as well as highlight challenges and areas for improvement.
TL;DR: The results demonstrate that authors in the field of Healthcare tend to cite highly cited documents when they have a choice, and the average citation‐gap between selected or deselected studies narrows slightly over time, which fits poorly with the name‐dropping interpretation and better with the quality and impact‐interpretation.
Abstract: Citation frequencies are commonly interpreted as measures of quality or impact. Yet, the true nature of citations and their proper interpretation have been the center of a long, but still unresolved discussion in Bibliometrics. A comparison of 67,578 pairs of studies on the same healthcare topic, with the same publication age (1–15 years) reveals that when one of the studies is being selected for citation, it has on average received about three times as many citations as the other study. However, the average citation-gap between selected or deselected studies narrows slightly over time, which fits poorly with the name-dropping interpretation and better with the quality and impact-interpretation. The results demonstrate that authors in the field of Healthcare tend to cite highly cited documents when they have a choice. This is more likely caused by differences related to quality than differences related to status of the publications cited.
TL;DR: This study presents a comprehensive analysis of hashtags, tweet contents, and user profiles in Twitter spamming, which is useful for both tweet‐level and user‐level spam detection.
Abstract: Over the years, Twitter has become a popular platform for information dissemination and information gathering. However, the popularity of Twitter has attracted not only legitimate users but also spammers who exploit social graphs, popular keywords, and hashtags for malicious purposes. In this paper, we present a detailed analysis of the HSpam14 dataset, which contains 14 million tweets with spam and ham (i.e., nonspam) labels, to understand spamming activities on Twitter. The primary focus of this paper is to analyze various aspects of spam on Twitter based on hashtags, tweet contents, and user profiles, which are useful for both tweet-level and user-level spam detection. First, we compare the usage of hashtags in spam and ham tweets based on frequency, position, orthography, and co-occurrence. Second, for content-based analysis, we analyze the variations in word usage, metadata, and near-duplicate tweets. Third, for user-based analysis, we investigate user profile information. In our study, we validate that spammers use popular hashtags to promote their tweets. We also observe differences in the usage of words in spam and ham tweets. Spam tweets are more likely to be emphasized using exclamation points and capitalized words. Furthermore, we observe that spammers use multiple accounts to post near-duplicate tweets to promote their services and products. Unlike spammers, legitimate users are likely to provide more information such as their locations and personal descriptions in their profiles. In summary, this study presents a comprehensive analysis of hashtags, tweet contents, and user profiles in Twitter spamming.