scispace - formally typeset
Author

Ana Lucic

Other affiliations: DePaul University
Bio: Ana Lucic is an academic researcher from University of Illinois at Urbana–Champaign. The author has contributed to research in topic(s): Noun & Support vector machine. The author has an hindex of 5, co-authored 16 publication(s) receiving 56 citation(s). Previous affiliations of Ana Lucic include DePaul University.

Papers
More filters
Journal ArticleDOI

[...]

TL;DR: A two-step approach is introduced to automatically extract three facets - two entities (the agent and object) and the way in which the entities are compared (the endpoint) from direct comparative sentences in full-text articles to accelerate the systematic review process and identify gaps where future research should be focused.
Abstract: Display Omitted Manual processes used in systematic reviews and comparative effectiveness studies are slow.We built a system to extract 3 facets (agent, object and endpoints) from comparative sentences.The system achieved 73%, 93%, and 73% accuracy for agents, objects, and endpoints respectively.The situated case-study of Metformin shows how the system results can inform a systematic review.The gaps shown in the tabular summary can help to focus where future work should be directed. Preparing a systematic review can take hundreds of hours to complete, but the process of reconciling different results from multiple studies is the bedrock of evidence-based medicine. We introduce a two-step approach to automatically extract three facets - two entities (the agent and object) and the way in which the entities are compared (the endpoint) - from direct comparative sentences in full-text articles. The system does not require a user to predefine entities in advance and thus can be used in domains where entity recognition is difficult or unavailable. As with a systematic review, the tabular summary produced using the automatically extracted facets shows how experimental results differ between studies. Experiments were conducted using a collection of more than 2million sentences from three journals Diabetes, Carcinogenesis and Endocrinology and two machine learning algorithms, support vector machines (SVM) and a general linear model (GLM). F1 and accuracy measures for the SVM and GLM differed by only 0.01 across all three comparison facets in a randomly selected set of test sentences. The system achieved the best performance of 92% for objects, whereas the accuracy for both agent and endpoints was 73%. F1 scores were higher for objects (0.77) than for endpoints (0.51) or agents (0.47). A situated evaluation of Metformin, a drug to treat diabetes, showed system accuracy of 95%, 83% and 79% for the object, endpoint and agent respectively. The situated evaluation had higher F1 scores of 0.88, 0.64 and 0.62 for object, endpoint, and agent respectively. On average, only 5.31% of the sentences in a full-text article are direct comparisons, but the tabular summaries suggest that these sentences provide a rich source of currently underutilized information that can be used to accelerate the systematic review process and identify gaps where future research should be focused.

14 citations

[...]

01 Jan 2020
TL;DR: This essay presents quantitative capture and predictive modeling for one of the largest and longest running mass reading programs of the past two decades: “One Book One Chicago” (OBOC) sponsored by the Chicago Public Library (CPL).
Abstract: This essay presents quantitative capture and predictive modeling for one of the largest and longest running mass reading programs of the past two decades: “One Book One Chicago” (OBOC) sponsored by the Chicago Public Library (CPL). The Reading Chicago Reading project uses data associated with OBOC as a probe into city-scale library usage and, by extension, as a window onto contemporary reading behavior. The first half of the essay explains why CPL’s OBOC program is conducive for modeling purposes, and the second half documents the creation of our models, their underlying data, and the results.

8 citations

Journal ArticleDOI

[...]

TL;DR: A new high-level feature based on local syntactic dependencies that an author uses when referring to a named entity (in this case a person’s name) is introduced and a series of experiments reveal how the amount of data in both the training and test sets influences predictive performance.
Abstract: Accurately determining who wrote a manuscript has captivated scholars of literary history for centuries, as the true author can have important ramifications in religion, law, literary studies, philosophy, and education. A wide array of lexical, character, syntactic, semantic, and application-specific features have been proposed to represent a text so that authorship attribution can be established automatically. Although surface-level features have been tested extensively, few studies have systematically explored high-level features, in part due to limitations in the natural language processing techniques required to capture high-level features. However, high-level features, such as sentence structure, are used subconsciously by a writer and thus may be more consistent than surface-level features, such as word choice. In this article, we introduce a new high-level feature based on local syntactic dependencies that an author uses when referring to a named entity (in our case a person’s name). The series of experiments in the contexts of movie reviews reveal how the amount of data in both the training and test sets influences predictive performance. Finally, we measure authorship consistency with respect to this new feature and show how consistency influences predictive performance. These results provide other researchers with a new model for how to evaluate new features and suggest that the local syntactic dependencies warrant further investigation.

7 citations

Proceedings Article

[...]

01 Jan 2016
TL;DR: It is shown empirically that establishing if head noun is an amount or measure provides a statistically significant improvement that increases the endpoint precision from 0.42 to 0.56 on longer and from0.51 to0.58 on shorter sentences and recall.
Abstract: Authors of biomedical articles use comparison sentences to communicate the findings of a study, and to compare the results of the current study with earlier studies The Claim Framework defines a comparison claim as a sentence that includes at least two entities that are being compared, and an endpoint that captures the way in which the entities are compared Although automated methods have been developed to identify comparison sentences from the text, identifying the role that a specific noun plays (ie entity or endpoint) is much more difficult Automated methods have been successful at identifying the second entity, but classification models were unable to clearly differentiate between the first entity and the endpoint We show empirically that establishing if head noun is an amount or measure provides a statistically significant improvement that increases the endpoint precision from 042 to 056 on longer and from 051 to 058 on shorter sentences and recall from 064 to 071 on longer and from 069 to 074 on shorter sentences The differences were not statistically significant for the second compared entity

6 citations

[...]

15 Mar 2015
TL;DR: This paper focuses on the identification of any claim in an article and on the Identification of explicit claims, a subtype of a more general claim, and frames the problem as a classification task and employs three different domainindependent feature selection strategies.
Abstract: The idea of automating systematic reviews has been motivated by both advances in technology that have increased the availability of full-text scientific articles and by sociological changes that have increased the adoption of evidence-based medicine. Although much work has focused on automating the information retrieval step of the systematic review process with a few exceptions the information extraction and analysis have been largely overlooked. In particular, there is a lack of systems that automatically identify the results of an empirical study. Our goal in this paper is to fill that gap. More specifically, we focus on the identification of 1) any claim in an article and on the identification of 2) explicit claims, a subtype of a more general claim. We frame the problem as a classification task and employ three different domainindependent feature selection strategies (χ statistic, information gain, and mutual information) with two different classifiers [support vector machines (SVM) and naive Bayes (NB)]. With respect to both accuracy and F1, the χ statistic and information gain consistently outperform mutual information. The SVM and NB classifiers had similar accuracy when predicting any claim but NB had better F1 performance for explicit claims. Lastly, we explored a semantic model developed for a different dataset. Accuracy was lower for the semantic model but when used with SVM plus sentence location information, this model actually achieved a higher F1 score for predicting explicit claims than all of the feature selection strategies. When used with NB, the prior model for explicit claims performed better than MI, but the F1 score dropped 0.04 to 0.08 compared with models built on training data in the same collection. Further work is needed to understand how features developed for one collection might be used to minimize the amount of training data needed for a new collection.

5 citations


Cited by
More filters
Journal ArticleDOI

[...]

TL;DR: In this paper, Imagined communities: Reflections on the origin and spread of nationalism are discussed. And the history of European ideas: Vol. 21, No. 5, pp. 721-722.
Abstract: (1995). Imagined communities: Reflections on the origin and spread of nationalism. History of European Ideas: Vol. 21, No. 5, pp. 721-722.

13,241 citations

Journal ArticleDOI

[...]

TL;DR: Sampson, Robert J. as mentioned in this paper, The Great American city: Chicago and the enduring neighborhood effect. Chicago: University of Chicago Press. 2012. pp. 552, $27.50 cloth.
Abstract: Sampson, Robert J. 2012. Great American city: Chicago and the enduring neighborhood effect. Chicago: University of Chicago Press. ISBN-13: 9780226734569. pp. 552, $27.50 cloth. Robert J. Sampson’s ...

1,006 citations

Proceedings Article

[...]

16 May 2014
TL;DR: The EPFL-CONF-203561 study highlights the need to understand more fully the role of social media in the decision-making process and the role that media outlets play in this process.
Abstract: Keywords: Crisis Informatics ; Social Media Collection ; Social Media Analysis Reference EPFL-CONF-203561 Record created on 2014-11-26, modified on 2017-05-12

277 citations

Journal Article

[...]

TL;DR: The authors reviewed the book "The Sixth Extinction: An Unnatural History" by Elizabeth Kolbert and found it to be a good book to read for any history book reader, regardless of genre.
Abstract: The article reviews the book "The Sixth Extinction: An Unnatural History" by Elizabeth Kolbert.

175 citations

[...]

01 Jan 2016

128 citations