scispace - formally typeset
Search or ask a question
Author

Fabio Crestani

Other affiliations: University UCINF, University of Glasgow, Leiden University  ...read more
Bio: Fabio Crestani is an academic researcher from University of Lugano. The author has contributed to research in topics: Relevance (information retrieval) & Ranking (information retrieval). The author has an hindex of 40, co-authored 365 publications receiving 6237 citations. Previous affiliations of Fabio Crestani include University UCINF & University of Glasgow.


Papers
More filters
Journal ArticleDOI
TL;DR: Fields related to sentiment analysis in Twitter including Twitter opinion retrieval, tracking sentiments over time, irony detection, emotion detection, and tweet sentiment quantification, tasks that have recently attracted increasing attention are discussed.
Abstract: Sentiment analysis in Twitter is a field that has recently attracted research interest. Twitter is one of the most popular microblog platforms on which users can publish their thoughts and opinions. Sentiment analysis in Twitter tackles the problem of analyzing the tweets in terms of the opinion they express. This survey provides an overview of the topic by investigating and briefly describing the algorithms that have been proposed for sentiment analysis in Twitter. The presented studies are categorized according to the approach they follow. In addition, we discuss fields related to sentiment analysis in Twitter including Twitter opinion retrieval, tracking sentiments over time, irony detection, emotion detection, and tweet sentiment quantification, tasks that have recently attracted increasing attention. Resources that have been used in the Twitter sentiment analysis literature are also briefly presented. The main contributions of this survey include the presentation of the proposed approaches for sentiment analysis in Twitter, their categorization according to the technique they use, and the discussion of recent research trends of the topic and its related fields.

406 citations

Journal ArticleDOI
TL;DR: The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented as mentioned in this paper, and various models proposed in the development of IR are described, classified, and compared using a common formalism.
Abstract: This article surveys probablistic approaches to modeling information retrieval. The basic concepts of probabilistic approaches to information retrieval are outlined and the principles and assumptions upon which the approaches are based are presented. The various models proposed in the development of IR are described, classified, and compared using a common formalism. New approaches that constitute the basis of future research are described.

244 citations

Book ChapterDOI
05 Sep 2016
TL;DR: A novel early detection task is proposed and a novel effectiveness measure is defined to systematically compare early detection algorithms that takes into account both the accuracy of the decisions taken by the algorithm and the delay in detecting positive cases.
Abstract: Several studies in the literature have shown that the words people use are indicative of their psychological states. In particular, depression was found to be associated with distinctive linguistic patterns. However, there is a lack of publicly available data for doing research on the interaction between language and depression. In this paper, we describe our first steps to fill this gap. We outline the methodology we have adopted to build and make publicly available a test collection on depression and language use. The resulting corpus includes a series of textual interactions written by different subjects. The new collection not only encourages research on differences in language between depressed and non-depressed individuals, but also on the evolution of the language use of depressed individuals. Further, we propose a novel early detection task and define a novel effectiveness measure to systematically compare early detection algorithms. This new measure takes into account both the accuracy of the decisions taken by the algorithm and the delay in detecting positive cases. We also present baseline results with novel detection methods that process users’ interactions in different ways.

199 citations

Proceedings ArticleDOI
18 Jul 2019
TL;DR: This paper proposed a retrieval framework consisting of three components: question retrieval, question selection, and document retrieval, which takes into account the original query and previous question-answer interactions while selecting the next question.
Abstract: Users often fail to formulate their complex information needs in a single query. As a consequence, they may need to scan multiple result pages or reformulate their queries, which may be a frustrating experience. Alternatively, systems can improve user satisfaction by proactively asking questions of the users to clarify their information needs. Asking clarifying questions is especially important in conversational systems since they can only return a limited number of (often only one) result(s). In this paper, we formulate the task of asking clarifying questions in open-domain information-seeking conversational systems. To this end, we propose an offline evaluation methodology for the task and collect a dataset, called Qulac, through crowdsourcing. Our dataset is built on top of the TREC Web Track 2009-2012 data and consists of over 10K question-answer pairs for 198 TREC topics with 762 facets. Our experiments on an oracle model demonstrate that asking only one good question leads to over 170% retrieval performance improvement in terms of P@1, which clearly demonstrates the potential impact of the task. We further propose a retrieval framework consisting of three components: question retrieval, question selection, and document retrieval. In particular, our question selection model takes into account the original query and previous question-answer interactions while selecting the next question. Our model significantly outperforms competitive baselines. To foster research in this area, we have made Qulac publicly available.

193 citations

Book ChapterDOI
11 Sep 2017
TL;DR: This paper provides an overview of eRisk 2017, the main purpose of which was to explore issues of evaluation methodology, effectiveness metrics and other processes related to early risk detection.
Abstract: This paper provides an overview of eRisk 2017. This was the first year that this lab was organized at CLEF. The main purpose of eRisk was to explore issues of evaluation methodology, effectiveness metrics and other processes related to early risk detection. Early detection technologies can be employed in different areas, particularly those related to health and safety. The first edition of eRisk included a pilot task on early risk detection of depression.

122 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Abstract: The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.

7,539 citations

Book
08 Jul 2008
TL;DR: This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems and focuses on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis.
Abstract: An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

7,452 citations

Book
01 Jan 2002
TL;DR: This chapter discusses the construction of Inquiry, the science of inquiry, and the role of data in the design of research.
Abstract: Part I: AN INTRODUCTION TO INQUIRY. 1. Human Inquiry and Science. 2. Paradigms, Theory, and Research. 3. The Ethics and Politics of Social Research. Part II: THE STRUCTURING OF INQUIRY: QUANTITATIVE AND QUALITATIVE. 4. Research Design. 5. Conceptualization, Operationalization, and Measurement. 6. Indexes, Scales, and Typologies. 7. The Logic of Sampling. Part III: MODES OF OBSERVATION: QUANTITATIVE AND QUALITATIVE. 8. Experiments. 9. Survey Research. 10. Qualitative Field Research. 11. Unobtrusive Research. 12. Evaluation Research. Part IV: ANALYSIS OF DATA:QUANTITATIVE AND QUALITATIVE . 13. Qualitative Data Analysis. 14. Quantitative Data Analysis. 15. Reading and Writing Social Research. Appendix A. Using the Library. Appendix B. Random Numbers. Appendix C. Distribution of Chi Square. Appendix D. Normal Curve Areas. Appendix E. Estimated Sampling Error.

2,884 citations