scispace - formally typeset
Search or ask a question
Author

Fernando Martínez-Santiago

Bio: Fernando Martínez-Santiago is an academic researcher from University of Jaén. The author has contributed to research in topics: Clef & Query expansion. The author has an hindex of 7, co-authored 26 publications receiving 182 citations.

Papers
More filters
Posted ContentDOI
TL;DR: Bias is introduced in a formal way and how it has been treated in several networks, in terms of detection and correction, and a strategy to deal with bias in deep NLP is proposed.
Abstract: Deep neural networks are hegemonic approaches to many machine learning areas, including natural language processing (NLP). Thanks to the availability of large corpora collections and the capability of deep architectures to shape internal language mechanisms in self-supervised learning processes (also known as “pre-training”), versatile and performing models are released continuously for every new network design. These networks, somehow, learn a probability distribution of words and relations across the training collection used, inheriting the potential flaws, inconsistencies and biases contained in such a collection. As pre-trained models have been found to be very useful approaches to transfer learning, dealing with bias has become a relevant issue in this new scenario. We introduce bias in a formal way and explore how it has been treated in several networks, in terms of detection and correction. In addition, available resources are identified and a strategy to deal with bias in deep NLP is proposed.

78 citations

Journal ArticleDOI
TL;DR: The results show that CESA is a valid solution for sentiment analysis and that similar approaches for model building from the continuous flow of posts could be exploited in other scenarios.
Abstract: With the rapid growth of data generated by social web applications new paradigms in the generation of knowledge are opening. This paper introduces Crowd Explicit Sentiment Analysis (CESA) as an approach for sentiment analysis in social media environments. Similar to Explicit Semantic Analysis, microblog posts are indexed by a predefined collection of documents. In CESA, these documents are built up from common emotional expressions in social streams. In this way, texts are projected to feelings or emotions. This process is performed within a Latent Semantic Analysis. A few simple regular expressions (e.g. “I feel X”, considering X a term representing an emotion or feeling) are used to scratch the enormous flow of micro-blog posts to generate a textual representation of an emotional state with clear polarity value (e.g. angry, happy, sad, confident, etc.). In this way, new posts can be indexed by these feelings according to the distance to their textual representation. The approach is suitable in many scenarios dealing with social media publications and can be implemented in other languages with little effort. In particular, we have evaluated the system on Polarity Classification with both English and Spanish data sets. The results show that CESA is a valid solution for sentiment analysis and that similar approaches for model building from the continuous flow of posts could be exploited in other scenarios.

40 citations

Journal ArticleDOI
TL;DR: The Otium planner system for scheduling of leisure tasks in tourism allows users to create their own agenda of activities within specified dates, and the Ajax-based web interface eases the creation of the final plan, offering an interactive experience to the user.
Abstract: This paper introduces the Otium planner system for scheduling of leisure tasks in tourism. This novel service allows users to create their own agenda of activities within specified dates. Activities are selected from a list of recommended events according to last selected events, user preferences and other parameters. The proposed restrictions on the recommendation procedure have been found to capture static and dynamic user context. The recommendation function is linear and shows low computational cost. The events are extracted from web sources with almost no human manipulation, so the recommender is always showing new and recent events. The Ajax-based web interface eases the creation of the final plan, offering an interactive experience to the user. We consider that the trade-off between interactivity and recommendation complexity exits, and that the second issue is preferable in this type of services. The details about the design and implementation of the system are described, along with the issues the system resolves and some guidelines for enhancement.

25 citations

Journal ArticleDOI
TL;DR: 2-step RSV (RSV: Retrieval Status Value), a approach to obtain a single list of relevant documents for CLIR systems driven by query translation, which is based on the re-indexing of the retrieval documents according to the query vocabulary, and it performs noticeably better than traditional methods.
Abstract: A usual strategy to implement CLIR (Cross-Language Information Retrieval) systems is the so-called query translation approach. The user query is translated for each language present in the multilingual collection in order to compute an independent monolingual information retrieval process per language. Thus, this approach divides documents according to language. In this way, we obtain as many different collections as languages. After searching in these corpora and obtaining a result list per language, we must merge them in order to provide a single list of retrieved articles. In this paper, we propose an approach to obtain a single list of relevant documents for CLIR systems driven by query translation. This approach, which we call 2-step RSV (RSV: Retrieval Status Value), is based on the re-indexing of the retrieval documents according to the query vocabulary, and it performs noticeably better than traditional methods. The proposed method requires query vocabulary alignment: given a word for a given query, we must know the translation or translations to the other languages. Because this is not always possible, we have researched on a mixed model. This mixed model is applied in order to deal with queries with partial word-level alignment. The results prove that even in this scenario, 2-step RSV performs better than traditional merging methods.

24 citations

Journal ArticleDOI
TL;DR: Simple Upper Ontology, SUpO, a semantic grammar which is made up of detailed knowledge of facts of the everyday life of simple words is presented, a multilingual semantic grammarWhich has been implemented by using Grammatical Framework.
Abstract: Beginning communicators are children faced with to the task of language learning. Young, typically developing children are early speakers before the age of 2years, the initial period of communication and language development. When this development is not happening because of disabilities or delays, it is possible to use computer-aided tools in order to help people to palliate or overcome such limitations, at least partially. For example, an Augmentative and Alternative Communication system must manage a vocabulary made up of several hundred concepts, usually without knowledge of the language at a semantic and pragmatic level. Such knowledge would make possible, for example, the implementation of new strategies for word prediction based on meaning, a very precise natural language generation or semantic parsing of the messages so that it would not allow the composition of meaningless messages. We present Simple Upper Ontology, SUpO, a semantic grammar which is made up of detailed knowledge of facts of the everyday life of simple words. In order to build SUpO we developed a procedure designed to give syntactic detail to part of FrameNet, a well-known ontology which encodes knowledge about usages of language at the semantic level. The result of this procedure is a multilingual semantic grammar which has been implemented by using Grammatical Framework. Finally, we propose some examples of tools where the use of SUpO would be suitable.

12 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: A rigorous survey on sentiment analysis is presented, which portrays views presented by over one hundred articles published in the last decade regarding necessary tasks, approaches, and applications of sentiment analysis.
Abstract: With the advent of Web 2.0, people became more eager to express and share their opinions on web regarding day-to-day activities and global issues as well. Evolution of social media has also contributed immensely to these activities, thereby providing us a transparent platform to share views across the world. These electronic Word of Mouth (eWOM) statements expressed on the web are much prevalent in business and service industry to enable customer to share his/her point of view. In the last one and half decades, research communities, academia, public and service industries are working rigorously on sentiment analysis, also known as, opinion mining, to extract and analyze public mood and views. In this regard, this paper presents a rigorous survey on sentiment analysis, which portrays views presented by over one hundred articles published in the last decade regarding necessary tasks, approaches, and applications of sentiment analysis. Several sub-tasks need to be performed for sentiment analysis which in turn can be accomplished using various approaches and techniques. This survey covering published literature during 2002-2015, is organized on the basis of sub-tasks to be performed, machine learning and natural language processing techniques used and applications of sentiment analysis. The paper also presents open issues and along with a summary table of a hundred and sixty-one articles.

1,011 citations

Journal ArticleDOI
TL;DR: A detailed and up-to-date survey of the field, considering the different kinds of interfaces, the diversity of recommendation algorithms, the functionalities offered by these systems and their use of Artificial Intelligence techniques.
Abstract: Recommender systems are currently being applied in many different domains. This paper focuses on their application in tourism. A comprehensive and thorough search of the smart e-Tourism recommenders reported in the Artificial Intelligence journals and conferences since 2008 has been made. The paper provides a detailed and up-to-date survey of the field, considering the different kinds of interfaces, the diversity of recommendation algorithms, the functionalities offered by these systems and their use of Artificial Intelligence techniques. The survey also provides some guidelines for the construction of tourism recommenders and outlines the most promising areas of work in the field for the next years.

402 citations

Journal ArticleDOI
TL;DR: This paper provides a detailed survey of popular deep learning models that are increasingly applied in sentiment analysis and presents a taxonomy of sentiment analysis, which highlights the power of deep learning architectures for solving sentiment analysis problems.
Abstract: Social media is a powerful source of communication among people to share their sentiments in the form of opinions and views about any topic or article, which results in an enormous amount of unstructured information. Business organizations need to process and study these sentiments to investigate data and to gain business insights. Hence, to analyze these sentiments, various machine learning, and natural language processing-based approaches have been used in the past. However, deep learning-based methods are becoming very popular due to their high performance in recent times. This paper provides a detailed survey of popular deep learning models that are increasingly applied in sentiment analysis. We present a taxonomy of sentiment analysis and discuss the implications of popular deep learning architectures. The key contributions of various researchers are highlighted with the prime focus on deep learning approaches. The crucial sentiment analysis tasks are presented, and multiple languages are identified on which sentiment analysis is done. The survey also summarizes the popular datasets, key features of the datasets, deep learning model applied on them, accuracy obtained from them, and the comparison of various deep learning models. The primary purpose of this survey is to highlight the power of deep learning architectures for solving sentiment analysis problems.

385 citations

BookDOI
01 Jan 2004
TL;DR: The paper discusses the evaluation approach adopted, describes the tracks and tasks offered and the test collections used, and provides an outline of the guidelines given to the participants.
Abstract: We describe the overall organization of the CLEF 2003 evaluation campaign, with a particular focus on the cross-language ad hoc and domainspecific retrieval tracks. The paper discusses the evaluation approach adopted, describes the tracks and tasks offered and the test collections used, and provides an outline of the guidelines given to the participants. It concludes with an overview of the techniques employed for results calculation and analysis for the monolingual, bilingual and multilingual and GIRT tasks.

214 citations

01 Jan 2005
TL;DR: What Happened in CLEF 2004?.- What Happens in CLEf 2004?
Abstract: What Happened in CLEF 2004?.- What Happened in CLEF 2004?.- I. Ad Hoc Text Retrieval Tracks.- CLEF 2004: Ad Hoc Track Overview and Results Analysis.- Selection and Merging Strategies for Multilingual Information Retrieval.- Using Surface-Syntactic Parser and Deviation from Randomness.- Cross-Language Retrieval Using HAIRCUT at CLEF 2004.- Experiments on Statistical Approaches to Compensate for Limited Linguistic Resources.- Application of Variable Length N-Gram Vectors to Monolingual and Bilingual Information Retrieval.- Integrating New Languages in a Multilingual Search System Based on a Deep Linguistic Analysis.- IR-n r2: Using Normalized Passages.- Using COTS Search Engines and Custom Query Strategies at CLEF.- Report on Thomson Legal and Regulatory Experiments at CLEF-2004.- Effective Translation, Tokenization and Combination for Cross-Lingual Retrieval.- Two-Stage Refinement of Transitive Query Translation with English Disambiguation for Cross-Language Information Retrieval: An Experiment at CLEF 2004.- Dictionary-Based Amharic - English Information Retrieval.- Dynamic Lexica for Query Translation.- SINAI at CLEF 2004: Using Machine Translation Resources with a Mixed 2-Step RSV Merging Algorithm.- Mono- and Crosslingual Retrieval Experiments at the University of Hildesheim.- University of Chicago at CLEF2004: Cross-Language Text and Spoken Document Retrieval.- UB at CLEF2004: Cross Language Information Retrieval Using Statistical Language Models.- MIRACLE's Hybrid Approach to Bilingual and Monolingual Information Retrieval.- Searching a Russian Document Collection Using English, Chinese and Japanese Queries.- Dublin City University at CLEF 2004: Experiments in Monolingual, Bilingual and Multilingual Retrieval.- Finnish, Portuguese and Russian Retrieval with Hummingbird SearchServerTM at CLEF 2004.- Data Fusion for Effective European Monolingual Information Retrieval.- The XLDB Group at CLEF 2004.- The University of Glasgow at CLEF 2004: French Monolingual Information Retrieval with Terrier.- II. Domain-Specific Document Retrieval.- The Domain-Specific Track in CLEF 2004: Overview of the Results and Remarks on the Assessment Process.- University of Hagen at CLEF 2004: Indexing and Translating Concepts for the GIRT Task.- IRIT at CLEF 2004: The English GIRT Task.- Ricoh at CLEF 2004.- GIRT and the Use of Subject Metadata for Retrieval.- III. Interactive Cross-Language Information Retrieval.- iCLEF 2004 Track Overview: Pilot Experiments in Interactive Cross-Language Question Answering.- Interactive Cross-Language Question Answering: Searching Passages Versus Searching Documents.- Improving Interaction with the User in Cross-Language Question Answering Through Relevant Domains and Syntactic Semantic Patterns.- Cooperation, Bookmarking, and Thesaurus in Interactive Bilingual Question Answering.- Summarization Design for Interactive Cross-Language Question Answering.- Interactive and Bilingual Question Answering Using Term Suggestion and Passage Retrieval.- IV. Multiple Language Question Answering.- Overview of the CLEF 2004 Multilingual Question Answering Track.- A Question Answering System for French.- Cross-Language French-English Question Answering Using the DLT System at CLEF 2004.- Experiments on Robust NL Question Interpretation and Multi-layered Document Annotation for a Cross-Language Question/Answering System.- Making Stone Soup: Evaluating a Recall-Oriented Multi-stream Question Answering System for Dutch.- The DIOGENE Question Answering System at CLEF-2004.- Cross-Lingual Question Answering Using Off-the-Shelf Machine Translation.- Bulgarian-English Question Answering: Adaptation of Language Resources.- Answering French Questions in English by Exploiting Results from Several Sources of Information.- Finnish as Source Language in Bilingual Question Answering.- miraQA: Experiments with Learning Answer Context Patterns from the Web.- Question Answering for Spanish Supported by Lexical Context Annotation.- Question Answering Using Sentence Parsing and Semantic Network Matching.- First Evaluation of Esfinge - A Question Answering System for Portuguese.- University of Evora in QA@CLEF-2004.- COLE Experiments at QA@CLEF 2004 Spanish Monolingual Track.- Does English Help Question Answering in Spanish?.- The TALP-QA System for Spanish at CLEF 2004: Structural and Hierarchical Relaxing of Semantic Constraints.- ILC-UniPI Italian QA.- Question Answering Pilot Task at CLEF 2004.- Evaluation of Complex Temporal Questions in CLEF-QA.- V. Cross-Language Retrieval in Image Collections.- The CLEF 2004 Cross-Language Image Retrieval Track.- Caption and Query Translation for Cross-Language Image Retrieval.- Pattern-Based Image Retrieval with Constraints and Preferences on ImageCLEF 2004.- How to Visually Retrieve Images from the St. Andrews Collection Using GIFT.- UNED at ImageCLEF 2004: Detecting Named Entities and Noun Phrases for Automatic Query Expansion and Structuring.- Dublin City University at CLEF 2004: Experiments with the ImageCLEF St. Andrew's Collection.- From Text to Image: Generating Visual Query for Image Retrieval.- Toward Cross-Language and Cross-Media Image Retrieval.- FIRE - Flexible Image Retrieval Engine: ImageCLEF 2004 Evaluation.- MIRACLE Approach to ImageCLEF 2004: Merging Textual and Content-Based Image Retrieval.- Cross-Media Feedback Strategies: Merging Text and Image Information to Improve Image Retrieval.- ImageCLEF 2004: Combining Image and Multi-lingual Search for Medical Image Retrieval.- Multi-modal Information Retrieval Using FINT.- Medical Image Retrieval Using Texture, Locality and Colour.- SMIRE: Similar Medical Image Retrieval Engine.- A Probabilistic Approach to Medical Image Retrieval.- UB at CLEF2004 Cross Language Medical Image Retrieval.- Content-Based Queries on the CasImage Database Within the IRMA Framework.- Comparison and Combination of Textual and Visual Features for Interactive Cross-Language Image Retrieval.- MSU at ImageCLEF: Cross Language and Interactive Image Retrieval.- VI. Cross-Language Spoken Document Retrieval.- CLEF 2004 Cross-Language Spoken Document Retrieval Track.- VII. Issues in CLIR and in Evaluation.- The Key to the First CLEF with Portuguese: Topics, Questions and Answers in CHAVE.- How Do Named Entities Contribute to Retrieval Effectiveness?.

201 citations