Showing papers by "Rubén San-Segundo published in 2018"

PDF

Open Access

Journal Article•DOI•

Robust Human Activity Recognition using smartwatches and smartphones

[...]

Rubén San-Segundo, Henrik Blunck¹, José Moreno-Pimentel, Allan Stisen², Manuel Gil-Martín - Show less +1 more•Institutions (2)

Ruhr University Bochum¹, Aarhus University²

01 Jun 2018-Engineering Applications of Artificial Intelligence

TL;DR: This work analyzes and proposes several techniques to improve the robustness of a Human Activity Recognition (HAR) system that uses accelerometer signals from different smartwatches and smartphones.

...read moreread less

75 citations

Proceedings Article•DOI•

Automated tremor detection in Parkinson's disease using accelerometer signals

[...]

Ada Zhang¹, Rubén San-Segundo², Stanislav Panev¹, Griffin Frederick Tabor¹, Katelyn Stebbins¹, Andrew S. Whitford¹, Fernando De la Torre¹, Jessica K. Hodgins¹ - Show less +4 more•Institutions (2)

Carnegie Mellon University¹, Technical University of Madrid²

26 Sep 2018

TL;DR: Several feature sets compared across two classification algorithms for PD tremor detection find that features automatically learned by a Convolutional Neural Network lead to the best performance, although the authors' handcrafted features are close behind.

...read moreread less

Abstract: Wearable sensor technology has the potential to transform the treatment of Parkinson's Disease (PD) by providing objective analysis about the frequency and severity of symptoms in everyday life. However, many challenges remain to developing a system that can robustly distinguish PD motor symptoms from normal motion. Stronger feature sets may help to improve the detection accuracy of such a system. In this work, we explore several feature sets compared across two classification algorithms for PD tremor detection. We find that features automatically learned by a Convolutional Neural Network (CNN) lead to the best performance, although our handcrafted features are close behind. We also find that CNNs benefit from training on data decomposed into tremor and activity spectra as opposed to raw data.

...read moreread less

13 citations

Journal Article•DOI•

Optimized symmetric partial facegraphs for face recognition in adverse conditions

[...]

Badr Lahasan¹, Syaheerah Lebai Lutfi¹, Ibrahim Venkat¹, Mohammed Azmi Al-Betar², Rubén San-Segundo³ - Show less +1 more•Institutions (3)

Universiti Sains Malaysia¹, Al-Balqa` Applied University², Technical University of Madrid³

01 Mar 2018-Information Sciences

TL;DR: A memetic based framework to recognize faces prone to adverse conditions such as facial occlusions, expression and illumination variations and an intelligent single particle optimizer to take advantage of their global and local search capabilities are proposed.

...read moreread less

9 citations

Journal Article•DOI•

Resource2Vec: Linked Data distributed representations for term discovery in automatic speech recognition

[...]

Alejandro Coucheiro-Limeres¹, Javier Ferreiros-López¹, Rubén San-Segundo¹, Ricardo de Córdoba¹•Institutions (1)

Technical University of Madrid¹

01 Dec 2018-Expert Systems With Applications

TL;DR: This strategy permits out-of-vocabulary (OOV) terms in a Large Vocabulary Continuous Speech Recognition (LVCSR) system to be discovered and then put into the final transcription, and is able to improve the transcription in Word Error Rate (WER) terms significantly.

...read moreread less

Abstract: In this work we present a neural network embedding we call Resource2Vec, which is able to represent the resources that make up some Linked Data (LD) corpora. A vector representation of these resources allows more advantageous processing (in computational terms) as is the case with known word or document embeddings. We give a quantitative analysis for their study. Furthermore, we employ them in an Automatic Speech Recognition (ASR) task to demonstrate their functionality by designing a strategy for term discovery. This strategy permits out-of-vocabulary (OOV) terms in a Large Vocabulary Continuous Speech Recognition (LVCSR) system to be discovered and then put into the final transcription. First, we detect where a potential OOV term may have been uttered in the LVCSR output speech segments. Second, we carry out a candidate OOV search in some LD corpora. This search is oriented by distance measurements between the transcription context around the potential-OOV speech segment and the resources of the LD corpora in Resource2Vec format, obtaining a set of candidates. To rank them, we mainly depend on the phone transcription of that segment. Finally, we decide whether or not to incorporate a candidate into the final transcription. The results show we are able to improve the transcription in Word Error Rate (WER) terms significantly, after our strategy is used on speech in Spanish.

...read moreread less

6 citations

Journal Article•DOI•

AMIC: Affective multimedia analytics with inclusive and natural communication

[...]

Alfonso Ortega, Eduardo Lleida, Rubén San-Segundo, Javier Ferreiros, Lluís F. Hurtado, Emilio Sanchis, María Inés Torres, Raquel Justo - Show less +4 more

01 Sep 2018-Procesamiento Del Lenguaje Natural

TL;DR: This project is focused on advancing, developing and improving speech and language technologies as well as image and video technologies in the analysis of multimedia content adding to this analysis the extraction of affective-emotional information.

...read moreread less

Abstract: Traditionally, textual content has been the main source of information extraction and indexing, and other technologies that are capable of extracting information from the audio and video of multimedia documents have joined later. Other major axis of analysis is the emotional and affective aspect intrinsic in human communication. This information of emotions, stances, preferences, figurative language, irony, sarcasm, etc. is fundamental and irreplaceable for a complete understanding of the content in conversations, speeches, debates, discussions, etc. The objective of this project is focused on advancing, developing and improving speech and language technologies as well as image and video technologies in the analysis of multimedia content adding to this analysis the extraction of affective-emotional information. As additional steps forward, we will advance in the methodologies and ways for presenting the information to the user, working on technologies for language simplification, automatic reports and summary generation, emotional speech synthesis and natural and inclusive interaction.

...read moreread less

4 citations

Proceedings Article•DOI•

On the use of Phone-based Embeddings for Language Recognition.

[...]

Christian Salamea, Ricardo de Córdoba, Luis Fernando D'Haro, Rubén San-Segundo, Javier Ferreiros - Show less +1 more

21 Nov 2018

TL;DR: The use of Neural Embeddings (NEs) as features for those phone-grams sequences, which are used as entries in a classical i-Vector framework to train a multi class logistic classifier, contribute to relative improvement over the baseline using a Skip-Gram model and a Glove model.

...read moreread less

Abstract: Language Identification (LID) can be defined as the process of automatically identifying the language of a given spoken utterance. We have focused in a phonotactic approach in which the system input is the phoneme sequence generated by a speech recognizer (ASR), but instead of phonemes, we have used phonetic units that contain context information, the socalled “phone-gram sequences”. In this context, we propose the use of Neural Embeddings (NEs) as features for those phone-grams sequences, which are used as entries in a classical i-Vector framework to train a multi class logistic classifier. These NEs incorporate information from the neighbouring phone-grams in the sequence and model implicitly longer-context information. The NEs have been trained using both a Skip-Gram and a Glove Model. Experiments have been carried out on the KALAKA-3 database and we have used Cavg as metric to compare the systems. We propose as baseline the Cavg obtained using the NEs as features in the LID task, 24,7%. Our strategy to incorporate information from the neighbouring phone-grams to define the final sequences contributes to obtain up to 24,3% relative improvement over the baseline using Skip-Gram model and up to 32,4% using Glove model. Finally, the fusion of our best system with a MFCC-based acoustic iVector system provides up to 34,1% improvement over the acoustic system alone.

...read moreread less

3 citations