scispace - formally typeset
Search or ask a question

Showing papers by "INESC-ID published in 2019"


Journal ArticleDOI
TL;DR: An in-depth analysis of 22 studies on automatic cyberbullying detection is conducted, complemented by an experiment to validate current practices through the analysis of two datasets and indicates that cyberbullies is often misrepresented in the literature, leading to inaccurate systems that would have little real-world application.

178 citations


Proceedings ArticleDOI
12 May 2019
TL;DR: In this paper, a cycle-consistency loss based on the speech encoder state sequence instead of the raw speech signal was proposed to mitigate the problem of limited paired data, which reduced the word error rate by 14.7% from an initial model trained with 100-hour paired data.
Abstract: This paper presents a method to train end-to-end automatic speech recognition (ASR) models using unpaired data. Although the end-to-end approach can eliminate the need for expert knowledge such as pronunciation dictionaries to build ASR systems, it still requires a large amount of paired data, i.e., speech utterances and their transcriptions. Cycle-consistency losses have been recently proposed as a way to mitigate the problem of limited paired data. These approaches compose a reverse operation with a given transformation, e.g., text-to-speech (TTS) with ASR, to build a loss that only requires unsupervised data, speech in this example. Applying cycle consistency to ASR models is not trivial since fundamental information, such as speaker traits, are lost in the intermediate text bottleneck. To solve this problem, this work presents a loss that is based on the speech encoder state sequence instead of the raw speech signal. This is achieved by training a Text-To-Encoder model and defining a loss based on the encoder reconstruction error. Experimental results on the LibriSpeech corpus show that the proposed cycle-consistency training reduced the word error rate by 14.7% from an initial model trained with 100-hour paired data, using an additional 360 hours of audio data without transcriptions. We also investigate the use of text-only data mainly for language modeling to further improve the performance in the unpaired data training scenario.

78 citations


Proceedings ArticleDOI
01 Feb 2019
TL;DR: In this article, three patch correctness assessment techniques were proposed to comprehensively study overfitting and incorrect patches in the QuixBugs benchmark, which has never been studied in the context of program repair.
Abstract: Automatic program repair papers tend to repeatedly use the same benchmarks. This poses a threat to the external validity of the findings of the program repair research community. In this paper, we perform an automatic repair experiment on a benchmark called QuixBugs that has never been studied in the context of program repair. In this study, we report on the characteristics of QuixBugs, and study five repair systems, Arja, Astor, Nopol, NPEfix and RSRepair, which are representatives of generate-and-validate repair techniques and synthesis repair techniques. We propose three patch correctness assessment techniques to comprehensively study overfitting and incorrect patches. Our key results are: 1) 15 / 40 buggy programs in the QuixBugs can be repaired with a test-suite adequate patch; 2) a total of 64 plausible patches for those 15 buggy programs in the QuixBugs are present in the search space of the considered tools; 3) the three patch assessment techniques discard in total 33 / 64 patches that are overfitting. This sets a baseline for future research of automatic repair on QuixBugs. Our experiment also highlights the major properties and challenges of how to perform automated correctness assessment of program repair patches. All experimental results are publicly available on Github in order to facilitate future research on automatic program repair.

70 citations


Journal ArticleDOI
TL;DR: A VR environment where user interactions are supported by untethered, easy to operate, peripherals, using a mobile virtual reality headset to provide virtual immersion and simplified geometric information to create voxel-based maquettes is developed.

60 citations


Journal ArticleDOI
06 Mar 2019
TL;DR: The results show that the autonomous robot with empathy fosters meaningful discussions about sustainability, which is a learning outcome in sustainability education.
Abstract: This work explores a group learning scenario with an autonomous empathic robot. We address two research questions: (1) Can an autonomous robot designed with empathic competencies foster collaborative learning in a group context? (2) Can an empathic robot sustain positive educational outcomes in long-term collaborative learning interactions with groups of students? To answer these questions, we developed an autonomous robot with empathic competencies that is able to interact with a group of students in a learning activity about sustainable development. Two studies were conducted. The first study compares learning outcomes in children across three conditions: learning with an empathic robot; learning with a robot without empathic capabilities; and learning without a robot. The results show that the autonomous robot with empathy fosters meaningful discussions about sustainability, which is a learning outcome in sustainability education. The second study features groups of students who interact with the robot in a school classroom for 2 months. The long-term educational interaction did not seem to provide significant learning gains, although there was a change in game-actions to achieve more sustainability during game-play. This result reflects the need to perform more long-term research in the field of educational robots for group learning.

45 citations


Journal ArticleDOI
01 Apr 2019
TL;DR: This review examines the pros and cons of humanizing social robots following a psychological perspective and discusses the overall effects of the humanization of robots in HRI and suggested new avenues of research and development.
Abstract: Funding information National Funds provided by the Portuguese Foundation for Science and Technology (FCT), Grant/Award Numbers: UID/PSI/03125/2013, PTDC/EEI-SII/7174/2014, SFRH/BD/110223/2015, CIPPSI/04345/2013 Abstract The aim of this review was to examine the pros and cons of humanizing social robots following a psychological perspective. As such, we had six goals. First, we defined what social robots are. Second, we clarified the meaning of humanizing social robots. Third, we presented the theoretical backgrounds for promoting humanization. Fourth, we conducted a review of empirical results of the positive effects and the negative effects of humanization on human–robot interaction (HRI). Fifth, we presented some of the political and ethical problems raised by the humanization of social robots. Lastly, we discussed the overall effects of the humanization of robots in HRI and suggested new avenues of research and development.

42 citations


Journal ArticleDOI
TL;DR: The Person-Action-Locator (PAL), a novel UAV-based situational awareness system that relies on Deep Learning models to automatically detect people and recognize their actions in near real-time, was developed and successfully tested in the field.
Abstract: Situational awareness by Unmanned Aerial Vehicles (UAVs) is important for many applications such as surveillance, search and rescue, and disaster response. In those applications, detecting and locating people and recognizing their actions in near real-time can play a crucial role for preparing an effective response. However, there are currently three main limitations to perform this task efficiently. First, it is currently often not possible to access the live video feed from a UAV’s camera due to limited bandwidth. Second, even if the video feed is available, monitoring and analyzing video over prolonged time is a tedious task for humans. Third, it is typically not possible to locate random people via their cellphones. Therefore, we developed the Person-Action-Locator (PAL), a novel UAV-based situational awareness system. The PAL system addresses the first issue by analyzing the video feed onboard the UAV, powered by a supercomputer-on-a-module. Specifically, as a support for human operators, the PAL system relies on Deep Learning models to automatically detect people and recognize their actions in near real-time. To address the third issue, we developed a Pixel2GPS converter that estimates the location of people from the video feed. The result – icons representing detected people labeled by their actions – is visualized on the map interface of the PAL system. The Deep Learning models were first tested in the lab and demonstrated promising results. The fully integrated PAL system was successfully tested in the field. We also performed another collection of surveillance data to complement the lab results.

41 citations


Journal ArticleDOI
TL;DR: Two high-sensitivity probes for Eddy Current Nondestructive Test (NDT) of buried and surface defects are disclosed, showing that there is an increase in spatial resolution of surface defects when contrasted to prior art, enabling the probes to resolve defects.
Abstract: This paper discloses two high-sensitivity probes for Eddy Current Nondestructive Test (NDT) of buried and surface defects. These probes incorporate eight and 32 magnetoresistive sensors, respectively, which are optimized for high sensitivity and spatial resolution. The signal processing and interfacing are carried out by a full-custom application-specific integrated circuit (ASIC). The ASIC signal chain performs with a thermal input-referred noise of 30 nV/ $\sqrt{\text{Hz}}$ at 1 kHz, with 66 mW of power consumption, in a die with 3.7×3.4 mm $^2$ . NDT results are presented, showing that there is an increase in spatial resolution of surface defects when contrasted to prior art, enabling the probes to resolve defects with diameters of 0.44 mm, pitches of 0.6 mm, and minimum edge distance as low as 0.16 mm. The results also show that the probe for buried defects is a good all-round tool for detection of defects under cladding and multiple-plate flat junctions.

36 citations


Journal ArticleDOI
TL;DR: It is shown that, as in other types of software, testing increases the quality of apps and evidence that tests are essential when it comes to engaging the community to contribute to mobile open source software is found.
Abstract: Software testing is an important phase in the software development lifecycle because it helps in identifying bugs in a software system before it is shipped into the hand of its end users. There are numerous studies on how developers test general-purpose software applications. The idiosyncrasies of mobile software applications, however, set mobile apps apart from general-purpose systems (e.g., desktop, stand-alone applications, web services). This paper investigates working habits and challenges of mobile software developers with respect to testing. A key finding of our exhaustive study, using 1000 Android apps, demonstrates that mobile apps are still tested in a very ad hoc way, if tested at all. However, we show that, as in other types of software, testing increases the quality of apps (demonstrated in user ratings and number of code issues). Furthermore, we find evidence that tests are essential when it comes to engaging the community to contribute to mobile open source software. We discuss reasons and potential directions to address our findings. Yet another relevant finding of our study is that Continuous Integration and Continuous Deployment (CI/CD) pipelines are rare in the mobile apps world (only 26% of the apps are developed in projects employing CI/CD) – we argue that one of the main reasons is due to the lack of exhaustive and automatic testing.

34 citations


Proceedings ArticleDOI
11 Mar 2019
TL;DR: The results suggest that different levels of warmth and competence are associated with distinct emotional responses from users and that these variables are useful in predicting future intention to work, thus hinting at the importance of considering warmth and Competence stereotypes in Human-Robot Interaction.
Abstract: In this paper we sought to understand how the display of different levels of warmth and competence, as well as, different roles (opponent versus partner) portrayed by a robot, affect the display of emotional responses towards robots and how they can be used to predict future intention to work. For this purpose we devised an entertainment card-game group scenario involving two humans and two robots $(\mathbf{n}=54)$ . The results suggest that different levels of warmth and competence are associated with distinct emotional responses from users and that these variables are useful in predicting future intention to work, thus hinting at the importance of considering warmth and competence stereotypes in Human-Robot Interaction.

30 citations


Journal ArticleDOI
TL;DR: The hardware and software infrastructure that supports such rich form of interaction in ASD therapy while featuring a fully autonomous robot is described, as well as the design methodology that guided the development of the INSIDE system.

Journal ArticleDOI
TL;DR: This report focuses on a recent microcytometric technology based on magnetic sensors and magnetic particles integrated into microfluidic structures for dynamic bioanalysis of fluid samples—magnetic flow cytometry.
Abstract: The growing need for biological information at the single cell level has driven the development of improved cytometry technologies. Flow cytometry is a particularly powerful method that has evolved over the past few decades. Flow cytometers have become essential instruments in biomedical research and routine clinical tests for disease diagnosis, prognosis, and treatment monitoring. However, the increasing number of cellular parameters unveiled by genomic, proteomic, and metabolomic data platforms demands an augmented multiplexability. Also, the need for identification and quantification of relevant biomarkers at low levels requires outstanding analytical sensitivity and reliability. In addition, growing awareness of the advantages associated with miniaturization of analytical devices is pushing forward the progress in integrated and compact, microfluidic-based devices at the point-of-care. In this context, novel types of flow cytometers are emerging during the search to tackle these challenges. Notwithstanding the relevance of other promising alternatives to standard optical flow cytometry (e.g., mass cytometry, various optical and electrical microcytometers), this report focuses on a recent microcytometric technology based on magnetic sensors and magnetic particles integrated into microfluidic structures for dynamic bioanalysis of fluid samples-magnetic flow cytometry. Its concept, main developments, targeted applications, as well as the challenges and trends behind this technology are presented and discussed. Graphical abstract ᅟ "Kindly advise whether there is online abstract figure for this paper. If so, kindly resupply.The graphical abstract is correctly supplied.

Proceedings ArticleDOI
25 Mar 2019
TL;DR: ROLP is a Runtime Object Lifetime Profiler that profiles application code at runtime and helps pretenuring GC algorithms allocating objects with similar lifetimes close to each other so that the overall fragmentation, GC effort, and application pauses are reduced.
Abstract: Latency sensitive services such as credit-card fraud detection and website targeted advertisement rely on Big Data platforms which run on top of memory managed runtimes, such as the Java Virtual Machine (JVM). These platforms, however, suffer from unpredictable and unacceptably high pause times due to inadequate memory management decisions (e.g., allocating objects with very different lifetimes next to each other, resulting in severe memory fragmentation). This leads to frequent and long application pause times, breaking Service Level Agreements (SLAs). This problem has been previously identified, and results show that current memory management techniques are ill-suited for applications that hold in memory massive amounts of long-lived objects (which is the case for a wide spectrum of Big Data applications). Previous works reduce such application pauses by allocating objects in off-heap, in special allocation regions/generations, or by using ultra-low latency Garbage Collectors (GC). However, all these solutions either require a combination of programmer effort and knowledge, source code access, offline profiling (with clear negative impacts on programmer's productivity), or impose a significant impact on application throughput and/or memory to reduce application pauses. We propose ROLP, a Runtime Object Lifetime Profiler that profiles application code at runtime and helps pretenuring GC algorithms allocating objects with similar lifetimes close to each other so that the overall fragmentation, GC effort, and application pauses are reduced. ROLP is implemented for the OpenJDK 8 and was evaluated with a recently proposed open-source pretenuring collector (NG2C). Results show long tail latencies reductions of up to 51% for Lucene, 85% for GraphChi, and 69% for Cassandra. This is achieved with negligible throughput (

Journal ArticleDOI
TL;DR: A control system that designs an optimal therapy based on adaptive control methods, aiming to allow the eradication of a metastatic renal cell carcinoma as quickly and efficiently as possible, and with lower associated toxicity, is developed.

Journal ArticleDOI
TL;DR: This work studied if a miniaturized and easy to use spectrometer could deliver data whose quality was enough to allow varieties separation even with data being collected in the field, non-destructively, and under uncontrolled solar lighting.

Book ChapterDOI
01 Jan 2019
TL;DR: This paper created four distinct classifiers, trained using a supervised approach, each one considering a group of features extracted from four different sources: user name and screen name, user description, content of the tweets, and profile picture, and a final classifier that combines the prediction of each one of the four previous individual classifiers achieves the best performance.
Abstract: Twitter provides a simple way for users to express feelings, ideas and opinions, makes the user generated content and associated metadata, available to the community, and provides easy-to-use web and application programming interfaces to access data. The user profile information is important for many studies, but essential information, such as gender and age, is not provided when accessing a Twitter account. However, clues about the user profile, such as the age and gender, behaviors, and preferences, can be extracted from other content provided by the user. The main focus of this paper is to infer the gender of the user from unstructured information, including the username, screen name, description and picture, or by the user generated content. We have performed experiments using an English labelled dataset containing 6.5 M tweets from 65 K users, and a Portuguese labelled dataset containing 5.8 M tweets from 58 K users. We have created four distinct classifiers, trained using a supervised approach, each one considering a group of features extracted from four different sources: user name and screen name, user description, content of the tweets, and profile picture. Features related with the activity, such as number of following and number of followers, were discarded, since these features were found not indicative of gender. A final classifier that combines the prediction of each one of the four previous individual classifiers achieves the best performance, corresponding to 93.2% accuracy for English and 96.9% accuracy for Portuguese data.

Proceedings ArticleDOI
02 May 2019
TL;DR: Warp Deixis is presented, a novel approach to improving the perception of pointing gestures and facilitate communication in collaborative Extended Reality environments by warping the virtual representation of the pointing individual to match the pointing expression to the observer's perception.
Abstract: When engaged in communication, people often rely on pointing gestures to refer to out-of-reach content. However, observers frequently misinterpret the target of a pointing gesture. Previous research suggests that to perform a pointing gesture, people place the index finger on or close to a line connecting the eye to the referent, while observers interpret pointing gestures by extrapolating the referent using a vector defined by the arm and index finger. In this paper we present Warping Deixis, a novel approach to improving the perception of pointing gestures and facilitate communication in collaborative Extended Reality environments. By warping the virtual representation of the pointing individual, we are able to match the pointing expression to the observer's perception. We evaluated our approach in a co-located side by side virtual reality scenario. Results suggest that our approach is effective in improving the interpretation of pointing gestures in shared virtual environments.

Journal ArticleDOI
TL;DR: The theoretical development and performance of novel Input–Output Linearization AC voltage controllers applied to Dynamic Voltage Restorers (DVR) with Flywheel Energy Storage (FES) proves to be faster and more aggressive than the PI controller, which is softer introducing less voltage distortion.

Proceedings ArticleDOI
09 Dec 2019
TL;DR: It is shown that Pando can provide throughput improvements compared to a single personal device, on a variety of compute-bound applications including animation rendering and image processing, and the flexibility of the approach is shown by deploying Pando on personal devices connected over a local network.
Abstract: The large penetration and continued growth in ownership of personal electronic devices represents a freely available and largely untapped source of computing power. To leverage those, we present Pando, a new volunteer computing tool based on a declarative concurrent programming model and implemented using JavaScript, WebRTC, and WebSockets. This tool enables a dynamically varying number of failure-prone personal devices contributed by volunteers to parallelize the application of a function on a stream of values, by using the devices' browsers. We show that Pando can provide throughput improvements compared to a single personal device, on a variety of compute-bound applications including animation rendering and image processing. We also show the flexibility of our approach by deploying Pando on personal devices connected over a local network, on Grid5000, a French-wide computing grid in a virtual private network, and seven PlanetLab nodes distributed in a wide area network over Europe.

Journal ArticleDOI
TL;DR: YOLO as discussed by the authors is a non-anthropomorphic social robot designed to stimulate creativity in children during free-play where they use the robot as a character for the stories they create.

Journal ArticleDOI
TL;DR: The successful development of two autonomous robots that are able to interact with a group of two humans in the execution of a task for social and entertainment purposes are described and how humans choose robots to partner with in a multi-party game context is investigated.
Abstract: Although groups of robots are expected to interact with groups of humans in the near future, research related to teams of humans and robots is still scarce. This paper contributes to the study of human–robot teams by describing the development of two autonomous robotic partners and by investigating how humans choose robots to partner with in a multi-party game context. Our work concerns the successful development of two autonomous robots that are able to interact with a group of two humans in the execution of a task for social and entertainment purposes. The creation of these two characters was motivated by psychological research on learning goal theory, according to which we interpret and approach a given task differently depending on our learning goal. Thus, we developed two robotic characters implemented in two robots: Emys (a competitive robot, based on characteristics related to performance-orientation goals) and Glin (a relationship-driven robot, based on characteristics related to learning-orientation goals). In our study, a group of four (two humans and two autonomous robots) engaged in a card game for social and entertainment purposes. Our study yields several important conclusions regarding groups of humans and robots. (1) When a partner is chosen without previous partnering experience, people tend to prefer robots with relationship-driven characteristics as their partners compared with competitive robots. (2) After some partnering experience has been gained, the choice becomes less clear, and additional driving factors emerge as follows: (2a) participants with higher levels of competitiveness (personal characteristics) tend to prefer Emys, whereas those with lower levels prefer Glin, and (2b) the choice of which robot to partner with also depends on team performance, with the winning team being the preferred choice.

Proceedings ArticleDOI
01 Nov 2019
TL;DR: Investigating whether people will cheat while in the presence of a robot and to what extent this depends on the role the robot plays found that participants cheated significantly more than chance when they were alone or with the robot giving instructions.
Abstract: People are not perfect, and if given the chance, some will be dishonest with no regrets. Some people will cheat just a little to gain some advantage, and others will not do it at all. With the prospect of more human-robot interactions in the future, it will become very important to understand which kind of roles a robot can have in the regulation of cheating behavior. We investigated whether people will cheat while in the presence of a robot and to what extent this depends on the role the robot plays. We ran a study to test cheating behavior with a die task, and allocated people to one of the following conditions: 1) participants were alone in the room while doing the task; 2) with a robot with a vigilant role or 3) with a robot that had a supporting role in the task, accompanying and giving instructions. Our results showed that participants cheated significantly more than chance when they were alone or with the robot giving instructions. In contrast, cheating could not be proven when the robot presented a vigilant role. This study has implications for human-robot interaction and for the deployment of autonomous robots in sensitive roles in which people may be prone to dishonest behavior.

Journal ArticleDOI
TL;DR: This study proposes a new structure for the three-phase DC−AC conversion stage for a grid-connected PV system that consists of two four-leg two-level voltage source inverters that are connected to two PV generators.
Abstract: Voltage source inverters (VSIs) are power converters that are considered essential in grid connected photovoltaic (PV) generators. There are several types of topologies for these converters. However, from the point of view of high-quality AC output voltage, multilevel inverters are considered the most adequate. Under this context, this study proposes a new structure for the three-phase DC−AC conversion stage for a grid-connected PV system. It consists of two four-leg two-level voltage source inverters that are connected to two PV generators. These inverters are associated with two Scott transformers. The secondary windings of the transformer are connected in a way that allows a series connection. Due to this, a multilevel operation will be achieved. The performance of the proposed power conditioning system will be verified through simulation and experimental results.

Posted Content
TL;DR: The results show that the autonomous robot with empathy fosters meaningful discussions about sustainability, which is a learning outcome in sustainability education.
Abstract: This work explores a group learning scenario with an autonomous empathic robot. We address two research questions: (1) Can an autonomous robot designed with empathic competencies foster collaborative learning in a group context? (2) Can an empathic robot sustain positive educational outcomes in long-term collaborative learning interactions with groups of students? To answer these questions, we developed an autonomous robot with empathic competencies that is able to interact with a group of students in a learning activity about sustainable development. Two studies were conducted. The first study compares learning outcomes in children across 3 conditions: learning with an empathic robot; learning with a robot without empathic capabilities; and learning without a robot. The results show that the autonomous robot with empathy fosters meaningful discussions about sustainability, which is a learning outcome in sustainability education. The second study features groups of students who interact with the robot in a school classroom for two months. The long-term educational interaction did not seem to provide significant learning gains, although there was a change in game-actions to achieve more sustainability during game-play. This result reflects the need to perform more long-term research in the field of educational robots for group learning.

Journal ArticleDOI
TL;DR: This work uses electronic power converters as an interface between the batteries and the grid to supply an ancillary service, more specifically, reactive power compensation, allowing to improve the capital investment return of these systems.
Abstract: The integration of energy storage systems in power distribution networks allows to obtain several benefits, such as, the minimization of energy losses, the improvement of voltage profile and the reduction of the energy costs. However, due to the high cost of these energy storage systems, this integration must be carefully applied. Thus, this work proposes the integration of energy storage systems based on a multiobjective optimization. The type of storage systems that is considered are the batteries. These systems require electronic power converters as an interface between the batteries and the grid. Thus, this work uses those converters to supply an ancillary service, more specifically, reactive power compensation. In this way, besides the peak shaving, the optimization approach will also consider the reactive-power compensation, allowing to improve the capital investment return of these systems. The reactive power compensation considers the maximum active power of the converter, to minimize the cost of the system. In consequence, when the energy storage system is at its maximum discharge mode, the reactive power compensation function will be inhibited. Since the multi-objective optimization generates a Pareto-optimal set with a large number of solutions, an approach to support the choice of the solution is also proposed. This approach considers a new post-Pareto analysis, which is based on the sum of the ranking differences. To demonstrate the applicability of the proposed approach, a case study using the 94-bus real test feeder is presented. Three scenarios tests are also presented for the post-pareto optimality analysis, each considering different weights for the objective functions. The results show that even for a specific case where the weights are assigned for each of the objective functions, more than one solution is obtained.

Proceedings ArticleDOI
01 Apr 2019
TL;DR: This solution intends to achieve fast changing between main and standby redundant devices in fault-tolerant ABH topologies through the combination of mechanical commutators and semiconductors to increase its reliability to safety critical applications.
Abstract: This paper proposes several modifications of the conventional Asymmetric Half-Bridge (ABH) topology for switched reluctance motor (SRM) drives with the aim of increasing its reliability to safety critical applications. This solution intends to achieve fast changing between main and standby redundant devices in fault-tolerant ABH topologies through the combination of mechanical commutators and semiconductors. With those modifications, the energy processing capability can be maintained for the most common failure modes. A survey of previously developed fault tolerant topologies for SRM drives and several aspects of failure modes, detection and isolation mechanisms are also included. The theoretical validity of one the proposed solutions is confirmed by several experimental results.

Journal ArticleDOI
TL;DR: It is illustrated that classical statistical methods may fail to identify outliers due to their heavy influence, prompting the need for robust statistics, and robust regression and outlier detection constitute key strategies to cope with high-dimensional clinical data such as omics data.
Abstract: Correct classification of breast cancer subtypes is of high importance as it directly affects the therapeutic options. We focus on triple-negative breast cancer which has the worst prognosis among breast cancer types. Using cutting edge methods from the field of robust statistics, we analyze Breast Invasive Carcinoma transcriptomic data publicly available from The Cancer Genome Atlas data portal. Our analysis identifies statistical outliers that may correspond to misdiagnosed patients. Furthermore, it is illustrated that classical statistical methods may fail to identify outliers due to their heavy influence, prompting the need for robust statistics. Using robust sparse logistic regression we obtain 36 relevant genes, of which ca. 60% have been previously reported as biologically relevant to triple-negative breast cancer, reinforcing the validity of the method. The remaining 14 genes identified are new potential biomarkers for triple-negative breast cancer. Out of these, JAM3, SFT2D2, and PAPSS1 were previously associated to breast tumors or other types of cancer. The relevance of these genes is confirmed by the new DetectDeviatingCells outlier detection technique. A comparison of gene networks on the selected genes showed significant differences between triple-negative breast cancer and non-triple-negative breast cancer data. The individual role of FOXA1 in triple-negative breast cancer and non-triple-negative breast cancer, and the strong FOXA1-AGR2 connection in triple-negative breast cancer stand out. The goal of our paper is to contribute to the breast cancer/triple-negative breast cancer understanding and management. At the same time it demonstrates that robust regression and outlier detection constitute key strategies to cope with high-dimensional clinical data such as omics data.

Proceedings ArticleDOI
01 Jun 2019
TL;DR: A graph clustering algorithm is applied on contextualized embedding representations of the verbs and arguments that provide cues for word-sense disambiguation and is able to outperform all of the baselines reported for the task on the test set.
Abstract: Building large datasets annotated with semantic information, such as FrameNet, is an expensive process. Consequently, such resources are unavailable for many languages and specific domains. This problem can be alleviated by using unsupervised approaches to induce the frames evoked by a collection of documents. That is the objective of the second task of SemEval 2019, which comprises three subtasks: clustering of verbs that evoke the same frame and clustering of arguments into both frame-specific slots and semantic roles. We approach all the subtasks by applying a graph clustering algorithm on contextualized embedding representations of the verbs and arguments. Using such representations is appropriate in the context of this task, since they provide cues for word-sense disambiguation. Thus, they can be used to identify different frames evoked by the same words. Using this approach we were able to outperform all of the baselines reported for the task on the test set in terms of Purity F1, as well as in terms of BCubed F1 in most cases.

Proceedings ArticleDOI
01 Aug 2019
TL;DR: A comprehensive study to reassess the effects of combining Dynamic Slicing with Spectrumbased Fault Localization finds that the DS-SFL combination was practical and effective and should be encouraged to be evaluated against that optimization.
Abstract: Several approaches have been proposed to reduce debugging costs through automated software fault diagnosis. Dynamic Slicing (DS) and Spectrum-based Fault Localization (SFL) are popular fault diagnosis techniques and normally seen as complementary. This paper reports on a comprehensive study to reassess the effects of combining DS with SFL. With this combination, components that are often involved in failing but seldom in passing test runs could be located and their suspiciousness reduced. Results show that the DS-SFL combination, coined as Tandem-FL, improves the diagnostic accuracy up to 73.7% (13.4% on average). Furthermore, results indicate that the risk of missing faulty statements, which is a DS?s key limitation, is not high ? DS misses faulty statements in 9% of the 260 cases. To sum up, we found that the DS-SFL combination was practical and effective and encourage new SFL techniques to be evaluated against that optimization.

Journal ArticleDOI
TL;DR: When comparing the developed system to the only publicly available alternative for flight search, it was shown that it provides the best-recommended and the cheapest solutions, respectively 74% and 95% of the times, allowing the user to save time and money.
Abstract: This work introduces and formalizes the Flying Tourist Problem (FTP), whose goal is to find the best schedule, route, and set of flights for any given unconstrained multi-city flight request. To solve the FTP, the developed work proposes a methodology that allows an efficient resolution of this rather demanding problem. This strategy uses different heuristics and meta-heuristic optimization algorithms, allowing the identification of solutions in real-time, even for large problem instances. The implemented system was evaluated using different criteria, including the provided gains (in terms of total flight price and duration) and its performance compared to other similar systems. The obtained results show that the developed optimization system consistently presents solutions that are up to 35% cheaper (or 60% faster) than those developed by simpler heuristics. Furthermore, when comparing the developed system to the only publicly available (but not-disclosed) alternative for flight search, it was shown that it provides the best-recommended and the cheapest solutions, respectively 74% and 95% of the times, allowing the user to save time and money.