Institution
INESC-ID
Nonprofit•Lisbon, Portugal•
About: INESC-ID is a nonprofit organization based out in Lisbon, Portugal. It is known for research contribution in the topics: Computer science & Context (language use). The organization has 932 authors who have published 2618 publications receiving 37658 citations.
Topics: Computer science, Context (language use), Field-programmable gate array, Control theory, Adaptive control
Papers published on a yearly basis
Papers
More filters
••
TL;DR: A new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010 and employs a Gaussian mixture model instead of a vector quantizer in the phoneme recognition front-end is presented.
Abstract: We present a new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010 Corpus-based enhancement systems do not merely filter an incoming noisy signal, but resynthesize its speech content via an inventory of pre-recorded clean signals The goal of the procedure is to perceptually improve the sound of speech signals in background noise The proposed new method modifies Xiao's method in four significant ways Firstly, it employs a Gaussian mixture model (GMM) instead of a vector quantizer in the phoneme recognition front-end Secondly, the state decoding of the recognition stage is supported with an uncertainty modeling technique With the GMM and the uncertainty modeling it is possible to eliminate the need for noise dependent system training Thirdly, the post-processing of the original method via sinusoidal modeling is replaced with a powerful cepstral smoothing operation And lastly, due to the improvements of these modifications, it is possible to extend the operational bandwidth of the procedure from 4 kHz to 8 kHz The performance of the proposed method was evaluated across different noise types and different signal-to-noise ratios The new method was able to significantly outperform traditional methods, including the one by Xiao and Nickel, in terms of PESQ scores and other objective quality measures Results of subjective CMOS tests over a smaller set of test samples support our claims
18 citations
••
01 Jan 2010TL;DR: This chapter studies different approaches that have been proposed for XML fuzzy duplicate detection, and shows that the DogmatiX system is the most effective overall, as it yields the highest recall and precision values for various kinds of differences between duplicates.
Abstract: Fuzzy duplicate detection aims at identifying multiple representations of real-world objects in a data source, and is a task of critical relevance in data cleaning, data mining, and data integration tasks. It has a long history for relational data, stored in a single table or in multiple tables with an equal schema. However, algorithms for fuzzy duplicate detection in more complex structures, such as hierarchies of a data warehouse, XML data, or graph data have only recently emerged. These algorithms use similarity measures that consider the duplicate status of their direct neighbors to improve duplicate detection effectiveness. In this chapter, we study different approaches that have been proposed for XML fuzzy duplicate detection. Our study includes a description and analysis of the different approaches, as well as a comparative experimental evaluation performed on both artificial and real-world data. The two main dimensions used for comparison are the methods effectiveness and efficiency. Our comparison shows that the DogmatiX system [44] is the most effective overall, as it yields the highest recall and precision values for various kinds of differences between duplicates. Another system, called XMLDup [27] has a similar performance, being most effective especially at low recall values. Finally, the SXNM system [36] is the most efficient, as it avoids executing too many pairwise comparisons, but its effectiveness is greatly affected by errors in the data.
18 citations
••
09 Dec 2019TL;DR: It is shown that Pando can provide throughput improvements compared to a single personal device, on a variety of compute-bound applications including animation rendering and image processing, and the flexibility of the approach is shown by deploying Pando on personal devices connected over a local network.
Abstract: The large penetration and continued growth in ownership of personal electronic devices represents a freely available and largely untapped source of computing power. To leverage those, we present Pando, a new volunteer computing tool based on a declarative concurrent programming model and implemented using JavaScript, WebRTC, and WebSockets. This tool enables a dynamically varying number of failure-prone personal devices contributed by volunteers to parallelize the application of a function on a stream of values, by using the devices' browsers. We show that Pando can provide throughput improvements compared to a single personal device, on a variety of compute-bound applications including animation rendering and image processing. We also show the flexibility of our approach by deploying Pando on personal devices connected over a local network, on Grid5000, a French-wide computing grid in a virtual private network, and seven PlanetLab nodes distributed in a wide area network over Europe.
18 citations
•
03 Aug 2013TL;DR: Thorough automatic evaluation shows that the new centrality-based relevance model for automatic summarization achieves state-of-the-art performance, both in written text, and automatically transcribed speech summarization, even when compared to considerably more complex approaches.
Abstract: In automatic summarization, centrality-as-relevance means that the most important content of an information source, or of a collection of information sources, corresponds to the most central passages, considering a representation where such notion makes sense (graph, spatial, etc.). We assess the main paradigms and introduce a new centrality-based relevance model for automatic summarization that relies on the use of support sets to better estimate the relevant content. Geometric proximity is used to compute semantic relatedness. Centrality (relevance) is determined by considering the whole input source (and not only local information), and by taking into account the existence of minor topics or lateral subjects in the information sources to be summarized. The method consists in creating, for each passage of the input source, a support set consisting only of the most semantically related passages. Then, the determination of the most relevant content is achieved by selecting the passages that occur in the largest number of support sets. This model produces extractive summaries that are generic, and language- and domain-independent. Thorough automatic evaluation shows that the method achieves state-of-the-art performance, both in written text, and automatically transcribed speech summarization, even when compared to considerably more complex approaches.
18 citations
••
TL;DR: An overview of the design, deployment and utilization of the CitySDK Tourism API, which aims to provide access to information about Points of Interest, Events and Itineraries, was provided.
Abstract: Tourism is a major social and cultural activity with relevant economic impact. In an effort to promote their attractions with tourists, some cities have adopted the open-data model, publishing touristic data for programmers to use in their own applications. Unfortunately, each city publishes touristic information in its own way.
18 citations
Authors
Showing all 967 results
Name | H-index | Papers | Citations |
---|---|---|---|
João Carvalho | 126 | 1278 | 77017 |
Jaime G. Carbonell | 72 | 496 | 31267 |
Chris Dyer | 71 | 240 | 32739 |
Joao P. S. Catalao | 68 | 1039 | 19348 |
Muhammad Bilal | 63 | 720 | 14720 |
Alan W. Black | 61 | 413 | 19215 |
João Paulo Teixeira | 60 | 636 | 19663 |
Bhiksha Raj | 51 | 359 | 13064 |
Joao Marques-Silva | 48 | 289 | 9374 |
Paulo Flores | 48 | 321 | 7617 |
Ana Paiva | 47 | 472 | 9626 |
Miadreza Shafie-khah | 47 | 450 | 8086 |
Susana Cardoso | 44 | 400 | 7068 |
Mark J. Bentum | 42 | 226 | 8347 |
Joaquim Jorge | 41 | 290 | 6366 |