scispace - formally typeset
Search or ask a question
Institution

INESC-ID

NonprofitLisbon, Portugal
About: INESC-ID is a nonprofit organization based out in Lisbon, Portugal. It is known for research contribution in the topics: Computer science & Context (language use). The organization has 932 authors who have published 2618 publications receiving 37658 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010 and employs a Gaussian mixture model instead of a vector quantizer in the phoneme recognition front-end is presented.
Abstract: We present a new approach for corpus-based speech enhancement that significantly improves over a method published by Xiao and Nickel in 2010 Corpus-based enhancement systems do not merely filter an incoming noisy signal, but resynthesize its speech content via an inventory of pre-recorded clean signals The goal of the procedure is to perceptually improve the sound of speech signals in background noise The proposed new method modifies Xiao's method in four significant ways Firstly, it employs a Gaussian mixture model (GMM) instead of a vector quantizer in the phoneme recognition front-end Secondly, the state decoding of the recognition stage is supported with an uncertainty modeling technique With the GMM and the uncertainty modeling it is possible to eliminate the need for noise dependent system training Thirdly, the post-processing of the original method via sinusoidal modeling is replaced with a powerful cepstral smoothing operation And lastly, due to the improvements of these modifications, it is possible to extend the operational bandwidth of the procedure from 4 kHz to 8 kHz The performance of the proposed method was evaluated across different noise types and different signal-to-noise ratios The new method was able to significantly outperform traditional methods, including the one by Xiao and Nickel, in terms of PESQ scores and other objective quality measures Results of subjective CMOS tests over a smaller set of test samples support our claims

18 citations

Book ChapterDOI
01 Jan 2010
TL;DR: This chapter studies different approaches that have been proposed for XML fuzzy duplicate detection, and shows that the DogmatiX system is the most effective overall, as it yields the highest recall and precision values for various kinds of differences between duplicates.
Abstract: Fuzzy duplicate detection aims at identifying multiple representations of real-world objects in a data source, and is a task of critical relevance in data cleaning, data mining, and data integration tasks. It has a long history for relational data, stored in a single table or in multiple tables with an equal schema. However, algorithms for fuzzy duplicate detection in more complex structures, such as hierarchies of a data warehouse, XML data, or graph data have only recently emerged. These algorithms use similarity measures that consider the duplicate status of their direct neighbors to improve duplicate detection effectiveness. In this chapter, we study different approaches that have been proposed for XML fuzzy duplicate detection. Our study includes a description and analysis of the different approaches, as well as a comparative experimental evaluation performed on both artificial and real-world data. The two main dimensions used for comparison are the methods effectiveness and efficiency. Our comparison shows that the DogmatiX system [44] is the most effective overall, as it yields the highest recall and precision values for various kinds of differences between duplicates. Another system, called XMLDup [27] has a similar performance, being most effective especially at low recall values. Finally, the SXNM system [36] is the most efficient, as it avoids executing too many pairwise comparisons, but its effectiveness is greatly affected by errors in the data.

18 citations

Proceedings ArticleDOI
09 Dec 2019
TL;DR: It is shown that Pando can provide throughput improvements compared to a single personal device, on a variety of compute-bound applications including animation rendering and image processing, and the flexibility of the approach is shown by deploying Pando on personal devices connected over a local network.
Abstract: The large penetration and continued growth in ownership of personal electronic devices represents a freely available and largely untapped source of computing power. To leverage those, we present Pando, a new volunteer computing tool based on a declarative concurrent programming model and implemented using JavaScript, WebRTC, and WebSockets. This tool enables a dynamically varying number of failure-prone personal devices contributed by volunteers to parallelize the application of a function on a stream of values, by using the devices' browsers. We show that Pando can provide throughput improvements compared to a single personal device, on a variety of compute-bound applications including animation rendering and image processing. We also show the flexibility of our approach by deploying Pando on personal devices connected over a local network, on Grid5000, a French-wide computing grid in a virtual private network, and seven PlanetLab nodes distributed in a wide area network over Europe.

18 citations

Proceedings Article
03 Aug 2013
TL;DR: Thorough automatic evaluation shows that the new centrality-based relevance model for automatic summarization achieves state-of-the-art performance, both in written text, and automatically transcribed speech summarization, even when compared to considerably more complex approaches.
Abstract: In automatic summarization, centrality-as-relevance means that the most important content of an information source, or of a collection of information sources, corresponds to the most central passages, considering a representation where such notion makes sense (graph, spatial, etc.). We assess the main paradigms and introduce a new centrality-based relevance model for automatic summarization that relies on the use of support sets to better estimate the relevant content. Geometric proximity is used to compute semantic relatedness. Centrality (relevance) is determined by considering the whole input source (and not only local information), and by taking into account the existence of minor topics or lateral subjects in the information sources to be summarized. The method consists in creating, for each passage of the input source, a support set consisting only of the most semantically related passages. Then, the determination of the most relevant content is achieved by selecting the passages that occur in the largest number of support sets. This model produces extractive summaries that are generic, and language- and domain-independent. Thorough automatic evaluation shows that the method achieves state-of-the-art performance, both in written text, and automatically transcribed speech summarization, even when compared to considerably more complex approaches.

18 citations

Journal ArticleDOI
TL;DR: An overview of the design, deployment and utilization of the CitySDK Tourism API, which aims to provide access to information about Points of Interest, Events and Itineraries, was provided.
Abstract: Tourism is a major social and cultural activity with relevant economic impact. In an effort to promote their attractions with tourists, some cities have adopted the open-data model, publishing touristic data for programmers to use in their own applications. Unfortunately, each city publishes touristic information in its own way.

18 citations


Authors

Showing all 967 results

NameH-indexPapersCitations
João Carvalho126127877017
Jaime G. Carbonell7249631267
Chris Dyer7124032739
Joao P. S. Catalao68103919348
Muhammad Bilal6372014720
Alan W. Black6141319215
João Paulo Teixeira6063619663
Bhiksha Raj5135913064
Joao Marques-Silva482899374
Paulo Flores483217617
Ana Paiva474729626
Miadreza Shafie-khah474508086
Susana Cardoso444007068
Mark J. Bentum422268347
Joaquim Jorge412906366
Network Information
Related Institutions (5)
Carnegie Mellon University
104.3K papers, 5.9M citations

88% related

Eindhoven University of Technology
52.9K papers, 1.5M citations

88% related

Microsoft
86.9K papers, 4.1M citations

88% related

Vienna University of Technology
49.3K papers, 1.3M citations

86% related

Performance
Metrics
No. of papers from the Institution in previous years
YearPapers
202311
202252
202196
2020131
2019133
2018126