scispace - formally typeset
Search or ask a question
Conference

Brazilian Symposium on Databases 

About: Brazilian Symposium on Databases is an academic conference. The conference publishes majorly in the area(s): Data warehouse & Humanities. Over the lifetime, 473 publications have been published by the conference receiving 2164 citations.


Papers
More filters
Proceedings Article
01 Jan 2003

168 citations

Proceedings Article
01 Jan 2003
TL;DR: This paper introduces a family of geometric data transformation methods (GDTMs) which ensure that the mining process will not violate privacy up to a certain degree of security, primarily on privacy preserving data clustering.
Abstract: Despite its benefit in a wide range of applications, data mining techniques also have raised a number of ethical issues. Some such issues include those of privacy, data security, intellectual property rights, and many others. In this paper, we address the privacy problem against unauthorized secondary use of information. To do so, we introduce a family of geometric data transformation methods (GDTMs) which ensure that the mining process will not violate privacy up to a certain degree of security. We focus primarily on privacy preserving data clustering, notably on partition-based and hierarchical methods. Our proposed methods distort only confidential numerical attributes to meet privacy requirements, while preserving general features for clustering analysis. Our experiments demonstrate that our methods are effective and provide acceptable values in practice for balancing privacy and accuracy. We report the main results of our performance evaluation and discuss some open research issues.

138 citations

Proceedings Article
01 Jan 2000
TL;DR: In this article, the authors proposed a fast, scalable algorithm to quickly select the most important attributes (dimensions) for a given set of n-dimensional vectors, which can spot attributes that have either linear or nonlinear correlations, and give a good estimate on how many attributes should be kept.
Abstract: Dimensionality curse and dimensionality reduction are two key issues that have retained high interest for data mining, machine learning, multimedia indexing, and clustering. In this paper we present a fast, scalable algorithm to quickly select the most important attributes (dimensions) for a given set of n-dimensional vectors. In contrast to older methods, our method has the following desirable properties: (a) it does not do rotation of attributes, thus leading to easy interpretation of the resulting attributes; (b) it can spot attributes that have either linear or nonlinear correlations; (c) it requires a constant number of passes over the dataset; (d) it gives a good estimate on how many attributes should be kept. The idea is to use the ‘fractal' dimension of a dataset as a good approximation of its intrinsic dimension, and to drop attributes that do not affect it. We applied our method on real and synthetic datasets, where it gave fast and correct results.

97 citations

Proceedings Article
03 Oct 2005
TL;DR: GKB, a repository based on a domain independent meta-model for integrating geographic knowledge collected from multiple sources, is introduced and the rules developed to add new knowledge to GKB are described.
Abstract: This paper introduces GKB, a repository based on a domain independent meta-model for integrating geographic knowledge collected from multiple sources. We present the architecture, the repository design and the data cleaning and knowledge integration processes. We also describe the rules developed to add new knowledge to GKB. GKB includes tools for generating ontologies, which are being used by multiple semantic web applications. To illustrate how it is being used, we present some of the applications that interact with the repository or load ontologies created with GKB. Key-words: Information Integration, Knowledge Management, Ontology

61 citations

Proceedings Article
01 Jan 2004
TL;DR: This paper proposed the first, fully-automatic approach to crawling the Hidden Web through keyword-based interfaces that uses an algorithm for automatically deriving a series of keyword- based queries whose goal is to obtain high coverage while minimizing the costs.
Abstract: In this paper, we study the problem of automating the retrieval of data hidden behind simple search interfaces that accept keyword-based queries. Our goal is to automatically retrieve all available results (or, as many as possible). We propose a new approach to siphon hidden data that automatically generates a small set of representative keywords and builds queries which lead to high coverage. We evaluate our algorithms over several real Web sites. Preliminary results indicate our approach is effective: coverage of over 90% is obtained for most of the sites considered.

58 citations

Performance
Metrics
No. of papers from the Conference in previous years
YearPapers
202244
202018
201935
201832
201719
201635