scispace - formally typeset
Search or ask a question
Author

Serge Abiteboul

Bio: Serge Abiteboul is an academic researcher from French Institute for Research in Computer Science and Automation. The author has contributed to research in topics: XML & Query language. The author has an hindex of 73, co-authored 278 publications receiving 24576 citations. Previous affiliations of Serge Abiteboul include University of Southern California & PSL Research University.


Papers
More filters
Proceedings Article
03 Sep 1996
TL;DR: This paper shows how many common fusion operations can be specified non-procedurally and succinctly and presents key optimization techniques that significantly reduce the processing costs associated with information fusion.
Abstract: One of the main tasks of mediators is to fuse information from heterogeneous information sources. This may involve, for example, removing redundancies, and resolving inconsistencies in favor of the most reliable source. The problem becomes harder when the sources are unstructured/semistructured and we do not have complete knowledge of their contents and structure. In this paper we show how many common fusion operations can be specified non-procedurally and succinctly. The key to our approach is to assign semantically meaningful object ids to objects as they are “imported” into the mediator. These semantic ids can then be used to specify how various objects are combined or merged into objects “exported” by the mediator. In this paper we also discuss the implementation of a mediation system based on these principles. In particular, we present key optimization techniques that significantly reduce the processing costs associated with information fusion.

265 citations

Proceedings ArticleDOI
02 Apr 1984
TL;DR: A new, formally defined database model is introduced which combines fundamental principles of "semantic" database modeling in a coherent fashion and can serve as the foundation for a theoretical investigation into a wide variety of fundamental issues concerning the logical representation of data in databases.
Abstract: A new, formally defined database model is introduced which combines fundamental principles of "semantic" database modeling in a coherent fashion. The model provides mechanisms for representing structured objects and functional and ISA relationships between them. It is anticipated that the model can serve as the foundation for a theoretical investigation into a wide variety of fundamental issues concerning the logical representation of data in databases. Preliminary applications of the model include an efficient algorithm for computing the set of object types which can occur in a given entity set, even in the presence of a complex set of ISA relationships. The model can also be applied to precisely articulate "good" design policies.

262 citations

Proceedings ArticleDOI
20 May 2003
TL;DR: A new algorithm OPIC is introduced that works on-line, and uses much less resources, and does not require storing the link matrix, and is used to focus crawling to the most interesting pages.
Abstract: The computation of page importance in a huge dynamic graph has recently attracted a lot of attention because of the web. Page importance, or page rank is defined as the fixpoint of a matrix equation. Previous algorithms compute it off-line and require the use of a lot of extra CPU as well as disk resources (e.g. to store, maintain and read the link matrix). We introduce a new algorithm OPIC that works on-line, and uses much less resources. In particular, it does not require storing the link matrix. It is on-line in that it continuously refines its estimate of page importance while the web/graph is visited. Thus it can be used to focus crawling to the most interesting pages. We prove the correctness of OPIC. We present Adaptive OPIC that also works on-line but adapts dynamically to changes of the web. A variant of this algorithm is now used by Xyleme.We report on experiments with synthetic data. In particular, we study the convergence and adaptiveness of the algorithms for various scheduling strategies for the pages to visit. We also report on experiments based on crawls of significant portions of the web.

258 citations

Proceedings Article
11 Sep 2001
TL;DR: The foundations of the logical representation and some aspects of the physical storage policy are presented and the implementation of the change-centric method to manage versions in a Web WareHouse of XML data is discussed.
Abstract: We present a change-centric method to manage versions in a Web WareHouse of XML data. The starting points is a sequence of snapshots of XML documents we obtain from the web. By running a di algorithm, we compute the changes between two consecutive versions. We then represent the sequence using a novel representation of changes based on completed deltas and persistent identi ers. We present the foundations of the logical representation and some aspects of the physical storage policy. The work presented here was developed in the context of the Xyleme project of massive XML warehouse for XML data from the Web. It has been implemented and tested. We brie y discuss the implementation.

236 citations

Proceedings ArticleDOI
01 May 1997
TL;DR: The evaluation of path expression queries on semi-structured data in a distributed asynchronous environment is considered and decidability and complexity results on the implication for path constraints are established.
Abstract: The evaluation of path expression queries on semi-structured data in a distributed asynchronous environment is considered. The focus is on the use of local information expressed in the form of path constraints in the optimization of path expression queries. In particular, decidability and complexity results on the implication for path constraints are established.

225 citations


Cited by
More filters
Book
08 Sep 2000
TL;DR: This book presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects, and provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data.
Abstract: The increasing volume of data in modern business and science calls for more complex and sophisticated tools. Although advances in data mining technology have made extensive data collection much easier, it's still always evolving and there is a constant need for new techniques and tools that can help us transform this data into useful information and knowledge. Since the previous edition's publication, great advances have been made in the field of data mining. Not only does the third of edition of Data Mining: Concepts and Techniques continue the tradition of equipping you with an understanding and application of the theory and practice of discovering patterns hidden in large data sets, it also focuses on new, important topics in the field: data warehouses and data cube technology, mining stream, mining social networks, and mining spatial, multimedia and other complex data. Each chapter is a stand-alone guide to a critical topic, presenting proven algorithms and sound implementations ready to be used directly or with strategic modification against live data. This is the resource you need if you want to apply today's most powerful data mining techniques to meet real business challenges. * Presents dozens of algorithms and implementation examples, all in pseudo-code and suitable for use in real-world, large-scale data mining projects. * Addresses advanced topics such as mining object-relational databases, spatial databases, multimedia databases, time-series databases, text databases, the World Wide Web, and applications in several fields. *Provides a comprehensive, practical look at the concepts and techniques you need to get the most out of real business data

23,600 citations

Journal ArticleDOI
01 Apr 1998
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Abstract: In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/. To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and web proliferation, creating a web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale web search engine -- the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.

14,696 citations

Journal Article
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.

13,327 citations

01 Jan 2002

9,314 citations