scispace - formally typeset
F

Fabien Azavant

Researcher at École Normale Supérieure

Publications -  6
Citations -  644

Fabien Azavant is an academic researcher from École Normale Supérieure. The author has contributed to research in topics: Efficient XML Interchange & XML Base. The author has an hindex of 5, co-authored 6 publications receiving 636 citations. Previous affiliations of Fabien Azavant include Télécom ParisTech.

Papers
More filters
Journal ArticleDOI

Building intelligent web applications using lightweight wrappers

TL;DR: The World Wide Web Wrapper Factory (W4F) is presented, a toolkit for the generation of wrappers for Web sources that offers an expressive language to specify the extraction of complex structures from HTML pages and a declarative mapping to various data formats like XML.
Proceedings Article

Building Light-Weight Wrappers for Legacy Web Data-Sources Using W4F

TL;DR: The Web has become a major conduit to information repositories of all kinds, but Web data sources also consist of standalone HTML pages hand-coded by individuals, that provide very useful information such as reviews, digests, links, etc.

WysiWyg Web Wrapper Factory (W4F)

TL;DR: The W4F toolkit consists of a retrieval language to identify Web sources, a declarative extraction language to express robust extraction rules and a mapping interface to export the extracted information into some user-defined data-structures.

Web Ecology: Recycling HTML pages as XML documents using W4F

TL;DR: This paper presents the World-Wide WebWrapper Factory (W4F), a Java toolkit to generate wrappers for Web data sources, an expressive language to extract information from HTML pages in a structured way, a mapping to export it as XML documents and some visual tools to assist the user during wrapper creation.
Proceedings ArticleDOI

Looking at the Web through XML glasses

TL;DR: The World Wide Web Wrapper Factory (W4F) is presented, a Java toolkit for the generation of wrappers for Web sources, with main contributions an expressive language to specify the extraction of complex structures from HTML pages.