scispace - formally typeset
Search or ask a question
Author

Martin Alberink

Bio: Martin Alberink is an academic researcher. The author has contributed to research in topics: Semantics & Cluster analysis. The author has an hindex of 3, co-authored 4 publications receiving 85 citations.

Papers
More filters
Proceedings ArticleDOI
26 Aug 2003
TL;DR: This paper presents the results of the first phase of the Topia project, which explored generating a discourse structure derived from generic processing of the underlying domain semantics, transforming this to a structured progression and then using this to steer the choice of hypermedia communicative devices used to convey the actual information in the resulting presentation.
Abstract: Generating hypermedia presentations requires processing constituent material into coherent, unified presentations. One large challenge is creating a generic process for producing hypermedia presentations from the semantics of potentially unfamiliar domains. The resulting presentations must both respect the underlying semantics and appear as coherent, plausible and, if possible, pleasant to the user. Among the related unsolved problems is the inclusion of discourse knowledge in the generation process. One potential approach is generating a discourse structure derived from generic processing of the underlying domain semantics, transforming this to a structured progression and then using this to steer the choice of hypermedia communicative devices used to convey the actual information in the resulting presentation.This paper presents the results of the first phase of the Topia project, which explored this approach. These results include an architecture for this more domain-independent processing of semantics and discourse into hypermedia presentations. We demonstrate this architecture with an implementation using Web standards and freely available technologies.

77 citations

01 Jan 2003
TL;DR: Different clustering techniques for deriving discourse from semantics use properties, relations and numerical properties of information objects and semantic networks to contribute to generating the hierarchical and sequential components of discourse.
Abstract: In earlier work, we have shown how clustering techniques can transform semantic markup into discourse, which style sheets can transform into hypermedia presentations. This paper discusses the different clustering techniques for deriving discourse from semantics. These techniques use properties, relations and numerical properties of information objects and semantic networks. We show how these clustering techniques contribute to generating the hierarchical and sequential components of discourse.

4 citations

01 Jan 2003
TL;DR: This paper shows generation of object sequences and emphasis in accordance with a user input of relevance of information attributes in the authors' Topia architecture, which makes it easier to identify relevant information objects among many others, as well as to observe their relations with the other information objects.
Abstract: For humans to gain comprehensive views of large amounts of repository contents, they need to have insight into the relations among information objects. It is a challenge to automatically generate presentations of repository contents, through, for example, search results, which reveal such relations to readers. Such presentations must reflect properties of information objects such that large sets of information objects appear as a coherent whole. An approach to this is generation of discourse structures that convey such properties of information objects in presentations. Semantic Web technology provides a conceptual basis for generation of discourse in Web-based information environments. This paper describes automatic generation of sequence and emphasis in presentations of information objects. It shows generation of object sequences and emphasis in accordance with a user input of relevance of information attributes in our Topia architecture. The resulting presentations allow users to encounter information objects in decreasing order of relevance. This makes it easier to identify relevant information objects among many others, as well as to observe their relations with the other information objects.

4 citations

Proceedings ArticleDOI
06 Sep 2005
TL;DR: In this paper, a general clustering-based algorithm for deriving presentation structure from semantic structure is presented, and domain-independent presentation generation results from this algorithm are obtained from domain independent presentation generation.
Abstract: This poster presents a general clustering-based algorithm for deriving presentation structure from semantic structure. Domain-independent presentation generation results from this algorithm.

Cited by
More filters
Posted Content
01 Jan 2004
TL;DR: A new theory of relative semantics between objects is presented, based on information distance and Kolmogorov complexity, which is then applied to construct a method to automatically extract the meaning of words and phrases from the world-wide-web using Google page counts.
Abstract: Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of `society' is `database,' and the equivalent of `use' is `way to search the database.' We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts we use the world-wide-web as database, and Google as search engine. The method is also applicable to other search engines and databases. This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the world-wide-web using Google page counts. The world-wide-web is the largest database on earth, and the context information entered by millions of independent users averages out to provide automatic semantics of useful quality. We give applications in hierarchical clustering, classification, and language translation. We give examples to distinguish between colors and numbers, cluster names of paintings by 17th century Dutch masters and names of books by English novelists, the ability to understand emergencies, and primes, and we demonstrate the ability to do a simple automatic English-Spanish translation. Finally, we use the WordNet database as an objective baseline against which to judge the performance of our method. We conduct a massive randomized trial in binary classification using support vector machines to learn categories based on our Google distance, resulting in an a mean agreement of 87% with the expert crafted WordNet categories.

129 citations

Proceedings ArticleDOI
06 Sep 2005
TL;DR: In this article, the mSpace framework and architecture is proposed as a platform to deploy lightweight Semantic Web applications which foreground associative interaction and evaluate both interaction needs and the cost/benefit of using Semantic web technologies to support them.
Abstract: Vannevar Bush proposed the memex as a means to support building knowledge in the way he says the human brain works: by association. Achieving this vision has been a core motivation for hypertext research. In this paper, we suggest first that Bush's memex reflects an interaction paradigm rather than system design. Second, we propose that Semantic Web promises to provide the mechanisms to enable these interaction requirements. Third, we propose the mSpace framework and architecture as a platform to deploy lightweight Semantic Web applications which foreground associative interaction. We propose this lightweight approach as a means to evaluate both interaction needs and the cost/benefits of using Semantic Web technologies to support them.

123 citations

DOI
01 Nov 2006
TL;DR: A methodology to automatically organize video material in an edited video sequence with a rhetorical structure is developed, to enable an alternative authoring process for film makers to make all their material dynamically available to users, without having to edit a static final cut that would select out possible informative footage.
Abstract: The scenario in which the present research takes place is that of one or more online video repositories containing several hours of documentary footage and users possibly interested only in particular topics of that material. In such a setting it is not possible to craft a single version containing all possible topics the user might like to see, unless including all the material, which is clearly not feasible. The main motivation for this research is, therefore, to enable an alternative authoring process for film makers to make all their material dynamically available to users, without having to edit a static final cut that would select out possible informative footage. We developed a methodology to automatically organize video material in an edited video sequence with a rhetorical structure. This was enabled by defining an annotation schema for the material and a genera- tion process with the following two requirements: • the data repository used by the generation process could be extended by simply adding annotated material to it • the final resulting structure of the video generation process would seem familiar to a video literate user The first requirement was satisfied by developing an annotation schema that explic- itly identifies rhetorical elements in the video material, and a generation process that assembles longer sequences of video by manipulating the annotations in a bottom-up fashion. The second requirement was satisfied by modelling the generation process accord- ing to documentary making and general film theory techniques, in particular making the role of rhetoric in video documentaries explicit. A specific case study was carried out using video material for video documentaries. These used an interview structure, where people are asked to make statements about subjective matters. This category is characterized by rich information encoded in the audio track and by the controversiality of the different opinions expressed in the inter- views. The approach was tested by implementing a system called Vox Populi that real- izes a user-driven generation of rhetoric-based video sequences. Using the annotation schema, Vox Populi can be used to generate the story space and to allow the user to select and browse such a space. The user can specify the topic but also the characters of the rhetorical dialogue and the rhetoric form of the presentation. Presenting controversial topics can introduce some bias: Vox Populi tries to con- trol that by modelling some rhetoric and film theory editing techniques that influence

91 citations

Proceedings Article
01 Jan 2006
TL;DR: This paper presented a new theory of relative semantics between objects, based on information distance and Kolmogorov complexity, which is then applied to construct a method to automatically extract the meaning of words and phrases from the world-wide-web using Google page counts.
Abstract: We present a new theory of relative semantics between objects, based on information distance and Kolmogorov complexity. This theory is then applied to construct a method to automatically extract the meaning of words and phrases from the world-wide-web using Google page counts. The approach is novel in its unrestricted problem domain, simplicity of implementation, and manifestly ontological underpinnings. The world-wide-web is the largest database on earth, and the latent semantic context information entered by millions of independent users averages out to provide automatic meaning of useful quality. We give examples to distinguish between colors and numbers, cluster names of paintings by 17th century Dutch masters and names of books by English novelists, the ability to understand emergencies, and primes, and we demonstrate the ability to do a simple automatic English-Spanish translation. Finally, we use the WordNet database as an objective baseline against which to judge the performance of our method. We conduct a massive randomized trial in binary classification using support vector machines to learn categories based on our Google distance, resulting in an a mean agreement of 87% with the expert crafted WordNet categories.

87 citations