Bio: Bernd Neumann is an academic researcher from University of Hamburg. The author has contributed to research in topics: Knowledge representation and reasoning & Description logic. The author has an hindex of 20, co-authored 97 publications receiving 1377 citations.
Papers published on a yearly basis
TL;DR: In this paper, the authors examine the possible use of description logics (DLs) as a knowledge representation and reasoning system for high-level scene interpretation, and show that aggregates can be represented by concept expressions of a description logic which provides a concrete-domain extension for quantitative temporal and spatial constraints.
Abstract: We examine the possible use of description logics (DLs) as a knowledge representation and reasoning system for high-level scene interpretation. It is shown that so-called aggregates composed of multiple parts and constrained primarily by temporal and spatial relations can be used to represent high-level concepts such as object configurations, occurrences, events, and episodes that are required in an application context. Scene interpretation is modelled as a stepwise process which exploits the taxonomical and compositional relations between aggregate concepts while incorporating visual evidence and contextual information. It is shown that aggregates can be represented by concept expressions of a description logic which provides a concrete-domain extension for quantitative temporal and spatial constraints. The analysis reveals that different kinds of representation constructs have to be carefully selected in order to provide for the required expressivity while retaining decidability in general as well as practical support from description logic system implementations in particular. Reasoning services of the DL system can be used as building blocks for the interpretation process, but additional information is required to generate preferred interpretations. A probabilistic model is sketched which can be integrated with the knowledge-based framework.
01 Jan 1984
TL;DR: No computational theory has yet been offered which promises satisfactory results for unrestricted real-world images and considerable progres s has been made in certain restricted situations, but interesting results on how to exploit optica l flow are still being uncovered.
Abstract: A large number of contributions to this workshop is concerned with computing or making use of optical flow . This i s the term now commonly used for an intermediate representation of time-varying imagery where each pixel is assigned a velocity vector describing its temporal displacement in th e image plane or—for human vision-in the retinal field . Optical flow can be consciously experienced by human observers (e .g . when travelling in a car) and has early been recognized a s a valuable source of information pertaining to the motion an d 3D characteristics of a scene (GIBSON 50) . Thorough quantitative analyses, however, have only become available during the last five years, when an increasing number of vision researchers turned to motion problems . As can be seen fro m this workshop, interesting results on how to exploit optica l flow are still being uncovered . Before making use of optical flow it must be computed — unfortunately . As it turns out, no computational theory has yet been offered which promises satisfactory results for unrestricted real-world images . Nevertheless, considerable progres s has been made in certain restricted situations . This is als o documented by several contributions to this workshop . hi thi s introductory survey I shall try to point out the major differences in the approaches taken so far. Fig . 1 gives a rough sketch of the representations and th e processing connected with optical flow . Much of the variety o f the research contributions is due to certain assumptions abou t the visual world . These will be discussed in the following sec tion . The visual world is projected yielding intensity arrays from which optical flow computation per se proceeds . Thre e rather distinct directions of processing have been proposed . A s a first possibility, optical flow is directly computed from th e intensity array . The result is usually a dense flow field . Alternately, descriptive elements like prominent points or edges may be computed first . Points usually give rise to a sparse flo w field after correspondence is established . Edges lead to a quit e different flow compentation clue to the remaining degree of freedom . In section 3 these distinctions are elaborated in som e more detail . Finally, I shall briefly review ways of extractin g useful information from optical flow .
••01 Jan 2008
TL;DR: This chapter shows how formal knowledge representation and reasoning techniques can be used for the retrieval and interpretation of multimedia data and introduces description logics (DLs) as the formal basis for ontology languages of the OWL (web ontology language) family.
Abstract: In this chapter, we show how formal knowledge representation and reasoning techniques can be used for the retrieval and interpretation of multimedia data. This section explains what we mean by an “interpretation” using examples of audio and video interpretation. Intuitively, interpretations are descriptions of media data at a high abstraction level, exposing interrelations and coherencies. In Section 3.2.3, we introduce description logics (DLs) as the formal basis for ontology languages of the OWL (web ontology language) family and for the interpretation framework described in subsequent sections. As a concrete example, we consider the interpretation of images describing a sports event in Section 3.3. It is shown that interpretations can be obtained by abductive reasoning, and a general interpretation framework is presented. Stepwise construction of an interpretation can be viewed as navigation in the compositional and taxonomical hierarchies spanned by a conceptual knowledge base. What do we mean by “interpretation” of media objects? Consider the image shown in Fig. 3.1. One can think of the image as a set of primitive objects such as persons, garbage containers, a garbage truck, a bicycle, traffic signs, trees, etc. An interpretation of the image is a description which “makes sense” of these primitive objects. In our example, the interpretation could include the assertions “two workers empty garbage containers into a garbage truck” and “a mailman distributes mail” expressed in some knowledge representation language. When including the figure caption into the interpretation process, we have a multimodal interpretation task which in this case involves visual and textual media objects. The result could be a refinement of the assertions above in terms of the location “in Hamburg”. Note that the interpretation describes activities extending in time although it is only based on a snapshot. Interpretations may generally include
01 Jan 2002
TL;DR: A comprehensive research strategy for the next decade of intelligent infonnation processing must be of an integrated socio-technical nature covering different levels.
Abstract: A very exciting development in current intelligent infonnation processing is the Semantic Web and the innovative e-applications it promises to enable. This promise will not come true, however, if research limits itself to the technological aspects and challenges only. Both supply-demand sides and business-technology sides need to be investigated in an integrated fashion. This implies that we simultaneously have to address technological, social, and business considerations. Therefore, a comprehensive research strategy for the next decade of intelligent infonnation processing must be of an integrated socio-technical nature covering different levels: (1) Definition and standardization of the baseline infrastructures, content Iibraries and languages that make up the Semantic Web; (2) The associated construction of generic smart web services that dynamically bridge the low-Ievel (for the end user) infrastructures and the high-level user applications; (3) Designing and studying innovative e-services, information systems, and business processes at the domain, customer, and business level; (4) Understanding and influencing the business and market logics and critical success factors that will determine the social adoption of smart web-based innovations.
01 Jan 2003
TL;DR: This work presents the SmartKom system, that provides full symmetric multimodality in a mixed-initiative dialogue system with an embodied conversational agent, and details the three-tiered representation of multimodal discourse, consisting of a domain layer, a discourse layer, and a modality layer.
Abstract: We introduce the notion of symmetric multimodality for dialogue systems in which all input modes (eg. speech, gesture, facial expression) are also available for output, and vice versa. A dialogue system with symmetric multimodality must not only understand and represent the user's multimodal input, but also its own multimodal output. We present the SmartKom system, that provides full symmetric multimodality in a mixed-initiative dialogue system with an embodied conversational agent. SmartKom represents a new generation of multimodal dialogue systems, that deal not only with simple modality integration and synchronization, but cover the full spectrum of dialogue phenomena that are associated with symmetric multimodality (including crossmodal references, one-anaphora, and backchannelling). We show that SmartKom's plug-anplay architecture supports multiple recognizers for a single modality, eg. the user's speech signal can be processed by three unimodal recognizers in parallel (speech recognition, emotional prosody, boundary prosody). Finally, we detail SmartKom's three-tiered representation of multimodal discourse, consisting of a domain layer, a discourse layer, and a modality layer.
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).
01 Jan 1990
TL;DR: An overview of the self-organizing map algorithm, on which the papers in this issue are based, is presented in this article, where the authors present an overview of their work.
Abstract: An overview of the self-organizing map algorithm, on which the papers in this issue are based, is presented in this article.
••01 Aug 2004
TL;DR: This paper reviews recent developments and general strategies of the processing framework of visual surveillance in dynamic scenes, and analyzes possible research directions, e.g., occlusion handling, a combination of two and three-dimensional tracking, and fusion of information from multiple sensors, and remote surveillance.
Abstract: Visual surveillance in dynamic scenes, especially for humans and vehicles, is currently one of the most active research topics in computer vision. It has a wide spectrum of promising applications, including access control in special areas, human identification at a distance, crowd flux statistics and congestion analysis, detection of anomalous behaviors, and interactive surveillance using multiple cameras, etc. In general, the processing framework of visual surveillance in dynamic scenes includes the following stages: modeling of environments, detection of motion, classification of moving objects, tracking, understanding and description of behaviors, human identification, and fusion of data from multiple cameras. We review recent developments and general strategies of all these stages. Finally, we analyze possible research directions, e.g., occlusion handling, a combination of twoand three-dimensional tracking, a combination of motion analysis and biometrics, anomaly detection and behavior prediction, content-based retrieval of surveillance videos, behavior understanding and natural language description, fusion of information from multiple sensors, and remote surveillance.
TL;DR: In this paper, the authors offer a new book that enPDFd the perception of the visual world to read, which they call "Let's Read". But they do not discuss how to read it.
Abstract: Let's read! We will often find out this sentence everywhere. When still being a kid, mom used to order us to always read, so did the teacher. Some books are fully read in a week and we need the obligation to support reading. What about now? Do you still love reading? Is reading only for you who have obligation? Absolutely not! We here offer you a new book enPDFd the perception of the visual world to read.
TL;DR: A number of promising applications are identified and an overview of recent developments in this domain is provided, including work on whole-body or hand motion and the various methodologies.
Abstract: The ability to recognize humans and their activities by vision is key for a machine to interact intelligently and effortlessly with a human-inhabited environment. Because of many potentially important applications, “looking at people” is currently one of the most active application domains in computer vision. This survey identifies a number of promising applications and provides an overview of recent developments in this domain. The scope of this survey is limited to work on whole-body or hand motion; it does not include work on human faces. The emphasis is on discussing the various methodologies; they are grouped in 2-D approaches with or without explicit shape models and 3-D approaches. Where appropriate, systems are reviewed. We conclude with some thoughts about future directions.