scispace - formally typeset
Search or ask a question

Showing papers by "AT&T Labs published in 1997"


Journal ArticleDOI
01 Aug 1997
TL;DR: The model studied can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting, and it is shown that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems.
Abstract: In the first part of the paper we consider the problem of dynamically apportioning resources among a set of options in a worst-case on-line framework. The model we study can be interpreted as a broad, abstract extension of the well-studied on-line prediction model to a general decision-theoretic setting. We show that the multiplicative weight-update Littlestone?Warmuth rule can be adapted to this model, yielding bounds that are slightly weaker in some cases, but applicable to a considerably more general class of learning problems. We show how the resulting learning algorithm can be applied to a variety of problems, including gambling, multiple-outcome prediction, repeated games, and prediction of points in Rn. In the second part of the paper we apply the multiplicative weight-update technique to derive a new boosting algorithm. This boosting algorithm does not require any prior knowledge about the performance of the weak learning algorithm. We also study generalizations of the new boosting algorithm to the problem of learning functions whose range, rather than being binary, is an arbitrary finite set or a bounded segment of the real line.

15,813 citations


Journal ArticleDOI
TL;DR: This special section includes descriptions of five recommender systems, which provide recommendations as inputs, which the system then aggregates and directs to appropriate recipients, and which combine evaluations with content analysis.
Abstract: Recommender systems assist and augment this natural social process. In a typical recommender system people provide recommendations as inputs, which the system then aggregates and directs to appropriate recipients. In some cases the primary transformation is in the aggregation; in others the system’s value lies in its ability to make good matches between the recommenders and those seeking recommendations. The developers of the first recommender system, Tapestry [1], coined the phrase “collaborative filtering” and several others have adopted it. We prefer the more general term “recommender system” for two reasons. First, recommenders may not explictly collaborate with recipients, who may be unknown to each other. Second, recommendations may suggest particularly interesting items, in addition to indicating those that should be filtered out. This special section includes descriptions of five recommender systems. A sixth article analyzes incentives for provision of recommendations. Figure 1 places the systems in a technical design space defined by five dimensions. First, the contents of an evaluation can be anything from a single bit (recommended or not) to unstructured textual annotations. Second, recommendations may be entered explicitly, but several systems gather implicit evaluations: GroupLens monitors users’ reading times; PHOAKS mines Usenet articles for mentions of URLs; and Siteseer mines personal bookmark lists. Third, recommendations may be anonymous, tagged with the source’s identity, or tagged with a pseudonym. The fourth dimension, and one of the richest areas for exploration, is how to aggregate evaluations. GroupLens, PHOAKS, and Siteseer employ variants on weighted voting. Fab takes that one step further to combine evaluations with content analysis. ReferralWeb combines suggested links between people to form longer referral chains. Finally, the (perhaps aggregated) evaluations may be used in several ways: negative recommendations may be filtered out, the items may be sorted according to numeric evaluations, or evaluations may accompany items in a display. Figures 2 and 3 identify dimensions of the domain space: The kinds of items being recommended and the people among whom evaluations are shared. Consider, first, the domain of items. The sheer volume is an important variable: Detailed textual reviews of restaurants or movies may be practical, but applying the same approach to thousands of daily Netnews messages would not. Ephemeral media such as netnews (most news servers throw away articles after one or two weeks) place a premium on gathering and distributing evaluations quickly, while evaluations for 19th century books can be gathered at a more leisurely pace. The last dimension describes the cost structure of choices people make about the items. Is it very costly to miss IT IS OFTEN NECESSARY TO MAKE CHOICES WITHOUT SUFFICIENT personal experience of the alternatives. In everyday life, we rely on

3,993 citations


Journal ArticleDOI
29 Jun 1997
TL;DR: In this article, the problem of finding quantum error-correcting codes is transformed into one of finding additive codes over the field GF(4) which are self-orthogonal with respect to a trace inner product.
Abstract: The unreasonable effectiveness of quantum computing is founded on coherent quantum superposition or entanglement which allows a large number of calculations to be performed simultaneously. This coherence is lost as a quantum system interacts with its environment. In the present paper the problem of finding quantum-error-correcting codes is transformed into one of finding additive codes over the field GF(4) which are self-orthogonal with respect to a certain trace inner product. Many new codes and new bounds are presented, as well as a table of upper and lower bounds on such codes of length up to 30 qubits.

1,525 citations


Journal ArticleDOI
TL;DR: It is shown that if the two-member committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queries, and this exponential decrease holds for query learning of perceptrons.
Abstract: We analyze the “query by committee” algorithm, a method for filtering informative queries from a random stream of inputs. We show that if the two-member committee algorithm achieves information gain with positive lower bound, then the prediction error decreases exponentially with the number of queries. We show that, in particular, this exponential decrease holds for query learning of perceptrons.

1,234 citations


Journal ArticleDOI
TL;DR: ReferralWeb as mentioned in this paper is an interactive system for reconstructing, visualizing, and searching social networks on the World Wide Web, which is based on the six degrees of separation phenomenon.
Abstract: Part of the success of social networks can be attributed to the “six degrees of separation’’ phenomena that means the distance between any two individuals in terms of direct personal relationships is relatively small. An equally important factor is there are limits to the amount and kinds of information a person is able or willing to make available to the public at large. For example, an expert in a particular field is almost certainly unable to write down all he knows about the topic, and is likely to be unwilling to make letters of recommendation he or she has written for various people publicly available. Thus, searching for a piece of information in this situation becomes a matter of searching the social network for an expert on the topic together with a chain of personal referrals from the searcher to the expert. The referral chain serves two key functions: It provides a reason for the expert to agree to respond to the requester by making their relationship explicit (for example, they have a mutual collaborator), and it provides a criteria for the searcher to use in evaluating the trustworthiness of the expert. Nonetheless, manually searching for a referral chain can be a frustrating and time-consuming task. One is faced with the trade-off of contacting a large number of individuals at each step, and thus straining both the time and goodwill of the possible respondents, or of contacting a smaller, more focused set, and being more likely to fail to locate an appropriate expert. In response to these problems we are building ReferralWeb, an interactive system for reconstructing, visualizing, and searching social networks on the World-Wide Web. Simulation experiments we ran before we began construction of ReferralWeb showed that automatically generated referrals can be highly

1,094 citations


Journal Article
Mehryar Mohri1
TL;DR: This work recalls classical theorems and gives new ones characterizing sequential string-to-string transducers, including algorithms for determinizing and minizizing these transducers very efficiently, and characterizations of the transducers admitting determinization and the corresponding algorithms.
Abstract: Finite-machines have been used in various domains of natural language processing. We consider here the use of a type of transducer that supports very efficient programs: sequential transducers. We recall classical theorems and give new ones characterizing sequential string-to-string transducers. Transducers that outpur weights also play an important role in language and speech processing. We give a specific study of string-to-weight transducers, including algorithms for determinizing and minizizing these transducers very efficiently, and characterizations of the transducers admitting determinization and the corresponding algorithms. Some applications of these algorithms in speech recognition are described and illustrated.

1,052 citations


Journal ArticleDOI
TL;DR: In this paper, a group theoretic framework is introduced that simplifies the description of known quantum error-correcting codes and greatly facilitates the construction of new examples, and codes are given which map 3 qubits to 8 qubits correcting 1 error.
Abstract: A group theoretic framework is introduced that simplifies the description of known quantum error-correcting codes and greatly facilitates the construction of new examples. Codes are given which map 3 qubits to 8 qubits correcting 1 error, 4 to 10 qubits correcting 1 error, 1 to 13 qubits correcting 2 errors, and 1 to 29 qubits correcting 5 errors.

774 citations


Journal ArticleDOI
01 Apr 1997
TL;DR: The superposition of many ON/OFF sources with strictly alternating ON- and OFF-periods can produce aggregate network traffic that exhibits the Joseph Effect, and this mathematical result can be combined with modern high-performance computing capabilities to yield a simple and efficient linear-time algorithm for generating self-similar traffic traces.
Abstract: We state and prove the following key mathematical result in self-similar traffic modeling: the superposition of many ON/OFF sources (also known as packet trains) with strictly alternating ON- and OFF-periods and whose ON-periods or OFF-periods exhibit the Noah Effect (i.e., have high variability or infinite variance) can produce aggregate network traffic that exhibits the Joseph Effect (i.e., is self-similar or long-range dependent). There is, moreover, a simple relation between the parameters describing the intensities of the Noah Effect (high variability) and the Joseph Effect (self-similarity). This provides a simple physical explanation for the presence of self-similar traffic patterns in modern high-speed network traffic that is consistent with traffic measurements at the source level. We illustrate how this mathematical result can be combined with modern high-performance computing capabilities to yield a simple and efficient linear-time algorithm for generating self-similar traffic traces.We also show how to obtain in the limit a Levy stable motion, that is, a process with stationary and independent increments but with infinite variance marginals. While we have presently no empirical evidence that such a limit is consistent with measured network traffic, the result might prove relevant for some future networking scenarios.

760 citations


Journal ArticleDOI
TL;DR: This paper focuses on the task of automatically routing telephone calls based on a user's fluently spoken response to the open-ended prompt of “ How may I help you? ”.

664 citations


Journal ArticleDOI
TL;DR: The feasibility of automatic recognition of recommendations is supported by empirical results and some resources are recommended by more than one person, and these multiconfirmed recommendations appear to be significant resources for the relevant community.
Abstract: The feasibility of automatic recognition of recommendations is supported by empirical results. First, Usenet messages are a significant source of recommendations of Web resources: 23% of Usenet messages mention Web resources, and ?>0% of these mentions are recommendations. Second, recommendation instances can be machine-recognized with nearly 90% accuracy. Third, some resources are recommended by more than one person. These multiconfirmed recommendations appear to be significant resources for the relevant community. Finally, the number of distinct recommenders of a resource is a tallying, and redistributing recom-

636 citations


Journal ArticleDOI
TL;DR: This work analyzes algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts, and shows how this leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently know in this context.
Abstract: We analyze algorithms that predict a binary value by combining the predictions of several prediction strategies, called experts. Our analysis is for worst-case situations, i.e., we make no assumptions about the way the sequence of bits to be predicted is generated. We measure the performance of the algorithm by the difference between the expected number of mistakes it makes on the bit sequence and the expected number of mistakes made by the best expert on this sequence, where the expectation is taken with respect to the randomization in the predictins. We show that the minimum achievable difference is on the order of the square root of the number of mistakes of the best expert, and we give efficient algorithms that achieve this. Our upper and lower bounds have matching leading constants in most cases. We then show how this leads to certain kinds of pattern recognition/learning algorithms with performance bounds that improve on the best results currently know in this context. We also compare our analysis to the case in which log loss is used instead of the expected number of mistakes.

Proceedings ArticleDOI
07 Jul 1997
TL;DR: Paradise (PARAdigm for DIalogue System Evaluation) as discussed by the authors is a general framework for evaluating spoken dialogue agents, which decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity.
Abstract: This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity.

Proceedings ArticleDOI
01 Apr 1997
TL;DR: New protocols for two parties to exchange documents with fairness are presented, such that no party can gain an advantage by quitting prematurely or otherwise misbehaving, and a third party that is L‘semi-trusted is used, in the sense that it may misbehave on its own but will not conspire with either of the main parties.
Abstract: We present new protocols for two parties to exchange documents with fairness, i.e., such that no party can gain an advantage by quitting prematurely or otherwise misbehaving. We use a third party that is L‘semi-trusted”, in the sense that it may misbehave on its own but will not conspire with either of the main parties. In our solutions, disruption by any one of the three parties will not allow the disrupter gain any useful new information about the documents. Our solutions are efficient and can be based on any of several cryptographic assumptions (e.g., factoring, discrete log, graph isomorpbism). We also discuss the application of our techniques to electronic commerce protocols to achieve fair payment.

Proceedings ArticleDOI
01 Jun 1997
TL;DR: This paper shows how to compress a very large dataset comprising multiple distinct time sequences into a format that supports ad hoc querying, provided that a small error can be tolerated when the data is uncompressed.
Abstract: Ad hoc querying is difficult on very large datasets, since it is usually not possible to have the entire dataset on disk. While compression can be used to decrease the size of the dataset, compressed data is notoriously difficult to index or access.In this paper we consider a very large dataset comprising multiple distinct time sequences. Each point in the sequence is a numerical value. We show how to compress such a dataset into a format that supports ad hoc querying, provided that a small error can be tolerated when the data is uncompressed. Experiments on large, real world datasets (AT&T customer calling patterns) show that the proposed method achieves an average of less than 5% error in any data value after compressing to a mere 2.5% of the original space (i.e., a 40:1 compression ratio), with these numbers not very sensitive to dataset size. Experiments on aggregate queries achieved a 0.5% reconstruction error with a space requirement under 2%.

Journal ArticleDOI
Pamela Zave1
TL;DR: This article proposes and justifies a trial classification scheme for requirements engineering, and the scheme has been refined somewhat in response to inadequacies discovered during the process of selecting the program.
Abstract: Requirements engineering is the branch of software engineering concerned with the real-world goals for, functions of, and constraints on software systems. It is also concerned with the relationship of these factors to precise specifications of software behavior, and to their evolution over time and across software families. Of all the areas in which computer scientists do research, requirements engineering is probably the most informal, interdisciplinary, and subjective. Although these qualities are inherent to the topic under investigation, they make scientists and mathematicians uncomfortable. Given these circumstances, a rigorous classification of research efforts in requirements engineering-if comprehensive and intelligible-might have several benefits, including: 1. It would delineate the area and would encourage research coverage of the whole area. 2. It would provide structure that might encourage the discovery and articulation of new principles. 3. It would assist in grouping similar things, such as competing solutions to the same problem. These groupings would be a great help in comparing, extending, and exploiting results. This article proposes and justifies a trial classification scheme. An earlier version was used to organize the papers submitted to this symposium, and the scheme has been refined somewhat in response to inadequacies discovered during the process of selecting the program. It is offered in hopes of stimulating discussion and eventual consensus. The fist issue to be tackled is the heterogeneity of the topics usually considered part of requirements engineering. They include Tasks that must be completed: elicitation, validation, specification. Problems that must be solved: barriers to communication, incompleteness, inconsistency. Solutions to problems: formal languages and analysis algorithms, prototyping, metrics, traceability. Ways of contributing to knowledge: descriptions of practice, case studies, controlled experiments. Types of system:: embedded systems, safety-critical systems, distributed systems. A list with all these topics is intended to be comprehensive, but its heterogeneity undermines all chance of bringing order to the field. There seems to be a need for several orthogonal dimensions of classification. While multiple dimensions will certainly help us cope with the heterogeneity of concerns, there is a danger of making the classification scheme too complex to use. I have compromised by settling on two dimensions, which are presented separately in the next two sections.

Book ChapterDOI
08 Jan 1997
TL;DR: It is proposed that both data and schema be represented as edge-labeled graphs and notions of conformance between a graph database and a graph schema are developed and it is shown that there is a natural and efficiently computable ordering on graph schemas.
Abstract: We develop a new schema for unstructured data. Traditional schemas resemble the type systems of programming languages. For unstructured data, however, the underlying type may be much less constrained and hence an alternative way of expressing constraints on the data is needed. Here, we propose that both data and schema be represented as edge-labeled graphs. We develop notions of conformance between a graph database and a graph schema and show that there is a natural and efficiently computable ordering on graph schemas. We then examine certain subclasses of schemas and show that schemas are closed under query applications. Finally, we discuss how they may be used in query decomposition and optimization.

Proceedings Article
27 Jul 1997
TL;DR: This work presents two statistical measures of the local search process that allow one to quickly find the optimal noise settings, and applies these principles to the problem of evaluating new search heuristics, and discovered two promising new strategies.
Abstract: It is well known that the performance of a stochastic local search procedure depends upon the setting of its noise parameter, and that the optimal setting varies with the problem distribution. It is therefore desirable to develop general priniciples for tuning the procedures. We present two statistical measures of the local search process that allow one to quickly find the optimal noise settings. These properties are independent of the fine details of the local search strategies, and appear to be relatively independent of the structure of the problem domains. We applied these principles to the problem of evaluating new search heuristics, and discovered two promising new strategies.

Journal ArticleDOI
TL;DR: This work modified an interrupt-driven networking implementation to do so, and eliminates receive livelock without degrading other aspects of system performance, including the use of polling when the system is heavily loaded, while retaining theUse of interrupts urJer lighter load.
Abstract: Most operating systems use interface interrupts to schedule network tasks. Interrupt-driven systems can provide low overhead and good latency at low offered load, but degrade significantly at higher arrival rates unless care is taken to prevent several pathologies. These are various forms ofreceive livelock, in which the system spends all of its time processing interrupts, to the exclusion of other necessary tasks. Under extreme conditions, no packets are delivered to the user application or the output of the system. To avoid livelock and related problems, an operating system must schedule network interrupt handling as carefully as it schedules process execution. We modified an interrupt-driven networking implementation to do so; this modification eliminates receive livelock without degrading other aspects of system performance. Our modifications include the use of polling when the system is heavily loaded, while retaining the use of interrupts ur.Jer lighter load. We present measurements demonstrating the success of our approach.

Proceedings Article
08 Dec 1997
TL;DR: The potential benefit of a shared proxy-caching server in a large environment is quantified by using traces that were collected at the Internet connection points for two large corporations, representing significant numbers of references.
Abstract: Caching in the World Wide Web is based on two critical assumptions: that a significant fraction of requests reaccess resources that have already been retrieved; and that those resources do not change between accesses. We tested the validity of these assumptions, and their dependence on characteristics of Web resources, including access rate, age at time of reference, content type, resource size, and Internet top-level domain. We also measured the rate at which resources change, and the prevalence of duplicate copies in the Web. We quantified the potential benefit of a shared proxy-caching server in a large environment by using traces that were collected at the Internet connection points for two large corporations, representing significant numbers of references. Only 22% of the resources referenced in the traces we analyzed were accessed more than once, but about half of the references were to those multiply-referenced resources. Of this half, 13% were to a resource that had been modified since the previous traced reference to it. We found that the content type and rate of access have a strong influence on these metrics, the domain has a moderate influence, and size has little effect. In addition, we studied other aspects of the rate of change, including semantic differences such as the insertion or deletion of anchors, phone numbers, and email addresses.

Proceedings ArticleDOI
01 May 1997
TL;DR: A comprehensive suite of measures to quantify the level of class coupling during the design of object-oriented systems takes into account the different 00 design mechanisms provided by the C++ language but it can be tailored to other 00 languages.
Abstract: This paper proposes a comprehensive suite of measures to quantify the level of class coupling during the design of object-oriented systems. This suite takes into account the different 00 design mechanisms provided by the C++ language (e.g., friendship between classes, specialization, and aggregation) but it can be tailored to other 00 languages. The different measures in our suite thus reflect different hypotheses about the different mechanisms of coupling in 00 systems. Based on actual project defect data, the hypotheses underlying our coupling measures are empirically validated by analyzing their relationship with the probability of fault detection across classes. The results demonstrate that some of these coupling measures may be useful early quality indicators of the design 'of 00 systems. These measures are conceptually different from the 00 design measures defined by Chidamber and Kemerer; in addition, our data suggests that they are complementary quality indicators.

Proceedings ArticleDOI
01 Oct 1997
TL;DR: It is shown that delta encoding can provide remarkable improvements in response size and response delay for an important subset of HTTP content types, and that the combination of delta encoding and data compression yields the best results.
Abstract: Caching in the World Wide Web currently follows a naive model, which assumes that resources are referenced many times between changes. The model also provides no way to update a cache entry if a resource does change, except by transferring the resource's entire new value. Several previous papers have proposed updating cache entries by transferring only the differences, or "delta," between the cached entry and the current value.In this paper, we make use of dynamic traces of the full contents of HTTP messages to quantify the potential benefits of delta-encoded responses. We show that delta encoding can provide remarkable improvements in response size and response delay for an important subset of HTTP content types. We also show the added benefit of data compression, and that the combination of delta encoding and data compression yields the best results.We propose specific extensions to the HTTP protocol for delta encoding and data compression. These extensions are compatible with existing implementations and specifications, yet allow efficient use of a variety of encoding techniques.

Proceedings ArticleDOI
26 Oct 1997
TL;DR: This work presents an approach to build integer to integer wavelet transforms based upon the idea of factoring wavelet transformations into lifting steps, which allows the construction of an integer version of every wavelet transform.
Abstract: Invertible wavelet transforms that map integers to integers are important for lossless representations. We present an approach to build integer to integer wavelet transforms based upon the idea of factoring wavelet transforms into lifting steps. This allows the construction of an integer version of every wavelet transform. We demonstrate the use of these transforms in lossless image compression.

Proceedings ArticleDOI
04 May 1997
TL;DR: It is shown how to transform algorithms that assume that all experts are always awake to algorithms that do not require this assumption, and how to derive corresponding loss bounds.
Abstract: We study online learning algorithms that predict by com- bining the predictions of several subordinate prediction a lgorithms, sometimes called "experts." These simple algorithms belon g to the multiplicative weights family of algorithms. The performance of these algorithms degrades only logarithmically with the number of experts, making them particularly useful in applications where the number of experts is very large. However, in applications such as text categorization, it is often natural for some of the ex perts to abstain from making predictions on some of the instances. We show how to transform algorithms that assume that all experts are always awake to algorithms that do not require this assumption. We also show how to derive corresponding loss bounds. Our method is very general, and can be applied to a large family of online learning algorithms. We also give applications to various prediction models including decision graphs and "switching" experts.

Journal ArticleDOI
01 Sep 1997
TL;DR: The need for trust management in Web applications is flesh out, the design philosophy of the REFEREE trust management system is explained, a prototype implementation is described, and a system for writing policies about policies is described.
Abstract: Digital signatures provide a mechanism for guaranteeing integrity and authenticity of Web content but not more general notions of security or trust. Web-aware applications must permit users to state clearly their own security policies and, of course, must provide the cryptographic tools for manipulating digital signatures. This paper describes the REFEREE trust management system for Web applications; REFEREE provides both a general policy-evaluation mechanism for Web clients and servers and a language for specifying trust policies. REFEREE places all trust decisions under explicit policy control; in the REFEREE model, every action, including evaluation of compliance with policy, happens under the control of some policy. That is, REFEREE is a system for writing policies about policies, as well as policies about cryptographic keys, PICS label bureaus, certification authorities, trust delegation, or anything else. In this paper, we flesh out the need for trust management in Web applications, explain the design philosophy of the REFEREE trust management system, and describe a prototype implementation of REFEREE.

Journal ArticleDOI
TL;DR: Comparisons with outdoor experimental data collected in Manhattan and Boston show that the computer-based propagation tool can predict signal strengths in these environments with very good accuracy, showing that simulations, rather than costly field measurements, can lead to accurate determination of the coverage area for a given system design.
Abstract: Engineers designing and installing outdoor and indoor wireless communications systems need effective and practical tools to help them determine base station antenna locations for adequate signal coverage. Computer-based radio propagation prediction tools are now often used in designing these systems. We assess the performance of such a propagation tool based on ray-tracing and advanced computational methods. We have compared its predictions with outdoor experimental data collected in Manhattan and Boston (at 900 MHz and 2 GHz). The comparisons show that the computer-based propagation tool can predict signal strengths in these environments with very good accuracy. The prediction errors are within 6 dB in both mean and standard deviation. This shows that simulations, rather than costly field measurements, can lead to accurate determination of the coverage area for a given system design.

Proceedings Article
27 Jul 1997
TL;DR: P-CLASSIC is presented, a probabilistic version of the description logiC CLASSIC that combines description logic with Bayesian networks and it is shown that the complexity of the inference algorithm is the best that can be hoped for in a language that combinesdescription logic withBayesian networks.
Abstract: Knowledge representation languages invariably reflect a trade-off between expressivity and tractability. Evidence suggests that the compromise chosen by description logics is a particularly successful one. However, description logiC (as for all vanants of first-order logic) is severely limited in its ability to express uncertainty. In this paper, we present P-CLASSIC, a probabilistic version of the description logiC CLASSIC. In addition to teoninological knowledge, the language utilizes Bayesian networks to express uncertainty about the basic properties of an individual, the number of fillers for its roles, and the properties of these fillers. We provide a semantics for P-CLASSIC and an effective inference procedure for probabilistic subsumption: computing the probability that a random individual in class C is also in class D. The effectiveness of the algorithm relies on independence assumptions and on our ability to execute lifted inference: reasoning about similar individuals as a group rather than as separate ground teons. We show that the complexity of the inference algorithm is the best that can be hoped for in a language that combines description logic with Bayesian networks. In particular, if we restrict to Bayesian networks that support polynomial time inference, the complexity of our inference procedure is also polynomial time.

Proceedings ArticleDOI
01 Jan 1997
TL;DR: Paradise (PARAdigm for DIalogue System Evaluation) as mentioned in this paper is a general framework for evaluating spoken dialogue agents, which decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity.
Abstract: This paper presents PARADISE (PARAdigm for DIalogue System Evaluation), a general framework for evaluating spoken dialogue agents. The framework decouples task requirements from an agent's dialogue behaviors, supports comparisons among dialogue strategies, enables the calculation of performance over subdialogues and whole dialogues, specifies the relative contribution of various factors to performance, and makes it possible to compare agents performing different tasks by normalizing for task complexity.

Proceedings ArticleDOI
01 Jul 1997
TL;DR: This article proves sanity-check bounds for the error of the leave-oneout cross-validation estimate of the generalization error: that is, bounds showing that the worst-case error of this estimate is not much worse than that of the training error estimate.
Abstract: In this article we prove sanity-check bounds for the error of the leave-oneout cross-validation estimate of the generalization error: that is, bounds showing that the worst-case error of this estimate is not much worse than that of the training error estimate. The name sanity check refers to the fact that although we often expect the leave-one-out estimate to perform considerably better than the training error estimate, we are here only seeking assurance that its performance will not be considerably worse. Perhaps surprisingly, such assurance has been given only for limited cases in the prior literature on cross-validation. Any nontrivial bound on the error of leave-one-out must rely on some notion of algorithmic stability. Previous bounds relied on the rather strong notion of hypothesis stability, whose application was primarily limited to nearest-neighbor and other local algorithms. Here we introduce the new and weaker notion of error stability and apply it to obtain sanity-check bounds for leave-one-ou...

Journal ArticleDOI
TL;DR: The first part of this paper presents a method for empirically validating multitutterance units referred to as discourse segments, and reports highly significant results of segmentations performed by naive subjects, where a commonsense notion of speaker intention is the segmentation criterion.
Abstract: The need to model the relation between discourse structure and linguistic features of utterances is almost universally acknowledged in the literature on discourse. However, there is only weak consensus on what the units of discourse structure are, or the criteria for recognizing and generating them. We present quantitative results of a two-part study using a corpus of spontaneous, narrative monologues. The first part of our paper presents a method for empirically validating multitutterance units referred to as discourse segments. We report highly significant results of segmentations performed by naive subjects, where a commonsense notion of speaker intention is the segmentation criterion. In the second part of our study, data abstracted from the subjects' segmentations serve as a target for evaluating two sets of algorithms that use utterance features to perform segmentation. On the first algorithm set, we evaluate and compare the correlation of discourse segmentation with three types of linguistic cues (referential noun phrases, cue words, and pauses). We then develop a second set using two methods: error analysis and machine learning. Testing the new algorithms on a new data set shows that when multiple sources of linguistic knowledge are used concurrently, algorithm performance improves.

Proceedings ArticleDOI
14 Dec 1997
TL;DR: Using stochastic modeling of real users the authors can both debug and evaluate a speech dialogue system while it is still in the lab, thus substantially reducing the amount of field testing with real users.
Abstract: Automatic speech dialogue systems are becoming common. In order to assess their performance, a large sample of real dialogues has to be collected and evaluated. This process is expensive, labor intensive, and prone to errors. To alleviate this situation we propose a user simulation to conduct dialogues with the system under investigation. Using stochastic modeling of real users we can both debug and evaluate a speech dialogue system while it is still in the lab, thus substantially reducing the amount of field testing with real users.