scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Density Estimation for Statistics and Data Analysis

01 Oct 1987-The Statistician (John Wiley & Sons, Ltd)-Vol. 36, Iss: 4, pp 420-421
About: This article is published in The Statistician.The article was published on 1987-10-01. It has received 5674 citations till now. The article focuses on the topics: Density estimation.
Citations
More filters
Book
01 Jan 1995
TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

19,056 citations

Journal ArticleDOI
01 Aug 1998
TL;DR: It will be shown that probabilistic methods can be used to predict topic changes in the context of the task of new event detection and provide further proof of concept for the use of language models for retrieval tasks.
Abstract: In today's world, there is no shortage of information. However, for a specific information need, only a small subset of all of the available information will be useful. The field of information retrieval (IR) is the study of methods to provide users with that small subset of information relevant to their needs and to do so in a timely fashion. Information sources can take many forms, but this thesis will focus on text based information systems and investigate problems germane to the retrieval of written natural language documents. Central to these problems is the notion of "topic." In other words, what are documents about? However, topics depend on the semantics of documents and retrieval systems are not endowed with knowledge of the semantics of natural language. The approach taken in this thesis will be to make use of probabilistic language models to investigate text based information retrieval and related problems. One such problem is the prediction of topic shifts in text, the topic segmentation problem. It will be shown that probabilistic methods can be used to predict topic changes in the context of the task of new event detection. Two complementary sets of features are studied individually and then combined into a single language model. The language modeling approach allows this problem to be approached in a principled way without complex semantic modeling. Next, the problem of document retrieval in response to a user query will be investigated. Models of document indexing and document retrieval have been extensively studied over the past three decades. The integration of these two classes of models has been the goal of several researchers but it is a very difficult problem. Much of the reason for this is that the indexing component requires inferences as to the semantics of documents. Instead, an approach to retrieval based on probabilistic language modeling will be presented. Models are estimated for each document individually. The approach to modeling is non-parametric and integrates the entire retrieval process into a single model. One advantage of this approach is that collection statistics, which are used heuristically for the assignment of concept probabilities in other probabilistic models, are used directly in the estimation of language model probabilities in this approach. The language modeling approach has been implemented and tested empirically and performs very well on standard test collections and query sets. In order to improve retrieval effectiveness, IR systems use additional techniques such as relevance feedback, unsupervised query expansion and structured queries. These and other techniques are discussed in terms of the language modeling approach and empirical results are given for several of the techniques developed. These results provide further proof of concept for the use of language models for retrieval tasks.

2,736 citations


Cites methods from "Density Estimation for Statistics a..."

  • ...Rather than making parametric assumptions, as is done in the 2-Poisson model it is assumed that terms follow a mixture of two Poisson distributions, as Silverman said, \the data will be allowed to speak for themselves [16]....

    [...]

Journal ArticleDOI
TL;DR: The results show the importance of taking characteristics of several regions of the recorded electropherogram into account in order to get a robust and reliable prediction of RNA integrity, especially if compared to traditional methods.
Abstract: The integrity of RNA molecules is of paramount importance for experiments that try to reflect the snapshot of gene expression at the moment of RNA extraction. Until recently, there has been no reliable standard for estimating the integrity of RNA samples and the ratio of 28S:18S ribosomal RNA, the common measure for this purpose, has been shown to be inconsistent. The advent of microcapillary electrophoretic RNA separation provides the basis for an automated high-throughput approach, in order to estimate the integrity of RNA samples in an unambiguous way. A method is introduced that automatically selects features from signal measurements and constructs regression models based on a Bayesian learning technique. Feature spaces of different dimensionality are compared in the Bayesian framework, which allows selecting a final feature combination corresponding to models with high posterior probability. This approach is applied to a large collection of electrophoretic RNA measurements recorded with an Agilent 2100 bioanalyzer to extract an algorithm that describes RNA integrity. The resulting algorithm is a user-independent, automated and reliable procedure for standardization of RNA quality control that allows the calculation of an RNA integrity number (RIN). Our results show the importance of taking characteristics of several regions of the recorded electropherogram into account in order to get a robust and reliable prediction of RNA integrity, especially if compared to traditional methods.

2,406 citations

Book
01 Nov 1989
TL;DR: In this article, the authors propose a model called the Dynamic Regression Model (DRM) which is an extension of the First-Order Polynomial Model (FOPM) and the Dynamic Linear Model (DLM).
Abstract: to the DLM: The First-Order Polynomial Model.- to the DLM: The Dynamic Regression Model.- The Dynamic Linear Model.- Univariate Time Series DLM Theory.- Model Specification and Design.- Polynomial Trend Models.- Seasonal Models.- Regression, Autoregression, and Related Models.- Illustrations and Extensions of Standard DLMs.- Intervention and Monitoring.- Multi-Process Models.- Non-Linear Dynamic Models: Analytic and Numerical Approximations.- Exponential Family Dynamic Models.- Simulation-Based Methods in Dynamic Models.- Multivariate Modelling and Forecasting.- Distribution Theory and Linear Algebra.

2,129 citations


Cites background or methods from "Density Estimation for Statistics a..."

  • ...Conventional density estimation techniques (Silverman 1986) choose the window width h as a slowly decreasing function of n, so that the kernel components are naturally more concentrated about the locations θj for larger sample sizes....

    [...]

  • ...Useful background on posterior simulation appears in Bernardo and Smith (1994, Section 5.5), and Gelman, Carlin, Stern and Rubin (1995), Chapters 10 and 11....

    [...]

  • ...through Woodward and Goldsmith (1964), and of British Nylon Spinners, later to become part of ICI, through Ewan and Kemp (1960)....

    [...]

Proceedings ArticleDOI
20 Aug 2006
TL;DR: This work presents a method for "compressing" large, complex ensembles into smaller, faster models, usually without significant loss in performance.
Abstract: Often the best performing supervised learning models are ensembles of hundreds or thousands of base-level classifiers. Unfortunately, the space required to store this many classifiers, and the time required to execute them at run-time, prohibits their use in applications where test sets are large (e.g. Google), where storage space is at a premium (e.g. PDAs), and where computational power is limited (e.g. hea-ring aids). We present a method for "compressing" large, complex ensembles into smaller, faster models, usually without significant loss in performance.

2,091 citations