Showing papers on "Pointwise mutual information published in 2002"

PDF

Open Access

Posted Content•

Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL

[...]

11 Dec 2002-arXiv: Learning

TL;DR: This article presented an unsupervised learning algorithm for recognizing synonyms based on statistical data acquired by querying a web search engine, called Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words.

...read moreread less

Abstract: This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).

...read moreread less

1,303 citations

Journal Article•DOI•

The mutual information: detecting and evaluating dependencies between variables.

[...]

Ralph E. Steuer, Jürgen Kurths, Carsten O. Daub¹, Janko Weise, Joachim Selbig¹ - Show less +1 more•Institutions (1)

Max Planck Society¹

01 Oct 2002

TL;DR: The findings show that the algorithms used so far may be quite substantially improved upon when dealing with small datasets, finite sample effects and other sources of potentially misleading results have to be taken into account.

...read moreread less

Abstract: Motivation: Clustering co-expressed genes usually requires the definition of ‘distance’ or ‘similarity’ between measured datasets, the most common choices being Pearson correlation or Euclidean distance. With the size of available datasets steadily increasing, it has become feasible to consider other, more general, definitions as well. One alternative, based on information theory, is the mutual information, providing a general measure of dependencies between variables. While the use of mutual information in cluster analysis and visualization of large-scale gene expression data has been suggested previously, the earlier studies did not focus on comparing different algorithms to estimate the mutual information from finite data. Results: Here we describe and review several approaches to estimate the mutual information from finite datasets. Our findings show that the algorithms used so far may be quite substantially improved upon. In particular when dealing with small datasets, finite sample effects and other sources of potentially misleading results have to be taken into account.

...read moreread less

764 citations

Posted Content•

Unsupervised Learning of Semantic Orientation from a Hundred-Billion-Word Corpus

[...]

Peter D. Turney¹, Michael L. Littman•Institutions (1)

National Research Council¹

08 Dec 2002-arXiv: Learning

TL;DR: This paper introduces a simple algorithm for unsupervised learning of semantic orientation from extremely large corpora by issuing queries to a Web search engine and using pointwise mutual information to analyse the results.

...read moreread less

Abstract: The evaluative character of a word is called its semantic orientation. A positive semantic orientation implies desirability (e.g., "honest", "intrepid") and a negative semantic orientation implies undesirability (e.g., "disturbing", "superfluous"). This paper introduces a simple algorithm for unsupervised learning of semantic orientation from extremely large corpora. The method involves issuing queries to a Web search engine and using pointwise mutual information to analyse the results. The algorithm is empirically evaluated using a training corpus of approximately one hundred billion words the subset of the Web that is indexed by the chosen search engine. Tested with 3,596 words (1,614 positive and 1,982 negative), the algorithm attains an accuracy of 80%. The 3,596 test words include adjectives, adverbs, nouns, and verbs. The accuracy is comparable with the results achieved by Hatzivassiloglou and McKeown (1997), using a complex four-stage supervised learning algorithm that is restricted to determining the semantic orientation of adjectives.

...read moreread less

375 citations

Proceedings Article•

Robust feature selection by mutual information distributions

[...]

Marco Zaffalon¹, Marcus Hutter¹•Institutions (1)

Dalle Molle Institute for Artificial Intelligence Research¹

01 Aug 2002

TL;DR: In this article, the distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution, is analyzed for the problem of selecting features for incremental learning and classification of the naive Bayes classifier.

...read moreread less

Abstract: Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sample-to-population inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean and an analytical approximation of the variance are reported. Asymptotic approximations of the distribution are proposed. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined method is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. Finally, a theoretical development is reported that allows one to efficiently extend the above methods to incomplete samples in an easy and effective way.

...read moreread less

116 citations

Proceedings Article•

Comparative Evaluation of Collocation Extraction Metrics.

[...]

Aristomenis Thanopoulos¹, Nikos Fakotakis¹, George Kokkinakis¹•Institutions (1)

University of Patras¹

01 May 2002

TL;DR: It is discovered that a frequency-biased version of mutual dependency performs the best, followed close by likelihood ratio, and some implications that usage of available electronic dictionaries such as the WordNet for evaluation of collocation extraction encompasses are pointed out.

...read moreread less

Abstract: Corpus-based automatic extraction of collocations is typically carried out employing some statistic indicating concurrency in order to identify words that co-occur more often than expected by chance. In this paper we are concerned with some typical measures such as the t-score, Pearson’s χ-square test, log-likelihood ratio, pointwise mutual information and a novel information theoretic measure, namely mutual dependency. Apart from some theoretical discussion about their correlation, we perform comparative evaluation experiments judging performance by their ability to identify lexically associated bigrams. We use two different gold standards: WordNet and lists of named-entities. Besides discovering that a frequency-biased version of mutual dependency performs the best, followed close by likelihood ratio, we point out some implications that usage of available electronic dictionaries such as the WordNet for evaluation of collocation extraction encompasses.

...read moreread less

76 citations

Proceedings Article•DOI•

Second-order moments and mutual information in the analysis of time series

[...]

David R. Brillinger

01 Nov 2002

TL;DR: The focus of this article is simple networks and includes an example from hydrology, where the nodal random variables are time series.

...read moreread less

Abstract: A statistical network is a collection of nodes representing random variables and a set of edges that connect the nodes. A probabilistic model for such is called a statistical graphical model. These models, graphs and networks are particularly useful for examining statistical dependencies amongst quantities via conditioning. In this article the nodal random variables are time series. Basic to the study of statistical networks is some measure of the strength of (possibly directed) connections between the nodes. The use of the ordinary and partial coherences and of mutual information is considered as a study for inference concerning statistical graphical models. The focus of this article is simple networks. The article includes an example from hydrology.

...read moreread less

25 citations

Proceedings Article•DOI•

Generalized mutual information similarity metrics for multimodal biomedical image registration

[...]

Mark P. Wachowiak¹, R. Smolikova¹, Georgia D. Tourassi, Adel Elmaghraby•Institutions (1)

University of Louisville¹

23 Oct 2002

TL;DR: Mutual information similarity metrics computed from fractional order Renyi entropy and entropy kind t are presented as novel similarity metrics for ultrasound/MRI registration and are shown to be more accurate than Shannon mutual information in many cases.

...read moreread less

Abstract: Mutual information has been widely used as a similarity metric for biomedical image registration. Although usually based on the Shannon definition of entropy, mutual information may be computed from other entropy definitions. Mutual information similarity metrics computed from fractional order Renyi entropy and entropy kind t are presented as novel similarity metrics for ultrasound/MRI registration. These metrics are shown to be more accurate than Shannon mutual information in many cases, and frequently facilitate faster convergence to the optimum. They are particularly effective for local optimization, but some measures may potentially be exploited for global searches.

...read moreread less

11 citations

Posted Content•

Four Top Reasons Mutual Information Does Not Quantify Neural Information Processing

[...]

Don H. Johnson¹•Institutions (1)

Rice University¹

01 Feb 2002-Social Science Research Network

TL;DR: To measure mutual information, the experimenter defines a stimulus set and, from the measured response, estimates , the probability distribution of the response under each stimulus condition.

...read moreread less

Abstract: Mutual information between stimulus and response has been advocated as an information theoretic measure of a neural system’s capability to process information. Once calculated, the result is a single number that supposedly captures the system’s information characteristics over the range of stimulus conditions used to measure it. I show that mutual information is a flawed measure, the standard approach to measuring it has theoretical difficulties, and that relating capacity to information processing capability is quite complicated.

...read moreread less

11 citations

Proceedings Article•DOI•

The MI-RBFN: mapping for generalization

[...]

P.B. Deignan¹, Peter H. Meckl¹, Matthew A. Franchek¹•Institutions (1)

Purdue University¹

08 May 2002

TL;DR: The expectation-maximization algorithm is introduced for Gaussian clustering of MI estimates and the specification of a set of rules for intelligently determining the binning interval of the input and target spaces is marked.

...read moreread less

Abstract: The mutual information-radial basis function network (MI-RBFN) is an efficient, general, and integrated method of approximating complex, continuous, deterministic systems from incomplete information. The nodes of the MI-RBFN are located by clustering local mutual information estimates thereby yielding a, mapping that inherently generalizes better than one formulated by seeking solely to minimize residuals. The expectation-maximization algorithm is introduced for Gaussian clustering of MI estimates. A further improvement in the methodology is marked by the specification of a set of rules for intelligently determining the binning interval of the input and target spaces.

...read moreread less

5 citations

Distribution of mutual information for robust feature selection

[...]

Marcus Hutter, Marco Zaffalon

15 May 2002

TL;DR: A fast, newly defined filter is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets and allows the above methods to be extended to incomplete samples in an easy and effective way.

...read moreread less

Abstract: Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sample-to-population inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by using second-order Dirichlet prior distributions. We derive reliable and quickly computable analytical approximations for the distribution of mutual information. We concentrate on the mean, variance, skewness, and kurtosis. For the mean we also provide an exact expression. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined filter is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. A theoretical development allows the above methods to be extended to incomplete samples in an easy and effective way. Further experiments on incomplete data sets support the extension of the proposed filter to the case of missing data.

...read moreread less

4 citations

Journal Article•DOI•

Mutual information calculation using empirical classification

[...]

Charlotte M. Gruner

01 Jun 2002-Neurocomputing

TL;DR: This work uses a new method for calculating mutual information based on empirical classification to show how mutual information and the Kullback–Leibler distance summarize coding efficacy, and suggests that knowledge gained through mutual information methods could be more easily obtained and interpreted using the KULLback– Leiblerdistance.

...read moreread less

Proceedings Article•

MMIHMM: Maximum Mutual Information Hidden Markov Models

[...]

Nuria Oliver, Ashutosh Garg

08 Jul 2002

TL;DR: In this article, the authors proposed a new family of Hidden Markov Models (MMIHMMs) which have the same graphical structure as HMMs, but the cost function being optimized is not the joint likelihood of the observations and the hidden states.

...read moreread less

Abstract: This paper proposes a new family of Hidden Markov Models named Maximum Mutual Information Hidden Markov Models (MMIHMMs). MMIHMMs have the same graphical structure as HMMs. However, the cost function being optimized is not the joint likelihood of the observations and the hidden states. It consists of the weighted linear combination of the mutual information between the hidden states and the observations and the likelihood of the observations and the states. We present both theoretical and practical motivations for having such a cost function. Next, we derive the parameter estimation (learning) equations for both the discrete and continuous observation cases. Finally we illustrate the superiority of our approach in different classification tasks by comparing the classification performance of our proposed Maximum Mutual Information HMMs (MMIHMMs) with standard Maximum Likelihood HMMs (HMMs), in the case of synthetic and real, discrete and continuous, supervised and unsupervised data. We believe that MMIHMMs are a powerful tool to solve many of the problems associated with HMMs when used for classification and/or clustering.

...read moreread less

Journal Article•DOI•

Average mutual information of the random block-message ensembles for many-letters

[...]

Hao San-Ru¹, Hao San-Ru², Hou Bo-Yu¹•Institutions (2)

Northwest University (China)¹, Hunan Normal University²

01 May 2002-Chinese Physics

TL;DR: The standard quantum information theory of block messages with fixed block length to the variable one is generalized and it is shown that the states belonging to a sufficiently large Hilbert space are the highly distinguishable states.

...read moreread less

Abstract: By making use of the theoretical framework presented by Bostroem (K. J. Bostroem, LANL quant-ph/0009052), we generalize the standard quantum information theory of block messages with fixed block length to the variable one. We show that the states belonging to a sufficiently large Hilbert space are the highly distinguishable states. We also consider the collection states (product states of more than one qubit state) and seek a ``pretty good measurement"(PGM) with measurement vectors to improve the mutual information. The average mutual information over random block-message ensembles with variable block length n is discussed in detail.

...read moreread less