Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

Machine learning

In 1974 an article appeared in Science magazine with the dry-sounding title “Judgment Under Uncertainty: Heuristics and Biases” by a pair of psychologists who were not well known outside their discipline of decision theory. In it Amos Tversky and Daniel Kahneman introduced the world to Prospect Theory, which mapped out how humans actually behave when faced with decisions about gains and losses, in contrast to how economists assumed that people behave. Prospect Theory turned Economics on its head by demonstrating through a series of ingenious experiments that people are much more concerned with losses than they are with gains, and that framing a choice from one perspective or the other will result in decisions that are exactly the opposite of each other, even if the outcomes are monetarily the same. Prospect Theory led cognitive psychology in a new direction that began to uncover other human biases in thinking that are probably not learned but are part of our brain’s wiring.

Thinking fast and slow.

The PASCAL Visual Object Classes Challenge

Text classification is a foundational task in many NLP applications. Traditional text classifiers often rely on many human-designed features, such as dictionaries, knowledge bases and special tree kernels. In contrast to traditional methods, we introduce a recurrent convolutional neural network for text classification without human-designed features. In our model, we apply a recurrent structure to capture contextual information as far as possible when learning word representations, which may introduce considerably less noise compared to traditional window-based neural networks. We also employ a max-pooling layer that automatically judges which words play key roles in text classification to capture the key components in texts. We conduct experiments on four commonly used datasets. The experimental results show that the proposed method outperforms the state-of-the-art methods on several datasets, particularly on document-level datasets.

/pdf/recurrent-convolutional-neural-networks-for-text-1vxjlhvkh6.pdf

Recurrent convolutional neural networks for text classification

The major change in the second edition of this book is the addition of a new chapter on probabilistic retrieval. This chapter has been included because I think this is one of the most interesting and active areas of research in information retrieval. There are still many problems to be solved so I hope that this particular chapter will be of some help to those who want to advance the state of knowledge in this area. All the other chapters have been updated by including some of the more recent work on the topics covered. In preparing this new edition I have benefited from discussions with Bruce Croft, The material of this book is aimed at advanced undergraduate information (or computer) science students, postgraduate library science students, and research workers in the field of IR. Some of the chapters, particularly Chapter 6 * , make simple use of a little advanced mathematics. However, the necessary mathematical tools can be easily mastered from numerous mathematical texts that now exist and, in any case, references have been given where the mathematics occur. I had to face the problem of balancing clarity of exposition with density of references. I was tempted to give large numbers of references but was afraid they would have destroyed the continuity of the text. I have tried to steer a middle course and not compete with the Annual Review of Information Science and Technology. Normally one is encouraged to cite only works that have been published in some readily accessible form, such as a book or periodical. Unfortunately, much of the interesting work in IR is contained in technical reports and Ph.D. theses. For example, most the work done on the SMART system at Cornell is available only in reports. Luckily many of these are now available through the National Technical Information Service (U.S.) and University Microfilms (U.K.). I have not avoided using these sources although if the same material is accessible more readily in some other form I have given it preference. I should like to acknowledge my considerable debt to many people and institutions that have helped me. Let me say first that they are responsible for many of the ideas in this book but that only I wish to be held responsible. My greatest debt is to Karen Sparck Jones who taught me to research information retrieval as an experimental science. Nick Jardine and Robin …

Information retrieval

In this paper, we propose Latent Dirichlet Allocation (LDA) [1] based document classification algorithm which does not require any labeled dataset. In our algorithm, we construct a topic model using LDA, assign one topic to one of the class labels, aggregate all the same class label topics into a single topic using the aggregation property of the Dirichlet distribution and then automatically assign a class label to each unlabeled document depending on its "closeness" to one of the aggregated topics. We present an extension to our algorithm based on the combination of Expectation-Maximization (EM) algorithm and a naive Bayes classifier. We show effectiveness of our algorithm on three real world datasets.

Document classification by topic labeling

Latent Semantic Indexing (LSI) has been shown to be effective in recovering from synonymy and polysemy in text retrieval applications. However, since LSI ignores class labels of training documents, LSI generated representations are not as effective in classification tasks. To address this limitation, a process called 'sprinkling' is presented. Sprinkling is a simple extension of LSI based on augmenting the set of features using additional terms that encode class knowledge. However, a limitation of sprinkling is that it treats all classes (and classifiers) in the same way. To overcome this, we propose a more principled extension called Adaptive Sprinkling (AS). AS leverages confusion matrices to emphasise the differences between those classes which are hard to separate. The method is tested on diverse classification tasks, including those where classes share ordinal or hierarchical relationships. These experiments reveal that AS can significantly enhance the performance of instance-based techniques (kNN) to make them competitive with the state-of-the-art SVM classifier. The revised representations generated by AS also have a favourable impact on SVM performance.

/pdf/supervised-latent-semantic-indexing-using-adaptive-ydeq43w4z4.pdf

Supervised latent semantic indexing using adaptive sprinkling

Latent Semantic Indexing (LSI) is an established dimensionality reduction technique for Information Retrieval applications. However, LSI generated dimensions are not optimal in a classification setting, since LSI fails to exploit class labels of training documents. We propose an approach that uses class information to influence LSI dimensions whereby class labels of training documents are endoded as new terms, which are appended to the documents. When LSI is carried out on the augmented term-document matrix, terms pertaining to the same class are pulled closer to each other. Evaluation over experimental data reveals significant improvement in classification accuracy over LSI. The results also compare favourably with naive Support Vector Machines.

Sprinkling: supervised latent semantic indexing

We present a novel approach to mine word similarity in Textual Case Based Reasoning. We exploit indirect associations of words, in addition to direct ones for estimating their similarity. If word Aco-occurs with word B, we say Aand Bshare a first order association between them. If Aco-occurs with Bin some documents, and Bwith Cin some others, then Aand Care said to share a second order co-occurrence via B. Higher orders of co-occurrence may similarly be defined. In this paper we present algorithms for mining higher order co-occurrences. A weighted linear model is used to combine the contribution of these higher orders into a word similarity model. Our experimental results demonstrate significant improvements compared to similarity models based on first order co-occurrences alone. Our approach also outperforms state-of-the-art techniques like SVM and LSI in classification tasks of varying complexity.

Acquiring Word Similarities with Higher Order Association Mining

Supervised text classifiers require extensive human expertise and labeling efforts. In this paper, we propose a weakly supervised text classification algorithm based on the labeling of Latent Dirichlet Allocation (LDA) topics. Our algorithm is based on the generative property of LDA. In our algorithm, we ask an annotator to assign one or more class labels to each topic, based on its most probable words. We classify a document based on its posterior topic proportions and the class labels of the topics. We also enhance our approach by incorporating domain knowledge in the form of labeled words. We evaluate our approach on four real world text classification datasets. The results show that our approach is more accurate in comparison to semi-supervised techniques from previous work. A central contribution of this work is an approach that delivers effectiveness comparable to the state-of-the-art supervised techniques in hard-to-classify domains, with very low overheads in terms of manual knowledge engineering.

Sutanu Chakraborti

Papers

Document classification by topic labeling

Supervised latent semantic indexing using adaptive sprinkling

Sprinkling: supervised latent semantic indexing

Acquiring Word Similarities with Higher Order Association Mining

Topic labeled text classification: a weakly supervised approach