scispace - formally typeset
Search or ask a question
Author

Zuhair Bandar

Bio: Zuhair Bandar is an academic researcher from Manchester Metropolitan University. The author has contributed to research in topics: Semantic similarity & Decision tree. The author has an hindex of 18, co-authored 80 publications receiving 1728 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Experiments demonstrate that the proposed method provides a similarity measure that shows a significant correlation to human intuition and can be used in a variety of applications that involve text knowledge representation and discovery.
Abstract: Sentence similarity measures play an increasingly important role in text-related research and applications in areas such as text mining, Web page retrieval, and dialogue systems. Existing methods for computing sentence similarity have been adopted from approaches used for long text documents. These methods process sentences in a very high-dimensional space and are consequently inefficient, require human input, and are not adaptable to some application domains. This paper focuses directly on computing the similarity between very short texts of sentence length. It presents an algorithm that takes account of semantic information and word order information implied in the sentences. The semantic similarity of two sentences is calculated using information from a structured lexical database and from corpus statistics. The use of a lexical database enables our method to model human common sense knowledge and the incorporation of corpus statistics allows our method to be adaptable to different domains. The proposed method can be used in a variety of applications that involve text knowledge representation and discovery. Experiments on two sets of selected sentence pairs demonstrate that the proposed method provides a similarity measure that shows a significant correlation to human intuition

850 citations

Book ChapterDOI
26 Mar 2008
TL;DR: A comparative study of STASIS and LSA is described, which shows measures of semantic similarity can be applied to short texts for use in Conversational Agents (CAs), and a benchmark data set of 65 sentence pairs with human-derived similarity ratings is presented.
Abstract: This paper describes a comparative study of STASIS and LSA These measures of semantic similarity can be applied to short texts for use in Conversational Agents (CAs) CAs are computer programs that interact with humans through natural language dialogue Business organizations have spent large sums of money in recent years developing them for online customer selfservice, but achievements have been limited to simple FAQ systems We believe this is due to the labour-intensive process of scripting, which could be reduced radically by the use of short-text semantic similarity measures "Short texts" are typically 10-20 words long but are not required to be grammatically correct sentences, for example spoken utterances and text messages We also present a benchmark data set of 65 sentence pairs with human-derived similarity ratings This data set is the first of its kind, specifically developed to evaluate such measures and we believe it will be valuable to future researchers

86 citations

Journal ArticleDOI
TL;DR: A novel fuzzy inference algorithm to generate fuzzy decision trees from induced crisp decision trees is proposed, suggesting that the later fuzzy tree is significantly more robust and produces a more balanced classification.

50 citations

Proceedings Article
01 May 2004
TL;DR: A novel algorithm for computing similarity between very short texts of sentence length that takes account of not only semantic information but also word order information implied in the sentences is presented.
Abstract: This paper presents a novel algorithm for computing similarity between very short texts of sentence length. It will introduce a method that takes account of not only semantic information but also word order information implied in the sentences. Firstly, semantic similarity between two sentences is derived from information from a structured lexical database and from corpus statistics. Secondly, word order similarity is computed from the position of word appearance in the sentence. Finally, sentence similarity is computed as a combination of semantic similarity and word order similarity. The proposed algorithm is applied to a real world domain of conversational agents. Experimental results demonstrated that the proposed algorithm reduces the scripter's effort to devise rule base for conversational agent.

48 citations

Journal ArticleDOI
TL;DR: A hybrid radial basis function (RBF) sigmoid neural network with a three-step training algorithm that utilizes both global search and gradient descent training and is seen to compare favorably with both perceptron radial basis net and regression tree derived RBFs.
Abstract: We present a hybrid radial basis function (RBF) sigmoid neural network with a three-step training algorithm that utilizes both global search and gradient descent training. The algorithm used is intended to identify global features of an input-output relationship before adding local detail to the approximating function. It aims to achieve efficient function approximation through the separate identification of aspects of a relationship that are expressed universally from those that vary only within particular regions of the input space. We test the effectiveness of our method using five regression tasks; four use synthetic datasets while the last problem uses real-world data on the wave overtopping of seawalls. It is shown that the hybrid architecture is often superior to architectures containing neurons of a single type in several ways: lower mean square errors are often achievable using fewer hidden neurons and with less need for regularization. Our global-local artificial neural network (GL-ANN) is also seen to compare favorably with both perceptron radial basis net and regression tree derived RBFs. A number of issues concerning the training of GL-ANNs are discussed: the use of regularization, the inclusion of a gradient descent optimization step, the choice of RBF spreads, model selection, and the development of appropriate stopping criteria.

41 citations


Cited by
More filters
01 Jan 2009

7,241 citations

Journal ArticleDOI
TL;DR: This article proposes a framework for representing the meaning of word combinations in vector space in terms of additive and multiplicative functions, and introduces a wide range of composition models that are evaluated empirically on a phrase similarity task.

981 citations

Journal ArticleDOI
TL;DR: This first of its kind, comprehensive literature review of the diverse field of affective computing focuses mainly on the use of audio, visual and text information for multimodal affect analysis, and outlines existing methods for fusing information from different modalities.

969 citations

Book
01 Jan 1975
TL;DR: The major change in the second edition of this book is the addition of a new chapter on probabilistic retrieval, which I think is one of the most interesting and active areas of research in information retrieval.
Abstract: The major change in the second edition of this book is the addition of a new chapter on probabilistic retrieval. This chapter has been included because I think this is one of the most interesting and active areas of research in information retrieval. There are still many problems to be solved so I hope that this particular chapter will be of some help to those who want to advance the state of knowledge in this area. All the other chapters have been updated by including some of the more recent work on the topics covered. In preparing this new edition I have benefited from discussions with Bruce Croft, The material of this book is aimed at advanced undergraduate information (or computer) science students, postgraduate library science students, and research workers in the field of IR. Some of the chapters, particularly Chapter 6 * , make simple use of a little advanced mathematics. However, the necessary mathematical tools can be easily mastered from numerous mathematical texts that now exist and, in any case, references have been given where the mathematics occur. I had to face the problem of balancing clarity of exposition with density of references. I was tempted to give large numbers of references but was afraid they would have destroyed the continuity of the text. I have tried to steer a middle course and not compete with the Annual Review of Information Science and Technology. Normally one is encouraged to cite only works that have been published in some readily accessible form, such as a book or periodical. Unfortunately, much of the interesting work in IR is contained in technical reports and Ph.D. theses. For example, most the work done on the SMART system at Cornell is available only in reports. Luckily many of these are now available through the National Technical Information Service (U.S.) and University Microfilms (U.K.). I have not avoided using these sources although if the same material is accessible more readily in some other form I have given it preference. I should like to acknowledge my considerable debt to many people and institutions that have helped me. Let me say first that they are responsible for many of the ideas in this book but that only I wish to be held responsible. My greatest debt is to Karen Sparck Jones who taught me to research information retrieval as an experimental science. Nick Jardine and Robin …

822 citations