Showing papers by "Qiang Yang published in 2004"

PDF

Open Access

Proceedings Article•DOI•

[...]

Charles X. Ling¹, Qiang Yang², Jianning Wang¹, Shichao Zhang³•Institutions (3)

University of Western Ontario¹, Hong Kong University of Science and Technology², Guangxi Normal University³

04 Jul 2004

TL;DR: A simple, novel and yet effective method for building and testing decision trees that minimizes the sum of the misclassification and test costs and design several intelligent test strategies that can suggest ways of obtaining the missing values at a cost in order to minimize the total cost.

...read moreread less

Abstract: We propose a simple, novel and yet effective method for building and testing decision trees that minimizes the sum of the misclassification and test costs. More specifically, we first put forward an original and simple splitting criterion for attribute selection in tree building. Our tree-building algorithm has many desirable properties for a cost-sensitive learning system that must account for both types of costs. Then, assuming that the test cases may have a large number of missing values, we design several intelligent test strategies that can suggest ways of obtaining the missing values at a cost in order to minimize the total cost. We experimentally compare these strategies and C4.5, and demonstrate that our new algorithms significantly outperform C4.5 and its variations. In addition, our algorithm's complexity is similar to that of C4.5, and is much lower than that of previous work. Our work is useful for many diagnostic tasks which must factor in the misclassification and test costs for obtaining missing information.

...read moreread less

291 citations

Proceedings Article•DOI•

Web-page classification through summarization

[...]

Dou Shen¹, Zheng Chen², Qiang Yang³, Hua-Jun Zeng², Benyu Zhang², Yuchang Lu¹, Wei-Ying Ma² - Show less +3 more•Institutions (3)

Tsinghua University¹, Microsoft², Hong Kong University of Science and Technology³

25 Jul 2004

TL;DR: This paper gives empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web- page classification algorithms and proposes a new Web summarization-based classification algorithm that achieves an approximately 8.8% improvement over pure-text based methods.

...read moreread less

Abstract: Web-page classification is much more difficult than pure-text classification due to a large variety of noisy information embedded in Web pages. In this paper, we propose a new Web-page classification algorithm based on Web summarization for improving the accuracy. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then propose a new Web summarization-based classification algorithm and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that our proposed summarization-based classification algorithm achieves an approximately 8.8% improvement as compared to pure-text-based classification algorithm. We further introduce an ensemble classifier using the improved summarization algorithm and show that it achieves about 12.9% improvement over pure-text based methods.

...read moreread less

204 citations

Proceedings Article•DOI•

Test-cost sensitive naive Bayes classification

[...]

Xiaoyong Chai¹, Lin Deng¹, Qiang Yang¹, Charles X. Ling²•Institutions (2)

Hong Kong University of Science and Technology¹, University of Western Ontario²

01 Nov 2004

TL;DR: This paper shows how to obtain a test-cost sensitive naive Bayes classifier (csNB) by including a test strategy which determines how unknown attributes are selected to perform test on in order to minimize the sum of the mis-classification costs and test costs.

...read moreread less

Abstract: Inductive learning techniques such as the naive Bayes and decision tree algorithms have been extended in the past to handle different types of costs mainly by distinguishing different costs of classification errors. However, it is an equally important issue to consider how to handle the test costs associated with querying the missing values in a test case. When the value of an attribute is missing in a test case, it may or may not be worthwhile to take the effort to obtain its missing value, depending on how much the value results in a potential gain in the classification accuracy. In this paper, we show how to obtain a test-cost sensitive naive Bayes classifier (csNB) by including a test strategy which determines how unknown attributes are selected to perform test on in order to minimize the sum of the mis-classification costs and test costs. We propose and evaluate several potential test strategies including one that allows several tests to be done at once. We empirically evaluate the csNB method, and show that it compares favorably with its decision tree counterpart.

...read moreread less

152 citations

Journal Article•DOI•

Exploiting the kernel trick to correlate fragment ions for peptide identification via tandem mass spectrometry

[...]

Yan Fu¹, Qiang Yang², Rui-Xiang Sun¹, Dequan Li¹, Rong Zeng¹, Charles X. Ling³, Wen Gao¹ - Show less +3 more•Institutions (3)

Chinese Academy of Sciences¹, Hong Kong University of Science and Technology², University of Western Ontario³

12 Aug 2004-Bioinformatics

TL;DR: A promising approach to utilizing the correlative information for improving the peptide identification accuracy by extending the tandem mass spectral dot product to the kernel SDP (KSDP), which outperforms two SDP-based software tools, SEQUEST and Sonar MS/MS, in terms of identification accuracy.

...read moreread less

Abstract: Motivation: The correlation among fragment ions in a tandem mass spectrum is crucial in reducing stochastic mismatches for peptide identification by database searching. Until now, an efficient scoring algorithm that considers the correlative information in a tunable and comprehensive manner has been lacking. Results: This paper provides a promising approach to utilizing the correlative information for improving the peptide identification accuracy. The kernel trick, rooted in the statistical learning theory, is exploited to address this issue with low computational effort. The common scoring method, the tandem mass spectral dot product (SDP), is extended to the kernel SDP (KSDP). Experiments on a dataset reported previously demonstrate the effectiveness of the KSDP. The implementation on consecutive fragments shows a decrease of 10% in the error rate compared with the SDP. Our software tool, pFind, using a simple scoring function based on the KSDP, outperforms two SDP-based software tools, SEQUEST and Sonar MS/MS, in terms of identification accuracy. Supplementary Information: http://www.jdl.ac.cn/user/yfu/pfind/index.html

...read moreread less

101 citations

Proceedings Article•

High-level goal recognition in a wireless LAN

[...]

Jie Yin¹, Xiaoyong Chai¹, Qiang Yang¹•Institutions (1)

Hong Kong University of Science and Technology¹

25 Jul 2004

TL;DR: An integrated plan-recognition model that combines low-level sensory readings with high-level goal inference and a dynamic Bayesian network to infer a user's actions from raw signals and an N-gram model to infer the users' goals from actions is presented.

...read moreread less

Abstract: Plan recognition has traditionally been developed for logically encoded application domains with a focus on logical reasoning. In this paper, we present an integrated plan-recognition model that combines low-level sensory readings with high-level goal inference. A two-level architecture is proposed to infer a user's goals in a complex indoor environment using an RF-based wireless network. The novelty of our work derives from our ability to infer a user's goals from sequences of signal trajectory, and the ability for us to make a trade-off between model accuracy and inference efficiency. The model relies on a dynamic Bayesian network to infer a user's actions from raw signals, and an N-gram model to infer the users' goals from actions. We present a method for constructing the model from the past data and demonstrate the effectiveness of our proposed solution through empirical studies using some real data that we have collected.

...read moreread less

70 citations

Journal Article•DOI•

Building Association-Rule Based Sequential Classifiers for Web-Document Prediction

[...]

Qiang Yang¹, Tianyi Li¹, Ke Wang¹•Institutions (1)

Simon Fraser University¹

01 May 2004-Data Mining and Knowledge Discovery

TL;DR: A comparative study on different kinds of sequential association rules for web document prediction shows that the existing approaches can be cast under two important dimensions, namely the type of antecedents of rules and the criterion for selecting prediction rules.

...read moreread less

Abstract: Web servers keep track of web users' browsing behavior in web logs. From these logs, one can build statistical models that predict the users' next requests based on their current behavior. These data are complex due to their large size and sequential nature. In the past, researchers have proposed different methods for building association-rule based prediction models using the web logs, but there has been no systematic study on the relative merits of these methods. In this paper, we provide a comparative study on different kinds of sequential association rules for web document prediction. We show that the existing approaches can be cast under two important dimensions, namely the type of antecedents of rules and the criterion for selecting prediction rules. From this comparison we propose a best overall method and empirically test the proposed model on real web logs.

...read moreread less

70 citations

Proceedings Article•DOI•

IMMC: incremental maximum margin criterion

[...]

Jun Yan¹, Benyu Zhang², Shuicheng Yan¹, Qiang Yang³, Hua Li¹, Zheng Chen², Wensi Xi⁴, Weiguo Fan⁴, Wei-Ying Ma², Qiansheng Cheng¹ - Show less +6 more•Institutions (4)

Peking University¹, Microsoft², Hong Kong University of Science and Technology³, Virginia Tech⁴

22 Aug 2004

TL;DR: An incremental supervised subspace learning algorithm to infer an adaptive subspace by optimizing the Maximum Margin Criterion is proposed, and experimental results show that IMMC converges to the similar subspace as that of batch approach.

...read moreread less

Abstract: Subspace learning approaches have attracted much attention in academia recently. However, the classical batch algorithms no longer satisfy the applications on streaming data or large-scale data. To meet this desirability, Incremental Principal Component Analysis (IPCA) algorithm has been well established, but it is an unsupervised subspace learning approach and is not optimal for general classification tasks, such as face recognition and Web document categorization. In this paper, we propose an incremental supervised subspace learning algorithm, called Incremental Maximum Margin Criterion (IMMC), to infer an adaptive subspace by optimizing the Maximum Margin Criterion. We also present the proof for convergence of the proposed algorithm. Experimental results on both synthetic dataset and real world datasets show that IMMC converges to the similar subspace as that of batch approach.

...read moreread less

61 citations

Proceedings Article•DOI•

[...]

Ning Liu¹, Benyu Zhang², Jun Yan³, Qiang Yang⁴, Shuicheng Yan², Zheng Chen², Fengshan Bai¹, Wei-Ying Ma² - Show less +4 more•Institutions (4)

Tsinghua University¹, Microsoft², Peking University³, Hong Kong University of Science and Technology⁴

13 Nov 2004

TL;DR: Experimental results show that the proposed algorithm outperforms the traditional Cosine similarity and is superior to LSI, and a novel iterative algorithm for computing non-orthogonal space similarity measures is proposed.

...read moreread less

Abstract: Many machine learning and data mining algorithms crucially rely on the similarity metrics. The Cosine similarity, which calculates the inner product of two normalized feature vectors, is one of the most commonly used similarity measures. However, in many practical tasks such as text categorization and document clustering, the Cosine similarity is calculated under the assumption that the input space is an orthogonal space which usually could not be satisfied due to synonymy and polysemy. Various algorithms such as Latent Semantic Indexing (LSI) were used to solve this problem by projecting the original data into an orthogonal space. However LSI also suffered from the high computational cost and data sparseness. These shortcomings led to increases in computation time and storage requirements for large scale realistic data. In this paper, we propose a novel and effective similarity metric in the non-orthogonal input space. The basic idea of our proposed metric is that the similarity of features should affect the similarity of objects, and vice versa. A novel iterative algorithm for computing non-orthogonal space similarity measures is then proposed. Experimental results on a synthetic data set, a real MSN search click-thru logs, and 20NG dataset show that our algorithm outperforms the traditional Cosine similarity and is superior to LSI.

...read moreread less

45 citations

Journal Article•DOI•

Nitrification-denitrification via nitrite for nitrogen removal from high nitrogen soybean wastewater with on-line fuzzy control.

[...]

Shengqi Wang¹, Dawen Gao², Yongzhen Peng¹, Panpan Wang², Qiang Yang² - Show less +1 more•Institutions (2)

Beijing University of Technology¹, Harbin Institute of Technology²

01 Mar 2004-Water Science and Technology

TL;DR: This method enhances the efficiency and the stability of nitrogen removal, and reduces operating costs and construction investment in the process of wastewater treatment.

...read moreread less

26 citations

Proceedings Article•DOI•

Mining ratio rules via principal sparse non-negative matrix factorization

[...]

Chenyong Hu¹, Benyu Zhang², Shuicheng Yan³, Qiang Yang⁴, Jun Yan³, Zheng Chen², Wei-Ying Ma² - Show less +3 more•Institutions (4)

Chinese Academy of Sciences¹, Microsoft², Peking University³, Hong Kong University of Science and Technology⁴

01 Nov 2004

TL;DR: This paper proposes a method, called principal sparse nonnegative matrix factorization (PSNMF), for learning the associations between itemsets in the form of ratio rules, and provides a support measurement to weigh the importance of each rule for the entire dataset.

...read moreread less

Abstract: Association rules are traditionally designed to capture statistical relationship among itemsets in a given database. To additionally capture the quantitative association knowledge, Korn et al. (1998) proposed a paradigm named ratio rules for quantifiable data mining. However, their approach is mainly based on principle component analysis (PCA) and as a result, it cannot guarantee that the ratio coefficient is nonnegative. This may lead to serious problems in the rules' application. In this paper, we propose a method, called principal sparse nonnegative matrix factorization (PSNMF), for learning the associations between itemsets in the form of ratio rules. In addition, we provide a support measurement to weigh the importance of each rule for the entire dataset.

...read moreread less

22 citations

Journal Article•DOI•

Guest editors' introduction - Information enhancement for data mining

[...]

Shichao Zhang, Chengqi Zhang, Qiang Yang

01 Mar 2004-IEEE Intelligent Systems

Proceedings Article•DOI•

IRC: an iterative reinforcement categorization algorithm for interrelated Web objects

[...]

Gui-Rong Xue¹, Dou Shen², Qiang Yang³, Hua-Jun Zeng⁴, Zheng Chen⁴, Yong Yu¹, Wen Si Xi⁵, Wei-Ying Ma⁴ - Show less +4 more•Institutions (5)

Shanghai Jiao Tong University¹, Tsinghua University², Hong Kong University of Science and Technology³, Microsoft⁴, Virginia Tech⁵

01 Nov 2004

TL;DR: IRC attempts to classify the interrelated Web objects by iterative reinforcement between individual classification results of different types via the interrelationships by exploiting the full interrelationship between the heterogeneous objects on the Web.

...read moreread less

Abstract: Most existing categorization algorithms deal with homogeneous Web data objects, and consider interrelated objects as additional features when taking the interrelationships with other types of objects into account. However, focusing on any single aspects of these interrelationships and objects does not fully reveal their true categories. In this paper, we propose a categorization algorithm, the iterative reinforcement categorization algorithm (IRC), to exploit the full interrelationships between the heterogeneous objects on the Web. IRC attempts to classify the interrelated Web objects by iterative reinforcement between individual classification results of different types via the interrelationships. Experiments on a clickthrough log dataset from MSN search engine show that, with the Fl measures, IRC achieves a 26.4% improvement over a pure content-based classification method, a 21% improvement over a query metadata-based method, and a 16.4% improvement over a virtual document-based method. Furthermore, our experiments show that IRC converges rapidly.

...read moreread less

Journal Article•DOI•

A block-based support vector machine approach to the protein homology prediction task in KDD Cup 2004

[...]

Yan Fu¹, Rui-Xiang Sun¹, Qiang Yang², Si-Min He¹, Chunli Wang¹, Haipeng Wang¹, Shiguang Shan¹, Junfa Liu¹, Wen Gao¹ - Show less +5 more•Institutions (2)

Chinese Academy of Sciences¹, Hong Kong University of Science and Technology²

01 Dec 2004-Sigkdd Explorations

TL;DR: This paper describes the solution for the protein homology prediction task in KDD Cup 2004 competition and focuses on making full use of the abundant information within the blocks, and developing a new technique for reducing and balancing training data to make the support vector machine applicable to this kind of large-scale and imbalanced learning tasks.

...read moreread less

Abstract: This paper describes our solution for the protein homology prediction task in KDD Cup 2004 competition. This task is modeled as a supervised learning problem with multiple performance metrics. Several key characteristics make the problem both novel and challenging, including the concept of data blocks and the presence of large-scale and imbalanced training data. These features make a naive application of the traditional classification algorithms infeasible. Our approach focuses on making full use of the abundant information within the blocks, and developing a new technique for reducing and balancing training data to make the support vector machine applicable to this kind of large-scale and imbalanced learning tasks.

...read moreread less

Journal Article•

LEAPS: A location estimation and action prediction system in a wireless LAN environment

[...]

Qiang Yang¹, Yiqiang Chen², Jie Yin¹, Xiaoyong Chai¹•Institutions (2)

Hong Kong University of Science and Technology¹, Chinese Academy of Sciences²

01 Jan 2004-Lecture Notes in Computer Science

TL;DR: In this article, an integrated framework called LEAPS (location estimation and action prediction), jointly developed by Hong Kong University of Science and Technology, and the Institute of Computing, Shanghai, of the Chinese Academy of Sciences, is presented.

...read moreread less

Abstract: Location estimation and user behavior recognition are research issues that go hand in hand. In the past, these two issues have been investigated separately. In this paper, we present an integrated framework called LEAPS (location estimation and action prediction), jointly developed by Hong Kong University of Science and Technology, and the Institute of Computing, Shanghai, of the Chinese Academy of Sciences that combines two areas of interest, namely, location estimation and plan recognition, in a coherent whole. Under this framework, we have been carrying out several investigations, including action and plan recognition from low-level signals and location estimation by intelligently selecting access points (AP). Our two-layered model, including a sensor-level model and an action and goal prediction model, allows for future extensions in more advanced features and services.

...read moreread less

Book Chapter•DOI•

LEAPS: A Location Estimation and Action Prediction System in a Wireless LAN Environment

[...]

Qiang Yang¹, Yiqiang Chen², Jie Yin¹, Xiaoyong Chai¹•Institutions (2)

Hong Kong University of Science and Technology¹, Chinese Academy of Sciences²

18 Oct 2004

TL;DR: An integrated framework called LEAPS (location estimation and action prediction), jointly developed by Hong Kong University of Science and Technology and the Institute of Computing, Shanghai, is presented that combines two areas of interest, namely, location estimation and plan recognition, in a coherent whole.

...read moreread less

Book Chapter•DOI•

Mining of web-page visiting patterns with continuous-time Markov models

[...]

Qiming Huang¹, Qiang Yang², Joshua Zhexue Huang³, Michael K. Ng³•Institutions (3)

Peking University¹, Hong Kong University of Science and Technology², University of Hong Kong³

26 May 2004

TL;DR: A new prediction model for predicting when an online customer leaves a current page and which next Web page the customer will visit, which is based on the Kolmogorov’s backward equations is presented.

...read moreread less

Abstract: This paper presents a new prediction model for predicting when an online customer leaves a current page and which next Web page the customer will visit. The model can forecast the total number of visits of a given Web page by all incoming users at the same time. The prediction technique can be used as a component for many Web based applications . The prediction model regards a Web browsing session as a continuous-time Markov process where the transition probability matrix can be computed from Web log data using the Kolmogorov’s backward equations. The model is tested against real Web-log data where the scalability and accuracy of our method are analyzed.

...read moreread less

Book Chapter•DOI•

Case Retrieval Using Nonlinear Feature-Space Transformation

[...]

Rong Pan, Qiang Yang¹, Lei Li•Institutions (1)

Hong Kong University of Science and Technology¹

30 Aug 2004-Lecture Notes in Computer Science

TL;DR: This paper explores how to handle case retrieval when the case base is nonlinear in similarity measurement, in which situation the linear similarity functions will result in the wrong solutions.

...read moreread less

Abstract: Good similarity functions are at the heart of effective case-based reasoning. However, the similarity functions that have been designed so far have been mostly linear, weighted-sum in nature. In this paper, we explore how to handle case retrieval when the case base is nonlinear in similarity measurement, in which situation the linear similarity functions will result in the wrong solutions. Our approach is to first transform the case base into a feature space using kernel computation. We perform correlation analysis with maximum correlation criterion(MCC) in the feature space to find the most important features through which we construct a feature-space case base. We then solve the new case in the feature space using the traditional similarity-based retrieval. We show that for nonlinear case bases, our method results in a performance gain by a large margin. We show the theoretical foundation and empirical evaluation to support our observations.

...read moreread less

Proceedings Article•DOI•

Multi-dimensional model-based clustering for user-behavior mining in telecommunications industry

[...]

Yiming Yang¹, Hui Wang, Lei Li, Tianyi Li, Wen-Min Li, Qiang Yang, Wei Lv, Ping Huang - Show less +4 more•Institutions (1)

Sun Yat-sen University¹

26 Aug 2004

TL;DR: An innovative sequential data mining system for mining the customers' churning behaviors for the telecommunications industry by using a model-based clustering method, extended to handle multi-dimensional data, to automatically and efficiently partition the customer behavior according to their behavior.

...read moreread less

Abstract: We develop an innovative sequential data mining system for mining the customers' churning behaviors for the telecommunications industry. Recently, an increasing number of telecommunications customers are switching from one service or service provider to another. This phenomenon is called 'churn', which is a major cause of corporations' loss of profitability. It is important for a telecommunications company to find out the transitional behavior of its customers through data mining. Our approach is to use a model-based clustering method, extended to handle multi-dimensional data, to automatically and efficiently partition the customer behavior according to their behavior. We model this problem as a sequential clustering problem, and present an effective solution for solving the problem when the elements in the sequences are of a multi-dimensional nature. We provide theory and algorithms for the task, and empirically demonstrate that the method is effective in mining the customers data for the telecommunications industry.

...read moreread less

Proceedings Article•DOI•

Cluster cores-based clustering for high dimensional data

[...]

Yi-Dong Shen¹, Zhiyong Shen¹, Shiming Zhang¹, Qiang Yang²•Institutions (2)

Chinese Academy of Sciences¹, Hong Kong University of Science and Technology²

01 Nov 2004

TL;DR: This work proposes a new approach to clustering high dimensional data based on a novel notion of cluster cores, instead of on nearest neighbors, which outperforms the well-known clustering algorithm, ROCK, with both lower time complexity and higher accuracy.

...read moreread less

Abstract: We propose a new approach to clustering high dimensional data based on a novel notion of cluster cores, instead of on nearest neighbors. A cluster core is a fairly dense group with a maximal number of pairwise similar objects. It represents the core of a cluster, as all objects in a cluster are with a great degree attracted to it. As a result, building clusters from cluster cores achieves high accuracy. Other major characteristics of the approach include: (1) It uses a semantics-based similarity measure. (2) It does not incur the curse of dimensionality and is scalable linearly with the dimensionality of data. (3) It outperforms the well-known clustering algorithm, ROCK, with both lower time complexity and higher accuracy.

...read moreread less

Book Chapter•DOI•

Mining web logs for actionable knowledge

[...]

Qiang Yang¹, Charles X. Ling², Jianfeng Gao³•Institutions (3)

Hong Kong University of Science and Technology¹, University of Western Ontario², Microsoft³

01 Apr 2004

TL;DR: This chapter presents three examples of actionable Web log mining, and presents an example of applying Web query log knowledge to improving Web search for a search engine application.

...read moreread less

Abstract: Everyday, popular Websites attract millions of visitors. These visitors leave behind vast amounts of Websites traversal information in the form of Web server and query logs. By analyzing these logs, it is possible to discover various kinds of knowledge, which can be applied to improve the performance of Web services. A particularly useful kind of knowledge is knowledge that can be immediately applied to the operation of the Web-sites; we call this type of knowledge actionable knowledge. In this chapter, we present three examples of actionable Web log mining. The first method is to mine a Web log for Markov models that can be used for improving caching and prefetching of Web objects. A second method is to use the mined knowledge for building better, adaptive user interfaces. The new user interface can adjust as the user behavior changes with time. Finally, we present an example of applying Web query log knowledge to improving Web search for a search engine application.

...read moreread less

Journal Article•DOI•

Guest Editors' Introduction: Mining Actionable Knowledge on the Web

[...]

Qiang Yang, Craig A. Knoblock, Xindong Wu

01 Nov 2004-IEEE Intelligent Systems

TL;DR: This special issue of IEEE Intelligent Systems features five articles that address the problem of actionable Web mining.

...read moreread less

Abstract: The Web-its resources and users-offers a wealth of information for data mining and knowledge discovery. Up to now, a great deal of work has been done applying data mining and machine learning methods to discover novel and useful knowledge on the Web. However, many techniques aim only at extracting knowledge for human users to view and use. Recently, more and more work addresses Web for knowledge that computer systems will use. You can apply such actionable knowledge back to the Web for measurable performance improvements. This special issue of IEEE Intelligent Systems features five articles that address the problem of actionable Web mining.

...read moreread less

Book Chapter•DOI•

A Kernel-based case retrieval algorithm with application to bioinformatics

[...]

Yan Fu¹, Qiang Yang², Charles X. Ling³, Haipeng Wang¹, Dequan Li¹, Rui-Xiang Sun¹, Hu Zhou¹, Rong Zeng¹, Yiqiang Chen¹, Si-Min He¹, Wen Gao¹ - Show less +7 more•Institutions (3)

Chinese Academy of Sciences¹, Hong Kong University of Science and Technology², University of Western Ontario³

09 Aug 2004

TL;DR: An approach to utilizing the correlative information among features to compute the similarity of cases for case retrieval is provided by extending the dot product-based linear similarity measures to their nonlinear versions with kernel functions.

...read moreread less

Abstract: Case retrieval in case-based reasoning relies heavily on the design of a good similarity function. This paper provides an approach to utilizing the correlative information among features to compute the similarity of cases for case retrieval. This is achieved by extending the dot product-based linear similarity measures to their nonlinear versions with kernel functions. An application to the peptide retrieval problem in bioinformatics shows the effectiveness of the approach. In this problem, the objective is to retrieve the corresponding peptide to the input tandem mass spectrum from a large database of known peptides. By a kernel function implicitly mapping the tandem mass spectrum to a high dimensional space, the correlative information among fragment ions in a tandem mass spectrum can be modeled to dramatically reduce the stochastic mismatches. The experiment on the real spectra dataset shows a significant reduction of 10% in the error rate as compared to a common linear similarity function.

...read moreread less

A block-based support vector machine approach to the protein homology prediction task in KDD Cup 2004

[...]

Yan Fu¹, Rui-Xiang Sun¹, Qiang Yang², Si-Min He¹, Chunli Wang¹, Haipeng Wang¹, Shiguang Shan¹, Junfa Liu¹, Wen Gao¹ - Show less +5 more•Institutions (2)

Chinese Academy of Sciences¹, Hong Kong University of Science and Technology²

01 Jan 2004

TL;DR: In this paper, the authors used a support vector machine (SVM) to solve the protein homology prediction task in the KDD Cup 2004 competition, which was modeled as a supervised learning problem with multiple performance metrics.

...read moreread less

Proceedings Article•DOI•

An adaptive CBR model of call center systems

[...]

Liu Hongwei, Li Li, Xuan Yiliang, Jiang Yongbo, Qiang Yang - Show less +1 more

06 Dec 2004

TL;DR: An adaptive CBR model which can learn continually through detecting feedbacks from the outside to partially release this to enhance the system's adaptation of solving problems in dynamic environment.

...read moreread less

Abstract: Adaptation is one of the necessary capabilities of any expert system. In a traditional expert system, the evolving environment is often treated in a static view. And the system accepts the change negatively. Our focus in this paper is to construct an adaptive CBR model which can learn continually through detecting feedbacks from the outside to partially release this. Knowledge base here is improved gradually so to enhance the system's adaptation of solving problems in dynamic environment.

...read moreread less