scispace - formally typeset
Search or ask a question

Showing papers by "Koji Tsuda published in 2010"


Proceedings Article
31 Oct 2010
TL;DR: A novel method that employs blockwise masked sorting that can dramatically reduce the number of candidate pairs which have to be verified by distance calculation in exchange with an increased amount of sorting operations for high dimensional dense data, where distance calculation is expensive.
Abstract: To save memory and improve speed, vectorial data such as images and signals are often represented as strings of discrete symbols (i.e., sketches). Charikar (2002) proposed a fast approximate method for finding neighbor pairs of strings by sorting and scanning with a small window. This method, which we shall call “single sorting”, is applied to locality sensitive codes and prevalently used in speed-demanding web-related applications. To improve on single sorting, we propose a novel method that employs blockwise masked sorting. Our method can dramatically reduce the number of candidate pairs which have to be verified by distance calculation in exchange with an increased amount of sorting operations. So it is especially attractive for high dimensional dense data, where distance calculation is expensive. Empirical results show the efficiency of our method in comparison to single sorting and recent fast nearest neighbor methods.

20 citations


Journal ArticleDOI
TL;DR: Reaction graph kernels are a new metric for comparing enzymatic reactions that compute similarity between two chemical reactions considering the similarity of chemical compounds in reaction and their relationships.
Abstract: Background Understanding of secondary metabolic pathway in plant is essential for finding druggable candidate enzymes. However, there are many enzymes whose functions are not yet discovered in organism-specific metabolic pathways. Towards identifying the functions of those enzymes, assignment of EC numbers to the enzymatic reactions they catalyze plays a key role, since EC numbers represent the categorization of enzymes on one hand, and the categorization of enzymatic reactions on the other hand.

15 citations


Journal ArticleDOI
TL;DR: Experimental results show the Cartesian kernel is much faster than the Kronecker kernel, and at the same time, competitive with the KrOnecker kernel in predictive performance.
Abstract: Pairwise classification has many applications including network prediction, entity resolution, and collaborative filtering. The pairwise kernel has been proposed for those purposes by several research groups independently, and has been used successfully in several fields. In this paper, we propose an efficient alternative which we call a Cartesian kernel. While the existing pairwise kernel (which we refer to as the Kronecker kernel) can be interpreted as the weighted adjacency matrix of the Kronecker product graph of two graphs, the Cartesian kernel can be interpreted as that of the Cartesian graph, which is more sparse than the Kronecker product graph. We discuss the generalization bounds of the two pairwise kernels by using eigenvalue analysis of the kernel matrices. Also, we consider the N-wise extensions of the two pairwise kernels. Experimental results show the Cartesian kernel is much faster than the Kronecker kernel, and at the same time, competitive with the Kronecker kernel in predictive performance.

15 citations



Book ChapterDOI
01 Dec 2010
TL;DR: In the first step of drug discovery process, a large number of lead compounds are found by high throughput screening, and SAR and QSAR analyses are commonly applied to identify physicochemical properties of the lead compounds.
Abstract: In the first step of drug discovery process, a large number of lead compounds are found by high throughput screening. To identify physicochemical properties of the lead compounds, SAR and QSAR analyses are commonly applied (Gasteiger & Engel, 2003). In machine learning terminology, SAR is understood as a classification task where a chemical compound is given as an input, and the learning machine predicts the value of a binary output variable indicating the activity. In QSAR, the output variable is real-valued and it is a regression task. For accurate prediction, numerical features that characterize physicochemical properties are AbsTRACT

5 citations


Proceedings ArticleDOI
01 Jan 2010
TL;DR: The result showed that combinations detected by IT included non-switching combinations and Pearson was affected by outliers easily while Spearman and the midcorrelation seemed likely to avoid them.
Abstract: We address an issue of detecting a switching mechanism in gene expression, where two genes are positively correlated for one experimental condition while they are negatively correlated for another. We compare the performance of existing methods for this issue, roughly divided into two types: interaction test (IT) and the difference of correlation coefficients. Interaction test, currently a standard approach for detecting epistasis in genetics, is the log-likelihood ratio test between two logistic regressions with/without an interaction term, resulting in checking the strength of interaction between two genes. On the other hand, two correlation coefficients can be computed for two experimental conditions and the difference of them shows the alteration of expression trends in a more straightforward manner. In our experiments, we tested three different types of correlation coefficients: Pearson, Spearman and a midcorrelation (biweight midcorrelation). The experiment was performed by using ~ 2.3 × 10(9) combinations selected out of the GEO (Gene Expression Omnibus) database. We sorted all combinations according to the p-values of IT or by the absolute values of the difference of correlation coefficients and then visually evaluated the top ranked combinations in terms of the switching mechanism. The result showed that 1) combinations detected by IT included non-switching combinations and 2) Pearson was affected by outliers easily while Spearman and the midcorrelation seemed likely to avoid them.

1 citations