Showing papers by "Koji Tsuda published in 2010"

PDF

Open Access

Proceedings Article•

Single versus Multiple Sorting in All Pairs Similarity Search

[...]

Yasuo Tabei, Takeaki Uno, Masashi Sugiyama, Koji Tsuda

31 Oct 2010

TL;DR: A novel method that employs blockwise masked sorting that can dramatically reduce the number of candidate pairs which have to be verified by distance calculation in exchange with an increased amount of sorting operations for high dimensional dense data, where distance calculation is expensive.

...read moreread less

Abstract: To save memory and improve speed, vectorial data such as images and signals are often represented as strings of discrete symbols (i.e., sketches). Charikar (2002) proposed a fast approximate method for finding neighbor pairs of strings by sorting and scanning with a small window. This method, which we shall call “single sorting”, is applied to locality sensitive codes and prevalently used in speed-demanding web-related applications. To improve on single sorting, we propose a novel method that employs blockwise masked sorting. Our method can dramatically reduce the number of candidate pairs which have to be verified by distance calculation in exchange with an increased amount of sorting operations. So it is especially attractive for high dimensional dense data, where distance calculation is expensive. Empirical results show the efficiency of our method in comparison to single sorting and recent fast nearest neighbor methods.

...read moreread less

20 citations

Journal Article•DOI•

Reaction graph kernels predict EC numbers of unknown enzymatic reactions in plant secondary metabolism

[...]

Hiroto Saigo¹, Masahiro Hattori², Hisashi Kashima³, Koji Tsuda⁴•Institutions (4)

Max Planck Society¹, Kyoto University², University of Tokyo³, National Institute of Advanced Industrial Science and Technology⁴

18 Jan 2010-BMC Bioinformatics

TL;DR: Reaction graph kernels are a new metric for comparing enzymatic reactions that compute similarity between two chemical reactions considering the similarity of chemical compounds in reaction and their relationships.

...read moreread less

Abstract: Background Understanding of secondary metabolic pathway in plant is essential for finding druggable candidate enzymes. However, there are many enzymes whose functions are not yet discovered in organism-specific metabolic pathways. Towards identifying the functions of those enzymes, assignment of EC numbers to the enzymatic reactions they catalyze plays a key role, since EC numbers represent the categorization of enzymes on one hand, and the categorization of enzymatic reactions on the other hand.

...read moreread less

15 citations

Journal Article•DOI•

Cartesian Kernel: An Efficient Alternative to the Pairwise Kernel

[...]

Hisashi Kashima¹, Satoshi Oyama², Yoshihiro Yamanishi³, Koji Tsuda⁴•Institutions (4)

University of Tokyo¹, Hokkaido University², Mines ParisTech³, National Institute of Advanced Industrial Science and Technology⁴

01 Oct 2010-IEICE Transactions on Information and Systems

TL;DR: Experimental results show the Cartesian kernel is much faster than the Kronecker kernel, and at the same time, competitive with the KrOnecker kernel in predictive performance.

...read moreread less

Abstract: Pairwise classification has many applications including network prediction, entity resolution, and collaborative filtering. The pairwise kernel has been proposed for those purposes by several research groups independently, and has been used successfully in several fields. In this paper, we propose an efficient alternative which we call a Cartesian kernel. While the existing pairwise kernel (which we refer to as the Kronecker kernel) can be interpreted as the weighted adjacency matrix of the Kronecker product graph of two graphs, the Cartesian kernel can be interpreted as that of the Cartesian graph, which is more sparse than the Kronecker product graph. We discuss the generalization bounds of the two pairwise kernels by using eigenvalue analysis of the kernel matrices. Also, we consider the N-wise extensions of the two pairwise kernels. Experimental results show the Cartesian kernel is much faster than the Kronecker kernel, and at the same time, competitive with the Kronecker kernel in predictive performance.

...read moreread less

15 citations

Book Chapter•DOI•

Graph kernels for chemoinformatics

[...]

Hisashi Kashima¹, Hiroto Saigo², Masahiro Hattori³, Masahiro Hattori⁴, Koji Tsuda⁵ - Show less +1 more•Institutions (5)

University of Tokyo¹, Kyushu Institute of Technology², Kyoto University³, Tokyo University of Technology⁴, National Institute of Advanced Industrial Science and Technology⁵

01 Dec 2010

13 citations

Book Chapter•DOI•

Graph Mining in Chemoinformatics

[...]

Hiroto Saigo¹, Hiroto Saigo², Koji Tsuda³•Institutions (3)

Max Planck Society¹, Kyushu Institute of Technology², National Institute of Advanced Industrial Science and Technology³

01 Dec 2010

TL;DR: In the first step of drug discovery process, a large number of lead compounds are found by high throughput screening, and SAR and QSAR analyses are commonly applied to identify physicochemical properties of the lead compounds.

...read moreread less

Abstract: In the first step of drug discovery process, a large number of lead compounds are found by high throughput screening. To identify physicochemical properties of the lead compounds, SAR and QSAR analyses are commonly applied (Gasteiger & Engel, 2003). In machine learning terminology, SAR is understood as a classification task where a chemical compound is given as an input, and the learning machine predicts the value of a binary output variable indicating the activity. In QSAR, the output variable is real-valued and it is a regression task. For accurate prediction, numerical features that characterize physicochemical properties are AbsTRACT

...read moreread less

5 citations

Proceedings Article•DOI•

On the performance of methods for finding a switching mechanism in gene expression.

[...]

Mitsunori Kayano¹, Ichigaku Takigawa, Motoki Shiga, Koji Tsuda, Hiroshi Mamitsuka - Show less +1 more•Institutions (1)

Kyoto University¹

01 Jan 2010

TL;DR: The result showed that combinations detected by IT included non-switching combinations and Pearson was affected by outliers easily while Spearman and the midcorrelation seemed likely to avoid them.

...read moreread less

Abstract: We address an issue of detecting a switching mechanism in gene expression, where two genes are positively correlated for one experimental condition while they are negatively correlated for another. We compare the performance of existing methods for this issue, roughly divided into two types: interaction test (IT) and the difference of correlation coefficients. Interaction test, currently a standard approach for detecting epistasis in genetics, is the log-likelihood ratio test between two logistic regressions with/without an interaction term, resulting in checking the strength of interaction between two genes. On the other hand, two correlation coefficients can be computed for two experimental conditions and the difference of them shows the alteration of expression trends in a more straightforward manner. In our experiments, we tested three different types of correlation coefficients: Pearson, Spearman and a midcorrelation (biweight midcorrelation). The experiment was performed by using ~ 2.3 × 10(9) combinations selected out of the GEO (Gene Expression Omnibus) database. We sorted all combinations according to the p-values of IT or by the absolute values of the difference of correlation coefficients and then visually evaluated the top ranked combinations in terms of the switching mechanism. The result showed that 1) combinations detected by IT included non-switching combinations and 2) Pearson was affected by outliers easily while Spearman and the midcorrelation seemed likely to avoid them.

...read moreread less

1 citations