Showing papers by "IBM published in 2007"

PDF

Open Access

Journal Article•DOI•

[...]

Xindong Wu¹, Vipin Kumar², J. Ross Quinlan, Joydeep Ghosh³, Qiang Yang⁴, Hiroshi Motoda⁵, Geoffrey J. McLachlan⁶, Angus S. K. Ng⁷, Bing Liu⁸, Philip S. Yu⁹, Zhi-Hua Zhou¹⁰, Michael Steinbach², David J. Hand¹¹, Dan Steinberg¹² - Show less +10 more•Institutions (12)

University of Vermont¹, University of Minnesota², University of Texas at Austin³, Hong Kong University of Science and Technology⁴, Osaka University⁵, University of Queensland⁶, Griffith University⁷, University of Illinois at Chicago⁸, IBM⁹, Nanjing University¹⁰, Imperial College London¹¹, University of Salford¹²

19 Dec 2007-Knowledge and Information Systems

TL;DR: This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART.

...read moreread less

Abstract: This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART. These top 10 algorithms are among the most influential data mining algorithms in the research community. With each algorithm, we provide a description of the algorithm, discuss the impact of the algorithm, and review current and further research on the algorithm. These 10 algorithms cover classification, clustering, statistical learning, association analysis, and link mining, which are all among the most important topics in data mining research and development.

...read moreread less

4,944 citations

Journal Article•DOI•

Carbon-based electronics.

[...]

Phaedon Avouris¹, Zhihong Chen¹, Vasili Perebeinos¹•Institutions (1)

IBM¹

30 Sep 2007-Nature Nanotechnology

TL;DR: This work reviews the progress that has been made with carbon nanotubes and, more recently, graphene layers and nanoribbons and suggests that it could be possible to make both electronic and optoelectronic devices from the same material.

...read moreread less

Abstract: The semiconductor industry has been able to improve the performance of electronic systems for more than four decades by making ever-smaller devices. However, this approach will soon encounter both scientific and technical limits, which is why the industry is exploring a number of alternative device technologies. Here we review the progress that has been made with carbon nanotubes and, more recently, graphene layers and nanoribbons. Field-effect transistors based on semiconductor nanotubes and graphene nanoribbons have already been demonstrated, and metallic nanotubes could be used as high-performance interconnects. Moreover, owing to the excellent optical properties of nanotubes it could be possible to make both electronic and optoelectronic devices from the same material.

...read moreread less

2,274 citations

Journal Article•DOI•

Graphene nano-ribbon electronics

[...]

Zhihong Chen¹, Yu-Ming Lin¹, Michael J. Rooks¹, Phaedon Avouris¹•Institutions (1)

IBM¹

01 Dec 2007-Physica E-low-dimensional Systems & Nanostructures

TL;DR: In this article, the electrical properties of nano-ribboned field effect transistor (FE transistor) devices were investigated as a function of ribbon width, and it was shown that the resistivity of a ribbon increases as its width decreases, indicating the impact of edge states.

...read moreread less

Abstract: We have fabricated graphene nano-ribbon field-effect transistor devices and investigated their electrical properties as a function of ribbon width. Our experiments show that the resistivity of a ribbon increases as its width decreases, indicating the impact of edge states. Analysis of temperature-dependent measurements suggests a finite quantum confinement gap opening in narrow ribbons. The electrical current noise of the graphene ribbon devices at low frequency is found to be dominated by the 1/f noise.

...read moreread less

1,506 citations

Journal Article•DOI•

Steps toward a science of service systems

[...]

James C. Spohrer¹, Paul P. Maglio¹, John H. Bailey¹, Daniel Gruhl¹•Institutions (1)

IBM¹

01 Jan 2007-IEEE Computer

TL;DR: A science of service systems could provide theory and practice around service innovation in the service sector.

...read moreread less

Abstract: The service sector accounts for most of the world's economic activity, but it's the least-studied part of the economy. A service system comprises people and technologies that adaptively compute and adjust to a system's changing value of knowledge. A science of service systems could provide theory and practice around service innovation

...read moreread less

1,282 citations

Book•

The Minimum Description Length Principle

[...]

Peter Grünwald¹•Institutions (1)

IBM¹

23 Mar 2007

TL;DR: The minimum description length (MDL) principle as mentioned in this paper is a powerful method of inductive inference, the basis of statistical modeling, pattern recognition, and machine learning, which is particularly well suited for dealing with model selection, prediction, and estimation problems in situations where the models under consideration can be arbitrarily complex, and overfitting the data is a serious concern.

...read moreread less

Abstract: The minimum description length (MDL) principle is a powerful method of inductive inference, the basis of statistical modeling, pattern recognition, and machine learning. It holds that the best explanation, given a limited set of observed data, is the one that permits the greatest compression of the data. MDL methods are particularly well-suited for dealing with model selection, prediction, and estimation problems in situations where the models under consideration can be arbitrarily complex, and overfitting the data is a serious concern.This extensive, step-by-step introduction to the MDL Principle provides a comprehensive reference (with an emphasis on conceptual issues) that is accessible to graduate students and researchers in statistics, pattern classification, machine learning, and data mining, to philosophers interested in the foundations of statistics, and to researchers in other applied sciences that involve model selection, including biology, econometrics, and experimental psychology. Part I provides a basic introduction to MDL and an overview of the concepts in statistics and information theory needed to understand MDL. Part II treats universal coding, the information-theoretic notion on which MDL is built, and part III gives a formal treatment of MDL theory as a theory of inductive inference based on universal coding. Part IV provides a comprehensive overview of the statistical theory of exponential families with an emphasis on their information-theoretic properties. The text includes a number of summaries, paragraphs offering the reader a "fast track" through the material, and boxes highlighting the most important concepts.

...read moreread less

1,270 citations

Journal Article•DOI•

Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite.

[...]

Falk Warnecke¹, Peter Luginbuhl², Natalia Ivanova¹, Majid Ghassemian², Toby Richardson³, Toby Richardson², Justin T. Stege², Michelle H. Cayouette², Alice C. McHardy³, Gordana Djordjevic², Nahla Aboushadi², Rotem Sorek¹, Susannah G. Tringe¹, Mircea Podar⁴, Hector Garcia Martin¹, Victor Kunin¹, Daniel Dalevi¹, Julita Madejska¹, Edward Kirton¹, Darren Platt¹, Ernest Szeto¹, Asaf Salamov¹, Kerrie Barry¹, Natalia Mikhailova¹, Nikos C. Kyrpides¹, Eric G. Matson, Elizabeth A. Ottesen⁵, Xinning Zhang, Myriam Hernandez⁶, Catalina Patricia Morales Murillo⁶, Luis G. Acosta⁶, Isidore Rigoutsos⁷, Giselle Tamayo⁶, Brian D. Green², Cathy Chang², Edward M. Rubin¹, Eric J. Mathur³, Eric J. Mathur², Dan E. Robertson², Philip Hugenholtz¹, Jared R. Leadbetter - Show less +37 more•Institutions (7)

Joint Genome Institute¹, Verenium Corporation², Max Planck Society³, Oak Ridge National Laboratory⁴, California Institute of Technology⁵, Instituto Nacional de Biodiversidad⁶, IBM⁷

22 Nov 2007-Nature

TL;DR: In this article, a metagenomic analysis of the bacterial community resident in the hindgut paunch of a wood-feeding Nasutitermes species (which do not contain cellulose-fermenting protozoa) was performed to show the presence of a large, diverse set of bacterial genes for cellulose and xylan hydrolysis.

...read moreread less

Abstract: From the standpoints of both basic research and biotechnology, there is considerable interest in reaching a clearer understanding of the diversity of biological mechanisms employed during lignocellulose degradation. Globally, termites are an extremely successful group of wood-degrading organisms and are therefore important both for their roles in carbon turnover in the environment and as potential sources of biochemical catalysts for efforts aimed at converting wood into biofuels. Only recently have data supported any direct role for the symbiotic bacteria in the gut of the termite in cellulose and xylan hydrolysis. Here we use a metagenomic analysis of the bacterial community resident in the hindgut paunch of a wood-feeding ‘higher’ Nasutitermes species (which do not contain cellulose-fermenting protozoa) to show the presence of a large, diverse set of bacterial genes for cellulose and xylan hydrolysis. Many of these genes were expressed in vivo or had cellulase activity in vitro, and further analyses implicate spirochete and fibrobacter species in gut lignocellulose degradation. New insights into other important symbiotic functions including H_2 metabolism, CO_2-reductive acetogenesis and N_2 fixation are also provided by this first system-wide gene analysis of a microbial community specialized towards plant lignocellulose degradation. Our results underscore how complex even a 1-μl environment can be.

...read moreread less

1,247 citations

Journal Article•DOI•

Ultracompact optical buffers on a silicon chip

[...]

Fengnian Xia¹, Lidija Sekaric¹, Yurii A. Vlasov¹•Institutions (1)

IBM¹

01 Jan 2007-Nature Photonics

TL;DR: In this paper, the trade-offs between resonantly enhanced group delay, device size, insertion loss and operational bandwidth are analyzed for various delay-line designs, and a large fractional group delay exceeding 10 bits is achieved for bit rates as high as 20 Gbps.

...read moreread less

Abstract: On-chip optical buffers based on waveguide delay lines might have significant implications for the development of optical interconnects in computer systems. Silicon-on-insulator (SOI) submicrometre photonic wire waveguides are used, because they can provide strong light confinement at the diffraction limit, allowing dramatic scaling of device size. Here we report on-chip optical delay lines based on such waveguides that consist of up to 100 microring resonators cascaded in either coupled-resonator or all-pass filter (APF) configurations. On-chip group delays exceeding 500 ps are demonstrated in a device with a footprint below 0.09 mm2. The trade-offs between resonantly enhanced group delay, device size, insertion loss and operational bandwidth are analysed for various delay-line designs. A large fractional group delay exceeding 10 bits is achieved for bit rates as high as 20 Gbps. Measurements of system-level metrics as bit error rates for different bit rates demonstrate error-free operation up to 5 Gbps.

...read moreread less

1,161 citations

Journal Article•DOI•

Logarithmic regret algorithms for online convex optimization

[...]

Elad Hazan¹, Amit Agarwal², Satyen Kale²•Institutions (2)

IBM¹, Princeton University²

01 Dec 2007-Machine Learning

TL;DR: Several algorithms achieving logarithmic regret are proposed, which besides being more general are also much more efficient to implement, and give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field.

...read moreread less

Abstract: In an online convex optimization problem a decision-maker makes a sequence of decisions, i.e., chooses a sequence of points in Euclidean space, from a fixed feasible set. After each point is chosen, it encounters a sequence of (possibly unrelated) convex cost functions. Zinkevich (ICML 2003) introduced this framework, which models many natural repeated decision-making problems and generalizes many existing problems such as Prediction from Expert Advice and Cover's Universal Portfolios. Zinkevich showed that a simple online gradient descent algorithm achieves additive regret $O(\sqrt{T})$ , for an arbitrary sequence of T convex cost functions (of bounded gradients), with respect to the best single decision in hindsight. In this paper, we give algorithms that achieve regret O(log?(T)) for an arbitrary sequence of strictly convex functions (with bounded first and second derivatives). This mirrors what has been done for the special cases of prediction from expert advice by Kivinen and Warmuth (EuroCOLT 1999), and Universal Portfolios by Cover (Math. Finance 1:1---19, 1991). We propose several algorithms achieving logarithmic regret, which besides being more general are also much more efficient to implement. The main new ideas give rise to an efficient algorithm based on the Newton method for optimization, a new tool in the field. Our analysis shows a surprising connection between the natural follow-the-leader approach and the Newton method. We also analyze other algorithms, which tie together several different previous approaches including follow-the-leader, exponential weighting, Cover's algorithm and gradient descent.

...read moreread less

1,124 citations

Journal Article•DOI•

A Survey of Control Issues in Nanopositioning

[...]

Santosh Devasia¹, Evangelos Eleftheriou², S.O.R. Moheimani³•Institutions (3)

University of Washington¹, IBM², University of Newcastle³

27 Aug 2007-IEEE Transactions on Control Systems and Technology

TL;DR: This paper presents an overview of nanopositioning technologies and devices emphasizing the key role of advanced control techniques in improving precision, accuracy, and speed of operation of these systems.

...read moreread less

Abstract: Nanotechnology is the science of understanding matter and the control of matter at dimensions of 100 nm or less. Encompassing nanoscale science, engineering, and technology, nanotechnology involves imaging, measuring, modeling, and manipulation of matter at this level of precision. An important aspect of research in nanotechnology involves precision control and manipulation of devices and materials at a nanoscale, i.e., nanopositioning. Nanopositioners are precision mechatronic systems designed to move objects over a small range with a resolution down to a fraction of an atomic diameter. The desired attributes of a nanopositioner are extremely high resolution, accuracy, stability, and fast response. The key to successful nanopositioning is accurate position sensing and feedback control of the motion. This paper presents an overview of nanopositioning technologies and devices emphasizing the key role of advanced control techniques in improving precision, accuracy, and speed of operation of these systems.

...read moreread less

1,027 citations

Proceedings Article•DOI•

Approximating the Kullback Leibler Divergence Between Gaussian Mixture Models

[...]

John R. Hershey¹, Peder A. Olsen¹•Institutions (1)

IBM¹

15 Apr 2007

TL;DR: Two new methods, the variational approximation and the Variational upper bound, are introduced and compared to existing methods and the benefits of each one are considered and the performance of each is evaluated through numerical experiments.

...read moreread less

Abstract: The Kullback Leibler (KL) divergence is a widely used tool in statistics and pattern recognition. The KL divergence between two Gaussian mixture models (GMMs) is frequently needed in the fields of speech and image recognition. Unfortunately the KL divergence between two GMMs is not analytically tractable, nor does any efficient computational algorithm exist. Some techniques cope with this problem by replacing the KL divergence with other functions that can be computed efficiently. We introduce two new methods, the variational approximation and the variational upper bound, and compare them to existing methods. We discuss seven different techniques in total and weigh the benefits of each one against the others. To conclude we evaluate the performance of each one through numerical experiments.

...read moreread less

998 citations

Journal Article•DOI•

Nonvolatile memory elements based on organic materials

[...]

J. C. Scott¹, L. D. Bozano¹•Institutions (1)

IBM¹

04 Jun 2007-Advanced Materials

TL;DR: In this article, a review of the materials used in switching devices is presented, focusing particularly on the role of filamentary conduction and deliberately introduced or accidental nanoparticles, and the reported device parameters (on-off ratio, on-state current, switching time, retention time, cycling endurance, and rectification) are compared with those that would be necessary for a viable memory technology.

...read moreread less

Abstract: Many organic electronic devices exhibit switching behavior, and have therefore been proposed as the basis for a nonvolatile memory (NVM) technology. This Review summarizes the materials that have been used in switching devices, and describes the variety of device behavior observed in their charge-voltage (capacitive) or current-voltage (resistive) response. A critical summary of the proposed charge-transport mechanisms for resistive switching is given, focusing particularly on the role of filamentary conduction and of deliberately introduced or accidental nanoparticles. The reported device parameters (on-off ratio, on-state current, switching time, retention time, cycling endurance, and rectification) are compared with those that would be necessary for a viable memory technology.

...read moreread less

Patent•

System for rebuilding dispersed data

[...]

Christopher S. Gladwin¹, Matthew M. England¹, Krishna Kapila Lakshmana Harsha Dhanvi Gopala, Zachary J. Mark, Vance Thornton - Show less +1 more•Institutions (1)

IBM¹

22 Mar 2007

TL;DR: In this article, a digital data file storage system is disclosed in which original data files to be stored are dispersed using some form of information dispersal algorithm into a number of file subsets in such a manner that the data in each file share is less usable or less recognizable or completely unusable or completely unrecognizable by itself except when combined with some or all of the other file shares.

...read moreread less

Abstract: A digital data file storage system is disclosed in which original data files to be stored are dispersed using some form of information dispersal algorithm into a number of file “slices” or subsets in such a manner that the data in each file share is less usable or less recognizable or completely unusable or completely unrecognizable by itself except when combined with some or all of the other file shares. These file shares are stored on separate digital data storage devices as a way of increasing privacy and security. As dispersed file shares are being transferred to or stored on a grid of distributed storage locations, various grid resources may become non-operational or may operate below at a less than optimal level. When dispersed file shares are being written to a dispersed storage grid which not available, the grid clients designates the dispersed data shares that could not be written at that time on a Rebuild List. In addition when grid resources already storing dispersed data become non-available, a process within the dispersed storage grid designates the dispersed data shares that need to be recreated on the Rebuild List. At other points in time a separate process reads the set of Rebuild Lists used to create the corresponding dispersed data and stores that data on available grid resources.

...read moreread less

Patent•

Billing system for information dispersal system

[...]

S. Christopher Gladwin¹, Matthew M. England¹, Zachary J. Mark, Vance Thornton, Joshua J. Mullin, Sejal Kumarbhai Modi - Show less +2 more•Institutions (1)

IBM¹

22 Mar 2007

TL;DR: In this paper, the original data to be stored is separated into a number of data'slices' or shares (22, 24, 26, 28, 30, and 32) and stored on separate digital data storage devices (34, 36, 38, 40, 42, and 44) as a way of increasing privacy and security.

...read moreread less

Abstract: A billing process is disclosed for an information dispersal system or digital data storage system. The original data to be stored is separated into a number of data 'slices' or shares (22, 24, 26, 28, 30, and 32). These data subsets are stored on separate digital data storage devices (34, 36, 38, 40, 42, and 44) as a way of increasing privacy and security. A set of metadata tables are created, separate from the dispersed file share storage, to maintain information about the original data size of each block, file or set of file shares dispersed on the grid.

...read moreread less

Proceedings Article•DOI•

Dynamic Placement of Virtual Machines for Managing SLA Violations

[...]

Norman Bobroff¹, Andrzej Kochut¹, Kirk A. Beaty¹•Institutions (1)

IBM¹

25 Jun 2007

TL;DR: A dynamic server migration and consolidation algorithm is introduced and is shown to provide substantial improvement over static server consolidation in reducing the amount of required capacity and the rate of service level agreement violations.

...read moreread less

Abstract: A dynamic server migration and consolidation algorithm is introduced. The algorithm is shown to provide substantial improvement over static server consolidation in reducing the amount of required capacity and the rate of service level agreement violations. Benefits accrue for workloads that are variable and can be forecast over intervals shorter than the time scale of demand variability. The management algorithm reduces the amount of physical capacity required to support a specified rate of SLA violations for a given workload by as much as 50% as compared to static consolidation approach. Another result is that the rate of SLA violations at fixed capacity may be reduced by up to 20%. The results are based on hundreds of production workload traces across a variety of operating systems, applications, and industries.

...read moreread less

Journal Article•DOI•

Generating Cancelable Fingerprint Templates

[...]

Nalini K. Ratha¹, Sharat Chikkerur, Jonathan H. Connell, Ruud M. Bolle•Institutions (1)

IBM¹

01 Apr 2007-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper demonstrates several methods to generate multiple cancelable identifiers from fingerprint images to overcome privacy concerns and concludes that feature-level cancelable biometric construction is practicable in large biometric deployments.

...read moreread less

Abstract: Biometrics-based authentication systems offer obvious usability advantages over traditional password and token-based authentication schemes. However, biometrics raises several privacy concerns. A biometric is permanently associated with a user and cannot be changed. Hence, if a biometric identifier is compromised, it is lost forever and possibly for every application where the biometric is used. Moreover, if the same biometric is used in multiple applications, a user can potentially be tracked from one application to the next by cross-matching biometric databases. In this paper, we demonstrate several methods to generate multiple cancelable identifiers from fingerprint images to overcome these problems. In essence, a user can be given as many biometric identifiers as needed by issuing a new transformation "key". The identifiers can be cancelled and replaced when compromised. We empirically compare the performance of several algorithms such as Cartesian, polar, and surface folding transformations of the minutiae positions. It is demonstrated through multiple experiments that we can achieve revocability and prevent cross-matching of biometric databases. It is also shown that the transforms are noninvertible by demonstrating that it is computationally as hard to recover the original biometric identifier from a transformed version as by randomly guessing. Based on these empirical results and a theoretical analysis we conclude that feature-level cancelable biometric construction is practicable in large biometric deployments

...read moreread less

Journal Article•DOI•

ManyEyes: a Site for Visualization at Internet Scale

[...]

Fernanda B. Viégas¹, Martin Wattenberg¹, F. van Ham¹, Jesse H. Kriss¹, Matthew Mehall McKeon¹ - Show less +1 more•Institutions (1)

IBM¹

01 Nov 2007-IEEE Transactions on Visualization and Computer Graphics

TL;DR: The design and deployment of Many Eyes is described, a public Web site where users may upload data, create interactive visualizations, and carry on discussions to support collaboration around visualizations at a large scale by fostering a social style of data analysis.

...read moreread less

Abstract: We describe the design and deployment of Many Eyes, a public Web site where users may upload data, create interactive visualizations, and carry on discussions. The goal of the site is to support collaboration around visualizations at a large scale by fostering a social style of data analysis in which visualizations not only serve as a discovery tool for individuals but also as a medium to spur discussion among users. To support this goal, the site includes novel mechanisms for end-user creation of visualizations and asynchronous collaboration around those visualizations. In addition to describing these technologies, we provide a preliminary report on the activity of our users.

...read moreread less

Proceedings Article•

Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation

[...]

Masashi Sugiyama¹, Shinichi Nakajima², Hisashi Kashima³, Paul V. Buenau⁴, Motoaki Kawanabe - Show less +1 more•Institutions (4)

Tokyo Institute of Technology¹, Nikon², IBM³, Technical University of Berlin⁴

03 Dec 2007

TL;DR: This paper proposes a direct importance estimation method that does not involve density estimation and is equipped with a natural cross validation procedure and hence tuning parameters such as the kernel width can be objectively optimized.

...read moreread less

Abstract: A situation where training and test samples follow different input distributions is called covariate shift. Under covariate shift, standard learning methods such as maximum likelihood estimation are no longer consistent—weighted variants according to the ratio of test and training input densities are consistent. Therefore, accurately estimating the density ratio, called the importance, is one of the key issues in covariate shift adaptation. A naive approach to this task is to first estimate training and test input densities separately and then estimate the importance by taking the ratio of the estimated densities. However, this naive approach tends to perform poorly since density estimation is a hard task particularly in high dimensional cases. In this paper, we propose a direct importance estimation method that does not involve density estimation. Our method is equipped with a natural cross validation procedure and hence tuning parameters such as the kernel width can be objectively optimized. Simulations illustrate the usefulness of our approach.

...read moreread less

Journal Article•DOI•

Ultra-compact, low RF power, 10 Gb/s silicon Mach-Zehnder modulator.

[...]

William M. J. Green¹, Michael J. Rooks¹, Lidija Sekaric¹, Yurii A. Vlasov¹•Institutions (1)

IBM¹

10 Dec 2007-Optics Express

TL;DR: Silicon p(+)-i-n(+) diode Mach-Zehnder electrooptic modulators having an ultra-compact length of 100 to 200 mum are presented and exhibit high modulation efficiency.

...read moreread less

Abstract: Silicon p(+)-i-n(+) diode Mach-Zehnder electrooptic modulators having an ultra-compact length of 100 to 200 mum are presented. These devices exhibit high modulation efficiency, with a V(pi)L figure of merit of 0.36 V-mm. Optical modulation at data rates up to 10 Gb/s is demonstrated with low RF power consumption of only 5 pJ/bit.

...read moreread less

Proceedings Article•DOI•

Cross-domain video concept detection using adaptive svms

[...]

Jun Yang¹, Rong Yan², Alexander G. Hauptmann¹•Institutions (2)

Carnegie Mellon University¹, IBM²

29 Sep 2007

TL;DR: This paper proposes Adaptive Support Vector Machines (A-SVMs) as a general method to adapt one or more existing classifiers of any type to the new dataset and outperforms several baseline and competing methods in terms of classification accuracy and efficiency in cross-domain concept detection in the TRECVID corpus.

...read moreread less

Abstract: Many multimedia applications can benefit from techniques for adapting existing classifiers to data with different distributions. One example is cross-domain video concept detection which aims to adapt concept classifiers across various video domains. In this paper, we explore two key problems for classifier adaptation: (1) how to transform existing classifier(s) into an effective classifier for a new dataset that only has a limited number of labeled examples, and (2) how to select the best existing classifier(s) for adaptation. For the first problem, we propose Adaptive Support Vector Machines (A-SVMs) as a general method to adapt one or more existing classifiers of any type to the new dataset. It aims to learn the "delta function" between the original and adapted classifier using an objective function similar to SVMs. For the second problem, we estimate the performance of each existing classifier on the sparsely-labeled new dataset by analyzing its score distribution and other meta features, and select the classifiers with the best estimated performance. The proposed method outperforms several baseline and competing methods in terms of classification accuracy and efficiency in cross-domain concept detection in the TRECVID corpus.

...read moreread less

Proceedings Article•DOI•

Trojan Detection using IC Fingerprinting

[...]

Dakshi Agrawal¹, Selcuk Baktir¹, Deniz Karakoyunlu², Pankaj Rohatgi¹, Berk Sunar² - Show less +1 more•Institutions (2)

IBM¹, Worcester Polytechnic Institute²

20 May 2007

TL;DR: These results show that Trojans that are 3-4 orders of magnitude smaller than the main circuit can be detected by signal processing techniques and provide a starting point to address this important problem.

...read moreread less

Abstract: Hardware manufacturers are increasingly outsourcing their IC fabrication work overseas due to their much lower cost structure. This poses a significant security risk for ICs used for critical military and business applications. Attackers can exploit this loss of control to substitute Trojan ICs for genuine ones or insert a Trojan circuit into the design or mask used for fabrication. We show that a technique borrowed from side-channel cryptanalysis can be used to mitigate this problem. Our approach uses noise modeling to construct a set of fingerprints/or an IC family utilizing side- channel information such as power, temperature, and electromagnetic (EM) profiles. The set of fingerprints can be developed using a few ICs from a batch and only these ICs would have to be invasively tested to ensure that they were all authentic. The remaining ICs are verified using statistical tests against the fingerprints. We describe the theoretical framework and present preliminary experimental results to show that this approach is viable by presenting results obtained by using power simulations performed on representative circuits with several different Trojan circuitry. These results show that Trojans that are 3-4 orders of magnitude smaller than the main circuit can be detected by signal processing techniques. While scaling our technique to detect even smaller Trojans in complex ICs with tens or hundreds of millions of transistors would require certain modifications to the IC design process, our results provide a starting point to address this important problem.

...read moreread less

Journal Article•DOI•

Discovering Motifs in Ranked Lists of DNA Sequences

[...]

Eran Eden¹, Doron Lipson¹, Sivan Yogev¹, Sivan Yogev², Zohar Yakhini³, Zohar Yakhini¹ - Show less +2 more•Institutions (3)

Technion – Israel Institute of Technology¹, IBM², Agilent Technologies³

23 Mar 2007-PLOS Computational Biology

TL;DR: The implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences, is demonstrated, demonstrating that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications.

...read moreread less

Abstract: Computational methods for discovery of sequence elements that are enriched in a target set compared with a background set are fundamental in molecular biology research. One example is the discovery of transcription factor binding motifs that are inferred from ChIP–chip (chromatin immuno-precipitation on a microarray) measurements. Several major challenges in sequence motif discovery still require consideration: (i) the need for a principled approach to partitioning the data into target and background sets; (ii) the lack of rigorous models and of an exact p-value for measuring motif enrichment; (iii) the need for an appropriate framework for accounting for motif multiplicity; (iv) the tendency, in many of the existing methods, to report presumably significant motifs even when applied to randomly generated data. In this paper we present a statistical framework for discovering enriched sequence elements in ranked lists that resolves these four issues. We demonstrate the implementation of this framework in a software application, termed DRIM (discovery of rank imbalanced motifs), which identifies sequence motifs in lists of ranked DNA sequences. We applied DRIM to ChIP–chip and CpG methylation data and obtained the following results. (i) Identification of 50 novel putative transcription factor (TF) binding sites in yeast ChIP–chip data. The biological function of some of them was further investigated to gain new insights on transcription regulation networks in yeast. For example, our discoveries enable the elucidation of the network of the TF ARO80. Another finding concerns a systematic TF binding enhancement to sequences containing CA repeats. (ii) Discovery of novel motifs in human cancer CpG methylation data. Remarkably, most of these motifs are similar to DNA sequence elements bound by the Polycomb complex that promotes histone methylation. Our findings thus support a model in which histone methylation and CpG methylation are mechanistically linked. Overall, we demonstrate that the statistical framework embodied in the DRIM software tool is highly effective for identifying regulatory sequence elements in a variety of applications ranging from expression and ChIP–chip to CpG methylation data. DRIM is publicly available at http://bioinfo.cs.technion.ac.il/drim.

...read moreread less

Proceedings Article•DOI•

GraphScope: parameter-free mining of large time-evolving graphs

[...]

Jimeng Sun¹, Christos Faloutsos¹, Spiros Papadimitriou², Philip S. Yu²•Institutions (2)

Carnegie Mellon University¹, IBM²

12 Aug 2007

TL;DR: The efficiency and effectiveness of the GraphScope is demonstrated, which is designed to operate on large graphs, in a streaming fashion, on real datasets from several diverse domains, and produces meaningful time-evolving patterns that agree with human intuition.

...read moreread less

Abstract: How can we find communities in dynamic networks of socialinteractions, such as who calls whom, who emails whom, or who sells to whom? How can we spot discontinuity time-points in such streams of graphs, in an on-line, any-time fashion? We propose GraphScope, that addresses both problems, using information theoretic principles. Contrary to the majority of earlier methods, it needs no user-defined parameters. Moreover, it is designed to operate on large graphs, in a streaming fashion. We demonstrate the efficiency and effectiveness of our GraphScope on real datasets from several diverse domains. In all cases it produces meaningful time-evolving patterns that agree with human intuition.

...read moreread less

Journal Article•DOI•

Current-Induced Hydrogen Tautomerization and Conductance Switching of Naphthalocyanine Molecules

[...]

Peter Liljeroth¹, Peter Liljeroth², Jascha Repp¹, Jascha Repp², Gerhard Meyer¹, Gerhard Meyer² - Show less +2 more•Institutions (2)

University of Regensburg¹, IBM²

31 Aug 2007-Science

TL;DR: A coupling of the switching process so that the charge injection in one molecule induced tautomerization in an adjacent molecule is demonstrated.

...read moreread less

Abstract: The bistability in the position of the two hydrogen atoms in the inner cavity of single free-base naphthalocyanine molecules constitutes a two-level system that was manipulated and probed by low-temperature scanning tunneling microscopy. When adsorbed on an ultrathin insulating film, the molecules can be switched in a controlled fashion between the two states by excitation induced by the inelastic tunneling current. The tautomerization reaction can be probed by resonant tunneling through the molecule and is expressed as considerable changes in the conductivity of the molecule. We also demonstrated a coupling of the switching process so that the charge injection in one molecule induced tautomerization in an adjacent molecule.

...read moreread less

Journal Article•DOI•

Secure Distributed Key Generation for Discrete-Log Based Cryptosystems

[...]

Rosario Gennaro¹, Stanislaw Jarecki², Hugo Krawczyk¹, Tal Rabin¹•Institutions (2)

IBM¹, University of California, Irvine²

01 Jan 2007-Journal of Cryptology

TL;DR: This paper shows that a widely used dlog-based DKG protocol suggested by Pedersen does not guarantee a uniformly random distribution of generated keys, and presents a new protocol which proves to satisfy the security requirements from DKG protocols and ensures a uniform distribution of the generated keys.

...read moreread less

Abstract: A Distributed Key Generation (DKG) protocol is an essential component of threshold cryptosystems required to initialize the cryptosystem securely and generate its private and public keys. In the case of discrete-log-based (dlog-based) threshold signature schemes (ElGamal and its derivatives), the DKG protocol is further used in the distributed signature generation phase to generate one-time signature randomizers (r = gk). In this paper we show that a widely used dlog-based DKG protocol suggested by Pedersen does not guarantee a uniformly random distribution of generated keys: we describe an efficient active attacker controlling a small number of parties which successfully biases the values of the generated keys away from uniform. We then present a new DKG protocol for the setting of dlog-based cryptosystems which we prove to satisfy the security requirements from DKG protocols and, in particular, it ensures a uniform distribution of the generated keys. The new protocol can be used as a secure replacement for the many applications of Pedersen's protocol. Motivated by the fact that the new DKG protocol incurs additional communication cost relative to Pedersen's original protocol, we investigate whether the latter can be used in specific applications which require relaxed security properties from the DKG protocol. We answer this question affirmatively by showing that Pedersen's protocol suffices for the secure implementation of certain threshold cryptosystems whose security can be reduced to the hardness of the discrete logarithm problem. In particular, we show Pedersen's DKG to be sufficient for the construction of a threshold Schnorr signature scheme. Finally, we observe an interesting trade-off between security (reductions), computation, and communication that arises when comparing Pedersen's DKG protocol with ours.

...read moreread less

Proceedings Article•DOI•

Optimizing web search using social annotations

[...]

Shenghua Bao¹, Gui-Rong Xue¹, Xiaoyuan Wu¹, Yong Yu¹, Ben Fei², Zhong Su² - Show less +2 more•Institutions (2)

Shanghai Jiao Tong University¹, IBM²

08 May 2007

TL;DR: Preliminary experimental results show that SSR can find the latent semantic association between queries and annotations, while SPR successfully measures the quality of a webpage from the web users' perspective.

...read moreread less

Abstract: This paper explores the use of social annotations to improve websearch. Nowadays, many services, e.g. del.icio.us, have been developed for web users to organize and share their favorite webpages on line by using social annotations. We observe that the social annotations can benefit web search in two aspects: 1) the annotations are usually good summaries of corresponding webpages; 2) the count of annotations indicates the popularity of webpages. Two novel algorithms are proposed to incorporate the above information into page ranking: 1) SocialSimRank (SSR)calculates the similarity between social annotations and webqueries; 2) SocialPageRank (SPR) captures the popularity of webpages. Preliminary experimental results show that SSR can find the latent semantic association between queries and annotations, while SPR successfully measures the quality (popularity) of a webpage from the web users' perspective. We further evaluate the proposed methods empirically with 50 manually constructed queries and 3000 auto-generated queries on a dataset crawledfrom delicious. Experiments show that both SSR and SPRbenefit web search significantly.

...read moreread less

Proceedings Article•DOI•

BLINKS: ranked keyword searches on graphs

[...]

Hao He¹, Haixun Wang², Jun Yang¹, Philip S. Yu²•Institutions (2)

Duke University¹, IBM²

11 Jun 2007

TL;DR: BLINKS follows a search strategy with provable performance bounds, while additionally exploiting a bi-level index for pruning and accelerating the search, and offers orders-of-magnitude performance improvement over existing approaches.

...read moreread less

Abstract: Query processing over graph-structured data is enjoying a growing number of applications. A top-k keyword search query on a graph finds the top k answers according to some ranking criteria, where each answer is a substructure of the graph containing all query keywords. Current techniques for supporting such queries on general graphs suffer from several drawbacks, e.g., poor worst-case performance, not taking full advantage of indexes, and high memory requirements. To address these problems, we propose BLINKS, a bi-level indexing and query processing scheme for top-k keyword search on graphs. BLINKS follows a search strategy with provable performance bounds, while additionally exploiting a bi-level index for pruning and accelerating the search. To reduce the index space, BLINKS partitions a data graph into blocks: The bi-level index stores summary information at the block level to initiate and guide search among blocks, and more detailed information for each block to accelerate search within blocks. Our experiments show that BLINKS offers orders-of-magnitude performance improvement over existing approaches.

...read moreread less

Journal Article•DOI•

Germanium nanowire growth below the eutectic temperature

[...]

Suneel Kodambaka¹, Jerry Tersoff¹, Mark C. Reuter¹, Frances M. Ross¹•Institutions (1)

IBM¹

04 May 2007-Science

TL;DR: In situ microscopy is showed that, for the classic Ge/Au system, nanowire growth can occur below the eutectic temperature with either liquid or solid catalysts at the same temperature, and it is found, unexpectedly, that the catalyst state depends on the growth pressure and thermal history.

...read moreread less

Abstract: Nanowires are conventionally assumed to grow via the vapor-liquid-solid process, in which material from the vapor is incorporated into the growing nanowire via a liquid catalyst, commonly a low-melting point eutectic alloy. However, nanowires have been observed to grow below the eutectic temperature, and the state of the catalyst remains controversial. Using in situ microscopy, we showed that, for the classic Ge/Au system, nanowire growth can occur below the eutectic temperature with either liquid or solid catalysts at the same temperature. We found, unexpectedly, that the catalyst state depends on the growth pressure and thermal history. We suggest that these phenomena may be due to kinetic enrichment of the eutectic alloy composition and expect these results to be relevant for other nanowire systems.

...read moreread less

Book Chapter•DOI•

Synthesis, Structure, and Properties of Organic-Inorganic Perovskites and Related Materials

[...]

David B. Mitzi¹•Institutions (1)

IBM¹

09 Mar 2007

Journal Article•DOI•

SAWSDL: Semantic Annotations for WSDL and XML Schema

[...]

Jacek Kopecky¹, Tomas Vitvar¹, C. Bournez, J. Farrell²•Institutions (2)

Digital Enterprise Research Institute¹, IBM²

01 Nov 2007-IEEE Internet Computing

TL;DR: The World Wide Web Consortium (W3C) has recently finished work on two important standards for describing Web services the Web Services Description Language (W SDL) 2.0 and Semantic Annotations for WSDL and XML Schema (SAWSDL).

...read moreread less

Abstract: Web services are important for creating distributed applications on the Web. In fact, they're a key enabler for service-oriented architectures that focus on service reuse and interoperability. The World Wide Web Consortium (W3C) has recently finished work on two important standards for describing Web services the Web Services Description Language (WSDL) 2.0 and Semantic Annotations for WSDL and XML Schema (SAWSDL). Here, the authors discuss the latter, which is the first standard for adding semantics to Web service descriptions.

...read moreread less

Journal Article•DOI•

Fingerprint-Based Fuzzy Vault: Implementation and Performance

[...]

Karthik Nandakumar¹, Anil K. Jain¹, Sharathchandra U. Pankanti²•Institutions (2)

Michigan State University¹, IBM²

01 Dec 2007-IEEE Transactions on Information Forensics and Security

TL;DR: This work presents a fully automatic implementation of the fuzzy vault scheme based on fingerprint minutiae, a biometric cryptosystem that secures both the secret key and the biometric template by binding them within a cryptographic framework.

...read moreread less

Abstract: Reliable information security mechanisms are required to combat the rising magnitude of identity theft in our society. While cryptography is a powerful tool to achieve information security, one of the main challenges in cryptosystems is to maintain the secrecy of the cryptographic keys. Though biometric authentication can be used to ensure that only the legitimate user has access to the secret keys, a biometric system itself is vulnerable to a number of threats. A critical issue in biometric systems is to protect the template of a user which is typically stored in a database or a smart card. The fuzzy vault construct is a biometric cryptosystem that secures both the secret key and the biometric template by binding them within a cryptographic framework. We present a fully automatic implementation of the fuzzy vault scheme based on fingerprint minutiae. Since the fuzzy vault stores only a transformed version of the template, aligning the query fingerprint with the template is a challenging task. We extract high curvature points derived from the fingerprint orientation field and use them as helper data to align the template and query minutiae. The helper data itself do not leak any information about the minutiae template, yet contain sufficient information to align the template and query fingerprints accurately. Further, we apply a minutiae matcher during decoding to account for nonlinear distortion and this leads to significant improvement in the genuine accept rate. We demonstrate the performance of the vault implementation on two different fingerprint databases. We also show that performance improvement can be achieved by using multiple fingerprint impressions during enrollment and verification.

...read moreread less

Collapse