Showing papers by "Google published in 2012"

PDF

Open Access

Journal Article•DOI•

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

[...]

Geoffrey E. Hinton¹, Li Deng², Dong Yu², George E. Dahl¹, Abdelrahman Mohamed¹, Navdeep Jaitly¹, Andrew W. Senior³, Vincent Vanhoucke³, Patrick Nguyen³, Tara N. Sainath⁴, Brian Kingsbury⁴ - Show less +7 more•Institutions (4)

University of Toronto¹, Microsoft², Google³, IBM⁴

18 Oct 2012-IEEE Signal Processing Magazine

TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

...read moreread less

Abstract: Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

...read moreread less

9,091 citations

Proceedings Article•

Large Scale Distributed Deep Networks

[...]

Jeffrey Dean¹, Greg S. Corrado¹, Rajat Monga¹, Kai Chen¹, Matthieu Devin¹, Mark Z. Mao¹, Marc'Aurelio Ranzato¹, Andrew W. Senior¹, Paul A. Tucker¹, Ke Yang¹, Quoc V. Le¹, Andrew Y. Ng¹ - Show less +8 more•Institutions (1)

Google¹

03 Dec 2012

TL;DR: This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.

...read moreread less

Abstract: Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly- sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.

...read moreread less

3,475 citations

Journal Article•DOI•

Predicting the Present with Google Trends

[...]

Hyunyoung Choi¹, Hal R. Varian¹•Institutions (1)

Google¹

01 Jun 2012-Economic Record

TL;DR: This paper used search engine data to forecast near-term values of economic indicators, such as automobile sales, unemployment claims, travel destination planning, and consumer confidence, and showed how to use this information to forecast future economic indicators.

...read moreread less

Abstract: In this paper we show how to use search engine data to forecast near-term values of economic indicators. Examples include automobile sales, unemployment claims, travel destination planning and consumer confidence.

...read moreread less

1,619 citations

Proceedings Article•DOI•

Spanner: Google's globally-distributed database

[...]

James C. Corbett¹, Jeffrey Dean¹, Michael James Boyer Epstein¹, Andrew Fikes¹, Christopher Frost¹, J. J. Furman¹, Sanjay Ghemawat¹, Andrey Gubarev¹, Christopher Heiser¹, Peter Hochschild¹, Wilson C. Hsieh¹, Sebastian Kanthak¹, Eugene Kogan¹, Hongyi Li¹, Alexander Lloyd¹, Sergey Melnik¹, David Mwaura¹, David Nagle¹, Sean Quinlan¹, Rajesh Rao¹, Lindsay Rolig¹, Yasushi Saito¹, Michal Piotr Szymaniak¹, Chris Jorgen Taylor¹, Ruth Wang¹, Dale Woodford¹ - Show less +22 more•Institutions (1)

Google¹

08 Oct 2012

TL;DR: This article describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty, critical to supporting external consistency and a variety of powerful features.

...read moreread less

Abstract: Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: nonblocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.

...read moreread less

1,366 citations

Journal Article•DOI•

Approximate frequency counts over data streams

[...]

Gurmeet Singh Manku¹, Rajeev Motwani²•Institutions (2)

Google¹, Stanford University²

01 Aug 2012

TL;DR: This talk will trace the history of the Approximate Frequency Counts paper, how it was conceptualized and how it influenced data stream research.

...read moreread less

Abstract: Research in data stream algorithms has blossomed since late 90s. The talk will trace the history of the Approximate Frequency Counts paper, how it was conceptualized and how it influenced data stream research. The talk will also touch upon a recent development: analysis of personal data streams for improving our quality of lives.

...read moreread less

1,291 citations

Journal Article•DOI•

Measuring the Objectness of Image Windows

[...]

Bogdan Alexe¹, Thomas Deselaers², Vittorio Ferrari³•Institutions (3)

ETH Zurich¹, Google², University of Edinburgh³

01 Nov 2012-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this paper, a generic objectness measure is proposed to quantify how likely an image window is to contain an object of any class, such as cows and telephones, from amorphous background elements such as grass and road.

...read moreread less

Abstract: We present a generic objectness measure, quantifying how likely it is for an image window to contain an object of any class. We explicitly train it to distinguish objects with a well-defined boundary in space, such as cows and telephones, from amorphous background elements, such as grass and road. The measure combines in a Bayesian framework several image cues measuring characteristics of objects, such as appearing different from their surroundings and having a closed boundary. These include an innovative cue to measure the closed boundary characteristic. In experiments on the challenging PASCAL VOC 07 dataset, we show this new cue to outperform a state-of-the-art saliency measure, and the combined objectness measure to perform better than any cue alone. We also compare to interest point operators, a HOG detector, and three recent works aiming at automatic object segmentation. Finally, we present two applications of objectness. In the first, we sample a small numberof windows according to their objectness probability and give an algorithm to employ them as location priors for modern class-specific object detectors. As we show experimentally, this greatly reduces the number of windows evaluated by the expensive class-specific model. In the second application, we use objectness as a complementary score in addition to the class-specific model, which leads to fewer false positives. As shown in several recent papers, objectness can act as a valuable focus of attention mechanism in many other applications operating on image windows, including weakly supervised learning of object categories, unsupervised pixelwise segmentation, and object tracking in video. Computing objectness is very efficient and takes only about 4 sec. per image.

...read moreread less

1,223 citations

Journal Article•DOI•

Semi-Supervised Hashing for Large-Scale Search

[...]

Jun Wang¹, Sanjiv Kumar², Shih-Fu Chang³•Institutions (3)

IBM¹, Google², Columbia University³

01 Dec 2012-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work proposes a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets and presents three different semi- supervised hashing methods, including orthogonal hashing, nonorthogonal hash, and sequential hashing.

...read moreread less

Abstract: Hashing-based approximate nearest neighbor (ANN) search in huge databases has become popular due to its computational and memory efficiency. The popular hashing methods, e.g., Locality Sensitive Hashing and Spectral Hashing, construct hash functions based on random or principal projections. The resulting hashes are either not very accurate or are inefficient. Moreover, these methods are designed for a given metric similarity. On the contrary, semantic similarity is usually given in terms of pairwise labels of samples. There exist supervised hashing methods that can handle such semantic similarity, but they are prone to overfitting when labeled data are small or noisy. In this work, we propose a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets. Based on this framework, we present three different semi-supervised hashing methods, including orthogonal hashing, nonorthogonal hashing, and sequential hashing. Particularly, the sequential hashing method generates robust codes in which each hash function is designed to correct the errors made by the previous ones. We further show that the sequential learning paradigm can be extended to unsupervised domains where no labeled pairs are available. Extensive experiments on four large datasets (up to 80 million samples) demonstrate the superior performance of the proposed SSH methods over state-of-the-art supervised and unsupervised hashing techniques.

...read moreread less

834 citations

Proceedings Article•

AddressSanitizer: a fast address sanity checker

[...]

Konstantin Serebryany¹, Derek Bruening¹, Alexander Potapenko¹, Dmitriy Vyukov¹•Institutions (1)

Google¹

13 Jun 2012

TL;DR: The paper presents AddressSanitizer, a new memory error detector that achieves efficiency without sacrificing comprehensiveness, and has found over 300 previously unknown bugs in the Chromium browser and many bugs in other software.

...read moreread less

Abstract: Memory access bugs, including buffer overflows and uses of freed heap memory, remain a serious problem for programming languages like C and C++. Many memory error detectors exist, but most of them are either slow or detect a limited set of bugs, or both. This paper presents AddressSanitizer, a new memory error detector. Our tool finds out-of-bounds accesses to heap, stack, and global objects, as well as use-after-free bugs. It employs a specialized memory allocator and code instrumentation that is simple enough to be implemented in any compiler, binary translation system, or even in hardware. AddressSanitizer achieves efficiency without sacrificing comprehensiveness. Its average slowdown is just 73% yet it accurately detects bugs at the point of occurrence. It has found over 300 previously unknown bugs in the Chromium browser and many bugs in other software.

...read moreread less

795 citations

Proceedings Article•

Building high-level features using large scale unsupervised learning

[...]

Marc'Aurelio Ranzato¹, Rajat Monga¹, Matthieu Devin¹, Kai Chen¹, Greg S. Corrado¹, Jeffrey Dean¹, Quoc V. Le², Andrew Y. Ng² - Show less +4 more•Institutions (2)

Google¹, Stanford University²

26 Jun 2012

TL;DR: In this paper, a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization was used to learn high-level, class-specific feature detectors from only unlabeled data.

...read moreread less

Abstract: We consider the problem of building high-level, class-specific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images using unlabeled images? To answer this, we train a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization on a large dataset of images (the model has 1 billion connections, the dataset has 10 million 200×200 pixel images downloaded from the Internet). We train this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. Control experiments show that this feature detector is robust not only to translation but also to scaling and out-of-plane rotation. We also find that the same network is sensitive to other high-level concepts such as cat faces and human bodies. Starting with these learned features, we trained our network to obtain 15.8% accuracy in recognizing 20,000 object categories from ImageNet, a leap of 70% relative improvement over the previous state-of-the-art.

...read moreread less

786 citations

Proceedings Article•DOI•

Japanese and Korean voice search

[...]

Mike Schuster¹, Kaisuke Nakajima¹•Institutions (1)

Google¹

25 Mar 2012

TL;DR: The techniques used to deal with an infinite vocabulary, how modeling completely in the written domain for language model and dictionary can avoid some system complexity, and how dictionaries, language and acoustic models are built in this framework are described.

...read moreread less

Abstract: This paper describes challenges and solutions for building a successful voice search system as applied to Japanese and Korean at Google. We describe the techniques used to deal with an infinite vocabulary, how modeling completely in the written domain for language model and dictionary can avoid some system complexity, and how we built dictionaries, language and acoustic models in this framework. We show how to deal with the difficulty of scoring results for multiple script languages because of ambiguities. The development of voice search for these languages led to a significant simplification of the original process to build a system for any new language which in in parts became our default process for internationalization of voice search.

...read moreread less

774 citations

Proceedings Article•

A Universal Part-of-Speech Tagset

[...]

Slav Petrov¹, Dipanjan Das², Ryan McDonald¹•Institutions (2)

Google¹, Carnegie Mellon University²

01 May 2012

TL;DR: This work proposes a tagset that consists of twelve universal part-of-speech categories and develops a mapping from 25 different treebank tagsets to this universal set, which when combined with the original treebank data produces a dataset consisting of common parts- of-speech for 22 different languages.

...read moreread less

Abstract: To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts-of-speech for 22 different languages. We highlight the use of this resource via three experiments, that (1) compare tagging accuracies across languages, (2) present an unsupervised grammar induction approach that does not use gold standard part-of-speech tags, and (3) use the universal tags to transfer dependency parsers between languages, achieving state-of-the-art results.

...read moreread less

Proceedings Article•DOI•

Spotting fake reviewer groups in consumer reviews

[...]

Arjun Mukherjee¹, Bing Liu¹, Natalie S. Glance²•Institutions (2)

University of Illinois at Chicago¹, Google²

16 Apr 2012

TL;DR: This paper studies spam detection in the collaborative setting, i.e., to discover fake reviewer groups by using several behavioral models derived from the collusion phenomenon among fake reviewers and relation models based on the relationships among groups, individual reviewers, and products they reviewed to detectfake reviewer groups.

...read moreread less

Abstract: Opinionated social media such as product reviews are now widely used by individuals and organizations for their decision making. However, due to the reason of profit or fame, people try to game the system by opinion spamming (e.g., writing fake reviews) to promote or demote some target products. For reviews to reflect genuine user experiences and opinions, such spam reviews should be detected. Prior works on opinion spam focused on detecting fake reviews and individual fake reviewers. However, a fake reviewer group (a group of reviewers who work collaboratively to write fake reviews) is even more damaging as they can take total control of the sentiment on the target product due to its size. This paper studies spam detection in the collaborative setting, i.e., to discover fake reviewer groups. The proposed method first uses a frequent itemset mining method to find a set of candidate groups. It then uses several behavioral models derived from the collusion phenomenon among fake reviewers and relation models based on the relationships among groups, individual reviewers, and products they reviewed to detect fake reviewer groups. Additionally, we also built a labeled dataset of fake reviewer groups. Although labeling individual fake reviews and reviewers is very hard, to our surprise labeling fake reviewer groups is much easier. We also note that the proposed technique departs from the traditional supervised learning approach for spam detection because of the inherent nature of our problem which makes the classic supervised learning approach less effective. Experimental results show that the proposed method outperforms multiple strong baselines including the state-of-the-art supervised classification, regression, and learning to rank algorithms.

...read moreread less

Journal Article•DOI•

On the (im)possibility of obfuscating programs

[...]

Boaz Barak¹, Oded Goldreich², Russell Impagliazzo³, Steven Rudich⁴, Amit Sahai⁵, Salil Vadhan⁶, Ke Yang⁷ - Show less +3 more•Institutions (7)

Microsoft¹, Weizmann Institute of Science², University of California, San Diego³, Carnegie Mellon University⁴, University of California, Los Angeles⁵, Harvard University⁶, Google⁷

03 May 2012-Journal of the ACM

TL;DR: It is proved that obfuscation is impossible, by constructing a family of efficient programs that are unobfuscatable, in the sense that given any efficient program, the “source code” of that program can be efficiently reconstructed.

...read moreread less

Abstract: Informally, an obfuscatorO is an (efficient, probabilistic) “compiler” that takes as input a program (or circuit) P and produces a new program O(P) that has the same functionality as P yet is “unintelligible” in some sense. Obfuscators, if they exist, would have a wide variety of cryptographic and complexity-theoretic applications, ranging from software protection to homomorphic encryption to complexity-theoretic analogues of Rice's theorem. Most of these applications are based on an interpretation of the “unintelligibility” condition in obfuscation as meaning that O(P) is a “virtual black box,” in the sense that anything one can efficiently compute given O(P), one could also efficiently compute given oracle access to P.In this work, we initiate a theoretical investigation of obfuscation. Our main result is that, even under very weak formalizations of the above intuition, obfuscation is impossible. We prove this by constructing a family of efficient programs P that are unobfuscatable in the sense that (a) given any efficient program P' that computes the same function as a program P ∈ p, the “source code” P can be efficiently reconstructed, yet (b) given oracle access to a (randomly selected) program P ∈ p, no efficient algorithm can reconstruct P (or even distinguish a certain bit in the code from random) except with negligible probability.We extend our impossibility result in a number of ways, including even obfuscators that (a) are not necessarily computable in polynomial time, (b) only approximately preserve the functionality, and (c) only need to work for very restricted models of computation (TC0). We also rule out several potential applications of obfuscators, by constructing “unobfuscatable” signature schemes, encryption schemes, and pseudorandom function families.

...read moreread less

Journal Article•DOI•

Empirical Studies in Information Visualization: Seven Scenarios

[...]

Heidi Lam¹, Enrico Bertini², Petra Isenberg³, Catherine Plaisant⁴, Sheelagh Carpendale⁵ - Show less +1 more•Institutions (5)

Google¹, University of Konstanz², French Institute for Research in Computer Science and Automation³, University of Maryland, College Park⁴, University of Calgary⁵

01 Sep 2012-IEEE Transactions on Visualization and Computer Graphics

TL;DR: The current practices in the information visualization research community are encapsulated and a different approach is provided to reaching decisions about what might be the most effective evaluation of a given information visualization.

...read moreread less

Abstract: We take a new, scenario-based look at evaluation in information visualization. Our seven scenarios, evaluating visual data analysis and reasoning, evaluating user performance, evaluating user experience, evaluating environments and work practices, evaluating communication through visualization, evaluating visualization algorithms, and evaluating collaborative data analysis were derived through an extensive literature review of over 800 visualization publications. These scenarios distinguish different study goals and types of research questions and are illustrated through example studies. Through this broad survey and the distillation of these scenarios, we make two contributions. One, we encapsulate the current practices in the information visualization research community and, two, we provide a different approach to reaching decisions about what might be the most effective evaluation of a given information visualization. Scenarios can be used to choose appropriate research questions and goals and the provided examples can be consulted for guidance on how to design one's own study.

...read moreread less

Proceedings Article•DOI•

Deadline-aware datacenter tcp (D2TCP)

[...]

Balajee Vamanan¹, Jahangir Hasan², T. N. Vijaykumar¹•Institutions (2)

Purdue University¹, Google²

13 Aug 2012

TL;DR: This work proposes Deadline-Aware Datacenter TCP (D2TCP), a novel transport protocol, which handles bursts, is deadline-aware, and is readily deployable and uses a novel congestion avoidance algorithm which uses ECN feedback and deadlines to modulate the congestion window via a gamma-correction function.

...read moreread less

Abstract: An important class of datacenter applications, called Online Data-Intensive (OLDI) applications, includes Web search, online retail, and advertisement. To achieve good user experience, OLDI applications operate under soft-real-time constraints (e.g., 300 ms latency) which imply deadlines for network communication within the applications. Further, OLDI applications typically employ tree-based algorithms which, in the common case, result in bursts of children-to-parent traffic with tight deadlines. Recent work on datacenter network protocols is either deadline-agnostic (DCTCP) or is deadline-aware (D3) but suffers under bursts due to race conditions. Further, D3 has the practical drawbacks of requiring changes to the switch hardware and not being able to coexist with legacy TCP. We propose Deadline-Aware Datacenter TCP (D2TCP), a novel transport protocol, which handles bursts, is deadline-aware, and is readily deployable. In designing D2TCP, we make two contributions: (1) D2TCP uses a distributed and reactive approach for bandwidth allocation which fundamentally enables D2TCP's properties. (2) D2TCP employs a novel congestion avoidance algorithm, which uses ECN feedback and deadlines to modulate the congestion window via a gamma-correction function. Using a small-scale implementation and at-scale simulations, we show that D2TCP reduces the fraction of missed deadlines compared to DCTCP and D3 by 75% and 50%, respectively.

...read moreread less

Journal Article•DOI•

Social media use by government: From the routine to the critical

[...]

Andrea L. Kavanaugh¹, Edward A. Fox¹, Steven D. Sheetz¹, Seungwon Yang¹, Lin Tzy Li², Donald J. Shoemaker¹, Apostol Natsev³, Lexing Xie⁴ - Show less +4 more•Institutions (4)

Virginia Tech¹, State University of Campinas², Google³, Australian National University⁴

01 Oct 2012-Government Information Quarterly

TL;DR: Findings from a exploratory study conducted with government officials in Arlington, VA between June and December 2010 are presented, with the broad goal of understanding social media use by government officials as well as community organizations, businesses, and the public at large.

...read moreread less

Proceedings Article•

Less is more: trading a little bandwidth for ultra-low latency in the data center

[...]

Mohammad Alizadeh¹, Abdul Kabbani², Tom Edsall³, Balaji Prabhakar¹, Amin Vahdat², Masato Yasuda⁴ - Show less +2 more•Institutions (4)

Stanford University¹, Google², Cisco Systems, Inc.³, NEC⁴

25 Apr 2012

TL;DR: The HULL (High-bandwidth Ultra-Low Latency) architecture is presented to balance two seemingly contradictory goals: near baseline fabric latency and high bandwidth utilization and results show that by sacrificing a small amount of bandwidth, HULL can dramatically reduce average and tail latencies in the data center.

...read moreread less

Abstract: Traditional measures of network goodness--goodput, quality of service, fairness--are expressed in terms of bandwidth. Network latency has rarely been a primary concern because delivering the highest level of bandwidth essentially entails driving up latency--at the mean and, especially, at the tail. Recently, however, there has been renewed interest in latency as a primary metric for mainstream applications. In this paper, we present the HULL (High-bandwidth Ultra-Low Latency) architecture to balance two seemingly contradictory goals: near baseline fabric latency and high bandwidth utilization. HULL leaves 'bandwidth headroom' using Phantom Queues that deliver congestion signals before network links are fully utilized and queues form at switches. By capping utilization at less than link capacity, we leave room for latency sensitive traffic to avoid buffering and the associated large delays. At the same time, we use DCTCP, a recently proposed congestion control algorithm, to adaptively respond to congestion and to mitigate the bandwidth penalties which arise from operating in a bufferless fashion. HULL further employs packet pacing to counter burstiness caused by Interrupt Coalescing and Large Send Offloading. Our implementation and simulation results show that by sacrificing a small amount (e.g., 10%) of bandwidth, HULL can dramatically reduce average and tail latencies in the data center.

...read moreread less

Journal Article•DOI•

Putting the pieces together: integrative modeling platform software for structure determination of macromolecular assemblies.

[...]

Daniel Russel¹, Keren Lasker², Keren Lasker¹, Ben Webb¹, Javier Velázquez-Muriel¹, Elina Tjioe¹, Dina Schneidman-Duhovny¹, Bret Peterson³, Andrej Sali¹ - Show less +5 more•Institutions (3)

California Institute for Quantitative Biosciences¹, Tel Aviv University², Google³

17 Jan 2012-PLOS Biology

TL;DR: A set of software tools for building and distributing models of macromolecular assemblies uses an integrative structure modeling approach, which casts the building of models as a computational optimization problem where information is encoded into a scoring function used to evaluate candidate models.

...read moreread less

Abstract: A set of software tools for building and distributing models of macromolecular assemblies uses an integrative structure modeling approach, which casts the building of models as a computational optimization problem where information is encoded into a scoring function used to evaluate candidate models.

...read moreread less

Journal Article•DOI•

OMERO: flexible, model-driven data management for experimental biology

[...]

Chris Allan¹, Jean-Marie Burel¹, Josh Moore, Colin Blackburn¹, Melissa Linkert, Scott Loynton¹, Donald MacDonald¹, William J. Moore¹, Carlos H. Neves, Andrew J. Patterson¹, Michael Porter¹, Aleksandra Tarkowska¹, Brian Loranger¹, Jerome Avondo², Ingvar Lagerstedt³, Luca Lianas, Simone Leo, Katherine Hands¹, Ronald T. Hay¹, Ardan Patwardhan³, Christoph Best³, Christoph Best⁴, Gerard J. Kleywegt³, Gianluigi Zanetti, Jason R. Swedlow¹ - Show less +21 more•Institutions (4)

University of Dundee¹, Norwich Research Park², European Bioinformatics Institute³, Google⁴

01 Mar 2012-Nature Methods

TL;DR: OMERO is a software platform that enables access to and use of a wide range of biological data, and its design and flexibility have enabled its use for light- microscopy, high-content-screening, electron-microscopy and even non-image-genotype data.

...read moreread less

Abstract: The Open Microscopy Environment Remote Objects (OMERO) software platform provides a server-based system for managing and analyzing microscopy images and non-image data.

...read moreread less

Proceedings Article•DOI•

RolX: structural role extraction & mining in large graphs

[...]

Keith Henderson¹, Brian Gallagher¹, Tina Eliassi-Rad², Hanghang Tong³, Sugato Basu⁴, Leman Akoglu⁵, Danai Koutra⁵, Christos Faloutsos⁵, Lei Li⁵ - Show less +5 more•Institutions (5)

Lawrence Livermore National Laboratory¹, Rutgers University², IBM³, Google⁴, Carnegie Mellon University⁵

12 Aug 2012

TL;DR: This paper proposes RolX (Role eXtraction), a scalable (linear in the number of edges), unsupervised learning approach for automatically extracting structural roles from general network data, and compares network role discovery with network community discovery.

...read moreread less

Abstract: Given a network, intuitively two nodes belong to the same role if they have similar structural behavior. Roles should be automatically determined from the data, and could be, for example, "clique-members," "periphery-nodes," etc. Roles enable numerous novel and useful network-mining tasks, such as sense-making, searching for similar nodes, and node classification. This paper addresses the question: Given a graph, how can we automatically discover roles for nodes? We propose RolX (Role eXtraction), a scalable (linear in the number of edges), unsupervised learning approach for automatically extracting structural roles from general network data. We demonstrate the effectiveness of RolX on several network-mining tasks: from exploratory data analysis to network transfer learning. Moreover, we compare network role discovery with network community discovery. We highlight fundamental differences between the two (e.g., roles generalize across disconnected networks, communities do not); and show that the two approaches are complimentary in nature.

...read moreread less

Journal Article•

Algorithms for learning kernels based on centered alignment

[...]

Corinna Cortes¹, Mehryar Mohri², Afshin Rostamizadeh¹•Institutions (2)

Google¹, Courant Institute of Mathematical Sciences²

01 Jan 2012-Journal of Machine Learning Research

TL;DR: The notion of centered alignment has been used as a similarity measure between kernels or kernel matrices as mentioned in this paper, which has been shown to consistently outperform the so-called uniform combination solution that has proven to be difficult to improve upon in the past.

...read moreread less

Abstract: This paper presents new and effective algorithms for learning kernels. In particular, as shown by our empirical results, these algorithms consistently outperform the so-called uniform combination solution that has proven to be difficult to improve upon in the past, as well as other algorithms for learning kernels based on convex combinations of base kernels in both classification and regression. Our algorithms are based on the notion of centered alignment which is used as a similarity measure between kernels or kernel matrices. We present a number of novel algorithmic, theoretical, and empirical results for learning kernels based on our notion of centered alignment. In particular, we describe efficient algorithms for learning a maximum alignment kernel by showing that the problem can be reduced to a simple QP and discuss a one-stage algorithm for learning both a kernel and a hypothesis based on that kernel using an alignment-based regularization. Our theoretical results include a novel concentration bound for centered alignment between kernel matrices, the proof of the existence of effective predictors for kernels with high alignment, both for classification and for regression, and the proof of stability-based generalization bounds for a broad family of algorithms for learning kernels based on centered alignment. We also report the results of experiments with our centered alignment-based algorithms in both classification and regression.

...read moreread less

Proceedings Article•

Syntactic Annotations for the Google Books NGram Corpus

[...]

Yuri Lin¹, Jean-Baptiste Michel¹, Erez Lieberman Aiden¹, Jon Orwant¹, William Brockman¹, Slav Petrov¹ - Show less +2 more•Institutions (1)

Google¹

10 Jul 2012

TL;DR: A new edition of the Google Books Ngram Corpus, which describes how often words and phrases were used over a period of five centuries, in eight languages, is presented, which will facilitate the study of linguistic trends, especially those related to the evolution of syntax.

...read moreread less

Abstract: We present a new edition of the Google Books Ngram Corpus, which describes how often words and phrases were used over a period of five centuries, in eight languages; it reflects 6% of all books ever published. This new edition introduces syntactic annotations: words are tagged with their part-of-speech, and head-modifier relationships are recorded. The annotations are produced automatically with statistical models that are specifically adapted to historical text. The corpus will facilitate the study of linguistic trends, especially those related to the evolution of syntax.

...read moreread less

Patent•

Wearable device with input and output structures

[...]

Maj Isabelle Olsson¹, Mitchell Joseph Heinrich¹, Daniel Kelly¹, John Lapetina¹•Institutions (1)

Google¹

14 Aug 2012

TL;DR: In this paper, an electronic device including a frame configured to be worn on the head of a user is disclosed, which can include a bridge configured to support on the nose of the user and a brow portion coupled to and extending away from the bridge.

...read moreread less

Abstract: An electronic device including a frame configured to be worn on the head of a user is disclosed. The frame can include a bridge configured to be supported on the nose of the user and a brow portion coupled to and extending away from the bridge and configured to be positioned over a side of a brow of the user. The frame can further include an arm coupled to the brow portion and extending to a free end. The first arm can be positionable over a temple of the user with the free end disposed near an ear of the user. The device can also include a transparent display affixed to the frame adjacent the brow portion and an input affixed to the frame and configured for receiving from the user an input associated with a function. Information related to the function can be presentable on the display.

...read moreread less

Journal Article•DOI•

Sample size selection in optimization methods for machine learning

[...]

Richard H. Byrd¹, Gillian M. Chin², Jorge Nocedal², Yuchen Wu³•Institutions (3)

University of Colorado Boulder¹, Northwestern University², Google³

01 Aug 2012-Mathematical Programming

TL;DR: A criterion for increasing the sample size based on variance estimates obtained during the computation of a batch gradient, and establishes an O(1/\epsilon) complexity bound on the total cost of a gradient method.

...read moreread less

Abstract: This paper presents a methodology for using varying sample sizes in batch-type optimization methods for large-scale machine learning problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the function and gradient. We propose a criterion for increasing the sample size based on variance estimates obtained during the computation of a batch gradient. We establish an $${O(1/\epsilon)}$$ complexity bound on the total cost of a gradient method. The second part of the paper describes a practical Newton method that uses a smaller sample to compute Hessian vector-products than to evaluate the function and the gradient, and that also employs a dynamic sampling technique. The focus of the paper shifts in the third part of the paper to L 1-regularized problems designed to produce sparse solutions. We propose a Newton-like method that consists of two phases: a (minimalistic) gradient projection phase that identifies zero variables, and subspace phase that applies a subsampled Hessian Newton iteration in the free variables. Numerical tests on speech recognition problems illustrate the performance of the algorithms.

...read moreread less

Journal Article•

Sampling methods for the Nyström method

[...]

Sanjiv Kumar¹, Mehryar Mohri², Ameet Talwalkar³•Institutions (3)

Google¹, Courant Institute of Mathematical Sciences², University of California, Berkeley³

01 Jan 2012-Journal of Machine Learning Research

TL;DR: This work reports results of extensive experiments that provide a detailed comparison of various fixed and adaptive sampling techniques, and demonstrates the performance improvement associated with the ensemble Nystrom method when used in conjunction with either fixed or adaptive sampling schemes.

...read moreread less

Abstract: The Nystrom method is an efficient technique to generate low-rank matrix approximations and is used in several large-scale learning applications. A key aspect of this method is the procedure according to which columns are sampled from the original matrix. In this work, we explore the efficacy of a variety of fixed and adaptive sampling schemes. We also propose a family of ensemble-based sampling algorithms for the Nystrom method. We report results of extensive experiments that provide a detailed comparison of various fixed and adaptive sampling techniques, and demonstrate the performance improvement associated with the ensemble Nystrom method when used in conjunction with either fixed or adaptive sampling schemes. Corroborating these empirical findings, we present a theoretical analysis of the Nystrom method, providing novel error bounds guaranteeing a better convergence rate of the ensemble Nystrom method in comparison to the standard Nystrom method.

...read moreread less

Proceedings Article•

Joint Learning of Words and Meaning Representations for Open-Text Semantic Parsing

[...]

Antoine Bordes, Xavier Glorot, Jason Weston¹, Yoshua Bengio•Institutions (1)

Google¹

21 Mar 2012

TL;DR: This work proposes a method that learns to assign MRs to a wide range of text thanks to a training scheme that combines learning from knowledge bases with learning from raw text.

...read moreread less

Abstract: Open-text semantic parsers are designed to interpret any statement in natural language by inferring a corresponding meaning representation (MR – a formal representation of its sense). Unfortunately, large scale systems cannot be easily machine-learned due to a lack of directly supervised data. We propose a method that learns to assign MRs to a wide range of text (using a dictionary of more than 70,000 words mapped to more than 40,000 entities) thanks to a training scheme that combines learning from knowledge bases (e.g. WordNet) with learning from raw text. The model jointly learns representations of words, entities and MRs via a multi-task training process operating on these diverse sources of data. Hence, the system ends up providing methods for knowledge acquisition and wordsense disambiguation within the context of semantic parsing in a single elegant framework. Experiments on these various tasks indicate the promise of the approach.

...read moreread less

Proceedings Article•

Recurrent Neural Networks for Noise Reduction in Robust ASR

[...]

Andrew L. Maas¹, Quoc V. Le¹, Tyler M. O'Neil¹, Oriol Vinyals², Patrick Nguyen³, Andrew Y. Ng¹ - Show less +2 more•Institutions (3)

Stanford University¹, University of California, Berkeley², Google³

01 Jan 2012

TL;DR: This work introduces a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR, and demonstrates the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly.

...read moreread less

Abstract: Recent work on deep neural networks as acoustic models for automatic speech recognition (ASR) have demonstrated substantial performance improvements. We introduce a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR. The model is trained on stereo (noisy and clean) audio features to predict clean features given noisy input. The model makes no assumptions about how noise affects the signal, nor the existence of distinct noise environments. Instead, the model can learn to model any type of distortion or additive noise given sufficient training data. We demonstrate the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly.

...read moreread less

Patent•

Methods and systems for a virtual input device

[...]

Thad Starner¹, Liang-Yu Tom Chi¹, Luis Ricardo Prada Gomez¹•Institutions (1)

Google¹

26 Jun 2012

TL;DR: In this paper, the authors present a system and methods for a virtual input device that includes a projector and a camera, where the camera captures images that can be interpreted by a processor to determine actions.

...read moreread less

Abstract: The present application discloses systems and methods for a virtual input device. In one example, the virtual input device includes a projector and a camera. The projector projects a pattern onto a surface. The camera captures images that can be interpreted by a processor to determine actions. The projector may be mounted on an arm of a pair of eyeglasses and the camera may be mounted on an opposite arm of the eyeglasses. A pattern for a virtual input device can be projected onto a “display hand” of a user, and the camera may be able to detect when the user uses an opposite hand to select items of the virtual input device. In another example, the camera may detect when the display hand is moving and interpret display hand movements as inputs to the virtual input device, and/or realign the projection onto the moving display hand.

...read moreread less

Journal Article•DOI•

Advances and challenges in log analysis

[...]

Adam J. Oliner¹, Archana Ganapathi, Wei Xu²•Institutions (2)

University of California, Berkeley¹, Google²

01 Feb 2012-Communications of The ACM

TL;DR: Logs contain a wealth of information to help manage systems and can be used to improve the quality and efficiency of systems and improve the user experience.

...read moreread less

Abstract: Computer-system logs provide a glimpse into the states of a running system. Instrumentation occasionally generates short messages that are collected in a system-specific log. The content and format...

...read moreread less

Proceedings Article•DOI•

Mirror mirror on the ceiling: flexible wireless links for data centers

[...]

Xia Zhou¹, Zengbin Zhang¹, Yibo Zhu¹, Yubo Li², Saipriya Kumar¹, Amin Vahdat³, Ben Y. Zhao¹, Haitao Zheng¹ - Show less +4 more•Institutions (3)

University of California, Santa Barbara¹, Xi'an Jiaotong University², Google³

13 Aug 2012

TL;DR: 3D beamforming is proposed and evaluated, where 60 GHz signals bounce off data center ceilings, thus establishing indirect line-of-sight between any two racks in a data center, thus improving link range and number of concurrent transmissions in the data center.

...read moreread less

Abstract: Modern data centers are massive, and support a range of distributed applications across potentially hundreds of server racks. As their utilization and bandwidth needs continue to grow, traditional methods of augmenting bandwidth have proven complex and costly in time and resources. Recent measurements show that data center traffic is often limited by congestion loss caused by short traffic bursts. Thus an attractive alternative to adding physical bandwidth is to augment wired links with wireless links in the 60 GHz band.We address two limitations with current 60 GHz wireless proposals. First, 60 GHz wireless links are limited by line-of-sight, and can be blocked by even small obstacles. Second, even beamforming links leak power, and potential interference will severely limit concurrent transmissions in dense data centers. We propose and evaluate a new wireless primitive for data centers, 3D beamforming, where 60 GHz signals bounce off data center ceilings, thus establishing indirect line-of-sight between any two racks in a data center. We build a small 3D beamforming testbed to demonstrate its ability to address both link blockage and link interference, thus improving link range and number of concurrent transmissions in the data center. In addition, we propose a simple link scheduler and use traffic simulations to show that these 3D links significantly expand wireless capacity compared to their 2D counterparts.

...read moreread less

Collapse