scispace - formally typeset
Search or ask a question

Showing papers by "Google published in 2012"


Journal ArticleDOI
TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Abstract: Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

9,091 citations


Proceedings Article
03 Dec 2012
TL;DR: This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.
Abstract: Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly- sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.

3,475 citations


Journal ArticleDOI
Hyunyoung Choi1, Hal R. Varian1
TL;DR: This paper used search engine data to forecast near-term values of economic indicators, such as automobile sales, unemployment claims, travel destination planning, and consumer confidence, and showed how to use this information to forecast future economic indicators.
Abstract: In this paper we show how to use search engine data to forecast near-term values of economic indicators. Examples include automobile sales, unemployment claims, travel destination planning and consumer confidence.

1,619 citations


Proceedings ArticleDOI
08 Oct 2012
TL;DR: This article describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty, critical to supporting external consistency and a variety of powerful features.
Abstract: Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: nonblocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.

1,366 citations


Journal ArticleDOI
01 Aug 2012
TL;DR: This talk will trace the history of the Approximate Frequency Counts paper, how it was conceptualized and how it influenced data stream research.
Abstract: Research in data stream algorithms has blossomed since late 90s. The talk will trace the history of the Approximate Frequency Counts paper, how it was conceptualized and how it influenced data stream research. The talk will also touch upon a recent development: analysis of personal data streams for improving our quality of lives.

1,291 citations


Journal ArticleDOI
TL;DR: In this paper, a generic objectness measure is proposed to quantify how likely an image window is to contain an object of any class, such as cows and telephones, from amorphous background elements such as grass and road.
Abstract: We present a generic objectness measure, quantifying how likely it is for an image window to contain an object of any class. We explicitly train it to distinguish objects with a well-defined boundary in space, such as cows and telephones, from amorphous background elements, such as grass and road. The measure combines in a Bayesian framework several image cues measuring characteristics of objects, such as appearing different from their surroundings and having a closed boundary. These include an innovative cue to measure the closed boundary characteristic. In experiments on the challenging PASCAL VOC 07 dataset, we show this new cue to outperform a state-of-the-art saliency measure, and the combined objectness measure to perform better than any cue alone. We also compare to interest point operators, a HOG detector, and three recent works aiming at automatic object segmentation. Finally, we present two applications of objectness. In the first, we sample a small numberof windows according to their objectness probability and give an algorithm to employ them as location priors for modern class-specific object detectors. As we show experimentally, this greatly reduces the number of windows evaluated by the expensive class-specific model. In the second application, we use objectness as a complementary score in addition to the class-specific model, which leads to fewer false positives. As shown in several recent papers, objectness can act as a valuable focus of attention mechanism in many other applications operating on image windows, including weakly supervised learning of object categories, unsupervised pixelwise segmentation, and object tracking in video. Computing objectness is very efficient and takes only about 4 sec. per image.

1,223 citations


Journal ArticleDOI
TL;DR: This work proposes a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets and presents three different semi- supervised hashing methods, including orthogonal hashing, nonorthogonal hash, and sequential hashing.
Abstract: Hashing-based approximate nearest neighbor (ANN) search in huge databases has become popular due to its computational and memory efficiency. The popular hashing methods, e.g., Locality Sensitive Hashing and Spectral Hashing, construct hash functions based on random or principal projections. The resulting hashes are either not very accurate or are inefficient. Moreover, these methods are designed for a given metric similarity. On the contrary, semantic similarity is usually given in terms of pairwise labels of samples. There exist supervised hashing methods that can handle such semantic similarity, but they are prone to overfitting when labeled data are small or noisy. In this work, we propose a semi-supervised hashing (SSH) framework that minimizes empirical error over the labeled set and an information theoretic regularizer over both labeled and unlabeled sets. Based on this framework, we present three different semi-supervised hashing methods, including orthogonal hashing, nonorthogonal hashing, and sequential hashing. Particularly, the sequential hashing method generates robust codes in which each hash function is designed to correct the errors made by the previous ones. We further show that the sequential learning paradigm can be extended to unsupervised domains where no labeled pairs are available. Extensive experiments on four large datasets (up to 80 million samples) demonstrate the superior performance of the proposed SSH methods over state-of-the-art supervised and unsupervised hashing techniques.

834 citations


Proceedings Article
13 Jun 2012
TL;DR: The paper presents AddressSanitizer, a new memory error detector that achieves efficiency without sacrificing comprehensiveness, and has found over 300 previously unknown bugs in the Chromium browser and many bugs in other software.
Abstract: Memory access bugs, including buffer overflows and uses of freed heap memory, remain a serious problem for programming languages like C and C++. Many memory error detectors exist, but most of them are either slow or detect a limited set of bugs, or both. This paper presents AddressSanitizer, a new memory error detector. Our tool finds out-of-bounds accesses to heap, stack, and global objects, as well as use-after-free bugs. It employs a specialized memory allocator and code instrumentation that is simple enough to be implemented in any compiler, binary translation system, or even in hardware. AddressSanitizer achieves efficiency without sacrificing comprehensiveness. Its average slowdown is just 73% yet it accurately detects bugs at the point of occurrence. It has found over 300 previously unknown bugs in the Chromium browser and many bugs in other software.

795 citations


Proceedings Article
26 Jun 2012
TL;DR: In this paper, a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization was used to learn high-level, class-specific feature detectors from only unlabeled data.
Abstract: We consider the problem of building high-level, class-specific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images using unlabeled images? To answer this, we train a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization on a large dataset of images (the model has 1 billion connections, the dataset has 10 million 200×200 pixel images downloaded from the Internet). We train this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. Control experiments show that this feature detector is robust not only to translation but also to scaling and out-of-plane rotation. We also find that the same network is sensitive to other high-level concepts such as cat faces and human bodies. Starting with these learned features, we trained our network to obtain 15.8% accuracy in recognizing 20,000 object categories from ImageNet, a leap of 70% relative improvement over the previous state-of-the-art.

786 citations


Proceedings ArticleDOI
Mike Schuster1, Kaisuke Nakajima1
25 Mar 2012
TL;DR: The techniques used to deal with an infinite vocabulary, how modeling completely in the written domain for language model and dictionary can avoid some system complexity, and how dictionaries, language and acoustic models are built in this framework are described.
Abstract: This paper describes challenges and solutions for building a successful voice search system as applied to Japanese and Korean at Google. We describe the techniques used to deal with an infinite vocabulary, how modeling completely in the written domain for language model and dictionary can avoid some system complexity, and how we built dictionaries, language and acoustic models in this framework. We show how to deal with the difficulty of scoring results for multiple script languages because of ambiguities. The development of voice search for these languages led to a significant simplification of the original process to build a system for any new language which in in parts became our default process for internationalization of voice search.

774 citations


Proceedings Article
01 May 2012
TL;DR: This work proposes a tagset that consists of twelve universal part-of-speech categories and develops a mapping from 25 different treebank tagsets to this universal set, which when combined with the original treebank data produces a dataset consisting of common parts- of-speech for 22 different languages.
Abstract: To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts-of-speech for 22 different languages. We highlight the use of this resource via three experiments, that (1) compare tagging accuracies across languages, (2) present an unsupervised grammar induction approach that does not use gold standard part-of-speech tags, and (3) use the universal tags to transfer dependency parsers between languages, achieving state-of-the-art results.

Proceedings ArticleDOI
16 Apr 2012
TL;DR: This paper studies spam detection in the collaborative setting, i.e., to discover fake reviewer groups by using several behavioral models derived from the collusion phenomenon among fake reviewers and relation models based on the relationships among groups, individual reviewers, and products they reviewed to detectfake reviewer groups.
Abstract: Opinionated social media such as product reviews are now widely used by individuals and organizations for their decision making. However, due to the reason of profit or fame, people try to game the system by opinion spamming (e.g., writing fake reviews) to promote or demote some target products. For reviews to reflect genuine user experiences and opinions, such spam reviews should be detected. Prior works on opinion spam focused on detecting fake reviews and individual fake reviewers. However, a fake reviewer group (a group of reviewers who work collaboratively to write fake reviews) is even more damaging as they can take total control of the sentiment on the target product due to its size. This paper studies spam detection in the collaborative setting, i.e., to discover fake reviewer groups. The proposed method first uses a frequent itemset mining method to find a set of candidate groups. It then uses several behavioral models derived from the collusion phenomenon among fake reviewers and relation models based on the relationships among groups, individual reviewers, and products they reviewed to detect fake reviewer groups. Additionally, we also built a labeled dataset of fake reviewer groups. Although labeling individual fake reviews and reviewers is very hard, to our surprise labeling fake reviewer groups is much easier. We also note that the proposed technique departs from the traditional supervised learning approach for spam detection because of the inherent nature of our problem which makes the classic supervised learning approach less effective. Experimental results show that the proposed method outperforms multiple strong baselines including the state-of-the-art supervised classification, regression, and learning to rank algorithms.

Journal ArticleDOI
TL;DR: It is proved that obfuscation is impossible, by constructing a family of efficient programs that are unobfuscatable, in the sense that given any efficient program, the “source code” of that program can be efficiently reconstructed.
Abstract: Informally, an obfuscatorO is an (efficient, probabilistic) “compiler” that takes as input a program (or circuit) P and produces a new program O(P) that has the same functionality as P yet is “unintelligible” in some sense. Obfuscators, if they exist, would have a wide variety of cryptographic and complexity-theoretic applications, ranging from software protection to homomorphic encryption to complexity-theoretic analogues of Rice's theorem. Most of these applications are based on an interpretation of the “unintelligibility” condition in obfuscation as meaning that O(P) is a “virtual black box,” in the sense that anything one can efficiently compute given O(P), one could also efficiently compute given oracle access to P.In this work, we initiate a theoretical investigation of obfuscation. Our main result is that, even under very weak formalizations of the above intuition, obfuscation is impossible. We prove this by constructing a family of efficient programs P that are unobfuscatable in the sense that (a) given any efficient program P' that computes the same function as a program P ∈ p, the “source code” P can be efficiently reconstructed, yet (b) given oracle access to a (randomly selected) program P ∈ p, no efficient algorithm can reconstruct P (or even distinguish a certain bit in the code from random) except with negligible probability.We extend our impossibility result in a number of ways, including even obfuscators that (a) are not necessarily computable in polynomial time, (b) only approximately preserve the functionality, and (c) only need to work for very restricted models of computation (TC0). We also rule out several potential applications of obfuscators, by constructing “unobfuscatable” signature schemes, encryption schemes, and pseudorandom function families.

Journal ArticleDOI
TL;DR: The current practices in the information visualization research community are encapsulated and a different approach is provided to reaching decisions about what might be the most effective evaluation of a given information visualization.
Abstract: We take a new, scenario-based look at evaluation in information visualization. Our seven scenarios, evaluating visual data analysis and reasoning, evaluating user performance, evaluating user experience, evaluating environments and work practices, evaluating communication through visualization, evaluating visualization algorithms, and evaluating collaborative data analysis were derived through an extensive literature review of over 800 visualization publications. These scenarios distinguish different study goals and types of research questions and are illustrated through example studies. Through this broad survey and the distillation of these scenarios, we make two contributions. One, we encapsulate the current practices in the information visualization research community and, two, we provide a different approach to reaching decisions about what might be the most effective evaluation of a given information visualization. Scenarios can be used to choose appropriate research questions and goals and the provided examples can be consulted for guidance on how to design one's own study.

Proceedings ArticleDOI
13 Aug 2012
TL;DR: This work proposes Deadline-Aware Datacenter TCP (D2TCP), a novel transport protocol, which handles bursts, is deadline-aware, and is readily deployable and uses a novel congestion avoidance algorithm which uses ECN feedback and deadlines to modulate the congestion window via a gamma-correction function.
Abstract: An important class of datacenter applications, called Online Data-Intensive (OLDI) applications, includes Web search, online retail, and advertisement. To achieve good user experience, OLDI applications operate under soft-real-time constraints (e.g., 300 ms latency) which imply deadlines for network communication within the applications. Further, OLDI applications typically employ tree-based algorithms which, in the common case, result in bursts of children-to-parent traffic with tight deadlines. Recent work on datacenter network protocols is either deadline-agnostic (DCTCP) or is deadline-aware (D3) but suffers under bursts due to race conditions. Further, D3 has the practical drawbacks of requiring changes to the switch hardware and not being able to coexist with legacy TCP. We propose Deadline-Aware Datacenter TCP (D2TCP), a novel transport protocol, which handles bursts, is deadline-aware, and is readily deployable. In designing D2TCP, we make two contributions: (1) D2TCP uses a distributed and reactive approach for bandwidth allocation which fundamentally enables D2TCP's properties. (2) D2TCP employs a novel congestion avoidance algorithm, which uses ECN feedback and deadlines to modulate the congestion window via a gamma-correction function. Using a small-scale implementation and at-scale simulations, we show that D2TCP reduces the fraction of missed deadlines compared to DCTCP and D3 by 75% and 50%, respectively.

Journal ArticleDOI
TL;DR: Findings from a exploratory study conducted with government officials in Arlington, VA between June and December 2010 are presented, with the broad goal of understanding social media use by government officials as well as community organizations, businesses, and the public at large.

Proceedings Article
25 Apr 2012
TL;DR: The HULL (High-bandwidth Ultra-Low Latency) architecture is presented to balance two seemingly contradictory goals: near baseline fabric latency and high bandwidth utilization and results show that by sacrificing a small amount of bandwidth, HULL can dramatically reduce average and tail latencies in the data center.
Abstract: Traditional measures of network goodness--goodput, quality of service, fairness--are expressed in terms of bandwidth. Network latency has rarely been a primary concern because delivering the highest level of bandwidth essentially entails driving up latency--at the mean and, especially, at the tail. Recently, however, there has been renewed interest in latency as a primary metric for mainstream applications. In this paper, we present the HULL (High-bandwidth Ultra-Low Latency) architecture to balance two seemingly contradictory goals: near baseline fabric latency and high bandwidth utilization. HULL leaves 'bandwidth headroom' using Phantom Queues that deliver congestion signals before network links are fully utilized and queues form at switches. By capping utilization at less than link capacity, we leave room for latency sensitive traffic to avoid buffering and the associated large delays. At the same time, we use DCTCP, a recently proposed congestion control algorithm, to adaptively respond to congestion and to mitigate the bandwidth penalties which arise from operating in a bufferless fashion. HULL further employs packet pacing to counter burstiness caused by Interrupt Coalescing and Large Send Offloading. Our implementation and simulation results show that by sacrificing a small amount (e.g., 10%) of bandwidth, HULL can dramatically reduce average and tail latencies in the data center.

Journal ArticleDOI
TL;DR: A set of software tools for building and distributing models of macromolecular assemblies uses an integrative structure modeling approach, which casts the building of models as a computational optimization problem where information is encoded into a scoring function used to evaluate candidate models.
Abstract: A set of software tools for building and distributing models of macromolecular assemblies uses an integrative structure modeling approach, which casts the building of models as a computational optimization problem where information is encoded into a scoring function used to evaluate candidate models.

Journal ArticleDOI
TL;DR: OMERO is a software platform that enables access to and use of a wide range of biological data, and its design and flexibility have enabled its use for light- microscopy, high-content-screening, electron-microscopy and even non-image-genotype data.
Abstract: The Open Microscopy Environment Remote Objects (OMERO) software platform provides a server-based system for managing and analyzing microscopy images and non-image data.

Proceedings ArticleDOI
12 Aug 2012
TL;DR: This paper proposes RolX (Role eXtraction), a scalable (linear in the number of edges), unsupervised learning approach for automatically extracting structural roles from general network data, and compares network role discovery with network community discovery.
Abstract: Given a network, intuitively two nodes belong to the same role if they have similar structural behavior. Roles should be automatically determined from the data, and could be, for example, "clique-members," "periphery-nodes," etc. Roles enable numerous novel and useful network-mining tasks, such as sense-making, searching for similar nodes, and node classification. This paper addresses the question: Given a graph, how can we automatically discover roles for nodes? We propose RolX (Role eXtraction), a scalable (linear in the number of edges), unsupervised learning approach for automatically extracting structural roles from general network data. We demonstrate the effectiveness of RolX on several network-mining tasks: from exploratory data analysis to network transfer learning. Moreover, we compare network role discovery with network community discovery. We highlight fundamental differences between the two (e.g., roles generalize across disconnected networks, communities do not); and show that the two approaches are complimentary in nature.

Journal Article
TL;DR: The notion of centered alignment has been used as a similarity measure between kernels or kernel matrices as mentioned in this paper, which has been shown to consistently outperform the so-called uniform combination solution that has proven to be difficult to improve upon in the past.
Abstract: This paper presents new and effective algorithms for learning kernels. In particular, as shown by our empirical results, these algorithms consistently outperform the so-called uniform combination solution that has proven to be difficult to improve upon in the past, as well as other algorithms for learning kernels based on convex combinations of base kernels in both classification and regression. Our algorithms are based on the notion of centered alignment which is used as a similarity measure between kernels or kernel matrices. We present a number of novel algorithmic, theoretical, and empirical results for learning kernels based on our notion of centered alignment. In particular, we describe efficient algorithms for learning a maximum alignment kernel by showing that the problem can be reduced to a simple QP and discuss a one-stage algorithm for learning both a kernel and a hypothesis based on that kernel using an alignment-based regularization. Our theoretical results include a novel concentration bound for centered alignment between kernel matrices, the proof of the existence of effective predictors for kernels with high alignment, both for classification and for regression, and the proof of stability-based generalization bounds for a broad family of algorithms for learning kernels based on centered alignment. We also report the results of experiments with our centered alignment-based algorithms in both classification and regression.

Proceedings Article
10 Jul 2012
TL;DR: A new edition of the Google Books Ngram Corpus, which describes how often words and phrases were used over a period of five centuries, in eight languages, is presented, which will facilitate the study of linguistic trends, especially those related to the evolution of syntax.
Abstract: We present a new edition of the Google Books Ngram Corpus, which describes how often words and phrases were used over a period of five centuries, in eight languages; it reflects 6% of all books ever published. This new edition introduces syntactic annotations: words are tagged with their part-of-speech, and head-modifier relationships are recorded. The annotations are produced automatically with statistical models that are specifically adapted to historical text. The corpus will facilitate the study of linguistic trends, especially those related to the evolution of syntax.

Patent
14 Aug 2012
TL;DR: In this paper, an electronic device including a frame configured to be worn on the head of a user is disclosed, which can include a bridge configured to support on the nose of the user and a brow portion coupled to and extending away from the bridge.
Abstract: An electronic device including a frame configured to be worn on the head of a user is disclosed. The frame can include a bridge configured to be supported on the nose of the user and a brow portion coupled to and extending away from the bridge and configured to be positioned over a side of a brow of the user. The frame can further include an arm coupled to the brow portion and extending to a free end. The first arm can be positionable over a temple of the user with the free end disposed near an ear of the user. The device can also include a transparent display affixed to the frame adjacent the brow portion and an input affixed to the frame and configured for receiving from the user an input associated with a function. Information related to the function can be presentable on the display.

Journal ArticleDOI
TL;DR: A criterion for increasing the sample size based on variance estimates obtained during the computation of a batch gradient, and establishes an O(1/\epsilon) complexity bound on the total cost of a gradient method.
Abstract: This paper presents a methodology for using varying sample sizes in batch-type optimization methods for large-scale machine learning problems. The first part of the paper deals with the delicate issue of dynamic sample selection in the evaluation of the function and gradient. We propose a criterion for increasing the sample size based on variance estimates obtained during the computation of a batch gradient. We establish an $${O(1/\epsilon)}$$ complexity bound on the total cost of a gradient method. The second part of the paper describes a practical Newton method that uses a smaller sample to compute Hessian vector-products than to evaluate the function and the gradient, and that also employs a dynamic sampling technique. The focus of the paper shifts in the third part of the paper to L 1-regularized problems designed to produce sparse solutions. We propose a Newton-like method that consists of two phases: a (minimalistic) gradient projection phase that identifies zero variables, and subspace phase that applies a subsampled Hessian Newton iteration in the free variables. Numerical tests on speech recognition problems illustrate the performance of the algorithms.

Journal Article
TL;DR: This work reports results of extensive experiments that provide a detailed comparison of various fixed and adaptive sampling techniques, and demonstrates the performance improvement associated with the ensemble Nystrom method when used in conjunction with either fixed or adaptive sampling schemes.
Abstract: The Nystrom method is an efficient technique to generate low-rank matrix approximations and is used in several large-scale learning applications. A key aspect of this method is the procedure according to which columns are sampled from the original matrix. In this work, we explore the efficacy of a variety of fixed and adaptive sampling schemes. We also propose a family of ensemble-based sampling algorithms for the Nystrom method. We report results of extensive experiments that provide a detailed comparison of various fixed and adaptive sampling techniques, and demonstrate the performance improvement associated with the ensemble Nystrom method when used in conjunction with either fixed or adaptive sampling schemes. Corroborating these empirical findings, we present a theoretical analysis of the Nystrom method, providing novel error bounds guaranteeing a better convergence rate of the ensemble Nystrom method in comparison to the standard Nystrom method.

Proceedings Article
21 Mar 2012
TL;DR: This work proposes a method that learns to assign MRs to a wide range of text thanks to a training scheme that combines learning from knowledge bases with learning from raw text.
Abstract: Open-text semantic parsers are designed to interpret any statement in natural language by inferring a corresponding meaning representation (MR – a formal representation of its sense). Unfortunately, large scale systems cannot be easily machine-learned due to a lack of directly supervised data. We propose a method that learns to assign MRs to a wide range of text (using a dictionary of more than 70,000 words mapped to more than 40,000 entities) thanks to a training scheme that combines learning from knowledge bases (e.g. WordNet) with learning from raw text. The model jointly learns representations of words, entities and MRs via a multi-task training process operating on these diverse sources of data. Hence, the system ends up providing methods for knowledge acquisition and wordsense disambiguation within the context of semantic parsing in a single elegant framework. Experiments on these various tasks indicate the promise of the approach.

Proceedings Article
01 Jan 2012
TL;DR: This work introduces a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR, and demonstrates the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly.
Abstract: Recent work on deep neural networks as acoustic models for automatic speech recognition (ASR) have demonstrated substantial performance improvements. We introduce a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR. The model is trained on stereo (noisy and clean) audio features to predict clean features given noisy input. The model makes no assumptions about how noise affects the signal, nor the existence of distinct noise environments. Instead, the model can learn to model any type of distortion or additive noise given sufficient training data. We demonstrate the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly.

Patent
26 Jun 2012
TL;DR: In this paper, the authors present a system and methods for a virtual input device that includes a projector and a camera, where the camera captures images that can be interpreted by a processor to determine actions.
Abstract: The present application discloses systems and methods for a virtual input device. In one example, the virtual input device includes a projector and a camera. The projector projects a pattern onto a surface. The camera captures images that can be interpreted by a processor to determine actions. The projector may be mounted on an arm of a pair of eyeglasses and the camera may be mounted on an opposite arm of the eyeglasses. A pattern for a virtual input device can be projected onto a “display hand” of a user, and the camera may be able to detect when the user uses an opposite hand to select items of the virtual input device. In another example, the camera may detect when the display hand is moving and interpret display hand movements as inputs to the virtual input device, and/or realign the projection onto the moving display hand.

Journal ArticleDOI
TL;DR: Logs contain a wealth of information to help manage systems and can be used to improve the quality and efficiency of systems and improve the user experience.
Abstract: Computer-system logs provide a glimpse into the states of a running system. Instrumentation occasionally generates short messages that are collected in a system-specific log. The content and format...

Proceedings ArticleDOI
13 Aug 2012
TL;DR: 3D beamforming is proposed and evaluated, where 60 GHz signals bounce off data center ceilings, thus establishing indirect line-of-sight between any two racks in a data center, thus improving link range and number of concurrent transmissions in the data center.
Abstract: Modern data centers are massive, and support a range of distributed applications across potentially hundreds of server racks. As their utilization and bandwidth needs continue to grow, traditional methods of augmenting bandwidth have proven complex and costly in time and resources. Recent measurements show that data center traffic is often limited by congestion loss caused by short traffic bursts. Thus an attractive alternative to adding physical bandwidth is to augment wired links with wireless links in the 60 GHz band.We address two limitations with current 60 GHz wireless proposals. First, 60 GHz wireless links are limited by line-of-sight, and can be blocked by even small obstacles. Second, even beamforming links leak power, and potential interference will severely limit concurrent transmissions in dense data centers. We propose and evaluate a new wireless primitive for data centers, 3D beamforming, where 60 GHz signals bounce off data center ceilings, thus establishing indirect line-of-sight between any two racks in a data center. We build a small 3D beamforming testbed to demonstrate its ability to address both link blockage and link interference, thus improving link range and number of concurrent transmissions in the data center. In addition, we propose a simple link scheduler and use traffic simulations to show that these 3D links significantly expand wireless capacity compared to their 2D counterparts.