Showing papers by "Google published in 2011"

PDF

Open Access

Journal Article•

Scikit-learn: Machine Learning in Python

[...]

Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel¹, Peter Prettenhofer², Ron Weiss³, Vincent Dubourg, Jake Vanderplas⁴, Alexandre Passos⁵, David Cournapeau, Matthieu Brucher⁶, Matthieu Perrot, Edouard Duchesnay - Show less +12 more•Institutions (6)

Kobe University¹, Bauhaus University, Weimar², Google³, University of Washington⁴, University of Massachusetts Amherst⁵, Total S.A.⁶

01 Feb 2011-Journal of Machine Learning Research

TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.

...read moreread less

Abstract: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net.

...read moreread less

47,974 citations

Journal Article•

Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

[...]

John C. Duchi¹, Elad Hazan², Yoram Singer³•Institutions (3)

University of California, Berkeley¹, Princeton University², Google³

01 Feb 2011-Journal of Machine Learning Research

TL;DR: This work describes and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal functions that can be chosen in hindsight.

...read moreread less

Abstract: We present a new family of subgradient methods that dynamically incorporate knowledge of the geometry of the data observed in earlier iterations to perform more informative gradient-based learning. Metaphorically, the adaptation allows us to find needles in haystacks in the form of very predictive but rarely seen features. Our paradigm stems from recent advances in stochastic optimization and online learning which employ proximal functions to control the gradient steps of the algorithm. We describe and analyze an apparatus for adaptively modifying the proximal function, which significantly simplifies setting a learning rate and results in regret guarantees that are provably as good as the best proximal function that can be chosen in hindsight. We give several efficient algorithms for empirical risk minimization problems with common and important regularization functions and domain constraints. We experimentally study our theoretical analysis and show that adaptive subgradient methods outperform state-of-the-art, yet non-adaptive, subgradient algorithms.

...read moreread less

6,984 citations

Journal Article•

Natural Language Processing (Almost) from Scratch

[...]

Ronan Collobert, Jason Weston¹, Léon Bottou, Michael Karlen, Koray Kavukcuoglu², Pavel P. Kuksa³ - Show less +2 more•Institutions (3)

Google¹, New York University², Rutgers University³

01 Feb 2011-Journal of Machine Learning Research

TL;DR: A unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling is proposed.

...read moreread less

Abstract: We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements.

...read moreread less

6,734 citations

Reading Digits in Natural Images with Unsupervised Feature Learning

[...]

Yuval Netzer¹, Tao Wang¹, Adam Coates¹, Alessandro Bissacco², Bo Wu², Andrew Y. Ng² - Show less +2 more•Institutions (2)

Google¹, Stanford University²

01 Jan 2011

TL;DR: A new benchmark dataset for research use is introduced containing over 600,000 labeled digits cropped from Street View images, and variants of two recently proposed unsupervised feature learning methods are employed, finding that they are convincingly superior on benchmarks.

...read moreread less

Abstract: Detecting and reading text from natural images is a hard computer vision task that is central to a variety of emerging applications. Related problems like document character recognition have been widely studied by computer vision and machine learning researchers and are virtually solved for practical applications like reading handwritten digits. Reliably recognizing characters in more complex scenes like photographs, however, is far more difficult: the best existing methods lag well behind human performance on the same tasks. In this paper we attack the problem of recognizing digits in a real application using unsupervised feature learning methods: reading house numbers from street level photos. To this end, we introduce a new benchmark dataset for research use containing over 600,000 labeled digits cropped from Street View images. We then demonstrate the difficulty of recognizing these digits when the problem is approached with hand-designed features. Finally, we employ variants of two recently proposed unsupervised feature learning methods and find that they are convincingly superior on our benchmarks.

...read moreread less

5,311 citations

Journal Article•DOI•

The gem5 simulator

[...]

Nathan Binkert¹, Bradford M. Beckmann², Gabriel Black³, Steven K. Reinhardt², Ali G. Saidi, Arkaprava Basu⁴, Joel Hestness⁵, Derek R. Hower⁴, Tushar Krishna⁶, Somayeh Sardashti⁴, Rathijit Sen⁴, Korey Sewell⁷, Muhammad Shoaib⁴, Nilay Vaish⁴, Mark D. Hill⁴, Darien Wood⁴ - Show less +12 more•Institutions (7)

Hewlett-Packard¹, Advanced Micro Devices², Google³, University of Wisconsin-Madison⁴, University of Texas at Austin⁵, Massachusetts Institute of Technology⁶, University of Michigan⁷

31 Aug 2011-ACM Sigarch Computer Architecture News

TL;DR: The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.

...read moreread less

Abstract: The gem5 simulation infrastructure is the merger of the best aspects of the M5 [4] and GEMS [9] simulators. M5 provides a highly configurable simulation framework, multiple ISAs, and diverse CPU models. GEMS complements these features with a detailed and exible memory system, including support for multiple cache coherence protocols and interconnect models. Currently, gem5 supports most commercial ISAs (ARM, ALPHA, MIPS, Power, SPARC, and x86), including booting Linux on three of them (ARM, ALPHA, and x86).The project is the result of the combined efforts of many academic and industrial institutions, including AMD, ARM, HP, MIPS, Princeton, MIT, and the Universities of Michigan, Texas, and Wisconsin. Over the past ten years, M5 and GEMS have been used in hundreds of publications and have been downloaded tens of thousands of times. The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.

...read moreread less

4,039 citations

Journal Article•DOI•

Quantitative analysis of culture using millions of digitized books

[...]

Jean-Baptiste Michel¹, Yuan Kui Shen², Yuan Kui Shen¹, Aviva Presser Aiden¹, Adrian Veres¹, Matthew K. Gray³, Joseph P. Pickett, Dale Hoiberg, Dan Clancy³, Peter Norvig³, Jon Orwant³, Steven Pinker¹, Martin A. Nowak¹, Erez Lieberman Aiden - Show less +10 more•Institutions (3)

Harvard University¹, Massachusetts Institute of Technology², Google³

14 Jan 2011-Science

TL;DR: This work surveys the vast terrain of ‘culturomics,’ focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000, and shows how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology and the pursuit of fame.

...read moreread less

Abstract: We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of 'culturomics,' focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.

...read moreread less

2,257 citations

Journal Article•DOI•

Pegasos: primal estimated sub-gradient solver for SVM

[...]

Shai Shalev-Shwartz¹, Yoram Singer², Nathan Srebro³, Andrew Cotter³•Institutions (3)

Hebrew University of Jerusalem¹, Google², Toyota Technological Institute at Chicago³

01 Mar 2011-Mathematical Programming

TL;DR: A simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines, which is particularly well suited for large text classification problems, and demonstrates an order-of-magnitude speedup over previous SVM learning methods.

...read moreread less

Abstract: We describe and analyze a simple and effective stochastic sub-gradient descent algorithm for solving the optimization problem cast by Support Vector Machines (SVM). We prove that the number of iterations required to obtain a solution of accuracy $${\epsilon}$$ is $${\tilde{O}(1 / \epsilon)}$$, where each iteration operates on a single training example. In contrast, previous analyses of stochastic gradient descent methods for SVMs require $${\Omega(1 / \epsilon^2)}$$ iterations. As in previously devised SVM solvers, the number of iterations also scales linearly with 1/λ, where λ is the regularization parameter of SVM. For a linear kernel, the total run-time of our method is $${\tilde{O}(d/(\lambda \epsilon))}$$, where d is a bound on the number of non-zero features in each example. Since the run-time does not depend directly on the size of the training set, the resulting algorithm is especially suited for learning from large datasets. Our approach also extends to non-linear kernels while working solely on the primal objective function, though in this case the runtime does depend linearly on the training set size. Our algorithm is particularly well suited for large text classification problems, where we demonstrate an order-of-magnitude speedup over previous SVM learning methods.

...read moreread less

2,037 citations

Patent•

Ad hoc networking

[...]

Christopher A. Tillman¹•Institutions (1)

Google¹

28 Sep 2011

TL;DR: In Ad Hoc Networking, Charles Perkins has compiled a series of technical papers about networking on the fly from a variety of laboratories and experts that explains the latest thinking on how mobile devices can best discover, identify, and communicate with other devices in the vicinity.

...read moreread less

Abstract: Embodiments disclosed herein relate to ad hoc networking. An embodiment includes computing a routing score for a source node based on at least hardware capabilities of the source node, applications available to the source node, and networking capabilities of the source node. The embodiment further includes receiving at the source node, one or more routing scores from intermediate nodes directly or indirectly connected to the source node, and sending the data to the destination node based on at least the routing scores received from each intermediate node and one or more route paths associated with each intermediate node.

...read moreread less

1,378 citations

Journal Article•DOI•

Building Rome in a day

[...]

Sameer Agarwal¹, Yasutaka Furukawa¹, Noah Snavely², Ian Simon³, Brian Curless⁴, Steven M. Seitz⁴, Richard Szeliski³ - Show less +3 more•Institutions (4)

Google¹, Cornell University², Microsoft³, University of Washington⁴

01 Oct 2011-Communications of The ACM

TL;DR: A system that can match and reconstruct 3D scenes from extremely large collections of photographs such as those found by searching for a given city on Internet photo sharing sites and is designed to scale gracefully with both the size of the problem and the amount of available computation.

...read moreread less

Abstract: We present a system that can reconstruct 3D geometry from large, unorganized collections of photographs such as those found by searching for a given city (e.g., Rome) on Internet photo-sharing sites. Our system is built on a set of new, distributed computer vision algorithms for image matching and 3D reconstruction, designed to maximize parallelism at each stage of the pipeline and to scale gracefully with both the size of the problem and the amount of available computation. Our experimental results demonstrate that it is now possible to reconstruct city-scale image collections with more than a hundred thousand images in less than a day.

...read moreread less

1,307 citations

Journal Article•DOI•

Crowdsourcing systems on the World-Wide Web

[...]

AnHai Doan¹, Raghu Ramakrishnan², Alon Halevy³•Institutions (3)

University of Wisconsin-Madison¹, Yahoo!², Google³

01 Apr 2011-Communications of The ACM

TL;DR: The practice of crowdsourcing is transforming the Web and giving rise to a new field of inquiry called "crowdsourcing", which aims to provide real-time information about events in a democratic manner.

...read moreread less

1,165 citations

Measuring the objectness of image windows

[...]

Bogdan Alexe¹, Thomas Deselaers², Vittorio Ferrari³•Institutions (3)

ETH Zurich¹, Google², University of Edinburgh³

01 Aug 2011

TL;DR: A generic objectness measure, quantifying how likely it is for an image window to contain an object of any class, and uses objectness as a complementary score in addition to the class-specific model, which leads to fewer false positives.

...read moreread less

Abstract: We present a generic objectness measure, quantifying how likely it is for an image window to contain an object of any class. We explicitly train it to distinguish objects with a well-defined boundary in space, such as cows and telephones, from amorphous background elements, such as grass and road. The measure combines in a Bayesian framework several image cues measuring characteristics of objects, such as appearing different from their surroundings and having a closed boundary. These include an innovative cue to measure the closed boundary characteristic. In experiments on the challenging PASCAL VOC 07 dataset, we show this new cue to outperform a state-of-the-art saliency measure, and the combined objectness measure to perform better than any cue alone. We also compare to interest point operators, a HOG detector, and three recent works aiming at automatic object segmentation. Finally, we present two applications of objectness. In the first, we sample a small numberof windows according to their objectness probability and give an algorithm to employ them as location priors for modern class-specific object detectors. As we show experimentally, this greatly reduces the number of windows evaluated by the expensive class-specific model. In the second application, we use objectness as a complementary score in addition to the class-specific model, which leads to fewer false positives. As shown in several recent papers, objectness can act as a valuable focus of attention mechanism in many other applications operating on image windows, including weakly supervised learning of object categories, unsupervised pixelwise segmentation, and object tracking in video. Computing objectness is very efficient and takes only about 4 sec. per image.

...read moreread less

Book Chapter•DOI•

Advances in Collaborative Filtering

[...]

Yehuda Koren¹, Robert M. Bell²•Institutions (2)

Google¹, AT&T Labs²

01 Jan 2011

TL;DR: In this paper, the authors survey the recent progress in the field of collaborative filtering and describe several extensions that bring competitive accuracy into neighborhood methods, which used to dominate the field and demonstrate how to utilize temporal models and implicit feedback to extend models accuracy.

...read moreread less

Abstract: The collaborative filtering (CF) approach to recommenders has recently enjoyed much interest and progress. The fact that it played a central role within the recently completed Netflix competition has contributed to its popularity. This chapter surveys the recent progress in the field. Matrix factorization techniques, which became a first choice for implementing CF, are described together with recent innovations. We also describe several extensions that bring competitive accuracy into neighborhood methods, which used to dominate the field. The chapter demonstrates how to utilize temporal models and implicit feedback to extend models accuracy. In passing, we include detailed descriptions of some the central methods developed for tackling the challenge of the Netflix Prize competition.

...read moreread less

Proceedings Article•

Hashing with Graphs

[...]

Wei Liu¹, Jun Wang², Sanjiv Kumar³, Shih-Fu Chang¹•Institutions (3)

Columbia University¹, IBM², Google³

28 Jun 2011

TL;DR: This paper proposes a novel graph-based hashing method which automatically discovers the neighborhood structure inherent in the data to learn appropriate compact codes and describes a hierarchical threshold learning procedure in which each eigenfunction yields multiple bits, leading to higher search accuracy.

...read moreread less

Abstract: Hashing is becoming increasingly popular for efficient nearest neighbor search in massive databases. However, learning short codes that yield good search performance is still a challenge. Moreover, in many cases real-world data lives on a low-dimensional manifold, which should be taken into account to capture meaningful nearest neighbors. In this paper, we propose a novel graph-based hashing method which automatically discovers the neighborhood structure inherent in the data to learn appropriate compact codes. To make such an approach computationally feasible, we utilize Anchor Graphs to obtain tractable low-rank adjacency matrices. Our formulation allows constant time hashing of a new data point by extrapolating graph Laplacian eigenvectors to eigenfunctions. Finally, we describe a hierarchical threshold learning procedure in which each eigenfunction yields multiple bits, leading to higher search accuracy. Experimental comparison with the other state-of-the-art methods on two large datasets demonstrates the efficacy of the proposed method.

...read moreread less

Journal Article•DOI•

Cython: The Best of Both Worlds

[...]

S Behnel, Robert Bradshaw¹, Craig Citro¹, Lisandro Dalcin, D. S. Seljebotn², Kurt Smith³ - Show less +2 more•Institutions (3)

Google¹, University of Oslo², University of Wisconsin-Madison³

01 Mar 2011

TL;DR: Cython is a Python language extension that allows explicit type declarations and is compiled directly to C, addressing Python's large overhead for numerical loops and the difficulty of efficiently using existing C and Fortran code, which Cython can interact with natively.

...read moreread less

Abstract: Cython is a Python language extension that allows explicit type declarations and is compiled directly to C. As such, it addresses Python's large overhead for numerical loops and the difficulty of efficiently using existing C and Fortran code, which Cython can interact with natively.

...read moreread less

Proceedings Article•

Learning structured embeddings of knowledge bases

[...]

Antoine Bordes¹, Jason Weston², Ronan Collobert, Yoshua Bengio³•Institutions (3)

Centre national de la recherche scientifique¹, Google², Université de Montréal³

07 Aug 2011

TL;DR: A learning process based on an innovative neural network architecture designed to embed any of these symbolic representations into a more flexible continuous vector space in which the original knowledge is kept and enhanced would allow data from any KB to be easily used in recent machine learning methods for prediction and information retrieval.

...read moreread less

Abstract: Many Knowledge Bases (KBs) are now readily available and encompass colossal quantities of information thanks to either a long-term funding effort (e.g. WordNet, OpenCyc) or a collaborative process (e.g. Freebase, DBpedia). However, each of them is based on a different rigid symbolic framework which makes it hard to use their data in other systems. It is unfortunate because such rich structured knowledge might lead to a huge leap forward in many other areas of AI like natural language processing (word-sense disambiguation, natural language understanding, ...), vision (scene classification, image semantic annotation, ...) or collaborative filtering. In this paper, we present a learning process based on an innovative neural network architecture designed to embed any of these symbolic representations into a more flexible continuous vector space in which the original knowledge is kept and enhanced. These learnt embeddings would allow data from any KB to be easily used in recent machine learning methods for prediction and information retrieval. We illustrate our method on WordNet and Freebase and also present a way to adapt it to knowledge extraction from raw text.

...read moreread less

Posted Content•

Natural Language Processing (almost) from Scratch

[...]

Ronan Collobert, Jason Weston¹, Léon Bottou, Michael Karlen, Koray Kavukcuoglu², Pavel P. Kuksa³ - Show less +2 more•Institutions (3)

Google¹, New York University², Rutgers University³

02 Mar 2011-arXiv: Learning

TL;DR: The authors proposed a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including: part-of-speech tagging, chunking, named entity recognition, and semantic role labeling.

...read moreread less

Abstract: We propose a unified neural network architecture and learning algorithm that can be applied to various natural language processing tasks including: part-of-speech tagging, chunking, named entity recognition, and semantic role labeling. This versatility is achieved by trying to avoid task-specific engineering and therefore disregarding a lot of prior knowledge. Instead of exploiting man-made input features carefully optimized for each task, our system learns internal representations on the basis of vast amounts of mostly unlabeled training data. This work is then used as a basis for building a freely available tagging system with good performance and minimal computational requirements.

...read moreread less

Proceedings Article•DOI•

WSABIE: scaling up to large vocabulary image annotation

[...]

Jason Weston¹, Samy Bengio¹, Nicolas Usunier²•Institutions (2)

Google¹, University of Paris²

16 Jul 2011

TL;DR: This work proposes a strongly performing method that scales to image annotation datasets by simultaneously learning to optimize precision at the top of the ranked list of annotations for a given image and learning a low-dimensional joint embedding space for both images and annotations.

...read moreread less

Abstract: Image annotation datasets are becoming larger and larger, with tens of millions of images and tens of thousands of possible annotations. We propose a strongly performing method that scales to such datasets by simultaneously learning to optimize precision at the top of the ranked list of annotations for a given image and learning a low-dimensional joint embedding space for both images and annotations. Our method, called WSABIE, both outperforms several baseline methods and is faster and consumes less memory.

...read moreread less

Proceedings Article•DOI•

Multicore bundle adjustment

[...]

Changchang Wu¹, Sameer Agarwal², Brian Curless¹, Steven M. Seitz¹•Institutions (2)

University of Washington¹, Google²

20 Jun 2011

TL;DR: The design and implementation of new inexact Newton type Bundle Adjustment algorithms that exploit hardware parallelism for efficiently solving large scale 3D scene reconstruction problems and show that overcoming the severe memory and bandwidth limitations of current generation GPUs not only leads to more space efficient algorithms, but also to surprising savings in runtime.

...read moreread less

Abstract: We present the design and implementation of new inexact Newton type Bundle Adjustment algorithms that exploit hardware parallelism for efficiently solving large scale 3D scene reconstruction problems. We explore the use of multicore CPU as well as multicore GPUs for this purpose. We show that overcoming the severe memory and bandwidth limitations of current generation GPUs not only leads to more space efficient algorithms, but also to surprising savings in runtime. Our CPU based system is up to ten times and our GPU based system is up to thirty times faster than the current state of the art methods [1], while maintaining comparable convergence behavior. The code and additional results are available at http://grail.cs. washington.edu/projects/mcba.

...read moreread less

Proceedings Article•

Megastore: Providing Scalable, Highly Available Storage for Interactive Services

[...]

Jason Baker¹, Christopher N. Bond¹, James C. Corbett¹, J. J. Furman¹, Andrey Khorlin¹, James Larson¹, Jean-Michel Leon¹, Yawei Li¹, Alexander Lloyd¹, Vadim Yushprakh¹ - Show less +6 more•Institutions (1)

Google¹

01 Jan 2011

TL;DR: Megastore provides fully serializable ACID semantics within ne-grained partitions of data, which allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between datacenters.

...read moreread less

Abstract: Megastore is a storage system developed to meet the requirements of today’s interactive online services. Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability. We provide fully serializable ACID semantics within ne-grained partitions of data. This partitioning allows us to synchronously replicate each write across a wide area network with reasonable latency and support seamless failover between datacenters. This paper describes Megastore’s semantics and replication algorithm. It also describes our experience supporting a wide range of Google production services built with Megastore.

...read moreread less

Journal Article•DOI•

Maximizing a Monotone Submodular Function Subject to a Matroid Constraint

[...]

Gruia Calinescu, Chandra Chekuri, Martin Pál¹, Jan Vondrák²•Institutions (2)

Google¹, IBM²

01 Nov 2011-SIAM Journal on Computing

TL;DR: An improved coating pan apparatus and spray arm assembly are disclosed for providing facilitated maintenance and cleaning of sensitive spray nozzles.

...read moreread less

Abstract: Let $f:2^X \rightarrow \cal R_+$ be a monotone submodular set function, and let $(X,\cal I)$ be a matroid. We consider the problem ${\rm max}_{S \in \cal I} f(S)$. It is known that the greedy algorithm yields a $1/2$-approximation [M. L. Fisher, G. L. Nemhauser, and L. A. Wolsey, Math. Programming Stud., no. 8 (1978), pp. 73-87] for this problem. For certain special cases, e.g., ${\rm max}_{|S| \leq k} f(S)$, the greedy algorithm yields a $(1-1/e)$-approximation. It is known that this is optimal both in the value oracle model (where the only access to $f$ is through a black box returning $f(S)$ for a given set $S$) [G. L. Nemhauser and L. A. Wolsey, Math. Oper. Res., 3 (1978), pp. 177-188] and for explicitly posed instances assuming $P eq NP$ [U. Feige, J. ACM, 45 (1998), pp. 634-652]. In this paper, we provide a randomized $(1-1/e)$-approximation for any monotone submodular function and an arbitrary matroid. The algorithm works in the value oracle model. Our main tools are a variant of the pipage rounding technique of Ageev and Sviridenko [J. Combin. Optim., 8 (2004), pp. 307-328], and a continuous greedy process that may be of independent interest. As a special case, our algorithm implies an optimal approximation for the submodular welfare problem in the value oracle model [J. Vondrak, Proceedings of the $38$th ACM Symposium on Theory of Computing, 2008, pp. 67-74]. As a second application, we show that the generalized assignment problem (GAP) is also a special case; although the reduction requires $|X|$ to be exponential in the original problem size, we are able to achieve a $(1-1/e-o(1))$-approximation for GAP, simplifying previously known algorithms. Additionally, the reduction enables us to obtain approximation algorithms for variants of GAP with more general constraints.

...read moreread less

Posted Content•

A Universal Part-of-Speech Tagset

[...]

Slav Petrov¹, Dipanjan Das², Ryan McDonald¹•Institutions (2)

Google¹, Carnegie Mellon University²

11 Apr 2011-arXiv: Computation and Language

TL;DR: This paper proposed a tagset that consists of twelve universal part-of-speech categories and developed a mapping from 25 different treebank tagsets to this universal set, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts of speech for 22 different languages.

...read moreread less

Abstract: To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts-of-speech for 22 different languages. We highlight the use of this resource via two experiments, including one that reports competitive accuracies for unsupervised grammar induction without gold standard part-of-speech tags.

...read moreread less

Proceedings Article•DOI•

CloudScale: elastic resource scaling for multi-tenant cloud systems

[...]

Zhiming Shen¹, Sethuraman Subbiah¹, Xiaohui Gu¹, John Wilkes²•Institutions (2)

North Carolina State University¹, Google²

26 Oct 2011

TL;DR: CloudScale is a system that automates fine-grained elastic resource scaling for multi-tenant cloud computing infrastructures that can achieve significantly higher SLO conformance than other alternatives with low resource and energy cost.

...read moreread less

Abstract: Elastic resource scaling lets cloud systems meet application service level objectives (SLOs) with minimum resource provisioning costs. In this paper, we present CloudScale, a system that automates fine-grained elastic resource scaling for multi-tenant cloud computing infrastructures. CloudScale employs online resource demand prediction and prediction error handling to achieve adaptive resource allocation without assuming any prior knowledge about the applications running inside the cloud. CloudScale can resolve scaling conflicts between applications using migration, and integrates dynamic CPU voltage/frequency scaling to achieve energy savings with minimal effect on application SLOs. We have implemented CloudScale on top of Xen and conducted extensive experiments using a set of CPU and memory intensive applications (RUBiS, Hadoop, IBM System S). The results show that CloudScale can achieve significantly higher SLO conformance than other alternatives with low resource and energy cost. CloudScale is non-intrusive and light-weight, and imposes negligible overhead (

...read moreread less

Patent•

System and method for predicting behaviors of detected objects

[...]

Jiajun Zhu¹, David I. Ferguson¹, Dmitri A. Dolgov¹•Institutions (1)

Google¹

05 Oct 2011

TL;DR: In this article, the features described may be used alone or in combination in order to improve the safety, use, driver experience, and performance of autonomous vehicles, such as self-driving cars.

...read moreread less

Abstract: Aspects of the invention relate generally to autonomous vehicles. Specifically, the features described may be used alone or in combination in order to improve the safety, use, driver experience, and performance of these vehicles.

...read moreread less

Proceedings Article•DOI•

Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

[...]

Jason Mars¹, Lingjia Tang¹, Robert Hundt², Kevin Skadron¹, Mary Lou Soffa¹ - Show less +1 more•Institutions (2)

University of Virginia¹, Google²

03 Dec 2011

TL;DR: Bubble-Up is presented, a characterization methodology that enables the accurate prediction of the performance degradation that results from contention for shared resources in the memory subsystem and can predict the performance interference between co-locate applications with an accuracy within 1% to 2% of the actual performance degradation.

...read moreread less

Abstract: As much of the world's computing continues to move into the cloud, the overprovisioning of computing resources to ensure the performance isolation of latency-sensitive tasks, such as web search, in modern datacenters is a major contributor to low machine utilization. Being unable to accurately predict performance degradation due to contention for shared resources on multicore systems has led to the heavy handed approach of simply disallowing the co-location of high-priority, latency-sensitive tasks with other tasks. Performing this precise prediction has been a challenging and unsolved problem. In this paper, we present Bubble-Up, a characterization methodology that enables the accurate prediction of the performance degradation that results from contention for shared resources in the memory subsystem. By using a bubble to apply a tunable amount of “pressure” to the memory subsystem on processors in production datacenters, our methodology can predict the performance interference between co-locate applications with an accuracy within 1% to 2% of the actual performance degradation. Using this methodology to arrive at “sensible” co-locations in Google's production datacenters with real-world large-scale applications, we can improve the utilization of a 500-machine cluster by 50% to 90% while guaranteeing a high quality of service of latency-sensitive applications.

...read moreread less

Journal Article•DOI•

Parallel Spectral Clustering in Distributed Systems

[...]

Wen-Yen Chen¹, Yangqiu Song², Hongjie Bai³, Chih-Jen Lin⁴, Edward Y. Chang³ - Show less +1 more•Institutions (4)

Yahoo!¹, Microsoft², Google³, National Taiwan University⁴

01 Mar 2011-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work investigates two representative ways of approximating the dense similarity matrix and picks the strategy of sparsifying the matrix via retaining nearest neighbors and investigates its parallelization, which can effectively handle large problems.

...read moreread less

Abstract: Spectral clustering algorithms have been shown to be more effective in finding clusters than some traditional algorithms, such as k-means. However, spectral clustering suffers from a scalability problem in both memory use and computational time when the size of a data set is large. To perform clustering on large data sets, we investigate two representative ways of approximating the dense similarity matrix. We compare one approach by sparsifying the matrix with another by the Nystrom method. We then pick the strategy of sparsifying the matrix via retaining nearest neighbors and investigate its parallelization. We parallelize both memory use and computation on distributed computers. Through an empirical study on a document data set of 193,844 instances and a photo data set of 2,121,863, we show that our parallel algorithm can effectively handle large problems.

...read moreread less

Journal Article•DOI•

Maximizing Non-monotone Submodular Functions

[...]

Uriel Feige¹, Vahab Mirrokni², Jan Vondrák³•Institutions (3)

Weizmann Institute of Science¹, Google², IBM³

01 Jul 2011-SIAM Journal on Computing

TL;DR: This paper designs the first constant-factor approximation algorithms for maximizing nonnegative (non-monotone) submodular functions and proves NP- hardness of $(\frac{5}{6}+\epsilon)$-approximation in the symmetric case and NP-hardness of $\frac{3}{4}+ \epsil on)$ in the general case.

...read moreread less

Abstract: Submodular maximization generalizes many important problems including Max Cut in directed and undirected graphs and hypergraphs, certain constraint satisfaction problems, and maximum facility location problems. Unlike the problem of minimizing submodular functions, the problem of maximizing submodular functions is NP-hard. In this paper, we design the first constant-factor approximation algorithms for maximizing nonnegative (non-monotone) submodular functions. In particular, we give a deterministic local-search $\frac{1}{3}$-approximation and a randomized $\frac{2}{5}$-approximation algorithm for maximizing nonnegative submodular functions. We also show that a uniformly random set gives a $\frac{1}{4}$-approximation. For symmetric submodular functions, we show that a random set gives a $\frac{1}{2}$-approximation, which can also be achieved by deterministic local search. These algorithms work in the value oracle model, where the submodular function is accessible through a black box returning $f(S)$ for a given set $S$. We show that in this model, a $(\frac{1}{2}+\epsilon)$-approximation for symmetric submodular functions would require an exponential number of queries for any fixed $\epsilon>0$. In the model where $f$ is given explicitly (as a sum of nonnegative submodular functions, each depending only on a constant number of elements), we prove NP-hardness of $(\frac{5}{6}+\epsilon)$-approximation in the symmetric case and NP-hardness of $(\frac{3}{4}+\epsilon)$-approximation in the general case.

...read moreread less

Journal Article•DOI•

Faster least squares approximation

[...]

Petros Drineas¹, Michael W. Mahoney², S. Muthukrishnan³, Tamas Sarlos⁴•Institutions (4)

Rensselaer Polytechnic Institute¹, Stanford University², Google³, Yahoo!⁴

01 Feb 2011-Numerische Mathematik

TL;DR: This work presents two randomized algorithms that provide accurate relative-error approximations to the optimal value and the solution vector of a least squares approximation problem more rapidly than existing exact algorithms.

...read moreread less

Abstract: Least squares approximation is a technique to find an approximate solution to a system of linear equations that has no exact solution. In a typical setting, one lets n be the number of constraints and d be the number of variables, with $${n \gg d}$$. Then, existing exact methods find a solution vector in O(nd 2) time. We present two randomized algorithms that provide accurate relative-error approximations to the optimal value and the solution vector of a least squares approximation problem more rapidly than existing exact algorithms. Both of our algorithms preprocess the data with the Randomized Hadamard transform. One then uniformly randomly samples constraints and solves the smaller problem on those constraints, and the other performs a sparse random projection and solves the smaller problem on those projected coordinates. In both cases, solving the smaller problem provides relative-error approximations, and, if n is sufficiently larger than d, the approximate solution can be computed in O(nd ln d) time.

...read moreread less

Proceedings Article•DOI•

User-defined motion gestures for mobile interaction

[...]

Jaime Ruiz¹, Yang Li², Edward Lank¹•Institutions (2)

University of Waterloo¹, Google²

07 May 2011

TL;DR: It is demonstrated that consensus exists among participants on parameters of movement and on mappings of motion gestures onto commands, and this consensus is used to develop a taxonomy for motion gestures and to specify an end-user inspired motion gesture set.

...read moreread less

Abstract: Modern smartphones contain sophisticated sensors to monitor three-dimensional movement of the device. These sensors permit devices to recognize motion gestures - deliberate movements of the device by end-users to invoke commands. However, little is known about best-practices in motion gesture design for the mobile computing paradigm. To address this issue, we present the results of a guessability study that elicits end-user motion gestures to invoke commands on a smartphone device. We demonstrate that consensus exists among our participants on parameters of movement and on mappings of motion gestures onto commands. We use this consensus to develop a taxonomy for motion gestures and to specify an end-user inspired motion gesture set. We highlight the implications of this work to the design of smartphone applications and hardware. Finally, we argue that our results influence best practices in design for all gestural interfaces.

...read moreread less

Proceedings Article•DOI•

Power management of online data-intensive services

[...]

David Meisner¹, Christopher M. Sadler², Luiz Andre Barroso², Wolf-Dietrich Weber², Thomas F. Wenisch¹ - Show less +1 more•Institutions (2)

University of Michigan¹, Google²

04 Jun 2011

TL;DR: This work evaluates the applicability of active and idle low-power modes to reduce the power consumed by the primary server components (processor, memory, and disk), while maintaining tight response time constraints, particularly on 95th-percentile latency.

...read moreread less

Abstract: Much of the success of the Internet services model can be attributed to the popularity of a class of workloads that we call Online Data-Intensive (OLDI) services. These workloads perform significant computing over massive data sets per user request but, unlike their offline counterparts (such as MapReduce computations), they require responsiveness in the sub-second time scale at high request rates. Large search products, online advertising, and machine translation are examples of workloads in this class. Although the load in OLDI services can vary widely during the day, their energy consumption sees little variance due to the lack of energy proportionality of the underlying machinery. The scale and latency sensitivity of OLDI workloads also make them a challenging target for power management techniques. We investigate what, if anything, can be done to make OLDI systems more energy-proportional. Specifically, we evaluate the applicability of active and idle low-power modes to reduce the power consumed by the primary server components (processor, memory, and disk), while maintaining tight response time constraints, particularly on 95th-percentile latency. Using Web search as a representative example of this workload class, we first characterize a production Web search workload at cluster-wide scale. We provide a fine-grain characterization and expose the opportunity for power savings using low-power modes of each primary server component. Second, we develop and validate a performance model to evaluate the impact of processor- and memory-based low-power modes on the search latency distribution and consider the benefit of current and foreseeable low-power modes. Our results highlight the challenges of power management for this class of workloads. In contrast to other server workloads, for which idle low-power modes have shown great promise, for OLDI workloads we find that energy-proportionality with acceptable query latency can only be achieved using coordinated, full-system active low-power modes.

...read moreread less

Proceedings Article•

Co-Training for Domain Adaptation

[...]

Minmin Chen¹, Kilian Q. Weinberger¹, John Blitzer²•Institutions (2)

Washington University in St. Louis¹, Google²

12 Dec 2011

TL;DR: An algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident, and is named CODA (Co-training for domain adaptation).

...read moreread less

Abstract: Domain adaptation algorithms seek to generalize a model trained in a source domain to a new target domain. In many practical cases, the source and target distributions can differ substantially, and in some cases crucial target features may not have support in the source domain. In this paper we introduce an algorithm that bridges the gap between source and target domains by slowly adding to the training set both the target features and instances in which the current algorithm is the most confident. Our algorithm is a variant of co-training [7], and we name it CODA (Co-training for domain adaptation). Unlike the original co-training work, we do not assume a particular feature split. Instead, for each iteration of co-training, we formulate a single optimization problem which simultaneously learns a target predictor, a split of the feature space into views, and a subset of source and target features to include in the predictor. CODA significantly out-performs the state-of-the-art on the 12-domain benchmark data set of Blitzer et al. [4]. Indeed, over a wide range (65 of 84 comparisons) of target supervision CODA achieves the best performance.

...read moreread less

Collapse