Open AccessBook
Data-Intensive Text Processing with MapReduce
Jimmy Lin,Chris Dyer +1 more
TLDR
This half-day tutorial introduces participants to data-intensive text processing with the MapReduce programming model using the open-source Hadoop implementation, with a focus on scalability and the tradeoffs associated with distributed processing of large datasets.Abstract:
This half-day tutorial introduces participants to data-intensive text processing with the MapReduce programming model [1], using the open-source Hadoop implementation. The focus will be on scalability and the tradeoffs associated with distributed processing of large datasets. Content will include general discussions about algorithm design, presentation of illustrative algorithms, case studies in HLT applications, as well as practical advice in writing Hadoop programs and running Hadoop clusters.read more
Citations
More filters
Book
Ontology Matching
Jérôme Euzenat,Pavel Shvaiko +1 more
TL;DR: The second edition of Ontology Matching has been thoroughly revised and updated to reflect the most recent advances in this quickly developing area, which resulted in more than 150 pages of new content.
Journal ArticleDOI
Toward Scalable Systems for Big Data Analytics: A Technology Tutorial
TL;DR: This paper presents a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics, and presents the prevalent Hadoop framework for addressing big data challenges.
Journal ArticleDOI
Parallel data processing with MapReduce: a survey
TL;DR: In this survey, the MapReduce framework is characterized and its inherent pros and cons are discussed, and its optimization strategies reported in the recent literature are introduced.
Proceedings ArticleDOI
WTF: the who to follow service at Twitter
TL;DR: An architectural overview of the architecture of WTF is provided and a few graph recommendation algorithms implemented in Cassovary are described and evaluated, including a novel approach based on a combination of random walks and SALSA.
Proceedings ArticleDOI
Counting triangles and the curse of the last reducer
TL;DR: This work describes a sequential triangle counting algorithm and shows how to adapt it to the MapReduce setting, and presents a new algorithm designed specifically for the Map Reduce framework that achieves a factor of 10-100 speed up over the naive approach.
References
More filters
Journal ArticleDOI
Maximum likelihood from incomplete data via the EM algorithm
Journal ArticleDOI
Collective dynamics of small-world networks
TL;DR: Simple models of networks that can be tuned through this middle ground: regular networks ‘rewired’ to introduce increasing amounts of disorder are explored, finding that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs.
Journal ArticleDOI
The Strength of Weak Ties
TL;DR: In this paper, it is argued that the degree of overlap of two individuals' friendship networks varies directly with the strength of their tie to one another, and the impact of this principle on diffusion of influence and information, mobility opportunity, and community organization is explored.
Journal ArticleDOI
A tutorial on hidden Markov models and selected applications in speech recognition
TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Book
Introduction to Algorithms
TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.