scispace - formally typeset
Open AccessBook

Data-Intensive Text Processing with MapReduce

TLDR
This half-day tutorial introduces participants to data-intensive text processing with the MapReduce programming model using the open-source Hadoop implementation, with a focus on scalability and the tradeoffs associated with distributed processing of large datasets.
Abstract
This half-day tutorial introduces participants to data-intensive text processing with the MapReduce programming model [1], using the open-source Hadoop implementation. The focus will be on scalability and the tradeoffs associated with distributed processing of large datasets. Content will include general discussions about algorithm design, presentation of illustrative algorithms, case studies in HLT applications, as well as practical advice in writing Hadoop programs and running Hadoop clusters.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Ontology Matching

TL;DR: The second edition of Ontology Matching has been thoroughly revised and updated to reflect the most recent advances in this quickly developing area, which resulted in more than 150 pages of new content.
Journal ArticleDOI

Toward Scalable Systems for Big Data Analytics: A Technology Tutorial

TL;DR: This paper presents a systematic framework to decompose big data systems into four sequential modules, namely data generation, data acquisition, data storage, and data analytics, and presents the prevalent Hadoop framework for addressing big data challenges.
Journal ArticleDOI

Parallel data processing with MapReduce: a survey

TL;DR: In this survey, the MapReduce framework is characterized and its inherent pros and cons are discussed, and its optimization strategies reported in the recent literature are introduced.
Proceedings ArticleDOI

WTF: the who to follow service at Twitter

TL;DR: An architectural overview of the architecture of WTF is provided and a few graph recommendation algorithms implemented in Cassovary are described and evaluated, including a novel approach based on a combination of random walks and SALSA.
Proceedings ArticleDOI

Counting triangles and the curse of the last reducer

TL;DR: This work describes a sequential triangle counting algorithm and shows how to adapt it to the MapReduce setting, and presents a new algorithm designed specifically for the Map Reduce framework that achieves a factor of 10-100 speed up over the naive approach.
References
More filters
Journal ArticleDOI

Collective dynamics of small-world networks

TL;DR: Simple models of networks that can be tuned through this middle ground: regular networks ‘rewired’ to introduce increasing amounts of disorder are explored, finding that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs.
Journal ArticleDOI

The Strength of Weak Ties

TL;DR: In this paper, it is argued that the degree of overlap of two individuals' friendship networks varies directly with the strength of their tie to one another, and the impact of this principle on diffusion of influence and information, mobility opportunity, and community organization is explored.
Journal ArticleDOI

A tutorial on hidden Markov models and selected applications in speech recognition

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Book

Introduction to Algorithms

TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.