Avatara: OLAP for web-scale analytics products

doi:10.14778/2367502.2367525

Journal ArticleDOI

Avatara: OLAP for web-scale analytics products

Lili Wu, +6 more

- Vol. 5, Iss: 12, pp 1874-1877

Chats0

TLDR

To serve LinkedIn's growing 160 million member base, the company built a scalable and fast OLAP serving system called Avatara to solve the many, small cubes problem.

Abstract:

Multidimensional data generated by members on websites has seen massive growth in recent years. OLAP is a well-suited solution for mining and analyzing this data. Providing insights derived from this analysis has become crucial for these websites to give members greater value. For example, LinkedIn, the largest professional social network, provides its professional members rich analytics features like "Who's Viewed My Profile?" and "Who's Viewed This Job?" The data behind these features form cubes that must be efficiently served at scale, and can be neatly sharded to do so. To serve our growing 160 million member base, we built a scalable and fast OLAP serving system called Avatara to solve this many, small cubes problem. At LinkedIn, Avatara has been powering several analytics features on the site for the past two years.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Reference Architecture and Classification of Technologies, Products and Services for Big Data Systems

Pekka Pääkkönen, +1 more

- 01 Dec 2015 -

Big Data Research

TL;DR: The contribution of this paper is technology independent reference architecture for big data systems, which is based on analysis of published implementation architectures of big data use cases, and classification of related implementation technologies and products/services, based onAnalysis of published use cases and survey of related work.

...read moreread less

Proceedings ArticleDOI

The big data ecosystem at LinkedIn

Roshan Rajesh Sumbaly, +2 more

TL;DR: LinkedIn's Hadoop-based analytics stack is presented, which allows data scientists and machine learning researchers to extract insights and build product features from massive amounts of data, and solutions to the ``last mile'' issues in providing a rich developer ecosystem are presented.

...read moreread less

Journal ArticleDOI

HaoLap: a Hadoop based OLAP system for big data

Jie Song, +5 more

- 01 Apr 2015 -

Journal of Systems and Software

TL;DR: HaoLap (Hadoop based oLap), an OLAP (OnLine Analytical Processing) system for big data, which adopts the specified multidimensional model to map the dimensions and the measures and has a great advantage in the OLAP performance of the data set size and query complexity.

...read moreread less

Towards a big data reference architecture

M Markus Maier

TL;DR: The proposed reference architecture and a survey of the current state of art in ‘big data’ technologies guides designers in the creation of systems, which create new value from existing, but also previously under-used data.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

Proceedings ArticleDOI

On power-law relationships of the Internet topology

Michalis Faloutsos, +2 more

TL;DR: These power-laws hold for three snapshots of the Internet, between November 1997 and December 1998, despite a 45% growth of its size during that period, and can be used to generate and select realistic topologies for simulation purposes.

...read moreread less

Proceedings ArticleDOI

Dynamo: amazon's highly available key-value store

Giuseppe deCandia, +8 more

TL;DR: D Dynamo is presented, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience and makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.

...read moreread less

Book

Hadoop: The Definitive Guide

Tom White

TL;DR: This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoops clusters.

...read moreread less

Book

The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling

Ralph Kimball, +1 more

TL;DR: Greatly expanded to cover both basic and advanced techniques for optimizing data warehouse design, this second edition to Ralph Kimballs classic guide is more than sixty percent updated.

...read moreread less