scispace - formally typeset
Journal ArticleDOI

Avatara: OLAP for web-scale analytics products

Reads0
Chats0
TLDR
To serve LinkedIn's growing 160 million member base, the company built a scalable and fast OLAP serving system called Avatara to solve the many, small cubes problem.
Abstract
Multidimensional data generated by members on websites has seen massive growth in recent years. OLAP is a well-suited solution for mining and analyzing this data. Providing insights derived from this analysis has become crucial for these websites to give members greater value. For example, LinkedIn, the largest professional social network, provides its professional members rich analytics features like "Who's Viewed My Profile?" and "Who's Viewed This Job?" The data behind these features form cubes that must be efficiently served at scale, and can be neatly sharded to do so. To serve our growing 160 million member base, we built a scalable and fast OLAP serving system called Avatara to solve this many, small cubes problem. At LinkedIn, Avatara has been powering several analytics features on the site for the past two years.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Reference Architecture and Classification of Technologies, Products and Services for Big Data Systems

TL;DR: The contribution of this paper is technology independent reference architecture for big data systems, which is based on analysis of published implementation architectures of big data use cases, and classification of related implementation technologies and products/services, based onAnalysis of published use cases and survey of related work.
Proceedings ArticleDOI

The big data ecosystem at LinkedIn

TL;DR: LinkedIn's Hadoop-based analytics stack is presented, which allows data scientists and machine learning researchers to extract insights and build product features from massive amounts of data, and solutions to the ``last mile'' issues in providing a rich developer ecosystem are presented.
Journal ArticleDOI

Mesa: geo-replicated, near real-time, scalable data warehousing

TL;DR: The Mesa system is presented and reports the performance and scale that it achieves, including near real-time data ingestion and queryability, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes.
Journal ArticleDOI

HaoLap: a Hadoop based OLAP system for big data

TL;DR: HaoLap (Hadoop based oLap), an OLAP (OnLine Analytical Processing) system for big data, which adopts the specified multidimensional model to map the dimensions and the measures and has a great advantage in the OLAP performance of the data set size and query complexity.

Towards a big data reference architecture

TL;DR: The proposed reference architecture and a survey of the current state of art in ‘big data’ technologies guides designers in the creation of systems, which create new value from existing, but also previously under-used data.
References
More filters
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Proceedings ArticleDOI

On power-law relationships of the Internet topology

TL;DR: These power-laws hold for three snapshots of the Internet, between November 1997 and December 1998, despite a 45% growth of its size during that period, and can be used to generate and select realistic topologies for simulation purposes.
Proceedings ArticleDOI

Dynamo: amazon's highly available key-value store

TL;DR: D Dynamo is presented, a highly available key-value storage system that some of Amazon's core services use to provide an "always-on" experience and makes extensive use of object versioning and application-assisted conflict resolution in a manner that provides a novel interface for developers to use.
Book

Hadoop: The Definitive Guide

Tom White
TL;DR: This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoops clusters.
Book

The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling

Ralph Kimball, +1 more
TL;DR: Greatly expanded to cover both basic and advanced techniques for optimizing data warehouse design, this second edition to Ralph Kimball’s classic guide is more than sixty percent updated.
Related Papers (5)