scispace - formally typeset
K

Kamil Bajda-Pawlikowski

Researcher at Yale University

Publications -  7
Citations -  1330

Kamil Bajda-Pawlikowski is an academic researcher from Yale University. The author has contributed to research in topics: Parallel database & Scalability. The author has an hindex of 6, co-authored 7 publications receiving 1307 citations.

Papers
More filters
Journal ArticleDOI

HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads

TL;DR: This paper explores the feasibility of building a hybrid system that takes the best features from both technologies; the prototype built approaches parallel databases in performance and efficiency, yet still yields the scalability, fault tolerance, and flexibility of MapReduce-based systems.
Patent

Systems and methods for processing data

TL;DR: In this paper, a system, method, and computer program product for processing data are disclosed, which includes a data processing framework, a plurality of database systems coupled to the data-processing framework, and a storage component in communication with the dataprocessing framework and the plurality database systems, configured to store information about each partition of the data processing task being processed.
Proceedings ArticleDOI

Efficient processing of data warehousing queries in a split execution environment

TL;DR: This work considers processing data warehousing queries over very large datasets by analyzing the complexity of this problem in the split execution environment of HadoopDB, with particular focus on join and aggregation operations.
Proceedings ArticleDOI

HadoopDB in action: building real world applications

TL;DR: A thorough walk-through of how to easily build applications on top of HadoopDB's flexible architecture and versatility with two real world application scenarios: a semantic web data application for protein sequence analysis and a business data warehousing application based on TPC-H.
Patent

Systems and methods for fault tolerant, adaptive execution of arbitrary queries at low latency

TL;DR: In this article, a distributed execution of database queries includes a query server that receives a query to be executed on a database, forms a query plan based on the query, assigns tasks to task slots on a plurality of worker nodes in a cluster, and, upon receipt of a notification that a task has completed on a worker node, immediately assigns an unassigned task to a free task slot on that worker node.