T
Tathagata Das
Researcher at University of California, Berkeley
Publications - 31
Citations - 9347
Tathagata Das is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Stream processing & Spark (mathematics). The author has an hindex of 18, co-authored 28 publications receiving 8464 citations. Previous affiliations of Tathagata Das include Microsoft & Indian Institute of Technology Kharagpur.
Papers
More filters
Proceedings Article
Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing
Matei Zaharia,Mosharaf Chowdhury,Tathagata Das,Ankur Dave,Justin Ma,Murphy McCauley,Michael J. Franklin,Scott Shenker,Ion Stoica +8 more
TL;DR: Resilient Distributed Datasets is presented, a distributed memory abstraction that lets programmers perform in-memory computations on large clusters in a fault-tolerant manner and is implemented in a system called Spark, which is evaluated through a variety of user applications and benchmarks.
Journal ArticleDOI
Apache Spark: a unified engine for big data processing
Matei Zaharia,Reynold Xin,Patrick Wendell,Tathagata Das,Michael Armbrust,Ankur Dave,Xiangrui Meng,Josh Rosen,Shivaram Venkataraman,Michael J. Franklin,Ali Ghodsi,Joseph E. Gonzalez,Scott Shenker,Ion Stoica +13 more
TL;DR: This open source computing framework unifies streaming, batch, and interactive big data workloads to unlock new applications.
Proceedings ArticleDOI
Discretized streams: fault-tolerant streaming computation at scale
TL;DR: D-Streams enable a parallel recovery mechanism that improves efficiency over traditional replication and backup schemes, and tolerates stragglers, and can easily be composed with batch and interactive query models like MapReduce, enabling rich applications that combine these modes.
Proceedings Article
Discretized streams: an efficient and fault-tolerant model for stream processing on large clusters
TL;DR: D-Streams support a new recovery mechanism that improves efficiency over the traditional replication and upstream backup solutions in streaming databases: parallel recovery of lost state across the cluster.
Proceedings ArticleDOI
DeTail: reducing the flow completion time tail in datacenter networks
TL;DR: A new cross-layer network stack aimed at reducing the long tail of flow completion times is presented, which exploits cross- layer information to reduce packet drops, prioritize latency-sensitive flows, and evenly distribute network load, effectively reducing theLong tail offlow completion times.