Accessing composite data structures in tiered storage across network nodes

Home
/
Papers
/
Accessing composite data structures in tiered storage across network nodes

Patent•

Accessing composite data structures in tiered storage across network nodes

George Johnu¹, Amit Kumar Saha¹, Arun Saha¹, Debojyoti Dutta•Institutions (1)

27 Feb 2018-

TL;DR: In this article, the optimal storage of data structures across different memory devices is associated with physically disparate network nodes, and a process of the technology can include steps for receiving a first retrieval request for a first object, searching a local PMEM device for the first object based on the first retrieval requests, in response to a failure to find the first item on the local MPEM device, transmitting a second retrieval request to a remote node, wherein the second retrieval requests is configured to cause the remote node to retrieve the first device from a remote PMEM devices.

read less

Abstract: Aspects of the disclosed technology relate to ways to determine the optimal storage of data structures across different memory device is associated with physically disparate network nodes. In some aspects, a process of the technology can include steps for receiving a first retrieval request for a first object, searching a local PMEM device for the first object based on the first retrieval request, in response to a failure to find the first object on the local PMEM device, transmitting a second retrieval request to a remote node, wherein the second retrieval request is configured to cause the remote node to retrieve the first object from a remote PMEM device. Systems and machine-readable media are also provided.

...read moreread less

Citations

PDF

Open Access

More filters

Patent•

Systems and methods for managing distributed database deployments

[...]

Eliot Horowitz, John Morales, Cory P. Mintz, Louisa Berger, Cailin Anne Nelson - Show less +1 more

20 Jun 2017

TL;DR: In this paper, the authors provide a database as a cloud service that eliminates the design challenges associated with many distributed database implementations, while allowing the client's input on configuration choices in building the database, so that clients can simply identity a number of database nodes, capability of the nodes, and within minutes have a fully functioning, scalable, replicated, and secure distributed database in the cloud.

...read moreread less

Abstract: Various aspects provide for implementation of a cloud service for running, monitoring, and maintaining cloud distributed database deployments and in particular examples, provides cloud based services to run, monitor and maintain deployments of the known MongoDB database. Various embodiments provide services, interfaces, and manage provisioning of dedicated servers for the distributed database instances (e.g., MongoDB instances). Further aspects, including providing a database as a cloud service that eliminates the design challenges associated with many distributed database implementations, while allowing the client's input on configuration choices in building the database. In some implementations, clients can simply identity a number of database nodes, capability of the nodes, and within minutes have a fully functioning, scalable, replicated, and secure distributed database in the cloud.

...read moreread less

74 citations

Patent•

System and method for determining consensus within a distributed database

[...]

Eliot Horowitz, Andrew Michalski Schwerin, Siyuan Zhou, Eric Andrew Milkie

25 May 2017

TL;DR: In this paper, a protocol is provided that reduces or eliminates heartbeat communication between nodes of a replica set. But it does not provide a method for determining consensus within a distributed database.

...read moreread less

Abstract: A system and method for determining consensus within a distributed database are provided. According to one aspect, a protocol is provided that reduces or eliminates heartbeat communication between nodes of a replica set. Nodes may communicate liveness information using existing database commands and metadata associated with the database commands. According to another aspect, improved systems and methods are provided for detection of node failures and election of a new primary node.

...read moreread less

35 citations

Patent•

Distributed database systems and methods with encrypted storage engines

[...]

Eliot Horowitz, Per Andreas Nilsson

25 May 2017

TL;DR: In this article, the authors describe a distributed database system that includes an encryption API configured to initialize callback functions for encrypting and decrypting database data, a storage API for executing the call back functions, a database API configuring to manage database operations (e.g., read and write requests), wherein the database API calls the storage API to access data on a stable storage medium.

...read moreread less

Abstract: Methods and systems are provided for selectively employing storage engines in a distributed database environment. The methods and systems can include a processor configured to execute a plurality of system components, that comprise an operation prediction component for determining an expected set of operations to be performed on a portion of the database; a data format selection component for selecting, based on at least one characteristic of the expected set of operations, and at least one storage engine for writing the portion of the database in a selected data format. According to one embodiment, the system includes an encryption API configured to initialize callback functions for encrypting and decrypting database data, a storage API for executing the call back functions, a database API configured to manage database operations (e.g., read and write requests), wherein the database API calls the storage API to access data on a stable storage medium.

...read moreread less

32 citations

Patent•

Systems and methods for database zone sharding and api integration

[...]

Dwight Merriman, Eliot Horowitz, Cory P. Mintz, Cailin Anne Nelson, Akshay Kumar, David Lenox Storch, Charles William Swanson¹, Keith Bostic¹, Michael Cahill¹, Dan Pasette¹, Mathias Benjamin Stearn¹, Geert Bosch¹ - Show less +8 more•Institutions (1)

Bosch¹

20 Jun 2018

TL;DR: In this paper, the authors present a system and methods to enable control and placement of data repositories, where the system segments data into zones, such that data related to operations executed in North America and transactions in Europe can be placed in the same zones.

...read moreread less

Abstract: Systems and methods are provided to enable control and placement of data repositories. In some embodiments, the system segments data into zones. A website, for example, may need to segment data according to location. In this example, a zone may be created for North America and another zone may be created for Europe. Data related to operations executed in North America, for example, can be placed in the North America zone and data related to transactions in Europe can be placed in the Europe zone. According to some embodiments, the system may use zones to accommodate a range of deployment scenarios.

...read moreread less

22 citations

Patent•

Method and apparatus for reading and writing committed data

[...]

Eliot Horowitz, Andrew Michalski Schwerin, Mathias Benjamin Stearn, Eric Andrew Milkie

25 May 2017

TL;DR: In this article, a database system comprising a processor configured to execute a plurality of system components is provided, which includes an interface component configured to receive a write commit command and provide a write-commit confirmation, a snapshot component configurable to generate a plurality snapshot of data stored in a data storage node of a pluralityof data storage nodes and identify a committed snapshot representative of data that has been replicated on a majority of the plurality of data storage devices.

...read moreread less

Abstract: According to some aspects, a database system comprising a processor configured to execute a plurality of system components is provided. The plurality of system components may include an interface component configured to receive a write commit command and provide a write commit confirmation, a snapshot component configured to generate a plurality of snapshots of data stored in a data storage node of a plurality of data storage nodes and identify a committed snapshot representative of data that has been replicated on a majority of the plurality of data storage nodes, and a command processing component configured to modify a data element based on the write commit command, determine whether the majority of the plurality of storage nodes have replicated the modification using the committed snapshot, and generate the write commit confirmation responsive to a determination that the majority of the plurality of data storage nodes have replicated the modification.

...read moreread less

16 citations

References

PDF

Open Access

More filters

Proceedings Article•DOI•

NV-Heaps: making persistent objects fast and safe with next-generation, non-volatile memories

[...]

Joel Coburn¹, Adrian M. Caulfield¹, Akel Ameen D¹, Laura M. Grupp¹, Rajesh Gupta¹, Ranjit Jhala¹, Steven Swanson¹ - Show less +3 more•Institutions (1)

University of California, San Diego¹

05 Mar 2011

TL;DR: A lightweight, high-performance persistent object system called NV-heaps is implemented that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about.

...read moreread less

Abstract: Persistent, user-defined objects present an attractive abstraction for working with non-volatile program state. However, the slow speed of persistent storage (i.e., disk) has restricted their design and limited their performance. Fast, byte-addressable, non-volatile technologies, such as phase change memory, will remove this constraint and allow programmers to build high-performance, persistent data structures in non-volatile storage that is almost as fast as DRAM. Creating these data structures requires a system that is lightweight enough to expose the performance of the underlying memories but also ensures safety in the presence of application and system failures by avoiding familiar bugs such as dangling pointers, multiple free()s, and locking errors. In addition, the system must prevent new types of hard-to-find pointer safety bugs that only arise with persistent objects. These bugs are especially dangerous since any corruption they cause will be permanent.We have implemented a lightweight, high-performance persistent object system called NV-heaps that provides transactional semantics while preventing these errors and providing a model for persistence that is easy to use and reason about. We implement search trees, hash tables, sparse graphs, and arrays using NV-heaps, BerkeleyDB, and Stasis. Our results show that NV-heap performance scales with thread count and that data structures implemented using NV-heaps out-perform BerkeleyDB and Stasis implementations by 32x and 244x, respectively, by avoiding the operating system and minimizing other software overheads. We also quantify the cost of enforcing the safety guarantees that NV-heaps provide and measure the costs of NV-heap primitive operations.

...read moreread less

850 citations

Proceedings Article•DOI•

Consistent and durable data structures for non-volatile byte-addressable memory

[...]

Shivaram Venkataraman¹, Niraj Tolia, Parthasarathy Ranganathan², Roy H. Campbell¹•Institutions (2)

University of Illinois at Urbana–Champaign¹, Hewlett-Packard²

15 Feb 2011

TL;DR: This paper presents Consistent and Durable Data Structures (CDDSs), a single-level data store that, on current hardware, allows programmers to safely exploit the low-latency and non-volatile aspects of new memory technologies.

...read moreread less

Abstract: The predicted shift to non-volatile, byte-addressable memory (e.g., Phase Change Memory and Memristor), the growth of "big data", and the subsequent emergence of frameworks such as memcached and NoSQL systems require us to rethink the design of data stores. To derive the maximum performance from these new memory technologies, this paper proposes the use of single-level data stores. For these systems, where no distinction is made between a volatile and a persistent copy of data, we present Consistent and Durable Data Structures (CDDSs) that, on current hardware, allows programmers to safely exploit the low-latency and non-volatile aspects of new memory technologies. CDDSs use versioning to allow atomic updates without requiring logging. The same versioning scheme also enables rollback for failure recovery. When compared to a memory-backed Berkeley DB B-Tree, our prototype-based results show that a CDDS B-Tree can increase put and get throughput by 74% and 138%. When compared to Cassandra, a two-level data store, Tembo, a CDDS B-Tree enabled distributed Key-Value system, increases throughput by up to 250%-286%.

...read moreread less

403 citations

Proceedings Article•DOI•

Data tiering in heterogeneous memory systems

[...]

Subramanya R. Dulloor¹, Amitabha Roy², Zheguang Zhao³, Narayanan Sundaram², Nadathur Satish², Rajesh M. Sankaran², Jeff Jackson², Karsten Schwan¹ - Show less +4 more•Institutions (3)

Georgia Institute of Technology¹, Intel², Brown University³

18 Apr 2016

TL;DR: The contribution of this paper is the design and implementation of a set of libraries and automatic tools that enables programmers to achieve optimal data placement with minimal effort on their part and shows that it is indeed possible to use a mix of a small amount of fast DRAM and large amounts of slower NVM without a proportional impact to an application's performance.

...read moreread less

Abstract: Memory-based data center applications require increasingly large memory capacities, but face the challenges posed by the inherent difficulties in scaling DRAM and also the cost of DRAM. Future systems are attempting to address these demands with heterogeneous memory architectures coupling DRAM with high capacity, low cost, but also lower performance, non-volatile memories (NVM) such as PCM and RRAM. A key usage model intended for NVM is as cheaper high capacity volatile memory. Data center operators are bound to ask whether this model for the usage of NVM to replace the majority of DRAM memory leads to a large slowdown in their applications? It is crucial to answer this question because a large performance impact will be an impediment to the adoption of such systems. This paper presents a thorough study of representative applications -- including a key-value store (MemC3), an in-memory database (VoltDB), and a graph analytics framework (GraphMat) -- on a platform that is capable of emulating a mix of memory technologies. Our conclusions are that it is indeed possible to use a mix of a small amount of fast DRAM and large amounts of slower NVM without a proportional impact to an application's performance. The caveat is that this result can only be achieved through careful placement of data structures. The contribution of this paper is the design and implementation of a set of libraries and automatic tools that enables programmers to achieve optimal data placement with minimal effort on their part. With such guided placement and with DRAM constituting only 6% of the total memory footprint for GraphMat and 25% for VoltDB and MemC3 (remaining memory is NVM with 4x higher latency and 8x lower bandwidth than DRAM), we show that our target applications demonstrate only a 13% to 40% slowdown. Without guided placement, these applications see, in the worst case, 1.5x to 5.9x slowdown on the same configuration. Based on a realistic assumption that NVM will be 5x cheaper (per bit) than DRAM, this hybrid solution also results in 2x to 2.8x better performance/$ than a DRAM-only system.

...read moreread less

189 citations

Patent•

Erasure coding across multiple zones and sub-zones

[...]

Sergey Yekhanin¹, Huseyin Simitci¹, Aaron W. Ogus¹, Jin Li¹, Cheng Huang¹, Parikshit Gopalan¹, Bradley Gene Calder¹ - Show less +3 more•Institutions (1)

Microsoft¹

24 Mar 2014

TL;DR: In this paper, a data chunk is divided into a plurality of sub-fragments, and each of the plurality of reconstruction parities comprises at least one cross-zone parity.

...read moreread less

Abstract: In various embodiments, methods and systems for erasure coding data across multiple storage zones are provided. This may be accomplished by dividing a data chunk into a plurality of sub-fragments. Each of the plurality of sub-fragments is associated with a zone. Zones comprise buildings, data centers, and geographic regions providing a storage service. A plurality of reconstruction parities is computed. Each of the plurality of reconstruction parities computed using at least one sub-fragment from the plurality of sub-fragments. The plurality of reconstruction parities comprises at least one cross-zone parity. The at least one cross-zone parity is assigned to a parity zone. The cross-zone parity provides cross-zone reconstruction of a portion of the data chunk.

...read moreread less

178 citations

Patent•

Continuous full scan data store table and distributed data store featuring predictable answer time for unpredictable workload

[...]

Dietmar Fauser, Jeremy Meyer, Cédric Florimond, Donald Kossmann, Gustavo Alonso, Georgios Giannikis, Philipp Unterbrunner - Show less +3 more

23 Aug 2010

TL;DR: In this paper, a method for storing and retrieving data in a storage node of a data store and storage node, storing in main memory at least one segment of a relational table is described.

...read moreread less

Abstract: A method for storing and retrieving data in a storage node of a data store and storage node of a data store, storing in main-memory at least one segment of a relational table is described. The storage node comprises at least one computational core running at least one scan thread each dedicated to the scanning of one of the at least one segment. The storage node is characterized in that the at least one scan thread uniquely, continuously and exhaustively scans the dedicated segment of the relational table. The storage node receives and processes batches of query and update operations for the at least one segment of the relational table. The query and update operations of a batch are re-indexed at beginning of each scan by the scan thread. Then, the indexed query and update operations of a batch are independently joined to data records of said segment that match with predicates of the indexed query and update operations so that the indexed query and update operations of a batch are progressively fulfilled whenever joined data records are retrieved by the scan thread while scanning said segment. This allows maximizing the sharing and access of data records in main- memory between query and update operations of a batch.

...read moreread less

133 citations