Topic

Data access

About: Data access is a research topic. Over the lifetime, 13141 publications have been published within this topic receiving 172859 citations. The topic is also known as: Data access.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

FPMR: MapReduce framework on FPGA

[...]

Yi Shan¹, Bo Wang¹, Jing Yan¹, Yu Wang¹, Ningyi Xu², Huazhong Yang¹ - Show less +2 more•Institutions (2)

Tsinghua University¹, Microsoft²

21 Feb 2010

TL;DR: FPMR, a MapReduce framework on FPGA, which provides programming abstraction, hardware architecture, and basic building blocks to developers so that more attention can be paid to the application itself and the speedup of this framework is demonstrated.

...read moreread less

Abstract: Machine learning and data mining are gaining increasing attentions of the computing society. FPGA provides a highly parallel, low power, and flexible hardware platform for this domain, while the difficulty of programming FPGA greatly limits its prevalence. MapReduce is a parallel programming framework that could easily utilize inherent parallelism in algorithms. In this paper, we describe FPMR, a MapReduce framework on FPGA, which provides programming abstraction, hardware architecture, and basic building blocks to developers.An on-chip processor scheduler is implemented to maximize the utilization of computation resources and achieve better load balancing. An efficient data access scheme is carefully designed to maximize data reuse and throughput. Meanwhile, the FPMR framework hides the task control, synchronization, and communication away from designers so that more attention can be paid to the application itself. A case study of RankBoost acceleration based on FPMR demonstrates that FPMR efficiently helps with the development productivity; and the speedup is 31.8x versus CPU-based implementation. This performance is comparable to a fully manually designed version, which achieves 33.5x speedup. Two other applications: SVM, PageRank are also discussed to show the generalization of the framework.

...read moreread less

154 citations

Patent•

Multilevel locking system and method

[...]

Daniel Manuel Dias¹, Balakrishna R. Iyer¹, Philip Shi-Lung Yu¹•Institutions (1)

IBM¹

30 Oct 1990

TL;DR: In this paper, a two level lock management system is used to prevent data corruption due to unsynchronized data access by the multiple processors in a multi-processor computer system, where each processor is under the control of separate system software and access a common database.

...read moreread less

Abstract: A multi-processor computer system in which each processor is under the control of separate system software and access a common database. A two level lock management system is used to prevent data corruption due to unsynchronized data access by the multiple processors. By this system, subsets of data in the database are assigned respectively different lock entities. Before a task running on one of the processors access data in the database it first requests permission to access the data in a given mode with reference to the appropriate lock entity. A first level lock manager handles these requests synchronously, using a simplified model of the locking system having shared and exclusive lock modes to either grant or deny the request. All requests are then forwarded to a second level lock manager which grants or denies the requests based on a more robust model of the locking system and queues denied requests. The denied requests are granted, in turn, as the tasks which have been granted access finish processing data in the database.

...read moreread less

153 citations

Patent•

Smart integration engine and metadata-oriented architecture for automatic eii and business integration

[...]

Rabia Mansour, Munir Badir, Amer Majed, Talea Sarsour, Naila Badir - Show less +1 more

09 Aug 2007

TL;DR: In this paper, the authors present systems and methods for automating the EII, using a smart integration engine based on metadata, which is used for seamless integration of a fully-distributed organization with many data sources and technologies.

...read moreread less

Abstract: The present invention discloses systems and methods for automating the EII, using a smart integration engine based on metadata. On-line execution (i.e. data access, retrieval, or update) is automated by integrating heterogeneous data sources via a centralized smart engine based on metadata of all data sources managed in a metadata repository. The data-source assets are mapped to business metadata (terminology) giving programmers the ability to use business terms, and overcome technical terms. IT departments can use the business-level terms for easy and fast programming of all services “at the business level”. The integration is performed by the engine (via pre-configuration) automatically, dynamically, and on-line, regardless of topology or technology changes, without user or administrator intervention. MDOA is a high-level concept in which the metadata maps the technical low-level terms to business high-level terms. MDOA is used for seamless integration of a fully-distributed organization with many data sources and technologies.

...read moreread less

152 citations

Journal Article•DOI•

BlobSeer: Next-generation data management for large scale infrastructures

[...]

Bogdan Nicolae¹, Gabriel Antoniu², Luc Bougé³, Diana Moise², Alexandra Carpen-Amarie² - Show less +1 more•Institutions (3)

University of Rennes¹, French Institute for Research in Computer Science and Automation², École normale supérieure de Cachan³

01 Feb 2011-Journal of Parallel and Distributed Computing

TL;DR: A set of principles for designing highly scalable distributed storage systems that are optimized for heavy data access concurrency and a set of versioning algorithms that enable a high throughput under concurrency are proposed.

...read moreread less

151 citations

Proceedings Article•DOI•

Investigation of Data Locality in MapReduce

[...]

Zhenhua Guo¹, Geoffrey C. Fox¹, Mo Zhou¹•Institutions (1)

Indiana University¹

13 May 2012

TL;DR: This paper builds a mathematical model of scheduling in MapReduce and proposes an algorithm that schedules multiple tasks simultaneously rather than one by one to give optimal data locality and runs extensive experiments to quantify performance improvement of the proposed algorithms and measure how different factors impact data locality.

...read moreread less

Abstract: Traditional HPC architectures separate compute nodes and storage nodes, which are interconnected with high speed links to satisfy data access requirements in multi-user environments. However, the capacity of those high speed links is still much less than the aggregate bandwidth of all compute nodes. In Data Parallel Systems such as GFS/MapReduce, clusters are built with commodity hardware and each node takes the roles of both computation and storage, which makes it possible to bring compute to data. Data locality is a significant advantage of data parallel systems over traditional HPC systems. Good data locality reduces cross-switch network traffic - one of the bottlenecks in data-intensive computing. In this paper, we investigate data locality in depth. Firstly, we build a mathematical model of scheduling in MapReduce and theoretically analyze the impact on data locality of configuration factors, such as the numbers of nodes and tasks. Secondly, we find the default Hadoop scheduling is non-optimal and propose an algorithm that schedules multiple tasks simultaneously rather than one by one to give optimal data locality. Thirdly, we run extensive experiments to quantify performance improvement of our proposed algorithms, measure how different factors impact data locality, and investigate how data locality influences job execution time in both single-cluster and cross-cluster environments.

...read moreread less

151 citations

Collapse

Network Information

Performance

Metrics

13,314

Papers

188,075

Citations

No. of papers in the topic in previous years
Year	Papers
2023	51
2022	125
2021	403
2020	721
2019	906
2018	816

Data access

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics