scispace - formally typeset
Search or ask a question
Author

Hongfeng Yu

Bio: Hongfeng Yu is an academic researcher from University of Nebraska–Lincoln. The author has contributed to research in topics: Visualization & Data visualization. The author has an hindex of 25, co-authored 107 publications receiving 2176 citations. Previous affiliations of Hongfeng Yu include Zhejiang University & University of California, Davis.


Papers
More filters
Proceedings ArticleDOI
10 Nov 2012
TL;DR: The lightweight, flexible framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight.
Abstract: With the onset of extreme-scale computing, I/O constraints make it increasingly difficult for scientists to save a sufficient amount of raw simulation data to persistent storage. One potential solution is to change the data analysis pipeline from a post-process centric to a concurrent approach based on either in-situ or in-transit processing. In this context computations are considered in-situ if they utilize the primary compute resources, while in-transit processing refers to offloading computations to a set of secondary resources using asynchronous data transfers. In this paper we explore the design and implementation of three common analysis techniques typically performed on large-scale scientific simulations: topological analysis, descriptive statistics, and visualization. We summarize algorithmic developments, describe a resource scheduling system to coordinate the execution of various analysis workflows, and discuss our implementation using the DataSpaces and ADIOS frameworks that support efficient data movement between in-situ and in-transit computations. We demonstrate the efficiency of our lightweight, flexible framework by deploying it on the Jaguar XK6 to analyze data generated by S3D, a massively parallel turbulent combustion code. Our framework allows scientists dealing with the data deluge at extreme scale to perform analyses at increased temporal resolutions, mitigate I/O costs, and significantly improve the time to insight.

185 citations

Journal ArticleDOI
TL;DR: This full picture is crucial particularly for capturing and understanding highly intermittent transient phenomena, such as ignition and extinction events in turbulent combustion.
Abstract: As scientific supercomputing moves toward petascale and exascale levels, in situ visualization stands out as a scalable way for scientists to view the data their simulations generate. This full picture is crucial particularly for capturing and understanding highly intermittent transient phenomena, such as ignition and extinction events in turbulent combustion.

182 citations

Proceedings ArticleDOI
11 Nov 2006
TL;DR: This work proposes an end-to-end approach in which all simulation components - meshing, partitioning, solver, and visualization - are tightly coupled and execute in parallel with shared data structures and no intermediate I/O.
Abstract: Parallel supercomputing has traditionally focused on the inner kernel of scientific simulations: the solver. The front and back ends of the simulation pipeline - problem description and interpretation of the output - have taken a back seat to the solver when it comes to attention paid to scalability and performance, and are often relegated to offline, sequential computation. As the largest simulations move beyond the realm of the terascale and into the petascale, this decomposition in tasks and platforms becomes increasingly untenable. We propose an end-to-end approach in which all simulation components - meshing, partitioning, solver, and visualization - are tightly coupled and execute in parallel with shared data structures and no intermediate I/O. We present our implementation of this new approach in the context of octree-based finite element simulation of earthquake ground motion. Performance evaluation on up to 2048 processors demonstrates the ability of the end-to-end approach to overcome the scalability bottlenecks of the traditional approach

167 citations

Journal ArticleDOI
TL;DR: An importance-driven approach to time-varying volume data visualization for enhancing that ability by conducting a block-wise analysis of the data in the joint feature-temporal space and derive an importance curve for each data block based on the formulation of conditional entropy from information theory.
Abstract: The ability to identify and present the most essential aspects of time-varying data is critically important in many areas of science and engineering. This paper introduces an importance-driven approach to time-varying volume data visualization for enhancing that ability. By conducting a block-wise analysis of the data in the joint feature-temporal space, we derive an importance curve for each data block based on the formulation of conditional entropy from information theory. Each curve characterizes the local temporal behavior of the respective block, and clustering the importance curves of all the volume blocks effectively classifies the underlying data. Based on different temporal trends exhibited by importance curves and their clustering results, we suggest several interesting and effective visualization techniques to reveal the important aspects of time-varying data.

164 citations

Journal ArticleDOI
01 Jul 2007
TL;DR: It is conjecture that the most plausible solution for the peta- and exa-scale data problem is to reduce or transform the data in-situ as it is being generated, so the amount of data that must be transferred over the network is kept to a minimum.
Abstract: The growing power of parallel supercomputers gives scientists the ability to simulate more complex problems at higher fidelity, leading to many high-impact scientific advances. To maximize the utilization of the vast amount of data generated by these simulations, scientists also need scalable solutions for studying their data to different extents and at different abstraction levels. As we move into peta- and exa-scale computing, simply dumping as much raw simulation data as the storage capacity allows for post-processing analysis and visualization is no longer a viable approach. A common practice is to use a separate parallel computer to prepare data for subsequent analysis and visualization. A naive realization of this strategy not only limits the amount of data that can be saved, but also turns I/O into a performance bottleneck when using a large parallel system. We conjecture that the most plausible solution for the peta- and exa-scale data problem is to reduce or transform the data in-situ as it is being generated, so the amount of data that must be transferred over the network is kept to a minimum. In this paper, we discuss different approaches to in-situ processing and visualization as well as the results of our preliminary study using large-scale simulation codes on massively parallel supercomputers.

94 citations


Cited by
More filters
01 May 1993
TL;DR: Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems.
Abstract: Three parallel algorithms for classical molecular dynamics are presented. The first assigns each processor a fixed subset of atoms; the second assigns each a fixed subset of inter-atomic forces to compute; the third assigns each a fixed spatial region. The algorithms are suitable for molecular dynamics models which can be difficult to parallelize efficiently—those with short-range forces where the neighbors of each atom change rapidly. They can be implemented on any distributed-memory parallel machine which allows for message-passing of data between independently executing processors. The algorithms are tested on a standard Lennard-Jones benchmark problem for system sizes ranging from 500 to 100,000,000 atoms on several parallel supercomputers--the nCUBE 2, Intel iPSC/860 and Paragon, and Cray T3D. Comparing the results to the fastest reported vectorized Cray Y-MP and C90 algorithm shows that the current generation of parallel machines is competitive with conventional vector supercomputers even for small problems. For large problems, the spatial algorithm achieves parallel efficiencies of 90% and a 1840-node Intel Paragon performs up to 165 faster than a single Cray C9O processor. Trade-offs between the three algorithms and guidelines for adapting them to more complex molecular dynamics simulations are also discussed.

29,323 citations

Book ChapterDOI
01 Dec 2018
TL;DR: Preliminary performance data on a subset of TPC-H is presented and it is shown that the system the team is building, C-Store, is substantially faster than popular commercial products.
Abstract: This paper presents the design of a read-optimized relational DBMS that contrasts sharply with most current systems, which are write-optimized. Among the many differences in its design are: storage of data by column rather than by row, careful coding and packing of objects into storage including main memory during query processing, storing an overlapping collection of column-oriented projections, rather than the current fare of tables and indexes, a non-traditional implementation of transactions which includes high availability and snapshot isolation for read-only transactions, and the extensive use of bitmap indexes to complement B-tree structures.We present preliminary performance data on a subset of TPC-H and show that the system we are building, C-Store, is substantially faster than popular commercial products. Hence, the architecture looks very encouraging.

1,063 citations

Book ChapterDOI
01 Dec 2018
TL;DR: The current RDBMS code lines, while attempting to be a "one size fits all" solution, in fact, excel at nothing and should be retired in favor of a collection of "from scratch" specialized engines.
Abstract: In previous papers [SC05, SBC+07], some of us predicted the end of "one size fits all" as a commercial relational DBMS paradigm. These papers presented reasons and experimental evidence that showed that the major RDBMS vendors can be outperformed by 1--2 orders of magnitude by specialized engines in the data warehouse, stream processing, text, and scientific database markets.Assuming that specialized engines dominate these markets over time, the current relational DBMS code lines will be left with the business data processing (OLTP) market and hybrid markets where more than one kind of capability is required. In this paper we show that current RDBMSs can be beaten by nearly two orders of magnitude in the OLTP market as well. The experimental evidence comes from comparing a new OLTP prototype, H-Store, which we have built at M.I.T. to a popular RDBMS on the standard transactional benchmark, TPC-C.We conclude that the current RDBMS code lines, while attempting to be a "one size fits all" solution, infact, excel at nothing. Hence, they are 25 year old legacy code lines that should be retired in favor of a collection of "from scratch" specialized engines. The DBMS vendors (and the research community) should start with a clean sheet of paper and design systems for yesterday's needs.

679 citations

Journal ArticleDOI
TL;DR: This work presents scalable algorithms for parallel adaptive mesh refinement and coarsening (AMR), partitioning, and 2:1 balancing on computational domains composed of multiple connected two-dimensional quadtrees or three-dimensional octrees, referred to as a forest of octrees.
Abstract: We present scalable algorithms for parallel adaptive mesh refinement and coarsening (AMR), partitioning, and 2:1 balancing on computational domains composed of multiple connected two-dimensional quadtrees or three-dimensional octrees, referred to as a forest of octrees. By distributing the union of octants from all octrees in parallel, we combine the high scalability proven previously for adaptive single-octree algorithms with the geometric flexibility that can be achieved by arbitrarily connected hexahedral macromeshes, in which each macroelement is the root of an adapted octree. A key concept of our approach is an encoding scheme of the interoctree connectivity that permits arbitrary relative orientations between octrees. Based on this encoding we develop interoctree transformations of octants. These form the basis for high-level parallel octree algorithms, which are designed to interact with an application code such as a numerical solver for partial differential equations. We have implemented and tested these algorithms in the p4est software library. We demonstrate the parallel scalability of p4est on its own and in combination with two geophysics codes. Using p4est we generate and adapt multioctree meshes with up to $5.13\times10^{11}$ octants on as many as 220,320 CPU cores and execute the 2:1 balance algorithm in less than 10 seconds per million octants per process.

648 citations