scispace - formally typeset
Search or ask a question
Author

Xiaoming Li

Bio: Xiaoming Li is an academic researcher from University of Delaware. The author has contributed to research in topics: Stream processing & Extensible programming. The author has an hindex of 1, co-authored 1 publications receiving 9 citations.

Papers
More filters
Journal ArticleDOI
01 Jan 2014
TL;DR: COStream, a programming language based on synchronous data flow execution model for data-driven application, is proposed and a compiler framework for COStream is proposed on general-purpose multi-core architectures.
Abstract: The dataflow programming paradigm shows an important way to improve programming productivity for streaming systems. In this paper we propose COStream, a programming language based on synchronous data flow execution model for data-driven application. We also propose a compiler framework for COStream on general-purpose multi-core architectures. It features an inter-thread software pipelining scheduler to exploit the parallelism among the cores. We implemented the COStream compiler framework on x86 multi-core architecture and performed experiments to evaluate the system.

11 citations


Cited by
More filters
Proceedings ArticleDOI
05 Jun 2017
TL;DR: This paper makes the preliminary attempt to develop the dataflow insight into a specialized graph accelerator and believes that this work would open a wide range of opportunities to improve the performance of computation and memory access for large-scale graph processing.
Abstract: Existing graph processing frameworks greatly improve the performance of memory subsystem, but they are still subject to the underlying modern processor, resulting in the potential inefficiencies for graph processing in the sense of low instruction level parallelism and high branch misprediction. These inefficiencies, in accordance with our comprehensive micro-architectural study, mainly arise out of a wealth of dependencies, serial semantic of instruction streams, and complex conditional instructions in graph processing. In this paper, we propose that a fundamental shift of approach is necessary to break through the inefficiencies of the underlying processor via the dataflow paradigm. It is verified that the idea of applying dataflow approach into graph processing is extremely appealing for the following two reasons. First, as the execution and retirement of instructions only depend on the availability of input data in dataflow model, a high degree of parallelism can be therefore provided to relax the heavy dependency and serial semantic. Second, dataflow is guaranteed to make it possible to reduce the costs of branch misprediction by simultaneously executing all branches of a conditional instruction. Consequently, we make the preliminary attempt to develop the dataflow insight into a specialized graph accelerator. We believe that our work would open a wide range of opportunities to improve the performance of computation and memory access for large-scale graph processing.

8 citations

Journal ArticleDOI
01 Jan 2015
TL;DR: This paper presents Fresh Breeze, a dataflow-based execution and programming model and computer architecture and how it provides a sound basis to develop future computing systems that match the DDDAS challenges.
Abstract: The DDDAS paradigm, unifying applications, mathematical modeling, and sensors, is now more relevant than ever with the advent of Large-Scale/Big-Data and Big-Computing. Large-Scale-Dynamic-Data (advertised as the next wave of Big Data) includes the integrated range of data from high-end systems and instruments together with the dynamic data arising from ubiquitous sensing and control in engineered, natural, and societal systems. In this paper we present Fresh Breeze, a dataflow-based execution and programming model and computer architecture and how it provides a sound basis to develop future computing systems that match the DDDAS challenges. Development of simulation models and a compiler for Fresh Breeze computer systems is discussed and initial results are reported.

8 citations

Dissertation
01 Jan 2015
TL;DR: This thesis presents an end-to-end framework that takes advantage of information gained by locality aware multithreading called “Group Locality” to seek a common ground where both processing and memory resources can collaborate to yield a better overall utilization.
Abstract: When powerful computational cores are matched against limited resources such as bandwidth and memory, competition across computational units can result in serious performance degradation due to contention at different levels of the memory hierarchy. In such a case, it becomes very important to maximize reuse at low latency memory space. The high performance community has been actively working on finding techniques that can improve data locality and reuse, as such endeavors have a direct and positive impact on both performance and power. In order to better utilize low latency memory such as caches and scratch-pad SRAMs, software techniques such as hierarchical tiling have proven very effective. However these techniques still operate under the paradigm that memory agnostic coarse grain parallelism is the norm. Therefore, they function in a coarse-grain fashion where inner tiles run serially. Such behavior can result in an under-utilization of processing and network resources. Even when the inner tiles are assigned to fine grain tasks, their memory agnostic placement will still incur heavy penalties when resources are contended. Finding a balance between parallelism and locality in such cases is an arduous task and it is essential to seek a common ground where both processing and memory resources can collaborate to yield a better overall utilization. This thesis explores the concept of locality aware multithreading called “Group Locality”. Multiple group of threads work together in close proximity in time and space as a unit, taking advantage of each other’s data movement and collaborating at a very fine-grain level. When an execution pattern is known a priori through static analysis and data is shared among different thread groups, the concept of group locality extends further to rearrange data to provide better access pattern for other thread groups. This thesis presents an end-to-end framework that takes advantage of information gained by

4 citations

Journal ArticleDOI
20 Jun 2016
TL;DR: In this article, a Domain Specific Language (DSL) is proposed to specify the elements related to the design phase of an adaptive software, and a functional prototype based on the Sirius plugin for Eclipse is presented.
Abstract: An adaptive software has the ability to modify its own behavior at runtime due to changes in the users and their context in the system, requirements, or environment in which the system is deployed, and thus, give the users a better experience. However, the development of this kind of systems is not a simple task. There are two main issues: (1) there is a lack of languages to specify, unambiguously, the elements related to the design phase. As a consequence, these systems are often developed in an ad-hoc manner, without the required formalism, augmenting the complexity in the process of derivation of design models to the next phases in the development cycle. (2) Design decisions and the adaptation model tend to be directly implemented into the source code and not thoroughly specified at the design level. Since the adaptation models become tangled with the code, system evolution becomes more difficult. To address the above issues, this paper proposes DMLAS, a Domain-Specific Language (DSL) to design adaptive systems. As proof of concept, this paper also provides a functional prototype based on the Sirius plugin for Eclipse. This prototype is a tool to model, in several layers of abstraction, the main components of an adaptive system. The notation used both in the models and the tool was validated against the nine principles for designing cognitively effective visual notations presented by Moody.

3 citations