Home
/
Authors
/
Jianxin Xiong

Author

Jianxin Xiong

Other affiliations: University of Illinois at Urbana–Champaign

Bio: Jianxin Xiong is an academic researcher from University of Cambridge. The author has contributed to research in topics: Compiler & Multidimensional signal processing. The author has an hindex of 6, co-authored 7 publications receiving 1269 citations. Previous affiliations of Jianxin Xiong include University of Illinois at Urbana–Champaign.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

SPIRAL: Code Generation for DSP Transforms

[...]

Markus Püschel¹, Jose M. F. Moura¹, Jeremy Johnson², David Padua³, Manuela Veloso¹, Bryan Singer, Jianxin Xiong⁴, Franz Franchetti¹, A. Gacic¹, Yevgen Voronenko¹, K. Chen⁵, R. W. Johnson, Nick Rizzolo³ - Show less +9 more•Institutions (5)

Carnegie Mellon University¹, Drexel University², University of Illinois at Urbana–Champaign³, University of Cambridge⁴, STMicroelectronics⁵

27 Jun 2005

TL;DR: SPIRAL generates high-performance code for a broad set of DSP transforms, including the discrete Fourier transform, other trigonometric transforms, filter transforms, and discrete wavelet transforms.

...read moreread less

Abstract: Fast changing, increasingly complex, and diverse computing platforms pose central problems in scientific computing: How to achieve, with reasonable effort, portable optimal performance? We present SPIRAL, which considers this problem for the performance-critical domain of linear digital signal processing (DSP) transforms. For a specified transform, SPIRAL automatically generates high-performance code that is tuned to the given platform. SPIRAL formulates the tuning as an optimization problem and exploits the domain-specific mathematical structure of transform algorithms to implement a feedback-driven optimizer. Similar to a human expert, for a specified transform, SPIRAL "intelligently" generates and explores algorithmic and implementation choices to find the best match to the computer's microarchitecture. The "intelligence" is provided by search and learning techniques that exploit the structure of the algorithm and implementation space to guide the exploration and optimization. SPIRAL generates high-performance code for a broad set of DSP transforms, including the discrete Fourier transform, other trigonometric transforms, filter transforms, and discrete wavelet transforms. Experimental results show that the code generated by SPIRAL competes with, and sometimes outperforms, the best available human tuned transform library code.

...read moreread less

853 citations

Proceedings Article•DOI•

SPL: a language and compiler for DSP algorithms

[...]

Jianxin Xiong¹, Jeremy Johnson, Robert E. Johnson¹, David Padua¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 May 2001

TL;DR: The design and implementation of a compiler that translates formulas representing signal processing transforms into efficient C or Fortran programs are discussed, and SPL formulations of the fast Fourier transform (FFT) are used to evaluate the compiler.

...read moreread less

Abstract: We discuss the design and implementation of a compiler that translates formulas representing signal processing transforms into efficient C or Fortran programs. The formulas are represented in a language that we call SPL, an acronym from Signal Processing Language. The compiler is a component of the SPIRAL system which makes use of formula transformations and intelligent search strategies to automatically generate optimized digital signal processing (DSP) libraries. After a discussion of the translation and optimization techniques implemented in the compiler, we use SPL formulations of the fast Fourier transform (FFT) to evaluate the compiler. Our results show that SPIRAL, which can be used to implement many classes of algorithms, produces programs that perform as well as “hard-wired” systems like FFTW.

...read moreread less

208 citations

Journal Article•DOI•

Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms

[...]

Markus Püschel¹, Jose M. F. Moura¹, Bryan Singer, Jianxin Xiong², Jeremy Johnson³, David Padua⁴, Manuela Veloso¹, Robert W. Johnson - Show less +4 more•Institutions (4)

Carnegie Mellon University¹, University of Cambridge², Drexel University³, University of Illinois at Urbana–Champaign⁴

01 Feb 2004

TL;DR: The main components of SPIRAL are described: the mathematical framework that concisely describes signal transforms and their fast algorithms; the formula generator that captures at the algorithmic level the degrees of freedom in expressing a particular signal processing transform; a formula translator that encapsulates the compilation degrees offreedom when translating a specific algorithm into an actual code implementation.

...read moreread less

Abstract: SPIRAL is a generator for libraries of fast software implementations of linear signal processing transforms. These libraries are adapted to the computing platform and can be re-optimized as the hardware is upgraded or replaced. This paper describes the main components of SPIRAL: the mathematical framework that concisely describes signal transforms and their fast algorithms; the formula generator that captures at the algorithmic level the degrees of freedom in expressing a particular signal processing transform; the formula translator that encapsulates the compilation degrees of freedom when translating a specific algorithm into an actual code implementation; and, finally, an intelligent search engine that finds within the large space of alternative formulas and implementations the "best" match to the given computing platform. We present empirical data that demonstrate the high performance of SPIRAL generated code.

...read moreread less

206 citations

Book Chapter•DOI•

Searching for the Best FFT Formulas with the SPL Compiler

[...]

Jeremy Johnson¹, Robert W. Johnson, David Padua², Jianxin Xiong²•Institutions (2)

Drexel University¹, University of Illinois at Urbana–Champaign²

10 Aug 2000

TL;DR: This paper presents an application of a approach to implementing and optimizing fast signal transforms based on a domain-specific computer language, called SPL, to the implementation of the FFT.

...read moreread less

Abstract: This paper discuss an approach to implementing and optimizing fast signal transforms based on a domain-specific computer language, called SPL. SPL programs, which are essentially mathematical formulas, represent matrix factorizations, which provide fast algorithms for computing many important signal transforms. A special purpose compiler translates SPL programs into efficient FORTRAN programs. Since there are many formulas for a given transform, a fast implementation can be obtained by generating alternative formulas and searching for the one with the fastest execution time. This paper presents an application of this methodology to the implementation of the FFT.

...read moreread less

19 citations

Journal Article•

Erratum: SPIRAL: A generator for platform-adapted libraries of signal processing algorithms (High Performance Computing Applications (2004) 18 (21-45))

[...]

Markus Püschel¹, Jose M. F. Moura¹, Bryan Singer, Jianxin Xiong², Jeremy Johnson³, David Padua, Manuela Veloso¹, Robert W. Johnson - Show less +4 more•Institutions (3)

Carnegie Mellon University¹, University of Cambridge², Drexel University³

01 Jun 2004-International Journal of High Performance Computing Applications

18 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

The Design and Implementation of FFTW3

[...]

Matteo Frigo¹, Steven G. Johnson²•Institutions (2)

IBM¹, Massachusetts Institute of Technology²

24 Jan 2005

TL;DR: It is shown that such an approach can yield an implementation of the discrete Fourier transform that is competitive with hand-optimized libraries, and the software structure that makes the current FFTW3 version flexible and adaptive is described.

...read moreread less

Abstract: FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the hardware in order to maximize performance. This paper shows that such an approach can yield an implementation that is competitive with hand-optimized libraries, and describes the software structure that makes our current FFTW3 version flexible and adaptive. We further discuss a new algorithm for real-data DFTs of prime size, a new way of implementing DFTs by means of machine-specific single-instruction, multiple-data (SIMD) instructions, and how a special-purpose compiler can derive optimized implementations of the discrete cosine and sine transforms automatically from a DFT algorithm.

...read moreread less

5,172 citations

Journal Article•DOI•

When and how to develop domain-specific languages

[...]

Marjan Mernik¹, Jan Heering, Anthony M. Sloane²•Institutions (2)

University of Maribor¹, Macquarie University²

01 Dec 2005-ACM Computing Surveys

TL;DR: In this article, the authors identify patterns in the decision, analysis, design, and implementation phases of DSL development and discuss domain analysis tools and language development systems that may help to speed up DSL development.

...read moreread less

Abstract: Domain-specific languages (DSLs) are languages tailored to a specific application domain. They offer substantial gains in expressiveness and ease of use compared with general-purpose programming languages in their domain of application. DSL development is hard, requiring both domain knowledge and language development expertise. Few people have both. Not surprisingly, the decision to develop a DSL is often postponed indefinitely, if considered at all, and most DSLs never get beyond the application library stage.Although many articles have been written on the development of particular DSLs, there is very limited literature on DSL development methodologies and many questions remain regarding when and how to develop a DSL. To aid the DSL developer, we identify patterns in the decision, analysis, design, and implementation phases of DSL development. Our patterns improve and extend earlier work on DSL design patterns. We also discuss domain analysis tools and language development systems that may help to speed up DSL development. Finally, we present a number of open problems.

...read moreread less

1,778 citations

Proceedings Article•DOI•

Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines

[...]

Jonathan Ragan-Kelley¹, Connelly Barnes², Andrew Adams¹, Sylvain Paris², Frédo Durand¹, Saman Amarasinghe¹ - Show less +2 more•Institutions (2)

Massachusetts Institute of Technology¹, Adobe Systems²

16 Jun 2013

TL;DR: A systematic model of the tradeoff space fundamental to stencil pipelines is presented, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule are presented.

...read moreread less

Abstract: Image processing pipelines combine the challenges of stencil computations and stream programs. They are composed of large graphs of different stencil stages, as well as complex reductions, and stages with global or data-dependent access patterns. Because of their complex structure, the performance difference between a naive implementation of a pipeline and an optimized one is often an order of magnitude. Efficient implementations require optimization of both parallelism and locality, but due to the nature of stencils, there is a fundamental tension between parallelism, locality, and introducing redundant recomputation of shared values.We present a systematic model of the tradeoff space fundamental to stencil pipelines, a schedule representation which describes concrete points in this space for each stage in an image processing pipeline, and an optimizing compiler for the Halide image processing language that synthesizes high performance implementations from a Halide algorithm and a schedule. Combining this compiler with stochastic search over the space of schedules enables terse, composable programs to achieve state-of-the-art performance on a wide range of real image processing pipelines, and across different hardware architectures, including multicores with SIMD, and heterogeneous CPU+GPU execution. From simple Halide programs written in a few hours, we demonstrate performance up to 5x faster than hand-tuned C, intrinsics, and CUDA implementations optimized by experts over weeks or months, for image processing applications beyond the reach of past automatic compilers.

...read moreread less

1,074 citations

Journal Article•DOI•

SPIRAL: Code Generation for DSP Transforms

[...]

Carnegie Mellon University¹, Drexel University², University of Illinois at Urbana–Champaign³, University of Cambridge⁴, STMicroelectronics⁵

27 Jun 2005

...read moreread less

853 citations

Proceedings Article•DOI•

Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

[...]

Victor W. Lee¹, Changkyu Kim¹, Jatin Chhugani¹, Michael E. Deisher¹, Daehyun Kim¹, Anthony D. Nguyen¹, Nadathur Satish¹, Mikhail Smelyanskiy¹, Srinivas Chennupaty¹, Per Hammarlund¹, Ronak Singhal¹, Pradeep Dubey¹ - Show less +8 more•Institutions (1)

Intel¹

19 Jun 2010

TL;DR: This paper discusses optimization techniques for both CPU and GPU, analyzes what architecture features contributed to performance differences between the two architectures, and recommends a set of architectural features which provide significant improvement in architectural efficiency for throughput kernels.

...read moreread less

Abstract: Recent advances in computing have led to an explosion in the amount of data being generated. Processing the ever-growing data in a timely manner has made throughput computing an important aspect for emerging applications. Our analysis of a set of important throughput computing kernels shows that there is an ample amount of parallelism in these kernels which makes them suitable for today's multi-core CPUs and GPUs. In the past few years there have been many studies claiming GPUs deliver substantial speedups (between 10X and 1000X) over multi-core CPUs on these kernels. To understand where such large performance difference comes from, we perform a rigorous performance analysis and find that after applying optimizations appropriate for both CPUs and GPUs the performance gap between an Nvidia GTX280 processor and the Intel Core i7-960 processor narrows to only 2.5x on average. In this paper, we discuss optimization techniques for both CPU and GPU, analyze what architecture features contributed to performance differences between the two architectures, and recommend a set of architectural features which provide significant improvement in architectural efficiency for throughput kernels.

...read moreread less

810 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse