# Non-uniform DFT implementation for channel simulations in GPU

...Equation (8) may be reduced to 1-D non-uniform DFT to compute the vector of diagonal elements [11]:...

...A non-unitary DFT based approach was used in [11] to implement frequency-domain channel simulator in GPU....

...Comparing to other similar existing frameworks [3]–[5], proposed solution takes into consideration consequences of wireless channel non-stationarity and practical aspects of multiuser scenarios....

...The baseband OFDM system [3] is shown in Figure 1, where x is the transmitted symbol, g(t) is the channel impulse response, ñ(t) is additive white Gaussian noise and y is the received symbol....

...A second kernel uses parallel scan method [8] to get all the rows of the twiddle factor matrices (e−j1θ, e−j2θ, ....

...A note on the CPU implementations that we use for comparison: ATLAS[10] is a library that can be tuned for optimum performance on CPUs, and makes use of some of the parallel features available on modern CPUs....

...However, the optimizations done by ATLAS lead to variations that are very sensitive to the size of the number of taps and similar parameters....

...The CPU code with ATLAS library gives 2.2x and 3.9x speedup for scan and time-shift method respectively compared to its single threaded 2We have restricted to single-threaded CPU implementation with -O3 compiler optimisation switch: the system has obvious data parallelism across multiple user/channels that can be accounted for separately....

...For consistency, we have compared our GPU implementation against the regular CPU implementations with normal compiler optimizations, with the understanding that a further speedup may be possible using ATLAS, but this does not fundamentally change the observations....

