# Non-uniform DFT implementation for channel simulations in GPU

##### Citations

6 citations

### Cites background or methods from "Non-uniform DFT implementation for ..."

...Equation (8) may be reduced to 1-D non-uniform DFT to compute the vector of diagonal elements [11]:...

[...]

...A non-unitary DFT based approach was used in [11] to implement frequency-domain channel simulator in GPU....

[...]

1 citations

### Cites background from "Non-uniform DFT implementation for ..."

...Comparing to other similar existing frameworks [3]–[5], proposed solution takes into consideration consequences of wireless channel non-stationarity and practical aspects of multiuser scenarios....

[...]

##### References

1,647 citations

### "Non-uniform DFT implementation for ..." refers methods in this paper

...The baseband OFDM system [3] is shown in Figure 1, where x is the transmitted symbol, g(t) is the channel impulse response, ñ(t) is additive white Gaussian noise and y is the received symbol....

[...]

1,594 citations

1,511 citations

### "Non-uniform DFT implementation for ..." refers methods in this paper

...A second kernel uses parallel scan method [8] to get all the rows of the twiddle factor matrices (e−j1θ, e−j2θ, ....

[...]

1,302 citations

994 citations

### "Non-uniform DFT implementation for ..." refers methods in this paper

...A note on the CPU implementations that we use for comparison: ATLAS[10] is a library that can be tuned for optimum performance on CPUs, and makes use of some of the parallel features available on modern CPUs....

[...]

...However, the optimizations done by ATLAS lead to variations that are very sensitive to the size of the number of taps and similar parameters....

[...]

...The CPU code with ATLAS library gives 2.2x and 3.9x speedup for scan and time-shift method respectively compared to its single threaded 2We have restricted to single-threaded CPU implementation with -O3 compiler optimisation switch: the system has obvious data parallelism across multiple user/channels that can be accounted for separately....

[...]

...For consistency, we have compared our GPU implementation against the regular CPU implementations with normal compiler optimizations, with the understanding that a further speedup may be possible using ATLAS, but this does not fundamentally change the observations....

[...]