scispace - formally typeset
Search or ask a question

Showing papers on "Fast Fourier transform published in 2019"


Book
23 Sep 2019
TL;DR: PRELIMINARIES An Elementary Introduction to the Discrete Fourier Transform Some Mathematical and Computational Preliminaries SEQUENTIAL FFT ALGORITHMS The Divide-and-Conquer Paradigm and Two Basic FFT Algorithms Deciphering the Scrambled Output from In-Place FFT Computation Bit-Reversed Input to the Radix-2 DIF FFT.
Abstract: PRELIMINARIES An Elementary Introduction to the Discrete Fourier Transform Some Mathematical and Computational Preliminaries SEQUENTIAL FFT ALGORITHMS The Divide-and-Conquer Paradigm and Two Basic FFT Algorithms Deciphering the Scrambled Output from In-Place FFT Computation Bit-Reversed Input to the Radix-2 DIF FFT Performing Bit-Reversal by Repeated Permutation of Intermediate Results An In-Place Radix-2 DIT FFT for Input in Natural Order An In-Place Radix-2 DIT FFT for Input in Bit-Reversed Order An Ordered Radix-2 DIT FFT Ordering Algorithms and Computer Implementation of Radix-2 FFTs The Radix-4 and the Class of Radix-2s FFTs The Mixed-Radix and Split-Radix FFTs FFTs for Arbitrary N FFTs for Real Input FFTs for Composite N Selected FFT Applications PARALLEL FFT ALGORITHMS Parallelizing the FFTs: Preliminaries on Data Mapping Computing and Communications on Distributed-Memory Multiprocessors Parallel FFTs without Inter-Processor Permutations Parallel FFTs with Inter-Processor Permutations A Potpourri of Variations on Parallel FFTs Further Improvement and a Generalization of Parallel FFTs Parallelizing Two-Dimensional FFTs Computing and Distributing Twiddle Factors in the Parallel FFTs APPENDICES Fundamental Concepts of Efficient Scientific Computation Solving Recurrence Equations by Substitution Bibliography

148 citations


Journal ArticleDOI
TL;DR: FINUFFT is presented, an efficient parallel library for type 1 (nonuiform to uniform), type 2 (uniform to nonuniform), or type 3 (non uniform toNonuniform) transforms, in dimensions 1, 2, or 3, which uses minimal RAM, requires no precomputation or plan steps, and has a simple interface to several languages.
Abstract: The nonuniform fast Fourier transform (NUFFT) generalizes the FFT to off-grid data. Its many applications include image reconstruction, data analysis, and the numerical solution of differential equ...

88 citations


Journal ArticleDOI
TL;DR: This paper proposes an efficient monotonic algorithm based on fast Fourier transform, to minimize the multi-dimensional objective function of weighted auto- and cross-correlation functions for multiple-input multiple-output (MIMO) radar systems.
Abstract: In this paper, we aim at designing sets of binary sequences with good aperiodic/periodic auto- and cross-correlation functions for multiple-input multiple-output (MIMO) radar systems. We show that such a set of sequences can be obtained by minimizing a weighted sum of peak sidelobe level (PSL) and integrated sidelobe level (ISL) with the binary element constraint at the design stage. The sets of designed sequences are neighboring the lower bound on ISL and have a better PSL than the best-known structured sets of binary sequences. To formulate the problem, we introduce a Pareto-objective of weighted auto- and cross-correlation functions by establishing a multi-objective NP-hard constrained optimization problem. Then, by using the block coordinate descent framework, we propose an efficient monotonic algorithm based on fast Fourier transform, to minimize the multi-dimensional objective function. Numerical results illustrate the superior performance of the proposed algorithm in comparison with the state-of-the-art methods.

78 citations


Journal ArticleDOI
TL;DR: It is found that processing with the proposed technique closely matches the reference-data and outperforms the inverse cosine windowing and zeroing techniques in 2-D cross correlation, amplitude, and phase average errors and phase root-mean-square error.
Abstract: A frequency-modulated continuous-wave (FMCW) radar interference mitigation technique using the interpolation of beat frequencies in the short-time Fourier transform (STFT) domain, phase matching, and reconfigurable linear prediction coefficients estimation for Coherent Processing Interval processing is proposed. The technique is noniterative and does not rely on algorithm convergence. It allows the usage of the fast Fourier transform (FFT) as the radar’s beat-frequency estimation tool, for reasons such as real-time implementation, noise linearity after the FFT, and compatibility with legacy receiver architectures. Verification is done in range and in range-Doppler using radar experimental data in two ways: first by removing interferences from interference-contaminated data and second by using interference-free data as the reference data, and processing it—as if it had interferences—using the proposed technique, inverse cosine windowing and zeroing for comparison. We found that processing with the proposed technique closely matches the reference-data and outperforms the inverse cosine windowing and zeroing techniques in 2-D cross correlation, amplitude, and phase average errors and phase root-mean-square error. It is expected that the proposed technique will be operationally deployed on the TU Delft simultaneous-polarimetric PARSAX radar.

76 citations


Journal ArticleDOI
TL;DR: An algorithm based on FFT methods has been introduced to solve the phase-field model of brittle fracture and is capable of predicting different crack modes and complex crack configuration, such as crack interaction, branching and coalescence.

64 citations


Journal ArticleDOI
TL;DR: To achieve the optimal BSR output, the IABSR method based on salp swarm algorithm (SSA) is presented and optimizes not only the BSR system parameters but also the calculation step size.
Abstract: Machinery vibration signal is a typical multi-component signal and fault features are often submerged by some interference components. To accurately extract fault features, a weak feature enhancement method based on empirical wavelet transform (EWT) and an improved adaptive bistable stochastic resonance (IABSR) is proposed. This method makes full use of the signal decomposition performance of EWT and the signal enhancement of the IABSR to achieve the purpose of fault feature enhancement in low frequency band of FFT spectrum. Firstly, EWT is used as the preprocessing program of bistable stochastic resonance (BSR) to decompose the machinery vibration signal into a set of sub-components. Then, the sensitive component that contains main fault information is further input into BSR system to enhance fault features with the assistance of residual noises. Finally, the fault features are identified from fast Fourier transform (FFT) spectrum of the BSR output. To achieve the optimal BSR output, the IABSR method based on salp swarm algorithm (SSA) is presented. Compared with the tradition adaptive BSR (ABSR), the IABSR optimizes not only the BSR system parameters but also the calculation step size. Two case studies on machinery fault diagnosis demonstrate the effectiveness and superiority of the proposed method. In addition, the proposed method is easy to implement and is robust to noise to some extent.

64 citations


Journal ArticleDOI
TL;DR: Experimental and comparative results show that the proposed method can be more effectively and accurately applied to the fault diagnosis of planetary gear transmission systems compared with typical fault diagnosis methods based on analytic flexible wavelet transform, Morlet wavelettransform, and infograms.

60 citations


Journal ArticleDOI
TL;DR: Vibration signals combined with a deep learning predictive model could be applied to predict the surface roughness in the milling process using FFT-LSTM or 1-D CNN is recommended to develop an intelligent system.
Abstract: The use of surface roughness (Ra) to indicate product quality in the milling process in an intelligent monitoring system applied in-process has been developing. From the considerations of convenient installation and cost-effectiveness, accelerator vibration signals combined with deep learning predictive models for predicting surface roughness is a potential tool. In this paper, three models, namely, Fast Fourier Transform-Deep Neural Networks (FFT-DNN), Fast Fourier Transform Long Short Term Memory Network (FFT-LSTM), and one-dimensional convolutional neural network (1-D CNN), are used to explore the training and prediction performances. Feature extraction plays an important role in the training and predicting results. FFT and the one-dimensional convolution filter, known as 1-D CNN, are employed to extract vibration signals’ raw data. The results show the following: (1) the LSTM model presents the temporal modeling ability to achieve a good performance at higher Ra value and (2) 1-D CNN, which is better at extracting features, exhibits highly accurate prediction performance at lower Ra ranges. Based on the results, vibration signals combined with a deep learning predictive model could be applied to predict the surface roughness in the milling process. Based on this experimental study, the use of prediction of the surface roughness via vibration signals using FFT-LSTM or 1-D CNN is recommended to develop an intelligent system.

60 citations


Journal ArticleDOI
TL;DR: A new approach is proposed for the indexing of electron back-scattered diffraction (EBSD) patterns that employs a spherical master EBSD pattern and computes its cross-correlation with a back-projected experimental pattern using the spherical harmonic transform (SHT).

59 citations


Journal ArticleDOI
TL;DR: A novel boundary point detection algorithm and spatial FFT-based filtering approach, which together allow for direct generation of low noise tessellated surfaces from point cloud data, which are not based on pre-defined threshold values.

55 citations


Journal ArticleDOI
TL;DR: The design of constant-modulus probing waveforms is considered to improve the spectral compatibility of radar systems with the congested radio frequency environments and a weighted least-squares fitting approach is used to formulate the spectral shaping problem.
Abstract: We consider the design of constant-modulus probing waveforms to improve the spectral compatibility of radar systems with the congested radio frequency environments. We seek to synthesize radar probing waveforms with desired spectral shapes. We use a weighted least-squares fitting approach to formulate the spectral shaping problem. We introduce two algorithms to tackle the optimization problem we encounter. Both algorithms are devised based on cyclic approaches and have guaranteed convergence of the objective values. Moreover, the proposed algorithms can be implemented via fast Fourier transforms and, hence, are computationally efficient. Furthermore, we extend the proposed algorithms to deal with peak-to-average-power ratio and similarity constraints, which are desirable in some radar applications. Finally, we provide several numerical examples to demonstrate the effectiveness of the proposed algorithms.

Journal ArticleDOI
TL;DR: Compared with the conventional subspace detection (SD) algorithm, the ADT-SFT algorithm only needs to search a small number of suspected target Doppler frequencies, and therefore, the computational complexity can be greatly reduced.
Abstract: In this paper, an adaptive dual-threshold sparse Fourier transform (ADT-SFT) algorithm is proposed, which enables the application of the SFT and robust SFT (RSFT) to the moving target detection in clutter background. Two levels of detection are introduced in this algorithm. First, a scalar constant false alarm rate (CFAR) detection is employed in each frequency channel formed by subsampled fast Fourier transform (FFT) to suppress the influence of strong clutter points on the sparsity and frequencies estimation. Second, the subspace detector constructed by suspected target Doppler frequencies is adopted to complete the target detection. The simulation analysis and results of the measured sea clutter data show that the ADT-SFT algorithm is more suitable for the clutter background and can obtain better detection performance than SFT and RSFT. In addition, compared with the conventional subspace detection (SD) algorithm, which needs to search all the Doppler frequencies one-by-one to establish the detector, the ADT-SFT algorithm only needs to search a small number of suspected target Doppler frequencies, and therefore, the computational complexity can be greatly reduced.

Journal ArticleDOI
TL;DR: In this article, the authors show that there exist fast constructions for computing approximate projections onto the leading Slepian basis elements of the discrete Prolate Spheroidal Sequence (DPSS).

Journal ArticleDOI
TL;DR: In this article, a multiphysics computational model for electroconvective flow between two infinitely long parallel electrodes is investigated via a tworelaxation-time Lattice Boltzmann Method for fluid and charge transport coupled to Fast Fourier Transport Poisson solver for the electric potential.

Journal ArticleDOI
TL;DR: In this paper, a novel signal optimization based generalized demodulation transform (SOGDT) is proposed for rolling bearing nonstationary fault characteristic extraction, which mainly involves five steps: (a) the resonance frequency band excited by bearing fault is obtained using the spectral kurtosis (SK) based band-pass filtering algorithm; (b) the instantaneous fault characteristic frequencies (IFCFs) are extracted via the peak search algorithm from the envelope time-frequency spectrum (TFS) of the filtered signal, and based on the optimal criteria, an optimal signal and an optimal

Journal ArticleDOI
TL;DR: This paper presents different approximate designs for computing the FFT, where the tradeoff between accuracy and performance is achieved by adjusting the word length in each computational stage by two algorithms for word length modification under a specific error margin.
Abstract: This paper presents different approximate designs for computing the FFT. The tradeoff between accuracy and performance is achieved by adjusting the word length in each computational stage. Two algorithms for word length modification under a specific error margin are proposed. The first algorithm targets an approximate FFT for an area-limited design compared to the conventional fixed design; the second algorithm targets performance so it achieves a higher operating frequency. Both of the proposed algorithms show that an efficient balance between hardware utilization and performance is possible at stage-level. The proposed approximate FFT designs are implemented on FPGA; experimental results show that hardware utilization using the first approximate algorithm are reduced by at least nearly 40%. The second algorithm increases performance of the designs by over 20%. Fine granularity design is also investigated, where the FPGA resources for a 256-point FFT computation can be further reduced by nearly 10% compared to a coarse design. Finally, the proposed approximate designs are applied to a feature extraction module in an isolated word recognition system; the numbers of LUTs and FFs for the Mel frequency cepstrum coefficients (MFCC) extraction module are decreased by up to 47.2% and 39.0%, respectively with a power reduction of up to 27.0% at a loss in accuracy of less than 2%.

Journal ArticleDOI
01 Jan 2019
TL;DR: The experimental results show that the proposed FFT combined with InfoGain method can generate better performance than the DWT method and outperforms six other reported methods and achieves an 11.9% improvement.
Abstract: This paper proposes a new algorithm which combines the information in frequency domain with the Information Gain (InfoGain) technique for the detection of epileptic seizures from electroencephalogram (EEG) data. The proposed method consists of four main steps. Firstly, in order to investigate which method is most suitable to decompose the EEG signals into frequency bands, we implement separately a fast Fourier transform (FFT) or discrete wavelet transform (DWT). Secondly, each band is partitioned into k windows and a set of statistical features are extracted from each window. Thirdly, the InfoGain is used to rank the extracted features and the most important ones are selected. Lastly, these features are forwarded to a least square support vector machine (LS-SVM) classifier to classify the EEG. This scheme is implemented and tested on a benchmark EEG database and also compared with other existing methods, based on some performance evaluation measures. The experimental results show that the proposed FFT combined with InfoGain method can generate better performance than the DWT method. This method achieves 100% accuracy for five different pairs: healthy people with eyes open (z) versus epileptic patients with activity seizures (s); healthy people with eyes closed (o) versus s; epileptic patients with free seizures (n) versus s; patients with free seizures epileptic (f) versus s; and z versus o. The accuracies obtained for two other pairs, (o vs. n) and (z vs. f), are 95.62 and 88.32%, respectively. These two pairs have more similarities with each other, leading to a lower level of accuracy. The proposed approach outperforms six other reported methods and achieves an 11.9% improvement. Finally, it can be concluded that the proposed FFT combined with InfoGain method has the capacity to detect epileptic seizures in EEG most effectively.

Journal ArticleDOI
TL;DR: A brief overview of the key developments in FFT algorithms along with some popular applications in speech and image processing, signal analysis, and communication systems are presented.
Abstract: The fast Fourier transform (FFT) algorithm was developed by Cooley and Tukey in 1965. It could reduce the computational complexity of discrete Fourier transform significantly from $$O(N^2)$$ to $$O(N\log _2 {N})$$ . The invention of FFT is considered as a landmark development in the field of digital signal processing (DSP), since it could expedite the DSP algorithms significantly such that real-time digital signal processing could be possible. During the past 50 years, many researchers have contributed to the advancements in the FFT algorithm to make it faster and more efficient in order to match with the requirements of various applications. In this article, we present a brief overview of the key developments in FFT algorithms along with some popular applications in speech and image processing, signal analysis, and communication systems.

Journal ArticleDOI
TL;DR: A closed-form expression of the bit error rate (BER) for FFT-NOMA as well as wavelet-based NOMA (WNOMA) systems is presented and the theoretical and simulation BER results show that WnOMA outperforms F FT-NomA in additive white Gaussian noise.
Abstract: The non-orthogonal multiple access (NOMA) technique is a strong candidate for 5G cellular networks that enable greater multiuser capacity and user fairness through multiplexing in the power domain. The user data are pulse-shaped using the orthogonal frequency-division multiplexing (OFDM) technique based on the fast Fourier transform (FFT) for conventional NOMA. We propose a discrete wavelet transform-based pulse shaping technique for NOMA. We present a closed-form expression of the bit error rate (BER) for FFT-NOMA as well as wavelet-based NOMA (WNOMA) systems. The theoretical and simulation BER results show that WNOMA outperforms FFT-NOMA in additive white Gaussian noise.

Journal ArticleDOI
TL;DR: This paper combines the notion of converging functions with the CAN and PeCAN frameworks, to considerably enhance their performances for binary sequence synthesis, and shows that the proposed algorithms can outperform existing approaches for aperiodic and periodicbinary sequence synthesis.
Abstract: Sequences with low autocorrelation sidelobes are needed in a diverse set of active sensing applications, including radar, sonar, communications, and biomedicine. The recently proposed and widely adopted methods of the Cyclic Algorithm-New (CAN) and Periodic CAN (PeCAN) are known to be computationally efficient, particularly as they employ fast Fourier transform (FFT) operations to design unimodular (i.e., unit-modulus) sequences with good autocorrelation properties. However, these cyclic algorithms cannot be directly used to design binary sequences with good autocorrelation properties due to the extremely multimodal nature of the associated optimization objectives. In this paper, we combine the notion of converging functions with the CAN and PeCAN frameworks, to considerably enhance their performances for binary sequence synthesis. Moreover, the convergence of these algorithms are established and their convergence properties are thoroughly analyzed. Numerical examples are provided to show that the proposed algorithms can outperform existing approaches for aperiodic and periodic binary sequence synthesis, especially for long sequence designs.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a normalized frequency-domain block FxLMS (NFB-FXLMS) algorithm for active vehicle interior noise control, which can be directly used in an active vehicle internal noise control system of the sample vehicle.

Journal ArticleDOI
TL;DR: To obtain a better cancellation of the PLI, a designing approach, generating adaptive notch filter (ANF) of sharp resolution, is proposed, which outperforms conventional notch filters and better preserves the QRS-complex features in the filtered signal.
Abstract: The noise cancellation in electrocardiogram (ECG) signal is very influential to distinguish the essential signal features masked by noises. The power line interference (PLI) is the main source of noise in most of bio-electric signals. Digital notch filters can be used to suppress the PLI in ECG signals. However, the problems of transient interferences and the ringing effect occur, especially when the digitization of PLI does not meet the condition of full period sampling. In this paper, to obtain a better cancellation of the PLI, a designing approach, generating adaptive notch filter (ANF) of sharp resolution, is proposed. The proposed method is concise in algorithm and achieves a more comprehensive reduction of the PLI. It only requires on one fast Fourier transform on the input signal. The spectrum correction method, based on the information from the FFT spectrum of the corrupted signal, is utilized to estimate the harmonic parameters of the PLI. The information of a few main lobe spectral bins in the FFT spectrum is merged such that a compensation signal can be synthesized. By subtracting the compensational signal from the original measurement, the PLI within the investigated signal can substantially reduced. A distinguished advantage of the proposed ANF lies in the fact that no parameters are required to be specified, making the algorithm easier to be implemented. The proposed ANF outperforms conventional notch filters because it not only alleviates the undesirable effects but also better preserves the QRS-complex features in the filtered signal.

Journal ArticleDOI
TL;DR: A novel feature extraction approach for chatter detection by using image analysis of dominant frequency bands from the short-time Fourier transform (STFT) spectrograms to indicate the efficiency of the time-frequency image features from dominant Frequency bands for chatter Detection and their better performance than the time domain features and wavelet-based features in terms of their separability capabilities.
Abstract: Chatter is a cause of low surface quality and productivity in milling and crucial features need to be extracted for accurate chatter detection and suppression. This paper introduces a novel feature extraction approach for chatter detection by using image analysis of dominant frequency bands from the short-time Fourier transform (STFT) spectrograms. In order to remove the environmental noises and highlight chatter related characteristics, dominant frequency bands with high energy are identified by applying the squared energy operator to the synthesized fast Fourier transform (FFT) spectrum. The time-frequency spectrogram of the vibration signal is divided into a set of grayscale sub-images according to the dominant frequency bands. Statistical image features are extracted from those sub-images to describe the machining condition and assessed in terms of their separability capabilities. The proposed feature extraction method is verified by using dry milling tests of titanium alloy Ti6Al4V and compared with two existing feature extraction techniques. The results indicate the efficiency of the time-frequency image features from dominant frequency bands for chatter detection and their better performance than the time domain features and wavelet-based features in terms of their separability capabilities.

Journal ArticleDOI
TL;DR: A low-complexity joint extrapolation-multiple signal classification (MUSIC)-based 2-D parameter estimator that combines extrapolated FFT and MUSIC to reduce the computational load for vital detection for FMCW radar is proposed.
Abstract: In this paper, a low-complexity joint extrapolation-multiple signal classification (MUSIC)-based 2-D parameter estimator is proposed for vital frequency-modulated continuous-wave (FMCW) radar. Recently, an FMCW radar, which can detect the distance and vital Doppler information, has been considered for vital non-contact radar. In the conventional FMCW radar system, fast Fourier transform (FFT)-based algorithms with low complexity are used to extract multiple parameters. However, the resolution and accuracy of an FFT-based parameter estimator are considerably low. Thus, 2-D high-resolution algorithms, such as the 2-D estimation of signal parameters via rotational invariance techniques and 2-D MUSIC, have been suggested as an alternative method. However, a large computation power is required compared with the FFT-based methods. Therefore, this paper proposes a 2-D parameter estimator that combines extrapolated FFT and MUSIC to reduce the computational load for vital detection. The proposed method uses an extrapolated FFT to overcome the disadvantages of the low-resolution FFT for the distance information, and then, the 1-D MUSIC algorithm is applied to the Doppler domain direction only for the extracted magnitude and phase information of the target’s extrapolated FFT results. Hence, the proposed algorithm combines the advantages of FFT and MUSIC. The performance of the proposed estimation is compared with that of other algorithms using Monte Carlo simulation results. The root-mean-square error of the proposed method is compared with that of 2-D MUSIC with various parameters. To verify the performance of the proposed combination method, the FMCW radar was used, and its performance was verified in an indoor environment.

Journal ArticleDOI
Yan Wang1, Qunzhan Li1, Fulin Zhou1, Yang Zhou1, Xiuqing Mu1 
TL;DR: A new method for automatic monitoring of noisy power quality, which is based on the Hilbert transform (HT) and the proposed slip-singular value decomposition (SVD)-based noise-suppression algorithm, which has the advantages such as low false detection rate, good noise tolerance capability, short computational time, fewer parameters, practicability, and compatibility in comparison with the traditional disturbance detection methods.
Abstract: This paper presents a new method for automatic monitoring of noisy power quality, which is based on the Hilbert transform (HT) and the proposed slip-singular value decomposition (SVD)-based noise-suppression algorithm. The proposed method first employs the fast Fourier transform (FFT)-based low-pass filter and HT to obtain the instantaneous fundamental amplitude and the FFT sequence of the signal. Second, the slip-SVD-based noise-suppression algorithm and threshold filtering are used to extract cleaned singular value characteristic waveform of the high-frequency signal. Through judging the instantaneous fundamental amplitude, cleaned singular value characteristic waveform, and the FFT sequence, the presence of disturbances including single and combined disturbances can be easily detected by the proposed method. To demonstrate the effectiveness of the proposed method, extensive tests are conducted on the diverse simulation disturbances and the actual data obtained from the practical power systems of China. The test results show that the proposed method has the advantages such as low false detection rate, good noise tolerance capability, short computational time, fewer parameters, practicability, and compatibility in comparison with the traditional disturbance detection methods. Besides, the proposed method can provide some important features such as amplitude, duration, and frequency for classification. Such advantages make the proposed method to be a good choice for real-time applications.

Journal ArticleDOI
TL;DR: To ensure directional transmission, it is proposed to design the transmit signal by minimizing the weighted mean-squared error (MSE) between the formed beampattern and a given one through an alternating minimization (AM) method.
Abstract: Deploying low-resolution (e.g. one-bit) digital-to-analog converters (DACs) is of great importance in the multiple-input multiple-output (MIMO) system equipped with a large-scale antenna array since such a hardware architecture brings low-cost and circuit power saving for each antenna. In this paper, the problem of transmit signal design in a large-scale MIMO system with 1-bit DACs is investigated. To ensure directional transmission, we propose to design the transmit signal by minimizing the weighted mean-squared error (MSE) between the formed beampattern and a given one. The resulting design problem, which involves a nonconvex fourth-order objective and a set of nonconvex discrete constraints, is NP-hard, and therefore, an alternating minimization (AM) method is devised. In order to obtain a high-quality 1-bit solution, we propose a continuous and differentiable function to approximate the 1-bit signal, such that the problem with discrete 1-bit constraint is recast to an unconstrained optimization problem with a penalty term, which can be effectively solved via the limited-memory Broyden, Fletcher, Goldfarb, and Shanno (L-BFGS) approach. Moreover, it is found that a closed-form solution can be obtained when equal weights are applied. In addition, low-complexity schemes are developed based on the fast Fourier transform (FFT). The numerical simulations are conducted to demonstrate the effectiveness and superiority of the proposed method.

Journal ArticleDOI
TL;DR: In this article, the authors present a new method for performing global redistributions of multidimensional arrays essential to parallel fast Fourier (or similar) transforms, taking advantage of subarray datatypes and generalized all-to-all scatter/gather from the MPI-2 standard to communicate discontiguous memory buffers, effectively eliminating the need for local data realignments.

Posted Content
TL;DR: This work introduces a parameterization of divide-and-conquer methods that can automatically learn an efficient algorithm for many important transforms, and can be incorporated as a lightweight replacement of generic matrices in machine learning pipelines to learn efficient and compressible transformations.
Abstract: Fast linear transforms are ubiquitous in machine learning, including the discrete Fourier transform, discrete cosine transform, and other structured transformations such as convolutions. All of these transforms can be represented by dense matrix-vector multiplication, yet each has a specialized and highly efficient (subquadratic) algorithm. We ask to what extent hand-crafting these algorithms and implementations is necessary, what structural priors they encode, and how much knowledge is required to automatically learn a fast algorithm for a provided structured transform. Motivated by a characterization of fast matrix-vector multiplication as products of sparse matrices, we introduce a parameterization of divide-and-conquer methods that is capable of representing a large class of transforms. This generic formulation can automatically learn an efficient algorithm for many important transforms; for example, it recovers the $O(N \log N)$ Cooley-Tukey FFT algorithm to machine precision, for dimensions $N$ up to $1024$. Furthermore, our method can be incorporated as a lightweight replacement of generic matrices in machine learning pipelines to learn efficient and compressible transformations. On a standard task of compressing a single hidden-layer network, our method exceeds the classification accuracy of unconstrained matrices on CIFAR-10 by 3.9 points -- the first time a structured approach has done so -- with 4X faster inference speed and 40X fewer parameters.

Journal ArticleDOI
TL;DR: A signal processing theoretical modeling approach for describing the power of the approximation noise which is the integral of error spectral density over the bandwidth, is developed and a mathematical optimization approach based on Lagrange Multipliers for optimizing design parameters is presented.
Abstract: In this paper, we present a framework for analytically estimating the output quality of common digital signal processing (DSP) blocks that utilize approximate adders. The framework is based on considering the error of approximate adders as an additive noise (approximation noise) that disturbs the output of the DSP block in question. A signal processing theoretical modeling approach for describing the power of the approximation noise which is the integral of error spectral density over the bandwidth, is developed. The output qualities of DSP blocks, such as finite impulse response filter, discrete cosine transform, and fast Fourier transform, which utilize approximate adders, are thus estimated. The accuracy of the proposed framework is evaluated by comparing mathematical model predictions to simulation results by using the signal-to-noise ratio (SNR) metric. The inaccuracy of the SNRs predicted by the framework was, on average, less than 2.5dB compared with that obtained from simulations. Therefore, a mathematical optimization approach based on Lagrange Multipliers for optimizing design parameters is also presented. The optimization is realized by choosing a proper configuration of the target block, such as determining the data width of the inexact computation part for each approximate adder in the design.

Journal ArticleDOI
TL;DR: A fast implementation of the SAR-based image reconstruction method in case of 1-D and 2-D multistatic arrays using Fourier based across the uniform direction in LaTeX-space and SAR based along the nonuniform direction is presented.
Abstract: Multistatic millimeter-wave imaging structures are superior to their monostatic counterparts for imaging natural objects of sudden profile variations. Multistatic image reconstruction is conventionally performed via synthetic aperture radar (SAR)-based methods which, in spite of their high accuracy, are computationally burdensome. On the other side, the Fourier-based image reconstruction in multistatic systems also faces few challenges including multidimensional interpolation, a plane-wave approximation of spherical waves, and $k$ -space partitioning. This paper presents a fast implementation of the SAR-based image reconstruction method in case of 1-D and 2-D multistatic arrays. The proposed implementation is Fourier based across the uniform direction in $k$ -space and SAR based along the nonuniform direction. Both methods are fully parallelizable and are arranged into a vector format to maximize memory usage and minimize the computational time. The extensive validation and benchmarking have been performed with both simulation and experimental data, which proves a $256\times $ improvement in the reconstruction time for the worst case scenario compared to that of in SAR-based methods with the same image quality. Furthermore, the reconstructed image performance is around $10\times $ better than the most recent Fourier-based reconstruction method, in terms of root-mean-square error metric, and with $28\times $ less computational time.