scispace - formally typeset
Search or ask a question

Showing papers on "Fast Fourier transform published in 2018"


Posted Content
TL;DR: In this paper, the authors introduce the building blocks for constructing spherical CNNs and demonstrate the computational efficiency, numerical accuracy, and effectiveness of spherical convolutional networks applied to 3D model recognition and atomization energy regression.
Abstract: Convolutional Neural Networks (CNNs) have become the method of choice for learning problems involving 2D planar images. However, a number of problems of recent interest have created a demand for models that can analyze spherical images. Examples include omnidirectional vision for drones, robots, and autonomous cars, molecular regression problems, and global weather and climate modelling. A naive application of convolutional networks to a planar projection of the spherical signal is destined to fail, because the space-varying distortions introduced by such a projection will make translational weight sharing ineffective. In this paper we introduce the building blocks for constructing spherical CNNs. We propose a definition for the spherical cross-correlation that is both expressive and rotation-equivariant. The spherical correlation satisfies a generalized Fourier theorem, which allows us to compute it efficiently using a generalized (non-commutative) Fast Fourier Transform (FFT) algorithm. We demonstrate the computational efficiency, numerical accuracy, and effectiveness of spherical CNNs applied to 3D model recognition and atomization energy regression.

322 citations


Journal ArticleDOI
TL;DR: In this article, a broadband channel estimation algorithm for mmWave multiple input multiple output (MIMO) systems with few-bit analog-to-digital converters (ADCs) is proposed.
Abstract: We develop a broadband channel estimation algorithm for millimeter wave (mmWave) multiple input multiple output (MIMO) systems with few-bit analog-to-digital converters (ADCs). Our methodology exploits the joint sparsity of the mmWave MIMO channel in the angle and delay domains. We formulate the estimation problem as a noisy quantized compressed-sensing problem and solve it using efficient approximate message passing (AMP) algorithms. In particular, we model the angle-delay coefficients using a Bernoulli–Gaussian-mixture distribution with unknown parameters and use the expectation-maximization forms of the generalized AMP and vector AMP algorithms to simultaneously learn the distributional parameters and compute approximately minimum mean-squared error (MSE) estimates of the channel coefficients. We design a training sequence that allows fast, fast Fourier transform based implementation of these algorithms while minimizing peak-to-average power ratio at the transmitter, making our methods scale efficiently to large numbers of antenna elements and delays. We present the results of a detailed simulation study that compares our algorithms to several benchmarks. Our study investigates the effect of SNR, training length, training type, ADC resolution, and runtime on channel estimation MSE, mutual information, and achievable rate. It shows that, in a mmWave MIMO system, the methods we propose to exploit joint angle-delay sparsity allow 1-bit ADCs to perform comparably to infinite-bit ADCs at low SNR, and 4-bit ADCs to perform comparably to infinite-bit ADCs at medium SNR.

319 citations


Journal ArticleDOI
TL;DR: In this article, a flexible piezoelectric acoustic sensor (f-PAS) with a highly sensitive multi-resonant frequency band was fabricated by mimicking the operating mechanism of the basilar membrane in the human cochlear.

113 citations


Journal ArticleDOI
TL;DR: Three variations of convolutions are evaluated, including direct convolution, fast Fourier transform-based convolution (FFT-Conv), and FFT overlap and add convolution for popular CNN networks in embedded hardware to explore the tradeoff between software and hardware implementation, domain-specific logic and instructions, as well as various parallelism across different architectures.
Abstract: Fueled by ImageNet Large Scale Visual Recognition Challenge and Common Objects in Context competitions, the convolutional neural network (CNN) has become important in computer vision and natural language processing. However, state-of-the-art CNNs are computationally memory-intensive, thus energy-efficient implementation on the embedded platform is challenging. Recently, VGGNet and ResNet showed that deep neural networks with more convolution layers and a few fully connected layers can achieve lower error rates, thus reducing the complexity of convolution layers is of utmost importance. In this paper, we evaluate three variations of convolutions, including direct convolution (Direct-Conv), fast Fourier transform (FFT)-based convolution (FFT-Conv), and FFT overlap and add convolution (FFT-OVA-Conv) in terms of computation complexity and memory storage requirements for popular CNN networks in embedded hardware. We implemented these three techniques for ResNet-20 with the CIFAR-10 data set on a low-power domain-specific many-core architecture called power-efficient nanoclusters (PENCs), NVIDIA Jetson TX1 graphics processing unit (GPU), ARM Cortex A53 CPU, and SPARse Convolutional NETwork (SPARCNet) accelerator on Zynq 7020 FPGA to explore the tradeoff between software and hardware implementation, domain-specific logic and instructions, as well as various parallelism across different architectures. Results are evaluated and compared with respect to throughput per layer, energy consumption, and execution time for the three methods. SPARCNet deployed on Zynq FPGA achieved 42-ms runtime with 135-mJ energy consumption with a 10.8-MB/s throughput per layer using FFT-Conv for ResNet-20. Using built-in FFT instruction in PENC, the FFT-OVA-Conv performs $2.9\times $ and $1.65\times $ faster and achieves $6.8\times $ and $2.5\times $ higher throughput per watt than Direct-Conv and FFT-Conv. In ARM A53 CPU, FFT-OVA-Conv achieves $3.36\times $ and $1.38\times $ improvement in execution time and $2.72\times $ and $1.32\times $ higher throughput than Direct-Conv and FFT-Conv. In TX1 GPU, FFT-Conv is $1.9\times $ faster, $2.2\times $ more energy-efficient, and achieves $5.6\times $ higher throughput per layer than Direct-Conv. PENC is 10 $916\times $ and $1.8\times $ faster and $5053\times $ and $4.3\times $ more energy-efficient and achieves $7.5\times $ and $1.2\times $ higher throughput per layer than ARM A53 CPU and TX1 GPU, respectively.

84 citations


Journal ArticleDOI
Tianwai Bo1, Hoon Kim1
TL;DR: A new DSP algorithm for KK receiver operable at 2 samples per symbol is proposed to avoid the use of nonlinear operations such as logarithm and exponential functions and demonstrates the transmission of 112-Gb/s SSB orthogonal frequency-division-multiplexed signal over an 80-km fiber link.
Abstract: The Kramers-Kronig (KK) receiver is capable of retrieving the phase information of optical single-sideband (SSB) signal from the optical intensity when the optical signal satisfies the minimum phase condition. Thus, it is possible to direct-detect the optical SSB signal without suffering from the signal-signal beat interference and linear transmission impairments. However, due to the spectral broadening induced by nonlinear operations in the conventional KK algorithm, it is necessary to employ the digital upsampling at the beginning of the digital signal processing (DSP). The increased number of samples at the DSP would hinder the real-time implementation of this attractive receiver. Hence, we propose a new DSP algorithm for KK receiver operable at 2 samples per symbol. We adopt a couple of mathematical approximations to avoid the use of nonlinear operations such as logarithm and exponential functions. By using the proposed algorithm, we demonstrate the transmission of 112-Gb/s SSB orthogonal frequency-division-multiplexed signal over an 80-km fiber link. The results show that the proposed algorithm operating at 2 samples per symbol exhibits similar performance to the conventional KK one operating at 6 samples per symbol. We also present the error analysis of the proposed algorithm for KK receiver in comparison with the conventional one.

82 citations


Journal ArticleDOI
TL;DR: This study considers a mmWave MIMO-orthogonal frequency division multiplexing (OFDM) receiver with a generalized hybrid architecture in which a small number of radio frequency (RF) chains and low-resolution ADCs are employed simultaneously and proposes a computationally efficient data detection algorithm that provides a minimum mean-square error estimate on data symbols and is extended to a mixed-ADC architecture.
Abstract: Hybrid analog–digital precoding architectures and low-resolution analog-to-digital converter (ADC) receivers are two solutions to reduce hardware cost and power consumption for millimeter wave (mmWave) multiple-input multiple-output (MIMO) communication systems with large antenna arrays. In this study, we consider a mmWave MIMO-orthogonal frequency division multiplexing (OFDM) receiver with a generalized hybrid architecture in which a small number of radio frequency (RF) chains and low-resolution ADCs are employed simultaneously. Owing to the strong nonlinearity introduced by low-resolution ADCs, the task of data detection is challenging, particularly achieving a Bayesian optimal data detection. This study aims to fill this gap. By using a generalized expectation consistent signal recovery technique, we propose a computationally efficient data detection algorithm that provides a minimum mean-square error estimate on data symbols and is extended to a mixed-ADC architecture. Considering particular structure of MIMO-OFDM channel matrix, we provide a low-complexity realization in which only fast fourier transform (FFT) operation and matrix-vector multiplications are required. Furthermore, we present an analytical framework to study the theoretical performance of the detector in the large-system limit, which can precisely evaluate the performance expressions, such as mean-square error and symbol error rate. Based on this optimal detector, the potential of adding a few low-resolution RF chains and high-resolution ADCs for a mixed-ADC architecture is investigated. Simulation results confirm the accuracy of our theoretical analysis and can be used for system design rapidly. The results reveal that adding a few low-resolution RF chains to original unquantized systems can obtain significant gains.

78 citations


Journal ArticleDOI
TL;DR: A new method based on autocorrelation to measure the RR and HR using IR-UWB radar with high accuracy and variational mode decomposition algorithm is adopted to successfully separate the respiration and heartbeat signals.
Abstract: Respiration rate (RR) and heartbeat rate (HR) are important physiological parameters for a person. Impulse radio ultra-wideband (IR-UWB) is a promising technology for non-contact sensing and monitoring. This brief presents a new method based on autocorrelation to measure the RR and HR using IR-UWB radar. The correlation coefficient waveform contains the vital sign signals, overcoming the effect of noise and clutter. Applying fast Fourier transform, the respiration frequency can be acquired easily. A clever method also based on autocorrelation is proposed to locate the subject. The receive signal matrix is divided into a set of bins in the direction of fast time. By removing one block from the matrix each time and re-applying the autocorrelation, the removed block resulting the smallest correlations is corresponding to the location of a subject. Moreover, variational mode decomposition algorithm is adopted to successfully separate the respiration and heartbeat signals. Experiments are carried out using a PulsOn410 UWB radar. The results show that the proposed low-complexity algorithm has high accuracy.

76 citations


Journal ArticleDOI
TL;DR: The results demonstrate that COBA outperforms DAS in terms of resolution and contrast and that the suggested beamformers offer a sizable element reduction while generating images with an equivalent or improved quality in comparison with DAS.
Abstract: The standard technique used by commercial medical ultrasound systems to form B-mode images is delay and sum (DAS) beamforming. However, DAS often results in limited image resolution and contrast that are governed by the center frequency and the aperture size of the ultrasound transducer. A large number of elements lead to improved resolution but at the same time increase the data size and the system cost due to the receive electronics required for each element. Therefore, reducing the number of receiving channels while producing high-quality images is of great importance. In this paper, we introduce a nonlinear beamformer called COnvolutional Beamforming Algorithm (COBA), which achieves significant improvement of lateral resolution and contrast. In addition, it can be implemented efficiently using the fast Fourier transform. Based on the COBA concept, we next present two sparse beamformers with closed-form expressions for the sensor locations, which result in the same beam pattern as DAS and COBA while using far fewer array elements. Optimization of the number of elements shows that they require a minimal number of elements that are on the order of the square root of the number used by DAS. The performance of the proposed methods is tested and validated using simulated data, phantom scans, and in vivo cardiac data. The results demonstrate that COBA outperforms DAS in terms of resolution and contrast and that the suggested beamformers offer a sizable element reduction while generating images with an equivalent or improved quality in comparison with DAS.

74 citations


Journal ArticleDOI
TL;DR: In this paper, a frequency synchronization scheme for multiuser orthogonal frequency division multiplexing uplink with a large-scale uniform linear array at base station (BS) by exploiting the angle information of users is proposed.
Abstract: In this paper, we propose a frequency synchronization scheme for multiuser orthogonal frequency division multiplexing uplink with a large-scale uniform linear array at base station (BS) by exploiting the angle information of users. Considering that the incident signal at BS from each user can be restricted within a certain angular spread, the proposed scheme could perform carrier frequency offset (CFO) estimation for each user individually through a joint spatial-frequency alignment procedure and can be completed efficiently with the aid of fast Fourier transform. A multi-branch receive beamforming is further designed to yield an equivalent single user transmission model for which the conventional single-user channel estimation and data detection can be carried out. To make the study complete, theoretical performance analysis of the CFO estimation is also conducted. We further develop a user grouping scheme to deal with the unexpected scenarios that some users may not be separated well from the spatial domain. Finally, various numerical results are provided to verify the proposed studies.

71 citations


Journal ArticleDOI
TL;DR: An efficient solver for massively-parallel direct numerical simulations of incompressible turbulent flows using a second-order, finite-volume pressure-correction scheme, where the pressure Poisson equation is solved with the method of eigenfunction expansions.
Abstract: We present an efficient solver for massively-parallel direct numerical simulations of incompressible turbulent flows. The method uses a second-order, finite-volume pressure-correction scheme, where the pressure Poisson equation is solved with the method of eigenfunction expansions. This approach allows for very efficient FFT-based solvers in problems with different combinations of homogeneous pressure boundary conditions. Our algorithm explores all combinations of pressure boundary conditions valid for such a solver, in a single, general framework. The method is implemented in a 2D pencil-like domain decomposition, which enables efficient massively-parallel simulations. The implementation was validated against different canonical flows, and its computational performance was examined. Excellent strong scaling performance up to 1 0 4 cores is demonstrated for a domain with 1 0 9 spatial degrees of freedom, corresponding to a very small wall-clock time/time step. The resulting tool, CaNS, has been made freely available and open-source.

71 citations


Posted Content
TL;DR: FinUFFT as mentioned in this paper is an efficient parallel library for non-uniform fast Fourier transform (NUFFT) in dimensions 1, 2, or 3, which uses minimal RAM, requires no precomputation or plan steps, and has a simple interface to several languages.
Abstract: The nonuniform fast Fourier transform (NUFFT) generalizes the FFT to off-grid data. Its many applications include image reconstruction, data analysis, and the numerical solution of differential equations. We present FINUFFT, an efficient parallel library for type 1 (nonuiform to uniform), type 2 (uniform to nonuniform), or type 3 (nonuniform to nonuniform) transforms, in dimensions 1, 2, or 3. It uses minimal RAM, requires no precomputation or plan steps, and has a simple interface to several languages. We perform the expensive spreading/interpolation between nonuniform points and the fine grid via a simple new kernel---the `exponential of semicircle' $e^{\beta \sqrt{1-x^2}}$ in $x\in[-1,1]$---in a cache-aware load-balanced multithreaded implementation. The deconvolution step requires the Fourier transform of the kernel, for which we propose efficient numerical quadrature. For types 1 and 2, rigorous error bounds asymptotic in the kernel width approach the fastest known exponential rate, namely that of the Kaiser--Bessel kernel. We benchmark against several popular CPU-based libraries, showing favorable speed and memory footprint, especially in three dimensions when high accuracy and/or clustered point distributions are desired.

Journal ArticleDOI
01 Dec 2018
TL;DR: In this article, the authors present results of accuracy evaluation of numerous numerical algorithms for the numerical approximation of the Inverse Laplace Transform, including Stehfest, Abate and Whitt, Vlach and Singhai.
Abstract: In the paper we present results of accuracy evaluation of numerous numerical algorithms for the numerical approximation of the Inverse Laplace Transform. The selected algorithms represent diverse lines of approach to this problem and include methods by Stehfest, Abate and Whitt, Vlach and Singhai, De Hoog, Talbot, Zakian and a one in which the FFT is applied for the Fourier series convergence acceleration. We use C++ and Python languages with arbitrary precision mathematical libraries to address some crucial issues of numerical implementation. The test set includes Laplace transforms considered as difficult to compute as well as some others commonly applied in fractional calculus. Evaluation results enable to conclude that the Talbot method which involves deformed Bromwich contour integration, the De Hoog and the Abate and Whitt methods using Fourier series expansion with accelerated convergence can be assumed as general purpose high-accuracy algorithms. They can be applied to a wide variety of inversion problems.

Journal ArticleDOI
01 Jun 2018
TL;DR: This paper proposes a method to obtain approximate graph Fourier transforms that can be applied rapidly and stored efficiently, carried out using a modified version of the famous Jacobi eigenvalues algorithm.
Abstract: The fast Fourier transform is an algorithm of paramount importance in signal processing as it allows to apply the Fourier transform in $\mathcal {O}(n \log n)$ instead of $\mathcal {O}(n^2)$ arithmetic operations. Graph signal processing is a recent research domain that generalizes classical signal processing tools, such as the Fourier transform, to situations where the signal domain is given by any arbitrary graph instead of a regular grid. Today, there is no method to rapidly apply graph Fourier transforms. In this paper, we propose a method to obtain approximate graph Fourier transforms that can be applied rapidly and stored efficiently. It is based on a greedy approximate diagonalization of the graph Laplacian matrix, carried out using a modified version of the famous Jacobi eigenvalues algorithm. The method is described and analyzed in detail, and then applied to both synthetic and real graphs, showing its potential.

Journal ArticleDOI
TL;DR: Faster-than-Nyquist (FTN) signaling reaches up to 67% higher transmission rate compared to the Nyquist counterpart without substantially consuming more transmitter energy per bit, and the overall complexities grow logarithmically with the length of the observations.
Abstract: Faster-than-Nyquist (FTN) signaling has attracted a lot of attentions for the fifth-generation (5G) cellular communication systems. However, low-complexity receiver design for FTN signaling becomes challenging. In this paper, we develop frequency-domain joint channel estimation and decoding methods for FTN signaling transmitting systems over frequency-selective fading channels. To deal with the colored noise inherent in FTN signaling, we propose to approximate the corresponding autocorrelation matrix by a circulant matrix, the special eigenvalue decomposition of which facilitates an efficient fast Fourier transform operation and decoupling the noise in frequency domain. Through a specific partition of the received symbols, many independent estimates are obtained and combined to further improve the accuracy of the channel estimation and data detection. Moreover, instead of assuming the data symbols to be Gaussian random variables, a generalized approximated message passing-based equalization is developed and embedded in the turbo iterations between the channel estimation and the soft-in soft-out decoder. Simulation results show that the proposed algorithm outperforms the cyclic prefix-based and overlap-based frequency-domain equalization methods. With the proposed algorithms, FTN signaling reaches up to 67% higher transmission rate compared to the Nyquist counterpart without substantially consuming more transmitter energy per bit, and the overall complexities grow logarithmically with the length of the observations.

Journal ArticleDOI
TL;DR: In this paper, a technique of de-noising signals is presented by the stator current based on a series of decomposition which are compared with respect to each other which is an appropriate tool for studying transient phenomena and non-stationary signals.
Abstract: The analysis of motor current signature analysis was used many years ago, but the fast Fourier transform (FFT) technique has some disadvantages under some conditions when the speed and the load torque are not constants. The FFT has problems due to a non-stationary signal if we must report accurately the frequency characteristics of the defects. Discrete wavelets transform (DWT) treats the non-stationary stator current signal, which becomes complex when it has noises. In this paper, a technique of de-noising signals is presented by the stator current based on a series of decomposition which are compared with respect to each other. We studied a normal bearings and bearings with outer and inner faults. The choice of the decomposition order was for: Daubechies, Symlets and Meyer. The limit point of determination of the levels number is presented. In addition, we look for informations about the basic defect signal on the energy stored in each level of decomposition. DWT has the ability to allow simultaneous time–frequency analysis, so it is an appropriate tool for studying transient phenomena and non-stationary signals.

Journal ArticleDOI
TL;DR: A novel on-line chatter detection method by monitoring the vibration energy that works in discrete real time intervals, and can detect the chatter earlier than frequency domain-based methods, which rely on fast Fourier Transforms.
Abstract: Milling exhibits forced vibrations at tooth passing frequency and its harmonics, as well as chatter vibrations close to one of the natural modes. In addition, there are sidebands, which are spread at the multiples of tooth passing frequency above and below the chatter frequency, and make the robust chatter detection difficult. This paper presents a novel on-line chatter detection method by monitoring the vibration energy. Forced vibrations are removed from the measurements in discrete time domain using a Kalman filter. After removing all periodic components, the amplitude and frequency of chatter are searched in between the two consecutive tooth passing frequency harmonics using a nonlinear energy operator (NEO). When the energy of any chatter component grows relative to the energy of forced vibrations, the presence of chatter is detected. The proposed method works in discrete real time intervals, and can detect the chatter earlier than frequency domain-based methods, which rely on fast Fourier Transforms. The method has been experimentally validated in several milling tests using both microphone and accelerometer measurements, as well as using spindle speed and current signals.

Journal ArticleDOI
TL;DR: A transceiver architecture for broadband OAM orthogonal frequency-division multiplexing (OFDM) wireless communication systems is proposed, which uses baseband digital 2-D fast Fourier transform (FFT) rather than existing radio frequency analog phase shifters to generate and receive the OAM-OFDM signal, thus reducing energy consumption and hardware cost.
Abstract: Radio orbital angular momentum (OAM) provides another perspective of spatial multiplexing to improve the spectrum efficiency. However, multipath induces severe intra- and interchannel crosstalk. To solve the problem in a uniform circular array (UCA)-based OAM system, we first incorporate the effect of sign changing of OAM reflection in modeling the multipath OAM channel. Then, we propose a transceiver architecture for broadband OAM orthogonal frequency-division multiplexing (OFDM) wireless communication systems, which uses baseband digital 2-D fast Fourier transform (FFT) rather than existing radio frequency analog phase shifters to generate and receive the OAM-OFDM signal, thus reducing energy consumption and hardware cost. At last, a flexible 2-D FFT algorithm is developed. Analysis and simulation results show that compared with the traditional row–column FFT algorithm, the proposed 2-D FFT algorithm could reduce the multiplication complexity by $\frac{1}{4}MN\log _2N$ , where $N$ and $M$ are the number of UCA antenna elements and the number of subcarriers, respectively.

Journal ArticleDOI
TL;DR: In this paper, the effect of joint stiffness on the vibration behavior of a typical slider-crank mechanism with a flexible component and joint clearances is presented, based on the results, it is concluded that in mechanisms with high crank speeds, the fundamental natural frequency could be reached by lower external excitation frequencies.

Journal ArticleDOI
TL;DR: An iterative spectral formulation in which convolutions are calculated in the Fourier space is developed to solve for the mechanical state associated with the discrete eigenstrain-based microstructural representation and demonstrates the heterogeneous DDD-FFT approach's ability to inherently incorporate image forces arising from elastic inhomogeneities.

Journal ArticleDOI
TL;DR: In this article, a spectral method based on Fast Fourier Transformer (FFT) was developed to study the mechanical properties of three-dimensional (3D) braided composites with complex internal microstructures.

Journal ArticleDOI
TL;DR: A more general form of DFT interpolation based frequency estimator based on interpolation of three discrete Fourier transform spectral lines based on sinusoid signal is proposed.

Journal ArticleDOI
TL;DR: The Galerkin finite element method is applied to numerically solve the nonlinear fractional Schrodinger equation with wave operator to show that this fast algorithm is more practical than the traditional backslash and LU factorization methods, in terms of memory requirement and computational cost.

Journal ArticleDOI
20 Sep 2018-PLOS ONE
TL;DR: A scheme for the computation of NCC by fast Fourier transform that can favorably compare for speed efficiency with other existing techniques and may outperform some of them given an appropriate search scenario is developed.
Abstract: The normalized cross-correlation (NCC), usually its 2D version, is routinely encountered in template matching algorithms, such as in facial recognition, motion-tracking, registration in medical imaging, etc Its rapid computation becomes critical in time sensitive applications Here I develop a scheme for the computation of NCC by fast Fourier transform that can favorably compare for speed efficiency with other existing techniques and may outperform some of them given an appropriate search scenario

Journal ArticleDOI
TL;DR: A fast and quasi-optimal algorithm for computing the NUDFT based on the fast Fourier transform (FFT) is proposed, which is essentially the FFT, and is competitive with state-of-the-art algorithms.
Abstract: By viewing the nonuniform discrete Fourier transform (NUDFT) as a perturbed version of a uniform discrete Fourier transform, we propose a fast and quasi-optimal algorithm for computing the NUDFT based on the fast Fourier transform (FFT). Our key observation is that an NUDFT and DFT matrix divided entry by entry is often well approximated by a low rank matrix, allowing us to express a NUDFT matrix as a sum of diagonally scaled DFT matrices. Our algorithm is simple to implement, automatically adapts to any working precision, and is competitive with state-of-the-art algorithms. In the fully uniform case, our algorithm is essentially the FFT. We also describe quasi-optimal algorithms for the inverse NUDFT and two-dimensional NUDFTs.

Journal ArticleDOI
TL;DR: An optimal interpolation factor is derived and a new two-stage TDOA/FDOA estimation algorithm using a resampling block is proposed to reduce the computational complexity and the data size simultaneously in EW systems.
Abstract: The cross ambiguity function (CAF) has been commonly used to find time difference of arrival (TDOA) and frequency difference of arrival (FDOA). In most cases, direct computation of the CAF by using a conventional method such as fast Fourier transform is too computationally intensive. Thus, a two-stage approach consisting of a coarse mode to find rough TDOA/FDOA estimates and a fine mode for precise estimation was introduced. However, there has been no methodology for selecting an interpolation factor determined by the sampling frequency and target precision which significantly affects the computational complexity. In addition, even if the computational complexity can be reduced by using the optimal interpolation factor, the huge transmission data through the datalink between sensors and the central station still remains to be an obstacle for an electronic warfare (EW) system. In this respect, we derive an optimal interpolation factor and then propose a new two-stage TDOA/FDOA estimation algorithm using a resampling block to reduce the computational complexity and the data size simultaneously in EW systems. In the proposed method, the optimal interpolation factor can be used irrespective of the sampling frequency and the target precision. Simulation results show that the optimal interpolation factor efficiently reduces the computational burden without the loss of estimation performance.

Journal ArticleDOI
TL;DR: A novel method based on improved Nuttall double-window all-phase FFT is proposed by improving the window function and the spectrum correction method for achieving higher precision and has proven to perform better than the traditional algorithms both for the detection of harmonics and interharmonics.
Abstract: Harmonics and interharmonics adversely affect power grids. The fast Fourier transform (FFT) algorithm is one of the most commonly used methods for harmonic analysis. However, in practical applications, the accuracy of harmonic analysis can be seriously affected by fence effect and spectral leakage, which are undesired characteristics inherent to discrete Fourier transforms. Moreover, when non-synchronous sampling is carried out, the phase measurement is not accurate enough, and there is a large error in the identification of interharmonics. In order to improve the measurement precision, the method of all-phase spectrum analysis is used, since it has the characteristics of phase invariance and good spectral leakage suppression. A novel method based on improved Nuttall double-window all-phase FFT is proposed by improving the window function and the spectrum correction method for achieving higher precision. Through simulation and experimental verification, the proposed algorithm has proven to perform better than the traditional algorithms both for the detection of harmonics and interharmonics. In addition, the computation burden is not considerably increased when compared to such algorithms, which allows the on-line use of the proposed algorithm.

Proceedings ArticleDOI
19 Mar 2018
TL;DR: This work proposes a Fast Fourier Transform-based DNN training and inference model suitable for embedded platforms with reduced asymptotic complexity of both computation and storage, and develops and deploys the FFT-based inference model on embedded platforms achieving extraordinary processing speed.
Abstract: Deep learning has delivered its powerfulness in many application domains, especially in image and speech recognition. As the backbone of deep learning, deep neural networks (DNNs) consist of multiple layers of various types with hundreds to thousands of neurons. Embedded platforms are now becoming essential for deep learning deployment due to their portability, versatility, and energy efficiency. The large model size of DNNs, while providing excellent accuracy, also burdens the embedded platforms with intensive computation and storage. Researchers have investigated on reducing DNN model size with negligible accuracy loss. This work proposes a Fast Fourier Transform (FFT)-based DNN training and inference model suitable for embedded platforms with reduced asymptotic complexity of both computation and storage, making our approach distinguished from existing approaches. We develop the training and inference algorithms based on FFT as the computing kernel and deploy the FFT-based inference model on embedded platforms achieving extraordinary processing speed.

Book ChapterDOI
29 Apr 2018
TL;DR: In this article, a Gaussian preimage sampling algorithm based on the MP12 trapdoor lattices was proposed. But the algorithm is based on a variant of the Fast Fourier Orthogonalization (FFO) algorithm, which avoids the need to precompute and store the FFO matrix by careful rearrangement of the operations.
Abstract: We present improved algorithms for gaussian preimage sampling using the lattice trapdoors of (Micciancio and Peikert, CRYPTO 2012). The MP12 work only offered a highly optimized algorithm for the on-line stage of the computation in the special case when the lattice modulus q is a power of two. For arbitrary modulus q, the MP12 preimage sampling procedure resorted to general lattice algorithms with complexity cubic in the bitsize of the modulus (or quadratic, but with substantial preprocessing and storage overheads). Our new preimage sampling algorithm (for any modulus q) achieves linear complexity with very modest storage requirements, and experimentally outperforms the generic method of MP12 already for small values of q. As an additional contribution, we give a new, quasi-linear time algorithm for the off-line perturbation sampling phase of MP12 in the ring setting. Our algorithm is based on a variant of the Fast Fourier Orthogonalization (FFO) algorithm of (Ducas and Prest, ISSAC 2016), but avoids the need to precompute and store the FFO matrix by a careful rearrangement of the operations. All our algorithms are fairly simple, with small hidden constants, and offer a practical alternative to use the MP12 trapdoor lattices in a broad range of cryptographic applications.

Proceedings ArticleDOI
01 Dec 2018
TL;DR: An algorithm that dynamically splits the input single precision dataset into two half precision sets at the lowest level, uses half precision multiplication, and recombines the result at a later step is developed, paving the way for using tensor cores for high precision inputs.
Abstract: The Fast Fourier Transform is a fundamental tool in scientific and technical computation. The highly parallelizable nature of the algorithm makes it a suitable candidate for GPU acceleration. This paper focuses on exploiting the speedup due to using the half precision multiplication capability of the latest GPUs' tensor core hardware without significantly degrading the precision of the Fourier Transform result. We develop an algorithm that dynamically splits the input single precision dataset into two half precision sets at the lowest level, uses half precision multiplication, and recombines the result at a later step. This work paves the way for using tensor cores for high precision inputs.

Posted Content
TL;DR: A new efficient implementation of Iterative Filtering algorithm is provided, called Fast Iteratives Filtering, which reduces the original iterative algorithm computational complexity by utilizing, in a nontrivial way, Fast Fourier Transform in the computations.
Abstract: Real life signals are in general non--stationary and non--linear. The development of methods able to extract their hidden features in a fast and reliable way is of high importance in many research fields. In this work we tackle the problem of further analyzing the convergence of the Iterative Filtering method both in a continuous and a discrete setting in order to provide a comprehensive analysis of its behavior. Based on these results we provide new ideas for efficient implementations of Iterative Filtering algorithm which are based on Fast Fourier Transform (FFT), and the reduction of the original iterative algorithm to a direct method.