Showing papers on "Fast Fourier transform published in 2018"

PDF

Open Access

Posted Content•

[...]

Taco S. Cohen, Mario Geiger, Jonas Koehler, Max Welling

30 Jan 2018-arXiv: Learning

TL;DR: In this paper, the authors introduce the building blocks for constructing spherical CNNs and demonstrate the computational efficiency, numerical accuracy, and effectiveness of spherical convolutional networks applied to 3D model recognition and atomization energy regression.

...read moreread less

Abstract: Convolutional Neural Networks (CNNs) have become the method of choice for learning problems involving 2D planar images. However, a number of problems of recent interest have created a demand for models that can analyze spherical images. Examples include omnidirectional vision for drones, robots, and autonomous cars, molecular regression problems, and global weather and climate modelling. A naive application of convolutional networks to a planar projection of the spherical signal is destined to fail, because the space-varying distortions introduced by such a projection will make translational weight sharing ineffective. In this paper we introduce the building blocks for constructing spherical CNNs. We propose a definition for the spherical cross-correlation that is both expressive and rotation-equivariant. The spherical correlation satisfies a generalized Fourier theorem, which allows us to compute it efficiently using a generalized (non-commutative) Fast Fourier Transform (FFT) algorithm. We demonstrate the computational efficiency, numerical accuracy, and effectiveness of spherical CNNs applied to 3D model recognition and atomization energy regression.

...read moreread less

322 citations

Journal Article•DOI•

Channel Estimation in Broadband Millimeter Wave MIMO Systems With Few-Bit ADCs

[...]

Jianhua Mo¹, Philip Schniter², Robert W. Heath²•Institutions (2)

Samsung¹, Ohio State University²

01 Mar 2018-IEEE Transactions on Signal Processing

TL;DR: In this article, a broadband channel estimation algorithm for mmWave multiple input multiple output (MIMO) systems with few-bit analog-to-digital converters (ADCs) is proposed.

...read moreread less

Abstract: We develop a broadband channel estimation algorithm for millimeter wave (mmWave) multiple input multiple output (MIMO) systems with few-bit analog-to-digital converters (ADCs). Our methodology exploits the joint sparsity of the mmWave MIMO channel in the angle and delay domains. We formulate the estimation problem as a noisy quantized compressed-sensing problem and solve it using efficient approximate message passing (AMP) algorithms. In particular, we model the angle-delay coefficients using a Bernoulli–Gaussian-mixture distribution with unknown parameters and use the expectation-maximization forms of the generalized AMP and vector AMP algorithms to simultaneously learn the distributional parameters and compute approximately minimum mean-squared error (MSE) estimates of the channel coefficients. We design a training sequence that allows fast, fast Fourier transform based implementation of these algorithms while minimizing peak-to-average power ratio at the transmitter, making our methods scale efficiently to large numbers of antenna elements and delays. We present the results of a detailed simulation study that compares our algorithms to several benchmarks. Our study investigates the effect of SNR, training length, training type, ADC resolution, and runtime on channel estimation MSE, mutual information, and achievable rate. It shows that, in a mmWave MIMO system, the methods we propose to exploit joint angle-delay sparsity allow 1-bit ADCs to perform comparably to infinite-bit ADCs at low SNR, and 4-bit ADCs to perform comparably to infinite-bit ADCs at medium SNR.

...read moreread less

319 citations

Journal Article•DOI•

Machine learning-based self-powered acoustic sensor for speaker recognition

[...]

Jae Hyun Han¹, Bae Kang Min¹, Seong Kwang Hong¹, Hyunsin Park¹, Jun-Hyuk Kwak, Hee Seung Wang¹, Daniel J. Joe¹, Jung-Hwan Park¹, Younghoon Jung¹, Shin Hur, Chang D. Yoo¹, Keon Jae Lee¹ - Show less +8 more•Institutions (1)

KAIST¹

01 Nov 2018-Nano Energy

TL;DR: In this article, a flexible piezoelectric acoustic sensor (f-PAS) with a highly sensitive multi-resonant frequency band was fabricated by mimicking the operating mechanism of the basilar membrane in the human cochlear.

...read moreread less

113 citations

Journal Article•DOI•

Accelerating Convolutional Neural Network With FFT on Embedded Hardware

[...]

Tahmid Abtahi¹, Colin Shea¹, Amey Kulkarni¹, Tinoosh Mohsenin¹•Institutions (1)

University of Maryland, Baltimore County¹

21 Jun 2018-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: Three variations of convolutions are evaluated, including direct convolution, fast Fourier transform-based convolution (FFT-Conv), and FFT overlap and add convolution for popular CNN networks in embedded hardware to explore the tradeoff between software and hardware implementation, domain-specific logic and instructions, as well as various parallelism across different architectures.

...read moreread less

Abstract: Fueled by ImageNet Large Scale Visual Recognition Challenge and Common Objects in Context competitions, the convolutional neural network (CNN) has become important in computer vision and natural language processing. However, state-of-the-art CNNs are computationally memory-intensive, thus energy-efficient implementation on the embedded platform is challenging. Recently, VGGNet and ResNet showed that deep neural networks with more convolution layers and a few fully connected layers can achieve lower error rates, thus reducing the complexity of convolution layers is of utmost importance. In this paper, we evaluate three variations of convolutions, including direct convolution (Direct-Conv), fast Fourier transform (FFT)-based convolution (FFT-Conv), and FFT overlap and add convolution (FFT-OVA-Conv) in terms of computation complexity and memory storage requirements for popular CNN networks in embedded hardware. We implemented these three techniques for ResNet-20 with the CIFAR-10 data set on a low-power domain-specific many-core architecture called power-efficient nanoclusters (PENCs), NVIDIA Jetson TX1 graphics processing unit (GPU), ARM Cortex A53 CPU, and SPARse Convolutional NETwork (SPARCNet) accelerator on Zynq 7020 FPGA to explore the tradeoff between software and hardware implementation, domain-specific logic and instructions, as well as various parallelism across different architectures. Results are evaluated and compared with respect to throughput per layer, energy consumption, and execution time for the three methods. SPARCNet deployed on Zynq FPGA achieved 42-ms runtime with 135-mJ energy consumption with a 10.8-MB/s throughput per layer using FFT-Conv for ResNet-20. Using built-in FFT instruction in PENC, the FFT-OVA-Conv performs $2.9\times $ and $1.65\times $ faster and achieves $6.8\times $ and $2.5\times $ higher throughput per watt than Direct-Conv and FFT-Conv. In ARM A53 CPU, FFT-OVA-Conv achieves $3.36\times $ and $1.38\times $ improvement in execution time and $2.72\times $ and $1.32\times $ higher throughput than Direct-Conv and FFT-Conv. In TX1 GPU, FFT-Conv is $1.9\times $ faster, $2.2\times $ more energy-efficient, and achieves $5.6\times $ higher throughput per layer than Direct-Conv. PENC is 10 $916\times $ and $1.8\times $ faster and $5053\times $ and $4.3\times $ more energy-efficient and achieves $7.5\times $ and $1.2\times $ higher throughput per layer than ARM A53 CPU and TX1 GPU, respectively.

...read moreread less

84 citations

Journal Article•DOI•

Kramers-Kronig receiver operable without digital upsampling.

[...]

Tianwai Bo¹, Hoon Kim¹•Institutions (1)

KAIST¹

28 May 2018-Optics Express

TL;DR: A new DSP algorithm for KK receiver operable at 2 samples per symbol is proposed to avoid the use of nonlinear operations such as logarithm and exponential functions and demonstrates the transmission of 112-Gb/s SSB orthogonal frequency-division-multiplexed signal over an 80-km fiber link.

...read moreread less

Abstract: The Kramers-Kronig (KK) receiver is capable of retrieving the phase information of optical single-sideband (SSB) signal from the optical intensity when the optical signal satisfies the minimum phase condition. Thus, it is possible to direct-detect the optical SSB signal without suffering from the signal-signal beat interference and linear transmission impairments. However, due to the spectral broadening induced by nonlinear operations in the conventional KK algorithm, it is necessary to employ the digital upsampling at the beginning of the digital signal processing (DSP). The increased number of samples at the DSP would hinder the real-time implementation of this attractive receiver. Hence, we propose a new DSP algorithm for KK receiver operable at 2 samples per symbol. We adopt a couple of mathematical approximations to avoid the use of nonlinear operations such as logarithm and exponential functions. By using the proposed algorithm, we demonstrate the transmission of 112-Gb/s SSB orthogonal frequency-division-multiplexed signal over an 80-km fiber link. The results show that the proposed algorithm operating at 2 samples per symbol exhibits similar performance to the conventional KK one operating at 6 samples per symbol. We also present the error analysis of the proposed algorithm for KK receiver in comparison with the conventional one.

...read moreread less

82 citations

Journal Article•DOI•

Bayesian Optimal Data Detector for Hybrid mmWave MIMO-OFDM Systems With Low-Resolution ADCs

[...]

Hengtao He¹, Chao-Kai Wen², Shi Jin¹•Institutions (2)

Southeast University¹, National Sun Yat-sen University²

21 Mar 2018-IEEE Journal of Selected Topics in Signal Processing

TL;DR: This study considers a mmWave MIMO-orthogonal frequency division multiplexing (OFDM) receiver with a generalized hybrid architecture in which a small number of radio frequency (RF) chains and low-resolution ADCs are employed simultaneously and proposes a computationally efficient data detection algorithm that provides a minimum mean-square error estimate on data symbols and is extended to a mixed-ADC architecture.

...read moreread less

Abstract: Hybrid analog–digital precoding architectures and low-resolution analog-to-digital converter (ADC) receivers are two solutions to reduce hardware cost and power consumption for millimeter wave (mmWave) multiple-input multiple-output (MIMO) communication systems with large antenna arrays. In this study, we consider a mmWave MIMO-orthogonal frequency division multiplexing (OFDM) receiver with a generalized hybrid architecture in which a small number of radio frequency (RF) chains and low-resolution ADCs are employed simultaneously. Owing to the strong nonlinearity introduced by low-resolution ADCs, the task of data detection is challenging, particularly achieving a Bayesian optimal data detection. This study aims to fill this gap. By using a generalized expectation consistent signal recovery technique, we propose a computationally efficient data detection algorithm that provides a minimum mean-square error estimate on data symbols and is extended to a mixed-ADC architecture. Considering particular structure of MIMO-OFDM channel matrix, we provide a low-complexity realization in which only fast fourier transform (FFT) operation and matrix-vector multiplications are required. Furthermore, we present an analytical framework to study the theoretical performance of the detector in the large-system limit, which can precisely evaluate the performance expressions, such as mean-square error and symbol error rate. Based on this optimal detector, the potential of adding a few low-resolution RF chains and high-resolution ADCs for a mixed-ADC architecture is investigated. Simulation results confirm the accuracy of our theoretical analysis and can be used for system design rapidly. The results reveal that adding a few low-resolution RF chains to original unquantized systems can obtain significant gains.

...read moreread less

78 citations

Journal Article•DOI•

Respiration and Heartbeat Rates Measurement Based on Autocorrelation Using IR-UWB Radar

[...]

Shen Hongming¹, Xu Chen¹, Yang Yongjie¹, Ling Sun¹, Zhitian Cai¹, Lin Bai², Edward A. Clancy², Xinming Huang² - Show less +4 more•Institutions (2)

Nantong University¹, Worcester Polytechnic Institute²

26 Jul 2018-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: A new method based on autocorrelation to measure the RR and HR using IR-UWB radar with high accuracy and variational mode decomposition algorithm is adopted to successfully separate the respiration and heartbeat signals.

...read moreread less

Abstract: Respiration rate (RR) and heartbeat rate (HR) are important physiological parameters for a person. Impulse radio ultra-wideband (IR-UWB) is a promising technology for non-contact sensing and monitoring. This brief presents a new method based on autocorrelation to measure the RR and HR using IR-UWB radar. The correlation coefficient waveform contains the vital sign signals, overcoming the effect of noise and clutter. Applying fast Fourier transform, the respiration frequency can be acquired easily. A clever method also based on autocorrelation is proposed to locate the subject. The receive signal matrix is divided into a set of bins in the direction of fast time. By removing one block from the matrix each time and re-applying the autocorrelation, the removed block resulting the smallest correlations is corresponding to the location of a subject. Moreover, variational mode decomposition algorithm is adopted to successfully separate the respiration and heartbeat signals. Experiments are carried out using a PulsOn410 UWB radar. The results show that the proposed low-complexity algorithm has high accuracy.

...read moreread less

76 citations

Journal Article•DOI•

Sparse Convolutional Beamforming for Ultrasound Imaging

[...]

Regev Cohen¹, Yonina C. Eldar¹•Institutions (1)

Technion – Israel Institute of Technology¹

08 Oct 2018-IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control

TL;DR: The results demonstrate that COBA outperforms DAS in terms of resolution and contrast and that the suggested beamformers offer a sizable element reduction while generating images with an equivalent or improved quality in comparison with DAS.

...read moreread less

Abstract: The standard technique used by commercial medical ultrasound systems to form B-mode images is delay and sum (DAS) beamforming. However, DAS often results in limited image resolution and contrast that are governed by the center frequency and the aperture size of the ultrasound transducer. A large number of elements lead to improved resolution but at the same time increase the data size and the system cost due to the receive electronics required for each element. Therefore, reducing the number of receiving channels while producing high-quality images is of great importance. In this paper, we introduce a nonlinear beamformer called COnvolutional Beamforming Algorithm (COBA), which achieves significant improvement of lateral resolution and contrast. In addition, it can be implemented efficiently using the fast Fourier transform. Based on the COBA concept, we next present two sparse beamformers with closed-form expressions for the sensor locations, which result in the same beam pattern as DAS and COBA while using far fewer array elements. Optimization of the number of elements shows that they require a minimal number of elements that are on the order of the square root of the number used by DAS. The performance of the proposed methods is tested and validated using simulated data, phantom scans, and in vivo cardiac data. The results demonstrate that COBA outperforms DAS in terms of resolution and contrast and that the suggested beamformers offer a sizable element reduction while generating images with an equivalent or improved quality in comparison with DAS.

...read moreread less

74 citations

Journal Article•DOI•

Frequency Synchronization for Uplink Massive MIMO Systems

[...]

Weile Zhang¹, Feifei Gao², Shi Jin³, Hai Lin⁴•Institutions (4)

Xi'an Jiaotong University¹, Tsinghua University², Southeast University³, Osaka Prefecture University⁴

01 Jan 2018-IEEE Transactions on Wireless Communications

TL;DR: In this paper, a frequency synchronization scheme for multiuser orthogonal frequency division multiplexing uplink with a large-scale uniform linear array at base station (BS) by exploiting the angle information of users is proposed.

...read moreread less

Abstract: In this paper, we propose a frequency synchronization scheme for multiuser orthogonal frequency division multiplexing uplink with a large-scale uniform linear array at base station (BS) by exploiting the angle information of users. Considering that the incident signal at BS from each user can be restricted within a certain angular spread, the proposed scheme could perform carrier frequency offset (CFO) estimation for each user individually through a joint spatial-frequency alignment procedure and can be completed efficiently with the aid of fast Fourier transform. A multi-branch receive beamforming is further designed to yield an equivalent single user transmission model for which the conventional single-user channel estimation and data detection can be carried out. To make the study complete, theoretical performance analysis of the CFO estimation is also conducted. We further develop a user grouping scheme to deal with the unexpected scenarios that some users may not be separated well from the spatial domain. Finally, various numerical results are provided to verify the proposed studies.

...read moreread less

71 citations

Journal Article•DOI•

A FFT-based finite-difference solver for massively-parallel direct numerical simulations of turbulent flows

[...]

Pedro Costa

15 Oct 2018-Computers & Mathematics With Applications

TL;DR: An efficient solver for massively-parallel direct numerical simulations of incompressible turbulent flows using a second-order, finite-volume pressure-correction scheme, where the pressure Poisson equation is solved with the method of eigenfunction expansions.

...read moreread less

Abstract: We present an efficient solver for massively-parallel direct numerical simulations of incompressible turbulent flows. The method uses a second-order, finite-volume pressure-correction scheme, where the pressure Poisson equation is solved with the method of eigenfunction expansions. This approach allows for very efficient FFT-based solvers in problems with different combinations of homogeneous pressure boundary conditions. Our algorithm explores all combinations of pressure boundary conditions valid for such a solver, in a single, general framework. The method is implemented in a 2D pencil-like domain decomposition, which enables efficient massively-parallel simulations. The implementation was validated against different canonical flows, and its computational performance was examined. Excellent strong scaling performance up to 1 0 4 cores is demonstrated for a domain with 1 0 9 spatial degrees of freedom, corresponding to a very small wall-clock time/time step. The resulting tool, CaNS, has been made freely available and open-source.

...read moreread less

71 citations

Posted Content•

A parallel non-uniform fast Fourier transform library based on an "exponential of semicircle" kernel

[...]

Alex H. Barnett, Jeremy F. Magland, Ludvig af Klinteberg

21 Aug 2018-arXiv: Numerical Analysis

TL;DR: FinUFFT as mentioned in this paper is an efficient parallel library for non-uniform fast Fourier transform (NUFFT) in dimensions 1, 2, or 3, which uses minimal RAM, requires no precomputation or plan steps, and has a simple interface to several languages.

...read moreread less

Abstract: The nonuniform fast Fourier transform (NUFFT) generalizes the FFT to off-grid data. Its many applications include image reconstruction, data analysis, and the numerical solution of differential equations. We present FINUFFT, an efficient parallel library for type 1 (nonuiform to uniform), type 2 (uniform to nonuniform), or type 3 (nonuniform to nonuniform) transforms, in dimensions 1, 2, or 3. It uses minimal RAM, requires no precomputation or plan steps, and has a simple interface to several languages. We perform the expensive spreading/interpolation between nonuniform points and the fine grid via a simple new kernel---the `exponential of semicircle' $e^{\beta \sqrt{1-x^2}}$ in $x\in[-1,1]$---in a cache-aware load-balanced multithreaded implementation. The deconvolution step requires the Fourier transform of the kernel, for which we propose efficient numerical quadrature. For types 1 and 2, rigorous error bounds asymptotic in the kernel width approach the fastest known exponential rate, namely that of the Kaiser--Bessel kernel. We benchmark against several popular CPU-based libraries, showing favorable speed and memory footprint, especially in three dimensions when high accuracy and/or clustered point distributions are desired.

...read moreread less

Journal Article•DOI•

Review of numerical methods for NumILPT with computational accuracy assessment for fractional calculus

[...]

Dariusz W. Brzeziński¹•Institutions (1)

Lodz University of Technology¹

01 Dec 2018

TL;DR: In this article, the authors present results of accuracy evaluation of numerous numerical algorithms for the numerical approximation of the Inverse Laplace Transform, including Stehfest, Abate and Whitt, Vlach and Singhai.

...read moreread less

Abstract: In the paper we present results of accuracy evaluation of numerous numerical algorithms for the numerical approximation of the Inverse Laplace Transform. The selected algorithms represent diverse lines of approach to this problem and include methods by Stehfest, Abate and Whitt, Vlach and Singhai, De Hoog, Talbot, Zakian and a one in which the FFT is applied for the Fourier series convergence acceleration. We use C++ and Python languages with arbitrary precision mathematical libraries to address some crucial issues of numerical implementation. The test set includes Laplace transforms considered as difficult to compute as well as some others commonly applied in fractional calculus. Evaluation results enable to conclude that the Talbot method which involves deformed Bromwich contour integration, the De Hoog and the Abate and Whitt methods using Fourier series expansion with accelerated convergence can be assumed as general purpose high-accuracy algorithms. They can be applied to a wide variety of inversion problems.

...read moreread less

Journal Article•DOI•

Approximate Fast Graph Fourier Transforms via Multilayer Sparse Approximations

[...]

Luc Le Magoarou¹, Rémi Gribonval¹, Nicolas Tremblay²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, Centre national de la recherche scientifique²

01 Jun 2018

TL;DR: This paper proposes a method to obtain approximate graph Fourier transforms that can be applied rapidly and stored efficiently, carried out using a modified version of the famous Jacobi eigenvalues algorithm.

...read moreread less

Abstract: The fast Fourier transform is an algorithm of paramount importance in signal processing as it allows to apply the Fourier transform in $\mathcal {O}(n \log n)$ instead of $\mathcal {O}(n^2)$ arithmetic operations. Graph signal processing is a recent research domain that generalizes classical signal processing tools, such as the Fourier transform, to situations where the signal domain is given by any arbitrary graph instead of a regular grid. Today, there is no method to rapidly apply graph Fourier transforms. In this paper, we propose a method to obtain approximate graph Fourier transforms that can be applied rapidly and stored efficiently. It is based on a greedy approximate diagonalization of the graph Laplacian matrix, carried out using a modified version of the famous Jacobi eigenvalues algorithm. The method is described and analyzed in detail, and then applied to both synthetic and real graphs, showing its potential.

...read moreread less

Journal Article•DOI•

Frequency-Domain Joint Channel Estimation and Decoding for Faster-Than-Nyquist Signaling

[...]

Qiaolin Shi¹, Nan Wu¹, Xiaoli Ma², Hua Wang¹•Institutions (2)

Beijing Institute of Technology¹, Georgia Institute of Technology²

01 Feb 2018-IEEE Transactions on Communications

TL;DR: Faster-than-Nyquist (FTN) signaling reaches up to 67% higher transmission rate compared to the Nyquist counterpart without substantially consuming more transmitter energy per bit, and the overall complexities grow logarithmically with the length of the observations.

...read moreread less

Abstract: Faster-than-Nyquist (FTN) signaling has attracted a lot of attentions for the fifth-generation (5G) cellular communication systems. However, low-complexity receiver design for FTN signaling becomes challenging. In this paper, we develop frequency-domain joint channel estimation and decoding methods for FTN signaling transmitting systems over frequency-selective fading channels. To deal with the colored noise inherent in FTN signaling, we propose to approximate the corresponding autocorrelation matrix by a circulant matrix, the special eigenvalue decomposition of which facilitates an efficient fast Fourier transform operation and decoupling the noise in frequency domain. Through a specific partition of the received symbols, many independent estimates are obtained and combined to further improve the accuracy of the channel estimation and data detection. Moreover, instead of assuming the data symbols to be Gaussian random variables, a generalized approximated message passing-based equalization is developed and embedded in the turbo iterations between the channel estimation and the soft-in soft-out decoder. Simulation results show that the proposed algorithm outperforms the cyclic prefix-based and overlap-based frequency-domain equalization methods. With the proposed algorithms, FTN signaling reaches up to 67% higher transmission rate compared to the Nyquist counterpart without substantially consuming more transmitter energy per bit, and the overall complexities grow logarithmically with the length of the observations.

...read moreread less

Journal Article•DOI•

Diagnosis of bearing defects in induction motors using discrete wavelet transform

[...]

Noureddine Bessous, Salah Eddine Zouzou, Wafa Bentrah, Salim Sbaa, Mohamed Sahraoui - Show less +1 more

01 Apr 2018-International Journal of Systems Assurance Engineering and Management

TL;DR: In this paper, a technique of de-noising signals is presented by the stator current based on a series of decomposition which are compared with respect to each other which is an appropriate tool for studying transient phenomena and non-stationary signals.

...read moreread less

Abstract: The analysis of motor current signature analysis was used many years ago, but the fast Fourier transform (FFT) technique has some disadvantages under some conditions when the speed and the load torque are not constants. The FFT has problems due to a non-stationary signal if we must report accurately the frequency characteristics of the defects. Discrete wavelets transform (DWT) treats the non-stationary stator current signal, which becomes complex when it has noises. In this paper, a technique of de-noising signals is presented by the stator current based on a series of decomposition which are compared with respect to each other. We studied a normal bearings and bearings with outer and inner faults. The choice of the decomposition order was for: Daubechies, Symlets and Meyer. The limit point of determination of the levels number is presented. In addition, we look for informations about the basic defect signal on the energy stored in each level of decomposition. DWT has the ability to allow simultaneous time–frequency analysis, so it is an appropriate tool for studying transient phenomena and non-stationary signals.

...read moreread less

Journal Article•DOI•

On-line energy-based milling chatter detection

[...]

Hakan Caliskan¹, Zekai Murat Kilic¹, Yusuf Altintas¹•Institutions (1)

University of British Columbia¹

01 Nov 2018-Journal of Manufacturing Science and Engineering-transactions of The Asme

TL;DR: A novel on-line chatter detection method by monitoring the vibration energy that works in discrete real time intervals, and can detect the chatter earlier than frequency domain-based methods, which rely on fast Fourier Transforms.

...read moreread less

Abstract: Milling exhibits forced vibrations at tooth passing frequency and its harmonics, as well as chatter vibrations close to one of the natural modes. In addition, there are sidebands, which are spread at the multiples of tooth passing frequency above and below the chatter frequency, and make the robust chatter detection difficult. This paper presents a novel on-line chatter detection method by monitoring the vibration energy. Forced vibrations are removed from the measurements in discrete time domain using a Kalman filter. After removing all periodic components, the amplitude and frequency of chatter are searched in between the two consecutive tooth passing frequency harmonics using a nonlinear energy operator (NEO). When the energy of any chatter component grows relative to the energy of forced vibrations, the presence of chatter is detected. The proposed method works in discrete real time intervals, and can detect the chatter earlier than frequency domain-based methods, which rely on fast Fourier Transforms. The method has been experimentally validated in several milling tests using both microphone and accelerometer measurements, as well as using spindle speed and current signals.

...read moreread less

Journal Article•DOI•

A 2-D FFT-Based Transceiver Architecture for OAM-OFDM Systems With UCA Antennas

[...]

Rui Chen¹, Wenhai Yang¹, Hui Xu¹, Jiandong Li¹•Institutions (1)

Xidian University¹

22 Mar 2018-IEEE Transactions on Vehicular Technology

TL;DR: A transceiver architecture for broadband OAM orthogonal frequency-division multiplexing (OFDM) wireless communication systems is proposed, which uses baseband digital 2-D fast Fourier transform (FFT) rather than existing radio frequency analog phase shifters to generate and receive the OAM-OFDM signal, thus reducing energy consumption and hardware cost.

...read moreread less

Abstract: Radio orbital angular momentum (OAM) provides another perspective of spatial multiplexing to improve the spectrum efficiency. However, multipath induces severe intra- and interchannel crosstalk. To solve the problem in a uniform circular array (UCA)-based OAM system, we first incorporate the effect of sign changing of OAM reflection in modeling the multipath OAM channel. Then, we propose a transceiver architecture for broadband OAM orthogonal frequency-division multiplexing (OFDM) wireless communication systems, which uses baseband digital 2-D fast Fourier transform (FFT) rather than existing radio frequency analog phase shifters to generate and receive the OAM-OFDM signal, thus reducing energy consumption and hardware cost. At last, a flexible 2-D FFT algorithm is developed. Analysis and simulation results show that compared with the traditional row–column FFT algorithm, the proposed 2-D FFT algorithm could reduce the multiplication complexity by $\frac{1}{4}MN\log _2N$ , where $N$ and $M$ are the number of UCA antenna elements and the number of subcarriers, respectively.

...read moreread less

Journal Article•DOI•

Frequency analysis of a typical planar flexible multibody system with joint clearances

[...]

Esmaeil Salahshoor¹, Saeed Ebrahimi¹, Yunqing Zhang²•Institutions (2)

Yazd University¹, Huazhong University of Science and Technology²

01 Aug 2018-Mechanism and Machine Theory

TL;DR: In this paper, the effect of joint stiffness on the vibration behavior of a typical slider-crank mechanism with a flexible component and joint clearances is presented, based on the results, it is concluded that in mechanisms with high crank speeds, the fundamental natural frequency could be reached by lower external excitation frequencies.

...read moreread less

Journal Article•DOI•

A FFT-based formulation for discrete dislocation dynamics in heterogeneous media

[...]

Nicolas Bertin¹, Laurent Capolungo²•Institutions (2)

Stanford University¹, Los Alamos National Laboratory²

15 Feb 2018-Journal of Computational Physics

TL;DR: An iterative spectral formulation in which convolutions are calculated in the Fourier space is developed to solve for the mechanical state associated with the discrete eigenstrain-based microstructural representation and demonstrates the heterogeneous DDD-FFT approach's ability to inherently incorporate image forces arising from elastic inhomogeneities.

...read moreread less

Journal Article•DOI•

Progressive damage analysis of 3D braided composites using FFT-based method

[...]

Bing Wang¹, Guodong Fang¹, Shuo Liu¹, Maoqing Fu¹, Jun Liang² - Show less +1 more•Institutions (2)

Harbin Institute of Technology¹, Beijing Institute of Technology²

15 May 2018-Composite Structures

TL;DR: In this article, a spectral method based on Fast Fourier Transformer (FFT) was developed to study the mechanical properties of three-dimensional (3D) braided composites with complex internal microstructures.

...read moreread less

Journal Article•DOI•

Frequency estimator of sinusoid based on interpolation of three DFT spectral lines

[...]

Lei Fan¹, Guoqing Qi²•Institutions (2)

Dalian Polytechnic University¹, Dalian Maritime University²

01 Mar 2018-Signal Processing

TL;DR: A more general form of DFT interpolation based frequency estimator based on interpolation of three discrete Fourier transform spectral lines based on sinusoid signal is proposed.

...read moreread less

Journal Article•DOI•

A fast energy conserving finite element method for the nonlinear fractional Schrödinger equation with wave operator

[...]

Meng Li¹, Yong-Liang Zhao²•Institutions (2)

Zhengzhou University¹, University of Electronic Science and Technology of China²

01 Dec 2018-Applied Mathematics and Computation

TL;DR: The Galerkin finite element method is applied to numerically solve the nonlinear fractional Schrodinger equation with wave operator to show that this fast algorithm is more practical than the traditional backslash and LU factorization methods, in terms of memory requirement and computational cost.

...read moreread less

Journal Article•DOI•

Computation of the normalized cross-correlation by fast Fourier transform.

[...]

Artan Kaso¹•Institutions (1)

University of Maryland, Baltimore¹

20 Sep 2018-PLOS ONE

TL;DR: A scheme for the computation of NCC by fast Fourier transform that can favorably compare for speed efficiency with other existing techniques and may outperform some of them given an appropriate search scenario is developed.

...read moreread less

Abstract: The normalized cross-correlation (NCC), usually its 2D version, is routinely encountered in template matching algorithms, such as in facial recognition, motion-tracking, registration in medical imaging, etc Its rapid computation becomes critical in time sensitive applications Here I develop a scheme for the computation of NCC by fast Fourier transform that can favorably compare for speed efficiency with other existing techniques and may outperform some of them given an appropriate search scenario

...read moreread less

Journal Article•DOI•

A nonuniform fast Fourier transform based on low rank approximation

[...]

Diego Ruiz-Antolin, Alex Townsend

20 Feb 2018-SIAM Journal on Scientific Computing

TL;DR: A fast and quasi-optimal algorithm for computing the NUDFT based on the fast Fourier transform (FFT) is proposed, which is essentially the FFT, and is competitive with state-of-the-art algorithms.

...read moreread less

Abstract: By viewing the nonuniform discrete Fourier transform (NUDFT) as a perturbed version of a uniform discrete Fourier transform, we propose a fast and quasi-optimal algorithm for computing the NUDFT based on the fast Fourier transform (FFT). Our key observation is that an NUDFT and DFT matrix divided entry by entry is often well approximated by a low rank matrix, allowing us to express a NUDFT matrix as a sum of diagonally scaled DFT matrices. Our algorithm is simple to implement, automatically adapts to any working precision, and is competitive with state-of-the-art algorithms. In the fully uniform case, our algorithm is essentially the FFT. We also describe quasi-optimal algorithms for the inverse NUDFT and two-dimensional NUDFTs.

...read moreread less

Journal Article•DOI•

Computationally Efficient TDOA/FDOA Estimation for Unknown Communication Signals in Electronic Warfare Systems

[...]

Dong-Gyu Kim¹, Geun-Ho Park¹, Hyoung-Nam Kim¹, Jin-Oh Park, Young Mi Park², Wook-Hyeon Shin² - Show less +2 more•Institutions (2)

Pusan National University¹, Agency for Defense Development²

01 Feb 2018-IEEE Transactions on Aerospace and Electronic Systems

TL;DR: An optimal interpolation factor is derived and a new two-stage TDOA/FDOA estimation algorithm using a resampling block is proposed to reduce the computational complexity and the data size simultaneously in EW systems.

...read moreread less

Abstract: The cross ambiguity function (CAF) has been commonly used to find time difference of arrival (TDOA) and frequency difference of arrival (FDOA). In most cases, direct computation of the CAF by using a conventional method such as fast Fourier transform is too computationally intensive. Thus, a two-stage approach consisting of a coarse mode to find rough TDOA/FDOA estimates and a fine mode for precise estimation was introduced. However, there has been no methodology for selecting an interpolation factor determined by the sampling frequency and target precision which significantly affects the computational complexity. In addition, even if the computational complexity can be reduced by using the optimal interpolation factor, the huge transmission data through the datalink between sensors and the central station still remains to be an obstacle for an electronic warfare (EW) system. In this respect, we derive an optimal interpolation factor and then propose a new two-stage TDOA/FDOA estimation algorithm using a resampling block to reduce the computational complexity and the data size simultaneously in EW systems. In the proposed method, the optimal interpolation factor can be used irrespective of the sampling frequency and the target precision. Simulation results show that the optimal interpolation factor efficiently reduces the computational burden without the loss of estimation performance.

...read moreread less

Journal Article•DOI•

Power harmonic and interharmonic detection method in renewable power based on Nuttall double-window all-phase FFT algorithm

[...]

Taixin Su, Mingfa Yang, Tao Jin, Rodolfo C.C. Flesch

01 Jun 2018-Iet Renewable Power Generation

TL;DR: A novel method based on improved Nuttall double-window all-phase FFT is proposed by improving the window function and the spectrum correction method for achieving higher precision and has proven to perform better than the traditional algorithms both for the detection of harmonics and interharmonics.

...read moreread less

Abstract: Harmonics and interharmonics adversely affect power grids. The fast Fourier transform (FFT) algorithm is one of the most commonly used methods for harmonic analysis. However, in practical applications, the accuracy of harmonic analysis can be seriously affected by fence effect and spectral leakage, which are undesired characteristics inherent to discrete Fourier transforms. Moreover, when non-synchronous sampling is carried out, the phase measurement is not accurate enough, and there is a large error in the identification of interharmonics. In order to improve the measurement precision, the method of all-phase spectrum analysis is used, since it has the characteristics of phase invariance and good spectral leakage suppression. A novel method based on improved Nuttall double-window all-phase FFT is proposed by improving the window function and the spectrum correction method for achieving higher precision. Through simulation and experimental verification, the proposed algorithm has proven to perform better than the traditional algorithms both for the detection of harmonics and interharmonics. In addition, the computation burden is not considerably increased when compared to such algorithms, which allows the on-line use of the proposed algorithm.

...read moreread less

Proceedings Article•DOI•

FFT-based deep learning deployment in embedded systems

[...]

Sheng Lin¹, Ning Liu¹, Mahdi Nazemi², Hongjia Li¹, Caiwen Ding¹, Yanzhi Wang¹, Massoud Pedram² - Show less +3 more•Institutions (2)

Syracuse University¹, University of Southern California²

19 Mar 2018

TL;DR: This work proposes a Fast Fourier Transform-based DNN training and inference model suitable for embedded platforms with reduced asymptotic complexity of both computation and storage, and develops and deploys the FFT-based inference model on embedded platforms achieving extraordinary processing speed.

...read moreread less

Abstract: Deep learning has delivered its powerfulness in many application domains, especially in image and speech recognition. As the backbone of deep learning, deep neural networks (DNNs) consist of multiple layers of various types with hundreds to thousands of neurons. Embedded platforms are now becoming essential for deep learning deployment due to their portability, versatility, and energy efficiency. The large model size of DNNs, while providing excellent accuracy, also burdens the embedded platforms with intensive computation and storage. Researchers have investigated on reducing DNN model size with negligible accuracy loss. This work proposes a Fast Fourier Transform (FFT)-based DNN training and inference model suitable for embedded platforms with reduced asymptotic complexity of both computation and storage, making our approach distinguished from existing approaches. We develop the training and inference algorithms based on FFT as the computing kernel and deploy the FFT-based inference model on embedded platforms achieving extraordinary processing speed.

...read moreread less

Book Chapter•DOI•

Faster Gaussian Sampling for Trapdoor Lattices with Arbitrary Modulus

[...]

Nicholas Genise¹, Daniele Micciancio¹•Institutions (1)

University of California¹

29 Apr 2018

TL;DR: In this article, a Gaussian preimage sampling algorithm based on the MP12 trapdoor lattices was proposed. But the algorithm is based on a variant of the Fast Fourier Orthogonalization (FFO) algorithm, which avoids the need to precompute and store the FFO matrix by careful rearrangement of the operations.

...read moreread less

Abstract: We present improved algorithms for gaussian preimage sampling using the lattice trapdoors of (Micciancio and Peikert, CRYPTO 2012). The MP12 work only offered a highly optimized algorithm for the on-line stage of the computation in the special case when the lattice modulus q is a power of two. For arbitrary modulus q, the MP12 preimage sampling procedure resorted to general lattice algorithms with complexity cubic in the bitsize of the modulus (or quadratic, but with substantial preprocessing and storage overheads). Our new preimage sampling algorithm (for any modulus q) achieves linear complexity with very modest storage requirements, and experimentally outperforms the generic method of MP12 already for small values of q. As an additional contribution, we give a new, quasi-linear time algorithm for the off-line perturbation sampling phase of MP12 in the ring setting. Our algorithm is based on a variant of the Fast Fourier Orthogonalization (FFO) algorithm of (Ducas and Prest, ISSAC 2016), but avoids the need to precompute and store the FFO matrix by a careful rearrangement of the operations. All our algorithms are fairly simple, with small hidden constants, and offer a practical alternative to use the MP12 trapdoor lattices in a broad range of cryptographic applications.

...read moreread less

Proceedings Article•DOI•

Optimizing the Fast Fourier Transform Using Mixed Precision on Tensor Core Hardware

[...]

Anumeena Sorna¹, Xiaohe Cheng², Eduardo D'Azevedo³, Kwai Won⁴, Stanimire Tomov⁴ - Show less +1 more•Institutions (4)

National Institute of Technology, Tiruchirappalli¹, Hong Kong University of Science and Technology², Oak Ridge National Laboratory³, University of Tennessee⁴

01 Dec 2018

TL;DR: An algorithm that dynamically splits the input single precision dataset into two half precision sets at the lowest level, uses half precision multiplication, and recombines the result at a later step is developed, paving the way for using tensor cores for high precision inputs.

...read moreread less

Abstract: The Fast Fourier Transform is a fundamental tool in scientific and technical computation. The highly parallelizable nature of the algorithm makes it a suitable candidate for GPU acceleration. This paper focuses on exploiting the speedup due to using the half precision multiplication capability of the latest GPUs' tensor core hardware without significantly degrading the precision of the Fourier Transform result. We develop an algorithm that dynamically splits the input single precision dataset into two half precision sets at the lowest level, uses half precision multiplication, and recombines the result at a later step. This work paves the way for using tensor cores for high precision inputs.

...read moreread less

Posted Content•

Numerical Analysis for Iterative Filtering with New Efficient Implementations Based on FFT

[...]

Antonio Cicone¹, Haomin Zhou²•Institutions (2)

University of L'Aquila¹, Georgia Institute of Technology²

05 Feb 2018-arXiv: Numerical Analysis

TL;DR: A new efficient implementation of Iterative Filtering algorithm is provided, called Fast Iteratives Filtering, which reduces the original iterative algorithm computational complexity by utilizing, in a nontrivial way, Fast Fourier Transform in the computations.

...read moreread less

Abstract: Real life signals are in general non--stationary and non--linear. The development of methods able to extract their hidden features in a fast and reliable way is of high importance in many research fields. In this work we tackle the problem of further analyzing the convergence of the Iterative Filtering method both in a continuous and a discrete setting in order to provide a comprehensive analysis of its behavior. Based on these results we provide new ideas for efficient implementations of Iterative Filtering algorithm which are based on Fast Fourier Transform (FFT), and the reduction of the original iterative algorithm to a direct method.

...read moreread less

Collapse