scispace - formally typeset
Search or ask a question

Showing papers on "Bilinear interpolation published in 2020"


Proceedings ArticleDOI
14 Jun 2020
TL;DR: A unified attention block --- X-Linear attention block, that fully employs bilinear pooling to selectively capitalize on visual information or perform multi-modal reasoning is introduced.
Abstract: Recent progress on fine-grained visual recognition and visual question answering has featured Bilinear Pooling, which effectively models the 2nd order interactions across multi-modal inputs. Nevertheless, there has not been evidence in support of building such interactions concurrently with attention mechanism for image captioning. In this paper, we introduce a unified attention block --- X-Linear attention block, that fully employs bilinear pooling to selectively capitalize on visual information or perform multi-modal reasoning. Technically, X-Linear attention block simultaneously exploits both the spatial and channel-wise bilinear attention distributions to capture the 2$^{nd}$ order interactions between the input single-modal or multi-modal features. Higher and even infinity order feature interactions are readily modeled through stacking multiple X-Linear attention blocks and equipping the block with Exponential Linear Unit (ELU) in a parameter-free fashion, respectively. Furthermore, we present X-Linear Attention Networks (dubbed as X-LAN) that novelly integrates X-Linear attention block(s) into image encoder and sentence decoder of image captioning model to leverage higher order intra- and inter-modal interactions. The experiments on COCO benchmark demonstrate that our X-LAN obtains to-date the best published CIDEr performance of 132.0% on COCO Karpathy test split. When further endowing Transformer with X-Linear attention blocks, CIDEr is boosted up to 132.8%. Source code is available at https://github.com/Panda-Peter/image-captioning.

401 citations


Journal ArticleDOI
TL;DR: A deep bilinear model for blind image quality assessment that works for both synthetically and authentically distorted images and achieves state-of-the-art performance on both synthetic and authentic IQA databases is proposed.
Abstract: We propose a deep bilinear model for blind image quality assessment that works for both synthetically and authentically distorted images. Our model constitutes two streams of deep convolutional neural networks (CNNs), specializing in two distortion scenarios separately. For synthetic distortions, we first pre-train a CNN to classify the distortion type and the level of an input image, whose ground truth label is readily available at a large scale. For authentic distortions, we make use of a pre-train CNN (VGG-16) for the image classification task. The two feature sets are bilinearly pooled into one representation for a final quality prediction. We fine-tune the whole network on the target databases using a variant of stochastic gradient descent. The extensive experimental results show that the proposed model achieves state-of-the-art performance on both synthetic and authentic IQA databases. Furthermore, we verify the generalizability of our method on the large-scale Waterloo Exploration Database, and demonstrate its competitiveness using the group maximum differentiation competition methodology.

390 citations


Journal ArticleDOI
TL;DR: A novel training strategy is presented which can form a new object-level attention mechanism for the model during the training phase, and bilinear pooling is utilized to improve the model capability of detecting local contrast casting defects.
Abstract: Automatic detection of casting defects on radiography images is an important technology to automatize digital radiography defect inspection. Traditionally, in an industrial application, conventional methods are inefficient when the detection targets are small, local, and subtle in the complex scenario. Meanwhile, the outperformance of deep learning models, such as the convolutional neural network (CNN), is limited by a huge volume of data with precise annotations. To overcome these challenges, an efficient CNN model, only trained with image-level labels, is first proposed for detection of tiny casting defects in a complicated industrial scene. Then, in this article, we present a novel training strategy which can form a new object-level attention mechanism for the model during the training phase, and bilinear pooling is utilized to improve the model capability of detecting local contrast casting defects. Moreover, to enhance the interpretability, we extend class activation maps (CAM) to bilinear CAM (Bi-CAM) which is adapted to bilinear architectures as a visualization technique to reason about the model output. Experimental results show that the proposed model achieves superior performance in terms of each quantitative metric and is suitable for most actual applications. The real-time defect detection of castings is efficiently implemented in the complex scenario.

80 citations


Journal ArticleDOI
TL;DR: In this paper, an adaptive filtering-based recursive identification algorithm for joint estimation of states and parameters of bilinear state-space systems with an autoregressive moving average noise was developed.
Abstract: This study develops an adaptive filtering-based recursive identification algorithm for joint estimation of states and parameters of bilinear state-space systems with an autoregressive moving average noise. In order to handle the correlated noise and unmeasurable states in parameter estimation, an adaptive filter is established to whiten the coloured noise and a bilinear state observer is constructed to update the unavailable states recursively. Then a hierarchical generalised extended least squares (HGELS) algorithm and an adaptive filtering-based HGELS algorithm are developed for simultaneously estimating the unknown states and parameters. The convergence analysis indicates that the parameter estimates can converge to their true values. A numerical example illustrates the convergence results.

61 citations


Journal ArticleDOI
TL;DR: A bilinear state observer is established to update the unavailable states recursively, and a new least squares based efficient estimation algorithm is presented for simultaneously estimating the unknown states, parameters and time delay.
Abstract: This paper develops a redundant recursive identification algorithm for joint estimation of states and parameters of bilinear state-space systems with time delays. In order to handle measurement delays in parameter identification and state estimation, the bilinear model is transformed to an extended identification model according to the redundant rule. In this regard, a bilinear state observer is established to update the unavailable states recursively, and a new least squares based efficient estimation algorithm is presented for simultaneously estimating the unknown states, parameters and time delay. The effectiveness of the proposed algorithm is evaluated by a numerical example.

57 citations


Journal ArticleDOI
02 Apr 2020-Sensors
TL;DR: A more efficient and lightweight convolutional neural network method to improve classification accuracy with a small training dataset and includes feature fusion with bilinear pooling, which can greatly improve performance and accuracy for remote scene classification.
Abstract: Classifying remote sensing images is vital for interpreting image content. Presently, remote sensing image scene classification methods using convolutional neural networks have drawbacks, including excessive parameters and heavy calculation costs. More efficient and lightweight CNNs have fewer parameters and calculations, but their classification performance is generally weaker. We propose a more efficient and lightweight convolutional neural network method to improve classification accuracy with a small training dataset. Inspired by fine-grained visual recognition, this study introduces a bilinear convolutional neural network model for scene classification. First, the lightweight convolutional neural network, MobileNetv2, is used to extract deep and abstract image features. Each feature is then transformed into two features with two different convolutional layers. The transformed features are subjected to Hadamard product operation to obtain an enhanced bilinear feature. Finally, the bilinear feature after pooling and normalization is used for classification. Experiments are performed on three widely used datasets: UC Merced, AID, and NWPU-RESISC45. Compared with other state-of-art methods, the proposed method has fewer parameters and calculations, while achieving higher accuracy. By including feature fusion with bilinear pooling, performance and accuracy for remote scene classification can greatly improve. This could be applied to any remote sensing image classification task.

57 citations


Journal ArticleDOI
TL;DR: In this paper, Carando et al. showed that the multivariable Rubio de Francia extrapolation theorem for the multilinear Muckenhoupt classes A p →, r → can be extended to bilinear rough singular integral operators.

53 citations


Posted Content
TL;DR: An improved DepthNet, HR-Depth, is presented with two effective strategies: (1) re-design the skip-connection in DepthNet to get better high-resolution features and (2) propose feature fusion Squeeze-and-Excitation module to fuse feature more efficiently.
Abstract: Self-supervised learning shows great potential in monoculardepth estimation, using image sequences as the only source ofsupervision. Although people try to use the high-resolutionimage for depth estimation, the accuracy of prediction hasnot been significantly improved. In this work, we find thecore reason comes from the inaccurate depth estimation inlarge gradient regions, making the bilinear interpolation er-ror gradually disappear as the resolution increases. To obtainmore accurate depth estimation in large gradient regions, itis necessary to obtain high-resolution features with spatialand semantic information. Therefore, we present an improvedDepthNet, HR-Depth, with two effective strategies: (1) re-design the skip-connection in DepthNet to get better high-resolution features and (2) propose feature fusion Squeeze-and-Excitation(fSE) module to fuse feature more efficiently.Using Resnet-18 as the encoder, HR-Depth surpasses all pre-vious state-of-the-art(SoTA) methods with the least param-eters at both high and low resolution. Moreover, previousstate-of-the-art methods are based on fairly complex and deepnetworks with a mass of parameters which limits their realapplications. Thus we also construct a lightweight networkwhich uses MobileNetV3 as encoder. Experiments show thatthe lightweight network can perform on par with many largemodels like Monodepth2 at high-resolution with only20%parameters. All codes and models will be available at this https URL.

52 citations


Journal ArticleDOI
TL;DR: Experimental results show better performance of S-K algorithm than the considered other methods, including bilinear and bicubic interpolation and quasi IIR (Infinite Impulse Response) approximation.

44 citations


Journal ArticleDOI
TL;DR: In this paper, a multi-objective matrix normalization (MOMN) method is proposed to simultaneously normalize a bilinear representation in terms of square-root, low-rank, and sparsity.
Abstract: Bilinear pooling achieves great success in fine-grained visual recognition (FGVC). Recent methods have shown that the matrix power normalization can stabilize the second-order information in bilinear features, but some problems, e.g., redundant information and over-fitting, remain to be resolved. In this paper, we propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation in terms of square-root, low-rank, and sparsity. These three regularizers can not only stabilize the second-order information, but also compact the bilinear features and promote model generalization. In MOMN, a core challenge is how to jointly optimize three non-smooth regularizers of different convex properties. To this end, MOMN first formulates them into an augmented Lagrange formula with approximated regularizer constraints. Then, auxiliary variables are introduced to relax different constraints, which allow each regularizer to be solved alternately. Finally, several updating strategies based on gradient descent are designed to obtain consistent convergence and efficient implementation. Consequently, MOMN is implemented with only matrix multiplication, which is well-compatible with GPU acceleration, and the normalized bilinear features are stabilized and discriminative. Experiments on five public benchmarks for FGVC demonstrate that the proposed MOMN is superior to existing normalization-based methods in terms of both accuracy and efficiency. The code is available: https://github.com/mboboGO/MOMN .

40 citations


Journal ArticleDOI
TL;DR: Experimental results on real multispectral data sets demonstrate the superiority of the proposed bilinear convolutional neural networks method over several well-known change-detection approaches.
Abstract: Recently, deep learning has been demonstrated to be an effective tool to detect changes in bitemporal remote sensing images. However, most existing methods based on deep learning obtain the ultimate change map by analyzing the difference image (DI) or the stacked feature vectors of input images, which cannot sufficiently capture the relationship between the two input images to obtain the change information. In this letter, a new method named bilinear convolutional neural networks (BCNNs) is proposed to detect changes in bitemporal multispectral images. The model can be trained end to end with two symmetric convolutional neural networks (CNNs), which are capable of learning the feature representation from bitemporal images and utilizing the relations between the two input images by a linear outer product operation in an effective way. Specifically, two sets of patches obtained from two multispectral images of different times are first input into two CNNs to extract deep features, respectively. Then, the matrix outer product is applied on the output feature maps to obtain the combined bilinear features. Finally, the ultimate change detected result can be produced by applying the softmax classifier on the combined features. Experimental results on real multispectral data sets demonstrate the superiority of the proposed method over several well-known change-detection approaches.

Journal ArticleDOI
TL;DR: A novel multimodal bilinear fusion network (MBFNet) is proposed, which is used to fuse the optical and SAR features for land cover classification and achieves more effectiveLand cover classification performance than the state-of-the-art methods.
Abstract: As two different tools for earth observation, the optical and synthetic aperture radar (SAR) images can provide complementary information of the same land types for better land cover classification. However, because of the different imaging mechanisms of optical and SAR images, how to efficiently exploit the complementary information becomes an interesting and challenging problem. In this article, we propose a novel multimodal bilinear fusion network (MBFNet), which is used to fuse the optical and SAR features for land cover classification. The MBFNet consists of three components: the feature extractor, the second-order attention-based channel selection module (SACSM), and the bilinear fusion module. First, in order to avoid the network parameters tempting to ingratiate dominant modality, the pseudo-siamese convolutional neural network (CNN) is taken as the feature extractor to extract deep semantic feature maps of optical and SAR images, respectively. Then, the SACSM is embedded into each stream, and the fine channel-attention maps with second-order statistics are obtained by bilinear integrating the global average-pooling and global max-pooling information. The SACSM can not only automatically highlight the important channels of feature maps to improve the representation power of networks, but also uses the channel selection mechanism to reconfigure compact feature maps with better discrimination. Finally, the bilinear pooling is used as the feature-level fusion method, which establishes the second-order association between two compact feature maps of the optical and SAR streams to obtain the low-dimension bilinear fusion features for land cover classification. Experimental results on three broad coregistered optical and SAR datasets demonstrate that our method achieves more effective land cover classification performance than the state-of-the-art methods.

Journal ArticleDOI
TL;DR: A new type of EEG classification network, the separable EEGNet (S-EEGNet), is proposed based on Hilbert–Huang transform and a separable convolutional neural network (CNN) with bilinear interpolation.
Abstract: As one of the most important research fields in the brain-computer interface (BCI) field, electroencephalogram (EEG) classification has a wide range of application values. However, for the EEG signal, it is difficult for the traditional neural networks to capture the characteristics of the EEG signal more comprehensively from the time and space dimensions, which has a certain effect on the accuracy of EEG classification. To solve this problem, we can improve the accuracy of classification via end-to-end learning of the time and space dimensions of EEG. In this paper, a new type of EEG classification network, the separable EEGNet (S-EEGNet), is proposed based on Hilbert-Huang transform (HHT) and a separable convolutional neural network (CNN) with bilinear interpolation. The EEG signal is transformed into time-frequency representation by HHT, which allows the EEG signal to be better described in the frequency domain. Then, the depthwise and pointwise elements of the network are combined to extract the feature map. The displacement variable is added by the bilinear interpolation method to the convolution layer of the separable CNN, allowing the free deformation of the sampling grid. The deformation depends on the local, dense, and adaptive input characteristics of the EEG data. The network can learn from the time and space dimensions of EEG signals end to end to extract features to improve the accuracy of EEG classification. To show the effectiveness of S-EEGNet, the team used this method to test two different types of EEG public datasets (motor imagery classification and emotion classification). The accuracy of motor imagery classification is 77.9%, and the accuracy of emotion classification is 89.91%, and 88.31%, respectively. The experimental results showed that the classification accuracy of S-EEGNet improved by 3.6%, 1.15%, and 1.33%, respectively.

Posted Content
TL;DR: This work designs a self-guided upsample module to tackle the interpolation blur problem caused by bilinear upsampling between pyramid levels, and proposes a pyramid distillation loss to add supervision for intermediate levels via distilling the finest flow as pseudo labels.
Abstract: We present an unsupervised learning approach for optical flow estimation by improving the upsampling and learning of pyramid network. We design a self-guided upsample module to tackle the interpolation blur problem caused by bilinear upsampling between pyramid levels. Moreover, we propose a pyramid distillation loss to add supervision for intermediate levels via distilling the finest flow as pseudo labels. By integrating these two components together, our method achieves the best performance for unsupervised optical flow learning on multiple leading benchmarks, including MPI-SIntel, KITTI 2012 and KITTI 2015. In particular, we achieve EPE=1.4 on KITTI 2012 and F1=9.38% on KITTI 2015, which outperform the previous state-of-the-art methods by 22.2% and 15.7%, respectively.

Journal ArticleDOI
TL;DR: A two-stage recursive generalized extended least squares algorithm is presented to reduce the computational burden by using the hierarchical identification principle and the auxiliary model identification idea, respectively, and a stochastic gradient identification algorithm is proposed for comparison.
Abstract: This paper considers the parameter identification for a class of nonlinear stochastic systems with colored noise. An input-output representation is derived by eliminating the state variables in the bilinear system. Based on the obtained identification model, a recursive generalized extend least squares algorithm is proposed by using the auxiliary model identification idea. Moreover, a two-stage recursive generalized extended least squares algorithm is presented to reduce the computational burden by using the hierarchical identification principle and the auxiliary model identification idea, respectively. A stochastic gradient identification algorithm is proposed for comparison. The simulation results show that the proposed algorithms have a good performance in estimating the parameters of the bilinear systems with colored noises.

Posted Content
TL;DR: This letter describes how the Koopman operator can be used to generate approximate linear, bilinear, and nonlinear model realizations from data, and argues in favor of bil inear realizations for characterizing systems with unknown dynamics.
Abstract: Nonlinear dynamical systems can be made easier to control by lifting them into the space of observable functions, where their evolution is described by the linear Koopman operator. This paper describes how the Koopman operator can be used to generate approximate linear, bilinear, and nonlinear model realizations from data, and argues in favor of bilinear realizations for characterizing systems with unknown dynamics. Necessary and sufficient conditions for a dynamical system to have a valid linear or bilinear realization over a given set of observable functions are presented and used to show that every control-affine system admits an infinite-dimensional bilinear realization, but does not necessarily admit a linear one. Therefore, approximate bilinear realizations constructed from generic sets of basis functions tend to improve as the number of basis functions increases, whereas approximate linear realizations may not. To demonstrate the advantages of bilinear Koopman realizations for control, a linear, bilinear, and nonlinear Koopman model realization of a simulated robot arm are constructed from data. In a trajectory following task, the bilinear realization exceeds the prediction accuracy of the linear realization and the computational efficiency of the nonlinear realization when incorporated into a model predictive control framework.

Journal ArticleDOI
TL;DR: In this paper, the behavior of specific dispersive waves in a new 3D-HB equation is studied and a Backlund transformation and a Hirota bilinear form of the model are first extracted from the truncated Painleve expansion.
Abstract: The behavior of specific dispersive waves in a new $$(3+1)$$ -dimensional Hirota bilinear (3D-HB) equation is studied. A Backlund transformation and a Hirota bilinear form of the model are first extracted from the truncated Painleve expansion. Through a series of mathematical analyses, it is then revealed that the new 3D-HB equation possesses a series of rational-type solutions. The interaction of lump-type and 1-soliton solutions is studied and some interesting and useful results are presented.

Journal ArticleDOI
TL;DR: In this paper, a multi-objective matrix normalization (MOMN) method is proposed to simultaneously normalize a bilinear representation in terms of square-root, low-rank, and sparsity.
Abstract: Bilinear pooling achieves great success in fine-grained visual recognition (FGVC). Recent methods have shown that the matrix power normalization can stabilize the second-order information in bilinear features, but some problems, e.g., redundant information and over-fitting, remain to be resolved. In this paper, we propose an efficient Multi-Objective Matrix Normalization (MOMN) method that can simultaneously normalize a bilinear representation in terms of square-root, low-rank, and sparsity. These three regularizers can not only stabilize the second-order information, but also compact the bilinear features and promote model generalization. In MOMN, a core challenge is how to jointly optimize three non-smooth regularizers of different convex properties. To this end, MOMN first formulates them into an augmented Lagrange formula with approximated regularizer constraints. Then, auxiliary variables are introduced to relax different constraints, which allow each regularizer to be solved alternately. Finally, several updating strategies based on gradient descent are designed to obtain consistent convergence and efficient implementation. Consequently, MOMN is implemented with only matrix multiplication, which is well-compatible with GPU acceleration, and the normalized bilinear features are stabilized and discriminative. Experiments on five public benchmarks for FGVC demonstrate that the proposed MOMN is superior to existing normalization-based methods in terms of both accuracy and efficiency. The code is available: this https URL.


Journal ArticleDOI
TL;DR: In this article, a general method for classifying, up to equivalence, all bilinear maps f : V × V → V such that d i m ( r a) (r)
Abstract: Let V be an n-dimensional linear space over an algebraically closed base field. We provide a general method for classifying, up to equivalence, all bilinear maps f : V × V → V such that d i m ( r a...

Journal ArticleDOI
TL;DR: In this paper, a generalized B-type Kadomtsev-Petviashvili equation was used to construct new mixed-type periodic and lump-type solutions via dependent variable transformation and Hirota's bilinear form.
Abstract: This paper aims to construct new mixed-type periodic and lump-type solutions via dependent variable transformation and Hirota’s bilinear form (general bilinear techniques). This study considers the (3 + 1)-dimensional generalized B-type Kadomtsev–Petviashvili equation, which describes the weakly dispersive waves in a homogeneous medium in fluid dynamics. The obtained solutions contain abundant physical structures. Consequently, the dynamical behaviors of these solutions are graphically discussed for different choices of the free parameters through 3D plots.

Journal ArticleDOI
TL;DR: In this paper, a general framework to construct fractal interpolation surfaces (FISs) on rectangular grids was presented and bilinear FIS was deduced by Ruan and Xu (Bull Aust Math Soc 91(3):435-446, 2015).
Abstract: A general framework to construct fractal interpolation surfaces (FISs) on rectangular grids was presented and bilinear FIS was deduced by Ruan and Xu (Bull Aust Math Soc 91(3):435–446, 2015). From the view point of operator theory and the stand point of developing some approximation aspects, we revisit the aforementioned construction to obtain a fractal analogue of a prescribed continuous function defined on a rectangular region in $${\mathbb {R}}^2$$. This approach leads to a bounded linear operator analogous to the so-called $$\alpha $$-fractal operator associated with the univariate fractal interpolation function. Several elementary properties of this bivariate fractal operator are reported. We extend the fractal operator to the $${\mathcal {L}}^p$$-spaces for $$1 \le p < \infty $$. Some approximation aspects of the bivariate continuous fractal functions are also discussed.

Journal ArticleDOI
03 Apr 2020
TL;DR: A new feature fusion algorithm, the factorized bilinear coding (FBC) method, is devised, which can generate compact and discriminative representations with substantially fewer parameters.
Abstract: Bilinear pooling has achieved state-of-the-art performance on fusing features in various machine learning tasks, owning to its ability to capture complex associations between features. Despite the success, bilinear pooling suffers from redundancy and burstiness issues, mainly due to the rank-one property of the resulting representation. In this paper, we prove that bilinear pooling is indeed a similarity-based coding-pooling formulation. This establishment then enables us to devise a new feature fusion algorithm, the factorized bilinear coding (FBC) method, to overcome the drawbacks of the bilinear pooling. We show that FBC can generate compact and discriminative representations with substantially fewer parameters. Experiments on two challenging tasks, namely image classification and visual question answering, demonstrate that our method surpasses the bilinear pooling technique by a large margin.

Journal ArticleDOI
TL;DR: This paper reviews the classical problem of free-form curve registration and applies it to an efficient RGB-D visual odometry system called Canny-VO, as it efficiently tracks all Canny edge features extracted from the images.
Abstract: The present paper reviews the classical problem of free-form curve registration and applies it to an efficient RGBD visual odometry system called Canny-VO, as it efficiently tracks all Canny edge features extracted from the images. Two replacements for the distance transformation commonly used in edge registration are proposed: Approximate Nearest Neighbour Fields and Oriented Nearest Neighbour Fields. 3D2D edge alignment benefits from these alternative formulations in terms of both efficiency and accuracy. It removes the need for the more computationally demanding paradigms of datato-model registration, bilinear interpolation, and sub-gradient computation. To ensure robustness of the system in the presence of outliers and sensor noise, the registration is formulated as a maximum a posteriori problem, and the resulting weighted least squares objective is solved by the iteratively re-weighted least squares method. A variety of robust weight functions are investigated and the optimal choice is made based on the statistics of the residual errors. Efficiency is furthermore boosted by an adaptively sampled definition of the nearest neighbour fields. Extensive evaluations on public SLAM benchmark sequences demonstrate state-of-the-art performance and an advantage over classical Euclidean distance fields.

Journal ArticleDOI
TL;DR: The proposed Md-Net outperforms the representative medical image segmentation methods, including Unet, Unet++, MaskRcnn and CE-Net, in terms of sensitivity, accuracy and area under curve both on lung dataset and Bladder dataset.
Abstract: Accurate CT image segmentation is of great importance to the clinical diagnosis. Due to the high similarity of gray values in CT image, the segmented areas are easily affected by their surroundings, which leads to the loss of semantic information. In this paper, we propose a multi-scale dilated convolution network (Md-Net) for CT image segmentation with superior segmentation performance compared with state-of-the-art methods. Specifically, our Md-Net utilizes the dilated convolutions with different sizes to form feature pyramids for extracting the semantic information. Moreover, we use a weighted Diceloss to accelerate the convergence in training process. Meanwhile, the bilinear interpolation and multiple convolutions are taken to reduce the computational cost. Experiment results show that our proposed Md-Net outperforms the representative medical image segmentation methods, including Unet, Unet++, MaskRcnn and CE-Net, in terms of sensitivity, accuracy and area under curve both on lung dataset and Bladder dataset.

Journal ArticleDOI
TL;DR: An autoencoder-like network architecture is presented to learn disentangled shape and pose embedding specifically for the 3D human body to improve the reconstruction accuracy and construct a large dataset of human body models with consistent connectivity for the learning of the neural network.
Abstract: Human bodies exhibit various shapes for different identities or poses, but the body shape has certain similarities in structure and thus can be embedded in a low-dimensional space. This article presents an autoencoder-like network architecture to learn disentangled shape and pose embedding specifically for the 3D human body. This is inspired by recent progress of deformation-based latent representation learning. To improve the reconstruction accuracy, we propose a hierarchical reconstruction pipeline for the disentangling process and construct a large dataset of human body models with consistent connectivity for the learning of the neural network. Our learned embedding can not only achieve superior reconstruction accuracy but also provide great flexibility in 3D human body generation via interpolation, bilinear interpolation, and latent space sampling. The results from extensive experiments demonstrate the powerfulness of our learned 3D human body embedding in various applications.

Journal ArticleDOI
TL;DR: In this paper, the maximal estimates for the bilinear spherical average and Bochner-Riesz operator on the optimal range were studied and a connection between these maximal estimates and the square function of the classical BochNER was drawn.

Proceedings Article
01 Apr 2020
TL;DR: This work restricts to bilinear zero-sum games and gives a systematic analysis of popular gradient updates, for both simultaneous and alternating versions, offering formal evidence that alternating updates converge "better" than simultaneous ones.
Abstract: Min-max formulations have attracted great attention in the ML community due to the rise of deep generative models and adversarial methods, while understanding the dynamics of gradient algorithms for solving such formulations has remained a grand challenge. As a first step, we restrict to bilinear zero-sum games and give a systematic analysis of popular gradient updates, for both simultaneous and alternating versions. We provide exact conditions for their convergence and find the optimal parameter setup and convergence rates. In particular, our results offer formal evidence that alternating updates converge "better" than simultaneous ones.

Book ChapterDOI
TL;DR: The proposed approach is data-driven and relies on the use of time-series data generated from the control dynamical system for the lifting of a nonlinear system in the Koopman eigenfunction coordinates to construct a finite-dimensional bilinear representation of a control-affine nonlinear Dynamical system.
Abstract: We propose the application of Koopman operator theory for the design of stabilizing feedback controller for a nonlinear control system. The proposed approach is data-driven and relies on the use of time-series data generated from the control dynamical system for the lifting of a nonlinear system in the Koopman eigenfunction coordinates. In particular, a finite-dimensional bilinear representation of a control-affine nonlinear dynamical system is constructed in the Koopman eigenfunction coordinates using time-series data. Sample complexity results are used to determine the data required to achieve the desired level of accuracy for the approximate bilinear representation of the nonlinear system in Koopman eigenfunction coordinates. A control Lyapunov function-based approach is proposed for the design of stabilizing feedback controller. A systematic convex optimization-based formulation is proposed for the search of control Lyapunov function. Several numerical examples are presented to demonstrate the application of the proposed data-driven stabilization approach.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a novel Constrained Bilinear Factorization Multi-view Subspace Clustering (CBF-MSC) method, where the bilinear factorization with an orthonormality constraint and a low-rank constraint is imposed for all coefficient matrices to make them have the same trace-norm instead of being equivalent, so as to explore the consensus information of multi-view data more fully.
Abstract: Multi-view clustering is an important and fundamental problem. Many multi-view subspace clustering methods have been proposed, and most of them assume that all views share a same coefficient matrix. However, the underlying information of multi-view data are not fully exploited under this assumption, since the coefficient matrices of different views should have the same clustering properties rather than be uniform among multiple views. To this end, this paper proposes a novel Constrained Bilinear Factorization Multi-view Subspace Clustering (CBF-MSC) method. Specifically, the bilinear factorization with an orthonormality constraint and a low-rank constraint is imposed for all coefficient matrices to make them have the same trace-norm instead of being equivalent, so as to explore the consensus information of multi-view data more fully. Finally, an Augmented Lagrangian Multiplier (ALM) based algorithm is designed to optimize the objective function. Comprehensive experiments tested on nine benchmark datasets validate the effectiveness and competitiveness of the proposed approach compared with several state-of-the-arts.