scispace - formally typeset
Search or ask a question

Showing papers by "Rong Zhang published in 2023"


Journal ArticleDOI
TL;DR: Lau et al. as mentioned in this paper reported the rapid spread of SARS-CoV-2 in Macao, Special Administrative Region, China, and conducted an online survey during December 27-30, 2022.

5 citations


Journal ArticleDOI
01 Jul 2023
TL;DR: In this article , a top-down convolutional network is proposed to incorporate priors about the structure of pose components and body configuration during training, which can improve the robustness under complex field conditions in the wild.
Abstract: Recent studies estimate human anatomical key points through the single monocular image, in which multichannel heatmaps are the key factor in determining the quality of human pose estimation. Multichannel heatmaps can efficiently handle the image-to-coordinate mapping task and the processing of semantic features. Most methods ignore physical constraints and internal relationships of human body parts, which easily misclassify left and right symmetrical parts as similar features. Some studies use RNNs on the top to incorporate priors about the structure of pose components and body configuration. Therefore, a novel top-down convolutional network is proposed to consider these priors during training, which can improve the robustness under complex field conditions in the wild. In order to learn the prior knowledge of human pose configuration, the hierarchy of fully convolutional networks (discriminator) is used to distinguish real poses from fake ones. Consequently, the pose network is inclined to make a pose estimation that the discriminator misjudges as true, which is reasonable in complex situations. The performance of the method is experimentally validated by pose estimation on the MS COCO human key point detection task. The proposed approach outperforms the original method and generates robust pose predictions, demonstrating efficiency by using adversarial learning.

1 citations


Journal ArticleDOI
TL;DR: In this paper , a multi-modal interactive attention network (MIA-Net) is proposed, which takes the modality that contributes the most to emotion as the main modality and the others as auxiliary modalities.
Abstract: When a multi-modal affective analysis model generalizes from a bimodal task to a trimodal or multi-modal task, it is usually transformed into a hierarchical fusion model based on every two pairwise modalities, similar to a binary tree structure. This easily leads to large growth in model parameters and computation as the number of modalities increases, which limits the model's generalization. Moreover, many multi-modal fusion methods ignore that different modalities contribute differently to affective analysis. To tackle these challenges, this paper proposes a general multi-modal fusion model that supports trimodal or multi-modal affective analysis tasks, called Multi-modal Interactive Attention Network (MIA-Net). Instead of treating different modalities equally, MIA-Net takes the modality that contributes the most to emotion as the main modality and the others as auxiliary modalities. MIA-Net introduces multi-modal interactive attention modules to adaptively select the important information of each auxiliary modality one by one to improve the main-modal representation. Moreover, MIA-Net enables quick generalization to trimodal or multi-modal tasks through stacking multiple MIA modules, which maintains efficient training and only requires linear computation and stable parameter counts. Experimental results of the transfer, generalization, and efficiency experiments on the widely-used datasets demonstrate the effectiveness and generalization of the proposed method.

1 citations


Journal ArticleDOI
01 Jul 2023
TL;DR: In this paper , a near-realistic microstructural model is proposed to simulate the cement hydration system using deep learning and cellular automata, in which behavior is controlled by deep neural networks distilled from micro-structural images.
Abstract: Cement has been widely used in civil engineering directly and plays a critical role in cement-based materials, e.g., concrete. As the microstructural evolution of cement hydration predominates the final physical properties, an accurate simulation of hydration is highly required to enable scientists to evaluate the performance and help design new cementitious materials. However, despite significant effort and progress, a satisfactory model to realistically and accurately simulate the evolution of three-dimensional (3-D) microstructure has not yet to be constructed, mainly because cement hydration is one of the most complex phenomena in material science. In this work, a novel near-realistic microstructural model is proposed to simulate the cement hydration system using deep learning and cellular automata. It is designed to break through the bottleneck of fidelity to real microstructural evolution. The dynamical system is constructed based on a 3-D cellular automaton, in which behavior is controlled by deep neural networks distilled from microstructural images. In addition, a dynamic stratified sampling method with variable capacity is proposed to ensure the representativeness of samples for reducing the computation cost of training. Experiments manifest that the simulated hydration is in accordance with the actual development in different aspects, such as near-realistic microstructure and approximate process. Furthermore, the constructed system also demonstrates promising generalization capability even under various conditions.

Journal ArticleDOI
TL;DR: SGML-Net as mentioned in this paper incorporates auxiliary information via saliency detection to guide discriminative representation learning, achieving high performance and low model complexity for few-shot fine-grained visual recognition.
Abstract: Recognizing novel sub-categories with scarce samples is an essential and challenging research topic in computer vision. Existing literature focus on addressing this challenge through global-based or local-based representation approaches. The former employs global feature representations for recognization, which may lack fine-grained information. The latter captures local relationships with complex structures, possibly leading to high model complexity. To address the above challenges, this article proposes a novel framework called SGML-Net for few-shot fine-grained visual recognition. SGML-Net incorporates auxiliary information via saliency detection to guide discriminative representation learning, achieving high performance and low model complexity. Specifically, SGML-Net utilizes the saliency detection model to emphasize the key regions of each sub-category, providing a strong prior for representation learning. SGML-Net transfers such prior with two independent branches in a mutual learning paradigm. To achieve effective transfer, SGML-Net leverages the relationships among different regions, making the representation more informative and thus providing better guidance. The auxiliary branch is excluded upon the transfer's completion, ensuring low model complexity in deployment. The proposed approach is empirically evaluated on three widely-used benchmarks, demonstrating its superior performance.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a convolutional broad learning system (ConvBLS) based on the spherical K-means (SKM) algorithm and two-stage multi-scale (TSMS) feature fusion.
Abstract: Deep learning generally suffers from enormous computational resources and time-consuming training processes. Broad Learning System (BLS) and its convolutional variants have been proposed to mitigate these issues and have achieved superb performance in image classification. However, the existing convolutional-based broad learning system (C-BLS) either lacks an efficient training method and incremental learning capability or suffers from poor performance. To this end, we propose a convolutional broad learning system (ConvBLS) based on the spherical K-means (SKM) algorithm and two-stage multi-scale (TSMS) feature fusion, which consists of the convolutional feature (CF) layer, convolutional enhancement (CE) layer, TSMS feature fusion layer, and output layer. First, unlike the current C-BLS, the simple yet efficient SKM algorithm is utilized to learn the weights of CF layers. Compared with random filters, the SKM algorithm makes the CF layer learn more comprehensive spatial features. Second, similar to the vanilla BLS, CE layers are established to expand the feature space. Third, the TSMS feature fusion layer is proposed to extract more effective multi-scale features through the integration of CF layers and CE layers. Thanks to the above design and the pseudo-inverse calculation of the output layer weights, our proposed ConvBLS method is unprecedentedly efficient and effective. Finally, the corresponding incremental learning algorithms are presented for rapid remodeling if the model deems to expand. Experiments and comparisons demonstrate the superiority of our method.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed semantic embedding for image transformers (SEiTs) to explore semantic features of facial morphology in the AU detection task, which can learn morphological features intrinsically from the face image.
Abstract: This article proposes semantic embedding for image transformers (SEiTs) to explore semantic features of facial morphology in the action unit (AU) detection task. The conventional approaches typically rely on external information (e.g., facial landmarks) to obtain the location of facial components, whereas the SEiT can learn morphological features intrinsically from the face image. The pre-training task, namely semantic masked facial image modeling (SMFIM), aims to actively obtain facial morphological information. The pixels of the input facial image are randomly erased with semantic masks (e.g., nose, eyes, eyebrows, mouth, and lip). The embedding model tries to predict the presence of facial components for the input image that can learn semantic representations of the face simultaneously. The learned semantic embeddings are fed to transformer blocks, which enable global interaction between semantic elements. The SEiT integrates facial morphological information and global interaction characters, appropriate for AU detection. The experiments are conducted on the Binghamton-Pittsburgh 4D (BP4D) dataset and Denver intensity of spontaneous facial action (DISFA) dataset, and the results demonstrate the effectiveness of the proposed SEiT.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a broad generative network (BG-Net) for two-stage image outpainting, in which a reconstruction network was trained by ridge regression optimization and a seam line discriminator was designed for transition smoothing.
Abstract: Image outpainting is a challenge for image processing since it needs to produce a big scenery image from a few patches. In general, two-stage frameworks are utilized to unpack complex tasks and complete them step-by-step. However, the time consumption caused by training two networks will hinder the method from adequately optimizing the parameters of networks with limited iterations. In this article, a broad generative network (BG-Net) for two-stage image outpainting is proposed. As a reconstruction network in the first stage, it can be quickly trained by utilizing ridge regression optimization. In the second stage, a seam line discriminator (SLD) is designed for transition smoothing, which greatly improves the quality of images. Compared with state-of-the-art image outpainting methods, the experimental results on the Wiki-Art and Place365 datasets show that the proposed method achieves the best results under evaluation metrics: the Fréchet inception distance (FID) and the kernel inception distance (KID). The proposed BG-Net has good reconstructive ability with faster training speed than those of deep learning-based networks. It reduces the overall training duration of the two-stage framework to the same level as the one-stage framework. Furthermore, the proposed method is adapted to image recurrent outpainting, demonstrating the powerful associative drawing capability of the model.