scispace - formally typeset
Search or ask a question

Showing papers by "Huizhu Jia published in 2022"


Proceedings ArticleDOI
22 Jan 2022
TL;DR: A simulated crowd counting dataset CrowdX, which has a large scale, accurate labeling, parameterized realization, and high fidelity, is proposed, which shows significant performance improvements of the proposed streamlined and efficient benchmark network ESA-Net.
Abstract: In this article, we propose a simulated crowd counting dataset CrowdX, which has a large scale, accurate labeling, parameterized realization, and high fidelity. The experimental results of using this dataset as data enhancement show that the performance of the proposed streamlined and efficient benchmark network ESA-Net can be improved by 8.4%. The other two classic heterogeneous architectures MCNN and CSRNet pre-trained on CrowdX also show significant performance improvements. Considering many influencing factors determine performance, such as background, camera angle, human density, and resolution. Although these factors are important, there is still a lack of research on how they affect crowd counting. Thanks to the CrowdX dataset with rich annotation information, we conduct a large number of data-driven comparative experiments to analyze these factors. Our research provides a reference for a deeper understanding of the crowd counting problem and puts forward some useful suggestions in the actual deployment of the algorithm.

6 citations


Proceedings ArticleDOI
10 Oct 2022
TL;DR: This work presents a framework Combining the Invertible and Non-invertible (CIN) mechanisms that outperforms the current state-of-the-art methods of imperceptibility and robustness significantly.
Abstract: Blind watermarking provides powerful evidence for copyright protection, image authentication, and tampering identification.However, it remains a challenge to design a watermarking model with high imperceptibility and robustness against strong noise attacks. To resolve this issue, we present a framework Combining the Invertible and Non-invertible (CIN) mechanisms. The CIN is composed of the invertible part to achieve high imperceptibility and the non-invertible part to strengthen the robustness against strong noise attacks. For the invertible part, we develop a diffusion and extraction module (DEM) and a fusion and split module (FSM) to embed and extract watermarks symmetrically in an invertible way. For the non-invertible part, we introduce a non-invertible attention-based module (NIAM) and the noise-specific selection module (NSM) to solve the asymmetric extraction under a strong noise attack. Extensive experiments demonstrate that our framework outperforms the current state-of-the-art methods of imperceptibility and robustness significantly. Our framework can achieve an average of 99.99% accuracy and 67.66 dB PSNR under noise-free conditions, while 96.64% and 39.28 dB combined strong noise attacks. The code will be available in https://github.com/RM1110/CIN.

3 citations


Proceedings ArticleDOI
18 Jul 2022
TL;DR: An adaptive rate estimation algorithm with a piece-wise linear function that is very friendly to hardware implemen-tation to accelerate the rate estimation process in the RDO for AVS3 practical applications is proposed.
Abstract: Towards enabling advanced video coding for emerging ap-plications, the AVS3 standard has been developed recently, achieving twice the coding efficiency of the AVS2 stan-dard through complex coding tools including advanced rate-distortion optimization (RDO) to select the best mode. The bit-rates are produced with the Advanced-Entropy-Coding (AEC) in the RDO process of AVS3. However, AEC dom-inates the time complexity of RDO and among the steps, con-text updating and interval subdivision are performed recur-sively, which is not conducive to real-time application, espe-cially for the hardware implementation. Thus this paper pro-poses an adaptive rate estimation algorithm with a piece-wise linear function that is very friendly to hardware implemen-tation to accelerate the rate estimation process in the RDO for AVS3 practical applications. The proposed architecture can meet the requirement of 4K@120fps ultra-high-definition videos at 200 MHz, whereas the BD-Rate increases only by 0.67% under the All-Intra (AI) configuration.

Journal ArticleDOI
TL;DR: MA2ML takes each machine learning module, such as data augmentation, neural architecture search, or hyper-parameters, as an agent and the performance as the reward to form a multi-agent reinforcement learning problem.
Abstract: In this paper, we propose multi-agent automated machine learning (MA2ML) with the aim to effectively handle joint optimization of modules in automated machine learning (AutoML). MA2ML takes each machine learning module, such as data augmentation (AUG), neural architecture search (NAS), or hyper-parameters (HPO), as an agent and the final performance as the reward, to formu-late a multi-agent reinforcement learning problem. MA2ML explicitly assigns credit to each agent according to its marginal contribution to enhance cooperation among modules, and incorporates off-policy learning to improve search efficiency. Theoretically, MA2ML guarantees monotonic improvement of joint optimization. Extensive experiments show that MA2ML yields the state-of-the-art top-1 accuracy on ImageNet under constraints of computational cost, e.g., 79 . 7% / 80 . 5% with FLOPs fewer than 600M/800M. Extensive ablation studies verify the benefits of credit assignment and off-policy learning of MA2ML.