In the real world, a realistic setting for computer vision or multimedia recognition problems is that we have some classes containing lots of training data and many classes contain a small amount of training data. Therefore, how to use frequent classes to help learning rare classes for which it is harder to collect the training data is an open question. Learning with Shared Information is an emerging topic in machine learning, computer vision and multimedia analysis. There are different level of components that can be shared during concept modeling and machine learning stages, such as sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, etc. Regarding the specific methods, multi-task learning, transfer learning and deep learning can be seen as using different strategies to share information. These learning with shared information methods are very effective in solving real-world large-scale problems. This special issue aims at gathering the recent advances in learning with shared information methods and their applications in computer vision and multimedia analysis. Both state-of-the-art works, as well as literature reviews, are welcome for submission. Papers addressing interesting real-world computer vision and multimedia applications are especially encouraged. Topics of interest include, but are not limited to:  • Multi-task learning or transfer learning for large-scale computer vision and multimedia analysis • Deep learning for large-scale computer vision and multimedia analysis • Multi-modal approach for large-scale computer vision and multimedia analysis • Different sharing strategies, e.g., sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, • Real-world computer vision and multimedia applications based on learning with shared information, e.g., event detection, object recognition, object detection, action recognition, human head pose estimation, object tracking, location-based services, semantic indexing. • New datasets and metrics to evaluate the benefit of the proposed sharing ability for the specific computer vision or multimedia problem. • Survey papers regarding the topic of learning with shared information.  Authors who are unsure whether their planned submission is in scope may contact the guest editors prior to the submission deadline with an abstract, in order to receive feedback.

IEEE transactions on pattern analysis and machine intelligence

Binary Neural Networks: A Survey

Image-Based malware classification using ensemble of CNN architectures (IMCEC)

Traditional framework of discriminative correlation filters (DCF) is often subject to undesired boundary effects. Several approaches to enlarge search regions have been already proposed in the past years to make up for this shortcoming. However, with excessive background information, more background noises are also introduced and the discriminative filter is prone to learn from the ambiance rather than the object. This situation, along with appearance changes of objects caused by full/partial occlusion, illumination variation, and other reasons has made it more likely to have aberrances in the detection process, which could substantially degrade the credibility of its result. Therefore, in this work, a novel approach to repress the aberrances happening during the detection process is proposed, i.e., aberrance repressed correlation filter (ARCF). By enforcing restriction to the rate of alteration in response maps generated in the detection phase, the ARCF tracker can evidently suppress aberrances and is thus more robust and accurate to track objects. Considerable experiments are conducted on different UAV datasets to perform object tracking from an aerial view, i.e., UAV123, UAVDT, and DTB70, with 243 challenging image sequences containing over 90K frames to verify the performance of the ARCF tracker and it has proven itself to have outperformed other 20 state-of-the-art trackers based on DCF and deep-based frameworks with sufficient speed for real-time applications.

/pdf/learning-aberrance-repressed-correlation-filters-for-real-3dc6pcs1e6.pdf

Learning Aberrance Repressed Correlation Filters for Real-Time UAV Tracking

Deep learning-based object detection and instance segmentation have achieved unprecedented progress. In this paper, we propose Complete-IoU (CIoU) loss and Cluster-NMS for enhancing geometric factors in both bounding box regression and Non-Maximum Suppression (NMS), leading to notable gains of average precision (AP) and average recall (AR), without the sacrifice of inference efficiency. In particular, we consider three geometric factors, i.e., overlap area, normalized central point distance and aspect ratio, which are crucial for measuring bounding box regression in object detection and instance segmentation. The three geometric factors are then incorporated into CIoU loss for better distinguishing difficult regression cases. The training of deep models using CIoU loss results in consistent AP and AR improvements in comparison to widely adopted $\ell_n$-norm loss and IoU-based loss. Furthermore, we propose Cluster-NMS, where NMS during inference is done by implicitly clustering detected boxes and usually requires less iterations. Cluster-NMS is very efficient due to its pure GPU implementation, and geometric factors can be incorporated to improve both AP and AR. In the experiments, CIoU loss and Cluster-NMS have been applied to state-of-the-art instance segmentation (e.g., YOLACT), and object detection (e.g., YOLO v3, SSD and Faster R-CNN) models. Taking YOLACT on MS COCO as an example, our method achieves performance gains as +1.7 AP and +6.2 AR$_{100}$ for object detection, and +0.9 AP and +3.5 AR$_{100}$ for instance segmentation, with 27.1 FPS on one NVIDIA GTX 1080Ti GPU. All the source code and trained models are available at this https URL

Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation

In recent years, the application of deep learning based on deep convolutional neural networks has gained great success in face detection. However, one of the remaining open challenges is the detection of small-scaled faces. The depth of the convolutional network can cause the projected feature map for small faces to be quickly shrunk, and most detection approaches with scale invariant can hardly handle less than $15\times 15$ pixel faces. To solve this problem, we propose a different scales face detector (DSFD) based on Faster R-CNN. The new network can improve the precision of face detection while performing as real-time a Faster R-CNN. First, an efficient multitask region proposal network (RPN), combined with boosting face detection, is developed to obtain the human face ROI. Setting the ROI as a constraint, an anchor is inhomogeneously produced on the top feature map by the multitask RPN. A human face proposal is extracted through the anchor combined with facial landmarks. Then, a parallel-type Fast R-CNN network is proposed based on the proposal scale. According to the different percentages they cover on the images, the proposals are assigned to three corresponding Fast R-CNN networks. The three networks are separated through the proposal scales and differ from each other in the weight of feature map concatenation. A variety of strategies is introduced in our face detection network, including multitask learning, feature pyramid, and feature concatenation. Compared to state-of-the-art face detection methods such as UnitBox, HyperFace, FastCNN, the proposed DSFD method achieves promising performance on popular benchmarks including FDDB, AFW, PASCAL faces, and WIDER FACE.

Face Detection With Different Scales Based on Faster R-CNN

In this paper, we propose a robust visual detection–learning–tracking framework for autonomous aerial refueling of unmanned aerial vehicles. Two classifiers (D-classifier and T-classifier) are defined in the proposed framework. The D-classifier is a robust linear support vector machine (SVM) classifier trained offline for detecting the drogue object of aerial refueling and a low-dimensional normalized robust local binary pattern feature is proposed to describe the drogue object in the D-classifier. The T-classifier is a state-based structured SVM classifier trained online for tracking the drogue object. A combination strategy between the D-classifier and the T-classifier is proposed in the framework. The D-classifier is used to assess if some positive support vectors in the T-classifier are required to be replaced by positive examples with density peaks. The experimental results on several challenging video sequences validate the effectiveness and robustness of our proposed framework.

Robust Visual Detection–Learning–Tracking Framework for Autonomous Aerial Refueling of UAVs

In this paper, we propose a robust state-based structured support vector machine (SVM) tracking algorithm combined with incremental principal component analysis (PCA). Different from the current structured SVM for tracking, our method directly learns and predicts the object’s states and not the 2-D translation transformation during tracking. We define the object’s virtual state to combine the state-based structured SVM and incremental PCA. The virtual state is considered as the most confident state of the object in every frame. The incremental PCA is used to update the virtual feature vector corresponding to the virtual state and the principal subspace of the object’s feature vectors. In order to improve the accuracy of the prediction, all the feature vectors are projected onto the principal subspace in the learning and prediction process of the state-based structured SVM. Experimental results on several challenging video sequences validate the effectiveness and robustness of our approach.

Online State-Based Structured SVM Combined With Incremental PCA for Robust Visual Tracking

In this paper, a position measurement system, including drogue’s landmark detection and position computation for autonomous aerial refueling of unmanned aerial vehicles, is proposed. A multitask parallel deep convolution neural network (MPDCNN) is designed to detect the landmarks of the drogue target. In MPDCNN, two parallel convolution networks are used, and a fusion mechanism is proposed to accomplish the effective fusion of the drogue’s two salient parts’ landmark detection. Considering the drogue target’s geometric constraints, a position measurement method based on monocular vision is proposed. An effective fusion strategy, which fuses the measurement results of drogue’s different parts, is proposed to achieve robust position measurement. The error of landmark detection with the proposed method is 3.9%, and it is obviously lower than the errors of other methods. Experimental results on the two KUKA robots platform verify the effectiveness and robustness of the proposed position measurement system for aerial refueling.

Robust Landmark Detection and Position Measurement Based on Monocular Vision for Autonomous Aerial Refueling of UAVs

In this paper, we propose a robust detection and tracking strategy for autonomous aerial refueling of unmanned aerial vehicles. The proposed framework includes two modules: a faster deep-learning-based detector (DLD) and a more accurate reinforcement-learning-based tracker (RLT). In the detection stage, the DLD achieves faster speed by combining the efficient MoblieNet with the you only look once framework. In the tracking stage, RLT is proposed to obtain target’s position accurately and fastly by performing hierarchically positioning and adjusting target bounding box according to the reinforcement learning. The precision of drogue object tracking is 98.7%, which is obviously higher than the other comparison methods. The speed of our network can achieve 15 frames/s on GPU Titan X. The experimental results validate the effectiveness and robustness of the proposed framework.

Yingjie Yin

Papers

Face Detection With Different Scales Based on Faster R-CNN

Robust Visual Detection–Learning–Tracking Framework for Autonomous Aerial Refueling of UAVs

Online State-Based Structured SVM Combined With Incremental PCA for Robust Visual Tracking

Robust Landmark Detection and Position Measurement Based on Monocular Vision for Autonomous Aerial Refueling of UAVs

Robust Visual Detection and Tracking Strategies for Autonomous Aerial Refueling of UAVs