scispace - formally typeset
Search or ask a question

Showing papers by "Heesung Kwon published in 2022"


Proceedings ArticleDOI
08 Feb 2022
TL;DR: This paper introduces a self-supervised learning framework suitable for hyperspectral images that are inherently challenging to annotate and leverages cross-domain CNN, allowing for learning representations from different hyperspectrals with varying spectral characteristics and no pixel-level annotation.
Abstract: Recently, self-supervised learning has attracted attention due to its remarkable ability to acquire meaningful representations for classification tasks without using semantic labels. This paper introduces a self-supervised learning framework suitable for hyperspectral images that are inherently challenging to annotate. The proposed framework architecture leverages cross-domain CNN [1], allowing for learning representations from different hyperspectral images with varying spectral characteristics and no pixel-level annotation. In the framework, cross-domain representations are learned via contrastive learning where neighboring spectral vectors in the same image are clustered together in a common representation space encompassing multiple hyperspectral images. In contrast, spectral vectors in different hyperspectral images are separated into distinct clusters in the space. To verify that the learned representation through contrastive learning is effectively transferred into a downstream task, we perform a classification task on hyperspectral images. The experimental results demonstrate the advantage of the proposed self-supervised representation over models trained from scratch or other transfer learning methods.

7 citations


Journal ArticleDOI
TL;DR: This work physically furnish multiple inlets to the model while having a universal portion which is designed to handle the inconsistent spectral characteristics among different domains, which acts as an effective workaround for the issue of the absence of large-scale dataset.
Abstract: A pretrain-finetune strategy is widely used to reduce the overfitting that can occur when data is insufficient for CNN training. First few layers of a CNN pretrained on a large-scale RGB dataset are capable of acquiring general image characteristics which are remarkably effective in tasks targeted for different RGB datasets. However, when it comes down to hyperspectral domain where each domain has its unique spectral properties, the pretrain-finetune strategy no longer can be deployed in a conventional way while presenting three major issues: 1) inconsistent spectral characteristics among the domains (e.g., frequency range), 2) inconsistent number of data channels among the domains, and 3) absence of large-scale hyperspectral dataset. We seek to train a universal cross-domain model which can later be deployed for various spectral domains. To achieve, we physically furnish multiple inlets to the model while having a universal portion which is designed to handle the inconsistent spectral characteristics among different domains. Note that only the universal portion is used in the finetune process. This approach naturally enables the learning of our model on multiple domains simultaneously which acts as an effective workaround for the issue of the absence of large-scale dataset. We have carried out a study to extensively compare models that were trained using cross-domain approach with ones trained from scratch. Our approach was found to be superior both in accuracy and in training efficiency. In addition, we have verified that our approach effectively reduces the overfitting issue, enabling us to deepen the model up to 13 layers (from 9) without compromising the accuracy.

6 citations


Proceedings ArticleDOI
27 May 2022
TL;DR: The proposed system design employs principles of signal processing oriented data ow modeling along with pipelining of data ow subsystems and integration on top of optimized, off-the-shelf software components for lower level processing to improve detection accuracy and robustness.
Abstract: Object detection from high resolution images is increasingly used for many important application areas of defense and commercial sensing. However, object detection on high resolution images requires intensive computation, which makes it challenging to apply on resource-constrained platforms such as in edge-cloud deployments. In this work, we present a novel system for streamlined object detection on edge-cloud platforms. The system integrates multiple object detectors into an ensemble to improve detection accuracy and robustness. The subset of object detectors that is active in the ensemble can be changed dynamically to provide adaptively adjusted trade-offs among object detection accuracy, real-time performance, and energy consumption. Such adaptivity can be of great utility for resource-constrained deployment to edge-cloud environments, where the execution time and energy cost of full-accuracy processing may be excessive if utilized all of the time. To promote efficient and reliable implementation on resource-constrained devices, the proposed system design employs principles of signal processing oriented data ow modeling along with pipelining of data ow subsystems and integration on top of optimized, off-the-shelf software components for lower level processing. The effectiveness of the proposed object detection system is demonstrated through extensive experiments involving the Unmanned Aerial Vehicle Benchmark and KITTI Vision Benchmark Suite. While the proposed system is developed for the specific problem of object detection, we envision that the underlying design methodology, which integrates adaptive ensemble processing with data ow modeling and optimized lower level libraries, is applicable to a wide range of applications in defense and commercial sensing.

2 citations


Proceedings ArticleDOI
17 Jul 2022
TL;DR: This work proposes an algorithmic frame- work that utilizes the combination of a learned model and a physics-based simulation model for fast planning, and provides a theoretical analysis and empirical evaluation showing a significant reduction in planning times.
Abstract: Off-road and unstructured environments often contain complex patches of various types of terrain, rough elevation changes, deformable objects, etc. An autonomous ground vehicle traversing such environments experiences physical interactions that are extremely hard to model at scale and thus very hard to predict. Nevertheless, planning a safely traversable path through such an environment requires the ability to predict the outcomes of these interactions instead of avoiding them. One approach to doing this is to learn the interaction model offline based on collected data. Unfortunately, though, this requires large amounts of data and can often be brittle. Alternatively, models using physics-based simulators can generate large data and provide a reliable prediction. However, they are very slow to query online within the planning loop. This work proposes an algorithmic framework that utilizes the combination of a learned model and a physics-based simulation model for fast planning. Specifically, it uses the learned model as much as possible to accelerate planning while sparsely using the physics-based simulator to verify the feasibility of the planned path. We provide a theoretical analysis of the algorithm and its empirical evaluation showing a significant reduction in planning times.

1 citations


Journal ArticleDOI
TL;DR: This paper introduces the first paired real image benchmark dataset with hazy and haze-free images, and in- situ haze density measurements, and evaluates a set of representative state-of-the-art dehazing approaches as well as object detectors on the dataset.
Abstract: Imagery collected from outdoor visual environments is often degraded due to the presence of dense smoke or haze. A key challenge for research in scene understanding in these degraded visual environments (DVE) is the lack of representative benchmark datasets. These datasets are required to evaluate state-of-the-art object recognition and other computer vision algorithms in degraded settings. In this paper, we address some of these limitations by introducing the first realistic haze image benchmark, from both aerial and ground view, with paired haze-free images, and in-situ haze density measurements. This dataset was produced in a controlled environment with professional smoke generating machines that covered the entire scene, and consists of images captured from the perspective of both an unmanned aerial vehicle (UAV) and an unmanned ground vehicle (UGV). We also evaluate a set of representative state-of-the-art dehazing approaches as well as object detectors on the dataset. The full dataset presented in this paper, including the ground truth object classification bounding boxes and haze density measurements, is provided for the community to evaluate their algorithms at: https://a2i2-archangel.vision. A subset of this dataset has been used for the “Object Detection in Haze” Track of CVPR UG2 2022 challenge at https://cvpr2022.ug2challenge.org/track1.html.

1 citations


Proceedings ArticleDOI
23 May 2022
TL;DR: A UAV-based image dataset is introduced, called the Archangel dataset, collected with a UAV that includes pose and range information in the form of metadata that aims to advance optimal training and testing of machine learning models in general as well as the more specific case of Uav-based object detection using deep neural networks.
Abstract: Object detection on imagery captured onboard aerial platforms involves different challenges than in ground-to-ground object detection. For example, images captured from UAVs with varying altitude and view angles present challenges for machine learning that are due to variations in appearance and scene attributes. Thus, it is essential to closely examine the critical variables that impact object detection from UAV platforms, such as the significant variations in pose, range to objects, background clutter, lighting, weather conditions, and velocity/acceleration of the UAV. To that end, in this work, we introduce a UAV-based image dataset, called the Archangel dataset, collected with a UAV that includes pose and range information in the form of metadata. Additionally, we use the Archangel dataset to conduct comprehensive studies of how the critical attributes of UAV-based images affect machine learning models for object detection. The extensive analysis on the Archangel dataset aims to advance optimal training and testing of machine learning models in general as well as the more specific case of UAV-based object detection using deep neural networks.

1 citations


Proceedings ArticleDOI
20 Jul 2022
TL;DR: MoReID as discussed by the authors uses a dictionary to store current and past batches to build a large set of encoded samples to leverage a very large number of negative samples in training for general re-ID task.
Abstract: . We present a Momentum Re-identification (MoReID) framework that can leverage a very large number of negative samples in training for general re-identification task. The design of this framework is inspired by Momentum Contrast (MoCo), which uses a dictionary to store current and past batches to build a large set of encoded samples. As we find it less effective to use past positive samples which may be highly inconsistent to the encoded feature property formed with the current positive samples, MoReID is designed to use only a large number of negative samples stored in the dictionary. However, if we train the model using the widely used Triplet loss that uses only one sample to represent a set of positive/negative samples, it is hard to effectively leverage the enlarged set of negative samples acquired by the MoReID framework. To maximize the advantage of using the scaled-up negative sample set, we newly introduce Hard-distance Elastic loss (HE loss), which is capa-ble of using more than one hard sample to represent a large number of samples. Our experiments demonstrate that a large number of negative samples provided by MoReID framework can be utilized at full capacity only with the HE loss, achieving the state-of-the-art accuracy on three re-ID benchmarks, VeRi-776, Market-1501, and VeRi-Wild.

1 citations


Journal ArticleDOI
TL;DR: In this paper , the first UAV-based object detection dataset composed of real and synthetic subsets captured with similar imagining conditions and UAV position and object pose metadata is presented. And the benefits of leveraging the metadata during model evaluation are discussed.
Abstract: Learning to detect objects, such as humans, in imagery captured by an unmanned aerial vehicle (UAV) usually suffers from tremendous variations caused by the UAV's position towards the objects. In addition, existing UAV-based benchmark datasets do not provide adequate dataset metadata, which is essential for precise model diagnosis and learning features invariant to those variations. In this paper, we introduce Archangel, the first UAV-based object detection dataset composed of real and synthetic subsets captured with similar imagining conditions and UAV position and object pose metadata. A series of experiments are carefully designed with a state-of-the-art object detector to demonstrate the benefits of leveraging the metadata during model evaluation. Moreover, several crucial insights involving both real and synthetic data during model optimization are presented. In the end, we discuss the advantages, limitations, and future directions regarding Archangel to highlight its distinct value for the broader machine learning community.

Proceedings ArticleDOI
31 Oct 2022
TL;DR: In this paper , a face tracking system for edge computing devices, which combines a tracking-by-detection algorithm with an ensemble of detectors, is presented to enhance real-time processing capability and reduce energy consumption by adaptively making trade-offs in resolution and active detectors to maintain tracking performance.
Abstract: The deployment of face tracking capabilities at the network edge requires (near) real-time performance under strict computational and energy constraints. Existing approaches often use object detectors with low complexity for tracking to satisfy limited resource constraints. An obvious problem with limiting complexity is that it has a direct impact on performance (e.g., ability to detect faces), especially at lower resolutions. In this study, we present a novel face tracking system for edge computing devices, which combines a tracking-by-detection algorithm with an ensemble of detectors. This system utilizes an online decision-making strategy based on extracting scene information from a density map to inform the active configuration of object detectors and image resolution. Using this system, we enhance real-time processing capability and reduce energy consumption by adaptively making trade-offs in resolution and active detectors to maintain tracking performance while minimizing resource costs. The proposed face tracking system is coupled with a multi-frame bounding box matching algorithm to provide multi-facial tracking functionality. We demonstrate the effectiveness of our system through experiments using the Multiple Object Tracking (MOT) Head Tracking 21 dataset.

Proceedings ArticleDOI
11 Oct 2022
TL;DR: In this article , a simplified depth approximation method is proposed to obtain depth information by quantizing the depth values into a small number of representative values, and the regions of interest are projected to the secondary image to concatenate the information from the additional image.
Abstract: Stereo image inputs provide higher object detection accuracy than monocular images by enabling the detection of objects that are missed from one view while being detectable from another view. To take advantage of additional information from the secondary image, it is necessary to search for the corresponding region in the images of different views by projecting with depth information of the target object. However, most existing studies utilize highly complex computations to estimate the depth for simple 2D object detection. This complexity limits the potential for deploying the methods on platforms, such as unmanned aerial vehicles, that involve significant resource constraints. In this paper, we introduce a simplified depth approximation to obtain depth information by quantizing the depth values into a small number of representative values. With these values, the regions of interest are projected to the secondary image to concatenate the information from the additional image. We validate our method with the KITTI dataset. Our results show that while having very low complexity, our approximation method leads to greatly improved object detection performance in two out of three difficulty groups of the dataset, and comparable performance in the other difficulty group compared to use of monocular image input.