scispace - formally typeset
Search or ask a question

Showing papers presented at "International Symposium on Image and Signal Processing and Analysis in 2021"


Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this article, an approach based on a variational autoencoder (VAE) was proposed to filter out the scans without anomalies/defects and in doing so, partially automate the procedure.
Abstract: Analysis of ultrasonic testing (UT) data is a time-consuming assignment. In order to make it less demanding we propose an approach based on a variational autoencoder (VAE) to filter out the scans without anomalies/defects and in doing so, partially automate the procedure. The implemented approach uses an additional encoder network allowing to encode the reconstructed images. The differences in encodings of input and reconstructed images have shown to be good indicators of anomalous data. Anomaly detection results surpass the results of other VAE based anomaly criteria.

7 citations


Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this article, a region of interest (ROI) is defined at one of the scales to extract tiles from multi-level multi-gapixel images, which can be used to automate the task of extracting tiles from different scales.
Abstract: In many image domains using multilevel gigapixel images, each image level may reveal different information. E.g., a histological image will reveal specific diagnostic information at different resolutions. By incorporating all levels in deep learning models, the accuracy can be improved. It is necessary to extract tiles from the image since it is intractable to process an entire gigapixel image at full resolution at once. Therefore, a sound method for finding and extracting tiles from multiple levels is essential both during training and prediction. In this paper, we have presented a method to parameterize and automate the task of extracting tiles from different scales with a region of interest (ROI) defined at one of the scales. The proposed method makes it easy to extract different datasets from the same group of gigapixel images with different choices of parameters, and it is reproducible and easy to describe by reporting the parameters. The method is suitable for many image domains and is demonstrated here with different parameter settings using histological images from urinary bladder cancer. An efficient implementation of the method is openly provided via Github.

6 citations


Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this paper, the authors solve a mushroom classification task by systematically going through the above key questions, including how they created and cleaned a proper data set for training, then why they selected a specific neural network considering the constraints of limited hardware resources.
Abstract: Picking mushrooms is traditionally a popular hobby for many people, on the other hand, image based mushroom recognition is a great challenge for machine learning methods due to the large number of species, similarities in appearance, and wide spectrum of environmental effects during imaging. While deep learning convolutional neural networks (CNNs) became monarch in image based recognition, the large number of possible architectures, the alternatives of training, the setting-up of proper data-sets, the settings of hyperparameters are making headaches for the researchers and developers to find optimal solutions for classification problems. In our article we are to solve a mushroom classification task by systematically going through the above key questions. First, we introduce how we created and cleaned a proper data-set for training, then why we selected a specific neural network considering the constraints of limited hardware resources. We go through different alternatives for training such as transfer learning, gradual freezing, changing model size, incremental-size learning, and also applying task specific subnetworks. Performance evaluation is made on our data-set of 106 species, the best approach reaching 92.6% accuracy.

5 citations


Proceedings ArticleDOI
13 Sep 2021
TL;DR: SampledABMIL as mentioned in this paper proposes within-bag sampling as one way of applying end-to-end multiple instance learning (MIL) methods on very large data, which is gaining increased popularity in the (bio)medical imaging community since it may provide a possibility to, while relying only on weak labels assigned to large regions, obtain more fine-grained information.
Abstract: End-to-end multiple instance learning (MIL) is an important concept with a wide range of applications. It is gaining increased popularity in the (bio)medical imaging community since it may provide a possibility to, while relying only on weak labels assigned to large regions, obtain more fine-grained information. However, processing very large bags in end-to-end MIL is problematic due to computer memory constraints. We propose within-bag sampling as one way of applying end-to-end MIL methods on very large data. We explore how different levels of sampling affect the performance of a well-known high-performing end-to-end attention-based MIL method, to understand the conditions when sampling can be utilized. We compose two new datasets tailored for the purpose of the study, and propose a strategy for sampling during MIL inference to arrive at reliable bag labels as well as instance level attention weights. We perform experiments without and with different levels of sampling, on the two publicly available datasets, and for a range of learning settings. We observe that in most situations the proposed bag-level sampling can be applied to end-to-end MIL without performance loss, supporting its confident usage to enable end-to-end MIL also in scenarios with very large bags. We share the code as open source at https://github.com/MIDA-group/SampledABMIL

4 citations


Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this paper, the authors proposed a low computational cost algorithm for emergency siren detection with a Convolutional Neural Network-based deep learning model, which employed Short-Time Fourier Transform spectrograms as features and improved the classification performance by applying a harmonic percussive source separation technique.
Abstract: Emergency Siren Detection is a topic of great importance for road safety. Nowadays, the design of cars with every comfort has improved the quality of driving, but distractions have also increased. Hence the usefulness of implementing an Emergency Vehicle Detection System: if installed inside the car, it alerts the driver of its approach, and if installed outdoors in strategic locations, it automatically activates reserved lanes. In this paper, we perform Emergency Siren Detection with a Convolutional Neural Network-based deep learning model. We investigate acoustic features to propose a low computational cost algorithm. We employ Short-Time Fourier Transform spectrograms as features and improve the classification performance by applying a harmonic percussive source separation technique. The enhancement of the harmonic components of the spectrograms gives better results than more computationally complex features. We also demonstrate the relevance of the siren harmonic contents in the classification task. The reduction of the network hyperparameters decreases the computational load of the algorithm and facilitates its implementation in real-time embedded systems.

4 citations


Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this article, a new integer-valued combinatorial coordinate system for the vertices in the octagonal C 4 C 8 (S) grid was proposed, and the neighborhood relation between vertices can be obtained through arithmetic operations on vertex coordinates.
Abstract: We define a new integer-valued combinatorial coordinate system for the vertices in the octagonal C 4 C 8 (S) grid. We review the existing coordinate systems proposed in the literature and provide formulas for the conversion between our and existing coordinate systems, as well as with Cartesian coordinates. The neighborhood relation between vertices can be obtained through arithmetic operations on vertex coordinates.

3 citations


Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this article, the applicability of deep learning and image analysis methods to automate this task, thus allowing for easier reproducibility of assessments, reduction of the time experts lose on repetitive tasks, and potentially better performance.
Abstract: Sex assessment is an important step of the forensic process. Dental remains are often the only remains left to examine due to their resistance to decay and external factors. Contemporary forensic odontology literature describes multiple methods for sex assessment from mandibular parameters, all of which require manual measurements and expert training. This study aims to explore the applicability of deep learning and image analysis methods to automate this task, thus allowing for easier reproducibility of assessments, reduction of the time experts lose on repetitive tasks, and potentially better performance. We have evaluated state-of-the-art deep learning models and components on the largest dataset of individual adult tooth x-ray images, consisting of 76293 samples. This study also explores the usage of decayed or structurally altered teeth, with which contemporary methods struggle. Two types of models are constructed, a family of models specialized for specific tooth types, and a general model that can assess the sex from any tooth type. We examine the performance of those models per tooth type and age group, as well as the impact of decayed and structurally altered teeth. The specialized models achieve an overall accuracy of 72.40%, and the general model reaches an overall accuracy of 72.68%.

3 citations


Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this article, the significance of morphological and border related features used in addition to spectral information was demonstrated and a feature set that provided a substantial improvement in classification results was proposed.
Abstract: Varietal classification of rice seeds is a crucial task in the process of rice crop production, management, and quality control. Traditionally, classification is performed manually which gives slow and inconsistent results. Machine vision technology provides an automated, real-time, non-destructive and cost-effective solution to this problem. Methods that combine RGB and hyperspectral imaging have shown very good results in rice seed classification. In this paper, we demonstrate the significance of morphological and border related features used in addition to spectral information and propose a feature set that provides a substantial improvement in classification results. The proposed approach was successfully tested on a publicly available dataset of 8640 seed samples corresponding to 90 different rice seed varieties, contained in 180 hyperspectral and RGB image pairs, and resulted in an average F1 score of 85.65%.

3 citations


Proceedings ArticleDOI
13 Sep 2021
TL;DR: A survey of recent work on various Graph Convolutional Network (GCN)-based approaches being applied to skeleton-based activity recognition can be found in this paper, where the conventional implementation of a GCN is introduced and methods that address the limitations of conventional GCN's are presented.
Abstract: Skeleton-Based Activity recognition is an active research topic in Computer Vision. In recent years, deep learning methods have been used in this area, including Recurrent Neural Network (RNN)-based, Convolutional Neural Network (CNN)-based and Graph Convolutional Network (GCN)-based approaches. This paper provides a survey of recent work on various Graph Convolutional Network (GCN)-based approaches being applied to Skeleton-Based Activity Recognition. We first introduce the conventional implementation of a GCN. Then methods that address the limitations of conventional GCN's are presented.

3 citations


Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this paper, a convolutional neural network (CNN) was proposed for multi-illuminant (sun and shadow) illumination estimation, which can remove the chromatic influence of the illumination on objects in the scene.
Abstract: White-balancing is an important part of the image processing pipeline and is used in many computer vision applications. It removes the chromatic influence of the illumination on objects in the scene. White balancing is important in tasks such as object detection and object tracking. This problem is tackled in a myriad of ways, but most methods use the assumption that images contain only one dominant uniform illuminant. In recent years, neural networks have been used to create state-of-the-art methods for single illuminant white-balancing, but the problem of multi-illuminant white-balancing has been largely ignored. The main reason for this is the lack of multi-illuminant datasets. In this paper, we introduce a convolutional neural network for multi-illuminant (sun and shadow) illumination estimation. For the training and testing of the created model over 100 outdoor daytime images were taken using the Canon EOS 550D camera. We show that the model outperforms existing statistics-based methods on the test data.

2 citations


Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this paper, the authors proposed a novel remote sensing object detection dataset for deep learning assisted search and rescue (SaR) using satellite imagery. And they evaluated the application of popular object detection models to this dataset as a baseline to inform further research.
Abstract: Access to high resolution satellite imagery has dramatically increased in recent years as several new constellations have entered service. High revisit frequencies as well as improved resolution has widened the use cases of satellite imagery to areas such as humanitarian relief and even Search and Rescue (SaR). We propose a novel remote sensing object detection dataset for deep learning assisted SaR. This dataset contains only small objects that have been identified as potential targets as part of a live SaR response. We evaluate the application of popular object detection models to this dataset as a baseline to inform further research. We also propose a novel object detection metric, specifically designed to be used in a deep learning assisted SaR setting.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this article, the authors compared conventional machine learning and more recent deep learning-based models for baby cry classification, using acoustic features, spectrograms, and a combination of the two.
Abstract: The reason behind an infant's cry has been elusive to sometimes even the most skilled and experienced paediatricians. Our comprehensive research aims to classify infant's cry into their behavioural traits using objective and analytical machine learning approaches. Towards this goal, we compare conventional machine learning and more recent deep learning-based models for baby cry classification, using acoustic features, spectrograms, and a combination of the two. We performed a detailed empirical study on the publicly available donateacry-corpus and the CRIED dataset to highlight the effectiveness of appropriate acoustic features, signal processing, or machine learning techniques for this task. We also conclude that acoustic features and spectrograms together bring better results. As a side result, this work also emphasized the challenge of an inadequate baby cry database in modelling infant behavioural traits.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this paper, a local tone mapping operator is proposed based on the theory of sprays introduced in the Random Sprays Retinex algorithm, a white balance algorithm dealing with the locality of color perception.
Abstract: In this paper, a local tone mapping operator is proposed. It is based on the theory of sprays introduced in the Random Sprays Retinex algorithm, a white balance algorithm dealing with the locality of color perception. This tone mapping implementation compresses high dynamic range images by using three types of computations on sprays. These operations are carefully chosen so that the result of their convex combination is a low dynamic range image with a high level of detail in all its parts regardless of the original luminance values that may span over large dynamic ranges. Furthermore, a simple local formulation of the Naka-Rushton equation based on random sprays is given. The experimental results are presented and discussed.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this paper, an annealing epsilon greedy algorithm, a reinforcement learning technique, is used to tune the attributes of a neural network equalizer for high frequency (HF) channels.
Abstract: In wireless communications, equalization can be used to remove channel impairments from transmissions. Neural networks (NNs) have proven to be an effective technique against conventional equalizers (i.e. decision-feedback, zero-forcing, etc.). High Frequency (HF) channels require high-performance equalizers to overcome Doppler shifts and large delay spreads. When using a NN equalizer, tuning its structure (i.e. activation function, optimizer, etc …) can be time-consuming. This work proposes using an annealing epsilon greedy algorithm, a reinforcement learning technique, to tune the attributes of a neural network equalizer. Reinforcement learning has been used to tune NNs in different applications, but to the best of our knowledge, it has not been done for NN equalization. The objective of this work is to analyze if using reinforcement learning can improve the performance of a NN equalizer.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this article, the authors proposed a method to generate 3D ultrasonic scans using computer vision and deep learning methods, which can be used for training human experts on non-destructive ultrasonic scan analysis.
Abstract: Non-destructive ultrasonic analysis of materials is a method for assessing the integrity of the inspected components. It is commonly used in monitoring critical parts of the power plants, in aeronautics, oil and gas, and the automotive industry. Since most ultrasonic inspections rely on expert's previous experience they must constantly practice on new, unseen data. Acquiring enough data for training human experts on non-destructive ultrasonic scan analysis can be an expensive and time-consuming task. The only possibility to get new data for practicing is to implant synthetic defects in real metal blocks. Artificial defects are made by temperature strain, electrical discharge, and physical damage. All of those methods are very complicated and expensive to perform. Also metal blocks have to be taken from the components of the power plants to have the same structure and be realistic. In this work, some attempts have been made to generate 3D ultrasonic scans using computer vision and deep learning methods.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this paper, the authors proposed a method based on the detection of flaws in C-scans that can reduce the complexity of manual detection of faults in B-scANS. But this method is not suitable for non-destructive ultrasonic testing of materials.
Abstract: The analysis of the data in non-destructive ultrasonic testing of materials is a very time-intensive task. To alleviate the aforementioned strain on the human expert inspectors, a plethora of assisted analysis methods based on deep learning have been developed recently. However, most of these methods are based on the automated detection of flaws in A-scans and B-scans and therefore we propose a method based on the detection of flaws in C-scans that can reduce the complexity of manual detection of flaws in B-scans. The proposed method classifies each row of the C-scan based on whether it contains any flaws or not. Afterward, the positively classified rows are forwarded for further automated (and manual) inspection. The results show that the developed method significantly reduces the number of B-scans that have to be further analyzed.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this paper, the authors proposed an automated way of determining the optimal number of low-rank components in dimension reduction of image data based on the combination of two-dimensional principal component analysis and an augmentation estimator.
Abstract: We propose an automated way of determining the optimal number of low-rank components in dimension reduction of image data. The method is based on the combination of two-dimensional principal component analysis and an augmentation estimator proposed recently in the literature. Intuitively, the main idea is to combine a scree plot with information extracted from the eigenvectors of a variation matrix. Simulation studies show that the method provides accurate estimates and a demonstration with a finger data set showcases its performance in practice.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this article, a novel deep learning-based speech emotion recognition method was proposed, which exploits a convolutional neural network (CNN), enriched with a GhostVLAD feature aggregation layer.
Abstract: In this paper, we introduce a novel deep learning-based speech emotion recognition method. The proposed approach exploits a convolutional neural network (CNN), enriched with a GhostVLAD feature aggregation layer. The resulting representation adjusts the contribution of each spectrogram segments to the final class prototype representation and is used for trainable and discriminative clustering purposes. In addition, we introduce a modified triplet loss function which integrates the relations between the various emotional patterns. The experimental evaluation, carried out on RAVDESS and CREMA-D datasets validates the proposed methodology, which yields emotion recognition rates superior to 83% and 64%, respectively. The comparative evaluation shows that the proposed approach outperforms state of the art techniques, with gains in accuracy of more than 3%.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this article, a delta function-based optimal shift in formants for enhancing the near-end speech intelligibility is proposed, which does not require the knowledge of noise statistics in designing the delta function.
Abstract: The present study proposes a novel delta function-based optimal shift in formants for enhancing the near-end speech intelligibility. The delta function being used here is trapezoidal in shape. The shaping parameters of this delta function are determined using comprehensive learning particle swarm optimization (CLPSO) which maximizes the short time objective intelligibility (STOI) of speech sequences. The proposed method does not require the knowledge of noise statistics in designing the delta function. Further, the proposed method does not require post-processing in terms of the computation of smoothing of the shifted formants. The performance of the proposed method is illustrated using speech signals from the Hearing In Noise Test (HINT) French database by including the engine noise from a car running at 130 km/h. The results of the investigation, at various SNRs, convincingly demonstrate that the optimal delta function (function with the optimized parameters) could significantly improve the speech intelligibility at very low SNRs while preserving the quality and naturalness of the sound.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this paper, a semi-automated approach for bounding box annotation was developed in the context of nighttime driving videos, where the authors generate trajectory proposals through a tracking-by-detection method, extend and verify object trajectories through single object tracking, and propose a pipeline for efficient semiautomatic annotation of object bounding boxes in videos.
Abstract: Ground-truth annotations are a fundamental requirement for the development of computer vision and deep learning algorithms targeting autonomous driving. Available public datasets have for the most part been recorded in urban settings, while scenes showing countryside roads and nighttime driving conditions are underrepresented in current datasets. In this paper, we present a semi-automated approach for bounding box annotation which was developed in the context of nighttime driving videos. In our three-step approach, we (a) generate trajectory proposals through a tracking-by-detection method, (b) extend and verify object trajectories through single object tracking, and (c) propose a pipeline for efficient semiautomatic annotation of object bounding boxes in videos. We evaluate our approach on the CVL dataset, which focuses on nighttime driving conditions on European countryside roads. We demonstrate the improvements achieved by each processing step, and observe an increase of 23% in recall while precision remains almost constant when compared to the initial tracking-by-detection approach.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this paper, a real-time implementation of an impulse response interpolation method that allows to obtain an accurate binaural reproduction reducing measurement sets is described, based on the time decomposition and frequency division of the HRIRs and the application of a peak detection and matching procedure in combination with an alignment algorithm and a linear interpolation.
Abstract: Binaural synthesis is a very important aspect in the field of immersive audio and it requires the knowledge of the head related impulse responses (HRIRs). This paper describes the real-time implementation of an impulse responses interpolation method that allows to obtain an accurate binaural reproduction reducing measurement sets. The method is based on the time decomposition and frequency division of the HRIRs and the application of a peak detection and matching procedure in combination with an alignment algorithm and a linear interpolation. A 3D set-up has been considered and the algorithm has been evaluated by means of objective and subjective tests, comparing it with the state of the art. The obtained results have demonstrated the excellent performance of the proposed system.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this article, a deep learning model for image segmentation based on the transformer architecture is proposed, which can identify regions in outdoor scenes where the global estimation and subsequent color correction of the image is not accurate.
Abstract: Color constancy is an important property of the human visual system that allows us to recognize the colors of objects regardless of the scene illumination. Computational color constancy is an unavoidable part of all modern camera image processing pipelines. However, most modern computational color constancy methods focus on the estimation of only one illuminant per scene, even though the scene may have multiple illuminations, such as very common outdoor scenes illuminated by sunlight. In this work, we address this problem by creating a deep learning model for image segmentation based on the transformer architecture, which can identify regions in outdoor scenes where the global estimation and subsequent color correction of the image is not accurate. We compare our convolution-free model to a convolutional model and a more simple baseline model and achieve excellent results.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this article, the authors developed a new model with a weighting module that dynamically determines the importance of each of the scanning angles to reduce the information loss during the process of merging.
Abstract: Ultrasonic testing (UT) is a commonly used approach for inspection of material and defect detection without causing harm to the inspected component. To improve the reliability of defect detection, the material is often scanned from various angles leading to an immense amount of data that needs to be analyzed. Some of the defects are only seen on B-scans taken from a particular angle so discarding some of the data would increase the risk of not detecting all of the defects. Recently there has been significant progress in the development of methods for automated defect analysis from the UT data. Using such methods the inspection can be performed quicker, but it is still necessary to inspect all of the angles to detect defects. In this work, we test a novel approach for accelerating the analysis by merging the images from various angles. To reduce the information loss during the process of merging, we develop a new model with a weighting module that dynamically determines the importance of each of the scanning angles. Using the proposed module, the loss of information is minimal, so the precision of the detection model is comparable to the model tested on each of the images separately. Using the merged images input, the analysis can be accelerated by almost 15 times.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this article, a new approach for the 2D PET data acquisition is introduced, which uses the intersections of lines of response (LORs) for the generation of a larger number of virtual LORs in the cases when the number of coincident events is initially small, i.e., when the amount of injected radiotracer is low.
Abstract: In this paper, a new approach for the 2D PET data acquisition is introduced, which uses the intersections of lines of response (LORs) for the generation of a larger number of virtual LORs in the cases when the number of coincident events is initially small, i.e, when the amount of injected radiotracer is low. This approach is based on the fact that the statistical properties of the unknown 2D process are preserved in the statistical properties of intersections of LORs. The 2D image is reconstructed from virtual LORs using the well-known Filtered back-projection method, thereby achieving high temporal resolution with a reduced dose of radiotracer injected into the living organisms. Moreover, the larger number of virtual LORs yields the reconstructed 2D image of higher spatial resolution compared with the reconstruction from original LORs.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: Wang et al. as discussed by the authors extend the ML-CSC framework towards multimodal data processing, and propose three different CNN architectures with increasing performance without introducing any additional learnable parameters.
Abstract: In recent years, Convolutional Neural Networks (CNNs) have led to huge successes across various computer vision applications. However, the lack of interpretability poses a severe barrier for their wider adoption in healthcare. Recently introduced Multilayer Convolutional Sparse Coding (ML-CSC) data model provides a model-based explanation of CNNs. This article aims to extend the ML-CSC framework towards multimodal data processing, which to our knowledge has not been addressed so far. In particular, we focus on interpretable medical image segmentation architecture design for multimodal data. We derive a novel sparse coding algorithm and propose three different CNN architectures with increasing performance, without introducing any additional learnable parameters. Based on the sparse coding theory, our multimodal extension enables the systematic design of interpretable CNN segmentation architectures. Experimental analysis demonstrates that the achieved segmentation results are consistent with the obtained theoretical expectations.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this article, the authors used automatically facial expression recognition, which was trained and evaluated on the AffectNet database, to predict the valence and arousal of 48 subjects during an HRC scenario.
Abstract: Human-Robot Collaboration (HRC) in the context of industrial workflows becomes more and more important. However, cooperation with powerful industrial robots might be problematic for human workers, who could suffer from fear or irritation. In this paper, we use automatically facial expression recognition, which was trained and evaluated on the AffectNet database, to predict the valence and arousal of 48 subjects during an HRC scenario. This covers an assembly task under regular and three kinds of aggravated conditions. The subjects are divided into two groups: The feedback group that gets automatically information according to the new situation and the no-feedback group that does not. We found that while arousal levels remained unaffected, the no-feedback group showed lower valence under aggravated conditions. This effect was compensated in the feedback group.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this paper, an appearance-based metric is used to associate detection to tracks after false negatives and occlusion, and only high confidence tracks having a minimum frequency of apparition are counted.
Abstract: Simple Online and Real-time Tracking (SORT) and its deep extension (DeepSORT) are simple, fast, and effective multi-object tracking by detection frameworks. Their main strengths are simplicity and speed. However, they still suffer from some problems, such as identity switch, instance merge, and many false positives, which prevent the tracking results from being used for subsequent tasks such as counting. In this paper, we strengthen and improve the tracking using EfficientDet and DeepSORT. In our approach, the motion prediction uses appearance, and the appearance embedding uses location. First, we modify the deep detection network to predict the objects' motion in the next frame by leveraging the attention between the current image and the next image. Second, an appearance-based metric is used to associate detection to tracks after false negatives and occlusion. This metric is a learned Mahalanobis distance between two feature descriptors constructed using EfficientDet and attention given to regions of interest from their images. Finally, we count only high confidence tracks having a minimum frequency of apparition. Our approach has been applied to a challenging real-life problem, namely seabed species tracking and counting. Our experimental results show that Robust DeepSORT reduces identity switches and merges. Thus, it improves tracking and counting evaluation measures while keeping the simplicity of the origlnal DeepSORT.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this paper, a novel minimax algorithm is proposed in order to tackle these shortcomings of conventional minimax approach at a cost of increase in computational complexity as compared to conventional optimax algorithm.
Abstract: Global active noise control (ANC) employs multichannel filtered-x least mean square (MCFxLMS) algorithm as it is more suitable algorithm to obtain large quiet zone. Minimax algorithm was proposed to counter the higher computational complexity faced in MCFxLMS based ANC by minimizing the square of the maximum of the absolute values of residual noise at the error microphones. However, the minimax approach leads to inferior performance in terms of convergence as well as noise reduction. Also, the classical minimax approach offers little flexibility in adjusting the ANC performance. In this paper, a novel minimax algorithm is proposed in order to tackle these shortcomings of conventional minimax algorithm at a cost of increase in computational complexity as compared to conventional minimax algorithm. The performance of the proposed approach is evaluated and compared with classical minimax for global noise reduction in a 2-dimensional quiet zone of size 1 m x 1 m in a 3-dimensional reverberant room. The proposed scheme is able to improve the performance with much reduced computational complexity as compared to MCFxLMS though with increased computational complexity as compared to classical minimax approach.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this article, a comprehensive review about the different techniques applied in MOT of deep learning based on different methods is presented, which analyzes the benefits and the constraints of current strategies, techniques and methods.
Abstract: The world is living a major shift from information era to artificial intelligence (AI) era. Machines are giving the ability to sense the surrounding world and to take decisions. Computer vision and especially multi-object tracking(MOT), which relies on Deep Learning, is at the heart of this shift. Indeed, with the growth of deep learning, the methods and algorithms that are tackling this problem have gained better performance from the integration of deep learning models. Deep Learning has been demonstrated as MOT, which tackles the challenges of in-and-out objects, unlabeled data, confusing appearance and occlusion. Deep learning, which relied on MOT techniques, has recently gained a fast ground from representation learning to modelling the networks thanks to the advancement of deep learning hypothesis and benchmark arrangement. This paper sums up and analyzes deep learning based MOT techniques which are at a highest level. The paper also offers a comprehensive review about the different techniques applied in MOT of deep learning based on different methods. Furthermore, this study analyzes the benefits and the constraints of current strategies, techniques and methods.

Proceedings ArticleDOI
13 Sep 2021
TL;DR: In this paper, the authors simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction.
Abstract: Optical coherence tomography (OCT) is a noninvasive imaging modality utilized by ophthalmologists to acquire volumetric data to characterize the retina, the light-sensitive tissue at the back of the eye. OCT captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subsampled OCT data and more recently, deep-learning-based methods have been explored. In this study, we simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction. Our experiment is limited by the size of our current dataset, and we leverage techniques like transfer learning from large natural image databases and image augmentation in our implementation. In anticipation of the reduced resolution that accompanies wide-field OCT systems, we attempt to reconstruct lost features using a pixel-to-pixel approach with an altered super-resolution GAN (SRGAN) architecture. Similar techniques have been used to upscale images of lower image size and resolution in medical images like radiographs. We build upon methods of super-resolution to explore methods of better aiding clinicians in their decision-making to improve patient outcomes.