scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Imaging in 2022"


Journal ArticleDOI
TL;DR: This article surveyed different data augmentation techniques employed on mammogram images and aims to provide insights into basic and deep learning-based augmentation Techniques.
Abstract: Research in the medical imaging field using deep learning approaches has become progressively contingent. Scientific findings reveal that supervised deep learning methods’ performance heavily depends on training set size, which expert radiologists must manually annotate. The latter is quite a tiring and time-consuming task. Therefore, most of the freely accessible biomedical image datasets are small-sized. Furthermore, it is challenging to have big-sized medical image datasets due to privacy and legal issues. Consequently, not a small number of supervised deep learning models are prone to overfitting and cannot produce generalized output. One of the most popular methods to mitigate the issue above goes under the name of data augmentation. This technique helps increase training set size by utilizing various transformations and has been publicized to improve the model performance when tested on new data. This article surveyed different data augmentation techniques employed on mammogram images. The article aims to provide insights into basic and deep learning-based augmentation techniques.

30 citations


Journal ArticleDOI
TL;DR: On the whole, MWI has proven its potential as a screening tool for breast cancer detection, both as a standalone or complementary technique, however, there are a few challenges that need to be addressed to unlock the full potential of this imaging modality and translate it to clinical settings.
Abstract: Breast cancer is the most commonly diagnosed cancer type and is the leading cause of cancer-related death among females worldwide. Breast screening and early detection are currently the most successful approaches for the management and treatment of this disease. Several imaging modalities are currently utilized for detecting breast cancer, of which microwave imaging (MWI) is gaining quite a lot of attention as a promising diagnostic tool for early breast cancer detection. MWI is a noninvasive, relatively inexpensive, fast, convenient, and safe screening tool. The purpose of this paper is to provide an up-to-date survey of the principles, developments, and current research status of MWI for breast cancer detection. This paper is structured into two sections; the first is an overview of current MWI techniques used for detecting breast cancer, followed by an explanation of the working principle behind MWI and its various types, namely, microwave tomography and radar-based imaging. In the second section, a review of the initial experiments along with more recent studies on the use of MWI for breast cancer detection is presented. Furthermore, the paper summarizes the challenges facing MWI as a breast cancer detection tool and provides future research directions. On the whole, MWI has proven its potential as a screening tool for breast cancer detection, both as a standalone or complementary technique. However, there are a few challenges that need to be addressed to unlock the full potential of this imaging modality and translate it to clinical settings.

27 citations


Journal ArticleDOI
TL;DR: The aim of this review is to briefly explain the technical principles of photon-counting CT and, more extensively, the potential clinical applications of this technology.
Abstract: Photon-counting computed tomography (CT) is a technology that has attracted increasing interest in recent years since, thanks to new-generation detectors, it holds the promise to radically change the clinical use of CT imaging. Photon-counting detectors overcome the major limitations of conventional CT detectors by providing very high spatial resolution without electronic noise, providing a higher contrast-to-noise ratio, and optimizing spectral images. Additionally, photon-counting CT can lead to reduced radiation exposure, reconstruction of higher spatial resolution images, reduction of image artifacts, optimization of the use of contrast agents, and create new opportunities for quantitative imaging. The aim of this review is to briefly explain the technical principles of photon-counting CT and, more extensively, the potential clinical applications of this technology.

18 citations


Journal ArticleDOI
TL;DR: This survey provides a comprehensive overview of brain tumor classification and segmentation techniques, with a focus on ML-based, CNN- based, CapsNet-based and ViT-based techniques.
Abstract: Management of brain tumors is based on clinical and radiological information with presumed grade dictating treatment. Hence, a non-invasive assessment of tumor grade is of paramount importance to choose the best treatment plan. Convolutional Neural Networks (CNNs) represent one of the effective Deep Learning (DL)-based techniques that have been used for brain tumor diagnosis. However, they are unable to handle input modifications effectively. Capsule neural networks (CapsNets) are a novel type of machine learning (ML) architecture that was recently developed to address the drawbacks of CNNs. CapsNets are resistant to rotations and affine translations, which is beneficial when processing medical imaging datasets. Moreover, Vision Transformers (ViT)-based solutions have been very recently proposed to address the issue of long-range dependency in CNNs. This survey provides a comprehensive overview of brain tumor classification and segmentation techniques, with a focus on ML-based, CNN-based, CapsNet-based, and ViT-based techniques. The survey highlights the fundamental contributions of recent studies and the performance of state-of-the-art techniques. Moreover, we present an in-depth discussion of crucial issues and open challenges. We also identify some key limitations and promising future research directions. We envisage that this survey shall serve as a good springboard for further study.

16 citations


Journal ArticleDOI
TL;DR: This paper focused on the development of an automatic ship detection (ASD) approach by using DL methods for assessing the Airbus ship dataset (composed of about 40 K satellite images), and found that the YOLOv5 object detection algorithm outperforms the other versions of the Y OLO algorithm.
Abstract: The remote sensing surveillance of maritime areas represents an essential task for both security and environmental reasons. Recently, learning strategies belonging to the field of machine learning (ML) have become a niche of interest for the community of remote sensing. Specifically, a major challenge is the automatic classification of ships from satellite imagery, which is needed for traffic surveillance systems, the protection of illegal fisheries, control systems of oil discharge, and the monitoring of sea pollution. Deep learning (DL) is a branch of ML that has emerged in the last few years as a result of advancements in digital technology and data availability. DL has shown capacity and efficacy in tackling difficult learning tasks that were previously intractable. Specifically, DL methods, such as convolutional neural networks (CNNs), have been reported to be efficient in image detection and recognition applications. In this paper, we focused on the development of an automatic ship detection (ASD) approach by using DL methods for assessing the Airbus ship dataset (composed of about 40 K satellite images). The paper explores and analyzes the distinct variations of the YOLO algorithm for the detection of ships from satellite images. A comparison of different versions of YOLO algorithms for ship detection, such as YOLOv3, YOLOv4, and YOLOv5, is presented, after training them on a personal computer with a large dataset of satellite images of the Airbus Ship Challenge and Shipsnet. The differences between the algorithms could be observed on the personal computer. We have confirmed that these algorithms can be used for effective ship detection from satellite images. The conclusion drawn from the conducted research is that the YOLOv5 object detection algorithm outperforms the other versions of the YOLO algorithm, i.e., YOLOv4 and YOLOv3 in terms accuracy of 99% for YOLOv5 compared to 98% and 97% respectively for YOLOv4 and YOLOv3.

16 citations


Journal ArticleDOI
TL;DR: This paper aims to uncover the limitations faced in image acquisition through the use of cameras, image segmentation and tracking, feature extraction, and gesture classification stages of vision-driven hand gesture recognition in various camera orientations.
Abstract: Researchers have recently focused their attention on vision-based hand gesture recognition. However, due to several constraints, achieving an effective vision-driven hand gesture recognition system in real time has remained a challenge. This paper aims to uncover the limitations faced in image acquisition through the use of cameras, image segmentation and tracking, feature extraction, and gesture classification stages of vision-driven hand gesture recognition in various camera orientations. This paper looked at research on vision-based hand gesture recognition systems from 2012 to 2022. Its goal is to find areas that are getting better and those that need more work. We used specific keywords to find 108 articles in well-known online databases. In this article, we put together a collection of the most notable research works related to gesture recognition. We suggest different categories for gesture recognition-related research with subcategories to create a valuable resource in this domain. We summarize and analyze the methodologies in tabular form. After comparing similar types of methodologies in the gesture recognition field, we have drawn conclusions based on our findings. Our research also looked at how well the vision-based system recognized hand gestures in terms of recognition accuracy. There is a wide variation in identification accuracy, from 68% to 97%, with the average being 86.6 percent. The limitations considered comprise multiple text and interpretations of gestures and complex non-rigid hand characteristics. In comparison to current research, this paper is unique in that it discusses all types of gesture recognition techniques.

15 citations


Journal ArticleDOI
TL;DR: Though promising system accuracy in the order of 2–5 mm has been demonstrated in phantom models, several human factors and technical challenges remain to be addressed prior to widespread adoption of OST-HMD led surgical navigation.
Abstract: We conducted a systematic review of recent literature to understand the current challenges in the use of optical see-through head-mounted displays (OST-HMDs) for augmented reality (AR) assisted surgery. Using Google Scholar, 57 relevant articles from 1 January 2021 through 18 March 2022 were identified. Selected articles were then categorized based on a taxonomy that described the required components of an effective AR-based navigation system: data, processing, overlay, view, and validation. Our findings indicated a focus on orthopedic (n=20) and maxillofacial surgeries (n=8). For preoperative input data, computed tomography (CT) (n=34), and surface rendered models (n=39) were most commonly used to represent image information. Virtual content was commonly directly superimposed with the target site (n=47); this was achieved by surface tracking of fiducials (n=30), external tracking (n=16), or manual placement (n=11). Microsoft HoloLens devices (n=24 in 2021, n=7 in 2022) were the most frequently used OST-HMDs; gestures and/or voice (n=32) served as the preferred interaction paradigm. Though promising system accuracy in the order of 2–5 mm has been demonstrated in phantom models, several human factors and technical challenges—perception, ease of use, context, interaction, and occlusion—remain to be addressed prior to widespread adoption of OST-HMD led surgical navigation.

15 citations


Journal ArticleDOI
TL;DR: The current development of AI for COVID-19 management and the outlook for emerging trends of combining AI-based LUS with robotics, telehealth, and other techniques are summarized.
Abstract: Ultrasound imaging of the lung has played an important role in managing patients with COVID-19–associated pneumonia and acute respiratory distress syndrome (ARDS). During the COVID-19 pandemic, lung ultrasound (LUS) or point-of-care ultrasound (POCUS) has been a popular diagnostic tool due to its unique imaging capability and logistical advantages over chest X-ray and CT. Pneumonia/ARDS is associated with the sonographic appearances of pleural line irregularities and B-line artefacts, which are caused by interstitial thickening and inflammation, and increase in number with severity. Artificial intelligence (AI), particularly machine learning, is increasingly used as a critical tool that assists clinicians in LUS image reading and COVID-19 decision making. We conducted a systematic review from academic databases (PubMed and Google Scholar) and preprints on arXiv or TechRxiv of the state-of-the-art machine learning technologies for LUS images in COVID-19 diagnosis. Openly accessible LUS datasets are listed. Various machine learning architectures have been employed to evaluate LUS and showed high performance. This paper will summarize the current development of AI for COVID-19 management and the outlook for emerging trends of combining AI-based LUS with robotics, telehealth, and other techniques.

14 citations


Journal ArticleDOI
TL;DR: In this paper , the use of cameras capable of capturing a 360° scene with a single image was assessed using spherical photogrammetry and the algorithm based on the structure from motion and multi-view stereo, it is possible to reconstruct the geometry of an object or structure.
Abstract: The digitization of Cultural Heritage is an important activity for the protection, management, and conservation of structures of particular historical and architectural interest. In this context, the use of low-cost sensors, especially in the photogrammetric field, represents a major research challenge. In this paper, the use of cameras capable of capturing a 360° scene with a single image was assessed. By using spherical photogrammetry and the algorithm based on the structure from motion and multi-view stereo, it is possible to reconstruct the geometry (point cloud) of an object or structure. In particular, for this experiment, the Ricoh theta SC2 camera was used. The analysis was conducted on two sites: one in the laboratory and another directly in the field for the digitization of a large structure (Colonada in Buziaș, Romania). In the case study of the laboratory, several tests were carried out to identify the best strategy for reconstructing the 3D model of the observed environment. In this environment, the approach that provided the best result in terms of both detail and dimensional accuracy was subsequently applied to the case study of Colonada in Buziaș. In this latter case study, a comparison of the point cloud generated by this low-cost sensor and one performed by a high-performance Terrestrial Laser Scanner (TLS), showed a difference of 15 centimeters for 80% of the points. In addition, the 3D point cloud obtained from 360° images is rather noisy and unable to construct complex geometries with small dimensions. However, the photogrammetric dataset can be used for the reconstruction of a virtual tour for the documentation and dissemination of Cultural Heritage.

13 citations


Journal ArticleDOI
TL;DR: The workflow, architecture, application, future development of matRadiomics is discussed, and its working principles are demonstrated in a real case study with the aim of establishing a reference standard for the whole radiomics analysis, starting from the image visualization up to the predictive model implementation.
Abstract: Radiomics aims to support clinical decisions through its workflow, which is divided into: (i) target identification and segmentation, (ii) feature extraction, (iii) feature selection, and (iv) model fitting. Many radiomics tools were developed to fulfill the steps mentioned above. However, to date, users must switch different software to complete the radiomics workflow. To address this issue, we developed a new free and user-friendly radiomics framework, namely matRadiomics, which allows the user: (i) to import and inspect biomedical images, (ii) to identify and segment the target, (iii) to extract the features, (iv) to reduce and select them, and (v) to build a predictive model using machine learning algorithms. As a result, biomedical images can be visualized and segmented and, through the integration of Pyradiomics into matRadiomics, radiomic features can be extracted. These features can be selected using a hybrid descriptive–inferential method, and, consequently, used to train three different classifiers: linear discriminant analysis, k-nearest neighbors, and support vector machines. Model validation is performed using k-fold cross-Validation and k-fold stratified cross-validation. Finally, the performance metrics of each model are shown in the graphical interface of matRadiomics. In this study, we discuss the workflow, architecture, application, future development of matRadiomics, and demonstrate its working principles in a real case study with the aim of establishing a reference standard for the whole radiomics analysis, starting from the image visualization up to the predictive model implementation.

12 citations


Journal ArticleDOI
TL;DR: iPPG perfusion maps were successfully extracted from the intestine microvasculature, demonstrating that iPPG can be successfully used for detecting perturbations and perfusion changes in intestinal tissues during surgery.
Abstract: Surgical excision is the golden standard for treatment of intestinal tumors. In this surgical procedure, inadequate perfusion of the anastomosis can lead to postoperative complications, such as anastomotic leakages. Imaging photoplethysmography (iPPG) can potentially provide objective and real-time feedback of the perfusion status of tissues. This feasibility study aims to evaluate an iPPG acquisition system during intestinal surgeries to detect the perfusion levels of the microvasculature tissue bed in different perfusion conditions. This feasibility study assesses three patients that underwent resection of a portion of the small intestine. Data was acquired from fully perfused, non-perfused and anastomosis parts of the intestine during different phases of the surgical procedure. Strategies for limiting motion and noise during acquisition were implemented. iPPG perfusion maps were successfully extracted from the intestine microvasculature, demonstrating that iPPG can be successfully used for detecting perturbations and perfusion changes in intestinal tissues during surgery. This study provides proof of concept for iPPG to detect changes in organ perfusion levels.

Journal ArticleDOI
TL;DR: The proposed methodology may improve the method of calculating the [64Cu]chelator biodistribution and open the way towards a decision support system in the field of new radiopharmaceuticals used in preclinical imaging trials.
Abstract: The 64Cu-labeled chelator was analyzed in vivo by positron emission tomography (PET) imaging to evaluate its biodistribution in a murine model at different acquisition times. For this purpose, nine 6-week-old female Balb/C nude strain mice underwent micro-PET imaging at three different time points after 64Cu-labeled chelator injection. Specifically, the mice were divided into group 1 (acquisition 1 h after [64Cu] chelator administration, n = 3 mice), group 2 (acquisition 4 h after [64Cu]chelator administration, n = 3 mice), and group 3 (acquisition 24 h after [64Cu] chelator administration, n = 3 mice). Successively, all PET studies were segmented by means of registration with a standard template space (3D whole-body Digimouse atlas), and 108 radiomics features were extracted from seven organs (namely, heart, bladder, stomach, liver, spleen, kidney, and lung) to investigate possible changes over time in [64Cu]chelator biodistribution. The one-way analysis of variance and post hoc Tukey Honestly Significant Difference test revealed that, while heart, stomach, spleen, kidney, and lung districts showed a very low percentage of radiomics features with significant variations (p-value < 0.05) among the three groups of mice, a large number of features (greater than 60% and 50%, respectively) that varied significantly between groups were observed in bladder and liver, indicating a different in vivo uptake of the 64Cu-labeled chelator over time. The proposed methodology may improve the method of calculating the [64Cu]chelator biodistribution and open the way towards a decision support system in the field of new radiopharmaceuticals used in preclinical imaging trials.

Journal ArticleDOI
TL;DR: A narrative review of Generative Adversarial Networks in brain imaging is provided, discussing the clinical potential of GANs, future clinical applications, as well as pitfalls that radiologists should be aware of.
Abstract: Artificial intelligence (AI) is expected to have a major effect on radiology as it demonstrated remarkable progress in many clinical tasks, mostly regarding the detection, segmentation, classification, monitoring, and prediction of diseases. Generative Adversarial Networks have been proposed as one of the most exciting applications of deep learning in radiology. GANs are a new approach to deep learning that leverages adversarial learning to tackle a wide array of computer vision challenges. Brain radiology was one of the first fields where GANs found their application. In neuroradiology, indeed, GANs open unexplored scenarios, allowing new processes such as image-to-image and cross-modality synthesis, image reconstruction, image segmentation, image synthesis, data augmentation, disease progression models, and brain decoding. In this narrative review, we will provide an introduction to GANs in brain imaging, discussing the clinical potential of GANs, future clinical applications, as well as pitfalls that radiologists should be aware of.

Journal ArticleDOI
TL;DR: Whether deep learning techniques may be helpful in performing accurate and low-cost measurements related to glaucoma, which may promote patient empowerment and help medical doctors better monitor patients is verified.
Abstract: Artificial intelligence techniques are now being applied in different medical solutions ranging from disease screening to activity recognition and computer-aided diagnosis. The combination of computer science methods and medical knowledge facilitates and improves the accuracy of the different processes and tools. Inspired by these advances, this paper performs a literature review focused on state-of-the-art glaucoma screening, segmentation, and classification based on images of the papilla and excavation using deep learning techniques. These techniques have been shown to have high sensitivity and specificity in glaucoma screening based on papilla and excavation images. The automatic segmentation of the contours of the optic disc and the excavation then allows the identification and assessment of the glaucomatous disease’s progression. As a result, we verified whether deep learning techniques may be helpful in performing accurate and low-cost measurements related to glaucoma, which may promote patient empowerment and help medical doctors better monitor patients.

Journal ArticleDOI
TL;DR: The aim of the review is to provide information and advice for practitioners to select the appropriate version of watershed for their problem solving, and to forecast future directions of software development for 3D image segmentation by watershed.
Abstract: Watershed is a widely used image segmentation algorithm. Most researchers understand just an idea of this method: a grayscale image is considered as topographic relief, which is flooded from initial basins. However, frequently they are not aware of the options of the algorithm and the peculiarities of its realizations. There are many watershed implementations in software packages and products. Even if these packages are based on the identical algorithm–watershed, by flooding their outcomes, processing speed, and consumed memory, vary greatly. In particular, the difference among various implementations is noticeable for huge volumetric images; for instance, tomographic 3D images, for which low performance and high memory requirements of watershed might be bottlenecks. In our review, we discuss the peculiarities of algorithms with and without waterline generation, the impact of connectivity type and relief quantization level on the result, approaches for parallelization, as well as other method options. We present detailed benchmarking of seven open-source and three commercial software implementations of marker-controlled watershed for semantic or instance segmentation. We compare those software packages for one synthetic and two natural volumetric images. The aim of the review is to provide information and advice for practitioners to select the appropriate version of watershed for their problem solving. In addition, we forecast future directions of software development for 3D image segmentation by watershed.

Journal ArticleDOI
TL;DR: This review aims to provide an overview regarding the different noninvasive imaging techniques for the evaluation of ICC, providing information ranging from the anatomical assessment of coronary artery arteries to the assessment of ischemic myocardium and myocardial infarction.
Abstract: Ischemic chronic cardiomyopathy (ICC) is still one of the most common cardiac diseases leading to the development of myocardial ischemia, infarction, or heart failure. The application of several imaging modalities can provide information regarding coronary anatomy, coronary artery disease, myocardial ischemia and tissue characterization. In particular, coronary computed tomography angiography (CCTA) can provide information regarding coronary plaque stenosis, its composition, and the possible evaluation of myocardial ischemia using fractional flow reserve CT or CT perfusion. Cardiac magnetic resonance (CMR) can be used to evaluate cardiac function as well as the presence of ischemia. In addition, CMR can be used to characterize the myocardial tissue of hibernated or infarcted myocardium. Echocardiography is the most widely used technique to achieve information regarding function and myocardial wall motion abnormalities during myocardial ischemia. Nuclear medicine can be used to evaluate perfusion in both qualitative and quantitative assessment. In this review we aim to provide an overview regarding the different noninvasive imaging techniques for the evaluation of ICC, providing information ranging from the anatomical assessment of coronary artery arteries to the assessment of ischemic myocardium and myocardial infarction. In particular this review is going to show the different noninvasive approaches based on the specific clinical history of patients with ICC.

Journal ArticleDOI
TL;DR: The various types of medical images and segmentation techniques and the assessment criteria for segmentation outcomes in kidney tumor segmentation are discussed, highlighting their building blocks and various strategies.
Abstract: Cure rates for kidney cancer vary according to stage and grade; hence, accurate diagnostic procedures for early detection and diagnosis are crucial. Some difficulties with manual segmentation have necessitated the use of deep learning models to assist clinicians in effectively recognizing and segmenting tumors. Deep learning (DL), particularly convolutional neural networks, has produced outstanding success in classifying and segmenting images. Simultaneously, researchers in the field of medical image segmentation employ DL approaches to solve problems such as tumor segmentation, cell segmentation, and organ segmentation. Segmentation of tumors semantically is critical in radiation and therapeutic practice. This article discusses current advances in kidney tumor segmentation systems based on DL. We discuss the various types of medical images and segmentation techniques and the assessment criteria for segmentation outcomes in kidney tumor segmentation, highlighting their building blocks and various strategies.

Journal ArticleDOI
TL;DR: This work proposes a simple way to train few-shot classification models, with the aim of reaching top performance on multiple standardized benchmarks in the field.
Abstract: Few-shot classification aims at leveraging knowledge learned in a deep learning model, in order to obtain good classification performance on new problems, where only a few labeled samples per class are available. Recent years have seen a fair number of works in the field, each one introducing their own methodology. A frequent problem, though, is the use of suboptimally trained models as a first building block, leading to doubts about whether proposed approaches bring gains if applied to more sophisticated pretrained models. In this work, we propose a simple way to train such models, with the aim of reaching top performance on multiple standardized benchmarks in the field. This methodology offers a new baseline on which to propose (and fairly compare) new techniques or adapt existing ones.

Journal ArticleDOI
TL;DR: Nonlinear reconstruction (NLR) as discussed by the authors was developed in 2017 to reconstruct the object image in the case of optical-scattering modulators, which can reconstruct an object image modulated by an axicons, bifocal lenses and even exotic spiral diffractive elements, which generate deterministic optical fields.
Abstract: Indirect-imaging methods involve at least two steps, namely optical recording and computational reconstruction. The optical-recording process uses an optical modulator that transforms the light from the object into a typical intensity distribution. This distribution is numerically processed to reconstruct the object’s image corresponding to different spatial and spectral dimensions. There have been numerous optical-modulation functions and reconstruction methods developed in the past few years for different applications. In most cases, a compatible pair of the optical-modulation function and reconstruction method gives optimal performance. A new reconstruction method, termed nonlinear reconstruction (NLR), was developed in 2017 to reconstruct the object image in the case of optical-scattering modulators. Over the years, it has been revealed that the NLR can reconstruct an object’s image modulated by an axicons, bifocal lenses and even exotic spiral diffractive elements, which generate deterministic optical fields. Apparently, NLR seems to be a universal reconstruction method for indirect imaging. In this review, the performance of NLR isinvestigated for many deterministic and stochastic optical fields. Simulation and experimental results for different cases are presented and discussed.

Journal ArticleDOI
TL;DR: A detailed comparative review of all types of available techniques and sensors used for sign language recognition was presented in this paper , where the focus was to explore emerging trends and strategies for SLR recognition and to point out deficiencies in existing systems.
Abstract: Sign language recognition is challenging due to the lack of communication between normal and affected people. Many social and physiological impacts are created due to speaking or hearing disability. A lot of different dimensional techniques have been proposed previously to overcome this gap. A sensor-based smart glove for sign language recognition (SLR) proved helpful to generate data based on various hand movements related to specific signs. A detailed comparative review of all types of available techniques and sensors used for sign language recognition was presented in this article. The focus of this paper was to explore emerging trends and strategies for sign language recognition and to point out deficiencies in existing systems. This paper will act as a guide for other researchers to understand all materials and techniques like flex resistive sensor-based, vision sensor-based, or hybrid system-based technologies used for sign language until now.

Journal ArticleDOI
TL;DR: The proposed method reduces the time-cost associated with manually labelling drones, and it is proved that the convolutional neural network is transferable to real-life videos.
Abstract: We present a convolutional neural network (CNN) that identifies drone models in real-life videos. The neural network is trained on synthetic images and tested on a real-life dataset of drone videos. To create the training and validation datasets, we show a method of generating synthetic drone images. Domain randomization is used to vary the simulation parameters such as model textures, background images, and orientation. Three common drone models are classified: DJI Phantom, DJI Mavic, and DJI Inspire. To test the performance of the neural network model, Anti-UAV, a real-life dataset of flying drones is used. The proposed method reduces the time-cost associated with manually labelling drones, and we prove that it is transferable to real-life videos. The CNN achieves an overall accuracy of 92.4%, a precision of 88.8%, a recall of 88.6%, and an f1 score of 88.7% when tested on the real-life dataset.

Journal ArticleDOI
TL;DR:
Abstract: Secure image transmission is one of the most challenging problems in the age of communication technology. Millions of people use and transfer images for either personal or commercial purposes over the internet. One way of achieving secure image transmission over the network is encryption techniques that convert the original image into a non-understandable or scrambled form, called a cipher image, so that even if the attacker gets access to the cipher they would not be able to retrieve the original image. In this study, chaos-based image encryption and block cipher techniques are implemented and analyzed for image encryption. Arnold cat map in combination with a logistic map are used as native chaotic and hybrid chaotic approaches respectively whereas advanced encryption standard (AES) is used as a block cipher approach. The chaotic and AES methods are applied to encrypt images and are subjected to measures of different performance parameters such as peak signal to noise ratio (PSNR), number of pixels change rate (NPCR), unified average changing intensity (UACI), and histogram and computation time analysis to measure the strength of each algorithm. The results show that the hybrid chaotic map has better NPCR and UACI values which makes it more robust to differential attacks or chosen plain text attacks. The Arnold cat map is computationally efficient in comparison to the other two approaches. However, AES has a lower PSNR value (7.53 to 11.93) and has more variation between histograms of original and cipher images, thereby indicating that it is more resistant to statistical attacks than the other two approaches.

Journal ArticleDOI
TL;DR: The results suggest that the use of an HMD to align the clinician’s visual and motor fields promotes successful needle guidance, highlighting the importance of continued HMD-guidance research.
Abstract: While ultrasound (US) guidance has been used during central venous catheterization to reduce complications, including the puncturing of arteries, the rate of such problems remains non-negligible. To further reduce complication rates, mixed-reality systems have been proposed as part of the user interface for such procedures. We demonstrate the use of a surgical navigation system that renders a calibrated US image, and the needle and its trajectory, in a common frame of reference. We compare the effectiveness of this system, whereby images are rendered on a planar monitor and within a head-mounted display (HMD), to the standard-of-care US-only approach, via a phantom-based user study that recruited 31 expert clinicians and 20 medical students. These users performed needle-insertions into a phantom under the three modes of visualization. The success rates were significantly improved under HMD-guidance as compared to US-guidance, for both expert clinicians (94% vs. 70%) and medical students (70% vs. 25%). Users more consistently positioned their needle closer to the center of the vessel’s lumen under HMD-guidance compared to US-guidance. The performance of the clinicians when interacting with this monitor system was comparable to using US-only guidance, with no significant difference being observed across any metrics. The results suggest that the use of an HMD to align the clinician’s visual and motor fields promotes successful needle guidance, highlighting the importance of continued HMD-guidance research.

Journal ArticleDOI
TL;DR: In this article , the authors introduce a probabilistic approach that connects these perspectives based on variational inference in a single deep autoencoder model, and propose to bound the approximate posterior by fitting regions of high density on the basis of correctly classified data points.
Abstract: Modern deep neural networks are well known to be brittle in the face of unknown data instances and recognition of the latter remains a challenge. Although it is inevitable for continual-learning systems to encounter such unseen concepts, the corresponding literature appears to nonetheless focus primarily on alleviating catastrophic interference with learned representations. In this work, we introduce a probabilistic approach that connects these perspectives based on variational inference in a single deep autoencoder model. Specifically, we propose to bound the approximate posterior by fitting regions of high density on the basis of correctly classified data points. These bounds are shown to serve a dual purpose: unseen unknown out-of-distribution data can be distinguished from already trained known tasks towards robust application. Simultaneously, to retain already acquired knowledge, a generative replay process can be narrowed to strictly in-distribution samples, in order to significantly alleviate catastrophic interference.

Journal ArticleDOI
TL;DR: This study aims to develop an efficient deep learning model that can be used to predict British sign language in an attempt to narrow this communication gap between speech-impaired and non-speech-imPAired people in the community.
Abstract: Human beings usually rely on communication to express their feeling and ideas and to solve disputes among themselves. A major component required for effective communication is language. Language can occur in different forms, including written symbols, gestures, and vocalizations. It is usually essential for all of the communicating parties to be fully conversant with a common language. However, to date this has not been the case between speech-impaired people who use sign language and people who use spoken languages. A number of different studies have pointed out a significant gaps between these two groups which can limit the ease of communication. Therefore, this study aims to develop an efficient deep learning model that can be used to predict British sign language in an attempt to narrow this communication gap between speech-impaired and non-speech-impaired people in the community. Two models were developed in this research, CNN and LSTM, and their performance was evaluated using a multi-class confusion matrix. The CNN model emerged with the highest performance, attaining training and testing accuracies of 98.8% and 97.4%, respectively. In addition, the model achieved average weighted precession and recall of 97% and 96%, respectively. On the other hand, the LSTM model’s performance was quite poor, with the maximum training and testing performance accuracies achieved being 49.4% and 48.7%, respectively. Our research concluded that the CNN model was the best for recognizing and determining British sign language.

Journal ArticleDOI
TL;DR: In this article , a systematic review was conducted following the PRISMA methodology, carefully analysing all studies that reported visual games that include AR activities and somehow included presence data, or related dimensions that may be referred to as immersion-related feelings, analysis or results.
Abstract: The sense of presence in augmented reality (AR) has been studied by multiple researchers through diverse applications and strategies. In addition to the valuable information provided to the scientific community, new questions keep being raised. These approaches vary from following the standards from virtual reality to ascertaining the presence of users’ experiences and new proposals for evaluating presence that specifically target AR environments. It is undeniable that the idea of evaluating presence across AR may be overwhelming due to the different scenarios that may be possible, whether this regards technological devices—from immersive AR headsets to the small screens of smartphones—or the amount of virtual information that is being added to the real scenario. Taking into account the recent literature that has addressed the sense of presence in AR as a true challenge given the diversity of ways that AR can be experienced, this study proposes a specific scope to address presence and other related forms of dimensions such as immersion, engagement, embodiment, or telepresence, when AR is used in games. This systematic review was conducted following the PRISMA methodology, carefully analysing all studies that reported visual games that include AR activities and somehow included presence data—or related dimensions that may be referred to as immersion-related feelings, analysis or results. This study clarifies what dimensions of presence are being considered and evaluated in AR games, how presence-related variables have been evaluated, and what the major research findings are. For a better understanding of these approaches, this study takes note of what devices are being used for the AR experience when immersion-related feelings are one of the behaviours that are considered in their evaluations, and discusses to what extent these feelings in AR games affect the player’s other behaviours.

Journal ArticleDOI
TL;DR: In this set of mammography image processing results, the NCA clinically significantly surpasses YOLOv4 in the case of asymmetric density and of changes invisible on the dense parenchyma background.
Abstract: Background: We directly compared the mammography image processing results obtained with the help of the YOLOv4 convolutional neural network (CNN) model versus those obtained with the help of the NCA-based nested contours algorithm model. Method: We used 1080 images to train the YOLOv4, plus 100 images with proven breast cancer (BC) and 100 images with proven absence of BC to test both models. Results: the rates of true-positive, false-positive and false-negative outcomes were 60, 10 and 40, respectively, for YOLOv4, and 93, 63 and 7, respectively, for NCA. The sensitivities for the YOLOv4 and the NCA were comparable to each other for star-like lesions, masses with unclear borders, round- or oval-shaped masses with clear borders and partly visualized masses. On the contrary, the NCA was superior to the YOLOv4 in the case of asymmetric density and of changes invisible on the dense parenchyma background. Radiologists changed their earlier decisions in six cases per 100 for NCA. YOLOv4 outputs did not influence the radiologists’ decisions. Conclusions: in our set, NCA clinically significantly surpasses YOLOv4.

Journal ArticleDOI
TL;DR: A prediction framework including a 3D tumor segmentation in positron emission tomography (PET) images, based on a weakly supervised deep learning method, and an outcome prediction based onA 3D-CNN classifier applied to the segmented tumor regions, achieves state-of-the-art prediction results.
Abstract: It is proven that radiomic characteristics extracted from the tumor region are predictive. The first step in radiomic analysis is the segmentation of the lesion. However, this task is time consuming and requires a highly trained physician. This process could be automated using computer-aided detection (CAD) tools. Current state-of-the-art methods are trained in a supervised learning setting, which requires a lot of data that are usually not available in the medical imaging field. The challenge is to train one model to segment different types of tumors with only a weak segmentation ground truth. In this work, we propose a prediction framework including a 3D tumor segmentation in positron emission tomography (PET) images, based on a weakly supervised deep learning method, and an outcome prediction based on a 3D-CNN classifier applied to the segmented tumor regions. The key step is to locate the tumor in 3D. We propose to (1) calculate two maximum intensity projection (MIP) images from 3D PET images in two directions, (2) classify the MIP images into different types of cancers, (3) generate the class activation maps through a multitask learning approach with a weak prior knowledge, and (4) segment the 3D tumor region from the two 2D activation maps with a proposed new loss function for the multitask. The proposed approach achieves state-of-the-art prediction results with a small data set and with a weak segmentation ground truth. Our model was tested and validated for treatment response and survival in lung and esophageal cancers on 195 patients, with an area under the receiver operating characteristic curve (AUC) of 67% and 59%, respectively, and a dice coefficient of 73% and 0.77% for tumor segmentation.

Journal ArticleDOI
TL;DR: This paper demonstrates that scene recognition can be performed solely using object-level information in line with advances in computer vision and natural language processing and could be further helpful in the field of embodied research and dynamic scene classification.
Abstract: Indoor scene recognition and semantic information can be helpful for social robots. Recently, in the field of indoor scene recognition, researchers have incorporated object-level information and shown improved performances. This paper demonstrates that scene recognition can be performed solely using object-level information in line with these advances. A state-of-the-art object detection model was trained to detect objects typically found in indoor environments and then used to detect objects in scene data. These predicted objects were then used as features to predict room categories. This paper successfully combines approaches conventionally used in computer vision and natural language processing (YOLO and TF-IDF, respectively). These approaches could be further helpful in the field of embodied research and dynamic scene classification, which we elaborate on.

Journal ArticleDOI
TL;DR: In this article , an adaptive binarization method based on a combination of local threshold processing, hologram division into blocks, and error diffusion procedure (the LDE method) was proposed.
Abstract: High-speed optical reconstruction of 3D-scenes can be achieved using digital holography with binary digital micromirror devices (DMD) or a ferroelectric spatial light modulator (fSLM). There are many algorithms for binarizing digital holograms. The most common are methods based on global and local thresholding and error diffusion techniques. In addition, hologram binarization is used in optical encryption, data compression, beam shaping, 3D-displays, nanofabrication, materials characterization, etc. This paper proposes an adaptive binarization method based on a combination of local threshold processing, hologram division into blocks, and error diffusion procedure (the LDE method). The method is applied for binarization of optically recorded and computer-generated digital holograms of flat objects and three-dimensional scenes. The quality of reconstructed images was compared with different methods of error diffusion and thresholding. Image reconstruction quality was up to 22% higher by various metrics than that one for standard binarization methods. The optical hologram reconstruction using DMD confirms the results of the numerical simulations.