scispace - formally typeset
Search or ask a question

Showing papers on "Object (computer science) published in 2022"


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an incremental learning mechanism based on progressive fuzzy three-way concept for object classification in dynamic environment, which can directly process the continuous data through contrasting the numerical data into the membership degree of object to attribute.

59 citations


Journal ArticleDOI
TL;DR: This paper introduces two novel elements to learn the video object segmentation model, the scribble attention module, which captures more accurate context information and learns an effective attention map to enhance the contrast between foreground and background, and the scribbles-supervised loss, which can optimize the unlabeled pixels and dynamically correct inaccurate segmented areas during the training stage.
Abstract: Recently, video object segmentation has received great attention in the computer vision community. Most of the existing methods heavily rely on the pixel-wise human annotations, which are expensive and time-consuming to obtain. To tackle this problem, we make an early attempt to achieve video object segmentation with scribble-level supervision, which can alleviate large amounts of human labor for collecting the manual annotation. However, using conventional network architectures and learning objective functions under this scenario cannot work well as the supervision information is highly sparse and incomplete. To address this issue, this paper introduces two novel elements to learn the video object segmentation model. The first one is the scribble attention module, which captures more accurate context information and learns an effective attention map to enhance the contrast between foreground and background. The other one is the scribble-supervised loss, which can optimize the unlabeled pixels and dynamically correct inaccurate segmented areas during the training stage. To evaluate the proposed method, we implement experiments on two video object segmentation benchmark datasets, YouTube-video object segmentation (VOS), and densely annotated video segmentation (DAVIS)-2017. We first generate the scribble annotations from the original per-pixel annotations. Then, we train our model and compare its test performance with the baseline models and other existing works. Extensive experiments demonstrate that the proposed method can work effectively and approach to the methods requiring the dense per-pixel annotations.

40 citations


Journal ArticleDOI
01 Jan 2022
TL;DR: In this paper, a Visual Foresight Tree (VFT) is proposed to intelligently rearrange the clutter surrounding a target object so that it can be grasped easily, using a combination of robotic pushing and grasping actions.
Abstract: This letter considers the problem of retrieving an object from many tightly packed objects using a combination of robotic pushing and grasping actions. Object retrieval in dense clutter is an important skill for robots to operate in households and everyday environments effectively. The proposed solution, Visual Foresight Tree ( VFT ), intelligently rearranges the clutter surrounding a target object so that it can be grasped easily. Rearrangement with nested nonprehensile actions is challenging as it requires predicting complex object interactions in a combinatorially large configuration space of multiple objects. We first show that a deep neural network can be trained to accurately predict the poses of the packed objects when the robot pushes one of them. The predictive network provides visual foresight and is used in a tree search as a state transition function in the space of scene images. The tree search returns a sequence of consecutive push actions yielding the best arrangement of the clutter for grasping the target object. Experiments in simulation and using a real robot and objects show that the proposed approach outperforms model-free techniques as well as model-based myopic methods both in terms of success rates and the number of executed actions, on several challenging tasks. A video introducing VFT , with robot experiments, is accessible at https://youtu.be/7cL-hmgvyec . The full source code is available at https://github.com/arc-l/vft .

28 citations


Journal ArticleDOI
01 Jan 2022
TL;DR: GMM-Det as discussed by the authors is a real-time method for extracting epistemic uncertainty from object detectors to identify and reject open-set errors, where the detector produces a structured logit space that is modelled with class-specific Gaussian Mixture Models.
Abstract: Deployed into an open world, object detectors are prone to open-set errors, false positive detections of object classes not present in the training dataset.We propose GMM-Det, a real-time method for extracting epistemic uncertainty from object detectors to identify and reject open-set errors. GMM-Det trains the detector to produce a structured logit space that is modelled with class-specific Gaussian Mixture Models. At test time, open-set errors are identified by their low log-probability under all Gaussian Mixture Models. We test two common detector architectures, Faster R-CNN and RetinaNet, across three varied datasets spanning robotics and computer vision. Our results show that GMM-Det consistently outperforms existing uncertainty techniques for identifying and rejecting open-set detections, especially at the low-error-rate operating point required for safety-critical applications. GMM-Det maintains object detection performance, and introduces only minimal computational overhead. We also introduce a methodology for converting existing object detection datasets into specific open-set datasets to evaluate open-set performance in object detection.

21 citations


Journal ArticleDOI
TL;DR: This paper advocates the importance of equipping two-stage detectors with top-down signals, in order to which provides high-level contextual cues to complement low-level features in object detection.

20 citations


Journal ArticleDOI
Wei Gao1, Fang Wan1, Jun Yue2, Songcen Xu2, Qixiang Ye1 
TL;DR: D-MIL adopts multiple MIL learners to pursue discrepant yet complementary solutions indicating object parts, which are fused with a collaboration module for precise object localization.

15 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present an approach for running analytics on the live video on embedded or mobile devices, e.g., mobile phones, tablets, and computers, where the authors propose a framework for video streaming over the network.
Abstract: Videos take a lot of time to transport over the network, hence running analytics on the live video on embedded or mobile devices has become an important system driver. Considering such devices, e.g...

12 citations


Journal ArticleDOI
TL;DR: In this article, a robust perspective-sensitive network (PSNet) is proposed to overcome the influence of the different view-angles on intra-class similarity by replacing the uniformed feature representation of traditional detectors with a perspective-specific structural feature.

12 citations


Journal ArticleDOI
01 Jan 2022
TL;DR: In this article, a Kalman filter was proposed to synchronize delayed outputs of low frame rate Convolutional Neural Networks for instance segmentation and 6D object pose estimation with the RGB-D input stream to achieve fast and precise object pose and velocity tracking.
Abstract: 6D object pose tracking has been extensively studied in the robotics and computer vision communities. The most promising solutions, leveraging on deep neural networks and/or filtering and optimization, exhibit notable performance on standard benchmarks. However, to our best knowledge, these have not been tested thoroughly against fast object motions. Tracking performance in this scenario degrades significantly, especially for methods that do not achieve real-time performance and introduce non negligible delays. In this work, we introduce ROFT, a Kalman filtering approach for 6D object pose and velocity tracking from a stream of RGB-D images. By leveraging real-time optical flow, ROFT synchronizes delayed outputs of low frame rate Convolutional Neural Networks for instance segmentation and 6D object pose estimation with the RGB-D input stream to achieve fast and precise 6D object pose and velocity tracking. We test our method on a newly introduced photorealistic dataset, Fast-YCB, which comprises fast moving objects from the YCB model set, and on the dataset for object and hand pose estimation HO-3D. Results demonstrate that our approach outperforms state-of-the-art methods for 6D object pose tracking, while also providing 6D object velocity tracking. A video showing the experiments is provided as supplementary material.

8 citations


Journal ArticleDOI
TL;DR: In this article, a well-structured knowledge-based framework of object search is proposed in order to improve the searching efficiency and reasonability, an ontology-based hierarchical and interrelated knowledge structure is formed to support the implementation of complicated service planning with either single or multiple tasks.
Abstract: In the unstructured family environment, robots are expected to provide various services to improve the quality of human life, based on the performance of specific action sequences generated by service planning. This paper focuses on one of the greatest challenges in service planning that is aimed at accomplishing the service tasks by generating appropriate object sequences to guide the robot on searching the corresponding target objects efficiently and reasonably. A well-structured knowledge-based framework of object search is proposed in our approach as well as taking into account the multi-domain knowledge of applying object, scene, and service in design. In order to improve the searching efficiency and reasonability, an ontology-based hierarchical and interrelated knowledge structure is formed to support the implementation of complicated service planning with either single or multiple tasks. The proposed framework is tested by comprehensive experiments, and the performance is evaluated with other mainstream methods in both simulation and real-world environments. The experimental results demonstrate the feasibility and effectiveness of applying this knowledge-based framework to efficient object searching aspect in service planning.

8 citations


DOI
01 Jan 2022
TL;DR: An improved real-time object detection and recognition technique from web camera video is introduced and the convolutional neural network (CNN) has been used for the purpose of classifying the object.
Abstract: In computer vision, real-time object detection and recognition is considered as a challenging task in uncontrolled environments. In this research work, an improved real-time object detection and recognition technique from web camera video is introduced. Objects such as people, vehicles, animals, etc. are detected and recognized by this technique. Single Shot Detector (SSD) and You Only Look Once (YOLO) models are used in our paper shown promising results in the task of object detection and recognition for getting better performance. Our system can detect objects even in adverse as well as uncontrolled environments like excess or lack of light, rotation, mirroring and a variety of backgrounds, etc. Here, the convolutional neural network (CNN) has been used for the purpose of classifying the object. Our investigated technique is able to gain real-time performance with satisfactory detection as well as classification results and also provides better accuracy. The percentage of accuracy in the detection and classification of an object through our investigated model is about 63–90%.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a 3D-2D structural information fusion (SIF) for 3D object detection on LiDAR-camera system, which is based on hand-crafted 3D and 2D descriptors, generates primary structure feature, and has stable performance in outdoor scenes.

Journal ArticleDOI
TL;DR: In this article, a siamese convolutional neural network is developed to detect and track inherently useful landmarks from sensor data, after training upon synthetic datasets of visual, LiDAR or RGB-D data.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a visual persistence module in encoder and decoder for image captioning based on Encoder-Decoder frameworks, which seeks the core object features to replace the image global representation and fuses them as the final attended feature to generate a new word.

Journal ArticleDOI
TL;DR: In this paper, a new backbone network was designed to reinforce the feature extraction layers using convolutional blocks, and it was added to the object detection component to maintain lightweight compactness and complement feature mapping.

Journal ArticleDOI
TL;DR: This work demonstrates how a feature-fusion strategy of the orientation components leads to further improving visual recognition accuracy to 97% and carries out extensive experimentation on the publicly available Yale dataset, finding significant improvements.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper proposed a more lightweight framework to progressively sample discriminative parts for learning details from coarse-scale to fine-scale, without any pre-designed bounding boxes.
Abstract: Fine-grained image recognition puts forward a special challenge due to the difficulties of distinguishing subtle inter-class differences and large intra-class variances. Existing weakly supervised approaches tend to capture the most discriminative regions, thereby guiding network to learn fine-grained features. However, current methods neglect the correlation between object and details, where object localization is conductive to part detection. In addition, they generally not only need heavy computational cost to find details with auxiliary subnet or selective strategy, but also require well-designed bounding boxes which are inflexible for different scale targets. In this paper, we propose a more lightweight framework to progressively sampling discriminative parts for learning details from coarse-scale to fine-scale, without any pre-designed bounding boxes. Our method first amplifies the object (e.g., bird, car) from the original image in the light of class visual patterns, then a self-adaptive region sampler applied to detect most informative regions from attention maps to learn fine-grained representations. The framework consists of three streams, i.e., the whole, the object and the detail respectively, thus hierarchical features can be preserved and learned. Furthermore, our approach can be trained end-to-end in a weakly supervised manner, and few computational costs are needed at inference phase. Comprehensive experiments and ablation studies demonstrate that the proposed method obtains competitive performance on three benchmarks.

Book ChapterDOI
01 Jan 2022
TL;DR: In this paper, a Support Vector Machine (SVM) algorithm is utilized for performing analysis and decision-making in object classification domain, which can be used to improve accuracy in finding the classification of similar objects.
Abstract: Due to the rapid technological advancements in recent years, humans have gained the ability to design and implement the knowledge into machines and it also allows them to perform various functions such as autonomous thinking ability, understanding skills, and problem-solving. Moreover, machine learning [ML] plays an important role in developing the image-processing models and application. In real-time applications, the labels of objects may be unfamiliar to those who are unaware of them, or there may be several identical objects but are labeled differently. In this paper, the approach that would be useful to detect and classify various objects is presented with a trial study by utilizing various datasets. This method can be used to improve accuracy in finding the classification of similar objects. In the process of classifying real-time objects, this system uses a supervised learning technique, where various datasets are trained and compared with the queried object. Here, the Support Vector Machine (SVM) algorithm is utilized for performing analysis and decision making in object classification domain. The result of this project is to read the given objects with the help of computer vision and allowing the machine to perform the prediction or classification for the given object. More than 800 images obtained from standard datasets of trained labeled classes are implemented in our application to classify the objects.

Journal ArticleDOI
01 Mar 2022
TL;DR: In this article, an edge computing-based multivariate time series (EC-MTS) framework was developed to track mobile objects and exploit edge computing to offload its intensive computation tasks.
Abstract: Mobile object tracking, which has broad applications, utilizes a large number of Internet of Things (IoT) devices to identify, record, and share the trajectory information of physical objects. Nonetheless, IoT devices are energy constrained and not feasible for deploying advanced tracking techniques due to significant computing requirements. To address these issues, in this paper, we develop an edge computing-based multivariate time series (EC-MTS) framework to accurately track mobile objects and exploit edge computing to offload its intensive computation tasks. Specifically, EC-MTS leverages statistical technique (i.e., vector auto regression (VAR)) to conduct arbitrary historical object trajectory data revisit and fit a best-effort trajectory model for accurate mobile object location prediction. Our framework offers the benefit of offloading computation intensive tasks from IoT devices by using edge computing infrastructure. We have validated the efficacy of EC-MTS and our experimental results demonstrate that EC-MTS framework could significantly improve mobile object tracking efficacy in terms of trajectory goodness-of-fit and location prediction accuracy of mobile objects. In addition, we extend our proposed EC-MTS framework to conduct multiple objects tracking in IoT systems.


Book ChapterDOI
01 Jan 2022
TL;DR: A comprehensive overview of the recent advances in 3D object recognition of indoor objects using Convolutional Neural Networks (CNN) is presented in this paper, where a comparison of main recognition methods based on methods of geometric shape descriptor and supervised learning and their strengths and weakness are also included.
Abstract: Recognition of an object from a point cloud, image or video is an important task in computer vision which plays a crucial role in many real-world applications. The challenges involved in object recognition, aiming at locating object instances from a large number of predefined categories in collections (images, video or, model library), are multi-model, multi-pose, complicated background, occlusion, and depth variations. In the past few years numerous methods were developed to tackle these challenges and have reported remarkable progress for 3D objects. However, suitable methods of object recognition are needed to achieve added value in built environment. Suitable acquisition methods are also necessary to compensate the impact of darkness, dirt, and occlusion. This chapter provides a comprehensive overview of the recent advances in 3D object recognition of indoor objects using Convolutional Neural Networks (CNN). Methodology for object recognition, approaches for point cloud generation, and test bases are presented. The comparison of main recognition methods based on methods of geometric shape descriptor and supervised learning and their strengths and weakness are also included. The focus lies on the specific requirements and constrains in an industrial environment like tight assembly, light, dirt, occlusion, or incomplete data sets. Finally, a recommendation for use of existing CNN framework for implementation of an automatic object recognition procedure is given.

Journal ArticleDOI
TL;DR: In this article, a divergent activation module and a similarity module are introduced to improve the response strength of the low-response areas in the shallow feature map and suppress background noise.

Journal ArticleDOI
TL;DR: In this paper, a CNN is used to map arbitrary objects to blob-like structures and then, using a Laplacian of Gaussian (LoG) filter, gather the position of all detected objects.




DOI
01 Jan 2022
TL;DR: In this article, a novel framework is developed for the discovery of the object in the IoT ecosystem based on the data gathered by the data center about the objects in the Internet of Things (IoT).
Abstract: Internet of Things (IoT) paradigm joins physical objects in the real world to the internet and prompts the creation of smart environments and applications. A real world object is the building block of the IoT, known as a smart object that keeps an observation of the environment. These objects can communicate among themselves and also possess data processing capabilities. A novel framework is developed for the discovery of the object in the IoT ecosystem based on the data gathered by the data center about the objects in the IoT ecosystem. The search technique depends on the data gathered and after analyzing the data comprehension is made, the central data center maintains the type of data that the objects often send. Based on the gathered data, analytics is made and each object is categorized into a predefined category, thereby enhance the search mechanism based on the clue of the data that the objects transmit to the central processing entity.


Book ChapterDOI
01 Jan 2022
TL;DR: In this article, the use of SQL to facilitate the removal of outliers from an embedding database and also discuss latent works which study the geometric and statistical relationships between embeddings to formulate methods for outlier embedding removal.
Abstract: Deep Metric learning (DML) is gaining popularity recently as a way of exploiting the advantages of deep learning in applications where the task involves adaption to variable object features, e.g. facial verification, person re-identification. In applications of deep learning where generalisability is difficult to achieve, DML provides an architecture which has the facility for the algorithm’s output’s to be adapted to each use case by framing classification tasks as a reidentification problem. At the inference stage, query embeddings generated by the DML model are compared against a gallery of embeddings in a latent space. This paper will investigate online database management strategies to preserve the quality and diversity of data and the representation of each class in the gallery of embeddings. We propose the use of SQL to facilitate the removal of outliers from an embedding database and also discuss latent works which study the geometric and statistical relationships between embeddings to formulate methods for outlier embeddings removal.