scispace - formally typeset
Search or ask a question

Showing papers on "Sketch recognition published in 2022"


Journal ArticleDOI
TL;DR: Peng et al. as mentioned in this paper proposed a graph neural network (GNN) for learning representations of sketches from multiple graphs, which simultaneously capture global and local geometric stroke structures as well as temporal information.
Abstract: Learning meaningful representations of free-hand sketches remains a challenging task given the signal sparsity and the high-level abstraction of sketches. Existing techniques have focused on exploiting either the static nature of sketches with convolutional neural networks (CNNs) or the temporal sequential property with recurrent neural networks (RNNs). In this work, we propose a new representation of sketches as multiple sparsely connected graphs. We design a novel graph neural network (GNN), the multigraph transformer (MGT), for learning representations of sketches from multiple graphs, which simultaneously capture global and local geometric stroke structures as well as temporal information. We report extensive numerical experiments on a sketch recognition task to demonstrate the performance of the proposed approach. Particularly, MGT applied on 414k sketches from Google QuickDraw: 1) achieves a small recognition gap to the CNN-based performance upper bound (72.80% versus 74.22%) and infers faster than the CNN competitors and 2) outperforms all RNN-based models by a significant margin. To the best of our knowledge, this is the first work proposing to represent sketches as graphs and apply GNNs for sketch recognition. Code and trained models are available at https://github.com/PengBoXiangShang/multigraph_transformer.

13 citations


Proceedings ArticleDOI
01 Jan 2022
TL;DR: Zhang et al. as mentioned in this paper proposed an open-domain sketch-to-photo translation method, which can synthesize a realistic photo from a freehand sketch with its class label, even if the sketches of that class are missing in the training data.
Abstract: In this paper, we explore open-domain sketch-to-photo translation, which aims to synthesize a realistic photo from a freehand sketch with its class label, even if the sketches of that class are missing in the training data. It is challenging due to the lack of training supervision and the large geometric distortion between the freehand sketch and photo domains. To synthesize the absent freehand sketches from photos, we propose a framework that jointly learns sketch-to-photo and photo-to-sketch generation. However, the generator trained from fake sketches might lead to unsatisfying results when dealing with sketches of missing classes, due to the domain gap between synthesized sketches and real ones. To alleviate this issue, we further propose a simple yet effective open-domain sampling and optimization strategy to "fool" the generator into treating fake sketches as real ones. Our method takes advantage of the learned sketch-to-photo and photo-to-sketch mapping of in-domain data and generalizes it to the open-domain classes. We validate our method on the Scribble and SketchyCOCO datasets. Compared with the recent competing methods, our approach shows impressive results in synthesizing realistic color, texture, and maintaining the geometric composition for various categories of open-domain sketches.

10 citations


Journal ArticleDOI
TL;DR: In this article , the authors used the Google Academic and literature database Web of Science (WoS) to search for studies related to HCI and deep learning, such as intelligent HCI, speech recognition, emotion recognition, and intelligent robot direction.
Abstract: In recent years, gesture recognition and speech recognition, as important input methods in Human–Computer Interaction (HCI), have been widely used in the field of virtual reality. In particular, with the rapid development of deep learning, artificial intelligence, and other computer technologies, gesture recognition and speech recognition have achieved breakthrough research progress. The search platform used in this work is mainly the Google Academic and literature database Web of Science. According to the keywords related to HCI and deep learning, such as “intelligent HCI”, “speech recognition”, “gesture recognition”, and “natural language processing”, nearly 1000 studies were selected. Then, nearly 500 studies of research methods were selected and 100 studies were finally selected as the research content of this work after five years (2019–2022) of year screening. First, the current situation of the HCI intelligent system is analyzed, the realization of gesture interaction and voice interaction in HCI is summarized, and the advantages brought by deep learning are selected for research. Then, the core concepts of gesture interaction are introduced and the progress of gesture recognition and speech recognition interaction is analyzed. Furthermore, the representative applications of gesture recognition and speech recognition interaction are described. Finally, the current HCI in the direction of natural language processing is investigated. The results show that the combination of intelligent HCI and deep learning is deeply applied in gesture recognition, speech recognition, emotion recognition, and intelligent robot direction. A wide variety of recognition methods were proposed in related research fields and verified by experiments. Compared with interactive methods without deep learning, high recognition accuracy was achieved. In Human–Machine Interfaces (HMIs) with voice support, context plays an important role in improving user interfaces. Whether it is voice search, mobile communication, or children’s speech recognition, HCI combined with deep learning can maintain better robustness. The combination of convolutional neural networks and long short-term memory networks can greatly improve the accuracy and precision of action recognition. Therefore, in the future, the application field of HCI will involve more industries and greater prospects are expected.

8 citations


Proceedings ArticleDOI
23 May 2022
TL;DR: This work replaces the natural image in image caption dataset with the sketch with the corresponding objects to generate pseudo sketch, which can obtain pseudo paired sketch-caption and sketch-image data and proposes four novel objectives, which help the model learn mappings between sketch and story from more perspectives.
Abstract: Sketch storytelling aims to generate a story for a given sketch. Although image captioning based on deep learning has great progress, describing the sketch in a story style is still a challenge. The reason is that there is currently no paired sketch-story data which is expensive to acquire. Therefore, it is necessary to train a sketch storytelling model without using any paired sketch-story data. To address these issues, we replace the natural image in image caption dataset with the sketch with the corresponding objects to generate pseudo sketch, which can obtain pseudo paired sketch-caption and sketch-image data. Due to these pseudo sketches are not drawn in a standardized way, we present a selective attention module to reduce noise for pseudo sketches. Furthermore, we propose four novel objectives include sketch-image matching, image-caption generation, sketch-caption generation, mask infilling, which help the model learn mappings between sketch and story from more perspectives. Consequently, we built a test set for sketch-story evaluation. The experimental results show that our model achieves state-of-the-art performance as compared to other methods.

5 citations


Journal ArticleDOI
TL;DR: In this paper , the authors used sketches represented as a sequence of strokes, i.e., as vector images, to effectively capture the long-term temporal dependencies in hand-drawn sketches.
Abstract: For the past few decades, machines have replaced humans in several disciplines. However, machine cognition still lags behind the human capabilities. We address the machines’ ability to recognize human drawn sketches in this work. Visual representations, such as sketches have long been a medium of communication for humans. For artificially intelligent systems to effectively immerse in interactive environments, it is required that machines understand such notations. The abstract nature and varied artistic styling of these sketches make automatic recognition of drawings more challenging than other areas of image classification. In this article, we use sketches represented as a sequence of strokes, i.e., as vector images, to effectively capture the long-term temporal dependencies in hand-drawn sketches. The proposed approach combines the self-attention capabilities of Transformers while effectively utilizing the long-term temporal dependencies through temporal convolution networks (TCNs) for sketch recognition. The confidence scores obtained from the two techniques are combined using triangular-norm (T-norm). Attention heat maps are plotted to isolate the discriminating parts of a sketch that contribute to sketch classification. The extensive quantitative and qualitative evaluation confirms that the proposed network performs favorably against state-of-the-art techniques.

3 citations


Proceedings ArticleDOI
23 Mar 2022
TL;DR: In this paper , an approach to recognize hand gestures is introduced, and a virtual mouse and keyboard with hand gesture recognition using Computer Vision techniques are implemented, and full keyboard features and mouse cursor movement and click events are implemented to control the computer virtually.
Abstract: Human-Computer Interaction (HCI) is the interface between humans and computers. Traditionally, mouse and keyboards are used to interact with computers. An approach recently introduced to interact with computers is hand gestures. In this research paper, an approach to recognize hand gestures is introduced, and a virtual mouse and keyboard with hand gesture recognition using Computer Vision techniques are implemented. Full keyboard features and mouse cursor movement and click events are implemented to control the computer virtually. The recognition rate and response rate of all the considered inputs are calculated and presented in the results. The accuracy of the presented approach is compared with the other state-of-the-art algorithms that show the method presented here performs better with the accuracy of 95%.

3 citations


Journal ArticleDOI
TL;DR: TASK-former as mentioned in this paper uses a text description and a sketch as input to improve image retrieval performance, and empirically demonstrate that using an input sketch (even a poorly drawn one) in addition to text considerably increases retrieval recall compared to traditional text-based image retrieval.
Abstract: We address the problem of retrieving in-the-wild images with both a sketch and a text query. We present TASK-former (Text And SKetch transformer), an end-to-end trainable model for image retrieval using a text description and a sketch as input. We argue that both input modalities complement each other in a manner that cannot be achieved easily by either one alone. TASK-former follows the late-fusion dual-encoder approach, similar to CLIP [35], which allows efficient and scalable retrieval since the retrieval set can be indexed independently of the queries. We empirically demonstrate that using an input sketch (even a poorly drawn one) in addition to text considerably increases retrieval recall compared to traditional text-based image retrieval. To evaluate our approach, we collect 5,000 hand-drawn sketches for images in the test set of the COCO dataset. The collected sketches are available a https://janesjanes.github.io/tsbir/ .

2 citations



Journal ArticleDOI
TL;DR: This paper studies its application in continuous sign language sentence recognition based on inertial sensor and rule matching recognition algorithm and finds the recognition rate of nine single HRI gestures is 92.7%, and the HRI of combined gestures is realized.
Abstract: The process of human-computer cooperation using gesture recognition can make people get rid of the limitations of traditional input devices such as mouse and keyboard, and control artificial intelligence devices more efficiently and naturally. As a new way of human-robot interaction (HRI), gesture recognition has made some progress. There are many ways to realize gesture recognition combined with visual recognition, motion information acquisition and EMG signal. The research on isolated language gesture recognition has been quite mature, but the expression semantics of isolated gestures is single. In order to improve the interaction efficiency, the application of continuous gesture recognition is essential. This paper studies its application in continuous sign language sentence recognition based on inertial sensor and rule matching recognition algorithm. The recognition rate of nine single HRI gestures is 92.7%, and the HRI of combined gestures is realized.

2 citations


Journal ArticleDOI
26 Sep 2022-Sensors
TL;DR: This work features a novel approach to the use of an intermediate latent space between the two modalities that circumvents the problem of modality gap for face photo-sketch recognition and introduces a three-step training scheme.
Abstract: The photo-sketch matching problem is challenging because the modality gap between a photo and a sketch is very large. This work features a novel approach to the use of an intermediate latent space between the two modalities that circumvents the problem of modality gap for face photo-sketch recognition. To set up a stable homogenous latent space between a photo and a sketch that is effective for matching, we utilize a bidirectional (photo → sketch and sketch → photo) collaborative synthesis network and equip the latent space with rich representation power. To provide rich representation power, we employ StyleGAN architectures, such as StyleGAN and StyleGAN2. The proposed latent space equipped with rich representation power enables us to conduct accurate matching because we can effectively align the distributions of the two modalities in the latent space. In addition, to resolve the problem of insufficient paired photo/sketch samples for training, we introduce a three-step training scheme. Extensive evaluation on a public composite face sketch database confirms superior performance of the proposed approach compared to existing state-of-the-art methods. The proposed methodology can be employed in matching other modality pairs.

2 citations


Journal ArticleDOI
TL;DR: Experimental results of gesture recognition on public data sets NTU and VIVA show that the proposed algorithm can effectively avoid the over-fitting problem of training models, and has higher recognition accuracy and stronger robustness than traditional algorithms.
Abstract: The application development of hot technology is both an opportunity and a challenge. The vision-based gesture recognition rate is low and real-time performance is poor, so various algorithms need to be studied to improve the accuracy and speed of recognition. In this paper, we propose a novel gesture recognition based on two channel region-based convolution neural network for explainable human-computer interaction understanding. The input gesture image is extracted through two mutually independent channels. The two channels have convolution kernel with different scales, which can extract the features of different scales in the input image, and then carry out feature fusion at the fully connection layer. Finally, it is classified by the softmax classifier. The two-channel convolutional neural network model is proposed to solve the problem of insufficient feature extraction by the convolution kernel. Experimental results of gesture recognition on public data sets NTU and VIVA show that the proposed algorithm can effectively avoid the over-fitting problem of training models, and has higher recognition accuracy and stronger robustness than traditional algorithms.

Proceedings ArticleDOI
01 Jun 2022
TL;DR: In this article , the authors present a study about the use of unlabeled data to improve a sketch-based model and evaluate variations of VAE and semi-supervised VAE, and present an extension of BYOL to deal with sketches.
Abstract: Sketch-based understanding is a critical component of human cognitive learning and is a primitive communication means between humans. This topic has recently attracted the interest of the computer vision community as sketching represents a powerful tool to express static objects and dynamic scenes. Unfortunately, despite its broad application domains, the current sketch-based models strongly rely on labels for supervised training, ignoring knowledge from unlabeled data, thus limiting the underlying generalization and the applicability. Therefore, we present a study about the use of unlabeled data to improve a sketch-based model. To this end, we evaluate variations of VAE and semi-supervised VAE, and present an extension of BYOL to deal with sketches. Our results show the superiority of sketch-BYOL, which outperforms other self-supervised approaches increasing the retrieval performance for known and unknown categories. Furthermore, we show how other tasks can benefit from our proposal.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a hierarchical residual network as a whole for sketch recognition and evaluated it on the Tu-Berlin benchmark thoroughly, and the experimental results show that the proposed network outperforms most of baseline methods and it is excellent among non-sequential models.
Abstract: With the widespread use of touch-screen devices, it is more and more convenient for people to draw sketches on screen. This results in the demand for automatically understanding the sketches. Thus, the sketch recognition task becomes more significant than before. To accomplish this task, it is necessary to solve the critical issue of improving the distinction of the sketch features. To this end, we have made efforts in three aspects. First, a novel multi-scale residual block is designed. Compared with the conventional basic residual block, it can better perceive multi-scale information and reduce the number of parameters during training. Second, a hierarchical residual structure is built by stacking multi-scale residual blocks in a specific way. In contrast with the single-level residual structure, the learned features from this structure are more sufficient. Last but not least, the compact triplet-center loss is proposed specifically for the sketch recognition task. It can solve the problem that the triplet-center loss does not fully consider too large intra-class space and too small inter-class space in sketch field. By studying the above modules, a hierarchical residual network as a whole is proposed for sketch recognition and evaluated on Tu-Berlin benchmark thoroughly. The experimental results show that the proposed network outperforms most of baseline methods and it is excellent among non-sequential models at present.

Journal ArticleDOI
TL;DR: ExtrudeNet as discussed by the authors is an unsupervised end-to-end network for discovering sketch and extrude from point clouds, which can model extrusion with freeform sketches and conventional cylinder and box primitives as well.
Abstract: Sketch-and-extrude is a common and intuitive modeling process in computer aided design. This paper studies the problem of learning the shape given in the form of point clouds by “inverse” sketch-and-extrude. We present ExtrudeNet, an unsupervised end-to-end network for discovering sketch and extrude from point clouds. Behind ExtrudeNet are two new technical components: 1) an effective representation for sketch and extrude, which can model extrusion with freeform sketches and conventional cylinder and box primitives as well; and 2) a numerical method for computing the signed distance field which is used in the network learning. This is the first attempt that uses machine learning to reverse engineer the sketch-and-extrude modeling process of a shape in an unsupervised fashion. ExtrudeNet not only outputs a compact, editable and interpretable representation of the shape that can be seamlessly integrated into modern CAD software, but also aligns with the standard CAD modeling process facilitating various editing applications, which distinguishes our work from existing shape parsing research. Code is released at https://github.com/kimren227/ExtrudeNet .

Journal ArticleDOI
TL;DR: In this article , a study has analyzed 571 papers related to gesture recognition and artificial intelligence and extracted relevant information related to scientific production, such as the most productive authors and journals or the most pertinent articles on the subject.
Abstract: Gesture recognition is an ideal means of interaction because it allows users not to have to make contact with any surface, which is a safe and hygienic means, especially in the pandemic situation that is occurring worldwide. However, gesture recognition is not a new discipline and it has been researched for many years but this type of interaction has not succeeded in replacing the keyboard and mouse. It is very useful to know about the advances that are being made with artificial intelligence in gesture recognition to be able to perform a more robust and reliable gesture recognition with a low response time. As it is, deep learning is being integrated into various areas to increase improvement in performance and one such area is artificial intelligence. In this way, there is the possibility that in the future the recognition of gestures will be a viable option as a means of daily interaction for the user and the main objective of this paper is to contribute to that process. For this reason, this study has analyzed 571 papers related to gesture recognition and artificial intelligence. This analysis has extracted relevant information related to scientific production, such as the most productive authors and journals or the most pertinent articles on the subject. Furthermore, we have developed our own model, which shows the relationship between the types of gesture recognition and the artificial intelligence techniques that have been applied for this task.

Book ChapterDOI
01 Jan 2022
TL;DR: Doodleformer as mentioned in this paper decomposes the creative sketch generation problem into the creation of coarse sketch composition followed by the incorporation of fine-details in the sketch, and introduces graph-aware transformer encoders that effectively capture global dynamic as well as local static structural relations among different body parts.
Abstract: Creative sketching or doodling is an expressive activity, where imaginative and previously unseen depictions of everyday visual objects are drawn. Creative sketch image generation is a challenging vision problem, where the task is to generate diverse, yet realistic creative sketches possessing the unseen composition of the visual-world objects. Here, we propose a novel coarse-to-fine two-stage framework, DoodleFormer, that decomposes the creative sketch generation problem into the creation of coarse sketch composition followed by the incorporation of fine-details in the sketch. We introduce graph-aware transformer encoders that effectively capture global dynamic as well as local static structural relations among different body parts. To ensure diversity of the generated creative sketches, we introduce a probabilistic coarse sketch decoder that explicitly models the variations of each sketch body part to be drawn. Experiments are performed on two creative sketch datasets: Creative Birds and Creative Creatures. Our qualitative, quantitative and human-based evaluations show that DoodleFormer outperforms the state-of-the-art on both datasets, yielding realistic and diverse creative sketches. On Creative Creatures, DoodleFormer achieves an absolute gain of 25 in Frèchet inception distance (FID) over state-of-the-art. We also demonstrate the effectiveness of DoodleFormer for related applications of text to creative sketch generation, sketch completion and house layout generation. Code is available at: https://github.com/ankanbhunia/doodleformer .

Proceedings ArticleDOI
08 Mar 2022
TL;DR: In this paper , a gesture recognition system was constructed using a 1DCNN, and the recognition accuracy was verified to be improved by introducing a self-attention mechanism, and a skeletal detection and its accuracy improvement technique was described.
Abstract: This research is aimed at recognizing the gesture of a lifting coordinator and automating the operation of a crane by introducing a system with deep learning. This paper first explains the outline of a gesture recognition system, and describes skeletal detection and its accuracy improvement technique. Furthermore, a gesture recognition system is constructed using a 1DCNN, and the recognition accuracy is verified to be improved by introducing a self-attention mechanism.

Journal ArticleDOI
TL;DR: In this paper , a sketch-based editing operation can simultaneously modify the geometry and topology of multiple geometric objects via over-sketching, which is based on the fuzzy logic-based strategy of SKIT.

Journal ArticleDOI
TL;DR: In this paper , the authors identify the frequently occurring generalizations in 108 sketch maps of small urban area by manually extracting generalized features such as streets and buildings and classifying them based on similarities.

Journal ArticleDOI
TL;DR: The SketchMaker is developed as an interactive sketch extraction and composition system to help users generate scene sketches via reusing object sketches in existing scene sketches with minimal manual intervention and how SBIR improves from composited scene sketches is demonstrated to verify the performance of the interactive sketch processing system.
Abstract: Sketching is an intuitive and simple way to depict sciences with various object form and appearance characteristics. In the past few years, widely available touchscreen devices have increasingly made sketch-based human-AI co-creation applications popular. One key issue of sketch-oriented interaction is to prepare input sketches efficiently by non-professionals because it is usually difficult and time-consuming to draw an ideal sketch with appropriate outlines and rich details, especially for novice users with no sketching skills. Thus, sketching brings great obstacles for sketch applications in daily life. On the other hand, hand-drawn sketches are scarce and hard to collect. Given the fact that there are several large-scale sketch datasets providing sketch data resources, but they usually have a limited number of objects and categories in sketch, and do not support users to collect new sketch materials according to their personal preferences. In addition, few sketch-related applications support the reuse of existing sketch elements. Thus, knowing how to extract sketches from existing drawings and effectively re-use them in interactive scene sketch composition will provide an elegant way for sketch-based image retrieval (SBIR) applications, which are widely used in various touch screen devices. In this study, we first conduct a study on current SBIR to better understand the main requirements and challenges in sketch-oriented applications. Then we develop the SketchMaker as an interactive sketch extraction and composition system to help users generate scene sketches via reusing object sketches in existing scene sketches with minimal manual intervention. Moreover, we demonstrate how SBIR improves from composited scene sketches to verify the performance of our interactive sketch processing system. We also include a sketch-based video localization task as an alternative application of our sketch composition scheme. Our pilot study shows that our system is effective and efficient, and provides a way to promote practical applications of sketches.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a four-branch Siamese network based on sketch-specific data augmentation to generate discriminative feature representations and improve the sketch-recognition accuracy.
Abstract: The shortened abstract is as follows: Sketch recognition has become an important hotspot issue because of sketch's intuitiveness and visualization. The existing sketch-recognition methods based on handcrafted features and deep features are insufficient in the recognition of the local information of sketches, and the recognition accuracy is not ideal. Accordingly, this paper proposes a four-branch Siamese network based on sketch-specific data augmentation to generate discriminative feature representations and improve the sketch-recognition accuracy. A sketch is an ordered list of strokes, we adopt the semantic information of strokes as the decomposition criteria to divide a sketch into three disjoint local blocks, and then combine the local blocks in pairs to form three new sketches. In order to give full play to the positive effect of local blocks on category prediction and enhance the fine-grained capability of the network, three newly generated sketches and the original sketch are combined to construct a four-branch Siamese network. Each branch network adopts the Sketch-A-Net architecture with the fully connected layer removed as the basic network, and we improve it by adding shortcut connection layer and multi-scale weighted bilinear coding (MWBC) modules. Compared with the state-of-the-art methods, the experimental results on the TU-Berlin dataset demonstrate the excellent performance of our model.

Proceedings ArticleDOI
23 May 2022
TL;DR: Zhang et al. as mentioned in this paper proposed a method to generate a potential grasp configuration relevant to the sketch-depicted objects by incorporating the structure of the sketch to enhance the representation ability.
Abstract: In this paper, we are interested in the problem of generating target grasps by understanding freehand sketches. The sketch is useful for the persons who cannot formulate language and the cases where a textual description is not available on the fly. However, very few works are aware of the usability of this novel interactive way between humans and robots. To this end, we propose a method to generate a potential grasp configuration relevant to the sketch-depicted objects. Due to the inherent ambiguity of sketches with abstract details, we take the advantage of the graph by incorporating the structure of the sketch to enhance the representation ability. This graph-represented sketch is further validated to improve the generalization of the network, capable of learning the sketch-queried grasp detection by using a small collection (around 100 samples) of hand-drawn sketches. Additionally, our model is trained and tested in an end-to-end manner which is easy to be implemented in real-world applications. Experiments on the multi-object VMRD and GraspNet-1Billion datasets demonstrate the good generalization of the proposed method. The physical robot experiments confirm the utility of our method in object-cluttered scenes.

Journal ArticleDOI
Lili Bao1
TL;DR: In this article , Guo et al. proposed a method for automated face-sketch recognition based on domain alignment embedding network for Sketch Face Recognition, where it is deemed to be superior to several other methods published in literature.
Abstract: Guo et al. proposed a method for automated face-sketch recognition in their paper “Domain Alignment Embedding Network for Sketch Face Recognition”, where it is deemed to be superior to several other methods published in literature. However, the employed evaluation methodology contains several critical flaws, such that definitive conclusions with regards to performance comparisons between the methods considered cannot be made. Moreover, erroneous results are reported for the works published in literature given the employed evaluation framework, potentially compromising the accuracy of future works that use the results published by Guo et al. Discussions on these and other observations are given in this manuscript.

Book ChapterDOI
ANDREW POPP1
01 Jan 2022

Posted ContentDOI
03 Mar 2022
TL;DR: The FS-COCO dataset as discussed by the authors contains 10,000 freehand scene vector sketches with per point space-time information by 100 non-expert individuals, offering both object and scene-level abstraction.
Abstract: We advance sketch research to scenes with the first dataset of freehand scene sketches, FS-COCO. With practical applications in mind, we collect sketches that convey scene content well but can be sketched within a few minutes by a person with any sketching skills. Our dataset comprises 10,000 freehand scene vector sketches with per point space-time information by 100 non-expert individuals, offering both object- and scene-level abstraction. Each sketch is augmented with its text description. Using our dataset, we study for the first time the problem of fine-grained image retrieval from freehand scene sketches and sketch captions. We draw insights on: (i) Scene salience encoded in sketches using the strokes temporal order; (ii) Performance comparison of image retrieval from a scene sketch and an image caption; (iii) Complementarity of information in sketches and image captions, as well as the potential benefit of combining the two modalities. In addition, we extend a popular vector sketch LSTM-based encoder to handle sketches with larger complexity than was supported by previous work. Namely, we propose a hierarchical sketch decoder, which we leverage at a sketch-specific "pre-text" task. Our dataset enables for the first time research on freehand scene sketch understanding and its practical applications.

Proceedings ArticleDOI
28 Nov 2022
TL;DR: In this paper , the authors summarize the recent gesture recognition solutions and apply them to gesture recognition to improve the accuracy of gesture recognition, which will provide an additional option for gesture recognition control.
Abstract: With the development of UAV, more and more people use the lens of drones for image recognition. Among them, the application of gesture recognition does not systematically sort out relevant information, and for gesture recognition, its accuracy will be a problem. In addition, during the recognition process, different scenes will also affect the judgment result of the gesture, so how to achieve the reduction of the interference brought by the venue is also a challenge. This paper summarizes recent papers on the application of gesture recognition on drones and applies it to gesture recognition. For the training module, the RNN and CNN architecture will be used for training. For the interference caused by the environment, we add more environment maps for the training. This paper summarizes the recent gesture recognition solutions and applies them to gesture recognition to improve the accuracy of gesture recognition, which will provide an additional option for gesture recognition control.

Book ChapterDOI
01 Jan 2022
TL;DR: In this article , the authors propose a primitive-based sketch abstraction task, which maps each stroke of a sketch to its most similar primitive in a given set, predicting an affine transformation that aligns the selected primitive to the target stroke.
Abstract: Humans show high-level of abstraction capabilities in games that require quickly communicating object information. They decompose the message content into multiple parts and communicate them in an interpretable protocol. Toward equipping machines with such capabilities, we propose the Primitive-based Sketch Abstraction task where the goal is to represent sketches using a fixed set of drawing primitives under the influence of a budget. To solve this task, our Primitive-Matching Network (PMN), learns interpretable abstractions of a sketch in a self supervised manner. Specifically, PMN maps each stroke of a sketch to its most similar primitive in a given set, predicting an affine transformation that aligns the selected primitive to the target stroke. We learn this stroke-to-primitive mapping end-to-end with a distance-transform loss that is minimal when the original sketch is precisely reconstructed with the predicted primitives. Our PMN abstraction empirically achieves the highest performance on sketch recognition and sketch-based image retrieval given a communication budget, while at the same time being highly interpretable. This opens up new possibilities for sketch analysis, such as comparing sketches by extracting the most relevant primitives that define an object category. Code is available at https://github.com/ExplainableML/sketch-primitives .

Journal ArticleDOI
TL;DR: A standalone application is presented which would allow users to create composites face sketch of the suspect without the help of forensic artists using drag and drop feature in the application and can automatically match the drawn composite face sketch with the police database much faster and efficiently using deep learning and cloud infrastructure.
Abstract: Abstract: In forensic science, it is seen that hand-drawn face sketches are still very limited and time consuming when it comes to using them with the latest technologies used for recognition and identification of criminals. In this paper, we present a standalone application which would allow users to create composites face sketch of the suspect without the help of forensic artists using drag and drop feature in the application and can automatically match the drawn composite face sketch with the police database much faster and efficiently using deep learning and cloud infrastructure. Keywords: Forensic Face Sketch, Face Sketch Construction, Face Recognition, Criminal Identification, Deep Learning, Machine Locking, Two Step Verification.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed domain adaptation scaling entropy meta-network (DASEMN) to tackle the sketch face recognition problem, where a meta-learning training strategy is designed to tackle few-shot problem and improve the generalization ability of the network.
Abstract: In recent years, sketch face recognition has a wide application in law enforcement agencies and criminals. Deep learning plays a crucial role in the recent developments of face recognition, however, it is challenging to employ deep learning methods for sketch face recognition due to insufficient face photo–sketch data. Moreover, compared to photos, sketches lack detailed texture, and there exists a domain gap between photos and sketches, hence, traditional homogeneous face recognition methods perform poorly in sketch face recognition. In this paper, a novel deep learning method termed Domain Adaptation Scaled Entropy Meta-Network (DASEMN) is proposed to tackle sketch face recognition tasks. Specifically, a meta-learning training strategy is designed to tackle the few-shot problem and improve the generalization ability of the network. Then, a generalized entropy loss termed scaled mean entropy loss is proposed to guide the network to extract discriminate features. Finally, a domain adaptation module is introduced in the training set to reduce the domain gap between the sketch domain and the photo domain. Experiments on UoM-SGFS and CUFSF sketch face databases show that the proposed method is superior to other sketch face recognition methods.

Book ChapterDOI
29 Sep 2022
TL;DR: In this article , a low-cost infrastructure device that alters the need for keyboards and mice in laptops and computers is proposed. But it is applied to a hand gesture recognition system to provide input to the computer to manipulate virtual objects by simply moving hand parts which act as commands.
Abstract: Hand gesture recognition provides a significant impact in the field of human–computer interaction. It introduces the information, tools, and systematic design techniques, by which accuracy and easy implementation of daily tasks can be achieved. Gesture recognition is the approach by which computers can detect hand gestures. Human–computer interaction provides appropriateness in feedback, effortless implementation, and timely completion of the goal. Computer vision plays an important role in extracting high levels of comprehension from electronic images and videos. It is applied to a hand gesture recognition system to provide input to the computer to manipulate virtual objects by simply moving hand parts which act as a command. Providing a low-cost infrastructure device that alters the need for keyboards and mouse in laptops and computers.