scispace - formally typeset
Search or ask a question

Showing papers by "Fuji Ren published in 2020"


Journal ArticleDOI
TL;DR: Through a collection of experimental studies, it is demonstrated that the proposed PFLSCM algorithm achieves improved segmentation performance in comparison with the results produced by some related FCM-based algorithms.

72 citations


Journal ArticleDOI
TL;DR: This research highlights the existing technologies of listening, speaking, reading, writing, and other senses, which are widely used in human interaction, and introduces some intelligent robot systems and platforms.
Abstract: In the field of artificial intelligence, human–computer interaction (HCI) technology and its related intelligent robot technologies are essential and interesting contents of research. From the perspective of software algorithm and hardware system, these above-mentioned technologies study and try to build a natural HCI environment. The purpose of this research is to provide an overview of HCI and intelligent robots. This research highlights the existing technologies of listening, speaking, reading, writing, and other senses, which are widely used in human interaction. Based on these same technologies, this research introduces some intelligent robot systems and platforms. This paper also forecasts some vital challenges of researching HCI and intelligent robots. The authors hope that this work will help researchers in the field to acquire the necessary information and technologies to further conduct more advanced research.

65 citations


Journal ArticleDOI
TL;DR: The proposed CGMVQA model, including classification and answer generation capabilities, is effective in medical visual question answering and can better assist doctors in clinical analysis and diagnosis.
Abstract: Medical images are playing an important role in the medical domain. A mature medical visual question answering system can aid diagnosis, but there is no satisfactory method to solve this comprehensive problem so far. Considering that there are many different types of questions, we propose a model called CGMVQA, including classification and answer generation capabilities to turn this complex problem into multiple simple problems in this paper. We adopt data augmentation on images and tokenization on texts. We use pre-trained ResNet152 to extract image features and add three kinds of embeddings together to deal with texts. We reduce the parameters of the multi-head self-attention transformer to cut the computational cost down. We adjust the masking and output layers to change the functions of the model. This model establishes new state-of-the-art results: 0.640 of classification accuracy, 0.659 of word matching and 0.678 of semantic similarity in ImageCLEF 2019 VQA-Med data set. It suggests that the CGMVQA is effective in medical visual question answering and can better assist doctors in clinical analysis and diagnosis.

58 citations


Journal ArticleDOI
TL;DR: A novel Energy Optimized Congestion Control based on Temperature Aware Routing Algorithm (EOCC-TARA) using Enhanced Multi-objective Spider Monkey Optimization (EMSMO) for SDN-based WBAN overcomes the vital challenges, namely energy-efficiency, congestion-free communication, and reducing adverse thermal effects in WBAN routing.
Abstract: Wireless Body Area Network (WBAN) is a promising cost-effective technology for the privacy confined military applications and healthcare applications like remote health monitoring, telemedicine, and e-health services. The use of a Software-Defined Network (SDN) approach improves the control and management processes of the complex structured WBANs and also provides higher flexibility and dynamic network structure. To seamless routing performance in SDN-based WBAN, the energy-efficiency problems must be tackled effectively. The main contribution of this paper is to develop a novel Energy Optimized Congestion Control based on Temperature Aware Routing Algorithm (EOCC-TARA) using Enhanced Multi-objective Spider Monkey Optimization (EMSMO) for SDN-based WBAN. This algorithm overcomes the vital challenges, namely energy-efficiency, congestion-free communication, and reducing adverse thermal effects in WBAN routing. First, the proposed EOCC-TARA routing algorithm considers the effects of temperature due to the thermal dissipation of sensor nodes and formulates a strategy to adaptively select the forwarding nodes based on temperature and energy. Then the congestion avoidance concept is added with the energy-efficiency, link reliability, and path loss for modeling the cost function based on which the EMSMO provides the optimal routing. Simulations were performed, and the evaluation results showed that the proposed EOCC-TARA routing algorithm has superior performance than the traditional routing approaches in terms of energy consumption, network lifetime, throughput, temperature control, congestion overhead, delay, and successful transmission rate.

48 citations


Journal ArticleDOI
TL;DR: An electroencephalogram(EEG) emotion recognition method based on the LIBSVM classifier, where EEG features are calculated to represent the characterisitics associated with emotion states and the classification results of each channel are fused by the Takagi-Sugeno fuzzy model.

39 citations


Journal ArticleDOI
TL;DR: A triplet training framework based on the multiclass classification approach to conduct the training for the intention detection task is proposed and a Siamese neural network architecture with metric learning is utilized to construct a robust and discriminative utterance feature embedding model.
Abstract: Understanding the user's intention is an essential task for the spoken language understanding (SLU) module in the dialogue system, which further illustrates vital information for managing and generating future action and response. In this paper, we propose a triplet training framework based on the multiclass classification approach to conduct the training for the intention detection task. Precisely, we utilize a Siamese neural network architecture with metric learning to construct a robust and discriminative utterance feature embedding model. We modified the RMCNN model and fine-tuned BERT model as Siamese encoders to train utterance triplets from different semantic aspects. The triplet loss can effectively distinguish the details of two input data by learning a mapping from sequence utterances to a compact Euclidean space. After generating the mapping, the intention detection task can be easily implemented using standard techniques with pre-trained embeddings as feature vectors. Besides, we use the fusion strategy to enhance utterance feature representation in the downstream of intention detection task. We conduct experiments on several benchmark datasets of intention detection task: Snips dataset, ATIS dataset, Facebook multilingual task-oriented datasets, Daily Dialogue dataset, and MRDA dataset. The results illustrate that the proposed method can effectively improve the recognition performance of these datasets and achieves new state-of-the-art results on single-turn task-oriented datasets (Snips dataset, Facebook dataset), and a multi-turn dataset (Daily Dialogue dataset).

34 citations


Journal ArticleDOI
16 Nov 2020
TL;DR: This paper proposed a feature fusion residual block to perform the insect pest recognition task and constructed the Deep Feature Fusion Residual Network (DFF-ResNet), which outperform the original ResNet and other state-of-the-art methods.
Abstract: Insect pest control is considered as a significant factor in the yield of commercial crops. Thus, to avoid economic losses, we need a valid method for insect pest recognition. In this paper, we proposed a feature fusion residual block to perform the insect pest recognition task. Based on the original residual block, we fused the feature from a previous layer between two 1×1 convolution layers in a residual signal branch to improve the capacity of the block. Furthermore, we explored the contribution of each residual group to the model performance. We found that adding the residual blocks of earlier residual groups promotes the model performance significantly, which improves the capacity of generalization of the model. By stacking the feature fusion residual block, we constructed the Deep Feature Fusion Residual Network (DFF-ResNet). To prove the validity and adaptivity of our approach, we constructed it with two common residual networks (Pre-ResNet and Wide Residual Network (WRN)) and validated these models on the Canadian Institute For Advanced Research (CIFAR) and Street View House Number (SVHN) benchmark datasets. The experimental results indicate that our models have a lower test error than those of baseline models. Then, we applied our models to recognize insect pests and obtained validity on the IP102 benchmark dataset. The experimental results show that our models outperform the original ResNet and other state-of-the-art methods.

31 citations


Journal ArticleDOI
TL;DR: A novel non-split condition with an easy-setting hyperparameter which focuses more on minority classes of the current node is proposed and applied in the BDTKS model, avoiding ignoring the minority classes in the class imbalance cases and speeding up the process of classification.

30 citations


Journal ArticleDOI
TL;DR: The key idea of Sleepy is that the energy feature of the wireless channel follows a Gaussian Mixture Model derived from the accumulated channel data over a long period, leading to a low-cost yet promising solution for sleep monitoring.
Abstract: Sleep is a major event of our daily lives. Its quality constitutes a critical indicator of people's health conditions, both mentally and physically. Existing sensor-based or vision-based sleep monitoring systems either are obstructive to use or fail to provide adequate coverage. With the fast expansion of wireless infrastructures nowadays, channel data, which is pervasive and transparent, emerges as another alternative. To this end, we propose Sleepy, a wireless channel data driven sleep monitoring system leveraging commercial WiFi devices. The key idea of Sleepy is that the energy feature of the wireless channel follows a Gaussian Mixture Model (GMM) derived from the accumulated channel data over a long period. Therefore, a GMM based foreground extraction method has been designed to adaptively distinguish motions like rollovers (foreground) from background (stationary postures), leading to certain major merits, e.g., no calibrations or target-dependent training needed. We prototype Sleepy and evaluate it in two real environments. In the short-term controlled experiments, Sleepy achieves 95.65 percent detection accuracy (DA) and 2.16 percent false negative rate (FNR) on average. In the 60-minute real sleep studies, Sleepy demonstrates strong stability, i.e., 0 percent FNR and 98.22 percent DA. Considering that Sleepy is compatible with existing WiFi infrastructure, it constitutes a low-cost yet promising solution for sleep monitoring.

27 citations


Journal ArticleDOI
01 Jun 2020
TL;DR: In this paper, a first-of-its-kind wireless emotion sensing system driven by computational intelligence is presented, where the basic methodology is to explore the physical expression of emotions from wireless channel response via data mining.
Abstract: Emotion is well recognized as a distinguished symbol of human beings, and it plays a crucial role in our daily lives. Existing vision-based or sensor-based solutions are either obstructive to use or rely on specialized hardware, hindering their applicability. This paper introduces EmoSense, a first-of-its-kind wireless emotion sensing system driven by computational intelligence. The basic methodology is to explore the physical expression of emotions from wireless channel response via data mining. The design and implementation of EmoSense faces two major challenges—extracting physical expression from wireless channel data and recovering emotion from the corresponding physical expression. For the former, we present a Fresnel zone-based theoretical model depicting the fingerprint of the physical expression on channel response. For the latter, we design an efficient computational intelligence driven mechanism to recognize emotion from the corresponding fingerprints. We prototyped EmoSense on the commodity WiFi infrastructure and compared it with mainstream sensor-based and vision-based approaches in the real-world scenario. The numerical study over 3360 cases confirms that EmoSense achieves a comparable performance to the vision-based and sensor-based rivals under different scenarios. EmoSense only leverages the low-cost and prevalent WiFi infrastructures and thus, constitutes a tempting solution for emotion sensing.

20 citations


Journal ArticleDOI
TL;DR: The proposed Multi-label Emotion Detection Architecture (MEDA) is proposed to detect all associated emotions expressed in a given piece of text and can achieve better performance than state-of-the-art methods in this task.
Abstract: Textual emotion detection is an attractive task while previous studies mainly focused on polarity or single-emotion classification. However, human expressions are complex, and multiple emotions often occur simultaneously with non-negligible emotion correlations. In this paper, a Multi-label Emotion Detection Architecture (MEDA) is proposed to detect all associated emotions expressed in a given piece of text. MEDA is mainly composed of two modules: Multi-Channel Emotion-Specified Feature Extractor (MC-ESFE) and Emotion Correlation Learner (ECorL). MEDA captures underlying emotion-specified features through MC-ESFE module in advance. MC-ESFE is composed of multiple channel-wise ESFE networks. Each channel is devoted to the feature extraction of a specified emotion from sentence-level to context-level through a hierarchical structure. Based on obtained features, emotion correlation learning is implemented through an emotion sequence predictor in ECorL. During model training, we define a new loss function, which is called multi-label focal loss. With this loss function, the model can focus more on misclassified positive-negative emotion pairs and improve the overall performance by balancing the prediction of positive and negative emotions. The evaluation of proposed MEDA architecture is carried out on emotional corpus: RenCECps and NLPCC2018 datasets. The experimental results indicate that the proposed method can achieve better performance than state-of-the-art methods in this task.

Journal ArticleDOI
Xiaohua Wang1, Jianqiao Gong1, Min Hu1, Yu Gu1, Fuji Ren1 
TL;DR: The experimental results demonstrate that the improved StarGAN model can alleviate some flaws in the face generated by the original StarGAN, and can generate person images with better quality with different poses and expressions.
Abstract: In the field of facial expression recognition, deep learning is extensively used. However, insufficient and unbalanced facial training data in available public databases is a major challenge for improving the expression recognition rate. Generative Adversarial Networks (GANs) can produce more one-to-one faces with different expressions, which can be used to enhance databases. StarGAN can perform one-to-many translations for multiple expressions. Compared with original GANs, StarGAN can increase the efficiency of sample generation. Nevertheless, there are some defects in essential areas of the generated face, such as the mouth and the fuzzy side face image generation. To address these limitations, we improved StarGAN to alleviate the defects of images generation by modifying the reconstruction loss and adding the Contextual loss. Meanwhile, we added the Attention U-Net to StarGAN's generator, replacing StarGAN's original generator. Therefore, we proposed the Contextual loss and Attention U-Net (LAUN) improved StarGAN. The U-shape structure and skip connection in Attention U-Net can effectively integrate the details and semantic features of images. The network's attention structure can pay attention to the essential areas of the human face. The experimental results demonstrate that the improved model can alleviate some flaws in the face generated by the original StarGAN. Therefore, it can generate person images with better quality with different poses and expressions. The experiments were conducted on the Karolinska Directed Emotional Faces database, and the accuracy of facial expression recognition is 95.97%, 2.19% higher than that by using StarGAN. Meanwhile, the experiments were carried out on the MMI Facial Expression Database, and the accuracy of expression is 98.30%, 1.21% higher than that by using StarGAN. Moreover, experiment results have better performance based on the LAUN improved StarGAN enhanced databases than those without enhancement.

Journal ArticleDOI
TL;DR: The key idea is to visualize the channel data affected by human movements into time -series heat-map images, which are processed by a Convolutional Neural Network to understand the corresponding user behaviors.

Journal ArticleDOI
TL;DR: A real-time FEI method for a humanoid robot is proposed based on smooth-constraint reversed mechanical model (SRMM) by combining a sequence-to-sequence deep learning model and a motion-smoothing constraint to improve space–time similarity and motion smoothness of facial expression imitation.
Abstract: To improve the space–time similarity and motion smoothness of facial expression imitation (FEI), a real-time FEI method for a humanoid robot is proposed based on smooth-constraint reversed mechanical model (SRMM) by combining a sequence-to-sequence deep learning model and a motion-smoothing constraint. First, on the basis of facial data from a Kinect capture device, a facial feature vector is characterized based on 3 head postures, 17 facial animation units, and facial geometric deformation cascaded by Laplace coordinates. Second, a reversed mechanical model is constructed via a multilayer long short-term memory neural network to accomplish direct mapping from facial feature sequences to motor position sequences. Additionally, to overcome the motor chattering phenomenon during real-time FEI, a high-order polynomial is constructed to fit the position sequence of motors, and an SRMM is proposed and designed based on the deviation of position, velocity, and acceleration. Finally, aiming to imitate the real-time facial feature sequences of a performer captured from Kinect, the optimal position sequences generated based on the SRMM is sent to the hardware system to keep the space–time characteristics consistent with those of the performer. The experimental results demonstrate that the motor position deviation of the SRMM is less than 8%. The space–time similarity between the robot and the performer is greater than 85%, and the motion smoothness of the online FEI exceeded 90%. Compared with other related methods, the proposed method achieves a remarkable improvement in motor position deviation, space–time similarity, and motion smoothness.

Proceedings ArticleDOI
12 Oct 2020
TL;DR: A fusion model for text based on self-attention and topic clustering for multi-label emotion classification is proposed, which outperforms several strong baselines and related works.
Abstract: As one of the most critical tasks of natural language processing (NLP), emotion classification has a wide range of applications in many fields. However, restricted by corpus, semantic ambiguity, and other constraints, researchers in emotion classification face many difficulties, and the accuracy of multi-label emotion classification is not ideal. In this paper, to improve the accuracy of multi-label emotion classification, especially when semantic ambiguity occurs, we proposed a fusion model for text based on self-attention and topic clustering. We use the Pre-trained BERT to extract the hidden emotional representations of the sentence, and use the improved LDA topic model to cluster the topics of different levels of text. Then we fuse the hidden representations of the sentence and use a classification neural network to calculate the multi-label emotional intensity of the sentence. After testing on the Chinese emotion corpus Ren_CECPs corpus, extensive experimental results demonstrate that our model outperforms several strong baselines and related works. The F1-score of our model reaches 0.484, which is 0.064 higher than the best results in similar studies.

Journal ArticleDOI
TL;DR: An emotion expression extraction method is proposed to process millions of user-generated opinionated sentences automatically in this paper and experimental results demonstrate the effectiveness of algorithms in the proposed method.
Abstract: With the rapid spread of Chinese microblog, a large number of microblog topics are being generated in real-time. More and more users pay attention to emotion expressions of these opinionated sentences in different topics. It is challenging to label the emotion expressions of opinionated sentences manually. For this endeavor, an emotion expression extraction method is proposed to process millions of user-generated opinionated sentences automatically in this paper. Specifically, the proposed method mainly contains two tasks: emotion classification and opinion target extraction. We first use a lexicon-based emotion classification method to compute different emotion values in emotion label vectors of opinionated sentences. Then emotion label vectors of opinionated sentences are revised by an unsupervised emotion label propagation algorithm. After extracting candidate opinion targets of opinionated sentences, the opinion target extraction task is performed on a random walk-based ranking algorithm, which considers the connection between candidate opinion targets and the textual similarity between opinionated sentences, ranks candidate opinion targets of opinionated sentences. Experimental results demonstrate the effectiveness of algorithms in the proposed method.


Journal ArticleDOI
TL;DR: A multi-reservoirs feature coding continuous label fusion semi-supervised Generative Adversarial Networks (MCLFS-GAN) is proposed by using permutation phase transfer entropy as the EEG signal feature to effectively improve the recognition performance.

Journal ArticleDOI
TL;DR: A novel approach for KG entity typing is proposed which is trained by jointly utilizing local typing knowledge from existing entity type assertions and global triple knowledge in KGs, and two distinct knowledge-driven effective mechanisms of entity type inference are presented.
Abstract: Knowledge graph (KG) entity typing aims at inferring possible missing entity type instances in KG, which is a very significant but still under-explored subtask of knowledge graph completion. In this paper, we propose a novel approach for KG entity typing which is trained by jointly utilizing local typing knowledge from existing entity type assertions and global triple knowledge in KGs. Specifically, we present two distinct knowledge-driven effective mechanisms of entity type inference. Accordingly, we build two novel embedding models to realize the mechanisms. Afterward, a joint model via connecting them is used to infer missing entity type instances, which favors inferences that agree with both entity type instances and triple knowledge in KGs. Experimental results on two real-world datasets (Freebase and YAGO) demonstrate the effectiveness of our proposed mechanisms and models for improving KG entity typing.

Posted Content
TL;DR: A hybrid emotion recognition system leveraging two emotion-rich and tightly-coupled modalities, i.e., facial expression and body gesture and a signal sensitivity enhancement method based on the Rician K factor theory is proposed.
Abstract: Emotion is an essential part of Artificial Intelligence (AI) and human mental health. Current emotion recognition research mainly focuses on single modality (e.g., facial expression), while human emotion expressions are multi-modal in nature. In this paper, we propose a hybrid emotion recognition system leveraging two emotion-rich and tightly-coupled modalities, i.e., facial expression and body gesture. However, unbiased and fine-grained facial expression and gesture recognition remain a major problem. To this end, unlike our rivals relying on contact or even invasive sensors, we explore the commodity WiFi signal for device-free and contactless gesture recognition, while adopting a vision-based facial expression. However, there exist two design challenges, i.e., how to improve the sensitivity of WiFi signals and how to process the large-volume, heterogeneous, and non-synchronous data contributed by the two-modalities. For the former, we propose a signal sensitivity enhancement method based on the Rician K factor theory; for the latter, we combine CNN and RNN to mine the high-level features of bi-modal data, and perform a score-level fusion for fine-grained recognition. To evaluate the proposed method, we build a first-of-its-kind Vision-CSI Emotion Database (VCED) and conduct extensive experiments. Empirical results show the superiority of the bi-modality by achieving 83.24\% recognition accuracy for seven emotions, as compared with 66.48% and 66.67% recognition accuracy by gesture-only based solution and facial-only based solution, respectively. The VCED database download link is this https URL.

Journal ArticleDOI
TL;DR: A lip matching scheme based on vowel priority and a similarity evaluation model based on the Manhattan distance by using computer vision lip features, which quantifies the lip shape similarity between 0–1 provides an effective recommendation of evaluation standard are proposed.
Abstract: At present, the significance of humanoid robots dramatically increased while this kind of robots rarely enters human life because of its immature development. The lip shape of humanoid robots is crucial in the speech process since it makes humanoid robots look like real humans. Many studies show that vowels are the essential elements of pronunciation in all languages in the world. Based on the traditional research of viseme, we increased the priority of the smooth transition of lip between vowels and propose a lip matching scheme based on vowel priority. Additionally, we also designed a similarity evaluation model based on the Manhattan distance by using computer vision lip features, which quantifies the lip shape similarity between 0–1 provides an effective recommendation of evaluation standard. Surprisingly, this model successfully compensates the disadvantages of lip shape similarity evaluation criteria in this field. We applied this lip-matching scheme to Ren-Xin humanoid robot and performed robot teaching experiments as well as a similarity comparison experiment of 20 sentences with two males and two females and the robot. Notably, all the experiments have achieved excellent results.

Journal ArticleDOI
TL;DR: The authors quite agree that the AsI model is first developed by Panagiotis Petrantonakis from references [11] and [18] in their thesis.
Abstract: The authors quite agree that the AsI model is first developed by Panagiotis Petrantonakis from references [11] and [18] in their thesis.

Proceedings ArticleDOI
13 Oct 2020
TL;DR: In this paper, GEI with noise removal is created by using Mask R-CNN, and CNN is strengthened by applying Batch Normalization to CNN used for gait recognition.
Abstract: Currently, biometric authentication is being actively researched as a personal authentication technology for security, and gait recognition that uses Convolutional Neural Networks (CNN) for recognizing human walking is one of them. When creating a Gait Energy Image (GEI) using background subtraction, noises such as shadows and illumination fluctuations often hinder the accuracy of the method. In this paper, GEI with noise removal is created by using Mask R-CNN, and CNN is strengthened by applying Batch Normalization to CNN used for gait recognition. The effectiveness of this method was confirmed by conducting experiments on two types of gaits, one with no bag and one with a bag.

Journal ArticleDOI
TL;DR: Investigations show that the proposed method outperforms the state-of-theart approaches to datasets for two tasks, namely semantic relatedness and Microsoft research paraphrase identification and also boosts the similarity accuracy.
Abstract: Neural networks have received considerable attention in sentence similarity measuring systems due to their efficiency in dealing with semantic composition. However, existing neural network methods are not sufficiently effective in capturing the most significant semantic information buried in an input. To address this problem, a novel weighted-pooling attention layer is proposed to retain the most remarkable attention vector. It has already been established that long short-term memory and a convolution neural network have a strong ability to accumulate enriched patterns of whole sentence semantic representation. First, a sentence representation is generated by employing a siamese structure based on bidirectional long short-term memory and a convolutional neural network. Subsequently, a weighted-pooling attention layer is applied to obtain an attention vector. Finally, the attention vector pair information is leveraged to calculate the score of sentence similarity. An amalgamation of both, bidirectional long short-term memory and a convolutional neural network has resulted in a model that enhances information extracting and learning capacity. Investigations show that the proposed method outperforms the state-of-theart approaches to datasets for two tasks, namely semantic relatedness and Microsoft research paraphrase identification. The new model improves the learning capability and also boosts the similarity accuracy as well. key words: sentence similarity, sentence embedding, deep learning, long short-term memory, convolutional neural network

Journal ArticleDOI
TL;DR: This article presents a novel research method for mining the actionable intents for search users, by generating a ranked list of the potentially most informative actions based on a massive pool of action samples.
Abstract: Understanding search engine users' intents has been a popular study in information retrieval, which directly affects the quality of retrieved information. One of the fundamental problems in this field is to find a connection between the entity in a query and the potential intents of the users, the latter of which would further reveal important information for facilitating the users' future actions. In this article, we present a novel research method for mining the actionable intents for search users, by generating a ranked list of the potentially most informative actions based on a massive pool of action samples. We compare different search strategies and their combinations for retrieving the action pool and develop three criteria for measuring the informativeness of the selected action samples, that is, the significance of an action sample within the pool, the representativeness of an action sample for the other candidate samples, and the diverseness of an action sample with respect to the selected actions. Our experiment, based on the Action Mining (AM) query entity data set from the Actionable Knowledge Graph (AKG) task at NTCIR‐13, suggests that the proposed approach is effective in generating an informative and early‐satisfying ranking of potential actions for search users.

Proceedings ArticleDOI
13 Oct 2020
TL;DR: An attention-based Bi-LSTM-CRF network is proposed to integrate both the contextual information and the latent semantic relations of emotion expression and candidate clause to identify the causes behind an emotion expressed in a document.
Abstract: Emotion cause extraction is to identify the causes behind an emotion expressed in a document, a more challenging task for the fine-grained emotion analysis in natural language processing. Most existing methods regard the task as an independent clause classification problem, ignoring the relationships among multiple clauses in the same document. Moreover, the relative position of the candidate clause and emotion clause provide critical emotion cause clue. In the paper, an attention-based Bi-LSTM-CRF network is proposed to integrate the above information. In this network, a bi-directional long short-term memory is first used to capture both the contextual information and the latent semantic relations of emotion expression and candidate clause. Then, two attention mechanisms are designed to encode the mutual influence of the emotion expression and candidate clause, the relative position and candidate clause. Better-distributed representations are created with the former design. Finally, these representations are into the Condition Random Fields for labeling. The results experimented on a benchmark Chinese emotion cause dataset proved the effectiveness of our method by achieving the F score of 88.40 %.

Book ChapterDOI
07 Nov 2020
TL;DR: A multi-view weighted kernel fuzzy clustering method with collaborative evident and concealed views (MV-Co-KFCM) is put forward, which is found that the algorithm is more excellent as for 5 clustering validity indexes.
Abstract: With the development of media technology, data types that cluster analysis needs to face become more and more complicated. One of the more typical problems is the clustering of multi-view data sets. Existing clustering methods are difficult to handle such data well. To remedy this deficiency, a multi-view weighted kernel fuzzy clustering method with collaborative evident and concealed views (MV-Co-KFCM) is put forward. To begin with, the hidden shared information is extracted from several different views of the data set by means of non-negative matrix factorization, then applied to this iterative process of clustering. This not only takes advantage of the difference information in distinct views, but also utilizes the consistency knowledge in distinct views. This pre-processing algorithm of extracting hidden information from multiple views (EHI-MV) is obtained. Furthermore, in order to coordinate different views during the iteration, a weight is distributed. In addition, so as to regulate the weight adaptively, shannon entropy regularization term is also introduced. Entropy can be maximized as far as possible by minimizing the objective function, thus MV-Co-KFCM algorithm is proposed. Facing 5 multi-view databases and comparing with 6 current leading algorithms, it is found that the algorithm which we put forward is more excellent as for 5 clustering validity indexes.

Proceedings ArticleDOI
16 Oct 2020
TL;DR: This paper introduces ELMo representations and add a gated self-attention layer to the Bi-Directional Attention Flow network (BIDAF) and employs the feature reuse method and modify the linear function of answer layer to further improve the model and prove the validity of this model.
Abstract: Machine reading comprehension (MRC) has always been a significant part of artificial intelligence and the focus in the field of natural language processing (NLP). Given context paragraph, to answer its query, we need to encode complex interaction between the question and the context. In the late years, with the rapid progress of neural network model and attention theory, MRC has made great advances. Especially, attention theory has been widely used in MRC. However, the accuracy of the previous classic baseline model has some upside potential and some of them did not take into account the long context dependence and polysemy. In this paper, for resolving the above problems and further improve the model, we introduce ELMo representations and add a gated self-attention layer to the Bi-Directional Attention Flow network (BIDAF). In addition, we employ the feature reuse method and modify the linear function of answer layer to further improve the performance. In the experiment of SQuAD, we prove this model greatly exceeds the baseline BIDAF model and its performance is close to the average level of human test, which proves the validity of this model.

Proceedings ArticleDOI
01 May 2020
TL;DR: The purpose of this research is to create an environmental recognition system, specifically on emotion expressions and human states, for a humanoid robot aiming for interpersonal services using a Mask R-CNN model.
Abstract: The purpose of this research is to create an environmental recognition system, specifically on emotion expressions and human states, for a humanoid robot aiming for interpersonal services. Region Convolutional Neural Networks (R-CNN) are often used for detecting objects in the environment. We employ a Mask R-CNN model for detection of emotions and states of a target person from the robot’s field of view. The model was trained using various images of a human’s body in several emotional states. Experiments were conducted to validate the effectiveness of the model to detect the states of surrounding people from the robot’s camera. Although the set of human states assumed in the experiment was limited, the results of the experiments imply the potential of the proposed method to act as a basis of a recognition model for an intelligent humanoid robot for interpersonal services.

Proceedings ArticleDOI
09 Aug 2020
TL;DR: This paper prototypes a new method for user authentication by leveraging commodity WiFi, and explores four classifiers including K Nearest Neighbor(KNN), Support Vector Machine (SVM), Random Forest, and Decision Tree for recognizing users, showing that KNN provides the best performance.
Abstract: User authentication is a major area of interest within the field of Human Computer Interaction (HCI). Meanwhile, it prevents unauthorized accesses to certain the security of data. Personal Identification Number (PIN) and biometrics are the main approaches for identifying the user on the basis of his/her identity. However, PIN can be easily leaked to others, and biometrics usually require specialized devices. In this paper, we prototype our system, a new method for user authentication by leveraging commodity WiFi. The basic methodology is to explore the typing habit of users from Channel State Information (CSI). The design and implementation of our system face two challenges, i.e. extracting keystroke features from wireless channel data and authenticating the user via typing habit from the corresponding keystroke features. For the former, we capture signal fluctuations caused by the micro movements like typing and extract the keystroke features on channel response obtained from commodity WiFi devices. For the latter, we design a computational intelligence driven mechanism to authenticate users from the corresponding keystroke feature. We prototype our system on the low-cost off-the-shelf WiFi devices and evaluate its performance in real-world experiments. We have explored four classifiers including K Nearest Neighbor(KNN), Support Vector Machine (SVM), Random Forest, and Decision Tree for recognizing users. Empirical results show that KNN provides the best performance, i.e., 85.2% authentication accuracy, 12.8% false accept rate, and 11.2% false reject rate on average over 9 participants.