scispace - formally typeset
Search or ask a question

Showing papers on "Voice activity detection published in 2018"


Proceedings ArticleDOI
01 Oct 2018
TL;DR: A custom annotation tool has been developed to carry out the manual labelling task which, among other things, allows the annotators to choose whether to read the context of a sentence before labelling it.
Abstract: Hate speech is commonly defined as any communication that disparages a target group of people based on some characteristic such as race, colour, ethnicity, gender, sexual orientation, nationality, religion, or other characteristic. Due to the massive rise of user-generated web content on social media, the amount of hate speech is also steadily increasing. Over the past years, interest in online hate speech detection and, particularly, the automation of this task has continuously grown, along with the societal impact of the phenomenon. This paper describes a hate speech dataset composed of thousands of sentences manually labelled as containing hate speech or not. The sentences have been extracted from Stormfront, a white supremacist forum. A custom annotation tool has been developed to carry out the manual labelling task which, among other things, allows the annotators to choose whether to read the context of a sentence before labelling it. The paper also provides a thoughtful qualitative and quantitative study of the resulting dataset and several baseline experiments with different classification models. The dataset is publicly available.

289 citations


Journal ArticleDOI
TL;DR: This paper proposes an approach to detect hate expressions on Twitter based on unigrams and patterns that are automatically collected from the training set and used, among others, as features to train a machine learning algorithm.
Abstract: With the rapid growth of social networks and microblogging websites, communication between people from different cultural and psychological backgrounds has become more direct, resulting in more and more “cyber” conflicts between these people. Consequently, hate speech is used more and more, to the point where it has become a serious problem invading these open spaces. Hate speech refers to the use of aggressive, violent or offensive language, targeting a specific group of people sharing a common property, whether this property is their gender (i.e., sexism), their ethnic group or race (i.e., racism) or their believes and religion. While most of the online social networks and microblogging websites forbid the use of hate speech, the size of these networks and websites makes it almost impossible to control all of their content. Therefore, arises the necessity to detect such speech automatically and filter any content that presents hateful language or language inciting to hatred. In this paper, we propose an approach to detect hate expressions on Twitter. Our approach is based on unigrams and patterns that are automatically collected from the training set. These patterns and unigrams are later used, among others, as features to train a machine learning algorithm. Our experiments on a test set composed of 2010 tweets show that our approach reaches an accuracy equal to 87.4% on detecting whether a tweet is offensive or not (binary classification), and an accuracy equal to 78.4% on detecting whether a tweet is hateful, offensive, or clean (ternary classification).

251 citations


Proceedings ArticleDOI
01 Jun 2018
TL;DR: This work presents a Hindi-English code-mixed dataset consisting of tweets posted online on Twitter and proposes a supervised classification system for detecting hate speech in the text using various character level, word level, and lexicon based features.
Abstract: Hate speech detection in social media texts is an important Natural language Processing task, which has several crucial applications like sentiment analysis, investigating cyberbullying and examining socio-political controversies. While relevant research has been done independently on code-mixed social media texts and hate speech detection, our work is the first attempt in detecting hate speech in Hindi-English code-mixed social media text. In this paper, we analyze the problem of hate speech detection in code-mixed texts and present a Hindi-English code-mixed dataset consisting of tweets posted online on Twitter. The tweets are annotated with the language at word level and the class they belong to (Hate Speech or Normal Speech). We also propose a supervised classification system for detecting hate speech in the text using various character level, word level, and lexicon based features.

175 citations


Proceedings ArticleDOI
28 Aug 2018
TL;DR: This work created the first publicly available Arabic dataset annotated for the task of religious hate speech detection and the first Arabic lexicon consisting of terms commonly found in religious discussions along with scores representing their polarity and strength and developed various classification models using lexicon- based, n-gram-based, and deep-learning-based approaches.
Abstract: Religious hate speech in the Arabic Twittersphere is a notable problem that requires developing automated tools to detect messages that use inflammatory sectarian language to promote hatred and violence against people on the basis of religious affiliation. Distinguishing hate speech from other profane and vulgar language is quite a challenging task that requires deep linguistic analysis. The richness of the Arabic morphology and the limited available resources for the Arabic language make this task even more challenging. To the best of our knowledge, this paper is the first to address the problem of identifying speech promoting religious hatred in the Arabic Twitter. In this work, we describe how we created the first publicly available Arabic dataset annotated for the task of religious hate speech detection and the first Arabic lexicon consisting of terms commonly found in religious discussions along with scores representing their polarity and strength. We then developed various classification models using lexicon-based, n-gram-based, and deep-learning-based approaches. A detailed comparison of the performance of different models on a completely new unseen dataset is then presented. We find that a simple Recurrent Neural Network (RNN) architecture with Gated Recurrent Units (GRU) and pre-trained word embeddings can adequately detect religious hate speech with 0.84 Area Under the Receiver Operating Characteristic curve (AUROC).

149 citations


Journal ArticleDOI
TL;DR: A detection scheme that is an ensemble of Recurrent Neural Network (RNN) classifiers that incorporates various features associated with user-related information, such as the users’ tendency towards racism or sexism, and it can successfully distinguish racism and sexism messages from normal text, and achieve higher classification quality than current state-of-the-art algorithms.
Abstract: This paper addresses the important problem of discerning hateful content in social media. We propose a detection scheme that is an ensemble of Recurrent Neural Network (RNN) classifiers, and it incorporates various features associated with user-related information, such as the users’ tendency towards racism or sexism. This data is fed as input to the above classifiers along with the word frequency vectors derived from the textual content. We evaluate our approach on a publicly available corpus of 16k tweets, and the results demonstrate its effectiveness in comparison to existing state-of-the-art solutions. More specifically, our scheme can successfully distinguish racism and sexism messages from normal text, and achieve higher classification quality than current state-of-the-art algorithms.

143 citations


Book ChapterDOI
01 Jan 2018
TL;DR: The Hate Speech Detection task is a shared task on Italian social media for the detection of hateful content, and it has been proposed for the first time at EVALITA 2018, providing two datasets from two different online social platforms differently featured from the linguistic and communicative point of view.
Abstract: English. The Hate Speech Detection (HaSpeeDe) task is a shared task on Italian social media (Facebook and Twitter) for the detection of hateful content, and it has been proposed for the first time at EVALITA 2018. Providing two datasets from two different online social platforms differently featured from the linguistic and communicative point of view, we organized the task in three tasks where systems must be trained and tested on the same resource or using one in training and the other in testing: HaSpeeDe-FB, HaSpeeDeTW and Cross-HaSpeeDe (further subdivided into Cross-HaSpeeDe FB and Cross-HaSpeeDe TW sub-tasks). Overall, 9 teams participated in the task, and the best system achieved a macro F1score of 0.8288 for HaSpeeDe-FB, 0.7993 for HaSpeeDe-TW, 0.6541 for CrossHaSpeeDe FB and 0.6985 for CrossHaSpeeDe TW. In this report, we describe the datasets released and the evaluation measures, and we discuss results. Italiano. HaSpeeDe è la prima campagna di valutazione di sistemi per l’identificazione automatica di discorsi di incitamento all’odio su social media (Facebook e Twitter) in lingua italiana, proposta nell’ambito di EVALITA 2018. Fornendo ai partecipanti due insiemi di dati estratti da due piattaforme differenti dal punto di vista linguistico e della comunicazione, abbiamo articolato HaSpeeDe in tre compiti in cui i sistemi sono addestrati e testati sulla stessa tipologia di dati oppure addrestrati su una tipologia e testati sull’altra: HaSpeeDe-FB, HaSpeeDe-TW e Cross-HaSpeeDe (a sua volta suddiviso in Cross-HaSpeeDe FB e Cross-HaSpeeDe TW). Nel complesso, 9 gruppi hanno partecipato alla campagna, e il miglior sistema ha ottenuto un punteggio di macro F1 pari a 0,8288 in HaSpeeDe-FB, 0,7993 in HaSpeeDe-TW, 0,6541 in Cross-HaSpeeDe FB e 0.6985 in Cross-HaSpeeDe TW. L’articolo descrive i dataset rilasciati e le modalità di valutazione, e discute i risultati ottenuti.

139 citations


Posted Content
TL;DR: It is argued that for successful hate speech detection, model architecture is less important than the type of data and labeling criteria, and all proposed detection techniques are brittle against adversaries who can (automatically) insert typos, change word boundaries or add innocuous words to the original hate speech.
Abstract: With the spread of social networks and their unfortunate use for hate speech, automatic detection of the latter has become a pressing problem. In this paper, we reproduce seven state-of-the-art hate speech detection models from prior work, and show that they perform well only when tested on the same type of data they were trained on. Based on these results, we argue that for successful hate speech detection, model architecture is less important than the type of data and labeling criteria. We further show that all proposed detection techniques are brittle against adversaries who can (automatically) insert typos, change word boundaries or add innocuous words to the original hate speech. A combination of these methods is also effective against Google Perspective -- a cutting-edge solution from industry. Our experiments demonstrate that adversarial training does not completely mitigate the attacks, and using character-level features makes the models systematically more attack-resistant than using word-level features.

124 citations


Proceedings ArticleDOI
15 Jan 2018
TL;DR: In this paper, the authors reproduce seven state-of-the-art hate speech detection models from prior work, and show that they perform well only when tested on the same type of data they were trained on.
Abstract: With the spread of social networks and their unfortunate use for hate speech, automatic detection of the latter has become a pressing problem. In this paper, we reproduce seven state-of-the-art hate speech detection models from prior work, and show that they perform well only when tested on the same type of data they were trained on. Based on these results, we argue that for successful hate speech detection, model architecture is less important than the type of data and labeling criteria. We further show that all proposed detection techniques are brittle against adversaries who can (automatically) insert typos, change word boundaries or add innocuous words to the original hate speech. A combination of these methods is also effective against Google Perspective - a cutting-edge solution from industry. Our experiments demonstrate that adversarial training does not completely mitigate the attacks, and using character-level features makes the models systematically more attack-resistant than using word-level features.

114 citations


Journal ArticleDOI
TL;DR: A smartphone app that performs real-time voice activity detection based on convolutional neural network, meant to act as a switch for noise reduction in the signal processing pipelines of hearing devices, enabling noise estimation or classification to be conducted in noise-only parts of noisy speech signals.
Abstract: This paper presents a smartphone app that performs real-time voice activity detection based on convolutional neural network. Real-time implementation issues are discussed showing how the slow inference time associated with convolutional neural networks is addressed. The developed smartphone app is meant to act as a switch for noise reduction in the signal processing pipelines of hearing devices, enabling noise estimation or classification to be conducted in noise-only parts of noisy speech signals. The developed smartphone app is compared with a previously developed voice activity detection app as well as with two highly cited voice activity detection algorithms. The experimental results indicate that the developed app using convolutional neural network outperforms the previously developed smartphone app.

105 citations


Posted Content
TL;DR: The authors proposed novel deep neural network structures serving as effective feature extractors, and explore the usage of background information in the form of different word embeddings pre-trained from unlabeled corpora.
Abstract: In recent years, the increasing propagation of hate speech on social media and the urgent need for effective counter-measures have drawn significant investment from governments, companies, and empirical research. Despite a large number of emerging, scientific studies to address the problem, the performance of existing automated methods at identifying specific types of hate speech - as opposed to identifying non-hate -is still very unsatisfactory, and the reasons behind are poorly understood. This work undertakes the first in-depth analysis towards this problem and shows that, the very challenging nature of identifying hate speech on the social media is largely due to the extremely unbalanced presence of real hateful content in the typical datasets, and the lack of unique, discriminative features in such content, both causing them to reside in the 'long tail' of a dataset that is difficult to discover. To address this issue, we propose novel Deep Neural Network structures serving as effective feature extractors, and explore the usage of background information in the form of different word embeddings pre-trained from unlabelled corpora. We empirically evaluate our methods on the largest collection of hate speech datasets based on Twitter, and show that our methods can significantly outperform state of the art, as they are able to obtain a maximum improvement of between 4 and 16 percentage points (macro-average F1) depending on datasets.

94 citations


Journal ArticleDOI
Juntae Kim1, Minsoo Hahn1
TL;DR: This letter improves the use of context information by using an adaptive context attention model (ACAM) with a novel training strategy for effective attention, which weights the most crucial parts of the context for proper classification.
Abstract: Voice activity detection (VAD) classifies incoming signal segments into speech or background noise; its performance is crucial in various speech-related applications. Although speech-signal context is a relevant VAD asset, its usefulness varies in unpredictable noise environments. Therefore, its usage should be adaptively adjustable to the noise type. This letter improves the use of context information by using an adaptive context attention model (ACAM) with a novel training strategy for effective attention, which weights the most crucial parts of the context for proper classification. Experiments in real-world scenarios demonstrate that the proposed ACAM-based VAD outperforms the other baseline VAD methods.

Book ChapterDOI
01 Jan 2018
TL;DR: This paper investigates methods for bridging differences in annotation and data collection of abusive language tweets such as different annotation schemes, labels, or geographic and cultural influences from data sampling, and considers three distinct sets of annotations.
Abstract: Accurately detecting hate speech using supervised classification is dependent on data that is annotated by humans. Attaining high agreement amongst annotators though is difficult due to the subjective nature of the task, and different cultural, geographic and social backgrounds of the annotators. Furthermore, existing datasets capture only single types of hate speech such as sexism or racism; or single demographics such as people living in the United States, which negatively affects the recall when classifying data that are not captured in the training examples. End users of websites where hate speech may occur are exposed to risk of being exposed to explicit content due to the shortcomings in the training of automatic hate speech detection systems where unseen forms of hate speech or hate speech towards unseen groups are not captured. In this paper, we investigate methods for bridging differences in annotation and data collection of abusive language tweets such as different annotation schemes, labels, or geographic and cultural influences from data sampling. We consider three distinct sets of annotations, namely the annotations provided by Waseem (2016), Waseem and Hovy (2016), and Davidson et al. (2017). Specifically, we train a machine learning model using a multi-task learning (MTL) framework, where typically some auxiliary task is learned alongside a main task in order to gain better performance on the latter. Our approach distinguishes itself from most previous work in that we aim to train a model that is robust across data originating from different distributions and labeled under differing annotation guidelines, and that we understand these different datasets as different learning objectives in the way that classical work in multi-task learning does with different tasks. Here, we experiment with using fine-grained tags for annotation. Aided by the predictions in our models as well as the baseline models, we seek to show that it is possible to utilize distinct domains for classification as well as showing how cultural contexts influence classifier performance as the datasets we use are collected either exclusively from the U.S. Davidson et al. (2017) or collected globally with no geographic restriction (Waseem 2016; Waseem and Hovy 2016). Our choice for a multi-task learning set-up is motivated by a number of factors. Most importantly, MTL allows us to share knowledge between two or more objectives, such that we can leverage information encoded in one dataset to better fit another. As shown by Bingel and Sogaard (2017) and Martinez Alonso and Plank (2017), this is particularly promising when the auxiliary task has a more coarse-grained set of labels in comparison to the main task. Another benefit of MTL is that it lets us learn lower-level representations from greater amounts of data when compared to a single-task setup. This, in connection with MTL being known to work as a regularizer, is not only promising when it comes to fitting the training data, but also helps to prevent overfitting, especially when we have to deal with small datasets.

Proceedings Article
01 May 2018
TL;DR: An ensemble method is adapted for usage with neural networks and is presented to better classify hate speech, pleased that this method has a nearly 5 point improvement in F-measure when compared to original work on a publicly available hate speech evaluation dataset.
Abstract: Hate speech has become a major issue that is currently a hot topic in the domain of social media. Simultaneously, current proposed methods to address the issue raise concerns about censorship. Broadly speaking, our research focus is the area human rights, including the development of new methods to identify and better address discrimination while protecting freedom of expression. As neural network approaches are becoming state of the art for text classification problems, an ensemble method is adapted for usage with neural networks and is presented to better classify hate speech. Our method utilizes a publicly available embedding model, which is tested against a hate speech corpus from Twitter. To confirm robustness of our results, we additionally test against a popular sentiment dataset. Given our goal, we are pleased that our method has a nearly 5 point improvement in F-measure when compared to original work on a publicly available hate speech evaluation dataset. We also note difficulties encountered with reproducibility of deep learning methods and comparison of findings from other work. Based on our experience, more details are needed in published work reliant on deep learning methods, with additional evaluation information a consideration too. This information is provided to foster discussion within the research community for future work.

Journal ArticleDOI
TL;DR: For all tasks and conditions, the proposed optimization method significantly improves the SAD performance and among all the tested SAD methods the CG-LSTM model gives the best results.
Abstract: Speech activity detection (SAD) is an essential component of automatic speech recognition systems impacting the overall system performance. This paper investigates an optimization process for recurrent neural network (RNN) based SAD. This process optimizes all system parameters including those used for feature extraction, the NN weights, and the back-end parameters. Three cost functions are considered for SAD optimization: the frame error rate, the NIST detection cost function, and the word error rate of a downstream speech recognizer. Different types of RNN models and optimization methods are investigated. Three types of RNNs are compared: a basic RNN, long short-term memory (LSTM) network with peepholes, and a coordinated-gate LSTM (CG-LSTM) network introduced by Gelly and Gauvain. Well suited for nondifferentiable optimization problems, quantum-behaved particle swarm optimization is used to optimize feature extraction and posterior smoothing, as well as for the initial training of the neural networks. Experimental SAD results are reported on the NIST 2015 SAD evaluation data as well as REPERE and AMI meeting corpora. Speech recognition results are reported on the OpenKWS’13 test data. For all tasks and conditions, the proposed optimization method significantly improves the SAD performance and among all the tested SAD methods the CG-LSTM model gives the best results.

Journal ArticleDOI
TL;DR: It is argued that VADs should prioritize accuracy over area and power, and it is introduced a VAD circuit that uses an NN to classify modulation frequency features with 22.3-mW power consumption.
Abstract: This paper describes digital circuit architectures for automatic speech recognition (ASR) and voice activity detection (VAD) with improved accuracy, programmability, and scalability. Our ASR architecture is designed to minimize off-chip memory bandwidth, which is the main driver of system power consumption. A SIMD processor with 32 parallel execution units efficiently evaluates feed-forward deep neural networks (NNs) for ASR, limiting memory usage with a sparse quantized weight matrix format. We argue that VADs should prioritize accuracy over area and power, and introduce a VAD circuit that uses an NN to classify modulation frequency features with 22.3- $\mu \text{W}$ power consumption. The 65-nm test chip is shown to perform a variety of ASR tasks in real time, with vocabularies ranging from 11 words to 145 000 words and full-chip power consumption ranging from 172 $\mu \text{W}$ to 7.78 mW.

Proceedings ArticleDOI
15 Apr 2018
TL;DR: This paper proposes an alternative architecture that does not suffer from saturation problems by modeling temporal variations through a stateless dilated convolution neural network (CNN), which differs from conventional CNNs in three respects: it uses dilated causal convolution, gated activations and residual connections.
Abstract: Voice activity detection (VAD) is the task of predicting which parts of an utterance contains speech versus background noise. It is an important first step to determine which samples to send to the decoder and when to close the microphone. The long short-term memory neural network (LSTM) is a popular architecture for sequential modeling of acoustic signals, and has been successfully used in several VAD applications. However, it has been observed that LSTMs suffer from state saturation problems when the utterance is long (i.e., for voice dictation tasks), and thus requires the LSTM state to be periodically reset. In this paper, we propose an alternative architecture that does not suffer from saturation problems by modeling temporal variations through a stateless dilated convolution neural network (CNN). The proposed architecture differs from conventional CNNs in three respects: it uses dilated causal convolution, gated activations and residual connections. Results on a Google Voice Typing task shows that the proposed architecture achieves 14% relative FA improvement at a FR of 1% over state-of-the-art LSTMs for VAD task. We also include detailed experiments investigating the factors that distinguish the proposed architecture from conventional convolution.

Journal ArticleDOI
TL;DR: For both training schemes, ASR-based predictions outperform established measures such as the extended speech intelligibility index (ESII), the multi-resolution speech envelope power spectrum model (mr-sEPSM) and others.

Proceedings ArticleDOI
09 Apr 2018
TL;DR: This paper radically improve automated hate speech detection by presenting a novel model that leverages intra-user and inter-user representation learning for robust hate speech Detection on Twitter.
Abstract: Hate speech detection is a critical, yet challenging problem in Natural Language Processing (NLP). Despite the existence of numerous studies dedicated to the development of NLP hate speech detection approaches, the accuracy is still poor. The central problem is that social media posts are short and noisy, and most existing hate speech detection solutions take each post as an isolated input instance, which is likely to yield high false positive and negative rates. In this paper, we radically improve automated hate speech detection by presenting a novel model that leverages intra-user and inter-user representation learning for robust hate speech detection on Twitter. In addition to the target Tweet, we collect and analyze the user’s historical posts to model intra-user Tweet representations. To suppress the noise in a single Tweet, we also model the similar Tweets posted by all other users with reinforced inter-user representation learning techniques. Experimentally, we show that leveraging these two representations can significantly improve the f-score of a strong bidirectional LSTM baseline model by 10.1%.

Proceedings ArticleDOI
01 Oct 2018
TL;DR: A quantitative analysis of Twitter data was conducted, but no correlations were found between hateful text and the characteristics of the users who had posted it, and experiments with a hate speech classifier showed that combining certain user features with textual features gave slight improvements of classification performance.
Abstract: The paper investigates the potential effects user features have on hate speech classification. A quantitative analysis of Twitter data was conducted to better understand user characteristics, but no correlations were found between hateful text and the characteristics of the users who had posted it. However, experiments with a hate speech classifier based on datasets from three different languages showed that combining certain user features with textual features gave slight improvements of classification performance. While the incorporation of user features resulted in varying impact on performance for the different datasets used, user network-related features provided the most consistent improvements.

Journal ArticleDOI
TL;DR: A novel age estimation system based on LSTM-RNNs that is able to deal with short utterances, easily deployed in a real-time architecture and compared with a state-of-the-art i-vector approach.
Abstract: Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capabilities. Long short-term memory (LSTM) recurrent neural networks (RNN) have shown to outperform state-of-the-art approaches in related speech-based tasks, such as language identification or voice activity detection, especially when an accurate real-time response is required. In this paper, we propose a novel age estimation system based on LSTM-RNNs. This system is able to deal with short utterances (from 3 to 10 s) and it can be easily deployed in a real-time architecture. The proposed system has been tested and compared with a state-of-the-art i-vector approach using data from NIST speaker recognition evaluation 2008 and 2010 data sets. Experiments on short duration utterances show a relative improvement up to 28% in terms of mean absolute error of this new approach over the baseline system.

Posted Content
TL;DR: A neural-network based approach to classifying online hate speech in general, as well as racist and sexist speech in particular, using pre-trained word embeddings and max/mean pooling from simple, fully-connected transformations of these embedDings is presented.
Abstract: We present a neural-network based approach to classifying online hate speech in general, as well as racist and sexist speech in particular. Using pre-trained word embeddings and max/mean pooling from simple, fully-connected transformations of these embeddings, we are able to predict the occurrence of hate speech on three commonly used publicly available datasets. Our models match or outperform state of the art F1 performance on all three datasets using significantly fewer parameters and minimal feature preprocessing compared to previous methods.

Journal ArticleDOI
05 Jul 2018
TL;DR: A novel feature modeling technique named NN2Vec is presented that identifies and exploits the inherent relationship between speakers' vocal states and symptoms/affective states and achieves F-1 scores 17% and 13% higher than those of the best available baselines.
Abstract: Although social anxiety and depression are common, they are often underdiagnosed and undertreated, in part due to difficulties identifying and accessing individuals in need of services. Current assessments rely on client self-report and clinician judgment, which are vulnerable to social desirability and other subjective biases. Identifying objective, nonburdensome markers of these mental health problems, such as features of speech, could help advance assessment, prevention, and treatment approaches. Prior research examining speech detection methods has focused on fully supervised learning approaches employing strongly labeled data. However, strong labeling of individuals high in symptoms or state affect in speech audio data is impractical, in part because it is not possible to identify with high confidence which regions of a long speech indicate the person's symptoms or affective state. We propose a weakly supervised learning framework for detecting social anxiety and depression from long audio clips. Specifically, we present a novel feature modeling technique named NN2Vec that identifies and exploits the inherent relationship between speakers' vocal states and symptoms/affective states. Detecting speakers high in social anxiety or depression symptoms using NN2Vec features achieves F-1 scores 17% and 13% higher than those of the best available baselines. In addition, we present a new multiple instance learning adaptation of a BLSTM classifier, named BLSTM-MIL. Our novel framework of using NN2Vec features with the BLSTM-MIL classifier achieves F-1 scores of 90.1% and 85.44% in detecting speakers high in social anxiety and depression symptoms.

Proceedings ArticleDOI
TL;DR: An apache spark based model to classify Amharic Facebook posts and comments into hate and not hate is developed and achieves a promising result with unique feature of spark for big data.
Abstract: The anonymity of social networks makes it attractive for hate speech to mask their criminal activities online posing a challenge to the world and in particular Ethiopia. With this everincreasing volume of social media data, hate speech identification becomes a challenge in aggravating conflict between citizens of nations. The high rate of production, has become difficult to collect, store and analyze such big data using traditional detection methods. This paper proposed the application of apache spark in hate speech detection to reduce the challenges. Authors developed an apache spark based model to classify Amharic Facebook posts and comments into hate and not hate. Authors employed Random forest and Naïve Bayes for learning and Word2Vec and TF-IDF for feature selection. Tested by 10-fold crossvalidation, the model based on word2vec embedding performed best with 79.83%accuracy. The proposed method achieve a promising result with unique feature of spark for big data.

Proceedings ArticleDOI
01 Oct 2018
TL;DR: This paper examines methods to classify hate speech in social media using a dataset annotated for this purpose and obtains results of 80.56% accuracy, which represents an increase of almost 100% from the original analysis used as a reference.
Abstract: In this paper, we examine methods to classify hate speech in social media. We aim to establish lexical baselines for this task by applying classification methods using a dataset annotated for this purpose. As features, our system uses Natural Language Processing (NLP) techniques in order to expand the original dataset with emotional information and provide it for machine learning classification. We obtain results of 80.56% accuracy in hate speech identification, which represents an increase of almost 100% from the original analysis used as a reference.

Proceedings ArticleDOI
01 Feb 2018
TL;DR: A 1μW VAD system utilizing AFE and a digital BNN classifier with an event-encoding A/D interface is presented and the whole AFE is 9.4x more power-efficient than the prior art [5] and 7.9x than the state-of-the-art digital filter bank [6].
Abstract: Voice user interfaces (UIs) are highly compelling for wearable and mobile devices. They have the advantage of using compact and ultra-low-power (ULP) input devices (e.g. passive microphones). Together with ULP signal acquisition and processing, voice UIs can give energy-harvesting acoustic sensor nodes and battery-operating devices the sought-after capability of natural interaction with humans. Voice activity detection (VAD), separating speech from background noise, is a key building block in such voice UIs, e.g. it can enable power gating of higher-level speech tasks such as speaker identification and speech recognition [1]. As an always-on block, the power consumption of VAD must be minimized and meanwhile maintain high classification accuracy. Motivated by high power efficiency of analog signal processing, a VAD system using analog feature extraction (AFE) and mixed-signal decision tree (DT) classifier was demonstrated in [2]. While it achieved a record of 6μW, the system requires machine-learning based calibration of the DT thresholds on a chip-to-chip basis due to ill-controlled AFE variation. Moreover, the 7-node DT may deliver inferior classification accuracy especially under low input SNR and difficult noise scenario, compared to more advanced classifiers like deep neural networks (DNNs) [1,3]. Although heavy computational load in conventional floating-point DNNs prevents their adoption in embedded systems, the binarized neural networks (BNNs) with binary weights and activations proposed in [4] may pave the way to ULP implementations. In this paper, we present a 1μW VAD system utilizing AFE and a digital BNN classifier with an event-encoding A/D interface. The whole AFE is 9.4x more power-efficient than the prior art [5] and 7.9x than the state-of-the-art digital filter bank [6], and the BNN consumes only 0.63μW. To avoid costly chip-wise training, a variation-aware python model of the AFE was created and the generated features were used for offline BNN training. Measurements show 84.4%/85.4% mean speech/non-speech hit rate with 1.88%/4.65% 1-σ standard deviation among 10 dies using the same weights for 10dB SNR speech with restaurant noise.

Journal ArticleDOI
TL;DR: This paper proposed a new approach to detect synthetic speech using score-level fusion of front-end features namely, constant Q cepstral coefficients (CQCCs), all-pole group delay function (APGDF) and fundamental frequency variation (FFV), which outperforms all existing baseline features for both known and unknown attacks.

Journal ArticleDOI
TL;DR: A VAD technique is presented that uses line spectral frequency-based statistical features namely LSF-S coupled with extreme learning-based classification that helps in reducing the computational overhead as well elevate the recognition performance of speech-based systems.
Abstract: Voice activity detection (VAD) refers to the task of identifying vocal segments from an audio clip. It helps in reducing the computational overhead as well elevate the recognition performance of speech-based systems by helping to discard the non vocal portions from an input signal. In this paper, a VAD technique is presented that uses line spectral frequency-based statistical features namely LSF-S coupled with extreme learning-based classification. The experiments were performed on a database of more than 350 h consisting of data from multifarious sources. We have obtained an encouraging overall accuracy of 99.43%.

Posted Content
TL;DR: An increment to the state-of-the-art in hate speech detection for English-Hindi code-mixed tweets by using domain-specific embeddings results in an improved representation of target groups, and an improved F-score.
Abstract: This paper reports an increment to the state-of-the-art in hate speech detection for English-Hindi code-mixed tweets. We compare three typical deep learning models using domain-specific embeddings. On experimenting with a benchmark dataset of English-Hindi code-mixed tweets, we observe that using domain-specific embeddings results in an improved representation of target groups, and an improved F-score.

Proceedings ArticleDOI
02 Sep 2018
TL;DR: A standalone replay spoof speech detection system to classify the natural vs. replay speech signals using Convolutional Restricted Boltzmann Machine (ConvRBM) with the pre-emphasized speech signals.
Abstract: In this paper, we present a standalone replay spoof speech detection (SSD) system to classify the natural vs. replay speech. The replay speech spectrum is known to be affected in the higher frequency range. In this context, we propose to exploit an auditory filterbank learning using Convolutional Restricted Boltzmann Machine (ConvRBM) with the pre-emphasized speech signals. Temporal modulations in amplitude (AM) and frequency (FM) are extracted from the ConvRBM subbands using the Energy Separation Algorithm (ESA). ConvRBM-based short-time AM and FM features are developed using cepstral processing, denoted as AM-ConvRBM-CC and FM-ConvRBM-CC. Proposed temporal modulation features performed better than the baseline Constant-Q Cepstral Coefficients (CQCC) features. On the evaluation set, an absolute reduction of 7.48 % and 5.28 % in Equal Error Rate (EER) is obtained using AM-ConvRBM-CC and FM-ConvRBM-CC, respectively compared to our CQCC baseline. The best results are achieved by combining scores from AM and FM cues (0.82 % and 8.89 % EER for development and evaluation set, respectively). The statistics of AM-FM features are analyzed to understand the performance gap and complementary information in both the features.

Proceedings ArticleDOI
01 Oct 2018
TL;DR: The authors used pre-trained word embeddings and max/mean pooling from simple, fully-connected transformations of these embedding to predict the occurrence of hate speech on three commonly used publicly available datasets.
Abstract: We present a neural-network based approach to classifying online hate speech in general, as well as racist and sexist speech in particular. Using pre-trained word embeddings and max/mean pooling from simple, fully-connected transformations of these embeddings, we are able to predict the occurrence of hate speech on three commonly used publicly available datasets. Our models match or outperform state of the art F1 performance on all three datasets using significantly fewer parameters and minimal feature preprocessing compared to previous methods.