scispace - formally typeset
Search or ask a question

Showing papers by "Dennis P. Wall published in 2021"


Proceedings ArticleDOI
08 May 2021
TL;DR: In this paper, a feature representation consisting exclusively of head pose keypoints was proposed for detecting head banging in home videos using a time-distributed convolutional neural network (CNN) in which a single CNN extracts features from each frame in the input sequence, and these extracted features are fed as input to a long short-term memory (LSTM) network.
Abstract: Activity recognition computer vision algorithms can be used to detect the presence of autism-related behaviors, including what are termed “restricted and repetitive behaviors”, or stimming, by diagnostic instruments. Examples of stimming include hand flapping, spinning, and head banging. One of the most significant bottlenecks for implementing such classifiers is the lack of sufficiently large training sets of human behavior specific to pediatric developmental delays. The data that do exist are usually recorded with a handheld camera which is itself shaky or even moving, posing a challenge for traditional feature representation approaches for activity detection which capture the camera's motion as a feature. To address these issues, we first document the advantages and limitations of current feature representation techniques for activity recognition when applied to head banging detection. We then propose a feature representation consisting exclusively of head pose keypoints. We create a computer vision classifier for detecting head banging in home videos using a time-distributed convolutional neural network (CNN) in which a single CNN extracts features from each frame in the input sequence, and these extracted features are fed as input to a long short-term memory (LSTM) network. On the binary task of predicting head banging and no head banging within videos from the Self Stimulatory Behaviour Dataset (SSBD), we reach a mean F1-score of 90.77% using 3-fold cross validation (with individual fold F1-scores of 83.3%, 89.0%, and 100.0%) when ensuring that no child who appeared in the train set was in the test set for all folds. This work documents a successful process for training a computer vision classifier which can detect a particular human motion pattern with few training examples and even when the camera recording the source clip is unstable. The process of engineering useful feature representations by visually inspecting the representations, as described here, can be a useful practice by designers and developers of interactive systems detecting human motion patterns for use in mobile and ubiquitous interactive systems.

26 citations


Journal ArticleDOI
TL;DR: The role of insertions and deletions (indels) as well as recombination in SARS-CoV-2 evolution has been examined in this paper, using sequences from the GISAID database.
Abstract: The evolutionary dynamics of SARS-CoV-2 have been carefully monitored since the COVID-19 pandemic began in December 2019. However, analysis has focused primarily on single nucleotide polymorphisms and largely ignored the role of insertions and deletions (indels) as well as recombination in SARS-CoV-2 evolution. Using sequences from the GISAID database, we catalogue over 100 insertions and deletions in the SARS-CoV-2 consensus sequences. We hypothesize that these indels are artifacts of recombination events between SARS-CoV-2 replicates whereby RNA-dependent RNA polymerase (RdRp) re-associates with a homologous template at a different loci (“imperfect homologous recombination”). We provide several independent pieces of evidence that suggest this. (1) The indels from the GISAID consensus sequences are clustered at specific regions of the genome. (2) These regions are also enriched for 5’ and 3’ breakpoints in the transcription regulatory site (TRS) independent transcriptome, presumably sites of RNA-dependent RNA polymerase (RdRp) template-switching. (3) Within raw reads, these indel hotspots have cases of both high intra-host heterogeneity and intra-host homogeneity, suggesting that these indels are both consequences of de novo recombination events within a host and artifacts of previous recombination. We briefly analyze the indels in the context of RNA secondary structure, noting that indels preferentially occur in “arms” and loop structures of the predicted folded RNA, suggesting that secondary structure may be a mechanism for TRS-independent template-switching in SARS-CoV-2 or other coronaviruses. These insights into the relationship between structural variation and recombination in SARS-CoV-2 can improve our reconstructions of the SARS-CoV-2 evolutionary history as well as our understanding of the process of RdRp template-switching in RNA viruses.

20 citations


Journal ArticleDOI
TL;DR: In this paper, a trustworthy crowd of non-experts can efficiently annotate behavioral features needed for accurate machine learning detection of the common childhood developmental disorder Autism Spectrum Disorder (ASD) for children under 8 years old.
Abstract: Standard medical diagnosis of mental health conditions requires licensed experts who are increasingly outnumbered by those at risk, limiting reach. We test the hypothesis that a trustworthy crowd of non-experts can efficiently annotate behavioral features needed for accurate machine learning detection of the common childhood developmental disorder Autism Spectrum Disorder (ASD) for children under 8 years old. We implement a novel process for identifying and certifying a trustworthy distributed workforce for video feature extraction, selecting a workforce of 102 workers from a pool of 1,107. Two previously validated ASD logistic regression classifiers, evaluated against parent-reported diagnoses, were used to assess the accuracy of the trusted crowd’s ratings of unstructured home videos. A representative balanced sample (N = 50 videos) of videos were evaluated with and without face box and pitch shift privacy alterations, with AUROC and AUPRC scores > 0.98. With both privacy-preserving modifications, sensitivity is preserved (96.0%) while maintaining specificity (80.0%) and accuracy (88.0%) at levels comparable to prior classification methods without alterations. We find that machine learning classification from features extracted by a certified nonexpert crowd achieves high performance for ASD detection from natural home videos of the child at risk and maintains high sensitivity when privacy-preserving mechanisms are applied. These results suggest that privacy-safeguarded crowdsourced analysis of short home videos can help enable rapid and mobile machine-learning detection of developmental delays in children.

19 citations


Posted ContentDOI
31 Jul 2021-medRxiv
TL;DR: In this paper, a new emotion classifier designed specifically for pediatric populations, trained with images crowdsourced from an educational mobile charades-style game: Guess What?, was proposed.
Abstract: Autism spectrum disorder (ASD) is a neurodevelopmental disorder affecting one in 40 children in the United States and is associated with impaired social interactions, restricted interests, and repetitive behaviors. Previous studies have demonstrated the promise of applying mobile systems with real-time emotion recognition to autism therapy, but existing platforms have shown limited performance on videos of children with ASD. We propose the development of a new emotion classifier designed specifically for pediatric populations, trained with images crowdsourced from an educational mobile charades-style game: Guess What?. We crowdsourced the acquisition of videos of children portraying emotions during remote game sessions of Guess What? that yielded 6,344 frames from fifteen subjects. Two raters manually labeled the frames with four of the Ekman universal emotions (happy, scared, angry, sad), a “neutral” class, and “n/a” for frames with an indeterminable label. The data were pre-processed, and a model was trained with a transfer-learning and neural-architecture-search approach using the Google Cloud AutoML Vision API. The resulting classifier was evaluated against existing approaches (Microsoft’s Azure Face API and Amazon Web Service’s Rekognition) using the standard metrics of F1 score. The resulting classifier demonstrated superior performance across all evaluated emotions, supporting our hypothesis that a model trained with a pediatric dataset would outperform existing emotion-recognition approaches for the population of interest. These results suggest a new strategy to develop precision therapy for autism at home by integrating the model trained with a personalized dataset to the mobile game.

13 citations


Journal ArticleDOI
06 Apr 2021
TL;DR: In this paper, the authors used 16S rRNA V4 amplicon sequencing of 117 samples (60 ASD and 57 controls) to identify 21 amplicon sequence variants (ASVs) that differed significantly between the two cohorts: 11 were found to be enriched in neurotypical children (six ASVs belonging to the Lachnospiraceae family), while 10 were enriched in children with ASD (including Ruminococcaceae and Bacteroidaceae families).
Abstract: The existence of a link between the gut microbiome and autism spectrum disorder (ASD) is well established in mice, but in human populations, efforts to identify microbial biomarkers have been limited due to a lack of appropriately matched controls, stratification of participants within the autism spectrum, and sample size. To overcome these limitations, we crowdsourced the recruitment of families with age-matched sibling pairs between 2 and 7 years old (within 2 years of each other), where one child had a diagnosis of ASD and the other did not. Parents collected stool samples, provided a home video of their ASD child's natural social behavior, and responded online to diet and behavioral questionnaires. 16S rRNA V4 amplicon sequencing of 117 samples (60 ASD and 57 controls) identified 21 amplicon sequence variants (ASVs) that differed significantly between the two cohorts: 11 were found to be enriched in neurotypical children (six ASVs belonging to the Lachnospiraceae family), while 10 were enriched in children with ASD (including Ruminococcaceae and Bacteroidaceae families). Summarizing the expected KEGG orthologs of each predicted genome, the taxonomic biomarkers associated with children with ASD can use amino acids as precursors for butyragenic pathways, potentially altering the availability of neurotransmitters like glutamate and gamma aminobutyric acid (GABA).IMPORTANCE Autism spectrum disorder (ASD), which now affects 1 in 54 children in the United States, is known to have comorbidity with gut disorders of a variety of types; however, the link to the microbiome remains poorly characterized. Recent work has provided compelling evidence to link the gut microbiome to the autism phenotype in mouse models, but identification of specific taxa associated with autism has suffered replicability issues in humans. This has been due in part to sample size that sufficiently covers the spectrum of phenotypes known to autism (which range from subtle to severe) and a lack of appropriately matched controls. Our original study proposes to overcome these limitations by collecting stool-associated microbiome on 60 sibling pairs of children, one with autism and one neurotypically developing, both 2 to 7 years old and no more than 2 years apart in age. We use exact sequence variant analysis and both permutation and differential abundance procedures to identify 21 taxa with significant enrichment or depletion in the autism cohort compared to their matched sibling controls. Several of these 21 biomarkers have been identified in previous smaller studies; however, some are new to autism and known to be important in gut-brain interactions and/or are associated with specific fatty acid biosynthesis pathways.

11 citations


Journal ArticleDOI
TL;DR: In this paper, the authors explore the feasibility of using crowdsourcing to acquire reliable soft-target labels and evaluate an emotion detection classifier trained with these labels, and compare the resulting softmax output distributions of the two classifiers with a 2-sample independent t-test of L1 distances between the classifier's output probability distribution and the distribution of human labels.
Abstract: Emotion detection classifiers traditionally predict discrete emotions. However, emotion expressions are often subjective, thus requiring a method to handle compound and ambiguous labels. We explore the feasibility of using crowdsourcing to acquire reliable soft-target labels and evaluate an emotion detection classifier trained with these labels. We hypothesize that training with labels that are representative of the diversity of human interpretation of an image will result in predictions that are similarly representative on a disjoint test set. We also hypothesize that crowdsourcing can generate distributions which mirror those generated in a lab setting. We center our study on the Child Affective Facial Expression (CAFE) dataset, a gold standard collection of images depicting pediatric facial expressions along with 100 human labels per image. To test the feasibility of crowdsourcing to generate these labels, we used Microworkers to acquire labels for 207 CAFE images. We evaluate both unfiltered workers and workers selected through a short crowd filtration process. We then train two versions of a ResNet-152 neural network on soft-target CAFE labels using the original 100 annotations provided with the dataset: (1) a classifier trained with traditional one-hot encoded labels and (2) a classifier trained with vector labels representing the distribution of CAFE annotator responses. We compare the resulting softmax output distributions of the two classifiers with a 2-sample independent t-test of L1 distances between the classifier’s output probability distribution and the distribution of human labels. While agreement with CAFE is weak for unfiltered crowd workers, the filtered crowd agree with the CAFE labels 100% of the time for happy, neutral, sad, and “fear + surprise” and 88.8% for “anger + disgust.” While the F1-score for a one-hot encoded classifier is much higher (94.33% vs. 78.68%) with respect to the ground truth CAFE labels, the output probability vector of the crowd-trained classifier more closely resembles the distribution of human labels (t = 3.2827, p = 0.0014). For many applications of affective computing, reporting an emotion probability distribution that accounts for the subjectivity of human interpretation can be more useful than an absolute label. Crowdsourcing, including a sufficient filtering mechanism for selecting reliable crowd workers, is a feasible solution for acquiring soft-target labels.

9 citations


Journal ArticleDOI
TL;DR: In this paper, a mobile application, GuessWhat, was designed to deliver game-based therapy to children aged 3 to 12 in home settings through a smartphone, held by a caregiver on their forehead, displaying one of a range of appropriate and therapeutically relevant prompts (e.g., a surprised face) that the child must recognize and mimic sufficiently to allow the caregiver to guess what is being imitated and proceed to the next prompt.
Abstract: Background Many children with autism cannot receive timely in-person diagnosis and therapy, especially in situations where access is limited by geography, socioeconomics, or global health concerns such as the current COVD-19 pandemic. Mobile solutions that work outside of traditional clinical environments can safeguard against gaps in access to quality care. Objective The aim of the study is to examine the engagement level and therapeutic feasibility of a mobile game platform for children with autism. Methods We designed a mobile application, GuessWhat, which, in its current form, delivers game-based therapy to children aged 3 to 12 in home settings through a smartphone. The phone, held by a caregiver on their forehead, displays one of a range of appropriate and therapeutically relevant prompts (e.g., a surprised face) that the child must recognize and mimic sufficiently to allow the caregiver to guess what is being imitated and proceed to the next prompt. Each game runs for 90 seconds to create a robust social exchange between the child and the caregiver. Results We examined the therapeutic feasibility of GuessWhat in 72 children (75% male, average age 8 years 2 months) with autism who were asked to play the game for three 90-second sessions per day, 3 days per week, for a total of 4 weeks. The group showed significant improvements in Social Responsiveness Score-2 (SRS-2) total (3.97, p Conclusion The results support that the GuessWhat mobile game is a viable approach for efficacious treatment of autism and further support the possibility that the game can be used in natural settings to increase access to treatment when barriers to care exist.

7 citations


Journal ArticleDOI
TL;DR: In this article, a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology is presented.
Abstract: As next-generation sequencing technologies make their way into the clinic, knowledge of their error rates is essential if they are to be used to guide patient care. However, sequencing platforms and variant-calling pipelines are continuously evolving, making it difficult to accurately quantify error rates for the particular combination of assay and software parameters used on each sample. Family data provide a unique opportunity for estimating sequencing error rates since it allows us to observe a fraction of sequencing errors as Mendelian errors in the family, which we can then use to produce genome-wide error estimates for each sample. We introduce a method that uses Mendelian errors in sequencing data to make highly granular per-sample estimates of precision and recall for any set of variant calls, regardless of sequencing platform or calling methodology. We validate the accuracy of our estimates using monozygotic twins, and we use a set of monozygotic quadruplets to show that our predictions closely match the consensus method. We demonstrate our method’s versatility by estimating sequencing error rates for whole genome sequencing, whole exome sequencing, and microarray datasets, and we highlight its sensitivity by quantifying performance increases between different versions of the GATK variant-calling pipeline. We then use our method to demonstrate that: 1) Sequencing error rates between samples in the same dataset can vary by over an order of magnitude. 2) Variant calling performance decreases substantially in low-complexity regions of the genome. 3) Variant calling performance in whole exome sequencing data decreases with distance from the nearest target region. 4) Variant calls from lymphoblastoid cell lines can be as accurate as those from whole blood. 5) Whole-genome sequencing can attain microarray-level precision and recall at disease-associated SNV sites. Genotype datasets from families are powerful resources that can be used to make fine-grained estimates of sequencing error for any sequencing platform and variant-calling methodology.

6 citations


Posted Content
10 Jan 2021
TL;DR: This work replaces traditional one-hot encoded label representations with a crowd's distribution of labels and demonstrates that the consensus labels from the crowd tend to match the consensus from the original CAFE raters, validating the utility of crowdsourcing.
Abstract: Current emotion detection classifiers predict discrete emotions. However, literature in psychology has documented that compound and ambiguous facial expressions are often evoked by humans. As a stride towards development of machine learning models that more accurately reflect compound and ambiguous emotions, we replace traditional one-hot encoded label representations with a crowd's distribution of labels. We center our study on the Child Affective Facial Expression (CAFE) dataset, a gold standard dataset of pediatric facial expressions which includes 100 human labels per image. We first acquire crowdsourced labels for 207 emotions from CAFE and demonstrate that the consensus labels from the crowd tend to match the consensus from the original CAFE raters, validating the utility of crowdsourcing. We then train two versions of a ResNet-152 classifier on CAFE images with two types of labels (1) traditional one-hot encoding and (2) vector labels representing the crowd distribution of responses. We compare the resulting output distributions of the two classifiers. While the traditional F1-score for the one-hot encoding classifier is much higher (94.33% vs. 78.68%), the output probability vector of the crowd-trained classifier much more closely resembles the distribution of human labels (t=3.2827, p=0.0014). For many applications of affective computing, reporting an emotion probability distribution that more closely resembles human interpretation can be more important than traditional machine learning metrics. This work is a first step for engineers of interactive systems to account for machine learning cases with ambiguous classes and we hope it will generate a discussion about machine learning with ambiguous labels and leveraging crowdsourcing as a potential solution.

6 citations


Posted ContentDOI
06 Jul 2021-medRxiv
TL;DR: In this article, the authors explore the potential for global image transformations to provide privacy while preserving behavioral annotation quality and find that crowd annotations can approximate clinician-provided autism impression from home videos in a privacy-preserved manner.
Abstract: Artificial Intelligence (A.I.) solutions are increasingly considered for telemedicine. For these methods to adapt to the field of behavioral pediatrics, serving children and their families in home settings, it will be crucial to ensure the privacy of the child and parent subjects in the videos. To address this challenge in A.I. for healthcare, we explore the potential for global image transformations to provide privacy while preserving behavioral annotation quality. Crowd workers have previously been shown to reliably annotate behavioral features in unstructured home videos, allowing machine learning classifiers to detect autism using the annotations as input. We evaluate this method with videos altered via pixelation, dense optical flow, and Gaussian blurring. On a balanced test set of 30 videos of children with autism and 30 neurotypical controls, we find that the visual privacy alterations do not drastically alter any individual behavioral annotation at the item level. The AUROC on the evaluation set was 90.0% +/- 7.5% for the unaltered condition, 85.0% +/- 9.0% for pixelation, 85.0% +/- 9.0% for optical flow, and 83.3% +/- 9.3% for blurring, demonstrating that an aggregation of small changes across multiple behavioral questions can collectively result in increased misdiagnosis rates. We also compare crowd answers against clinicians who provided the same annotations on the same videos and find that clinicians are more sensitive to autism-related symptoms. We also find that there is a linear correlation (r=0.75, p<0.0001) between the mean Clinical Global Impression (CGI) score provided by professional clinicians and the corresponding classifier score emitted by the logistic regression classifier with crowd inputs, indicating that the classifiers output probability is a reliable estimate of clinical impression of autism from home videos. A significant correlation is maintained with privacy alterations, indicating that crowd annotations can approximate clinician-provided autism impression from home videos in a privacy-preserved manner.

5 citations


Posted ContentDOI
25 Jun 2021-medRxiv
TL;DR: In this article, a mobile health application was used to collect over 11 hours of video footage depicting 95 children engaged in gameplay in a natural home environment, and an LSTM neural network was trained to determine if gaze indicators could be predictive of ASD.
Abstract: ObjectiveAutism spectrum disorder (ASD) is a widespread neurodevelopmental condition with a range of potential causes and symptoms. Children with ASD exhibit behavioral and social impairments, giving rise to the possibility of utilizing computational techniques to evaluate a childs social phenotype from home videos. MethodsHere, we use a mobile health application to collect over 11 hours of video footage depicting 95 children engaged in gameplay in a natural home environment. We utilize automated dataset annotations to analyze two social indicators that have previously been shown to differ between children with ASD and their neurotypical (NT) peers: (1) gaze fixation patterns and (2) visual scanning methods. We compare the gaze fixation and visual scanning methods utilized by children during a 90-second gameplay video in order to identify statistically-significant differences between the two cohorts; we then train an LSTM neural network in order to determine if gaze indicators could be predictive of ASD. ResultsOur work identifies one statistically significant region of fixation and one significant gaze transition pattern that differ between our two cohorts during gameplay. In addition, our deep learning model demonstrates mild predictive power in identifying ASD based on coarse annotations of gaze fixations. DiscussionUltimately, our results demonstrate the utility of game-based mobile health platforms in quantifying visual patterns and providing insights into ASD. We also show the importance of automated labeling techniques in generating large-scale datasets while simultaneously preserving the privacy of participants. Our approaches can generalize to other healthcare needs.

Journal ArticleDOI
TL;DR: In this article, the authors proposed sequence-based biomarkers (SBBs), an aggregation method that groups and aggregates microbes using single variants and combinations of variants within their 16S sequences.
Abstract: Sequencing partial 16S rRNA genes is a cost effective method for quantifying the microbial composition of an environment, such as the human gut. However, downstream analysis relies on binning reads into microbial groups by either considering each unique sequence as a different microbe, querying a database to get taxonomic labels from sequences, or clustering similar sequences together. However, these approaches do not fully capture evolutionary relationships between microbes, limiting the ability to identify differentially abundant groups of microbes between a diseased and control cohort. We present sequence-based biomarkers (SBBs), an aggregation method that groups and aggregates microbes using single variants and combinations of variants within their 16S sequences. We compare SBBs against other existing aggregation methods (OTU clustering and Microphenoor DiTaxa features) in several benchmarking tasks: biomarker discovery via permutation test, biomarker discovery via linear discriminant analysis, and phenotype prediction power. We demonstrate the SBBs perform on-par or better than the state-of-the-art methods in biomarker discovery and phenotype prediction. On two independent datasets, SBBs identify differentially abundant groups of microbes with similar or higher statistical significance than existing methods in both a permutation-test-based analysis and using linear discriminant analysis effect size. . By grouping microbes by SBB, we can identify several differentially abundant microbial groups (FDR <.1) between children with autism and neurotypical controls in a set of 115 discordant siblings. Porphyromonadaceae, Ruminococcaceae, and an unnamed species of Blastocystis were significantly enriched in autism, while Veillonellaceae was significantly depleted. Likewise, aggregating microbes by SBB on a dataset of obese and lean twins, we find several significantly differentially abundant microbial groups (FDR<.1). We observed Megasphaera andSutterellaceae highly enriched in obesity, and Phocaeicola significantly depleted. SBBs also perform on bar with or better than existing aggregation methods as features in a phenotype prediction model, predicting the autism phenotype with an ROC-AUC score of .64 and the obesity phenotype with an ROC-AUC score of .84. SBBs provide a powerful method for aggregating microbes to perform differential abundance analysis as well as phenotype prediction. Our source code can be freely downloaded from http://github.com/briannachrisman/16s_biomarkers .

Posted Content
TL;DR: In this article, a time-distributed convolutional neural network (CNN) was used for detecting head banging in home videos using a single CNN extract features from each frame in the input sequence, and these extracted features are fed as input to a LSTM network.
Abstract: Activity recognition computer vision algorithms can be used to detect the presence of autism-related behaviors, including what are termed "restricted and repetitive behaviors", or stimming, by diagnostic instruments. The limited data that exist in this domain are usually recorded with a handheld camera which can be shaky or even moving, posing a challenge for traditional feature representation approaches for activity detection which mistakenly capture the camera's motion as a feature. To address these issues, we first document the advantages and limitations of current feature representation techniques for activity recognition when applied to head banging detection. We then propose a feature representation consisting exclusively of head pose keypoints. We create a computer vision classifier for detecting head banging in home videos using a time-distributed convolutional neural network (CNN) in which a single CNN extracts features from each frame in the input sequence, and these extracted features are fed as input to a long short-term memory (LSTM) network. On the binary task of predicting head banging and no head banging within videos from the Self Stimulatory Behaviour Dataset (SSBD), we reach a mean F1-score of 90.77% using 3-fold cross validation (with individual fold F1-scores of 83.3%, 89.0%, and 100.0%) when ensuring that no child who appeared in the train set was in the test set for all folds. This work documents a successful technique for training a computer vision classifier which can detect human motion with few training examples and even when the camera recording the source clips is unstable. The general methods described here can be applied by designers and developers of interactive systems towards other human motion and pose classification problems used in mobile and ubiquitous interactive systems.

Posted Content
TL;DR: NetVec as mentioned in this paper is a multi-level framework for scalable unsupervised hypergraph embedding that can be combined with any graph embedding algorithm toproduce embeddings of hypergraphs with millions of nodes and hyperedges in a few minutes.
Abstract: Many problems such as vertex classification andlink prediction in network data can be solvedusing graph embeddings, and a number of algo-rithms are known for constructing such embed-dings. However, it is difficult to use graphs tocapture non-binary relations such as communitiesof vertices. These kinds of complex relations areexpressed more naturally as hypergraphs. Whilehypergraphs are a generalization of graphs, state-of-the-art graph embedding techniques are notadequate for solving prediction and classificationtasks on large hypergraphs accurately in reason-able time. In this paper, we introduce NetVec,a novel multi-level framework for scalable un-supervised hypergraph embedding, that can becoupled with any graph embedding algorithm toproduce embeddings of hypergraphs with millionsof nodes and hyperedges in a few minutes.

Journal ArticleDOI
TL;DR: In this paper, a graph-based methodology based on maximum flow is proposed to identify stable sets of variants associated with complex multigenic disorders, which can help pave the way towards biomarker-based diagnosis methods for complex genetic disorders.
Abstract: Machine learning approaches for predicting disease risk from high-dimensional whole genome sequence (WGS) data often result in unstable models that can be difficult to interpret, limiting the identification of putative sets of biomarkers. Here, we design and validate a graph-based methodology based on maximum flow, which leverages the presence of linkage disequilibrium (LD) to identify stable sets of variants associated with complex multigenic disorders. We apply our method to a previously published logistic regression model trained to identify variants in simple repeat sequences associated with autism spectrum disorder (ASD); this L1-regularized model exhibits high predictive accuracy yet demonstrates great variability in the features selected from over 230,000 possible variants. In order to improve model stability, we extract the variants assigned non-zero weights in each of 5 cross-validation folds and then assemble the five sets of features into a flow network subject to LD constraints. The maximum flow formulation allowed us to identify 55 variants, which we show to be more stable than the features identified by the original classifier. Our method allows for the creation of machine learning models that can identify predictive variants. Our results help pave the way towards biomarker-based diagnosis methods for complex genetic disorders.

Posted Content
TL;DR: In this paper, the authors used hand landmark detection over time as a feature representation which was then fed into a Long Short-Term Memory (LSTM) model for detecting whether videos from the Self Stimulatory Behaviour Dataset contain hand flapping or not.
Abstract: A formal autism diagnosis is an inefficient and lengthy process. Families often have to wait years before receiving a diagnosis for their child; some may not receive one at all due to this delay. One approach to this problem is to use digital technologies to detect the presence of behaviors related to autism, which in aggregate may lead to remote and automated diagnostics. One of the strongest indicators of autism is stimming, which is a set of repetitive, self-stimulatory behaviors such as hand flapping, headbanging, and spinning. Using computer vision to detect hand flapping is especially difficult due to the sparsity of public training data in this space and excessive shakiness and motion in such data. Our work demonstrates a novel method that overcomes these issues: we use hand landmark detection over time as a feature representation which is then fed into a Long Short-Term Memory (LSTM) model. We achieve a validation accuracy and F1 Score of about 72% on detecting whether videos from the Self-Stimulatory Behaviour Dataset (SSBD) contain hand flapping or not. Our best model also predicts accurately on external videos we recorded of ourselves outside of the dataset it was trained on. This model uses less than 26,000 parameters, providing promise for fast deployment into ubiquitous and wearable digital settings for a remote autism diagnosis.

Posted Content
TL;DR: In this article, the authors used hand landmark detection over time as a feature representation which was then fed into a Long Short-Term Memory (LSTM) model for detecting whether videos from the Self Stimulatory Behaviour Dataset contain hand flapping or not.
Abstract: A formal autism diagnosis is an inefficient and lengthy process. Families often have to wait years before receiving a diagnosis for their child; some may not receive one at all due to this delay. One approach to this problem is to use digital technologies to detect the presence of behaviors related to autism, which in aggregate may lead to remote and automated diagnostics. One of the strongest indicators of autism is stimming, which is a set of repetitive, self-stimulatory behaviors such as hand flapping, headbanging, and spinning. Using computer vision to detect hand flapping is especially difficult due to the sparsity of public training data in this space and excessive shakiness and motion in such data. Our work demonstrates a novel method that overcomes these issues: we use hand landmark detection over time as a feature representation which is then fed into a Long Short-Term Memory (LSTM) model. We achieve a validation accuracy and F1 Score of about 72% on detecting whether videos from the Self-Stimulatory Behaviour Dataset (SSBD) contain hand flapping or not. Our best model also predicts accurately on external videos we recorded of ourselves outside of the dataset it was trained on. This model uses less than 26,000 parameters, providing promise for fast deployment into ubiquitous and wearable digital settings for a remote autism diagnosis.

Posted Content
TL;DR: In this paper, the authors optimized and profiled various machine learning models designed for inference on edge devices and were able to match previous state-of-the-art results for emotion recognition on children.
Abstract: Implementing automated emotion recognition on mobile devices could provide an accessible diagnostic and therapeutic tool for those who struggle to recognize emotion, including children with developmental behavioral conditions such as autism. Although recent advances have been made in building more accurate emotion classifiers, existing models are too computationally expensive to be deployed on mobile devices. In this study, we optimized and profiled various machine learning models designed for inference on edge devices and were able to match previous state of the art results for emotion recognition on children. Our best model, a MobileNet-V2 network pre-trained on ImageNet, achieved 65.11% balanced accuracy and 64.19% F1-score on CAFE, while achieving a 45-millisecond inference latency on a Motorola Moto G6 phone. This balanced accuracy is only 1.79% less than the current state of the art for CAFE, which used a model that contains 26.62x more parameters and was unable to run on the Moto G6, even when fully optimized. This work validates that with specialized design and optimization techniques, machine learning models can become lightweight enough for deployment on mobile devices and still achieve high accuracies on difficult image classification tasks.

Posted Content
TL;DR: In this article, the authors used crowdsourcing to acquire reliable soft-target labels and evaluated an emotion detection classifier trained with these labels on the Child Affective Facial Expression (CAFE) dataset, a gold standard collection of images depicting pediatric facial expressions along with 100 human labels per image.
Abstract: Emotion classifiers traditionally predict discrete emotions. However, emotion expressions are often subjective, thus requiring a method to handle subjective labels. We explore the use of crowdsourcing to acquire reliable soft-target labels and evaluate an emotion detection classifier trained with these labels. We center our study on the Child Affective Facial Expression (CAFE) dataset, a gold standard collection of images depicting pediatric facial expressions along with 100 human labels per image. To test the feasibility of crowdsourcing to generate these labels, we used Microworkers to acquire labels for 207 CAFE images. We evaluate both unfiltered workers as well as workers selected through a short crowd filtration process. We then train two versions of a classifiers on soft-target CAFE labels using the original 100 annotations provided with the dataset: (1) a classifier trained with traditional one-hot encoded labels, and (2) a classifier trained with vector labels representing the distribution of CAFE annotator responses. We compare the resulting softmax output distributions of the two classifiers with a 2-sample independent t-test of L1 distances between the classifier's output probability distribution and the distribution of human labels. While agreement with CAFE is weak for unfiltered crowd workers, the filtered crowd agree with the CAFE labels 100% of the time for many emotions. While the F1-score for a one-hot encoded classifier is much higher (94.33% vs. 78.68%) with respect to the ground truth CAFE labels, the output probability vector of the crowd-trained classifier more closely resembles the distribution of human labels (t=3.2827, p=0.0014). Reporting an emotion probability distribution that accounts for the subjectivity of human interpretation. Crowdsourcing, including a sufficient filtering mechanism, is a feasible solution for acquiring soft-target labels.