Open AccessProceedings Article
Leveraging Quality Prediction Models for Automatic Writing Feedback.
Hamed Nilforoshan,Eugene Wu +1 more
- pp 211-220
Reads0
Chats0
TLDR
In this paper, a perturbation-based explanation method for tree-ensembles is proposed to identify writing features that, if changed, will most improve the text quality.Abstract:
User-generated, multi-paragraph writing is pervasive and important in many social media platforms (i.e. Amazon reviews, AirBnB host profiles, etc). Ensuring high-quality content is important. Unfortunately, content submitted by users is often not of high quality. Moreover, the characteristics that constitute high quality may even vary between domains in ways that users are unaware of. Automated writing feedback has the potential to immediately point out and suggest improvements during the writing process. Most approaches, however, focus on syntax/phrasing, which is only one characteristic of high-quality content.
Existing research develops accurate quality prediction models. We propose combining these models with model explanation techniques to identify writing features that, if changed, will most improve the text quality. To this end, we develop a perturbation-based explanation method for a popular class of models called tree-ensembles. Furthermore, we use a weak-supervision technique to adapt this method to generate feedback for specific text segments in addition to feedback for the entire document. Our user study finds that the perturbation-based approach, when combined with segment-specific feedback, can help improve writing quality on Amazon (review helpfulness) and Airbnb (host profile trustworthiness) by > 14% (3X improvement over recent automated feedback techniques).read more
Citations
More filters
Proceedings ArticleDOI
Complaint-driven Training Data Debugging for Query 2.0
TL;DR: This work proposes Rain, a complaint-driven training data debugging system that allows users to specify complaints over the query's intermediate or final output, and aims to return a minimum set of training examples so that if they were removed, the complaints would be resolved.
Proceedings ArticleDOI
A Study of Incorrect Paraphrases in Crowdsourced User Utterances
TL;DR: This work investigates common crowdsourced paraphrasing issues, and proposes an annotated dataset called Para-Quality, for detecting the quality issues and investigates existing tools and services to provide baselines for detecting each category of issues.
Posted Content
SliceNDice: Mining Suspicious Multi-attribute Entity Groups with Multi-view Graphs
Hamed Nilforoshan,Neil Shah +1 more
TL;DR: In this article, a multi-view graph mining problem is formulated to find groups of entities which share too many properties with one another across multiple attributes (sybil accounts created at the same time and location, propaganda spreaders broadcasting articles with the same rhetoric and with similar reshares, etc.).
Posted Content
Generative Grading: Near Human-level Accuracy for Automated Feedback on Richly Structured Problems
Ali Ahmad Malik,Mike Wu,Vrinda Vasavada,Jinpeng Song,Madison Coots,John C. Mitchell,Noah D. Goodman,Chris Piech +7 more
TL;DR: Generative grading as mentioned in this paper uses generative descriptions of student cognition, written as probabilistic programs, to synthesise millions of labelled example solutions to a problem; then learn to infer feedback for real student solutions based on this cognitive model.
Journal ArticleDOI
Zero-shot causal learning
Hamed Nilforoshan,Michael Moor,Yusuf H. Roohani,Yining Chen,Anja vSurina,Michihiro Yasunaga,Sara Oblak,Jure Leskovec +7 more
TL;DR: Zhang et al. as discussed by the authors proposed CaML, a causal meta-learning framework which formulates the personalized prediction of each intervention's effect as a task and trains a single meta-model across thousands of tasks, each constructed by sampling an intervention, along with its recipients and nonrecipients.
References
More filters
Journal ArticleDOI
Support-Vector Networks
Corinna Cortes,Vladimir Vapnik +1 more
TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Proceedings Article
Latent Dirichlet Allocation
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
TL;DR: The Elements of Statistical Learning: Data Mining, Inference, and Prediction as discussed by the authors is a popular book for data mining and machine learning, focusing on data mining, inference, and prediction.
Proceedings Article
VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text
Clayton J. Hutto,Eric Gilbert +1 more
TL;DR: Interestingly, using the authors' parsimonious rule-based model to assess the sentiment of tweets, it is found that VADER outperforms individual human raters, and generalizes more favorably across contexts than any of their benchmarks.
Proceedings Article
Mining opinion features in customer reviews
Minqing Hu,Bing Liu +1 more
TL;DR: This project aims to summarize all the customer reviews of a product by mining opinion/product features that the reviewers have commented on and a number of techniques are presented to mine such features.