scispace - formally typeset
Open AccessProceedings Article

Leveraging Quality Prediction Models for Automatic Writing Feedback.

Reads0
Chats0
TLDR
In this paper, a perturbation-based explanation method for tree-ensembles is proposed to identify writing features that, if changed, will most improve the text quality.
Abstract
User-generated, multi-paragraph writing is pervasive and important in many social media platforms (i.e. Amazon reviews, AirBnB host profiles, etc). Ensuring high-quality content is important. Unfortunately, content submitted by users is often not of high quality. Moreover, the characteristics that constitute high quality may even vary between domains in ways that users are unaware of. Automated writing feedback has the potential to immediately point out and suggest improvements during the writing process. Most approaches, however, focus on syntax/phrasing, which is only one characteristic of high-quality content. Existing research develops accurate quality prediction models. We propose combining these models with model explanation techniques to identify writing features that, if changed, will most improve the text quality. To this end, we develop a perturbation-based explanation method for a popular class of models called tree-ensembles. Furthermore, we use a weak-supervision technique to adapt this method to generate feedback for specific text segments in addition to feedback for the entire document. Our user study finds that the perturbation-based approach, when combined with segment-specific feedback, can help improve writing quality on Amazon (review helpfulness) and Airbnb (host profile trustworthiness) by > 14% (3X improvement over recent automated feedback techniques).

read more

Citations
More filters
Proceedings ArticleDOI

Complaint-driven Training Data Debugging for Query 2.0

TL;DR: This work proposes Rain, a complaint-driven training data debugging system that allows users to specify complaints over the query's intermediate or final output, and aims to return a minimum set of training examples so that if they were removed, the complaints would be resolved.
Proceedings ArticleDOI

A Study of Incorrect Paraphrases in Crowdsourced User Utterances

TL;DR: This work investigates common crowdsourced paraphrasing issues, and proposes an annotated dataset called Para-Quality, for detecting the quality issues and investigates existing tools and services to provide baselines for detecting each category of issues.
Posted Content

SliceNDice: Mining Suspicious Multi-attribute Entity Groups with Multi-view Graphs

TL;DR: In this article, a multi-view graph mining problem is formulated to find groups of entities which share too many properties with one another across multiple attributes (sybil accounts created at the same time and location, propaganda spreaders broadcasting articles with the same rhetoric and with similar reshares, etc.).
Posted Content

Generative Grading: Near Human-level Accuracy for Automated Feedback on Richly Structured Problems

TL;DR: Generative grading as mentioned in this paper uses generative descriptions of student cognition, written as probabilistic programs, to synthesise millions of labelled example solutions to a problem; then learn to infer feedback for real student solutions based on this cognitive model.
Journal ArticleDOI

Zero-shot causal learning

TL;DR: Zhang et al. as discussed by the authors proposed CaML, a causal meta-learning framework which formulates the personalized prediction of each intervention's effect as a task and trains a single meta-model across thousands of tasks, each constructed by sampling an intervention, along with its recipients and nonrecipients.
References
More filters
Journal ArticleDOI

Support-Vector Networks

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

TL;DR: The Elements of Statistical Learning: Data Mining, Inference, and Prediction as discussed by the authors is a popular book for data mining and machine learning, focusing on data mining, inference, and prediction.
Proceedings Article

VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text

TL;DR: Interestingly, using the authors' parsimonious rule-based model to assess the sentiment of tweets, it is found that VADER outperforms individual human raters, and generalizes more favorably across contexts than any of their benchmarks.
Proceedings Article

Mining opinion features in customer reviews

TL;DR: This project aims to summarize all the customer reviews of a product by mining opinion/product features that the reviewers have commented on and a number of techniques are presented to mine such features.