scispace - formally typeset
R

Rotem Dror

Researcher at Technion – Israel Institute of Technology

Publications -  18
Citations -  581

Rotem Dror is an academic researcher from Technion – Israel Institute of Technology. The author has contributed to research in topics: Computer science & Statistical hypothesis testing. The author has an hindex of 5, co-authored 13 publications receiving 330 citations. Previous affiliations of Rotem Dror include University of Pennsylvania & IBM.

Papers
More filters
Proceedings ArticleDOI

The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing

TL;DR: This opinion/ theoretical paper proposes a simple practical protocol for statistical significance test selection in NLP setups and accompanies this protocol with a brief survey of the most relevant tests.
Proceedings ArticleDOI

Deep Dominance - How to Properly Compare Deep Neural Models

TL;DR: The criteria for a high quality comparison method between DNNs is defined, and it is shown that the proposed test meets all criteria while previously proposed methods fail to do so.
Journal ArticleDOI

Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets

TL;DR: This paper proposes a Replicability Analysis framework for a statistically sound analysis of multiple comparisons between algorithms for NLP tasks, and demonstrates its empirical value across four applications: multi-domain dependency parsing, multilingual POS tagging, cross-domain sentiment classification and word similarity prediction.
Book

Statistical Significance Testing for Natural Language Processing

TL;DR: Data-driven experimental analysis has become the main evaluation tool of Natural Language Processing (NLP) algorithms and has become rare to see an NLP paper, in the last decade.
Posted Content

Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets

TL;DR: This article propose a Replicability analysis framework for a statistically sound analysis of multiple comparisons between algorithms for NLP tasks, and demonstrate its empirical value across four applications: multi-domain dependency parsing, multilingual POS tagging, cross-domain sentiment classification and word similarity prediction.