Understanding and explaining Delta measures for authorship attribution
Citations
96 citations
39 citations
22 citations
19 citations
17 citations
Cites methods from "Understanding and explaining Delta ..."
...Pm i¼1Max rtf iA,rtf iBð Þ ð1Þ With the Burrows’ Delta model, the relative term frequency rtfiA of each term ti in Text A is computed, as well as the mean (meani), and standard deviation (si) of that term over all texts belonging to the corpus....
[...]
...As well-known strategies, one can mention Burrows’ Delta (Burrows, 2002; Evert et al., 2017) using the top mmost frequent word-tokens (with m = 40 to 1,000), the Kullback– Leibler divergence (Zhao & Zobel, 2007) using a predefined set of 363 English words, or Labbé’s method (Labbé, 2014) based on…...
[...]
...This article proposes to revisit this authorship attribution problem by considering two effective methods (Burrows’ Delta, Labbé’s intertextual distance)....
[...]
...In this article, two computer-based authorship methods (Burrows’ Delta, Burrows, 2002), and intertextual distance (Labbé, 2014), have been applied....
[...]
...As well-known strategies, one can mention Burrows’ Delta (Burrows, 2002; Evert et al., 2017) using the top mmost frequent word-tokens (with m = 40 to 1,000), the Kullback– Leibler divergence (Zhao & Zobel, 2007) using a predefined set of 363 English words, or Labbé’s method (Labbé, 2014) based on the entire vocabulary and opting for a variant of the Tanimoto distance, an approach found effective for Authorship Attribution (AA; Kocher & Savoy, 2017b)....
[...]
References
14,483 citations
[...]
9,857 citations
6,615 citations
"Understanding and explaining Delta ..." refers methods in this paper
...For example, bootstrapping approaches (Efron 1979) cannot easily be applied because the clustering quality is not based on individual measurements for the texts in the sample but rather on the sample as a whole; permutation tests (Hunter & McCoy 2004) can only be used to show that a clustering is significantly better than chance, which is entirely obvious given the excellent ARI in our experiments; and calculating p-values for clustersvalue clustering (Suzuki & Shimodaira 2006) assumes that features are independent and identically distributed, which is clearly not the case for language data due to Zipf’s law....
[...]
...For example, bootstrapping approaches (Efron 1979) cannot easily be applied because the clustering quality is not based on individual measurements for the texts in the sample but rather on the sample as a whole; permutation tests (Hunter & McCoy 2004) can only be used to show that a clustering is…...
[...]
2,155 citations
"Understanding and explaining Delta ..." refers methods in this paper
...…better than chance, which is entirely obvious given the excellent ARI in our experiments; and calculating p-values for clustersvalue clustering (Suzuki & Shimodaira 2006) assumes that features are independent and identically distributed, which is clearly not the case for language data due to…...
[...]
1,186 citations