N-gram-based text categorization
Citations
20,196 citations
7,539 citations
Cites background from "N-gram-based text categorization"
...2000; Schapire and Singer 2000], multimedia document categorization through the analysis of textual captions [Sable and Hatzivassiloglou 2000], author identification for literary texts of unknown or disputed authorship [Forsyth 1999], language identification for texts of unknown language [Cavnar and Trenkle 1994], automated identification of text genre [Kessler et al....
[...]
...…2000], author identi.cation for literary texts of unknown or disputed authorship [Forsyth 1999], language identi.cation for texts of unknown language [Cavnar and Trenkle 1994], automated identi.cation of text genre [Kessler et al. 1997], and automated essay grading [Larkey 1998]....
[...]
1,057 citations
Additional excerpts
..., by using n-grams (Cavnar and Trenkle 1994)....
[...]
703 citations
Cites background from "N-gram-based text categorization"
...First, we drop non-English-language blogs (Cavnar and Trenkle 1994), as well as spam blogs (with a technology we do not share publicly; for another, see Kolari, Finin, and Joshi 2006)....
[...]
679 citations
Cites background or methods from "N-gram-based text categorization"
...Therefore, recall in this setting is measured relative to the set of candidate pairs that was generated....
[...]
...The simplest possibility is to separate the pages on a site into the two languages of interest using automatic language identification (Ingle 1976; Beesley 1988; Cavnar and Trenkle 1994; Dunning 1994), throwing away any pages that are not in either language, and then generate the cross product....
[...]
References
1,944 citations
566 citations
237 citations
"N-gram-based text categorization" refers background in this paper
...N-gram-based matching has had some success in dealing with noisy ASCII input in other problem domains, such as in interpreting postal addresses ([1] and [2]), in text retrieval ([3] and [4]), and in a wide variety of other natural language processing applications[5]....
[...]
46 citations
"N-gram-based text categorization" refers background in this paper
...N-gram-based matching has had some success in dealing with noisy ASCII input in other problem domains, such as in interpreting postal addresses ([1] and [2]), in text retrieval ([3] and [4]), and in a wide variety of other natural language processing applications[5]....
[...]
37 citations
"N-gram-based text categorization" refers background in this paper
...N-gram-based matching has had some success in dealing with noisy ASCII input in other problem domains, such as in interpreting postal addresses ([1] and [2]), in text retrieval ([3] and [4]), and in a wide variety of other natural language processing applications[5]....
[...]