An open-source toolkit for mining Wikipedia
Citations
702 citations
343 citations
Cites background from "An open-source toolkit for mining W..."
...Scientists have begun to develop Web services with interfaces to collectors of Big Data sets, e.g., Milne and Witten (2009) for Wikipedia at http://wikipedia-miner.cms.waikato.ac.nz/ and Reips and Garaizar (2011) for Twitter at http://tweetminer.eu....
[...]
185 citations
Additional excerpts
...Most of the research is focused on Wikipedia [11], which is understandable considering the availability of its data sets, in particular the whole edit history [27] and the availability of tools for working with Wikipedia [22]....
[...]
180 citations
Cites methods from "An open-source toolkit for mining W..."
...[127,128] R Open source programming language and software environment, is designed for data mining/analysis and visualization....
[...]
176 citations
Cites background or methods from "An open-source toolkit for mining W..."
...We argue that the popularity enjoyed by this line of research is a consequence of the fact that (i) it provides a viable solution to some of AI’s long-lasting problems, crucially including the quest for knowledge [179]; (ii) it has wide applicability spanning many different sub-areas of AI – as shown by the papers found in this special issue, which range from computational neuroscience [154] to information retrieval [82,102], through works in knowledge acquisition [73,130,192] and a variety of NLP applications such as Named Entity Recognition [148], Named Entity disambiguation [67] and computing semantic relatedness [122,216]....
[...]
...The Wikipedia Miner toolkit from Milne and Witten [122] makes the supervised wikification system originally presented in [121] freely available, while Tonelli et al. [192] present instead the Wiki Machine, a high-performance wikification system which is shown to outperform Wikipedia Miner thanks to a state-of-the-art kernel-based WSD algorithm [62]....
[...]
...Milne and Witten [120] compared a tf *idf -like measure computed on Wikipedia links with a more refined link co-occurrence measure modeled after the Normalized Google Distance [37]....
[...]
...Finally, the last two papers present tools for working with semi-structured resources like Wikipedia [122], and its use to acquire computational semantic models of mental representations of concepts [154]....
[...]
...The Wikipedia Miner toolkit from Milne and Witten [122] makes the supervised wikification system originally presented in [121] freely available, while Tonelli et al....
[...]
References
20,309 citations
"An open-source toolkit for mining W..." refers background in this paper
...The performance of this file-based database is a bottleneck for many applications....
[...]
20,196 citations
"An open-source toolkit for mining W..." refers methods in this paper
...It also provides a platform for sharing mining techniques, and for taking advantage of powerful technologies like the distributed computing framework Hadoop [27] and the Weka machine learning workbench [28]....
[...]
17,663 citations
[...]
9,995 citations
5,429 citations