scispace - formally typeset
Search or ask a question

Showing papers by "Nello Cristianini published in 2011"


Proceedings ArticleDOI
21 Aug 2011
TL;DR: A method to identify how a piece of text evolves as it moves through an underlying network and how substring information can be used to narrow down where in the evolutionary process a particular observation at a node lies is developed.
Abstract: Inferring causal networks behind observed data is an active area of research with wide applicability to areas such as epidemiology, microbiology and social science. In particular recent research has focused on identifying how information propagates through the Internet. This research has so far only used temporal features of observations, and while reasonable results have been achieved, there is often further information which can be used.In this paper we show that additional features of the observed data can be used very effectively to improve an existing method. Our particular example is one of inferring an underlying network for how text is reused in the Internet, although the general approach is applicable to other inference methods and information sources.We develop a method to identify how a piece of text evolves as it moves through an underlying network and how substring information can be used to narrow down where in the evolutionary process a particular observation at a node lies. Hence we narrow down the number of ways the node could have acquired the infection. Text reuse is detected using a suffix tree which is also used to identify the substring relations between chunks of reused text. We then use a modification of the NetCover method to infer the underlying network.Experimental results -- on both synthetic and real life data -- show that using more information than just timing leads to greater accuracy in the inferred networks.

43 citations



21 Oct 2011
TL;DR: A working system for large scale quantitative narrative analysis of news corpora, which includes various recent ideas from text mining and pattern analysis in order to solve a problem arising in computational social sciences is presented.
Abstract: We present a working system for large scale quantitative narrative analysis (QNA) of news corpora, which includes various recent ideas from text mining and pattern analysis in order to solve a problem arising in computational social sciences. The task is that of identifying the key actors in a body of news, and the actions they perform, so that further analysis can be carried out. This step is normally performed by hand and is very labour intensive. We then characterise the actors by: studying their position in the overall network of actors and actions; studying the time series associated with some of their properties; generating scatter plots describing the subject/object bias of each actor; and investigating the types of actions each actor is most associated with. The system is demonstrated on a set of 100,000 articles about crime appeared on the New York Times between 1987 and 2007. As an example, we nd that Men were most commonly responsible for crimes against the person, while Women and Children were most often victims of those crimes.

17 citations


Book ChapterDOI
27 Jun 2011
TL;DR: The results presented in this survey show how the automatic analysis of millions of documents in dozens of different languages can detect non-trivial macro-patterns that could not be observed at a smaller scale, and how the social sciences can benefit from closer interaction with the pattern analysis, artificial intelligence and text mining research communities.
Abstract: The strong trend towards the automation of many aspects of scientific enquiry and scholarship has started to affect also the social sciences and even the humanities Several recent articles have demonstrated the application of pattern analysis techniques to the discovery of non-trivial relations in various datasets that have relevance for social and human sciences, and some have even heralded the advent of "Computational Social Sciences" and "Culturomics" In this review article I survey the results obtained over the past 5 years at the Intelligent Systems Laboratory in Bristol, in the area of automating the analysis of news media content This endeavor, which we approach by combining pattern recognition, data mining and language technologies, is traditionally a part of the social sciences, and is normally performed by human researchers on small sets of data The analysis of news content is of crucial importance due to the central role that the global news system plays in shaping public opinion, markets and culture It is today possible to access freely online a large part of global news, and to devise automated methods for large scale constant monitoring of patterns in content The results presented in this survey show how the automatic analysis of millions of documents in dozens of different languages can detect non-trivial macro-patterns that could not be observed at a smaller scale, and how the social sciences can benefit from closer interaction with the pattern analysis, artificial intelligence and text mining research communities

5 citations


Posted Content
TL;DR: A generic model for multiplicative algorithms which is suitable for the MapReduce parallel programming paradigm is introduced and three typical machine learning algorithms are implemented to demonstrate how similarity comparison, gradient descent, power method and other classic learning techniques fit this model well.
Abstract: In this paper we introduce a generic model for multiplicative algorithms which is suitable for the MapReduce parallel programming paradigm. We implement three typical machine learning algorithms to demonstrate how similarity comparison, gradient descent, power method and other classic learning techniques fit this model well. Two versions of large-scale matrix multiplication are discussed in this paper, and different methods are developed for both cases with regard to their unique computational characteristics and problem settings. In contrast to earlier research, we focus on fundamental linear algebra techniques that establish a generic approach for a range of algorithms, rather than specific ways of scaling up algorithms one at a time. Experiments show promising results when evaluated on both speedup and accuracy. Compared with a standard implementation with computational complexity $O(m^3)$ in the worst case, the large-scale matrix multiplication experiments prove our design is considerably more efficient and maintains a good speedup as the number of cores increases. Algorithm-specific experiments also produce encouraging results on runtime performance.

4 citations


Book ChapterDOI
14 Apr 2011
TL;DR: This work casts the problem of learning and predicting popularity of articles from online news media as a ranking task of pairs of popular and non-popular articles and shows that this approach can reach accuracy of up to 76%.
Abstract: We explore the problem of learning and predicting popularity of articles from online news media. The only available information we exploit is the textual content of the articles and the information whether they became popular - by users clicking on them - or not. First we show that this problem cannot be solved satisfactorily in a naive way by modelling it as a binary classification problem. Next, we cast this problem as a ranking task of pairs of popular and non-popular articles and show that this approach can reach accuracy of up to 76%. Finally we show that prediction performance can improve if more content-based features are used. For all experiments, Support Vector Machines approaches are used.

2 citations