scispace - formally typeset
Open AccessJournal IssueDOI

The link-prediction problem for social networks

Reads0
Chats0
TLDR
Experiments on large coauthorship networks suggest that information about future interactions can be extracted from network topology alone, and that fairly subtle measures for detecting node proximity can outperform more direct measures.

Content maybe subject to copyright    Report

Towards Memory Supporting Personal Information
Management Tools
David Elsweiler, Ian Ruthven and Christopher Jones
{david.elsweiler, ian.ruthven, cjones}@cis.strath.ac.uk
Department of Computer and Information Sciences, University of Strathclyde
Livingstone Tower, 26 Richmond Street, Glasgow, G1 1XH
Abstract
In this article we discuss re-retrieving personal information objects and relate the task to recovering
from lapse(s) in memory. We propose that fundamentally it is lapses in memory that impede users from
successfully re-finding the information they need. Our hypothesis is that by learning more about
memory lapses in non-computing contexts and how people cope and recover from these lapses, we can
better inform the design of PIM tools and improve the user's ability to re-access and re-use objects. We
describe a diary study that investigates the everyday memory problems of 25 people from a wide range
of backgrounds. Based on the findings, we present a series of principles that we hypothesize will
improve the design of personal information management tools. This hypothesis is validated by an
evaluation of a tool for managing personal photographs, which was designed with respect to our
findings. The evaluation suggests that users’ performance when re-finding objects can be improved by
building personal information management tools to support characteristics of human memory.
1. Introduction
In our daily lives we constantly interact with a wide range of electronically stored information
objects; email messages, web pages, digital images, video samples, etc. The sheer quantity of the
information we create and use combined with limitations of human memory means that we cannot rely
solely on our memories to recollect precisely what information we have seen, where we may have stored
an object or how we can find it again. Consequently, we are forced to rely on tools to support our access
and management of digital information. These tools are either dedicated to searching our personal
information stores, such as Stuff-I’ve-Seen [Dumais et al. 2003], or are tools which allow us to manage
information objects, e.g. folders in email applications. Information management tools are intended to
help people find previously stored information by allowing the user to organise their information
1

objects. However, both the searching and managing approaches place the load for successful recovery
of information on the user’s memory.
To conduct a successful search on a query-based system such as Google desktop, for example, a user
must remember sufficient details about the information they want to retrieve in order to form a query.
However, psychological research indicates that people are not good at remembering precise details.
Instead what tends to be remembered are high-level meanings or gists [Sachs 1967; Clark and Clark
1977; Rubin 1977]. This suggests that people would not be adept at remembering terms in a document,
the subject of an email etc. – the kind of recollections required to construct queries.
The major alternatives to query-based systems are browse-based systems in which a user looks
through information objects in order to find the objects they want. Browsing systems either show users
all the objects available, limiting the approach to relatively small data sets, or force a classification on
the objects such as colour distribution for images [Heesch & Rüger, 2004], concepts for documents
[Yang 1994], etc.
Similarly, information management tools force a classification on users, either by automatically
classifying objects, as in text categorisation systems [Hayes et al., 1990], or forcing users to classify
objects, usually in some form of hierarchical system [Malone 1983]. For example, photographs and
music are generally organised in albums and possibly further sub-categorised by artist, date, genre etc.
Operating systems manage applications and files in a hierarchical system of folders, email tools provide
facilities to group messages hierarchically, and standard web page book-marking features are
hierarchical.
Despite their popularity, hierarchical systems have been shown to have problems. Malone’s study of
natural office behaviour demonstrated that they are cognitively challenging and that users are reluctant
to use them either because they cannot decide how to categorise an item, or because they are not
confident in their ability to retrieve a categorised item at a later date [Malone 1983]. Similar behaviour
has been observed with digital documents [Boardman & Sasse 2003] and email messages [Whittaker &
Sidner 1996].
The limitations of existing Personal Information Management (PIM) tools and the fact that the
quantities of information people are required to process are likely to continue to grow combine to
motivate our work. In particular, we are interested in the role that human memory plays in the
management of personal information. In PIM people try to obtain information based on the features of
an object that they can recall. Therefore, the information that people forget is the barrier to successful
2

retrieval if they could remember everything that they once knew about an object then it would be
simple to re-access it. To improve PIM systems we need to understand in more detail what people can
remember, what strategies are successful for remembering and how we can design tools that better
support personal information management.
The role memory plays in PIM is non-trivial and involves different types of memory. For example,
when re-retrieving an object from our personal stores our strategy may be based on the recollection of a
property that object has (semantic memory), a previous experience with the object (autobiographical
memory), a temporal reference to that object, such as when it was previously accessed, etc. Depending
on the context of the search it may be easier for the searcher to utilise some types of memory over
others, e.g. in email retrieval, it may be easier to remember who sent an email, when it was sent or what
it said depending on properties of the email and the search. Thus, supporting PIM should, we argue,
allow for searchers to utilise different types of memory in retrieval. Further, it is lapses in memory, such
as a failure to recall the specific location, property, or source of an object that prevents successful re-
retrieval in PIM. For example, in the period shortly after an information object has been stored or
accessed it can be re-accessed with ease because the recollection of the object and its location is lucid.
However, popular theories of memory emphasize the transient nature of human memory; recollection
diminishes over time [decay theory e.g. Rubin & Wenzel 1996] and focusing on other tasks and
interaction with other objects can also degrade the recollection [interference theory e.g. Bower et al.
1994].
We hypothesize that in order to ascertain which types of tool will be effective, and how existing tools
can be changed to enhance rather than restrict human recall, it will be useful to investigate memory
lapses in other contexts: what do people forget, why do they forget and what automated support might
make the process of remembering easier? Further, as we show in section 4, there are similarities
between memory lapses that people suffer from and learn to deal with effectively in everyday life and
those that hinder PIM. Therefore, can lessons be learned from everyday behaviour with respect to
improving PIM practises and tools? These are questions we address in this work.
This article is divided into two main parts. In the first part, we report on a diary study that evaluates
the variety, frequency and severity of everyday memory lapses. The study also explores the types of
tasks that cause memory failure (or the memory failure to be reported), as well as the methods employed
to recover from lapses. By comparing and contrasting the recorded memory problems and
compensatory strategies with those that hinder PIM, we demonstrate restrictive aspects of existing PIM
3

systems and their interface designs. We discuss the possible implications this work has for the design of
PIM tools and illustrate them in the context of a tool for the management of personal photographs. In
the second part of the article we deal with the main aim of this work - to determine if taking memory
into account in the design of PIM systems is advantageous. To this end, we perform a pilot evaluation
comparing the performance of our tool with a traditional browse-based interface.
The remainder of this article is structured as follows: Section 2 describes the background literature;
section 3 details the research methodology used to examine everyday memory lapses; section 4 provides
the results of the study, the implications of which are discussed in section 5, outlined as a series of
design principles. Section 6 presents a tool for managing personal photographs, which embodies the
principles established from our findings. Section 7 presents an evaluation of the tool. Finally, our
conclusions are presented in section 8 set against the context of future work.
2. Related Work
This section describes the background literature for the primary themes of this article. Section 2.1
describes previous studies that have also taken a psychological approach to investigating PIM behaviour.
Section 2.2 presents previous work that that relates memory lapses and PIM. Section 2.3 describes
knowledge of everyday memory problems, while section 2.4 details methods for studying everyday
memory problems.
2.1 Personal Information Management Behaviour
Several studies have been performed that have investigated personal information management
behaviour in natural settings. These studies had the goal of uncovering the strategies people employ
when storing and retrieving information, the reasons why they choose to use these strategies and the
problems they have when doing so.
It has been observed, for example, that documents are often placed in piles rather than being filed in
a more appropriate location [Malone 1983]. A number of explanations have been offered for this.
Firstly, it is the result of people having multiple and conflicting uses for their document collections.
Barreau and Nardi [1995] discovered that people use collections both for preserving information that
they may need at a later time and for reminding themselves that tasks have still to be completed. Piles
are common because, to a certain extent, they achieve both of these goals. When the number of
documents in the collection remains small it can be easy to re-find sought after documents. Further,
piles represent a kind of short-term memory; a buffer which retains tasks that must be performed [Jones
4

et al. 2002]. This is useful because when documents are filed in folders you have an “out of sight, out of
mind problem” [Bruce et al. 2004]. It is only when the number of files / piles scales beyond a certain
threshold that the disadvantages of employing a piling strategy become apparent. In this situation
different groups of people react in different ways. “Frequent filers” file documents as they use them and
never let piles become large enough to cause trouble, “spring cleaners” respond to over-sized piles by
archiving certain files into longer-term storage, whereas “no filers” make no efforts to manage the piles
and struggle to work productively [Whittaker & Sidner 1996].
The use of piling as an information strategy demonstrates that the function of the information space
plays a role in determining how people manage that space. Kwasnik [1989a] also observed that the
function or use of a document or specific elements within a document influences the way that people
will store or file that document. For example, resources for teaching may be stored together. Bruce
[2005] argues that it is the user's predicted need for information, i.e. their estimation of the value that the
information may hold for them in the future as well as the reason for that importance, that have the
greatest influence on the way they store it. Again, there is a problem with this because if people
inaccurately predict future needs the information becomes difficult to retrieve when they require it for a
purpose unrelated to its filed location.
Other researchers have observed that people use different management strategies depending on the
format of the information [Kwasnik 1989b], their role within a company [Jones et al. 2002], and their
relationship with the information [Jones et al., 2002]. In their studies of keeping information found on
the web, Jones and others [2002, 2003] and Bruce and others [2004] observed many strategies for
retaining information from web pages. They discovered that people, for example, use bookmarks, email
themselves URLs, print out entire web pages, cut and paste useful information into other kinds of
document. If the above studies are correct, then the exact method of retention will depend on a
complicated array of factors. The lack of a well defined or easily predictable storage strategy places
further burden on the memory when re-retrieving documents because to retrieve the document the user
must remember contextual facts such as the tool used to retrieve it, the task they were undertaking at the
time, their location etc. to determine where they would have stored the information.
Capra and Perez-Quinones [2003] noticed that when re-retrieving information objects users take a
two-stage iterative approach. The first stage identifies an appropriate information source, while the
second focuses on narrowing towards specific information from within that source. Their findings align
with those of Teevan and her colleagues [2004] who discovered similarities between the way people re-
5

Citations
More filters
Proceedings ArticleDOI

DeepWalk: online learning of social representations

TL;DR: DeepWalk as mentioned in this paper uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences, which encode social relations in a continuous vector space, which is easily exploited by statistical models.
Proceedings ArticleDOI

node2vec: Scalable Feature Learning for Networks

TL;DR: Node2vec as mentioned in this paper learns a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes by using a biased random walk procedure.
Journal ArticleDOI

Business intelligence and analytics: from big data to big impact

TL;DR: This introduction to the MIS Quarterly Special Issue on Business Intelligence Research first provides a framework that identifies the evolution, applications, and emerging research areas of BI&A, and introduces and characterized the six articles that comprise this special issue in terms of the proposed BI &A research framework.
Proceedings ArticleDOI

LINE: Large-scale Information Network Embedding

TL;DR: A novel network embedding method called the ``LINE,'' which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted, and optimizes a carefully designed objective function that preserves both the local and global network structures.
Proceedings ArticleDOI

LINE: Large-scale Information Network Embedding

TL;DR: LINE as discussed by the authors proposes a network embedding method called LINE, which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted, and optimizes a carefully designed objective function that preserves both the local and global network structures.
References
More filters
Journal ArticleDOI

Collective dynamics of small-world networks

TL;DR: Simple models of networks that can be tuned through this middle ground: regular networks ‘rewired’ to introduce increasing amounts of disorder are explored, finding that these systems can be highly clustered, like regular lattices, yet have small characteristic path lengths, like random graphs.
Journal ArticleDOI

Emergence of Scaling in Random Networks

TL;DR: A model based on these two ingredients reproduces the observed stationary scale-free distributions, which indicates that the development of large networks is governed by robust self-organizing phenomena that go beyond the particulars of the individual systems.
Journal ArticleDOI

The Structure and Function of Complex Networks

Mark Newman
- 01 Jan 2003 - 
TL;DR: Developments in this field are reviewed, including such concepts as the small-world effect, degree distributions, clustering, network correlations, random graph models, models of network growth and preferential attachment, and dynamical processes taking place on networks.
Journal ArticleDOI

The anatomy of a large-scale hypertextual Web search engine

TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Journal Article

The Anatomy of a Large-Scale Hypertextual Web Search Engine.

Sergey Brin, +1 more
- 01 Jan 1998 - 
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.
Frequently Asked Questions (10)
Q1. What contributions have the authors mentioned in the paper "Towards memory supporting personal information management tools" ?

In this article the authors discuss re-retrieving personal information objects and relate the task to recovering from lapse ( s ) in memory. The authors propose that fundamentally it is lapses in memory that impede users from successfully re-finding the information they need. Their hypothesis is that by learning more about memory lapses in non-computing contexts and how people cope and recover from these lapses, the authors can better inform the design of PIM tools and improve the user 's ability to re-access and re-use objects. The authors describe a diary study that investigates the everyday memory problems of 25 people from a wide range of backgrounds. Based on the findings, the authors present a series of principles that they hypothesize will improve the design of personal information management tools. The evaluation suggests that users ’ performance when re-finding objects can be improved by building personal information management tools to support characteristics of human memory. 

Recovery strategies are mainly associated with retrospective lapses and action slips, as when prospective lapses are realised it is usually too late to recover. 

Failure to predict a future information need and consequent failure to direct enough cognitive resources towards the encoding process is a principal cause of information-based lapses. 

photographs have additional properties that can allow utilization of additional memory types such as visual recollection, strong autobiographic recollection etc. 

After a short demonstration, participants were given approximately 3 weeks to familiarise themselves with the two new systems, while creating and annotating their test collections. 

Photographs were chosen to demonstrate their theories because of current research interest in this area, the ease of finding evaluation participants with data collections, and their belief that participants would be more comfortable sharing their personal photographs than other types of personal data. 

The major alternatives to query-based systems are browse-based systems in which a user looks through information objects in order to find the objects they want. 

To support the concept of retrieval journeys, in the PhotoMemory interface the user’s full collection is visible on screen at all times. 

To conduct a successful search on a query-based system such as Google desktop, for example, a user must remember sufficient details about the information they want to retrieve in order to form a query. 

This style of interaction tended to frustrate users, with 4 out of 6 subjects remarking that they felt uncomfortable performing searches in this way.