scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Information Science in 2012"


Journal ArticleDOI
TL;DR: In this article, existing definitions of crowdsourcing are analysed to extract common elements and to establish the basic characteristics of any crowdsourcing initiative.
Abstract: 'Crowdsourcing' is a relatively recent concept that encompasses many practices. This diversity leads to the blurring of the limits of crowdsourcing that may be identified virtually with any type of internet-based collaborative activity, such as co-creation or user innovation. Varying definitions of crowdsourcing exist, and therefore some authors present certain specific examples of crowdsourcing as paradigmatic, while others present the same examples as the opposite. In this article, existing definitions of crowdsourcing are analysed to extract common elements and to establish the basic characteristics of any crowdsourcing initiative. Based on these existing definitions, an exhaustive and consistent definition for crowdsourcing is presented and contrasted in 11 cases.

1,616 citations


Journal ArticleDOI
TL;DR: It is concluded that recasting this field as the study of information in social practice – in other words, as about exploring how information activities are woven through social practices – could be a highly productive perspective.
Abstract: A number of cognate disciplines, such as science and technology studies and media studies, appear to be turning to practice theories as a theoretical perspective. Using Schatzki's work as a starting point, this conceptual paper explores the practice approach, and its actual and potential application to different fields in information science. The paper begins by discussing definitions of practice, and charting differences in how authors have emphasized different aspects within the theory, such as the body, materiality, routine and knowing. Examples drawn from a study of family photography illustrate the discussion. The paper also locates a familiar concept, communities of practice, in the wider development of the theory. It then evaluates the practice approach as a perspective. The last sections of the paper examine the ways in which the practice approach has begun to be used in the study of information behaviour. The paper concludes that recasting this field as the study of information in social practice - in other words, as about exploring how information activities are woven through social practices - could be a highly productive perspective.

117 citations


Journal ArticleDOI
Gang Li1, Fei Liu1
TL;DR: This article introduces a novel approach for sentiment analysis – the clustering-based sentiment analysis approach, which applies a TF-IDF weighting method, a voting mechanism and importing term scores to obtain an acceptable and stable clustering result.
Abstract: This article introduces a novel approach for sentiment analysis - the clustering-based sentiment analysis approach. By applying a TF-IDF weighting method, a voting mechanism and importing term scores, an acceptable and stable clustering result can be obtained. The methodology has competitive advantages over the two existing types of approaches: symbolic techniques and supervised learning methods. It is a well-performed, efficient and non-human participating approach to solving sentiment analysis problems.

69 citations


Journal ArticleDOI
TL;DR: This paper provides a literature review of the quality evaluation of DLs based on users’ perceptions to bring together previously disparate streams of work to help shed light on this thriving area.
Abstract: In the past two decades, the use of digital libraries (DLs) has grown significantly. Accordingly, questions about the utility, usability and cost of DLs have started to arise, and greater attention is being paid to the quality evaluation of this type of information system. Since DLs are destined to serve user communities, one of the main aspects to be considered in DL evaluation is the user’s opinion. The literature on this topic has produced a set of varied criteria to judge DLs from the user’s perspective, measuring instruments to elicit users’ opinions, and approaches to analyse the elicited data to conclude an evaluation. This paper provides a literature review of the quality evaluation of DLs based on users’ perceptions. Its main contribution is to bring together previously disparate streams of work to help shed light on this thriving area. In addition, the various studies are discussed, and some challenges to be faced in the future are proposed.

68 citations


Journal ArticleDOI
TL;DR: This study proposes an integrated layer model of trust, which suggests that trust in information is influenced by trust in its source, which in turn is influence by a more general propensity to trust.
Abstract: Credibility evaluation has become a daily task in the current world of online information that varies in quality. The way this task is performed has been a topic of research for some time now. In this study, we aim to extend this research by proposing an integrated layer model of trust. According to this model, trust in information is influenced by trust in its source. Moreover, source trust is influenced by trust in the medium, which in turn is influenced by a more general propensity to trust. We provide an initial validation of the proposed model by means of an online quasi-experiment (n = 152) in which participants rated the credibility of Wikipedia articles. Additionally, the results suggest that the participants were more likely to have too little trust in Wikipedia than too much trust.

62 citations


Journal ArticleDOI
Dan Wu1, Daqing He2, Jiepu Jiang2, Wuyi Dong1, Kim Thien Vo2 
TL;DR: The results show that iSchools share the same vision and mission of working on relationships between information, people and technology, and have established themselves as the appropriate institutions for researchers from diverse subject areas to study this interdisciplinary integration.
Abstract: The emergence of the iSchool movement and the establishment of iSchools have helped to reshape the landscape of the library and information science (LIS) discipline. In this article, based on a set of research questions focusing around the research and education efforts of about 25 iSchools, we performed a study using both quantitative and qualitative methods on publically available data obtained from the web. Our results show that iSchools share the same vision and mission of working on relationships between information, people and technology, and have established themselves as the appropriate institutions for researchers from diverse subject areas to study this interdisciplinary integration. Overall, we are seeing an emerging iSchool identity and a defining iField, but there are still many important developments to make.

52 citations


Journal ArticleDOI
TL;DR: Results show that published journal articles are by far the most popular type of source bookmarked, followed by conference proceedings and books, and there is a marked preference for the use of subject-based repositories over institutional repositories in the case of open access repositories.
Abstract: This paper explores the possibility of using data from social bookmarking services to measure the use of information by academic researchers. Social bookmarking data can be used to augment participative methods (e.g. interviews and surveys) and other, non-participative methods (e.g. citation analysis and transaction logs) to measure the use of scholarly information. We use BibSonomy, a free resource-sharing system, as a case study. Results show that published journal articles are by far the most popular type of source bookmarked, followed by conference proceedings and books. Commercial journal publisher platforms are the most popular type of information resource bookmarked, followed by websites, records in databases and digital repositories. Usage of open access information resources is low in comparison with toll access journals. In the case of open access repositories, there is a marked preference for the use of subject-based repositories over institutional repositories. The results are consistent with those observed in related studies based on surveys and citation analysis, confirming the possible use of bookmarking data in studies of information behaviour in academic settings. The main advantages of using social bookmarking data are that is an unobtrusive approach, it captures the reading habits of researchers who are not necessarily authors, and data are readily available. The main limitation is that a significant amount of human resources is required in cleaning and standardizing the data.

44 citations


Journal ArticleDOI
TL;DR: The article briefly presents and discusses 12 different approaches to the evaluation of information sources (for example a Wikipedia entry or a journal article): the checklist approach, classical peer review, modifiedpeer review, and evidence-based evaluation.
Abstract: The article briefly presents and discusses 12 different approaches to the evaluation of information sources (for example a Wikipedia entry or a journal article): (1) the checklist approach; (2) classical peer review; (3) modified peer review; (4) evaluation based on examining the coverage of controversial views; (5) evidence-based evaluation; (6) comparative studies; (7) author credentials; (8) publisher reputation; (9) journal impact factor; (10) sponsoring: tracing the influence of economic, political, and ideological interests; (11) book reviews and book reviewing; and (12) broader criteria. Reading a text is often not a simple process. All the methods discussed here are steps on the way on learning how to read, understand, and criticize texts. According to hermeneutics it involves the subjectivity of the reader, and that subjectivity is influenced, more or less, by different theoretical perspectives. Good, scholarly reading is to be aware of different perspectives, and to situate oneself among them.

42 citations


Journal ArticleDOI
TL;DR: Three key KS mechanisms and three contingency factors affecting their application were identified based on the research results and future research that examines the interrelationships among these contingency factors and how they collectively influence KS practices in similar contexts is encouraged.
Abstract: Prior studies indicate that undesired consequences may occur if knowledge cannot be effectively shared among members of a project team. Nevertheless, there are few studies that explore the knowledge-sharing (KS) mechanisms used and the contingency factors affecting their application in the context of managing new product development projects that encounter changes in project scope. Therefore, in this research the principles of the contingency approach were adopted in order to examine the KS mechanisms used and the contingency factors affecting their use in this context via an in-depth case study. Three key KS mechanisms and three contingency factors affecting their application were identified based on the research results. The relationship between the KS mechanisms and the contingency factors is formalized in five propositions. Future research that examines the interrelationships among these contingency factors and how they collectively influence KS practices in similar contexts is encouraged.

37 citations


Journal ArticleDOI
TL;DR: This article proposes an adaptive collaborative filtering algorithm which takes time into account when predicting users’ behaviour, and shows that the proposed algorithm is more accurate than the classical collaborative filtering technique.
Abstract: Recommendation systems manage information overload in order to present personalized content to users based on their interests. One of the most efficient recommendation approaches is collaborative filtering, through which recommendation is based on previously rated data. Collaborative filtering techniques feature impressive solutions for suggesting favourite items to certain users. However, recommendation methods fail to reflect fluctuations in users' behaviour over time. In this article, we propose an adaptive collaborative filtering algorithm which takes time into account when predicting users' behaviour. The transitive relationship from one user to another is considered when computing the similarity of two different users. We predict variations of users' preferences using their profiles. Our experimental results show that the proposed algorithm is more accurate than the classical collaborative filtering technique.

35 citations



Journal ArticleDOI
TL;DR: This paper proposes a fresh approach that expresses the behaviour of interactive users and various web robots in terms of a sequence of request types, called request patterns, and shows that the proposed approach is more accurate, and that real-time detection of web robots is feasible.
Abstract: In web robot detection it is important is to find features that are common characteristics of diverse robots, in order to differentiate between them and humans. Existing approaches employ fairly simple features (e.g. empty referrer field, interval between successive requests), which often fail to reflect web robots' behaviour accurately. False alarms may therefore occur unacceptably often. In this paper we propose a fresh approach that expresses the behaviour of interactive users and various web robots in terms of a sequence of request types, called request patterns. Previous proposals have primarily targeted the detection of text crawlers, but our approach works well on many other web robots, such as image crawlers, email collectors and link checkers. In empirical evaluation of more than 1 billion requests collected at www.microsoft.com, our approach achieved 94% accuracy in web robot detection, estimated by F-measure. A decision tree algorithm proposed by Tan and Kumar was also applied to the same data. A comparison shows that the proposed approach is more accurate, and that real-time detection of web robots is feasible.

Journal ArticleDOI
TL;DR: A user study is presented, involving 60 subjects in 30 pairs, in which the experience and performance of users are compared while performing an information-seeking task in three different spatially defined collaboration settings, indicating the impact of space on collaboration.
Abstract: Space and time are considered the most important dimensions for studying systems and methods that support collaboration in information seeking. Several investigations have provided us with insights into people's preferences and experiences relating to these two dimensions, but there is a lack of empirical evidence. A user study is presented, involving 60 subjects in 30 pairs, in which the experience and performance of users are compared while performing an information-seeking task in three different spatially defined collaboration settings: (1) working at the same workstation, (2) working in the same room at different workstations, and (3) working in different rooms. The results show significant differences among the experimental conditions, indicating the impact of space on collaboration. The pros and cons of different spatial set-ups are derived from an extensive analysis that uses several traditional information retrieval measures such as precision and recall, as well as unconventional assessments involving coverage and diversity.

Journal ArticleDOI
TL;DR: This study explores the relationships between cultural and social capital and online social tagging behaviour in Delicious.com, a social bookmarking web site that offers social tagging functionalities, and made inferences on the user roles and the power structure of a social tagging folksonomy community.
Abstract: This study explores the relationships between cultural and social capital and online social tagging behaviour in Delicious.com, a social bookmarking web site that offers social tagging functionalities. Based on Bourdieu's conception of cultural and social capital, an online questionnaire was developed to measure Delicious users' capital possession and its influences on social tagging behavioural tendencies. The study findings showed that the offline/online cultural capital and offline social capital affected information organization-oriented tagging; offline/online social capital affected social oriented-tagging; offline/online cultural capital and offline/online social capital both affected strategic tagging; offline/online social capital affected tagging imitation. Based on the findings, we made inferences on the user roles and the power structure of a social tagging folksonomy community.

Journal ArticleDOI
TL;DR: This research has investigated four different classification algorithms (naïve Bayes, decision tree, SVM and K-NN) to detect Arabic web spam pages, based on content, and revealed that the Decision Tree was the best classifier for this purpose.
Abstract: Search engines are important outlets for information query and retrieval. They have to deal with the continual increase of information available on the web, and provide users with convenient access to such huge amounts of information. Furthermore, with this huge amount of information, a more complex challenge that continuously gets more and more difficult to illuminate is the spam in web pages. For several reasons, web spammers try to intrude in the search results and inject artificially biased results in favour of their websites or pages. Spam pages are added to the internet on a daily basis, thus making it difficult for search engines to keep up with the fast-growing and dynamic nature of the web, especially since spammers tend to add more keywords to their websites to deceive the search engines and increase the rank of their pages. In this research, we have investigated four different classification algorithms (naA¯ve Bayes, decision tree, SVM and K-NN) to detect Arabic web spam pages, based on content. The three groups of datasets used, with 1%, 15% and 50% spam contents, were collected using a crawler that was customized for this study. Spam pages were classified manually. Different tests and comparisons have revealed that the Decision Tree was the best classifier for this purpose.

Journal ArticleDOI
TL;DR: An algorithm that recommends answer providers that leverages answer provider interest and expertise, allowing for more effective differentiation in community-based question answering services is proposed.
Abstract: Obtaining answers from community-based question answering (CQA) services is typically a lengthy process. In this light, the authors propose an algorithm that recommends answer providers. A two-step...

Journal ArticleDOI
TL;DR: This work supports the contention that adolescent social groups in which SNs are embedded form a distinct domain, and establishes a rationale for further investigation of adolescents’ contextualized use of SNs within social relationships.
Abstract: Exploring ways in which new technology impacts adolescents' information behaviours and creates a social space requires holistic investigation. A qualitative study of 21 seniors in an upper-middle-class suburban high school revealed highly individualized use of Facebook and its features. These included: (i) Friends groups of 50-3700 members, with even the largest groups representative primarily of face-to-face connections, and (ii) a clear articulation within those groups of various categories, each with its own distinct communicative channel and style. A meaningful connection was found between the social value of various social network (SN)-mediated relationships and the communicative modes used to maintain and enhance them. Through a comprehensive literature review and clearly grounded analysis of rich data, this work supports the contention that adolescent social groups in which SNs are embedded form a distinct domain, and establishes a rationale for further investigation of adolescents' contextualized use of SNs within social relationships.

Journal ArticleDOI
TL;DR: A social inverted index is proposed – a novel inverted index extended for social-tagging-based IR – that maintains a separate user sublist for each resource in a resource-posting list to contain each user’s various features as weights.
Abstract: Keywords have played an important role not only for searchers who formulate a query, but also for search engines that index documents and evaluate the query. Recently, tags chosen by users to annotate web resources are gaining significance for improving information retrieval (IR) tasks, in that they can act as meaningful keywords bridging the gap between humans and machines. One critical aspect of tagging (besides the tag and the resource) is the user (or tagger); there exists a ternary relationship among the tag, resource, and user. The traditional inverted index, however, does not consider the user aspect, and is based on the binary relationship between term and document. In this paper we propose a social inverted index - a novel inverted index extended for social-tagging-based IR - that maintains a separate user sublist for each resource in a resource-posting list to contain each user's various features as weights. The social inverted index is different from the normal inverted index in that it regards each user as a unique person, rather than simply count the number of users, and highlights the value of a user who has participated in tagging. This extended structure facilitates the use of dynamic resource weights, which are expected to be more meaningful than simple user-frequency-based weights. It also allows a flexible response to the conditional queries that are increasingly required in tag-based IR. Our experiments have shown that this user-considering indexing performs better in IR tasks than a normal inverted index with no user sublists. The time and space overhead required for index construction and maintenance was also acceptable.

Journal ArticleDOI
TL;DR: This research proposed a framework to enable users to use their slang language in order to retrieve the relevant documents that have been posted in both forms – slang and classical, designed and implemented based on a context-free grammar.
Abstract: Due to the widespread use of the internet, there are large amounts of information and documents available in several languages. The Arabic language is one of the available important languages in terms of its usage and structure. Search engines like Google and Yahoo support searching in Arabic, yet fail to get good results when slang terms are used in the query. There are difficulties associated with the Arabic language. The main goal of this research is to refine Arabic text-based searching by using Arabic slang terms in queries. This research proposed a framework to enable users to use their slang language in order to retrieve the relevant documents that have been posted in both forms - slang and classical. The framework is designed and implemented based on a context-free grammar that is used to map the user's slang queries to the equivalent classical ones. On a classical dataset, results showed a 3% improvement on the average values of precision, recall, and F-measure achieved using classical-based queries rather than slang-based ones. Using slang-based queries gives 13% improvement on the average values of the used measures on a slang dataset and 7% improvement on the average values of the used measures on a hybrid dataset.

Journal ArticleDOI
TL;DR: A project to build a scalable OCR workflow for historical collections based on open source tools that is particularly tailored towards use in small-scale historical archives is discussed.
Abstract: Large-scale digitization projects dealing with text-based historical material face challenges that are not well catered for by commercial software. This article discusses the results of a project to build a scalable OCR workflow for historical collections based on open source tools that is particularly tailored towards use in small-scale historical archives. It argues that open source tools allow for better customization to match these requirements, particularly with regard to character model training and per-project language modelling. We offer insights into our accuracy evaluation results of various open source OCR tools, as well as a case study about the challenges and opportunities of open source OCR in historical archives.

Journal ArticleDOI
TL;DR: This article describes the process of constructing a vocabulary of personal names of jazz artists in the form of Linked Open Data (LOD), created as a name directory to support the development of the Linked Jazz project.
Abstract: This article describes the process of constructing a vocabulary of personal names of jazz artists in the form of Linked Open Data (LOD). Created as a name directory to support the development of the Linked Jazz project, it provides a case study that demonstrates the value and the challenges of developing a domain-specific vocabulary tool that draws upon the semantics of DBpedia, a major LOD dataset. The article also addresses possible strategies for enhancing the directory to make it a more robust personal name vocabulary.

Journal ArticleDOI
TL;DR: This work proposes a novel approach called the Automatic Thai Legal Ontology Building (ATOB) algorithm for automatic legal ontology building and to improve the court sentences retrieval process and concludes that the effective ontology should be weight-embedded.
Abstract: Ontology plays an important role in knowledge representation, especially in the domain of information retrieval. However, building ontology remains a challenging problem because it is a time-consum...

Journal ArticleDOI
TL;DR: The significance of this research is not only that it has identified blind spots existing in information policy making and research, but also that the new Chinese information policy domain framework will help regulate informationization in China.
Abstract: Information policies, the guidelines and standards for the macro information management of a nation, play an important role in the course of informationization in China. Owing to the absence of an ...

Journal ArticleDOI
TL;DR: Key challenges and problems associated with OA journals in the humanities and social sciences in China are identified and development strategies to address these issues are outlined, including actively promoting the transition of scholarly journals from print form to OA, speeding up network construction of OA Journals, and enhancing the functionality of the OA journal’s websites.
Abstract: We identified and analysed the 147 journals offering open access (OA) among the 2960 scholarly journals indexed by the Chinese National Knowledge Information (CNKI) database in the humanities and social sciences. Data were analysed concerning each journal’s organizer, discipline, publishing cycle, areas, regions or provinces covered, and first date that content was offered free of charge, together with the journal’s website construction, the way full text was accessed, and the time delay in publication. On the basis of the survey results, we identify key challenges and problems associated with OA journals in the humanities and social sciences in China, and we outline development strategies to address these issues, including actively promoting the transition of scholarly journals from print form to OA, speeding up network construction of OA journals, and enhancing the functionality of the OA journals’ websites.

Journal ArticleDOI
TL;DR: A novel metric termed a differentor is defined to assess the probability that a similarity measure can find the one-to-one mappings between two ontologies at the entity level, and use it to integrate different similarity measures.
Abstract: Ontology matching, aimed at finding semantically related entities from different ontologies, plays an important role in establishing interoperation among Semantic Web applications. Recently, many similarity measures have been proposed to explore the lexical, structural or semantic features of ontologies. However, a key problem is how to integrate various similarities automatically. In this paper, we define a novel metric termed a “differentor” to assess the probability that a similarity measure can find the one-to-one mappings between two ontologies at the entity level, and use it to integrate different similarity measures. The proposed approach can assign weights automatically to each pair of entities from different ontologies without any prior knowledge, and the aggregation task is accomplished based on these weights. The proposed approach has been tested on OAEI2010 benchmarks for evaluation. The experimental results show that the differentor can reflect the performance of individual similarity measures, and a differentor-based aggregation strategy outperforms other existing aggregation strategies.

Journal ArticleDOI
TL;DR: This paper presents a novel stepwise paradigm, SPCF, which in the first step clusters users and items separately using their latent similarity, and is able to alleviate the well known sparsity problem which intrinsically exists in collaborative filtering.
Abstract: Collaborative filtering is a widely used approach in recommendation systems which predict user preferences by learning from user-item ratings. To extract either user relationship or item dependencies, there exist several well known approaches; among them clustering is of great importance. Traditional clustering methods in collaborative filtering usually suffer from two fundamental problems: sparsity and scalability. Sparsity refers to a situation where most users rate only a small number of items, while scalability denotes a huge number of both users and items. Inspired by these problems, this paper presents a novel stepwise paradigm, SPCF, which in the first step clusters users and items separately using their latent similarity. Once the primary clusters of the first level are formed, the second level simultaneously clusters the user and item clusters by means of co-clustering. The advantages of SPCF are threefold; first, it is able to alleviate the well known sparsity problem which intrinsically exists in collaborative filtering; second, the proposed method offers an elegant solution to the scalability problem based on dimensionality reduction which in turn leads to better performance of the model; third, experimental results on two versions of a Movielens dataset for prediction have demonstrated that the proposed method can reveal major interests of users or items in promising manner.

Journal ArticleDOI
TL;DR: This paper proposes a modified information inference model that can mimic human cognitive behaviour to categorize various web short texts in an unsupervised manner and indicates the applicability and usefulness of the proposed method.
Abstract: Traditional text-processing methods encounter significant performance degradation when they are applied to web short texts, with their inherent characteristics including feature sparseness, lack of sufficient hand-labelled training examples, domain dependence, and asyntactic expression. In this paper we propose a modified information inference model that can mimic human cognitive behaviour to categorize various web short texts in an unsupervised manner. The model is based on the conceptual space theory and hyperspace analogue to language (HAL) model, and it is a novel development in that it combines domain-specific knowledge and universal knowledge via a fusion mechanism for multiple HAL spaces. Moreover, in the realization of conceptual space, a concept is represented geometrically by a two-tuple of property sets, which can effectively improve the representation accuracy of the information contained in combined concepts. Two measurements of the relationship between concepts are used to implement the information inference for web short texts. The experimental evaluation of our model is conducted via three different tasks on web short text categorization, and the results indicate the applicability and usefulness of the proposed method.

Journal ArticleDOI
TL;DR: A new approach is proposed that includes a hierarchical feature selection (HFS) algorithm which can be used to select more representative features with greater discriminative ability to present a set of patent documents with hierarchical class labels.
Abstract: A vector space model (VSM) composed of selected important features is a common way to represent documents, including patent documents. Patent documents have some special characteristics that make it difficult to apply traditional feature selection methods directly: (a) it is difficult to find common terms for patent documents in different categories; and (b) the class label of a patent document is hierarchical rather than flat. Hence, in this article we propose a new approach that includes a hierarchical feature selection (HFS) algorithm which can be used to select more representative features with greater discriminative ability to present a set of patent documents with hierarchical class labels. The performance of the proposed method is evaluated through application to two documents sets with 2400 and 9600 patent documents, where we extract candidate terms from their titles and abstracts. The experimental results reveal that a VSM whose features are selected by a proportional selection process gives better coverage, while a VSM whose features are selected with a weighted-summed selection process gives higher accuracy.

Journal ArticleDOI
TL;DR: A novel method for effectively retrieving sentences that contain a given terminological concept based on semantic units called predicate-argument tuples is proposed, which enables effective textual similarity computations and minimized errors based on six TP ranking models.
Abstract: Terminological paraphrases (TPs) are sentences or phrases that express the concepts of terminologies in a different form. Here we propose an effective way to identify and extract TPs from large-scale scientific literature databases. We propose a novel method for effectively retrieving sentences that contain a given terminological concept based on semantic units called predicate-argument tuples. This method enables effective textual similarity computations and minimized errors based on six TP ranking models. For evaluation, we constructed an evaluation collection for the TP recognition task by extracting TPs from a target literature database using the proposed method. Through the two experiments, we learned that scientific literature contain many TPs that could not have been identified so far. Also, the experimental results showed the potential and extensibility of our proposed methods to extract the TPs.

Journal ArticleDOI
TL;DR: This work proposes dividing the document into regions through the document structure and image position, and weight links between these regions according to their hierarchical positions, in order to distinguish between links that are useful and those that are not useful.
Abstract: In this paper, we are interested in XML multimedia retrieval, the aim of which is to find relevant multimedia objects such as images, audio and video through their context as document structure. In context-based multimedia retrieval, the most common technique is based on the text surrounding the image. However, such textual information can be irrelevant to the image content. Therefore many works are oriented to the use of alternative techniques to extend the image description, such as the use of ontologies, relevance feedback, and user profiles. We studied in our work the use of links between XML elements to improve image retrieval. More precisely, we propose dividing the document into regions through the document structure and image position. Then we weight links between these regions according to their hierarchical positions, in order to distinguish between links that are useful and those that are not useful. We then apply an updated version of the HITS algorithm at the region level, and compute a final image score by combining link scores with initial image scores. Experiments were done on the INEX 2006 and 2007 multimedia tracks, and showed the potential of our method.