scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Documentation in 2001"


Journal ArticleDOI
TL;DR: The concept of ‘literacy’ is expanded to include newer forms of literacy, more suitable for complex information environments, and related concepts, including computer literacy, library literacy, network literacy, Internet literacy and hyper‐literacy are discussed.
Abstract: The concepts of ‘information literacy’ and ‘digital literacy’ are described, and reviewed, by way of a literature survey and analysis. Related concepts, including computer literacy, library literacy, network literacy, Internet literacy and hyper‐literacy are also discussed, and their relationships elucidated. After a general introduction, the paper begins with the basic concept of ‘literacy’, which is then expanded to include newer forms of literacy, more suitable for complex information environments. Some of these, for example library, media and computer literacies, are based largely on specific skills, but have some extension beyond them. They lead togeneral concepts, such as information literacy and digital literacy which are based on knowledge, perceptions and attitudes, though reliant on the simpler skills‐based literacies

886 citations


Journal ArticleDOI
TL;DR: In this article, a theory of task-based information searching based on the empirical findings of the study is presented and corroborated hypotheses expand the ideas in Kuhlthau's model in the domain of information retrieval.
Abstract: The aim of this article is threefold: (1) to give a summary of empirical results reported earlier on relations between students‘ problem stages in the course of writing their research proposals for a master’s thesis and the information sought, choice of search terms and tactics and relevance assessments of the information found for that task; (2) to show how the findings of the study refine Kuhlthau‘s model of the information search process in the field of information retrieval (IR); and (3) to construe a tentative theory of a task‐based IR process based on the supported hypotheses. The results of the empirical studies show that there is a close connection between the students’ problem stages (mental model) in the task performance and the information sought, the search tactics used and the assessment of the relevance and utility of the information found. The corroborated hypotheses expand the ideas in Kuhlthau‘s model in the domain of IR. A theory of task‐based information searching based on the empirical findings of the study is presented.

260 citations


Journal ArticleDOI
TL;DR: This study sought to gain a better understanding of the variety of tasks that involve lawyers as a particular group of information workers, how they use information to accomplish their work, and the role mediators play in their process of information seeking and use.
Abstract: The study reported in this paper is part of a programme of ongoing research based on the model of the Information Search Process (ISP) developed in a series of prior studies by Kuhlthau. This study sought to gain a better understanding of the variety of tasks that involve lawyers as a particular group of information workers, how they use information to accomplish their work, and the role mediators play in their process of information seeking and use. Findings revealed that these lawyers frequently were involved in complex tasks that required a constructive process of interpreting, learning and creating. To accomplish these complex tasks, they preferred printed texts over computer databases primarily because computer databases required well‐specified requests and did not offer an option for examining a wide range of information at one time. These lawyers called for an active potential role for mediators in ‘just for me’ services. ‘Just for me’ services would encompass designing systems to provide a wider range of access more compatible with the process of construction, applying and developing principles of classification that would offer a more uniform system for organising and accessing files, and providing direction in filtering the overwhelming amount of information available on electronic resources.

183 citations


Journal ArticleDOI
TL;DR: The paper elaborates the linguistic morphological typology for the purposes of IR research and studies how the indexes of synthesis and fusion could be used as practical tools in mono‐ and cross‐lingual IR research.
Abstract: This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of every language in the world can be described by two variables, index of synthesis and index of fusion. These variables provide a theoretical basis for IR research handling morphological issues. A common theoretical framework is needed in particular because of the increasing significance of cross‐language retrieval research and CLIR systems processing different languages. The paper elaborates the linguistic morphological typology for the purposes of IR research. It studies how the indexes of synthesis and fusion could be used as practical tools in mono‐ and cross‐lingual IR research. The need for semantic and syntactic typologies is discussed. The paper also reviews studies made in different languages on the effects of morphology and stemming in IR.

92 citations


Journal ArticleDOI
TL;DR: This paper explains at least some of the major problems related to the subject indexing process and proposes a new approach to understanding the process, which is ordinarily described as a process that takes a number of steps.
Abstract: This paper explains at least some of the major problems related to the subject indexing process and proposes a new approach to understanding the process, which is ordinarily described as a process that takes a number of steps. The subject is first determined, then it is described in a few sentences and, lastly, the description of the subject is converted into the indexing language. It is argued that this typical approach characteristically lacks an understanding of what the central nature of the process is. Indexing is not a neutral and objective representation of a document’s subject matter but the representation of an interpretation of a document for future use. Semiotics is offered here as a framework for understanding the “interpretative” nature of the subject indexing process. By placing this process within Peirce’s semiotic framework of ideas and terminology, a more detailed description of the process is offered which shows that the uncertainty generally associated with this process is created by the fact that the indexer goes through a number of steps and creates the subject matter of the document during this process. The creation of the subject matter is based on the indexer’s social and cultural context. The paper offers an explanation of what occurs in the indexing process and suggests that there is only little certainty to its result.

89 citations


Journal ArticleDOI
TL;DR: In view of the increasing importance of the Internet as a publication/communication medium, the fluctuations in the result sets of Internet search engines can no longer be neglected.
Abstract: An empirical investigation of the consistency of retrieval through Internet search engines is reported. Thirteen engines are evaluated: AltaVista, EuroFerret, Excite, HotBot, InfoSeek, Lycos, MSN, NorthernLight, Snap, WebCrawler and three national Dutch engines: Ilse, Search.nl and Vindex. The focus is on a characteristics related to size: the degree of consistency to which an engine retrieves documents. Does an engine always present the same relevant documents that are, or were, available in its databases? We observed and identified three types of fluctuations in the result sets of several kinds of searches, many of them significant. These should be taken into account by users who apply an Internet search engine, for instance to retrieve as many relevant documents as possible, or to retrieve a document that was already found in a previous search, or to perform scientometric/bibliometric measurements. The fluctuations should also be considered as a complication of other research on the behaviour and performance of Internet search engines. In conclusion: in view of the increasing importance of the Internet as a publication/communication medium, the fluctuations in the result sets of Internet search engines can no longer be neglected.

84 citations


Journal ArticleDOI
TL;DR: The principal findings were that with certain restrictions, WIFs can be calculated reliably, but do not correlate with accepted research rankings owing to the variety of material hosted on university servers.
Abstract: Web impact factors, the proposed web equivalent of impact factors for journals, can be calculated by using search engines. It has been found that the results are problematic because of the variable coverage of search engines as well as their ability to give significantly different results over short periods of time. The fundamental problem is that although some search engines provide a functionality that is capable of being used for impact calculations, this is not their primary task and therefore they do not give guarantees as to performance in this respect. In this paper, a bespoke web crawler designed specifically for the calculation of reliable WIFs is presented. This crawler was used to calculate WIFs for a number of UK universities, and the results of these calculations are discussed. The principal findings were that with certain restrictions, WIFs can be calculated reliably, but do not correlate with accepted research rankings owing to the variety of material hosted on university servers. Changes to the calculations to improve the fit of the results to research rankings are proposed, but there are still inherent problems undermining the reliability of the calculation. These problems still apply if the WIF scores are taken on their own as indicators of the general impact of any area of the Internet, but with care would not apply to online journals.

69 citations


Journal ArticleDOI
TL;DR: The author analyses recent fieldwork in three online communities to discover to what extent they may be described as communities of practice, and to establish how they support participants’ learning.
Abstract: Communities of practice have been identified as sites where knowledge is created in organisations. The author reviews studies of situated learning and situated action and suggests that these two activities may characterise the learning process in communities of practice where they are supported by a distinctive ‘social’ infrastructure. She analyses recent fieldwork in three online communities (a digital library reference service, a virtual enterprise and an online shopping group) to discover to what extent they may be described as communities of practice, and to establish how they support participants’ learning.

55 citations


Journal ArticleDOI
Blaise Cronin1
TL;DR: Analysis of the aggregate data confirms the general impression that acknowledgement has become an institutionalised element of the scholarly communication process, reflecting the growing cognitive and structural complexity of contemporary research.
Abstract: Data were gathered on acknowledgements in five leading information science journals for the years 1991‐1999. The results were compared with data from two earlier studies of the same journals. Analysis of the aggregate data (1971‐1999) confirms the general impression that acknowledgement has become an institutionalised element of the scholarly communication process, reflecting the growing cognitive and structural complexity of contemporary research.

46 citations


Journal ArticleDOI
Elin K. Jacob1
TL;DR: Two different but complementary approaches to the investigation of situated cognition are presented: cognition‐as‐scaffolding and cognition‐ as‐infrastructure, shown to build upon and extend T.D. Wilson’s contention that research is most productive when it attends to the social and organisational contexts of cognitive activity by focusing on the everyday world of work.
Abstract: One major aspect of T.D. Wilson’s research has been his insistence on situating the investigation of information behaviour within the context of its occurrence O within the everyday world of work. The significance of this approach is reviewed in light of the notion of embodied cognition that characterises the evolving theoretical episteme in cognitive science research. Embodied cognition employs complex external props such as stigmergic structures and cognitive scaffoldings to reduce the cognitive burden on the individual and to augment human problem‐solving activities. The cognitive function of the classification scheme is described as exemplifying both stigmergic structures and cognitive scaffoldings. Two different but complementary approaches to the investigation of situated cognition are presented: cognition‐as‐scaffolding and cognition‐as‐infrastructure. Classification‐as‐scaffolding views the classification scheme as a knowledge storage device supporting and promoting cognitive economy. Classification‐as‐infrastructure views the classification system as a social convention that, when integrated with technological structures and organisational practices, supports knowledge management work. Both approaches are shown to build upon and extend Wilson’s contention that research is most productive when it attends to the social and organisational contexts of cognitive activity by focusing on the everyday world of work.

46 citations


Journal ArticleDOI
TL;DR: The history of adult education with its corresponding study modes is traced, and the experience of students is set within the wider framework of educational change in the information society.
Abstract: The information needs and practices of part‐time and distancelearning students in higher education (HE) in the UK outside the Open University (OU) have been evaluated. In recent years, the government has pointed out the importance of individuals engaging in lifelong learning initiatives, in order to remain competitive in a globalised economy which draws increasingly on cumulative knowledge creation. In response, the HE sector in the UK offers a growing number of its programmes on a part‐time and/or distance‐learning basis for students who can remain in full‐ or part‐time employment while studying for their qualifications. We trace the history of adult education with its corresponding study modes, and set the experience of students within the wider framework of educational change in the information society. We distributed a questionnaire and conducted telephone and face‐to‐face interviews with a substantial sample of part‐time and distance learners. Based on our research findings, we question whether the i...

Journal ArticleDOI
TL;DR: There appears to be no relationship between production costs and subscription prices of scholarly journals, and initiatives that aim to influence the structure of the market for scholarly journals with a view to driving prices down such as SPARC and HighWire Press are reviewed.
Abstract: This article explores recent developments in the production and delivery of scholarly journal articles in digital form. It identifies the key stakeholders as authors, publishers, librarians and end users. It explores their concerns with regard to the digital journal production and delivery chain. It also explores the interrelationships of different stakeholder groups and considers how their concerns accord or conflict. The paper goes on to review cost and pricing developments. There appears to be no relationship between production costs and subscription prices of scholarly journals. Journals are priced according to what the market will bear, but, at the same time, the market is inelastic. As a result, prices have consistently increased annually at a rate well above the general inflation rate for the last two decades. Digital publishing by publishers has done nothing to relieve this problem. The ‘serials crisis’ has been the impetus for a number of developments that aim to use digital technology to reduce costs for the HE sector. These include alternative models of journal production such as that proposed by Harnad, and initiatives that aim to influence the structure of the market for scholarly journals with a view to driving prices down such as SPARC and HighWire Press. These developments are reviewed.

Journal ArticleDOI
TL;DR: The findings showed that the study methods together provided the domain knowledge needed to define the role of the thesaurus and design its content and structure.
Abstract: Design and construction of indexing languages require thorough knowledge and understanding of the information environment. This empirical study investigated a mixed set of methods (group interviews, recollection of information needs and word association tests to collect data; content analysis and discourse analysis to analyse data) to evaluate whether these methods collected the data needed for work domain oriented thesaurus design. The findings showed that the study methods together provided the domain knowledge needed to define the role of the thesaurus and design its content and structure. The study was carried out from a person‐insituation perspective. The findings reflected the information environment and made it possible to develop a thesaurus according to the characteristics of the work domain. It seemed more difficult to capture the needs of the individual user and adapt the thesaurus to individual characteristics.

Journal ArticleDOI
TL;DR: Evidence suggests that some varieties of community can be constituted via electronic communication, but it is probably not possible to replicate those features of community that many people find lacking in modern life.
Abstract: Communities and neighbourhoods are often perceived to be under threat in the information society, as technological developments accelerate economic and social change. Technological developments may also provide a solution: ‘virtual communities’. There has been much debate about whether virtual communities can exist, but in the midst of such debates there has been little recognition that ‘community’ is a complex phenomenon. Many varieties of community exist, which can be categorised as moral, normative or proximate. Evidence suggests that some varieties of community can be constituted via electronic communication, but it is probably not possible to replicate those features of community that many people find lacking in modern life. Such a lack, and the desire for virtual communities as a response to that lack, are symptomatic of individuals‘ disengagement from social and political participation. If the process continues, this suggests an information society constituted by segmented diversity with isolated pockets of sociability.

Journal ArticleDOI
TL;DR: Pilot thesaurus InfoDEFT is introduced as a possible model for new online thesauri, which are semantically structured, encyclopedic and multilingual, and can be used as training sets for artificial learning programmes, thus increasing their volume considerably at relatively little extra cost.
Abstract: In the 21st century, multilingual tools are gaining importance as increasingly diverse user groups from different cultural and linguistic backgrounds seek access to equally diverse pieces of information. The authors of this paper believe that most current forms of multilingual information access are inadequate for this role, and that a new form of multilingual thesaurus is required. The core of this paper introduces their pilot thesaurus InfoDEFT as a possible model for new online thesauri, which are semantically structured, encyclopedic and multilingual. The authors conclude that while the manual construction of such thesauri is labour intensive and hence costly, pilot thesauri can be used as training sets for artificial learning programmes, thus increasing their volume considerably at relatively little extra cost.

Journal ArticleDOI
TL;DR: A new concept‐based method to analyse the text characteristics of documents at varying relevance levels is introduced and it was shown that highly relevant documents benefit essentially more from the concept-based QE in ranking than marginally relevant documents.
Abstract: The increasing flood of documentary information through the Internet and other information sources challenges the developers of information retrieval systems. It is not enough that an IR system is able to make a distinction between relevant and non‐relevant documents. The reduction of information overload requires that IR systems provide the capability of screening the most valuable documents out of the mass of potentially or marginally relevant documents. This paper introduces a new concept‐based method to analyse the text characteristics of documents at varying relevance levels. The results of the document analysis were applied in an experiment on query expansion (QE) in a probabilistic IR system. Statistical differences in textual characteristics of highly relevant and less relevant documents were investigated by applying a facet analysis technique. In highly relevant documents a larger number of aspects of the request were discussed, searchable expressions for the aspects were distributed over a larger set of text paragraphs, and a larger set of unique expressions were used per aspect than in marginally relevant documents. A query expansion experiment verified that the findings of the text analysis can be exploited in formulating more effective queries for best match retrieval in the search for highly relevant documents. The results revealed that expanded queries with concept‐based structures performed better than unexpanded queries or Nnatural languageO queries. Further, it was shown that highly relevant documents benefit essentially more from the concept‐based QE in ranking than marginally relevant documents.

Journal ArticleDOI
TL;DR: The aim of legal deposit is to ensure the preservation of and access to a nation’s intellectual and cultural heritage over time, and there is a global trend towards extending legal deposit to cover digital publications in order to maintain comprehensive national archives.
Abstract: The aim of legal deposit is to ensure the preservation of and access to a nation’s intellectual and cultural heritage over time. There is a global trend towards extending legal deposit to cover digital publications in order to maintain comprehensive national archives. However, including digital publications in legal deposit regulations is not enough to ensure the long‐term preservation of these publications. Indeed, there are many practical difficulties associated with the entire deposit process. Conceptsm, principles and practices that are accepted and understood in the print environment, such as publication, publisher, place of publication and edition, may have new meanings or no longer be appropriate in a networked environment. Mechanisms for identifying, selecting and depositing digital material either do not exist or are inappropriate for some kinds of digital publication. There is a great deal of work on developing digital preservation strategies; this is at an early stage. National and other deposit libraries are at the forefront of research and development in this area, often in partnership with other libraries, publishers and technology vendors. Most of this activity is of a technical nature. There is some work on developing policies and strategies for managing digital resources. However, not all management issues or users’ needs are being addressed.

Journal ArticleDOI
TL;DR: An exception to this general rule is presented, based on data from the Chinese Science Citation Database (CSCD), which shows that in 1998 42% of this database of source journals did not follow the expected trend.
Abstract: Generally speaking, the three‐year synchronous impact factor is larger than the two‐year one. This follows from theoretical models derived from observations based on ISIOs database. In this article we present an exception to this general rule, based on data from the Chinese Science Citation Database (CSCD). In 1998 42% of this databaseOs source journals did not follow the expected trend. As a possible explanation we note that, contrary to intuition, in the CSCD the changes in the number of both publications and citations are largely independent. It is, however, not ruled out that the observed discrepancies are nothing but statistical fluctuations of the basic publication‐citation model.

Journal ArticleDOI
TL;DR: Some of the factors leading to changes in the organisational structures of academic libraries are described and an overview of trends, excluding convergence, discernible in North America is provided.
Abstract: As a result of rapid environmental changes, organisations of all types are rethinking their organisational structures in an attempt to provide greater effectiveness and efficiency A few years ago business process re‐engineering (BPR) was considered the most promising way to restructure an organisation, but has become less popular as shortcomings associated with the process have become evident Today, greater emphasis is being placed upon modifying the actual organisational structure Most restructured organisations have moved away from rigid hierarchies to flatter, more flexible structures Many of the same forces (including increased automation, changing information needs and expectations of users, reduced budgets and the need for staff to have more autonomy over their own work) that have precipitated the reshaping of other organisations have also affected academic libraries This paper describes some of the factors leading to changes in the organisational structures of academic libraries and provides an overview of trends, excluding convergence, discernible in North America The paper includes suggestions for steps to be taken to facilitate successful reorganisations, and comments on possible future developments that might radically alter the organisational structures of academic libraries

Journal ArticleDOI
TL;DR: The reasoning indicated that both country groups considered not only question‐related reasons but also source‐ and search‐strategy related reasons in making their decision, which raises questions about considering cultural differences in designing web search access mechanisms.
Abstract: This paper uses a mix of qualitative and quantitative methodology to analyse differences between Finnish and American web searchers (n=27 per country) in their choice of initial search strategies (direct address, subject directory and search engines) and their reasoning underlying these choices, with data gathered via a questionnaire. The paper looks at these differences for four types of questions with two variables: closed/open and predictable/unpredictable source of answer (n=16 questions per searcher; total n=864 questions). The paper found significant differences between the two groups’ initial search strategies and for three of the four types of questions. The reasoning varied across countries and questions as well, with Finns mentioning fewer reasons although both groups mentioned in aggregate a total of 1,284 reasons in twenty‐four reason categories. The reasoning indicated that both country groups considered not only question‐related reasons but also source‐ and search‐strategy related reasons in making their decision. The research raises questions about considering cultural differences in designing web search access mechanisms.

Journal ArticleDOI
TL;DR: The paper verifies the hypothesis that a dominant central cluster exists consisting of the large Anglo‐American countries: USA, Canada and the UK and makes a strong case for adjusting or tuning the baseline impact to the actual national publication profiles when comparing NIFs of different countries.
Abstract: The paper investigates the advantages of graphical mapping of national research publication and citation profiles from scientific fields in order to provide additional information with respect to research performance. By means of multi‐dimensional scaling techniques national social science profiles from seventeen OECD countries and two periods, 1989‐1993 and 1994‐1998, are mapped, each profile represented by a vector of either publication volumes or citation values for nine social science fields. Aside from demonstrating the developments of publication volumes and citedness ranges as well as patterns, the graphical maps display clusters and similarities of national profiles over time. Combined with international rankings of averaged national impact factors (NIF) relative to the average world impact of field (WIF) for the same number of fields and periods, the graphical display supplies additional otherwise concealed information of the differences in research patterns between countries – even when the NIFs are quite similar. The analyses show that low Pearson correlation coefficients can be applied to flag extraordinary instances of either high or low national citation impacts during a period. Most importantly, the graphical maps make a strong case for adjusting or tuning the baseline impact to the actual national publication profiles when comparing NIFs of different countries. A new indicator, the Tuned Citation Impact Index (TCII) is proposed. It is constructed from the amount of expected citations a country ought to have received in each research field aggregated over its true profile. Common baseline profiles, like those of the world or EU, are consequently not regarded as the ideal benchmark. In the case illustrated by the journal publications of the social sciences the paper verifies the hypothesis that a dominant central cluster exists consisting of the large Anglo‐American countries: USA, Canada and the UK. A further hypothesis, that the smaller northern EU countries with English as the second language are located together and close to the central cluster on the publication maps is only partly satisfied in the second period. A third hypothesis, that countries located near the central cluster on the citation maps may hold high(er) NIFs is falsified.

Journal ArticleDOI
TL;DR: Present and possible future developments in the techniques of document management are reviewed, the major ones being text retrieval and scanning and OCR.
Abstract: Present and possible future developments in the techniques of document management are reviewed, the major ones being text retrieval and scanning and OCR. Acquisition, indexing and thesauri, publishing and dissemination and the document management industry are also addressed. The emerging standards are reviewed and the impact of the Internet is analysed.

Journal ArticleDOI
TL;DR: The semantic information theory, formulated by the philosopher Fred I. Dretske, is presented as a contribution to the discussion of metatheories and their practical implications in the field of library and information science.
Abstract: This article presents the semantic information theory, formulated by the philosopher Fred I. Dretske, as a contribution to the discussion of metatheories and their practical implications in the field of library and information science. Dretske’s theory is described in Knowledge and the flow of information. It is founded on mathematical communication theory but developed and elaborated into a cognitive, functionalistic theory, is individually oriented, and deals with the content of information. The topics are: the information process from perception to cognition, and how concept formation takes place in terms of digitisation. Other important issues are the concepts of information and knowledge, truth and meaning. Semantic information theory can be used as a frame of reference in order to explain, clarify and refute concepts currently used in library and information science, and as the basis for critical reviews of elements of the cognitive viewpoint in IR, primarily the notion of “potential information”. T...

Journal ArticleDOI
TL;DR: The results show that women value information highly and that they search for and use a wide range of categories of information in relation to education and their professional and personal life and that respondents tend to predict high levels of future use of European information.
Abstract: This paper describes the results of an exploratory survey by questionnaire distributed via a variety of information agencies, designed to investigate women’s information needs and patterns of information‐seeking behaviour in relation to the European Union. The results explore women’s attitudes to information and its value to them in a range of different life contexts, as well as their use of information agencies and of information and communications technologies. The results show that women value information highly and that they search for and use a wide range of categories of information in relation to education and their professional and personal life. Findings also suggest that respondents tend to predict high levels of future use of European information, in particular in relation to democratic participation and self development. Women were conscious of barriers to information access and suggested a range of measures that might improve access. They were generally positive about participating in trainin...

Journal ArticleDOI
TL;DR: Overall, PROPIE was found to have the potential both for enhancing the user’s interaction with information captured within e‐journals and for adding value to e‐documents in various ways.
Abstract: In this paper, we present a proposed information environment (PROPIE) for enhanced interaction and value‐adding of electronic documents (e‐documents). The design of PROPIE was based on a thorough user needs and requirements assessment in interacting with information through well‐documented findings, and a focus group with twelve participants to elicit features that were deemed desirable in future interactions. The design was also based on an earlier work which reviewed the advancements in various user interface (UI) technologies, visualisation and interactive techniques, and a consideration of novel information structuring and organisation techniques that pose important implications for the design of more advanced UIs. Providing a suite of novel features and interactive tools that can be flexibly combined, PROPIE allows users to apply multiple novel ways to query intuitively and navigate information in an e‐document. The querying and browsing processes in PROPIE are supported by various interactive and visualisation techniques. Users work within a visually sovereign, integrated environment for information gathering and organising, based on navigable, fractional information objects that are also affiliated with rich metadata and additional layers of value‐adding information. A set of interface mock‐ups was developed to demonstrate the potential of the environment in supporting the design of a new generation of electronic journals (e‐journals). We report here empirical results from a study conducted to obtain representative users‘ feedback with regard to using PROPIE for interacting with e‐journals. Twenty‐two participants from a variety of academic backgrounds participated in the evaluation. Overall, PROPIE was found to have the potential both for enhancing the user’s interaction with information captured within e‐journals and for adding value to e‐documents in various ways.

Journal ArticleDOI
TL;DR: An application of the approach to medical records of a psychiatric hospital is presented, which helps physicians to extract knowledge about patients and diseases.
Abstract: This paper presents an approach for performing knowledge discovery in texts through qualitative and quantitative analyses of high‐level textual characteristics. Instead of applying mining techniques on attribute values, terms or keywords extracted from texts, the discovery process works over conceptss identified in texts. Concepts represent real world events and objects, and they help the user to understand ideas, trends, thoughts, opinions and intentions present in texts. The approach combines a quasi‐automatic categorisation task (for qualitative analysis) with a mining process (for quantitative analysis). The goal is to find new and useful knowledge inside a textual collection through the use of mining techniques applied over concepts (representing text content). In this paper, an application of the approach to medical records of a psychiatric hospital is presented. The approach helps physicians to extract knowledge about patients and diseases. This knowledge may be used for epidemiological studies, for training professionals and it may be also used to support physicians to diagnose and evaluate diseases.

Journal ArticleDOI
TL;DR: The possibilities of elaborating pattern vectors that include the characteristics of different classes or categories of documents, using techniques based on those applied to the expansion of queries by relevance are shown.
Abstract: Automatic categorisation can be understood as a learning process during which a program recognises the characteristics that distinguish each category or class from others, i.e. those characteristics which the documents should have in order to belong to that category. As yet few experiments have been carried out with documents in Spanish. Here we show the possibilities of elaborating pattern vectors that include the characteristics of different classes or categories of documents, using techniques based on those applied to the expansion of queries by relevance; likewise, the results of applying these techniques to a collection of documents in Spanish are given. The same collection of documents was categorised manually and the results of both procedures were compared.

Journal ArticleDOI
TL;DR: It is argued that the expression inconsistency is a clear sign of a retrieval problem and that query expansion based on semantic relationships can significantly improve retrieval performance on free‐text sources.
Abstract: This article investigates how consistent different newspapers are in their choice of words when writing about the same news events. News articles on the same news events were taken from three Finnish newspapers and compared in regard to their central concepts and words representing the concepts in the news texts. Consistency figures were calculated for each set of three articles (the total number of sets was sixty). Inconsistency in words and concepts was found between news articles from different newspapers. The mean value of consistency calculated on the basis of words was 65 per cent; this however depended on the article length. For short news wires consistency was 83 per cent while for long articles it was only 47 per cent. At the concept level, consistency was considerably higher, ranging from 92 per cent to 97 per cent between short and long articles. The articles also represented three categories of topic (event, process and opinion). Statistically significant differences in consistency were found in regard to length but not in regard to the categories of topic. We argue that the expression inconsistency is a clear sign of a retrieval problem and that query expansion based on semantic relationships can significantly improve retrieval performance on free‐text sources.

Journal ArticleDOI
TL;DR: Economic aspects of a national electronic reserve service (NERS) were explored using Ithink Analyst, a modelling software package and it is concluded that as a whole this service is inefficient.
Abstract: Economic aspects of a national electronic reserve service (NERS) were explored using Ithink Analyst, a modelling software package. A model was developed and simulations were used to monitor the effect of variations in the values of key model elements. The model was based on developments within the UK HE community and primarily on Higher Education ON demand (HERON), a national service which is part‐funded by the UK HE funding councils. The two principal activities of HERON are rights clearance and digitisation but the service is also building a repository of digitised texts which are stored for future use to avoid duplication of effort. Model elements were manipulated to compare the cost per student of providing reserve materials using this service with the cost per student of a traditional print service. The level of overlap in materials required by different universities using the service was varied as was the copyright fee paid to rights holders for use of their texts. The results suggest that this service is extremely expensive for a library when compared with an equivalent print service. Furthermore, if the service operated within the library budget for reserve materials, the income generated for publishers would be a fraction of that generated from selling print copies to libraries at the current rate. The authors conclude that as a whole this service is inefficient. Specific elements of the service, e.g. the copyright clearance function, may be efficient in a different context.

Journal ArticleDOI
TL;DR: Economic aspects of a resource discovery network (RDN) consisting of a centre and eight subject‐based hubs were explored using Ithink Analyst, a modelling software package and the results suggest that with a combination of sponsorship and subscriptions income a RDN could succeed without grant funding within ten years of its launch.
Abstract: Economic aspects of a resource discovery network (RDN) consisting of a centre and eight subject‐based hubs were explored using Ithink Analyst, a modelling software package. A model was developed and simulations were used to monitor the effect of variations in the values of key model elements. The model was based on a recent report which suggested that a RDN could survive on a combination of grant funding and sponsorship. Model elements were manipulated to determine the level of sponsorship required for a RDN to be self sustaining within ten years if grant funding contributed 50% of required income. Additional simulations were used to explore the feasibility of subscription as an income source. The results suggest that with a combination of sponsorship and subscriptions income a RDN could succeed without grant funding within ten years of its launch.