scispace - formally typeset
Search or ask a question

Showing papers in "Information Processing and Management in 1995"


Journal ArticleDOI
TL;DR: The relationships of task complexity, necessary information types, information channels, and sources are analyzed at the task level on the basis of a qualitative investigation using diaries, questionnaires, and questionnaires.
Abstract: It is nowadays generally agreed that a person's information seeking depends on his or her tasks and the problems encountered in performing them. The relationships of broad job types and information-seeking characteristics have been analyzed both conceptually and empirically, mostly through questionnaires after task performance rather than during task performance. In this article, the relationships of task complexity, necessary information types, information channels, and sources are analyzed at the task level on the basis of a qualitative investigation. Tasks were categorized in five complexity classes and information into problem information, domain information, and problem-solving information. Moreover, several classifications of information channels and sources were utilized. The data were collected in a public administration setting through diaries, which were written during task performance, and questionnaires. The findings were structured into work charts for each task and summarized in qualitative process description tables for each task complexity category. Quantitative indices further summarizing the results were also computed. The findings indicate systematic and logical relationships among task complexity, types of information, information channels, and sources.

852 citations


Journal ArticleDOI
TL;DR: A system that performs domain-independent automatic condensation of news from a large commercial news service encompassing 41 different publications is described, with the result that the lead-based summaries outperformed the “intelligent” summaries significantly.
Abstract: As electronic information access becomes the norm, and the variety of retrievable material increases, automatic methods of summarizing or condensing text will become critical. This paper describes a system that performs domain-independent automatic condensation of news from a large commercial news service encompassing 41 different publications. This system was evaluated against a system that condensed the same articles using only the first portion of the texts (the lead), up to the target length of the summaries. Three lengths of articles were evaluated for 250 documents by both systems, totalling 1500 suitability judgements in all. The outcome of perhaps the largest evaluation of human vs machine summarization performed to date was unexpected. The lead-based summaries outperformed the “intelligent” summaries significantly, achieving acceptability ratings of over 90%, compared to 74.4%. This paper briefly reviews the literature, details the implications of these results, and addresses the remaining hopes for content-based summarization. We expect the results presented here to be useful to other researchers currently investigating the viability of summarization through sentence selection heuristics.

432 citations


Journal ArticleDOI
TL;DR: Several optimization techniques that can be used to reduce evaluation costs and simulation results are presented to compare the performance of these optimization techniques when evaluating natural language queries with a collection of full text legal materials.
Abstract: This paper discusses the two major query evaluation strategies used in large text retrieval systems and analyzes the performance of these strategies. We then discuss several optimization techniques that can be used to reduce evaluation costs and present simulation results to compare the performance of these optimization techniques when evaluating natural language queries with a collection of full text legal materials.

300 citations


Journal ArticleDOI
TL;DR: A modified technique is presented that attempts to match the likelihood of retrieving a document of a certain length to thelihood of documents of that length being judged relevant, and it is shown that this technique yields significant improvements in retrieval effectiveness.
Abstract: In the TREC collection -a large full-text experimental text collection with widely varying document lengths -we observe that the likelihood of a document being judged relevant by a user increases with the document length. We show that a retrieval strategy, such as the vector-space cosine match, that retrieves documents of different lengths with roughly equal probability, will not optimally retrieve useful documents from such a collection. We present a modified technique that attempts to match the likelihood of retrieving a document of a certain length to the likelihood of documents of that length being judged relevant, and show that this technique yields significant improvements in retrieval effectiveness.

215 citations



Journal ArticleDOI
TL;DR: A list of characteristics (dimensions) that are crucial for data model quality is presented and discussed, organized into six categories: content, scope, level of detail, composition, consistency, and reaction to change.
Abstract: Data quality is usually associated with the quality of data values. But even perfectly correct data values are of little use if they are based on a deficient data model. The purpose of this paper is to present and discuss a list of characteristics (dimensions) that are crucial for data model quality. We single out 14 quality dimensions, organized into six categories: content, scope, level of detail, composition, consistency, and reaction to change. Two types of correlation among dimensions called “reinforcements” and “tradeoffs” are recognized and discussed as well.

97 citations


Journal ArticleDOI
TL;DR: An approach to summary generation that opportunistically folds information from multiple facts into a single sentence using concise linguistic constructions, which allows the construction of concise summaries, containing complex sentences that pack in information.
Abstract: Summaries typically convey maximal information in minimal space. In this paper, we describe an approach to summary generation that opportunistically folds information from multiple facts into a single sentence using concise linguistic constructions. Unlike previous work in generation, how information gets added into a summary depends in part on constraints from how the text is worded so far. This approach allows the construction of concise summaries, containing complex sentences that pack in information. The resulting summary sentences are, in fact, longer than sentences generated by previous systems. We describe two applications we have developed using this approach, one of which produces summaries of basketball games (STREAK) while the other (PLANDOC) produces summaries of telephone network planning activity; both systems summarize input data as opposed to full text. The applications implement opportunistic summary generation using complementary approaches. STREAK uses revision, creating a draft of essential facts and then using revision rules constrained by the draft wording to add in additional facts as the text allows. PLANDOC uses discourse planning, looking ahead in its text plan to group together facts which can be expressed concisely using conjunction and deleting repetitions. In this paper, we describe the problems for summary generation, the two domains, the linguistic constructions that the systems use to convey information concisely and the textual constraints that determine what information gets included.

81 citations


Journal ArticleDOI
TL;DR: A system, SumGen, is described, which selects key information from an event database by reasoning about event frequencies, frequencies of relations between events, and domain specific importance measures and then aggregates similar information and plans a summary presentation tailored to a stereotypical user.
Abstract: Summarization entails analysis of source material, selection of key information, condensation of this, and generation of a compact summary form. While there have been many investigations into the automatic summarization of text, relatively little attention has been given to the summarization of information from structured information sources such as data or knowledge bases, despite this being a desirable capability for a number of application areas including report generation from databases (e.g. weather, financial, medical) and simulations (e.g. military, manufacturing, economic). After a brief introduction indicating the main elements of summarization and referring to some illustrative approaches to it, this article considers specific issues in the generation of text summaries of event data. It describes a system, SumGen, which selects key information from an event database by reasoning about event frequencies, frequencies of relations between events, and domain specific importance measures. The article describes how SumGen then aggregates similar information and plans a summary presentation tailored to a stereotypical user. Finally, the article evaluates SumGen performance, and also that of a much more limited second summariser, by assessesing information extraction by 22 human subjects from both source and summary texts. This evaluation shows that the use of SumGen reduces average sentence length by approx. 15%, document length by 70%, and time to perform information extraction by 58%.

77 citations


Journal ArticleDOI
TL;DR: Probabilistic retrieval, based on BI assumptions and applied to simple subject descriptions of documents and queries, can retrieve all relevant documents and only relevant documents, when term relevance weights are computed accurately.
Abstract: Computing formulas for binary independent (BI) term relevance weights are evaluated as a function of query representations and retrieval expectations in the CF database. Query representations consist of the limited set of terms appearing in each query statement and the complete set of terms appearing in the database. Retrieval expectations include comprehensive searches, for which many relevant documents are sought, and specific searches, for which only a few documents have merit. Conventional computing equations, which are known to over estimate term relevance weights, are shown to produce mediocre results for all combinations of query representations and retrieval expectations. Modified computing equations, which do not over estimate relevance weights, produce essentially perfect retrieval results for both comprehensive and specific searches, when the query representation is complete. Probabilistic retrieval, based on BI assumptions and applied to simple subject descriptions of documents and queries, can retrieve all relevant documents and only relevant documents, when term relevance weights are computed accurately.

68 citations


Journal ArticleDOI
TL;DR: The discussion begins with the empirical model and aims at a computational model which is implementable without determining the concrete implementation tools (the design model according to KADS), and feels that a small simulation model of professional summarizing is feasible.
Abstract: Four working steps taken from a comprehensive empirical model of expert abstracting are studied in order to prepare an explorative implementation of a simulation model. It aims at explaining the knowledge processing activities during professional summarizing. Following the case-based and holistic strategy of qualitative empirical research, we develop the main features of the simulation system by investigating in detail a small but central test case—four working steps where an expert abstractor discovers what the paper is about and drafts the topic sentence of the abstract. Following the KADS methodology of knowledge engineering, our discussion begins with the empirical model (a conceptual model in KADS terms) and aims at a computational model which is implementable without determining the concrete implementation tools (the design model according to KADS). The envisaged solution uses a blackboard system architecture with cooperating object-oriented agents representing cognitive strategies and a dynamic text representation which borrows its conceptual relations in particular from RST (Rhetorical Structure Theory). As a result of the discussion we feel that a small simulation model of professional summarizing is feasible.

67 citations


Journal ArticleDOI
TL;DR: The notion of semantic links is highlighted and it is shown how the semantic content of hypertext links can be used for retrieval purposes and indexing and retrieval algorithms that exploit the link content in addition to the content of the nodes are presented.
Abstract: Hypermedia links support the manual browsing through large hypertext or hypermedia collections; however, retrieving specific portions of information in such a collection cannot be achieved by browsing only. Retrieval mechanisms are necessary. In this article we highlight the notion of semantic links and show how the semantic content of hypertext links can be used for retrieval purposes. We present indexing and retrieval algorithms that exploit the link content in addition to the content of the nodes. Retrieval status values are obtained from a combination of conventional information retrieval and constrained spreading activation techniques. The results of some retrieval experiments in a hypertext test collection are presented: They are clearly superior to those obtained when the links are ignored. Since hypermedia collections basically have the same structure as hypertexts, the hope is that the same techniques can be applied to hypermedia information with the same effect of improving the retrieval results. Moreover, we hope that the results can be improved further by more sophisticated indexing algorithms.

Journal ArticleDOI
TL;DR: A descriptive model of design that characterizes communication among users, designers, and developers as they create an artifact was developed and is a first step towards a predictive design model that suggests strategies which may help participants interact more effectively and ultimately improve the quality of design outcomes and the design process.
Abstract: Many information system design situations today include users, designers, and developers who, with their own unique group and individual perspectives, need to interact so that they can come to a working understanding of how the information system being developed will coexist with and ideally support patterns of work activities, social groups, and personal beliefs. In these situations, design is fundamentally an interactive process that requires communication among users, designers, and developers. However, communication among these groups is often difficult although of paramount importance to design outcomes. Through a qualitative analysis of a house, expert system, and telecommunications network architecture and management system design situations, a descriptive model of design that characterizes communication among users, designers, and developers as they create an artifact was developed. The model describes design phases, roles, themes, and intergroup communication networks as they evolve throughout the design process and characterizes design as a process of “contested collaboration”. It is a first step towards a predictive design model that suggests strategies which may help participants interact more effectively and ultimately improve the quality of design outcomes and the design process.

Journal ArticleDOI
TL;DR: The findings demonstrate that the r-lohi, wpq, emim, and porter algorithms have similar performance in bringing good terms to the top of a ranked list of terms for query expansion, however, further evaluation of the algorithms in different environments is needed before these results can be generalized.
Abstract: The performance of eight ranking algorithms was evaluated with respect to their effectiveness in ranking terms for query expansion. The evaluation was conducted within an investigation of interactive query expansion and relevance feedback in a real operational environment. This study focuses on the identification of algorithms that most effectively take cognizance of user preferences. User choices (i.e. the terms selected by the searchers for the query expansion search) provided the yardstick for the evaluation of the eight ranking algorithms. This methodology introduces a user-oriented approach in evaluating ranking algorithms for query expansion in contrast to the standard, system-oriented approaches. Similarities in the performance of the eight algorithms and the ways that these algorithms rank terms were the main focus of this evaluation. The findings demonstrate that the r-lohi, wpq, emim, and porter algorithms have similar performance in bringing good terms to the top of a ranked list of terms for query expansion. However, further evaluation of the algorithms in different (e.g. full-text) environments is needed before these results can be generalized beyond the context of the present study.

Journal ArticleDOI
TL;DR: In this article, a statistical model for citation processes is presented as a particular version of a nonhomogeneous birth process, where the mean value function and transition probabilities are derived from known and estimated parameters.
Abstract: A statistical model for citation processes is presented as a particular version of a nonhomogeneous birth process. The mean value function E ( X ( t ) − X ( s )| X ( s ) = i ) and special transition probabilities such as P ( X ( t ) − X ( s ) > 0| X ( s ) = 0) and P ( X ( t ) − X ( s ) = 0| X ( s ) > 0) give essential information on the change of citation impact in time. It is shown that the mean value functions and transition probabilities can readily be calculated on the basis of known and estimated parameters. The analysis is illustrated by five examples. The citation rate for papers published in 1980 has been recorded in the period 1980 through 1989 in five science fields. The model provides sufficiently good approximations for both the empirical mean value functions and the transition frequencies for the years 1985 and 1989 based on the number of citations the papers have received until 1982.

Journal ArticleDOI
TL;DR: Both intersearcher and intrasearcher consistency grew most immediately after a rather simple evaluation of linguistic expressions, and statistically very significant differences in consistency were found according to the types of search environments and search requests.
Abstract: Intersearcher and intrasearcher consistency in the selection of search concepts and search terms are considered The article is based on an empirical study where 32 searchers from four different types of search environments analyzed altogether 12 search requests of four different types in two separate test situations between which two months elapsed Statistically very significant differences in consistency were found according to the types of search environments and search requests Consistency was also considered according to the extent of the scope of search concept At Level I search terms were compared character by character At Level II different search terms were accepted as the same search concept with a rather simple evaluation of linguistic expressions At Level III, in addition to Level II, the hierarchical approach of the search request was also controlled At Level IV different search terms were accepted as the same search concept with a broad interpretation of the search concept Both intersearcher and intrasearcher consistency grew most immediately after a rather simple evaluation of linguistic expressions


Journal ArticleDOI
TL;DR: In this paper, the authors used the human approach to examine the sources and effectiveness of search terms selected during mediated interactive information retrieval, and found that terms selected from particular database fields of retrieved items during term relevance feedback were more effective than search terms from intermediary, database thesauri or users' domain knowledge during the interaction, but not as effective as terms from the users' written question statements.
Abstract: Research into both the algorithmic and human approaches to information retrieval is required to improve information retrieval system design and database searching effectiveness. This study uses the human approach to examine the sources and effectiveness of search terms selected during mediated interactive information retrieval. The study focuses on determining the retrieval effectiveness of search terms identified by users and intermediaries from retrieved items during term relevance feedback. Results show that terms selected from particular database fields of retrieved items during term relevance feedback (TRF) were more effective than search terms from the intermediary, database thesauri or users' domain knowledge during the interaction, but not as effective as terms from the users' written question statements. Implications for the design and testing of automatic relevance feedback techniques that place greater emphasis on these sources and the practice of database searching are also discussed.

Journal ArticleDOI
TL;DR: This paper presents a first prototype that provides a generic basis for a functionality that allows an editor working on producing a large-scale encyclopedia to have access to dynamically selected aspects of the contents of those articles so that a “summarization” of that content needs to be achieved.
Abstract: In this paper we focus on an experimental application scenario in which the presentation of appropriately selected information is crucial. The scenario involves an editor working on producing a large-scale encyclopedia on the basis of a large number of submitted source articles. In order to make editorial decisions, that editor needs to have access to dynamically selected aspects of the contents of those articles—;a “summarization” of that content needs to be achieved. We present a first prototype that provides a generic basis for such a functionality. The essential features of our system supporting this functionality build on multilingual, genre-driven automatic text generation. The central role of genre in this model is motivated and briefly illustrated by considering examples of generated texts. The scenario as a whole naturally extends to allow considerations of the information needs of the information-seeking non-expert and to open information systems.

Journal ArticleDOI
TL;DR: The preliminary investigation of the concept of impact of information and the research questions raised by its assessment suggest that the most significant impact may be found in the transformation of knowledge structures at the deep paradigmatic level as a result of information-as-contents.
Abstract: A renewal of interest for the theory of information seems to emerge from a series of recent publications. A number of them advocate a shift toward a cognitive perspective. The preliminary investigation of the concept of impact of information and the research questions raised by its assessment suggest that the most significant impact may be found in the transformation of knowledge structures at the deep paradigmatic level as a result of information-as-contents. This leads to propose a revised formulation of Brookes' Fundamental equation and possible approaches for describing the attributes of the beneficiaries and their knowledge structure.

Journal ArticleDOI
TL;DR: German non-native speaker (GNNS) abstracts were author translations and contained structural and linguistic inadequacies which may hamper the general readability for the scientific community, therefore abstracting should be systematically incorporated into language courses for the medical profession and for technical translators.
Abstract: Studies on contrastive genre analysis have become a current issue in research on languages for specific purposes (LSP) and are intended to economize specialist communication. The present article compares formal schemata and linguistic devices of German abstracts and their English equivalents, written by German medical scholars to English native speaker (NS) abstracts. The source material is a corpus of 20 abstracts taken from German medical journals representing different degrees of specialism/professionalism. The method of linguistic analysis includes 1. (1) the overall length of articles/abstracts, 2. (2) the representation/arrangement of “moves”, 3. (3) the linguistic means (complexity of sentences, finite verb forms, active and passive voice, tenses, linking words, and lexical hedging). Results show no correlation between the length of articles and the length of abstracts. In contrast to NS author abstracts, the move “Background information” predominated in the structure of the studied German non-native speaker (GNNS) abstracts, whereas “Purpose of study” and “Conclusions” were not clearly stated. In linguistic terms, the German abstracts frequently contained lexical hedges, complex and enumerating sentence structures, passive voice and past tense as well as linkers of adversative, concessive and consecutive character. The GNNS English equivalent abstracts were author translations and contained structural and linguistic inadequacies which may hamper the general readability for the scientific community. Therefore abstracting should be systematically incorporated into language courses for the medical profession and for technical translators.


Journal ArticleDOI
TL;DR: An international project to test suitable approaches for the assessment of the benefits derived from all types of information activities in the developing countries and a review, from a personal perspective, of the many research questions related to the concepts of information and development, the impact of Information and the methodological and practical constraints in its assessment.
Abstract: At a time competition for scarce resources in tougher than ever, policy-makers, decision-makers, and information specialists alike, can no longer be satisfied with general assumptions which describe the role of information in the achievement of individual, organizational and societal goals as being “a critical resource”. An international project called “Impact of information on development” is carried out by the International Development Research Centre (IDRC, Canada) with a view to test suitable approaches for the assessment of the benefits derived from all types of information activities in the developing countries. This effort aims at assembling more solid evidence of the benefits associated with information. The rationale for the project and its progress to date are briefly presented. The main features of the suggested problem and constituency centered approach to impact assessment are discussed. Future developments in the project call for the establishment of a decentralized research network. While the IDRC project has to be focused exclusively on developing countries, the issues raised are in fact of universal significance. On the basis of the outcome of the project so far, the paper attempts at presenting a review, from a personal perspective, of the many research questions related to the concepts of information and development, the impact of information and the methodological and practical constraints in its assessment.

Journal ArticleDOI
TL;DR: Gender effect was found for three of the variables, which consist of perceived immediacy, decision confidence, and effectiveness, which provide some support to the notion that perception changes over time.
Abstract: This study investigates group members' perceived communication outcome variables across three CMC systems. The outcome variables were perceived satisfaction, decision confidenece, immediacy, effectiveness, and system ease of use. The role of communication medium and individual characteristic, specifically gender, on perception differences was explored. The results lend support to the argument that perception of communication outcomes differs by CMC medium, gender, and group gender composition status. Specifically there were medium effects for all the five outcome variables. Gender effect was found for three of the variables, which consist of perceived immediacy, decision confidence, and effectiveness. Interaction of medium and gender, and group composition status effects were also found for these three variables. The results also provide some support to the notion that perception changes over time. The discussion addresses implications for mixed gender group and CMC interaction.

Journal ArticleDOI
TL;DR: The decision making process is overviewed, an MSS for supporting the process is presented, and the influences of the MSS on the process and outcomes of health care decision making are assessed.
Abstract: Since accurate decisions are required for effective management, much research has focused on information systems that support decision making. Recently, this research has engendered frameworks, such as the management support system (MSS), that are designed to provide comprehensive and integrated support for the decision making process. Few, if any, studies have empirically measured the effects of these frameworks on decision making. This article offers empirical evidence on MSS effectiveness. It overviews the decision making process, presents an MSS for supporting the process, and assesses the influences of the MSS on the process and outcomes of health care decision making. The paper also examines the implications of the analyses for information systems research and health care practice.

Journal ArticleDOI
TL;DR: In this paper, the authors report findings of a study that sought better understanding of communications and other interactions within research teams composed of individuals from a variety of cultures, focusing on economically developing countries, addressed the questions: What kinds of processes do multicultural team researchers use to develop, exchange, and disseminate data and information, and which factors affect the quality and outcome of such processes? Key concepts are introduced and assumptions regarding information flows and technology transfer are examined.
Abstract: This paper reports findings of a study that sought better understanding of communications and other interactions within research teams composed of individuals from a variety of cultures. The study, focusing on economically developing countries, addressed the questions: (1) What kinds of processes do multicultural team researchers use to develop, exchange, and disseminate data and information, and (2) which factors affect the quality and outcome of such processes? Key concepts are introduced and assumptions regarding information flows and technology transfer are examined. Research in technology transfer and the diffusion of innovation within multicultural settings is reviewed briefly, as are the settings of multicultural research and recent trends in the operation of multicultural teams. Research methods for the study were descriptive and exploratory, employing a survey and in-person interviews. Preliminary analysis of data identified five major themes, characterizing the external environment of the respondents' projects, which may form an organizational framework for future research: sociopolitical climate, cultural climate, development-related trends within countries, information climate, and behavior patterns and attitudes. The study addressed facilitators of research and dissemination, barriers and their effects, and team research approaches considered by participants to be most likely to succeed in the future.


Journal ArticleDOI
TL;DR: An effective methodology for determining normal forms by employing a cost/benefit model coupled with a decision tree is proposed and the resulting cost/ benefit analysis enables database analysts to produce more cost-effective normalized databases.
Abstract: During the information systems development process within an organization, data resource is typically analyzed in the form of a data model. During this data analysis phase, the data model is further refined so that it obeys certain rules of good behavior. Normalization is the process of grouping data into such well refined structures. Determining an appropriate normal form has not been clear to database systems analysts. This paper proposes an effective methodology for determining normal forms by employing a cost/benefit model coupled with a decision tree. Three primary variables that impact the benefits and costs of normalization are addressed. The resulting cost/ benefit analysis enables database analysts to produce more cost-effective normalized databases.

Journal ArticleDOI
TL;DR: The first phase of research is reported that demonstrates how a neural network augmented by an inductive learning technique results in effective information retrieval performance in the areas that demand flexible inferencing and reasoning when incomplete queries and inconsistent indexing problems are present.
Abstract: Traditional information retrieval systems based on Boolean logic suffer from two inherent problems: (1) inaccurate or incomplete query representation, and (2) inconsistent indexing. While many researchers have demonstrated that neural networks can solve the incomplete query problems for information retrieval, the inconsistent indexing problem still remains unsolved. In this paper, we present a hybrid methodology of integrating an inductive learning technique with a neural network (connectionist model) in order to solve both inconsistent indexing and incomplete query problems. Since an inductive learning technique has the ability to identify the most significant document index terms with various levels of relationship to their semantic significance, it provides a possible solution to the problem of inconsistent indexing. This paper reports the first phase of research that demonstrates how a neural network augmented by an inductive learning technique results in effective information retrieval performance in the areas that demand flexible inferencing and reasoning when incomplete queries and inconsistent indexing problems are present.

Journal ArticleDOI
TL;DR: A software tool to decipher abbrevations by finding their whole-word equivalents or “disabbreviations”, which uses a large English dictionary and a rule-based system to guess the most-likely candidates, with users having final approval.
Abstract: Abbreviations adversely affect information retrieval and text comprehensibility. We describe a software tool to decipher abbrevations by finding their whole-word equivalents or “disabbreviations”. It uses a large English dictionary and a rule-based system to guess the most-likely candidates, with users having final approval. The rule-based system uses a variety of knowledge to limit its search, including phonetics, known methods of constructing multiword abbrevations, and analogies to previous abbreviations. The tool is especially helpful for retrieval from computer programs, a form of technical text in which abbreviations are notoriously common; disabbreviation of programs can make programs more reusable, improving software engineering. It also helps decipher the often-specialized abbreviations in technical captions. Experimental results confirm that the prototype tool is easy to use, finds many correct disabbreviations, and improves text comprehensibility.

Journal ArticleDOI
TL;DR: This paper classifies typical data restructuring tasks in an IR environment, and shows that query specification in this interface remains compact and truly declarative—also in the context of complex NF2 relational queries.
Abstract: In information retrieval (IR) there is a need for greater structural expressiveness than that provided by ordinary retrieval systems or the ordinary relational model. Especially hierarchical structures are usual in IR applications. Therefore the non-first-normal-form (NF2) relational model often is a more natural and intuitive way to model data of IR applications than the pure relational model. Because many-to-many relationships often exist among real world entities of IR applications, it is impossible to find a stable hierarchical structure suitable to all needs of users. This means that a tool is needed that has a powerful restructuring capability. In other words, it has to be able to produce for the user result NF2 relations in which hierarchical relationships among data have been organized in a way that is drastically different from that in the source NF2 relations. In this paper we classify typical data restructuring tasks in an IR environment, and give several examples on their specifications. It has been widely recognized that NF2 relational query formulation with conventional query languages is too cumbersome for ordinary end users. In order to simplify NF2 relational query formulation, we have developed and implemented a novel user interface. We show that query specification in this interface remains compact and truly declarative—also in the context of complex NF2 relational queries.