scispace - formally typeset

Journal ArticleDOI

Assessing repository technology: where do we go from here?

20 May 1992-International Journal of Software Engineering and Knowledge Engineering (World Scientific Publishing Company)-Vol. 02, Iss: 03, pp 467-481

TL;DR: Three sample information retrieval systems, archie, autoLib, and WAIS, are compared as to their expressiveness and usefulness — first, in the general context of information retrieval, and then as prospective software reuse repositories.
Abstract: Three sample information retrieval systems, archie, autoLib, and Wide Area Information Service (WAIS), are compared with regard to their expressiveness and usefulness, first in the general context of information retrieval, and then as perspective software reuse repositories. While the representational capabilities of these systems are limited, they provide a useful foundation for future repository efforts, particularly from the perspective of repository distribution and coherent user interface design.

Content maybe subject to copyright    Report

m
m
N93:12391
Assessing Repository Technology:
Where Do We Go From Here?*
David Eichmann t
Software Reuse Repository Lab (SoRReL)
Dept. of Statistics and Computer Science
West Virginia University
m
LJ
m
w
Send correspondence to:
David Eichmann
SoRReL
Dept. of Statistics and Computer Science
West Virginia University
Morgantown, WV 26506
email: eiehmann@cs.wvu.wvuetedu
n
w
* to appear in the International Journal of Software Engineering and Knowledge Engineering.
t This work was supported in part by NASA as Pan of the Repository Based Software Engineering project,
cooperative agreement NCC-9-16, project no. RICIS SE.43, subcontract no. 089 and in part by a grant from
MountainNct Inc.
1

l
Abstract
Three sample information retrieval_systems ,archie, autoLib, and
WAIS, are _rhpared as=to_their exp_ssivene_ and usefulness, first
in the general context of information retrieval, and then as prospec-
five software reuse repositories. While the representational capabil-
ities of these systems are limited, they provide a useful foundation
for future repository efforts, particularly from the perspective of re-
pository distribution and coherent user interface design.
m
Ii
!
!
I
m
w
U
m
I
W
g
=
imm
I
S
!
m
m
1
i
m
B
I
m
i
[]
I

m
n
w
r
w
w
1 - Introduction
As information becomes an .increasingly important sector of the global economy, the way in
which we access that information - and thereby the way in which we access and structure knowl.
edge - becomes a critical concern. The engineering of knowledge is quickly becoming an area of
research in its own fight, independent of its parent disciplines of artificial intelligence, database
systems, and information retrieval; consider the fl0e of the journal that you now hold in your hands.
t
Wegner recognized the value of knowledge engineering in his landmark article on the role of cap-
ital in software development:
"Knowledge engineering is a body of techniques for managing the complexity of knowledge.., itis
capital-intensive in the sense that reusability is aprimary consideration in the development of books,
expert systems, and oth_ stng'tutes for the management and use of knowledge." [10, p. 33]
Just as Wegner observed that the products of software engineering are capital, so are the products
of knowledge engineering a form of capital. Identification, structure, and locatability are critical to
the enabling of this knowledge capital. Innovation in this area is driven from two diverse perspec-
fives, the traditional perspective of researchers and a not-so-tradifional perspective of what might
be referred to as an information underground.
The goal of this information underground is not necessarily an extension of the state of the art,
but a rather more pragmatic development of an informational infrastructure [4]. The prototypes re-
suiting from this type of work propagate quickly over the Interact, immediately generating large
numbers of users. Even while still experimental, systems that provide distinct benefit frequently
need to limit access in order to maintain reasonable system performance for other users of the un-
derlying platforms.
My reference to this community as an underground is calculated, for even within the computer
science community (let alone the academic or commercial communities as a whole), only a small
percentage of individuals are aware of such information systems. This article was spurred by my
interest in software repositories, a number of conversations that I've had in recent months, and the
1

benefitI thinkcanbegainedby wideningtheforum for suchsystemstoalarger audience.
In particular, it is interesting to cvalUate_ Systems as an enab_g _chnology for software
reuse repositories. Repositories, and by implication, information retrieval mechanisms, play a crit-
ical role in successful reuse. This statement disagrees with the conventional wisdom [9], that reuse
is a social and managerial issue, and not a technical one. A closer examination of the conventional
wisdom leads to a recognition that without a repository with substantial representational capability
many of the social and managerial requirements cannot be supported.
This paper surveys a number of interesting information server projects, with an eye towards
enabling technologies. Section 2 lays down a typical scenario in which such systems are used.
Sample sessions for three systems appear in section 3, and an analysis appears in section 4. I con-
clude with remarks on the potential of future systems.
2 - A Scenario and User Profile
Consider aprogrammer involvedin a researchprojectin some reasonably sizeduniversity.I
choose this context not only for its personal familiarity, but also because
such projects typically take place in facilities with rich local and wide area network connectiv-
ity;
progranmaers typicallyhave a personalworkstationwith substantialdisplaycapabilities(e.g.,
X'Windows)i and
there are strong incentives in avoiding the redevelopment of capabilities available from other
projects, either local or remote.
In effect, the development environment is one which is typical, or will be within the next few years.
In addition, the social infrastructure and equipment infrastructure for a successful reuse program
arc present, if not an explicit charter for reuse, or a true repository.
Our programmer is now faced with a dilemma-- aware that there is a strong likelihood that a
m
u
I
J
I
i
w
i
J
I
I
J
!
m
w
J
J
M
_ i
w
2
m
i
g

n
w
w
w
m
i
needed tool or component already exists somewhere out on the network, but uncertain as to where
to begin the search in the thousands of systems that currently make up the Internet, or even how to
identify the needed artifact. Un_ recently the only choices included asking acquaintances for ad-
vice (although the study by Schwartz and Wood [7] demonstrated the amazing potential for even
ad hoe mechanisms such as this), poring over intermittently posted electronic digest news articles
for likely sounding names, or manually searching a few sites maintained by volunteers and acces-
sible through anonymous ftp. Obviously, our programmer is ripe for recruitment as a client of the
services provided by the information underground.
3 - Example Repositories
Early in the evolution of the Internet, system administrators began adapting file transfer facil-
ities into what today is referred to as anonymous ftp, comprised of publicly accessible accounts, a
limited file space, and a restricted command set. These facilities, while amazingly popular as a dis-
semination tool, presume a fair amount of user knowledge, not the least of which being where to
look for the sought-after artifact. This section describes three information systems, archie, WAIS,
and autoLib. Each of these systems has a distinct design focus, anonymous ftp access in archie,
document retrieval/display in WAIS, and a limited form of electronic library in autoLib. However,
the resulting systems have much in common, and their look and feel has several similarities. These
systems were selected for discussion because they were designed primarily as information retriev-
al systems, rather than as software repository systems.
w
3.1 - archie
The arehie system is "an on-line resource directory service for an intemetworked environment"
[3]. While archie isn't truly a repository per se, since it doesn't actually contain the artifacts that it
classifies, when treated as a whole with the diverse anonymous ftp sites that it reference_, it does
fit into our discussion. An:hie grew out of the efforts of Emtage and Deutsch to automate the cre-
ation and referencing of previously hand-maintained lists of anonymous ftp sites. A demon peri-
3
. =

Citations
More filters


Journal ArticleDOI
David Eichmann1Institutions (1)
TL;DR: The beginnings of network information discovery and retrieval are surveyed, how the Web has created a surprising level of integration of these systems, and where the current state of the art lies in creating globally accessible information spaces and supporting access to those information spaces are surveyed.
Abstract: Access to information using the Internet has undergone dramatic change and expansion recently. The unrivaled success of the World Wide Web has altered the Internet from something approachable only by the initiated to something of a media craze — the information superhighway made manifest in the personal "home page". This paper surveys the beginnings of network information discovery and retrieval, how the Web has created a surprising level of integration of these systems, and where the current state of the art lies in creating globally accessible information spaces and supporting access to those information spaces.

5 citations


Cites background from "Assessing repository technology: wh..."

  • ...1 Advances in Network Information Access to information using the Internet has undergone dramatic change and expansion recently....

    [...]


Book ChapterDOI
01 Jun 1995-
TL;DR: Recent network information retrieval systems are compared for their expressiveness and usefulness — first, in the general context of information retrieval, and then as prospective software reuse repositories.
Abstract: Recent network information retrieval systems are compared for their expressiveness and usefulness — first, in the general context of information retrieval, and then as prospective software reuse repositories. While the representational capabilities of these some of these systems are limited, they provide a useful foundation for future repository efforts, particularly from the perspective of repository distribution and coherent user interface design.

References
More filters

Book
01 Dec 1988-
TL;DR: This paper examines statistical techniques for exploiting relevance information to weight search terms using information about the distribution of index terms in documents in general and shows that specific weighted search methods are implied by a general probabilistic theory of retrieval.
Abstract: This paper examines statistical techniques for exploiting relevance information to weight search terms. These techniques are presented as a natural extension of weighting methods using information about the distribution of index terms in documents in general. A series of relevance weighting functions is derived and is justified by theoretical considerations. In particular, it is shown that specific weighted search methods are implied by a general probabilistic theory of retrieval. Different applications of relevance weighting are illustrated by experimental results for test collections.

2,055 citations


"Assessing repository technology: wh..." refers background in this paper

  • ...Relevance feedback has been shown to be more effective than boolean expression as a search mechanism for textual information (a report of one such study appears in [6])....

    [...]


Journal ArticleDOI
Stephen Robertson1, K. Sparck Jones2Institutions (2)
Abstract: This paper examines statistical techniques for exploiting relevance information to weight search terms. These techniques are presented as a natural extension of weighting methods using information about the distribution of index terms in documents in general. A series of relevance weighting functions is derived and is justified by theoretical considerations. In particular, it is shown that specific weighted search methods are implied by a general probabilistic theory of retrieval. Different applications of relevance weighting are illustrated by experimental results for test collections.

1,846 citations


Journal ArticleDOI
TL;DR: The design and implementation decisions made for the three-dimensional data manager POSTGRES are discussed, and attention is restricted to the DBMS backend functions.
Abstract: The design and implementation decisions made for the three-dimensional data manager POSTGRES are discussed. Attention is restricted to the DBMS backend functions. The POSTGRES data model and query language, the rules system, the storage system, the POSTGRES implementation and the current status and performance are discussed. >

426 citations


"Assessing repository technology: wh..." refers methods in this paper

  • ...tabase. This approach has been carried much further in work on extensible database systems such as POSTGRES [ 8 ]....

    [...]


Journal ArticleDOI
Will Tracz1Institutions (1)
TL;DR: This paper analyzes nine commonly believed software reuse myths and reveals certain technical, organizational, and psychological software engineering research issues and trends.
Abstract: Reusing software is a simple, straightforward concept that has appealed to programmers since the first stored-program computer was created. Unfortunately, software reuse has not evolved beyond its most primitive forms of subroutine libraries and brute force program modification. This paper analyzes nine commonly believed software reuse myths. These myths reveal certain technical, organizational, and psychological software engineering research issues and trends.

113 citations


"Assessing repository technology: wh..." refers background in this paper

  • ...This statement disagrees with the conventional wisdom [9], that reuse is a social and managerial issue, and not a technical one....

    [...]


Journal ArticleDOI
P. Wegner1Institutions (1)
01 Jul 1984-IEEE Software
TL;DR: Each section of this four-part article deals with a different aspect of capital-intensive software technology and presents an integrated view of the subject.
Abstract: Each section of this four-part article deals with a different aspect of capital-intensive software technology. Together, they present an integrated view of the subject.

99 citations


Network Information
Related Papers (5)
08 Jan 1991

Minder Chen, Edgar H. Sibley

01 Dec 1990, Information & Software Technology

William B. Frakes, Paul B. Gandel

07 Oct 2015

Yolanda Gil, Varun Ratnakar +1 more

Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
19952
19901