Assessing repository technology: where do we go from here?

doi:10.1142/S0218194092000221

m

N93:12391

Assessing Repository Technology:

Where Do We Go From Here?*

David Eichmann t

Software Reuse Repository Lab (SoRReL)

Dept. of Statistics and Computer Science

West Virginia University

m

LJ

m

w

Send correspondence to:

David Eichmann

SoRReL

Dept. of Statistics and Computer Science

West Virginia University

Morgantown, WV 26506

email: eiehmann@cs.wvu.wvuetedu

n

w

* to appear in the International Journal of Software Engineering and Knowledge Engineering.

t This work was supported in part by NASA as Pan of the Repository Based Software Engineering project,

cooperative agreement NCC-9-16, project no. RICIS SE.43, subcontract no. 089 and in part by a grant from

MountainNct Inc.

1

l

Abstract

Three sample information retrieval_systems ,archie, autoLib, and

WAIS, are _rhpared as=to_their exp_ssivene_ and usefulness, first

in the general context of information retrieval, and then as prospec-

five software reuse repositories. While the representational capabil-

ities of these systems are limited, they provide a useful foundation

for future repository efforts, particularly from the perspective of re-

pository distribution and coherent user interface design.

m

Ii

!

I

m

w

U

m

I

W

g

=

imm

I

S

!

m

1

i

m

B

I

m

i

[]

I

m

n

w

r

w

1 - Introduction

As information becomes an .increasingly important sector of the global economy, the way in

which we access that information - and thereby the way in which we access and structure knowl.

edge - becomes a critical concern. The engineering of knowledge is quickly becoming an area of

research in its own fight, independent of its parent disciplines of artificial intelligence, database

systems, and information retrieval; consider the fl0e of the journal that you now hold in your hands.

t

Wegner recognized the value of knowledge engineering in his landmark article on the role of cap-

ital in software development:

"Knowledge engineering is a body of techniques for managing the complexity of knowledge.., itis

capital-intensive in the sense that reusability is aprimary consideration in the development of books,

expert systems, and oth_ stng'tutes for the management and use of knowledge." [10, p. 33]

Just as Wegner observed that the products of software engineering are capital, so are the products

of knowledge engineering a form of capital. Identification, structure, and locatability are critical to

the enabling of this knowledge capital. Innovation in this area is driven from two diverse perspec-

fives, the traditional perspective of researchers and a not-so-tradifional perspective of what might

be referred to as an information underground.

The goal of this information underground is not necessarily an extension of the state of the art,

but a rather more pragmatic development of an informational infrastructure [4]. The prototypes re-

suiting from this type of work propagate quickly over the Interact, immediately generating large

numbers of users. Even while still experimental, systems that provide distinct benefit frequently

need to limit access in order to maintain reasonable system performance for other users of the un-

derlying platforms.

My reference to this community as an underground is calculated, for even within the computer

science community (let alone the academic or commercial communities as a whole), only a small

percentage of individuals are aware of such information systems. This article was spurred by my

interest in software repositories, a number of conversations that I've had in recent months, and the

1

benefitI thinkcanbegainedby wideningtheforum for suchsystemstoalarger audience.

In particular, it is interesting to cvalUate_ Systems as an enab_g _chnology for software

reuse repositories. Repositories, and by implication, information retrieval mechanisms, play a crit-

ical role in successful reuse. This statement disagrees with the conventional wisdom [9], that reuse

is a social and managerial issue, and not a technical one. A closer examination of the conventional

wisdom leads to a recognition that without a repository with substantial representational capability

many of the social and managerial requirements cannot be supported.

This paper surveys a number of interesting information server projects, with an eye towards

enabling technologies. Section 2 lays down a typical scenario in which such systems are used.

Sample sessions for three systems appear in section 3, and an analysis appears in section 4. I con-

clude with remarks on the potential of future systems.

2 - A Scenario and User Profile

Consider aprogrammer involvedin a researchprojectin some reasonably sizeduniversity.I

choose this context not only for its personal familiarity, but also because

• such projects typically take place in facilities with rich local and wide area network connectiv-

ity;

• progranmaers typicallyhave a personalworkstationwith substantialdisplaycapabilities(e.g.,

X'Windows)i and

• there are strong incentives in avoiding the redevelopment of capabilities available from other

projects, either local or remote.

In effect, the development environment is one which is typical, or will be within the next few years.

In addition, the social infrastructure and equipment infrastructure for a successful reuse program

arc present, if not an explicit charter for reuse, or a true repository.

Our programmer is now faced with a dilemma-- aware that there is a strong likelihood that a

m

u

I

J

I

i

w

i

J

I

J

!

m

w

J

M

_ i

w

2

m

i

g

n

w

m

i

needed tool or component already exists somewhere out on the network, but uncertain as to where

to begin the search in the thousands of systems that currently make up the Internet, or even how to

identify the needed artifact. Un_ recently the only choices included asking acquaintances for ad-

vice (although the study by Schwartz and Wood [7] demonstrated the amazing potential for even

ad hoe mechanisms such as this), poring over intermittently posted electronic digest news articles

for likely sounding names, or manually searching a few sites maintained by volunteers and acces-

sible through anonymous ftp. Obviously, our programmer is ripe for recruitment as a client of the

services provided by the information underground.

3 - Example Repositories

Early in the evolution of the Internet, system administrators began adapting file transfer facil-

ities into what today is referred to as anonymous ftp, comprised of publicly accessible accounts, a

limited file space, and a restricted command set. These facilities, while amazingly popular as a dis-

semination tool, presume a fair amount of user knowledge, not the least of which being where to

look for the sought-after artifact. This section describes three information systems, archie, WAIS,

and autoLib. Each of these systems has a distinct design focus, anonymous ftp access in archie,

document retrieval/display in WAIS, and a limited form of electronic library in autoLib. However,

the resulting systems have much in common, and their look and feel has several similarities. These

systems were selected for discussion because they were designed primarily as information retriev-

al systems, rather than as software repository systems.

w

3.1 - archie

The arehie system is "an on-line resource directory service for an intemetworked environment"

[3]. While archie isn't truly a repository per se, since it doesn't actually contain the artifacts that it

classifies, when treated as a whole with the diverse anonymous ftp sites that it reference_, it does

fit into our discussion. An:hie grew out of the efforts of Emtage and Deutsch to automate the cre-

ation and referencing of previously hand-maintained lists of anonymous ftp sites. A demon peri-

3

. =

Assessing repository technology: where do we go from here?

Citations

A Measurement Study of Organizational Properties in the Global Electronic Mail Community ; CU-CS-482-90

Advances in network information discovery and retrieval

Recent efforts in internet repository services

References

Relevance weighting of search terms

Relevance weighting of search terms

The implementation of POSTGRES

Software reuse myths

Capital-Intensive Software Technology

Related Papers (5)

Using a CASE based repository for systems integration

Visualization of design knowledge component relationships to facilitate reuse

Representing reusable software

Knowledge management using semantic web technologies: an application in software development

OntoSoft: Capturing Scientific Software Metadata