scispace - formally typeset
Search or ask a question

Showing papers in "Computers and The Humanities in 1997"


Journal ArticleDOI
TL;DR: In this paper, an analysis is presented in which wordsenses are abstractions from clusters of corpus citations, inaccordance with current lexicographic practice, where the corpus citations are the basic objects in the ontology.
Abstract: Word sense disambiguation assumes word senses. Withinthe lexicography and linguistics literature, they areknown to bevery slippery entities. The first part of the paperlooks at problemswith existing accounts of ‘word sense’ and describesthe various kinds of ways in which a word's meaning candeviate from its coremeaning. An analysis is presented in which wordsenses areabstractions from clusters of corpus citations, inaccordance withcurrent lexicographic practice. The corpus citations,not the wordsenses, are the basic objects in the ontology. Thecorpus citationswill be clustered into senses according to thepurposes of whoever or whatever does the clustering. In theabsence of suchpurposes, word senses do not exist. Word sense disambiguation also needs a set of wordsenses todisambiguate between. In most recent work, the sethas been takenfrom a general-purpose lexical resource, with theassumption that thelexical resource describes the word senses ofEnglish/French/...,between which NLP applications will need todisambiguate. Theimplication of the first part of the paper is, bycontrast, that wordsenses exist only relative to a task. Thefinal part of the paper pursues this, exploring, bymeans of asurvey, whether and how word sense ambiguity is infact a problem forcurrent NLP applications.

419 citations


Journal ArticleDOI
TL;DR: The statement, ’’Results of most non-traditional authorship attribution studies are not universally accepted as definitive,'' is explicated.
Abstract: The statement, ’’Results of most non-traditional authorship attribution studies are not universally accepted as definitive,'' is explicated. A variety of problems in these studies are listed and discussed: studies governed by expediency; a lack of competent research; flawed statistical techniques; corrupted primary data; lack of expertise in allied fields; a dilettantish approach; inadequate treatment of errors. Various solutions are suggested: construct a correct and complete experimental design; educate the practitioners; study style in its totality; identify and educate the gatekeepers; develop a complete theoretical framework; form an association of practitioners.

263 citations


Journal ArticleDOI
TL;DR: The present state of digital preservation is discussed, requirements of both users and custodians are articulated, and research needs in storage media, migration, conversion, and overall management strategies are suggested.
Abstract: The difficulty and expense of preserving digital information is a potential impediment to digital library development. Preservation of traditional materials became more successful and systematic after libraries and archives integrated preservation into overall planning and resource allocation. Digital preservation is largely experimental and replete with the risks associated with untested methods. Digital preservation strategies are shaped by the needs and constraints of repositories with little consideration for the requirements of current and future users of digital scholarly resources. This article discusses the present state of digital preservation, articulates requirements of both users and custodians, and suggests research needs in storage media, migration, conversion, and overall management strategies. Additional research in these areas would help developers of digital libraries and other institutions with preservation responsibilities to integrate long-term preservation into program planning, administration, system architectures, and resource allocation.

237 citations


Journal ArticleDOI
TL;DR: The results of this work provide confirmation of connoisseurial claims regarding craquelure as a broad indicator of authorship.
Abstract: This paper describes the formal representation and analysis of a visual structure –craquelure (an accidental feature of paintings).Various statistical methods demonstrate a relationship between a formal representation of craquelure and art-historical categories of paintings. The results of this work provide confirmation of connoisseurial claims regarding craquelure as a broad indicator of authorship. The techniques employed in this study are; repertory grid, hierarchical clustering, multidimensional scaling and discriminant analysis.

38 citations


Journal ArticleDOI
TL;DR: In this article, an illuminating new humanities analogy was found by constructing a search statement in which proper names werecoupled with associated concepts, drawing upon an efficaciousmethod for discovering previously unknown causes of medical syndromes, and searching inHumanities Index, a periodical index included inWILS, the Wilson Database.
Abstract: Voluminous databases contain hiddenknowledge, i.e., literatures that are logicallybut not bibliographically linked. Unlinkedliteratures containing academically interestingcommonalities cannot be retrieved via normalsearching methods. Extracting hidden knowledgefrom humanities databases is especiallyproblematic because the literature, written in“everyday”rather than technical language, lacksprecision required for efficient retrieval, andbecause humanities scholars seek new analogiesrather than causes. Drawing upon an efficaciousmethod for discovering previously unknown causesof medical syndromes, and searching inHumanities Index, a periodical index included inWILS, the Wilson Database, an illuminating newhumanities analogy was found by constructing asearch statement in which proper names werecoupled with associated concepts.

35 citations


Journal ArticleDOI
TL;DR: An empirical study was undertaken to evaluate second language learning with a videodisc named Vi-Conte and suggestions for the development of adaptive learning environments are made based on these findings.
Abstract: An empirical study was undertaken to evaluate second language learning with a videodisc named Vi-Conte. The 78 subjects were post-secondary students and adults and belonged either to a control group, a video-control group or an experimental group. The research methodology is presented as well as analyses of learners' navigational patterns, strategies, gains in vocabulary items, changes in attitude and global evaluation of the videodisc. Suggestions for the development of adaptive learning environments are made based on these findings.

33 citations


Journal ArticleDOI
TL;DR: The point of this paper is to examine two well-known contributions to sense-tag critically, widely taken to show that the task, as defined, cannot be carried outsystematically by humans and, secondly, which claims strikingly good results at doing exactly that.
Abstract: This paper addresses the question of whether it is possible to sense-tag systematically, and on a large scale, and how we should assess progress so far. That is to say, how to attach each occurrence of a word in a text to one and only one sense in a dictionary - a particular dictionary of course, and that is part of the problem. The paper does not propose a solution to the question, though we have reported empirical findings elsewhere (Cowie et al., 1992; Wilks et al., 1996; Wilks and Stevenson, 1997), and intend to continue and refine that work. The point of this paper is to examine two well-known contributions critically: The first (Kilgarriff, 1993), which is widely taken to show that the task, as defined, cannot be carried out systematically by humans and, secondly (Yarowsky, 1995), which claims strikingly good results at doing exactly that

33 citations


Journal ArticleDOI
TL;DR: An information-based framework for punctuation is presented, influenced by treatments of several related phenomena in computational linguistics, to outline a current perspective for the usage and functions of punctuation marks.
Abstract: Some recent studies in computational linguistics have aimed to take advantage of various cues presented by punctuation marks. This short survey is intended to summarise these research efforts and additionally, to outline a current perspective for the usage and functions of punctuation marks. We conclude by presenting an information-based framework for punctuation, influenced by treatments of several related phenomena in computational linguistics.

33 citations


Journal ArticleDOI
TL;DR: This paper describes the LT NSL system, an architecture for writing corpus processing tools, and addresses the advantages and disadvantages of an SGML approach compared with a non-sgml database approach.
Abstract: This paper describes the LT NSL system (McKelvie et al., 1996), an architecture for writing corpus processing tools. This system is then compared with two other systems which address similar issues, the GATE system (Cunningham et al., 1995) and the IMS Corpus Workbench (Christ, 1994). In particular we address the advantages and disadvantages of an SGML approach compared with a non-sgml database approach.

29 citations


Journal ArticleDOI
TL;DR: A morphological analyser for Estonian and how using a text corpus influenced the process of creating it and the resulting programitself and how this influence is noticeable in the resulting algorithm and implementation too.
Abstract: The paper describes a morphological analyser forEstonian and how using a text corpus influenced theprocess of creating it and the resulting programitself. The influence is not limited to the lexicononly, but is also noticeable in the resulting algorithm andimplementation too. When work on the analyser began,there were no computational treatment of Estonianderivatives and compounds. After some cycles ofdevelopment and testing on the corpus, we came up withan acceptable algorithm for their treatment. Both themorphological analyser and the speller based on ithave been successfully marketed.

26 citations


Journal ArticleDOI
TL;DR: Four examples of content analysis studies with and without autocorrelation are discussed and several methods specifically intended for small samples have been developed.
Abstract: Many content analysis studies involving temporal data are biased by some unknown dose of autocorrelation. The effect of autocorrelation is to inflate or deflate the significant differences that may exist among the different parts of texts being compared. The solution consists in removing effects due to autocorrelation, even if the latter is not statistically significant. Procedures such as Crosbie's (1993) ITSACORR remove the effect of at least first-order autocorrelations and can be used with small samples. The AREG procedure of SPSS (1994) and the AUTOREG procedure of SAS (1993) can be employed to detect and remove first-order autocorrelations, and higher-order ones too in the case of AUTOREG, while several methods specifically intended for small samples (Huitema and McKean, 1991, 1994) have been developed. Four examples of content analysis studies with and without autocorrelation are discussed.

Journal ArticleDOI
TL;DR: Two high-resolution digital imaging systems developed by the National Gallery in London are used in the technical study of paintings from the Collection, for example in recording changes of colour that result from conservation treatment, clarification of infrared images, comparison of related compositions and computer reconstruction of faded or altered colours.
Abstract: To allow permanent records of the condition of paintings to be made, the National Gallery in London has developed two high-resolution digital imaging systems over the past ten years; the VASARI scanner and the MARC camera. Each is capable of recording images of paintings with excellent colour accuracy, permitting comparisons between the state of paintings now and in the future to be made. In addition to their prime uses in documenting condition and measuring change, the systems have also been used in the technical study of paintings from the Collection, for example in recording changes of colour that result from conservation treatment, clarification of infrared images, comparison of related compositions and computer reconstruction of faded or altered colours.

Journal ArticleDOI
TL;DR: In this article, two programs for statistical analysis of concordance lines are described, which can be used for analyzing the lexical context of a given word in a given context.
Abstract: In this report two programs for statistical analysis of concordance lines are described. The programs have been developed for analyzing he lexical context of a given word. It is shown how different parameter settings influence the outcome of collocational analysis, and how the concept of collocation can be extended to allow the extraction of lines typical for a word from a set of concordance lines. Even though all the examples are for English, the software is completely language independent and only requires minimal linguistic resources.

Journal ArticleDOI
TL;DR: This paper presents a method for developing limited-context grammar rules in order to mark up text automatically, by attaching specific text segments to a small number of well-defined and application-determined semantic categories.
Abstract: This paper presents a method for developing limited-context grammar rules in order to mark up text automatically, by attaching specific text segments to a small number of well-defined and application-determined semantic categories. The Text Analysis Tool with Object Encoding (TATOE) was used in order to support the iterative process of developing a set of rules as well as for constructing and managing the lexical resources. The work reported here is part of a real-world application scenario: the automatic semantic mark up of German news messages, as provided by a German press agency, according to the SGML-based standard News Industry Text Format (NITF) to facilitate their further exchange. The implemented export mechanism of the semantic mark up into NITF is also described in the paper.

Journal ArticleDOI
TL;DR: This article is a detailed account of COMLEX Syntax, an on-line syntactic dictionary of English, developed by the Proteus Project at New York University under the auspices of the Linguistics Data Consortium.
Abstract: This article is a detailed account of COMLEX Syntax, an on-line syntactic dictionary of English, developed by the Proteus Project at New York University under the auspices of the Linguistics Data Consortium. This lexicon was intended to be used for a variety of tasks in natural language processing by computer and as such has very detailed classes with a large number of syntactic features and complements for the major parts of speech and is, as far as possible, theory neutral. The dictionary was entered by hand with reference to hard copy dictionaries, an on-line concordance and native speakers‘intuition. Thus it is without prior encumbrances and can be used for both pure research and commercial purposes.

Journal ArticleDOI
Julia Flanders1
TL;DR: It is argued that, to truly determine the importance of images to the function of electronic editions, the authors must understand the contribution the image makes to the form of textual knowledge provided by the edition.
Abstract: This paper discusses the contested role of images in electronic editions, and summarizes some of the chief arguments for their inclusion. I then argue that, to truly determine the importance of images to the function of electronic editions, we must understand the contribution the image makes to the form of textual knowledge provided by the edition. I suggest a distinction between editions which are primarily pedagogical in their aims, those which aim above all at scholarly authority, and those which attempt to provide textual information as high-quality data which can be analysed and processed. I conclude that the latter represents the most significant future trend in electronic editing.

Journal ArticleDOI
TL;DR: Because the TEI Guidelines provide a fairly flexible system for the encoding of proper names, projects may need to collaborate to determine more specific constraints, to ensure consistency of approach and compatibility of data.
Abstract: This paper discusses the encoding of proper names using the TEI Guidelines, describing the practice of the Women Writers Project at Brown University, and the CELT Project at University College, Cork. We argue that such encoding may be necessary to enable historical and literary re- search, and that the specific approach taken will depend on the needs of the project and the audience to be served. Because the TEI Guidelines provide a fairly flexible system for the encoding of proper names, we conclude that projects may need to collaborate to determine more specific constraints, to ensure consistency of approach and compatibility of data.

Journal ArticleDOI
TL;DR: The authors describe and analyze the didactic dimensions to be considered when designing a multi-media tool, based on their own experience as software authors and language trainers.
Abstract: Multi-Media does not, just by itself, guarantee accelerated learning and enhanced motivation unless there is a clear pedagogical progression and learning strategy. The authors describe and analyze the didactic dimensions to be considered when designing a multi-media tool, based on their own experience as software authors and language trainers.

Journal ArticleDOI
TL;DR: The integration of the 1871 Canadian census public use sample with similar samples of the 1850 and 1880 American censuses to form the Integrated Canadian-American Public Use Microdata Series (ICAPUMS) is described.
Abstract: The comparative use of census data is a useful way to study social characteristics across national boundaries. However, truly comparative demographic history is not possible without fully integrating separate census data, uniting multiple data files with a common set of comparably coded variables. This paper describes the integration of the 1871 Canadian census public use sample with similar samples of the 1850 and 1880 American censuses to form the Integrated Canadian-American Public Use Microdata Series (ICAPUMS). These data sets lent themselves well to integration because of their strong similarities in sampling design, data collection and data organization. Consistency in the availability and treatment of variables also eased integration of the samples, although the harmonization of occupation variables presented significant challenges. The ICAPUMS features a general household relationship variable which allows us to examine household structure across the two countries and three years. The paper concludes by proposing some general principles of census data set integration. This integrated data set is now available to researchers on the website of the University of Minnesota Historical Census Projects (www.hist.umn.edu/~ipums).

Journal ArticleDOI
Jack Child1
TL;DR: This article uses several approaches to assess the impact that using computerassisted instruction (CAI) had in several undergraduate courses taught at American University, with an emphasis on possible reasons for the success (or lack of) in using CAI in these courses.
Abstract: This article uses several approaches to assess the impact that using computerassisted instruction (CAI) had in several undergraduate courses taught at American University (Washington, DC). The various CAI materials are first described (Part II in general, and Part III in greater detail for one program which was authored in-house) as part of an evolutionary process from conventional teaching of several courses to increasingly heavy use of CAI. The principal focus is a General Education survey course for first and second year college students, "Latin America: History, Art, Literature". Part IV describes the methodology of the assessment process, along with the various sets of data developed. The data are then analyzed in Part IV, and discussed in Part V, with an emphasis on possible reasons for the success (or lack of) in using CAI in these courses, along with some limitations and problems observed. Conclusions are reached in Part VI. In examining the literature on CAI, one is struck by the generally accepted premise that CAI has a strong positive impact on teaching, especially at the K- 12 and lower university levels.2 Understandably, this premise is enthusiastically echoed by those involved in developing and selling the hardware and software labeled as "educational".

Journal ArticleDOI
TL;DR: FOIL extends the capabilities of earlier anthropology-specific learning programs by providing a more powerful representation for induced relationships, and is better able to learn in the face of noisy or incomplete data.
Abstract: A common problem in anthropological field work is generalizing rules governing social interactions and relations (particularly kinship) from a series of examples. One class of machine learning algorithms is particularly well-suited to this task: inductive logic programming systems, as exemplified by FOIL. A knowledge base of relationships among individuals is established, in the form of a series of single-predicate facts. Given a set of positive and negative examples of a new relationship, the machine learning programs build a Horn clause description of the target relationship. The power of these algorithms to derive complex hypotheses is demonstrated for a set of kinship relationships drawn from the anthropological literature. FOIL extends the capabilities of earlier anthropology-specific learning programs by providing a more powerful representation for induced relationships, and is better able to learn in the face of noisy or incomplete data.

Journal ArticleDOI
TL;DR: Comparative analysis of Common Sense and other pre-Revolutionary pamphlets suggests that Common Sense was indeed stylistically unique; no other pamphleteer came close to matching Paine's combination of simplicity and forcefulness.
Abstract: The extraordinary impact of Thomas Paine's Common Sense has often been attributed to its style — to the simplicity and forcefulness with which Paine expressed ideas that many others before him had expressed Comparative analysis of Common Sense and other pre-Revolutionary pamphlets suggests that Common Sense was indeed stylistically unique; no other pamphleteer came close to matching Paine's combination of simplicity and forcefulness

Journal ArticleDOI
TL;DR: It is shown that machine learning techniques can be used for designing an archaeological typology, at an early stage when the classes are not yet well defined, and results show a good compatibility between the classes such as the yare defined by the system and the archaeological hypotheses.
Abstract: The authors here show that machine learning techniques can be used for designing an archaeological typology, at an early stage when the classes are not yet well defined. The program (LEGAL, LEarning with GAlois Lattice) is a machine learning system which uses a set of examples and counter-examples in order to discriminate between classes. Results show a good compatibility between the classes such as the yare defined by the system and the archaeological hypotheses.

Journal ArticleDOI
TL;DR: A way of creating and maintaining a ‘dynamic encyclopedia’, i.e., an encyclopedia whose entries can be improved and updated on a continual basis without requiring the production of an entire new edition.
Abstract: This paper describes a way of creating and maintaining a ‘dynamic encyclopedia’, i.e., an encyclopedia whoseentries can be improved and updated on a continual basis withoutrequiring the production of an entire new edition. Such anencyclopedia is therefore responsive to new developments and newresearch. We discuss our implementation of a dynamic encyclopedia andthe problems that we had to solve along the way. We also discuss waysof automating the administration of the encyclopedia.

Journal ArticleDOI
TL;DR: The different developmental trajectories of computer-aided historical research and teaching in Western Europe and in the United States are surveyed and synergies which promise to enhance the discipline are sought.
Abstract: This historiographical article surveys the different developmental trajectories of computer-aided historical research and teaching in Western Europe and in the United States, and seeks synergies which promise to enhance the discipline.

Journal ArticleDOI
TL;DR: A detailed evaluation of the effectiveness of a system that has been developed for the identification and retrieval of morphological variants in searches of Latin text databases and to a range of classical, vulgar and medieval Latin texts drawn from the Patrologia Latina and from the PHI Disk 5.3 datasets.
Abstract: This paper reports a detailed evaluation of the effectiveness of a system that has been developed for the identification and retrieval of morphological variants in searches of Latin text databases. A user of the retrieval system enters the principal parts of the search term (two parts for a noun or adjective, three parts for a deponent verb, and four parts for other verbs), this enabling the identification of the type of word that is to be processed and of the rules that are to be followed in determining the morphological variants that should be retrieved. Two different search algorithms are described. The algorithms are applied to the Latin portion of the Hartlib Papers Collection and to a range of classical, vulgar and medieval Latin texts drawn from the Patrologia Latina and from the PHI Disk 5.3 datasets. The effectiveness of these searches demonstrates the effectiveness of our procedures in providing access to the full range of classical and post-classical Latin text databases.

Journal ArticleDOI
TL;DR: The Orlando Project, based at the Universities of Alberta and Guelph, is using SGML to create an integrated electronic history of British women's writing in English, which incorporates sophisticated SGML encoding for content as well as structure.
Abstract: This paper describes the novel ways in which the Orlando Project, based at the Universities of Alberta and Guelph, is using SGML to create an integrated electronic history of British women's writing in English. Unlike most other SGML-based humanities computing projects which are tagging existing texts, we are researching and writing new material, including biographies, items of historical significance, and many kinds of literary and historical interpretation, all of which incorporates sophisticated SGML encoding for content as well as structure. We have created three DTDs, for biographies, for writing-related activities and publications, and for social, political and other events. A major factor influencing the design of the DTDs was the requirement to be able to merge and restructure the entire text base in many ways in order to retrieve and index it and to reflect multiple views and interpretations. In addition a stable and well-documented system for tagging was deemed essential for a team which involves almost twenty people, including eight graduate students, in two locations.

Journal ArticleDOI
TL;DR: The article briefly reviews the history of the ELSE program, one of the first Computer-Assisted Language (CALL) programs to use sophisticated error analysis, and describes those desiderata and the problems that ensue as one tries to implement them into a specific piece of software.
Abstract: The article briefly reviews the history of the ELSE program, one of the first Computer-Assisted Language (CALL) programs to use sophisticated error analysis. Its development evokes ten desiderata for such programs (not all of which are found in ELSE). The article, addressed to language teachers and to programmers working with CALL, describes those desiderata and the problems that ensue as one tries to implement them into a specific piece of software.

Journal ArticleDOI
TL;DR: In this article, the authors present two groups of text encoding problems encountered by the Brown University Women Writers' Project (WWP) and analyze the issues they raise, and present several possible approaches to these encoding problems.
Abstract: This paper presents two groups of text encodingproblems encountered by the Brown University WomenWriters Project (WWP). The WWP is creating a full-textdatabase of transcriptions of pre-1830 printed bookswritten by women in English. For encoding our texts weuse Standard Generalized Markup Language (SGML),following the Text Encoding Initiative’s Guidelines for Electronic Text Encoding andInterchange. SGML is a powerful text encoding systemfor describing complex textual features, but a fullexpression of these may require very complex encoding,and careful thought about the intended purpose of theencoded text. We present here several possibleapproaches to these encoding problems, and analyze theissues they raise.

Journal ArticleDOI
TL;DR: An excellent collection of work is assembled which illustrates the flavour of humanities computing as the nineties draw to a close and contextualize this snapshot with respect to the past and present state of the field.
Abstract: At the conclusion of the 1997 Joint Annual Meeting of the Association for Computers in the Humanities and the Association for Literary and Linguistic Computing held at Queen's University in Canada, it was decided to bring together a representative collection of articles based on papers from the conference in order to provide a 'snapshot' of the state of the art in humanities computing. While it was not possible to cover all areas of the field, we have been able to assemble, in our opinion, an excellent collection of work which illustrates the flavour of humanities computing as the nineties draw to a close. At the same time, in what follows, we will attempt to contextualize this snapshot with respect to the past and present state of the field.