Showing papers in &quot;Computers and The Humanities in 2003&quot;

Another Perspective on Vocabulary Richness

TL;DR: C-rater is an automated scoring engine that has been developed to score responses to content-based short answer questions using predicateargument structure, pronominal reference, morphological analysis and synonyms to assign full or partial credit to a short answer question.

...read moreread less

Abstract: C-rater is an automated scoringengine that has been developed to scoreresponses to content-based short answerquestions. It is not simply a stringmatching program – instead it uses predicateargument structure, pronominal reference,morphological analysis and synonyms to assignfull or partial credit to a short answerquestion. C-rater has been used in two studies:National Assessment for Educational Progress(NAEP) and a statewide assessment in Indiana.In both studies, c-rater agreed with humangraders about 84% of the time.

...read moreread less

363 citations

Journal Article•DOI•

[...]

David L. Hoover¹•Institutions (1)

New York University¹

Phonetic Alignment and Similarity

TL;DR: Vocabulary richness is of marginal value in stylistic and authorship studies because the basic assumption that it constitutes awordprint for authors is false.

...read moreread less

Abstract: This article examines the usefulness of vocabulary richness for authorship attribution and tests the assumption that appropriate measures of vocabulary richness can capture an author's distinctive style or identity. After briefly discussing perceived and actual vocabulary richness, I show that doubling and combining texts affects some measures in computationally predictable but concep- tually surprising ways. I discuss some theoretical and empirical problems with some measures and develop simple methods to test how well vocabulary richness distinguishes texts by different authors. These methods show that vocabulary richness is ineffective for large groups of texts because of the extreme variability within and among them. I conclude that vocabulary richness is of marginal value in stylistic and authorship studies because the basic assumption that it constitutes a wordprint for authors is false.

...read moreread less

109 citations

Journal Article•DOI•

[...]

Grzegorz Kondrak¹•Institutions (1)

University of Alberta¹

The Time Course of Language Change

TL;DR: A novel approach to the problem that employs a scoring scheme for computing phonetic similarity between phonetic segments on the basis of multivalued articulatory phonetic features, which performs better than comparable algorithms reported in the literature.

...read moreread less

Abstract: The computation of the optimal phonetic alignment and the phonetic similarity between words is an important step in many applications in computational phonology, including dialecto- metry. After discussing several related algorithms, I present a novel approach to the problem that employs a scoring scheme for computing phonetic similarity between phonetic segments on the basis of multivalued articulatory phonetic features. The scheme incorporates the key concept of feature salience, which is necessary to properly balance the importance of various features. The new algorithm combines several techniques developed for sequence comparison: an extended set of edit operations, local and semiglobal modes of alignment, and the capability of retrieving a set of near-optimal alignments. On a set of 82 cognate pairs, it performs better than comparable algorithms reported in the literature.

...read moreread less

93 citations

Journal Article•DOI•

[...]

Patrick Juola¹•Institutions (1)

Duquesne University¹

Questions of Authorship: Attribution and Beyond A Lecture Delivered on the Occasion of the Roberto Busa Award ACH-ALLC 2001, New York

TL;DR: A statistical analysis of the results shows, first, that language change can be measured, and second, that the rate of language change has not been uniform, and that in particular, the period 1939-;1948 had particularly slow change, while 1949-;1950 and 1959-;1968 had particularly rapid changes.

...read moreread less

Abstract: This paper presents a numeric and information theoretic model for themeasuring of language change, without specifying the particular type ofchange. It is shown that this measurement is intuitively plausibleand that meaningful measurements canbe made from as few as 1000 characters. This measurement techniqueis extended to the task of determining the ``rate'' of language changebased on an examination of brief excerpts from the NationalGeographic Magazine and determining both their linguistic distancefrom one another as well as the number of years of temporal separation.A statistical analysis of these results shows, first, that language changecan be measured, and second, that the rate of languagechange has not been uniform, and that in particular, the period 1939-;1948had particularly slow change, while 1949-;1958 and 1959-;1968 hadparticularly rapid changes.

...read moreread less

72 citations

Journal Article•DOI•

[...]

John Burrows¹•Institutions (1)

University of Newcastle¹

Profile-Based Linguistic Uniformity as a Generic Method for Comparing Language Varieties

TL;DR: The authors presented a methode d'identification de l'auteur d'un texte, based on a hierarchique des frequences des mots communs a l'ensemble of textes.

...read moreread less

Abstract: L'A. intervient a l'occasion de la reception du prix Roberto Busa 2001 qui lui est decerne pour sa contribution dans le domaine de l'informatique et des sciences humaines. Il fait un bilan de son parcours et presente une nouvelle methode d'identification de l'auteur d'un texte. Son interet pour ce domaine de recherche a debute dans les annees 1970, avec l'analyse d'un texte de Jane Austen. Il nous presente aujourd'hui une nouvelle approche basee sur une liste hierarchique des frequences des mots communs a l'ensemble des textes. Il decrit alors les procedures et les resultats de cette methode, en donne quelques developpements possibles pour l'avenir et fait le point sur l'etat de l'art dans ce domaine de la recherche assistee par ordinateur

...read moreread less

70 citations

Journal Article•DOI•

[...]

Dirk Speelman¹, Stefan Grondelaers¹, Dirk Geeraerts¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Jan 2003-Computers and The Humanities

TL;DR: A profile-based linguistic uniformity method designed to compare language varieties on the basis of a wide range of potentiallyheterogeneous linguistic variables, which makes it possible to compare them and investigate the implications of notabledifferences.

...read moreread less

Abstract: In this text we present``profile-based linguistic uniformity'', a methoddesigned to compare language varieties on thebasis of a wide range of potentiallyheterogeneous linguistic variables. In manyrespects a parallel can be drawn with currentmethods in dialectometry (for an overview, see,Nerbonne and Heeringa, 2001; Heeringa, Nerbonneand Kleiweg, 2002): in both casesdissimilarities between varieties on the basisof individual variables are summarized inglobal dissimilarities, and a series oflanguage varieties are subsequently clusteredor charted using multivariate techniques suchas cluster analysis or multidimensionalscaling. This global similarity between themethods makes it possible to compare them andto investigate the implications of notabledifferences. In this text we specifically focuson, and defend one characteristic of ourmethodology, its profile-based nature.

...read moreread less

57 citations

Journal Article•DOI•

Lexical distance in LAMSAS

[...]

John Nerbonne¹, Peter Kleiweg¹•Institutions (1)

University of Groningen¹

Extending Dublin Core Metadata to Support the Description and Discovery of Language Resources

TL;DR: Alexical distance measure is applied to assess the lexical relatedness of LAMSAS's sites, a popular focus of investigation in the past and extends dialectometric technique in suggesting means of dealing with alternate forms and multiple responses.

...read moreread less

Abstract: The Linguistic Atlas of the Middle and South Atlantic States(LAMSAS) is admirably accessible for reanalysis (seehttp://hyde.park.uga.edu/lamsas/,Kretzschmar, 1994). The present paper applies alexical distance measure to assess the lexical relatedness of LAMSAS'ssites, a popular focus of investigation in the past(Kurath, 1949; Carver, 1989; McDavid, 1994). Several conclusions arenoteworthy: First, and least controversially, we note that LAMSAS isdialectometrically challenging at least due to the range of fieldworkers and questionnaires employed. Second, on the issue of whichareas ought to be recognized, we note that our investigations tend tosupport a three-wayNorth/South/Midlands division rather than a two-wayNorth/South division, i.e. they tend to support Kurath and McDavidrather than Carver, but this tendency is not conclusive. Third, weextend dialectometric technique in suggesting means of dealing withalternate forms and multiple responses.

...read moreread less

55 citations

Journal Article•DOI•

[...]

Steven Bird¹, Steven Bird², Gary Simons³•Institutions (3)

University of Melbourne¹, University of Pennsylvania², SIL International³

01 Dec 2003-Computers and The Humanities

TL;DR: A new digital infrastructure for discovering language resources being developed by the Open Language Archives Community is reported on, designed to facilitatedescription and discovery of all kinds of language resources, including data, tools, or advice.

...read moreread less

Abstract: As language data and associated technologies proliferate and as the language resources community expands, it is becoming increasingly difficult to locate and reuse existing resources. Are there any lexical resources for such-and-such a language? What tool works with transcripts in this particular format? What is a good format to use for linguistic data of this type? Questions like these dominate many mailing lists, since web search engines are an unreliable way to find language resources. This paper reports on a new digital infrastructure for discovering language resources being developed by the Open Language Archives Community (OLAC). At the core of OLAC is its metadata format, which is designed to facilitate description and discovery of all kinds of language resources, including data, tools, or advice. The paper describes OLAC metadata, its relationship to Dublin Core metadata, and its dissemination using the metadata harvesting protocol of the Open Archives Initiative.

...read moreread less

37 citations

Journal Article•DOI•

A Machine Learning Approach for Identification of Thesis and Conclusion Statements in Student Essays

[...]

Jill Burstein¹, Daniel Marcu²•Institutions (2)

Princeton University¹, University of Southern California²

Norwegian dialects examined perceptually and acoustically

TL;DR: Two essay-based discourse analysis systems that identify thesis and conclusion statements from student essays written on six different essaytopics show similar results, indicating that asystem can generalize to unseen data – that is, essay responses on topics that the system has not seen in training.

...read moreread less

Abstract: This study describes and evaluates twoessay-based discourse analysis systems thatidentify thesis and conclusion statements fromstudent essays written on six different essaytopics. Essays used to train and evaluate thesystems were annotated by two human judges,according to a discourse annotation protocol. Using a machine learning approach, a number ofdiscourse-related features were automaticallyextracted from a set of annotated trainingdata. Using these features, two discourseanalysis models were built using C5.0 withboosting: a topic-dependent and atopic-independent model. Both systemsoutperformed a positional algorithm. While thetopic-dependent system showed somewhat higherperformance, the topic-independent systemshowed similar results, indicating that asystem can generalize to unseen data – thatis, essay responses on topics that the systemhas not seen in training.

...read moreread less

34 citations

Journal Article•DOI•

[...]

Wilbert Heeringa¹, Charlotte Gooskens¹•Institutions (1)

University of Groningen¹

Introducing computational techniques in dialectometry

TL;DR: The aim of this paper is to find an acoustic distance measure between dialects which approximates perceptual distance measure, and applies the Levenshteinalgorithm to spectra or formant value bundles instead of transcription segments.

...read moreread less

Abstract: Gooskens (2003) described an experiment which determined linguistic distances between 15 Norwegian dialects as perceived by Norwegian listeners. The results are compared to Levenshtein distances, calculated on the basis of transcriptions (of the words) of the same recordings as used in the perception experiment. The Levenshtein distance is equal to the sum of the weights of the insertions, deletions and substitutions needed to change one pronunciation into another. The success of the method depends on the reliability of the transcriber. The aim of this paper is to find an acoustic distance measure between dialects which approximates perceptual distance measure. We use and compare different representations of the acoustic signal: Barkfilter spectrograms, cochleagrams and formant tracks. We now apply the Levenshtein algorithm to spectra or formant value bundles instead of transcription segments. From these acoustic representations we got the best results using the formant track representation. However the transcription-based Levenshtein distances correlate still more closely. In the acoustic signal the speaker-dependent influence is kept to some extent, while a transcriber abstracts from voice quality. Using more samples per dialect word (instead of only one as in our research) should improve the accuracy of the measurements.

...read moreread less

27 citations

Journal Article•DOI•

[...]

John Nerbonne¹, John Nerbonne², William A. Kretzschmar¹, William A. Kretzschmar²•Institutions (2)

University of Georgia¹, University of Groningen²

The Pompey Project: Digital Research and Virtual Reconstruction of Rome's First Theatre

TL;DR: This special issue ofComputers and the Humanities presents a range of recent work on dialectology and dialectometry, which shows a perceived need for techniques which can deal with large amounts of data in a controlled means, i.e. computational techniques.

...read moreread less

Abstract: Dialectology is the study of dialects, and dialectometry is the measurement of dialect differences, i.e. linguistic differences whose distribution is determined primarily by geography. The earliest works in dialectology showed that language variation is complex both geographically and linguistically and cannot be reduced to simple characterizations. There has thus always been a perceived need for techniques which can deal with large amounts of data in a controlled means, i.e. computational techniques. This special issue of Computers and the Humanities presents a range of recent work on this topic.

...read moreread less

Journal Article•DOI•

[...]

Richard Beacham¹, Hugh Denard¹•Institutions (1)

University of Warwick¹

Authorship Attribution and Pastiche

TL;DR: The Theatre of Pompey became the architectural Ur-text for many of the numerous theatres built throughout the Roman Empire, and in the Renaissance left its imprint upon such seminal theatres as the Teatro Olimpico at Vicenza and theTeatro Farnese at Parma.

...read moreread less

Abstract: In 55 BC the triumphal general Pompey the Great dedicated Rome's first permanent theatre and named it after himself This was no ordinary theatre Pompey's sumptuous and grandiose edifice probably the largest theatre ever built comprised, in addition to the Theatre itself (the stage of which was 300 feet wide), an extensive "leisure-complex" of gardens enclosed within a colonnade, and galleries displaying rare works of art It also included a curia (a meeting house for the Senate), and it was in this building that Caesar was assassinated in 44 BC A grand temple above the uppermost tiers of the auditorium, dedicated to Pompey's patron divinity, Venus Victrix, crowned the entire architecturally unified monument Although the theatre was built upon the flats of the Campus Martius, this, its highest point, was second in height only to the temple of Jupiter on the capitol According to our research, the auditorium or cavea beneath it may have accommodated some 25,000 spectators1 Pompey's gift to the Roman people was for centuries the site of many of the most important events in the cultural and political life of the city2 Nero himself performed upon its stage,3 much to the disgust of the senatorial class and the delight of the masses As late as the 6th century AD, when it was restored for the last time the theatre was still sufficiently imposing for Cassiodorus to exclaim, "one would have thought it more likely for mountains to subside, than this strong building be shaken"4 Over five centuries earlier, when Vitruvius wrote his influential treatise, De Architectura, his detailed account of how a "typical" Roman theatre should be built was based upon Pompey's recently-completed edifice; indeed, at the time he wrote, it was probably still the only stone theatre in the city of Rome5 Thus, through Vitruvius, the Theatre of Pompey became the architectural Ur-text for many of the numerous theatres built throughout the Roman Empire Subsequently, in the Renaissance, through the influence of Vitruvius, the Theatre of Pompey left its imprint upon such seminal theatres as the Teatro Olimpico at Vicenza and the Teatro Farnese at Parma This single theatre, therefore, had a unique

...read moreread less

Journal Article•DOI•

[...]

Harold L. Somers¹, Fiona J. Tweedie²•Institutions (2)

University of Manchester¹, University of Glasgow²

Beyond the Web: TEI, the Digital Library, and the Ebook Revolution

TL;DR: Gilbert Adair's pastiche of Lewis Carroll, Alice Through the Needle's Eye, is compared with the original `Alice' books and a principal component analysis based on word frequencies finds that the main differences are not due to authorship.

...read moreread less

Abstract: This paper considers the question of authorship attribution techniques whenfaced with a pastiche. We ask whether the techniques can distinguish the real thing from the fake, or can the author fool the computer? If the latter, is this because the pastiche is good, or because the technique is faulty? Using a number of mainly vocabulary-based techniques, Gilbert Adair's pastiche of Lewis Carroll, Alice Through the Needle's Eye, is compared with the original `Alice' books. Standard measures of lexical richness, Yule's K andOrlov's Z both distinguish Adair from Carroll, though Z also distinguishesthe two originals. A principal component analysis based on word frequenciesfinds that the main differences are not due to authorship. A discriminantanalysis based on word usage and lexical richness successfully distinguishes thepastiche from the originals. Weighted cusum tests were also unable to distinguish the two authors in a majority of cases. As a cross-validation, wemade similar comparisons with control texts: another children's story from thesame era, and other work by Carroll and Adair. The implications of thesefindings are discussed.

...read moreread less

Journal Article•DOI•

[...]

Matthew A. Gibson¹, Christine Ruotolo¹•Institutions (1)

University of Virginia¹

Meta-Interpretation and Hypertext Fiction: A Critical Response

TL;DR: The mechanics of ebookproduction at the Etext Center, the limits of the current technology, and the conversion workflow are discussed, as well as the conversionflow the authors hope to implement in the future.

...read moreread less

Abstract: Between August 2000 and August2002, the Electronic Text Center at theUniversity of Virginia distributed over sevenmillion freely-available electronic books tousers from more than 100 different countries. Delivered in a variety of formats, including.lit and .pdb, these ebooks have providedproof-of-concept for the adaptive uses of TEIstandards beyond the World Wide Web – standardsthat the Electronic Text Center has employedsince its inception in 1992. The first half ofthis paper discusses the mechanics of ebookproduction at the Etext Center, the limits ofthe current technology, and the conversionworkflow we hope to implement in the future.The second half discusses user response to ourebook collection, classroom applications ofebook technology, and the advantages anddisadvantages that different formats offer toscholars and instructors in the humanities.

...read moreread less

Journal Article•DOI•

[...]

Colin Barry Gardner¹•Institutions (1)

University of Sheffield¹

Analyzing the Order of Items in Manuscripts of The Canterbury Tales

TL;DR: Meta-interpretation, a method that combines individual responses to a text, reading logs, screen recordings and limitedqualitative/quantitative analysis, and criticalinterpretation is outlined, which addresses Espen Aarseth's concerns and illuminates interesting features of interactive processes in fictional environments.

...read moreread less

Abstract: Traditional discourses upon literature have been predicated upon the ability to refer to a text that others may consult (Landow, 1994, p. 33). Texts that involve elements of feedback and non- trivial decision-making on the part of the reader (Aarseth, 1997, p. 1) therefore present a challenge to readers and critics alike. Since a persuasive case has been made against a critical method that sets out to "identify the task of interpretation as a task of territorial exploration and territorial mastery" (Aarseth, p. 87), this paper proposes the use of readers in an empirically based approach to hypertext fiction. Meta-interpretation, a method that combines individual responses to a text, reading logs, screen recordings and limited qualitative/quantitative analysis, and critical interpretation is outlined. By analysing readers' responses it is possible to suggest both the ways that textual elements may have influenced or determined readers' choices and the ways that readers' choices "configure" the text. The method thus addresses Espen Aarseth's concerns and illuminates interesting features of interactive processes in fictional environments. The paper is divided into two parts: the first part sketches out meta-interpretation through consideration of the main problems confronting the literary critic; the second part describes reading research aimed at generating data for the literary critic.

...read moreread less

Journal Article•DOI•

[...]

Matthew Spencer¹, Barbara Bordalejo², Li-San Wang³, Adrian C. Barbrook¹, Linne R. Mooney⁴, Peter Robinson², Tandy Warnow³, Christopher J. Howe¹ - Show less +4 more•Institutions (4)

University of Cambridge¹, De Montfort University², University of Texas at Austin³, University of Maine⁴

01 Jan 2003-Computers and The Humanities

TL;DR: Gene order analysis for Chaucer's Canterbury Tales supports the idea that there was no established order when the first manuscript was written, and shows relationships predicted by earliersolars, reveals new relationships, and shares features with a word variation stemma.

...read moreread less

Abstract: Chaucer's Canterbury Tales consists of loosely-connected stories, appearing in many different orders in extant manuscripts. Differences in order result from rearrangements by scribes during copying, and may reveal relationships among manuscripts. Identifying these relationships is analogous to determining evolutionary relationships among organisms from the order of genes on a genome. We use gene order analysis to construct a stemma for the Canterbury Tales. This stemma shows relationships predicted by earlier scholars, reveals new relationships, and shares features with a word variation stemma. Our results support the idea that there was no established order when the first manuscripts were written.

...read moreread less

Journal Article•

Introducing Computational Methods in Dialectometry

[...]

John Nerbonne, William A. Kretzschmar

01 Jan 2003-Computers and The Humanities

Journal Article•DOI•

The use of the Almeida-Braun system in the measurement of Dutch dialect distances

[...]

Wilbert Heeringa¹, Wilbert Heeringa², Angelika Braun², Angelika Braun¹•Institutions (2)

University of Groningen¹, University of Marburg²

Categorisation techniques in computer-assisted reading and analysis of texts (CARAT) in the humanities

TL;DR: An adjusted version of an articulation-based system, developed by Almeida and Braun (1986) for findingsound distances, using the IPA system is used, which gets a division with clear similarities to traditional dialect maps when classifying dialects.

...read moreread less

Abstract: Measuring dialect distances can be based on the comparison of words, and the comparison words should be based on the comparison of sounds. In this research we used an adjusted version of an articulation-based system, developed by Almeida and Braun (1986) for finding sound distances, using the IPA system. For comparison of two pronunciations of a word corresponding with two different varieties, we used the Levenshtein algorithm, which finds the easiest way in which one word can be changed into the other by inserting, deleting or substituting sounds. As operations weights of these three operations we used distances as found with the Almeida & Braun system. The dialect distance is now equal to the average of a range of word distances. We applied the technique to 360 Dutch dialects. The transcriptions of 125 words for each dialect are taken from the Reeks Nederlandse Dialectatlassen (Blancquaert and Pee, 1925-1982). We get a division with clear similarities to traditional dialect maps when classifying dialects. Using logarithmic sound distances improves results compared to results based on constant sound distances.

...read moreread less

Journal Article•DOI•

[...]

Jean-Frédéric de Pasquale¹, Jean-Guy Meunier¹•Institutions (1)

Université du Québec à Montréal¹

Vocabulary in interviews as related to respondent characteristics

TL;DR: This paper shall concentrate on the problem of assisting the automatic categorisation of small segments of aphilosophical text into a set of thematiccategories.

...read moreread less

Abstract: There are two important strategies incomputer-assisted reading and analysis of text(CARAT) The first relates to theclassification process, and the second pertainsto the categorisation process These twooften-interrelated operations have beenregularly recognised as essential components oftext analysis However, the two operations arehighly time-consuming A possible solution tothis problem calls upon more inductive orbottom-up strategies that are numerical andstatistical in nature In our own research, wehave been exploring a few of these techniquesand their combination We now know, through ourown past research and others' work, that theclassification methods allow a good empiricalthematic exploration of a corpus Morespecifically, in this paper we shallconcentrate on the problem of assisting theautomatic categorisation of small segments of aphilosophical text into a set of thematiccategories

...read moreread less

Journal Article•DOI•

[...]

Kjell Härnqvist¹, Ulf Christianson¹, Daniel Ridings¹, Jan-Gunnar Tingsell¹•Institutions (1)

University of Gothenburg¹

Modeling Task-Oriented Dialogue

TL;DR: Responses in personal interviews about education and career with 415 Swedish men and women (age 34) forms the basis of a speech corpus with 1.8 million words, described by means of two sets of vocabulary and a broad set of respondent characteristics.

...read moreread less

Abstract: Responses in personalinterviews about education and career with 415Swedish men and women (age 34) forms the basisof a speech corpus with 1.8 million words. Thevocabulary is described by means of two sets ofvariables. One is based on the number of tokensand types, word length and sectioning of therunning text. The other set divides the corpusinto grammatical categories. Both sets ofvariables are related to a number of backgroundvariables such as gender, socioeconomicbackground, education, and indicators of verbalproficiency at age 13 and 32. This possibilityto study the relationship between vocabularyand a broad set of respondent characteristicsis a unique feature of this corpus.

...read moreread less

Journal Article•DOI•

[...]

Maite Taboada¹•Institutions (1)

Simon Fraser University¹

Neighbours or Enemies? Competing Variants Causing Differences in Transitional Dialects

TL;DR: The use of a finite state machine (FSM) to disambiguate speech acts in a machinetranslation system and Evaluation results show that the discourse processor is able todisambiguates and improve the quality of thedialogue translation.

...read moreread less

Abstract: A common tool for improving the performance quality of natural language processing systems is the use of contextual information for disambiguation Here I describe the use of a finite state machine (FSM) to disambiguate speech acts in a machine translation system The FSM has two layers that model, respectively, the global and local structures found in naturally-occurring conversations The FSM has been modeled on a corpus of task-oriented dialogues in a travel planning situation In the dialogues, one of the interactants is a travel agent or hotel clerk, and the other a client requesting information or services A discourse processor based on the FSM was implemented in order to process contextual information in a machine translation system Evaluation results show that the discourse processor is able to disambiguate and improve the quality of the dialogue translation Other applications include human-computer interaction and computer-assisted language learning

...read moreread less

Journal Article•DOI•

[...]

Marjatta Palander¹, Lisa Lena Opas-Hänninen², Fiona J. Tweedie³•Institutions (3)

University of Eastern Finland¹, University of Oulu², University of Edinburgh³

Chronological distribution of information in historical texts

TL;DR: How clusteranalysis can shed light on very complexvariation in a transitional dialect zone in eastern Finland is shown to show how the effects of the old parishes, borders and settlements are stillvisible in the dialects.

...read moreread less

Abstract: The aim of this study is to show how clusteranalysis can shed light on very complexvariation in a transitional dialect zone ineastern Finland. In the course of history thisarea has been on the border between Sweden andRussia and the population has clearly been oftwo kinds: the Savo people and the Karelians.It is a well-known fact that there is variationamong these dialects, but the spread and extentof the variation has not been demonstrated previously.The idiolects of the area were studied in thelight of ten phonological and morphologicalfeatures. The material consisted of recordingsof 198 idiolects, totalling around 195 hoursand representing 19 parishes. The variation wasanalysed using hierarchical cluster analysis.While the analysis showed the extent of thevariation between idiolects and parishes, italso demonstrated how the effects of the oldparishes, borders and settlements are stillvisible in the dialects. On the parish level,the data formed clear clusters that correspondwith the main dialects in the area and itssurroundings. On the idiolect level, however,the speakers from the surrounding areas formedfairly homogenous clusters but the idiolectsfrom the Savonlinna area were spread acrossalmost all clusters.

...read moreread less

Journal Article•DOI•

[...]

Jordan Tabov¹•Institutions (1)

Bulgarian Academy of Sciences¹

The Value of Mentoring: Young Scholars in IT and the Humanities

TL;DR: Another approach to constructing similar functions of the so-called "volumefunction" describing the chronological distribution of information in historical texts is given.

...read moreread less

Abstract: In their papers, Kalashnikov et al (1986),Rachev et al (1989) and Fomenko et al (1990)introduced the so-called “volumefunction” describing the chronologicaldistribution of information in historical texts Here we give anotherapproach to constructing similar functions

...read moreread less

Journal Article•DOI•

[...]

Edward Vanhoutte

Graduate Education in Humanities Computing

Journal Article•DOI•

[...]

Geoffrey Rockwell¹•Institutions (1)

McMaster University¹

Putting the Dialogue Back Together Re-Creating Structure in Letter Publishing

Journal Article•DOI•

[...]

Øyvind Eide¹•Institutions (1)

University of Oslo¹

Talking About Meter in SGML

TL;DR: This paper will present a publication system in which selected material from letter collections is presented as dialogues between twopersons.

...read moreread less

Abstract: In this paper, we will present a publication system in which selectedmaterial from letter collections is presented as dialogues between twopersons.

...read moreread less

Journal Article•DOI•

[...]

Anne Mahoney¹•Institutions (1)

Tufts University¹

Locating the Eureka Stockade: Use of a Geographical Information System (GIS) in a Historiographical Research Context

TL;DR: An encoding for representing quantitativemetrical analyses in TEI SGML or XML documents, using only characters from the standard keyboard set, and a system for converting this encoding to other forms for display is described.

...read moreread less

Abstract: This paper describes an encoding for representing quantitative metrical analyses in TEI SGML or XML documents, using only characters from the standard keyboard set, and a system for converting this encoding to other forms for display.

...read moreread less

Journal Article•DOI•

[...]

Jack Harvey¹•Institutions (1)

Federation University Australia¹

The Were -Subjunctive in British Rural Dialects: Marrying Corpus and Questionnaire Data

TL;DR: GIS methodology was used for the purpose of locating the disputed site of ahistorically significant battle, which took place in 1854 when miners on an Australian goldfield staged an armed uprising against government forces.

...read moreread less

Abstract: GIS methodology was used for thepurpose of locating the disputed site of ahistorically significant battle, which tookplace in 1854 when miners on an Australian goldfield staged an armed uprising againstgovernment forces. The route of the firstsurvey of the area (1854) and the earliestknown contour map (1856–1857) were overlaid on amodern street grid. Other features such as thevantage points of illustrators and the authorsof eyewitness accounts were also incorporated. The resulting composite map was used as the keyreference framework for comparing andcritically evaluating a large body of primaryand secondary written accounts, and forreaching a conclusion concerning the site.

...read moreread less

Journal Article•DOI•

[...]

Andrew Hardie¹, Tony McEnery¹•Institutions (1)

Lancaster University¹

TEI Consortium Members Meet in Chicago

TL;DR: This paper examines the were-subjunctive in British rural dialects in the light of data from two sources: the Survey of English Dialects (SED) questionnaire, and the Leeds Corpus ofEnglish Dialect (LCED), consisting of transcribed recordings made at the same time as the data was gathered for the questionnaire.

...read moreread less

Abstract: This paper examines the were-subjunctive in British rural dialects in the light of data from two sources: the Survey of English Dialects (SED) questionnaire, and the Leeds Corpus of English Dialect (LCED), consisting of transcribed recordings made at the same time as the data was gathered for the questionnaire. We begin by surveying previous work on the subjunctive in general, and the were-subjunctive in dialect grammar in particular (section 1), culminating in a discussion of the SED data on the were-subjunctive. We then move on in section 2 to pose two hypotheses: firstly that the SED does not provide a complete picture of this phenomenon and thus corpus data may be of use enriching it; secondly a "null" hypothesis that no were-subjunctive is consistently marked in the dialects in question. We then look at the methodology and data used (section 3), describing the source of our data, the LCED. We also note some potential difficulties (3.1) before moving on to discuss the choice of an area of England to examine (3.2) and of texts to analyse (3.3). In section 3.4 we describe the mark-up scheme used in the analysis of the texts, and in 3.5 the process of annotation and extraction of results form the texts. These results are presented in section 4. We consider the corpus data in relation to the questionnaire data (4.1), and to our two hypotheses (4.2 and 4.3). In our Conclusion (section 5) we summarise the implications of this study and consider some possible future routes of enquiry into the were-subjunctive in the rural dialects of England.

...read moreread less

Journal Article•DOI•

[...]

Lou Burnard