scispace - formally typeset
Search or ask a question

Showing papers in "Digital Scholarship in the Humanities in 2017"


Journal ArticleDOI
TL;DR: It is shown that feature vector normalization, that is, the transformation of the feature vectors to a uniform length of 1 (implicit in the cosine measure), is the decisive factor for the improvement of Delta proposed recently.
Abstract: This article builds on a mathematical explanation of one the most prominent stylometric measures, Burrows’s Delta (and its variants), to understand and explain its working. Starting with the conceptual separation between feature selection, feature scaling, and distance measures, we have designed a series of controlled experiments in which we used the kind of feature scaling (various types of standardization and normalization) and the type of distance measures (notably Manhattan, Euclidean, and Cosine) as independent variables and the correct authorship attributions as the dependent variable indicative of the performance of each of the methods proposed. In this way, we are able to describe in some detail how each of these two variables interact with each other and how they influence the results. Thus we can show that feature vector normalization, that is, the transformation of the feature vectors to a uniform length of 1 (implicit in the cosine measure), is the decisive factor for the improvement of Delta proposed recently. We are also able to show that the information particularly relevant to the identification of the author of a text lies in the profile of deviation across the most frequent words rather than in the extent of the deviation or in the deviation of specific words only. .................................................................................................................................................................................

102 citations


Journal ArticleDOI
TL;DR: In this paper, the authors discuss reliability issues of a few visual techniques used in stylometry, and introduce a new method that enhances the explanatory power of visualization with a procedure of validation inspired by advanced statistical methods.
Abstract: The aim of this article is to discuss reliability issues of a few visual techniques used in stylometry, and to introduce a new method that enhances the explanatory power of visualization with a procedure of validation inspired by advanced statistical methods A promising way of extending cluster analysis dendrograms with a self-validating procedure involves producing numerous particular ‘snapshots’, or dendrograms produced using different input parameters, and combining them all into the form of a consensus tree Significantly better results, however, can be obtained using a new visualization technique, which combines the idea of nearest neighborhood derived from cluster analysis, the idea of hammering out a clustering consensus from bootstrap consensus trees, with the idea of mapping textual similarities onto a form of a network Additionally, network analysis seems to be a good solution for large data sets

67 citations


Journal ArticleDOI
TL;DR: The ancient Maya belief that sight played a key role in structuring everyday experiences because it triggered perception in the other senses serves to bridge the computational and experiential results in this case study.
Abstract: For several decades, Geographic Information Systems (GISs) have held center stage in archaeological studies of ancient landscapes. Recently, three-dimensional (3D) technologies such as airborne LiDAR and aerial photogrammetry are allowing us to acquire inordinate amounts of georeferenced 3D data to locate, map, and visualize archaeological sites within their surrounding landscapes. GIS offers locational precision, data overlay, and complex spatial analysis. Threedimensionality adds a ground-based perspective lacking in two-dimensional GIS maps to provide archaeologists a sense of mass and space more closely attuned with human perception. This article uses comparative and iterative approaches ‘tacking back and forth’ between GIS and 3D visualization to explore the role of visibility in conveying sociopolitical and ideological messages at ancient Copan—today a UNESCO World Heritage Site in Honduras. A two-prong approach comprising computational and experiential components explores the potential role of visibility in sending messages that participate in the shaping of social interaction on a daily basis. The organization of built forms within the natural landscape created spatial configurations that sent visual messages targeting specific different groups, subsequently influencing how people negotiated their physical surroundings and the frequency and intensity of social interactions. The ancient Maya belief that sight played a key role in structuring everyday experiences because it triggered perception in the other senses thus serves to bridge the computational and experiential results in this case study. .................................................................................................................................................................................

28 citations


Journal ArticleDOI
TL;DR: In this article, a meso-level visualization of aligned text is presented for comparative reading on the screen, all the while assembling non-contradictory, intuitive solutions for the visual exploration of multi-scalar variance.
Abstract: Medieval literary traditions provide a particularly challenging test case for textual alignment and the visualization of variance. Whereas the editors of medieval traditions working with the printed page struggle to illustrate the complex phenomena of textual instability, research in screen-based visualization has made significant progress, allowing for complex textual situations to be captured at the microand the macro-level. This article uses visualization and a computational approach to identifying variance to allow the analysis of different medieval poetic works using the transcriptions of how they are found in particular manuscripts. It introduces the notion of a meso-level visualization, a visual representation of aligned text providing for comparative reading on the screen, all the while assembling non-contradictory, intuitive solutions for the visual exploration of multi-scalar variance. Building upon the literary notion of mouvance, it delves into medieval French literature and, in particular, different visualizations of three versions of the Chanson de Roland (the Oxford, the Châteauroux, and the Venice 4 manuscripts). The article presents experimental prototypes for such meso-level visualization and explores how they can advance our understanding of formulaically rich medieval poetry. .................................................................................................................................................................................

24 citations



Journal ArticleDOI
TL;DR: The set-up of 'The Newstracker', a tool that primarily allowed us to analyse online news consumption of a group of young Dutch news users on their desktop and laptop computers, and the need for a multimethod study design when aiming to understand online user behaviour is described.
Abstract: Understanding people's online behaviour has traditionally been a field of interest of commercial research agencies. However, academic researchers in a variety of fields are interested in the same type of data to gain insights in the Web behaviour of users. Digital Humanities scholars interested in the use of digital collections are, e.g., interested in the navigation paths of users to these collections. In our case we wanted (1) to analyse the way news consumers visit news websites and (2) understand how these websites fit in their daily news consumption patterns. Until now most common applied scholarly research methods to analyse online user behaviour focus on analyses of log files provided by website owners or recalled user behaviour by survey, diary, or interview methods. Only recently scholars started to experiment with gathering real-world data of Web behaviour by monitoring a group of respondents. In this article we describe the set-up of 'The Newstracker', a tool that primarily allowed us to analyse online news consumption of a group of young Dutch news users on their desktop and laptop computers. We demonstrate the workflow of the Newstracker and how we designed the data collection and pre-processing phase. By reflecting on the technical, methodological, and analytical challenges we encountered, we illustrate the potential of online monitoring tools such as the Newstracker. We end our article with discussing its limitations by stressing the need for a multimethod study design when aiming not only to analyse but also to understand online user behaviour.

16 citations


Journal ArticleDOI
TL;DR: There is a need to develop code criticism as a critical and reflexive tool within the humanities given the increasing softwarization of both society and scholarship that pushes the boundaries of the methods and objects of study of the humanities.
Abstract: What is the scholarly nature of code and how do we evaluate the scholarship involved with coding? Our claim is that the humanities need an urgent answer to these questions given the increasing softwarization of both society and scholarship that pushes the boundaries of the methods and objects of study of the humanities. We argue that, as a result, there is a need to develop code criticism as a critical and reflexive tool within the humanities. Code criticism is described and positioned with respect to critical code studies, textual criticism, literary criticism, tool, and interface critique. Finally we outline an approach to code criticism based on ideas of reciprocal inquiry and of a continuum of literacies that connects code, code criticism, textual criticism, and literature.

13 citations


Journal ArticleDOI
Espen S. Ore1
TL;DR: The EpiDoc community is focused on as a positive example of a specialist community of practitioners who take a flexible approach to TEI instruction that meets both the shared and individual needs of scholars.
Abstract: This chapter reviews the current online resources available to learn the TEI Guidelines for structured data in the humanities, as well as the theory that drives their construction and continued improvement. It focuses on the EpiDoc community as a positive example of a specialist community of practitioners who take a flexible approach to TEI instruction that meets both the shared and individual needs of scholars (cf. Bodard and Stoyanova, q.v.). We also address some of the barriers to multilingual contribution to the online digital Classics, and report on a case study in which we discuss the experience of Masters-level students trained in non-digital Classics methods with the translation and transcription of texts via the Perseids platform (cf. Almas and Beaulieu, q.v.). We consider how templates revealing the TEI markup allow students to gain comfort and familiarity with the XML, as well as to enable their own work to serve as a model for future contributors. However, we also note the pedagogical limitations of contribution without direct instruction as seen in this case study, and posit that a mixed model of experiential education combined with interpersonal guidance might better serve students hoping to contribute machine-actionable data in the digital Classics. How to cite this book chapter: Dee, S, Foradi, M and Šarić, F. 2016. Learning By Doing: Learning to Implement the TEI Guidelines Through Digital Classics Publication. In: Bodard, G & Romanello, M (eds.) Digital Classics Outside the Echo-Chamber: Teaching, Knowledge Exchange & Public Engagement, Pp. 15–32. London: Ubiquity Press. DOI: http://dx.doi. org/10.5334/bat.b. License: CC-BY 4.0. 16 Digital Classics Outside the Echo-Chamber

12 citations


Journal ArticleDOI
TL;DR: The first quantitative analysis on these linguistic phenomena—syntactic parallelism, imagistic language, and propositional language—on a treebank of selected poems from the Complete Tang Poems is presented.
Abstract: It is widely believed that different parts of a classical Chinese poem vary in syntactic properties. The middle part is usually parallel, i.e. the two lines in a couplet have similar sentence structure and part of speech; in contrast, the beginning and final parts tend to be non-parallel. Imagistic language, dominated by noun phrases evoking images, is concentrated in the middle; propositional language, with more complex grammatical structures, is more often found at the end. We present the first quantitative analysis on these linguistic phenomena—syntactic parallelism, imagistic language, and propositional language—on a treebank of selected poems from the Complete Tang Poems . Written during the Tang Dynasty between the 7th and 9th centuries CE, these poems are often considered the pinnacle of classical Chinese poetry. Our analysis affirms the traditional observation that the final couplet is rarely parallel; the middle couplets are more frequently parallel, especially at the phrase rather than the word level. Further, the final couplet more often takes a non-declarative mood, uses function words, and adopts propositional language. In contrast, the beginning and middle couplets employ more content words and tend toward imagistic language.

11 citations


Journal ArticleDOI
TL;DR: This article highlights shared methods, questions, and challenges between Research Through Design (RtD) and Digital Humanities (DH) through the discussion of an archival research project and emphasizes that this relationship can and will be productive for both disciplines.
Abstract: This work was supported by the Arts and Humanities Research Council [grant number AH/L007746/1].

11 citations


Journal ArticleDOI
TL;DR: Stylometric methods are used to provide evidence for a claim about the authorship of the story and to analyze the nature of Eddy’s collaboration with Lovecraft.
Abstract: The authorship of the 1924 short story ‘The Loved Dead’ has been contested by family members of Clifford Martin Eddy, Jr. and Sunand Tryambak Joshi, a leading scholar on Howard Phillips Lovecraft. The authors of this article use stylometric methods to provide evidence for a claim about the authorship of the story and to analyze the nature of Eddy’s collaboration with Lovecraft. Further, we extend Rybicki, Hoover, and Kestemont’s (Collaborative authorship: Conrad, Ford, and rolling delta. Literary and Linguistic Computing , 2014; 29 , 422–31) analysis of stylometry as it relates to collaborations in order to reveal the necessary considerations for employing a stylometric approach to authorial collaboration.




Journal ArticleDOI
TL;DR: It is argued thatRecommender systems offer an opportunity to discover new humanistic interpretative possibilities by building new metadata from text and images for recommender systems to reorganize and reshape the archive.
Abstract: The way materials are archived and organized shapes knowledge production (Derrida, J. Archive Fever: A Freudian Impression. Vancouver: University of Chicago Press, 1996; Foucault, M. L’archéologie du savoir. Paris, France: Éditions Gallimard, 1969; Kramer, M. Going meta on metadata. Journal of Digital Humanities, 3(2), 2014; Hart, T. How do you archive the sky? Archive Journal, 5, 2015; Taylor, D. Save As. e-misférica, 9, 2012). We argue that recommender systems offer an opportunity to discover new humanistic interpretative possibilities. We can do so by building new metadata from text and images for recommender systems to reorganize and reshape the archive. In the process, we can remix and reframe the archive allowing users to mine the archive in multiple ways while making visible the organizing logics that shape interpretation. To show how recommender systems can shape the digital humanities, we will look closely at how they are used in digital media and then applied to the digital humanities by focusing on the Photogrammar project, a Web platform showcasing US government photography from 1935 to 1945. .................................................................................................................................................................................

Journal ArticleDOI
Claire Warwick1
TL;DR: The article argues that the reasons why problems in the design of digital systems for use in cultural heritage and the humanities persist may be due to the very complex relationship between physical and digital information and information resources.
Abstract: Certain problems in the design of digital systems for use in cultural heritage and the humanities have proved to be unexpectedly difficult to solve. For example, Why is it difficult to locate ourselves and understand the extent and shape of digital information resources? Why is digital serendipity still so unusual? Why do users persist in making notes on paper rather than using digital annotation systems? Why do we like to visit and work in a library, and browse open stacks, even though we could access digital information remotely? Why do we still love printed books, but feel little affection for digital e-readers? Why are vinyl records so popular? Why is the experience of visiting a museum still relatively unaffected by digital interaction? The article argues that the reasons these problems persist may be due to the very complex relationship between physical and digital information and information resources. I will discuss the importance of spatial orientation, memory, pleasure, and multi-sensory input, especially touch, in making sense of, and connections between physical and digital information. I will also argue that, in this context, we have much to learn from the designers of early printed books and libraries, such as the Priory Library and that of John Cosin, a seventeenth-century bishop of Durham, which is part of the collections of Durham University library.

Journal ArticleDOI
TL;DR: The Histogram of Orientation Shape Context (HOOSC) shape descriptor is introduced to the Digital Humanities community and a graph-based glyph visualization interface is developed to facilitate efficient exploration and analysis of hieroglyphs.
Abstract: Maya hieroglyphic analysis requires epigraphers to spend a significant amount of time browsing existing catalogs to identify individual glyphs. Automatic Maya glyph analysis provides an efficient way to assist scholars’ daily work. We introduce the Histogram of Orientation Shape Context (HOOSC) shape descriptor to the Digital Humanities community. We discuss key issues for practitioners and study the effect that certain parameters have on the performance of the descriptor. Different HOOSC parameters are tested in an automatic ancient Maya hieroglyph retrieval system with two different settings, namely, when shape alone is considered and when glyph co-occurrence information is incorporated. Additionally, we developed a graph-based glyph visualization interface to facilitate efficient exploration and analysis of hieroglyphs. Specifically, a force-directed graph prototype is applied to visualize Maya glyphs based on their visual similarity. Each node in the graph represents a glyph image; the width of an edge indicates the visual similarity between the two according glyphs. The HOOSC descriptor is used to represent glyph shape, based on which pairwise glyph similarity scores are computed. To evaluate our tool, we designed evaluation tasks and questionnaires for two separate user groups, namely, a general public user group and an epigrapher scholar group. Evaluation results and feedback from both groups show that our tool provides intuitive access to explore and discover the Maya hieroglyphic writing, and could potentially facilitate epigraphy work. The positive evaluation results and feedback further hint the practical value of the HOOSC descriptor.


Journal ArticleDOI
TL;DR: The evaluation shows that DIVADIAWI has two advantages when bringing it into context with state-of-the-art tools, the automatic functions of DIVAdIAWI greatly accelerate the GT generation, and DIVadIAWI obtains a high score in a system usability test.
Abstract: Historical documents usually have a complex layout, making them one of the most challenging types of documents for automatic image analysis. In the pipeline of automatic document image analysis (DIA), layout analysis is an important prerequisite for further steps including optical character recognition, script analysis, and image recognition. It aims at splitting a document image into regions of interest such as text lines, background, and decorations. To train a layout analysis system, an essential prerequisite is a set of pages with corresponding ground truth (GT), i.e. existing labels (e.g. text line and decoration) annotated by human experts. Although there exist many methods and tools in GT generation, most of them are not suitable on our specific data sets. In this article, we propose to use Gabor features to generate GT, and based on Gabor features, we developed a web-based interface called DIVADIAWI. DIVADIAWI applies automatic functions using Gabor features to generate GT of text lines. For other region types such as background and decorations, users can manually draw their GT with userfriendly operations. The evaluation shows that (1) DIVADIAWI has two advantages when bringing it into context with state-of-the-art tools, (2) the automatic functions of DIVADIAWI greatly accelerate the GT generation, and (3) DIVADIAWI obtains a high score in a system usability test. .................................................................................................................................................................................

Journal ArticleDOI
TL;DR: In order to develop a more robust, quantitative approach to the study of motion in theatre performances, video processing techniques are used to analyze a puppet theatre recording from Indonesia found that there is a strong correspondence between the narrative structure and the speed of the puppets.
Abstract: The digital humanities have been very successful in proposing quantitative methods for the analysis of textual data. However, similar methods are not widespread for the study of artistic expressions that rely on motion (such as theatre). In order to develop a more robust, quantitative approach to the study of motion in theatre performances, we use video processing techniques to analyze a puppet theatre recording from Indonesia. By calculating the average speed of the different scenes, we found that there is a strong correspondence between the narrative structure and the speed of the puppets. We hope this work contributes to a development of quantitative analysis methods for the study of theatre and that it also impacts the way in which theatre documentation projects are carried out in the future.



Journal ArticleDOI
TL;DR: An automated method for identifying articles that are likely to be poems by searching for a number of signals embedded in articles is developed, and 88% of the newspaper articles that a knowledgeable human would classify as ‘poetry’ are identified.
Abstract: AustLit is a major Australian cultural heritage database and the most comprehensive record of a nation’s literary history in the world. In this article we will present the successful results of a project addressing the challenge of discovering and recording creative writing published in digitized historical Australian newspapers, provided by the National Library of Australia’s Trove service. As a first step in identifying creative writing, we developed an automated method for identifying articles that are likely to be poems by searching for a number of signals embedded in articles. When this work began, AustLit contained more 10,200 bibliographical records for poems published between 1803 and 1954 (75% prior to 1900) with links to the full text in 115 different newspaper. The aim of the project was to expand this number of bibliographical records in AustLit and provide a foundation for analysing the importance of poetry in newspaper publishing of the period. Taking advantage of Ted Underwood’s (Getting Everything you Want from HathiTrust , and Open Data ( ): The Stone and the Shell, Underwood blog posts (Both accessed 27 October 2015), 2012) work with seventeenth- and eighteenth-century full text in the HathiTrust collection, we trained a naive Bayesian classifier, modifying code from Daniel Shiffman (Bayesian Filtering. (accessed 27 October 2015), 2008) and Paul Graham (A Plan for Spam. (accessed 27 October 2015), 2002) and improving the quality of Optical Character Recognition (OCR) by using the overProof correction algorithm. We have been able to successfully identify large numbers of poems in the newspapers database, greatly expanding AustLit’s coverage of this important literary form. After suitable training of the classifier, we were able to successfully identify 88% of the newspaper articles that a knowledgeable human would classify as ‘poetry’. Our results have encouraged us to consider enhancing and extending the techniques to aid the identification of other forms of literature and criticism.

Journal ArticleDOI
TL;DR: The point of departure for this article is the Renderings project, established in 2014 and developed at the Massachusetts Institute of Technology in a laboratory called the Trope Tank, to translate highly computational and otherwise unusual digital literature into English.

Journal ArticleDOI
TL;DR: The first virtual research environment devoted mainly to Spanish speakers interested in digital scholarly edition is EVI-LINHD (Entorno Virtual de Investigación del Laboratorio de Innovacion en Humanidades Digitales).
Abstract: Laboratorio de Innovacion en Humanidades Digitales (UNED) has developed Entorno Virtual de Investigacion del Laboratorio de Innovacion en Humanidades Digitales (EVI-LINHD), the first virtual research environment devoted mainly to Spanish speakers interested in digital scholarly edition. EVI-LINHD combines different open-source software for developing a complete digital project: (1) a Webbased application markup tool—TEIscribe—combined with an eXistdb solution and a TEIPublisher platform, (2) Omeka for digital libraries, and (3) WordPress for simple Web pages. All these instances are linked to a local installation of the LINDAT/Common Language Resources and Technology Infrastructure (CLARIN) digital repository. LINDAT/CLARIN allows EVI-LINHD users to have their projects deposited and stored safely. Thanks to this solution, EVI-LINHD projects also improve their visibility. The specific metadata profile used in the repository is based on Dublin Core, and it is enriched with the Spanish translation of DARIAH’s Taxonomy of Digital Research Activities in the Humanities.

Journal ArticleDOI
TL;DR: In this article, a semi-automated extraction of details corresponding to narratological fabula from a corpus of narrative interviews on a single event provides decontextualized building blocks for transversal, or cross-document, narratives.
Abstract: Semi-automated extraction of details corresponding to narratological fabula from a corpus of narrative interviews on a single event provides decontextualized building blocks for transversal, or cross-document, narratives. With information extracted from 503 World Trade Center Task Force Interviews comprising 12,000 pages of testimony and novel visualization techniques, this article proposes a computational method for the emergence of narratives that cross beyond the boundaries of one interview. These assembled narratives, in cases like that of Chief Ganci, can document those who did not survive to tell their own story.

Journal ArticleDOI
Jacob Shell1
TL;DR: There was enough geographic information contained in the volumes of Karl Marx's Capital to produce a geovisually rich map, presenting the themes, places, and relationships in this text in a new and revealing way.
Abstract: Presented here is a geovisual reading of all three volumes of Karl Marx's Capital. Marx's seminal treatise on political economy is normally treated as a work of abstract conceptualization. However, Marx names hundreds of geographic locations in Capital, usually in a highly relational and dynamic fashion. It seemed to me there was enough geographic information contained in the volumes to produce a geovisually rich map, presenting the themes, places, and relationships in this text in a new and revealing way.

Journal ArticleDOI
TL;DR: The authors argue that the functions of document and text are realized primarily by their fluid nature and by the dynamic character of their interpretation, and define the purpose of textual scholarship as a stabilisation of text is therefore fallacious.
Abstract: In this article we aim to provide a minimally sufficient theoretical framework to argue that it is time for a re-conception of the notion of text in the field of digital textual scholarship. This should allow us to reconsider the ontological status of digital text, and that will ground future work discussing the specific analytical affordances offered by digital texts understood as digital texts. Following from the argument of Suzanne Briet regarding documentation, referring to Eco’s understanding of ‘infinite semiosis’, and accounting for the reciprocal effects between carrier technology and meaning observed by McLuhan, we argue that the functions of document and text are realized primarily by their fluid nature and by the dynamic character of their interpretation. To define the purpose of textual scholarship as a ‘stabilisation’ of text is therefore fallacious. The delusive focus on ‘stability’ and discrete ‘philological fact’ gives rise to a widespread belief in textual scholarship that digital texts can be treated simply as representations of print or manuscript texts. On the contrary—digital texts are texts in and of themselves in numerous digital models and data structures which may include, but is not limited to, text meant for graphical display on a screen. We conclude with the observation that philological treatment of these texts demands an adequate digital and/or computational literacy.