scispace - formally typeset
Search or ask a question

Showing papers by "Jonathan L. Zittrain published in 2021"


Journal ArticleDOI
TL;DR: In this paper, the authors identify, and potentially mitigate, common sources of data set shift in machine-learning systems, which occurs when the model "training" is used in clinical trials.
Abstract: Dataset Shift in Clinical Trials This letter outlines how to identify, and potentially mitigate, common sources of “dataset shift” in machine-learning systems. This occurs when the model “training ...

144 citations


Journal ArticleDOI
TL;DR: The substantial linkrot and content drift the authors find here across the New York Times corpus accurately reflects the inherent difficulties of long-term linking to pieces of a volatile web.
Abstract: Hyperlinks are a powerful tool for journalists and their readers. Diving deep into the context of an article is just a click away. But hyperlinks are a double-edged sword; for all of the internet’s boundlessness, what’s found on the web can also be modified, moved, or entirely disappeared. This often-irreversible decay of web content is commonly known as linkrot. It comes with a similar problem of content drift, or the often-unannounced changes––retractions, additions, replacement––to the content at a particular URL. Our team of researchers at Harvard Law School has undertaken a project to gain insight into the extent and characteristics of journalistic linkrot and content drift. We examined hyperlinks in New York Times articles starting with the launch of the Times website in 1996 up through mid-2019, developed on the basis of a dataset provided to us by the Times. We focus on the Times not because it is an influential publication whose archives are often used to help form a historical record. Rather, the substantial linkrot and content drift we find here across the New York Times corpus accurately reflects the inherent difficulties of long-term linking to pieces of a volatile web. Results show a near linear increase of linkrot over time, with interesting patterns emerging within certain sections of the paper or across top level domains. Over half of articles containing at least one URL also contained a dead link. Additionally, of the ostensibly “healthy” links existing in articles, a hand review revealed additional erosion to citations via content drift.

4 citations