Assembling the Fragments of Mobilizing the Archaeological Report for a Future of Reuse, Part Two
- Sep 25, 2024
- 7 min read
The second part of our piece for the Journal of Field Archaeology’s 50th anniversary issue is a bit scrappier than the first, but I think it says many of the things that we want to say (even if the citations and some elements of emphasis remain unpolished). If you want some background to what this post is about, check out yesterday’s post and follow the links.
Scaffolding the Archaeological Report: A Guide to Promoting Date Reuse
The digital age has often brought expectations that archaeological datasets will revolutionize archaeological knowledge. Indeed, a special issue of this journal celebrated the potential and recognized the challenges of Big Data oriented approaches to archaeology. Publishing data to facilitate reuse and replication, however, is easier said than done. Anyone who has critically read a written archaeological report—the most common form for encountering archaeological data today—knows how hard it is to navigate the relationship between interpretation and evidence. Archaeologists have, of course, improved the prospects of reuse by sharing the data itself through online catalogs, photos, and data tables. Yet, even clean, accessible, and interoperable data cannot be mobilized without appropriate descriptive frameworks, metadata, paradata, and instructions for analysis. Néhémie Strupler’s efforts to reproduce the published analyses of three regional surface surveys on the basis of their online datasets (2021) demonstrates the fundamental challenges of verifying results without published code of original analysis.
In preparing scaffolding for the Eastern Korinthia Archaeological Survey, we began with the foundation itself: the creation of a reliable dataset that was simultaneously findable, accessible, interoperable, and reusable. The lead editor (Pettegrew) worked with EKAS’ project directors and the editors of Open Context to finetune and finesse open data over a period of several years (2020-2023). Some of this work entailed tracking down and compiling decades old Microsoft Access databases and migrating them to current versions. Some entailed making higher-order decisions about curating data, including which information to include and exclude and how to structure data for downloading and archiving. Pettegrew also spent considerable time reclaiming the integrity of bad data by refining values for clear search results, such as correcting misspellings, standardizing inconsistent spellings in text fields, and verifying that fields defined as numeric (e.g., counted artifacts, visibility estimates, or fieldwalker spacing) contained no text (e.g., “10” rather than “10M” to denote 10-meter spacing). Although the project staff had keyed most of its data at the time of survey, Pettegrew still devoted 300 hours in fall 2020 alone. As he notes in the volume, he spent this time “pressing usable and meaningful information out of messy digitized survey records” (Pettegrew 2024, 181).
Beyond the initial period of data refinement in 2020, Pettegrew worked with the editors-in-chief at Open Context over the next three years to make the online data adhere to FAIR standards. Together, they cleaned data for neater presentation, standardized filenames and uploaded files, provided guidelines and overviews to contextualize the data, defined metadata and paradata, and validated and updated data errors encountered in the process. They also created stable links to online objects and particular search queries to make sense of the 25,000 objects available today at Open Context. These objects included primary reports, manuals, and copies of various forms used in the field and during artifact processing; an extensive archive of over 7,000 digital photographs and illustrations of survey units, field conditions, and finds; a wide range of geospatial records; and a sizable collection of information relating to environment, counted artifacts, and identified artifacts and features. The 25,000 digital objects were assigned stable and unique identifiers and made available under an open access (CC 4.0 BY-NC) license to encourage reuse, but to prohibit commercial exploitation of this data. This ensured that Pettegrew or any other analyst could link data directly to the publications and that the data would be visible and accessible to any user without paywalls or other barriers. Archived and preserved through the California Digital Library, these instructions ensure that the open data is accessible for generations of future analysts. The publication of the Eastern Korinthia Archaeological datasets in 2021 provided a project landing page for understanding the project methods, results, significance, datasets, and usability, as well as a Related Media hub for downloading documents, tables, and GIS files.
The accompanying volume, Corinthian Countrysides, reflected a desire to publish a fuller contextual treatment of the EKAS data that would scaffold for reuse. Although Pettegrew initially set out in 2019 to write a synthetic interpretation of Corinthian countrysides from prehistory to the modern period, the overwhelming prospect of synthesizing enormous quantities of survey data, maps, aerial photos, and historical texts recommended paring down scope. In conversation with Bill Caraher, he revised his aim to create a slim volume that would serve simply as a guide to the datasets which were published on the data publishing platform Open Context. The first draft of the book offered a brief history of the project, a summary of the survey methods and procedures, and a thorough description of the data available online. This draft, however, earned a critical response through peer review. The reviewers recommended more context, more complete presentation of results, increased subtlety and detail in analysis, and greater efforts to synthetize interpretations—in short, the elements of a traditional final report. The author responded to these critiques by creating a more hybridized volume that is neither a final report nor simply a description of the datasets. Pettegrew expanded the book with new chapters that added context and case studies showcasing the possibilities of analysis and interpretation. These changes presented more original analysis of the data, but did not change the character of the book as fundamentally a guide to its reuse. Caraher returned this heavily revised manuscript to the reviewers and they gave it their blessing.
The published version of the book has three sections that prepare a reader to understand and reuse data. The first six chapters introduce the reader to the project through a comprehensive description of the intellectual backgrounds, theoretical and methodological orientations, and the contingent historical factors and agendas that shaped the decisions to collect particular kinds of information with particular practices. This section familiarizes the reader with siteless (artifact-level) approaches to survey that aim to map artifact counts and artifact samples (rather than sites) to interpret diachronic land use and settlement. These chapters show that distributional approaches reward data-centered analyses that allow archaeologists to deconstruct, aggregate, parse, and layer the artifactual landscape. They also highlight elements of tension and contradiction in the formulation of the survey—for example, in the project’s decision to record “Localized Cultural Anomalies” (EKAS’ definition of a site worth exploring more) in the context of a survey that self-consciously privileged “siteless” and “non-site” techniques.
The second section (Ch. 7-8, plus appendices) offers detailed information about the digital datasets at Open Context, including the processes used to refine and clean, comprehensive metadata definitions for all files and fields, and instructions for downloading and exploring online data. Often archaeologists leave unexplained the processes by which they transform data from the point of collection to publication, in turn leaving readers uncertain about the quality and integrity of information. One point of these chapters then is to foster greater transparency about how decisions affected the character of published data available for analysis and interpretation. Pettegrew also added a chapter on navigating Open Context’s interface to promote browsing, querying, downloading, and discovering records.
The third section (Ch. 9-14) focuses on major issues of source criticism and different modes of analysis that establish the potential and limitations of survey data for interpreting Corinthian history. Two chapters, for example, are devoted to a critical discussion of the range of factors (e.g., ground visibility) affecting the analysis of artifact counts and densities, and model approaches to working with survey data. These approaches include creating defined analytical toponyms and survey zones that allow an analyst to map and manipulate data at scales beyond a small individual survey unit. These analytical units are interpretative, of course, and Pettegrew’s use of these designations serve as much to reveal new patterns in the data as to demonstrate how a user can aggregate and disaggregate the data in productive ways. Other chapters in this section mine the information that sampled artifacts to reveal how they can contribute to understanding patterns of land use and settlement at different spatial and chronological scales. These chapters highlight tradeoffs that come with variously parsing and deconstructing the landscape (e.g., mapping all of the Early Roman tablewares dating to the second century CE), and lumping or aggregating larger corpuses of records that are chronologically coarser (e.g., mapping all ceramic remains marking the three-century long Early Roman period).
Altogether the work comprises a critical edition of the survey data by demonstrating the human elements of its creation, delimiting its potential and limitations in analysis, and imagining its relevance to writing histories of Corinthian countrysides. Analysis and interpretation, where present, are not presented as definitive readings, but illustrations of how EKAS datasets can inform the history of the Corinthian landscape. While the resulting book is a far more comprehensive study than originally planned, in introducing and modeling the potential uses of the EKAS datasets, it hands readers the keys to mobilizing complex survey datasets for discovery, browsing, and tinkering.
On the publishing side, the hyperlink creates a coherence and unity between book and online data. Corinthian Countrysides contains thousands of hyperlinks. Some of these were added to create ease of access for internal cross-referencing of chapters, sections, figures, tables, and definitions of metadata. But we also imagined links as creating opportunities to learn more about a period, place, or assemblage of artifacts beyond the book. In this case, we added hyperlinks to online texts of ancient authors, period definitions, and toponyms at the well-known sites of Pleiades and GeoNames. We added thousands of hyperlinks to the data at Open Context, including field reports, publications, archaeological units, and particular queries whether of individual objects (e.g., Late Roman African Red Slip Form 104-106 rim) or entire object assemblages (e.g., all artifacts of Early Modern date). We had no intention of linking for linking’s sake, but added hyperlinks when we felt it created opportunities for going beneath the surface, browsing, and exploring source data.
The format of the book remains conventional by design and relies on off-the-shelf formats and components. The book circulates as a PDF document which is both an open and archival format common to scholarly publishing. The hyperlinks present in the digital version of the book are visible to the reader as endnotes. This ensures that the print version and the digital version share the same pagination even if it is unlikely that a reader of the paper book will hand-type the URI present in the endnotes. Their presence, however, ensures that the relationship between the book and the data remains as transparent as possible for the reader. We have archived final versions of the book in both paper and digital form in the University of North Dakota’s Scholarly Commons digital repository at the university’s Chester Fritz Library where it also received a DOI. Between the long term commitments to archival preservation at the University of North Dakota and at Open Context (and their partners) the book’s persistence is secure.









Comments