Machaut Data Part 1

Introduction

For those who are interested in the fourteenth-century French poet and composer Guillaume de Machaut, one resource has remained indispensable for the last seventeen years: Lawrence Earp's Guillaume de Machaut: A Guide to Research (Garland Publishing Inc., 1995). The brilliance of the book lies in the fact that Earp went far beyond even the stated goal, articulated in his preface to the volume, of "a guide to secondary sources" [p. xii]. No, what he did - for the field - was provide a framework for organizing the objects of study (source manuscripts, works, documentary evidence) as a preliminary step to navigation of the body of secondary sources that had been produced up to the time of publication.

Machaut studies is rife with alternate numbering systems, with some works having two or three competing identifiers, some manuscripts bearing multiple sigla, etc. A Guide to Research provided, in a single volume, a clarification of the primary sources and the conceptual works contained in them. This didn't necessarily make it easier for later authors to easily index the Machaut corpus in their own publications (as Elizabeth Eva Leach has discussed here), but it provides a handy cross-reference for anyone working in the field.

No publication has rivaled Earp's guide in nearly two decades, and there does not appear to be a revised edition on the horizon (though I would be happy to hear otherwise). Meanwhile, publication on Machaut has continued apace (or increased, with major articles, collections, and monographs appearing regularly, and a new complete edition in the works). New recordings of Machaut's music continue to be produced, and the burgeoning digitization efforts of national and academic libraries and specialist sites like DIAMM mean that high-resolution facsimiles of the manuscript sources are increasingly available (an up-to-date list of available digitized facsimiles of Machaut sources can be found here).

This situation got me thinking about the relationship humanities scholars have to the data that we work with. As more of our work takes place in the digital world, how do we translate a massive effort, like Earp's, into an ongoing, sustainable, and extensible dataset? How do we enable others in the field to aggregate snapshots of the state of research? How do we make our digital resources useful and, more importantly, how do we express the relationships between them in a way that is not dependent on a single scholar (or team of scholars) for maintenance and updating?

Earp's guide remains relevant because the basic description of resources it provides remains accessible (though increasingly expensive if one were to try to buy the printed object now). While the list of secondary sources is increasingly out of date, the delivery technology will remain functional for generations.

In the digital world, we have to flip that expectation. Delivery technologies will change - and they will change rapidly. A Machaut-oriented website built today will not have a 100-year lifespan. It won't have a 50-year lifespan. It probably won't even have a 17-year lifespan. Rather than focus on the delivery mechanism, we should focus on building extensible data and using robust ontologies, standards, and common identifiers for our primary objects. Further, we should be publishing those datasets freely for re-use by others so that the information available in our areas of interest becomes part of a growing network of data that can be assembled, aggregated, re-used, expanded. The value is not in creating the next "online Earp" or something similar, the value is in building a framework of identifiers so that new resources can be added to the collection, new relationships expressed between resources, scholarly by-products (transcription and analysis) be integrated into the network of information - and so that we can easily take a snap-shot of that network of information.

This argument is not new. This approach is not new. However, we don't seem to be seeing a lot of uptake in humanities projects for any number of reasons, including the underlying values of humanities scholars and entrenched structures of evaluation for promotion. However, the technologies are emerging and scholarship outside of the humanities is making increased use of them. Linked data provides a tempting approach for building a Machaut data framework - with enough freely available tools to provide access to that data in ways that might be useful in the near term and could easily be replaced as new tools supersede them.

In this series of posts, I will document the creation of a graph of data for Machaut studies focusing on a subset of the resources that form the objects of study:

manuscript sources containing Machaut's works
works attributed to (or likely authored by) Machaut
works not by Machaut but that appear in manuscript sources alongside his works
documentary evidence about Machaut, his works, or the manuscripts sources containing them.

I will use existing tools to visualize this data - starting with the Simile Exhibit tools, though possibly exploring others along the way.

As a caveat - I'm new to a lot of this, so will be learning along the way. I want to produce something useful to the field, with data that can be re-used by others. In the next post, I will talk about building the data for a list of the manuscript sources containing Machaut's works. A preliminary snapshot of that data is here, but this is really just a first pass - dates, provenance, choice of ontology, etc. still need some thought and will probably be revised multiple times in the months ahead (clicking on the "scissors" icon on the right-hand side of the page in the link above will give access to the data as it stands right now).

Published

13 January 2013