PREFACE: in keeping with the rest of my website, my blog is a low-tech affair; no fancy-pants bells and whistles like categories or tags or proofreading. You may wish to use cmd + F (Mac) or ctrl + F (Windows) to search for terms of interest.


UNPACKING AN ARCHIVES AS DATA PARADIGM: Or, Let's Stop Comparing Data to Oil, Shall We?

[The blog post below is based on a talk I gave at the 2020 Association of Canadian Archivists (virtual) conference called "Seeing Archives as Data: Exploring the Emergence of a Data Paradigm in Archival Thinking." Because it was a conference presentation, it's... uh, kinda dry.

I will probably expand on it from time to time because, like most of my conference papers, it was freshly (half-)baked when I presented it; so, not the product of endless iterations and revisions - and it shows.

The slides for the presentation contain the references for the blog post.

I also began the talk with a territorial acknowledgement and statement of positionality[1] which is included in the notes if you wish to read it.]

In his 2013 article, “Evidence, memory, identity and community: four shifting archival paradigms,” Terry Cook identifies four key phases of archival thinking over the past 150 years that he represents through the paradigms of evidence, memory, identity and community respectively. Clarifying that the paradigms are more like “frameworks for thinking about archives, or archival mindsets, ways of imagining archives and archiving” (97), he suggests that they have been integral to the formation of a shared professional identity and embedded in the processes of archival work.

I have elsewhere proposed that a fifth paradigm – data – may capture a more recent shift in the archival profession towards thinking about and working with archives as data. In this post, I'll map out some possible features of an archives as data paradigm in greater – though not yet sufficient – depth.

I'll focus first on the ways in which archives – as data – are imagined in both concrete and metaphorical terms, specifically in relation to the idea that data is a natural resource from which value can be extracted. I'll then analyze what implications for archival identity a data paradigm might entail, highlighting Acker and Kriesberg's notion of the developer steward. Lastly, I'll conclude by (briefly) speaking to sites and strategies for resistance to the more oppressive aspects of an archives as data paradigm [NB: this would be the part that needs more development but given the rate at which I write blog posts... outlook not so good].

I. WAYS OF IMAGINING ARCHIVES

Cook demonstrates the interwovenness of archival identity, archival processes and the ways in which archives are imagined by the profession; that is, for archives to be memory it requires that the archival profession orients its principles and practices towards that telos. It also demands that the archivist play a particular role.

Cook explains: "[the memory] paradigm was distinctively concerned… with appraising records as historical sources, with the historian-archivist subjectively creating a cultural memory resource" (2013, 109). Or, in shorthand: "the historian-archivist selects the archive" (2013, 107 - underline added).

So, what might this formulation look like for an archives as data paradigm?

Computational tools have been enlisted by recordkeepers to manage large volumes of digital archives. Keywords or even categorical entities like people, places and organizations can be rapidly surfaced to assist with arrangement and description. Personally identifiable information (PII), like home addresses and credit card information, can be detected and anonymized.

But there's a catch: to leverage the capabilities afforded by computing, archives must first be translated into a form that can be operated upon by machines; that is, they must be datafied. While born-digital archives are created as data, physical archives not only need to be digitized but also made legible to a computer through techniques like optical character recognition, tokenization – or the breaking down of unstructured text into individual words – and image segmentation. Of course, each step "cooks" the data to some extent using tools that often were initially developed for scientific or industry applications and so, are influenced by the biases and assumptions of those contexts.

At the heart of an archives as data paradigm, then, is an amenability to computation.

It is – of course – more complicated than stating that that archives are data, however. Evidence, memory, identity, community and data are all extra-disciplinary concepts, so – in comparison, say, with "respect des fonds" or "original order" – there's another layer of contested meaning to complicate a shared understanding of how archives are imagined by the profession. And data, as an especially abstract and elusive concept, accumulates metaphors easily.

In "Critical Questions for Archives as (Big) Data," I concentrated on the conceptual framing of data as raw material, arguing that it allowed for problematic claims of professional objectivity and neutrality to be reasserted. To unravel another thread of the archives as data paradigm, I'm going to take up the same metaphor but from different angle: data as a(n unrefined) natural resource, with data processing being the means through which value is extracted.

Imagining data as a natural resource is not something that arises from an archival context but rather comes readymade from its discursive construction – or, the way it's talked about – in the technocapitalist realm. Niels Kerssens, in his analysis of metaphor-driven representations of data practices, traces the notion of data mining as an extraction of value from natural resources back to the 1990s. But it's still going strong: he cites numerous recent examples from trade publications conflating data and natural resources, including an article in The Economist entitled "The World's Most Valuable Resource Is No Longer Oil, but Data" which plainly states: "Artificial-intelligence (AI) techniques such as machine learning extract more value from data." Sort of a triple-whammy of hammering home the metaphor.

Aligning archives with data as a natural resource also functions to signal their potential for monetization. Jarrett Drake has already remarked on the commodification of the archival record which is reinforced through the use of neoliberal terms like producer, manager and consumer in the OAIS reference model (2019, 276). Similarly, an archives as data (as natural resource) paradigm conveys a value expressed in terms of capital rather than disciplinary notions of archival value. At a time when archival institutions – unlike police – are increasingly being defunded and – in an era of Google, Ancestry, and other historical resources available online – are compelled to defend their importance under neoliberal conditions, it may be tempting to frame datafied archives as containing financial worth: a gold mine of data.

Now, I’m not necessarily suggesting that we avoid using language like "entity extraction" or "data mining" – though we can :) – as it is the discourse of data disciplines, what enables us to communicate with other professionals in related fields. But it is important that we resist an uncritical adoption of terms from business-oriented data practices, and that we are attuned to the "the representational politics of big data metaphors" (Kerssens 2019, 2) when we talk about our work. The data as a natural resource metaphor is located within a larger narrative of colonial exploitation, which simultaneously signifies the dispossession of Indigenous lands. Imagining and talking about archives as data through the metaphor of data as a natural resource thus stands to ground the paradigm in the world view of settler colonialism. And maybe we want to untether ourselves from that legacy.

II. IMAGINING ARCHIVAL IDENTITY

When imagining archives as data, and Big Data specifically, we must also ask what such a paradigm demands of archival identity. Cook illustrates how the archives as evidence paradigm – that is, imagining archives as trustworthy witnesses to facts, actions and ideas – prescribes the characteristics of an ideal archivist: "an honest broker between creator and researcher" (2013, 100). In order to fulfill the role of an honest broker – the unobtrusive middleman – the archivist is presupposed to be neutral, objective and impartial, and this identity is operationalized though functions, procedures, archival education and more. So, what then does an archives as data paradigm require the archivist to become?

In "Social media data archives in an API‑driven world" (which is brillant and you should read it), Amelia Acker and Adam Kreisberg introduce the role of the developer steward: memory workers who leverage Application Programming Interfaces, or APIs, to request data from social media platforms for the purposes of preservation and future use (2019, 114). Acker and Kriesberg situate the developer steward as - essentially - a rescuer of social media data from the proprietary custody of corporate systems, whose primary interests in profiting from user data obviously diverge from the aims of the archival profession. I would argue, however, that the figure of the developer steward could actually be taken up more broadly to encapsulate the identity of archivists working with archives as data.

The practices involved in working with archives as data have expanded the scope of archival labour considerably: in addition to the more traditional activities of selection, arrangement and description, recordkeepers working with datafied archives engage in a range of computational tasks: processing metadata, running scripts, making API requests and so on. They may also create tools to assist with their work, like ePADD, BitCurator or Archivematica. The "developer" in developer steward, then, suitably captures this wider set of practices and the technical fluency they call for.

We thus arrive at a possible summary statement, in the style of Cook, for an archives as data paradigm: the developer steward renders the archive amenable to computation (and I realize "render" is yet another term borrowed from imperial capitalism to suggest the refining of natural resources – gah).

When imagining archival identity in relation to how archives are imagined, however, we must also be attentive to the ways in which that identity is implicitly coded along the lines of race, gender, sexuality, ability and other markers of difference. Though Cook advocates for a pluralistic and diverse understanding of archival identity, he stops short of naming the dominant obstacle to it: that is, the default subject position envisioned within an archives as data paradigm is likely to be white.

Mario Ramirez has critiqued the normative whiteness of the archival profession while D'Ignatzio & Klein (and many others) have commented on the predominance of white and male practitioners in data science fields. Given that the developer steward occupies a space where these two disciplines overlap, the continued centring of white subjectivity could easily persist if not actively challenged.

The barriers to participation for minoritized groups are therefore not only structural but ideological when, as the quotation below referenced by Kim Gallon exemplifies:

"When all these [administrators] are looking around for the folks who are going to do cutting edge work, the last folks they think about are black folks" (2016, 45-46).

Though Gallon is referring more specifically to the Digital Humanities, her point is nonetheless germane to the archival profession: Black and Indigenous archivists are simply not imagined (by white recordkeepers) to be doing the work of the developer steward.

III. EXPANDING THE PARADIGM (WHEREIN I WRAP THINGS UP ABRUPTLY BECAUSE I'VE ALREADY GONE OVER THE ALOTTED TIME)

Just as Cook maintained that the four paradigms coexist and enrich each other, I would argue that an archives as data paradigm can similarly accommodate a more expansive imagining of archives and archival identity. That is, there isn't one definitive role within an archives as data paradigm but several in dialogue with each other. The Feminist Data Manifest-no, led by Marika Cifor and Patricia Garcia, points to an identity engaged in dismantling harmful data regimes. Likewise, the Archivists4ClimateAction counter the metaphors of abundance tied to big data by educating colleagues about the significant carbon footprint of digital storage and preservation actions. But, while this multiplicity of narratives is promising, those with privilege must also guard against the tendency for such roles to be distributed along the lines of positionality, leaving underrepresented practitioners to perform the majority of the critique.

It's not my intent to thwart an archives as data paradigm, of course – nip it in the bud. What I am hoping for, however, is to encourage those of us engaged in working with archives as data to be intentional about imagining datafied archives and our professional identity differently from the representations that have been handed to us from technocapitalism. To recognize the oppressive dimensions of metaphors like data as a natural resource which further entrench the unequal power relations that marginalize recordkeepers, creators and users of archives who are Black, Indigenous and People of Colour.

I'll leave you to consider an alternative framing of data – and by extension, datafied archives – proposed by Stark and Hoffmann that might resonate with the archival profession: "data as a record of human activity" (2019, 7). A bit circular, perhaps, but rife with possibility.


****NOTES****

[1] Being a settler of Lithuanian and Scottish ancestry, I'll begin my talk by acknowledging that I live and work on Ontario's Treaty 2 territory, which had previously been (and remains to this day) the home of numerous First Nations including the Ojibwa, the Odawa, and the Potawatomi. I am grateful, as an uninvited guest, to be able to walk these lands and have a deep respect for the life within them.

My positionality as a white cisgender woman further affords me the unearned privilege of having my identity and experiences reflected back to me, of feeling included in the profession. I have not encountered the same structural oppression – in both work and life – that my colleagues and potential colleagues who are Black, Indigenous and People of Colour, trans, dis-abled or otherwise marginalized have. I am, however, committed to decentring the normative assumption that archivists, and digital archivists in particular, are white.
-> return to post.


A Paean to Low-Tech Web Design: Or, Two Arguments – Environmental and Archival – for Slowing our Collective Roll when Creating Websites

This (inaugural!) post is the product of a 3x melee combo of sessions from several conferences I've been lucky to attend over the past few weeks:

  1. a presentation by the folks from Low-Tech Magazine on their solar powered website at Our Networks in Toronto,
  2. U of Regina Library's Dale Storie speaking about Wax, an open-source static-site digital collections platform, at the superb and rousing Access conference in Edmonton and,
  3. my friend and colleague Mita Williams' panel talk at the DIY Urbanism symposium in Windsor, wherein she asserted “there is no greater time to blog” – so, here goes...

(As invigorating as it's been, I'm declaring a personal moratorium on conferences for at least the next month)

Now, I'm late to the party – archivists have been discussing the carbon footprint of digital preservation for years now (see Pendergrass et al. for a thorough background on the subject). And the rest of the world seems to be catching on as well (heh) when it comes to acknowledging that an information economy is not the clean, immaterial alternative to the manufacturing industries of old that we imagine it to be: for example, MIT Technology Review's “The Carbon Footprint of AI” and IEEE Spectrum's “Green Data: The Next Step to Zero-Emissions Data Centers.”

Storie made a compelling and pragmatic case for using Wax, grounded in minimal computing principles, to create web-based digital exhibits. For those of us who have been tasked with updating a steady parade of open-source content management systems over the years – WordPress, Drupal, Omeka, oh my! – the free-floating malaise of monitoring an ever-expanding roster of legacy sites for security vulnerabilities is likely familiar (not intended as a slight against open-source, BTW; just the reality of maintaining them). As is the Rubik's cube of patching the core application only to discover that you've broken two mission-critical third-party plug-ins. So, the lack of dependencies – both within the application itself and at the server-level – is reason enough to get back to basics with Wax and other static site generators. Not to mention faster performance, etc. etc.

But to Storie's list of benefits, I would add another pair: less resource-intensive (ergo, smaller carbon footprint until we shift to an energy infrastructure that isn't reliant upon fossil fuels) and ease of capture for web archiving. And I'll generalize beyond digital collections platforms to advocating for a minimalist's approach to creating *any* website. To ask oneself – do I really need an Escalade when a Yugo would do? Or a Prius? Let's go with Prius.

NB: I'm an HTML hobbyist – not a web designer. So, speaking as an amateur whose earnestness outweighs her expertise 10:1.

I. An Environmental Rationale

By way of concrete illustrations, check out Low-Tech magazine's *solar-powered website*. I ♥ the idea of having a website with such minimal energy requirements that it can be run from a little-solar-powered-server-that-could; it unapologetically states “This is a solar-powered website, which means it sometimes goes offline” (and last time I checked up on it, the site was hovering defiantly at 4% power).

You can read more about the site's energy saving design or, if you'd rather not find yourself down a rabbit hole (because there's plenty on the site to get lost in if you have a kindred soft spot for low-tech), allow me to summarize:

  1. start with a static site (i.e. not one that's dynamically generated on the fly, as with database-driven web applications),
  2. minimize image size through dithering,
  3. use the browser's default typeface, and
  4. omit a (graphical) logo.

I'm pleased to report that my own humble website (accidentally) follows most of these principles – because I love the Web 1.0 aesthetic and not because I've forgotten almost everything I knew about HTML5, right? (I seriously do loves my low-tech, though) The whole shebang clocks in at a mere 100kb, images included. I don't use the browser's default typeface because that would be where I draw the line between design and conservation – I still have standards, damnit.

Of course, a static website doesn't have to *look* low-tech, or be feature-sparse. I get that not everyone appreciates the gritty charm of dithered images as much as I do (LIKE THEM!!!). But with dozens of open source static-site generators available, there will definitely be use cases in which going with a low-tech website will be viable and even desirable.

Yes, it takes more effort than Wordpress' one-click install. It requires a bit of technical know-how; I prefer making skills more accessible through good instruction over tools that make content creation more accessible but keep the mechanisms by which they operate hidden, however. And yes, it's a miniscule drop in an anthropogenic ocean. But it also represents a broader shift from an abundance mindset in tech: a sort of digital frugality, enjoying doing more with less and finding creativity within constraints.

II. The Archival Rationale

Because I'm running out of steam (but more so to mask my ignorance), I'll keep it brief: static websites > dynamic websites when it comes to archiving them (amiright? Feel free to correct me as I've literally *just* started with web archiving). Not only will low-tech websites be immune to an aeons-worth of updates to the MP of your server's LAMP stack as an active record, there also are fewer moving parts to muss up one's web archiving efforts.

Speaking more from theoretical learnin' than actual experience: the difficulties with archiving dynamic, database-driven websites are substantially sorted? (Yes/no) But rather than a dynamic site that is archived as static with mixed results, a static website is captured as it exists on the web; "An end user should be able to navigate the preserved website in the same way that the original website was navigated, and as much as possible should see the same content and experience the same functionality" (Archivematica wiki). It follows that archiving a static website would be less prone to errors and to variability from the original when accessing its archive. If someone with a heck-ton more time logged with web archiving would like to expand on this, please do! I'm getting into fast and loose territory...

There are definitely sound and legitimate reasons for creating dynamic websites. But there are also cases in which a static site is more than sufficient for one's purposes, and then it's worth considering whether the trade-offs described by Storie and myself outweigh the ease of spinning up a Weebly or Omeka site. To invoke one of the lessons from the InterPARES project: preservation begins at creation.