Activist Internet Archivists: Caring about how and who to archive on the internet

By Elena Rowan

For the past year, I’ve been spending a lot of time online looking at digital libraries and archives. I was initially drawn to these places while looking for old lifestyle books and magazines from the 90’s, but remained interested when I discovered just how much these archives and libraries contain. My search kept bringing me to the metaphorical doorstep of the Internet Archive (IA). In their archive, I’ve found materials as variable as old Canadian public service announcements, vintage books and lifestyle magazines, and cereal-box video games. I started to wonder: why did someone decide to save all this information? What inspires these archivists, and why do they dedicate themselves to their cause? People often take archives for granted. But someone must do the work to care for this data: collecting it, tending to it, making it accessible. What motivates people to preserve things like old Tumblr blogs? What motivates them to preserve funny cat videos? How are they able to gather and make sense of so much data? How do they choose what to put in the archive, and what to leave out, when the goal is to ‘save history’?

The Internet Archive is a nonprofit organisation attempting to build a digital library containing historical records of everything meaningful on the internet since its inception. They define meaningful as something that another human being has linked to, spoken about, or referenced in their digital life, usually a website or cultural artifact in digital form. Their mission is to provide Universal Access to All Knowledge.

Started by Brewster Kahle, a self-described ‘digital librarian’, the Internet Archive has grown to contain one of the largest archives of English-language digital materials on the Internet. The archive is contributed to by a a variety of archivists working directly and indirectly with IA and its supporters. One of these contributors is Archive Team – a self-described ‘loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage’. Archive Team archivists use web crawlers and bots to scrape webpages and URL links on the internet, preserving as much data as possible for future recall of websites in their entirety, wherever possible. This archival method sets out to provide a snapshot of the internet at any given point in time, from the early 2000s to the present period. The Wayback Machine – the Internet Archive’s digital archive – allows users to go ‘back in time’ and see what websites looked like in the past. It is the ultimate ‘home’ for many of Archive Team’s archived materials.

The work Archive Team does is vital to academics, researchers, students, journalists, activists and anyone with an interest in the past, because it ensures that historical records, cultural artifacts, and other valuable resources are accessible to future generations. Through these tools, critical researchers can explore how the internet has changed over the past three decades, in the shift from a decentralized and democratic web to a more corporatized one. They can track platforms as they move from an open structure to an enclosed property regime instituted with the rise of large monopolies and map out the changing ownerships arrangements underpinning the internet. Archives like the Internet Archive also provide a means of preserving the memory of grassroots digital communities, documenting the time and energy that people put into building communities from the ground up. However, despite their best intentions, archivists in Archive Team and at the Internet Archive face several ethical, practical, legal and conceptual dilemmas in doing their work.

Consent

One dilemma Archive Team faces is the issue of consent. Archive Team is committed to archiving everything on the internet. But what if those associated with the content don’t want it shared? And whose voices should be taken into consideration? Should Archive Team respect the property rights of large corporations who assume ownership over data? Or the rights of the individuals who created the data in the first place? Though they feel strongly that all information should be archived, not everyone feels the same way. Archive Team has faced push back from different actors, including corporations and individuals, who don’t want to have data they’ve created or collected archived. For example, while archiving Tumblr pages being removed from the web, they faced criticism from both the corporation and users, some who are for archiving the site, and others who are against archiving the site (Ogden, 2022). In almost every case, they archive the site, often to the dismay of the corporate owner. In their own words,

“the approach Archive Team built up meant that initially, when a website announces it’s going down, we try to grab a copy of the website via best practices. Grab until they scream at you then grab harder, when they block you get around the block. When people hear about it, explain to them what you’re doing. And then make it available as best you can after the site comes down.”… “But, you know, sometimes the company doesn’t like it if we archive them. And then we archive them anyway… there’s people that ’don’t like what we do, especially some of the websites that we archive, but you know, that kind of sucks for them?”

In the face of this pushback, the archivists continue because their care for the materials themselves and concern over what it would mean as a society to not keep track of our digital past is so great. They justify their position as one that looks out for the needs of society, not just the desires of the few corporations who claim to own the data. Both Archive Team and the Internet Archive reject the idea that information publicly shared on the internet can or should be privatized. Materials made publicly available are and should remain the property of the internet commons, not private corporations. In this case, internet commons refers to the idea that the resources publicly available on the internet can and should be collectively governed, not privately owned by any one individual. They work to stop the reorientation of the web towards businesses and corporations needs above its users, especially when the users created the content being erased. For example, when Archive Team began archiving the many Tumblr sites being removed by its new corporate owner Yahoo for containing ‘sexually explicit materials’, Yahoo blocked entire swaths of Archive Team’s IP addresses (Ogden, 2022). This slowed down but did not stop Archive Team from continuing their work. Although the response seems to be universally positive, the community and individuals who made the Tumblr sites weren’t consulted or given the space to consent to the archiving of their materials. As the internet has grown to be controlled by monopolistic and exploitative platforms, Archive Team and the Internet Archive have worked to ensure the internet remains public, not only privately owned by those in powerful positions.

Caring for the digital materials on the internet and ensuring that they will remain accessible in the future is a key part of their relationship to these materials and their work as Archive Team. The materials they archive are disappearing, and their archives serve to create spaces which foster collective memory and collective remembering. Their care is based on an ethic that anyone should be allowed to curate, collect and tend to the data on the internet, not just those who ‘own’ the data. While each member of Archive Team’s motivations for doing this work are unique and personal, they all centre around finding value in being a part of preserving and documenting pieces of the internet that otherwise would disappear unnoticed. One member of Archive Team explained the main motivations behind their involvement:

“You have websites shutting down, there’s a deadline. You try to get it copied in time. And, yeah, in the beginning, it’s — you also feel like you really are doing something important because for many of these sites… it’s a part of history. And you know, we were the only group really that worked on getting a copy of that. And just like that, there’s a lot of cases where we are really the only people who make a copy of it. And yeah, that, that you feel like you really do something important with your time. And it’s fun.”

Out of concern for preserving what would otherwise be lost, these activist archivists spend countless hours finding, scraping, collecting, and documenting disappearing materials on the web. They give their time as volunteers, because their care for the materials is something they feel is important and meaningful. However, they have also faced criticisms from community groups, Indigenous groups, data sovereignty advocates, and research groups for making many errors in their work as Archive Team. There are serious issues of ethics, digital security, colonialism and custodianship to be considered when archiving materials from marginalized communities. While archivists use the language of accountability and transparency to frame their work, there are questions about how the archive is itself kept accountable and to whom. Many groups are concerned that by aiming to archive the entire internet, Archive Team and the Internet Archive are facilitating the work of surveillance, which can be turned around and weaponized against marginalized communities. For instance, in the case of archiving Tumblr, some of those whose Tumblr pages were being deleted did not want their materials archived. Archive Team initially ignored those requests, and only after public outcry did Archive Team create a place for the creators of blogs to opt-out of the archiving of their sites (Ogden, 2022). In this situation, Archive Team treated the community as an object to archive without considering the accuracy of what they were archiving or the consent of the Tumblr users who created the materials. Critics have called attention to how such practices may reinforce the surveillance of marginalized communities: “One person’s archive is another person’s police dossier” (New Design Congress, 2022, p. 45). When an open, digital archive is created, it is equally accessible to researchers with good intent as it is to organisations who wish to use data for surveillance of marginalized communities.

This example raises larger questions around the actors to whom archives should be accountable. Archive Team often treats the rights of the corporate owners and the rights of the individual creators as interchangeable, when they are very different types of owners. As critics note, the corporate owner only bought the rights to the content; they are not the same as the individual users who created the content. Content creators may not want all their work to be archived on the internet indefinitely, but may not have the power to remove it or have it taken out of an archive. While the confronting the privatization of the Internet by corporate monopolies may be laudable, this raises the questions around the ethics of data sharing in the context where data is treated as a commons, appropriable by anyone. However, as we will see in the next section, while activist archivists preach an ethic of universal access for all, their everyday practices highlight a more careful process of data curation, speaking to how the day to day work of building and sustaining archives involves negotiating social relationships.

Selection

The fact that archivists don’t take into consideration the consent of those who they archive might suggest that they lack ethics in caring for and curating information. However, a closer examination of their methods reveals a strong ethics of care, just not towards individual property owners. The archivists themselves often demonstrate a devotion to collecting everything. Archive Team’s founder describes the attributes of an ideal member as “kleptomania, paranoia and rage. Because there’s a constant worry that everything is not permanent. Everything’s on shifting sands”. This quote highlights some of the motivations Archive Team members have in doing their work; they worry about the ephemerality of the internet. The impermanence of digital materials elicits passionate responses, and for members of the group, means that everything must be documented. For the group, hoarding and collecting materials that may disappear is the most important part of their work. For the past decade, the group’s members have archived digital materials which would otherwise be lost in shutdowns, shutoffs, mergers, and plain old deletions, in order to “save history before it’s lost forever” (Archive Team, n.d.).

However, preserving the memory of the entire internet is a daunting task. After conducting interviews over Zoom with members of Archive Team, most of whom have contributed digital material to the Internet Archive, I found that the activists – despite seeking to provide access to “all knowledge” – cannot archive everything; they must be selective. In their selection processes they discuss potential ramifications like privacy violations and archiving false materials; however, they are not concerned with defining what should be classified as ‘history’ (Ogden, 2022). In deciding what to include in their archives, the archivists use a specific philosophy in their decision making (Ogden, 2021), and a robust ethic of care to guide their work. Their care is evident throughout their descriptions of how they gather and make sense of data.

Ogden criticizes Archive Team for failing to seek consent to archive data, in her case Tumblr sites, from those who created them. She concludes that such practices indicate a lack of care within the group. However, this does not mean that there is no ethic of care. In fact, speaking with Archive Team members, I discovered that  they deeply care about the data that they are archiving. Their ethic of care deviates from a privatized ownership model, in which data is considered to be the sole property of the creators. Instead, they tend to data as a commons, meaning that they treat it as common property that involves practices of collective stewardship. Archive Team cares about preserving online materials. Their care is for the preservation of the materials, and less for the finished archive itself. Although on the surface it could be said that Archive Team doesn’t have an ethic of care because they do not seek consent from property owners before they archive materials, their ethic of care is situated in caring for the archive as a record of the past. Their care is for the internet as a common, collectively owned space, not a privately controlled network. They do not seek consent from those corporations they are working against. Though people take archives for granted, someone must do the work to collect and care for this data. What motivates activist archivists to collect and tend to digital materials is an ethic of care. Archivists care less about the finished archive itself, and more how the materials created online in the past are available for future use.

Archivists are committed to an ideal of the archive as a space that is unbiased and accurate. As Ogden also observed in her ethnographic study of the group, Archive Team “espouses to treat all websites with equal priority, while positioning the collective as a non-partisan protector of web content and mobilizing a community of practice around notions of history, heritage and the future of the web” (Ogden, 2022, p. 119). Her article outlines two core tenets of practice which Archive Team follows: everything online is created equal; and archive first, ask questions later (Ogden, 2022, p. 119).

These tenets can create many challenges and issues when deciding which materials to archive, and which not to archive. For example, the founder of Archive Team talked about difficulties that arose when the team had to decide whether to archive conversations on the controversial web forum 8chan. As he describes:

“There was a schism… because what happened is, that group was archiving image boards, specifically, 4chan, 8ch, and a couple others. And I, in a rare show of power was like, no, go to hell. We’re not going to do that. Because (4chan, 8chan) is a machine designed to create offensive material, and iterates itself to become more and more offensive, and its power rests in its ephemerality. So by grabbing it through the Archive Team channels, we are rewarding the offensiveness while punishing the ephemerality”.

This quote highlights a dilemma faced by archivists. While fighting the ephemerality of the Internet, it is acknowledged by some members that some things are more ephemeral than others, and they should be kept ephemeral. By reproducing these materials, and storing them forever in an archive, they would be feeding the outrage machine that is 8chan. Creating an archive of ephemeral, toxic hate-speech would be breaking from their ethic of care.

More broadly, this example highlights how 8chan is differently positioned as a material to Archive Team. Linked to white supremacy, Nazism, hate crimes and multiple mass shootings, 8chan’s power comes from its ephemerality and anonymity, which is used to stoke outrage. Many of the comments made on the site are treated as disposable, encouraging violence and carelessness. In contrast, Archive Team’s care is focused on inclusion and preservation of materials worthy of future study. Ephemeral content that stokes outrage is not something to be archived for the future. In this case, the founder of Archive Team asked questions before they archived, breaking from their core tenets. They made a decision about what to archive on the internet, and what to leave out of the archive, based on their ethic of care for the future of the internet.

By creating an archive of digital materials that would otherwise be ephemeral, Internet archivists are playing a critical role in preserving the collective memory of the Internet. Against the trend over the past two decades to privatize data, transforming it into property hoarded by the Big Tech companies, their work defends knowledge as a commons. However, the ethic of care advanced by Archive Team and the challenges teach us that archiving digital materials is contentious work. Deciding when to seek consent to archive materials and what to archive is a process that must be enacted through social relationships that are premised on care and responsibility. Deciding what an archive holds is complicated. While both Archive Team and the Internet Archive focus on disengaging data from property rights, in doing so they raise questions about the ownership of data. If this data is not the property of large corporate platforms, then who has the right to manage it and on what basis? Archive Team and the Internet Archive are wrestling with these complex questions of data ownership in their day-to-day operations.

 

References

Archive Team | Wikipedia. (2023). | Wikipedia. (June). https://en.wikipedia.org/w/index.php?title=

Archive_Team&oldid=999937182

New Design Congress. (2022). Memory in Uncertainty: Web preservation in the polycrisis. New Design Congress. 1st Edition.

Ogden, J. (2022). “Everything on the Internet can be saved”: Archive Team, Tumblr and the cultural significance of web archiving. Internet Histories 6 (1–2) : 113–132. https://doi.org/10.1080/24701475.2021.1985835

Share via
Copy link
Powered by Social Snap