How much can we trust The Archive?
Issue 4. The Archive

How much can we trust The Archive?

One popular notion holds that archives are neutral storehouses of fact. But archives have always been active, interpretive spaces: shaped by power, vulnerable to error, and at times complicit in distortion. From medieval chancery forgeries to Cold War propaganda, from the misattributions of ancient librarians to the vulnerabilities of digital records, archives are far from infallible. Yet they remain indispensable. Drawing from his experience at the Blinken OSA Archivum in Budapest, where a unique Cold War collection is kept, historian István Rév argues in this essay that the archive must be defended. Not just as a technological system, but as a civic institution grounded in public responsibility.

Those who work on issues related to history approach the archive as natural scientists approach the laboratory: changing and manipulating hypotheses and assertions to fit records, or using records that will fit into the hypothesis. Archives have never been perceived as depositories of dead certainties, but archival documents have usually been considered as credibilia, sources worthy of belief. Generations perceived the archive as a trustworthy institution. Archives (or archivists) could play the role of social arbiters, both in forensic and historical disputes. The Archives were usually considered credible institutions even though, from the earliest times, they had housed infelicitous, misattributed, misidentified, or bluntly and consciously forged documents.

The authority of the archive as an institution traditionally rests on trust in the authenticity and integrity of the documents housed inside its walls, as well as on trust in the integrity of the archivists themselves, the custodians of the documents. To ensure documents can be trusted as evidence, there must be a clear and unbroken chain of custody — a record of who handled them and what was done with them. This proves they were protected both physically and with moral responsibility. A chronological documentation must be kept of how documents circulate (from officials or authors to the archive itself, and then their movement within the archive). From the 1830s onwards, several public manuals codified archival theory and practice. Among them were regulations requiring the respect pour les fonds, issued by François Guizot, the French minister of public instruction. Archival integrity also rested on the principle of provenance, which stipulates that records originating from a common source are kept together, if not physically, at least intellectually, with the help of the archival finding aids.

But a difficulty arises: the sheer fact of working on archives, the daily routine of this activity, endangers the authenticity and integrity of documents. To this day, no archive has ever been able to exist without some harm being done to the documents it keeps. Even in traditional archives, documents couldn’t remain completely unaltered. Keepers of the archives, minor officials, monks, scribes, or learned antiquarians would copy, re-scribe, translate, and annotate documents. The Library of Alexandria was one of the first known archives, as in Ptolemaic Alexandria, the librarian, “the guardian of the books”, was considered to be the “keeper of the archives”. It contained tens of thousands of papyrus scrolls, many of which were confiscated from ships in the city’s harbor and copied in the library (the copy was then returned to the owner). In the course of copying, the text was frequently altered. This could happen involuntarily through a mistake made by the scribbler, or consciously, to “improve” the original.

Archivists or philologists (“lovers of words”) of the Ptolemaic Museum were engaged in conserving, “rectifying” and restoring a past (corpus) that had supposedly become altered, distorted, contaminated or corrupted. The philologist Daniel Heller-Roazen writes that the Library of Alexandria, in its very notion and its practices, stood for an understanding of “history as catastrophe”. The ongoing daily activity of the Archive is a heroic attempt to preserve or restore the presumed but corrupted “original” and to prevent the worst from happening — a flood, a fire, an invasion of mice or worms, sudden technological changes, digital decay, or any event that can make retrieving documents impossible.

Libraries and archives have been set up to collect under one roof, and thus preserve otherwise dispersed texts to prevent the disappearance and destruction of important records. Yet collections of documents have always been highly vulnerable: the majority of the papyrus scrolls of the Library of Alexandria most probably would have disappeared even without the fire that allegedly destroyed this institution. Papyri survive more than two or three hundred years in exceptional climatic circumstances, and even then, bugs and mice might finish off what climate conditions have left intact. Papyri, like other manuscripts, had to be copied to be preserved, and the corrected documents were then often reattributed, with individuals named in copied documents possibly reappearing in new contexts.

Archives have never been completely immune from the suspicion of containing forged documents, something which could happen in the interests of external authorities, private individuals, or the archives themselves. Historians of the Middle Ages are familiar with the so-called “chancery forgery”, in which documents, as entitlements, were produced retroactively. The medievalist Patrick Geary persuasively argues in his book Phantoms of Remembrance that Western monastic archives started with massive selective remembrance. Documents deemed contrary to the interests of a monastery were discarded, or fake documents were produced, to strengthen the spiritual, legal, or economic standing of the house. Forgeries implicated benefactors, legal heirs, whether dead or still alive, as well as their past deeds. This was aimed at securing a monastery’s legal titles to property allegedly bequeathed to it by a benefactor. Revisiting and rectifying the past was a double process of creation and destruction. In most cases, original documents were destroyed to cover up any trace of alteration. The archive of the Abbey of Saint Denis, in France, reached “back to the dawn of institutional archival formation” and “was systematically pillaged and destroyed [already in the 11th century] to build from its fragments a more useful and appropriate past”, writes Geary. The history of the French royal archives started with “the reconstruction” of the archive itself, after King Philip Augustus’ tax books were captured by Richard the Lionheart in 1194. In medieval times, a document was considered authentic, thus true, simply because it was in the archive.

On February 9, 2024, when Vladimir Putin gave an interview to the disinformationer Tucker Carlson, he handed over, in a very demonstrative way, a folder with alleged archival material. This material, Putin claimed, confirmed the “historical unity” of Russians and Ukrainians. Such bella archive, or war with archives, is hardly a new phenomenon. When Louis XIV decided to claim German border territories, he ordered a search for — or, if need be, the fabrication of — titles that could substantiate his territorial ambitions.

Precisely because documents kept in archives have always been prone to both material and textual deterioration, they had to be moved, reshelved, reboxed, transcribed, altered, reattributed and, as a result, recontextualized. The emergence of digitization has made this problem worse.

The danger to an archive’s authenticity and integrity became more pervasive. Digitization might affect a text and its readability, if only because optical character recognition software is far from perfect and cannot faithfully recognize printed text, manuscript, or longhand. Digitized information is always in flux: from one server to another, from one format to another, uploaded to the cloud, and then copied, stored on multiple servers. Cloud architectures require the replication of data. That data is in constant automated movement from one location to another, without the consent or the knowledge of the administrator, nor of the archivist.

Archivists working today in a digital environment are confronted with the so-called Collingridge dilemma, named after the British academic, David Collingridge. He came to this conclusion: when (technical) innovation becomes available, the problem it may cause cannot be foreseen; when problems and dangers become apparent, it is already too late; change has become expensive, politically entrenched, difficult and time-consuming. Machiavelli formulated a similar idea in The Prince, some five hundred years ago: “when one recognizes from afar the evils that arise in a state (...), they are soon healed; but when they are left to grow because they were not recognized, to the point that everyone recognizes them, there is no longer any remedy for them”.

Archivists are unable to foresee the exact impact of technological changes, including the introduction of AI, on the integrity of the material they use. Had they been able to understand those implications when new technologies were first introduced, before they became embedded and widely distributed, there would still have been a chance to weigh in, to voice concerns, and perhaps modify digital tools or their parameters. But by the time the full impact of the new technology becomes apparent, it’s too late: strong technological, corporate, and/or political forces have vested interests in keeping digital things as they are, even at high social costs.

Being able to connect digitized archival documents to other collections, such as specialized data, means that original documents and their content can be set in a new and completely different frame. Descriptive documents can now be related to sensor or geospatial data. Radio-frequency identification and social data can be linked to images obtained from surveillance cameras or to data originating from the Internet of Things. All of this can recontextualize the document. Relating and connecting archived records and data coming from different repositories (historical, social, commercial, surveillance) results in a deep layer of recursivity. Or, to put it simply, there can be endless iterations. As a result, the collectors or keepers of the original records find themselves unable to predict what such an aggregation of data might lead to.

Archives exist not only for the collection, storage, and preservation of documents, but also to make them available and retrievable. They exist for the people who want to study, consult, or scrutinize documents — whatever their reason to do so. Archives should provide access to the documents they keep. But the very manner in which documents can be accessed matters a lot. Electronic copies of documents accessible on an archive’s website may become available to anyone and without any public control, without, say, the mediation of a researcher bearing ethical and moral (not just legal) responsibility for how personal data might end up being published. It is in the public interest that relevant information — even containing the names of identifiable people — should become available, but it is also in the public interest that archives should retain their status as trusted institutions.

Trust springs from the assumption that an archive guards the authenticity and integrity of documents, and that it will neither “de-access” them nor destroy them. Trust flows from an understanding that an archive will not mishandle sensitive personal information, that it will handle documents in a legally and ethically sound way. In a nutshell, archives are expected to engage in a never-ending balancing act between their responsibility to the public, which has the right to know, and their responsibility to individual people, who have the right to be protected.

The archive where I work, the Blinken OSA Archivum, in Budapest, is one of the largest Cold War and human rights archives in the world. Its two large Russian language collections demonstrate the dilemma I have in mind. On one hand, there is the so-called “Red Archive”, with official reports drawn up by Soviet party and government sources, and on the other, there is the “Samizdat Archive”, which includes unofficial, underground documents produced by generations of anti-Soviet opposition networks.

Here’s an example of the dilemma: documents in the “Red Archive” mention the name of a specific Russian psychiatrist, who, according to official Soviet sources, had “betrayed his country” and defected from the USSR to live in the West. Yet the name of that person also surfaced in samizdat publications, which describe him as having taken part in the forcible psychiatric treatment of political opponents. According to the “Samizdat” collection, he had then arrived in London as a self-styled critic of Soviet psychiatry and was offered a position at the famous Tavistock clinic. Because an archive must protect the integrity of its documents, redacting the name in either of the collections is out of the question. Nor is the archive’s task to verify the veracity of the information contained in separate sets of documents. So, in this example, as in others, the Archive as an institution does not take a stand as to the truthfulness of sources.

For a historian, some of the most important data found in an archive are the names of people connected to certain events. “Sentences containing proper names can be used to make identity statements which convey factual and not merely linguistic information,” noted the philosopher John Searle. In a specific and limited sense, there is no difference between natural sciences and the historian’s profession: both require experiments that can be repeated, checked, confirmed, or rebutted, using the same data. For the historian, repeating the experiment means going back to the archive and scrutinizing documents. To be sure, since the end of the 1960s, when Searle wrote his essay, things have changed. In today's world, aggregated sets of metadata, including geospatial information, can help establish a person’s identity, even without mentioning their name. For Google, a name is just noise, as an individual can now be identified without it.

So the credibility of an archive is both an epistemological and an ethical question: working with archival documents requires a particular style of reasoning, and a specific author-function. A historian’s exploration of the archive was once compared to what fieldwork is for the anthropologist: first-hand experience that provides a unique status and the earned right to make certain claims with authority. This is obviously no longer the case, either in anthropology or in history writing. The ubiquity of captured documents, the diminishing importance of physical storage, the zero-marginal cost of reproducibility, and the diminishing status of expertise, reliability, factuality, and truthfulness have all created a new situation. This no doubt affects issues of credibility of sources, and traditional institutions.

In order to restore credibility, archives should today revisit their catalogue and their finding aids. The technique of cataloging goes back to Hellenistic times, when the poet and scholar Callimachus first devised the Pinakes, the Tablets that aimed to record documents in summary abbreviations. As the philologist Daniel Heller-Roazen puts it: “Like every technical advance in the forms of writing, the Pinakes marked a rupture in the tradition from which they emerged (…). The works listed in the Tables could be inscribed in the archive (…) only by being transcribed in a new form; they were transmitted by being transformed. (...) The works registered in the Pinakes thus became, by necessity, what they had until then never been: figures, ciphers, mere names of themselves. Such was the price each work paid for its admission to the archive (…), it would be remembered only in being dismembered.”

This is the fate of every archive. But our Cold War archive has other specificities as well. We inherited a heavily biased catalogue, with ideologically loaded subject headings. The structure of our archive, to a certain extent, is a faithful image of the work that went on at Radio Free Europe/Radio Liberty, the most storied propaganda media during the Cold War from the early 1950s to 1993. No doubt, the catalogue and subject headings are important historical documents, as integral parts of the records they describe and refer to. At the same time, we, the guardians of this documentary heritage, and also the researchers who come to the archive, are all, in more than one sense, prisoners of the Cold War logic. We have to face this double bind and find a way to remain historically faithful to the origins, logic, perception, language, and circumstances of the original archive, while at the same time re-contextualizing (from a post-Cold War perspective) the inherited documents. Credibility requires both of those efforts.

Following the events of 1989/90 (the fall of the Berlin Wall, the dissolution of the Soviet Union, and the end of South Africa’s apartheid regime), the opening of secret archives fueled two contradictory assumptions. One of them was that documents suddenly made publicly available would finally provide answers to disturbing historical questions, they would make uncertainties disappear, and they would point to those responsible for crimes. They would provide a basis for justice, or at least restitution. The opening of archives (some would be closed again, later on) created a perception that "facts" were just there, hidden in the depths of well-guarded archives until now, awaiting discovery, rescue, and reworking. They would help debunk a number of historical claims. Anything seemed possible, like a retroactive historical miracle. Yet these hopes would be unfulfilled, and the public felt betrayed. As a result — and therein came the contradictory assumption — some people pointed to a sort of conspiracy, involving the shredding of documents that would have either unequivocally identified those responsible for misdeeds and crimes, or that would have restored the dignity of victims. These illusions and disappointments led to the further erosion of trust in public institutions, including in the archives themselves.

At this point, I am convinced that trust in the infallibility of technology cannot be a substitute for the strengthening and restoring of institutional trust in the Archive. In this, I differ from the ARCHANGEL project run by the British National Archives and the Open Data Institute, which aims to enable “a shift from an institutional to a technological underscoring of trust”. The assumption behind the ARCHANGEL project is that “historically an archives’ word was authoritative, but we are now in an age where people are increasingly questioning institutions and their legitimacy”. I have serious doubts about replacing trust in institutions with trust in algorithms, artificial intelligence, or blockchain technology. Indeed, distributed ledger technology (blockchain) and/or digitally fingerprinting digital documents can be used to secure the integrity of some records in the future. Once stored in the blockchain, the fingerprint can be verified, and a document will remain the same no matter what changes may be introduced to the format of a file over time. Digital data, however, is much more fragile than analog or printed documents. Digital data is intangible; it is vulnerable to tampering. Digital formats constantly change and evolve. Content needs updating. So there is a risk that files may be consciously or accidentally corrupted. Digital preservation and the constant need to migrate documents can endanger the integrity of documents.

Studying the events of the past based on archival work involves a credibility handicap, compared to the verification process of natural sciences. The post-1945 dream of “big science” — a close connection between universities, research institutes in natural sciences, and governmental, including military, funding agencies — arguably contributed to a diminishing trust in academia and science. It’s one of the reasons why visible moral engagement for honesty, respect for factuality, and issues of credibility should be considered as one of the most urgent tasks of archives. It took about a thousand years to build a certain level of trust in the Archives, and we are surprised to see how little time it can take to lose such trust. When Jean Mabillon, the author of De re diplomatica (1681), the founding work of diplomatic and archival science, set forth the “rules of history”, he put the “love of truth” in first place, followed by sincerity, and authority.

The historian can never know all the relevant facts. Some important details can never be recovered, some were never recorded, and some are utterly misrepresented in the surviving documents.

All these factors make historical reconstruction extremely risky and difficult, but not, perhaps, hopeless or impossible. That limit to a historian’s work is, of course, particularly stark when working with unreliable archive documents. Skeptics will seize upon that incomplete knowledge to suggest a general blank acquittal from possible historical responsibility. They will argue that unknown, undisclosed, unattainable, perished, or destroyed documents would shed a different light on a historical event, and that the consequences of that event would be essentially different if all the facts could be properly taken into consideration.

We usually do not know all the relevant facts — even in our ordinary everyday life, or as Ignazio Silone put it: "Behind every secret there is another secret”. Still, we can form reasonable, usable opinions about incidents in the lives of others, however different they may be from us (a different gender, a different past, a different tradition, environment etc). Historians have been trained to reconstruct incidents of the past through the critical interpretation of partially available sources and their connections. The historian can achieve accuracy — which is not the same thing as absolute certainty — without total, full knowledge. By having access to relevant and sufficient documentary sources, serious historians can base their claims on sound foundations.

Historical interpretation can’t be formalized in a mechanical way. The historian might fail, even while trying to be faithful to the two virtues of truth: sincerity and accuracy. Accuracy is the virtue of carefully investigating and deliberating over the evidence for and against a belief, before asserting it. And sincerity is the virtue of genuinely expressing what one, in fact, believes - in the case of history - based on identified facts, meticulously scrutinized in their proper context. As the philosopher Ian Hacking remarked, it is difficult to accept that skeptics can be better at delivering the truth about the alleged impossibility of historical reconstruction than some historians are at delivering truthful accounts of past events.

Surprisingly, those who are skeptical about the very possibility of making the recent past intelligible never seem to hesitate much before making strong assertions about “heroic” figures of a distant, allegedly heroic national past. Skeptics harbour more doubts, it seems, when the discussion is about a figure from the recent past.. The closer the historical figure is in time, the more particular, complex, and less penetrable his or her motivations supposedly become. The closer we get to our present, the more demanding the historian’s task becomes. Large quantities of (oftentimes contradictory) information should, according to the skeptics, constrain or silence the historian. Still, most of us — including skeptics — who live under the rule of law, accept that judges are entitled to administer justice, although, as we know, not even a judge can ever be in possession of all the relevant facts and motives of a defendant. Courts, while carefully weighing the deeds and motivations of a person, can naturally make mistakes. Justice is not exempt from occasional miscarriage, but if the possibility of ever arriving at an intelligible reconstruction of past events were denied to the courts, as is so often done in the case of some historical reconstructions, none of us would ever stand a chance of living under the rule of law.

In our current, adverse political climate, rebuilding lost trust, without which citizens can’t live together in a normal way, is a crucial task. The trust we place in the Archive can’t be disconnected from the trust we need in the institutions a democracy relies on to function properly.

Illustration: Alisa Gots

István Rév
István Rév

Historian

Publications

View All