Trove, headlined by the National Library of Australia has been an exemplary service in digital cultural heritage since I can remember. Even today, the landing page's meta description introduces it as "Australia's free online research portal". Browsing Trove data is indeed free.
In how far working with Trove's data is still free is an open question. To use Trove's API, one needs an API key. In February, Tim Sherratt's API keys were revoked without prior notice or - even upon request - much of an explanation. Trove thus put a stop to much-used services such as the GLAM Workbench, effectively hindering research and teaching in the digital humanities.
Libraries (and other GLAM institutions) should support research, not prevent it. Obviously: Solidarity with Tim Sherratt.
But almost more importantly, the situation there highlights a more general problem in many GLAM institutions' handling of their APIs. Tim asks:
What I want is pretty simple:
- my API keys back
- an apology for the way I’ve been treated
- more transparency from the NLA about API access
- an open discussion within the research sector about the problems and possibilities of working with Trove data
If there's a need for API keys, return them. But if there is no need for API keys: Abolish them!
What Are API Keys?
If one is to provide an online service, the default view for whatever data is to be presented is an HTML view. This HTML presentation (visible when one uses the "view page source" option in one's browser) is translated by the browser into a human-readable interface. As it caters to human readers, the data is often at best loosely structured and even minor design changes tend to change that structure as well.
If somebody wants to work with the data provided by a web service in any way that goes beyond whatever functionalities the web service itself offers, one wants really machine-readable, structured data. APIs provide a stable way of access to such machine-readable data.
API keys then are a way to identify and authenticate a user of an API. It roughly works as follows: There is some form of registration. Upon registration, the user gets a long, unique string (the key) that needs to be sent along upon every API request. Based on the key, the user's permissions are evaluated and outputs are generated based on what they should be able to read or do.
In part due to secondary effects, API keys can be used to:
- Limit access to a user's permissions (Should the user be able to read / edit the requested data?)
- Provide user-specific results based on preferences a user has previously entered for their account (The user has expressed their interest in pop music, the service will thus rank pop music over classical music in the outputs of a search query.)
- Track the use of the APIs (Users who are interested in a) are also interested in b), but not interested in c). As a) is a popular API, the institution should prioritize improvements to API b).)
- It allows the API provider to revoke the key to generally prevent access in a targeted fashion.
When Do API Keys Make Sense For Us?
Now, as GLAM institutions, which of those effects are actually in line with our aims? As the 2022 definition was controversial, here's the pretty uncontroversial and still useful 2007 (Archived copy) one:
A museum is a non-profit, permanent institution in the service of society and its development, open to the public, which acquires, conserves, researches, communicates and exhibits the tangible and intangible heritage of humanity and its environment for the purposes of education, study and enjoyment.
Reading the potential use cases of API keys against that definition:
Limiting access based on a user's permissions makes sense only if the data (or the action triggered by an API call) is actually access restricted. If the data is publicly available in HTML form anyway, it makes little sense to restrict API access to it. Doing so stands in stark contrast to the central aspect of being an open, institution in the service of society that researches and fosters education and study.
It is obviously different if the API call triggers some form of update to the data or the data is not public anyway. If the API concerns, for example, data that should only be available to workers at an institution (think location in the depot, insurance value), it makes sense to limit access. If the API is to update published information, it similarly makes sense to restrict access. And there is even a good argument for limiting access to APIs that go beyond what the web version offers in some computationally expensive way.
But in no way does limiting access to otherwise publicly accessible data accessed by the same ways as in the publicly accessible, human-centric interface make sense.
Second, providing specific results tailored to a user makes sense. If one is Facebook. If we are speaking about research, study, and education, reproducibility is a virtue. And a ranking or limitation of the search results based on a user's preferences should forbid itself, at least as long as it's not made very transparent. In the case of API results, it makes even less sense, as the use cases are potentially much broader - and one user of the API likely translates into multiple real-world users.
Tracking API use cases might seem to make sense. But if museum-digital's example teaches one thing, then it is exactly that mail works. If people need an API and you provide a findable mail addresss, they will simply tell you. If you want to track your API usd to better allocate resources on the other hand, good old server logs are entirely sufficient to do that.
What remains is that requiring the use of API keys gives the provider of API keys power over the APIs users and allows them to withdraw access unequivocally and without announcement. Again, it may sound reasonable to do so. What if someone were to use one's API to promote malicious / hateful / otherwise unwanted messaging. But a truly malicious, motivated actor will find ways to circumvent the block. If it's about collections thst are already published anyway, there is no way to really lock somebody out. A non-malicious actor who ends up locked out will be frustrated and much more likely to give up - and be it, because they do not want to resort to adversarial means.
Benefits to Keyless APIs
On the contrary, offering an API without keys, without the power to lock out specific users, removes the burden of options from oneself on an institutional level. Imagine a leadership not sharing the previously held principles and suddenly wanting to monetize data that should be freely accessible (this currently seems to be roughly what happened in the case of Trove / the NLA). Or, worse yet, somebody getting elected who orders you to share only their interpretation of history (which is the opposite to the multitude of histories one can tell based on freely accessible data). Or who wants to target political enemies and shut down their services.
From a service provider's perspective, offering one's APIs without requiring API keys is a guarantee against oneself turning rogue. All the more, if one were to use the same APIs to build the human-readable version of whatever content one serves.
In so far, offering one's APIs without requiring API keys tracks other established open culture and open source practice: Once one has released their collections data under a CC0 license, one cannot legally step back and suddenly start to sue people for using the data in the wrong context. If you release your code under an open source licence, you cannot suddenly backtrack and complain about unloved use cases. Open culture is open and stays so. The same goes for open source and free software.
Especially in the case of free software and open source, it is clear that the software would not be reliable and usable without such a guarantee. Why would it be different in the cases of open culture or GLAM institutions' APIs.
Many Are the Problem / Let's be Better!
Criticising Trove and the NLA is timely and right. But above, I criticized them using the ICOM definition of museums. Sure, because the ICOM definition comes to mind more easily. But also because GLAM institution of all kinds shar the same problem.
The Smithonian Institution's collections are hidden away in data.gov, requiring an API key. The official, documented API of the German Digital Library (DDB) requires an API key. And the list goes on. The DDB's official API also suffers from a related problem, where not even all published data is included in the API.
At museum-digital we make it a point to provide our public APIs without requiring API keys. If we do, it is for an actually good reason - that the API provides either access to unpublished data and/or allows writing to the database. And we make it a point to use the same APIs, so that the whole service would break down if we started to lock them away.
It would be great if more did the same. So that situations like the one surrounding Trove may never occur again - at least among institutions, who claim to stand for openness and intellectual vigor.
Openness and intellectual freedom are not meant to benefit us as service providers. They are a benefit to society. And if they mean, that somebody writes a more appealing frontend to the data we provide, we can either be sad - or we take up the challenge, learn from what they did better, and become better ourselves. Then, openness and intellectual freedom are not just following our mission statement, but also beneficial to the most self-interested among those who strive not to be.
Let's be open. Let's get better.
Update (2025-04-20): Fix some typos and style errors, add links.