Sitemap

The Weaponisation of Openness? Toward a New Social Contract for Data in the AI Era

By Stefaan G. Verhulst

8 min readOct 21, 2025

--

For years, public interest advocates and other defenders of freedom on the Internet used “open” as a rallying cry. Open data. Open science. Open government. The idea was simple and noble: Knowledge should be shared freely, accessibly, and transparently to empower citizens, accelerate discovery, and improve governance.

For a time, this vision made sense, even if it was imperfectly implemented. But as with many well-intentioned revolutions, openness has more recently been weaponised. What began as a movement to democratise knowledge has instead become justification for a new kind of extraction — this time not of oil or minerals, but of meaning. This phenomenon has become especially evident with the rise of generative AI, which relies on its voracious appetite for public data to train its models and refine its predictions. In the process, the very datasets, research repositories, and public web archives that were designed to serve the public interest have been harvested to train the large language models now controlled by a few corporations in a handful of countries.

The situation is dire but it is not hopeless. In what follows, we describe the problem in greater detail, outline the insufficiency of current mechanisms, and then discuss some possible mitigating responses.

Press enter or click to view image in full size
(Dall-E Created)

The Emergent Data Winter

Scholars of geopolitics increasingly speak of “the weaponisation of interdependence.” The term refers to the way states use global networks of finance, energy, and technology — originally built to promote trade, prosperity, and mutual benefit — to exert power and serve national security objectives, thereby subverting the very ideals of openness and cooperation on which those networks were founded. In the realm of artificial intelligence, a subtler but equally consequential dynamic has emerged: the weaponisation of openness and data accessibility.

Early in the history of the Internet, a number of projects and initiatives were built to expand knowledge, foster collaboration, and democratize access to information. These projects, which include a number of public databases, commons-based projects like Wikipedia and government portals, were built on the assumption that shared knowledge strengthens society. Yet over time, as commercialism and rising regulatory control have challenged some of the founding assumptions of the Internet itself, these open resources have been mined at planetary scale for AI training. This process has largely taken place without meaningful consent, compensation, or even attribution. The result is a modern-day twist on the tragedy of the commons: The more open the digital ecosystem became, the more it was extracted.

The response to this weaponization of access has been swift and almost as harmful as the extraction itself–if certainly more understandable. Faced with the relentless scraping and appropriation of the open data they produce, mostly without compensation or consent, organizations once celebrated for their transparency are closing their data doors. Governments are retreating behind paywalls or access restrictions. Researchers are locking their archives. Those few companies that previously opened up their data for non-commercial use are likewise imposing new controls. As a result, what was once an era of open data optimism is now shading into a data winter — a growing reluctance to share, born of the fear that anything open will be exploited by others for private gain.

This backlash risks undoing decades of progress. The open data movement was never naïve about the dangers of misuse, but it believed that public value would largely outweigh private appropriation. It also believed that certain norms surrounding data access and reuse might be upheld, and it certainly didn’t envision the ongoing scale of the data appropriation. Those assumptions have been upended.

Broken Guards: Copyright, Creative Commons, and the Commons Itself

Part of the challenge today is that traditional regimes meant to govern (and adequately compensate for) information use are ill-suited to address the new age of extraction being driven by AI. Copyright law is premised on identifiable authorship and discrete works, not the probabilistic remixing of millions of pages to train an algorithm. The once-unifying frameworks of open licensing are likewise under strain. For example, the Creative Commons movement, long the bedrock of open digital culture, finds itself split between those who still see openness as an intrinsic good and those who now demand mechanisms to signal preference or restrict use. Initiatives such as CC Signals — a proposed method to indicate acceptable forms of reuse — similarly reflect the difficulties of reclaiming some control without entirely abandoning the commons.

The changing environment is evident in the adapting policies and behaviors of some key entities, too. Wikimedia, one of the stalwarts of the open access movement, has recently been forced to change. Its Wikimedia Enterprise project now charges commercial actors — including AI developers — for structured access to its data. This move, controversial within the open movement, underscores an uncomfortable truth: Sustaining openness in an age of industrial-scale extraction may require new economic and governance models.

The dilemma is particularly acute for those whose mission depends on public dissemination. If your mandate is to inform or educate the public — as it is for statistical agencies, media outlets, universities, and many public institutions — you arguably fail that mission if your content is not included in AI training sets. Exclusion means invisibility. Yet inclusion, under current terms, without attribution or compensation can mean exploitation. The answer, it seems clear, is not restricting or blocking access, but resetting the terms under which it takes place. We need, in short, new frameworks — and new governance models — to better align public value and private incentive, access rights and accountability, openness and equity.

A Reinvented Data Commons

What would a mechanism that fulfills the above goals look like? We are searching for an institution, or a governance framework, that would allow data to be reused responsibly, that would limit the extractive harvesting of data without shutting the digital door on the collective benefits of data and AI itself.

This is where the idea of a reinvented data commons comes in. Unlike the old data commons model of “open by default”, reinvented frameworks are designed to combine accessibility with governance. They provide structured ways to provide access to data — whether public, private, or hybrid — while specifying conditions of use, obligations for benefit-sharing, mechanisms for oversight, and systems for attribution and compensation. They represent a rethinking of openness for an era in which context, consent, and control matter as much as availability.

To imagine these new frameworks, it may be useful to consider historical precedent. In particular, the global governance of biological and genetic resources offers lessons and insights. For decades, indigenous communities saw their traditional knowledge and biodiversity extracted by pharmaceutical firms without recognition or return. The response was the creation of benefit-sharing regimes, such as the Nagoya Protocol, which ensured that those who provide access to knowledge or material resources would share in the resulting benefits.

A similar principle should apply to data. If AI developers rely on open repositories of public or citizen-generated data, then value derived from that data — whether monetary, intellectual, or societal — should, in some form, flow back to those who made it possible. Such reciprocity would not only be fair; it would sustain the very openness on which innovation depends, thus enabling the AI ecosystem itself. Without such self-reinforcing mechanisms, there is a real risk that AI will collapse into a feedback loop of diminishing returns, in which models increasingly train on their own exhaust and the quality of intelligence steadily decays — the proverbial “model collapse” that researchers warn about.

The contours of a reinvented digital commons are only beginning to come into focus, but a few design principles are becoming clear.

  • First, such a commons would be modular, allowing different levels of access depending on the sensitivity and provenance of the data, as well as the status of the stakeholder requesting the data.
  • Second, a revitalized commons would embed benefit-sharing protocols, so that those who supply or steward data — whether individuals, communities, or institutions — receive recognition, compensation, or access to the resulting AI application when that data is reused.
  • Third, it would incorporate participatory governance, including participatory boards representing data contributors, users, and affected communities, to ensure that decisions about reuse reflect collective interests and are aligned across stakeholders (embedding the governance principles of Elinor Ostrom).
  • And finally, a reinvented data commons framework would leverage technology, making use of tools — such as data provenance tracking, audit trails, and licensing metadata — to ensure that openness does not mean loss of control.

Together, these features would enable data to circulate productively while maintaining trust, fairness, and accountability in its reuse. They could help rebalance the relationship between those who generate data and those who use it, and ultimately lead to a more equitable, sustainable, and innovation-friendly digital ecosystem.

Conclusion: A Rethink, Not a Retreat, on Openness

The problem we face today is not one of openness per se, but rather one stemming from an absence of governance around openness. We need to move from open data to trusted data — where access is coupled with accountability–and we need to rethink the existing paradigm of data use, sharing, and reuse. While the old paradigm assumed that more openness automatically led to more progress, the new reality demands greater nuance.

Rethinking openness is not an act of retreat but one of renewal. As the world around it changes, the open movement must likewise evolve from the libertarian ethos of the early internet toward a civic ethos fit for the age of AI. We must design infrastructures — data commons, benefit-sharing mechanisms, and governance protocols — that preserve the spirit of accessibility while protecting against its weaponisation.

Without such evolution, the very ideals that built the digital commons will continue to be turned against it. A world where openness fuels concentration and transparency becomes a source of asymmetry is not the world the open data movement set out to create. What a cruel irony — and tragedy — it would be if the legacy of the open movement was ultimately not to empower the many but to entrench the few. Only by doing so can we reclaim some of the original spirit of openness not as a source of extraction and control but of collaboration, democratic participation, and shared progress.

Such a world is avoidable. But it requires us to build new social contracts around data reuse–integrating ethical, legal, and institutional frameworks that recognise data as both a shared asset and a site of rights and responsibilities — and it forces us to rethink the very meaning of openness.

Thanks to Akash Kapur, Andrew Zahuranec and Hannah Chafetz for review and suggestions to improve earlier draft.

.

--

--

No responses yet