Data Reboot: 10 Reasons why we need to change how we approach data in today’s society
Stefaan G. Verhulst and Julia Stamm, The Data Tank
(Appeared first as Data Reboot: 10 Gründe, warum wir unseren Umgang mit Daten ändern müssen at 1E9)
For several decades, society has been aware of the rapidly increasing amount of data. The term “information explosion” was first used in 1941, and discussions about data overload reached a peak in the 1990s. The term “big data” is often considered to have first been used in a 1997 article (though its origins are debated). Yet as we approach a new era, marked by advancements such as ChatGPT and other generative AI, which many believe to be a new paradigm in computing, search, and information management, it’s time to reframe our approach to data and reconsider existing frameworks and governance practices that guide its use.
In the below, we consider 10 reasons why we need to reboot the data conversations and change our approach to data governance. They also provide the rationale for “The Data Tank”, a new think-and-do tank that aims to address societal challenges by fostering data sharing across sectors and contributing to building a fair and healthy data ecosystem.
1. Data is not the new oil: This phrase, sometimes attributed to Clive Humby in 2006, has become a staple of media and other commentaries. In fact, the analogy is flawed in many ways. As Mathias Risse, from the Carr Center for Human Rights Policy at Harvard, points out, oil is scarce, fungible, and rivalrous (can be used and owned by a single entity). Data, by contrast, possesses none of these properties. In particular, as we explain further below, data is shareable (i.e., non-rivalrous); its societal and economic value also greatly increases through sharing. The data-as-oil analogy should thus be discarded, both because it is inaccurate and because it artificially inhibits the potential of data.
2. Not all data is equal: Assessing the value of data can be challenging, leading many organizations to treat (e.g., collect and store) all data equally. The value of data varies widely, however, depending on context, use case, and the underlying properties of the data (the information it contains, its quality, etc.). Establishing metrics or processes to accurately value data is therefore essential. This is particularly true as the amount of data continues to explode, potentially exceeding stakeholders’ ability to store or process all generated data.
3. Weighing Risks and Benefits of data use: Following a string of high-profile privacy violations in recent years, public and regulatory attention has largely focused on the risks associated with data, and steps required to minimize those risks. Such concerns are, of course, valid and important. At the same time, a sole focus on preventing harms has led to artificial limits on maximizing the potential benefits of data — or, put another way, on the risks of not using data. It is time to apply a more balanced approach, one that weighs risks against benefits. By freeing up large amounts of currently siloed and unused data, such a responsible data framework could unleash huge amounts of social innovation and public benefit.
4. Data is key to AI: Amid recent enthusiasm over GPT-3 and ChatGPT, it is easy to forget the key role that data has played (and continues to play) in enabling artificial intelligence. Indeed, it is the rapid expansion of data — powered, among other things, by our interactions with social media and the proliferation of data-enabled devices on the Internet of Things (IoT) — that has led to the powerful machine learning algorithms that underlie advances in AI. Acknowledging the role of data is critical not just to better understand AI advances; it also allows us to better understand — and help address — some of the biases embedded within that data, and that could potentially exacerbate wider socio-economic inequities.
5. Collaboration is key: To a significant extent, the modern data ecology has been defined by a zero-sum game — one that pits data holders against data consumers, and large, data-rich companies against civil society and government regulators. In particular, the potential of data has been stunted by an outdated concept of “ownership” that walls off data from those who could most benefit from its application. The truth is that data is most powerful when it is treated as a shared resource, a public good that can benefit all stakeholders. Collaboration is key: it helps match data supply and demand and helps channel the most relevant data to those who can make the most effective use of it. Indeed, the key to unlocking innovation in the modern data economy lies in building a trusted and collaborative ecology that transitions from zero-sum to win-win.
6. Data is relational: Data is often treated as a disembodied entity containing similarly atomized “facts.” Data should actually be seen as relational, or contextual, its meaning and potential value are determined as much by the information it contains as the broader context within which data is collected and deployed. As Sabina Leonelli has written, “the presentation of data, the way they are identified, selected and included (or excluded) in databases and the information provided to users to re-contextualize them are fundamental to producing knowledge — and significantly influence its content.”
7. From individual consent to a social license: Social license refers to the informal demands or expectations set by society on how data may be used, reused, and shared. The notion, which originates in the field of environmental resource management, recognizes that social license may not overlap perfectly with legal or regulatory license. In some cases, it may exceed formal approvals for how data can be used, and in others, it may be more limited. Either way, public trust is as essential as legal compliance — a thriving data ecology can only exist if data holders and other stakeholders operate within the boundaries of community norms and expectations.
8. From data ownership to data stewardship: Many of the above propositions add up to an implicit recognition that we need to move beyond notions of ownership when it comes to data. As a non-rivalrous public good, data offers massive potential for the public good and social transformation. That potential varies by context and use case; sharing and collaboration are essential to ensuring that the right data is brought to bear on the most relevant social problems. A notion of stewardship — which recognizes that data is held in public trust, available to be shared in a responsible manner — is thus more helpful (and socially beneficial) than outdated notions of ownership. A number of tools and mechanisms exist to encourage stewardship and sharing. As we have elsewhere written, data collaboratives are among the most promising.
9. Data Asymmetries: Data, it was often proclaimed, would be a harbinger of greater societal prosperity and well being. The era of big data was to usher in a new tide of innovation and economic growth that would lift all boats. The reality has been somewhat different. The era of big data has rather been characterized by persistent, and in many ways worsening, asymmetries. These manifest in inequalities in access to data itself, and, more problematically, inequalities in the way the social and economic fruits of data are being distributed. We thus need to reconceptualize our approach to data, ensuring that its benefits are more equitably spread, and that it does not in fact end up exacerbating the widespread and systematic inequalities that characterize our times.
10. Reconceptualizing self-determination: A number of measures have been proposed to address asymmetries in the data economy. These include consent mechanisms, as well as methods such as personal information management systems and new notions of ownership. While each of these might offer some benefits, a more comprehensive approach is needed, one that would reconceptualize self-determination for the digital age. The notion of digital self-determination remains emergent; it builds on existing philosophical concepts of self-determination (developed, for instance, by Immanuel Kant). It is concerned primarily with agency over data, has both an individual and collective dimension; is specifically targeted at benefiting the already marginalized; and is flexible and context-specific (while also enforceable). If operationalized, these and other properties can help mitigate–and perhaps eventually eliminate–some of the imbalances in the data ecology.