Data Collaboratives: Matching Demand with Supply of (Corporate) Data to solve Public Problems
by Stefaan G. Verhulst, Iryna Susha and Alexander Kostura
Last Friday, February 19, the International Data Responsibility Group (IDRG) and the city of The Hague welcomed participants to the International Data Responsibility Conference. The day included six “Thought Leadership Breakout Sessions” led by the institutional members of the IDRG. We led a participatory session exploring the potential and value of Data Collaboratives. The audience — consisting of representatives of international, national, and city government officials; private companies; academic researchers; and civil society organizations — highlighted some of the emerging practices as well as key opportunities and challenges confronting Data Collaboratives.
As previously explored in this blog, Data Collaboratives refer to a new form of collaboration, beyond the public-private partnership model, in which participants from different sectors (private companies, research institutions, and government agencies) share data to help solve public problems. Several of society’s greatest challenges — from climate change to poverty — require greater access to big (but not always open) data sets, more cross-sector collaboration, and increased capacity for data analysis. Participants at the workshop and breakout sessions explored the various ways in which data collaborative can help meet these needs.
Current Experimentation: Different Shades of Openness
First, workshop participants identified a variety of “data demands” for opening up corporate data or creating new data-sets to better serve the public good. The list of data required was diverse, and included the following:
· Data for early-warning systems to help mitigate the effects of natural disasters;
· Data to help understand human behavior as it relates to nutrition and livelihoods in developing countries;
· Data to monitor compliance with weapons treaties;
· Data to more accurately measure progress related to the UN Sustainable Development Goals.
Next, participants considered the supply side of the equation, and in particular whether or not these data demands might be met by some already-existing data set that is currently closed or otherwise inaccessible. They discussed a number of data collaborative experiments that largely matched the taxonomy previously proposed by The GovLab. Examples included:
· Trusted Intermediaries: Statistics Netherlands partnered with Vodafone to analyze mobile call data records in order to better understand mobility patterns and inform urban planning.
· Prizes and Challenges: Orange Telecom, which has been a leader in this type of Data Collaboration, provided several examples of the company’s initiatives, such as the use of call data records to track the spread of malaria as well as their experience with Challenge 4 Development.
· Research partnerships: The Data for Climate Action project is an ongoing large-scale initiative incentivizing companies to share their data to help researchers answer particular scientific questions related to climate change and adaptation.
· Sharing intelligence products: JP Morgan Chase shares macro economic insights they gained leveraging their data through the newly established JP Morgan Chase Institute.
How to accelerate more experimentation: Enablers
Matching supply and demand of data emerged as one of the most important and overarching issues facing the big and open data communities. Participants agreed that more experimentation is needed so that new, innovative and more successful models of data sharing can be identified.
How to discover and enable such models? When asked how the international community might foster greater experimentation, participants indicated the need to develop the following:
· A responsible data framework that serves to build trust in sharing data would be based upon existing frameworks but also accommodates emerging technologies and practices. It would also need to be sensitive to public opinion and perception.
· Increased insight into different business models that may facilitate the sharing of data. As experimentation continues, the data community should map emerging practices and models of sharing so that successful cases can be replicated.
· Capacity to tap into the potential value of data. On the demand side, capacity refers to the ability to pose good questions, understand current data limitations, and seek new data sets responsibly. On the supply side, this means seeking shared value in collaboration, thinking creatively about public use of private data, and establishing norms of responsibility around security, privacy, and anonymity.
· Transparent stock of available data supply, including an inventory of what corporate data exist that can match multiple demands and that is shared through established networks and new collaborative institutional structures.
· Mapping emerging practices and models of sharing. Corporate data offers value not only for humanitarian action (which was a particular focus at the conference) but also for a variety of other domains, including science, agriculture, healthcare, urban development, environment, media and arts, and others. Gaining insight in what practices emerge across sectors could broaden the spectrum of what is feasible and how.
In general, it was felt that understanding the business models underlying data collaboratives is of utmost importance in order to achieve win-win outcomes for both private and public sector players. Moreover, issues of public perception and trust were raised as important concerns of government organizations participating in data collaboratives.
The need for new Intermediaries
Another topic of discussion at the workshop was regarding the types of intermediaries that might emerge or be required to foster an enabling ecology for data collaborative. The diverse experiences of the participants suggested that cross-sector data sharing for public good may require new kinds of actors (individuals or organizations) that do the following:
· Provide privacy as a service. Participants discussed the possibility of “taking the code to the data,” rather than extracting data from its corporate home and processing it separately, at another location. For example, intermediary organizations could specialize in analyzing privately-held data that is potentially sensitive, effectively providing privacy as a service to their clients.
· Match supply and demand. New intermediaries would have a business model that depends on communicating data demands to the right supplier, thus fulfilling the needed matchmaker role previously identified.
· Perform high-quality data analytics. As the capacity for big data analysis remains relatively weak and undeveloped, new data analytics intermediaries are required that could perform this essential role.
The Data Collaboratives workshop in The Hague raised many important questions about sharing corporate data for public good. Some of these questions — pertaining for example to lessons learnt, success factors, and sector-specific use cases — will be further explored by the co-authors. In particular, we will focus on the following questions:
· Given new examples, how can a comprehensive taxonomy for analyzing data collaboratives be defined?
· What are the success factors for the use of data collaboratives in specific contexts?
· Which data sharing and collaboration mechanisms are most efficient in specific contexts?
Contact: Stefaan Verhulst, Co-Founder of The GovLab (stefaan at thegovlab.org)
About the Authors:
Stefaan G. Verhulst is the co-founder and chief of research at The GovLab and is currently developing a repository of case studies and projects on Data Collaboratives.
Iryna Susha is a postdoctoral researcher at the Department of Informatics of Örebro University in Sweden and a guest researcher at the Faculty of Technology, Policy and Management of Delft University of Technology.
Alexander Kostura is a graduate student at The Fletcher School of Law & Diplomacy at Tufts University. With support from The Hitachi Center for Technology & International Affairs, his research focuses on data-driven development and humanitarian response.