Better Data for Better Policy: Accessing New Data Sources for Statistics Through Data Collaboratives

6 min readOct 20, 2017

We live in an increasingly quantified world, one where data is driving key business decisions. Data is claimed to be the new competitive advantage. Yet, paradoxically, even as our reliance on data increases and the call for agile, data-driven policy making becomes more pronounced, many Statistical Offices are confronted with shrinking budgets and an increased demand to adjust their practices to a data age. If Statistical Offices fail to find new ways to deliver “evidence of tomorrow”, by leveraging new data sources, this could mean that public policy may be formed without access to the full range of available and relevant intelligence — as most business leaders have. At worst, a thinning evidence base and lack of rigorous data foundation could lead to errors and more “fake news,” with possibly harmful public policy implications.

This week I was delighted and honored to provide the opening keynote of the Power from Statistics conference in Brussels, co-organized by Eurostat and the European Political Strategy Centre — two organizations at the forefront of some of the best and most innovative thinking on how to better deliver evidence to policy makers. The conference built upon the insights generated by various roundtable events that culminated in the “Power from Statistics Outlook Report” — recommended reading for anyone interested in data driven policy making.

The four ways data can inform and ultimately transform the full policy cycle

While my talk was focused on the key ways data can inform and ultimately transform the full policy cycle (see full presentation here), a key premise I examined was the need to access, utilize and find insight in the vast reams of data and data expertise that exist in private hands through the creation of new kinds of public and private partnerships or “data collaboratives” to establish more agile and data-driven policy making.

Applied to statistics, such approaches have already shown promise in a number of settings and countries. Eurostat itself has, for instance, experimented together with Statistics Belgium, with leveraging call detail records provided by Proximus to document population density. Statistics Netherlands (CBS) recently launched a Center for Big Data Statistics (CBDS) in partnership with companies like Dell-EMC and Microsoft. Other National Statistics Offices (NSOs) are considering using scanner data for monitoring consumer prices (Austria); leveraging smart meter data (Canada); or using telecom data for complementing transportation statistics (Belgium). We are now living undeniably in an era of data. Much of this data is held by private corporations. The key task is thus to find a way of utilizing this data for the greater public good.

Value Proposition — and Challenges

There are several reasons to believe that public policy making and official statistics could indeed benefit from access to privately collected and held data. Among the value propositions:

Using private data can increase the scope and breadth and thus insights offered by available evidence for policymakers;
Using private data can increase the quality and credibility of existing data sets (for instance, by complementing or validating them);
Private data can increase the timeliness and thus relevance of often-outdated information held by statistical agencies (social media streams, for example, can provide real-time insights into public behavior); and
Private data can lower costs and increase other efficiencies (for example, through more sophisticated analytical methods) for statistical organizations.

These are just some of the ways in which private data could complement publicly held data. But our research has also made us acutely aware that challenges exist — such as privacy and security. One of these challenges concerns the representativeness of private data. Data held by private companies often represents a particular demographic subset, while ignoring others creating so-called data invisibles. Caution needs to be exercised in extrapolating general observations, with consequences for the population at large, from such data.

Concerns and Challenges of Accessing Private Data

Insufficient access to private data poses an even larger problem. For all its potential benefits, private data is often tightly held and not readily available to third parties — including statistical organizations. Many barriers to access exist. Some are legitimate, for instance concerns over privacy or security. Other barriers stem from companies’ desire to maintain a competitive advantage, outdated concepts of “data as property,” or simply from the general absence of a data-sharing culture.

Overcoming these barriers requires a careful and sophisticated balancing of incentives, risks and rewards. There need to be conversations about data responsibility, and, importantly, about the potential incentives for companies from sharing data. Our research leads to us one undeniable conclusion: Concerns and access barriers are a serious issue, and will need to be addressed if the optimistic vision of private data complementing and informing official statistics is to become a reality. At the same time we need to become smarter about the opportunity costs of not seeking to access new data sources, including for instance social media data.

Steps Toward Sustained and Responsible Data Collaboration

Access to New Data Sources by Thilo Klein and Stefaan Verhulst

In a recent paper, complementing various other work we have done on data collaboratives, Thilo Klein and I considered some steps that could help lower barriers to access to private data for statistical agencies. “Access to New Data Sources for Statistics: Business Models and Incentives for the Corporate Sector” was released as part of an occasional OECD — Paris 21 working series on statistics. It includes a number of potential approaches to data sharing, as well as specific recommendations that could help increase sharing between the private sector and statistical agencies. Since then, our work at The GovLab has expanded the initial findings.

Various generic models for sharing exist, and have been put into practice in contexts around the world. In our research, we considered several avenues to enable private-sector data to be accessed by third parties. These include direct API transfers of data; access via trusted third-party intermediaries or research partnerships; the organization of prizes and challenges; the creation of data pools or cooperatives; and the sharing of intelligence gained from the data but not the data itself. Each of these approaches comes with a particular set of advantages and disadvantages. Usually, there are tradeoffs between access and flexibility, on one hand, and security or comfort (for the data sharers), on the other hand. Which model is deployed might depend on the specific context; in some cases, it may also be possible to use a hybrid approach.

Considering these various models — as well as other research and existing examples — has led us to a set of recommendations for data sharing between private companies and public agencies. These recommendations are not quite universal, (context still matters), but they in general do apply across models and can serve as guideposts for organizations considering sharing data or setting up data collaboratives. We end with a summary of the key recommendations:

Data Stewards: Companies should develop and empower “data stewards” within their organizations to help define the value of their data to the public good, to subsequently enable responsible data sharing arrangements, and to be an advocate for the insights to be used to improve people’s lives. This definition of the term presents a different, yet complementary, understanding of data stewardship than the concept of data stewards as actors solely seeking to secure institutional information.
Networks of Data Stewards and Experts: Networks and groups should be created to share experiences on data sharing, establish professional practices to collaborate on data, and in particular, to establish what works and what does not.
Repository of Case Studies and Examples: A repository of detailed case studies where statistical agencies have been able to leverage private-sector data, considering lessons learned and best practices, can inform the creation of new methods for data collaboration.
Responsible Data Decision Tree: A responsible data decision tree, which would enable stakeholders to assess the benefits and risks of exchanging data, can help ensure that sharing data does not harm individual, group and organizational privacy and security.
Common Trusted Sharing Environment: A common trusted data sharing environment can go a significant way toward easing the inherent burden (especially on smaller organizations) of sharing data in a safe and secure way. Such an environment could be established by private companies, or by an established trusted third-party intermediary with an existing track record of safety and reliability.

Ultimately, we need a movement toward better data for better policy with a focus on how statistical organizations can leverage their immense expertise and credibility in a “post-truth” environment, and complement these strengths with new data sources and methods — provided by the private sector. Toward that end, please leave your thoughts, concerns and recommendations to this short essay.

Better Data for Better Policy: Accessing New Data Sources for Statistics Through Data Collaboratives

Written by Stefaan G. Verhulst