A New Social Contract for AI?
Comparing CC Signals and the Social License for Data Reuse
By Stefaan Verhulst
Introduction
Last week, Creative Commons — the global nonprofit best known for its open copyright licenses — released “CC Signals: A New Social Contract for the Age of AI.” This framework seeks to offer creators a means to signal their preferences for how their works are used in machine learning, including training Artificial Intelligence systems. It marks an important step toward integrating re-use preferences and shared benefits directly into the AI development lifecycle.
Creative Commons has long enabled artists, educators, researchers, and institutions to share their work under flexible licensing terms that promote openness while protecting attribution. With CC Signals, they are extending their mission into the realm of AI, seeking to empower creators and content stewards to express normative preferences — not merely legal terms — for how their content should be reused in AI development. Rather than creating new property rights, their goal is to establish a social protocol for responsible reuse, rooted in reciprocity, credit, and openness. (For a more detailed rationale behind their latest efforts: From Human Content to Machine Data: Introducing CC Signals)
In particular, their new framework introducess four “signals” including:
- Credit — ensuring attribution in a way that is appropriate for the context and method of reuse;
- Direct Contribution — encouraging monetary or in-kind support to the content steward or community;
- Ecosystem Contribution — supporting the broader knowledge commons that AI systems rely on;
- Open — requiring that the resulting AI systems themselves adhere to principles of openness, based on emerging global definitions.
These signals are supposed to be machine- and human-readable, technically interoperable, and tied to standardized categories of machine use (e.g., AI training, inference), leveraging work from the Internet Engineering Task Force (IETF).
Adherence is voluntary but incentivized through reputational accountability, reciprocity norms, and self-interest of AI developers in maintaining a vibrant and sustainable digital commons.
From a responsible AI perspective, the CC Signals framework is an important development. It demonstrates how soft governance mechanisms — declarations, usage expressions, and social signaling — can supplement or even fill gaps left by inconsistent global copyright regimes in the context of AI.
At the same time, this initiative provides an interesting point of comparison with our ongoing work to develop a Social License for Data Reuse. A social license for data reuse is a participatory governance framework that allows communities to collectively define, signal and enforce the conditions under which data about them can be reused — including training AI. Unlike traditional consent-based mechanisms, which focus on individual permissions at the point of collection, a social license introduces a community-centered, continuous process of engagement — ensuring that data practices align with shared values, ethical norms, and contextual realities. It provides a complementary layer to legal compliance, emphasizing trust, legitimacy, and accountability in data governance.
While both frameworks enable signaling of expectations and conditions for reuse and offer a bottom-up complement to hard law and regulation, they differ meaningfully in scope, method, and theory of change.
Below, we offer a comparative analysis of the two frameworks — highlighting how each approaches the challenge of embedding legitimacy and trust into AI and data ecosystems.
Comparative analysis
1. Focus of Governance
Creative Commons Signals
- Focused on creators’ intent and enabling them to signal conditions under which their content (often copyrighted) can be reused in AI.
- Emphasis is on creative works, authorship, and licensing within the AI lifecycle.
Social License for Data Reuse
- Focused on communities (not individuals as content creators) who are data contributors rather than originators of creative work.
- Concerned with how data about people is reused, especially non-personal, collective, or community-level data.
- Prioritizes collective governance, especially where traditional consent or copyright doesn’t apply.
2. Mechanism for Agency
Creative Commons Signals
- Agency is expressed through copyright and licensing — creators or content stewards assert conditions (like source citation, cultural attribution, or reuse contributions).
- Operates primarily within existing IP regimes and leverages licensing frameworks.
- Declaring Parties can represent multiple contributors, coordinating reuse preferences across collections.
- Signals include options like Credit, Direct Contribution, Ecosystem Contribution, and Open.
Social License for Data Reuse
- Agency is expressed through participatory engagement, community-defined preferences, and negotiated conditions.
- Not reliant on IP rights; instead it builds on agency and social consent.
- Provides a detailed, structured process for co-creating enforceable terms of data reuse (see Social License Questionnaire and Sample Clauses for Social License-Compliant Data Sharing and Use Agreements).
3. Application Domain
Creative Commons Signals
- Applies to AI model inputs and outputs that rely on creative content (e.g., images, music, writing).
- Addresses harms like deepfakes, cultural misappropriation, or unauthorized AI training.
Social License for Data Reuse
- Applies more broadly to data-driven development, public service delivery, and community-collected or community-implicated datasets (e.g., health, environment, Indigenous data).
- Often concerns non-IP data that nonetheless requires social legitimacy for ethical reuse.
4. Underlying Principles and Values
Creative Commons Signals
- Builds on openness, attribution, and reciprocity principles.
- Seeks to clarify reuse rights, sustain the digital commons, and amplify creator voice in shaping AI.
Social License for Data Reuse
- Builds on digital self-determination, community participation, and agency
- Emphasizes rectifying agency asymmetries, especially for historically marginalized groups.
- Focuses on ongoing principled negotiation and oversight, not just signaling at a single point in time.
5. Implementation Approach
Creative Commons Signals
- Built for standardization and scale, using machine-readable, interoperable formats aligned with global norms (e.g., IETF categories).
- Applies modular, use-specific signals (e.g., AI training, inference) that can be combined in various ways.
Designed to be lightweight and low-friction, enabling broad adoption without requiring individualized negotiation.
Social License for Data Reuse
- Involves structured consultation and community participation to co-develop reuse terms.
- Produces tailored agreements based on context, values, and risk — often requiring more time and resourcing.
- Prioritizes governance legitimacy and relationship-building over scalability or automation.
6. Enforcement and Oversight
Creative Commons Signals
- Enforcement relies on norms, declarations, and licensing infrastructure.
- Still experimental in terms of integration into actual AI pipelines.
Social License for Data Reuse
- Envisions multi-layered enforcement, including:
- Contracts with sample clauses
- Trusted intermediaries (e.g., data trusts or co-ops)
- Compliance mechanisms like certification and smart contracts.
- Treats social license as an ongoing process rather than a static artifact.
7. Opportunities for Complementarity
- Together, these frameworks address both content creators’ rights (CC Signals) and community-level agency and consent (SLDR).
- There’s a shared emphasis on participatory governance, non-exploitative reuse, and the need for clear, community-aligned norms in AI development.
- A combined approach could help develop layered governance mechanisms — e.g., a data commons or AI dataset that includes both creator consent signals and community social licenses.
Possible Use Cases
1. Possible Use Cases for Creative Commons “Signals”
When to Use:
- The data or content is covered by intellectual property (IP) rights.
- The creator’s intent is essential (e.g., attribution, non-commercial use).
- The AI model is trained on cultural, artistic, or creative artifacts.
Examples:
AI Trained on Visual Art, Music, or Writing
- Use case: An AI tool that generates artwork trained on online illustrations.
- Why Signals? Creators can indicate reuse preferences (e.g., no use in AI training, attribution required).
Cultural Heritage Data
- Use case: Archives of Indigenous oral histories or traditional songs.
- Why Signals? Helps communities or rights holders assert conditions like cultural attribution or sacred content exclusions.
Academic or Open Educational Resources
- Use case: AI summarizers trained on open-access textbooks.
- Why Signals? Creators can tag content with reuse permissions tailored for AI (e.g., restrict use for profit-generating tools).
2. Possible Use Cases for Social License for Data Reuse
When to Use:
- The data involves groups or communities, not individual creators.
- It is not IP-governed (e.g., health, climate, behavioral, or mobility data).
- The focus is on participatory governance and contextual reuse.
- The data is part of a commons or shared infrastructure.
Examples:
Community Health Data Commons
- Use case: A data cooperative aggregating health data from rural clinics for AI disease prediction.
- Why SLDR? Ensures community oversight, benefit-sharing, and ethical reuse — not governed by IP but deeply sensitive.
Environmental Monitoring Data
- Use case: Citizen-collected air quality data reused in climate models.
- Why SLDR? The data is collectively generated and impacts communities; requires ongoing community consent and benefit alignment.
Mobility Data in Smart Cities
- Use case: Reuse of aggregated location data to optimize transportation.
- Why SLDR? High surveillance risk and uneven power dynamics make social license essential for trust and legitimacy.
Language Data from Marginalized Communities
- Use case: Building large language models (LLMs) incorporating underrepresented dialects.
- Why SLDR? These communities may not own IP but need governance over cultural and social implications of data reuse.
3. Possible Complementary Use Cases
AI Model Trained on Local Language Audio Files and Community Health Reports
- Signals: Helps local creators tag audio files with licensing restrictions (e.g., no commercial use).
- SLDR: Ensures the community can set collective rules on how health insights are derived, used, and monetized.
Conclusion
This is a dynamic and rapidly evolving space. As we both seek to reimagine governance for AI-era reuse, comparisons like this are intended not to compete but to contribute constructively. Creative Commons is currently inviting public input on its CC Signals framework as part of shaping a new social contract for the age of AI. We highly recommend reviewing and engaging with their effort — they are actively seeking feedback through November, and broader dialogue will be essential to ensure that emerging governance models remain inclusive, credible, and impactful.
Thanks to Adam Zable for reviewing earlier draft.