Article 10 EU AI Act: Data and Data Governance Explained

cici BEL
Apr 23
9 min read

Article 10 EU AI Act: Data and Data Governance Explained

If Article 9 is the heart of high-risk AI compliance, Article 10 is the bloodstream. It governs everything that flows into your AI system: the data that trains it, validates it, and tests it. Get data governance wrong, and no amount of risk management will save you. Article 10 establishes requirements for data and data governance that apply to all high-risk AI systems — and it's the article with the closest connection to GDPR, requiring technical teams, data scientists, and compliance officers to work hand in hand.

What Does Article 10 Cover?

Article 10 applies to training, validation, and testing datasets used for high-risk AI systems. If your system learns from data — whether through machine learning, statistical methods, or rule-based approaches refined with data — Article 10 applies. The article establishes requirements across five subsections: Article 10(1) sets the scope for high-risk AI systems using data-driven techniques. Article 10(2) defines data governance and management practices — the process requirements for how you handle data throughout its lifecycle. Article 10(3) specifies data quality attributes — what "good data" means under the AI Act. Article 10(4) requires contextualization — ensuring your data reflects where and how the system will actually be deployed. And Article 10(5) creates a narrow exception for processing special categories of personal data (like ethnicity or health data) when strictly necessary for bias detection and correction.

Key Insight: Article 10 creates an "obligation de moyens" (best-efforts obligation), not an "obligation de résultat" (results obligation). You're not required to have perfect, error-free data — but you must demonstrate serious, documented efforts to achieve quality. This distinction matters enormously for liability.

The 3 Levels of Article 10 Requirements

Article 10 structures data governance into three interconnected levels, each building on the previous to create a comprehensive framework.

Level 1: Data Governance and Management (Article 10(2))— This is the process layer. It defines how you manage data throughout its lifecycle, from initial design decisions through collection, preparation, and ongoing use. Article 10(2) requires documentation and practices covering eight specific areas. Think of this as the "how" of data governance.

Level 2: Data Quality (Article 10(3)) — This is the attributes layer. It specifies what properties your datasets must have: relevance, representativeness, error-freeness (to the best extent possible), and completeness. These four quality attributes define what "good data" means under the AI Act and apply to training, validation, and testing datasets, either individually or in combination.

Level 3: Contextualization (Article 10(4))— This is the deployment layer. It requires that your datasets reflect the specific context where the AI system will actually operate, including geographic, behavioral, contextual, and functional characteristics. A system trained on US hospital data and deployed in Germany needs validation data from German healthcare settings. Context matters.

Bonus: Article 72(1) Compliance Presumption — If your datasets appropriately reflect the deployment context, you benefit from a presumption of conformity during post-market monitoring. Context-specific data isn't just compliant — it's strategically advantageous

The Infographic explain how to build a compliant data management from ground up — The 3 Levels of Article 10 — Infographic

Article 10(2): The 8 Data Governance Obligations

Article 10(2) establishes eight specific areas that your data governance and management practices must address. This is the operational core of Article 10 compliance.

1. Relevant Design Decisions — Document the choices you make when designing your datasets. Why did you select these data sources? Why these features? What alternatives did you consider and reject?

2. Data Collection Processes and Origin — Document where your data comes from and how it was collected. For personal data, you must also document the original purpose for which the data was collected. This includes the collection mode: was the data crowd-sourced, scraped from public sources, synthetically generated, or collected in real-time? Was it provided voluntarily? Was there an opt-out option?

3. Data Preparation Operations — Document all processing steps: annotation, labeling, cleaning, updating, enrichment, and aggregation. Each transformation can introduce bias — and each must be traceable.

4. Formulation of Assumptions — Explicitly state what you assume the data measures or represents. What are the statistical assumptions underlying your dataset? These assumptions shape everything downstream.

5. Assessment of Availability, Quantity, and Suitability — Evaluate whether your datasets are sufficient for the intended purpose. Do you have enough data? Is it available when needed? Is it suitable for training, validation, and testing?

6. Examination for Bias — This is the core obligation. You must examine your datasets for biases that could harm health and safety, negatively affect fundamental rights, or result in discrimination prohibited under EU law. Pay particular attention to feedback loops — where data outputs influence future inputs, potentially amplifying bias over time.

7. Measures for Bias Detection, Prevention, and Mitigation — Document the specific measures you implement to detect, prevent, and mitigate identified biases. This isn't just about finding bias — it's about acting on what you find.

8. Identification of Data Gaps and Shortcomings — Identify any gaps or deficiencies in your data that could prevent compliance, and document how you plan to address them.

Article 10(2) establishes eight specific areas that your data governance and management practices must address — The 8 Data Governance Obligations

IEEE 7003 Connection: The IEEE Standard for Algorithmic Bias Considerations (IEEE 7003-2024) provides detailed methodology for obligations 6 and 7. Its Clause 7 on "Data Representation" maps directly to Article 10(2) requirements, offering a 17-point metadata checklist that operationalizes these obligations.

Article 10(3): The 4 Quality Attributes

Article 10(3) specifies four quality attributes that your training, validation, and testing datasets must possess. The phrase "to the best extent possible" signals a best-efforts obligation — you must demonstrate serious attempts, not guarantee perfection.

1. Relevance — Your data must be relevant to the intended purpose of the AI system. Irrelevant data isn't just wasteful — it can introduce noise that degrades performance or masks bias .

2. Representativeness — Your data must be sufficiently representative of the population or context where the system will be used. This is directly connected to the contextualization requirement in Article 10(4). Representativeness isn't just about sample size — it's about whether your data captures the full diversity of real-world scenarios the system will encounter.

3. Error-Freeness — Your data must be as free from errors as possible. This includes factual errors, labeling mistakes, corrupted entries, and systematic measurement errors. The "to the best extent possible" qualifier is crucial — perfect data doesn't exist, but documented quality assurance processes do.

4. Completeness — Your data must be complete in view of the intended purpose. Missing values, truncated records, or gaps in coverage can compromise system performance and introduce bias. Completeness also means having appropriate statistical properties, including adequate representation of the persons or groups on whom the system will be used.

Article 10(3) specifies four quality attributes that your training, validation, and testing datasets must possess — The 4 Quality Attributes

Important: These requirements can be met at the level of individual datasets or through combinations of datasets. If one dataset lacks certain properties, another can compensate — provided the combination meets all four attributes.

Article 10(4): Contextualization

Article 10(4) requires that datasets account for the specific characteristics of the setting where the AI system will be deployed. This includes four dimensions: Geographic context — where will the system operate? Different regions have different demographics, regulations, and conditions. Behavioral context — how do users in the target environment actually behave? Contextual setting — what is the industry, domain, or application environment? And Functional context — how will the system be integrated into existing processes?

A medical diagnosis AI trained exclusively on US hospital data may perform poorly in Germany — not because the underlying medicine is different, but because patient demographics, disease prevalence rates, treatment protocols, and healthcare system structures differ. Contextualization bridges the gap between training data and deployment reality. It's the difference between a system that works in the lab and one that works in the field.

Article 10(5): Special Data Categories — The Exception

Article 10(5) creates a narrow exception allowing the processing of special categories of personal data (as defined in GDPR Article 9) — but only when strictly necessary for bias detection and correction. Under normal GDPR rules, processing data about race, ethnicity, health, or other sensitive categories is heavily restricted. Article 10(5) creates an AI Act-specific pathway — but with strict conditions.

The Cascade Principle: Before using actual sensitive data, you must demonstrate that alternatives are insufficient. First, try synthetic data. If synthetic data cannot adequately detect or correct the bias, then consider anonymized data. Only if anonymized data is also insufficient can you use actual sensitive personal data — and only under the six cumulative conditions below.

The 6 Cumulative Conditions: All six must be met simultaneously.

(a) Necessity — Bias detection or correction cannot be achieved with other data, including synthetic or anonymized alternatives.

(b) Technical restrictions — Technical measures must limit further use of the data, including state-of-the-art security measures such as pseudonymization.

(c) Access controls — Data must be secured, protected, and subject to strict access controls with documentation; only authorized persons with confidentiality obligations may access it.

(d) No third-party access — No transfer, disclosure, or other access by third parties.

(e) Deletion obligation — Data must be deleted as soon as the bias is corrected or the retention period expires, whichever comes first.

(f) GDPR Article 30 records — Your records of processing activities must include justification for why special categories were strictly necessary.

The graphic explain under with conditions should sensitive Data can be processed — Article 10(5) Decision Tree — When Can You Use Sensitive Data?

Recommendation: If you need to invoke Article 10(5), coordinate with your GDPR compliance. Conduct a Joint Assessment combining the Data Protection Impact Assessment (DPIA under GDPR Article 35) with the Fundamental Rights Impact Assessment (FRIA under AI Act Article 27). This avoids duplication while ensuring complete coverage.

The GDPR Connection

Article 10 has the closest connection to GDPR of any provision in the AI Act. Understanding this relationship is essential for compliance. Key intersections include: GDPR Article 5(1)(d)— The accuracy principle mirrors Article 10(3)'s error-freeness requirement; data must be accurate and kept up to date. GDPR Article 9— Special categories of personal data; Article 10(5) adds two new exception grounds specific to AI: bias detection and bias correction. GDPR Article 30 — Records of processing activities must be updated to include Article 10(5) justifications when special categories are used. GDPR Article 35 — Data Protection Impact Assessments should be coordinated with AI Act Article 27 FRIAs.

Recital 69 of the AI Act explicitly confirms GDPR principles apply throughout the AI lifecycle. This includes data minimization, privacy by design, and privacy by default. Privacy-preserving techniques — including technologies that enable training without data transfer (like federated learning) — are explicitly encouraged.

Standards and Methodologie

Two key standards provide operational guidance for Article 10 compliance. The ISO/IEC 5259 series addresses data quality for AI and machine learning systems. Part 3 covers data quality management requirements across the lifecycle, while Part 4 provides guidance specific to ML systems (supervised, unsupervised, semi-supervised, and reinforcement learning). Note that these standards predate the AI Act and don't yet fully address the Act's specific risk perspective focused on impacts to individuals — expect harmonized European standards to emerge.

The IEEE 7003-2024 standard provides the most directly operationalizable methodology for Article 10(2) compliance — particularly for bias analysis. Key elements include Stakeholder Identification (Clause 6) for identifying all parties who influence or are impacted by the AI system; Data Representation (Clause 7) for mapping how well your data represents identified stakeholders, with a 17-point metadata checklist that directly operationalizes Article 10(2) requirements; and the Bias Profile concept — a living document that captures bias considerations throughout the AI lifecycle.

IEEE 7003 is the methodological complement to Article 10. Where the AI Act says "what" you must do, IEEE 7003 shows "how" to do it.

Common Pitfalls

Based on regulatory guidance and industry practice, these are the most common Article 10 compliance failures:

Proxy Variables — A feature that appears neutral may actually proxy for a protected attribute. Postal code can proxy for ethnicity or income. University name can proxy for socioeconomic background. Identifying and addressing proxy variables is essential.

Synthetic Data Doesn't Mean Bias-Free — Synthetic data can inherit bias from the generation algorithm or the real data it was based on. Always analyze synthetic data for bias — don't assume it's clean.

Ignoring Deployment Context — Training on convenient data rather than contextually appropriate data is a common shortcut that leads to Article 10(4) violations and poor real-world performance

Feedback Loops — When system outputs influence future inputs, bias can amplify over time. Predictive policing systems are the classic example: more surveillance leads to more recorded incidents, which leads to even more surveillance — regardless of actual crime rates.

Documentation Gaps — Article 10(2) requires documentation of design decisions, collection processes, preparation operations, and more. Missing documentation means non-compliance — even if your actual practices are sound.

How TrustTroiAI Helps

Article 10 compliance requires systematic data governance across multiple dimensions — from initial design decisions through ongoing monitoring. TrustTroiAI provides structured support: Troi (Scope Check) determines whether your AI system is high-risk and Article 10 applies. Data Governance Templates offer structured workflows covering all 8 obligations from Article 10(2), with built-in documentation. The Bias Analysis Framework operationalizes IEEE 7003 methodology with stakeholder identification, data representation mapping, and bias profiling. Quality Assessment provides systematic evaluation against the 4 quality attributes of Article 10(3). GDPR Coordination offers joint assessment templates that align DPIA (GDPR Art. 35) with FRIA (AI Act Art. 27). And Finn (Context Assistant) provides situation-specific guidance on contextualization requirements.

Your data is only as compliant as your governance. Start your Data Governance Assessment at trusttroiai.com — and build the documentation that proves you did it right.

Key Takeaways

Article 10 establishes data governance requirements at three levels: governance processes (10.2), quality attributes (10.3), and contextualization (10.4), with a special exception for sensitive data used in bias correction (10.5). The 8 obligations in Article 10(2) create a comprehensive checklist for data management practices — from design decisions through bias mitigation. The 4 quality attributes — relevance, representativeness, error-freeness, and completeness — define what compliant data looks like.

Article 10(5)'s exception for special data categories requires exhausting alternatives first (synthetic, then anonymized) and meeting all 6 cumulative conditions. IEEE 7003-2024 provides the operational methodology for implementing Article 10(2) requirements, particularly stakeholder identification and data representation. And documentation is not optional — Article 10 creates a best-efforts obligation, but proving your efforts requires complete, traceable records.

In our next article, we'll show you exactly how to implement these requirements in practice — with step-by-step guidance, real examples, and templates you can use today.

Source

EU AI Act 2024/1689

Art. 10,

Recitals 67-70

IEEE 7003-2024

JRC132833

Academic Guide (hal-05365570v1)

t

r

u

s

t

t

r

o

i

a

i

Article 10 EU AI Act: Data and Data Governance Explained

Article 10 EU AI Act: Data and Data Governance Explained

What Does Article 10 Cover?

The 3 Levels of Article 10 Requirements

Article 10(2): The 8 Data Governance Obligations

Article 10(3): The 4 Quality Attributes

Article 10(4): Contextualization

Article 10(5): Special Data Categories — The Exception

The GDPR Connection

Standards and Methodologie

Common Pitfalls

How TrustTroiAI Helps

Key Takeaways

Recent Posts

Comments