Eclipse PanEval: Advancing AI evaluation standards in Europe and beyond

With the EU AI Act set to come into force in August 2026, Europe is entering a new phase in how artificial intelligence is governed and evaluated. Transparency and rigorous benchmarking of General-Purpose AI (GPAI) models are becoming not just best practices, but regulatory requirements. 

In response, the Eclipse Foundation is proud to announce Eclipse PanEval, an open source initiative designed to support transparent, standardised AI evaluation.

A European initiative in a global landscape

PanEval emerges within a growing global ecosystem of AI evaluation efforts. It builds on the FlagEval framework, originally initiated by the Beijing Academy of Artificial Intelligence (BAAI), while establishing an independent European implementation tailored to regional requirements. 

Rather than replicating existing systems directly, PanEval adapts this foundation to address Europe’s specific needs around governance, regulatory compliance, and linguistic diversity. This includes alignment with the EU AI Act, as well as a strong emphasis on data autonomy and transparency.

This approach enables PanEval to contribute to the broader evolution of AI evaluation practices while remaining fully grounded in European institutional and regulatory contexts.

From regulation to implementation

The EU AI Act introduces a risk-based framework for artificial intelligence, banning certain uses, imposing strict requirements on high-risk systems, and establishing transparency mandates for GPAI models to protect safety and fundamental rights. These requirements create a clear need for objective, standardised evaluation mechanisms.

An AI evaluation framework functions much like an objective 'exam' for these models, using defined datasets and metrics to assess performance, safety, robustness and bias. PanEval provides this framework as an open source, vendor-neutral resource, enabling organisations to demonstrate transparency and support compliance with emerging regulatory expectations.

By translating policy requirements into practical technical workflows, PanEval helps bridge the gap between regulation and implementation.

Technical foundation: Independent and scalable

PanEval builds upon the advanced capabilities of the FlagEval codebase to offer a robust "evaluation-as-a-service" platform capable of handling complex benchmarking workloads across large-scale AI models, including Large Language Models (LLMs). 

While it builds on concepts and architecture established in FlagEval, PanEval maintains an independent codebase and development path. This architectural de-coupling serves as a deliberate "de-risking" strategy, allowing the European codebase to adapt specifically to local requirements and regional standards without being tied to a 1:1 identical core.

The platform supports multi-dimensional evaluation, moving beyond simple accuracy to include metrics on safety, bias, and robustness. Its architecture is optimized for high-performance and high-concurrency workloads, enabling scalable and repeatable evaluation across diverse use cases. 

For developers and deployers operating in Europe, this provides a transparent and reproducible path to verify that models align with the technical documentation standards required by upcoming regulations.

Neutrality and governance by design

PanEval is hosted under the Eclipse Foundation, ensuring vendor-neutral governance and open, meritocratic participation. This structure is critical for building trust in AI evaluation infrastructure.

 All project technical assets and trademarks are managed within a transparent, community-driven process, enabling contributions from industry, academia, and independent developers on equal footing. No single organization controls the direction of the project, reinforcing its role as a shared resource for the broader ecosystem.

This model aligns closely with the expectations of regulators and stakeholders seeking trustworthy, independent mechanisms for evaluating AI systems.

The Road Ahead: Beijing and Paris

PanEval is launching with a clear focus on community development, technical maturity, and alignment with emerging regulatory needs. Further details on the project’s roadmap and technical direction will be shared at upcoming industry events, including the Global Open Source AI Innovation Forum on March 27, 2026 and GOSIM in Paris this May.

As AI systems continue to evolve, the need for open, transparent, and globally relevant evaluation standards will only grow. PanEval represents a step toward meeting that need, anchored in European values, while contributing to a broader international dialogue on responsible AI.

The Eclipse Foundation invites developers, researchers, and organisations to participate in shaping PanEval and advancing open approaches to AI evaluation.