Internal Warnings, Public Stakes: Strengthening AI Whistleblower Protections in the EU

4 Sept

By Jakub Kryś (Spring ‘25 Talos Alumni)

Note: This piece was drafted as part of a policy brief project during the Talos Fellowship. The views and opinions expressed in this policy paper are those of the author(s) and do not necessarily reflect the official policy or position of Talos Network.

Originally published by AIWI.

Abstract

Whistleblowers may play a crucial role in the governance of advanced artificial intelligence systems. Their importance is particularly acute in the context of risks that are difficult to detect from the outside, not yet captured by existing regulation, or require significant technical expertise to evaluate. Within the European Union, while the EU AI Act and the Whistleblower Protection Directive offer a strong legal foundation, we identify important gaps in both the scope and implementation of these protections. In particular, risks arising from internal deployment may not clearly fall under the AI Act’s legal remit. Moreover, internal reporting channels are often missing or inadequate, and external authorities lack the staffing and expertise to reliably handle whistleblower disclosures. Civil society initiatives can help fill these gaps but remain informal and undersupported. We present a set of recommendations to strengthen the whistleblowing ecosystem in the EU, including reinforcing internal and external channels, formally engaging advisory NGOs, and establishing a dedicated mailbox within the EU AI Office.

Executive summary

The rapid pace of AI development has left regulation struggling to keep up. As with any emerging technology, it will be very challenging to preemptively cover all new sources of risk with a static legal framework. Combined with the fact that frontier AI systems are created by private companies rather than governments or academia, this means that there is a pressing need for greater transparency, understanding and accountability. In this work, we argue that AI whistleblowers could play a crucial role in achieving all these objectives. In particular, as insiders placed within such companies, they are in a prime position to warn against risks that are hard to detect from outside, difficult to assess without deep expertise, or are concerning, yet do not violate any existing laws.

Unfortunately, current legal protections in the EU are insufficient to fully leverage the potential of AI whistleblowers. The EU AI Act invokes the Whistleblower Protection Directive, which provides safeguards against retaliation for those reporting breaches of the Act. However, several classes of risks arising from pre-market internal deployment may fall outside the Act’s scope. These legal limitations are compounded by significant implementation gaps of the Directive: internal reporting channels are often missing or untrusted, external channels lack expertise and are chronically understaffed, and not all Member States guarantee anonymity or legal coverage for public disclosures. Moreover, certain whistleblowing support functions, such as third-party advisory organisations, fall outside the Directive’s narrow definition of protected entities.

To address the identified shortcomings, we propose several policy interventions. These include:

Enforce the establishment of trusted and anonymous internal reporting channels. Many insiders currently lack a safe and realistic pathway for escalating concerns within their organisations, making this the first necessary layer of protection.
Strengthen and harmonise external reporting authorities across Member States. Current authorities are often fragmented, understaffed, and lack the technical expertise to handle frontier AI concerns.
Engage and fund third-party advisory organisations to support whistleblowers.
Civil society actors already offer informal guidance, but should be further integrated into the ecosystem to help triage reports, reduce false positives, and build trust.
Launch a dedicated whistleblower mailbox within the EU AI Office.
Crucially, the Office must publicly commit to offering the same legal and procedural safeguards as those guaranteed under the Whistleblower Protection Directive – including confidentiality, non-retaliation, and timely follow-up. Without this, whistleblowers may lose eligibility for protection under the Directive, particularly if no formal response is received.
Adopt a principle-based regulatory approach in future whistleblower provisions.
This would ensure protection even for disclosures involving novel risks or conduct that violates the spirit, but not the letter, of existing law.
Boost public awareness of whistleblower rights.
This could include mandatory training for employees, the creation of accessible guides explaining reporting procedures, and the publication of anonymised summary statistics on whistleblower cases in order to build trust in the system.

The proposal explores not only the legislative shortfalls but also the broader landscape of whistleblowing, including second-order benefits (such as improved corporate safety culture), the expected low rate of malicious or false reports, and the valuable role of civil society actors offering confidential guidance. The overall message is simple: AI whistleblowers are positioned to surface risks that no audit, database, or system card can reliably uncover. But to do so, they need confidence that the system will listen and protect them when they speak. The EU has a solid legal foundation, but must act swiftly to operationalise these protections before the AI Act comes fully into force.

Introduction

Artificial Intelligence (AI) is set to become arguably the most transformative technology ever created. While it offers unprecedented opportunities across all aspects of our lives, it also comes with risks that are often grouped into three categories: systemic (e.g. job loss, misinformation), misuse (e.g. a malicious actor using AI for a sophisticated cyberattack) and misalignment (e.g. the AI system acting against its creators’ intentions, either on purpose or by accident). Although many of these risks are not entirely novel – especially the first two categories – the scope and pace with which they can transpire pose a tremendous challenge for governance and policymaking. In fact, a phenomenon where the development of emerging technologies leaves regulation unable to keep up is known as the ‘pacing problem’. This issue is expected to be particularly pronounced in the context of AI. As an example, the origins of the EU AI Act (AIA), which represents the most comprehensive set of regulations on General Purpose AI (GPAI) in the world, can be traced back to a white paper from early 2020. In contrast, the last provision of the AIA is set to come into effect in August 2026. This represents a gap between GPT-3 level systems and AI agents that can autonomously perform complex multi-hour coding tasks.

Another factor which exacerbates the gravity of the situation is that, barring nationalisation, Artificial General Intelligence (AGI) might add to a short list of truly revolutionary technologies whose development was not conducted, funded or at least heavily shaped by a government agency (cf. nuclear weapons, space technologies, radar, GPS, microprocessors) [1]. As a result, regulators may lack comprehensive insight into AI system development – an input that is crucial to informed policymaking.

Overall, this suggests a pressing need for increasing transparency into the workings of AI companies and their development of frontier AI systems. A multitude of solutions have been considered, for example information sharing schemes, incident reporting schemes, responsible scaling policies that trigger certain actions when models cross pre-defined capability thresholds, system cards that detail specific information about the model [2], and requirements for AI labs to register their systems in a database. However, such solutions typically suffer from at least three problems:

We cannot have full trust that proposals that are voluntary in nature will be upheld indefinitely. Unfortunately, AI companies have already reneged on their prior declarations or at least failed to fulfil them. This problem can be expected to deepen as the stakes in the AI competition become higher.
Even regulations that are legally binding are not infallible. Firstly, there are multiple cases of tech companies choosing to violate existing laws and trying to conceal it [3]. Moreover, laws may not cover the full range of risks, especially the ones that are yet to emerge. To future-proof regulation, legislators can frame their laws in terms of broad, high-level principles. However, some effects can be very difficult to quantify or specify legally. For example, accusations of ‘gross mismanagement’ or ‘failing to adhere to best practices’ are challenging to prove in court and may take years.
Even if an all-encompassing and precise set of rules could be established, their violation might not be unveiled until after disaster strikes. Case studies from various industries show that often actions clearly falling under ‘gross mismanagement’ were evident well in advance, yet continued to be ignored. Due to the stark information asymmetry between AI companies and external observers, it is crucial to retain the ability to gather this information from the inside. We can think of it as a ‘right to warn’ the outside world. This is particularly important if we expect relevant parties to become more secretive in the future or the consequences of a violation to be particularly hard to mitigate. AI will likely satisfy both conditions.

Altogether, it is clear that voluntary commitments, legally binding mandates and post hoc enforcement will not be sufficient to prevent AI-driven risks. In the rest of this work, we will argue that whistleblowers can help mitigate all these three points of concern and explain how we can protect their right to warn while ensuring privacy and fairness to their employers. In particular, we will focus our analysis on existing EU regulation, since the EU AI Act constitutes the first horizontal framework for governing AI.

Role of whistleblowers in AI

Whistleblowers have a long history spanning multiple industries and sectors. There are also good reasons to believe their importance will be higher than ever when it comes to monitoring and regulating the development of advanced AI systems. Firstly, due to the breakneck speed of AI progress, traditional channels of information sharing (such as system cards or external audits) often lag behind deployment. Insiders, by contrast, may be the only ones able to detect and flag issues as they unfold in real time.

Next, one key aspect which distinguishes AI from most other technologies is that advanced AI systems could be extremely dangerous even if not deployed to the general audience. For example, a model that gains the ability to copy itself onto other servers and resist shutdown can lead to catastrophic consequences even if used purely in internal experimentation [4]. Indeed, risks from internal deployment have recently attracted growing attention of the AI safety community. Again, whistleblowers placed inside frontier AI labs could be our first line of defence against such novel risks from future AI systems.

Even with perfect insight into a GPAI developer’s operations, legislators may still struggle to decide whether a given activity should be prohibited. This is simply due to the very high level of expertise required to monitor cutting-edge AI development – much of which originates in the private sector rather than academia. To illustrate the difference, take the following two examples. If a developer was to disclose their training dataset and it turned out that they mismanaged personally identifiable information or copyrighted materials, this constitutes a clear violation of the AIA. As a counterexample, we can consider the recent rise of reasoning models that take their time to think through the question before returning the final answer. Most of such models ‘think’ in tokens, that is they use a language understandable to humans. However, models can also think using their own internal representations of tokens in a way that we cannot comprehend. While this has been shown to often outperform human-understandable reasoning, it completely sacrifices our ability to interpret the model’s thought process. For this reason, some have advocated that this technique should be banned. Needless to say, it will be difficult for policymakers or external auditors to make such decisions at the cutting edge of AI developments. Whistleblowers, as insiders embedded in this development, can offer not only information, but also the necessary context needed to make these decisions.

Overall, whistleblowers are uniquely positioned to warn about issues that are: (i) difficult to detect from the outside, (ii) concerning in ways not yet captured by existing regulation, or (iii) require significant tacit knowledge to assess. This suggests that we should seriously consider what kinds of protections and support structures they need, so we do not miss out on their value for AI governance.

Existing whistleblower protections

Overview

AI companies have been accused of stifling criticism and forcing their employees to sign restrictive non-disparagement agreements [5]. Those who refuse to sign them risk losing their vested equity, which can normally be monetised and may constitute the overwhelming majority of a person’s income (on top of their regular salary). Apart from financial pressure, the prospect of being sued by powerful corporations with entire legal teams and endless resources is plenty enough to prevent employees from raising their concerns. Moreover, in an industry where top talent is concentrated in just a few tech hubs such as London and Silicon Valley, many fear that becoming known as a whistleblower could jeopardise future opportunities – even if recent cases suggest this risk may be lower than assumed.

Before focusing on EU regulation, it is worth briefly examining the US context as a point of comparison. After all, most frontier labs are headquartered there and transatlantic enforcement challenges may arise. While legal protections for whistleblowers against prosecution or retaliation do exist in the US, a quick review suffices to conclude that they are likely insufficient in the context of the AI industry [6]. For example, the Whistleblower Protection Act applies to ‘a violation of any law, rule, or regulation’ or ‘a substantial and specific danger to public health or safety.’ As explained above, the former clause is not enough in the context of a pre-paradigmatic regulatory regime such as AI. The latter is more promising as it could be argued that releasing (or even training) a misaligned model constitutes a substantial danger to public safety. However, this Act applies only to federal workers in the executive branch. The US Department of Labour has its own protections, yet the categories they cover are not immediately relevant to the concerns about AI we articulated earlier: they cover issues such as employee safety, overtime pay, lie detector testing or mine hazards. Finally, we note that various AI-related whistleblower protections may be introduced at the state level, but their details are still being worked out.

European Union

EU legislation has arguably the most forward-looking provisions for AI whistleblowers. This is because the AIA itself invokes the European Whistleblower Protection Directive in its Article 87:

“Directive (EU) 2019/1937 shall apply to the reporting of infringements of this Regulation and the protection of persons reporting such infringements.”

Thus, a whistleblower wishing to report a violation of any part of the AIA benefits from the protections afforded by this Directive. This also implies that the usefulness of AI whistleblowing in the EU is limited by the same factors that limit the AIA as a piece of legislation (discussed above). In particular, whistleblowers are not protected in cases of disclosing circumstances that are serious, but do not yet constitute a clear violation of the AIA. This statement in itself is not necessarily alarming. After all, workers should not be permitted to reveal arbitrary details of their companies if they pertain to activities that are legally allowed. Nonetheless, this approach might not be enough in the context of advanced AI systems, where a large number of ‘threat vectors’ has not emerged yet and regulation will likely struggle to keep up [7]. One particular example we highlighted in the previous section is that of risks from internal deployment. Currently, it is not clear to us whether the AIA covers all such scenarios [8]. Article 2(8) of the AIA states (emphasis ours [9]):

“This Regulation does not apply to any research, testing or development activity regarding AI systems or AI models prior to their being placed on the market or put into service. Such activities shall be conducted in accordance with applicable Union law. Testing in real world conditions shall not be covered by that exclusion.”

It is important to point out that the Whistleblower Protection Directive applies not only to past breaches of regulation, but also to potential breaches where there are reasonable grounds for suspicion. This preemptive clause is promising in the context of high-stakes situations that might occur when dealing with advanced AI systems. Article 5(2) of the Directive states that:

“ ‘information on breaches’ means information, including reasonable suspicions, about actual or potential breaches, which occurred or are very likely to occur in the organisation in which the reporting person works or has worked or in another organisation with which the reporting person is or was in contact through his or her work, and about attempts to conceal such breaches”

Article 6(1) further specifies that whistleblowers will be covered under the Directive if (emphasis ours):

“(a) they had reasonable grounds to believe that the information on breaches reported was true at the time of reporting and that such information fell within the scope of this Directive; and

(b) they reported either internally in accordance with Article 7 or externally in accordance with Article 10, or made a public disclosure in accordance with Article 15.”

The fact that both conditions need to be satisfied warrants further scrutiny of part (b).

Internal reporting

Some AI companies have indicated their voluntary commitments to establishing internal channels for whistleblowers. During the first AI Safety Summit in Bletchley, Anthropic’s CEO Dario Amodei said that:

“On the operational side, we will put in place a whistleblower policy before we reach ASL-3 and already have an officer responsible for ensuring compliance with the RSP and reporting to our Long Term Benefit Trust. As risk increases, we expect that stronger forms of accountability will be necessary.”

Anthropic’s updated Responsible Scaling Policy from October 2024 includes two relevant clauses:

“Noncompliance: We will maintain a process through which Anthropic staff may anonymously notify the Responsible Scaling Officer of any potential instances of noncompliance with this policy. We will also establish a policy governing noncompliance reporting, which will (1) protect reporters from retaliation and (2) set forth a mechanism for escalating reports to one or more members of the Board of Directors in cases where the report relates to conduct of the Responsible Scaling Officer. Further, we will track and investigate any reported or otherwise identified potential instances of noncompliance with this policy. Where reports are substantiated, we will take appropriate and proportional corrective action and document the same. The Responsible Scaling Officer will regularly update the Board of Directors on substantial cases of noncompliance and overall trends.”

and:

“Employee agreements: We will not impose contractual non-disparagement obligations on employees, candidates, or former employees in a way that could impede or discourage them from publicly raising safety concerns about Anthropic. If we offer agreements with a non-disparagement clause, that clause will not preclude raising safety concerns, nor will it preclude disclosure of the existence of that clause”

The ambiguous language (‘will maintain’, ‘will not impose’) makes it unclear whether Anthropic has already implemented these steps or merely plans to. Nonetheless, we welcome these commitments and hope other AI labs follow suit [10]. Ideally, the full text of companies’ whistleblowing policies should be made public so that (i) AI companies can draw on each others’ practices [11], (ii) their content can be scrutinised by third-party reviewers, and (iii) the risk of backtracking on previous commitments is reduced if they are openly available [12]. Guidelines on establishing internal reporting channels already exist and could serve as a starting point, perhaps with some modifications to adapt them to the AI context [13].

We stress the utmost importance of the anonymity of such reporting channels – lack of anonymity has consistently been listed as one of the biggest factors preventing whistleblowers from raising concerns within their organisations. Unfortunately, while the Whistleblower Protection Directive grants full rights to those reporting anonymously, it leaves it up to individual Member States to decide whether companies and authorities must accept and follow up on anonymous reports. In reality, few countries mandate this within their national laws.

External reporting

In terms of external reporting channels, the Directive mandates that each Member State establish a competent authority to handle the receipt, processing and communication of cases related to whistleblowers’ disclosures [14]. However, in contrast to ‘regulations’ such as the AIA, ‘directives’ require a so-called transposition from the EU level onto national level. They usually specify the end goal to be achieved and leave more room for interpretation regarding the intermediate steps. Just as with anonymity requirements, this means that the precise implementation can vary among Member States, with unequal provisions written into national laws and varied adoption timelines. Indeed, a 2024 report on the transposition of the Whistleblower Protection Directive finds that [15]:

“All Member States have transposed the Directive’s main provisions, but the transposition needs to be improved on certain key areas, such as the material scope, the conditions for protection and the measures of protection against retaliation, in particular the exemptions from liability and the penalties. Moreover, the Commission regrets the overall very late transposition of the Directive.”

These comments are alarming in the context of rapid progress in AI and call for a more unified approach among all Member States. Relevant national authorities are often critically understaffed, lack expertise and independence [16]. This can further decrease trust in action being taken and raise concerns of retaliations, discouraging whistleblowers from speaking up.

Public disclosure

Lastly, in terms of public disclosures, Article 15 of the Directive states that internal and external reporting can be skipped if the whistleblower has reasons to fear retaliation, has doubts about the credibility and speed of these two channels, or if ‘the breach may constitute an imminent or manifest danger to the public interest, such as where there is an emergency situation or a risk of irreversible damage’. Considering the lack of official internal reporting channels in AI labs, and the inconsistency in nation-level external channels, this is promising, albeit with the caveats presented in the footnote above. Unfortunately, in at least one case transposition into national law expressly does not cover public disclosures, which constitutes a clear violation of the Directive.

Analysis

Overall, our analysis of existing provisions for whistleblower protections in the EU paints a picture of strong foundations, yet with key shortcomings that will be crucial to address if we wish to make the most of AI whistleblowers. Before suggesting concrete policy recommendations, we first discuss several aspects that are useful to bear in mind.

Rule-based and principle-based regulation

A common distinction made in policymaking is that between rule-based and principle-based regulation. The former type is more detailed and ‘low-level’, leaving less room for interpretation and ensuring that companies have clearly defined boundaries for their operation. However, such rules can potentially be ‘gamed’ (i.e. satisfied according to the letter of the law, while still undermining its intended purpose) [17] and might not be able to keep up in fast-moving areas such as AI. In contrast, using wide-ranging, high-level principles prevents these problems, but can introduce additional burden in terms of verifying compliance, since both companies and the regulators need to analyse carefully which actions adhere to these principles.

On one hand, rule-based regulation makes whistleblower claims easier to verify. However, it cannot address one of the most serious concerns: the need to prevent emerging threats that existing law has yet to capture. If whistleblowers are meant to bring such risks to the surface, they must be allowed to uncover violations of the spirit of the law. As an example, let us take the case of evaluating models for their capabilities of creating cyber attacks. The AIA does in fact have transparency requirements which specify what sort of evaluation information needs to be reported. Annex XI states that this includes:

“A detailed description of the evaluation strategies, including evaluation results, on the basis of available public evaluation protocols and tools or otherwise of other evaluation methodologies. Evaluation strategies shall include evaluation criteria, metrics and the methodology on the identification of limitations.”

Nonetheless, it is a well-known fact that evaluations can only provide the lower bound of a model’s capabilities. The exact score is heavily dependent on the ‘scaffolding’ used, i.e. whether the model has access to tools such as a calculator or Python code, or how much time the reasoning model is allowed to spend thinking. A developer afraid of their model being banned can deliberately perform only a weak elicitation of the capabilities during testing, such that the officially reported score on cyber threats falls below an acceptable threshold. They would then argue that this level of elicitation is typical in the industry. If a whistleblower has reasonable grounds to believe that stronger scaffolding will be readily available to the user [18], and that this scaffolding will elicit cyber capabilities above the threshold, they should have the right to voice the concern that the reported evaluation strategy was insufficient [19]. Thus, it seems that the usefulness of whistleblowers will be maximised if we adopt a principle-based approach to regulation.

Civil society organisations

It is worth pointing out that several organisations with direct relevance to informing, empowering and advocating for whistleblowers already exist within the civil society. For example AIWI (Formerly, OAISIS), which functions as part of the wider German Whistleblower Network, is a project dedicated to individuals working at the frontier of AI development. In particular, they offer a free advisory service called Third Opinion whose goal is to provide clarity on whether a given practice observed by the employee is concerning or not. The process involves submitting a question to their online platform, after which Third Opinion assembles a panel of experts who seek clarifications from the advisee and submit their views on the level of concern regarding this practice. The whole procedure is handled confidentially using tools widely adopted in the whistleblower world: self-hosting an instance of the GlobalLeaks platform, using privacy-optimised operating systems such as Qubes OS or Tails, and requiring advisees to submit their questions through the Tor browser.

However, it must be remembered that Third Opinion is not in itself a whistleblower service. They do not forward the received information further through any internal, external or public channels. They also do not have the authority (nor the means) to offer financial or legal protections to the users of their service. Again, Third Opinion can only provide clarity on whether a given observation is concerning or not, yet it is still the advisee’s choice whether to escalate and report their concern through appropriate channels. Moreover, a potentially concerning gap in current legislation is that while the Whistleblower Protection Directive covers ‘facilitators’, these are defined in Article 5 as:

“ ‘facilitator’ means a natural person who assists a reporting person in the reporting process in a work-related context, and whose assistance should be confidential;”

Therefore, we are uncertain whether in practice courts would extend the protections of the Directive to third-party support organisations.

Overall, we see a very high value in engaging with such civil society initiatives in the context of building robust whistleblowing channels. They could serve as the first step in a two-tier system. Initially, a concerned individual uses services like Third Opinion to confirm whether their observation warrants concern. Then, they may report the claim internally within their organisations, externally to national authorities, or – under certain conditions – make a public disclosure as permitted by the Directive. This first stage can give confidence to potential whistleblowers, inform them of existing protections and give guidance on how to proceed further. As argued in the next section, it would also have the useful side effect of minimising the probability that a whistleblower’s claim turns out to be mistaken. Overall, we believe that the EU should either engage with such organisations by fostering their growth in the civil society or alternatively replicate their function through a separate body managed at the Union level.

Finally, we point out that another non-profit organisation, The Signals Network, already offers clear and detailed guides on whistleblowing procedures and protections available in the US, UK and the Republic of Ireland, albeit not specifically in the context of AI. Such guides could be updated with the content of the AIA and extended to cover Member States other than Ireland.

Are false claims going to be a problem?

A natural concern regarding extensive whistleblower protection schemes is that they may allow for careless or false disclosures. This would place an unjustified burden on the companies and potentially leak their sensitive information to competitors. In practice, however, we argue that this effect is very unlikely to transpire.

Broadly speaking, false claims made by whistleblowers can fall into two categories: malicious and accidental. Experts we spoke to agreed that the risk of malicious false disclosures is low. This is simply because whistleblowers, even if granted legal protections, are still placed in a very disadvantageous and vulnerable position as compared to the organisations they blow the whistle on. They risk being ostracised from their industry (and top AI talent is very concentrated), lengthy legal battles, daunting media attention and even threats to their personal safety [20]. Thus, in reality people who become whistleblowers need to have a very high degree of confidence in their claims and truly believe that disclosing them is in the public interest. The incentives against false accusations are extremely strong.

Preventing accidental false claims is somewhat more difficult, but far from impossible. Naturally, not every insider at an AI lab is going to be an expert at the issue they find concerning. For example, a technical ML researcher without experience at organisational risk management might become alarmed that a certain practice at their company is worrying and should be disclosed. To help them decide whether what they are witnessing is really a cause for concern, we can use external committees composed of topic experts to provide independent reviews. This process should be arranged in a confidential manner and be free of charge. It seems that civil society services such as Third Opinion could already play this role – while they themselves cannot offer legally binding protections, they could give whistleblowers sufficient confidence that their case warrants escalation to the authorities and would be eligible for these protections. Moreover, accidental false claims can even be perceived as desirable, especially in the early stages of rolling out new regulation. This is because they could help highlight parts of the regulation which are not clear and require further specification, thereby providing useful feedback to the regulator.

Finally, we note that the Whistleblower Protection Directive already includes a clause protecting companies from false accusations and enabling appropriate compensations:

“Member States shall provide for effective, proportionate and dissuasive penalties applicable in respect of reporting persons where it is established that they knowingly reported or publicly disclosed false information. Member States shall also provide for measures for compensating damage resulting from such reporting or public disclosures in accordance with national law.”

Moreover, an associated Recital clarifies that ‘reasonable grounds’ do not include ‘information which is already fully available in the public domain or unsubstantiated rumours and hearsay.’

Second-order effects from whistleblowing

The usefulness of whistleblowers’ revelations extends beyond breaches of a particular law. Indeed, in the context of the AI industry, where a regulatory paradigm has not been established yet (possibly with the exception of the EU), we see a very large part of its value in so-called second-order effects. Even if such revelations are not intended to warn us of an impending danger to public safety, they can nonetheless exert a strong pressure on companies to adopt better safety practices and culture.

As a striking example, after a series of high-profile departures from OpenAI in 2024, it was revealed that employees were forced to sign very restrictive non-disparagement agreements that prevented them from ever criticising the company or even mentioning the existence of these agreements. Some employees refused to sign them at the cost of losing their equity, which often amounted to a number several times higher than their regular salaries. The fact that these documents were brought up and handed over to a news outlet means that the insiders broke their contracts and could have been retaliated against. After all, restrictive non-disparagement agreements, while controversial, are not illegal [21, 22]. Nonetheless, the resultant public backlash was so large that it prompted a swift apology from OpenAI’s CEO, who stated that the company will remove these clauses from their exit agreements.

This example is a strong indication that whistleblower protections should – under reasonable circumstances – be extended beyond breaches of existing laws. As a result of such second-order effects, companies may feel pressured to improve their safety standards, as not doing so paints them in bad light and could negatively affect their user base or potential future contracts.

Another important second-order effect is that robust internal reporting schemes can be of direct benefit to the companies themselves. Fostering a culture where speaking out is the norm, not the exception, can help highlight a variety of issues before they escalate into major crises. Needless to say, such a work environment also improves employee satisfaction and loyalty – when staff trust that their concerns will be taken seriously, they are less likely to bypass internal channels and seek external assistance.

Recommendations

In this section, we present several policy proposals to take this discussion forward.

Enforce the establishment of internal reporting channels within frontier AI labs whose activities may pose serious risk to public safety

This applies first and foremost to providers and deployers of frontier GPAI, though not exclusively. For example, employees of companies providing computer vision models or biological sequence models should also have a straightforward, confidential and transparent pathway to reporting their concerns, as these systems have a high potential of falling under the ‘unacceptable’ or ‘high risk’ categories of the AIA. The establishment of such channels is already mandatory under the Whistleblower Protection Directive for companies with over 50 employees, however we lack evidence that these requirements have been implemented effectively or in a timely manner – if at all. The same requirements should also be considered for companies with fewer than 50 employees.

Article 9 of the Directive lays out minimal design standards for internal channels; these should be supplemented by guidelines drawn from the expertise of civil society organisations. In particular, we emphasise the need for anonymity – companies should be mandated to accept and follow up on anonymous reporting, whereas this is currently left up to each Member State to decide. Furthermore, it is essential that such bodies operate independently from company leadership, both legally and in practice. We also believe that encouraging companies to openly publish their internal reporting procedures would raise collective safety standards, improve accountability, and increase trust in action being taken. In addition, organisations should be required to demonstrate the effectiveness of their whistleblower education initiatives, for example by regularly testing employee comprehension of their rights and available reporting channels.

Ensure external reporting channels across Member States have adequate resources and technical expertise

In many cases, whistleblowers choose to raise their concerns outside of their organisations, mainly due to a lack of trust in internal mechanisms and fears of retaliation. While the establishment of such external channels is also mandated by the Directive, in practice they are often severely understaffed and lack the expertise necessary to investigate disclosures. This is a clear barrier to whistleblowers coming forward with information. Member States should ensure that sufficient funding is allocated to their whistleblowing channels and specify which authority is responsible for which provisions of the AI Act, particularly when multiple regulatory bodies are involved. We recommend that Member States publish clear, public-facing frameworks that map specific articles of the AI Act to the relevant authority or enforcement body, including instructions on how disclosures can be submitted. Furthermore, these bodies must be equipped with the technical know-how required to assess AI-specific claims. Failure to do so would risk procedural bottlenecks, delayed investigations, or inconsistent treatment of similar cases across jurisdictions.

When leaving too much room for legal interpretation to Member States, there is also a risk of a ‘regulatory race to the bottom’ – individual states may be incentivised to limit whistleblower protections in order to avoid driving away companies from their territories. To prevent this effect, the European Commission should issue common guidance on minimum standards for national reporting channels. This should include guarantees for whistleblower anonymity, timelines for responding to disclosures, and expectations around publishing anonymised summary statistics. Critically, implementation must be completed before August 2026, when the AI Act enters into full force and will require effective national enforcement. Without functioning external reporting structures, Member States will not be able to fulfill their obligations under the Act, nor will they be able to act on the most time-sensitive and high-stakes whistleblower reports.

Engage with and fund civil society organisations that serve as the first step in the whistleblowing ladder

Multiple non-profit organisations dedicated to advising potential whistleblowers already exist. These groups have deep expertise in the concerns whistleblowers face, the tactics used to silence or retaliate against them, and the practicalities of secure, confidential communication. They could be formally integrated into the EU’s broader whistleblower protection framework as an initial point of contact, fulfilling three key functions: (i) advising individuals on whether their concerns are warranted, (ii) informing them of existing protections, and (iii) guiding them on how to escalate their disclosures through official channels. Involving third-party organisations in this way could also reduce the rate of ‘accidental false positives’ – claims that were made in good faith but ultimately turned out to be unfounded due to the whistleblower's limited expertise or legal complexities.

Aside from fostering and funding such organisations in the civil society, the Union should also consider replicating some of their functionality as part of the AI Office. In either case, we do not recommend automatically escalating reports from support organisations to courts or authorities. Such automatic escalation could discourage whistleblowers from seeking advice in the first place. Moreover, it could increase the risk of false reporting – a malicious party could submit a fake claim anonymously, knowing that it will automatically lead to legal action. If the responsibility of taking action lies with the claimant, there is no incentive to lie to advisory organisations.

Finally, we point out that these organisations do not seem to be currently covered under the Whistleblower Protection Directive, which defines facilitators as ‘a natural person who assists a reporting person in the reporting process in a work-related context’. This definition should be extended to include external support organisations.

Establish a dedicated whistleblower mailbox within the EU AI Office [23]

Once a potential whistleblower gains confidence that what they observed should be reported, they need to have a clear pathway to do so. Unfortunately, internal channels are still missing (or may be untrusted), while national authorities are often underfunded and lack relevant expertise. Therefore, we recommend introducing an anonymous whistleblower mailbox directly as part of the EU AI Office. The European Commission already maintains several such mailboxes, for example in the context of antitrust, anti-fraud and sanction violations. As the official enforcer of the AIA, the AI Office is uniquely placed to establish an AI-specific mailbox, launch investigations and impose any resultant penalties.

In addition to providing advice and clarifications of the AIA, this mailbox could also serve as a first point of contact for informal queries, while offering the option to file a formal complaint that may lead to a follow-up investigation. It might be tempting to have higher standards for the depth of reported information prior to launching an investigation, however we do not recommend this. Preliminary survey results by AIWI show that insiders will likely be extremely uncertain about whether their concerns are valid and fall within the scope of the AI Act. Such uncertainty should not disqualify a report from being taken seriously, but rather be seen as a normal feature of complex, high-stakes situations. We therefore want to reduce barriers to reporting at the AI Office level as much as possible. For this reason, we also strongly advise against requiring whistleblowers to identify themselves.

It is crucial that the mailbox be appropriately staffed, both to handle the required volume of cases as well as signal its professionalism to potential whistleblowers. For reference, the antitrust mailbox receives around 100 notices per year, but is relevant to millions of EU employees, in contrast to a much smaller number of AI-relevant employees. This suggests that it might be possible to avoid overwhelming the inbox without specifying too strict reporting standards that could intimidate potential whistleblowers.

In addition to processing individual disclosures, the AI Office should also play a coordinating role across the broader whistleblower ecosystem. In cases where it lacks the jurisdiction or capacity to follow up directly, the Office could serve as a central recipient that forwards reports to the appropriate national authority, while maintaining oversight of the overall process. This would help prevent cases from being lost between institutions or subjected to uneven treatment across Member States. To further support transparency and institutional learning, the AI Office should also publish periodic reports on the types of issues raised through the mailbox, including aggregated statistics and anonymised case summaries. Doing so would not only build trust in the mechanism itself, but also contribute to a clearer understanding of where safety concerns tend to cluster within the AI development pipeline.

As the AI Office will function outside of the national reporting channels covered by the Whistleblower Protection Directive, it is essential that it publicly clarify the standards it will uphold when receiving disclosures. In particular, the Office should commit to offering the same procedural and legal safeguards as those guaranteed under the Directive – including confidentiality, timely follow-up, and protection against retaliation. Without such commitments, whistleblowers who report through the AI Office may risk losing eligibility for public disclosure under Article 15 of the Directive, especially if they receive no formal response. Clarifying this institutional role would help build trust in the mailbox and reduce the uncertainty currently faced by would-be whistleblowers.

Overall, we recommend that this mailbox be operational by early 2026, such that it can begin supporting informal queries and infrastructure coordination ahead of the AI Act’s full enforcement in August 2026. While its formal role under the Whistleblower Protection Directive will only apply from that point onward, early deployment would enable smoother uptake, clearer institutional responsibilities, and a stronger baseline for future evaluation.

Follow the principle-based approach to policymaking when implementing future provisions relevant to whistleblower protections
One of the main takeaways of our work is that the ‘pacing problem’ (regulation unable to keep up with what it is trying to regulate) is likely to be particularly pronounced in the AI industry. New developments are emerging at a breakneck speed and bring novel risks that simply cannot be preemptively covered by fine-grained rules. Therefore, we recommend that future whistleblower protections be implemented according to overarching principles that follow the spirit of the law. At the same time, we also highlight the danger of moving towards principles that are too broad and leave too much room for interpretation. Indeed, vague and unenforceable language was one of the key concerns highlighted by a roundtable on the Code of Practice. Striking the right balance between the two approaches will be crucial.

Promote the public awareness and understanding of the available legal protections within the EU
The unfortunate reality of whistleblowing is that the majority of people who wish to come out with important information will not do so out of fear, doubts regarding their correctness, and unfamiliarity with available protections. We thus recommend creating an extensive set of materials to raise awareness of the legal frameworks put in place in the EU. This is particularly important for employees of frontier AI labs, both domiciled within the EU (e.g. Mistral), as well as domiciled outside of the EU but deploying their products on the EU market (as they are also covered by the AIA). This objective could be achieved e.g. by producing and disseminating easily understandable educational materials or conducting dedicated training upon onboarding into the company.

We point out that a guide specific to the Republic of Ireland already exists and could be readily expanded to cover the whole EU, as well as adapted to the AIA and the Code of Practice.

Clarify what constitutes ‘dangers to the public interest, such as where there is an emergency situation or a risk of irreversible damage’

The Whistleblower Protection Directive offers this framing as an avenue to skip internal and external reporting channels, yet still be granted full legal protection. We believe this article should be clarified and expanded to include risks from advanced AI systems, such as threats created by misaligned AIs or model weight theft due to insufficient cybersecurity protocols. This could be done either through a modification of the article itself or through a new recital.

We recognise that several of the recommendations presented above would require changes to legal instruments that fall outside the scope of the AIA, most notably the Whistleblower Protection Directive. For example, mandating the acceptance of anonymous reports, lowering the >50 employee threshold for internal channels, and expanding the definition of whistleblowing facilitators would all require formal amendments to the Directive. As this would involve a full legislative process at the EU level, followed by national transposition across 27 Member States, such changes are unlikely to be implemented in the short term. We nonetheless believe it is essential to flag these issues now, such that they can be considered in future revisions of the Directive or – in the meantime – incorporated into Commission guidance documents and non-binding recommendations. In contrast, other proposals, such as the establishment of a dedicated whistleblower mailbox, fall well within the existing mandate of the AI Office and could be actioned during the 2025–2026 implementation period. We recommend that these short-term interventions be prioritised, while laying the groundwork for more systemic legal updates in parallel.

Conclusions

This work has argued that whistleblowers will be indispensable to any serious effort to govern advanced AI systems – especially those developed by private frontier labs with limited external oversight. While current EU frameworks offer a promising legal foundation, they are ultimately shaped by the limitations of the EU AI Act: whistleblowers are protected only when reporting legally defined breaches, rather than when raising concerns about novel or poorly understood risks. As we have shown, several of these risks – including internal deployment of dangerous models, under-elicited evaluations, or latent capabilities – may not yet fall under the AI Act’s regulatory scope, despite posing serious threats to public safety.

The Whistleblower Protection Directive offers some flexibility by extending protection to cases of potential breaches or imminent danger. However, the Directive’s uneven implementation across Member States, limited guarantees for anonymity, and lack of clarity around public disclosures significantly weaken its practical effectiveness. To solidify EU whistleblower protections in the context of AI, we recommend reinforcing both internal and external reporting channels, engaging expert civil society organisations, and establishing an anonymous, well-staffed reporting mailbox within the EU AI Office. As the frontier of AI capabilities continues to evolve rapidly, empowering insiders to speak up – and designing institutions that listen – may be one of our most important governance tools.

Acknowledgements

The author wishes to thank Karl Koch, Elsa Donnat, Mauricio Baker and Michelle Nie for useful discussions and comments, as well as Ethan Beri for early access to unpublished work.

This work was produced as part of the Talos Fellowship. The author also graciously acknowledges the financial support of Open Philanthropy, which was used during the last stages of the project.

This assumes the US or Chinese government do not decide to organise a ‘Manhattan Project’ for AI, which in itself is a contentious claim.

System cards, also known as model cards, are technical reports accompanying the release of an AI model. They are published on a voluntary basis and may include aspects such as datasets used, architectures, training infrastructure, evaluation scores, safety precautions and environmental impact. Note that, at the time of writing, there are no legally binding requirements specifying the contents of system cards.

For a striking example, see documents unveiled as part of a lawsuit against Meta’s training on pirated content. These documents reveal internal discussions of legal consequences and possible coverup strategies.

One could argue that nuclear weapons are similar, in that they carry significant risks – such as accidental launch or mishandling – even when they are not used in war or proliferated beyond state control. However, a major difference is that they, being physical hardware rather than software, cannot self-replicate, self-improve and conspire against us. Consequently, containing and controlling them is easier relative to a superhuman AI.

Non-disparagement agreements are distinct from the standard non-disclosure agreements in that they might prevent employees from ever criticising their employer, even after the end of employment. They also often include clauses that prevent the employees from mentioning the existence of such agreements.

A thorough review of whistleblower protections in various industries and legislations is beyond the scope of this work. We refer the reader to the following resources:
Governments Need to Protect AI Industry Whistleblowers: Here's How
How to design AI whistleblower legislation
Congress Introduces “Urgently Needed” AI Whistleblower Bill

Tech Whistleblowing Guides - The Signals Network

At first sight, a potential extenuating circumstance is given by Article 15(1)(b)(i), which states that a public disclosure of information will qualify for protection under this Directive when there are reasonable grounds to believe that ‘the breach may constitute an imminent or manifest danger to the public interest, such as where there is an emergency situation or a risk of irreversible damage’. However, a ‘breach’ here is defined by Article 5 in the context of existing Union law, which in this context is the AIA itself. Furthermore, it is not clear whether the scope of the Directive as defined through Article 2 is sufficient to cover all emerging threats missed by the AIA. While it does mention ‘product safety and compliance’, as well as ‘protection of privacy and personal data, and security of network and information systems’, it is not explicitly guaranteed that this would include situations such as an internal AI system used for research purposes that is about to cross the threshold of automated replication and self-improvement.

An argument for why ‘internal deployment’ is potentially covered has been given in Section 4.2 and Appendix B of AI Behind Closed Doors: a Primer on The Governance of Internal Deployment. However, even if this argument holds true, we are uncertain whether all types of internal usage would classify as ‘putting the model into service’, which would in turn satisfy the AIA applicability criterion. For example, experimenting on a penultimate checkpoint of a training run might not be covered (since such a checkpoint would not be used to directly accelerate the R&D of other models) and so would fall out of scope of the AIA due to Article 2(8), as explained in the text. Still, such internal experimentation could lead to the same risks as internal deployment.

Yet another example would be an AI lab training a model exclusively for the purpose of testing a new alignment technique, which then goes wrong by accident. Due to Article 2(6), such internal usage might also fall out of scope of the AIA.

The word ‘any’ is of course all-inclusive, but Recital 25 further confirms that this applies not only to purely scientific research, but also to product-oriented research that is meant to be monetised:

“This Regulation should support innovation, should respect freedom of science, and should not undermine research and development activity. It is therefore necessary to exclude from its scope AI systems and models specifically developed and put into service for the sole purpose of scientific research and development. Moreover, it is necessary to ensure that this Regulation does not otherwise affect scientific research and development activity on AI systems or models prior to being placed on the market or put into service. As regards product-oriented research, testing and development activity regarding AI systems or models, the provisions of this Regulation should also not apply prior to those systems and models being put into service or placed on the market.”

We have since learned that OpenAI has a dedicated 24/7 ‘integrity hotline’, as well as a policy document with commitments to anonymity and non-retaliation. Apparently, Anthropic maintains a similar hotline implemented through a third-party whistleblower support organisation, although we were not able to independently verify this claim.

A good analogy here is the publication of the Responsible Scaling Policy itself. After Anthropic’s initial release of their RSP in September 2023, OpenAI followed with the Preparedness Framework and Google DeepMind with the Frontier Safety Framework.

(Beri and Baker, forthcoming) find that lack of trust in action being taken was a strong demotivating factor in 40% of the whistleblowing cases they analysed.

Interestingly, one of the most extensive and transparent internal whistleblowing systems has been implemented by Volkswagen, which is a direct consequence of its infamous Emissions Scandal.

Importantly, whistleblowers should be aware that they do not need to pursue internal reporting first.

A report by Transparency International raises similar concerns, often finding that national laws are weakened or contradict the original content of the Directive.

A good example here is the Wirecard Scandal in Germany, which involved fraud of almost €2 billion. Journalists reported their findings to the Federal Financial Supervisory Authority (BaFin), which then filed a criminal complaint against the journalists for market manipulation. This is likely due to very aggressive tactics by Wirecard, who carried out sting operations on the journalists in order to ‘muddy the waters’. Moreover, investigators found that BaFin employees had increased their trading in Wirecard shares in the months leading up to the firm’s collapse, which potentially constitutes a strong conflict of interests.

To present a hypothetical example, Article 14 of the AIA already mandates including human-in-the-loop components in high-stakes AI systems. A company could design the user interface for these components such that the ‘Accept’ option is pre-selected by default. Note that this does not assume any bad intentions; it could be done inadvertently to reduce cognitive friction and make the system less cumbersome to use. While technically allowing for oversight, this design choice nudges the user toward automatic approval, especially if not paired with adequate training.

For example, through the official API or because it could be easily constructed by other means.

The requirement of appropriate elicitation in model evaluations is included in the Code of Practice. Measure II.4.6 of the Third Draft states:

“Signatories shall ensure that all model evaluations of their GPAISR (whether internal or external) are performed with a state-of-the-art level of model elicitation appropriate and proportionate to the systemic risk assessed to: (1) elicit the upper limit of current and reasonably foreseeable capabilities, propensities, and effects of the model under evaluation; (2) minimise the risk of under-elicitation; (3) minimise the risk of model deception during model evaluation; and (4) match the realistic model elicitation capabilities of potential misuse actors, where misuse actors play a role in the relevant systemic risk scenario (e.g. some potential misuse actors might not be able to fully elicit the model).”

Nonetheless, our example illustrates the sort of difficulties that can be encountered in practice.

(Beri and Baker, forthcoming) find that in 30 case studies they analysed, retaliation occurred ~60% of the time, with ~10% of whistleblowers receiving death threats.

Although a later SEC inquiry raised questions about the legality of the NDAs, the initial policy change was driven by reputational pressure, not legal compulsion.

There are likely significant differences between the US and EU in terms of how NDAs interplay with whistleblower protections. In the US, there is a high level of ‘freedom of contract’, meaning that employees are usually allowed to give up any rights they choose unless the contract would violate some important public policy. While courts may expressly rule in certain cases that whistleblower rights take precedence over NDAs, cases where courts allowed counterclaims for NDA violations to go forward do exist. Moreover, the mere existence of a strict NDA can successfully deter a potential whistleblower from speaking up. See this article from the Institute for Law and AI for more information on the US context.

On the other hand, in the EU, Recital 91 of the Whistleblower Protection Directive states that:

“It should not be possible to rely on individuals' legal or contractual obligations, such as loyalty clauses in contracts or confidentiality or non-disclosure agreements, so as to preclude reporting, to deny protection or to penalise reporting persons for having reported information on breaches or made a public disclosure where providing the information falling within the scope of such clauses and agreements is necessary for revealing the breach.”

Therefore, there is clear guidance that in cases of conflicts between NDAs and whistleblower rights, courts should rule in favour of the latter.

The recommendation to establish such a mailbox has now been included in the ‘Statement from Chairs and Vice-Chairs’ on the final version of the Code of Practice. Similarly to the considerations elucidated here, it calls for the mailbox to allow for anonymous communication and to ensure that the level of afforded protections matches that of the Member States.

David Conrad