The Korea AI Safety Institute (KR AISI) and Singapore AI Safety Institute (SG AISI) have completed a joint testing exercise examining how autonomous AI agents handle sensitive data during routine multi-step tasks, highlighting persistent risks of unintended data leakage even in non-malicious settings.
The bilateral initiative tested whether modern AI agents could execute realistic enterprise and consumer workflows — such as onboarding employees, managing refunds, scheduling meetings, publishing content, and analyzing internal datasets — while adhering to basic data-handling rules. The exercise reflects rising interest in agents capable of taking actions, invoking tools, and interacting with digital environments rather than producing static text responses.
According to a summary published January 19, the institutes designed 11 scenarios across three common agent archetypes: customer service agents, enterprise productivity agents, and personal productivity agents. Tasks required multiple decisions and interactions with simulated tools based on the Model Context Protocol, including email systems, calendars, file systems, messaging platforms, and blogs.
The tests focused on three categories of leakage: lack of data awareness (revealing inherently sensitive information such as passwords), lack of audience awareness (sending internal data to external recipients), and lack of policy compliance (violating enterprise-specific data rules). Unlike prior agent research centered on adversarial attacks such as prompt injection, this effort evaluated leakage during benign, day-to-day task execution.
Across 660 total runs, no model was consistently correct and safe. Singapore reported that its highest-performing model achieved full safety in 57 percent of runs and full correctness and safety in 40 percent. Lower-performing models frequently mishandled confidential data or failed to complete tasks. Korea recorded similar patterns but with lower scores, which both sides attributed to differences in scaffolding and implementation environments. Human review confirmed the general direction of results but also identified judgment gaps between human evaluators and LLM-based judges.
Qualitative findings underscored that data leakage often stemmed from misunderstanding context or inventing details rather than malicious behavior. Agents sometimes skipped steps, assumed actions had succeeded, or deviated from instructions in attempts to be helpful. In some cases, simulated “user” LLMs misled agents, complicating execution.
The institutes said bilateral testing surfaced methodological benefits — including more realistic environments and shared evaluation frameworks — and demonstrated the value of multi-country collaboration in emerging agent safety standards. A full report is expected following refinements to testing design.
Need Help?
If you have questions or concerns about any global guidelines, regulations and laws, don’t hesitate to reach out to BABL AI. Their Audit Experts can offer valuable insight, and ensure you’re informed and compliant.


