The rapid development of artificial intelligence (AI) and machine-learning has led to powerful algorithms that have the potential to improve lives on an unprecedented scale. However, with greater capabilities comes greater potential for harm to civil liberties. This past year was a bad year for AI, with public scandals emerging over biased outcomes, lack of transparency, and misuse of data, all of which contributed to the growing mistrust of AI (e.g., Whittaker et al. 2018). [i]  

In response to these concerns, nearly every research organization (and researcher) that deals with the ethics of AI have issued reports and guidelines recommending the auditability of algorithms.[ii]  A recent example is the EU High-Level Expert Group on Artificial Intelligence, who emphasized this need in their draft “Ethics Guidelines for Trustworthy AI”.[iii] In the U.S., it has been suggested by members of congress that “any algorithmic decision-making product the government buys must satisfy algorithmic auditability standards.”[iv] While it’s clear that having an impartial third-partly examining AI will help build trust with the public, how these “algorithm audits” are to be done is still an open question. This area is an active field of research, but current proposals are either too high-level to be put into practice without further guidance (Barocas et al. 2013; Sandvig et al. 2014; Mittelstadt et al. 2016), or focus on very specific notions of fairness or transparency that don’t consider multiple stakeholders or the broader social context (Selbst et al. 2018).

For example, there is growing research into algorithmic bias, where unwanted human bias (such as racial or gender bias) has been encoded into algorithms through the training data (e.g., Friedman & Nissenbaum 1996; Angwin et al 2016; Noble 2018). Related to bias, several metrics for algorithmic fairness have been proposed (Zliobaite et al. 2017; Speicher et al. 2018 and references therein), as well as methods to mitigate detected unfair bias (e.g., Kamiran et al. 2012; Zemel et al. 2013), though inherent incompatibilities between different notions of fairness have been noted (Kleinberg et al. 2018). There have also been attempts to extend beyond simple fairness metrics and weigh algorithmic impact on society and human well-being (Corbet-Davies et al. 2017; Hu & Chen 2018; Liu et al. 2018; Altman et al. 2018), but direct application of these metrics requires a holistic framework with which to consider competing interests from a variety of stakeholders.

BABL has developed such framework for internal use, but we plan to make it public, and will be soliciting feedback from the research community soon. The end goal will be to converge on a robust set of tools (e.g. Fig. 1) that would allow nearly any educated auditor to apply this framework, provided sufficient training, to assess how “ethical” an algorithm is. Given the proliferation of advanced AI algorithms, and the regulations that will follow, there will be a need for trained auditors that speak a common language.

For inquires, please contact us at

Figure 1: Adapted from Brown, Davidovic & Hasan (in prep.) Left: Example stakeholder interests, each of which would have a “salience” score. Middle: Each line represents an element in the “relevancy matrix” which maps the relevance of each interest to each sub-metric. Right: Sample sub-metrics from the Fairness and Effectiveness categories (Transparency and Accountability not shown).

[i] e.g.,

[ii] We’re using AI and algorithms interchangeably here, thought they don’t have to mean the same thing, because the specific issues of trust that I’m discussing here apply to both.


[iv] pg 18,