My research focuses on the evaluation of artificial intelligence (AI) systems: to help figure out which AI systems are “better” or “worse” for human outcomes. Evaluation tools can help guide AI development towards helpful and safe systems. As AI solves increasingly complex tasks, its evaluation becomes more challenging: judging state-of-the-art agents is more challenging than comparing simple image classifiers.

My work within AI evaluation focuses on aspects of AI systems critical for human utility but missed by existing evaluations. Our recently released Feedback Forensics toolkit enables testing AI personality, one such understudied aspect. Relatedly, our Inverse Constitutional AI work enables the interpretation of opaque human preferences commonly used in crowd-sourced evaluation methods, including preferences over personality traits. Previously, I worked on benchmarking more specialised (smaller) AI models: I built benchmark tools for (meta) reinforcement learning (RL) methods in the context of efficient building control systems (see the Bauwerk and Beobench libraries).

Look at the projects page for a list of research software projects and papers.