May Roundup

May 22, 2025

Hello friends!

Welcome to the May roundup - your monthly dose of papers, events, and other interesting bits I’ve been reading lately. I also share over on LinkedIn and Bluesky, where you can follow too.

This month I’ve been dividing my time between client coaching sessions and launching a new venture (stay tuned to learn more)! Last week, I also had the privilege of delivering a keynote at Agile Manchester, where I also learned about topics like staying agile when delivering critical and fast-changing services, and improving team communication.

Need some guidance? I always offer a complimentary 30-minute discovery session. Simply email me or click to book a call, and schedule a time that works for you.

Career Spotlight

I know solid career advice can be hard to come by in the AI world, which is exactly why I started this spotlight series. The idea is to highlight a range of career paths in AI and hopefully spark a bit of inspiration, by sharing how some brilliant people have shaped their journeys.

This month’s spotlight:

Toju Duke, Responsible AI leader and founder

Podcast

This month’s podcast is with Mauro Nicolao, from Soapbox Labs. Listen to hear us talk about designing voice technology specifically for children, and how it can enhance and embed the great work of educators:

AI Updates

How People Are Really Using Gen AI in 2025

How are people actually using AI? It's been a couple of years now since ChatGPT launched, which has given people loads of time to adopt and experiment with it.

This article in HBR took a close look at how people are using genAI tools like ChatGPT, by trawling online forums and articles to find mentions of specific use cases, and categorising them. This gives some interesting insights into what capabilities people are finding valuable:

📎 Personal and professional support was the biggest category in 2025. While there's a lot of research into the appropriateness of LLMs for therapy and coaching, it seems people are trying them anyway

💡 Coding and other professional tasks was a big use case - we've seen tools like Cursor become commonplace for Engineers

📈 Creating and editing content is a third big use case, and now there's even a growing backlash against certain words ('delve') or punctuation (em-dashes) as being indicators of AI generated text

AI Papers I’ve read

Human Trust in AI Search: A Large-Scale Experiment

Find the paper: https://arxiv.org/abs/2504.06435

What makes people trust in GenAI, or not?

This study used a range of search queries, and compared generative AI summaries to traditional search. It also looked at how design factors impacted people’s trust in the search results.

Participants generally trusted GenAI-generated summaries less than traditional search results. However, adding references to GenAI summaries significantly increased trust, even when some references were invalid or hallucinated. This suggests that references can create a persuasive appearance of credibility, regardless of their actual validity. In contrast, features like highlighting uncertainty or displaying social feedback did not increase trust. In fact, these sometimes reduced how much participants engaged with the results.

While references can make GenAI search results seem more trustworthy, this trust may not always be deserved. As AI-generated answers become more common, it’s crucial for both designers and users to look beyond surface cues like references, and critically evaluate the information provided.

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Find the paper: https://arxiv.org/abs/2412.14161

Ever wondered if AI could handle your job?

That's exactly what researchers explored in this paper, creating a virtual software company where AI agents face real-world work challenges. These digital workers were given access to everything they'd need - computers, task management tools, company chat, documentation, and code repositories - then assigned authentic workplace tasks to complete.

The results? Even the best performing model, Claude-3.5-Sonnet, successfully completed only 24% of assignments. This was notably more than double the success rate of the runner-up at 11.4%. The agents struggled with seemingly basic workplace requirements: applying common sense, demonstrating social awareness, effectively searching for information online, and executing multi-step plans.

This research reveals that despite recent advancements, there's still a significant gap between current AI capabilities and what's needed to autonomously perform most professional work. So while AI continues to make impressive strides, it looks like your job security remains intact!

Interesting AI articles & other links

AI for Science

If you’re interested in how AI is being applied across science research, then Accelerate Science have released a series of 5 short videos shining a light into how different researchers are using AI across scientific disciplines.

Head over to YouTube to watch:

Samia Mohinta talk about Computer Vision in Neuroscience
Dinithi Sumanaweera talk about unsupervised clustering in cell biology
Sireesha Chamarthi talk about anomaly detection in Astrophysics
Chris Bannon talk about supervised learning and explainable AI in medicine
Felix Steffek talk about natural language processing in law

Thanks for reading!

See you next month,

Catherine.

AI x Insights

Discussion about this post