April Roundup: Studio Ghibli, Llama 4 & Maths Olympiad
Hello friends!
Welcome to the April roundup - your monthly dose of papers, events, and other interesting bits I’ve been reading lately. I also share over on LinkedIn and Bluesky.
Work with me! I work with AI leaders, helping you navigate the chaos and lead strategically. Email me or book a call to learn about 1-1 coaching and advisory packages.
Also coming up… Happy to announce that I’ll be speaking at Agile Manchester on 15th May. If you want to join the event - get 10% off tickets with the code ‘10Catherine’.
AI Updates
Llama 4 release
Llama 4 made its debut earlier this month on 5 April. And in a rare twist for AI launches, it dropped on a Saturday. That got people talking. Was it a rushed release to beat announcements from rival labs? Or, as Mark Zuckerberg put it on X, “That’s when it was ready.”
Since then it’s also come under fire for its misleading benchmark results - the version performing well on LLMArena wasn’t the same as the one publicly released - and for moving more to the right in an effort to ‘present both sides’. Still, like previous generations, Llama 4 models are likely to be popular and well used.
Studio Ghibli in the spotlight
Studio Ghibli images went viral this month as people discovered you can use ChatGPT to generate images in the studio’s iconic style. While the trend was popular, plenty of people noted that it went against the ideals of the studio’s founder.
Personally, I think it shows how much people love the Studio Ghibli aesthetic, but also highlights the pressing need for society and tech companise to figure out the issues with using copyrighted data for AI training. In the meantime, if you're after the real deal, don’t miss this: back in 2020, Studio Ghibli released 2,000 film stills for personal use. It's a beautiful archive that’s 100% authentic and totally worth a browse.
Stanford AI Index
What’s the state of AI in 2025?
Stanford just released their yearly AI index, and here are some of my highlights:
🏭 Industry continues to dominate AI R&D, with more than 90% of the last year’s notable model releases coming from industry
🌱 The power needed to train notable AI models is doubling annually, and as a result the carbon emissions from AI training is going up significantly
💰 The inference cost to use the most advanced models is dropping fast, and has gone from dollars to just a few cents per million tokens
📉 The performance gap at the frontier is closing fast - including the gap between open and closed weight models, or Chinese and US models
🇬🇧 US private investment in AI is 24 times that of the UK - $109bn compared to $4.5bn
✍️ AI topics have grown from 22% of computer science papers to 42% over 10 years
Career Spotlight
I know solid career advice can be hard to come by in the AI world, which is exactly why I started this spotlight series. The idea is to highlight a range of career paths in AI and hopefully spark a bit of inspiration, by sharing how some brilliant people have shaped their journeys.
This month’s spotlights include:
Simon Fothergill, Senior AI Engineer
Mahana Mansfield, Head of AI at Deliveroo
Tom Diethe, Head of the Centre for AI at AstraZeneca
Neal Lathia, CTO at Gradient Labs
Andrew Jeske, Digital Product Manager at Harvard Business School
Know anyone who has an interesting career path in AI? Introduce us, I’d love to feature them!
Podcast
This month’s first podcast episode is with Chris Pedder, Chief Data and AI officer at Obrizum, full of insight about how to empower Data and AI teams. BTW, Chris has his own Substack here which you should go check out:
The second podcast this week features Shawn Wen, CTO at PolyAI. It’s a great listen about their journey to Series C, with Shawn sharing some honest, behind-the-scenes reflections on building a company. From navigating big tech shifts to keeping the culture strong as the team scales:
AI Papers I’ve read
LLM-based speaker diarization correction: A generalizable approach
Find the paper: https://arxiv.org/abs/2406.04927
Who said what? That’s the big question behind speaker diarisation. And it’s trickier crack than you might think. Background noise, people talking over each other… it all adds to the challenge.
Diarisation usually works by clustering speaker “embeddings” from chunks of audio, or using models that handle transcription and diarisation together. But this new paper tried a fresh angle: using a large language model (LLM) to fix diarisation after the fact.
The idea? Prompt the LLM to reassign any words that were misattributed in the transcript and so correct the diarisation. Since every speech recognition tool has its own quirks, the researchers fine-tuned the LLM on examples of common mistakes from a specific transcription system.
Without that fine-tuning, the results actually got worse. But once the model was trained on those patterns? Diarisation performance jumped by 20–50%, depending on the setup. Not bad for some smart post-processing.
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad
Find the paper: https://arxiv.org/abs/2503.21934
Are LLMs good at maths? They’ve been scoring impressively on benchmark tests. But here’s the catch: those tests often just check if the final answer is right, or they don’t push models too hard when it comes to evaluating full-blown proofs.
This new paper puts things to the test, using Maths Olympiad questions released just hours before the experiment. So, no chance of the answers sneaking into training data. The set included six problems, each requiring detailed, step-by-step proofs.
Interestingly, all the LLMs claimed they’d solved all six. But when four human experts marked the responses? Most models scored under 5%. Digging into the results, researchers found that the models struggled with logic, made bold assumptions without backing them up, and didn’t show much creativity, often failing to explore different lines of reasoning.
So, while LLMs might look good at maths on paper, give them a real challenge and the cracks start to show. A good reminder to always test them on your problems, not just the benchmarks.
Interesting AI articles & other links
Thanks for reading!
See you next month,
Catherine.


