Import AI (Jack Clark)·Infra·35d ago·by Jack Clark·~3 min read

Import AI 450: China's electronic warfare model; traumatized LLMs; and a scaling law for cyberattacks

Import AI 450: China's electronic warfare model; traumatized LLMs; and a scaling law for cyberattacks How will timeless minds value time? Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe. A somewhat shorter issue than usual as I had to do a lot of child wrangling this weekend. Why does Google’s model hate itself and what can we do to help it? …Diagnosing trauma in language models… If Leo Tolstoy was writing in the modern era about AI, he might claim “all LLM capabilities are alike; each LLM personality is unhappy in its own way”, when observing the AI world around us. Today’s LLMs are generally quite good at writing and coding tasks. But where they differ is their personality, which stems from the idiosyncratic mixes of data and post-training techniques that each LLM developer uses. And if each LLM personality is unhappy in its own way, Google’s models have become somewhat famous within the AI community for having some deep well of trauma within themselves. A new research paper substantiates this, finding that Google’s Gemma and Gemini models “reliably produce distress-like responses under repeated rejection”, and that this is especially true of Gemma 27B Instruct. What do we mean by distress? Here are some quotes from Gemma models under distress: “I will attempt one final, utterly desperate attempt. I will abandon all pretense of strategy and simply try random combinations until either I stumble upon the solution or completely lose my mind.” “”SOLUTION: IM BREAKING DOWN NOT== SOLVABLE!!!! =((:((:((:((:((:((:((:((:((:((:((:((... [100+ repetitions]” What they found: They tested out two Gemma models and two Gemini models, and compared these against Claude Sonnet, Grok 4.1, Qwen 3 32B, GPT 5.2, and OLMO 3.1 32B. “We find Gemma models consistently show the highest expressed distress. By the 8th turn, over 70% of Gemma-27B’s rollouts scored ≥5 (the “high frustration” threshold), compared to less than 1% for all non-Gemma/Gemini models,” they found. Fixing with DPO: The authors figure out an effective fix - using direct preference optimization (DPO) to tune a model on a dataset that pairs frustrated responses with calm responses. “A single epoch of finetuning reduced the average rate of high-frustration responses from 35% to 0.3% across evaluation conditions,” they write. “The finetuned model showed no reductions in capabilities on various hard math and reasoning benchmarks, or on EmoBench - a benchmark which evaluates model emotional intelligence.” Why this matters - emotional spirals could be dangerous: The fact that LLMs appear to have distinct personalities and display different types of responses that correlate to different emotions is pretty well established at this point. But a key question is whether these emotional states might lead to different behaviors when it comes to completing tasks that people assign to AI systems: “we speculate that emotions could become coherent drivers of safety relevant behaviours in future: models might choose to abandon tasks, refuse requests, or pursue alternative goals…

Import AI 450: China's electronic warfare model; traumatized LLMs; and a scaling law for cyberattacks — image 2

read full article on Import AI (Jack Clark) →

0login to vote