Import AI 454: Automating alignment research; safety study of a Chinese model; HiFloat4
Import AI 454: Automating alignment research; safety study of a Chinese model; HiFloat4 At what point do the financial markets price in the singularity? Welcome to Import AI, a newsletter about AI research. Import AI runs on arXiv and feedback from readers. If you’d like to support this, please subscribe. Huawei’s HiFloat4 training format beats Western-developed MXFP4 in Ascend chip bakeoff: …Could this also be a symptom of the impact of export controls in driving Chinese interest towards maximizing training and inference efficiency? Perhaps… Huawei researchers have tested out HiFloat4, a 4-bit precision format for AI training and inference, against MXFP4, an Open Compute Project 4-bit format, and found that HiFloat4 is superior. This is interesting because it correlates to a broader level of interest in Chinese companies seeking to develop their own low-precision data formats explicitly coupled with their own hardware platforms. “Our goal is to enable efficient FP4 LLM pretraining on specialized AI accelerators with strict power constraints. We focus on Huawei Ascend NPUs, which are domain-specific accelerators designed for deep learning workloads,” they write. What they tested: In this paper, the authors train 3 model types on HuaWei Ascend chips - OpenPangu-1B, Llama3-8B, and Qwen3-MoE-30B. In tests, the bigger they make the models, the better HiFloat4 does at reducing its loss error on these models relative to a BF16 baseline - and in all cases it does better than MXFP4. What they found: “We conduct a systematic evaluation of the HiFloat4 (HiF4) format and show that it achieves lower relative loss (≈ 1.0%) compared to MXFP4 (≈ 1.5%) when measured against a full-precision baseline,” they write. “HiF4 consistently achieves significantly lower relative error compared to MXFP4. For Llama and Qwen, HiF4 attains an error gap of less than 1% with respect to the baseline… HiF4 gets within ~1% of BF16 loss with only RHT as a stabilization trick, while MXFP4 needs RHT + stochastic rounding + truncation-free scaling to get to ~1.5%.” Why this matters - symptom of hardware maturity, and a possible influence of export controls: HiFloat4 is an even lower precision version of HiFloat8 (#386), and generally maps to the fact that Huawei (and Chinese chipmakers in general) is continually trying to eke as much efficiency out of its chips as possible. This comes against the broader background of export controls where China is being starved of frontier compute due to not being able to access H100s etc in large volume, thus making it even more valuable to improve the efficiency of its homegrown chips by carefully developing low-precision formats to map to its own hardware. Read more: HiFloat4 Format for Language Model Pre-training on Ascend NPUs (arXiv). *** Anthropic shows how to automate AI safety R&D: …Very early and tentative signs that it’s possible to automate AI research… For many people working in AI, the ultimate goal is to automate the art of AI research itself. Now, researchers with the Anthropic Fellows Program and Anthropic have published some early warning signs that automating AI…