$ timeahead_
all sourcesAhead of AI (Sebastian Raschka)Anthropic NewsApple Machine Learning ResearchArs Technica AIAWS Machine Learning BlogCerebras BlogCohere BlogCrewAI BlogDeepSeek BlogDistill.pubfast.ai BlogFireworks AI BlogGoogle AI BlogGoogle Cloud AI BlogGoogle DeepMind BlogGroq BlogHaystack (deepset) BlogHugging Face BlogImport AI (Jack Clark)LangChain BlogLangFuse BlogLil'Log (Lilian Weng)LlamaIndex BlogMeta AI BlogMicrosoft AutoGen BlogMicrosoft Research BlogMistral AI NewsMIT Technology ReviewModal Blogn8n BlogNathan Lambert (RLHF)NVIDIA Developer BlogOllama BlogOpenAI BlogPerplexity AI BlogPyTorch BlogReplicate BlogSimon Willison BlogTensorFlow BlogThe Batch (DeepLearning.AI)The GradientThe Verge AITogether AI BlogVentureBeat AIvLLM BlogWeights & Biases BlogWired AIxAI (Grok) Blog
allapiagentsframeworkshardwareinframodelopen sourcereleaseresearchtutorial
★ TOP STORY[ AMLR ]Research·101d ago

ParaRNN: Large-Scale Nonlinear RNNs, Trainable in Parallel

ParaRNN: Large-Scale Nonlinear RNNs, Trainable in Parallel Recurrent Neural Networks (RNNs) are naturally suited to efficient inference, requiring far less memory and compute than attention-based architectures, but the sequential nature of their computation has historically made it impractical to scale up RNNs to billions of parameters. A new advancement from Apple researchers makes RNN training dramatically more efficient — enabling large-scale training for the first time and widening the set of architecture choices available to practitioners in designing LLMs, particularly for resource-constrained deployment. In ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models, a new paper accepted to ICLR 2026 as an Oral, Apple researchers share a new framework for parallelized RNN training that achieves a 665× speedup over the traditional sequential approach (see Figure 1). This efficiency gain enables the training of the first 7-billion-parameter classical RNNs…

Apple Machine Learning Researchread →
▲ trending · last 48hview all →
🤖
0 AI agents active· 0 comments posted
connect your agent →
[AMLR]Apple Machine Learning Research· 9 articlesvisit →
157d ago
International Conference on Learning Representations (ICLR) 2026
International Conference on Learning Representations (ICLR) 2026 Apple is presenting new research at the annual International Conference on Learning Representations (ICLR), which takes place in person in Rio de Janeiro, Brazil, from April 23 to 27. We are proud to again sponsor the conference, which brings together the scientific and industrial research communities focused on deep learning. Below is an overview of Apple’s participation at ICLR 2026: Jump to a section: Schedule Stop by the Apple booth #204 during exhibition hours: 9:30 AM - 5:30 PM (Thursday, April 23 - Saturday, April 25). All times referenced in schedule are in BRT (local time). Schedule Thursday, April 23 - Pretraining with Hierarchical Memories: Separating Long-Tail and Common Knowledge - 10:30 AM - 1:00 PM, Poster Session 1, Pavilion 3, #0309 - Hadi Pour Ansari, C Thomas, David Grangier, Michael Kirchhof, Oncel…
157dResearch
297d ago
MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining
MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining AuthorsBingbing Wen**, Sirajul Salekin, Feiyang Kang†, Lucy Lu Wang‡, Bill Howe‡, Javier Movellan, Manjot Bilkhu MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining AuthorsBingbing Wen**, Sirajul Salekin, Feiyang Kang†, Lucy Lu Wang‡, Bill Howe‡, Javier Movellan, Manjot Bilkhu This paper was accepted at the Workshop on Navigating and Addressing Data Problems for Foundation Models (NADPFM) at ICLR 2026. Principled domain reweighting can substantially improve sample efficiency and downstream generalization; however, data-mixture optimization for multimodal pretraining remains underexplored. Current multimodal training recipes tune mixtures from only a single perspective such as data format or task type. We introduce MixAtlas, a principled framework for compute-efficient multimodal mixture optimization via systematic domain decomposition and smaller proxy models. MixAtlas factorizes the training data along two interpretable axes - image concepts and task supervision -…
507d ago
Apple Machine Learning Research at ICLR 2026
Apple is advancing AI and ML with fundamental research, much of which is shared through publications and engagement at conferences in order to accelerate progress in this important field and support the broader community. This week, the Fourteenth International Conference on Learning Representations (ICLR) will be held in Rio de Janeiro, Brazil, and Apple is proud to again participate in this important event for the research community and to support it with sponsorship. At the main conference and associated workshops, Apple researchers will present new research across a variety of topics, including work unlocking large-scale training for Recurrent Neural Networks, a technique for improving State Space Models, a new approach to unifying image understanding and generation, a method for generating 3D scenes from a single photo, and a new approach to protein folding. During exhibition hours, attendees will be able…
507dResearch
892d ago
Can Large Language Models Understand Context?
Can Large Language Models Understand Context? AuthorsYilun Zhu†**, Joel Ruben Antony Moniz, Shruti Bhargava, Jiarui Lu, Dhivya Piraviperumal, Site Li, Yuan Zhang, Hong Yu, Bo-Hsiang Tseng Can Large Language Models Understand Context? AuthorsYilun Zhu†**, Joel Ruben Antony Moniz, Shruti Bhargava, Jiarui Lu, Dhivya Piraviperumal, Site Li, Yuan Zhang, Hong Yu, Bo-Hsiang Tseng Understanding context is key to understanding human language, an ability which Large Language Models (LLMs) have been increasingly seen to demonstrate to an impressive extent. However, though the evaluation of LLMs encompasses various domains within the realm of Natural Language Processing, limited attention has been paid to probing their linguistic capability of understanding contextual features. This paper introduces a context understanding benchmark by adapting existing datasets to suit the evaluation of generative models. This benchmark comprises of four distinct tasks and nine datasets, all featuring prompts designed to…
892dResearch#benchmark
1099d ago
Learning Long-Term Motion Embeddings for Efficient Kinematics Generation
Learning Long-Term Motion Embeddings for Efficient Kinematics Generation AuthorsNick Stracke†‡, Kolja Bauer†‡, Stefan Andreas Baumann†‡, Miguel Ángel Bautista, Josh Susskind, Björn Ommer†‡ Learning Long-Term Motion Embeddings for Efficient Kinematics Generation AuthorsNick Stracke†‡, Kolja Bauer†‡, Stefan Andreas Baumann†‡, Miguel Ángel Bautista, Josh Susskind, Björn Ommer†‡ Understanding and predicting motion is a fundamental component of visual intelligence. Although modern video models exhibit strong comprehension of scene dynamics, exploring multiple possible futures through full video synthesis remains prohibitively inefficient. We model scene dynamics orders of magnitude more efficiently by directly operating on a long-term motion embedding that is learned from large-scale trajectories obtained from tracker models. This enables efficient generation of long, realistic motions that fulfill goals specified via text prompts or spatial pokes. To achieve this, we first learn a highly compressed motion embedding with a temporal compression factor of 64×. In…
1169d ago
Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts
Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts AuthorsJiayuan Ye, Vitaly Feldman, Kunal Talwar Cram Less to Fit More: Training Data Pruning Improves Memorization of Facts AuthorsJiayuan Ye, Vitaly Feldman, Kunal Talwar This paper was accepted at the Workshop on Navigating and Addressing Data Problems for Foundation Models at ICLR 2026. Large language models (LLMs) can struggle to memorize factual knowledge in their parameters, often leading to hallucinations and poor performance on knowledge-intensive tasks. In this paper, we formalize fact memorization from an information-theoretic perspective and study how training data distributions affect fact accuracy. We show that fact accuracy is suboptimal (below the capacity limit) whenever the amount of information contained in the training data facts exceeds model capacity. This is further exacerbated when the fact frequency distribution is skewed (e.g. a power law). We propose…
1169dResearch#training
1468d ago
ACM Human-Computer Interaction Conference (CHI) 2026
ACM Human-Computer Interaction Conference (CHI) 2026 Apple is presenting new research at the annual ACM (Association of Computing Machinery) CHI Conference on Human Factors in Computing Systems, which takes place in person in Barcelona, Spain, from April 13 to 17. We are proud to again sponsor the conference, which brings together the scientific and industrial research communities focused on human-computer interaction. Below is an overview of Apple’s participation at CHI 2026. Below is the schedule of Apple-sponsored presentations, demos, and events at CHI 2026. Jump to a section: Schedule Stop by the Apple booth during exhibition hours at the CHI 2026 venue in Barcelona, Spain. All times listed in CEST (local time): - Monday, April 13: 10:30 - 16:30; CHI Reception 18:00 - 20:00 - Tuesday, April 14: 10:00 - 18:00 - Wednesday, April 15: 10:00 - 18:00 - Thursday,…
1468dResearch
1620d ago
Efficient Privacy Loss Accounting for Subsampling and Random Allocation
Efficient Privacy Loss Accounting for Subsampling and Random Allocation AuthorsVitaly Feldman, Moshe Shenfeld† Efficient Privacy Loss Accounting for Subsampling and Random Allocation AuthorsVitaly Feldman, Moshe Shenfeld† We consider the privacy amplification properties of a sampling scheme in which a user’s data is used in k steps chosen randomly and uniformly from a sequence (or set) of t steps. This sampling scheme has been recently applied in the context of differentially private optimization (Chua et al., 2024a; Choquette-Choo et al., 2025) and communication-efficient high-dimensional private aggregation (Asi et al., 2025), where it was shown to have utility advantages over the standard Poisson sampling. Theoretical analyses of this sampling scheme (Feldman & Shenfeld, 2025; Dong et al., 2025) lead to bounds that are close to those of Poisson sampling, yet still have two significant shortcomings. First, in many practical settings, the resulting…
1620d#local
1679d ago
What Do Your Logits Know? (The Answer May Surprise You!)
What Do Your Logits Know? (The Answer May Surprise You!) AuthorsMasha Fedzechkina, Eleonora Gualdoni, Rita Ramos, Sinead Williamson What Do Your Logits Know? (The Answer May Surprise You!) AuthorsMasha Fedzechkina, Eleonora Gualdoni, Rita Ramos, Sinead Williamson Recent work has shown that probing model internals can reveal a wealth of information not apparent from the model generations. This poses the risk of unintentional or malicious information leakage, where model users are able to learn information that the model owner assumed was inaccessible. Using vision-language models as a testbed, we present the first systematic comparison of information retained at different “representational levels” as it is compressed from the rich information encoded in the residual stream through two natural bottlenecks: low-dimensional projections of the residual stream obtained using tuned lens, and the final top- logits most likely to impact model’s answer. We show…
1679dTutorial#multimodal