Apple Machine Learning Research at ICLR 2026

Apple is advancing AI and ML with fundamental research, much of which is shared through publications and conference participation to accelerate progress in this important field and support the broader community. This week, the 14th International Conference on Learning and Representations (ICLR) will be held in Rio de Janeiro, Brazil. Apple is proud to once again participate and support this important event for the research community with sponsorship.

At the main conference and associated workshops, Apple researchers will present new work across a variety of topics, including work that enables the large-scale training of recurrent neural networks, techniques to improve state-space models, new approaches to integrating image understanding and generation, methods for generating 3D scenes from a single photo, and new approaches to protein folding.

During exhibit hours, attendees can experience demonstrations of Apple’s ML research at booth #204, including local LLM inference on Apple silicon using MLX and sub-second sharp monocular view synthesis. Apple also sponsors and participates in numerous events hosted by affinity groups that support underrepresented groups in the ML community.

A comprehensive overview of Apple’s participation and contributions to ICLR 2026 can be found here. A selection of highlights follows below.

Although recurrent neural networks (RNNs) are inherently suited to efficient inference and require much less memory and computation than attention-based architectures, the sequential nature of their computations has historically made it impractical to scale up RNNs to billions of parameters. New advances by Apple researchers dramatically streamline RNN training, making it possible for the first time to train at scale, and expanding the architectural options available to practitioners in LLM design, especially in resource-constrained deployments.

In a new paper, “ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models,” which was accepted as oral at ICLR 2026, Apple researchers share a new framework for parallel RNN training that achieves 665x speedup compared to traditional sequential approaches (see Figure 1). This efficiency increase enables training an initial 7 billion parameter classical RNN that can achieve language modeling performance competitive with transformers (see Figure 2).

To accelerate research into efficient sequence modeling and enable researchers and practitioners to explore new nonlinear RNN models at scale, the ParaRNN codebase has been released as an open-source framework for automatic training parallelization of nonlinear RNNs.

The first author of this paper will also give an expo talk about this research at ICLR.

Acceleration with parallel RNN training

Figure 1: Comparison of parallel and sequential application runtimes for adapted ParaGRU and ParaLSTM cells as a function of input sequence length. ParaRNN unlocks the power of parallelism during training, allowing for dramatic speedups compared to regular sequential applications.

Performance of large-scale classic RNNs

Figure 2: Perplexity (lower is better) for various model sizes for Mamba2, ParaLSTM, ParaGRU, and Transformers. Because parallelization allows for large-scale training, the adapted GRU and LSTM models exhibit complexity comparable to Transformers and Mamba2.

State-space models (SSMs) like Mamba have become the leading alternative to Transformer for sequence modeling tasks. Its main advantage is the efficiency of long context and long form generation, enabled by fixed size memory and linear scaling of computational complexity. A new paper from Apple called To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models has been accepted as an oral presentation at ICLR and explores the capabilities and limitations of SSM for long-form generation tasks. This paper shows that the efficiency of SSM comes at the cost of inherent performance degradation. In fact, even though the model can generate chains of thought (CoT) of arbitrary length, SSM cannot solve long-form generation tasks if the complexity of the task exceeds the model’s capabilities. This limitation stems from the model’s finite memory, which limits its expressive power when generating long sequences.

In this paper, we show that SSM can alleviate this limitation by allowing interactive access to external tools. With proper selection of tool access and problem-dependent training data, SSM can learn to solve tractable problems and generalize to arbitrary problem length and complexity (see Figure 3). This study demonstrates that tool-enhanced SSM achieves strong length generalization in a variety of arithmetic, reasoning, and coding tasks. These findings highlight SSM as a potentially efficient alternative to Transformer in interactive tool-based and agent settings.

Figure 3: Left: Diagram of the trajectory of an interactive tool-using agent using a pointer-based memory tool to solve multi-digit additions. Agents can generate thoughts (blue), output (purple) or commands (orange), and receive observations (green) from memory tools. At each step, the top line displays the state of the memory context and below it the sequence of generated tokens. Right: Accuracy of recurrent/SSM models (Mamba, LSTM, GRU) and transformers (Pythia, Mistral) trained on trajectories with additions of less than 5 digits. Evaluated to a maximum of 1,000 digits (logarithmic scale).

Integrated multimodal LLMs capable of both image understanding and generation are attractive not only for their architectural simplicity and efficiency, but also because shared representations improve understanding, improve visual-linguistic alignment, and enable unique features such as command-driven image editing.

However, existing open-source models often make performance trade-offs between image understanding and generation capabilities. At ICLR, Apple researchers will share MANZANO: A Simple and Scalable Integrated Multimodal Model with a Hybrid Vision Tokenizer. As explained in the paper, Manzano is a unified framework designed to alleviate this performance tradeoff through simple architectural ideas (see Figure 4) and training recipes that scale well across model sizes.

Manzano uses a single shared vision encoder to feed two lightweight adapters that generate continuous embeddings for image-to-text understanding and discrete tokens for text-to-image generation within a shared semantic space. An integrated autoregressive LLM predicts high-level semantics in the form of text and image tokens, and an auxiliary diffusion decoder converts image tokens into pixels. This architecture, combined with unified training recipes for data understanding and generation, enables scalable collaborative learning of both capabilities. Manzano achieves state-of-the-art results among integrated models and competes with specialized models, especially in text-rich assessments.

Figure 4: Hybrid tokenizer workflow. (Left): The tokenizer produces two different but homogeneous streams of functionality through separate adapters. During training, one adapter output is randomly sampled and passed to a small LLM decoder for tuning. (Right): Once the tokenizer is trained, the right panel shows how these two feature types are applied to task understanding and generation.

At ICLR, Apple researchers will also share Sharp Monocular View Synthesis in Less Than a Second. It shows how to generate a 3D Gaussian representation from a photo using a single forward pass through a neural network in less than 1 second on a standard GPU. The resulting representation can be rendered in real time from a close view as a high-resolution, photorealistic 3D scene (see Figure 5).

This technology, called SHARP (Single-image High-Accuracy Real-time Parallax), provides metric representation on an absolute scale and supports metric camera movement. Experimental results show that SHARP achieves robust zero-shot generalization across datasets. We also establish new state-of-the-art techniques on multiple datasets that reduce LPIPS by 25-34%, DISTS by 21-43%, and reduce synthesis time by three orders of magnitude compared to the best traditional model.

To help the community further explore and build on this approach, the code is available here.

ICLR attendees will be able to experience this work first-hand during the exhibit at a demo at Apple booth #204.

Figure 5: SHARP synthesizes a photorealistic 3D representation from a single photo in less than a second. Top: Input image. Bottom: A novel view synthesized by SHARP. The synthesized representation supports high-resolution rendering of close views with sharp detail and fine structure at over 100 frames per second on standard GPUs.

Protein folding is a fundamental but notorious challenge in computational biology. The crux of the problem is predicting the exact three-dimensional coordinates of each atom in a protein structure based solely on the amino acid sequence (i.e., a string with 20 possible values for each letter). Predicting the 3D structure of a protein is of great importance because its function is essentially related to its spatial organization. Breakthroughs in this field will allow researchers to rapidly design and understand proteins, potentially revolutionizing drug discovery, biotechnology, and more.

At ICLR, Apple researchers will share SimpleFold: Folding Proteins is Simpler than You Think. This book details a new approach that uses a general-purpose architecture based solely on standard transformer blocks (similar to text-to-image or text-to-3D models). This approach allows SimpleFold to avoid the complex architectural design of traditional approaches while maintaining performance (see Figure 6). To help the research community build on this method, this paper comes with code and model checkpoints that can be run efficiently locally on a Mac with Apple Silicon using MLX.

Figure 6: Examples of SimpleFold predictions for targets (a) chain A of 7QSW (RubisCO large subunit) and (b) chain A of 8DAY (dimethylallyltryptophan synthase 1). The ground truth is shown in light light blue and the predictions are shown in dark teal. (c) MD ensemble data was fine-tuned with SimpleFold to generate an ensemble of target strand B (flagellar hook protein FlgE) of 6NDW. (d) Performance of SimpleFold on CASP14 when increasing model size from 100M to 3B. (e) Inference time for different sizes of SimpleFold on M2 Max 64GB MacBook Pro.

During exhibit hours, ICLR attendees can experience live demos of Apple ML research at booth #204, including:

sharp – This demo shows SHARP running on a pre-recorded set of images, or images directly captured by the user during the demo. Experience the high-speed process from image selection to processing with SHARP and displaying the generated 3D Gaussian point cloud on an iPad Pro equipped with an M5 chip.
Local LLM Inference on Apple Silicon with MLX – This demo demonstrates on-device LLM inference on a MacBook Pro with M5 Max using MLX, Apple’s open source array framework built exclusively for Apple silicon. Run the Quantized Frontier Coding model completely locally within Xcode’s native development environment. The full stack (MLX, mlx-lm, and model weights) is open source, encouraging the research community to build and extend these methods on their own.

We are proud to once again sponsor affinity groups hosting events on-site at ICLR, including Women in Machine Learning (WiML) (April 24 Social) and Queer in AI (April 25 Social). In addition to supporting these groups with sponsorships, Apple employees also participate in these and other affinity events.

ICLR brings together experts dedicated to advancing deep learning, and Apple is proud to once again share innovative new research and connect with the community attending the event. This post highlights some of the work Apple ML researchers will present at ICLR 2026. A comprehensive overview and schedule of our participation can be found here.

Source link

創建binance帳戶 commented on MEGA sconto del 34% su Amazon: Your article helped me a lot, is there any more re
binance registrering commented on Global Industrial Automation Services Market Size to Reach: Your point of view caught my eye and was very inte
binance commented on WestMetric Defends Controversial On-Page SEO Services for the Era of AI: I don't think the title of your article matches th
创建个人账户 commented on AI in CMO Strategy: Transforming Marketing Leadership: Can you be more specific about the content of your
binance account creation commented on The rise of Artificial Intelligence in Film & TV: Thank you for your sharing. I am worried that I la

Apple Machine Learning Research at ICLR 2026

Acceleration with parallel RNN training

Performance of large-scale classic RNNs

RECENT POSTS

Erica Kirk comes under fire for allegations that TPUSA used AI video of Charlie endorsing her as CEO and fabricated a chapter

OpenAI’s impact on users facing state government scrutiny

Agentic AI: Context, Control, Accountability

Acceleration with parallel RNN training

Performance of large-scale classic RNNs

Related Posts