Ollama Adds Apple MLX Support, Boosts Mac Model Performance

Ollama Adds Apple MLX Support, Boosts Mac Model Performance
Ars Technica2

Key Points

  • Ollama adds preview support for Apple’s open‑source MLX framework.
  • NVFP4 compression format from Nvidia is now supported for better memory efficiency.
  • The update targets Apple Silicon Macs (M1 or later) with at least 32 GB RAM.
  • Initial support includes Alibaba’s 35‑billion‑parameter Qwen 3.5 model.
  • Improvements aim to boost caching performance and overall speed on Macs.
  • Local model interest is rising amid frustrations with cloud rate limits and subscription costs.
  • Ollama also expanded its Visual Studio Code integration.

Ollama, a runtime for running large language models locally, announced preview support for Apple’s open‑source MLX framework and added Nvidia’s NVFP4 compression format. The update targets Apple Silicon Macs, requiring at least 32 GB of RAM, and currently supports Alibaba’s 35‑billion‑parameter Qwen 3.5 model. These changes aim to improve caching, memory efficiency, and overall speed, aligning with growing interest in running AI models on personal machines amid frustrations with cloud‑based rate limits and subscription costs.

Ollama Expands Local Model Capabilities

Ollama, a runtime system designed to operate large language models on a local computer, has introduced two major enhancements in its latest preview release (Ollama 0.19). First, the platform now supports Apple’s open‑source MLX framework for machine learning, which is tailored for Apple Silicon chips such as the M1 and later models. Second, Ollama has added support for Nvidia’s NVFP4 compression format, a technique that improves memory usage for certain models.

These technical upgrades are positioned to deliver noticeably faster performance on Macs equipped with Apple Silicon. The company notes that the combination of MLX support and NVFP4 compression promises “significantly improved performance” for users who meet the hardware requirements. Specifically, Ollama requires an Apple Silicon‑equipped Mac with at least 32 GB of RAM to run the supported model.

At launch, the preview supports a single model: the 35‑billion‑parameter variant of Alibaba’s Qwen 3.5. While the hardware demands are high by typical consumer standards, the targeted audience includes developers, researchers, and hobbyists who are experimenting with local AI models.

The timing of these enhancements coincides with a surge in interest in running large language models locally. The open‑source project OpenClaw, for example, quickly accumulated over 300,000 stars on GitHub and generated widespread attention, especially in China. Users are increasingly seeking alternatives to cloud‑based services that impose rate limits or require costly subscriptions, such as Claude Code or ChatGPT Codex. By enabling more efficient local execution, Ollama aims to address these pain points.

In addition to the MLX integration, Ollama recently expanded its Visual Studio Code integration, further streamlining the workflow for developers who wish to incorporate local AI models into their coding environment.

Overall, Ollama’s latest preview release positions the platform as a more viable option for users who want high‑performance AI capabilities without relying on external cloud services. The focus on Apple Silicon, combined with memory‑saving compression techniques, reflects a broader industry trend toward on‑device AI processing.

#Ollama#Apple Silicon#MLX#large language models#local AI#model compression#NVFP4#developer tools#open source#AI performance
Generated with  News Factory -  Source: Ars Technica2

Also available in:

Ollama Adds Apple MLX Support, Boosts Mac Model Performance | AI News