We have been expecting this since ollama's first pull request to MLX. It is just the beginning, CUDA & CPU backends are still improving and hopefully we will have one framework unifying inference & training for all platforms.
Ollama is now updated to run the fastest on Apple silicon, powered by MLX, Apple's machine learning framework.
This change unlocks much faster performance to accelerate demanding work on macOS:
- Personal assistants like OpenClaw
- Coding agents like Claude Code, OpenCode, or Codex