Ollama in 2025: Major Updates Transform Local AI Experience

Paul Strebenitzer | 13.08.2025 artificial intelligence, devops

Ollama Logo

Remember last July when we dug into Ollama and its promise for local AI? Well, a lot has happened since then. Ollama has rolled out some pretty significant updates that show they're not just sticking to the command line anymore — though they haven't forgotten their roots either.

Finally, a Real Desktop App

Ollama Desktop App Interface

Ollama introduced a native desktop application for macOS and Windows on July 30, 2025, taking an important step forward in usability, as the tool becomes more accessible to a wider audience. The interface is clean and straightforward, focusing on essential features like a chat window and history of interactions without unnecessary complexity. Users can easily drag and drop PDFs or images into the chat window for seamless model interaction, if the model supports it. Some settings were also introduced in the new UI. For example, the addition of a context-length slider allows precise control over how much information the model retains, which proves valuable during extended sessions.

Turbo: Ollama Goes to the Cloud (Sort Of)

Ollama Turbo Cloud Service

Ollama launched Turbo in preview this August — a cloud inference service priced at $20 per month. This represents a shift for a platform originally focused on local AI deployment. Turbo offers access to datacenter-grade hardware for running larger models that would be impractical on the typical consumer hardware.

Performance and Technical Specifications:

Speed: Up to 300+ tokens per second, though some reports suggest up to 1,200 tokens per second for certain models
Currently supports GPT-OSS models: Ollama partners with OpenAI to serve its open weight models (Apache 2.0 license), including 20B and 120B parameter versions
US-based infrastructure with a stated no-data-retention policy
Compatible with Ollama's existing CLI, API, and desktop app
Includes hourly and daily usage limits
Usage-based pricing planned for future release

What the Community Thinks

The announcement has generated mixed reactions. While some appreciate the option for enhanced performance, others question whether this aligns with Ollama's local-first philosophy. At $20/month, the service faces competition from well-established cloud providers like Anthropic Claude and OpenAI, offering their proprietary models at similar pricing.

Important Note: Local Ollama functionality remains completely free and requires no account. You can download, run, and manage models locally without any registration or subscription. The account system only becomes relevant when accessing cloud-based features like Turbo.

Developer Features That Matter

There were also further API enhancements that developers should be aware of:

Structured Outputs with JSON Schema: Type-safe API responses using JSON schemas; eliminates parsing errors.
Streaming Responses with Tool Calls: Real-time function execution during streaming responses; enables more interactive apps.
Enhanced Multimodal Support: Improved image and document handling; powers drag-and-drop and vision models.
Thinking Mode for Reasoning Models: Control visibility of model reasoning steps for flexible application UX.

Secure Minions Protocol

Announced in June 2025 with Stanford’s Hazy Research lab, Secure Minions lets local Ollama models work together with more powerful cloud models while keeping all data end‑to‑end encrypted. Running on NVIDIA H100 GPUs in confidential computing mode, it delivers up to 98% of frontier‑model accuracy with 5–30x lower cost and under 1% added latency — making sure sensitive context never leaves the device in plaintext.

Under the Hood Improvements

Several optimization updates enhance performance and usability:

GPU acceleration: Improved VRAM usage for NVIDIA, AMD, and Apple Silicon.
Configurable model storage: Choose custom locations for model files.
LAN mode: Share models across your local network.
Context window: Better memory management for longer sessions.

Looking Ahead

The 2025 updates demonstrate Ollama's evolution from a simple local model runner to a robust AI platform offering both local and cloud-powered options. While the introduction of paid services like Turbo has sparked debate, Ollama continues to prioritize free, open-source local functionality, ensuring accessibility for privacy-focused and cost-conscious users. Looking forward, the team has hinted at upcoming features that will further enhance desktop integration, including a computer use agent update for seamless interaction with local files and applications — similar to recent tools like Claude's desktop agent.

This new hybrid approach positions Ollama to serve both individual developers seeking local AI capabilities and organizations requiring scalable, high-performance inference — though the long-term success of this strategy will depend on competitive pricing, model availability, and continued investment in local-first features.

We all know AI is a never-ending journey of improvement and adaptation and it is often hard to stay up-to-date with the latest advancements. We at Infralovers are committed to helping you navigate this landscape with, so be sure to subscribe to our newsletter for the latest updates and insights.

Go Back explore our courses

AI Coding Essentials

Discover how to leverage AI tools to enhance coding efficiency, automate repetitive tasks, and unlock innovative development workflows in this hands-on session.

AI Essentials for Engineers

Transform your engineering workflows with hands-on AI: Deploy LLMs, automate infrastructure, and master the latest tools and protocols for secure, compliant, and efficient operations.

Matthias Theuermann | 05.11.2025 security, hashicorp, devops

Keeping Credentials Out of Code - A Practical Guide to 1Password and Vault

The Problem: Hardcoded Credentials Every developer has faced this temptation: you need to test something quickly, so you hardcode an API key or database

Matthias Theuermann | 31.10.2025 artificial intelligence, platform engineering, cloud native, ci/cd

Beyond Copy-Paste: Building Backstage with AI-Assisted Development

How Claude Sonnet 4.5 and GitHub Copilot helped us navigate the maze of custom Backstage integrations The Backstage Promise (and the Reality) Spotify's

Martin Buchleitner | 24.10.2025 artificial intelligence, platform engineering, cloud native, devops

Running MCP Servers Declaratively with Toolhive on Kubernetes

Short read (~6–7 min) – focused on the operational side of Model Context Protocol (MCP) enablement. 1. Problem in Practice Single MCP servers are easy.

Martin Buchleitner | 23.10.2025 artificial intelligence, devops, cloud native, platform engineering, security

Operational Patterns: LiteLLM with MCP Servers (and an n8n + Open WebUI Alternative)

Introduction Model access alone rarely delivers differentiated organizational value. Real leverage appears when language models can safely invoke tools

Jürgen Brüder | 21.10.2025 devops, cloud native

Docker vs Podman - Choosing the Right Container Platform for Your Team

The container ecosystem has evolved significantly over the past few years, and teams today have more choices than ever when selecting their container runtime

We are here for you

You are interested in our courses or you simply have a question that needs answering? You can contact us at anytime! We will do our best to answer all your questions.