Ollama — Complete Setup & Implementation Guide 2026

01 Getting Ready

Prerequisites

Ensure your system meets the requirements and your tools are current before installing Ollama.

⚙️

System Requirements

Requirement	Minimum	Recommended
RAM	8 GB	16 GB+
Disk Space	12 GB free	50 GB+ free
OS (Windows)	Windows 10 (1903+)	Windows 11
OS (Linux)	Ubuntu 20.04+	Ubuntu 22.04 / 24.04
OS (macOS)	macOS 12+	macOS 14+ (Apple Silicon)
GPU (Optional)	NVIDIA 6 GB / AMD Radeon	NVIDIA 8 GB+ / Apple M-series

🔧

Update Your System First

PowerShell
# Check your Windows version (must be 10 1903+ or 11)
winver

# Ensure Windows is updated via Settings → Windows Update
# No additional package managers needed — Ollama has a
# native .exe installer that works without Admin rights

# Optional: Install Windows Terminal for a better CLI experience
winget install Microsoft.WindowsTerminal

🪟

No WSL needed! Ollama runs natively on Windows 10/11. The installer is a simple .exe that doesn't require Administrator privileges. It auto-starts in your system tray.

Bash — Terminal
# Update package lists and upgrade installed packages
sudo apt update && sudo apt upgrade -y

# Install essential tools
sudo apt install -y curl wget git

# Verify curl is installed
curl --version

💡

Why sudo? Many install commands need root privileges to write to system directories. The sudo prefix temporarily elevates your permissions and will prompt for your password.

Bash — Terminal
# Install Homebrew if you don't have it
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Update Homebrew and install essentials
brew update
brew install curl wget git

# Verify macOS version
sw_vers

02 Setup

Install Ollama

Ollama runs natively on all three platforms — no virtual machines required. Pick your OS below.

🎯

Native Windows — No WSL required. Ollama installs as a native Windows application with full NVIDIA and AMD Radeon GPU support. It runs in the background via system tray, and the CLI works in CMD, PowerShell, or Windows Terminal.

1

Install Ollama

The fastest way is a single PowerShell command. Alternatively, download the .exe installer manually.

PowerShell (Recommended)
# One-line install — paste in PowerShell and press Enter
irm https://ollama.com/install.ps1 | iex

Manual Download (Alternative)
# Option A: Open in browser
https://ollama.com/download/windows

# Option B: Download via PowerShell
Invoke-WebRequest -Uri "https://ollama.com/download/OllamaSetup.exe" -OutFile "OllamaSetup.exe"

2

Confirm Installation

If you used the PowerShell one-liner, Ollama is already installed and running. If you downloaded OllamaSetup.exe manually, double-click it and click Install. Either way, the 🦙 Ollama icon will appear in your system tray and the server starts automatically.

3

Verify in Terminal

Open Command Prompt, PowerShell, or Windows Terminal and run:

CMD / PowerShell / Windows Terminal
# Check Ollama version
ollama --version
# Expected: ollama version 0.x.x

# Test the server is responding
curl http://localhost:11434
# Expected: Ollama is running

4

Run Your First Model!

That's it — you're ready. Pull and run a model right away:

Terminal
ollama run llama3.2
# Downloads the model (~2 GB) then opens interactive chat
>>> Hello! Tell me about yourself.

✅

Health Check — Open http://localhost:11434 in your browser. You should see "Ollama is running". The API and CLI both use port 11434.

Ollama stores files in these Windows locations. Press Win+R and paste these paths:

Path	Contains
%LOCALAPPDATA%\Ollama	Logs (app.log, server.log) and downloaded updates
%LOCALAPPDATA%\Programs\Ollama	Binaries (auto-added to your user PATH)
%HOMEPATH%\.ollama	Downloaded models and configuration

To change model storage location, set the OLLAMA_MODELS environment variable via Windows Settings → "Environment variables". To uninstall, use Settings → Add or remove programs → Ollama.

⚡

GPU Acceleration — Ollama automatically detects and uses NVIDIA (CUDA) and AMD Radeon GPUs on Windows. No additional driver setup needed beyond having your standard GPU drivers installed.

1

Run the Official Install Script

This single command downloads and installs Ollama, sets up the systemd service, and configures everything automatically.

Terminal
sudo apt-get update
sudo apt-get install zstd

curl -fsSL https://ollama.com/install.sh | sh

2

Verify the Installation

Confirm Ollama is installed and the systemd service is running.

Terminal
# Check Ollama version
ollama --version
# Expected: ollama version 0.x.x

# Check the service status
sudo systemctl status ollama
# Should show: ● ollama.service — active (running)

3

Enable Auto-Start on Boot

Ensure Ollama starts automatically every time your machine boots.

Terminal

sudo systemctl enable ollama
sudo systemctl restart ollama

✅

Quick Health Check — Open your browser and visit http://localhost:11434. You should see "Ollama is running".

1

Download Ollama

Option A — Homebrew

brew install ollama

Option B — Direct Download

open https://ollama.com/download

2

Install & Launch

If using the .zip download, extract it and move Ollama.app to your Applications folder, then launch it — the llama icon will appear in your menu bar.

3

Verify Installation

Terminal
ollama --version

# Test that the server is responsive
curl http://localhost:11434
# Expected: Ollama is running

🍎

Apple Silicon Advantage — M1/M2/M3/M4 Macs automatically use the Metal GPU for acceleration. Zero extra drivers or configuration needed.

03 Account

Sign In to Ollama

Sign-in unlocks pushing custom models to the registry and accessing Ollama Cloud. Completely optional for pulling and running public models.

1

Create an Account

Visit ollama.com and click Sign Up. You can register with your email or GitHub account.

2

Authenticate via CLI

Any Terminal (Windows / Linux / macOS)

# Authenticate with your Ollama account
ollama signin

3

Explore the Model Library

Hundreds of models at ollama.com/library — from 1B parameter pocket models to 70B+ powerhouses.

🔓

No account needed for most tasks. Pull and run any public model anonymously. Sign-in is only required to push your own custom models to the Ollama registry or to use Ollama Cloud features.

04 Command Line

CLI Deep Dive

Every Ollama command you'll ever need — identical across Windows, macOS, and Linux.

🔄

Same CLI everywhere. All ollama commands work identically in Windows CMD/PowerShell, macOS Terminal, and Linux bash. The examples below work on all platforms.

⌨️

Complete Command Reference

Command	What It Does
ollama serve	Start the Ollama server (port 11434 by default)
ollama run <model>	Download (if needed) and start interactive chat
ollama pull <model>	Download a model without starting it
ollama ls	Show all locally downloaded models with sizes (alias: `ollama list`)
ollama ps	Show currently loaded / running models in memory
ollama show <model>	Display model details — parameters, template, license
ollama stop <model>	Unload a model from memory
ollama rm <model>	Permanently delete a model from disk
ollama cp <src> <dst>	Clone a model under a new name
ollama create -f <Modelfile>	Create a custom model from a Modelfile
ollama push <model>	Push a custom model to the Ollama registry
ollama launch	Configure and launch integrations (Claude Code, Codex, OpenCode, Droid)
ollama signin	Sign in to your Ollama account
ollama signout	Sign out of your Ollama account
ollama --version	Display installed Ollama version

Any Terminal (Works on All OS)
# Step 1: Ensure the server is running
# Windows: It auto-starts via system tray
# Linux:   sudo systemctl start ollama
# macOS:   Open the Ollama app, or run:
ollama serve &

# Step 2: Pull some models
ollama pull llama3.2:1b        # Meta Llama 3.2  — ~1.3 GB
ollama pull gemma3:4b          # Google Gemma 3  — ~3.3 GB
ollama pull deepseek-r1:8b     # DeepSeek R1     — ~4.9 GB
ollama pull qwen3:7b           # Qwen 3          — ~4.4 GB
ollama pull mistral            # Mistral 7B      — ~4.1 GB

# Step 3: List downloaded models
ollama ls
# NAME              ID            SIZE      MODIFIED
# llama3.2:1b       a80c4f17...   1.3 GB    just now

# Step 4: Start an interactive session
ollama run llama3.2:1b
>>> Hello! What can you help me with?

# Step 5: In another terminal — check what's loaded
ollama ps

# Step 6: Show model details
ollama show llama3.2:1b

# Step 7: Unload when done
ollama stop llama3.2:1b

Inside an ollama run session, these slash commands are available:

Command	Action
/bye	Exit the chat session
/clear	Clear the conversation context
/set	Set session variables (system prompt, parameters, format, verbose, wordwrap, etc.)
/show	Show model information (info, license, parameters, system, template)
/load <model>	Load a session or switch to a different model
/save <name>	Save the current session state
/?, /help	Show all available slash commands and keyboard shortcuts

💡

Useful /set sub-commands: /set system <msg> sets a system prompt, /set parameter temperature 0.7 adjusts creativity, /set verbose shows token stats, /set format json enables JSON mode.

🪟 Windows

PowerShell or CMD
# Set environment variables for your user account
setx OLLAMA_HOST "0.0.0.0"
setx OLLAMA_ORIGINS "*"
setx OLLAMA_MODELS "D:\ollama-models"  # Optional: change model storage

# Or via GUI: Settings → search "environment variables"
# → Edit environment variables for your account

# After setting, quit Ollama from system tray and relaunch

🐧 Linux — systemd service override

Bash
# Edit the Ollama service configuration
sudo systemctl edit ollama

# Add these lines under [Service]:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_ORIGINS=*"
Environment="OLLAMA_NUM_PARALLEL=2"

# Save and reload
sudo systemctl daemon-reload
sudo systemctl restart ollama

🍎 macOS — launchctl

Terminal
launchctl setenv OLLAMA_HOST "0.0.0.0"
launchctl setenv OLLAMA_ORIGINS "*"
# Then restart the Ollama app from menu bar

Ollama integrates with a wide range of tools. Launch supported integrations directly with ollama launch. See the full list at docs.ollama.com/integrations.

Category	Tools	Description
Coding Agents	Claude Code, Codex, OpenCode, Droid, Goose	AI agents that read, modify, and execute code
Assistants	OpenClaw	General-purpose AI assistant across WhatsApp, Telegram, Slack, Discord
IDEs	VS Code, Cline, Roo Code, JetBrains, Xcode, Zed	Native IDE integrations for code completion and assistance
Chat & RAG	Onyx	Chat interface with retrieval-augmented generation
Automation	n8n	Workflow automation with AI
Notebooks	marimo	Interactive computing with AI

05 Models

Install & Run Models

Five powerful open-source LLMs with install commands. These commands work identically on Windows, Linux, and macOS.

📦

Storage Note: Models range from ~1.3 GB to ~4.9 GB each. Check free space first: df -h (Linux/macOS) or dir C:\ (Windows).

🦙 Llama 3.2

Meta AI1B / 3BGeneral Purpose

Meta's latest compact model family — excellent for general conversation, reasoning, and code. The 3B variant offers strong performance while staying lightweight enough for most hardware.

Install & Run
# Pull the 3B model (recommended starting point — ~2 GB)
ollama pull llama3.2

# Or the lighter 1B version (~1.3 GB)
ollama pull llama3.2:1b

# Run interactively
ollama run llama3.2
>>> Explain the concept of microservices architecture

🧠 DeepSeek R1

DeepSeek AI8B / 14BChain-of-Thought

A powerful reasoning model with visible chain-of-thought steps. Exceptional at complex problem-solving, mathematics, and logical inference tasks.

Install & Run
# Pull the 8B reasoning model (~4.9 GB)
ollama pull deepseek-r1:8b

# Run with thinking output visible (supports levels: true, high, medium, low)
ollama run deepseek-r1:8b --think
>>> Design a rate limiter for an API gateway

💎 Gemma 3

Google4B / 12BMultimodal

Google's open model with multimodal support (text + images). Fast, efficient, and excellent for its size class across coding and language tasks.

Install & Run
# Pull the 4B model (~3.3 GB)
ollama pull gemma3:4b

ollama run gemma3:4b
>>> What is the difference between REST and GraphQL?

🌐 Qwen 3

Alibaba7B / 14B / 32BMultilingual

Alibaba's top-performing open model family — exceptional at multilingual tasks, coding, and reasoning. One of the best open models available in 2026.

Install & Run
# Pull the 7B model (~4.4 GB)
ollama pull qwen3:7b

ollama run qwen3:7b
>>> Write a Python decorator for caching with TTL

🌊 Mistral

Mistral AI7BFast & Efficient

Mistral's flagship 7B model — impressively fast with low latency. Punches well above its weight class for coding, summarisation, and instruction-following.

Install & Run
# Pull Mistral 7B (~4.1 GB)
ollama pull mistral

ollama run mistral
>>> Explain how Docker containers work

📋

Pull all 5 models in one command:

Terminal (any OS)
ollama pull llama3.2:1b
ollama pull deepseek-r1:8b
ollama pull gemma3:4b
ollama pull qwen3:7b
ollama pull mistral

06 Validation

Test & Validate Each Model

Three computer science prompts per model covering software development, AI agents, and LLM internals. Run these to compare quality.

🦙 Testing Llama 3.2

Launch

ollama run llama3.2

Question 1 — Development

Explain the SOLID principles in software engineering with a real-world Python example for each principle.

Question 2 — AI Agents

What is the ReAct (Reasoning + Acting) pattern in AI agents? How does it differ from simple chain-of-thought prompting? Provide a pseudocode example.

Question 3 — LLMs

Explain the transformer attention mechanism. What are Query, Key, and Value vectors, and how does multi-head attention improve over single-head attention?

Non-Interactive One-Liners
ollama run llama3.2 "Explain the SOLID principles with Python examples"
ollama run llama3.2 "What is the ReAct pattern in AI agents?"
ollama run llama3.2 "Explain the transformer attention mechanism"

🧠 Testing DeepSeek R1

Launch

ollama run deepseek-r1:8b

Question 1 — Development

Design a distributed task queue system (like Celery). What components are needed, how do you handle failures, and what are the trade-offs between at-least-once vs exactly-once delivery?

Question 2 — AI Agents

How would you implement a tool-using AI agent that can search the web, execute code, and manage files? Describe the architecture with a state machine diagram.

Question 3 — LLMs

Compare RAG (Retrieval Augmented Generation) vs fine-tuning for adding domain knowledge to an LLM. When should you use each approach? Provide concrete scenarios.

Non-Interactive One-Liners
ollama run deepseek-r1:8b "Design a distributed task queue system"
ollama run deepseek-r1:8b "How to implement a tool-using AI agent?"
ollama run deepseek-r1:8b "Compare RAG vs fine-tuning for LLMs"

💎 Testing Gemma 3

Launch

ollama run gemma3:4b

Question 1 — Development

Explain event-driven architecture. Write a Python example using an event bus pattern with publishers and subscribers for an e-commerce order system.

Question 2 — AI Agents

What are multi-agent systems? Explain how frameworks like CrewAI or AutoGen orchestrate multiple AI agents to collaborate on complex tasks.

Question 3 — LLMs

What is quantization in LLMs? Explain the difference between FP16, INT8, and INT4 quantization and their impact on model quality and performance.

Non-Interactive One-Liners
ollama run gemma3:4b "Explain event-driven architecture with Python code"
ollama run gemma3:4b "What are multi-agent systems like CrewAI?"
ollama run gemma3:4b "Explain LLM quantization: FP16 vs INT8 vs INT4"

🌐 Testing Qwen 3

Launch

ollama run qwen3:7b

Question 1 — Development

Implement a simple API gateway in Python using FastAPI that handles rate limiting, authentication via JWT tokens, and request routing to microservices.

Question 2 — AI Agents

Explain the concept of "function calling" in LLMs. How do models decide when and how to call external tools? Show the JSON schema approach.

Question 3 — LLMs

What are the key differences between decoder-only (GPT), encoder-only (BERT), and encoder-decoder (T5) transformer architectures? When would you choose each?

Non-Interactive One-Liners
ollama run qwen3:7b "Build an API gateway with FastAPI, JWT, and rate limiting"
ollama run qwen3:7b "Explain function calling in LLMs with JSON schema"
ollama run qwen3:7b "Decoder-only vs encoder-only vs encoder-decoder transformers"

🌊 Testing Mistral

Launch

ollama run mistral

Question 1 — Development

Explain the CAP theorem with real-world examples. How do databases like PostgreSQL, MongoDB, and Cassandra make different trade-offs?

Question 2 — AI Agents

What is the MCP (Model Context Protocol) by Anthropic? How does it standardize tool use and external service integration for AI agents?

Question 3 — LLMs

Explain the concept of "context window" in LLMs. How do techniques like sliding window attention, sparse attention, and RoPE help extend context length?

Non-Interactive One-Liners
ollama run mistral "Explain CAP theorem with database examples"
ollama run mistral "What is MCP (Model Context Protocol) by Anthropic?"
ollama run mistral "Explain context windows and attention mechanisms in LLMs"

07 Integration

API & Python Integration

Ollama exposes a REST API on port 11434 and provides official Python and JavaScript SDKs for seamless app integration.

cURL (Linux / macOS / Windows Terminal)
curl http://localhost:11434/api/chat \
  -d '{
    "model": "llama3.2",
    "messages": [
      { "role": "user", "content": "Explain Docker in 3 sentences" }
    ],
    "stream": false
  }'

PowerShell (Windows native)
# PowerShell equivalent using Invoke-WebRequest
(Invoke-WebRequest -Method POST `
  -Body '{"model":"llama3.2", "prompt":"Explain Docker in 3 sentences", "stream": false}' `
  -Uri http://localhost:11434/api/generate
).Content | ConvertFrom-Json

List Local Models

curl http://localhost:11434/api/tags

Install SDK

pip install ollama

basic_chat.py
from ollama import chat
from ollama import ChatResponse

response: ChatResponse = chat(
    model='llama3.2',
    messages=[
        {
            'role': 'user',
            'content': 'Explain microservices vs monolith',
        },
    ],
)

print(response.message.content)

streaming_chat.py
from ollama import chat

stream = chat(
    model='gemma3:4b',
    messages=[{
        'role': 'user',
        'content': 'Write a Python function for binary search'
    }],
    stream=True,
)

for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)

compare_models.py
# Benchmark the same prompt across multiple models
from ollama import chat
import time

models = ['llama3.2', 'gemma3:4b', 'mistral']
prompt = "Explain the Observer design pattern with a Python example"

for model in models:
    print(f"\n{'='*50}\nModel: {model}\n{'='*50}")
    start = time.time()
    response = chat(
        model=model,
        messages=[{'role': 'user', 'content': prompt}],
    )
    elapsed = time.time() - start
    print(response.message.content[:500])
    print(f"\n⏱ Response time: {elapsed:.1f}s")

Install

npm install ollama

index.mjs
import ollama from "ollama";

const response = await ollama.chat({
  model: "llama3.2",
  messages: [{
    role: "user",
    content: "Explain async/await in JavaScript",
  }],
});

console.log(response.message.content);

08 Support

Troubleshooting

Quick fixes for the most common Ollama issues across all platforms.

The Ollama server isn't running. Start it for your platform:

Fix per OS
# 🪟 Windows — Restart from system tray or Start Menu
# Right-click the Ollama tray icon → Quit, then relaunch
# Or run manually:
ollama serve

# 🐧 Linux
sudo systemctl start ollama

# 🍎 macOS — open the Ollama app from Applications, or:
ollama serve

The model may be running on CPU instead of GPU. Diagnose with:

Diagnose & Fix
# Check processor allocation
ollama ps
# PROCESSOR column: "100% GPU" = good  |  "100% CPU" = slow

# If on CPU, try a smaller model
ollama run llama3.2:1b

# Check NVIDIA GPU memory
nvidia-smi

# Check system RAM
free -h          # Linux / macOS
systeminfo        # Windows (look for "Available Physical Memory")

Solutions
# 1. Stop other running models first
ollama stop llama3.2

# 2. Use a smaller variant
ollama run llama3.2:1b    # instead of 3b
ollama run gemma3:1b      # instead of 4b

# 3. Remove unused models to free disk space
ollama rm deepseek-r1:8b

# 4. Audit what's in memory
ollama ps

🪟 Windows
# Open log files via Run dialog (Win+R):
%LOCALAPPDATA%\Ollama\app.log      # GUI app log
%LOCALAPPDATA%\Ollama\server.log   # Server log

# Or view in PowerShell:
Get-Content "$env:LOCALAPPDATA\Ollama\server.log" -Tail 50

🐧 Linux
# Stream live logs
journalctl -u ollama.service -f

# View last 50 lines
journalctl -u ollama.service -n 50