Complete Implementation Guide — 2026 Edition

Getting Started
with Ollama

Install, configure, and master local LLMs — from your first ollama run to production-ready API integrations.

🪟 Windows (Native) 🐧 Linux / Ubuntu 🍎 macOS 🦙 5 Models Covered 🐍 Python & REST API

Prerequisites

Ensure your system meets the requirements and your tools are current before installing Ollama.

⚙️
System Requirements
RequirementMinimumRecommended
RAM8 GB16 GB+
Disk Space12 GB free50 GB+ free
OS (Windows)Windows 10 (1903+)Windows 11
OS (Linux)Ubuntu 20.04+Ubuntu 22.04 / 24.04
OS (macOS)macOS 12+macOS 14+ (Apple Silicon)
GPU (Optional)NVIDIA 6 GB / AMD RadeonNVIDIA 8 GB+ / Apple M-series
🔧
Update Your System First
PowerShell
# Check your Windows version (must be 10 1903+ or 11)
winver

# Ensure Windows is updated via Settings → Windows Update
# No additional package managers needed — Ollama has a
# native .exe installer that works without Admin rights

# Optional: Install Windows Terminal for a better CLI experience
winget install Microsoft.WindowsTerminal
🪟
No WSL needed! Ollama runs natively on Windows 10/11. The installer is a simple .exe that doesn't require Administrator privileges. It auto-starts in your system tray.
Bash — Terminal
# Update package lists and upgrade installed packages
sudo apt update && sudo apt upgrade -y

# Install essential tools
sudo apt install -y curl wget git

# Verify curl is installed
curl --version
💡
Why sudo? Many install commands need root privileges to write to system directories. The sudo prefix temporarily elevates your permissions and will prompt for your password.
Bash — Terminal
# Install Homebrew if you don't have it
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# Update Homebrew and install essentials
brew update
brew install curl wget git

# Verify macOS version
sw_vers

Install Ollama

Ollama runs natively on all three platforms — no virtual machines required. Pick your OS below.

🎯
Native Windows — No WSL required. Ollama installs as a native Windows application with full NVIDIA and AMD Radeon GPU support. It runs in the background via system tray, and the CLI works in CMD, PowerShell, or Windows Terminal.
1
Download the Installer
Grab OllamaSetup.exe from the official site. No admin rights needed.
Browser or PowerShell
# Option A: Open in browser
https://ollama.com/download/windows

# Option B: Download via PowerShell
Invoke-WebRequest -Uri "https://ollama.com/download/OllamaSetup.exe" -OutFile "OllamaSetup.exe"
2
Run the Installer
Double-click OllamaSetup.exe and click Install. The setup takes about 30 seconds. After installation, you'll see the 🦙 Ollama icon appear in your Windows system tray (bottom-right corner). The server starts automatically.
3
Verify in Terminal
Open Command Prompt, PowerShell, or Windows Terminal and run:
CMD / PowerShell / Windows Terminal
# Check Ollama version
ollama --version
# Expected: ollama version 0.x.x

# Test the server is responding
curl http://localhost:11434
# Expected: Ollama is running
4
Run Your First Model!
That's it — you're ready. Pull and run a model right away:
Terminal
ollama run llama3.2
# Downloads the model (~2 GB) then opens interactive chat
>>> Hello! Tell me about yourself.
Health Check — Open http://localhost:11434 in your browser. You should see "Ollama is running". The API and CLI both use port 11434.

Ollama stores files in these Windows locations. Press Win+R and paste these paths:

PathContains
%LOCALAPPDATA%\OllamaLogs (app.log, server.log) and downloaded updates
%LOCALAPPDATA%\Programs\OllamaBinaries (auto-added to your user PATH)
%HOMEPATH%\.ollamaDownloaded models and configuration

To change model storage location, set the OLLAMA_MODELS environment variable via Windows Settings → "Environment variables". To uninstall, use Settings → Add or remove programs → Ollama.

GPU Acceleration — Ollama automatically detects and uses NVIDIA (CUDA) and AMD Radeon GPUs on Windows. No additional driver setup needed beyond having your standard GPU drivers installed.
1
Run the Official Install Script
This single command downloads and installs Ollama, sets up the systemd service, and configures everything automatically.
Terminal
curl -fsSL https://ollama.com/install.sh | sh
2
Verify the Installation
Confirm Ollama is installed and the systemd service is running.
Terminal
# Check Ollama version
ollama --version
# Expected: ollama version 0.x.x

# Check the service status
sudo systemctl status ollama
# Should show: ● ollama.service — active (running)
3
Enable Auto-Start on Boot
Ensure Ollama starts automatically every time your machine boots.
Terminal
sudo systemctl enable ollama
sudo systemctl restart ollama
Quick Health Check — Open your browser and visit http://localhost:11434. You should see "Ollama is running".
1
Download Ollama
Option A — Homebrew
brew install ollama
Option B — Direct Download
open https://ollama.com/download
2
Install & Launch
If using the .zip download, extract it and move Ollama.app to your Applications folder, then launch it — the llama icon will appear in your menu bar.
3
Verify Installation
Terminal
ollama --version

# Test that the server is responsive
curl http://localhost:11434
# Expected: Ollama is running
🍎
Apple Silicon Advantage — M1/M2/M3/M4 Macs automatically use the Metal GPU for acceleration. Zero extra drivers or configuration needed.

Sign In to Ollama

Sign-in unlocks pushing custom models to the registry. Completely optional for pulling and running public models.

1
Create an Account
Visit ollama.com and click Sign Up. You can register with your email or GitHub account.
2
Authenticate via CLI
Any Terminal (Windows / Linux / macOS)
# Opens a browser window for OAuth authentication
ollama signin
3
Explore the Model Library
Hundreds of models at ollama.com/library — from 1B parameter pocket models to 70B+ powerhouses.
🔓
No account needed for most tasks. Pull and run any public model anonymously. Sign-in is only required to push your own custom models to the Ollama registry.

CLI Deep Dive

Every Ollama command you'll ever need — identical across Windows, macOS, and Linux.

🔄
Same CLI everywhere. All ollama commands work identically in Windows CMD/PowerShell, macOS Terminal, and Linux bash. The examples below work on all platforms.
⌨️
Complete Command Reference
CommandWhat It Does
ollama serveStart the Ollama server (port 11434 by default)
ollama run <model>Download (if needed) and start interactive chat
ollama pull <model>Download a model without starting it
ollama listShow all locally downloaded models with sizes
ollama psShow currently loaded / running models in memory
ollama show <model>Display model details — parameters, template, license
ollama stop <model>Unload a model from memory
ollama rm <model>Permanently delete a model from disk
ollama cp <src> <dst>Clone a model under a new name
ollama create <name>Create a custom model from a Modelfile
ollama push <model>Push a custom model to the Ollama registry
ollama signinSign in to your Ollama account
ollama signoutSign out of your Ollama account
ollama --versionDisplay installed Ollama version
Any Terminal (Works on All OS)
# Step 1: Ensure the server is running
# Windows: It auto-starts via system tray
# Linux:   sudo systemctl start ollama
# macOS:   Open the Ollama app, or run:
ollama serve &

# Step 2: Pull some models
ollama pull llama3.2:1b        # Meta Llama 3.2  — ~1.3 GB
ollama pull gemma3:4b          # Google Gemma 3  — ~2.5 GB
ollama pull deepseek-r1:8b     # DeepSeek R1     — ~4.9 GB
ollama pull qwen3:7b           # Qwen 3          — ~4.4 GB
ollama pull mistral            # Mistral 7B      — ~4.1 GB

# Step 3: List downloaded models
ollama list
# NAME              ID            SIZE      MODIFIED
# llama3.2:1b       a80c4f17...   1.3 GB    just now

# Step 4: Start an interactive session
ollama run llama3.2:1b
>>> Hello! What can you help me with?

# Step 5: In another terminal — check what's loaded
ollama ps

# Step 6: Show model details
ollama show llama3.2:1b

# Step 7: Unload when done
ollama stop llama3.2:1b

Inside an ollama run session, these slash commands are available:

CommandAction
/byeExit the chat session
/clearClear the conversation context
/set system <msg>Set a system prompt for the session
/show infoDisplay current model information
/show licenseShow the model's license
/load <model>Switch to a different model mid-session
/save <name>Save the current session state
/?Show all available slash commands

🪟 Windows

PowerShell or CMD
# Set environment variables for your user account
setx OLLAMA_HOST "0.0.0.0"
setx OLLAMA_ORIGINS "*"
setx OLLAMA_MODELS "D:\ollama-models"  # Optional: change model storage

# Or via GUI: Settings → search "environment variables"
# → Edit environment variables for your account

# After setting, quit Ollama from system tray and relaunch

🐧 Linux — systemd service override

Bash
# Edit the Ollama service configuration
sudo systemctl edit ollama

# Add these lines under [Service]:
[Service]
Environment="OLLAMA_HOST=0.0.0.0"
Environment="OLLAMA_ORIGINS=*"
Environment="OLLAMA_NUM_PARALLEL=2"

# Save and reload
sudo systemctl daemon-reload
sudo systemctl restart ollama

🍎 macOS — launchctl

Terminal
launchctl setenv OLLAMA_HOST "0.0.0.0"
launchctl setenv OLLAMA_ORIGINS "*"
# Then restart the Ollama app from menu bar

Ollama integrates with a wide range of tools. See the full list at docs.ollama.com/integrations.

CategoryToolsDescription
Coding AgentsClaude Code, Codex, OpenCode, Droid, Goose, PiAI agents that read, modify, and execute code
AssistantsOpenClawGeneral-purpose AI assistant
IDEsVS Code, Cline, Roo Code, JetBrains, Xcode, ZedNative IDE integrations
Chat & RAGOnyxChat interface with retrieval-augmented generation
Automationn8nWorkflow automation with AI
NotebooksmarimoInteractive computing with AI

Install & Run Models

Five powerful open-source LLMs with install commands. These commands work identically on Windows, Linux, and macOS.

📦
Storage Note: Models range from ~1.3 GB to ~4.9 GB each. Check free space first: df -h (Linux/macOS) or dir C:\ (Windows).
🦙 Llama 3.2
Meta AI1B / 3BGeneral Purpose

Meta's latest compact model family — excellent for general conversation, reasoning, and code. The 3B variant offers strong performance while staying lightweight enough for most hardware.

Install & Run
# Pull the 3B model (recommended starting point — ~2 GB)
ollama pull llama3.2

# Or the lighter 1B version (~1.3 GB)
ollama pull llama3.2:1b

# Run interactively
ollama run llama3.2
>>> Explain the concept of microservices architecture
🧠 DeepSeek R1
DeepSeek AI8B / 14BChain-of-Thought

A powerful reasoning model with visible chain-of-thought steps. Exceptional at complex problem-solving, mathematics, and logical inference tasks.

Install & Run
# Pull the 8B reasoning model (~4.9 GB)
ollama pull deepseek-r1:8b

# Run with thinking output visible
ollama run deepseek-r1:8b --think
>>> Design a rate limiter for an API gateway
💎 Gemma 3
Google4B / 12BMultimodal

Google's open model with multimodal support (text + images). Fast, efficient, and excellent for its size class across coding and language tasks.

Install & Run
# Pull the 4B model (~2.5 GB)
ollama pull gemma3:4b

ollama run gemma3:4b
>>> What is the difference between REST and GraphQL?
🌐 Qwen 3
Alibaba7B / 14B / 32BMultilingual

Alibaba's top-performing open model family — exceptional at multilingual tasks, coding, and reasoning. One of the best open models available in 2026.

Install & Run
# Pull the 7B model (~4.4 GB)
ollama pull qwen3:7b

ollama run qwen3:7b
>>> Write a Python decorator for caching with TTL
🌊 Mistral
Mistral AI7BFast & Efficient

Mistral's flagship 7B model — impressively fast with low latency. Punches well above its weight class for coding, summarisation, and instruction-following.

Install & Run
# Pull Mistral 7B (~4.1 GB)
ollama pull mistral

ollama run mistral
>>> Explain how Docker containers work
📋
Pull all 5 models in one command:
Terminal (any OS)
ollama pull llama3.2 && ollama pull deepseek-r1:8b && ollama pull gemma3:4b && ollama pull qwen3:7b && ollama pull mistral

Test & Validate Each Model

Three computer science prompts per model covering software development, AI agents, and LLM internals. Run these to compare quality.

🦙 Testing Llama 3.2
Launch
ollama run llama3.2
Question 1 — Development
Explain the SOLID principles in software engineering with a real-world Python example for each principle.
Question 2 — AI Agents
What is the ReAct (Reasoning + Acting) pattern in AI agents? How does it differ from simple chain-of-thought prompting? Provide a pseudocode example.
Question 3 — LLMs
Explain the transformer attention mechanism. What are Query, Key, and Value vectors, and how does multi-head attention improve over single-head attention?
Non-Interactive One-Liners
ollama run llama3.2 "Explain the SOLID principles with Python examples"
ollama run llama3.2 "What is the ReAct pattern in AI agents?"
ollama run llama3.2 "Explain the transformer attention mechanism"
🧠 Testing DeepSeek R1
Launch
ollama run deepseek-r1:8b
Question 1 — Development
Design a distributed task queue system (like Celery). What components are needed, how do you handle failures, and what are the trade-offs between at-least-once vs exactly-once delivery?
Question 2 — AI Agents
How would you implement a tool-using AI agent that can search the web, execute code, and manage files? Describe the architecture with a state machine diagram.
Question 3 — LLMs
Compare RAG (Retrieval Augmented Generation) vs fine-tuning for adding domain knowledge to an LLM. When should you use each approach? Provide concrete scenarios.
Non-Interactive One-Liners
ollama run deepseek-r1:8b "Design a distributed task queue system"
ollama run deepseek-r1:8b "How to implement a tool-using AI agent?"
ollama run deepseek-r1:8b "Compare RAG vs fine-tuning for LLMs"
💎 Testing Gemma 3
Launch
ollama run gemma3:4b
Question 1 — Development
Explain event-driven architecture. Write a Python example using an event bus pattern with publishers and subscribers for an e-commerce order system.
Question 2 — AI Agents
What are multi-agent systems? Explain how frameworks like CrewAI or AutoGen orchestrate multiple AI agents to collaborate on complex tasks.
Question 3 — LLMs
What is quantization in LLMs? Explain the difference between FP16, INT8, and INT4 quantization and their impact on model quality and performance.
Non-Interactive One-Liners
ollama run gemma3:4b "Explain event-driven architecture with Python code"
ollama run gemma3:4b "What are multi-agent systems like CrewAI?"
ollama run gemma3:4b "Explain LLM quantization: FP16 vs INT8 vs INT4"
🌐 Testing Qwen 3
Launch
ollama run qwen3:7b
Question 1 — Development
Implement a simple API gateway in Python using FastAPI that handles rate limiting, authentication via JWT tokens, and request routing to microservices.
Question 2 — AI Agents
Explain the concept of "function calling" in LLMs. How do models decide when and how to call external tools? Show the JSON schema approach.
Question 3 — LLMs
What are the key differences between decoder-only (GPT), encoder-only (BERT), and encoder-decoder (T5) transformer architectures? When would you choose each?
Non-Interactive One-Liners
ollama run qwen3:7b "Build an API gateway with FastAPI, JWT, and rate limiting"
ollama run qwen3:7b "Explain function calling in LLMs with JSON schema"
ollama run qwen3:7b "Decoder-only vs encoder-only vs encoder-decoder transformers"
🌊 Testing Mistral
Launch
ollama run mistral
Question 1 — Development
Explain the CAP theorem with real-world examples. How do databases like PostgreSQL, MongoDB, and Cassandra make different trade-offs?
Question 2 — AI Agents
What is the MCP (Model Context Protocol) by Anthropic? How does it standardize tool use and external service integration for AI agents?
Question 3 — LLMs
Explain the concept of "context window" in LLMs. How do techniques like sliding window attention, sparse attention, and RoPE help extend context length?
Non-Interactive One-Liners
ollama run mistral "Explain CAP theorem with database examples"
ollama run mistral "What is MCP (Model Context Protocol) by Anthropic?"
ollama run mistral "Explain context windows and attention mechanisms in LLMs"

API & Python Integration

Ollama exposes a REST API on port 11434 and provides official Python and JavaScript SDKs for seamless app integration.

cURL (Linux / macOS / Windows Terminal)
curl http://localhost:11434/api/chat \
  -d '{
    "model": "llama3.2",
    "messages": [
      { "role": "user", "content": "Explain Docker in 3 sentences" }
    ],
    "stream": false
  }'
PowerShell (Windows native)
# PowerShell equivalent using Invoke-WebRequest
(Invoke-WebRequest -Method POST `
  -Body '{"model":"llama3.2", "prompt":"Explain Docker in 3 sentences", "stream": false}' `
  -Uri http://localhost:11434/api/generate
).Content | ConvertFrom-Json
List Local Models
curl http://localhost:11434/api/tags
Install SDK
pip install ollama
basic_chat.py
from ollama import chat
from ollama import ChatResponse

response: ChatResponse = chat(
    model='llama3.2',
    messages=[
        {
            'role': 'user',
            'content': 'Explain microservices vs monolith',
        },
    ],
)

print(response.message.content)
streaming_chat.py
from ollama import chat

stream = chat(
    model='gemma3:4b',
    messages=[{
        'role': 'user',
        'content': 'Write a Python function for binary search'
    }],
    stream=True,
)

for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)
compare_models.py
# Benchmark the same prompt across multiple models
from ollama import chat
import time

models = ['llama3.2', 'gemma3:4b', 'mistral']
prompt = "Explain the Observer design pattern with a Python example"

for model in models:
    print(f"\n{'='*50}\nModel: {model}\n{'='*50}")
    start = time.time()
    response = chat(
        model=model,
        messages=[{'role': 'user', 'content': prompt}],
    )
    elapsed = time.time() - start
    print(response.message.content[:500])
    print(f"\n⏱ Response time: {elapsed:.1f}s")
Install
npm install ollama
index.mjs
import ollama from "ollama";

const response = await ollama.chat({
  model: "llama3.2",
  messages: [{
    role: "user",
    content: "Explain async/await in JavaScript",
  }],
});

console.log(response.message.content);

Troubleshooting

Quick fixes for the most common Ollama issues across all platforms.

The Ollama server isn't running. Start it for your platform:

Fix per OS
# 🪟 Windows — Restart from system tray or Start Menu
# Right-click the Ollama tray icon → Quit, then relaunch
# Or run manually:
ollama serve

# 🐧 Linux
sudo systemctl start ollama

# 🍎 macOS — open the Ollama app from Applications, or:
ollama serve

The model may be running on CPU instead of GPU. Diagnose with:

Diagnose & Fix
# Check processor allocation
ollama ps
# PROCESSOR column: "100% GPU" = good  |  "100% CPU" = slow

# If on CPU, try a smaller model
ollama run llama3.2:1b

# Check NVIDIA GPU memory
nvidia-smi

# Check system RAM
free -h          # Linux / macOS
systeminfo        # Windows (look for "Available Physical Memory")
Solutions
# 1. Stop other running models first
ollama stop llama3.2

# 2. Use a smaller variant
ollama run llama3.2:1b    # instead of 3b
ollama run gemma3:1b      # instead of 4b

# 3. Remove unused models to free disk space
ollama rm deepseek-r1:8b

# 4. Audit what's in memory
ollama ps
🪟 Windows
# Open log files via Run dialog (Win+R):
%LOCALAPPDATA%\Ollama\app.log      # GUI app log
%LOCALAPPDATA%\Ollama\server.log   # Server log

# Or view in PowerShell:
Get-Content "$env:LOCALAPPDATA\Ollama\server.log" -Tail 50
🐧 Linux
# Stream live logs
journalctl -u ollama.service -f

# View last 50 lines
journalctl -u ollama.service -n 50