Prerequisites
Ensure your system meets the requirements and your tools are current before installing Ollama.
| Requirement | Minimum | Recommended |
|---|---|---|
| RAM | 8 GB | 16 GB+ |
| Disk Space | 12 GB free | 50 GB+ free |
| OS (Windows) | Windows 10 (1903+) | Windows 11 |
| OS (Linux) | Ubuntu 20.04+ | Ubuntu 22.04 / 24.04 |
| OS (macOS) | macOS 12+ | macOS 14+ (Apple Silicon) |
| GPU (Optional) | NVIDIA 6 GB / AMD Radeon | NVIDIA 8 GB+ / Apple M-series |
# Check your Windows version (must be 10 1903+ or 11) winver # Ensure Windows is updated via Settings → Windows Update # No additional package managers needed — Ollama has a # native .exe installer that works without Admin rights # Optional: Install Windows Terminal for a better CLI experience winget install Microsoft.WindowsTerminal
.exe that doesn't require Administrator privileges. It auto-starts in your system tray.# Update package lists and upgrade installed packages sudo apt update && sudo apt upgrade -y # Install essential tools sudo apt install -y curl wget git # Verify curl is installed curl --version
sudo? Many install commands need root privileges to write to system directories. The sudo prefix temporarily elevates your permissions and will prompt for your password.# Install Homebrew if you don't have it /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)" # Update Homebrew and install essentials brew update brew install curl wget git # Verify macOS version sw_vers
Install Ollama
Ollama runs natively on all three platforms — no virtual machines required. Pick your OS below.
OllamaSetup.exe from the official site. No admin rights needed.# Option A: Open in browser https://ollama.com/download/windows # Option B: Download via PowerShell Invoke-WebRequest -Uri "https://ollama.com/download/OllamaSetup.exe" -OutFile "OllamaSetup.exe"
OllamaSetup.exe and click Install. The setup takes about 30 seconds. After installation, you'll see the 🦙 Ollama icon appear in your Windows system tray (bottom-right corner). The server starts automatically.# Check Ollama version ollama --version # Expected: ollama version 0.x.x # Test the server is responding curl http://localhost:11434 # Expected: Ollama is running
ollama run llama3.2 # Downloads the model (~2 GB) then opens interactive chat >>> Hello! Tell me about yourself.
http://localhost:11434 in your browser. You should see "Ollama is running". The API and CLI both use port 11434.Ollama stores files in these Windows locations. Press Win+R and paste these paths:
| Path | Contains |
|---|---|
| %LOCALAPPDATA%\Ollama | Logs (app.log, server.log) and downloaded updates |
| %LOCALAPPDATA%\Programs\Ollama | Binaries (auto-added to your user PATH) |
| %HOMEPATH%\.ollama | Downloaded models and configuration |
To change model storage location, set the OLLAMA_MODELS environment variable via Windows Settings → "Environment variables". To uninstall, use Settings → Add or remove programs → Ollama.
curl -fsSL https://ollama.com/install.sh | sh
# Check Ollama version ollama --version # Expected: ollama version 0.x.x # Check the service status sudo systemctl status ollama # Should show: ● ollama.service — active (running)
sudo systemctl enable ollama sudo systemctl restart ollama
http://localhost:11434. You should see "Ollama is running".brew install ollamaopen https://ollama.com/downloadollama --version # Test that the server is responsive curl http://localhost:11434 # Expected: Ollama is running
Sign In to Ollama
Sign-in unlocks pushing custom models to the registry. Completely optional for pulling and running public models.
# Opens a browser window for OAuth authentication ollama signin
CLI Deep Dive
Every Ollama command you'll ever need — identical across Windows, macOS, and Linux.
ollama commands work identically in Windows CMD/PowerShell, macOS Terminal, and Linux bash. The examples below work on all platforms.| Command | What It Does |
|---|---|
| ollama serve | Start the Ollama server (port 11434 by default) |
| ollama run <model> | Download (if needed) and start interactive chat |
| ollama pull <model> | Download a model without starting it |
| ollama list | Show all locally downloaded models with sizes |
| ollama ps | Show currently loaded / running models in memory |
| ollama show <model> | Display model details — parameters, template, license |
| ollama stop <model> | Unload a model from memory |
| ollama rm <model> | Permanently delete a model from disk |
| ollama cp <src> <dst> | Clone a model under a new name |
| ollama create <name> | Create a custom model from a Modelfile |
| ollama push <model> | Push a custom model to the Ollama registry |
| ollama signin | Sign in to your Ollama account |
| ollama signout | Sign out of your Ollama account |
| ollama --version | Display installed Ollama version |
# Step 1: Ensure the server is running # Windows: It auto-starts via system tray # Linux: sudo systemctl start ollama # macOS: Open the Ollama app, or run: ollama serve & # Step 2: Pull some models ollama pull llama3.2:1b # Meta Llama 3.2 — ~1.3 GB ollama pull gemma3:4b # Google Gemma 3 — ~2.5 GB ollama pull deepseek-r1:8b # DeepSeek R1 — ~4.9 GB ollama pull qwen3:7b # Qwen 3 — ~4.4 GB ollama pull mistral # Mistral 7B — ~4.1 GB # Step 3: List downloaded models ollama list # NAME ID SIZE MODIFIED # llama3.2:1b a80c4f17... 1.3 GB just now # Step 4: Start an interactive session ollama run llama3.2:1b >>> Hello! What can you help me with? # Step 5: In another terminal — check what's loaded ollama ps # Step 6: Show model details ollama show llama3.2:1b # Step 7: Unload when done ollama stop llama3.2:1b
Inside an ollama run session, these slash commands are available:
| Command | Action |
|---|---|
| /bye | Exit the chat session |
| /clear | Clear the conversation context |
| /set system <msg> | Set a system prompt for the session |
| /show info | Display current model information |
| /show license | Show the model's license |
| /load <model> | Switch to a different model mid-session |
| /save <name> | Save the current session state |
| /? | Show all available slash commands |
🪟 Windows
# Set environment variables for your user account setx OLLAMA_HOST "0.0.0.0" setx OLLAMA_ORIGINS "*" setx OLLAMA_MODELS "D:\ollama-models" # Optional: change model storage # Or via GUI: Settings → search "environment variables" # → Edit environment variables for your account # After setting, quit Ollama from system tray and relaunch
🐧 Linux — systemd service override
# Edit the Ollama service configuration sudo systemctl edit ollama # Add these lines under [Service]: [Service] Environment="OLLAMA_HOST=0.0.0.0" Environment="OLLAMA_ORIGINS=*" Environment="OLLAMA_NUM_PARALLEL=2" # Save and reload sudo systemctl daemon-reload sudo systemctl restart ollama
🍎 macOS — launchctl
launchctl setenv OLLAMA_HOST "0.0.0.0" launchctl setenv OLLAMA_ORIGINS "*" # Then restart the Ollama app from menu bar
Ollama integrates with a wide range of tools. See the full list at docs.ollama.com/integrations.
| Category | Tools | Description |
|---|---|---|
| Coding Agents | Claude Code, Codex, OpenCode, Droid, Goose, Pi | AI agents that read, modify, and execute code |
| Assistants | OpenClaw | General-purpose AI assistant |
| IDEs | VS Code, Cline, Roo Code, JetBrains, Xcode, Zed | Native IDE integrations |
| Chat & RAG | Onyx | Chat interface with retrieval-augmented generation |
| Automation | n8n | Workflow automation with AI |
| Notebooks | marimo | Interactive computing with AI |
Install & Run Models
Five powerful open-source LLMs with install commands. These commands work identically on Windows, Linux, and macOS.
df -h (Linux/macOS) or dir C:\ (Windows).Meta's latest compact model family — excellent for general conversation, reasoning, and code. The 3B variant offers strong performance while staying lightweight enough for most hardware.
# Pull the 3B model (recommended starting point — ~2 GB) ollama pull llama3.2 # Or the lighter 1B version (~1.3 GB) ollama pull llama3.2:1b # Run interactively ollama run llama3.2 >>> Explain the concept of microservices architecture
A powerful reasoning model with visible chain-of-thought steps. Exceptional at complex problem-solving, mathematics, and logical inference tasks.
# Pull the 8B reasoning model (~4.9 GB) ollama pull deepseek-r1:8b # Run with thinking output visible ollama run deepseek-r1:8b --think >>> Design a rate limiter for an API gateway
Google's open model with multimodal support (text + images). Fast, efficient, and excellent for its size class across coding and language tasks.
# Pull the 4B model (~2.5 GB) ollama pull gemma3:4b ollama run gemma3:4b >>> What is the difference between REST and GraphQL?
Alibaba's top-performing open model family — exceptional at multilingual tasks, coding, and reasoning. One of the best open models available in 2026.
# Pull the 7B model (~4.4 GB) ollama pull qwen3:7b ollama run qwen3:7b >>> Write a Python decorator for caching with TTL
Mistral's flagship 7B model — impressively fast with low latency. Punches well above its weight class for coding, summarisation, and instruction-following.
# Pull Mistral 7B (~4.1 GB) ollama pull mistral ollama run mistral >>> Explain how Docker containers work
ollama pull llama3.2 && ollama pull deepseek-r1:8b && ollama pull gemma3:4b && ollama pull qwen3:7b && ollama pull mistral
Test & Validate Each Model
Three computer science prompts per model covering software development, AI agents, and LLM internals. Run these to compare quality.
ollama run llama3.2ollama run llama3.2 "Explain the SOLID principles with Python examples" ollama run llama3.2 "What is the ReAct pattern in AI agents?" ollama run llama3.2 "Explain the transformer attention mechanism"
ollama run deepseek-r1:8bollama run deepseek-r1:8b "Design a distributed task queue system" ollama run deepseek-r1:8b "How to implement a tool-using AI agent?" ollama run deepseek-r1:8b "Compare RAG vs fine-tuning for LLMs"
ollama run gemma3:4bollama run gemma3:4b "Explain event-driven architecture with Python code" ollama run gemma3:4b "What are multi-agent systems like CrewAI?" ollama run gemma3:4b "Explain LLM quantization: FP16 vs INT8 vs INT4"
ollama run qwen3:7bollama run qwen3:7b "Build an API gateway with FastAPI, JWT, and rate limiting" ollama run qwen3:7b "Explain function calling in LLMs with JSON schema" ollama run qwen3:7b "Decoder-only vs encoder-only vs encoder-decoder transformers"
ollama run mistralollama run mistral "Explain CAP theorem with database examples" ollama run mistral "What is MCP (Model Context Protocol) by Anthropic?" ollama run mistral "Explain context windows and attention mechanisms in LLMs"
API & Python Integration
Ollama exposes a REST API on port 11434 and provides official Python and JavaScript SDKs for seamless app integration.
curl http://localhost:11434/api/chat \ -d '{ "model": "llama3.2", "messages": [ { "role": "user", "content": "Explain Docker in 3 sentences" } ], "stream": false }'
# PowerShell equivalent using Invoke-WebRequest (Invoke-WebRequest -Method POST ` -Body '{"model":"llama3.2", "prompt":"Explain Docker in 3 sentences", "stream": false}' ` -Uri http://localhost:11434/api/generate ).Content | ConvertFrom-Json
curl http://localhost:11434/api/tagspip install ollamafrom ollama import chat from ollama import ChatResponse response: ChatResponse = chat( model='llama3.2', messages=[ { 'role': 'user', 'content': 'Explain microservices vs monolith', }, ], ) print(response.message.content)
from ollama import chat stream = chat( model='gemma3:4b', messages=[{ 'role': 'user', 'content': 'Write a Python function for binary search' }], stream=True, ) for chunk in stream: print(chunk['message']['content'], end='', flush=True)
# Benchmark the same prompt across multiple models from ollama import chat import time models = ['llama3.2', 'gemma3:4b', 'mistral'] prompt = "Explain the Observer design pattern with a Python example" for model in models: print(f"\n{'='*50}\nModel: {model}\n{'='*50}") start = time.time() response = chat( model=model, messages=[{'role': 'user', 'content': prompt}], ) elapsed = time.time() - start print(response.message.content[:500]) print(f"\n⏱ Response time: {elapsed:.1f}s")
npm install ollamaimport ollama from "ollama"; const response = await ollama.chat({ model: "llama3.2", messages: [{ role: "user", content: "Explain async/await in JavaScript", }], }); console.log(response.message.content);
Troubleshooting
Quick fixes for the most common Ollama issues across all platforms.
The Ollama server isn't running. Start it for your platform:
# 🪟 Windows — Restart from system tray or Start Menu # Right-click the Ollama tray icon → Quit, then relaunch # Or run manually: ollama serve # 🐧 Linux sudo systemctl start ollama # 🍎 macOS — open the Ollama app from Applications, or: ollama serve
The model may be running on CPU instead of GPU. Diagnose with:
# Check processor allocation ollama ps # PROCESSOR column: "100% GPU" = good | "100% CPU" = slow # If on CPU, try a smaller model ollama run llama3.2:1b # Check NVIDIA GPU memory nvidia-smi # Check system RAM free -h # Linux / macOS systeminfo # Windows (look for "Available Physical Memory")
# 1. Stop other running models first ollama stop llama3.2 # 2. Use a smaller variant ollama run llama3.2:1b # instead of 3b ollama run gemma3:1b # instead of 4b # 3. Remove unused models to free disk space ollama rm deepseek-r1:8b # 4. Audit what's in memory ollama ps
# Open log files via Run dialog (Win+R): %LOCALAPPDATA%\Ollama\app.log # GUI app log %LOCALAPPDATA%\Ollama\server.log # Server log # Or view in PowerShell: Get-Content "$env:LOCALAPPDATA\Ollama\server.log" -Tail 50
# Stream live logs journalctl -u ollama.service -f # View last 50 lines journalctl -u ollama.service -n 50