← Home
Updated March 2026
Vol. I · 2026 Edition
Comprehensive Reference · Local AI Models

The Ollama
Model Catalog

A structured reference for every major model available on Ollama's local inference platform — covering specialties, sizes, deployments, and licensing. Built for developers who need clarity, not noise.

0+ Models Available
0 Profiled Here
270M→671B Parameter Range
0+ Architecture Types
Showing all 16 models
§ 01 Overview Summary Table
#ModelLatest Ver.Release MaintainerCategorySize RangeDownloads
01Llama 3.3Meta AI3.3Dec 2024Meta AIGeneral43 GB (70B Q4)2.9M+
02Llama 3.1Meta AI3.1Jul 2024Meta AIGeneral4.9 – 243 GB108M+
03Llama 3.2Meta AI3.2Sep 2024Meta AIEdge1.3 – 2.0 GB51M+
04Llama 3.2-VisionMeta AI3.2Sep 2024Meta AIVision7.8 – 55 GBHigh
05DeepSeek-R1DeepSeek AIR1Jan 2025DeepSeek AIReasoning1.1 GB – 404 GB75M+
06DeepSeek-V3DeepSeek AIV3.2Dec 2024DeepSeek AIGeneral~404 GB (MoE)High
07Mistral 7BMistral AIv0.3Sep 2023Mistral AIGeneral4.1 GBVery High
08Mixtral 8×7B / 8×22BMistral AI8×22BApr 2024Mistral AIGeneral26 – 80 GBHigh
09Gemma 3Google DeepMind3Mar 2025Google DeepMindGeneralVision1.9 – 67 GB28M+
10Phi-4Microsoft Research4Dec 2024MicrosoftReasoning8.9 GB (14B Q4)High
11Phi-3 / Phi-3.5Microsoft Research3.5Apr 2024MicrosoftEdge2.2 – 7.9 GBHigh
12Qwen3Alibaba Cloud3Apr 2025Alibaba Qwen TeamReasoning0.6B – 235BHigh
13Qwen2.5Alibaba Cloud2.5Sep 2024Alibaba Qwen TeamMultilingual~400 MB – 44 GBHigh
14CodeLlamaMeta AI70BAug 2023Meta AICode3.8 – 38 GBHigh
15nomic-embed-textNomic AIv1.52024Nomic AIEmbedding~274 MBHigh
16LLaVAUW-Madisonv1.6Jan 2024Haotian Liu et al.Vision4.5 – 20 GBHigh
§ 02 Individual Model Profiles
No. 01
Llama 3.3
Meta AI · December 2024
General Instruction
Meta's current flagship open model. Delivers performance comparable to the 405B variant at a fraction of compute cost. Excels at multi-step reasoning, code, math, and multilingual tasks. Best single-GPU open model heading into 2025.
Min Size
43 GB (70B Q4)
Parameters
70 Billion
Context Window
128K tokens
Languages
8 supported
Notable Deployments
Groq CloudFastest known inference at ~276 tok/sec; used in production API endpoints
NVIDIA TRT-LLMUp to 3.55× throughput via speculative decoding on HGX H200 hardware
Llama 3.3 Community License Dec 2024
No. 02
Llama 3.1
Meta AI · July 2024
General Code Multilingual
The most downloaded model on Ollama with 108M+ pulls. The 8B variant is the go-to professional workhorse for private local AI. At 405B, the first open model to truly rival GPT-4. Supports native tool use and a 128K context window throughout.
Min Size
4.9 GB (8B)
Max Size
243 GB (405B)
Context Window
128K tokens
Training Tokens
15 Trillion
Notable Deployments
n8n WorkflowsOllama node integration for automated local business task pipelines
LiteLLM / LangChainUnified proxy layer for swapping between Llama 3.1 and cloud APIs
Llama 3.1 Community License Jul 2024
No. 03
Llama 3.2
Meta AI · September 2024
Edge / On-Device Multilingual
Meta's push into on-device AI. Created via pruning and knowledge distillation from Llama 3.1 8B and 70B. Compatible with Qualcomm, MediaTek, and ARM chips. The 3B model significantly outperforms Gemma 2 2.6B on instruction following benchmarks.
Min Size
1.3 GB (1B)
Max Size
2.0 GB (3B)
Context Window
128K tokens
Min RAM
8 GB RAM
Notable Deployments
Mobile / IoTCLI copilots, homelab dashboards, low-latency edge inference agents
Qualcomm SnapdragonOn-device inference on mobile chipsets without any cloud dependency
Llama 3.2 Community License Sep 2024
No. 04
Llama 3.2-Vision
Meta AI · September 2024
Vision General
Meta's first multimodal Llama. Uses cross-attention adapters to attach a vision encoder to Llama 3.1. Language model weights stayed frozen during training — making it a full drop-in replacement for text tasks. Supports images up to 1120×1120px.
Min Size
7.8 GB (11B)
Max Size
55 GB (90B)
Image Resolution
1120×1120 px
Training Pairs
6 Billion imgs
Notable Deployments
Document OCR PipelinesChart, table, and form data extraction in enterprise workflows
Accessibility ToolsImage-to-text for accessibility apps running fully offline on local hardware
US-Only Geographic Restriction Sep 2024
No. 05
DeepSeek-R1
DeepSeek AI · January 2025
Reasoning Code
Open reasoning family trained with rule-based RL rather than expensive supervised data. Includes distilled variants (1.5B – 70B) and a 671B flagship. Approaches O3 and Gemini 2.5 Pro on math and logic benchmarks. Chain-of-thought inference is native and transparent.
Min Size
1.1 GB (1.5B)
Max Size
~404 GB (671B)
Context Window
128K tokens
Architecture
MoE (671B)
Notable Deployments
Academic ResearchChain-of-thought benchmarks with visible scratchpad-style reasoning
Home Lab Engineering7B / 8B / 14B distills widely deployed as local reasoning and planning agents
MIT License Jan 2025
No. 06
DeepSeek-V3.2
DeepSeek AI · December 2024 / 2025
General MoE
General-purpose frontier model using Mixture-of-Experts with 671B total parameters, activating only ~37B per token. Designed for computational efficiency at frontier scale with strong reasoning performance and far lower inference cost than equivalent dense architectures.
Total Parameters
671B (MoE)
Active / Token
~37B active
Context Window
128K tokens
Local Format
GGUF Q4
Notable Deployments
DeepSeek API PlatformHosted inference for enterprise coding, analysis, and research tasks
vLLM / SGLangHigh-throughput server deployment for production agent and pipeline frameworks
DeepSeek License 2.0 Dec 2024
No. 07
Mistral 7B
Mistral AI · September 2023
General Code
The original benchmark-setter for efficient 7B-class models. Fast, accurate, and broadly capable — from email drafting to code summarization. Version 0.3 added function calling support. Extremely popular in small business integrations due to minimal hardware requirements.
Size (Q4)
4.1 GB
Parameters
7 Billion
Min VRAM
8 GB
Architecture
Sliding Window Attn
Notable Deployments
Small Business ToolingEmail drafting, report summarization, local customer support chatbots
Open InterpreterDefault local model for natural language computer control workflows
Apache 2.0 Sep 2023 → v0.3 May 2024
No. 08
Mixtral 8×7B / 8×22B
Mistral AI · Dec 2023 / Apr 2024
General MoE
Sparse Mixture-of-Experts — activates only 2 of 8 expert sub-networks per token. Massive total parameter count with manageable compute overhead. The 8×22B model rivals top-tier proprietary models under a fully permissive Apache 2.0 license. Excellent multilingual reasoning.
8×7B Size (Q4)
26 GB
8×22B Size (Q4)
~80 GB
Active Params
~12.9B / token
Architecture
Sparse MoE
Notable Deployments
Dolphin Fine-tunesCommunity fine-tunes by Eric Hartford for creative and research use cases
Enterprise Knowledge BasesDocument Q&A and analysis pipelines via LangChain + Ollama integrations
Apache 2.0 Dec 2023 → 8×22B Apr 2024
No. 09
Gemma 3
Google DeepMind · March 2025
General Vision 140+ Languages
Google's latest open-weight multimodal family. The 27B model reportedly outperforms Llama 3 405B and DeepSeek-V3 in human preference evaluations. Features QK-norm, Grouped-Query Attention, and an efficient KV cache. Supports over 140 languages natively out of the box.
Min Size
1.9 GB (1B)
Max Size
67 GB (27B Q8)
Context Window
128K tokens
Sizes
1B, 4B, 12B, 27B
Notable Deployments
Google AI StudioHosted research access; Gemma 3 used as reference open model for developers
Multimodal AnalysisDocument and chart understanding at 12B scale on consumer RTX graphics cards
Gemma Terms of Use Mar 2025
No. 10
Phi-4
Microsoft Research · December 2024
Reasoning Compact
14B parameter model from Microsoft that rivals much larger models on complex STEM reasoning. Phi-4-Reasoning matches frontier math performance at 14B scale — ideal for constrained hardware requiring deep analytical capability without giving up accuracy for speed.
Size (Q4)
8.9 GB
Parameters
14 Billion
Min VRAM
10 GB
Specialty
Math / STEM
Notable Deployments
Azure AI StudioMicrosoft-hosted managed endpoint for enterprise STEM reasoning tasks
QodeAssist (Qt Creator)AI coding assistant plugin using local Phi-4 via Ollama backend
MIT License Dec 2024
No. 11
Phi-3 / Phi-3.5
Microsoft Research · April 2024
Edge General
Punches well above its 3.8B weight class. Matches much larger models on MMLU benchmarks. Designed for phones and embedded systems. Phi-3.5-mini extends context to 128K. Used in offline agricultural field tools and developer CLI copilots on severely constrained hardware.
Min Size
2.2 GB (3.8B)
Max Size
7.9 GB (14B)
Variants
Mini, Small, Med
Min RAM
4 GB RAM
Notable Deployments
Agricultural Field AppsOn-device crop advice and field diagnostics with no internet required
AI Toolkit for VS CodeMicrosoft's official extension uses Phi-3 as default local code assistance model
MIT License Apr 2024 → Phi-3.5 Aug 2024
No. 12
Qwen3
Alibaba Qwen Team · April 2025
General Reasoning Multilingual
Latest Qwen generation with dense and MoE variants. Qwen3 offers up to 256K tokens — the longest context window on Ollama. The 235B MoE is the flagship. Excellent for multi-step agentic tasks, tool use, long-document analysis, and broad multilingual coverage.
Min Size
~500 MB (0.6B)
Max Size
~140 GB (235B)
Context Window
256K tokens
Architecture
Dense + MoE
Notable Deployments
Enterprise Long-Doc Q&AContract review and summarization via LangChain + Ollama integrations
Qwen3-Coder-NextCoding-focused variant optimized for agentic local development workflows
Apache 2.0 (most variants) Apr 2025
No. 13
Qwen2.5
Alibaba Qwen Team · September 2024
Multilingual Code Variant General
Alibaba's mature, well-quantized general family. Pretrained on 18 trillion tokens. Sweet spots are 7B (fast) and 14B (balanced reasoning). Includes Qwen2.5-Coder with support for 300+ programming languages. One of the broadest multilingual models available locally.
Min Size
~400 MB (0.5B)
Max Size
~44 GB (72B)
Context Window
128K tokens
Training Tokens
18 Trillion
Notable Deployments
Home Lab AI StacksQ4 / Q5 / Q8 builds popular on consumer GPUs in self-hosted server setups
DevOps ScriptingQwen2.5-Coder for code generation, review, and shell scripting automation
Apache 2.0 (≤72B) Sep 2024
No. 14
CodeLlama
Meta AI · August 2023
Code Fill-in-Middle
Meta's code-specialized Llama 2 derivative. Supports fill-in-the-middle (FIM) for inline code completion via the <FILL_ME> token. Available in Python-specific and instruction-tuned variants. Important: FIM support is limited to the 7B and 13B base model sizes only.
Min Size
3.8 GB (7B)
Max Size
38 GB (70B)
Sizes
7B, 13B, 34B, 70B
FIM Support
7B & 13B only
Notable Deployments
Continue (VS Code)Open-source AI coding assistant using CodeLlama for tab-completion
Cline (VS Code)Multi-file repository coding agent; CodeLlama 34B for local use cases
Llama 2 Community License Aug 2023
No. 15
nomic-embed-text
Nomic AI · 2024
Embedding
High-performing open embedding model with an unusually large 8K token context window. Converts text to 768-dimensional vectors for semantic search, RAG pipelines, and similarity matching. One of the only fully open embedding models that genuinely competes with proprietary APIs.
Size
~274 MB
Context Window
8K tokens
Output Dimensions
768
Version
v1.5
Notable Deployments
RAG PipelinesLocal vector stores (Chroma, pgvector, chromem-go) for private document Q&A
Semantic Kernel / LangChainDefault local embedding model in many open-source agent frameworks
Apache 2.0 2024
No. 16
LLaVA
Haotian Liu et al., UW-Madison · April 2023
Vision General
The original open-source vision-language model for local deployment. Connects a CLIP visual encoder to Llama or Mistral via a lightweight projection layer. LLaVA 1.6 (Mistral-based) is the most popular variant. The easiest entry point to multimodal AI in a home lab.
Min Size
4.5 GB (7B)
Max Size
20 GB (34B)
Vision Encoder
CLIP ViT-L
Latest Version
v1.6 (Jan 2024)
Notable Deployments
Home Lab VisionMost popular multimodal model for self-hosted image description and analysis
Open WebUIIntegrated as the default vision option in popular Ollama browser front-ends
LLaVA License (research) Apr 2023 → v1.6 Jan 2024
§ 03 Verify Current Availability & Latest Releases
📡 How to Check Model Availability, Sizes & Updates

All data above reflects best publicly available information as of March 2026. Model sizes vary significantly by quantization level (Q4_K_M, Q5_K_M, Q8_0, F16) — the same model can differ 2–4× in disk size depending on the specific tag pulled. Always verify current availability using the commands and sources below.

bash · ollama cli
# List all locally installed models with their exact sizes
ollama list

# Pull a specific model and quantization variant
ollama pull llama3.1:8b
ollama pull deepseek-r1:7b
ollama pull qwen3:14b

# Inspect a model's metadata and Modelfile before using it
ollama show llama3.3 --modelfile

# Query the registry API for all available model names
curl https://ollama.com/api/tags | jq '.models[].name'

# Update all installed models to their latest published versions
ollama list | awk 'NR>1{print $1}' | xargs -I{} ollama pull {}
Official Sources
Not Publicly Documented
  • Exact pull / download counts for most models
  • Precise release dates for community fine-tunes
  • Hardware-specific performance benchmarks per model
  • Commercial deployment details (not self-reported)
  • Quantization quality tradeoff specifics per family