Ollama Model Catalog 2026

#	Model	Latest Ver.	Release	Maintainer	Category	Size Range	Downloads
01	Llama 3.3Meta AI	3.3	Dec 2024	Meta AI	General	43 GB (70B Q4)	2.9M+
02	Llama 3.1Meta AI	3.1	Jul 2024	Meta AI	General	4.9 – 243 GB	108M+
03	Llama 3.2Meta AI	3.2	Sep 2024	Meta AI	Edge	1.3 – 2.0 GB	51M+
04	Llama 3.2-VisionMeta AI	3.2	Sep 2024	Meta AI	Vision	7.8 – 55 GB	High
05	DeepSeek-R1DeepSeek AI	R1	Jan 2025	DeepSeek AI	Reasoning	1.1 GB – 404 GB	75M+
06	DeepSeek-V3DeepSeek AI	V3.2	Dec 2024	DeepSeek AI	General	~404 GB (MoE)	High
07	Mistral 7BMistral AI	v0.3	Sep 2023	Mistral AI	General	4.1 GB	Very High
08	Mixtral 8×7B / 8×22BMistral AI	8×22B	Apr 2024	Mistral AI	General	26 – 80 GB	High
09	Gemma 3Google DeepMind	3	Mar 2025	Google DeepMind	GeneralVision	1.9 – 67 GB	28M+
10	Phi-4Microsoft Research	4	Dec 2024	Microsoft	Reasoning	8.9 GB (14B Q4)	High
11	Phi-3 / Phi-3.5Microsoft Research	3.5	Apr 2024	Microsoft	Edge	2.2 – 7.9 GB	High
12	Qwen3Alibaba Cloud	3	Apr 2025	Alibaba Qwen Team	Reasoning	0.6B – 235B	High
13	Qwen2.5Alibaba Cloud	2.5	Sep 2024	Alibaba Qwen Team	Multilingual	~400 MB – 44 GB	High
14	CodeLlamaMeta AI	70B	Aug 2023	Meta AI	Code	3.8 – 38 GB	High
15	nomic-embed-textNomic AI	v1.5	2024	Nomic AI	Embedding	~274 MB	High
16	LLaVAUW-Madison	v1.6	Jan 2024	Haotian Liu et al.	Vision	4.5 – 20 GB	High

No. 01

Llama 3.3

Meta AI · December 2024

General Instruction

Meta's current flagship open model. Delivers performance comparable to the 405B variant at a fraction of compute cost. Excels at multi-step reasoning, code, math, and multilingual tasks. Best single-GPU open model heading into 2025.

Min Size

43 GB (70B Q4)

Parameters

70 Billion

Context Window

128K tokens

Languages

8 supported

Notable Deployments

Groq Cloud — Fastest known inference at ~276 tok/sec; used in production API endpoints

NVIDIA TRT-LLM — Up to 3.55× throughput via speculative decoding on HGX H200 hardware

Llama 3.3 Community License Dec 2024

No. 02

Llama 3.1

Meta AI · July 2024

General Code Multilingual

The most downloaded model on Ollama with 108M+ pulls. The 8B variant is the go-to professional workhorse for private local AI. At 405B, the first open model to truly rival GPT-4. Supports native tool use and a 128K context window throughout.

Min Size

4.9 GB (8B)

Max Size

243 GB (405B)

Context Window

128K tokens

Training Tokens

15 Trillion

Notable Deployments

n8n Workflows — Ollama node integration for automated local business task pipelines

LiteLLM / LangChain — Unified proxy layer for swapping between Llama 3.1 and cloud APIs

Llama 3.1 Community License Jul 2024

No. 03

Llama 3.2

Meta AI · September 2024

Edge / On-Device Multilingual

Meta's push into on-device AI. Created via pruning and knowledge distillation from Llama 3.1 8B and 70B. Compatible with Qualcomm, MediaTek, and ARM chips. The 3B model significantly outperforms Gemma 2 2.6B on instruction following benchmarks.

Min Size

1.3 GB (1B)

Max Size

2.0 GB (3B)

Context Window

128K tokens

Min RAM

8 GB RAM

Notable Deployments

Mobile / IoT — CLI copilots, homelab dashboards, low-latency edge inference agents

Qualcomm Snapdragon — On-device inference on mobile chipsets without any cloud dependency

Llama 3.2 Community License Sep 2024

No. 04

Llama 3.2-Vision

Meta AI · September 2024

Vision General

Meta's first multimodal Llama. Uses cross-attention adapters to attach a vision encoder to Llama 3.1. Language model weights stayed frozen during training — making it a full drop-in replacement for text tasks. Supports images up to 1120×1120px.

Min Size

7.8 GB (11B)

Max Size

55 GB (90B)

Image Resolution

1120×1120 px

Training Pairs

6 Billion imgs

Notable Deployments

Document OCR Pipelines — Chart, table, and form data extraction in enterprise workflows

Accessibility Tools — Image-to-text for accessibility apps running fully offline on local hardware

US-Only Geographic Restriction Sep 2024

No. 05

DeepSeek-R1

DeepSeek AI · January 2025

Reasoning Code

Open reasoning family trained with rule-based RL rather than expensive supervised data. Includes distilled variants (1.5B – 70B) and a 671B flagship. Approaches O3 and Gemini 2.5 Pro on math and logic benchmarks. Chain-of-thought inference is native and transparent.

Min Size

1.1 GB (1.5B)

Max Size

~404 GB (671B)

Context Window

128K tokens

Architecture

MoE (671B)

Notable Deployments

Academic Research — Chain-of-thought benchmarks with visible scratchpad-style reasoning

Home Lab Engineering — 7B / 8B / 14B distills widely deployed as local reasoning and planning agents

MIT License Jan 2025

No. 06

DeepSeek-V3.2

DeepSeek AI · December 2024 / 2025

General MoE

General-purpose frontier model using Mixture-of-Experts with 671B total parameters, activating only ~37B per token. Designed for computational efficiency at frontier scale with strong reasoning performance and far lower inference cost than equivalent dense architectures.

Total Parameters

671B (MoE)

Active / Token

~37B active

Context Window

128K tokens

Local Format

GGUF Q4

Notable Deployments

DeepSeek API Platform — Hosted inference for enterprise coding, analysis, and research tasks

vLLM / SGLang — High-throughput server deployment for production agent and pipeline frameworks

DeepSeek License 2.0 Dec 2024

No. 07

Mistral 7B

Mistral AI · September 2023

General Code

The original benchmark-setter for efficient 7B-class models. Fast, accurate, and broadly capable — from email drafting to code summarization. Version 0.3 added function calling support. Extremely popular in small business integrations due to minimal hardware requirements.

Size (Q4)

4.1 GB

Parameters

7 Billion

Min VRAM

8 GB

Architecture

Sliding Window Attn

Notable Deployments

Small Business Tooling — Email drafting, report summarization, local customer support chatbots

Open Interpreter — Default local model for natural language computer control workflows

Apache 2.0 Sep 2023 → v0.3 May 2024

No. 08

Mixtral 8×7B / 8×22B

Mistral AI · Dec 2023 / Apr 2024

General MoE

Sparse Mixture-of-Experts — activates only 2 of 8 expert sub-networks per token. Massive total parameter count with manageable compute overhead. The 8×22B model rivals top-tier proprietary models under a fully permissive Apache 2.0 license. Excellent multilingual reasoning.

8×7B Size (Q4)

26 GB

8×22B Size (Q4)

~80 GB

Active Params

~12.9B / token

Architecture

Sparse MoE

Notable Deployments

Dolphin Fine-tunes — Community fine-tunes by Eric Hartford for creative and research use cases

Enterprise Knowledge Bases — Document Q&A and analysis pipelines via LangChain + Ollama integrations

Apache 2.0 Dec 2023 → 8×22B Apr 2024

No. 09

Gemma 3

Google DeepMind · March 2025

General Vision 140+ Languages

Google's latest open-weight multimodal family. The 27B model reportedly outperforms Llama 3 405B and DeepSeek-V3 in human preference evaluations. Features QK-norm, Grouped-Query Attention, and an efficient KV cache. Supports over 140 languages natively out of the box.

Min Size

1.9 GB (1B)

Max Size

67 GB (27B Q8)

Context Window

128K tokens

Sizes

1B, 4B, 12B, 27B

Notable Deployments

Google AI Studio — Hosted research access; Gemma 3 used as reference open model for developers

Multimodal Analysis — Document and chart understanding at 12B scale on consumer RTX graphics cards

Gemma Terms of Use Mar 2025

No. 10

Phi-4

Microsoft Research · December 2024

Reasoning Compact

14B parameter model from Microsoft that rivals much larger models on complex STEM reasoning. Phi-4-Reasoning matches frontier math performance at 14B scale — ideal for constrained hardware requiring deep analytical capability without giving up accuracy for speed.

Size (Q4)

8.9 GB

Parameters

14 Billion

Min VRAM

10 GB

Specialty

Math / STEM

Notable Deployments

Azure AI Studio — Microsoft-hosted managed endpoint for enterprise STEM reasoning tasks

QodeAssist (Qt Creator) — AI coding assistant plugin using local Phi-4 via Ollama backend

MIT License Dec 2024

No. 11

Phi-3 / Phi-3.5

Microsoft Research · April 2024

Edge General

Punches well above its 3.8B weight class. Matches much larger models on MMLU benchmarks. Designed for phones and embedded systems. Phi-3.5-mini extends context to 128K. Used in offline agricultural field tools and developer CLI copilots on severely constrained hardware.

Min Size

2.2 GB (3.8B)

Max Size

7.9 GB (14B)

Variants

Mini, Small, Med

Min RAM

4 GB RAM

Notable Deployments

Agricultural Field Apps — On-device crop advice and field diagnostics with no internet required

AI Toolkit for VS Code — Microsoft's official extension uses Phi-3 as default local code assistance model

MIT License Apr 2024 → Phi-3.5 Aug 2024

No. 12

Qwen3

Alibaba Qwen Team · April 2025

General Reasoning Multilingual

Latest Qwen generation with dense and MoE variants. Qwen3 offers up to 256K tokens — the longest context window on Ollama. The 235B MoE is the flagship. Excellent for multi-step agentic tasks, tool use, long-document analysis, and broad multilingual coverage.

Min Size

~500 MB (0.6B)

Max Size

~140 GB (235B)

Context Window

256K tokens

Architecture

Dense + MoE

Notable Deployments

Enterprise Long-Doc Q&A — Contract review and summarization via LangChain + Ollama integrations

Qwen3-Coder-Next — Coding-focused variant optimized for agentic local development workflows

Apache 2.0 (most variants) Apr 2025

No. 13

Qwen2.5

Alibaba Qwen Team · September 2024

Multilingual Code Variant General

Alibaba's mature, well-quantized general family. Pretrained on 18 trillion tokens. Sweet spots are 7B (fast) and 14B (balanced reasoning). Includes Qwen2.5-Coder with support for 300+ programming languages. One of the broadest multilingual models available locally.

Min Size

~400 MB (0.5B)

Max Size

~44 GB (72B)

Context Window

128K tokens

Training Tokens

18 Trillion

Notable Deployments

Home Lab AI Stacks — Q4 / Q5 / Q8 builds popular on consumer GPUs in self-hosted server setups

DevOps Scripting — Qwen2.5-Coder for code generation, review, and shell scripting automation

Apache 2.0 (≤72B) Sep 2024

No. 14

CodeLlama

Meta AI · August 2023

Code Fill-in-Middle

Meta's code-specialized Llama 2 derivative. Supports fill-in-the-middle (FIM) for inline code completion via the <FILL_ME> token. Available in Python-specific and instruction-tuned variants. Important: FIM support is limited to the 7B and 13B base model sizes only.

Min Size

3.8 GB (7B)

Max Size

38 GB (70B)

Sizes

7B, 13B, 34B, 70B

FIM Support

7B & 13B only

Notable Deployments

Continue (VS Code) — Open-source AI coding assistant using CodeLlama for tab-completion

Cline (VS Code) — Multi-file repository coding agent; CodeLlama 34B for local use cases

Llama 2 Community License Aug 2023

No. 15

nomic-embed-text

Nomic AI · 2024

Embedding

High-performing open embedding model with an unusually large 8K token context window. Converts text to 768-dimensional vectors for semantic search, RAG pipelines, and similarity matching. One of the only fully open embedding models that genuinely competes with proprietary APIs.

Size

~274 MB

Context Window

8K tokens

Output Dimensions

768

Version

v1.5

Notable Deployments

RAG Pipelines — Local vector stores (Chroma, pgvector, chromem-go) for private document Q&A

Semantic Kernel / LangChain — Default local embedding model in many open-source agent frameworks

Apache 2.0 2024

No. 16

LLaVA

Haotian Liu et al., UW-Madison · April 2023

Vision General

The original open-source vision-language model for local deployment. Connects a CLIP visual encoder to Llama or Mistral via a lightweight projection layer. LLaVA 1.6 (Mistral-based) is the most popular variant. The easiest entry point to multimodal AI in a home lab.

Min Size

4.5 GB (7B)

Max Size

20 GB (34B)

Vision Encoder

CLIP ViT-L

Latest Version

v1.6 (Jan 2024)

Notable Deployments

Home Lab Vision — Most popular multimodal model for self-hosted image description and analysis

Open WebUI — Integrated as the default vision option in popular Ollama browser front-ends

LLaVA License (research) Apr 2023 → v1.6 Jan 2024