No. 01
Llama 3.3
Meta AI · December 2024
General
Instruction
Meta's current flagship open model. Delivers performance comparable to the 405B variant at a fraction of compute cost. Excels at multi-step reasoning, code, math, and multilingual tasks. Best single-GPU open model heading into 2025.
Context Window
128K tokens
Notable Deployments
Groq Cloud — Fastest known inference at ~276 tok/sec; used in production API endpoints
NVIDIA TRT-LLM — Up to 3.55× throughput via speculative decoding on HGX H200 hardware
No. 02
Llama 3.1
Meta AI · July 2024
General
Code
Multilingual
The most downloaded model on Ollama with 108M+ pulls. The 8B variant is the go-to professional workhorse for private local AI. At 405B, the first open model to truly rival GPT-4. Supports native tool use and a 128K context window throughout.
Context Window
128K tokens
Training Tokens
15 Trillion
Notable Deployments
n8n Workflows — Ollama node integration for automated local business task pipelines
LiteLLM / LangChain — Unified proxy layer for swapping between Llama 3.1 and cloud APIs
No. 03
Llama 3.2
Meta AI · September 2024
Edge / On-Device
Multilingual
Meta's push into on-device AI. Created via pruning and knowledge distillation from Llama 3.1 8B and 70B. Compatible with Qualcomm, MediaTek, and ARM chips. The 3B model significantly outperforms Gemma 2 2.6B on instruction following benchmarks.
Context Window
128K tokens
Notable Deployments
Mobile / IoT — CLI copilots, homelab dashboards, low-latency edge inference agents
Qualcomm Snapdragon — On-device inference on mobile chipsets without any cloud dependency
No. 04
Llama 3.2-Vision
Meta AI · September 2024
Vision
General
Meta's first multimodal Llama. Uses cross-attention adapters to attach a vision encoder to Llama 3.1. Language model weights stayed frozen during training — making it a full drop-in replacement for text tasks. Supports images up to 1120×1120px.
Image Resolution
1120×1120 px
Training Pairs
6 Billion imgs
Notable Deployments
Document OCR Pipelines — Chart, table, and form data extraction in enterprise workflows
Accessibility Tools — Image-to-text for accessibility apps running fully offline on local hardware
No. 05
DeepSeek-R1
DeepSeek AI · January 2025
Reasoning
Code
Open reasoning family trained with rule-based RL rather than expensive supervised data. Includes distilled variants (1.5B – 70B) and a 671B flagship. Approaches O3 and Gemini 2.5 Pro on math and logic benchmarks. Chain-of-thought inference is native and transparent.
Context Window
128K tokens
Notable Deployments
Academic Research — Chain-of-thought benchmarks with visible scratchpad-style reasoning
Home Lab Engineering — 7B / 8B / 14B distills widely deployed as local reasoning and planning agents
No. 06
DeepSeek-V3.2
DeepSeek AI · December 2024 / 2025
General
MoE
General-purpose frontier model using Mixture-of-Experts with 671B total parameters, activating only ~37B per token. Designed for computational efficiency at frontier scale with strong reasoning performance and far lower inference cost than equivalent dense architectures.
Total Parameters
671B (MoE)
Active / Token
~37B active
Context Window
128K tokens
Notable Deployments
DeepSeek API Platform — Hosted inference for enterprise coding, analysis, and research tasks
vLLM / SGLang — High-throughput server deployment for production agent and pipeline frameworks
No. 07
Mistral 7B
Mistral AI · September 2023
General
Code
The original benchmark-setter for efficient 7B-class models. Fast, accurate, and broadly capable — from email drafting to code summarization. Version 0.3 added function calling support. Extremely popular in small business integrations due to minimal hardware requirements.
Architecture
Sliding Window Attn
Notable Deployments
Small Business Tooling — Email drafting, report summarization, local customer support chatbots
Open Interpreter — Default local model for natural language computer control workflows
No. 08
Mixtral 8×7B / 8×22B
Mistral AI · Dec 2023 / Apr 2024
General
MoE
Sparse Mixture-of-Experts — activates only 2 of 8 expert sub-networks per token. Massive total parameter count with manageable compute overhead. The 8×22B model rivals top-tier proprietary models under a fully permissive Apache 2.0 license. Excellent multilingual reasoning.
Active Params
~12.9B / token
Notable Deployments
Dolphin Fine-tunes — Community fine-tunes by Eric Hartford for creative and research use cases
Enterprise Knowledge Bases — Document Q&A and analysis pipelines via LangChain + Ollama integrations
No. 09
Gemma 3
Google DeepMind · March 2025
General
Vision
140+ Languages
Google's latest open-weight multimodal family. The 27B model reportedly outperforms Llama 3 405B and DeepSeek-V3 in human preference evaluations. Features QK-norm, Grouped-Query Attention, and an efficient KV cache. Supports over 140 languages natively out of the box.
Context Window
128K tokens
Notable Deployments
Google AI Studio — Hosted research access; Gemma 3 used as reference open model for developers
Multimodal Analysis — Document and chart understanding at 12B scale on consumer RTX graphics cards
No. 10
Phi-4
Microsoft Research · December 2024
Reasoning
Compact
14B parameter model from Microsoft that rivals much larger models on complex STEM reasoning. Phi-4-Reasoning matches frontier math performance at 14B scale — ideal for constrained hardware requiring deep analytical capability without giving up accuracy for speed.
Notable Deployments
Azure AI Studio — Microsoft-hosted managed endpoint for enterprise STEM reasoning tasks
QodeAssist (Qt Creator) — AI coding assistant plugin using local Phi-4 via Ollama backend
No. 11
Phi-3 / Phi-3.5
Microsoft Research · April 2024
Edge
General
Punches well above its 3.8B weight class. Matches much larger models on MMLU benchmarks. Designed for phones and embedded systems. Phi-3.5-mini extends context to 128K. Used in offline agricultural field tools and developer CLI copilots on severely constrained hardware.
Notable Deployments
Agricultural Field Apps — On-device crop advice and field diagnostics with no internet required
AI Toolkit for VS Code — Microsoft's official extension uses Phi-3 as default local code assistance model
No. 12
Qwen3
Alibaba Qwen Team · April 2025
General
Reasoning
Multilingual
Latest Qwen generation with dense and MoE variants. Qwen3 offers up to 256K tokens — the longest context window on Ollama. The 235B MoE is the flagship. Excellent for multi-step agentic tasks, tool use, long-document analysis, and broad multilingual coverage.
Context Window
256K tokens
Notable Deployments
Enterprise Long-Doc Q&A — Contract review and summarization via LangChain + Ollama integrations
Qwen3-Coder-Next — Coding-focused variant optimized for agentic local development workflows
No. 13
Qwen2.5
Alibaba Qwen Team · September 2024
Multilingual
Code Variant
General
Alibaba's mature, well-quantized general family. Pretrained on 18 trillion tokens. Sweet spots are 7B (fast) and 14B (balanced reasoning). Includes Qwen2.5-Coder with support for 300+ programming languages. One of the broadest multilingual models available locally.
Context Window
128K tokens
Training Tokens
18 Trillion
Notable Deployments
Home Lab AI Stacks — Q4 / Q5 / Q8 builds popular on consumer GPUs in self-hosted server setups
DevOps Scripting — Qwen2.5-Coder for code generation, review, and shell scripting automation
No. 14
CodeLlama
Meta AI · August 2023
Code
Fill-in-Middle
Meta's code-specialized Llama 2 derivative. Supports fill-in-the-middle (FIM) for inline code completion via the <FILL_ME> token. Available in Python-specific and instruction-tuned variants. Important: FIM support is limited to the 7B and 13B base model sizes only.
Notable Deployments
Continue (VS Code) — Open-source AI coding assistant using CodeLlama for tab-completion
Cline (VS Code) — Multi-file repository coding agent; CodeLlama 34B for local use cases
No. 15
nomic-embed-text
Nomic AI · 2024
Embedding
High-performing open embedding model with an unusually large 8K token context window. Converts text to 768-dimensional vectors for semantic search, RAG pipelines, and similarity matching. One of the only fully open embedding models that genuinely competes with proprietary APIs.
Notable Deployments
RAG Pipelines — Local vector stores (Chroma, pgvector, chromem-go) for private document Q&A
Semantic Kernel / LangChain — Default local embedding model in many open-source agent frameworks
No. 16
LLaVA
Haotian Liu et al., UW-Madison · April 2023
Vision
General
The original open-source vision-language model for local deployment. Connects a CLIP visual encoder to Llama or Mistral via a lightweight projection layer. LLaVA 1.6 (Mistral-based) is the most popular variant. The easiest entry point to multimodal AI in a home lab.
Latest Version
v1.6 (Jan 2024)
Notable Deployments
Home Lab Vision — Most popular multimodal model for self-hosted image description and analysis
Open WebUI — Integrated as the default vision option in popular Ollama browser front-ends