K-EXAONE-236B-A23B is a state-of-the-art Mixture-of-Experts (MoE) large language model developed by LG AI Research. It contains 236B total parameters with only 23B active per token, delivering frontier-level reasoning, coding, agentic behavior, and long-context (256K tokens) performance—while keeping inference efficient.
Released on Hugging Face under the K-EXAONE AI Model License, the model targets research and enterprise deployments that need accuracy, scalability, and long-document understanding.
Why K-EXAONE Matters
- Frontier performance with efficiency: MoE routing activates just 8 experts (+1 shared), keeping compute costs lower than dense models of similar capability.
- True long-context native support: 256K tokens without hacks—ideal for books, codebases, logs, and multi-document reasoning.
- Strong agentic skills: Tool calling, browsing, and multi-agent strategies are first-class capabilities.
- Multilingual by design: Optimized vocabulary (SuperBPE) improves token efficiency by ~30%.
Key Features
1) Architecture & Speed
- 236B MoE (23B active) with Multi-Token Prediction (MTP)
- Self-speculative decoding boosts throughput by ~1.5×
2) Long-Context Engineering
- 256K context window
- Hybrid attention (3:1): Sliding-window (128 tokens) + global attention
- No Rotary Positional Embedding (NoPE) to reduce memory overhead
3) Multilingual Coverage (6)
Korean, English, Spanish, German, Japanese, Vietnamese
- 153,600-token vocabulary via SuperBPE
4) Agentic & Tool Use
- Compatible with OpenAI & Hugging Face tool calling
- Excels in search, browsing, and task automation
5) Safety & Cultural Alignment
- Tuned for universal human values
- Incorporates Korean cultural & historical context, improving regional reliability
Model Configuration (At a Glance)
- Layers: 48 main + 1 MTP
- Hidden size: 6,144
- Heads: 64 Q / 8 KV (128-dim)
- Experts: 128 total | 8 active (+1 shared)
- Context: 262,144 tokens
- Knowledge cutoff: Dec 2024
Benchmark Highlights (Reasoning Mode)
K-EXAONE consistently competes with or surpasses strong MoE peers across domains:
- World Knowledge: MMLU-Pro 83.8
- Math: AIME 2025 92.8
- Coding: LiveCodeBench v6 80.7
- Agentic Use: τ2-Bench (Retail) 78.6
- Long Context: AA-LCR 53.5
- Korean Tasks: Ko-LongBench 86.8
- Safety: Wild-Jailbreak 89.9
(Full tables and methodology are in the technical report.)
How to Use
Reasoning vs Non-Reasoning Modes
- Reasoning mode (
enable_thinking=True) → highest accuracy - Non-reasoning mode (
enable_thinking=False) → lower latency
Recommended sampling:temperature=1.0, top_p=0.95, presence_penalty=0.0
Tool Calling
- Works with HF docstring-to-schema utilities
- Supports OpenAI-style tool APIs
Deployment Options
- Transformers (custom EXAONE-MoE fork required)
- vLLM (serving 256K context with TP on 4× H200 GPUs)
- SGLang (native server)
- llama.cpp (EXAONE-MoE fork)
- TensorRT-LLM: support in progress
Limitations
- May generate biased or inappropriate outputs in edge cases
- Knowledge is static (cutoff Dec 2024)
- Long-context reasoning remains probabilistic
- Users must comply with LG AI ethical guidelines
Who Should Use K-EXAONE?
- Research labs exploring long-context reasoning and MoE scaling
- Enterprises building agentic systems, code assistants, or knowledge workers
- Multilingual platforms needing high accuracy with efficiency
- Developers serving massive documents or tool-driven workflows
Final Takeaway
K-EXAONE-236B-A23B proves that frontier-level intelligence doesn’t require dense, always-on compute. With MoE efficiency, native 256K context, strong agentic behavior, and robust multilingual performance, it stands as one of the most capable open models released to date—especially for long-form and tool-augmented tasks.