Alibaba’s crew has launched Qwen3.5, the most recent technology of open-weight giant language and multimodal fashions. This collection pushes the boundaries of efficiency and effectivity, enabling high-end capabilities on dramatically decreased compute budgets. The discharge aligns with an industry-wide pivot towards environment friendly, deployable AI: fashions that ship superior reasoning, coding, agentic habits, and native multimodality whereas becoming on shopper {hardware}, edge units, servers with modest sources, and even native/privacy-focused setups.
Qwen3.5 spans a broad household of sizes and architectures, from ultra-compact dense fashions below 1 billion parameters to huge sparse MoE flagships exceeding 300 billion complete parameters. This tiered lineup lets builders match fashions exactly to their wants for latency, throughput, reminiscence footprint, value, and functionality.
On the light-weight finish, the Qwen3.5 Small collection contains 4 fashions: 0.8B, 2B, 4B, and 9B parameters. Launched in early March 2026 (finishing the household rollout that started in mid-February), these are optimized for on-device and edge deployment: smartphones, IoT units, embedded methods, and privacy-sensitive native inference.
They obtain exceptional effectivity by means of architectural selections like hybrid consideration (Gated Delta Networks for linear-time scaling) and methods that decrease VRAM utilization. Even the 9B mannequin runs easily on modest shopper GPUs or high-end cell {hardware}. All small fashions inherit native multimodality and a 262,144-token context window, making long-document processing and prolonged conversations possible regionally.
The 9B variant stands out because the strongest small-model performer, closing a lot of the hole with far bigger fashions in reasoning, logical problem-solving, and instruction following – thanks partially to in depth post-training reinforcement studying.
A core breakthrough in Qwen3.5 is its native multimodal structure. Not like many prior methods that retrofit imaginative and prescient encoders onto pretrained language fashions, Qwen3.5 integrates imaginative and prescient and language from the pre-training stage onward (early fusion). This unified coaching produces a cohesive illustration house for textual content, pictures, diagrams, charts, screenshots, and paperwork.
The result’s superior efficiency on visible understanding duties: doc structure evaluation, chart/desk interpretation, diagram reasoning, fine-grained OCR, visible query answering, and multimodal agent behaviors (e.g., understanding and appearing on display screen content material).
Within the flagship and medium MoE fashions, solely a small subset of parameters prompts per token:
- Qwen3.5-397B-A17B (flagship): 397 billion complete parameters, about 17 billion activated.
- Qwen3.5-122B-A10B: 122 billion complete, about 10 billion activated.
- Qwen3.5-35B-A3B: 35 billion complete, about 3 billion activated.
This sparsity allows high-end multimodal reasoning and agentic efficiency at inference prices and speeds far nearer to a lot smaller dense fashions – typically 60% cheaper and with 8 occasions higher throughput on giant workloads than the prior technology.
Qwen3.5 leverages large-scale post-training reinforcement studying, together with multi-agent simulation environments with progressively tougher, real-world-inspired duties. This sharpens instruction following, multi-step planning, instrument use, decreased hallucinations, structured output adherence, and flexibility in agentic situations (coding brokers, visible brokers, long-horizon reasoning).
The collection dramatically expands linguistic protection to 201 languages and dialects, with particular emphasis on low-resource languages – advancing really inclusive, culturally conscious AI.
All fashions function a local 262,144-token context window (262K), adequate for complete codebases, prolonged paperwork, multi-turn conversations, or complicated multi-document reasoning. Hosted/API variants (e.g., Qwen3.5-Plus on Alibaba Cloud Mannequin Studio) lengthen this to 1 million tokens.
Accessible below permissive open licenses (primarily Apache 2.0) on Hugging Face, ModelScope, and GitHub, Qwen3.5 empowers builders and enterprises worldwide to construct extra succesful, environment friendly, and accessible AI purposes: from cell assistants and edge analytics to highly effective cloud brokers and analysis frontiers.

