Alibaba
✓ VerifiedCreator of the Qwen open-weight model family.
At a glance
Alibaba's AI arm
Alibaba's AI work centres on its DAMO Academy research lab and the cloud-services arm Alibaba Cloud, China's largest cloud provider. Alibaba's flagship LLM family is Qwen (通义千问, Tongyi Qianwen).
The Qwen series
Qwen has emerged as one of the most prominent open-weights LLM families in the world:
- Qwen 1 (2023) — initial release
- Qwen 2 (Jun 2024) — major architectural overhaul
- Qwen 2.5 (Sep 2024) — code, math, and long-context variants
- Qwen 3 (2025) — current flagship including 235B-parameter MoE
Qwen models are released under Apache 2.0 (with some exceptions for the largest sizes) and are widely used across the global open-source ecosystem, particularly in benchmarks and fine-tuning research.
Strategic role
Alibaba has been one of the most aggressive open-weights publishers in China, alongside DeepSeek. Its US$53 billion three-year AI capex commitment (announced Feb 2025) signals continued prioritisation of generative AI over its traditional commerce business.
Latest activity
Full history →- 0
Qwen2.5 72B Instruct released
Qwen2.5 72B Instruct discovered via OpenRouter. Provider: Alibaba.
Qwen2.5 72B Instruct - 0
Qwen: Qwen2.5 7B Instruct released
Qwen: Qwen2.5 7B Instruct discovered via OpenRouter. Provider: Alibaba.
Qwen: Qwen2.5 7B Instruct - 0
Qwen2.5 Coder 32B Instruct released
Qwen2.5 Coder 32B Instruct discovered via OpenRouter. Provider: Alibaba.
Qwen2.5 Coder 32B Instruct - 0
Qwen: Qwen-Plus released
Qwen: Qwen-Plus discovered via OpenRouter. Provider: Alibaba.
Qwen: Qwen-Plus - 0
Qwen: Qwen2.5 VL 72B Instruct released
Qwen: Qwen2.5 VL 72B Instruct discovered via OpenRouter. Provider: Alibaba.
Qwen: Qwen2.5 VL 72B Instruct - 0
Qwen: Qwen-Turbo released
Qwen: Qwen-Turbo discovered via OpenRouter. Provider: Alibaba.
Qwen: Qwen-Turbo - 0
Qwen: Qwen VL Max released
Qwen: Qwen VL Max discovered via OpenRouter. Provider: Alibaba.
Qwen: Qwen VL Max - 0
Qwen: Qwen VL Plus released
Qwen: Qwen VL Plus discovered via OpenRouter. Provider: Alibaba.
Qwen: Qwen VL Plus
Releases timeline
Showing 12 most recent- Qwen: Qwen3.6 Plus
Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers...
- Qwen: Qwen3.5-9B
Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design...
- Qwen: Qwen3.5-35B-A3B
The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall...
- Qwen: Qwen3.5-27B
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of...
- Qwen: Qwen3.5-122B-A10B
The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of...
- Qwen: Qwen3.5-Flash
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the...
- Qwen: Qwen3.5 Plus 2026-02-15
The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of...
- Qwen: Qwen3.5 397B A17B
The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers...
- Qwen: Qwen3 Max Thinking
Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it...
- Qwen: Qwen3 Coder Next
Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It uses a sparse MoE design with 80B total parameters and only 3B activated per...
- Qwen: Qwen3 VL 32B Instruct
Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...
- Qwen: Qwen3 VL 8B Thinking
Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and...
Active models
- Qwen 3 14BActive
14B compact Qwen 3 model for efficient local deployment.
128K ctx - Qwen 3 235BActive
Alibaba's frontier open-weight MoE model with hybrid thinking.
128K ctx - Qwen 3 32BActive
32B Qwen 3 model offering strong reasoning at mid-size cost.
128K ctx - Qwen 3 72BActive
72B dense open-weight model with hybrid thinking from Alibaba.
128K ctx - Qwen2.5 72B InstructActive
Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and...
33K ctx - Qwen2.5 Coder 32B InstructActive
Qwen2.5-Coder is the latest series of Code-Specific Qwen large language models (formerly known as CodeQwen). Qwen2.5-Coder brings the following improvements upon CodeQwen1.5: - Significantly improvements in **code generation**, **code reasoning**...
33K ctx - Qwen: QwQ 32BActive
QwQ is the reasoning model of the Qwen series. Compared with conventional instruction-tuned models, QwQ, which is capable of thinking and reasoning, can achieve significantly enhanced performance in downstream tasks,...
131K ctx - Qwen: Qwen Plus 0728Active
Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.
1M ctx - Qwen: Qwen Plus 0728 (thinking)Active
Qwen Plus 0728, based on the Qwen3 foundation model, is a 1 million context hybrid reasoning model with a balanced performance, speed, and cost combination.
1M ctx - Qwen: Qwen VL MaxActive
Qwen VL Max is a visual understanding model with 7500 tokens context length. It excels in delivering optimal performance for a broader spectrum of complex tasks.
131K ctx - Qwen: Qwen VL PlusActive
Qwen's Enhanced Large Visual Language Model. Significantly upgraded for detailed recognition capabilities and text recognition abilities, supporting ultra-high pixel resolutions up to millions of pixels and extreme aspect ratios for...
131K ctx - Qwen: Qwen-Max Active
Qwen-Max, based on Qwen2.5, provides the best inference performance among [Qwen models](/qwen), especially for complex multi-step tasks. It's a large-scale MoE model that has been pretrained on over 20 trillion...
33K ctx - Qwen: Qwen-PlusActive
Qwen-Plus, based on the Qwen2.5 foundation model, is a 131K context model with a balanced performance, speed, and cost combination.
1M ctx - Qwen: Qwen-TurboActive
Qwen-Turbo, based on Qwen2.5, is a 1M context model that provides fast speed and low cost, suitable for simple tasks.
131K ctx - Qwen: Qwen2.5 7B InstructActive
Qwen2.5 7B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and...
33K ctx - Qwen: Qwen2.5 VL 32B InstructActive
Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual...
128K ctx - Qwen: Qwen2.5 VL 72B InstructActive
Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.
32K ctx - Qwen: Qwen3 14BActive
Qwen3-14B is a dense 14.8B parameter causal language model from the Qwen3 series, designed for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...
41K ctx - Qwen: Qwen3 235B A22BActive
Qwen3-235B-A22B is a 235B parameter mixture-of-experts (MoE) model developed by Qwen, activating 22B parameters per forward pass. It supports seamless switching between a "thinking" mode for complex reasoning, math, and...
131K ctx - Qwen: Qwen3 235B A22B Instruct 2507Active
Qwen3-235B-A22B-Instruct-2507 is a multilingual, instruction-tuned mixture-of-experts language model based on the Qwen3-235B architecture, with 22B active parameters per forward pass. It is optimized for general-purpose text generation, including instruction following,...
262K ctx - Qwen: Qwen3 235B A22B Thinking 2507Active
Qwen3-235B-A22B-Thinking-2507 is a high-performance, open-weight Mixture-of-Experts (MoE) language model optimized for complex reasoning tasks. It activates 22B of its 235B parameters per forward pass and natively supports up to 262,144...
262K ctx - Qwen: Qwen3 30B A3BActive
Qwen3, the latest generation in the Qwen large language model series, features both dense and mixture-of-experts (MoE) architectures to excel in reasoning, multilingual support, and advanced agent tasks. Its unique...
41K ctx - Qwen: Qwen3 30B A3B Instruct 2507Active
Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. It operates in non-thinking mode and is designed for high-quality instruction following, multilingual understanding, and...
262K ctx - Qwen: Qwen3 30B A3B Thinking 2507Active
Qwen3-30B-A3B-Thinking-2507 is a 30B parameter Mixture-of-Experts reasoning model optimized for complex tasks requiring extended multi-step thinking. The model is designed specifically for “thinking mode,” where internal reasoning traces are separated...
131K ctx - Qwen: Qwen3 32BActive
Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...
41K ctx - Qwen: Qwen3 8BActive
Qwen3-8B is a dense 8.2B parameter causal language model from the Qwen3 series, designed for both reasoning-heavy tasks and efficient dialogue. It supports seamless switching between "thinking" mode for math,...
41K ctx - Qwen: Qwen3 Coder 30B A3B InstructActive
Qwen3-Coder-30B-A3B-Instruct is a 30.5B parameter Mixture-of-Experts (MoE) model with 128 experts (8 active per forward pass), designed for advanced code generation, repository-scale understanding, and agentic tool use. Built on the...
160K ctx - Qwen: Qwen3 Coder 480B A35BActive
Qwen3-Coder-480B-A35B-Instruct is a Mixture-of-Experts (MoE) code generation model developed by the Qwen team. It is optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning over...
262K ctx - Qwen: Qwen3 Coder FlashActive
Qwen3 Coder Flash is Alibaba's fast and cost efficient version of their proprietary Qwen3 Coder Plus. It is a powerful coding agent model specializing in autonomous programming via tool calling...
1M ctx - Qwen: Qwen3 Coder NextActive
Qwen3-Coder-Next is an open-weight causal language model optimized for coding agents and local development workflows. It uses a sparse MoE design with 80B total parameters and only 3B activated per...
262K ctx - Qwen: Qwen3 Coder PlusActive
Qwen3 Coder Plus is Alibaba's proprietary version of the Open Source Qwen3 Coder 480B A35B. It is a powerful coding agent model specializing in autonomous programming via tool calling and...
1M ctx - Qwen: Qwen3 MaxActive
Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It...
262K ctx - Qwen: Qwen3 Max ThinkingActive
Qwen3-Max-Thinking is the flagship reasoning model in the Qwen3 series, designed for high-stakes cognitive tasks that require deep, multi-step reasoning. By significantly scaling model capacity and reinforcement learning compute, it...
262K ctx - Qwen: Qwen3 Next 80B A3B InstructActive
Qwen3-Next-80B-A3B-Instruct is an instruction-tuned chat model in the Qwen3-Next series optimized for fast, stable responses without “thinking” traces. It targets complex tasks across reasoning, code generation, knowledge QA, and multilingual...
262K ctx - Qwen: Qwen3 Next 80B A3B ThinkingActive
Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic...
131K ctx - Qwen: Qwen3 VL 235B A22B InstructActive
Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table...
262K ctx - Qwen: Qwen3 VL 235B A22B ThinkingActive
Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math....
131K ctx - Qwen: Qwen3 VL 30B A3B InstructActive
Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...
131K ctx - Qwen: Qwen3 VL 30B A3B ThinkingActive
Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels...
131K ctx - Qwen: Qwen3 VL 32B InstructActive
Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...
131K ctx - Qwen: Qwen3 VL 8B InstructActive
Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...
131K ctx - Qwen: Qwen3 VL 8B ThinkingActive
Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and...
131K ctx - Qwen: Qwen3.5 397B A17BActive
The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers...
262K ctx - Qwen: Qwen3.5 Plus 2026-02-15Active
The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of...
1M ctx - Qwen: Qwen3.5-122B-A10BActive
The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of...
262K ctx - Qwen: Qwen3.5-27BActive
The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of...
262K ctx - Qwen: Qwen3.5-35B-A3BActive
The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall...
262K ctx - Qwen: Qwen3.5-9BActive
Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design...
262K ctx - Qwen: Qwen3.5-FlashActive
The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the...
1M ctx - Qwen: Qwen3.6 PlusActive
Qwen 3.6 Plus builds on a hybrid architecture that combines efficient linear attention with sparse mixture-of-experts routing, enabling strong scalability and high-performance inference. Compared to the 3.5 series, it delivers...
1M ctx - Tongyi DeepResearch 30B A3BActive
Tongyi DeepResearch is an agentic large language model developed by Tongyi Lab, with 30 billion total parameters activating only 3 billion per token. It's optimized for long-horizon, deep information-seeking tasks...
131K ctx
Community ratings
Rate Alibaba
Sign in to rate and review.
Comments
Sign in to leave a comment.