


.webp)
Qwen 1.5 (1.8B): The Latest Iteration of Alibaba Cloud's Large Language Model Series
The Qwen 1.5 (1.8B), the newest iteration in their Qwen series of large language models. This impressive model series spans from 0.5 billion to 72 billion parameters. Aiming to surpass its competitors, Qwen 1.5 has made significant strides in delivering enhanced performance and aligning with human preferences.
Qwen 1.5 (1.8B), the beta version of Qwen2, is a transformer-based, decoder-only language model. It's been pre-trained on substantial amounts of data. This model series includes various model sizes - 0.5B, 1.8B, 4B, 7B, 14B, and 72B. Each model size includes a base language model and an aligned chat model.
The core architecture of Qwen1.5 is based on Transformer with SwiGLU activation, attention QKV bias, group query attention, mixture of sliding window attention and full attention, and more. The model supports a context length of 32K tokens, enabling it to process and generate longer text sequences. It has multilingual capabilities, with an improved tokenizer adaptive to multiple natural languages and codes.
Qwen 1.5 offers stiff competition to other large language models. When compared to other models like Claude 2.1, GPT-3.5-Turbo, and Mixtral, Qwen 1.5 demonstrates superior performance.
In basic capabilities, such as language understanding and reasoning, Qwen 1.5 shows strong performance across traditional benchmarks. In terms of alignment with human preferences, Qwen 1.5 chat models have demonstrated impressive performance on benchmarks like MT-Bench and AlpacaEval.

In terms of multilingual capabilities, Qwen 1.5 has shown impressive performance across a diverse set of languages. It has been evaluated on a number of benchmarks which cover exams, understanding, translation, and math.
When using Qwen 1.5, it's recommended to install transformers>=4.37.0 to avoid errors. Moreover, it's advisable not to use base language models for text generation. Instead, consider applying post-training techniques like SFT, RLHF, or continued pretraining on this model.
Check the license of each model inside its HF repo. It is NOT necessary for you to submit a request for commercial usage.
Qwen 1.5 (1.8B) represents a significant milestone in the development of large language models. Its impressive capabilities and competitive performance make it a promising tool for various applications. As the model continues to evolve, it's likely to offer even more advanced features and improved performance.