CodeGen2 (7B)
Techflow Logo - Techflow X Webflow Template

CodeGen2 (7B)

7 billion parameter autoregressive language model, capable of generating and completing code in 12 programming languages and most popular frameworks.

API for

CodeGen2 (7B)

Access CodeGen2 (7B) API: A 7 billion parameter autoregressive language model, capable of generating and completing code in 12 programming languages and most popular frameworks.

CodeGen2 (7B)

Basic Information

  • Model Name: CodeGen2 (7B)
  • Developer/Creator: Salesforce AI Research
  • Release Date: 2023
  • Version: 2.0
  • Model Type: Autoregressive language model



CodeGen2 (7B) is a 7 billion parameter autoregressive language model, a competitor in the realm of program synthesis. Developed by the minds at Salesforce AI Research, this model generates executable code based on natural language descriptions and completes partially-generated code snippets with precision.

Key Features

  • Supports code infilling: CodeGen2 (7B takes you partially completed code, filling in the gaps and bringing it to life.
  • Trained on a diverse dataset: Covering 12 programming languages and popular frameworks, this model is a diverse companion, capable of adapting to various coding environments and use cases.
  • Capable of multi-turn code generation and completion: Engage in a dynamic dialogue with CodeGen2 (7B), refining and iterating on your code until it meets your exact specifications.

Intended Use

CodeGen2 (7B) is the ultimate helper for program synthesis . Whether you're a seasoned developer looking to streamline your workflow or an aspiring coder - this model has got you covered. It's able to generate code from natural language descriptions to completing partially-written code snippets and assisting in code refactoring and optimization.

Language Support

Supported languages (and frameworks) are as follows: C, C++, C-Sharp, Dart, Go, Java, Javascript, Kotlin, Lua, PHP, Python, Ruby, Rust, Scala, Shell, SQL, Swift, Typescript, and Vue.

Technical Details


CodeGen2 (7B) is built upon the foundation of transformer-based architecture, popularized widely by GPT-3. However, modifications for program synthesis tasks were introduced. The result is a transformer-based architecture that captures long-range dependencies in the input sequence with high precision, ensuring that your code is not only well-structured but also semantically coherent.

Training Data

This checkpoint is trained on the stricter permissive subset of the deduplicated version of the Stack dataset (v1.1). From complex algorithms to simple of scripts, CodeGen2 (7B) has been exposed to a wide range of programming practices and techniques.

Data Source and Size

A dataset of approximately 1.5 billion tokens was used. The code has been curated to ensure high-quality and relevance to the target programming languages.

Knowledge Cutoff

Like a wise mentor, CodeGen2 (7B) has been trained on a wealth of knowledge, but even the most knowledgeable have their limits. The model's knowledge cutoff is based on the timestamp of the training data, that was collected up to June 2022.

Diversity and Bias

From niche programming domains to popular use cases, this model has been exposed to a wide range of coding practices and techniques.

Performance Metrics

On the HumanEval benchmark, this model achieved a score of 30.7, outperforming GPT-3. And on the MBPP (Mostly Basic Programming Problems) benchmark, CodeGen2 (7B) scored 43.1.


API Usage Example

License Type

CodeGen2 (7B) is available under a commercial license. Developers interested in using the model for commercial purposes should contact Salesforce for licensing information and terms of use.

CodeGen2 (7B)

More APIs

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.