7 billion parameter autoregressive language model, capable of generating and completing code in 12 programming languages and most popular frameworks.
CodeGen2 (7B) is a 7 billion parameter autoregressive language model, a competitor in the realm of program synthesis. Developed by the minds at Salesforce AI Research, this model generates executable code based on natural language descriptions and completes partially-generated code snippets with precision.
CodeGen2 (7B) is the ultimate helper for program synthesis . Whether you're a seasoned developer looking to streamline your workflow or an aspiring coder - this model has got you covered. It's able to generate code from natural language descriptions to completing partially-written code snippets and assisting in code refactoring and optimization.
Supported languages (and frameworks) are as follows: C, C++, C-Sharp, Dart, Go, Java, Javascript, Kotlin, Lua, PHP, Python, Ruby, Rust, Scala, Shell, SQL, Swift, Typescript, and Vue.
CodeGen2 (7B) is built upon the foundation of transformer-based architecture, popularized widely by GPT-3. However, modifications for program synthesis tasks were introduced. The result is a transformer-based architecture that captures long-range dependencies in the input sequence with high precision, ensuring that your code is not only well-structured but also semantically coherent.
This checkpoint is trained on the stricter permissive subset of the deduplicated version of the Stack dataset (v1.1). From complex algorithms to simple of scripts, CodeGen2 (7B) has been exposed to a wide range of programming practices and techniques.
A dataset of approximately 1.5 billion tokens was used. The code has been curated to ensure high-quality and relevance to the target programming languages.
Like a wise mentor, CodeGen2 (7B) has been trained on a wealth of knowledge, but even the most knowledgeable have their limits. The model's knowledge cutoff is based on the timestamp of the training data, that was collected up to June 2022.
From niche programming domains to popular use cases, this model has been exposed to a wide range of coding practices and techniques.
On the HumanEval benchmark, this model achieved a score of 30.7, outperforming GPT-3. And on the MBPP (Mostly Basic Programming Problems) benchmark, CodeGen2 (7B) scored 43.1.
CodeGen2 (7B) is available under a commercial license. Developers interested in using the model for commercial purposes should contact Salesforce for licensing information and terms of use.