CodeGen2-16B: Powerful program synthesis model by Salesforce AI Research.
CodeGen2-16B is a colossal language model developed by the visionaries at Salesforce AI Research. This behemoth of a model is designed to revolutionize the way we approach program synthesis, with the ability to generate and comprehend code across a vast array of programming languages.
CodeGen2-16B is a Swiss Army knife for developers, designed to assist in writing and understanding code. From code generation to code completion, this model is a greate AI tool for those who seek to harness the power of AI in their coding endeavors.
Supported languages (and frameworks) are as follows: C, C++, C-Sharp, Dart, Go, Java, Javascript, Kotlin, Lua, PHP, Python, Ruby, Rust, Scala, Shell, SQL, Swift, Typescript, and Vue.
CodeGen2-16B is a Transformer-based model, with a staggering 16 billion parameters. It's amongst the smaller models, capable of processing and generating code with lightning speed, thanks to techniques like Flash Attention.
This model is trained on the stricter permissive subset of the deduplicated version of the Stack dataset (v1.1)
The model's knowledge is as current as the training data itself, up to June 2022.
The training data is a melting pot of programming languages and domains, but the exact diversity and potential biases are not something we can discuss openly. It's a topic that requires careful consideration and research.
The model is a gift to the research community, available for research and non-commercial use under the Salesforce AI Research license.