Llama-based 8B model for safe LLM content classification
LlamaGuard-2-8B is an 8 billion parameter model developed by Meta AI to classify content safety in large language models (LLMs). It is based on the Meta Llama 3 architecture and is trained to predict safety labels across 11 categories from the MLCommons taxonomy of hazards.
LlamaGuard-2-8B is designed to be integrated into LLM-powered applications to ensure the safety and responsibility of generated content. It can be used to filter out potentially harmful or inappropriate text before it is displayed to users.
The model is currently trained on English text, but it could potentially be fine-tuned to support other languages as well.
LlamaGuard-2-8B is based on the Meta Llama 3 model, a large language model using the Transformer architecture.
The model was fine-tuned on the Llama 3 model with additional data for safety classification, including a diverse set of online text covering the 11 safety categories.
The training data for LlamaGuard-2-8B is not publicly disclosed, but it is likely a large corpus of online text covering a wide range of topics and genres.
The knowledge cutoff for LlamaGuard-2-8B is not explicitly stated, but it is likely trained on data up to 2023.
The model's training data is designed to be diverse and representative, but it is possible that some biases may still exist. Developers should carefully evaluate the model's performance and outputs for any signs of bias or lack of diversity.
LlamaGuard-2-8B outperforms other popular content moderation APIs, achieving an F1 score of 0.915 and a low false positive rate of 0.040 on internal test sets. It also demonstrates strong robustness and generalization across different types of content.
Meta AI has published ethical guidelines for the development and use of LlamaGuard-2-8B, emphasizing the importance of responsible AI and the need to mitigate potential harms.
The licensing details for LlamaGuard-2-8B are not publicly disclosed, but it is likely available for both commercial and non-commercial use under certain terms and conditions.