Summary
- Gemma offers responsible AI development with smaller model sizes, optimized for laptops and adherence to Google's AI principles.
- Google releases a toolkit for safe AI applications, including model debugging, safety classification, and best practices for model builders.
- Gemma supports retrieval-augmented generation (RAG), allowing for conversing with documents and access to multiple tools and systems for compatibility.
Google's Gemini models have been around for a couple of months now, and the company has announced a step-up to Gemini 1.5, with the Pro model offering a context window of an astounding 1 million tokens. Now Google is releasing Gemma, a language model aimed at helping people develop AI responsibly. It's available worldwide, and comes in two model sizes: Gemma 2B and Gemma 7B, and both are released with pre-trained and instruction-tuned variants. Google claims that it performs better in MMLU benchmarks than Mistral 7B and Llama 13B.
5 things Gemini 1.5 Pro can do that 1.0 couldn't
Gemini 1.5 is here, and here is 5 things that Gemini 1.5 Pro can do that 1.0 couldn't
Gemma's big appeal is that thanks to its smaller model sizes, it can run on a laptop or a computer with ease. Google says it excels at some key benchmarks where it can perform better than alternative, larger models. It's built by following Google's AI principles, and a significant amount of testing was put in to prevent it from giving responses that go against those principles. They're trained in the same way that Gemini was, too, benefitting from those advanced processes.
Google also released a Responsible Generative AI Toolkit to help developers and researchers with building safe AI applications. It includes:
- Safety classification: A novel methodology for building robust safety classifiers with minimal examples.
- Debugging: A model debugging tool helps you investigate Gemma's behavior and address potential issues.
- Guidance: You can access best practices for model builders based on Google’s experience in developing and deploying large language models.
Gemma supports retrieval-augmented generation (RAG) too, which is how Chat with RTX allows you to converse with documents. It essentially means you can provide a knowledgebase to Gemma that it can respond with, giving it additional context outside of its training data.
Nvidia's Chat with RTX will connect an LLM with YouTube videos and documents locally on your PC
Nvidia is making it even easier to run a local LLM with Chat with RTX, and it's pretty powerful, too.
Gemma supports the following tools and systems:
- Multi-framework tools: There are reference implementations for inference and fine-tuning across multi-framework Keras 3.0, native PyTorch, JAX, and Hugging Face Transformers.
- Cross-device compatibility: Gemma models run across many different device types, including laptop, desktop, IoT, mobile and cloud.
- Nvidia partnership: Gemma is optimized for Nvidia GPUs, thanks to a partnership between Google and Nvidia. You'll be able to use Gemma in Chat with RTX soon.
- Optimized for Google Cloud: Vertex AI provides a broad MLOps toolset with a range of tuning options and one-click deployment using built-in inference optimizations. Advanced customization is available with fully-managed Vertex AI tools or with self-managed GKE, including deployment to cost-efficient infrastructure across GPU, TPU, and CPU from either platform.
Gemma is free to use on Kaggle, a free tier for Colab notebooks, and there's also $300 in credits for first-time Google cloud users. Researchers can also apply for up to $500,000 in free Google Cloud credits.
If you want to give Gemma a try you can, and you'll be able to run it on basically any machine that has enough vRAM and an Nvidia GPU.
