VOOZH about

URL: https://www.analyticsvidhya.com/blog/2024/04/how-to-access-llama3-with-flask/

⇱ How to Access Llama3 with Flask?


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

How to Access Llama3 with Flask?

Mobarak Inuwa Last Updated : 24 Apr, 2024
7 min read

Introduction

The world of AI just got a whole lot more exciting with the release of Llama3! This powerful open-source language model, created by Meta, is shaking things up. Llama3, available in 8B and 70B pretrained and instruction-tuned variants, offers a wide range of applications. In this guide, we will explore the capabilities of Llama3 and how to access Llama3 with Flask, focusing on its potential to revolutionize Generative AI.

Learning Objectives

  • Explore the architecture and training methodologies behind Llama3, uncovering its innovative pretraining data and fine-tuning techniques, essential for understanding its exceptional performance.
  • Experience hands-on implementation of Llama3 through Flask, mastering the art of text generation using transformers while gaining insights into the critical aspects of safety testing and tuning.
  • Analyze the impressive capabilities of Llama3, including its enhanced accuracy, adaptability, and robust scalability, while also recognizing its limitations and potential risks, crucial for responsible use and development.
  • Engage with real-world examples and use cases of Llama3, empowering you to leverage its power effectively in diverse applications and scenarios, thereby unlocking its full potential in the realm of Generative AI.

This article was published as a part of the Data Science Blogathon.

Llama3 Architecture and Training

Llama3 is an auto-regressive language model that leverages an optimized transformer architecture. Yes, the regular transformer but with an upgraded approach. The tuned versions employ supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety. The model was pretrained on an extensive corpus of over 15 trillion tokens of data from publicly available sources, with a cutoff of March 2023 for the 8B model and December 2023 for the 70B model, respectively. The fine-tuning data incorporates publicly available instruction datasets, as well as over 10 million human-annotated examples.

Llama3 Impressive Capabilities

As we previously noted, Llama3 has an optimized transformer design and comes in two sizes, 8B and 70B parameters, in both pre-trained and instruction-tuned versions. The tokenizer of the model has a 128K token vocabulary. Sequences of 8,192 tokens were used to train the models. Llama3 has proven to be remarkably capable of the following:

  • Enhanced accuracy: Llama3 has shown improved performance on various natural language processing tasks.
  • Adaptability: The model’s ability to adapt to diverse contexts and tasks makes it an ideal choice for a wide range of applications.
  • Robust scalability: Llama3’s scalability enables it to handle large volumes of data and complex tasks with ease.
  • Coding Capabilities: Llama3’s coding capability is agreed to be nothing short of remarkable with an incredible 250+ tokens per second. Instead of the golden GPUs, the efficiency of LPUs is unmatched, making them the superior choice for running large language models.

The most significant advantage of Llama3 is its open-source and free nature, making it accessible to developers without breaking the bank.

Llama3 Variants and Features

As mentioned earlier, the Llama3 offers two major variants, each catering to different use cases with the two sizes of 8B and 70B:

  • Pre-trained models: Suitable for natural language generation tasks. A bit more general in performance.
  • Instruction-tuned models: Optimized for dialogue use cases, outperforming many open-source chat models on industry benchmarks.

Llama3 Training Data and Benchmarks

Llama3 was pre-trained on an extensive corpus of over 15 trillion tokens of publicly available data, with a cutoff of March 2023 for the 8B model and December 2023 for the 70B model. The fine-tuning data incorporates publicly available instruction datasets and over 10 million human-annotated examples(You heard that right!). The model has achieved impressive results on standard automatic benchmarks, including MMLU, AGIEval English, CommonSenseQA, and more.

πŸ‘ llama3

Llama3 Use Cases and Examples

Llama can be used like other Llama family models which has also made using it very easy. We basically need to install transformer and accelerate. We will see a wrapper script in this section. You can find the entire code snippets and the notebook to run with GPU here. I have added the notebook, a flask app, and an interactive mode script to test the behavior of the model. Here’s an example of using Llama3 with pipeline:

How to Access Llama3 with Flask?

Let us now explore the steps to access Llama3 with Flask.

Step 1: Set up Python Environment

Create a virtual environment (optional but recommended):

$ python -m venv env
$ source env/bin/activate # On Windows use `.\env\Scripts\activate`

Install necessary packages:

We install transformer and accelerate but since Llama3 is new, we go on by installing directly from Git Hub.

(env) $ pip install -q git+https://github.com/huggingface/transformers.git
(env) $ pip install -q flask transformers torch accelerate # datasets peft bitsandbytes

Step2: Prepare Main Application File

Create a new Python file called main.py. Inside it, paste the following code.

from flask import Flask, request, jsonify
import transformers
import torch

app = Flask(__name__)

# Initialize the model and pipeline outside of the function to avoid unnecessary reloading
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
pipeline = transformers.pipeline(
 "text-generation",
 model=model_id,
 model_kwargs={"torch_dtype": torch.bfloat16},
 device_map="auto",
)


@app.route('/generate', methods=['POST'])
def generate():
 data = request.get_json()
 user_message = data.get('message')

 if not user_message:
 return jsonify({'error': 'No message provided.'}), 400

 # Create system message
 messages = [{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"}]

 # Add user message
 messages.append({"role": "user", "content": user_message})

 prompt = pipeline.tokenizer.apply_chat_template(
 messages,
 tokenize=False,
 add_generation_prompt=True
 )

 terminators = [
 pipeline.tokenizer.eos_token_id,
 pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
 ]

 outputs = pipeline(
 prompt,
 max_new_tokens=256,
 eos_token_id=terminators,
 do_sample=True,
 temperature=0.6,
 top_p=0.9,
 )

 generated_text = outputs[0]['generated_text'][len(prompt):].strip()
 response = {
 'message': generated_text
 }

 return jsonify(response), 200


if __name__ == '__main__':
 app.run(debug=True)

The above code initializes a Flask web server with a single route, /generate, responsible for receiving and processing user messages and returning AI-generated responses.

Step3: Run Flask Application

Run the Flask app by executing the following command:

(env) $ export FLASK_APP=main.py
(env) $ flask run --port=5000

Now, you should have the Flask app running at http://localhost:5000. You may test the API via tools like Postman or CURL, or even write a simple HTML frontend page.

Interactive Mode Using Transformers AutoModelForCausalLM

To interactively query the model within Jupyter Notebook, paste this in a cell and run:

import requests
import sys
sys.path.insert(0,'..')
import torch

from transformers import AutoTokenizer, AutoModelForCausalLM

MODEL_NAME ='meta-llama/Meta-Llama-3-8B-Instruct'

class InteractivePirateChatbot:
 def __init__(self):
 self._tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, padding_side='left')
 self._tokenizer.pad_token = self._tokenizer.eos_token
 self._model = AutoModelForCausalLM.from_pretrained(MODEL_NAME, torch_dtype=torch.bfloat16, device_map="auto", offload_buffers=True)
 
 def _prepare_inputs(self, messages):
 try:
 inputs = self._tokenizer([message['content'] for message in messages], padding='longest', truncation=True, max_length=512, return_tensors='pt')
 input_ids = inputs.input_ids.to(self._model.device)
 attention_mask = inputs.attention_mask.to(self._model.device)
 return {'input_ids': input_ids, 'attention_mask': attention_mask}
 except Exception as e:
 print(f"Error preparing inputs: {e}")
 return None

 def ask(self, question):
 try:
 messages = [
 {"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
 {"role": "user", "content": question}
 ]

 prepared_data = self._prepare_inputs(messages)
 if prepared_data is None:
 print("Error preparing inputs. Skipping...")
 return

 output = self._model.generate(**prepared_data, max_length=512, num_beams=5, early_stopping=True)

 answer = self._tokenizer.decode(output[0], skip_special_tokens=True)
 print("Pirate:", answer)
 except Exception as e:
 print(f"Error generating response: {e}")

generator = InteractivePirateChatbot()
while True:
 question = input("User: ")
 generator.ask(question)

The above code will allow you to quickly interact and see how the model works. Find the entire code here.

User: "Who are you?"

Pirate: "Arrrr, me hearty! Me name be Captain Chat, the scurviest pirate chatbot to ever sail the Seven Seas! I be here to swab yer decks with me clever responses and me trusty parrot, Polly, perched on me shoulder. So hoist the colors, me matey, and let's set sail fer a swashbucklin' good time!"

Since we have seen how the model works, let’s see some safety and responsibility guides.

Responsibility and Safety

Meta has taken a series of steps to ensure responsible AI development, including implementing safety best practices, providing resources like Meta Llama Guard 2 and Code Shield safeguards, and updating the Responsible Use Guide. Developers are encouraged to tune and deploy these safeguards according to their needs, weighing the benefits of alignment and helpfulness for their specific use case and audience. All these links are available in the Hugginface repository for Llama3.

Ethical Considerations and Limitations

While Llama3 is a powerful tool, it’s essential to acknowledge its limitations and potential risks. The model may produce inaccurate, biased, or objectionable responses to user prompts. Therefore, developers should perform safety testing and tuning tailored to their specific applications of the model. Meta recommends incorporating Purple Llama solutions into workflows, specifically Llama Guard, which provides a base model to filter input and output prompts to layer system-level safety on top of model-level safety.

Conclusion

Meta has reshaped the landscape of artificial intelligence with the introduction of Llama3, a potent open-source language model crafted by Meta. With its availability in both 8B and 70B pretrained and instruction-tuned versions, Llama3 presents a multitude of possibilities for innovation. This guide has provided an in-depth exploration of Llama3’s capabilities and how to access Llama3 with Flask, emphasizing its potential to redefine Generative AI.

Key Takeaways

  • Meta developed Llama3, a powerful open-source language model available in both 8B and 70B pretrained and instruction-tuned versions.
  • Llama3 has demonstrated impressive capabilities, including enhanced accuracy, adaptability, and robust scalability.
  • The model is open-source and completely free, making it accessible to developers and low-budget researchers.
  • Users can utilize Llama3 with transformers, leveraging the pipeline abstraction or Auto classes with the generate() function.
  • Llama3 and Flask enable developers to explore new horizons in Generative AI, fostering innovative solutions like chatbots and content generation, pushing human-machine interaction boundaries.

Frequently Asked Questions

Q1. What is Llama3?

A. Meta developed Llama3, a powerful open-source language model available in both 8B and 70B pre-trained and instruction-tuned versions.

Q2. What are the key features of Llama3?

A. Llama3 has demonstrated impressive capabilities, including enhanced accuracy, adaptability, and robust scalability. Research and tests have shown that it delivers more relevant and context-aware responses, ensuring that each solution is finely tuned to the user’s needs.

Q3. Is Llama3 open-source and free and can I use Llama3 for commercial purposes?

A. Yes, Llama3 is open-source and completely free, making it accessible to developers without breaking the bank. Although Llama3 is open-source and free to use for commercial purposes. However, we recommend reviewing the licensing terms and conditions to ensure compliance with any applicable regulations.

Q4. Can I fine-tune Llama3 for my specific use case?

A.Yes, Llama3 can be fine-tuned for specific use cases by adjusting the hyperparameters and training data. This can help improve the model’s performance on specific tasks and datasets.

Q5. How does Llama3 compare to other language models like BERT and RoBERTa?

A. Llama3, a more advanced language model trained on a larger dataset, outperforms BERT and RoBERTa in various natural language processing tasks.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

I am an AI Engineer with a deep passion for research, and solving complex problems. I provide AI solutions leveraging Large Language Models (LLMs), GenAI, Transformer Models, and Stable Diffusion.

Login to continue reading and enjoy expert-curated content.

Free Courses

AI Interview Questions & Answers Masterclass

Master AI interview questions with expert answers.

Agentic AI Masterclass: Building Multi-Agent Systems with AutoGen, LangGraph & CrewAI

Build multi-agent systems using AutoGen, LangGraph, CrewAI.

Graph RAG: Build Knowledge Graph Powered Retrieval Systems

Build Graph RAG systems using knowledge graphs.

Advanced Strands Agents with MCP

Build enterprise-grade agentic AI using Strands SDK and MCP.

Build Products 10x Faster with GenAI : Hands On

Master prompt engineering,build AI apps with LangChain & deploy custom GPTs.

Responses From Readers

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
πŸ‘ Av Logo White

Continue your learning for FREE

Forgot your password?
πŸ‘ Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

πŸ‘ Popup Banner
πŸ‘ AI Popup Banner