VOOZH about

URL: https://huggingface.co/Ateeqq/food-analysis

โ‡ฑ Ateeqq/food-analysis ยท Hugging Face


๐Ÿฝ๏ธ Food Analyzer

A model designed to analyze food images and generate structured nutritional information in JSON format.
It helps users instantly understand what they are eating by predicting calories, macronutrients, and meal composition directly from images.

Built on top of Qwen3-VL-2B-Instruct (4-bit) and fine-tuned using LoRA, this model is optimized for efficient food understanding.


๐Ÿง  Main Capabilities

๐Ÿ• Food Recognition

  • Identifies dish name and food type (homemade, restaurant, etc.)

๐Ÿ”ฅ Calorie Prediction

  • Estimates total calories per serving

๐Ÿฅ— Macronutrient Breakdown

  • Protein (g)
  • Carbohydrates (g)
  • Fat (g)

๐Ÿณ Cooking Method Detection

  • Boiled, fried, grilled, baked, mixed, etc.

๐Ÿ“ Portion Estimation

  • Approximates ingredient quantities

๐Ÿš€ Features

โœ… Accepts food images as input
โœ… Outputs clean structured JSON only
โœ… Detects dish name and cooking method
โœ… Estimates nutritional values (calories, macros)


๐Ÿ“ฅ Model Overview

Property Value
Base Model Qwen3-VL-2B-Instruct
Finetuning Method LoRA
Modality Image + Text
Output JSON
License openrail

๐Ÿง  Example Output

{
 "dish_name": "Vegetable Bowl",
 "food_type": "Homemade food",
 "cooking_method": "boiled and mixed",
 "nutritional_summary": {
 "calories_kcal": 500,
 "protein_g": 20.0,
 "carbohydrate_g": 70.0,
 "fat_g": 15.0
 },
 "portion_size": {
 "quinoa": 200,
 "vegetables": 300,
 "sauce": 50
 }
}

โš™๏ธ Usage

https://colab.research.google.com/drive/1iPQTY_5sM4OZCj1fCXi_bHb-Lt4DeCxv?usp=sharing

Install dependencies

! pip install -U bitsandbytes accelerate
! pip install -U transformers==4.57.0
! pip install peft pillow requests
! pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

Python Inference Example

from transformers import Qwen3VLForConditionalGeneration, AutoProcessor
from peft import PeftModel
import torch
from PIL import Image
import requests
from io import BytesIO
# import gc

# torch.cuda.empty_cache()
# gc.collect()

base_model_name = "unsloth/Qwen3-VL-2B-Instruct-bnb-4bit"
print("Loading quantized base models...")

model = Qwen3VLForConditionalGeneration.from_pretrained(
 base_model_name,
 torch_dtype=torch.float16,
 device_map="auto",
 trust_remote_code=True,
 low_cpu_mem_usage=True,
 attn_implementation="sdpa", # Use "flash_attention_2" if available
)

processor = AutoProcessor.from_pretrained(
 base_model_name,
 trust_remote_code=True
)


print("Loading the saved LoRA adapter...")
model = PeftModel.from_pretrained(
 model,
 "Ateeqq/food-analysis",
)
print("LoRA adapter loaded successfully!")

model.eval()

user_prompt = """As a food-analyzer AI, analyze the image and return a single JSON object containing nutritional information.

Respond with JSON only. No extra text.
"""

image_url = "https://images.pexels.com/photos/1640777/pexels-photo-1640777.jpeg"
response = requests.get(image_url)
image = Image.open(BytesIO(response.content)).convert("RGB")

# Resize image to reduce memory usage (optional but helpful)
max_size = 1024
image.thumbnail((max_size, max_size), Image.Resampling.LANCZOS)

print("\nRunning inference on a new image...")

# Format messages
messages = [
 {
 "role": "user",
 "content": [
 {"type": "image", "image": image},
 {"type": "text", "text": user_prompt}
 ]
 }
]

# Process inputs
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = processor(
 text=[text],
 images=[image],
 return_tensors="pt",
 padding=True
).to(model.device)

# # Clear cache before generation
# torch.cuda.empty_cache()

# Generate output with memory optimizations
with torch.no_grad():
 outputs = model.generate(
 **inputs,
 max_new_tokens=512,
 temperature=0.1,
 do_sample=True,
 use_cache=True, # Enable KV cache
 num_beams=1, # Use greedy decoding to save memory
 # Add these memory-saving options:
 pad_token_id=processor.tokenizer.pad_token_id,
 eos_token_id=processor.tokenizer.eos_token_id,
 )

# Decode output
decoded_output = processor.decode(outputs[0], skip_special_tokens=True)

print("\n\nFinetuned model's response:")
print(decoded_output)

๐Ÿ“œ Citation

If you use this model, please cite:

@misc{food_analyzer_qwen3_vl,
 author = {Muhammad Ateeq},
 title = {Food Analyzer Vision-Language Model},
 year = {2026},
 base_model = {Qwen3-VL-2B-Instruct}
}

๐Ÿค Acknowledgements

  • Qwen team for Qwen3-VL
  • Hugging Face Transformers
  • PEFT (LoRA) framework
Downloads last month

-

Downloads are not tracked for this model. How to track

Model tree for Ateeqq/food-analysis

Finetuned
(229)
this model

Dataset used to train Ateeqq/food-analysis