Pricing
from $12.00 / 1,000 results
Go to Apify Store
AI Image Captioner
Generate accurate text descriptions for any image using AI โ bulk caption product photos, screenshots, or any image URL for SEO, accessibility, and content tagging.
Pricing
from $12.00 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
5
Total users
1
Monthly active users
11 days ago
Last modified
Categories
Share
Generate accurate, detailed text descriptions for any image using AI โ bulk caption product photos, screenshots, or any image URL for SEO alt text, accessibility compliance, and content tagging.
What you get
- Natural-language captions generated by Molmo 2, trained on 712,000+ human-described images
- Three detail levels: brief one-liner, balanced description, or full detailed paragraph
- Optional focus directive to target specific aspects (text, background, faces, objects, etc.)
- One output record per image with the caption and source URL
- Supports bulk processing โ pass up to 50 image URLs and get all captions in a single run
- Export to JSON or CSV directly from the Apify console
Use cases
- E-commerce SEO โ generate alt text for thousands of product images automatically
- Accessibility compliance โ add descriptive alt text to images on websites and apps
- Content moderation โ understand what's in user-uploaded images before publishing
- Dataset labeling โ annotate image datasets for machine learning pipelines
- Digital asset management โ auto-tag and describe photos in large media libraries
- Social media monitoring โ caption scraped images to make them searchable by content
Examples
| Image | Detail Level | Caption |
|---|---|---|
| ๐ Nike sneaker on red background | High | A vibrant red Nike sneaker takes center stage in this striking advertisement, set against a bold red background that creates a visually cohesive and eye-catching composition. The shoe is positioned at an angle, giving the impression of motion and energy. The sneaker features a white Nike swoosh, darker red laces, and "Nike Free" branding on the white sole. The lighting is bright and even, highlighting the shoe's textures and details. |
| ๐ Nike sneaker on red background | Medium | A vibrant red Nike sneaker is displayed against a matching red background. The shoe features a white Nike swoosh and "Nike Free" branding on the sole. The laces are a darker shade of red, complementing the overall design. |
| ๐ Nike sneaker on red background | Low | A red Nike sneaker with white accents. |
| ๐ Food flatlay with three dishes | High | A top-down view of a rustic wooden table with three round bowls arranged in a triangular formation. The central bowl features slices of medium-rare steak garnished with fresh green leaves and a red chili pepper. The left bowl holds crispy fried fish topped with a creamy sauce and herbs. The right bowl contains a meat dish garnished with thinly sliced red onions and nuts. Scattered around the bowls are whole chili peppers, cashews, and a small bowl of brown dipping sauce. |
| ๐ Food flatlay with three dishes | Medium | Three bowls of food are arranged on a gray wooden table, creating a rustic dining scene. The central bowl contains sliced steak, while the left bowl holds fried fish topped with sauce and herbs. The right bowl features a meat dish garnished with onions and nuts. |
| ๐ Food flatlay with three dishes | Low | Three bowls of food on a wooden table with garnishes. |
How to use
- Paste one or more image URLs into the Images field (or upload files directly)
- Choose a Detail Level โ High gives the most descriptive output (recommended for SEO and accessibility)
- Optionally add a Focus hint to direct the model's attention (e.g. "describe only the text visible")
- Click Run โ captions appear in the Dataset tab when complete
- Export results as JSON or CSV, or connect to downstream actors via the Apify API
Output format
Each dataset record:
{"inputImageUrl":"https://example.com/product.jpg","caption":"A white ceramic coffee mug sitting on a wooden table next to an open laptop. The mug has a minimalist logo on the front and steam rising from the top, suggesting the coffee is hot.","detailLevel":"high","status":"success","error":null}
Input options
| Field | Type | Description |
|---|---|---|
| Images | URL list | One or more http/https image URLs or base64 data URIs |
| Upload Images | File upload | Upload images directly from your computer |
| Detail Level | Select | Low (one-liner), Medium (balanced), High (detailed paragraph) โ default: High |
| Focus | Text | Optional directive to focus the caption on a specific aspect of the image |
Limits
- Maximum 50 images per run
- Each image must be a publicly accessible URL or a base64 data URI
- Processing time is typically 5โ15 seconds per image
Related AI image actors
Part of a complete AI image toolkit โ explore the rest of the suite:
- AI Image Background Remover โ Remove backgrounds to clean transparent PNGs
- AI Image Upscaler โ Batch-upscale images to 4K or 8K
- AI Image Watermark Remover โ Remove text and logo watermarks from images
- Image OCR Scraper โ Extract text from images in 109 languages
- Photo Location Finder โ Find where a photo was taken โ no EXIF needed
