Pricing
from $1.00 / 1,000 results
Website Image Scraper
Extract every image URL from a website. Crawls the start page (and optionally internal links up to a configurable depth), parses `<img>` tags, `<picture>`/`<source>`, `srcset` candidates, and CSS `background-image` declarations. HTTP-only, no proxy or browser needed.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
1
Bookmarked
35
Total users
16
Monthly active users
2 months ago
Last modified
Categories
Share
Extract every image URL from a website. Crawls the start page (and optionally internal links up to a configurable depth), then parses <img> tags, <picture>/<source>, srcset candidates, <link rel="icon">, and CSS background-image declarations. HTTP-only β no browser, no proxy, no API key.
What it does
- Pull every image URL referenced on a page β
<img src>, lazy-loadeddata-src, srcset candidates, picture sources, favicons, inlinestyle="background-image: url(...)". - Crawl deeper β follow internal links up to
maxCrawlDepth(same host only) to grab images from linked pages too. - Filter by format β restrict to specific extensions (e.g. only SVG, only WebP/AVIF).
- Bounded β
maxImagesPerPageandmaxTotalImageskeep runs cost-predictable on large galleries.
Input
| Field | Type | Default | Description |
|---|---|---|---|
startUrl | string (required) | https://apify.com | Page to start crawling. Must be http:// or https://. |
maxCrawlDepth | integer | 1 (0β5) | 0 = only the start URL; 1+ = follow internal links one level (same host only). |
maxImagesPerPage | integer | 200 (1β5000) | Cap per page β keeps pathological galleries bounded. |
maxTotalImages | integer | 1000 (1β50000) | Hard cap on total images emitted across the whole run. |
imageExtensions | array | [jpg, jpeg, png, gif, webp, svg, avif, bmp, ico] | Only URLs whose path ends in one of these are kept. |
includeBackgroundImages | boolean | true | Also extract from inline style="background-image: url(...)". |
userAgent | string | (Chrome 131) | Optional UA override. |
Example input
{"startUrl":"https://apify.com","maxCrawlDepth":1,"maxImagesPerPage":200,"maxTotalImages":500,"imageExtensions":["jpg","png","webp","svg"],"includeBackgroundImages":true}
Output
One record per unique image URL. Empty fields are omitted (no nulls).
{"url":"https://apify.com/static/hero.jpg","sourcePage":"https://apify.com/","pageTitle":"Apify Β· The full-stack web-scraping & automation platform","alt":"Apify hero image","hasAltText":true,"title":"Apify","width":1200,"height":600,"extension":"jpg","discoveredVia":"img-tag","mimeTypeHint":"image/jpeg","crawlDepth":0,"scrapedAt":"2024-12-16T14:23:11+00:00"}
Output fields
urlβ absolute URL of the image (data: URIs and javascript: pseudo-URLs are filtered out).sourcePageβ the page where the image was discovered.pageTitleβ<title>of the page where the image was found (handy for grouping the dataset by page name).altβaltattribute of the<img>tag (when present).hasAltTextβ derived boolean:truewhenaltis present and non-empty. Lets you filter accessibility issues without testing for field presence.titleβtitleattribute (when present).width/heightβ explicit pixel dimensions from the tag (only emitted when numeric).extensionβ lowercase file extension parsed from the URL path (e.g."jpg","svg","webp"). Useful for format-bucket aggregations.discoveredViaβ one ofimg-tag,srcset,picture-source,link-icon,css-background.mimeTypeHintβ derived from the file extension (e.g.image/png,image/svg+xml).crawlDepthβ depth at which the page was crawled (0 = startUrl).scrapedAtβ ISO-8601 timestamp.
Use cases
- Content audits β see every image a website serves up, broken down by source (img tag vs CSS background).
- Asset inventory β pull all logos, hero images, and icons from a competitor or brand site.
- Format migration β find every JPEG/PNG to convert to WebP/AVIF, or every PNG to convert to SVG.
- SEO / accessibility β list images with
hasAltText: falseto flag accessibility issues at a glance.
FAQ
Does it download the image binaries? No. The actor only collects URLs and metadata. Combine with a separate downloader (or pipe URLs into Apify's standard "URL list" actor) if you need the bytes.
Does it work on JavaScript-rendered pages? Mostly no. This scraper is HTTP-only β it sees the server-rendered HTML, not what runs after the page boots. If a site lazy-loads images via React/Vue, you may only see fallback / placeholder images. For SPA-rendered content, use a Playwright-based actor instead.
Can I limit it to a single page?
Set maxCrawlDepth: 0. Only the start URL is fetched.
Does it follow external links?
No. Internal-link crawling only follows links to the same host as startUrl to keep cost and scope bounded.
What if the site has no images at all?
You get a single sentinel record {"type": "website_image_scraper_error", "reason": "no_images_found"} so the dataset is non-empty. The run still completes successfully.
How does it deduplicate?
By absolute URL. The same image referenced from multiple pages produces one record (the first-seen page is recorded as sourcePage).
