Impressum Standby Scraper (Playwright Version)
Pricing
from $2.52 / 1,000 results
Impressum Standby Scraper (Playwright Version)
Scrape German imprint pages instantly. Using a headless-browser for dynamic modern sites. This Apify Actor finds and extracts structured contact & legal data from any German website โ company name, address, phone, fax, email, VAT ID, register number, social media & decision makers.
Pricing
from $2.52 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
3 days ago
Last modified
Categories
Share
German Imprint Scraper (Standby API)
Find and extract structured contact and legal information from German imprint pages ("Impressum") โ in real time, one URL per request. Send a homepage URL to the actor's HTTP endpoint and it automatically discovers the site's imprint page and returns clean, structured data: company name, address, phone/fax, email, commercial register number, VAT ID, social media links, and decision-makers.
This actor runs in Apify Standby mode as a long-lived HTTP server. That makes it ideal for on-demand enrichment: low per-request latency, no run start-up overhead per URL, and a simple GET/POST API you can call directly from your application, a workflow tool, or another actor.
โน๏ธ Which version is this?
This scraper is published in two variants, optimised for different kinds of websites:
๐ญ Playwright version (this actor)
A headless-browser scraper that renders pages with a real Chromium engine. Use it for modern, JavaScript-heavy websites whose imprint links or content only appear after the page renders (e.g. Next.js / React apps). It is more robust but slower, and adds a small headless-browser charge per processed URL.
๐ Most imprint pages are plain server-rendered HTML and don't need a browser. For those, the HTTP version is faster and cheaper.
๐ก Features
- Automatic imprint-page discovery: point the actor at a homepage; it finds the correct "Impressum" page for you.
- Selective data extraction: request only the fields you need, from basic contact info to ML-extracted decision-makers.
- Real-time Standby API:
GETorPOSTa single URL and get structured JSON back immediately. One request is processed at a time per container. - Proxy support: integrates with Apify Proxy for IP rotation and to reduce blocking.
- Structured JSON output: clean, predictable records ready for your CRM, database, or downstream pipeline.
๐ Standby API
In Standby mode the actor exposes an HTTP server. Apify gives every Standby actor a base URL; append the query parameters below and authenticate with your Apify API token (e.g. as a ?token= query parameter or Authorization: Bearer <token> header).
GET / โ scrape one URL (query string)
| Parameter | Required | Description |
|---|---|---|
startUrl | Yes | Homepage URL to scrape. The actor discovers the imprint page automatically. https:// is prepended if the scheme is missing. |
fieldsToExtract | No | Comma-separated list of fields to extract. Defaults to all fields. |
metaData | No | true/false โ include extra technical details in the response. Default false. |
$curl'https://dominic-quaiser--impressum-standby-scraper.apify.actor/?startUrl=https://www.renault.de/&fieldsToExtract=company_name,emails,phone_number&token=<APIFY_TOKEN>'
POST / โ scrape one URL (JSON body)
curl-X POST 'https://dominic-quaiser--impressum-standby-scraper.apify.actor/?token=<APIFY_TOKEN>'\-H'Content-Type: application/json'\-d'{"startUrl": "https://www.renault.de/","fieldsToExtract": ["company_name", "emails", "phone_number"],"metaData": false}'
GET /health โ health check & stats
Returns 200 with a snapshot of running counters (total requests, successful scrapes, errors, etc.). Useful for uptime checks.
$curl'https://dominic-quaiser--impressum-standby-scraper.apify.actor/health'
Responses
| Status | Meaning |
|---|---|
200 | Scrape completed. Body is { "url": ..., "result": { ... } }, or { "url": ..., "result": null, "message": "No data extracted" } when nothing could be extracted. |
400 | Missing or invalid startUrl, or an invalid JSON body. |
500 | Unhandled scraper error. |
504 | Processing timed out. |
Each successful result is also pushed to the actor's default dataset, so you can browse or export your scrape history from the Apify Console even when calling the API directly.
๐ Extractable data
Select any combination of the following fields via fieldsToExtract:
| Field | Description | Type |
|---|---|---|
company_name | The official company name, with a confidence score for the match. | Object |
business_address | Full address parsed into full_address, street, house_number, postal_code, city. | Object |
phone_number | One or more phone numbers, keyed phone_1, phone_2, โฆ | Object |
fax_number | One or more fax numbers, keyed fax_1, fax_2, โฆ | Object |
emails | One or more email addresses; emails matching the site's domain are prioritised. | Object |
register_number | Commercial register number ("Handelsregisternummer") and the registration court ("Registergericht"). | Object |
vat_id | German VAT ID ("Umsatzsteuer-ID") with checksum validation, e.g. DE123456788. | Object |
social_media | Links to platforms like LinkedIn, Xing, Facebook, Instagram, etc. | Object |
decision_makers | (Premium) Names of key decision-makers ("Entscheidungstrรคger") extracted via an external NER (Named Entity Recognition) model. | Array |
Numbered outputs (emails, phone numbers, โฆ) are ordered by how likely each value is the company's main contact.
๐ค Output structure
The exact fields depend on your fieldsToExtract selection.
{"start_url":"https://muster-firma.de/","imprint_url":"https://muster-firma.de/impressum","company_name":{"name":"Muster GmbH","confidence":1},"business_address":{"full_address":"Musterstraรe 123, 12345 Berlin","street":"Musterstraรe","house_number":"123","postal_code":"12345","city":"Berlin"},"phone_number":{"phone_1":"+493012345678"},"fax_number":{"fax_1":"+493012345679"},"emails":{"email_1":"kontakt@muster-firma.de"},"register_number":{"number":"HRB 12345 B","court":"Amtsgericht Charlottenburg"},"vat_id":{"vat_id":"DE123456788"},"social_media":{"linkedin":"https://www.linkedin.com/company/muster-firma"},"decision_makers":["Max Mustermann"],"metadata":{"domain":"muster-firma.de","fetch_method":"http","fallback_attempted":false,"scraped_at":"2026-06-22T12:04:48.003780"}}
The metadata block is only included when metaData is enabled.
โ๏ธ Legal disclaimer
You are solely responsible for determining the legality of your use of this actor and the data it generates. Scraping and handling data โ particularly personal information โ is subject to legal frameworks such as the GDPR (DSGVO), copyright law, and the terms of service of the sites you scrape. Ensure your use case is compliant with all applicable laws. This text is not legal advice.
GDPR notice: "Decision Makers" feature
The decision_makers feature uses an external API hosted on a private server in Europe (Germany) to process data.
- What is processed: the text of the imprint page is sent to the API to identify personal names.
- Why: the NER model needs the page text to accurately extract decision-makers.
- Data controller: you, the user, are the data controller; the actor's developer acts as data processor for this task.
- Location & compliance: all processing occurs within the EU and is subject to the GDPR (DSGVO).
- Data storage: the text is processed in-memory and is not stored or logged on the external server.
- Important: this processing is external to the Apify platform and not covered by Apify's DPA. By using this feature you acknowledge this separate processing activity.
๐ค Other actors
- Gelbe Seiten (German Yellow Pages) Scraper: extract business listings from Germany's Yellow Pages with three detail levels.
- Das Telefonbuch Scraper: extract business listings from Das Telefonbuch, Germany's official telephone directory.
- Das รrtliche Scraper: extract business listings from Das รrtliche, Germany's nationwide telephone directory.
๐ฏ Use cases
- Lead generation โ build targeted contact lists for sales and marketing.
- Real-time enrichment โ call the Standby API to enrich a record the moment a lead enters your CRM.
- Compliance & verification โ check for legally compliant imprint information.
- Market research โ aggregate company data for a specific industry or region.
๐ ๏ธ Maintainer
- Author: Dominic M. Quaiser
- Contact: dev@krake.run
- Website: krake.run
