VOOZH about

URL: https://apify.com/exciting_perfume/web-page-to-single-page-pdf-and-html

⇱ Web Page to Single-Page PDF & HTML (Automation-Ready) Β· Apify


πŸ‘ Web Page to Single-Page PDF & HTML (Automation-Ready) avatar

Web Page to Single-Page PDF & HTML (Automation-Ready)

Pricing

$29.99/month + usage

Go to Apify Store

Web Page to Single-Page PDF & HTML (Automation-Ready)

Convert webpages to single-page PDFs and extract raw HTML via API. Captures full scroll height (no A4 splits). Built for automation with n8n, Make, and Zapier. Ideal for archiving, AI workflows, compliance, and bulk processing.

Pricing

$29.99/month + usage

Rating

0.0

(0)

Developer

πŸ‘ Gavin Campbell

Gavin Campbell

Maintained by Community

Actor stats

1

Bookmarked

9

Total users

1

Monthly active users

3 months ago

Last modified

Share

Web Page to Single-Page PDF Converter (Automation Ready)

Capture full-length webpages as single-page PDFs and extract raw HTML source code via API.

Designed for seamless integration with automation platforms like n8n, Make.com, and Zapier, this Apify Actor allows you to programmatically archive web content, generate visual reports, and feed clean data into your AI workflows.

Unlike standard converters that cut pages into A4 sheets, this tool captures the entire scrollable area of a webpage into one continuous PDF file, ensuring no data is cut off at page breaks.


πŸš€ Key Features

  • Single-Page "Long" PDFs: Captures the full height of the webpage in a single continuous document. Perfect for newsletters, landing pages, and social media feeds.
  • HTML Source Extraction: Option to save the exact view-source: HTML code alongside the visual PDF.
  • Bulk Processing: Handle thousands of URLs in a single run.
  • Anti-Blocking: Built-in support for Apify Proxy and stealth mode to bypass bot detection.
  • Smart Waiting: Configurable waitUntil strategies (e.g., networkidle0) ensure dynamic JavaScript content loads completely before capture.

πŸ’‘ Use Cases

  1. Compliance & Archiving: Automatically screenshot and save the HTML source of your legal pages, T&Cs, or partner sites for compliance auditing.
  2. Marketing Swipe Files: Build a visual database of competitor landing pages, emails, and ad creatives.
  3. AI Knowledge Base: Feed the raw HTML output into LLMs (like ChatGPT or Claude) via n8n to analyze page structure or content without parsing complex DOMs yourself.
  4. Invoicing & Receipts: Convert web-based invoice views into portable PDF files for accounting systems.
  5. Design QA: Automate visual regression testing by capturing full-page renders of your staging environment.

βš™οΈ Input Configuration

FieldTypeDefaultDescription
startUrlsArray[]A list of URLs you want to convert. Supports direct URLs or object format.
saveHtmlBooleantrueIf enabled, saves the raw HTML source code (.html) to the Key-Value store.
proxyConfigurationObjectApify ProxyRecommended to keep enabled to avoid IP bans.
waitUntilStringnetworkidle0When to take the snapshot. Use networkidle0 for strict loading or domcontentloaded for speed.

πŸ”Œ Automation Integrations

This Actor is built to be a backend microservice. Here is how to connect it to your favorite workflow automation tools.

1. n8n Integration

Goal: Trigger the actor from a workflow and download the resulting PDF.

  1. Add the "Apify" Node: In your n8n workflow, add the Apify node.
  2. Select Action: Choose Run Actor.
  3. Actor ID: Search for web-to-pdf-converter (or use the Actor ID from the Apify console).
  4. Input: switch to JSON mode and map your URL:
    {
    "startUrls":[{"url":"{{$json.your_url_field}}"}],
    "saveHtml":true
    }
  5. Wait for Finish: Ensure the "Synchronous" option is checked (or use a separate "Wait" node and "Get Dataset Items" node for long runs).
  6. Retrieve Files: The output will contain a pdfUrl. Use an HTTP Request node to GET that URL and save the binary data.

2. Make.com (Integromat) Integration

Goal: Save a webpage to Google Drive every time a new row is added to Google Sheets.

  1. Trigger: Google Sheets (Watch Rows).
  2. Action: Add the Apify module -> Run Actor.
  3. Settings:
    • Actor: Select this actor.
    • Body:
      {
      "startUrls":[{"url":"{{1.url}}"}],
      "saveHtml":true
      }
  4. Action: Add Apify module -> Get Dataset Items.
    • Dataset ID: Map the defaultDatasetId from the previous step.
  5. Action: Add HTTP module -> Get a file.
    • URL: Map the pdfUrl from the dataset items.
  6. Action: Google Drive -> Upload a File.

3. Zapier Integration

Goal: Email a PDF version of a webpage when a specific event occurs.

  1. Trigger: Any Zapier trigger (e.g., "New Trello Card").
  2. Action: Search for Apify.
  3. Event: Select Run Actor.
  4. Configure:
    • Actor: Paste the Actor ID.
    • Input Body:
      {
      "startUrls":[{"url":"https://example.com"}]
      }
  5. Action: Select Apify -> Get Dataset Items (to get the PDF link).
  6. Action: Gmail -> Send Email. Use the pdfUrl in the attachment field or body.

πŸ“¦ Output Format

The actor stores results in two locations:

  1. Key-Value Store: The physical files.
    • Page_Title_hash.pdf (The visual render)
    • Page_Title_hash_source.html (The source code)
  2. Dataset: The JSON metadata used for linking.

Sample Dataset JSON:

{
"url":"https://apify.com",
"title":"Apify: The Web Scraping and Automation Platform",
"pdfUrl":"https://api.apify.com/v2/key-value-stores/mYStoReId/records/Apify_hash.pdf",
"htmlUrl":"https://api.apify.com/v2/key-value-stores/mYStoReId/records/Apify_hash_source.html",
"timestamp":"2023-10-27T14:30:00.000Z"
}

πŸ›  Troubleshooting

  • PDF is blank/white: Try changing waitUntil to networkidle0. This forces the crawler to wait until all network activity (images, scripts) has settled.
  • Cookie Consent Popups: The actor attempts to hide scrollbars, but popups may obscure content. For complex sites, you may need an actor with custom "click" logic or use a pre-navigation hook (advanced usage).
  • Access Denied: Ensure you are using the proxyConfiguration set to useApifyProxy: true to avoid 403 errors.

Built with ❀️ using the Apify SDK and Puppeteer.

You might also like

Html To Pdf Api

simplifysme/html-to-pdf-api

πŸ“„ Convert any HTML page or URL to high-quality PDF documents via API. Perfect for reports, invoices, documentation, web page archiving, and automated document generation.

πŸ‘ User avatar

SimplifySME Toolbox

1

HTML To PDF for N8N

exciting_perfume/HTML-to-PDF-Apify-Actor

Generate accurate PDFs from HTML or URLs using Chromium. Supports CSS, fonts, and backgrounds. Automation-ready and perfect for n8n workflows, reports, invoices, and contracts.

πŸ‘ User avatar

Gavin Campbell

20

HTML to PDF Converter

automation-lab/html-to-pdf-converter

Convert HTML content or web pages to PDF documents. Supports raw HTML strings, single URLs, and bulk URL lists. Full control over page size, margins, orientation, headers, and footers.

πŸ‘ User avatar

Stas Persiianenko

27

n8n Workflow Automation Templates Scraper

scraped/n8n-workflow-automation-templates-scraper

A tool that automatically scrapes and collects n8n workflow automation templates from the n8n for easy access and use.

n8n-mcp

nourishing_courier/web-data-for-ai

n8n-mcp

πŸ‘ User avatar

Ani BjΓΆrkstrΓΆm

4

n8n Workflows Scraper

dadhalfdev/n8n-workflows-scraper

This scraper extracts pre-built, free workflow templates directly from the n8n template library. Pick a category and sort order, and the scraper will navigate n8n's library to extract not only the metadata of each workflow but the full, raw JSON configuration. Get up to 150 workflows per run.

πŸ‘ User avatar

Marco Rodrigues

2

Reddit Scraper Pro

webdatalabs/reddit-scraper-pro

High-performance Reddit scraper (99%+ success rate) for automation workflows. Monitor subreddits, track keywords with sentiment analysis, scrape comments, and integrate with n8n/Zapier for powerful automation.

132

5.0

n8n Documentation MCP Server

agentify/n8n-mcp-server

n8n MCP Server provides AI assistants with structured access to n8n node documentation, properties, and validation tools for building and verifying workflows efficiently.

Reddit Scraper - Markdown for AI & n8n

clearpath/reddit-to-llm-api

Extract Reddit posts and comments as LLM-ready Markdown. No API key needed. Direct n8n/Make integrationβ€”connect output to AI nodes instantly. 20x faster than browser scrapers. Perfect for lead gen, product validation, and market research workflows.

Related articles

How to publish your Apify Actor as an n8n node
Read more