VOOZH about

URL: https://apify.com/njoylab/ai-enhanced-website-metadata

โ‡ฑ AI-Enhanced Website Metadata ยท Apify


Pricing

from $7.00 / 1,000 results

Go to Apify Store

AI-Enhanced Website Metadata

Extracts complete website metadata including SEO tags, OpenGraph data, social media links, contact information and performs link analysis. Features AI-powered content summarization with multilingual support and structured data extraction. Perfect for gathering deep insights from any URL.

Pricing

from $7.00 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ njoylab

njoylab

Maintained by Community

Actor stats

1

Bookmarked

22

Total users

8

Monthly active users

6 months ago

Last modified

Share

URL Summary Scraper with AI

A powerful Apify actor that extracts essential website information with optional AI-powered summaries and key facts extraction. Supports LLM analysis in 30+ languages.

Features

Core Scraping

  • Comprehensive metadata extraction - SEO, OpenGraph, Twitter Card data
  • Social media links - Facebook, X (Twitter), LinkedIn, Instagram, YouTube, TikTok, Pinterest, Trustpilot, GitHub, Discord, Telegram, WhatsApp, Medium, Reddit, Threads, Mastodon, Twitch, Vimeo, Spotify, Snapchat
  • Contact information - Email, phone numbers, addresses
  • Link analysis - Internal/external links with domain categorization
  • Media assets - Favicons, logos, featured images
  • Structured data - JSON-LD extraction
  • Robots.txt compliance - Respects crawling rules (can be bypassed)
  • Batch processing - Process single URL or multiple URLs in one run

AI-Powered Analysis (Optional)

  • Intelligent summaries - Short (50 words), Medium (150 words), Long (300 words)
  • Semantic keywords - AI-extracted keywords from content (works for any page type)
  • Multilingual support - 30+ languages including English, Italian, Spanish, French, German, Portuguese, etc.
  • Key facts extraction - Company name, industry, services, target audience, business model
  • Graceful degradation - Returns metadata even if AI analysis fails

Input Parameters

ParameterTypeRequiredDefaultDescription
urlarrayYes-Array of URLs to scrape (use single-element array for one URL)
languagestringNoen, en-US;q=0.9, en-GB;q=0.8Accept-Language header
ignoreRobotsbooleanNofalseBypass robots.txt rules
ignoreExternalLinksbooleanNofalseSkip external links extraction
ignoreInteralLinksbooleanNofalseSkip internal links extraction
generateSummarybooleanNofalseEnable AI-powered summaries (opt-in)
summaryLengthstringNo-Summary length: short, medium, or long. Leave empty for all three.
summaryLanguagestringNoauto-detectTarget language code (e.g., en, it, es)
extractKeyFactsbooleanNofalseExtract structured business information

Usage Examples

Single URL - Basic Scraping

{
"url":["https://apify.com"]
}

Multiple URLs - Batch Processing

{
"url":[
"https://example.com",
"https://example.org",
"https://example.net"
]
}

AI-Powered Analysis

{
"url":["https://apify.com"],
"generateSummary":true,
"extractKeyFacts":true
}

Multilingual Summary

{
"url":["https://example.it"],
"generateSummary":true,
"summaryLanguage":"it"
}

Output Schema

The actor returns hierarchical JSON structure for each URL:

{
"url":"string",
"seo":{
"title":"string",
"description":"string",
"keywords":["string"],
"canonical":"string",
"robots":"string",
"language":"string",
"viewport":"string"
},
"openGraph":{
"title":"string",
"description":"string",
"image":"string",
"url":"string",
"type":"string",
"siteName":"string"
},
"twitterCard":{
"card":"string",
"site":"string",
"creator":"string",
"title":"string",
"description":"string",
"image":"string"
},
"social":{
"facebook":"string",
"x":"string",
"linkedin":"string",
"instagram":"string",
"youtube":"string",
"tiktok":"string",
"pinterest":"string",
"trustpilot":"string",
"github":"string",
"discord":"string",
"telegram":"string",
"whatsapp":"string",
"medium":"string",
"reddit":"string",
"threads":"string",
"mastodon":"string",
"twitch":"string",
"vimeo":"string",
"spotify":"string",
"snapchat":"string"
},
"contact":{
"email":"string",
"phone":"string",
"address":"string"
},
"technical":{
"statusCode":200,
"finalUrl":"string",
"originalUrl":"string",
"robotsAllowed":true,
"loadTime":1234,
"isSecure":true,
"contentType":"text/html"
},
"media":{
"favicon":"string",
"appleTouchIcon":"string",
"featuredImage":"string",
"logo":"string",
"screenshots":["string"]
},
"links":{
"internal":{
"total":42,
"urls":["string"]
},
"external":{
"total":15,
"urls":["string"],
"domains":["string"]
},
"mailto":["string"],
"tel":["string"]
},
"structuredData":[{}],
"ai":{
"summary":{
"short":"string",
"medium":"string",
"long":"string",
"contentLength":5000,
"truncated":false
},
"keywords":["string"],
"keyFacts":{
"companyName":"string",
"companyType":"B2B SaaS",
"industry":"Technology",
"services":["string"],
"targetAudience":"string",
"headquarters":"San Francisco, USA",
"foundedYear":2020,
"keyFeatures":["string"],
"businessModel":"Subscription"
},
"processingTime":2340,
"error":"string"
}
}

Note: When processing multiple URLs, one record per URL will be added to the dataset.

Supported Languages for AI Summaries

English, Italian, Spanish, French, German, Portuguese, Dutch, Russian, Chinese, Japanese, Korean, Arabic, Hindi, Polish, Turkish, Swedish, Norwegian, Danish, Finnish, Greek, Czech, Romanian, Hungarian, Thai, Vietnamese, Indonesian, Malay, Ukrainian, Bulgarian, Croatian, Slovak, Slovenian, Lithuanian, Latvian, Estonian.

Performance

  • Basic scraping: < 5 seconds per URL
  • With AI analysis: < 30 seconds per URL
  • Memory: Recommended 2048 MB
  • Timeout: Recommended 300 seconds (5 minutes)

Error Handling

The actor implements graceful degradation:

  • AI failures โ†’ Returns metadata with ai.error field
  • Network errors โ†’ Retries with different URL variants (http/https, www/non-www)
  • Robots.txt blocking โ†’ Can be bypassed with ignoreRobots: true
  • Partial failures โ†’ When processing multiple URLs, failed URLs return error objects while successful ones return full data
  • Individual URL errors โ†’ Each URL is processed independently; one failure doesn't stop the batch

Example Response

Here's a real example of the actor output for a single URL:

{
"url":"https://apify.com/",
"seo":{
"title":"Apify: Full-stack web scraping and data extraction platform",
"description":"Extract data from any website with Apify's scraping tools and ready-made scrapers. No coding needed.",
"keywords":["web scraping","data extraction","automation"],
"canonical":"https://apify.com/",
"language":"en",
"viewport":"width=device-width, initial-scale=1"
},
"openGraph":{
"title":"Apify: Full-stack web scraping and data extraction platform",
"description":"Extract data from any website with Apify's scraping tools",
"image":"https://apify.com/og-image.png",
"url":"https://apify.com/",
"type":"website",
"siteName":"Apify"
},
"twitterCard":{
"card":"summary_large_image",
"site":"@apify",
"title":"Apify: Web scraping platform",
"image":"https://apify.com/twitter-card.png"
},
"social":{
"x":"https://x.com/apify",
"linkedin":"https://linkedin.com/company/apifytech",
"youtube":"https://youtube.com/c/apify",
"github":"https://github.com/apify",
"discord":"https://discord.com/invite/apify",
"medium":"https://medium.com/@apify"
},
"contact":{
"email":"support@apify.com"
},
"technical":{
"statusCode":200,
"finalUrl":"https://apify.com/",
"originalUrl":"https://apify.com",
"robotsAllowed":true,
"loadTime":1247,
"isSecure":true,
"contentType":"text/html; charset=utf-8"
},
"media":{
"favicon":"https://apify.com/favicon.ico",
"logo":"https://apify.com/logo.svg",
"featuredImage":"https://apify.com/og-image.png"
},
"links":{
"internal":{
"total":127,
"urls":["https://apify.com/pricing","https://apify.com/about","..."]
},
"external":{
"total":8,
"urls":["https://docs.apify.com","..."],
"domains":["docs.apify.com","blog.apify.com"]
},
"mailto":["support@apify.com"],
"tel":[]
},
"ai":{
"summary":{
"short":"Apify is a web scraping and automation platform that allows users to extract data from websites without coding.",
"medium":"Apify is a comprehensive web scraping and data extraction platform designed for both developers and non-technical users. It offers ready-made scrapers, custom scraping tools, and a cloud infrastructure to extract data from any website at scale. The platform features an extensive library of pre-built actors, proxy management, and scheduling capabilities.",
"contentLength":15420,
"truncated":false
},
"keywords":["web scraping","data extraction","automation","B2B SaaS","cloud platform","API"],
"keyFacts":{
"companyName":"Apify",
"companyType":"B2B SaaS",
"industry":"Web Scraping & Data Extraction",
"services":["Web scraping tools","Ready-made scrapers","Cloud infrastructure","Proxy services"],
"targetAudience":"Developers, Data Scientists, Business Analysts",
"businessModel":"Subscription",
"keyFeatures":["Actor marketplace","Serverless computing","Proxy management","Scheduling"]
},
"processingTime":3421
}
}

Tips for Best Results

  1. Batch Processing - Use arrays for multiple URLs to process them efficiently
  2. AI Costs - Enable generateSummary only when needed to avoid AI costs
  3. Language Detection - Leave summaryLanguage empty to auto-detect from page content
  4. Specific Summaries - Use summaryLength to get only the length you need
  5. Robots.txt - Respect robots.txt by default; only use ignoreRobots: true when legally permitted

Disclaimer

This actor is provided for legitimate web scraping and data extraction purposes. Users are responsible for:

  • Compliance with Terms of Service - Ensure you have permission to scrape target websites
  • Respect for robots.txt - Follow website crawling guidelines unless legally permitted to override
  • Rate limiting - Implement appropriate delays to avoid overloading target servers
  • Data privacy - Comply with GDPR, CCPA, and other data protection regulations
  • Intellectual property - Respect copyright and trademark rights of scraped content

The developers of this actor are not responsible for misuse or violations of applicable laws and terms of service.

You might also like

URL to Metadata - mail, social and more

njoylab/url-summary-scraper

A powerful Apify actor that extracts essential website information, including title, description, images, mail, and social media links. Perfect for quick data gathering and insights from any URL.

Website Metadata Extractor

scrapers-hub/website-metadata-extractor

Website metadata extractor to extract titles, descriptions, keywords, and meta tags from any website ๐ŸŒ๐Ÿ“Š Perfect for SEO analysis, auditing, and research. Fast, accurate, and scalable extraction.

Website Scraper Search Email, Phone, & Social Media

scraping_solutions/website-scraper-search-email-phone-social-media

Automatically extracts emails, social media links, and phone numbers from any website. Perfect for quickly gathering contact details and online presence data of businesses or professionals.

๐Ÿ‘ User avatar

Scraping Solutions

106

SEO Intelligence Suite - Complete Analysis with AI

viralanalyzer/seo-intelligence-suite

Complete SEO audit: meta tags, headings, links, structured data, AI recommendations.

13

5.0

AI-Powered RSS Aggregator & Summarizer

primeparse/rss-aggregator

Enterprise-grade RSS aggregator with AI-powered summarization. Collects, filters, and processes feeds from any source. Ideal for content analysis, news monitoring, and AI training. Features keyword filtering, metadata extraction, and structured output in JSON/CSV. Built with Hugging Face.

AI Website Enricher & Metadata Scraper

express_kingfisher/website-details-gather-with-ai

๐Ÿš€ Instantly turn any URL into rich data. Extract description, tech stack, pricing model, social links, and SEO categories using AI. Perfect for lead generation and market research.

AI Web Extractor

uxinfra/uxinfra-web-extractor

Intelligent web content extraction with AI-powered structuring. Extracts articles, products, reviews, and structured data from any website.