VOOZH about

URL: https://apify.com/alkausari_mujahid/ofsted-reports-data-scraper

โ‡ฑ Ofsted Reports Data Scraper ยท Apify


Pricing

from $10.00 / 1,000 results

Go to Apify Store

Ofsted Reports Data Scraper

Scrape Ofsted full inspection reports for children's homes. Extracts 18 structured fields from PDFs โ€” judgement ratings, provider details, inspector info, home capacity and type โ€” filtered by date. Exports to MySQL and/or Apify dataset.

Pricing

from $10.00 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ Alkausari M

Alkausari M

Maintained by Community

Actor stats

0

Bookmarked

7

Total users

3

Monthly active users

a month ago

Last modified

Share

Extract structured data from Ofsted full inspection PDF reports for children's homes โ€” at scale. Judgement ratings, provider details, inspectors, home capacity, specialism, dates โ€” 18 fields per report, parsed directly from the source PDFs. Export to your MySQL database, your Apify dataset, or both.

Built and maintained by Alkausari M.


โœฆ Highlights

  • ๐Ÿ“„ Full PDF parsing โ€” 18 structured fields extracted from each report
  • ๐Ÿ“… Date-filtered crawling โ€” target only reports in your inspection date range
  • ๐Ÿ—„ MySQL export โ€” direct insert/update with ON DUPLICATE KEY UPDATE, no duplicates on re-runs
  • โ™ป๏ธ Smart deduplication โ€” startup checks your existing records and skips already-processed PDFs
  • ๐Ÿ”— Direct PDF URL support โ€” pass a files.ofsted.gov.uk URL to process a single report
  • ๐Ÿ›ก Resilient โ€” auto-retry with exponential backoff; unparseable PDFs logged separately

โš™ How it works

  1. Paste a search URL โ€” from the Ofsted reports portal with your filters applied. Or pass a direct PDF URL.
  2. Set a date range โ€” latest_report_date_start and latest_report_date_end (YYYY-MM-DD).
  3. Click Start โ€” the Actor finds matching providers โ†’ Full Inspection reports โ†’ downloads and parses each PDF.
// Example input
{
"start_urls": [
{ "url": "https://reports.ofsted.gov.uk/search?q=&level_1_types=3&level_2_types%5B0%5D=11&status%5B0%5D=1&start=0&rows=10" }
],
"latest_report_date_start": "2026-02-15",
"latest_report_date_end": "2026-02-28",
"max_depth": 3,
"skip_db_export": false,
"db_host": "your-db-host",
"db_database": "your-database-name",
"db_user": "your-db-user",
"db_password": "your-db-password"
}

Set skip_db_export: true to use the Actor without any database โ€” all data still lands in your Apify dataset (JSON, CSV, Excel, API).

MySQL tables

When MySQL export is enabled, two tables are used:

  • ofsted_reports โ€” primary output, keyed on pdf_url. Records are inserted on first run, updated on re-runs.
  • ofsted_unsupported_reports โ€” PDFs that don't match the expected Ofsted format (e.g. older layouts) are logged here for review rather than silently dropped.

๐Ÿ“ฆ What you get back

Each record represents one parsed inspection report:

{
"PDF URL":"https://files.ofsted.gov.uk/v1/file/50298941",
"Unique reference number":"2587763",
"Registered provider":"Mercia Children Services Ltd",
"Registered provider address":"Windsor House, Bayshill Road, Cheltenham, Gloucestershire GL50 3AT",
"Provision sub-type":"Children's home",
"Responsible individual":"Michael Lloyd",
"Registered manager":"David Griffiths",
"Inspection dates":"3 and 4 March 2026",
"Inspection type":"Full inspection",
"Overall experiences and progress":"good",
"Help and protection":"good",
"Leadership and management":"good",
"Date of last inspection":"25 February 2025",
"Overall judgement at last inspection":"good",
"Enforcement action since last inspection":"None",
"Inspectors":[
{"name":"Helen Fee","role":"Social Care Inspector"}
],
"Home Capacity":"4",
"Home Type":"social and emotional difficulties"
}

๐Ÿ“‹ Input

ParameterTypeRequiredDefaultDescription
start_urlsArrayYesโ€”Ofsted search URL(s) or a direct files.ofsted.gov.uk PDF URL
latest_report_date_startStringYesTodayStart of inspection date range (YYYY-MM-DD)
latest_report_date_endStringYesTodayEnd of inspection date range (YYYY-MM-DD)
max_depthIntegerNo31 = listing only, 2 = provider pages, 3 = full PDF extraction
skip_db_exportBooleanNofalsetrue = skip MySQL, save to Apify dataset only
db_hostStringIf DB exportโ€”MySQL host
db_databaseStringIf DB exportโ€”MySQL database name
db_userStringIf DB exportโ€”MySQL username
db_passwordStringIf DB exportโ€”MySQL password

Direct PDF โ€” single-report mode

{
"start_urls": [{ "url": "https://files.ofsted.gov.uk/v1/file/50287454" }],
"max_depth": 1,
"skip_db_export": true
}

๐Ÿ’ก Use cases

  • Research โ€” academic and policy analysis of inspection trends across providers and regions.
  • Compliance monitoring โ€” track ratings and enforcement actions across the providers you work with.
  • Sector consultancy โ€” build a structured dataset of children's-home judgements for client reporting.
  • Scheduled syncs โ€” set a rolling 7-day date window and schedule daily/weekly runs; dedup ensures no rework.
  • Data products โ€” power dashboards and BI on top of clean, parsed Ofsted data via the Apify API.

๐Ÿ“ฎ Support

Bugs, feature requests, or custom scraping work โ€” open an issue on Apify or email alkausarimujahid@gmail.com.


You might also like

Freelancermap Profile Scraper

alkausari_mujahid/freelancermap-profile-scraper

Scrapes freelancer profiles from freelancermap search results. Extracts name, title, location, hourly/daily rates, skills, languages, project history, certificates, and optionally email and phone number (requires a freelancermap.de account)

Google Ads Transparency Scraper

alkausari_mujahid/google-ads-transparency-scraper

Designed for marketers, researchers, and business intelligence teams, this efficient tool scans a provided list of websites to determine which ones have ever run Google Adsโ€”whether in the past or currentlyโ€”by leveraging data from the Google Ads Transparency Center.

188

Google Ads Scraper

ivanvs/google-ads-scraper

Extract details about ads from Google Ads Transparency Centar. Scrape ad details like when it was run, ad targeting, advertiser and content of ad. Download ad data in JSON, XML, Excel format.

718

5.0

Property Tax Delinquent Lead Aggregator

george.the.developer/property-tax-delinquent-leads

Multi-county tax delinquent + pre-foreclosure property aggregator with parcel, owner, balance, and skip-trace search hints. Built for real estate wholesalers and pre-foreclosure investors. Pay per record.

12

Google Ads Transparency Scraper

solidcode/ads-transparency-scraper

[๐Ÿ’ฐ $0.8 / 1K] Affordable and effective | Extract ad creatives from Google Ads Transparency Center. Search by keyword, domain, or advertiser ID with filters for format, platform, region, and date range.

734

5.0

Google Ads Scraper

dz_omar/google-ads-scraper

Extract Google Ads creative data including advertiser info, ad text, images, and landing URLs directly from the Google Ads Transparency Center. Ideal for competitor research, ad analysis, and marketing intelligence.

๐Ÿ‘ User avatar

FlowExtract API

690

5.0

Google Search Scraper โ€” SERP, AI Overview, Ads

scrape.badger/google-search-scraper

Scrape Google Search (SERP) results at scale: organic, paid ads, AI Overview, knowledge graph, People Also Ask, local pack, news, related searches. Supports desktop / mobile, 200+ domains, location & language targeting. No CAPTCHAs โ€” handled by ScrapeBadger's residential proxy + browser farm.

89

Google Ads Transparency Scraper

devilscrapes/google-ads-transparency

Scrape ad creatives from the Google Ads Transparency Center by advertiser domain or advertiser ID โ€” creative, format, regions, first/last shown, landing URL โ€” export to JSON or CSV. A Google Ads Transparency API alternative and data exporter. You pay only for ads that land.

11