VOOZH about

URL: https://apify.com/parseforge/sf-open-data-scraper

โ‡ฑ SF Open Data Scraper - 659 Datasets ยท Apify


Pricing

from $25.72 / 1,000 results

Go to Apify Store

San Francisco Open Data Scraper

Scrape any San Francisco Open Data dataset via Socrata SODA API. Business registrations, restaurants, permits, parking, 311 calls, evictions and more. No API key required.

Pricing

from $25.72 / 1,000 results

Rating

0.0

(0)

Developer

๐Ÿ‘ ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Share

๐Ÿ‘ ParseForge Banner

๐ŸŒ‰ San Francisco Open Data Scraper

๐Ÿš€ Export any San Francisco Open Data dataset in seconds. Tap 659 published datasets including business registrations, restaurants, building permits, 311 cases, parking citations, evictions, police incidents, and more, via the official Socrata SODA API. No API key, no registration.

๐Ÿ•’ Last updated: 2026-05-13 ยท ๐Ÿ“Š Native dataset schema per record ยท ๐Ÿ—‚๏ธ 659 datasets ยท ๐ŸŒ‰ City and County of San Francisco ยท ๐Ÿ”Œ Socrata SODA API

The San Francisco Open Data Scraper is a universal export tool for every dataset on data.sfgov.org. The City and County of San Francisco publishes 659 datasets covering city operations, public safety, transportation, economy, environment, health, and culture. This Actor lets you pull any of them by passing the Socrata 4x4 dataset ID, optionally adding SoQL filters ($where, $select, $order, $q), and downloading the result as CSV, Excel, JSON, or XML.

The catalog spans every major SF civic data set, including building permits (i98e-djp9), registered businesses (pyih-qa8i), 311 service requests (vw6y-z8j6), parking citations (5cei-gny5), eviction notices (tu7p-pa2g), mobile food permits (rqzj-sfat), police incident reports (wg3w-h783), restaurant inspections, film locations, and historical crime statistics. Output preserves the dataset's native schema and appends three metadata fields: _datasetId, _datasetUrl, and _scrapedAt.

๐ŸŽฏ Target Audience๐Ÿ’ก Primary Use Cases
Civic researchers, journalists, prop-tech startups, GIS engineers, data scientists, public health analysts, real-estate firms, urban planners, studentsCivic dashboards, FOIA-style export, permit/business/restaurant feeds, eviction and 311 monitoring, journalism investigations, ML training data on municipal events

๐Ÿ“‹ What the SF Open Data Scraper does

Four filtering knobs map straight to Socrata SoQL:

  • ๐Ÿ†” Dataset selector. Pick any of 659 datasets by 4x4 ID. Find IDs in the URL of any dataset page on data.sfgov.org.
  • ๐Ÿ” WHERE clause. Standard SoQL $where, e.g. permit_type=3 AND filed_date>'2024-01-01'.
  • ๐Ÿ“‹ SELECT clause. Limit returned columns via $select.
  • ๐Ÿ“ˆ ORDER clause. Sort with $order, e.g. filed_date DESC.
  • ๐Ÿ”Ž Full-text search. Free-text $q across all string columns.

Each record returns the dataset's native columns verbatim (with Socrata's internal :@computed_region_* lookup columns stripped to keep the output clean), plus three appended metadata fields: _datasetId, _datasetUrl, and _scrapedAt. Pagination is automatic and capped at 1,000,000 rows.

๐Ÿ’ก Why it matters: San Francisco publishes one of the richest open-data catalogs of any U.S. city, but the SODA API has its own query language, paging quirks, and computed-region noise. This Actor turns that into a clean, paginated export with no Socrata code on your side.


๐ŸŽฌ Full Demo

๐Ÿšง Coming soon: a 3-minute walkthrough showing how to go from sign-up to a downloaded SF dataset.


โš™๏ธ Input

InputTypeDefaultBehavior
datasetIdenum (4x4)"3pee-9qhc"Socrata 4x4 ID. Required. Enumerates all 659 datasets published on data.sfgov.org.
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
wherestring (SoQL)""Socrata $where filter.
selectstring (SoQL)""Comma-separated columns to return.
orderstring (SoQL)""Sort, e.g. filed_date DESC.
querystring""Free-text full-text search (Socrata $q).

Example: every building permit filed in 2026 with cost over $1M.

{
"datasetId":"i98e-djp9",
"maxItems":1000,
"where":"filed_date>'2026-01-01' AND estimated_cost>1000000",
"order":"filed_date DESC"
}

Example: 311 cases mentioning 'graffiti' in the Mission.

{
"datasetId":"vw6y-z8j6",
"maxItems":500,
"query":"graffiti",
"where":"neighborhoods_sffind_boundaries='Mission'"
}

โš ๏ธ Good to Know: the input dataset list contains all 659 datasets currently exposed on data.sfgov.org. A small number are private (require Socrata authentication) and will return an HTTP 401 / 403 error record. Browse the full catalog and find the right 4x4 ID at data.sfgov.org.


๐Ÿ“Š Output

Each record returns the dataset's native schema verbatim (Socrata internal :@computed_region_* columns are stripped) plus three metadata fields. Download as CSV, Excel, JSON, or XML.

๐Ÿงพ Schema (illustrative for building permits dataset i98e-djp9)

FieldTypeExample
๐Ÿ†” permit_numberstring"201903226060"
๐Ÿ—๏ธ permit_type_definitionstring"additions alterations or repairs"
๐Ÿ“… filed_dateISO 8601"2019-03-22T14:35:59.000"
๐Ÿ“‹ statusstring"expired"
๐Ÿ“ street_number / street_name / street_suffixstring"760" / "14th" / "St"
๐Ÿ“ descriptionstring"revision to pa 2017-1120-4452..."
๐Ÿ’ต estimated_cost / revised_coststring (number)"15000.0" / "97000.0"
๐Ÿ˜๏ธ existing_units / proposed_unitsstring (number)"12.0" / "14.0"
๐Ÿ“ฎ zipcodestring"94114"
๐Ÿ—บ๏ธ neighborhoods_analysis_boundariesstring"Castro/Upper Market"
๐Ÿ“ locationobject{"latitude":"...","longitude":"..."}
๐Ÿ†” _datasetIdstring"i98e-djp9"
๐Ÿ”— _datasetUrlstring"https://data.sfgov.org/d/i98e-djp9"
๐Ÿ•’ _scrapedAtISO 8601"2026-05-13T10:00:00.000Z"

Every dataset has its own column set. The Actor passes through whatever Socrata returns for the dataset you picked.

๐Ÿ“ฆ Sample record (building permits)


โœจ Why choose this Actor

Capability
๐Ÿ—‚๏ธ659 datasets, one Actor. Every public dataset on data.sfgov.org enumerated in the input schema.
๐Ÿ”Full SoQL filtering. $where, $select, $order, $q exposed as input fields.
๐ŸงนCleaned output. Socrata :@computed_region_* internal columns stripped automatically.
๐Ÿ”—Dataset provenance. Every record stamped with _datasetId, _datasetUrl, _scrapedAt.
โšกFast. 1,000-row pages, automatic pagination up to 1,000,000 rows.
๐ŸšซNo API key. The Socrata SODA API is public and unauthenticated for all public datasets.

๐Ÿ“Š SF's open-data catalog is one of the most cited public-sector datasets in the country, powering everything from civic-tech projects to academic research.


๐Ÿ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshFiltersSetup
โญ SF Open Data Scraper (this Actor)$5 free credit, then pay-per-useAll 659 SF datasetsLive per runfull SoQL ($where, $select, $order, $q)โšก 2 min
Manual CSV download from data.sfgov.orgFreeOne dataset at a timeSnapshotNone๐Ÿข Manual
Raw Socrata SODA queriesFreeFullLiveSoQL๐Ÿ› ๏ธ Code required
Third-party civic-data aggregators$99+/monthMixedDailyVendor-definedโณ Hours

Pick this Actor when you want a clean, filtered export of any SF dataset without writing a single line of Socrata code.


๐Ÿš€ How to use

  1. ๐Ÿ“ Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. ๐ŸŒ Open the Actor. Go to the San Francisco Open Data Scraper page on the Apify Store.
  3. ๐ŸŽฏ Pick a dataset. Find the 4x4 ID on data.sfgov.org (it's in every dataset URL) and paste it in.
  4. ๐Ÿ” Add optional filters. Type a SoQL $where, $order, $select, or full-text $q if you want a slice.
  5. ๐Ÿš€ Run it. Click Start and let the Actor collect your data.
  6. ๐Ÿ“ฅ Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

โฑ๏ธ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


๐Ÿ’ผ Business use cases

๐Ÿข Real Estate and Construction

  • Track every building permit filed in your target ZIP
  • Lead-gen from eviction notices and 3R reports
  • Comparable cost-per-unit analysis for development bids
  • Monitor neighborhood change with permit pipeline data

๐Ÿด Restaurant and Hospitality

  • Power restaurant-inspection lookup tools
  • Sync mobile food permit feeds for delivery startups
  • Track new business registrations by SIC code
  • Spot health violations across neighborhoods

๐Ÿš“ Public Safety and Insurance

  • Build crime-density dashboards by neighborhood
  • Underwrite policies with live incident data
  • Risk-score parcels with parking-citation history
  • Track 311 service-request volume per district

๐Ÿ—ž๏ธ Journalism and Civic Tech

  • Investigate displacement via eviction notices
  • Quantify housing-supply changes year over year
  • Build live-updating civic dashboards
  • Power newsroom data-explainer features

๐Ÿ”Œ Automating SF Open Data Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • ๐ŸŸข Node.js. Install the apify-client NPM package.
  • ๐Ÿ Python. Use the apify-client PyPI package.
  • ๐Ÿ“š See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Hourly, daily, or weekly refreshes keep downstream databases in sync automatically.


๐ŸŒŸ Beyond business use cases

Data like this powers more than commercial workflows. The same structured records support research, education, civic projects, and personal initiatives.

๐ŸŽ“ Research and academia

  • Urban-studies papers on housing, transit, displacement
  • Public-health theses with 311 and inspection data
  • Reproducible policy-impact studies with versioned pulls
  • GIS coursework on real municipal datasets

๐ŸŽจ Personal and creative

  • Neighborhood dashboards for your block
  • Side projects mapping every food truck in the city
  • Civic-art and visualization exhibitions
  • Hobby trackers for permit pipeline or 311 timing

๐Ÿค Non-profit and civic

  • Housing-justice orgs tracking eviction filings
  • Mutual-aid networks monitoring 311 categories
  • Civic-tech hackathons with structured datasets
  • Investigative journalism on city-government performance

๐Ÿงช Experimentation

  • Train classification ML models on 311 narratives
  • Prototype agent pipelines that summarize city activity
  • Test geocoding and address-normalization toolchains
  • Validate civic-tech product hypotheses with live data

๐Ÿค– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


โ“ Frequently Asked Questions

๐Ÿงฉ How does it work?

Paste the Socrata 4x4 ID of any SF dataset, optionally add SoQL filters and maxItems, click Start, and the Actor pages through the SODA API and emits the records verbatim with three appended metadata fields. No browser automation, no captchas, no setup.

๐Ÿ†” How do I find a dataset ID?

Browse the catalog at data.sfgov.org. Every dataset URL ends in a 4x4 ID like i98e-djp9 (building permits) or vw6y-z8j6 (311 cases). Paste that ID into the input form.

๐Ÿ—‚๏ธ How many datasets are supported?

All 659 datasets currently exposed on data.sfgov.org are enumerated in the input dropdown. New datasets are added by the City regularly; reach out if you need a specific one that isn't yet in the list.

๐Ÿ” What is SoQL?

SoQL is Socrata's SQL-like query language for the SODA API. The Actor exposes $where, $select, $order, and $q as input fields. Reference docs: dev.socrata.com. A short cheat sheet: $where=col='value', $order=col DESC, $select=col1,col2, $q=search text.

๐Ÿงน Why are some columns missing from the output?

Socrata appends internal :@computed_region_* lookup columns to most datasets. These are noise for downstream analytics, so the Actor strips them automatically. Everything else in the dataset's native schema is passed through verbatim.

๐Ÿ”„ How fresh is the data?

The City of San Francisco updates each dataset on its own cadence (some daily, some weekly, some monthly). Every run of this Actor fetches the latest data available on data.sfgov.org as of run time.

๐Ÿšซ Why did I get a 401 or 403 error?

A small number of datasets are private and require Socrata authentication. The Actor will return a clean {error: ...} record indicating which one. Public datasets work without any credentials.

โฐ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval (hourly, daily, weekly) and keep a downstream database in sync.

๐Ÿ’ณ Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small runs (10 records per run). A paid plan lifts the limit and gives you access to scheduling, higher concurrency, and larger datasets.

โš–๏ธ Is this data legal to use?

Yes. SF Open Data is published under the City of San Francisco Open Data Policy and is generally free to reuse with attribution. Specific datasets may carry additional notes on their landing page; check before commercial redistribution.

๐Ÿ†˜ What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


๐Ÿ”Œ Integrate with any app

SF Open Data Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get notified when a new record matches your filters
  • Airbyte - Pipe SF datasets into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh SF civic data into your CRM or analytics backend.


๐Ÿ”— Recommended Actors

๐Ÿ’ก Pro Tip: browse the complete ParseForge collection for more public-data scrapers.


๐Ÿ†˜ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


โš ๏ธ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by the City and County of San Francisco or Tyler Technologies / Socrata. All trademarks mentioned are the property of their respective owners. Only publicly available open data is collected.

You might also like

Los Angeles Open Data Scraper

parseforge/la-open-data-scraper

Scrape any Los Angeles Open Data dataset via Socrata SODA API. Crime, business taxes, building permits, parking, 311 service requests and more. No API key required.

Building Permits Search

hanamira/building-permits-search

Search building permits in Chicago, NYC, San Francisco, Seattle, Boston, Austin + any Socrata city. Filter by contractor, cost range, date. Find construction projects, renovations, permits. Official government open data for lead generation and market research.

San Francisco Crime Data Scraper | SFPD Reports

parseforge/san-francisco-crime-data-scraper

Pull San Francisco Police Department crime incidents with offense category, location, date, district, and resolution. Filter by district, type, or date range. Useful for journalists, neighborhood safety analysts, and researchers tracking SF public safety trends.

Socrata / Chicago Building Permits

moving_beacon-owner1/socrata-chicago-building-permits

Scrape data from any Socrata (SODA) open data portal, including Chicago Building Permits. Supports custom datasets, SoQL filters, pagination, sorting, and column selection. Export raw records to the Apify dataset, making it ideal for building permits, inspections, and 311 requests.

2