Pricing
from $1.00 / 1,000 results
ANVISA Medicine Scraper
Extracts complete ANVISA medicine data (presentations, manufacturers, ATC). Uses Playwright to automatically bypass Cloudflare/Dynatrace WAFs.
Pricing
from $1.00 / 1,000 results
Rating
0.0
(0)
Developer
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
a month ago
Last modified
Categories
Share
ANVISA Medicine Scraper π
Extracts complete data on registered medicines from ANVISA (Brazil's National Health Surveillance Agency), including commercial presentations, domestic and international manufacturers, ATC classification, therapeutic class, and registration holder details.
Why this Actor?
ANVISA's consultation portal is protected by a WAF (Cloudflare + Dynatrace), which blocks direct HTTP calls to the API β any request without valid session cookies gets a 403 Forbidden.
This Actor solves the problem using Playwright (a real headless browser) that:
- Resolves WAF challenges automatically β Cloudflare and Dynatrace are handled as part of normal browser navigation
- Intercepts JSON responses from the internal API β more robust than CSS selector scraping, won't break if the UI changes
- Returns complete, structured data β same depth of data available in each medicine's detail panel
Input
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
startDate | string | No | 7 days ago | Start date of the publication period (DD/MM/YYYY) |
endDate | string | No | Today | End date of the publication period (DD/MM/YYYY) |
cnpj | string | No | β | Registration holder's CNPJ (digits only). Filters by holder |
nomeProduto | string | No | β | Product name text search (partial match supported) |
maxPages | integer | No | 0 | Listing page limit (0 = unlimited, each page = 10 products) |
maxRequestsPerCrawl | integer | No | 1000 | Safety limit for HTTP requests per run |
Input examples
{"startDate":"01/04/2025","endDate":"30/04/2025","maxPages":1}
Search for a specific registration holder:
{"startDate":"01/01/2025","endDate":"30/06/2025","cnpj":"00000000000100"}
Search by product name:
{"nomeProduto":"paracetamol","maxPages":3}
Output
Each dataset item is a Medicine object with the following structure:
{"anvisaRegistrationId":"100000001","tradeName":"EXEMPLOMAX","activeIngredient":"PARACETAMOL","referenceMedicine":"TYLENOL","atcCodes":["N02BE01"],"therapeuticClasses":["ANALGΓSICOS"],"regulatoryCategory":"GenΓ©rico","registrationHolder":{"legalName":"PHARMA EXEMPLO LTDA.","cnpj":"00000000000100","authorizationNumber":"1000001"},"approvalDate":"2025-04-28","expiryDate":"2035-04-28","processNumber":"25351000000202500","presentations":[{"registrationId":"1000000010010","description":"500 MG COM CT BL AL PLAS INC X 20","pharmaceuticalForms":["COMPRIMIDO SIMPLES"],"routesOfAdministration":["ORAL"],"destinations":["Comercial"],"publicationDate":"2025-01-15","validity":"54","manufacturers":[{"name":"FΓBRICA EXEMPLO S.A.","address":"RUA DAS INDΓSTRIAS, 123 - CIDADE/SP","country":"BRASIL","manufacturingStage":"FABRICAΓΓO DO PRODUTO TERMINADO","uniqueCode":"X000001"}]}]}
Output fields
| Field | Type | Description |
|---|---|---|
anvisaRegistrationId | string | ANVISA registration number (up to 13 digits) |
tradeName | string | Trade name (brand) |
activeIngredient | string | Active pharmaceutical ingredient (API) |
referenceMedicine | string|null | Reference (innovator) medicine |
atcCodes | string[] | ATC codes (WHO classification) |
therapeuticClasses | string[] | ANVISA therapeutic classes |
regulatoryCategory | string | Regulatory category (Generic, Similar, New, Biological) |
registrationHolder | object | Registration holder company (legalName, cnpj, authorizationNumber) |
approvalDate | string | Registration/approval date (YYYY-MM-DD) |
expiryDate | string | Registration expiry date (YYYY-MM-DD) |
processNumber | string | ANVISA administrative process number |
presentations | array | Commercial presentations (dosage, packaging, manufacturers) |
Each presentation contains:
| Field | Type | Description |
|---|---|---|
registrationId | string | Presentation registration code |
description | string | Full description (dosage + form + packaging) |
pharmaceuticalForms | string[] | Pharmaceutical forms |
routesOfAdministration | string[] | Approved routes of administration |
destinations | string[] | Commercial destination (Commercial, Hospital, etc.) |
publicationDate | string | Official Gazette publication date (YYYY-MM-DD) |
validity | string | Shelf life in months |
manufacturers | array | Manufacturers (domestic and international, unified) |
Each manufacturer contains:
| Field | Type | Description |
|---|---|---|
name | string | Legal name |
address | string | Manufacturing plant address |
country | string | Country (BRASIL for domestic) |
manufacturingStage | string | Manufacturing process stage |
uniqueCode | string | Unique code in the ANVISA system |
How it works
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ 1. Playwright navigates to the ANVISASPA ββ β WAF(Cloudflare/Dynatrace) resolved ββ β Session cookies captured by the browser ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€β 2.DEFAULTroute: Paginated listing API ββ β fetch() via browser context(withWAF cookies) ββ β Filters out NOTIFICADO(notified-only) products ββ β Enqueues DETAILfor each REGISTERED product ββ β Next page if available ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€β 3.DETAILroute: Per-product detail API ββ β fetch() via browser context(withWAF cookies) ββ β Maps JSON to Medicine structure ββ β Saves to Dataset via pushData() ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The scraper does not rely on CSS selectors β it intercepts JSON responses from the internal API that ANVISA's own Angular SPA consumes. This makes extraction resilient to visual changes in the portal.
π Integration & API
You can easily integrate this Actor into your own data pipelines, backend applications, or BI tools using the Apify API.
Starting the Actor via REST API
Trigger a run by sending a POST request to the Apify API, passing your parameters in the JSON body:
curl"https://api.apify.com/v2/acts/YOUR_USERNAME~anvisa-raw-material-scraper/runs?token=YOUR_API_TOKEN"\-H"Content-Type: application/json"\-d'{"startDate": "01/04/2025","endDate": "30/04/2025","maxPages": 1}'
Note: Replace
YOUR_USERNAME~anvisa-raw-material-scraperwith your actual Actor ID and provide your Apify API Token.
Fetching the Results
Once the run finishes, download the extracted data (in JSON, CSV, or Excel format) directly from the run's dataset:
$curl"https://api.apify.com/v2/datasets/DATASET_ID/items?format=json"
For more details on integrating Apify Actors via Node.js, Python, or REST, refer to the official Apify API documentation.
Tech stack
- Crawlee β Web scraping framework
- Playwright β Browser automation
- Apify SDK β Actor platform
- TypeScript β Strict typing
Limitations and considerations
- Rate limiting: The crawler runs with a max concurrency of 3 to be respectful to ANVISA's servers
- WAF: In rare cases, the WAF may require manual CAPTCHA solving. The automatic retry (3 attempts) usually handles it
- Proxy: In production on Apify, using a proxy is recommended to avoid IP-based blocks. The Actor attempts to configure a proxy automatically and works without one if no credits are available
- Data volume: Each listing page contains 10 products. For long date ranges, the volume can be large β use
maxPagesto limit during testing.
License
ISC
