VOOZH about

URL: https://apify.com/david_craft/anvisa-raw-material-scraper

⇱ ANVISA Medicine Scraper Β· Apify


Pricing

from $1.00 / 1,000 results

Go to Apify Store

ANVISA Medicine Scraper

Extracts complete ANVISA medicine data (presentations, manufacturers, ATC). Uses Playwright to automatically bypass Cloudflare/Dynatrace WAFs.

Pricing

from $1.00 / 1,000 results

Rating

0.0

(0)

Developer

πŸ‘ David MendonΓ§a

David MendonΓ§a

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Share

ANVISA Medicine Scraper πŸ’Š

Extracts complete data on registered medicines from ANVISA (Brazil's National Health Surveillance Agency), including commercial presentations, domestic and international manufacturers, ATC classification, therapeutic class, and registration holder details.

Why this Actor?

ANVISA's consultation portal is protected by a WAF (Cloudflare + Dynatrace), which blocks direct HTTP calls to the API β€” any request without valid session cookies gets a 403 Forbidden.

This Actor solves the problem using Playwright (a real headless browser) that:

  1. Resolves WAF challenges automatically β€” Cloudflare and Dynatrace are handled as part of normal browser navigation
  2. Intercepts JSON responses from the internal API β€” more robust than CSS selector scraping, won't break if the UI changes
  3. Returns complete, structured data β€” same depth of data available in each medicine's detail panel

Input

FieldTypeRequiredDefaultDescription
startDatestringNo7 days agoStart date of the publication period (DD/MM/YYYY)
endDatestringNoTodayEnd date of the publication period (DD/MM/YYYY)
cnpjstringNoβ€”Registration holder's CNPJ (digits only). Filters by holder
nomeProdutostringNoβ€”Product name text search (partial match supported)
maxPagesintegerNo0Listing page limit (0 = unlimited, each page = 10 products)
maxRequestsPerCrawlintegerNo1000Safety limit for HTTP requests per run

Input examples

{
"startDate":"01/04/2025",
"endDate":"30/04/2025",
"maxPages":1
}

Search for a specific registration holder:

{
"startDate":"01/01/2025",
"endDate":"30/06/2025",
"cnpj":"00000000000100"
}

Search by product name:

{
"nomeProduto":"paracetamol",
"maxPages":3
}

Output

Each dataset item is a Medicine object with the following structure:

{
"anvisaRegistrationId":"100000001",
"tradeName":"EXEMPLOMAX",
"activeIngredient":"PARACETAMOL",
"referenceMedicine":"TYLENOL",
"atcCodes":["N02BE01"],
"therapeuticClasses":["ANALGÉSICOS"],
"regulatoryCategory":"GenΓ©rico",
"registrationHolder":{
"legalName":"PHARMA EXEMPLO LTDA.",
"cnpj":"00000000000100",
"authorizationNumber":"1000001"
},
"approvalDate":"2025-04-28",
"expiryDate":"2035-04-28",
"processNumber":"25351000000202500",
"presentations":[
{
"registrationId":"1000000010010",
"description":"500 MG COM CT BL AL PLAS INC X 20",
"pharmaceuticalForms":["COMPRIMIDO SIMPLES"],
"routesOfAdministration":["ORAL"],
"destinations":["Comercial"],
"publicationDate":"2025-01-15",
"validity":"54",
"manufacturers":[
{
"name":"FÁBRICA EXEMPLO S.A.",
"address":"RUA DAS INDÚSTRIAS, 123 - CIDADE/SP",
"country":"BRASIL",
"manufacturingStage":"FABRICAÇÃO DO PRODUTO TERMINADO",
"uniqueCode":"X000001"
}
]
}
]
}

Output fields

FieldTypeDescription
anvisaRegistrationIdstringANVISA registration number (up to 13 digits)
tradeNamestringTrade name (brand)
activeIngredientstringActive pharmaceutical ingredient (API)
referenceMedicinestring|nullReference (innovator) medicine
atcCodesstring[]ATC codes (WHO classification)
therapeuticClassesstring[]ANVISA therapeutic classes
regulatoryCategorystringRegulatory category (Generic, Similar, New, Biological)
registrationHolderobjectRegistration holder company (legalName, cnpj, authorizationNumber)
approvalDatestringRegistration/approval date (YYYY-MM-DD)
expiryDatestringRegistration expiry date (YYYY-MM-DD)
processNumberstringANVISA administrative process number
presentationsarrayCommercial presentations (dosage, packaging, manufacturers)

Each presentation contains:

FieldTypeDescription
registrationIdstringPresentation registration code
descriptionstringFull description (dosage + form + packaging)
pharmaceuticalFormsstring[]Pharmaceutical forms
routesOfAdministrationstring[]Approved routes of administration
destinationsstring[]Commercial destination (Commercial, Hospital, etc.)
publicationDatestringOfficial Gazette publication date (YYYY-MM-DD)
validitystringShelf life in months
manufacturersarrayManufacturers (domestic and international, unified)

Each manufacturer contains:

FieldTypeDescription
namestringLegal name
addressstringManufacturing plant address
countrystringCountry (BRASIL for domestic)
manufacturingStagestringManufacturing process stage
uniqueCodestringUnique code in the ANVISA system

How it works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ 1. Playwright navigates to the ANVISASPA β”‚
β”‚ β†’ WAF(Cloudflare/Dynatrace) resolved β”‚
β”‚ β†’ Session cookies captured by the browser β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 2.DEFAULTroute: Paginated listing API β”‚
β”‚ β†’ fetch() via browser context(withWAF cookies) β”‚
β”‚ β†’ Filters out NOTIFICADO(notified-only) products β”‚
β”‚ β†’ Enqueues DETAILfor each REGISTERED product β”‚
β”‚ β†’ Next page if available β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ 3.DETAILroute: Per-product detail API β”‚
β”‚ β†’ fetch() via browser context(withWAF cookies) β”‚
β”‚ β†’ Maps JSON to Medicine structure β”‚
β”‚ β†’ Saves to Dataset via pushData() β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The scraper does not rely on CSS selectors β€” it intercepts JSON responses from the internal API that ANVISA's own Angular SPA consumes. This makes extraction resilient to visual changes in the portal.

πŸ”Œ Integration & API

You can easily integrate this Actor into your own data pipelines, backend applications, or BI tools using the Apify API.

Starting the Actor via REST API

Trigger a run by sending a POST request to the Apify API, passing your parameters in the JSON body:

curl"https://api.apify.com/v2/acts/YOUR_USERNAME~anvisa-raw-material-scraper/runs?token=YOUR_API_TOKEN"\
-H"Content-Type: application/json"\
-d'{
"startDate": "01/04/2025",
"endDate": "30/04/2025",
"maxPages": 1
}'

Note: Replace YOUR_USERNAME~anvisa-raw-material-scraper with your actual Actor ID and provide your Apify API Token.

Fetching the Results

Once the run finishes, download the extracted data (in JSON, CSV, or Excel format) directly from the run's dataset:

$curl"https://api.apify.com/v2/datasets/DATASET_ID/items?format=json"

For more details on integrating Apify Actors via Node.js, Python, or REST, refer to the official Apify API documentation.

Tech stack

  • Crawlee β€” Web scraping framework
  • Playwright β€” Browser automation
  • Apify SDK β€” Actor platform
  • TypeScript β€” Strict typing

Limitations and considerations

  • Rate limiting: The crawler runs with a max concurrency of 3 to be respectful to ANVISA's servers
  • WAF: In rare cases, the WAF may require manual CAPTCHA solving. The automatic retry (3 attempts) usually handles it
  • Proxy: In production on Apify, using a proxy is recommended to avoid IP-based blocks. The Actor attempts to configure a proxy automatically and works without one if no credits are available
  • Data volume: Each listing page contains 10 products. For long date ranges, the volume can be large β€” use maxPages to limit during testing.

License

ISC

You might also like

ANVISA Brazil Medicines Registry Scraper

parseforge/anvisa-brazil-medicines-scraper

Search the ANVISA medicines registry by product name or active ingredient and pull product_name, active_ingredient, manufacturer, registration, expiry_date, category, and presentation. Handy for pharma market research, regulatory monitoring, and competitive intelligence across Brazil.

Sports Medicine Physician Email Scraper

contacts-api/sports-medicine-physician-email-scraper

Sports medicine physician email scraper to extract verified physician emails from clinics, hospitals, and medical directories πŸ“§πŸ₯ Perfect for healthcare outreach, recruitment, and targeted lead generation.

Internists Email Scraper

contacts-api/internists-email-scraper

Internists email scraper to extract verified internal medicine physician emails from hospitals, clinics, private practices, and healthcare directories πŸ“§πŸ©Ί Perfect for healthcare outreach, recruitment, and internal medicine lead generation.

Physiatrist Email Scraper

contacts-api/physiatrist-email-scraper

Physiatrist email scraper to extract verified physical medicine and rehabilitation physician emails from hospitals, rehabilitation centers, clinics, and healthcare directories πŸ“§πŸ©Ί Perfect for healthcare outreach, recruitment, and physical medicine lead generation.

Cloudflare Bypass Scraper Pro

xtech/cloudflare-scraper-pro

Cloudflare Scraper Pro: The ultimate solution for scraping Cloudflare-protected websites. Advanced browser automation with intelligent Turnstile & CAPTCHA bypass, automatic Cloudflare challenge resolution, and robust proxy rotation to extract data from the most heavily protected sites.

Sound Medicine Academy Blog Scraper

yourapiservice/soundmedicineacademy-blog-scraper

Sound Medicine Academy Blog Scraper (soundmedicineacademy.com) lets you extract blog content in HTML, JSON, and plaintext. Get authors, create/update date, images, read time, RSS, titles, SEO titles, featured images & videos, and keywords easily for content analysis and aggregation.

πŸ‘ User avatar

Your API Service

3

European Medicines Agency Medicines Scraper

parseforge/ema-medicines-scraper

Export EU authorised medicines from the European Medicines Agency. Pull medicine name, INN, ATC code, authorisation holder, therapeutic indication, status, and authorisation date. Filter by status, medicine type (human/veterinary), and therapeutic area.

Cloudflare Web Scraper

ecomscrape/cloudflare-web-scraper

Advanced web scraper designed to extract data from Cloudflare-protected websites with CAPTCHA bypass, proxy rotation, and JavaScript execution capabilities.

ecomscrape

780

3.3

Cloudflare Web Scraper (Pay per event)

ecomscrape/cloudflare-web-scraper-ppe

Advanced web scraper designed to extract data from Cloudflare-protected websites with CAPTCHA bypass, proxy rotation, and JavaScript execution capabilities.

ecomscrape

174

Related articles

How to bypass Cloudflare (updated for 2025)
Read more
Error 1015: how to solve rate limiting from Cloudflare when web scraping
Read more
1006 error code: how to solve it when web scraping
Read more