You write a scraper with Playwright, wait for the page to load, close the cookie banner, click a filter, and parse a table out of the DOM. Then someone redesigns the page and your selector breaks. The annoying part is that the data probably never lived in the HTML in the first place.
Most modern websites render a UI around structured background requests. The browser loads the shell, runs JavaScript, and calls internal endpoints for prices, availability, inventory, search results, profile data, or whatever the page needs. If you scrape the rendered page, you often process hundreds of kilobytes of layout and tracking code to recover a few kilobytes of JSON.
Look at the network layer before writing browser code
Before reaching for Playwright or Puppeteer, open DevTools and check what the site actually does.
In Chrome:
- Open DevTools
- Go to the Network tab
- Filter by
Fetch/XHR - Perform the action manually, such as search, filter, paginate, or change dates
- Click the request that returns the data
- Inspect the request URL, method, headers, payload, and response
You will often find something like this:
POST /api/search/hotels HTTP/2
content-type: application/json
x-csrf-token: 8f9c...
{"checkIn":"2026-03-12","checkOut":"2026-03-15","city":"Berlin","guests":2}
And the response is already the thing you wanted:
{"results":[{"id":"hotel_123","name":"Example Hotel","price":184,"currency":"EUR","available":true}]}
At that point, scraping the DOM is extra work. You can reproduce the request directly:
curl 'https://example.com/api/search/hotels' \
-X POST \
-H 'content-type: application/json' \
-H 'x-csrf-token: 8f9c...' \
--data '{"checkIn":"2026-03-12","checkOut":"2026-03-15","city":"Berlin","guests":2}'
Or from code:
const res = await fetch('https://example.com/api/search/hotels', {
method: 'POST',
headers: {
'content-type': 'application/json',
'x-csrf-token': process.env.CSRF_TOKEN
},
body: JSON.stringify({
checkIn: '2026-03-12',
checkOut: '2026-03-15',
city: 'Berlin',
guests: 2
})
});
if (!res.ok) {
throw new Error(`Search failed: ${res.status}${await res.text()}`);
}
const data = await res.json();
console.log(data.results.map(h => [h.name, h.price]));
This is the basic pattern: use the browser to discover the request, not to run every request forever.
Wire works in this network-layer space by mapping the background requests a site already uses and exposing stable endpoints around those flows.
Why this matters in production
Browser automation is useful, but it is expensive for data retrieval.
A headless browser has to start a browser process, load HTML, download scripts, execute JavaScript, build a DOM, run layout work, and wait for the app to settle. If you only need the JSON response behind a search form, most of that work is overhead.
The difference shows up quickly:
// Browser-level approach
const page = await browser.newPage();
await page.goto('https://example.com/hotels?q=berlin');
await page.waitForSelector('[data-testid="hotel-card"]');
const prices = await page.$$eval('[data-testid="hotel-card"]', cards =>
cards.map(card => card.textContent)
);
// Network-level approach
const res = await fetch('https://example.com/api/search/hotels', options);
const { results } = await res.json();
const prices = results.map(h => h.price);
The first version breaks if the frontend team renames data-testid, changes the card layout, lazy-loads the list differently, or inserts an experiment variant. The second version breaks if the internal API changes, auth changes, or the site adds request signing. Neither is magic. The network version just avoids tying your pipeline to visual presentation.
It also gives less junk to downstream systems. If an LLM needs to answer, "Which hotels are available under 200 EUR?", passing it rendered HTML or markdown forces it to ignore navigation, legal text, ads, and duplicated labels. Passing structured JSON lets you validate, filter, and summarize before the model sees anything.
The edge cases are real
Internal endpoints are not public contracts. Treat them as dependencies that can change without notice.
Common failure modes include:
-
401or403because the request needs a logged-in session -
400because the frontend generated a hidden token you did not include -
429because you ignored rate limits - Empty results because a required header, locale, currency, or experiment flag is missing
- A successful
200response with a changed JSON shape - Requests signed with timestamps or hashes generated in bundled JavaScript
A small amount of defensive code helps:
function parseHotelSearch(payload) {
if (!payload || !Array.isArray(payload.results)) {
throw new Error('Unexpected hotel search response shape');
}
return payload.results.map(item => {
if (typeof item.name !== 'string' || typeof item.price !== 'number') {
throw new Error(`Invalid hotel item: ${JSON.stringify(item)}`);
}
return {
id: item.id,
name: item.name,
price: item.price,
currency: item.currency ?? 'UNKNOWN',
available: item.available === true
};
});
}
You still need monitoring. Log status codes, response sizes, parse failures, and schema changes. If the payload usually contains results and suddenly returns items, you want an alert before a customer notices missing data.
For teams that do not want to own endpoint discovery, request signing, and breakage handling for many third-party sites, Wire is a managed version of this same approach rather than a DOM scraping wrapper.
When a browser is still the right tool
Use browser automation when the browser behavior is the thing you need to test or reproduce.
Good cases for Playwright, Puppeteer, or Selenium:
- End-to-end testing user flows
- Capturing screenshots or PDFs
- Interacting with canvas-heavy or browser-only apps
- Debugging frontend behavior
- Handling flows where the data is only available after complex client-side state changes
- Verifying that the UI actually displays what the API returned
Bad cases:
- Polling prices every five minutes
- Pulling paginated search results
- Checking inventory across many SKUs
- Feeding structured records into a data pipeline
- Giving an agent live availability data
For those, inspect the network requests first. If the data is already JSON, call that layer directly, validate the response shape, and keep the browser out of the hot path unless you actually need it.
For further actions, you may consider blocking this person and/or reporting abuse
