Voozh

You write a scraper with Playwright, wait for the page to load, close the cookie banner, click a filter, and parse a table out of the DOM. Then someone redesigns the page and your selector breaks. The annoying part is that the data probably never lived in the HTML in the first place.

Most modern websites render a UI around structured background requests. The browser loads the shell, runs JavaScript, and calls internal endpoints for prices, availability, inventory, search results, profile data, or whatever the page needs. If you scrape the rendered page, you often process hundreds of kilobytes of layout and tracking code to recover a few kilobytes of JSON.

Look at the network layer before writing browser code

Before reaching for Playwright or Puppeteer, open DevTools and check what the site actually does.

In Chrome:

Open DevTools
Go to the Network tab
Filter by Fetch/XHR
Perform the action manually, such as search, filter, paginate, or change dates
Click the request that returns the data
Inspect the request URL, method, headers, payload, and response

You will often find something like this:

POST /api/search/hotels HTTP/2
content-type: application/json
x-csrf-token: 8f9c...

{"checkIn":"2026-03-12","checkOut":"2026-03-15","city":"Berlin","guests":2}

And the response is already the thing you wanted:

{"results":[{"id":"hotel_123","name":"Example Hotel","price":184,"currency":"EUR","available":true}]}

At that point, scraping the DOM is extra work. You can reproduce the request directly:

curl 'https://example.com/api/search/hotels' \
 -X POST \
 -H 'content-type: application/json' \
 -H 'x-csrf-token: 8f9c...' \
 --data '{"checkIn":"2026-03-12","checkOut":"2026-03-15","city":"Berlin","guests":2}'

Or from code:

const res = await fetch('https://example.com/api/search/hotels', {
 method: 'POST',
 headers: {
 'content-type': 'application/json',
 'x-csrf-token': process.env.CSRF_TOKEN
 },
 body: JSON.stringify({
 checkIn: '2026-03-12',
 checkOut: '2026-03-15',
 city: 'Berlin',
 guests: 2
 })
});

if (!res.ok) {
 throw new Error(`Search failed: ${res.status}${await res.text()}`);
}

const data = await res.json();
console.log(data.results.map(h => [h.name, h.price]));

This is the basic pattern: use the browser to discover the request, not to run every request forever.

Wire works in this network-layer space by mapping the background requests a site already uses and exposing stable endpoints around those flows.

Why this matters in production

Browser automation is useful, but it is expensive for data retrieval.

A headless browser has to start a browser process, load HTML, download scripts, execute JavaScript, build a DOM, run layout work, and wait for the app to settle. If you only need the JSON response behind a search form, most of that work is overhead.

The difference shows up quickly:

// Browser-level approach
const page = await browser.newPage();
await page.goto('https://example.com/hotels?q=berlin');
await page.waitForSelector('[data-testid="hotel-card"]');
const prices = await page.$$eval('[data-testid="hotel-card"]', cards =>
 cards.map(card => card.textContent)
);

// Network-level approach
const res = await fetch('https://example.com/api/search/hotels', options);
const { results } = await res.json();
const prices = results.map(h => h.price);

The first version breaks if the frontend team renames data-testid, changes the card layout, lazy-loads the list differently, or inserts an experiment variant. The second version breaks if the internal API changes, auth changes, or the site adds request signing. Neither is magic. The network version just avoids tying your pipeline to visual presentation.

It also gives less junk to downstream systems. If an LLM needs to answer, "Which hotels are available under 200 EUR?", passing it rendered HTML or markdown forces it to ignore navigation, legal text, ads, and duplicated labels. Passing structured JSON lets you validate, filter, and summarize before the model sees anything.

The edge cases are real

Internal endpoints are not public contracts. Treat them as dependencies that can change without notice.

Common failure modes include:

401 or 403 because the request needs a logged-in session
400 because the frontend generated a hidden token you did not include
429 because you ignored rate limits
Empty results because a required header, locale, currency, or experiment flag is missing
A successful 200 response with a changed JSON shape
Requests signed with timestamps or hashes generated in bundled JavaScript

A small amount of defensive code helps:

function parseHotelSearch(payload) {
 if (!payload || !Array.isArray(payload.results)) {
 throw new Error('Unexpected hotel search response shape');
 }

 return payload.results.map(item => {
 if (typeof item.name !== 'string' || typeof item.price !== 'number') {
 throw new Error(`Invalid hotel item: ${JSON.stringify(item)}`);
 }

 return {
 id: item.id,
 name: item.name,
 price: item.price,
 currency: item.currency ?? 'UNKNOWN',
 available: item.available === true
 };
 });
}

You still need monitoring. Log status codes, response sizes, parse failures, and schema changes. If the payload usually contains results and suddenly returns items, you want an alert before a customer notices missing data.

For teams that do not want to own endpoint discovery, request signing, and breakage handling for many third-party sites, Wire is a managed version of this same approach rather than a DOM scraping wrapper.

When a browser is still the right tool

Use browser automation when the browser behavior is the thing you need to test or reproduce.

Good cases for Playwright, Puppeteer, or Selenium:

End-to-end testing user flows
Capturing screenshots or PDFs
Interacting with canvas-heavy or browser-only apps
Debugging frontend behavior
Handling flows where the data is only available after complex client-side state changes
Verifying that the UI actually displays what the API returned

Bad cases:

Polling prices every five minutes
Pulling paginated search results
Checking inventory across many SKUs
Feeding structured records into a data pipeline
Giving an agent live availability data

For those, inspect the network requests first. If the data is already JSON, call that layer directly, validate the response shape, and keep the browser out of the hot path unless you actually need it.

URL: https://dev.to/anakin_writers/stop-scraping-the-page-when-the-data-is-already-in-the-network-tab-314m

⇱ Stop scraping the page when the data is already in the network tab - DEV Community

Look at the network layer before writing browser code

Why this matters in production

The edge cases are real

When a browser is still the right tool