![]() |
VOOZH | about |
dotnet add package Soenneker.Playwrights.Crawler --version 4.0.101
NuGet\Install-Package Soenneker.Playwrights.Crawler -Version 4.0.101
<PackageReference Include="Soenneker.Playwrights.Crawler" Version="4.0.101" />
<PackageVersion Include="Soenneker.Playwrights.Crawler" Version="4.0.101" />Directory.Packages.props
<PackageReference Include="Soenneker.Playwrights.Crawler" />Project file
paket add Soenneker.Playwrights.Crawler --version 4.0.101
#r "nuget: Soenneker.Playwrights.Crawler, 4.0.101"
#:package Soenneker.Playwrights.Crawler@4.0.101
#addin nuget:?package=Soenneker.Playwrights.Crawler&version=4.0.101Install as a Cake Addin
#tool nuget:?package=Soenneker.Playwrights.Crawler&version=4.0.101Install as a Cake Tool
👁 alternate text is missing from this package README image
👁 alternate text is missing from this package README image
👁 alternate text is missing from this package README image
A configurable Playwright crawler for mirroring sites to disk with support for:
You might also be interested in:
dotnet add package Soenneker.Playwrights.Crawler
using Microsoft.Extensions.DependencyInjection;
using Soenneker.Playwrights.Crawler.Registrars;
var services = new ServiceCollection();
services.AddLogging();
services.AddPlaywrightCrawlerAsSingleton();
Use AddPlaywrightCrawlerAsScoped() if you prefer a scoped lifetime.
using Soenneker.Playwrights.Crawler.Abstract;
using Soenneker.Playwrights.Crawler.Dtos;
using Soenneker.Playwrights.Crawler.Enums;
IPlaywrightCrawler crawler = serviceProvider.GetRequiredService<IPlaywrightCrawler>();
PlaywrightCrawlResult result = await crawler.Crawl(new PlaywrightCrawlOptions
{
Url = "https://example.com",
SaveDirectory = @"C:\temp\example",
Mode = PlaywrightCrawlMode.Full,
MaxDepth = 2,
ClearSaveDirectory = true,
SameHostOnly = true
});
using Soenneker.Playwrights.Crawler.Abstract;
using Soenneker.Playwrights.Crawler.Dtos;
using Soenneker.Playwrights.Crawler.Enums;
using Soenneker.Playwrights.Extensions.Stealth.Options;
PlaywrightCrawlResult result = await crawler.Crawl(new PlaywrightCrawlOptions
{
Url = "https://example.com",
SaveDirectory = @"C:\temp\example",
Mode = PlaywrightCrawlMode.Full,
MaxDepth = 2,
MaxPages = 50,
MaxStorageBytes = 250_000_000,
MaxDuration = TimeSpan.FromMinutes(10),
SameHostOnly = true,
IgnoreQueryStringsInDuplicateDetection = true,
FormatHtml = true,
IncludeCrossOriginAssets = true,
RewriteCrossOriginAssetUrls = true,
ClearSaveDirectory = true,
OverwriteExistingFiles = true,
Headless = true,
UseStealth = true,
ThrottleMode = PlaywrightCrawlThrottleMode.Automatic,
NavigationTimeoutMs = 45_000,
WaitUntil = WaitUntilState.NetworkIdle,
PostNavigationDelayMs = 0,
ContinueOnPageError = true,
StealthLaunchOptions = new StealthLaunchOptions
{
IgnoreDetectableDefaultArguments = true
},
StealthContextOptions = new StealthContextOptions
{
NormalizeDocumentHeaders = true,
EnableCdpDomainHardening = false
},
Policy = new PlaywrightCrawlPolicy
{
GlobalMaxConcurrency = 20,
PerDomainMaxConcurrency = 2,
PerIpMaxConcurrency = 2,
MinimumDelayBetweenRequestsMs = 750,
DelayJitterMaxMs = 500,
RequestTimeoutMs = 30_000,
MaxRetries = 4
}
});
HtmlOnlySaves only rendered HTML documents discovered during the crawl.
FullSaves:
data-src and data-css-url_external when IncludeCrossOriginAssets = trueRewriteCrossOriginAssetUrls = trueRewriteSameOriginAbsoluteUrls = true| Option | Description |
|---|---|
Url |
Required absolute http or https root URL. |
SaveDirectory |
Required output directory for mirrored content. |
MaxDepth |
Link depth to follow from the root page. 0 crawls only the starting page. |
MaxPages |
Optional hard cap on visited pages. |
MaxStorageBytes |
Optional hard cap on bytes written to disk. |
MaxDuration |
Optional maximum crawl duration. |
SameHostOnly |
Restricts queued pages to the same host as the root URL. |
IgnoreQueryStringsInDuplicateDetection |
Treats query-string variants as the same page when detecting duplicates. |
FormatHtml |
Formats saved HTML documents with Soenneker.Html.Formatter when true. Defaults to false. |
IncludeCrossOriginAssets |
In Full mode, saves cross-origin resources under _external. |
RewriteCrossOriginAssetUrls |
Rewrites saved HTML so captured cross-origin asset URLs point at the local _external copy. Requires IncludeCrossOriginAssets. |
RewriteSameOriginAbsoluteUrls |
Rewrites same-origin absolute URLs in saved HTML and CSS to root-relative paths, such as https://example.com/script.js to /script.js. |
TriggerLazyLoading |
In Full mode, scrolls pages after navigation to trigger lazy-loaded media before resources are saved. Defaults to true. |
LazyLoadScrollStepPx |
Pixel distance for each lazy-load scroll step. |
LazyLoadScrollDelayMs |
Delay after each lazy-load scroll step. |
LazyLoadMaxScrolls |
Maximum number of lazy-load scroll steps per page. |
ClearSaveDirectory |
Deletes the output directory before crawling. |
OverwriteExistingFiles |
Controls whether existing files can be replaced. |
Headless |
Runs Chromium headlessly when true. |
UseStealth |
Enables the Soenneker stealth Playwright extensions. |
ThrottleMode |
Controls automatic pacing and adaptive throttling. Defaults to Automatic; use Disabled to bypass automatic pacing, slow mode, cooldown waiting, and implicit post-navigation jitter. |
NavigationTimeoutMs |
Navigation timeout per page. |
WaitUntil |
Playwright load state awaited during navigation. Defaults to NetworkIdle. |
PostNavigationDelayMs |
Extra delay after navigation to allow late assets to settle. |
ContinueOnPageError |
Continues crawling after an individual page fails. |
Policy |
Crawl throttling, retries, concurrency, slow mode, and cooldown configuration. |
Crawl() returns PlaywrightCrawlResult, which includes:
StartedAtUtc, CompletedAtUtc, Duration)PagesDiscovered, PagesVisited)HtmlFilesSaved, AssetFilesSaved)BytesWritten)StorageLimitReached, DurationLimitReached, PageLimitReached)FilesErrorsSaved files preserve URL structure so the output can be served by a simple static web server.
Examples:
https://example.com/ → index.htmlhttps://example.com/docs/getting-started → docs/getting-started/index.htmlhttps://example.com/script.js → /script.js inside saved HTML when same-origin URL rewriting is enabledhttps://cdn.example.com/app.css → _external/cdn.example.com/app.css when cross-origin asset capture is enabled../../_external/cdn.example.com/app.css when URL rewriting is enabledSoenneker.Html.Formatter when FormatHtml = true.ThrottleMode = PlaywrightCrawlThrottleMode.Disabled keeps configured concurrency limits and retries, but skips the crawler's automatic pacing and adaptive slowdown behavior.Full mode captures resources observed during page loads, but the rewrite pass is limited to captured cross-origin asset URLs rather than a full offline-mirroring transform.| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 net10.0 is compatible. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed. |
This package is not used by any NuGet packages.
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 4.0.101 | 0 | 6/19/2026 |
| 4.0.100 | 0 | 6/19/2026 |
| 4.0.99 | 19 | 6/18/2026 |
| 4.0.98 | 86 | 6/18/2026 |
| 4.0.97 | 98 | 6/18/2026 |
| 4.0.96 | 94 | 6/17/2026 |
| 4.0.94 | 114 | 6/17/2026 |
| 4.0.92 | 100 | 6/17/2026 |
| 4.0.91 | 103 | 6/16/2026 |
| 4.0.90 | 214 | 6/14/2026 |
| 4.0.89 | 105 | 6/13/2026 |
| 4.0.88 | 101 | 6/13/2026 |
| 4.0.87 | 240 | 6/11/2026 |
| 4.0.86 | 101 | 6/11/2026 |
| 4.0.85 | 131 | 6/10/2026 |
| 4.0.84 | 96 | 6/10/2026 |
| 4.0.83 | 107 | 6/10/2026 |
| 4.0.82 | 110 | 6/10/2026 |
| 4.0.81 | 144 | 6/10/2026 |
| 4.0.80 | 153 | 6/9/2026 |