![]() |
VOOZH | about |
dotnet add package AgentSdk.Pdf --version 1.4.0
NuGet\Install-Package AgentSdk.Pdf -Version 1.4.0
<PackageReference Include="AgentSdk.Pdf" Version="1.4.0" />
<PackageVersion Include="AgentSdk.Pdf" Version="1.4.0" />Directory.Packages.props
<PackageReference Include="AgentSdk.Pdf" />Project file
paket add AgentSdk.Pdf --version 1.4.0
#r "nuget: AgentSdk.Pdf, 1.4.0"
#:package AgentSdk.Pdf@1.4.0
#addin nuget:?package=AgentSdk.Pdf&version=1.4.0Install as a Cake Addin
#tool nuget:?package=AgentSdk.Pdf&version=1.4.0Install as a Cake Tool
PDF processing extensions for . Provides PDF image extraction, content analysis, and markdown conversion using PdfPig.
dotnet add package AgentSdk.Pdf
Platform Requirements:
# Ubuntu/Debian
sudo apt-get install libgdiplus
# Alpine (Docker)
apk add libgdiplus
DOTNET_SYSTEM_DRAWING_ENABLE_UNIX_SUPPORT=1 on Linuxusing Cyclotron.Maf.AgentSdk.Options;
using Cyclotron.Maf.AgentSdk.Services;
using Microsoft.Extensions.DependencyInjection;
// Register PDF services
services.AddPdfServices();
// Use PDF content analyzer
var analyzer = serviceProvider.GetRequiredKeyedService<IPdfContentAnalyzer>("pdfpig");
var analysis = await analyzer.AnalyzeAsync("invoice.pdf", cancellationToken);
if (analysis.ContentType == PdfContentType.TextBased)
{
// Convert to markdown
var converter = serviceProvider.GetRequiredService<IPdfToMarkdownConverter>();
var markdown = await converter.ConvertToMarkdownAsync("invoice.pdf", cancellationToken);
}
else if (analysis.ContentType == PdfContentType.ImageOnly)
{
// Extract images for vision model
var extractor = serviceProvider.GetRequiredKeyedService<IPdfImageExtractor>("pdfpig");
var images = await extractor.ExtractImagesAsync("invoice.pdf", cancellationToken);
// Use with Azure OpenAI GPT-4 Vision
foreach (var image in images)
{
// image.ImageBase64 ready for DataContent
// image.MimeType for content type
}
}
Configure PDF processing in appsettings.json or agent.config.yaml:
{
"PdfContentAnalysis": {
"Enabled": true,
"AnalyzerKey": "pdfpig",
"FailureStrategy": "fallback",
"TextRatioThreshold": 0.1,
"MaxPagesToAnalyze": 0,
"MinCharactersPerPage": 10,
"LogDetailedResults": false,
"FullPageImageAreaCoverageThreshold": 0.70,
"FullPageImagePrimaryDimensionThreshold": 0.85,
"FullPageImageSecondaryDimensionThreshold": 0.60
}
}
Options:
Enabled - Enable/disable PDF content analysisAnalyzerKey - Keyed service name ("pdfpig" by default)FailureStrategy - Skip, Throw, or Fallback on errorsTextRatioThreshold - Minimum text ratio for TextBased classification (0.0 - 1.0)MaxPagesToAnalyze - Limit pages to analyze (0 = all pages)MinCharactersPerPage - Minimum characters to consider page as textLogDetailedResults - Enable detailed loggingFullPageImageAreaCoverageThreshold - Minimum area coverage ratio (0.0 - 1.0) for an image to be classified as a dominant full-page image. Default: 0.70FullPageImagePrimaryDimensionThreshold - Minimum coverage ratio for an image's primary axis (width or height) in the aspect-aware full-page check. Default: 0.85FullPageImageSecondaryDimensionThreshold - Minimum coverage ratio for an image's secondary axis in the aspect-aware full-page check. Default: 0.60{
"PdfImageExtraction": {
"Enabled": true,
"ExtractorKey": "pdfpig",
"MaxPagesToProcess": 0,
"MaxImageSizeBytes": 5242880,
"PreferredFormat": "jpeg",
"JpegQuality": 85,
"EncodeAsBase64": true,
"MinImageWidth": 50,
"MinImageHeight": 50,
"SkipTextOnlyPages": true,
"LogDetailedResults": false
}
}
Options:
Enabled - Enable/disable image extractionExtractorKey - Keyed service name ("pdfpig" by default)MaxPagesToProcess - Limit pages to process (0 = all pages)MaxImageSizeBytes - Maximum image file size (5MB default)PreferredFormat - "jpeg" or "png" (PdfPig always outputs PNG)JpegQuality - JPEG compression quality (1-100)EncodeAsBase64 - Encode images as base64 stringsMinImageWidth/MinImageHeight - Minimum dimensions to extractSkipTextOnlyPages - Skip pages with text but no imagesLogDetailedResults - Enable detailed logging{
"PdfConversion": {
"Enabled": true,
"SaveMarkdownForDebug": false,
"OutputDirectory": "./output",
"IncludePageNumbers": true,
"PreserveParagraphStructure": true,
"IncludeTimestampInFilename": false,
"MarkdownFileExtension": ".md"
}
}
Options:
Enabled - Enable/disable markdown conversionSaveMarkdownForDebug - Save markdown files for debuggingOutputDirectory - Directory for debug markdown filesIncludePageNumbers - Add page number markers in markdownPreserveParagraphStructure - Maintain paragraph breaksIncludeTimestampInFilename - Add timestamp to debug filenamesMarkdownFileExtension - File extension for markdown files// 1. Analyze PDF content
var analyzer = serviceProvider.GetRequiredKeyedService<IPdfContentAnalyzer>("pdfpig");
var analysis = await analyzer.AnalyzeAsync(pdfPath, cancellationToken);
// 2. Route based on content type
switch (analysis.ContentType)
{
case PdfContentType.TextBased:
// Text extraction workflow
var converter = serviceProvider.GetRequiredService<IPdfToMarkdownConverter>();
var markdown = await converter.ConvertToMarkdownAsync(pdfPath, cancellationToken);
// Process markdown with LLM
break;
case PdfContentType.ImageOnly:
// Vision model workflow
var extractor = serviceProvider.GetRequiredKeyedService<IPdfImageExtractor>("pdfpig");
var images = await extractor.ExtractImagesAsync(pdfPath, cancellationToken);
// Process images with GPT-4 Vision
break;
case PdfContentType.Mixed:
// Hybrid workflow - use both
var markdownContent = await converter.ConvertToMarkdownAsync(pdfPath, cancellationToken);
var extractedImages = await extractor.ExtractImagesAsync(pdfPath, cancellationToken);
// Combine text and image processing
break;
}
For large PDFs, use streaming extraction to process images one at a time:
var extractor = serviceProvider.GetRequiredKeyedService<IPdfImageExtractor>("pdfpig");
var imageCount = await extractor.ExtractImagesStreamAsync(
pdfStream,
"large-document.pdf",
async (image) =>
{
// Process each image as it's extracted
await ProcessImageWithVisionModelAsync(image);
// Return false to stop extraction
return true;
},
cancellationToken);
var extractor = serviceProvider.GetRequiredKeyedService<IPdfImageExtractor>("pdfpig");
var pageNumbers = new[] { 1, 3, 5 }; // Extract from pages 1, 3, and 5
var images = await extractor.ExtractImagesAsync(pdfPath, pageNumbers, cancellationToken);
Register a custom analyzer alongside the default PdfPig implementation:
services.AddKeyedSingleton<IPdfContentAnalyzer>(
"custom",
(sp, _) => new MyCustomAnalyzer(
sp.GetRequiredService<ILogger<MyCustomAnalyzer>>(),
sp.GetRequiredService<IOptions<PdfContentAnalysisOptions>>()));
// Configure to use custom analyzer
services.Configure<PdfContentAnalysisOptions>(options =>
{
options.AnalyzerKey = "custom";
});
PDF services integrate seamlessly with workflows:
// In your workflow executor
public class InvoiceExtractionWorkflow : IInvoiceExtractionWorkflow
{
private readonly IPdfContentAnalyzer _analyzer;
private readonly IPdfToMarkdownConverter _converter;
private readonly IPdfImageExtractor _extractor;
private readonly IAgentFactory _agentFactory;
public InvoiceExtractionWorkflow(
[FromKeyedServices("pdfpig")] IPdfContentAnalyzer analyzer,
IPdfToMarkdownConverter converter,
[FromKeyedServices("pdfpig")] IPdfImageExtractor extractor,
[FromKeyedServices("extraction")] IAgentFactory agentFactory)
{
_analyzer = analyzer;
_converter = converter;
_extractor = extractor;
_agentFactory = agentFactory;
}
public async Task<WorkflowResult<InvoiceData>> ExecuteAsync(
WorkflowInput input,
CancellationToken cancellationToken = default)
{
// Analyze content
var analysis = await _analyzer.AnalyzeAsync(input.FilePath, cancellationToken);
// Extract data based on content type
var context = analysis.ContentType == PdfContentType.TextBased
? await _converter.ConvertToMarkdownAsync(input.FilePath, cancellationToken)
: string.Empty;
var images = analysis.ContentType != PdfContentType.TextBased
? await _extractor.ExtractImagesAsync(input.FilePath, cancellationToken)
: Array.Empty<ExtractedPdfImage>();
// Process with agent
await _agentFactory.CreateAgentAsync(vectorStoreId: null, cancellationToken);
var response = await _agentFactory.RunAgentWithPollingAsync(
messages: BuildMessages(context, images),
cancellationToken: cancellationToken);
return ParseInvoiceData(response);
}
}
See for complete workflow examples.
# Install libgdiplus
sudo apt-get update && sudo apt-get install -y libgdiplus
# Set environment variable
export DOTNET_SYSTEM_DRAWING_ENABLE_UNIX_SUPPORT=1
Enabled is true in PdfImageExtraction configurationMinImageWidth and MinImageHeight thresholdsLogDetailedResults to see detailed extraction logsPreserveParagraphStructure for better formattingIncludePageNumbers for document navigationSee the main repository for contribution guidelines.
MIT License - see for details.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net8.0 net8.0 is compatible. net8.0-android net8.0-android was computed. net8.0-browser net8.0-browser was computed. net8.0-ios net8.0-ios was computed. net8.0-maccatalyst net8.0-maccatalyst was computed. net8.0-macos net8.0-macos was computed. net8.0-tvos net8.0-tvos was computed. net8.0-windows net8.0-windows was computed. net9.0 net9.0 was computed. net9.0-android net9.0-android was computed. net9.0-browser net9.0-browser was computed. net9.0-ios net9.0-ios was computed. net9.0-maccatalyst net9.0-maccatalyst was computed. net9.0-macos net9.0-macos was computed. net9.0-tvos net9.0-tvos was computed. net9.0-windows net9.0-windows was computed. net10.0 net10.0 was computed. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed. |
Showing the top 1 NuGet packages that depend on AgentSdk.Pdf:
| Package | Downloads |
|---|---|
|
AgentSdk
A .NET SDK for building AI agent workflows using Microsoft Agent Framework (MAF) and Azure AI Foundry. Provides workflow orchestration, agent factories, vector store management, and OpenTelemetry integration. |
This package is not used by any popular GitHub repositories.
| Version | Downloads | Last Updated |
|---|---|---|
| 1.4.0 | 160 | 3/11/2026 |
| 1.4.0-alpha.2 | 72 | 3/11/2026 |
| 1.3.0 | 189 | 3/11/2026 |
| 1.3.0-alpha.7 | 72 | 3/11/2026 |
| 1.3.0-alpha.2 | 66 | 3/10/2026 |
| 1.2.0 | 115 | 3/10/2026 |
| 1.1.0 | 253 | 3/2/2026 |
| 1.1.0-alpha.1 | 106 | 2/28/2026 |
| 1.0.0 | 134 | 2/24/2026 |