👁 Image
Fuuga 1.0.7

.NET 10.0

dotnet add package Fuuga --version 1.0.7

NuGet\Install-Package Fuuga -Version 1.0.7

This command is intended to be used within the Package Manager Console in Visual Studio, as it uses the NuGet module's version of Install-Package.

<PackageReference Include="Fuuga" Version="1.0.7" />

For projects that support PackageReference, copy this XML node into the project file to reference the package.

<PackageVersion Include="Fuuga" Version="1.0.7" />
 

 Directory.Packages.props

<PackageReference Include="Fuuga" />
 

 Project file

For projects that support Central Package Management (CPM), copy this XML node into the solution Directory.Packages.props file to version the package.

paket add Fuuga --version 1.0.7

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

#r "nuget: Fuuga, 1.0.7"

#r directive can be used in F# Interactive and Polyglot Notebooks. Copy this into the interactive tool or source code of the script to reference the package.

#:package Fuuga@1.0.7

#:package directive can be used in C# file-based apps starting in .NET 10 preview 4. Copy this into a .cs file before any lines of code to reference the package.

#addin nuget:?package=Fuuga&version=1.0.7
 

 Install as a Cake Addin

#tool nuget:?package=Fuuga&version=1.0.7
 

 Install as a Cake Tool

The NuGet Team does not provide support for this client. Please contact its maintainers for support.

Fuuga

Tired of paying tokens? Think you could train a better model? Well, now you can try.

An LLM built from scratch in F# and .NET. Fuuga implements a complete language model pipeline: tokenization, data ingestion, model training, fine-tuning, and text generation -- with no Python dependencies.

Built on TorchSharp for tensor operations and Microsoft.ML.Tokenizers for BPE, Fuuga uses idiomatic F# (discriminated unions, pipelines, immutability) throughout. Works on GPU or CPU.

Features

Core Pipeline:

BPE Tokenizer -- Train a byte-pair encoding tokenizer on your own corpus with configurable vocabulary size
Data Ingestion -- Discover and tokenize epub, markdown, Parquet, and plain text files into a binary corpus
Parquet I/O -- Read and write HuggingFace-compatible Parquet datasets for SFT, DPO, and document data
Corpus Compression -- Zstd compression/decompression for .fuge corpus files
GPT-2 Transformer -- Decoder-only causal transformer with rotary position embeddings (RoPE), grouped-query attention (GQA), RMSNorm, and SwiGLU activation
Multi-Head Latent Attention (MLA) -- DeepSeek-V2 style compressed KV cache with query/KV compression, decoupled RoPE keys, and optional weight absorption for reduced memory during inference
Vision Encoder -- Vision model support for multimodal inputs
Vision Bridge -- Q-Former cross-attention bridge that compresses vision patch tokens into learned query vectors for multimodal (image+text) inputs
Paged Attention -- Paged KV-cache attention for efficient memory usage during long-context generation
Memory Hierarchy -- Compressed memory with external retrieval for extended context
Multi-Resolution Attention -- Chunk pooling with global tokens for efficient long-context processing
FlashAttention Config -- SDPA backend selection and benchmarking for attention kernels
Auto Config -- Hardware-aware auto-resolution of DU configuration cases (norm, activation, precision, offloading, communication) at startup
Early Exit -- Adaptive depth inference for faster generation when confidence is high
Training -- AdamW optimizer with cosine learning rate scheduling, warmup, gradient clipping, mixed precision support, and gradient accumulation
Multi-Token Prediction (MTP) -- DeepSeek-V3 style auxiliary heads predicting multiple future tokens (configurable Depth); adds a weighted multi-depth loss during training for better sample efficiency and powers MTP-drafted speculative decoding for faster generation. Enable from the CLI with train --mtp-depth <N> [--mtp-loss-weight <f>], or set MtpConfig in the model-config JSON
INT8 Optimizer Moments -- Optional INT8 quantization of AdamW M/V moment tensors with per-row symmetric quantization, reducing optimizer memory ~4× (--moment-quant int8)
Optimizer Variants -- Stochastic Weight Averaging (SWA) and Lookahead optimizer support with checkpointable optimizer state
Gradient Checkpointing -- Memory-efficient training via activation recomputation
GPU Offloading -- Layer-wise CPU/GPU offloading for reduced VRAM usage
Optimizer Offloading -- Offload optimizer states to CPU memory
NVMe Paging -- ZeRO-Infinity 3-tier GPU/CPU/NVMe memory management for training models larger than available VRAM
Per-Tensor Gradient Offloading -- Bulk-copy gradients to CPU after backward pass and restore before optimizer step, freeing GPU VRAM during the optimizer phase (--grad-offload)
Per-Parameter CUDA Flush -- Aggressive CUDA cache cleanup after optimizer step to reclaim transient VRAM spikes from M/V update temporaries (--flush-each-param)
VRAM Guard -- In-process background thread that polls GPU memory via nvidia-smi and signals the training loop to warn, skip batches, or abort when usage exceeds a configurable threshold (--vram-guard-gb <float>)
Memory Strategy Presets -- MemoryStrategyConfigs.none, .constrained (grad-offload + flush), and .full (all three with VRAM guard at 95%) with automatic ConfigWizard recommendations based on model-to-VRAM ratio
Model Parallelism -- Tensor and pipeline parallelism configuration for 70B+ parameter models, with automatic DataParallel recommendation for multi-GPU setups
Inference -- Greedy, top-k, top-p (nucleus), and temperature sampling with repetition penalty
Fill-in-the-Middle -- FIM support with prefix/suffix tokens for code completion
Checkpoints -- Save, load, resume training from checkpoints with full metadata; safetensors format support
Memory-Mapped Loading -- mmap-based model loading for fast startup
Streaming Inference -- Token-by-token generation with configurable stop conditions
Confidence Signals -- Entropy, repetition rate, hedging detection, calibrated confidence with Platt scaling, and stop reason reporting
Drift Detection -- Statistical drift monitoring (Kolmogorov-Smirnov, Population Stability Index) over confidence signals with ring-buffered accumulation
Drift Alerting -- Dual-threshold alerts with adaptive sigma-based thresholds, OpenTelemetry metrics, and retraining triggers
ONNX Export -- Export to ONNX format with fp16/int8 quantization, validation, and benchmarking
ONNX Inference -- ONNX Runtime backend for optimized inference (--backend onnx)
Benchmark Evaluation -- Built-in benchmark runner for MMLU, HellaSwag, ARC-Challenge, cached dataset downloads, and checkpoint-attached benchmark results
FP8 Dequantization -- FP8 format support for quantized weight loading with GPU-accelerated LUT path (256-entry cached lookup table using torch.index_select) auto-selected when CUDA is available
Validation Pipeline -- Input validation framework with composable validators
Scaling Heuristics -- Auto-scaling configuration from corpus and hardware stats
Config Wizard -- Corpus analysis and hardware-aware config generation using Chinchilla scaling laws, activation memory estimates, NTK-aware RoPE, multi-GPU detection (nvidia-smi), and memory strategy recommendations
CLI -- Subcommands for the full pipeline (tokenize, ingest, train, infer, info, sft, dpo, rl, merge, transfer, distill, merge-models, fisher, distributed, export onnx, compress, decompress, prune, eval, config, wordnet, serve, orchestrate, agent)

Fine-Tuning:

Supervised Fine-Tuning (SFT) -- LoRA-based fine-tuning on instruction/chat JSONL data with configurable rank, alpha, and target modules
Prompt Tuning -- Soft-prompt / virtual-token fine-tuning with frozen base weights for lightweight PEFT workflows
Direct Preference Optimization (DPO) -- Preference learning from chosen/rejected pairs with LoRA
Reinforcement Learning (RL) -- REINFORCE++ / GRPO fine-tuning with pluggable reward functions
Reward Functions -- Composable reward functions for RL training (correctness, formatting, safety)
LoRA Adapter Merging -- Merge trained LoRA adapters back into the base model weights
QLoRA -- NF4-quantized base weights with LoRA adapters for memory-efficient fine-tuning on consumer GPUs
Data Validation -- JSONL format validation for SFT and DPO datasets with honesty pattern classification
Data Augmentation -- Synonym replacement, rule-based paraphrasing, token-level noise injection, and SFT/DPO oversampling for training data diversity

Weight Transfer and Model Merging:

Weight Transfer -- Transfer weights from donor models with architecture-aware mapping (Phi-3, LLaMA3, Gemma-3, Gemma-4, DeepSeek-V3 dense FFN) and dimension adaptation for mismatched tensors
Knowledge Distillation -- Token-level, sequence-level, and reverse-KLD distillation from a teacher model
Attention Distillation -- MLA-to-GQA distillation that trains grouped-query attention layers to reproduce frozen Multi-head Latent Attention teacher outputs (DeepSeek-V3/Kimi K2) with KL-divergence + MSE loss
N-ary Model Merging -- Merge multiple models with configurable strategies (TIES, DARE, Karcher mean, ModelSoups, ModelStock) and EWC protection
Fisher Information -- Compute diagonal Fisher information matrices for Elastic Weight Consolidation
N-ary Data Mixing -- Weighted multi-source mixing with static, curriculum, proxy-based DoReMi (Group DRO domain reweighting from a unigram proxy), and self-paced (difficulty-ramped) strategies

Inference Capabilities:

Chain-of-Thought -- Thinking mode with ThinkStart/ThinkEnd token handling and dimmed thinking display
Constrained Decoding -- Grammar-guided JSON structured output generation
Self-Verification -- Draft/refine verification passes with learned verifier scoring for higher-confidence answers
Tool Calling -- MCP (Model Context Protocol) client for tool discovery and invocation during generation
Tool Policy -- Confidence-aware tool routing policy for deciding when external tools should be invoked
Web Search -- Web search integration for grounded generation with citations
Image Routing -- Route image-related queries to Fuuga.Image for generation or captioning
A2A Protocol -- Agent-to-Agent protocol client for multi-agent communication
Tree-Structured Speculative Decoding -- Speculative decoding with tree-structured candidates for faster generation
Draft-and-Refine -- Multi-pass reasoning pipeline for improved output quality
Autonomous Agent -- Web search agent loop for autonomous information gathering
Advanced Reasoning -- Consensus voting, verifier-scored selection, and tree-of-thoughts for improved answer quality
Backend Client -- HTTP client for calling external OpenAI-compatible LLM endpoints with structured response types
Context Awareness -- Convention file discovery (AGENTS.md, CLAUDE.md, .cursorrules), language/framework detection, git context
Semantic Knowledge -- WordNet WNDB parser with token-to-synset mapping and multi-lingual support

Orchestration:

Multi-Model Orchestrator -- Route tasks to appropriate models based on capability
Cost-Aware Routing -- Budget-tracked model routing with cost optimization
Fan-Out Orchestration -- Decompose tasks into subtasks, run in parallel, and aggregate results
Resumable Orchestration -- Checkpoint and resume fan-out plans across sessions

Agentic Persistence:

Experience Store -- Append-only JSON Lines log of attempt outcomes with thread-safe managed access
Strategy Lessons -- Persist and load distilled lessons from past experience for self-improvement
Persistent Retrieval Store -- Disk-backed IRetrievalStore for cross-session document retrieval
Agent Session Management -- Session lifecycle (init → active → completed/failed), save/load state, step-level experience recording
Orchestration Checkpoints -- Save and resume fan-out orchestration plans with per-subtask completion tracking
Cost Outcome Tracking -- Persist cost-aware routing outcomes for budget optimization across sessions

Model Compression:

Structured Pruning -- Attention head removal and layer removal with importance scoring
NF4 Quantization -- 4-bit NormalFloat quantization for weight compression
Quantization-Aware Training (STE) -- Straight-Through Estimator for NF4 weights: forward pass sees quantized values, backward pass flows gradients through identity (--ste)
Compression Pipeline -- Orchestrated prune → fine-tune → quantize workflow for production deployment

Server:

OpenAI-Compatible API -- Separate fuuga-serve project with /v1/chat/completions, /v1/completions, /v1/models, and /v1/embeddings endpoints
SSE Streaming -- Server-Sent Events for real-time token streaming
Continuous Batching (engine only) -- Iteration-level continuous batching scheduler, queueing, and preemption policies. NOTE: not yet connected to the HTTP server, which currently serializes requests behind a single inference lock; full integration requires a paged batched forward pass in FuugaModel
Dynamic Batching (engine only) -- Batch scheduling engine with metrics and backpressure. NOTE: same status as above — implemented and tested, but not yet wired into the serving path
Bearer Token Auth -- Optional API key authentication middleware
Guard Rails -- Prompt injection detection, PII masking, and content filtering middleware
MCP Tool Routing -- Server-side MCP tool integration for function calling
A2A Server -- Agent-to-Agent protocol server endpoint for multi-agent workflows

Distributed Training:

PyTorch/DeepSpeed Integration -- Export model weights for distributed training, import trained weights back, and auto-generate launch scripts

Image Generation (Fuuga.Image):

Text-to-Image -- Stable Diffusion image generation from text prompts with configurable samplers, steps, and guidance
Image-to-Image -- Transform existing images guided by text prompts with denoising strength control
Image Captioning -- Phi-3.5-vision captioning with brief/standard/detailed output modes
MCP Server Mode -- Run as an MCP tool server over stdio for integration with Fuuga LLM

Local Minion (Delegation):

Minion delegation -- Run Fuuga as a local "minion" that a more capable master agent (e.g. Claude Code) delegates small, well-scoped errands to, keeping simple work on a fully-local model with minimal energy cost
MCP server (fuuga-serve mcp) -- Exposes a fuuga_delegate tool plus local file tools to a master over stdio JSON-RPC; register several with distinct --name/--role to run a fleet of specialists (e.g. F# coder, C# coder, project manager)
Swappable brain -- The minion runs on a Fuuga-trained checkpoint (--checkpoint), an in-process GGUF via LLamaSharp (--gguf), or any OpenAI-compatible endpoint such as a local Ollama (--endpoint)
Sandboxed local tools -- read_file, list_dir, grep, plus permission-gated write_file, replace_in_file, and run_command; confined to a workspace --root (resists .. and symlink escapes), with writes/shell off by default
Extended reach -- Opt into the built-in web tools (--web) and configured MCP servers (--mcp-config) so a delegated errand can fetch pages and call other tools, not just touch the filesystem
Escalation contract -- Each delegation returns structured JSON {status, output, files_changed, escalate, reason, confidence}; the minion self-verifies and hands work back (escalate=true) when it is not confident, so the master only spends its own capacity when needed
One-shot CLI (fuuga delegate) -- Run a single errand locally and print the JSON result, without a master

Observability:

OpenTelemetry -- OTLP trace and metrics export with Serilog integration
Spectre.Console -- Rich terminal output for training progress and diagnostics

Prerequisites

.NET 10 SDK (v10.0.103 or later)
GPU is optional -- CPU works for the dev configuration (small model). CUDA-capable GPU recommended for larger models.
~500 MB disk space for dependencies, plus space for training data and checkpoints

Quick Start

For most users, the fastest path to useful output is to start from donor weights, not from scratch training.

Recommended paths:

examples/donor-transfer-and-refine.fsx -- practical donor-first workflow for normal users, with two modes:
- SmokeTest for limited hardware, using a very small donor just to prove the F# pipeline is real
- Practical for a few-GB donor model that gives much better output quality
examples/complete-pipeline.fsx -- educational train-from-scratch pipeline
examples/advanced-scenarios.fsx -- high-level tour of advanced Fuuga capabilities

If you want to understand the full pipeline from scratch, use the CLI below:

# Build (CPU):
dotnet build
# Build (GPU, ~2 GB dependency):
dotnet build -p:TorchBackend=cuda

# Train a tokenizer, ingest a corpus, train, and generate text
dotnet run -- tokenize --input data/raw --vocab-size 8000 --output data/tokenizer
dotnet run -- ingest --input data/raw --output data/corpus.fuge --tokenizer data/tokenizer
dotnet run -- train --corpus data/corpus.fuge --tokenizer data/tokenizer --checkpoint-dir checkpoints/
dotnet run -- infer --checkpoint checkpoints/step-100 --tokenizer data/tokenizer --prompt "Once upon a time"

See the for a complete end-to-end walkthrough.

Important expectation setting:

scratch training is educational and flexible, but tiny early runs often produce weak or gibberish text
donor transfer is the better starting point when you want coherent output quickly
short refinement on your own domain data is usually much more useful than starting from random weights

Use .fuge for tokenized corpus files and .fuuga for portable model packages.

F# Script Examples

Prefer the F# API over the CLI?

Practical donor-first path:

dotnet build
dotnet fsi examples/donor-transfer-and-refine.fsx

This loads donor weights, runs transfer into a Fuuga model, evaluates prompt outputs, and can do a short refinement pass. It is the recommended starting point for users who want useful results on limited hardware or with a few-GB donor model.

Train-from-scratch path:

dotnet build
dotnet fsi examples/complete-pipeline.fsx

This trains a tokenizer, ingests data, trains a model, and generates text -- all using the Fuuga modules directly. See for the full source.

For advanced workflows -- weight transfer from Phi-3/LLaMA3/DeepSeek, LoRA and prompt tuning, benchmark evaluation, ONNX export, verifier-assisted inference, constrained decoding, chain-of-thought, streaming inference, and serving via the OpenAI-compatible API -- see .

For image generation and captioning, see .

Image Generation

Fuuga.Image is a standalone CLI for image generation and captioning. See the for full command reference and MCP server mode, or run .

Project Structure

Fuuga.fsproj # Project file with layered compilation order
Types.fs # All shared types (ModelConfig, TrainingConfig, GenerationConfig, etc.)
Logging.fs # ActivitySource/Meter definitions, ILoggerFactory
Observability.fs # OpenTelemetry providers, Spectre.Console, --observe flag
DriftDetection.fs # Statistical drift monitoring (KS, PSI) over confidence signals
DriftAlerting.fs # Dual-threshold alerts, adaptive thresholds, OTel metrics, retraining triggers
Config.fs # JSON config loading, CLI arg parsing, MCP config, LoRA target parsing
Validation.fs # Input validation pipeline with composable validators
Scaling.fs # Scaling heuristics from corpus and hardware stats
ConfigWizard.fs # Corpus analysis + hardware-aware config generation (Chinchilla scaling)
Tokenizer.fs # BPE tokenizer training and loading
ParquetIO.fs # HuggingFace Parquet dataset read/write (Document, SFT, DPO)
TextCleanup.fs # Ingestion/preparation text cleanup
RagCleanup.fs # RAG (Retrieval-Augmented Generation) cleanup algorithms
Ingest.fs # Document discovery and binary corpus writing
CorpusCompression.fs # Zstd compression/decompression for .fuge files
Tensor.fs # Device selection (CPU/CUDA), DisposeScope
MultiResolutionAttention.fs # Chunk pooling, global tokens for long context
Model.fs # GPT-2 transformer with RoPE, GQA, RMSNorm, SwiGLU, MLA
AttentionConfig.fs # FlashAttention verification, SDPA backend selection
AutoConfig.fs # Auto-resolution of DU Auto* config cases from hardware probing
Vision.fs # Vision encoder for multimodal inputs
VisionBridge.fs # Q-Former cross-attention bridge for vision-to-language compression
PagedAttention.fs # Paged KV-cache attention
MemoryHierarchy.fs # Compressed memory, external retrieval
PersistentRetrievalStore.fs # Disk-backed IRetrievalStore for cross-session retrieval
ConfidenceHead.fs # Calibrated confidence MLP, Platt scaling, bucket assignment
EarlyExit.fs # Early exit / adaptive depth inference
Optimizer.fs # AdamW, SWA, Lookahead, and INT8 moment-quantized optimizers
Checkpoint.fs # Checkpoint save/load/metadata, safetensors
MmapLoading.fs # Memory-mapped model loading
GradientCheckpointing.fs # Gradient checkpointing for memory-efficient training
DistributedTraining.fs # Distributed training (PyTorch/DeepSpeed export/import)
ModelParallelism.fs # Tensor/pipeline parallelism config for 70B+ models
GpuOffloading.fs # Layer-wise CPU/GPU offloading
OptimizerOffload.fs # Optimizer state offloading
NvmePaging.fs # ZeRO-Infinity 3-tier GPU/CPU/NVMe memory management
OnnxExport.fs # ONNX export with quantization and validation
DataMixture.fs # N-ary weighted data source mixing
VramGuard.fs # In-process VRAM monitoring with nvidia-smi polling, signal-based training loop integration
Training.fs # Training loop with AdamW/cosine LR, gradient offloading, per-param flush
FineTuningData.fs # SFT/DPO JSONL parsing, chat templates, tokenization, batching
FineTuning.fs # LoRA (LoraLinear), SFT training, DPO loss/training, adapter save/load
RewardFunctions.fs # Composable reward functions for RL training
DataValidation.fs # SFT/DPO JSONL validation, honesty pattern classification
Fp8Dequantization.fs # FP8 format dequantization with GPU LUT acceleration
Nf4Quantizer.fs # NF4/FP4 4-bit weight quantization, STE for QAT
QLoraTraining.fs # QLoRA training (NF4 base + LoRA adapters)
WeightTransfer.fs # Weight transfer from donor models (Phi-3, LLaMA3, Gemma-3/4, DeepSeek mappings)
ModelMerge.fs # N-ary model merging (TIES, DARE, Karcher) with EWC protection
AttentionDistillation.fs # MLA-to-GQA attention distillation (KL + MSE loss)
Pruning.fs # Structured pruning (attention heads, layers)
CompressionPipeline.fs # Prune → finetune → quantize orchestration
ConstrainedDecoding.fs # Grammar-guided JSON constrained decoding
ChainOfThought.fs # ThinkStart/ThinkEnd token handling, phase tracking
Verifier.fs # Rule-based and learned verifier strategies
ToolPolicy.fs # Confidence-aware tool invocation policy
McpClient.fs # MCP client: connection, tool discovery, tool invocation
WebSearch.fs # Web search integration for grounded generation
ImageRouting.fs # Image query routing to Fuuga.Image
A2AClient.fs # A2A protocol client for agent-to-agent communication
TreeSpeculation.fs # Tree-structured speculative decoding
ContinuousBatching.fs # Iteration-level continuous batching for serving
Inference.fs # Text generation with sampling, tool-augmented generation, structured output, CoT
DraftAndRefine.fs # Draft-and-refine multi-pass reasoning pipeline
AdvancedReasoning.fs # Consensus voting, verifier-scored, tree-of-thoughts
Orchestrator.fs # Multi-model orchestrator with capability-based routing
BackendClient.fs # HTTP client for external OpenAI-compatible LLM endpoints
ExperienceStore.fs # Append-only experience log, strategy lessons, managed store
CostAwareRouting.fs # Cost-aware routing with budget tracking
AgentSession.fs # Agent session lifecycle, save/load, orchestration checkpoints
FanOutOrchestration.fs # Fan-out/fan-in task decomposition and aggregation
ContextAwareness.fs # Convention file discovery, language/framework detection, git context
SemanticKnowledge.fs # WordNet WNDB parser, token-to-synset mapping, multi-lingual
DataAugmentation.fs # Synonym replacement, paraphrasing, token noise, oversampling
VerifierRuntime.fs # Verifier loading and runtime integration
MinionTools.fs # Sandboxed local tools (read/list/grep/write/replace/run) for delegated errands
MinionBrain.fs # Swappable minion brain: Fuuga checkpoint, GGUF (LLamaSharp), or OpenAI endpoint
MinionEscalation.fs # Escalation contract: self-verify + uncertainty/budget signals -> hand back to master
MinionToolEnv.fs # Minion reach beyond local files: built-in web tools and configured MCP servers
MinionAgent.fs # Delegate loop: tool-use rounds, structured DelegateResult, escalation verdict
MinionCli.fs # Shared minion flag parsing (brain, sandbox, reach, identity, escalation)
Eval.fs # Benchmark datasets, runners, result serialization
Program.fs # CLI entry point with subcommand routing (incl. `delegate`)

Fuuga.Server/ # OpenAI-compatible HTTP server (separate project)
 ApiTypes.fs # Request/response types (OpenAI-compatible)
 McpToolRouting.fs # Server-side MCP tool routing for function calling
 FuugaChatClient.fs # IChatClient adapter for Microsoft AI ecosystem
 FuugaEmbeddingGenerator.fs # IEmbeddingGenerator adapter for /v1/embeddings
 A2AServer.fs # A2A protocol server endpoint
 GuardRails.fs # Prompt injection detection, PII masking, content filtering
 DynamicBatchingServer.fs # Dynamic batching with HTTP/SSE integration
 Server.fs # Oxpecker HTTP server with SSE streaming, auth middleware
 MinionServer.fs # Minion MCP server (stdio JSON-RPC): fuuga_delegate + local tools
 Program.fs # Server entry point (incl. `mcp` minion mode)

Fuuga.Image/ # Standalone image generation and captioning CLI
 Types.fs # Domain types, error handling (ImageError DU)
 Config.fs # CLI argument parsing
 ImageIO.fs # Image load/save, format conversion, validation
 Diffusion.fs # Stable Diffusion model wrapper (txt2img, img2img)
 Caption.fs # Phi-3.5-vision captioning (ONNX Runtime GenAI)
 McpServer.fs # MCP JSON-RPC 2.0 server over stdio
 Program.fs # Entry point, subcommand routing

Fuuga.Tests/ # Unit and integration tests (xUnit + FsUnit)
Fuuga.Image/Fuuga.Image.Tests/ # Image module tests
docs/ # User-facing documentation
examples/ # Runnable F# script examples
scripts/ # Training data generation and validation scripts

Documentation

-- End-to-end walkthrough from build to text generation
-- All subcommands, flags, defaults, and exit codes
-- Model, training, generation, and fine-tuning parameters explained
-- Fuuga vs .NET ecosystem comparison
-- Adding new natural languages via the KnownNaturalLanguages registry
-- Image generation CLI reference and MCP server mode
-- F# script demonstrating the full API
-- Weight transfer, prompt/LoRA tuning, RL, evaluation, ONNX export, serving
-- F# script demonstrating image generation and captioning

Architecture

Fuuga uses a layered module architecture with strict dependency ordering enforced by F#'s compilation model:

Layer 0: Types, Logging, Observability (foundation, no dependencies)
 DriftDetection, DriftAlerting (statistical drift monitoring, OTel alerts)
Layer 1: Config, Validation, Scaling (configuration, validation, heuristics)
 ConfigWizard (hardware-aware config generation)
Layer 2: Tokenizer, ParquetIO (BPE training/loading, Parquet dataset I/O)
Layer 3: Ingest, CorpusCompression (document discovery, corpus writing, Zstd compression)
Layer 4: Tensor, MultiResolutionAttention (device selection, chunk pooling + global tokens)
 Model, AttentionConfig, AutoConfig (transformer with MLA, FlashAttention, auto-resolution)
 Vision, VisionBridge (vision encoder, Q-Former bridge)
 PagedAttention, MemoryHierarchy (paged KV-cache, compressed memory)
 PersistentRetrievalStore (disk-backed retrieval for cross-session use)
 ConfidenceHead, EarlyExit (calibration, adaptive depth)
Layer 5: Optimizer, Checkpoint, MmapLoading (optimizer variants incl. INT8 moments, save/load/metadata, mmap loading)
 GradientCheckpointing (activation recomputation)
 DistributedTraining, ModelParallelism (PyTorch/DeepSpeed, tensor/pipeline parallel)
 GpuOffloading, OptimizerOffload (CPU/GPU memory management)
 NvmePaging, OnnxExport (NVMe paging, ONNX export with quantization)
 DataMixture, VramGuard (data mixing, in-process VRAM monitoring)
Layer 6: Training, FineTuningData, FineTuning (training loop, SFT/DPO data, LoRA training)
 RewardFunctions, DataValidation (composable RL rewards, JSONL validation)
 Fp8Dequantization, Nf4Quantizer (FP8/NF4 quantization support, GPU LUT, STE for QAT)
 QLoraTraining (QLoRA: NF4 base + LoRA adapters)
 WeightTransfer, ModelMerge (donor model transfer, N-ary merging)
 AttentionDistillation (MLA-to-GQA distillation)
 Pruning, CompressionPipeline (structured pruning, prune→finetune→quantize)
Layer 7: ConstrainedDecoding, ChainOfThought (generation extensions)
 Verifier, ToolPolicy, McpClient (verifier strategies, tool policy, tool calling)
 WebSearch, ImageRouting, A2AClient (search, image routing, A2A protocol)
 TreeSpeculation, ContinuousBatching (speculation, batch scheduling)
 Inference (text generation with sampling, tools, structured output)
 DraftAndRefine, AdvancedReasoning (multi-pass reasoning, consensus/tree-of-thoughts)
 Orchestrator, BackendClient (multi-model routing, external LLM client)
 ExperienceStore, CostAwareRouting (cross-session persistence, cost optimization)
 AgentSession, FanOutOrchestration (agent lifecycle, parallel task decomposition)
Layer 8: ContextAwareness, SemanticKnowledge (project context, WordNet)
 DataAugmentation (synonym replacement, paraphrasing, token noise)
 VerifierRuntime (verifier loading and runtime integration)
 Eval (benchmark datasets, runners, result serialization)
Layer 9: Program (CLI entry point, subcommand routing)

More about design decisions can be read from the .

Running Tests

dotnet test Fuuga.Tests

Tests cover tokenization, ingestion, Parquet I/O, corpus compression, model architecture, MLA attention, vision encoder, vision bridge, paged attention, multi-resolution attention, attention configuration, early exit, training, optimizer variants, INT8 optimizer moments, gradient checkpointing, GPU offloading, optimizer offloading, ONNX export, inference, verifier-guided generation, speculative decoding, checkpoints, memory-mapped loading, fine-tuning, prompt tuning, QLoRA, reinforcement learning, reward functions, data validation, data augmentation, benchmark evaluation, FP8 dequantization, FP8 GPU LUT dequantization, NF4 quantization, STE quantization-aware training, structured pruning, compression pipeline, constrained decoding, chain-of-thought, MCP client, A2A client/server, web search, image routing, draft-and-refine, advanced reasoning, orchestrator, backend client, cost-aware routing, fan-out orchestration, weight transfer, attention distillation, model merging, distributed training, model parallelism, scaling, validation, continuous batching, dynamic batching server, experience persistence, agent session management, persistent retrieval, drift detection, drift alerting, context awareness, semantic knowledge, observability, guard rails, memory strategy (gradient offloading, per-param flush, VRAM guard lifecycle, ConfigWizard recommendations, multi-GPU parallelism), and CLI integration. Tests use xUnit with FsUnit assertions and include both unit tests and end-to-end integration tests.

Technology Stack

Component	Library	Purpose
Tensors & GPU	TorchSharp 0.106.0	Tensor operations, CUDA support
LibTorch	libtorch-cpu 2.10.0	LibTorch CPU backend
Tokenization	Microsoft.ML.Tokenizers 2.0.0	BPE tokenizer training
ONNX Runtime	Microsoft.ML.OnnxRuntime 1.24.3	ONNX model inference
ONNX Export	OnnxSharp 0.3.2	ONNX model construction and manipulation
Protobuf	Google.Protobuf 3.34.0	Protobuf serialization for ONNX
Epub parsing	VersOne.Epub 3.3.4	Extract text from epub files
Markdown	Markdig 1.1.1	Parse markdown to plain text
Parquet	Parquet.Net 5.5.0	HuggingFace-compatible dataset I/O
Compression	ZstdSharp.Port 0.8.7	Zstandard corpus compression
Statistics	MathNet.Numerics.FSharp 5.0.0	Drift detection (KS test, PSI)
Logging	Serilog 4.3.0	Structured logging
Logging sinks	Serilog.Sinks.Console 6.0.0, .File 6.0.0, .OpenTelemetry 4.2.0	Console, file, and OTel log sinks
Logging bridge	Serilog.Extensions.Logging 9.0.0	Serilog/Microsoft.Extensions.Logging bridge
JSON	FSharp.SystemTextJson 1.4.36	F# DU-aware serialization
Telemetry	OpenTelemetry 1.11.2	Distributed tracing and metrics
Telemetry export	OpenTelemetry.Exporter.OpenTelemetryProtocol 1.11.2	OTLP protocol export
Telemetry hosting	OpenTelemetry.Extensions.Hosting 1.11.2	OpenTelemetry hosting integration
Terminal UI	Spectre.Console 0.49.1	Rich terminal output
HuggingFace	TorchSharp.PyBridge 1.4.3	HuggingFace weight loading
SIMD	System.Numerics.Tensors 10.0.5	Preprocessing acceleration
AI abstractions	Microsoft.Extensions.AI 10.4.0	IChatClient adapter
AI evaluation	Microsoft.Extensions.AI.Evaluation 10.4.0	Benchmark evaluation framework
MCP	ModelContextProtocol 1.1.0	MCP client SDK for tool calling
A2A	A2A 0.3.3-preview	Agent-to-Agent protocol client
Testing	xUnit 2.9.3 + FsUnit.xUnit 6.0.1	Unit and integration tests
Server: HTTP	Oxpecker 2.0.0	F# HTTP server framework
Server: A2A	A2A.AspNetCore 0.3.3-preview	A2A protocol server
Image: Diffusion	StableDiffusion.NET 5.0.0	Stable Diffusion model wrapper
Image: Captioning	Microsoft.ML.OnnxRuntimeGenAI 0.12.1	Phi-3.5-vision captioning
Image: Processing	HPPH.SkiaSharp 1.0.0	Image load/save, format conversion

License

See for details.

Product	Versions Compatible and additional computed target framework versions.
.NET	net10.0 net10.0 is compatible. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed.

Compatible target framework(s)

Included target framework(s) (in package)

Learn more about Target Frameworks and .NET Standard.

net10.0
- A2A (>= 0.3.3-preview)
- FSharp.Core (>= 10.1.301)
- FSharp.SystemTextJson (>= 1.4.36)
- Google.Protobuf (>= 3.35.1)
- libtorch-cuda-12.8-win-x64 (>= 2.10.0)
- LLamaSharp (>= 0.27.0)
- LLamaSharp.Backend.Cpu (>= 0.27.0)
- Markdig (>= 1.2.0)
- MathNet.Numerics.FSharp (>= 5.0.0)
- Microsoft.Extensions.AI (>= 10.7.0)
- Microsoft.Extensions.AI.Evaluation (>= 10.7.0)
- Microsoft.Extensions.AI.Evaluation.Quality (>= 10.7.0)
- Microsoft.Extensions.AI.Evaluation.Reporting (>= 10.7.0)
- Microsoft.ML.OnnxRuntime (>= 1.26.0)
- Microsoft.ML.Tokenizers (>= 2.0.0)
- ModelContextProtocol (>= 1.4.0)
- OnnxSharp (>= 0.3.2)
- OpenTelemetry (>= 1.16.0)
- OpenTelemetry.Exporter.OpenTelemetryProtocol (>= 1.16.0)
- OpenTelemetry.Extensions.Hosting (>= 1.16.0)
- Parquet.Net (>= 6.0.3)
- Serilog (>= 4.3.1)
- Serilog.Extensions.Logging (>= 10.0.0)
- Serilog.Sinks.Console (>= 6.1.1)
- Serilog.Sinks.File (>= 7.0.0)
- Serilog.Sinks.OpenTelemetry (>= 4.2.0)
- Spectre.Console (>= 0.57.0)
- System.Numerics.Tensors (>= 10.0.9)
- TorchSharp (>= 0.107.0)
- TorchSharp.PyBridge (>= 1.4.3)
- VersOne.Epub (>= 3.3.6)
- ZstdSharp.Port (>= 0.8.8)

NuGet packages

This package is not used by any NuGet packages.

GitHub repositories

This package is not used by any popular GitHub repositories.

Version	Downloads	Last Updated
1.0.7	91	6/12/2026
1.0.6	97	6/2/2026
1.0.5	131	4/8/2026
1.0.4	109	4/2/2026
1.0.3	106	4/1/2026
1.0.2	112	3/31/2026
1.0.1	107	3/25/2026
1.0.0	103	3/24/2026

URL: https://www.nuget.org/packages/Fuuga/

⇱ NuGet Gallery | Fuuga 1.0.7

👁 Image
Fuuga 1.0.7

Fuuga

Features

Prerequisites

Quick Start

F# Script Examples

Image Generation

Project Structure

Documentation

Architecture

Running Tests

Technology Stack

License

net10.0

NuGet packages

GitHub repositories

URL: https://www.nuget.org/packages/Fuuga/

⇱ NuGet Gallery | Fuuga 1.0.7

👁 Image Fuuga 1.0.7

Fuuga

Features

Prerequisites

Quick Start

F# Script Examples

Image Generation

Project Structure

Documentation

Architecture

Running Tests

Technology Stack

License

net10.0

NuGet packages

GitHub repositories

👁 Image
Fuuga 1.0.7