![]() |
VOOZH | about |
dotnet add package Fuuga --version 1.0.7
NuGet\Install-Package Fuuga -Version 1.0.7
<PackageReference Include="Fuuga" Version="1.0.7" />
<PackageVersion Include="Fuuga" Version="1.0.7" />Directory.Packages.props
<PackageReference Include="Fuuga" />Project file
paket add Fuuga --version 1.0.7
#r "nuget: Fuuga, 1.0.7"
#:package Fuuga@1.0.7
#addin nuget:?package=Fuuga&version=1.0.7Install as a Cake Addin
#tool nuget:?package=Fuuga&version=1.0.7Install as a Cake Tool
Tired of paying tokens? Think you could train a better model? Well, now you can try.
An LLM built from scratch in F# and .NET. Fuuga implements a complete language model pipeline: tokenization, data ingestion, model training, fine-tuning, and text generation -- with no Python dependencies.
Built on TorchSharp for tensor operations and Microsoft.ML.Tokenizers for BPE, Fuuga uses idiomatic F# (discriminated unions, pipelines, immutability) throughout. Works on GPU or CPU.
Core Pipeline:
.fuge corpus filesDepth); adds a weighted multi-depth loss during training for better sample efficiency and powers MTP-drafted speculative decoding for faster generation. Enable from the CLI with train --mtp-depth <N> [--mtp-loss-weight <f>], or set MtpConfig in the model-config JSON--moment-quant int8)--grad-offload)--flush-each-param)--vram-guard-gb <float>)MemoryStrategyConfigs.none, .constrained (grad-offload + flush), and .full (all three with VRAM guard at 95%) with automatic ConfigWizard recommendations based on model-to-VRAM ratio--backend onnx)torch.index_select) auto-selected when CUDA is availabletokenize, ingest, train, infer, info, sft, dpo, rl, merge, transfer, distill, merge-models, fisher, distributed, export onnx, compress, decompress, prune, eval, config, wordnet, serve, orchestrate, agent)Fine-Tuning:
Weight Transfer and Model Merging:
Inference Capabilities:
Orchestration:
Agentic Persistence:
Model Compression:
--ste)Server:
fuuga-serve project with /v1/chat/completions, /v1/completions, /v1/models, and /v1/embeddings endpointsDistributed Training:
Image Generation (Fuuga.Image):
Local Minion (Delegation):
fuuga-serve mcp) -- Exposes a fuuga_delegate tool plus local file tools to a master over stdio JSON-RPC; register several with distinct --name/--role to run a fleet of specialists (e.g. F# coder, C# coder, project manager)--checkpoint), an in-process GGUF via LLamaSharp (--gguf), or any OpenAI-compatible endpoint such as a local Ollama (--endpoint)read_file, list_dir, grep, plus permission-gated write_file, replace_in_file, and run_command; confined to a workspace --root (resists .. and symlink escapes), with writes/shell off by default--web) and configured MCP servers (--mcp-config) so a delegated errand can fetch pages and call other tools, not just touch the filesystem{status, output, files_changed, escalate, reason, confidence}; the minion self-verifies and hands work back (escalate=true) when it is not confident, so the master only spends its own capacity when neededfuuga delegate) -- Run a single errand locally and print the JSON result, without a masterObservability:
For most users, the fastest path to useful output is to start from donor weights, not from scratch training.
Recommended paths:
examples/donor-transfer-and-refine.fsx -- practical donor-first workflow for normal users, with two modes:
SmokeTest for limited hardware, using a very small donor just to prove the F# pipeline is realPractical for a few-GB donor model that gives much better output qualityexamples/complete-pipeline.fsx -- educational train-from-scratch pipelineexamples/advanced-scenarios.fsx -- high-level tour of advanced Fuuga capabilitiesIf you want to understand the full pipeline from scratch, use the CLI below:
# Build (CPU):
dotnet build
# Build (GPU, ~2 GB dependency):
dotnet build -p:TorchBackend=cuda
# Train a tokenizer, ingest a corpus, train, and generate text
dotnet run -- tokenize --input data/raw --vocab-size 8000 --output data/tokenizer
dotnet run -- ingest --input data/raw --output data/corpus.fuge --tokenizer data/tokenizer
dotnet run -- train --corpus data/corpus.fuge --tokenizer data/tokenizer --checkpoint-dir checkpoints/
dotnet run -- infer --checkpoint checkpoints/step-100 --tokenizer data/tokenizer --prompt "Once upon a time"
See the for a complete end-to-end walkthrough.
Important expectation setting:
Use .fuge for tokenized corpus files and .fuuga for portable model packages.
Prefer the F# API over the CLI?
Practical donor-first path:
dotnet build
dotnet fsi examples/donor-transfer-and-refine.fsx
This loads donor weights, runs transfer into a Fuuga model, evaluates prompt outputs, and can do a short refinement pass. It is the recommended starting point for users who want useful results on limited hardware or with a few-GB donor model.
Train-from-scratch path:
dotnet build
dotnet fsi examples/complete-pipeline.fsx
This trains a tokenizer, ingests data, trains a model, and generates text -- all using the Fuuga modules directly. See for the full source.
For advanced workflows -- weight transfer from Phi-3/LLaMA3/DeepSeek, LoRA and prompt tuning, benchmark evaluation, ONNX export, verifier-assisted inference, constrained decoding, chain-of-thought, streaming inference, and serving via the OpenAI-compatible API -- see .
For image generation and captioning, see .
Fuuga.Image is a standalone CLI for image generation and captioning. See the for full command reference and MCP server mode, or run .
Fuuga.fsproj # Project file with layered compilation order
Types.fs # All shared types (ModelConfig, TrainingConfig, GenerationConfig, etc.)
Logging.fs # ActivitySource/Meter definitions, ILoggerFactory
Observability.fs # OpenTelemetry providers, Spectre.Console, --observe flag
DriftDetection.fs # Statistical drift monitoring (KS, PSI) over confidence signals
DriftAlerting.fs # Dual-threshold alerts, adaptive thresholds, OTel metrics, retraining triggers
Config.fs # JSON config loading, CLI arg parsing, MCP config, LoRA target parsing
Validation.fs # Input validation pipeline with composable validators
Scaling.fs # Scaling heuristics from corpus and hardware stats
ConfigWizard.fs # Corpus analysis + hardware-aware config generation (Chinchilla scaling)
Tokenizer.fs # BPE tokenizer training and loading
ParquetIO.fs # HuggingFace Parquet dataset read/write (Document, SFT, DPO)
TextCleanup.fs # Ingestion/preparation text cleanup
RagCleanup.fs # RAG (Retrieval-Augmented Generation) cleanup algorithms
Ingest.fs # Document discovery and binary corpus writing
CorpusCompression.fs # Zstd compression/decompression for .fuge files
Tensor.fs # Device selection (CPU/CUDA), DisposeScope
MultiResolutionAttention.fs # Chunk pooling, global tokens for long context
Model.fs # GPT-2 transformer with RoPE, GQA, RMSNorm, SwiGLU, MLA
AttentionConfig.fs # FlashAttention verification, SDPA backend selection
AutoConfig.fs # Auto-resolution of DU Auto* config cases from hardware probing
Vision.fs # Vision encoder for multimodal inputs
VisionBridge.fs # Q-Former cross-attention bridge for vision-to-language compression
PagedAttention.fs # Paged KV-cache attention
MemoryHierarchy.fs # Compressed memory, external retrieval
PersistentRetrievalStore.fs # Disk-backed IRetrievalStore for cross-session retrieval
ConfidenceHead.fs # Calibrated confidence MLP, Platt scaling, bucket assignment
EarlyExit.fs # Early exit / adaptive depth inference
Optimizer.fs # AdamW, SWA, Lookahead, and INT8 moment-quantized optimizers
Checkpoint.fs # Checkpoint save/load/metadata, safetensors
MmapLoading.fs # Memory-mapped model loading
GradientCheckpointing.fs # Gradient checkpointing for memory-efficient training
DistributedTraining.fs # Distributed training (PyTorch/DeepSpeed export/import)
ModelParallelism.fs # Tensor/pipeline parallelism config for 70B+ models
GpuOffloading.fs # Layer-wise CPU/GPU offloading
OptimizerOffload.fs # Optimizer state offloading
NvmePaging.fs # ZeRO-Infinity 3-tier GPU/CPU/NVMe memory management
OnnxExport.fs # ONNX export with quantization and validation
DataMixture.fs # N-ary weighted data source mixing
VramGuard.fs # In-process VRAM monitoring with nvidia-smi polling, signal-based training loop integration
Training.fs # Training loop with AdamW/cosine LR, gradient offloading, per-param flush
FineTuningData.fs # SFT/DPO JSONL parsing, chat templates, tokenization, batching
FineTuning.fs # LoRA (LoraLinear), SFT training, DPO loss/training, adapter save/load
RewardFunctions.fs # Composable reward functions for RL training
DataValidation.fs # SFT/DPO JSONL validation, honesty pattern classification
Fp8Dequantization.fs # FP8 format dequantization with GPU LUT acceleration
Nf4Quantizer.fs # NF4/FP4 4-bit weight quantization, STE for QAT
QLoraTraining.fs # QLoRA training (NF4 base + LoRA adapters)
WeightTransfer.fs # Weight transfer from donor models (Phi-3, LLaMA3, Gemma-3/4, DeepSeek mappings)
ModelMerge.fs # N-ary model merging (TIES, DARE, Karcher) with EWC protection
AttentionDistillation.fs # MLA-to-GQA attention distillation (KL + MSE loss)
Pruning.fs # Structured pruning (attention heads, layers)
CompressionPipeline.fs # Prune → finetune → quantize orchestration
ConstrainedDecoding.fs # Grammar-guided JSON constrained decoding
ChainOfThought.fs # ThinkStart/ThinkEnd token handling, phase tracking
Verifier.fs # Rule-based and learned verifier strategies
ToolPolicy.fs # Confidence-aware tool invocation policy
McpClient.fs # MCP client: connection, tool discovery, tool invocation
WebSearch.fs # Web search integration for grounded generation
ImageRouting.fs # Image query routing to Fuuga.Image
A2AClient.fs # A2A protocol client for agent-to-agent communication
TreeSpeculation.fs # Tree-structured speculative decoding
ContinuousBatching.fs # Iteration-level continuous batching for serving
Inference.fs # Text generation with sampling, tool-augmented generation, structured output, CoT
DraftAndRefine.fs # Draft-and-refine multi-pass reasoning pipeline
AdvancedReasoning.fs # Consensus voting, verifier-scored, tree-of-thoughts
Orchestrator.fs # Multi-model orchestrator with capability-based routing
BackendClient.fs # HTTP client for external OpenAI-compatible LLM endpoints
ExperienceStore.fs # Append-only experience log, strategy lessons, managed store
CostAwareRouting.fs # Cost-aware routing with budget tracking
AgentSession.fs # Agent session lifecycle, save/load, orchestration checkpoints
FanOutOrchestration.fs # Fan-out/fan-in task decomposition and aggregation
ContextAwareness.fs # Convention file discovery, language/framework detection, git context
SemanticKnowledge.fs # WordNet WNDB parser, token-to-synset mapping, multi-lingual
DataAugmentation.fs # Synonym replacement, paraphrasing, token noise, oversampling
VerifierRuntime.fs # Verifier loading and runtime integration
MinionTools.fs # Sandboxed local tools (read/list/grep/write/replace/run) for delegated errands
MinionBrain.fs # Swappable minion brain: Fuuga checkpoint, GGUF (LLamaSharp), or OpenAI endpoint
MinionEscalation.fs # Escalation contract: self-verify + uncertainty/budget signals -> hand back to master
MinionToolEnv.fs # Minion reach beyond local files: built-in web tools and configured MCP servers
MinionAgent.fs # Delegate loop: tool-use rounds, structured DelegateResult, escalation verdict
MinionCli.fs # Shared minion flag parsing (brain, sandbox, reach, identity, escalation)
Eval.fs # Benchmark datasets, runners, result serialization
Program.fs # CLI entry point with subcommand routing (incl. `delegate`)
Fuuga.Server/ # OpenAI-compatible HTTP server (separate project)
ApiTypes.fs # Request/response types (OpenAI-compatible)
McpToolRouting.fs # Server-side MCP tool routing for function calling
FuugaChatClient.fs # IChatClient adapter for Microsoft AI ecosystem
FuugaEmbeddingGenerator.fs # IEmbeddingGenerator adapter for /v1/embeddings
A2AServer.fs # A2A protocol server endpoint
GuardRails.fs # Prompt injection detection, PII masking, content filtering
DynamicBatchingServer.fs # Dynamic batching with HTTP/SSE integration
Server.fs # Oxpecker HTTP server with SSE streaming, auth middleware
MinionServer.fs # Minion MCP server (stdio JSON-RPC): fuuga_delegate + local tools
Program.fs # Server entry point (incl. `mcp` minion mode)
Fuuga.Image/ # Standalone image generation and captioning CLI
Types.fs # Domain types, error handling (ImageError DU)
Config.fs # CLI argument parsing
ImageIO.fs # Image load/save, format conversion, validation
Diffusion.fs # Stable Diffusion model wrapper (txt2img, img2img)
Caption.fs # Phi-3.5-vision captioning (ONNX Runtime GenAI)
McpServer.fs # MCP JSON-RPC 2.0 server over stdio
Program.fs # Entry point, subcommand routing
Fuuga.Tests/ # Unit and integration tests (xUnit + FsUnit)
Fuuga.Image/Fuuga.Image.Tests/ # Image module tests
docs/ # User-facing documentation
examples/ # Runnable F# script examples
scripts/ # Training data generation and validation scripts
Fuuga uses a layered module architecture with strict dependency ordering enforced by F#'s compilation model:
Layer 0: Types, Logging, Observability (foundation, no dependencies)
DriftDetection, DriftAlerting (statistical drift monitoring, OTel alerts)
Layer 1: Config, Validation, Scaling (configuration, validation, heuristics)
ConfigWizard (hardware-aware config generation)
Layer 2: Tokenizer, ParquetIO (BPE training/loading, Parquet dataset I/O)
Layer 3: Ingest, CorpusCompression (document discovery, corpus writing, Zstd compression)
Layer 4: Tensor, MultiResolutionAttention (device selection, chunk pooling + global tokens)
Model, AttentionConfig, AutoConfig (transformer with MLA, FlashAttention, auto-resolution)
Vision, VisionBridge (vision encoder, Q-Former bridge)
PagedAttention, MemoryHierarchy (paged KV-cache, compressed memory)
PersistentRetrievalStore (disk-backed retrieval for cross-session use)
ConfidenceHead, EarlyExit (calibration, adaptive depth)
Layer 5: Optimizer, Checkpoint, MmapLoading (optimizer variants incl. INT8 moments, save/load/metadata, mmap loading)
GradientCheckpointing (activation recomputation)
DistributedTraining, ModelParallelism (PyTorch/DeepSpeed, tensor/pipeline parallel)
GpuOffloading, OptimizerOffload (CPU/GPU memory management)
NvmePaging, OnnxExport (NVMe paging, ONNX export with quantization)
DataMixture, VramGuard (data mixing, in-process VRAM monitoring)
Layer 6: Training, FineTuningData, FineTuning (training loop, SFT/DPO data, LoRA training)
RewardFunctions, DataValidation (composable RL rewards, JSONL validation)
Fp8Dequantization, Nf4Quantizer (FP8/NF4 quantization support, GPU LUT, STE for QAT)
QLoraTraining (QLoRA: NF4 base + LoRA adapters)
WeightTransfer, ModelMerge (donor model transfer, N-ary merging)
AttentionDistillation (MLA-to-GQA distillation)
Pruning, CompressionPipeline (structured pruning, prune→finetune→quantize)
Layer 7: ConstrainedDecoding, ChainOfThought (generation extensions)
Verifier, ToolPolicy, McpClient (verifier strategies, tool policy, tool calling)
WebSearch, ImageRouting, A2AClient (search, image routing, A2A protocol)
TreeSpeculation, ContinuousBatching (speculation, batch scheduling)
Inference (text generation with sampling, tools, structured output)
DraftAndRefine, AdvancedReasoning (multi-pass reasoning, consensus/tree-of-thoughts)
Orchestrator, BackendClient (multi-model routing, external LLM client)
ExperienceStore, CostAwareRouting (cross-session persistence, cost optimization)
AgentSession, FanOutOrchestration (agent lifecycle, parallel task decomposition)
Layer 8: ContextAwareness, SemanticKnowledge (project context, WordNet)
DataAugmentation (synonym replacement, paraphrasing, token noise)
VerifierRuntime (verifier loading and runtime integration)
Eval (benchmark datasets, runners, result serialization)
Layer 9: Program (CLI entry point, subcommand routing)
More about design decisions can be read from the .
dotnet test Fuuga.Tests
Tests cover tokenization, ingestion, Parquet I/O, corpus compression, model architecture, MLA attention, vision encoder, vision bridge, paged attention, multi-resolution attention, attention configuration, early exit, training, optimizer variants, INT8 optimizer moments, gradient checkpointing, GPU offloading, optimizer offloading, ONNX export, inference, verifier-guided generation, speculative decoding, checkpoints, memory-mapped loading, fine-tuning, prompt tuning, QLoRA, reinforcement learning, reward functions, data validation, data augmentation, benchmark evaluation, FP8 dequantization, FP8 GPU LUT dequantization, NF4 quantization, STE quantization-aware training, structured pruning, compression pipeline, constrained decoding, chain-of-thought, MCP client, A2A client/server, web search, image routing, draft-and-refine, advanced reasoning, orchestrator, backend client, cost-aware routing, fan-out orchestration, weight transfer, attention distillation, model merging, distributed training, model parallelism, scaling, validation, continuous batching, dynamic batching server, experience persistence, agent session management, persistent retrieval, drift detection, drift alerting, context awareness, semantic knowledge, observability, guard rails, memory strategy (gradient offloading, per-param flush, VRAM guard lifecycle, ConfigWizard recommendations, multi-GPU parallelism), and CLI integration. Tests use xUnit with FsUnit assertions and include both unit tests and end-to-end integration tests.
| Component | Library | Purpose |
|---|---|---|
| Tensors & GPU | TorchSharp 0.106.0 | Tensor operations, CUDA support |
| LibTorch | libtorch-cpu 2.10.0 | LibTorch CPU backend |
| Tokenization | Microsoft.ML.Tokenizers 2.0.0 | BPE tokenizer training |
| ONNX Runtime | Microsoft.ML.OnnxRuntime 1.24.3 | ONNX model inference |
| ONNX Export | OnnxSharp 0.3.2 | ONNX model construction and manipulation |
| Protobuf | Google.Protobuf 3.34.0 | Protobuf serialization for ONNX |
| Epub parsing | VersOne.Epub 3.3.4 | Extract text from epub files |
| Markdown | Markdig 1.1.1 | Parse markdown to plain text |
| Parquet | Parquet.Net 5.5.0 | HuggingFace-compatible dataset I/O |
| Compression | ZstdSharp.Port 0.8.7 | Zstandard corpus compression |
| Statistics | MathNet.Numerics.FSharp 5.0.0 | Drift detection (KS test, PSI) |
| Logging | Serilog 4.3.0 | Structured logging |
| Logging sinks | Serilog.Sinks.Console 6.0.0, .File 6.0.0, .OpenTelemetry 4.2.0 | Console, file, and OTel log sinks |
| Logging bridge | Serilog.Extensions.Logging 9.0.0 | Serilog/Microsoft.Extensions.Logging bridge |
| JSON | FSharp.SystemTextJson 1.4.36 | F# DU-aware serialization |
| Telemetry | OpenTelemetry 1.11.2 | Distributed tracing and metrics |
| Telemetry export | OpenTelemetry.Exporter.OpenTelemetryProtocol 1.11.2 | OTLP protocol export |
| Telemetry hosting | OpenTelemetry.Extensions.Hosting 1.11.2 | OpenTelemetry hosting integration |
| Terminal UI | Spectre.Console 0.49.1 | Rich terminal output |
| HuggingFace | TorchSharp.PyBridge 1.4.3 | HuggingFace weight loading |
| SIMD | System.Numerics.Tensors 10.0.5 | Preprocessing acceleration |
| AI abstractions | Microsoft.Extensions.AI 10.4.0 | IChatClient adapter |
| AI evaluation | Microsoft.Extensions.AI.Evaluation 10.4.0 | Benchmark evaluation framework |
| MCP | ModelContextProtocol 1.1.0 | MCP client SDK for tool calling |
| A2A | A2A 0.3.3-preview | Agent-to-Agent protocol client |
| Testing | xUnit 2.9.3 + FsUnit.xUnit 6.0.1 | Unit and integration tests |
| Server: HTTP | Oxpecker 2.0.0 | F# HTTP server framework |
| Server: A2A | A2A.AspNetCore 0.3.3-preview | A2A protocol server |
| Image: Diffusion | StableDiffusion.NET 5.0.0 | Stable Diffusion model wrapper |
| Image: Captioning | Microsoft.ML.OnnxRuntimeGenAI 0.12.1 | Phi-3.5-vision captioning |
| Image: Processing | HPPH.SkiaSharp 1.0.0 | Image load/save, format conversion |
See for details.
| Product | Versions Compatible and additional computed target framework versions. |
|---|---|
| .NET | net10.0 net10.0 is compatible. net10.0-android net10.0-android was computed. net10.0-browser net10.0-browser was computed. net10.0-ios net10.0-ios was computed. net10.0-maccatalyst net10.0-maccatalyst was computed. net10.0-macos net10.0-macos was computed. net10.0-tvos net10.0-tvos was computed. net10.0-windows net10.0-windows was computed. |
This package is not used by any NuGet packages.
This package is not used by any popular GitHub repositories.