DualMinded-Qwen3-1.7B

A 1.7B parameter dual-cognition model trained on Opus 4.6 reasoning traces. The model implements a three-phase cognitive loop — explore, examine, respond — where it reasons freely, critiques its own reasoning, then synthesizes a clean answer.

Convergent Intelligence LLC: Research Division

Architecture

<explore> — unconstrained reasoning, derivation, speculation
</explore>

<examine> — adversarial self-critique, error detection, refinement
</examine>

<response> — clean synthesis from the internal dialogue
</response>

This is the multi-model collision array collapsed into a single architecture. The dialectical structure that produces novel insights from architectural diversity is recreated through role-conditioned generation on shared weights. No extra parameters, no routing — same weights, different cognitive modes.

Training Pipeline

DualMinded-Qwen3-1.7B is the product of a four-stage pipeline:

Stage 1 — Multi-Teacher Distillation: Qwen3-30B-A3B in three variants (Instruct, Thinking, Coder) distilled into Qwen3-1.7B via proof-weighted KD with 2.25× loss amplification on reasoning tokens.

Stage 2 — DISC Refinement: Disctil-Qwen3-1.7B: the student refined through Discrepancy Calculus, detecting and preserving structural boundaries in the teacher's distribution.

Stage 3 — Topological Knowledge Distillation (TKD): Continuous-stream distillation with topology-guided windowing from Qwen3-30B-A3B-Thinking. Bounded variation decomposition of the teacher's output: smooth + jumps + drift. Jump positions amplified at 3σ, windows cut at low-discrepancy boundaries, 4-phase curriculum ordering (easy → hard).

Stage 4 — DualMind SFT on Opus 4.6: SFT using Opus-4.6-Reasoning-3000x-filtered. The thinking column maps directly to <explore> — no heuristic sentence splitting needed. The solution column is split into <examine> + <response>.

Training Configuration

Parameter	Value
Base checkpoint	TKD checkpoint-512
Dataset	Opus-4.6-Reasoning-3000x-filtered (50%)
Max seq length	2048
Batch size	2 × 8 accum = 16 effective
Learning rate	5e-6 (cosine)
Warmup	32 steps
Max steps	1024
Precision	BF16
Hardware	NVIDIA H100

DualMind vs DualMinded

	DualMind	DualMinded
SFT Data	LogicInference_OA	Opus-4.6-Reasoning
Explore Source	Heuristic CoT split	Direct Opus `thinking` column
Strength	Formal logic, structured proofs	Extended reasoning, creative derivation
Base Checkpoint	TKD final	TKD checkpoint-512

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model = AutoModelForCausalLM.from_pretrained(
 "reaperdoesntknow/DualMinded-Qwen3-1.7B",
 torch_dtype=torch.bfloat16,
 device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/DualMinded-Qwen3-1.7B")

prompt = "##USER:\nProve the mean value theorem.\n\n<explore>\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
 out = model.generate(
 **inputs,
 max_new_tokens=512,
 do_sample=True,
 temperature=0.6,
 top_p=0.9,
 repetition_penalty=1.15,
 )
print(tokenizer.decode(out[0], skip_special_tokens=True))

Ghost Imprinting

Sequential distillation from multiple teachers (Instruct → Thinking → Coder → Opus) leaves residual fields in weight space. These residuals produce capabilities absent from any individual teacher — the singular-continuous component of the bounded variation decomposition applied to the parameter tensor. Models in the DualMind family exhibit emergent behaviors (e.g., literary content from physics-only training data) attributable to these ghost imprints.

GGUF

Quantized versions available at DualMinded-Qwen3-1.7B-GGUF: F16, Q8_0, Q5_K_M, Q4_K_M.

Ollama: ollama run reaperdoesntrun/DualMinded-1.7B

DualMind — LogicInference-trained variant
DualMind_Methodolgy — Paper: DOI 10.57967/hf/8184
Structure Over Scale — Paper 1: CPU training methodology
DualMind Collection
DistilQwen Collection

Mathematical Foundations: Discrepancy Calculus (DISC)

This model's training pipeline is grounded in Discrepancy Calculus — a measure-theoretic framework that treats singularities as primary structure rather than pathology. Full theory: "On the Formal Analysis of Discrepancy Calculus" (Colca, 2026; Convergent Intelligence LLC: Research Division).

The Core Operator:

For smooth $f$: $Df(x) = |f'(x)|$. For rough $f$: $D$ localizes irregularity to null sets while preserving integral structure.

The Mesh Fundamental Identity — every BV function decomposes as:

Standard knowledge distillation captures only term 1. Topological Knowledge Distillation (TKD) preserves all three by treating the teacher's output distribution as a BV function and computing discrepancy energy, jump sets, and gap energy density before training begins.

Citation

@misc{colca2026dualmind,
 title={From Three Teachers to Dual Cognition: Topology-Aware Multi-Teacher Distillation and Role-Conditioned Self-Critique at 1.7B Scale},
 author={Colca, Roy S.},
 year={2026},
 publisher={HuggingFace},
 url={https://doi.org/10.57967/hf/8184}
}

Convergent Intelligence LLC: Research Division — Apache 2.0

Convergent Intelligence Portfolio

Part of the DualMind Series by Convergent Intelligence LLC: Research Division

DualMind Family

Model	Format	Description
DualMind	BF16	LogicInference-trained. Explore→Examine→Response loop.
DualMinded-Qwen3-1.7B	BF16	Opus 4.6 reasoning traces. Higher quality splits.
Dualmind-Qwen-1.7B-Thinking	BF16	Thinking-teacher variant with extended deliberation.
DualMind-GGUF	GGUF	Quantized LogicInference variant. CPU/6GB GPU.
DualMinded-Qwen3-1.7B-GGUF	GGUF	Quantized Opus variant. Ollama ready.