Qwen3-0.6B MoE-Prune Checkpoints

Artifacts from the MoE-Prune research track (knowledge-parameter-offloading via FFN neuron selection). Companion to hyunseoki/qwen3-0.6b-lambda-gates-chat (which carries the 4 published λ-gate variants); this repo collects the broader set of experiments.

Layout

lambda_gates/<name>/ — learned λ-gating checkpoints (full training runs). Each dir: lambda_logits.pt, neuron_indices.json, gate_stats.json, selected_thresholds.txt. Notable names: lambda_chat_energy*, lambda_chat_mean*, lambda_nke_optA_w0p05*, lambda_nke_optB_lf1p0*, lambda_stage1_qwen3_0p6b_*.
saliency_masks/<name>/ — training-free saliency masks (Taylor / Fisher / weight-magnitude / random), synthesized into the same lambda_logits.pt threshold-0.5 format. 70+ variants spanning taylor_{forget,retain,diff,ratio}_k*, fisher_{diff,ratio}_k*, weight_mag_k*, random_k* at budgets 0.05–5 % (per-layer and global).
smoke_matrix/<Vxx>/ — V0..V77 smoke-matrix gates from the gate-parameterization grid search. Each: lambda_logits.pt + neuron_indices.json + training_summary.json. Top-level aggregate.json / trajectories.csv summarize the run.

Format

lambda_logits.pt is a dict {layer_idx: tensor[intermediate_dim]} in Qwen3-0.6B's 28-layer / 3072-dim FFN space. Feed directly to scripts/eval_rq1.py via --lambda_checkpoint <path>; a neuron with logit < 0 (σ < 0.5) is treated as knowledge and ablated at inference.

Provenance

Pinned commit / branch info travels in each variant's run_meta.json where present; see also the per-variant eval artifacts at hyunseoki/mjoint-eval-artifacts (predictions JSONL snapshots).

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

URL: https://huggingface.co/hyunseoki/qwen3-0.6b-moe-prune-checkpoints

⇱ hyunseoki/qwen3-0.6b-moe-prune-checkpoints · Hugging Face

Qwen3-0.6B MoE-Prune Checkpoints

Layout

Format

Provenance