YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Qwen3-0.6B MoE-Prune Checkpoints
Artifacts from the MoE-Prune research track (knowledge-parameter-offloading
via FFN neuron selection). Companion to
hyunseoki/qwen3-0.6b-lambda-gates-chat (which carries the 4 published
λ-gate variants); this repo collects the broader set of experiments.
Layout
lambda_gates/<name>/— learned λ-gating checkpoints (full training runs). Each dir:lambda_logits.pt,neuron_indices.json,gate_stats.json,selected_thresholds.txt. Notable names:lambda_chat_energy*,lambda_chat_mean*,lambda_nke_optA_w0p05*,lambda_nke_optB_lf1p0*,lambda_stage1_qwen3_0p6b_*.saliency_masks/<name>/— training-free saliency masks (Taylor / Fisher / weight-magnitude / random), synthesized into the samelambda_logits.ptthreshold-0.5 format. 70+ variants spanningtaylor_{forget,retain,diff,ratio}_k*,fisher_{diff,ratio}_k*,weight_mag_k*,random_k*at budgets 0.05–5 % (per-layer and global).smoke_matrix/<Vxx>/— V0..V77 smoke-matrix gates from the gate-parameterization grid search. Each:lambda_logits.pt+neuron_indices.json+training_summary.json. Top-levelaggregate.json/trajectories.csvsummarize the run.
Format
lambda_logits.pt is a dict {layer_idx: tensor[intermediate_dim]} in
Qwen3-0.6B's 28-layer / 3072-dim FFN space. Feed directly to
scripts/eval_rq1.py via --lambda_checkpoint <path>; a neuron with
logit < 0 (σ < 0.5) is treated as knowledge and ablated at inference.
Provenance
Pinned commit / branch info travels in each variant's run_meta.json
where present; see also the per-variant eval artifacts at
hyunseoki/mjoint-eval-artifacts (predictions JSONL snapshots).
