Dataset Preview
config string | layers string | em float64 | f1 float64 | faithfulness float64 | sdr_pct int64 | cpr_pct int64 | delta_em float64 | key_addition string |
|---|---|---|---|---|---|---|---|---|
A | Baseline | 61.2 | 68.4 | 0.71 | 0 | 0 | 0 | Standard DPR retrieval |
B | A+TVE | 65.3 | 72.1 | 0.75 | 18 | 0 | 4.1 | Causal+syntactic arms added |
C | B+VRC | 67.8 | 74.9 | 0.78 | 26 | 8 | 2.5 | Geometric causal suppression |
D | C+SDC | 70.4 | 78.2 | 0.83 | 48 | 19 | 2.6 | Per-chunk SDS threshold |
E | D+CPG | 72.1 | 80.3 | 0.88 | 54 | 58 | 1.7 | Collective ESR constraint (+39pp CPR) |
F | E+RFG | 73.4 | 81.5 | 0.9 | 57 | 65 | 1.3 | No-weak-link Phi-score |
G | F+CCB | 73.9 | 82 | 0.91 | 59 | 68 | 0.5 | Depth-0 at position 0 |
H | G+FV (FULL) | 74.8 | 82.6 | 0.94 | 61 | 71 | 0.9 | Delta-R gate + regen loop |
null | null | null | null | null | null | null | null | null |
null | null | null | null | null | null | null | null | null |
null | null | null | null | null | null | null | null | null |
null | null | null | null | null | null | null | null | null |
null | null | null | null | null | null | null | null | null |
null | null | null | null | null | null | null | null | null |
null | null | null | null | null | null | null | null | null |
null | null | null | null | null | null | null | null | null |
null | null | 61.2 | 68.4 | 0.71 | null | null | null | null |
null | null | 59.8 | 66.1 | 0.69 | null | null | null | null |
null | null | 64.1 | 71.8 | 0.74 | 12 | 8 | null | null |
null | null | 66.9 | 74.3 | 0.78 | 31 | 19 | null | null |
null | null | 68.4 | 75.9 | 0.81 | 35 | 24 | null | null |
null | null | 63.5 | 70.2 | 0.73 | 8 | 5 | null | null |
null | null | 65.7 | 72.9 | 0.75 | 14 | 10 | null | null |
null | null | 74.8 | 82.6 | 0.94 | 61 | 71 | null | null |
null | null | null | null | null | null | null | null | null |
null | null | null | null | null | null | null | null | null |
null | null | null | null | null | null | null | null | null |
null | null | null | null | null | null | null | null | null |
null | null | null | null | null | null | null | null | null |
null | null | null | null | null | null | null | null | null |
null | null | null | null | null | null | null | null | null |
null | null | null | null | null | null | null | null | null |
null | null | null | null | null | null | null | null | null |
null | null | null | null | null | null | null | null | null |
VORTEXRAG Benchmark Evaluation Dataset
Benchmark evaluation data, results, and per-layer ablation scores for the VORTEXRAG 7-layer causal RAG framework.
Description
This dataset contains:
- Main benchmark results comparing VORTEXRAG against 7 baseline systems across NQ, HotpotQA, MuSiQue, and 2WikiMultiHopQA
- Full ablation study showing per-layer contribution (configs A through H)
- Per-dataset breakdown showing performance on 6 evaluation datasets
- Domain preset evaluation for all 11 domain configurations
- Hyperparameter sensitivity sweep data for τ and θ_CPG
- Latency breakdown per layer on A100 GPU
Datasets Used for Evaluation
| Dataset | Type | Questions | Hops | Source |
|---|---|---|---|---|
| NaturalQuestions | Open-domain QA | 7,842 | 1–2 | Wikipedia Dec 2018 |
| HotpotQA | Multi-hop QA | 7,405 | 2 | Wikipedia 10K docs |
| MuSiQue | Multi-hop QA | 2,417 | 2–4 | Wikipedia filtered |
| 2WikiMultiHopQA | Multi-hop QA | 12,576 | 2 | Wikipedia + Wikidata |
| LegalBench | Legal QA | 1,200 | 1–3 | US federal case law |
| MedQA | Medical QA | 1,273 | 1–2 | PubMed abstracts |
Main Results
| System | EM | F1 | Faithfulness | SDR | CPR | Latency |
|---|---|---|---|---|---|---|
| Naive RAG | 61.2 | 68.4 | 0.71 | — | — | 120ms |
| BM25+Rerank | 59.8 | 66.1 | 0.69 | — | — | 95ms |
| HyDE | 64.1 | 71.8 | 0.74 | 12% | 8% | 340ms |
| CRAG | 66.9 | 74.3 | 0.78 | 31% | 19% | 290ms |
| Self-RAG | 68.4 | 75.9 | 0.81 | 35% | 24% | 410ms |
| FiD | 63.5 | 70.2 | 0.73 | 8% | 5% | 280ms |
| FLARE | 65.7 | 72.9 | 0.75 | 14% | 10% | 320ms |
| VORTEXRAG | 74.8 | 82.6 | 0.94 | 61% | 71% | 185ms |
Links
- 📄 Paper: https://doi.org/10.5281/zenodo.20285144
- 💻 GitHub: https://github.com/vignesh2027/VORTEXRAG
- 🤗 Space: https://huggingface.co/spaces/vigneshwar234/VORTEXRAG
Author: Vignesh L | License: MIT | Version: v2.0
- Downloads last month
- 34
