VOOZH about

URL: https://huggingface.co/datasets/vigneshwar234/VORTEXRAG-Benchmarks

⇱ vigneshwar234/VORTEXRAG-Benchmarks · Datasets at Hugging Face


Dataset Preview
Duplicate
config
string
layers
string
em
float64
f1
float64
faithfulness
float64
sdr_pct
int64
cpr_pct
int64
delta_em
float64
key_addition
string
A
Baseline
61.2
68.4
0.71
0
0
0
Standard DPR retrieval
B
A+TVE
65.3
72.1
0.75
18
0
4.1
Causal+syntactic arms added
C
B+VRC
67.8
74.9
0.78
26
8
2.5
Geometric causal suppression
D
C+SDC
70.4
78.2
0.83
48
19
2.6
Per-chunk SDS threshold
E
D+CPG
72.1
80.3
0.88
54
58
1.7
Collective ESR constraint (+39pp CPR)
F
E+RFG
73.4
81.5
0.9
57
65
1.3
No-weak-link Phi-score
G
F+CCB
73.9
82
0.91
59
68
0.5
Depth-0 at position 0
H
G+FV (FULL)
74.8
82.6
0.94
61
71
0.9
Delta-R gate + regen loop
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
61.2
68.4
0.71
null
null
null
null
null
null
59.8
66.1
0.69
null
null
null
null
null
null
64.1
71.8
0.74
12
8
null
null
null
null
66.9
74.3
0.78
31
19
null
null
null
null
68.4
75.9
0.81
35
24
null
null
null
null
63.5
70.2
0.73
8
5
null
null
null
null
65.7
72.9
0.75
14
10
null
null
null
null
74.8
82.6
0.94
61
71
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null
null

VORTEXRAG Benchmark Evaluation Dataset

Benchmark evaluation data, results, and per-layer ablation scores for the VORTEXRAG 7-layer causal RAG framework.

Description

This dataset contains:

  1. Main benchmark results comparing VORTEXRAG against 7 baseline systems across NQ, HotpotQA, MuSiQue, and 2WikiMultiHopQA
  2. Full ablation study showing per-layer contribution (configs A through H)
  3. Per-dataset breakdown showing performance on 6 evaluation datasets
  4. Domain preset evaluation for all 11 domain configurations
  5. Hyperparameter sensitivity sweep data for τ and θ_CPG
  6. Latency breakdown per layer on A100 GPU

Datasets Used for Evaluation

Dataset Type Questions Hops Source
NaturalQuestions Open-domain QA 7,842 1–2 Wikipedia Dec 2018
HotpotQA Multi-hop QA 7,405 2 Wikipedia 10K docs
MuSiQue Multi-hop QA 2,417 2–4 Wikipedia filtered
2WikiMultiHopQA Multi-hop QA 12,576 2 Wikipedia + Wikidata
LegalBench Legal QA 1,200 1–3 US federal case law
MedQA Medical QA 1,273 1–2 PubMed abstracts

Main Results

System EM F1 Faithfulness SDR CPR Latency
Naive RAG 61.2 68.4 0.71 120ms
BM25+Rerank 59.8 66.1 0.69 95ms
HyDE 64.1 71.8 0.74 12% 8% 340ms
CRAG 66.9 74.3 0.78 31% 19% 290ms
Self-RAG 68.4 75.9 0.81 35% 24% 410ms
FiD 63.5 70.2 0.73 8% 5% 280ms
FLARE 65.7 72.9 0.75 14% 10% 320ms
VORTEXRAG 74.8 82.6 0.94 61% 71% 185ms

Links

Author: Vignesh L | License: MIT | Version: v2.0

Downloads last month
34

Space using vigneshwar234/VORTEXRAG-Benchmarks 1