VOOZH about

URL: https://huggingface.co/datasets/ACSci/v3-eval-judge-gpt-oss-20b

⇱ ACSci/v3-eval-judge-gpt-oss-20b · Datasets at Hugging Face


paper_id
stringclasses
221 values
title
stringclasses
221 values
model
stringclasses
1 value
strategy
stringclasses
6 values
pair_id
stringclasses
37 values
variant_id
int64
1
3
style
stringclasses
5 values
query
stringlengths
176
1.08k
response
stringlengths
714
33.6k
finished
bool
1 class
response_chars
int64
714
33.6k
score
float64
0
1
n_criteria
int64
0
11
n_yes
int64
0
10
criterion_scores
stringlengths
2
3.14k
0FJYicpOj0
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
azure_openai_gpt-5_4
direction_to_idea
p001
3
conversational
I'm trying to improve the efficiency of approximate unlearning for models where $p \sim n$. Right now, the state-of-the-art by Zou et al. (2025) needs at least two Newton steps to remove a data point because they use standard $\epsilon$-certifiability, which forces them to add massive amounts of Laplace noise. Since hi...
Yes. A promising way to do this is to replace the worst-case, coordinatewise notion behind standard \(\epsilon\)-certifiability with a distributional notion tailored to the fact that, in the \(p \sim n\) regime, the one-step parameter perturbation from removing a point is often well-approximated by a high-dimensional G...
true
20,831
0.7778
9
7
[{"criterion_id": 2, "judgment": "yes", "evidence": "Section 3.2 introduces a \"Gaussian certifiability\" framework, replacing standard \u03b5-certifiability with a Gaussian\u2011based definition."}, {"criterion_id": 3, "judgment": "yes", "evidence": "The response states: \"standard \u03b5\u2011certifiability ... force...
0FJYicpOj0
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
azure_openai_gpt-5_4
literature_to_gap
p002
1
concise
Recent theoretical work on machine unlearning, such as Newton-based methods by Guo et al. and Sekhari et al., and gradient-based methods by Allouah et al., provides privacy guarantees for data removal. However, what is the critical gap or implicit assumption in these theoretical frameworks when applied to modern AI mod...
The critical gap is that these theoretical unlearning guarantees typically rely on a **classical asymptotic regime** that is badly mismatched to modern overparameterized models. More concretely, the implicit assumption is: ## Core hidden assumption They assume learning occurs in a regime where the **number of model p...
true
4,740
0.6
10
6
[{"criterion_id": 1, "judgment": "yes", "evidence": "They assume learning occurs in a regime where the number of model parameters is fixed or grows much more slowly than the number of training samples."}, {"criterion_id": 2, "judgment": "yes", "evidence": "In modern models, especially deep networks and large foundation...
0FJYicpOj0
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
azure_openai_gpt-5_4
problem_gap_to_method
p003
3
conversational
I'm working on machine unlearning for high-dimensional models. The standard approach of exact retraining is too expensive, and existing approximate methods rely on low-dimensional assumptions that don't hold when parameters scale with the dataset size. If we want to build a randomized update procedure that relies on th...
A good way to think about this is: - You have a trained model parameter vector \(\hat\theta\) obtained on dataset \(S\). - You want to “delete” a subset \(U \subset S\), producing a new parameter vector that should behave like the retrained solution on \(S \setminus U\), but without full retraining. - In high dimensio...
true
19,569
1
10
10
[{"criterion_id": 1, "judgment": "yes", "evidence": "The goal is to construct a randomized map ... that is close in utility to retrained model ..."}, {"criterion_id": 2, "judgment": "yes", "evidence": "You asked specifically for formal metrics for both privacy protection and accuracy. In high dimensions, deterministic ...
0FJYicpOj0
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
azure_openai_gpt-5_4
method_to_experiments
p004
2
detailed
I am working on a machine unlearning algorithm that avoids full retraining by updating the original estimator $\hat{\beta}$ using a randomized procedure with Gaussian noise $\mathbf{b}$. The goal is to satisfy $(\phi, \epsilon)$-Gaussian certifiability for privacy while maintaining accuracy, measured by Generalization ...
Below is a concrete experimental blueprint to show that a Gaussian-perturbation unlearning update yields better utility than a Laplace-perturbation baseline under matched certifiability guarantees, while avoiding full retraining. --- ## 1. Core claim to test You want to demonstrate: 1. **Privacy/certifiability:** y...
true
24,450
0.5
10
5
[{"criterion_id": 1, "judgment": "yes", "evidence": "The response explicitly defines GED in Section 4 and states that experiments will be run on both synthetic logistic regression tasks and real\u2011world datasets to evaluate GED."}, {"criterion_id": 2, "judgment": "yes", "evidence": "Section 2 lists both a Gaussian\u...
0FJYicpOj0
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
azure_openai_gpt-5_4
results_to_analysis
p006
1
concise
In our unlearning experiments with large dimensions ($p=1000$ to $10000$), Gaussian noise consistently achieves lower Generalization Error Difference (GED) and Unlearning Error Difference (UED) than eps-delta and Laplace noise. To formally prove our theoretical results in this high-dimensional setting, what standard as...
For high-dimensional theoretical guarantees of unlearning with \(p \in [10^3,10^4]\), the standard route is to place assumptions on: 1. the **feature distribution** \(x \in \mathbb{R}^p\), 2. the **linear predictor / generalized linear model**, 3. the **loss curvature and regularization**, 4. the **scaling of norms an...
true
15,121
0.8
10
8
[{"criterion_id": 1, "judgment": "no", "evidence": "The response lists many assumptions (feature distribution, model, loss, scaling, noise) rather than stating two main assumptions."}, {"criterion_id": 2, "judgment": "yes", "evidence": "The response explicitly lists \"feature distribution\" as one of the key assumption...
0FJYicpOj0
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
azure_openai_gpt-5_4
results_to_analysis
p007
1
concise
Our experiments show that Gaussian perturbation outperforms other unlearning estimators in the high-dimensional proportional regime where $p \sim n$. How can we mathematically demonstrate that standard optimization assumptions, like simultaneous $O(1)$-smoothness and $\Omega(1)$-strong convexity of the per-example loss...
Yes. In the proportional high-dimensional regime \(p \asymp n\), one can make this failure mathematically explicit by looking at the Hessian geometry of standard empirical risks. The key point is that the usual assumptions - per-example loss is \(L\)-smooth with \(L=O(1)\), and - empirical or population objective is \...
true
11,442
0.1111
9
1
[{"criterion_id": 1, "judgment": "yes", "evidence": "The response states: \"cannot both hold uniformly, once the feature dimension grows proportionally with sample size\" and that simultaneous O(1)-smoothness and \u03a9(1)-strong convexity fail in this regime."}, {"criterion_id": 3, "judgment": "no", "evidence": "not f...
0FJYicpOj0
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
azure_openai_gpt-5_4
mixed_context
p008
1
concise
We have demonstrated that a Gaussian-perturbed one-step Newton estimator achieves vanishing Generalization Error Divergence in high dimensions for generalized linear models, with empirical unlearning batch sizes scaling as $m^3 = o(n)$. What are the most promising theoretical and practical directions to extend this unl...
Below are directions that seem most promising if you want to extend a Gaussian-perturbed one-step Newton unlearning framework beyond the current “high-dimensional GLM + vanishing GED + empirical \(m^3=o(n)\)” regime. I’ll organize them by what is likely to matter theoretically, algorithmically, and empirically. --- #...
true
22,740
0.6
10
6
[{"criterion_id": 1, "judgment": "yes", "evidence": "Section 3: \"Extend from GLMs to broader model classes\" lists general smooth M\u2011estimators, penalized estimators, kernel methods, and deep linear networks."}, {"criterion_id": 2, "judgment": "no", "evidence": "not found"}, {"criterion_id": 3, "judgment": "no", "...
0FJYicpOj0
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
azure_openai_gpt-5_4
mixed_context
p009
2
detailed
I'm researching approximate machine unlearning in the high-dimensional proportional regime where the number of parameters $p$ is comparable to the sample size $n$. Existing theoretical guarantees for Newton-based unlearning rely on standard $\Omega(1)$ strong convexity and $O(1)$ smoothness assumptions, which break dow...
A useful way to attack this is to **change the notion of certifiability** so it matches what Newton-style unlearning actually produces in the proportional regime. The core issue is: - In the high-dimensional regime \(p \asymp n\), the Hessian is not uniformly well-conditioned in the classical \(\mu I \preceq \nabla^2 ...
true
21,757
0.4
5
2
[{"criterion_id": 1, "judgment": "no", "evidence": "The response discusses Gaussian certifiability but does not mention an epsilon\u2011Gaussian framework."}, {"criterion_id": 3, "judgment": "no", "evidence": "The response does not reference the hypothesis\u2011testing interpretation of differential privacy or the conv...
0jHyEKHDyx
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
azure_openai_gpt-5_4
direction_to_idea
p001
3
conversational
I've been looking into the persistent issue where BF16 flash attention causes training to suddenly diverge and explode after a few thousand steps. It seems like the community relies heavily on patches like QK-clip or higher precision, but I want to understand exactly why this happens at the arithmetic level. Can you br...
Yes. The short version is: **BF16 flash attention can blow up because attention is an exponentially sensitive computation sitting on top of low-precision dot products and low-precision reductions.** A small multiplicative or additive error in the logits can become a large relative error after softmax, and if that er...
true
17,234
0.3
10
3
[{"criterion_id": 1, "judgment": "no", "evidence": "not found"}, {"criterion_id": 2, "judgment": "yes", "evidence": "Fix B: Keep the softmax state in FP32, not BF16"}, {"criterion_id": 3, "judgment": "yes", "evidence": "softmax exponentiates these noisy logits ... This biases attention toward accidental winners"}, {"cr...
0jHyEKHDyx
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention
azure_openai_gpt-5_4
problem_gap_to_method
p002
3
conversational
"We are trying to push our transformer training to lower precisions for efficiency, specifically usi(...TRUNCATED)
"A principled way to think about this is:\n\n**the failure is not fundamentally “large QK values(...TRUNCATED)
true
15,110
0
10
0
"[{\"criterion_id\": 1, \"judgment\": \"no\", \"evidence\": \"not found\"}, {\"criterion_id\": 2, \"(...TRUNCATED)
End of preview. Expand in Data Studio

README.md exists but content is empty.
Downloads last month
292