VOOZH about

URL: https://huggingface.co/datasets/ACSci/v3-eval-rubric-v2

⇱ ACSci/v3-eval-rubric-v2 · Datasets at Hugging Face


paper_id
stringclasses
221 values
title
stringclasses
221 values
model
stringclasses
1 value
strategy
stringclasses
6 values
pair_id
stringclasses
37 values
variant_id
int64
1
3
style
stringclasses
5 values
query
stringlengths
217
1.23k
input_context
stringlengths
77
187k
oracle_response
stringlengths
38
201k
response
stringlengths
679
34.8k
finished
bool
2 classes
response_chars
int64
679
34.8k
score
float64
0
1
n_criteria
int64
0
10
n_yes
int64
0
10
rubric_criteria
stringlengths
91
4.74k
judged_criteria
stringlengths
2
6.97k
0FJYicpOj0
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
cmu-pope-4b-v0300-iter-099
direction_to_idea
p001
1
concise
I am working on approximate machine unlearning in high-dimensional regimes where the number of parameters is comparable to the sample size ($p \sim n$). Current approaches like Zou et al. (2025) rely on standard $\epsilon$-certifiability and Laplace noise, which requires at least two Newton steps even for a single dele...
Modern ML models (from healthcare tools to systems like ChatGPT) are trained on data that often includes sensitive personal information, creating risks of memorization and leakage. Laws such as Union (2016); of California Department of Justice (2018); of Canada (2022) enforce a "right to be forgotten," requiring remova...
**Comparing with conclusions of Zou et al. (2025).** In high dimensions, the privacy-accuracy interplay is subtler than in low dimensions. Zou et al. (2025) is the only theoretical work we know of that has studied machine unlearning when $p \sim n$ by relaxing some of the standard optimization assumptions on the loss, ...
To address the challenge of high-dimensional machine unlearning ($p \sim n$) where existing two-step Newton-based methods (like Zou et al. 2025) impose unnecessary computational overhead, we must fundamentally decouple the **certifiability condition** from the **optimization step count** and introduce a mechanism that ...
true
17,876
0.1
10
1
[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response propose using a single noisy Gaussian-Newton step to achieve unlearning?", "reference_excerpt": "develop an approach where one noisy Gaussian-Newton step suffices to unlearn a regularized empirical risk minimizer."}, {"criterion_id": 2, "aspect"...
[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response propose using a single noisy Gaussian-Newton step to achieve unlearning?", "reference_excerpt": "develop an approach where one noisy Gaussian-Newton step suffices to unlearn a regularized empirical risk minimizer.", "judgment": "no", "evidence":...
0FJYicpOj0
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
cmu-pope-4b-v0300-iter-099
problem_gap_to_method
p003
2
detailed
Most theoretical work on machine unlearning assumes the number of model parameters is much smaller than the sample size. However, in high-dimensional settings, these standard optimization assumptions fail. We want to develop an approximate unlearning method that avoids full retraining by updating the original estimator...
Modern ML models (from healthcare tools to systems like ChatGPT) are trained on data that often includes sensitive personal information, creating risks of memorization and leakage. Laws such as Union (2016); of California Department of Justice (2018); of Canada (2022) enforce a "right to be forgotten," requiring remova...
To process deletions efficiently, without full retraining, and obscure residual information, we update $\hat{\beta}$ via the randomized procedure below.[^2] $$\tilde{\beta}_{\setminus \mathcal{M}} := \bar{A}(\hat{\beta}, \mathcal{D}_{\mathcal{M}}, T(\mathcal{D}_n), \mathbf{b}), \text{ where } \mathbf{b} \text{ is a ran...
To address the challenge of machine unlearning in high-dimensional settings ($d \gg n$, where $d$ is the parameter dimension and $n$ is the original sample size, with $n$ potentially smaller than or comparable to $d$), we must abandon the standard convexity and global gradient descent convergence assumptions that under...
true
14,530
0.9
10
9
[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response propose updating the original estimator using a randomized procedure to avoid full retraining?", "reference_excerpt": "To process deletions efficiently without full retraining... you could update the original estimator using a randomized procedu...
[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response propose updating the original estimator using a randomized procedure to avoid full retraining?", "reference_excerpt": "To process deletions efficiently without full retraining... you could update the original estimator using a randomized procedu...
0FJYicpOj0
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
cmu-pope-4b-v0300-iter-099
method_to_experiments
p004
2
detailed
I am working on a machine unlearning algorithm that avoids full retraining by updating the original estimator $\hat{\beta}$ using a randomized procedure with Gaussian noise $\mathbf{b}$. The goal is to satisfy $(\phi, \epsilon)$-Gaussian certifiability for privacy while maintaining accuracy, measured by Generalization ...
To process deletions efficiently, without full retraining, and obscure residual information, we update $\hat{\beta}$ via the randomized procedure below.[^2] $$\tilde{\beta}_{\setminus \mathcal{M}} := \bar{A}(\hat{\beta}, \mathcal{D}_{\mathcal{M}}, T(\mathcal{D}_n), \mathbf{b}), \text{ where } \mathbf{b} \text{ is a ran...
**Logistic regression model.** For simplicity, we use ridge penalized $r(\beta) = \|\beta\|_2^2$ logistic loss $\ell(y|x^\top \beta)$, with random features $x_i \sim N(0, \frac{1}{n} I_p)$ and true parameter $\beta_* \sim N(0, I_p)$. To depict high dimensionality, we take $n = p$, penalty parameter $\lambda = 0.5$, and...
To demonstrate the superior performance of Gaussian perturbation over Laplace noise in your machine unlearning algorithm regarding Generalization Error Divergence (GED) under $(\phi, \epsilon)$-Gaussian certifiability, you must construct an experimental framework that rigorously disentangles the effects of dimensionali...
true
16,188
0.5
10
5
[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response propose evaluating the Generalization Error Divergence (GED) across both synthetic and real-world datasets?", "reference_excerpt": "design a comprehensive suite of experiments evaluating the Generalization Error Divergence (GED) across synthetic...
[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response propose evaluating the Generalization Error Divergence (GED) across both synthetic and real-world datasets?", "reference_excerpt": "design a comprehensive suite of experiments evaluating the Generalization Error Divergence (GED) across synthetic...
0FJYicpOj0
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
cmu-pope-4b-v0300-iter-099
results_to_analysis
p006
1
concise
In our unlearning experiments with large dimensions ($p=1000$ to $10000$), Gaussian noise consistently achieves lower Generalization Error Difference (GED) and Unlearning Error Difference (UED) than eps-delta and Laplace noise. To formally prove our theoretical results in this high-dimensional setting, what standard as...
### 5.1 LARGE DIMENSIONAL AND REAL DATA EXPERIMENTS In this section, we add additional experiments with larger dimension $p$. The problem setting is exactly the same as in the previous section, but $p$ now varies from 1000 to 10000. Figure 4 shows that the same pattern as in Figure 1 earlier persists here. In the abov...
Assumption (B1) on mean-zero sub-Gaussian features is a mild and frequently made in many papers on high dimensional statistics. See for example Miolane & Montanari (2021); Rahnama Rad & Maleki (2020); Auddy et al. (2024); Zheng et al. (2017); Donoho & Montanari (2016); Bellec et al. (2025). Assumption (B2) is not a str...
To formally prove theoretical results regarding the performance of Gaussian noise versus $\epsilon$-$\delta$ (Exponential Mechanism) and Laplace noise in the context of *unlearning* large-scale models ($p \in [1000, 10000]$), you must move beyond generic privacy-unlearning trade-offs and construct a framework that spec...
true
16,199
0.5
10
5
[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response state that the theoretical proofs rely on two main assumptions?", "reference_excerpt": "rely on two standard assumptions"}, {"criterion_id": 2, "aspect": "key_idea", "criterion": "Does the response identify that one of the assumptions must be ma...
[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response state that the theoretical proofs rely on two main assumptions?", "reference_excerpt": "rely on two standard assumptions", "judgment": "no", "evidence": "The response categorizes assumptions into three areas (data geometry, model architecture, a...
0FJYicpOj0
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
cmu-pope-4b-v0300-iter-099
results_to_analysis
p007
3
conversational
I'm trying to write up the theoretical analysis for our unlearning results. We saw great performance in high dimensions ($p \sim n$), but I need to explain why prior theoretical frameworks don't apply here. Specifically, I want to show that assuming $\mu I_p \preceq \nabla^2 f(\beta, z_i) \preceq L I_p$ with constant $...
![Figure 1: Comparison of unlearned estimators on new test data: mean GED (with 3 SD error bars) across the dimension p (both in log scale) for Laplace (in red) vs. Gaussian (in cyan). We set λ = 0.5.](figures/figure_1.png) <!-- Figure 1 description: Three log-log plots showing Generalization Error Divergence (GED) vs....
**Failure of the assumptions in Allouah et al. (2025b), Sekhari et al. (2021).** In their theory Allouah et al. (2025b) assumes[^1] that $f$ satisfies the standard optimization assumptions. More precisely, they assume $\beta \to f(\beta, z_i)$ is simultaneously $\mu$-strongly convex and $L$-smooth, i.e., $$\mu I_p \pre...
To address the theoretical gap your unlearning framework encounters in the high-dimensional regime ($p \sim n$), we must first deconstruct the assumptions of standard convex optimization literature (e.g., those relying on uniform Hessian bounds $\mu I \preceq \nabla^2 f \preceq L I$). These bounds imply that the loss l...
true
18,872
0.3
10
3
[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response state that simultaneous strong convexity and smoothness conditions are incompatible in the high-dimensional proportional regime?", "reference_excerpt": "demonstrate that simultaneous strong convexity and smoothness conditions are incompatible."}...
[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response state that simultaneous strong convexity and smoothness conditions are incompatible in the high-dimensional proportional regime?", "reference_excerpt": "demonstrate that simultaneous strong convexity and smoothness conditions are incompatible.",...
0FJYicpOj0
Gaussian certified unlearning in high dimensions: A hypothesis testing approach
cmu-pope-4b-v0300-iter-099
mixed_context
p008
1
concise
"We have demonstrated that a Gaussian-perturbed one-step Newton estimator achieves vanishing General(...TRUNCATED)
"![Figure 1: Comparison of unlearned estimators on new test data: mean GED (with 3 SD error bars) ac(...TRUNCATED)
"Our analysis of models with a large number of parameters opens new avenues for tackling several oth(...TRUNCATED)
"Based on the strong foundation you have established regarding the Gaussian-perturbed one-step Newto(...TRUNCATED)
true
13,147
1
1
1
"[{\"criterion_id\": 1, \"aspect\": \"conceptual_depth\", \"criterion\": \"Does the response propose(...TRUNCATED)
"[{\"criterion_id\": 1, \"aspect\": \"conceptual_depth\", \"criterion\": \"Does the response propose(...TRUNCATED)
3RQ863cRbx
Visual symbolic mechanisms: Emergent symbol processing in Vision Language Models
cmu-pope-4b-v0300-iter-099
direction_to_idea
p001
1
concise
"I am investigating the 'binding problem' in Vision Language Models (VLMs), where models struggle to(...TRUNCATED)
"The visual world is composed of many commonly recurring elements — shapes, colors, textures, etc.(...TRUNCATED)
"![Figure 1: Overview of position ID architecture and supporting evidence. We identify three process(...TRUNCATED)
"To investigate the internal mechanisms by which Vision Language Models (VLMs) attempt to resolve th(...TRUNCATED)
true
16,505
0.2
10
2
"[{\"criterion_id\": 1, \"aspect\": \"key_idea\", \"criterion\": \"Does the response propose investi(...TRUNCATED)
"[{\"criterion_id\": 1, \"aspect\": \"key_idea\", \"criterion\": \"Does the response propose investi(...TRUNCATED)
3RQ863cRbx
Visual symbolic mechanisms: Emergent symbol processing in Vision Language Models
cmu-pope-4b-v0300-iter-099
literature_to_gap
p002
2
detailed
"Mechanistic interpretability has revealed structured, symbol-like processing mechanisms in text-onl(...TRUNCATED)
"A number of studies have investigated the emergent mechanisms that support symbol-like processing i(...TRUNCATED)
"Beyond the difference in modalities (images vs. text), there are also several novel contributions o(...TRUNCATED)
"The extension of mechanistic interpretability techniques—specifically those designed for the high(...TRUNCATED)
true
14,713
0.6
10
6
"[{\"criterion_id\": 1, \"aspect\": \"key_idea\", \"criterion\": \"Does the response state that the (...TRUNCATED)
"[{\"criterion_id\": 1, \"aspect\": \"key_idea\", \"criterion\": \"Does the response state that the (...TRUNCATED)
3RQ863cRbx
Visual symbolic mechanisms: Emergent symbol processing in Vision Language Models
cmu-pope-4b-v0300-iter-099
problem_gap_to_method
p003
1
concise
"Vision Language Models (VLMs) often struggle with the 'binding problem'—associating the correct f(...TRUNCATED)
"The visual world is composed of many commonly recurring elements — shapes, colors, textures, etc.(...TRUNCATED)
"![Figure 1: Overview of position ID architecture and supporting evidence. We identify three process(...TRUNCATED)
"The \"binding problem\" in Vision-Language Models (VLMs) is fundamentally a challenge of aligning h(...TRUNCATED)
true
16,806
0.1
10
1
"[{\"criterion_id\": 1, \"aspect\": \"key_idea\", \"criterion\": \"Does the response propose an inte(...TRUNCATED)
"[{\"criterion_id\": 1, \"aspect\": \"key_idea\", \"criterion\": \"Does the response propose an inte(...TRUNCATED)
3RQ863cRbx
Visual symbolic mechanisms: Emergent symbol processing in Vision Language Models
cmu-pope-4b-v0300-iter-099
method_to_experiments
p004
1
concise
"I am investigating how Vision-Language Models (VLMs) bind visual features to objects, specifically (...TRUNCATED)
"![Figure 1: Overview of position ID architecture and supporting evidence. We identify three process(...TRUNCATED)
"To identify the mechanisms underlying visual binding in VLMs, we used a scene description task that(...TRUNCATED)
"To rigorously investigate how Vision-Language Models (VLMs) bind visual features to objects and spe(...TRUNCATED)
true
13,168
0.7
10
7
"[{\"criterion_id\": 1, \"aspect\": \"key_idea\", \"criterion\": \"Does the response propose a scene(...TRUNCATED)
"[{\"criterion_id\": 1, \"aspect\": \"key_idea\", \"criterion\": \"Does the response propose a scene(...TRUNCATED)
End of preview. Expand in Data Studio

README.md exists but content is empty.
Downloads last month
75