Voozh

VOOZH

URL: https://huggingface.co/datasets/ACSci/v3-eval-rubric-v2

⇱ ACSci/v3-eval-rubric-v2 · Datasets at Hugging Face

Dataset Viewer

Auto-converted to Parquet Duplicate

paper_id stringclasses 221 values	title stringclasses 221 values	model stringclasses 1 value	strategy stringclasses 6 values	pair_id stringclasses 37 values	variant_id int64 1 3	style stringclasses 5 values	query stringlengths 217 1.23k	input_context stringlengths 77 187k	oracle_response stringlengths 38 201k	response stringlengths 679 34.8k	finished bool 2 classes	response_chars int64 679 34.8k	score float64 0 1	n_criteria int64 0 10	n_yes int64 0 10	rubric_criteria stringlengths 91 4.74k	judged_criteria stringlengths 2 6.97k
0FJYicpOj0	Gaussian certified unlearning in high dimensions: A hypothesis testing approach	cmu-pope-4b-v0300-iter-099	direction_to_idea	p001	1	concise	I am working on approximate machine unlearning in high-dimensional regimes where the number of parameters is comparable to the sample size ($p \sim n$). Current approaches like Zou et al. (2025) rely on standard $\epsilon$-certifiability and Laplace noise, which requires at least two Newton steps even for a single dele...	Modern ML models (from healthcare tools to systems like ChatGPT) are trained on data that often includes sensitive personal information, creating risks of memorization and leakage. Laws such as Union (2016); of California Department of Justice (2018); of Canada (2022) enforce a "right to be forgotten," requiring remova...	Comparing with conclusions of Zou et al. (2025). In high dimensions, the privacy-accuracy interplay is subtler than in low dimensions. Zou et al. (2025) is the only theoretical work we know of that has studied machine unlearning when $p \sim n$ by relaxing some of the standard optimization assumptions on the loss, ...	To address the challenge of high-dimensional machine unlearning ($p \sim n$) where existing two-step Newton-based methods (like Zou et al. 2025) impose unnecessary computational overhead, we must fundamentally decouple the certifiability condition from the optimization step count and introduce a mechanism that ...	true	17,876	0.1	10	1	[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response propose using a single noisy Gaussian-Newton step to achieve unlearning?", "reference_excerpt": "develop an approach where one noisy Gaussian-Newton step suffices to unlearn a regularized empirical risk minimizer."}, {"criterion_id": 2, "aspect"...	[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response propose using a single noisy Gaussian-Newton step to achieve unlearning?", "reference_excerpt": "develop an approach where one noisy Gaussian-Newton step suffices to unlearn a regularized empirical risk minimizer.", "judgment": "no", "evidence":...
0FJYicpOj0	Gaussian certified unlearning in high dimensions: A hypothesis testing approach	cmu-pope-4b-v0300-iter-099	problem_gap_to_method	p003	2	detailed	Most theoretical work on machine unlearning assumes the number of model parameters is much smaller than the sample size. However, in high-dimensional settings, these standard optimization assumptions fail. We want to develop an approximate unlearning method that avoids full retraining by updating the original estimator...	Modern ML models (from healthcare tools to systems like ChatGPT) are trained on data that often includes sensitive personal information, creating risks of memorization and leakage. Laws such as Union (2016); of California Department of Justice (2018); of Canada (2022) enforce a "right to be forgotten," requiring remova...	To process deletions efficiently, without full retraining, and obscure residual information, we update $\hat{\beta}$ via the randomized procedure below.[^2] $$\tilde{\beta}_{\setminus \mathcal{M}} := \bar{A}(\hat{\beta}, \mathcal{D}_{\mathcal{M}}, T(\mathcal{D}_n), \mathbf{b}), \text{ where } \mathbf{b} \text{ is a ran...	To address the challenge of machine unlearning in high-dimensional settings ($d \gg n$, where $d$ is the parameter dimension and $n$ is the original sample size, with $n$ potentially smaller than or comparable to $d$), we must abandon the standard convexity and global gradient descent convergence assumptions that under...	true	14,530	0.9	10	9	[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response propose updating the original estimator using a randomized procedure to avoid full retraining?", "reference_excerpt": "To process deletions efficiently without full retraining... you could update the original estimator using a randomized procedu...	[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response propose updating the original estimator using a randomized procedure to avoid full retraining?", "reference_excerpt": "To process deletions efficiently without full retraining... you could update the original estimator using a randomized procedu...
0FJYicpOj0	Gaussian certified unlearning in high dimensions: A hypothesis testing approach	cmu-pope-4b-v0300-iter-099	method_to_experiments	p004	2	detailed	I am working on a machine unlearning algorithm that avoids full retraining by updating the original estimator $\hat{\beta}$ using a randomized procedure with Gaussian noise $\mathbf{b}$. The goal is to satisfy $(\phi, \epsilon)$-Gaussian certifiability for privacy while maintaining accuracy, measured by Generalization ...	To process deletions efficiently, without full retraining, and obscure residual information, we update $\hat{\beta}$ via the randomized procedure below.[^2] $$\tilde{\beta}_{\setminus \mathcal{M}} := \bar{A}(\hat{\beta}, \mathcal{D}_{\mathcal{M}}, T(\mathcal{D}_n), \mathbf{b}), \text{ where } \mathbf{b} \text{ is a ran...	Logistic regression model. For simplicity, we use ridge penalized $r(\beta) = \\|\beta\\|_2^2$ logistic loss $\ell(y\|x^\top \beta)$, with random features $x_i \sim N(0, \frac{1}{n} I_p)$ and true parameter $\beta_* \sim N(0, I_p)$. To depict high dimensionality, we take $n = p$, penalty parameter $\lambda = 0.5$, and...	To demonstrate the superior performance of Gaussian perturbation over Laplace noise in your machine unlearning algorithm regarding Generalization Error Divergence (GED) under $(\phi, \epsilon)$-Gaussian certifiability, you must construct an experimental framework that rigorously disentangles the effects of dimensionali...	true	16,188	0.5	10	5	[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response propose evaluating the Generalization Error Divergence (GED) across both synthetic and real-world datasets?", "reference_excerpt": "design a comprehensive suite of experiments evaluating the Generalization Error Divergence (GED) across synthetic...	[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response propose evaluating the Generalization Error Divergence (GED) across both synthetic and real-world datasets?", "reference_excerpt": "design a comprehensive suite of experiments evaluating the Generalization Error Divergence (GED) across synthetic...
0FJYicpOj0	Gaussian certified unlearning in high dimensions: A hypothesis testing approach	cmu-pope-4b-v0300-iter-099	results_to_analysis	p006	1	concise	In our unlearning experiments with large dimensions ($p=1000$ to $10000$), Gaussian noise consistently achieves lower Generalization Error Difference (GED) and Unlearning Error Difference (UED) than eps-delta and Laplace noise. To formally prove our theoretical results in this high-dimensional setting, what standard as...	### 5.1 LARGE DIMENSIONAL AND REAL DATA EXPERIMENTS In this section, we add additional experiments with larger dimension $p$. The problem setting is exactly the same as in the previous section, but $p$ now varies from 1000 to 10000. Figure 4 shows that the same pattern as in Figure 1 earlier persists here. In the abov...	Assumption (B1) on mean-zero sub-Gaussian features is a mild and frequently made in many papers on high dimensional statistics. See for example Miolane & Montanari (2021); Rahnama Rad & Maleki (2020); Auddy et al. (2024); Zheng et al. (2017); Donoho & Montanari (2016); Bellec et al. (2025). Assumption (B2) is not a str...	To formally prove theoretical results regarding the performance of Gaussian noise versus $\epsilon$-$\delta$ (Exponential Mechanism) and Laplace noise in the context of unlearning large-scale models ($p \in [1000, 10000]$), you must move beyond generic privacy-unlearning trade-offs and construct a framework that spec...	true	16,199	0.5	10	5	[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response state that the theoretical proofs rely on two main assumptions?", "reference_excerpt": "rely on two standard assumptions"}, {"criterion_id": 2, "aspect": "key_idea", "criterion": "Does the response identify that one of the assumptions must be ma...	[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response state that the theoretical proofs rely on two main assumptions?", "reference_excerpt": "rely on two standard assumptions", "judgment": "no", "evidence": "The response categorizes assumptions into three areas (data geometry, model architecture, a...
0FJYicpOj0	Gaussian certified unlearning in high dimensions: A hypothesis testing approach	cmu-pope-4b-v0300-iter-099	results_to_analysis	p007	3	conversational	I'm trying to write up the theoretical analysis for our unlearning results. We saw great performance in high dimensions ($p \sim n$), but I need to explain why prior theoretical frameworks don't apply here. Specifically, I want to show that assuming $\mu I_p \preceq \nabla^2 f(\beta, z_i) \preceq L I_p$ with constant $...	![Figure 1: Comparison of unlearned estimators on new test data: mean GED (with 3 SD error bars) across the dimension p (both in log scale) for Laplace (in red) vs. Gaussian (in cyan). We set λ = 0.5.](figures/figure_1.png) <!-- Figure 1 description: Three log-log plots showing Generalization Error Divergence (GED) vs....	Failure of the assumptions in Allouah et al. (2025b), Sekhari et al. (2021). In their theory Allouah et al. (2025b) assumes[^1] that $f$ satisfies the standard optimization assumptions. More precisely, they assume $\beta \to f(\beta, z_i)$ is simultaneously $\mu$-strongly convex and $L$-smooth, i.e., $$\mu I_p \pre...	To address the theoretical gap your unlearning framework encounters in the high-dimensional regime ($p \sim n$), we must first deconstruct the assumptions of standard convex optimization literature (e.g., those relying on uniform Hessian bounds $\mu I \preceq \nabla^2 f \preceq L I$). These bounds imply that the loss l...	true	18,872	0.3	10	3	[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response state that simultaneous strong convexity and smoothness conditions are incompatible in the high-dimensional proportional regime?", "reference_excerpt": "demonstrate that simultaneous strong convexity and smoothness conditions are incompatible."}...	[{"criterion_id": 1, "aspect": "key_idea", "criterion": "Does the response state that simultaneous strong convexity and smoothness conditions are incompatible in the high-dimensional proportional regime?", "reference_excerpt": "demonstrate that simultaneous strong convexity and smoothness conditions are incompatible.",...
0FJYicpOj0	Gaussian certified unlearning in high dimensions: A hypothesis testing approach	cmu-pope-4b-v0300-iter-099	mixed_context	p008	1	concise	"We have demonstrated that a Gaussian-perturbed one-step Newton estimator achieves vanishing General(...TRUNCATED)	"![Figure 1: Comparison of unlearned estimators on new test data: mean GED (with 3 SD error bars) ac(...TRUNCATED)	"Our analysis of models with a large number of parameters opens new avenues for tackling several oth(...TRUNCATED)	"Based on the strong foundation you have established regarding the Gaussian-perturbed one-step Newto(...TRUNCATED)	true	13,147	1	1	1	"[{\"criterion_id\": 1, \"aspect\": \"conceptual_depth\", \"criterion\": \"Does the response propose(...TRUNCATED)	"[{\"criterion_id\": 1, \"aspect\": \"conceptual_depth\", \"criterion\": \"Does the response propose(...TRUNCATED)
3RQ863cRbx	Visual symbolic mechanisms: Emergent symbol processing in Vision Language Models	cmu-pope-4b-v0300-iter-099	direction_to_idea	p001	1	concise	"I am investigating the 'binding problem' in Vision Language Models (VLMs), where models struggle to(...TRUNCATED)	"The visual world is composed of many commonly recurring elements — shapes, colors, textures, etc.(...TRUNCATED)	"![Figure 1: Overview of position ID architecture and supporting evidence. We identify three process(...TRUNCATED)	"To investigate the internal mechanisms by which Vision Language Models (VLMs) attempt to resolve th(...TRUNCATED)	true	16,505	0.2	10	2	"[{\"criterion_id\": 1, \"aspect\": \"key_idea\", \"criterion\": \"Does the response propose investi(...TRUNCATED)	"[{\"criterion_id\": 1, \"aspect\": \"key_idea\", \"criterion\": \"Does the response propose investi(...TRUNCATED)
3RQ863cRbx	Visual symbolic mechanisms: Emergent symbol processing in Vision Language Models	cmu-pope-4b-v0300-iter-099	literature_to_gap	p002	2	detailed	"Mechanistic interpretability has revealed structured, symbol-like processing mechanisms in text-onl(...TRUNCATED)	"A number of studies have investigated the emergent mechanisms that support symbol-like processing i(...TRUNCATED)	"Beyond the difference in modalities (images vs. text), there are also several novel contributions o(...TRUNCATED)	"The extension of mechanistic interpretability techniques—specifically those designed for the high(...TRUNCATED)	true	14,713	0.6	10	6	"[{\"criterion_id\": 1, \"aspect\": \"key_idea\", \"criterion\": \"Does the response state that the (...TRUNCATED)	"[{\"criterion_id\": 1, \"aspect\": \"key_idea\", \"criterion\": \"Does the response state that the (...TRUNCATED)
3RQ863cRbx	Visual symbolic mechanisms: Emergent symbol processing in Vision Language Models	cmu-pope-4b-v0300-iter-099	problem_gap_to_method	p003	1	concise	"Vision Language Models (VLMs) often struggle with the 'binding problem'—associating the correct f(...TRUNCATED)	"The visual world is composed of many commonly recurring elements — shapes, colors, textures, etc.(...TRUNCATED)	"![Figure 1: Overview of position ID architecture and supporting evidence. We identify three process(...TRUNCATED)	"The \"binding problem\" in Vision-Language Models (VLMs) is fundamentally a challenge of aligning h(...TRUNCATED)	true	16,806	0.1	10	1	"[{\"criterion_id\": 1, \"aspect\": \"key_idea\", \"criterion\": \"Does the response propose an inte(...TRUNCATED)	"[{\"criterion_id\": 1, \"aspect\": \"key_idea\", \"criterion\": \"Does the response propose an inte(...TRUNCATED)
3RQ863cRbx	Visual symbolic mechanisms: Emergent symbol processing in Vision Language Models	cmu-pope-4b-v0300-iter-099	method_to_experiments	p004	1	concise	"I am investigating how Vision-Language Models (VLMs) bind visual features to objects, specifically (...TRUNCATED)	"![Figure 1: Overview of position ID architecture and supporting evidence. We identify three process(...TRUNCATED)	"To identify the mechanisms underlying visual binding in VLMs, we used a scene description task that(...TRUNCATED)	"To rigorously investigate how Vision-Language Models (VLMs) bind visual features to objects and spe(...TRUNCATED)	true	13,168	0.7	10	7	"[{\"criterion_id\": 1, \"aspect\": \"key_idea\", \"criterion\": \"Does the response propose a scene(...TRUNCATED)	"[{\"criterion_id\": 1, \"aspect\": \"key_idea\", \"criterion\": \"Does the response propose a scene(...TRUNCATED)

End of preview. Expand in Data Studio

README.md exists but content is empty.

Downloads last month: 75