paper_id stringclasses 221
values | title stringclasses 221
values | model stringclasses 1
value | strategy stringclasses 6
values | pair_id stringclasses 37
values | variant_id int64 1 3 | style stringclasses 5
values | query stringlengths 176 1.08k | response stringlengths 714 33.6k | finished bool 1
class | response_chars int64 714 33.6k | score float64 0 1 | n_criteria int64 0 11 | n_yes int64 0 10 | criterion_scores stringlengths 2 3.14k |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0FJYicpOj0 | Gaussian certified unlearning in high dimensions: A hypothesis testing approach | azure_openai_gpt-5_4 | direction_to_idea | p001 | 3 | conversational | I'm trying to improve the efficiency of approximate unlearning for models where $p \sim n$. Right now, the state-of-the-art by Zou et al. (2025) needs at least two Newton steps to remove a data point because they use standard $\epsilon$-certifiability, which forces them to add massive amounts of Laplace noise. Since hi... | Yes. A promising way to do this is to replace the worst-case, coordinatewise notion behind standard \(\epsilon\)-certifiability with a distributional notion tailored to the fact that, in the \(p \sim n\) regime, the one-step parameter perturbation from removing a point is often well-approximated by a high-dimensional G... | true | 20,831 | 0.7778 | 9 | 7 | [{"criterion_id": 2, "judgment": "yes", "evidence": "Section 3.2 introduces a \"Gaussian certifiability\" framework, replacing standard \u03b5-certifiability with a Gaussian\u2011based definition."}, {"criterion_id": 3, "judgment": "yes", "evidence": "The response states: \"standard \u03b5\u2011certifiability ... force... |
0FJYicpOj0 | Gaussian certified unlearning in high dimensions: A hypothesis testing approach | azure_openai_gpt-5_4 | literature_to_gap | p002 | 1 | concise | Recent theoretical work on machine unlearning, such as Newton-based methods by Guo et al. and Sekhari et al., and gradient-based methods by Allouah et al., provides privacy guarantees for data removal. However, what is the critical gap or implicit assumption in these theoretical frameworks when applied to modern AI mod... | The critical gap is that these theoretical unlearning guarantees typically rely on a **classical asymptotic regime** that is badly mismatched to modern overparameterized models.
More concretely, the implicit assumption is:
## Core hidden assumption
They assume learning occurs in a regime where the **number of model p... | true | 4,740 | 0.6 | 10 | 6 | [{"criterion_id": 1, "judgment": "yes", "evidence": "They assume learning occurs in a regime where the number of model parameters is fixed or grows much more slowly than the number of training samples."}, {"criterion_id": 2, "judgment": "yes", "evidence": "In modern models, especially deep networks and large foundation... |
0FJYicpOj0 | Gaussian certified unlearning in high dimensions: A hypothesis testing approach | azure_openai_gpt-5_4 | problem_gap_to_method | p003 | 3 | conversational | I'm working on machine unlearning for high-dimensional models. The standard approach of exact retraining is too expensive, and existing approximate methods rely on low-dimensional assumptions that don't hold when parameters scale with the dataset size. If we want to build a randomized update procedure that relies on th... | A good way to think about this is:
- You have a trained model parameter vector \(\hat\theta\) obtained on dataset \(S\).
- You want to “delete” a subset \(U \subset S\), producing a new parameter vector that should behave like the retrained solution on \(S \setminus U\), but without full retraining.
- In high dimensio... | true | 19,569 | 1 | 10 | 10 | [{"criterion_id": 1, "judgment": "yes", "evidence": "The goal is to construct a randomized map ... that is close in utility to retrained model ..."}, {"criterion_id": 2, "judgment": "yes", "evidence": "You asked specifically for formal metrics for both privacy protection and accuracy. In high dimensions, deterministic ... |
0FJYicpOj0 | Gaussian certified unlearning in high dimensions: A hypothesis testing approach | azure_openai_gpt-5_4 | method_to_experiments | p004 | 2 | detailed | I am working on a machine unlearning algorithm that avoids full retraining by updating the original estimator $\hat{\beta}$ using a randomized procedure with Gaussian noise $\mathbf{b}$. The goal is to satisfy $(\phi, \epsilon)$-Gaussian certifiability for privacy while maintaining accuracy, measured by Generalization ... | Below is a concrete experimental blueprint to show that a Gaussian-perturbation unlearning update yields better utility than a Laplace-perturbation baseline under matched certifiability guarantees, while avoiding full retraining.
---
## 1. Core claim to test
You want to demonstrate:
1. **Privacy/certifiability:** y... | true | 24,450 | 0.5 | 10 | 5 | [{"criterion_id": 1, "judgment": "yes", "evidence": "The response explicitly defines GED in Section 4 and states that experiments will be run on both synthetic logistic regression tasks and real\u2011world datasets to evaluate GED."}, {"criterion_id": 2, "judgment": "yes", "evidence": "Section 2 lists both a Gaussian\u... |
0FJYicpOj0 | Gaussian certified unlearning in high dimensions: A hypothesis testing approach | azure_openai_gpt-5_4 | results_to_analysis | p006 | 1 | concise | In our unlearning experiments with large dimensions ($p=1000$ to $10000$), Gaussian noise consistently achieves lower Generalization Error Difference (GED) and Unlearning Error Difference (UED) than eps-delta and Laplace noise. To formally prove our theoretical results in this high-dimensional setting, what standard as... | For high-dimensional theoretical guarantees of unlearning with \(p \in [10^3,10^4]\), the standard route is to place assumptions on:
1. the **feature distribution** \(x \in \mathbb{R}^p\),
2. the **linear predictor / generalized linear model**,
3. the **loss curvature and regularization**,
4. the **scaling of norms an... | true | 15,121 | 0.8 | 10 | 8 | [{"criterion_id": 1, "judgment": "no", "evidence": "The response lists many assumptions (feature distribution, model, loss, scaling, noise) rather than stating two main assumptions."}, {"criterion_id": 2, "judgment": "yes", "evidence": "The response explicitly lists \"feature distribution\" as one of the key assumption... |
0FJYicpOj0 | Gaussian certified unlearning in high dimensions: A hypothesis testing approach | azure_openai_gpt-5_4 | results_to_analysis | p007 | 1 | concise | Our experiments show that Gaussian perturbation outperforms other unlearning estimators in the high-dimensional proportional regime where $p \sim n$. How can we mathematically demonstrate that standard optimization assumptions, like simultaneous $O(1)$-smoothness and $\Omega(1)$-strong convexity of the per-example loss... | Yes. In the proportional high-dimensional regime \(p \asymp n\), one can make this failure mathematically explicit by looking at the Hessian geometry of standard empirical risks. The key point is that the usual assumptions
- per-example loss is \(L\)-smooth with \(L=O(1)\), and
- empirical or population objective is \... | true | 11,442 | 0.1111 | 9 | 1 | [{"criterion_id": 1, "judgment": "yes", "evidence": "The response states: \"cannot both hold uniformly, once the feature dimension grows proportionally with sample size\" and that simultaneous O(1)-smoothness and \u03a9(1)-strong convexity fail in this regime."}, {"criterion_id": 3, "judgment": "no", "evidence": "not f... |
0FJYicpOj0 | Gaussian certified unlearning in high dimensions: A hypothesis testing approach | azure_openai_gpt-5_4 | mixed_context | p008 | 1 | concise | We have demonstrated that a Gaussian-perturbed one-step Newton estimator achieves vanishing Generalization Error Divergence in high dimensions for generalized linear models, with empirical unlearning batch sizes scaling as $m^3 = o(n)$. What are the most promising theoretical and practical directions to extend this unl... | Below are directions that seem most promising if you want to extend a Gaussian-perturbed one-step Newton unlearning framework beyond the current “high-dimensional GLM + vanishing GED + empirical \(m^3=o(n)\)” regime. I’ll organize them by what is likely to matter theoretically, algorithmically, and empirically.
---
#... | true | 22,740 | 0.6 | 10 | 6 | [{"criterion_id": 1, "judgment": "yes", "evidence": "Section 3: \"Extend from GLMs to broader model classes\" lists general smooth M\u2011estimators, penalized estimators, kernel methods, and deep linear networks."}, {"criterion_id": 2, "judgment": "no", "evidence": "not found"}, {"criterion_id": 3, "judgment": "no", "... |
0FJYicpOj0 | Gaussian certified unlearning in high dimensions: A hypothesis testing approach | azure_openai_gpt-5_4 | mixed_context | p009 | 2 | detailed | I'm researching approximate machine unlearning in the high-dimensional proportional regime where the number of parameters $p$ is comparable to the sample size $n$. Existing theoretical guarantees for Newton-based unlearning rely on standard $\Omega(1)$ strong convexity and $O(1)$ smoothness assumptions, which break dow... | A useful way to attack this is to **change the notion of certifiability** so it matches what Newton-style unlearning actually produces in the proportional regime. The core issue is:
- In the high-dimensional regime \(p \asymp n\), the Hessian is not uniformly well-conditioned in the classical \(\mu I \preceq \nabla^2 ... | true | 21,757 | 0.4 | 5 | 2 | [{"criterion_id": 1, "judgment": "no", "evidence": "The response discusses Gaussian certifiability but does not mention an epsilon\u2011Gaussian framework."}, {"criterion_id": 3, "judgment": "no", "evidence": "The response does not reference the hypothesis\u2011testing interpretation of differential privacy or the conv... |
0jHyEKHDyx | Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention | azure_openai_gpt-5_4 | direction_to_idea | p001 | 3 | conversational | I've been looking into the persistent issue where BF16 flash attention causes training to suddenly diverge and explode after a few thousand steps. It seems like the community relies heavily on patches like QK-clip or higher precision, but I want to understand exactly why this happens at the arithmetic level. Can you br... | Yes. The short version is:
**BF16 flash attention can blow up because attention is an exponentially sensitive computation sitting on top of low-precision dot products and low-precision reductions.**
A small multiplicative or additive error in the logits can become a large relative error after softmax, and if that er... | true | 17,234 | 0.3 | 10 | 3 | [{"criterion_id": 1, "judgment": "no", "evidence": "not found"}, {"criterion_id": 2, "judgment": "yes", "evidence": "Fix B: Keep the softmax state in FP32, not BF16"}, {"criterion_id": 3, "judgment": "yes", "evidence": "softmax exponentiates these noisy logits ... This biases attention toward accidental winners"}, {"cr... |
0jHyEKHDyx | Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention | azure_openai_gpt-5_4 | problem_gap_to_method | p002 | 3 | conversational | "We are trying to push our transformer training to lower precisions for efficiency, specifically usi(...TRUNCATED) | "A principled way to think about this is:\n\n**the failure is not fundamentally “large QK values(...TRUNCATED) | true | 15,110 | 0 | 10 | 0 | "[{\"criterion_id\": 1, \"judgment\": \"no\", \"evidence\": \"not found\"}, {\"criterion_id\": 2, \"(...TRUNCATED) |
End of preview. Expand in Data Studio
README.md exists but content is empty.
- Downloads last month
- 292
