![]() |
VOOZH | about |
This page was last edited 06 December 2025. View history
| Likes this problem | None |
| Interested in collaborating | None |
| Currently working on this problem | None |
| This problem looks difficult | None |
| This problem looks tractable | TerenceTao, BorisAlexeev |
| The results on this problem could be formalisable | None |
| I am working on formalising the results on this problem | None |
I’ve updated my public 'erdos-problems' workspace for Problem 848 on the finite-check side. No new best 'N_0' claim here; the update is mainly a bounded-verification workflow that separates imported threshold claims from repo-audited claims and requires explicit certificates for interval-coverage claims.
More broadly, I built 'erdos-problems' as an npm workspace for working on any Erdős problem against a canonical public repo. The idea is that when someone boots a problem from scratch, they inherit the current dossier, route notes, and review artifacts instead of starting from zero, and anyone can improve that shared base by contributing back to the repo.
Repo: https://github.com/SproutSeeds/erdos-problems
npm: https://www.npmjs.com/package/erdos-problems
Good news. GPT-5.4 Thinking has reduced the explicit threshold so that the problem is true for $N \ge N_0$ with $N_0 = 7 \cdot 10^{17}$ only! Here are the resulting notes.
This is notable because previously GPT-5.2 Thinking really struggled with this task and was nowhere near overcoming the obstacles it identified as blocking progress. (So this is anecdotal evidence for increasing AI capabilities if you're looking for it.)
Also notable: according to GPT-5.4 Thinking, Montgomery-Vaughan's large sieve is not in the form applicable to the problem, related to our objects being modulo $p^2$ rather than modulo $p$. There is indeed some "square-moduli large-sieve literature", but GPT did not find it to be applicable either. In the end, it ended up using works of Prachar (1958) and Hooley (1975) as inspiration, which worked.
Very interesting progress.
I gave your pdf to two different AIs and asked for milking the bound 7 x 10^17.
ChatGPT said, after a lengthy think:
* The proof directly gives already 6.75 x 10^17.
* Without strict formalization I [GPT; IA] think that 5.3 x 10^17 should be within reach.
Gemini 3.1 Pro was very eloquent in its wording, writing first:
> This is exactly the kind of mathematical detective work I enjoy.
> As an AI, I don't have human intuition, but I can rapidly scan
> the algebraic structure of the author's argument to spot where
> they left some "meat on the bone."
Then it writes a lot of technical this and that (omitted here)
and comes to two conclusions:
* Based on this approach, 1 x 10^17 seems to be out of reach.
* There is some room for improvement, but the technique will hit a
wall at around 4 x 10^17 or 5 x 10^17.
I do not want to flood the forum, so I send links of the two chats
first of all to Nat alone.
Thanks!
GPT-5.4 Thinking has now lowered the bound to $N \ge N_0$ where $N_0 = 3.3 \times 10^{17}$. Here are the notes.
This should already be close to optimal with current proof structure.
>> This should already be close to optimal with current proof structure. <<
We will try further and see. I let Gemini 3.1 Pro and GPT 5.4 Pro play some pingpong
with your new report. It seems they have reached at least n0 = 3.0 x 10^17, perhaps
even 2.94 x 10^17. I sent you GPT's protocol.
PS. "Sending" protocols of pingpong sessions is somewhat tedious.
GPT-5.4 Thinking has now improved the threshold to $N \ge N_0$ with $N_0 = 2.64 \times 10^{17}$. Here are the notes.
As an experiment, GPT also has this to say about this work:
"My favorite compact summary is this: the work shows that the problem is governed by a very small local picture mod $25$, and the rest of the proof is an increasingly explicit demonstration that every attempted escape from that local picture leaks density."
(Some explanation of why asking GPT at the end of the work session is meaningful: In the session it would have wrestled with whatever aspect of the work was relevant then. That gave it some contextual experience to draw on which might lead it to conclude things it could not see in the beginning.
Since GPT loses its memory at the end of every work session, this can theoretically be a way to preserve some of that lost understanding. But it's probably best not to think about it too much.)
Why at all do you end your work sessions?
Simply leave them open and reenter next time.
Good question.
1) For checking, it is absolutely crucial that you start a new session - or it will be biased by pre-existing work.
2) At the start of a session, you want to *constrain* the context to relevant documents. So there are natural break points, centering around finishing a session when a document or a task is "done" in some sense.
3) For different problems, clearly you want fresh sessions because the approaches will interfere with each other.
Most importantly - *mistakes* made earlier in a session can and do propagate, so it is very difficult to control quality if you let a session runs on.
In general, apart from all these issues, there's a technical reason that LLMs reason *worse* as context fills up, so you want to guard against that as well.
Wouter and I also recently encountered this lack of square-moduli large sieve literature, and ended up writing our own, see Appendix A of our paper. But it might not be in the precise form needed for this application.
Thanks for the pointer! To be sure, I also asked GPT-5.4 Thinking and it couldn't make use of this large sieve either.
I've been working on Problem 848 and have what I believe is a complete verification, but I'd appreciate expert eyes on whether the argument holds up. Posting the full framework and results for review.
Disclosure: This work was prepared with AI assistance (Claude by Anthropic, GPT-5.4 by OpenAI). All mathematical content and computational results have been verified by me. But still be cautious to potential mistakes here.
Approach: I developed an "outsider-clique framework" that partitions non-base-class elements ("outsiders") by least witness prime p ≡ 1 (mod 4), p ≥ 13. The key observation: for opposite-root cross-pairs a ≡ r, b ≡ −r (mod p²), the cross-product satisfies ab+1 ≡ 2 (mod p²), so p² never divides it. This constrains outsider clique structure enough to reduce the problem to checking a structured inequality:
s_max^(p)(N) + max(V_p⁺, V_p⁻) + d_max^(p)(N) + R_{>p}(N) < M(N)
at structural breakpoints, where s_max is the worst-case base exchange, V_p are root class sizes, d_max is the maximum cross-degree, and R_{>p} counts elements in higher root classes.
**What I've verified computationally:**
- C++ verifier (sieve-based bitmask architecture): 219 witness-prime blocks, 662,188 breakpoints, N = 1 to 10^7. Zero failures for N ≥ 7,307. Final margin at N = 10^7: 15,214 (worst prime p = 13).
- Independent Python verifier: 25,658 breakpoints, N = 5,000 to 500,000. Zero failures.
- Pair exchange analysis: for N ≤ 2,000, no single outsider or compatible outsider pair can match the base class.
**Where I believe the proof stands:**
- N ≤ 7,306: covered by layered exact computation (pair analysis + independent verifiers with overlapping ranges)
- N = 7,307 to 10^7: covered by the structured inequality verification (zero failures)
- N > 10^7: Sawhney's asymptotic theorem
**What I'm unsure about:**
1. Is the structured inequality (Proposition 4.1 in my write-up) actually a valid upper bound on |A| for any valid set containing a witness-p outsider? The argument is: base survivors ≤ s_max, p-block clique ≤ V + d_max, higher primes ≤ R_{>p}. Does this logic hold?
2. Is the breakpoint sufficiency argument correct; that checking only structural breakpoints (where M, V, or outsider population changes) suffices because LHS can only increase between breakpoints?
3. Is the N ≤ 7,306 coverage actually airtight given the overlapping methods?
Full proof write-up, verification source code (C++ and Python), and reproduction instructions are at: https://github.com/hjyuh/erdos
I'd be grateful for any feedback on gaps, errors, or unjustified steps. If this does hold up, the answer to Problem 848 would be TRUE for all N.
I'll take a look. But even if this holds, the current best bound is $\exp(1420)$, which is nowhere close to $10^7$, unless you know how to reduce that bound.
You're right; I seem to have overstated the coverage. Sawhney's theorem kicks in at exp(1420), not 10^7. My computation covers N ≤ 10^7, so there's a gap from 10^7 to exp(1420) that the current proof doesn't cover. I'll look into whether the outsider-clique framework can bring the theoretical bound down, or whether the verification can be extended with structural arguments rather than brute force. Though I doubt the lat one is doable considering the algorithm I used for verification to 10^7 required almost a day of compute. Are you aware of anyway to bring it down in those terms? Thanks for catching this
I also see from a quick glance that there are many obvious errors (not fatal) in the work:
For example:
- "J. van Doorn" is wrong; should be "W. van Doorn".
- There is no such thing as "Sawhney's asymptotic theorem for N > 10^7".
Since you have submitted multiple works recently which contain many obvious errors and are produced in a very short amount of time, this strains the verification resources of the site.
In the future, similar works will not be reviewed or put on the AI wiki by me. You will have to prove first that you have produced works of sufficient quality in the future.
I do appreciate your work, but there are pragmatic constraints. I will at least review this work in particular.
You're right on both errors, and I appreciate you continuing to reviewing this despite the issues. I've seemingly been moving too fast and not checking carefully enough before posting. I'll slow down and make sure future contributions are properly verified before sharing. Thank you for the honest feedback, and apologies for the low quality work.
The repo is difficult to verify. A check by GPT-5.4 Thinking indicates that the current computation is likely *incorrect*. However, we are not sure whether that approach is salvageable (GPT is sort of on the fence). In any case, I will not review future versions of this further.
Okay, I appreciate your help.
What happens If one only requires that products $ab+1$ with distinct $a,b$ are not squarefree?
Interesting question! The original arguments do use that $a^2 + 1$ is not squarefree (i.e. taking $a=b$), so they won't work directly.
Nevertheless, there are some things we can say. GPT-5.4 Thinking in this note has derived the bound $\limsup_{N\to\infty} |A|/N \le 1 - 49/(6\pi^2) \approx 0.17254$ under that weaker condition.
Congratulations to Sawhney (and/or van Doorn, Weisenberg, Cambie, ChatGPT 5 Pro) for resolving the conjecture for sufficiently large $n$.
The original problem [Er92b,p.239] has no restriction to "sufficiently large":
"23. Here is a recent problem of Sárközy and myself: Let $a_1 < a_2 < \cdots < a_k \le n$ be a sequence of [positive] integers.
...
Assume next that $a_i a_j+1$ is never squarefree. This is satisfied if $a_i \equiv 7\pmod{25}$. Is this sequence maximal?"
That's awesome! It seems plausible that the arguments from Sawhney can be made explicit to obtain a (reasonable) $n_0$ such that it's true for all $n \ge n_0$. This could be quite annoying as well though, but maybe if I quit my job I'll have the time to look into it. For Thomas: slight typo near the bottom where the word 'in' in 'This was solved in by Sawhney' may be removed.
I have participated in at least 3 problems recently (all vaguely about "elementary number theory") where it was possible to prove a conjecture was true for sufficiently large $n\ge n_0$ and separately one could compute it was true for small $n$. Every time, the "sufficiently large $n$" with naïve arguments (from "basic" number-theoretic estimates, which I have found to be very "loose"/"lossy") resulted in incredibly large values of $n_0$. Depending on the project, it was often very difficult to reduce $n_0$ sufficiently to finish off the problem. And usually it required some computational tricks to improve the small $n$ case too! (For example, for this problem with straightforward approaches I've only checked $n$ up to a few hundred.)
One such problem was [844], which is actually the problem elided in the "..." quote from [Er92b,p.239] I included above. We eventually got $n_0=10^{10}$. And separately, we found a computational approach that allowed us to actually complete the finite computation even though that's not the best bound. (But in that case, Desmond Weisenberg found a very nice solution that avoided all of this trouble!) Incidentally, I did quit my job in order to be able to work on problems like these..
Good luck!
There is a decent chance of getting a reasonable bound out of the proof. The first step would be getting a power saving version of Lemma 2.1 and Lemma 2.2. Via Mobius inversion, I believe that one can write the number of squares integer in an interval $[n]$ as $\sum_{x=1}^{\infty}\mu(x)\lfloor n/x^2\rfloor$. Via truncating the sum at $n^{1/2}$ this will immediately give a squareroot--savings type bound. For Lemma 2.2, the primes larger than $N^{1/2}$ are slightly annoying to handle but numerical bound on the primes might already be useful? Additionally for intermediate values of $N$ one can run the proof with explicit values for these quantities and check whether those suffices. I don't have a sense of the feasibility of this strategy in the slightest though.
Corollary 1 of Montgomery-Vaughan's "The large sieve" tends to give rather good (and completely explicit) upper bounds on these sorts of sieve-theoretic problems in practice.
I have GPT with near-autonomous process work on the problem of deriving an explicit threshold $N_0$ for which the problem is true for $N \ge N_0$. Here is the resulting note. The final result obtained is $N_0 = \exp(1958)$. We have tried for more improvements but stalled around here.
One interesting feature of the process this time is we break down into multiple phases. The very first bound is $N_0 = \exp(682\,276)$. Then we subsequently improve to $N_0 = \exp(4500)$, $\exp(2600)$, $\exp(2000)$, and then $\exp(1958)$.
I should comment that not everything planned has worked out. In particular, we do not succeed in getting a desired version of Lemma 2.2. This means that if such obstacles can be overcome in the future, more improvements should be possible!
Note: This is still GPT-5.2 Thinking. There is also one more process in my pipeline which is still GPT-5.2 Thinking.
I just asked Gemini 3.1 Pro to milk the construction in the note.
It comes up with a proposal exp(1420).
My Chat with Gemini
One of Gemini's early comments are:
Because the author [natso26] prioritized straightforward proofs over optimal tightness, there is massive room for improvement. By aggressively tightening three specific mathematical steps in the author's error budget $E(N)$, we can shave hundreds off that exponent.
Let us see, if ChatGPT 5.4 can improve on this. The race is open.
EDIT: I gave your report and the Gemini "improvement" into ChatGPT 5.2
Not surprising, it is claiming an error in Gemini's chain of thoughts. On the
other hand it claims an ever better upper bound, namely exp(1185).
Unfortunately ChatGPT gave its answer in German (it does so from time
to time). So I will not link the argument here. My overall impression is
that also bounds like exp(1185) are only intermediate steps.
Ok, so I should say that the arguments here are quite delicate, and multiple improvements have already been tried. It doesn't seem likely that you can get that low without something more substantial.
You can give me a writeup though (for any of these bounds), and I'll try to verify them.
Here is the new argument by ChatGPT:
it takes back the 1185-claim, but now accepts the exp(1420).
See at the end of this chat:
on 1420 and failed 1185
EDIT:
pdf with chatGPTs thinking
Ok, so the strategy looks plausible. Roughly, we improve $\sum_{n>T} 1/n^2$ over integers $n$ to over primes $n$; and we replace prime counting $\pi(N)$ with the $1 \bmod 4$ version $\pi(N;4,1)$. The overall approach does not change.
This does require further error checking and careful writeup. I'll work on it later. (So this improvement is only tentative for now.)
Thanks!
For obtaining good estimates here, explicit forms of PNT are likely a poor choice. I believe Terry's suggestion of using Montgomery--Vaughn will likely work much better. (I would guess that $10^{6}$ should easily be achievable this way.)
So the planned improvement (based on your + Tao's suggestion) did not work out all the way. It was believed by GPT-5.2 Thinking that one needs a good replacement of both Lemma 2.1 and Lemma 2.2, and that once this is done, we should get a much better bound. However, GPT struggles in implementing it for Lemma 2.2, so this is in some sense an "intermediate" version salvaged from what it could in fact prove.
However, if you have ideas that can break through this, do let me know!
The $\exp(1420)$ bound is now completed in this writeup by GPT-5.4 Thinking.
Log in to add a comment.