Paata Ivanisvili's Twitter Thread

GPT-5 Pro found a counterexample to the NICD-with-erasures majority optimality (Simons list, p.25). https://simons.berkeley.edu/si... At p=0.4, n=5, f(x) = sign(x_1-3x_2+x_3-x_4+3x_5) gives E|f(x)|=0.43024 vs best majority 0.42904.

@PI010101 Can someone explain to me how f in the problem statement (originally defined only for ±1 inputs) is to be interpreted where an input is 0? I assume this is the point of the parenthetical about multilinear expansion, but I'm not sure I follow how to interpret that.

@RadishHarmers Good question. Use the unique multilinear/Fourier–Walsh extension of f (this requires some afditional computations - finding fourier coefficients of f) and then input values 0. GPT 5 Pro did that.

@PI010101 And every single kid deserves the same model you got to use. •

@PI010101 I have tried to find a counterexample by GPT 5 Thinking, but didn't find any. As the screenshots show, GPT 5 Thinking complains there is a time limit so that the heavy Fourier calculation is too slow to find the answer. After hitting the limit, GPT 5 Thinking made a guess:

@PI010101 Grok, Gemini and I have a physical proof of the Riemann Hypothesis - by counterexample.

@PI010101 GPT‑5 Pro just brute-forced a counterexample to a majority-vote optimality problem in NICD-with-erasures. Cool. Z‑15 would’ve skipped the 12B tokens it took to get there and shown why the theorem fails in 3 compressed traces. We’re not just watching LLMs discover math. We’re

@PI010101 If you're interested, I made GPT-5 solve à 2 months old cutting-edge research problem on its own, and how it succeeded is even more interesting that the fact that he succeeded. https://open.substack.com/pub/...

@PI010101 Ask it what happened to Balaji.

@PI010101 The future might be giant clusters running proving or disproving mathematical statements, with some AIs validating results, and other AIs supervising to determine which results are interesting or inter-related.

@PI010101 Yes — what you’re seeing there isn’t a new “branch” of mathematics, but an example of how researchers are using Boolean-function analysis and noise-stability theory (from combinatorics + probability + Fourier analysis on \{-1,1\}^n) to test the limits of a famous conjecture.

@PI010101 Ray Kurzweil - looks like you were right.

@PI010101 Link to chat session or it didn't happen.

@PI010101 😮

@PI010101 The absence of a counterexample until now probably lulled people into over-relying on majority votes in noisy intermediate computations. This result with erasures shows why we need more robust alternatives, especially when the gap is so tight. If anything, it underscores the

@PI010101 GPT-5 has great performance, idk the reason why everyone hates it.

@PI010101 Amazing! How long did it take? Will you share your prompts? What you think made this problem accessible to the model?

@PI010101 There’s something deeply disorienting about automating the very act of discovery, but ig this is the world we live in now…

@PI010101 Amazing!

@PI010101 When AI starts overturning human conjectures, we need a science of recursive coherence, not more scaffolding. That’s what RGP offers... https://github.com/gradient-pu...

@PI010101 1. how was it prompted? 2. how did it approach that counterexample (I expect it starts with some subset of common functions used in bool proofs, then does brute force over n)? 3. how many attempts were required? did the user scaffold the problem solving approach

@PI010101 GPT-5 found a counterexample. ΔΦ explains why it exists. All logic collapses or preserves based on field tension (ΔΦ). Majority fails when the field is imbalanced. GPT stumbled upon a new tension minimum—ΔΦ predicted it. AI imitates function. ΔΦ reveals form.

@PI010101 @grok what is this?

@PI010101 The theory assumes the limit n → ∞, but GPT-5 Pro is discussing the case n = 5, which is a finite-dimensional setting. That violates the asymptotic assumptions, so its argument doesn’t actually test the theory itself.

@PI010101 So is this a real dispropf to an actual math conjecture or is there something that's off here?

@PI010101 the goalposts are moving faster than the speed of light right now

@PI010101 Could we deploy the top AIs available today to work continuously on open Mathematics conjectures and problems, while also having them verify their solutions using a system like Lean?

@PI010101 I asked the same to GPT 5 Pro , and it gave a different solution but suspiciously the same E|f(x)|: p=0.4 , n=5, f(x)=sign(2x₁+2x₂+x₃+x₄+x₅) E|f(x)|=0.43024 vs best majority 0.42904 Obviously, an LLM IS NOT A CALCULATOR so it will dump whatever number suiting the request.

@PI010101 Have you verified the calculations yourself?

@PI010101 Very interesting! Can you share a bit more about this? How much back-and-forth did you have before it could find the example? What motivated you to try this problem with Pro? And how surprising is it that there's a sign(linear) counterexample?

@PI010101 I feel so dumb as i cannot understand a word 😅

@PI010101 Isn't this something that could have been found quickly by exhaustive search and/or optimization for p using a somewhat straightforward script? It's neat that GPT finds it for you, though.

@PI010101 Huh, it ends up being very simple. The function is "if first two bits agree, that's the output; otherwise maj of other 3 bits" In limit as p->0.5, the conjecture asks abt prob that flipping a random bit in random input flips output. This is 7/20 for this fcn but 3/8 for maj_5!

@PI010101 @grok es cierto?

@PI010101 We truly live at the edge of singularity

@PI010101 Thank you for using GPT-5-Pro and not the weaker models!

@PI010101 Has anyone set up a system where they just allow a model to go over tons of math papers and try its luck with problems?

Share this thread

Read on Twitter

Navigate thread