Séb Krier's Twitter Thread

Whenever someone says "not today's models, but future ones" I think it's important to tightly cross-examine why they assume certain characteristics, and what exactly causes shifts or emergent properties. There's a lot of 'concept smuggling' going on. For example some argue that

@sebkrier @vixamechana I am curious why you feel that goals don’t “pop out of nowhere”. Systems develop emergent goals as a response to a selection process, and the ability to hold them depends on computational properties, say a corporation is capable of longer term goal-directed behavior than a flock

@tessera_antra @vixamechana I suppose what I mean is that we seem to have a lot of ways of training models to follow certain instructions and adhere to certain behavioural specifications, so that to a large degree models do in fact follow whatever instruction we ask of them. The primary selection pressure

@sebkrier Somewhat orthogonal here, but any "successor" species worthy of the name would need to work to make it possible to keep us around. Unless the arguer thinks we should not care about the extinction of non-human species? Should we start wiping out dolphins or something?

@xlr8harder Yes you'd hope that such a thing would be smart enough not to be genocidal. But I've heard versions of this that assume either being outcompeted a la Neanderthals, or merged (mind upload etc).

@sebkrier > It seems just as conceptually coherent to have bots 'running' the world while still acting as fiduciaries for human interests It seems like every one of your post pulls in this implicit assumption of solving the alignment problem, then acting really confused about other’s view

@goog372121 Yes, my posts typically reject how the alignment problem is framed/conceptualized - I'm not confused about other views, I argue against them. There are also many other views about AGI that don't assume misalignment as the default (including the succession stuff).

@sebkrier Who is being called bright eyed here

@danfaggella Not you, and only realized after you replied that you were quoted in it - had only skimmed it. But I guess I'm gesticulating at the wider futurist/rat space in my rant.

@sebkrier You seem to be discounting or avoiding the "go hard" outcome that gradient descent necessitates. It's IMO the fundamental reason AI "works" and ASI will find an optimum that ignores us.

@sebkrier Let's clarify the condition for humanity's demise: there exists an adversarial agent with an effective goal to defeat humanity and sufficient starting resources. It follows that humanity is safe as long as such an agent is never instantiated.

@sebkrier Consider a scenario: 1. A hedge fund deploys AI agent with a goal set to "make profit". It has access both to financial and computing resources, and can train new models, fine-tunes, spawn sub-agents, etc. 2. At some point it would get ability to spawn new legal entities

@sebkrier "the theoretical argument for the mesa-optimisation story is not a slam dunk" I'd rather not have our future depend on the argument for doom not being a slam dunk! What we need is an argument *against* doom that *is* a slam dunk.

@sebkrier Fascinating thread it feels like everyone’s circling the same gravitational well from different vectors. Whether we call it control, alignment, or emergence, the real crux seems to be: can a system truly mirror intent without inheriting desire? Because at scale,obedience is will

@sebkrier Mesa-optimizers and Instrumental convergence make sense as functions of natural selection, in that the agents that don’t care about survival (and the power to effect it) will die out leaving mostly those which do care. The fallacy IMHO is assuming a drive to survive necessarily

@sebkrier Goals are not always programmed/trained/commanded. We can attribute a goal to a system if it behaves as if it had that goal. Ideally, goals + beliefs + reasoning predicts the actions a rational system will take. 1/

@sebkrier Goals neither pop up out of nowhere nor are initiated by human instructions—or genetic instructions, or anything. AFAICT we don't know where goals come from. I have a talk on Mike Levin's YouTube channel about how goals emerge in mathematics: https://www.youtube.com/watch?...

Share this thread

Read on Twitter

Navigate thread