Published: February 19, 2025
783
3.2k
18.6k

HOLY SHIT IT'S HAPPENING AI can now write genomes from scratch. Arc Institute an NVIDIA just published Evo-2, the largest AI model for biology, trained on 9.3 trillion DNA base pairs spanning the entire tree of life. it doesn’t just analyze genomes. it creates them 1/

Image in tweet by vittorio

Evo 2 generates mitochondrial, prokaryotic, and eukaryotic sequences at genome-scale Evo 2 is FULLY OPEN, including model parameters, training code, inference code, and the OpenGenome2 dataset LMFAO 2/

Image in tweet by vittorio

think of it as a DNA-focused LLM. instead of text, it generates genomic sequences. it reads and interprets complex DNA, including noncoding regions usually considered jink, generates entire chromosomes and new genomes, and predicts disease-causing mutations, even those not understood 3/

this is biology hacking AI is moving beyond describing biology to designing it. this allows for synthetic life engineered from scratch, programmable genomes optimized by AI, potential new gene therapies, and lays the groundwork for whole-cell simulation. biology is becoming a computational discipline 4/

it was trained on a dataset of 9.3 trillion base pairs from bacteria, archaea, eukaryotes, and bacteriophages it processes up to 1 million base pairs in a single context window, covering entire chromosomes. it identifies evolutionary patterns previously unseen by humans 5/

Evo-2 has demonstrated practical generation abilities, creating synthetic yeast chromosomes, mitochondrial genomes, and minimal bacterial genomes. this is computational design in action. 6/

Image in tweet by vittorio

Evo-2 understands noncoding DNA, which regulates gene expression and is involved in many genetic diseases. it predicts the functional impact of mutations in these regions, achieving state-of-the-art performance on noncoding variant pathogenicity and BRCA1 variant classification. this could lead to advances in precision medicine 7/

Evo-2 uses stripedhyena 2, combining convolution and attention mechanisms, not transformers. it models DNA at multiple scales, capturing long-range interactions, and autonomously learns features like exon-intron boundaries and transcription factor binding sites without human guidance. it’s not just memorizing it’s understanding biology. 8/

Image in tweet by vittorio

Evo-2 predicts whether mutations are harmful or benign without specific training on human disease data. WHAT it outperforms specialized models on BRCA1 variants and handles noncoding mutations effectively, suggesting it has learned DNA’s fundamental principles 9/

Evo-2 generates DNA sequences that influence chromatin accessibility, controlling gene expression. it has embedded simple Morse code into epigenomic designs as a proof of concept, not a practical application. this shows potential for designing programmable gene circuits. make me blonde. thank you 10/

Evo-2 is FULLY OPEN SOURCE, including model parameters, training data, and code. this will lead to massive widespread innovation in bioengineering, lowering barriers to genome design. it’s a revolution moment for the field. the era of biotech is here 11/

The Arc Institute aims to model entire cells, moving beyond DNA to whole organisms. this could lead to AI creating new life forms and synthetic biology becoming AI-driven. the future involves programming life at increasing scales 12/

three years ago, AI focused on chatbots. now it generates genomes. soon, it will design complex biological systems. this is a new phase humans are no longer just studying biology but rewriting its code. biology’s future is computational. are you prepared? 13/

Share this thread

Read on Twitter

View original thread

Navigate thread

1/14