Hacker Remix

RFdiffusion: Diffusion model generates protein backbones

114 points by jajoosam 2 years ago | 19 comments

zack-m 2 years ago

Corresponding paper to RFdiffusion: https://www.biorxiv.org/content/10.1101/2022.12.09.519842v2

Some context: Been waiting for this to come out for a while! Main innovation is leveraging RosettaFold (protein folding neural net) to generate protein backbones via diffusing in 3D space! From backbones, we can generate sequences that would fold into said structures via sequence design algorithms (check out: proteinMPNN, Rosetta FastDesign).

In terms of applications: This is super relevant for our ability to create strongly binding protein binders (ex timely creation of proteins that bind to virus spike proteins), and designing enzyme from scratch!

Prior methods suffered from much lower success rates for generating “good” backbone structures. Extremely exciting!! If you want to learn more, check out the Baker group at UW!

baq 2 years ago

So in essence, if I understand correctly, instead of generating Balenciaga Pope or arrested Trump fake images, we can now dream up fake protein things which may actually be viable for whatever purpose if synthesized in the real world?

mimischi 2 years ago

Dreaming up a static three dinensional structure does not guarantee that it is stable in a given environment, or that production of this structure in a lab is viable. A huge problem in the space is protein folding–concerned with figuring out how you get from an unfolded linear string of amino acids to this three dimensional structure.

Folding takes into account many variables, and a big chunk of current experimental structure determination is concerned with controlling/adjusting these variables.

So this dreaming up will provide a potential “quicker way” into what a folded protein might look like, but it will not guarantee you that humanity knows how to actually produce it in the real-world.

Disclaimer: someone correct me if I’m wrong. I might be rusty on the latest developments, as I’ve left the field after my PhD.

flanman23 2 years ago

Indeed there are many pitfalls between a protein sequence and something useful to humanity, but there is reason to believe the technique is capable of generating such proteins:

1) In the paper they express several of their designs and show stability via circular dichroism experiments. They also show size exclusion chromatography results indicating some of the proteins are of the expected size and are not aggregating.

2) Since RFDiffusion and ProteinMPNN, which generates the actually amino acid sequence, are trained using Protein Data Bank (PDB) data, it's reasonable to presume the predicted proteins will be well behaved. To solve a protein structure via say X-ray crystallography, EM, or NMR and deposit it into the PDB requires bucket loads of stable protein. I used several grams of recombinant protein for a X-ray structure I solved. Since the ML models are trained on well behaved proteins, I can believe the generated proteins will also be well behaved.

lysozyme 2 years ago

It’s a fun time to be interested in AI for proteins. Every new ML model type is inevitably tried out on proteins. As the functional molecules of life, proteins are uniquely important and fundamental to every process in biology. As the targets for every drug, the tools for every cellular job, and the squishy, wiggly, moving and alive parts of living things, proteins presents both exciting possibilities and deep technical challenges for those who design them. A protein can be understood simply as a string of letters about 300 long using the alphabet ACDEFGHIKLMNPQRSTVWY. This turns out to be a great representation for sequence models like transformers. One big public database, UniProt, has 200 million protein sequences you can train your model on.

The very largest plain transformer models trained on protein sequences (analogous to plain text) are about 15B parameters (I am thinking of Meta AI’s ESM-2 [1]). These can do for protein sequences what LLMs do for text (that is, they can “fill in the blank” to design variations, generate new proteins that look like their training data), and tell you how likely it is that a given sequence exists.

Some cool variations of transformers have applications for protein design, like the now-famous SE(3) equivariant transformer used in the structure prediction module of AlphaFold [2], now appearing in the research paper [3] accompanying TFA, as well as variations on the transformer such as the message passing model ProteinMPNN [4], which builds on a neighbor graph-structured transformer [5]

1. https://github.com/facebookresearch/esm

2. https://github.com/deepmind/alphafold

3. https://www.biorxiv.org/content/10.1101/2022.12.09.519842v2

4. https://github.com/dauparas/ProteinMPNN

5. https://github.com/jingraham/neurips19-graph-protein-design

folli 2 years ago

I'm interested in finding out more about de novo binder design to a given protein. Besides RFdiffusion, do you know of any other tools worth a look?

jajoosam 2 years ago

Check out ColabDesign!

Robotbeat 2 years ago

Can stuff like this be used for other polymers, like thermoplastics? Can you speed up molecular modeling of thermoplastic crystallization?

mr-ai 2 years ago

What I find fascinating about RFDiffusion is that it puts together two very powerful yet distinct deep learning architectures: Diffusion models and Graph Neural Networks. I wrote about this here: https://www.assemblyai.com/blog/ai-trends-graph-neural-netwo...

waynenilsen 2 years ago

So I guess this means easier drug discovery? Honesty those wiggly diagrams are meaningless to me I have no bio background

ramraj07 2 years ago

Easier drug discovery is what they tell public and grant agencies. In a roundabout way it’s true. Maybe. Many other hurdles still exist. But what this and other similar tools really are, is significantly advancing basic science in creating our own protein designs.

Before alphafold changed this field, creating your own protein design was considered an insane task (not impossible, bakers lab and others have done it a couple times). But these tools (now we have multiple) allow you to create new proteins From scratch that can do exactly what you want (caveats galore). New enzymes that can catalyze reactions never found in nature for example.

Before this all we could do was take proteins that already exist in nature and modify them. So you can imagine how new this world is.

og_kalu 2 years ago

Large Language models can also generate novel and working protein structures that adhere to a specified purpose https://www.nature.com/articles/s41587-022-01618-2

westurner 2 years ago

Can optical tweezers construct such proteins; or is there a more efficient way?

Optical tweezers: https://en.wikipedia.org/wiki/Optical_tweezers

"'Impossible' photonic breakthrough: scientist manipulate light at subwavelength scale" https://thedebrief.org/impossible-photonic-breakthrough-scie... :

> have successfully demonstrated that a beam of light can not only be confined to a spot that is 50 times smaller than its own wavelength but also “in a first of its kind” the spot can be moved by minuscule amounts at the point where the light is confined.

> According to that research, the key to confining light below the previous impermeable Abbe diffraction limit was accomplished by “storing a part of the electromagnetic energy in the kinetic energy of electric charges.” This clever adaptation, the researchers wrote, “opened the door to a number of groundbreaking real-world applications, which has contributed to the great success of the field of nanophotonics.”

> “Looking to the future, in principle, it could lead to the manipulation of micro and nanometre-sized objects, including biological particles,” De Liberato says, “or perhaps the sizeable enhancement of the sensitivity resolution of microscopic sensors.”

"Digging into DNA Repair with Optical Tweezer Technology" https://www.genengnews.com/topics/digging-into-dna-repair-wi...

c1ccccc1 2 years ago

It's much easier than that! Living cells already have ribosomes that construct proteins and all the other molecular machinery needed to go from DNA sequence to assembled protein. You can order a DNA sequence online and put it into e-coli or yeast cells and those cells will make that protein for you.

ramraj07 2 years ago

That’s like saying anyone who has a computer can hack into the NSA. In principle yes, but the amount of know-how and troubleshooting is being underplayed here. Not to mention the question of what you do with the protein once you produce it.

westurner 2 years ago

TIL about mail-order CRISPR kits. "Mail-Order CRISPR Kits Allow Absolutely Anyone to Hack DNA" (2017) https://www.scientificamerican.com/article/mail-order-crispr...

Protein production: https://en.wikipedia.org/wiki/Protein_production

Tissue Nanotransfection reprograms e.g. fibroblasts into neurons and endothelial cells (for ischemia) using electric charge. Are there different proteins then expressed? Which are the really useful targets?

> The delivered cargo then transforms the affected cells into a desired cell type without first transforming them to stem cells. TNT is a novel technique and has been used on mice models to successfully transfect fibroblasts into neuron-like cells along with rescue of ischemia in mice models with induced vasculature and perfusion

> [...] This chip is then connected to an electrical source capable of delivering an electrical field to drive the factors from the reservoir into the nanochannels, and onto the contacted tissue

https://en.wikipedia.org/wiki/Tissue_nanotransfection#Techni...

Are there lab safety standards for handling yeast or worse? https://en.wikipedia.org/wiki/Gene_drive

westurner 2 years ago

"Bacterial ‘Nanosyringe’ Could Deliver Gene Therapy to Human Cells" (2023) https://www.scientificamerican.com/article/bacterial-nanosyr... :

> In a paper published today in Nature, researchers report refashioning Photorhabdus’s syringe—called a contractile injection system—so that it can attach to human cells and inject large proteins into them. The work could provide a way to deliver various therapeutic proteins into any type of cell, including proteins that can “edit” the cell’s DNA. “It’s a very interesting approach,” says Mark Kay, a gene therapy researcher at Stanford University who was not involved in the study. “Where I think it could be very useful is when you want to express proteins that can do genome editing” to correct or knock out a gene that is mutated in a genetic disorder, he says.

> The nano injector could provide a critical tool for scientists interested in tweaking genes. “Delivery is probably the biggest unsolved problem for gene editing,” says study investigator Feng Zhang, a molecular biologist at the McGovern Institute for Brain Research at the Massachusetts Institute of Technology and the Broad Institute of M.I.T. and Harvard. Zhang is known for his work developing the gene editing system CRISPR-Cas9. Existing technology can insert the editing machinery “into a few tissues, blood and liver and the eye, but we don’t have a good way to get to anywhere else,” such as the brain, heart, lung or kidney, Zhang says. The syringe technology also holds promise for treating cancer because it can be engineered to attach to receptors on certain cancer cells.

westurner 2 years ago

From "New neural network architecture inspired by neural system of a worm" (2023) https://news.ycombinator.com/item?id=34715188 :

> "I’m skeptical that biological systems will ever serve as a basis for ML nets in practice"

>> First of all, ML engineers need to stop being so brainphiliacs, caring only about the 'neural networks' of the brain or brain-like systems. Lacrymaria olor has more intelligence, in terms of adapting to exploring/exploiting a given environment, than all our artificial neural networks combined and it has no neurons because it is merely a single-cell organism [1].

Which proteins code for organisms that compute?

jajoosam 2 years ago

ML for protein engineering is incredibly fascinating — and pretty much all of it, including RFdiffusion is built on structure prediction models.

This series of talks by Nazim Bouatta is exceptional, helped me appreciate and make sense of these models. Incredible how you can engineer neural nets to learn with way lesser data when you incorporate the right inductive biases: https://youtube.com/playlist?list=PL0NRmB0fnLJQPDZh-6utVnRpF...