Riskgaming

The soon-to-be-solved protein problem that will accelerate drug discovery

We’ve known for decades that one of the key mechanisms of biology — and of life itself — is the binding of molecules to proteins. Once bound, proteins change shape and thus their function, allowing our bodies to adapt and change their molecular machinery as needed for survival. The challenge that remains unsolved is to predict — across billions of potential proteins and a similar number of molecules — how those proteins change and how they might interact with each other.

The fervent hope of many scientists and entrepreneurs is that artificial intelligence coupled with experimental and synthetic datasets, may finally unlock this critical piece of the biological puzzle, ushering in a new wave of therapeutics.

My guest today is one of those science entrepreneurs, ⁠Laksh Aithani⁠, the co-founder and CEO of ⁠Charm Therapeutics⁠. He’s made cancer the focus of his work, and through Charm and his team, is building expansive datasets to develop AI models that can predict the 3D shape of proteins.

Alongside host ⁠Danny Crichton⁠ and my ⁠Lux⁠ colleague ⁠Tess van Stekelenburg⁠, we explore protein folding’s past, present and future, the utility and risks of synthetic data in biological research, how much money and time we might expect for future drug discovery, what individualized medicine might look like decades from now, and how new grads can get into the field as the century of biology kicks off.

Produced by ⁠⁠⁠⁠⁠⁠⁠⁠Christopher Gates⁠⁠⁠⁠⁠⁠⁠⁠

Music by ⁠⁠⁠⁠⁠⁠⁠⁠George Ko⁠⁠⁠⁠⁠⁠⁠⁠

Transcript

This is a human-generated transcript, however, it has not been verified for accuracy.

Danny Crichton:
All right, so look, we're talking about proteins today. When I think about proteins I think about protein shakes, I think about steak, but we're doing the biological version of this, which is also true of protein shakes. That's a biological substance. But nonetheless, proteins are the engines of life. They are what runs everything in our bodies, and yet, despite how important they are we actually know up until recently very little about them. But Laksh, I am really curious, you've spent your life's work on this sort of subject and there's been some tremendous discoveries over the last couple of years. Maybe walk us through a little bit of the history lesson of where we came from and where we are now.

Laksh Aithani:
Yeah. Well, look, proteins are fundamental to life. And in the world of drug discovery, it's all about finding drugs that bind these proteins. Now, a key step in that process is figuring out how they bind in 3D. It's really a 3D modeling problem. You want to find a drug with a complimentary shape to the protein, and predicting the structure of these proteins has been very critical in doing that. So over the last four years or so, we've made as a field massive leaps using AI to predict the structure of these proteins. So this has been a problem that was essentially impossible to do four years ago, and now we can do it. So it's a big breakthrough.

Danny Crichton:
So I am going to date myself a little, 20 years ago, there was SETI at home where you could search for extraterrestrial life by running a screensaver on your computer using your excess cycles. And then at the same time I recall there was folding at home where you could fold. There was a beautiful screen saver. You would see these protein folds creating those 3D shapes, trying to predict what they looked like and analyzing them. Did I contribute to this massive discovery? Do I get my prize from the Nobel Committee in a few years here?

Laksh Aithani:
As much as you'd like to hear the answer being yes, I don't think you have a prize.

Tess van Stekelenburg:
You're so political.

Laksh Aithani:
Well, I think you did it indirectly because there was a lot of work that went into that inspired people to use AI to try and do this problem. However, the ultimate solution to this problem was really a pure play AI approach.

Tess van Stekelenburg:
A lot of people in this audience will have heard of alpha fold, which is protein structure prediction. It feels like we're moving beyond that over the past... You may dragon fold, but over the past 12 months there's been a lot of players in this space that have looked not just at protein structure but also how it's doing this co-folding with small molecules or ligands.

Laksh Aithani:
The whole goal of drug discovery is to find the right drug, the right molecule from the trillions, actually more than trillions of molecules that exist and that could be potential drugs. You have to find the right one. So that's why people describe it as a needle in a haystack problem. If we were able to accurately predict how each molecule folded into the protein, I think that would really have a massive impact in terms of very quickly finding the right drug.

Danny Crichton:
And to me to summarize, this is sort of the revolution with AI. So you're going from this labor-intensive, we're going to have to use every protein, we have to try to design them. We're trying to find the one in a million, one in a billion protein that fits well and just fit versus having an AI model that could generate millions of different combinations. Experimentally look in that and say, "Look, one of these 50 is probably the right answer." And you've taken this massive universe of stars in the sky and you've reduced it down to a couple that you can actually test in the lab and go forward and there in the drug pipeline process. So one of the questions would've a follow-up on, so when we talk about generalized artificial intelligence one of the big questions is what kind of data do you need to actually get there?
And there's a couple of different arguments, one is you need really high quality data as inputs. That if you give it fake data, synthetic data, you get bad models. The other answer is that AGI or generalized artificial intelligence models are actually able to use synthetic and fake data really well and that what they really just need is just quantity. As much as possible, however you get it, even if it's good, even if it's bad, just throw it all into the mix, the kitchen sink and everything. When it comes to biology, up until now we had all this sort of quality data, you had all this X-ray crystallography data. You had this PDB database and so you had a level of truth that was being inserted to create these models. As you sort of scale up, do you still need that sort of experimental truth to be inputted or can it start to use a wider mix of data, I don't want to say fake data, but synthetic data that sort of fleshes out the data set in order to make these models as robust as possible?

Laksh Aithani:
Yeah, so that's a really interesting question. And we have seen, for example, in the AlphaFold paper and the AlphaFold work, they have used synthetic data in the form of self distillation to gain a small improvement in accuracy. I would say I think it's around let's say 5% ish improvement in accuracy, which is by no means negligible. It is a good improvement but still small in the grand scheme of things. The other thing to note is that when you use the synthetic data, the question is how do you generate it? Let's say you generate the synthetic data with molecular dynamics. Molecular dynamics it's essentially using physics to generate data. Now the problem with that is we don't know the rules of physics. If we knew the rules of physics perfectly, then we wouldn't need AI at all. We could just use physics to follow the process. The reality is the best force fields these days... And a force field is essentially as a way to do these molecular damage simulations to keep it very simple, the best force fields today are not able to even fold very simple and short proteins.
And that's because we're not able to write down the rules of physics effectively and then model them. Actually, that's not quite true. We can write down the exact rules of physics which is quantum mechanics essentially, but it's too computationally intensive to actually run those simulations.

Danny Crichton:
I was going to joke, I think my high school physics teacher would disagree with you. We know everything about physics. There are beautiful rules. All of it falls except we just need proteins to be the size of skyscrapers and we enter the realm of Newtonian mechanics as opposed to quantum mechanics.

Tess van Stekelenburg:
Well, yeah, I think the questions that follow from this is if... It feels like we're still a long way out from this holy grill of generalizability, where you take any arbitrary sequence and it's able to predict co-folding with a ligand. But it does seem that despite all the bottlenecks that are there today, that if there is this economic objective function like this billion dollar molecule at the end or large patient outcomes like being able to cure cancer, and we add an arbitrary length of time, let's say like 100 years, that we can solve this. That it's a problem that can be solved.

Laksh Aithani:
I think so. And the other thing is, as we've seen from the last five years in AI, it's very hard to predict the future of AI. If you asked anyone who's working in AI in 2019, would we be where we are today with charge ChatGPT4? They would've almost certainly said, absolutely not. The rate of progress suddenly increased very quickly starting from 2020, 2021 to where we are today. So it's quite hard to predict when those inflection points will take place. Let's say we increase the model size by a 100x, these protein folding models let's say we make them 100x bigger, maybe then we get a massive improvement in generalization ability. No one really knows, no one's done that experiment. And the problem is it takes a lot of money to do those types of experiments.

Tess van Stekelenburg:
And so if we look at the PDB, which has been generated over the last 50 years, and it might've cost around 10 billion or something to generate, what do you think a data set this would cost if you had to estimate? Just the sheer dollar value of getting the amount of data to get something that could get to predictive accuracies.

Laksh Aithani:
Yeah, well, so that's a really difficult question. One is how many extra crystal structures do you need? That's one component. But the other component is, what types of crystal structures do you need? Maybe you just need enough crystal structures in each area of protein space and protein legged space. Maybe that's enough. So I think that's a very hard question to answer, but let's say conservatively you need 10 times the size of the PDB, then you're talking about two million crystal structures. That's going to be a lot, and that's going to take a lot of money and a lot of time. That being said, I don't know a huge amount about the latest advances in high throughput x-ray crystallography. Maybe in the next 10 years we'll have a breakthrough in our ability to generate crystal structures in a much higher throughput way.

Danny Crichton:
This is really interesting to me because when I think of... Obviously I've looked at AlphaFold, I think the original AlphaFold wasn't that like science invention of the year or something like this when it was released a couple of years ago. And so it's had this huge headlines, hundreds of millions of proteins are sort of scanned or whatever. And what I find interesting is we sort of don't see all the other technologies that are leading into the creation of this. So unlike ChatGPT which is built around pulling a corpus of text from the web and from books and everywhere we can get it, in this case, we actually need a lot of ancillary inventions even today to really get to where we need to go to build the next generation of models.

Laksh Aithani:
Oh yeah, completely. If you count the number of Nobel Prizes that have been given for X-ray crystallography across the last several decades, I think you'd be counting probably a number of Nobel Prizes in order to get to where we are now with the PDB.

Danny Crichton:
So maybe this is a good opportunity to spin into charm and what you're working on today. So all this stuff has come together. We have this huge scaling up effect going on in AI and we reach you today in 2024. You're building out charm. You're also very charming on this program as well, so it all matches together. But what are you working on, and of all the problems, we've already identified a bunch, but of all the problems to work on why did you choose to work on the one you did?

Laksh Aithani:
So me personally, well, I really have a passion for drug discovery and also especially for small molecule drug discovery. There's huge potential in this field, and that's really why I've essentially dedicated my whole life towards this type of field. So really it stemmed from my expertise and also I think we've decided to focus on cancer. Ultimately, cancer is still one of the most devastating if not the most devastating diseases. And whilst we've made a significant amount of progress in the last 30 years, there's still a long way to go towards achieving our ultimate vision of completely eliminating cancer. So it's really a problem that I feel like we could really get stuck into and spend definitely, at least for me, a significant chunk of my life trying to solve.

Tess van Stekelenburg:
And just zooming out again, let's say that we have a bunch of government funding that's streams into this, almost like certain level equivalent. What is this holy grill, like the Higgs boson equivalent for protein co-folding? What does that future or utopia look like if everything works?

Laksh Aithani:
Yeah, well, if we can truly fold any protein sequence with any small molecule drug and do that quickly, then I think what you're looking at is the ability to find leads. And there still needs to be a few other things that need to be solved. You need to solve the ADME and PK side, which I won't get into it, but let's say you can solve that as well. Then you're really looking at potentially generating a drug for any target in a very rapid amount of time, let's say in a matter of months. And so then you could sort of imagine, let's say we're in 2,100, so 80 years from now, let's say you're a scientist working in a pharmaceutical company. You get a patient who comes to you, they've just been diagnosed with cancer, then you can read their DNA sequence and you can identify a unique mutation in their genome that is driving that specific cancer. Or it might even be multiple different mutations and what you can then do is you can then say, "All right, I want to target this protein, this protein and this protein."
And the system, the computer will just essentially automatically create a drug for each one, and you can then give that drug to the patient a few months or maybe even a few weeks later and then ultimately that patient is cured. I think that would be the ultimate vision that I think could be possible in the next 80 years.

Danny Crichton:
Giving the massive increase in cost of developing a drug which we've seen go from tens of millions to hundreds of millions to billions, the idea that not only could we bring that cost all the way down but to also be able to expand that across so many neglected diseases, rare diseases, small molecule diseases, all these categories that traditionally aren't Ozempic and obesity, to me is really exciting.

Tess van Stekelenburg:
Yeah, and even to make it more real outside of just therapeutics so if someone that's listening to this is not interested in curing disease, but they are coming home to their family at the dinner table and they want to sound smart, like Ozempic is a perfect case of this where for example, the GLP-1 receptor. Finding out how semaglutide binds to that for example, has been able to lead to this drastic ability for people to lose a lot of weights and change their appetite completely.

Danny Crichton:
So this is very inspiring, but I'm curious when you think about young people going to the field, people who are just getting started in their careers, people who are learning computer science or biology, we've talked about x-ray crystallography, we've talked about force fields, quantum mechanics, these are scary. I'm scared and I'm in my early 20s, we'll say, just starting my career off. I haven't aged in 15 years. But I'm curious, is this a field that someone who wants to do impact can do impact or what sort of credentials or training do they need to have and get to the edge and have a contribution here to make this dream a reality?

Laksh Aithani:
Yeah, look, the field of AI is very new. You can essentially jump in and start learning about the latest and greatest algorithm, which is probably going to be much better than what was used a few years ago. And start improving those algorithms and start making an impact immediately. So I would say yeah, as a young person you can definitely contribute to this field by trying to push the state of the art on AI.

Tess van Stekelenburg:
And where would you say the current bottlenecks are? So if Danny were to decide, okay, where's the highest leverage-

Danny Crichton:
I'm 19 years old, which I am and I have not regretfully given up on my scientist career to do policy and a bunch of other stuff. Yes, exactly, where would I go?

Laksh Aithani:
Yeah, [inaudible 00:14:48], I think the bottleneck is generalizability or improving generalizability. And the things that are going to get us there I think, improving the model architectures, scaling the models, and then also increasing the size of the dataset. So maybe looking at how can we use experimental methods to create data quicker or improve synthetic data generation. Those are definitely key areas that one could probably look at as well.

Danny Crichton:
Yeah, the thing that comes to mind and I mentioned policy, but the other side of this is, do you think we're at the point where it makes sense to think and conceive of this next phase. As like a human genome scale problem, where like the '90s we put a billion dollars to work to actually sequence the first human genome at least on the public side against [inaudible 00:15:36] on genomics on the private side, do we need sort of that big push to say, "Look, we have the infrastructure, we just have a data collection problem." No one's really incentivized. So every lab that has the tools and techniques you're going to devote to this, you get some nice grants, do other projects as well. But everyone's got to contribute to this big data bank for a couple of years to do the big push.

Laksh Aithani:
Yeah, for human science and human health and animal health and other types of health, I would love that. And I would think that that would really push the field forward. I have no idea about how the Human Genome Project got started. I'd be really interested to learn about that and who needs to take the initiative. How does something like that even start to happen? I just don't know.

Danny Crichton:
One of the interesting things, a little bit of the history, but one of the interesting things there is because it affects human health you always have an intrinsic set of constituencies who do want this work to happen, people who are affected by diseases. And we focus a lot on human health, but there's also agricultural applications as well to a lot of these different technologies. And so there's a nice tension between are we just on the cusp? Are we wasting our money? We're too early. Is it just the money that's holding us back right now? And if we just spent a five year multi-billion dollar project on the other side of that would be five million lives saved. As soon as you can get to those sorts of numbers then are reasonably assured. But in a reasonable sense that a policy person could sit down with a spreadsheet and do the hard math of five billion divided by five million is $1000 a life, that's an incredible deal. In the United States, we pay 30 million a life. That's what a value of a statistical life a VSL is worth.

Laksh Aithani:
Yeah, that's interesting. And the other thing is I do think doing something like this, at least in my opinion, might have nearer term potential than doing something like the Human Genome Project. I do think ultimately the Human Genome Project is going to have a huge amount of value. But if you look at what drugs have been developed based on the Human Genome Project, there are a number of drugs but it's potentially not that many. The pipeline of going from a discovery of which protein to go after to actually getting a drug is very long. And that's why there haven't been that many drugs that have been discovered based on the Human Genome Project or insights from the Human Genome Project.

Danny Crichton:
What's interesting to me between the two is the Human Genome Project was like this huge miracle that was about to come. As soon as we sequence this, we'll get all these drugs in a lot of ways and it's usually written up as quite a disappointment. We didn't really get a lot out of it. There's one or two good examples, but it didn't lead to this huge group but it was a milestone. We had never done it before. It's like walking on the moon. The first time it's ever sequenced, front cover on Time magazine. Now, you have the opposite problem which is, this is actually the practical thing. We actually know this will create a whole set of therapeutics coming down the line. Unfortunately, there's no milestone. Now we're just doing the same thing over and over again. It's like going to the moon the 50th time, and you are collecting data. And every time we go, we're going to get more information. We're going to get smarter, better, faster. We probably will actually get the applications we wanted the first time around.

Tess van Stekelenburg:
I guess it's the difference between NASA and SpaceX, right?

Danny Crichton:
Right. Well, SpaceX matters is to hold together the excitement of this. But I feel like launching a rocket into space and re-sequencing or figuring out another protein, there's a little bit of a visual challenge there at least on the politics side. We need a BBC series of protein hunters and with a really sexy cast. And each week there's a new protein that's discovered and makes everything exciting. And we pass it through Parliament, Congress and everyone else and solve all the world's problems.

Laksh Aithani:
Definitely. I would be very supportive of that.

Tess van Stekelenburg:
And then as someone that spends basically every waking hour of your life when you're not sleeping, thinking about this particular problem, what are the things you're reading, things that you're excited about, developments in the field that people that haven't spent those amount of hours might find interesting to hear about or should be digging into after this episode?

Laksh Aithani:
Things that I'm reading, well, I really like the healthcare field. Obviously it can be criticized but I do really like it. One book that I really enjoyed reading recently was, and I'm just getting it up, so it's called For Blood and Money, Billionaires, Biotech, and the Quest for a Blockbuster Drug. And it is essentially the story of actually two blood cancer drugs that were developed. I won't go into detail but it's a very thrilling story.

Tess van Stekelenburg:
And Danny, I think scary truth, we can edit this out, but I saw Laksh at some point talk into his ChatGPT and ask, how do I become well-rounded? And it had been so over-trained on drug discovery that its response was, read chemistry books, read biology books and learn about molecular biopharma. And so he's programmed into this world of small molecule drug discovery that he just can't exit even in his GPT environments.

Danny Crichton:
Well, I can believe that, now all you need is a vision pro. And then as we have 3D, what we need is a 3D folding at home Vision Pro app, so I can walk through the protein and actually experience ligands and protein folding all in one as an at-home experience.

Laksh Aithani:
I think if we could get that, that would definitely make me buy a Vision Pro.

Danny Crichton:
See, I'm telling they're selling them on every single podcast and yet Apple doesn't set as a free one. But I think we've covered quite a bit of ground on this episode. Laksh, thank you so much for joining us. Tess, as always, pleasure to have you here.

Tess van Stekelenburg:
Fantastic.

Laksh Aithani:
Thanks so much, Danny. Really great to chat.

continue
reading