Why high-throughput bio research needs better tools immediately

There have been data revolutions in most areas of human activity, and biological research is no exception. The rapidly shrinking cost of collecting data like DNA sequences means that there has been an exponential growth in the amount of data that bio researchers have at their disposal. Yet, most biologists still operate on top of general purpose cloud compute platforms, which don’t offer a native environment for them to engage in research at the cutting edge of the field.

On the Riskgaming podcast today, Lux’s ⁠Tess van Stekelenburg⁠ interviews ⁠Alfredo Andere⁠ and ⁠Kenny Workman⁠, the co-founders of ⁠LatchBio⁠ who are on a quest to rapidly accelerate the progress of biology’s tooling. The big challenge — even for big pharma — is a lack of access to top-flight AI/ML developers in the ferocious talent wars they face against even bigger Big Tech companies. As Workman says, “They just don't have world's best machine learning talent … And then they're working with usually 5- to 10-year-old machine learning technology, except for a small handful of outliers.” LatchBio and other startups are pioneering new ways of delivering those tools to biologists, today.

In this episode, the trio discuss the changing data economy of biological research, the lack of infrastructure for conducting laboratory and clinical work, why AstraZeneca has improved its pharma output over the past decade, what the ground truth is around AI and bio, the flaws of open-source software, and finally, how academia and commercial research will fit together in the future.

Episode Produced by ⁠⁠⁠⁠⁠⁠Christopher Gates⁠⁠⁠⁠⁠⁠

Music by ⁠⁠⁠⁠⁠⁠George Ko⁠⁠⁠⁠⁠⁠

‍

<- Previous

continue

reading

Next ->

Transcript

This is a human-generated transcript, however, it has not been verified for accuracy.

Tess van Stekelenburg:
Welcome on the podcast, Alfredo and Kenny from LatchBio. Can you do a very brief introduction to what you guys are building and in particular, what problems you guys are solving for your customers today?

‍

Alfredo Andere:
Yeah, totally. And yeah, thanks for having us, Tess. I guess to explain, I'll just go into the high level of what we're doing, and then we can go into a customer example. We are building a cloud computing platform for biotech companies, and that stems two questions. First of all, why does biotech even need a cloud computing platform? What is that? Then the second question is why not just use one of the existing cloud computing platforms? I mean, AWS, GCP, they do the job, right?
For the first question of why they need a cloud computing platform, I think we need to go back about 12 years ago, when biology mostly was being done at a data level where you could look at most experiments after they were done and determine the results. I'm talking about fluorescence experiment, that you could look under a microscope and see how bright they were. I'm talking about... cell counting is actually my favorite example just because when you think of a cell counting experiment, you think of this microscopic complex machine, but really what it is is a biologist with one of those machines they have at the entrance of a stadium when you're tapping your thumb and counting people. Well, instead, the biologist is looking through a microscope and just counting maybe 10, 50, 60 cells and then determining what to do for the next experiment.
Then, over the the last 12 years, I mean in 2012, we had CRISPR, huge advancement towards gene editing. You also have automation go down from like $400,000 Hamilton machines to like $2,000 Opentron machines. You have gene synthesis go down 1000X in price with twist continuing to drive the price down, and you have many other technologies, but obviously the gorilla in the room is NGS, which is Illumina, next generation sequencing, driving down the price of reading DNA from $100,000 to $100 in the last 10 years.
With all these technological tailwinds, suddenly same biologist spends the same time and money doing an experiment, but instead they receive a file with a hundred thousand lines of ACTGs of DNA or a hundred million lines of ACTGs. What are they going to do with that? They're not going to look at it with the human eye and count it with something in their hand. They're going to have to process it through huge computers that are going to turn it into plots, graphs and statistics that you can actually look at with the human eye. And you can't do that in your laptop. Your laptop's too small, and so you need to do it with large computers. Ideally they're in the cloud, and so that's why biologists today need a cloud computing platform.
Now, okay, so that solves question A of why, but the second one is well, why not use the existing solutions, AWS, GCP, Azure? And the honest answer to that is that is what they do today. Most of them, 80% of biotechs that need to process high throughput data are using probably AWS. The problem with that is that it's pretty inherent to AWS. Like if you ask a manager at AWS, they wouldn't say this is a problem, which is just how raw the platform is. They build very raw building blocks that you can then use to build whatever technology you need. I mean, they serve marketing, they serve FinTech, they serve healthcare, they serve everyone, including biotech. That means that if you're a biotech, you need to go into AWS, kind of get started, and then spend millions of dollars hiring a wing of software engineers, computational biologists and bioinformaticians to build out a platform with the raw building blocks that will then process your data from raw genomic files into some more interpretable results. And that takes millions of dollars.
If this was five, six, seven years ago when there was like five companies doing this, that's fine. That's their secret sauce, like LeadRecursion or Invitae built their own platform, but today there's hundreds or over a thousand companies that need to do this kind of data processing, and yet they're each reinventing this wheel from scratch, grabbing AWS's building blocks, spending millions of dollars, and building out this data processing platform. We saw that through Kenny's research as an undergrad. We saw this by talking to over 300 people, and we decided that we could build a pre-configured cloud computing platform that biotechs could hook into from day one and start analyzing their data without needing to configure 80% of what they were doing before.

‍

Tess van Stekelenburg:
I remember when I first came across Latch, you guys just had a website with a page, a very beautiful manifesto that was written out with your vision for biology. You have an analogy in your manifesto about benchtop experimentation being the machine code of the biological programmer. Can you walk us through your vision?

‍

Kenny Workman:
Yeah. Yeah, absolutely. It's all quite abstract, but we're trying to articulate that biotechnology is uniquely poised in this kind of calm before a storm, where we are reducing it as a substrate to well-characterized building blocks that are conducive to the set of engineering tools that built most of the modern world throughout the 20th century.
So what are we talking about? Well, today in terms of therapeutics, we live in an era of genetic and programmable medicines. A decade ago, we discovered CRISPR. A decade has passed and we have Casgevy in the clinic, which is a molecular therapy that edits DNA. Similarly, we are taking cells out of patients' bodies, genetically modifying them to become adversarial towards cancers and cardiovascular disease. Chimera was the first instance of approval in 2017. Moderna, the company that pop culture is more or less familiar with, is actually developing cancer vaccines that are in Phase 2B trial. The rate of progress in medicine is insane.
People like to harp the falling costs of DNA sequencing and synthesis a lot, but even more exciting is just our ability to measure many other types of molecules in a truly high-throughput, high-scale way. Proteins, epigenetic signatures, these sorts of things. And as Alfredo touched on the rise of automation, microfluidics, perhaps most excitingly we're repurposing semiconductor technology to move biomolecules with transistor scale precision to precisely configure molecules. Computers are quietly getting faster. Machine learning is actually becoming useful. We've seen over the past decade the rise of alpha fold evolutionary scale, and more recently EVO out of the Arc Institute, funded by Collison and led by Patrick Hsu that are machine learning tools that are actually able to make a real dent in some of the molecular prediction tasks in biotech. There are quite a lot of tailwinds that are converging to really make biology a truly engineerable discipline. Although people have been talking about it for decades, we really have the tools we need now.

‍

Tess van Stekelenburg:
There's a lot of hype right now off the back of this shift towards high-throughput data and the computational tools that we're developing in order to analyze the patterns and extract data from them at scale. AI for biology has become the theme. Nvidia is talking about it. Every single biotech company is now either a foundation model company on their assay, and we're all trying to figure out what these workflows are going to shift to. The one thing I think is unique about how you guys are thinking about it is you tend to have a more sobering take. Can you dive a bit into your current thoughts on AI?

‍

Kenny Workman:
We have a very strong engineering team that's well-grounded, and I would like to consider well-grounded in fundamentals. Machine learning is not magic technology and to us, I think even words like AI are disingenuous. Statistical information processing are probably more appropriate to myself and others on the team. It can work more steeped in traditional linear algebra statistics and the grounded theoretical underpinnings of these methods.
What machine learning methods are doing, taking observations of physical phenomenon, namely classes of molecules or proteins, converting them essentially to vectors of numbers embedded in some high dimensional coordinate space and fitting functions to map the vector representations to properties we care about how good the molecules or proteins are at carrying diseases, for example. Given some new object, can we then predict the property we care about? If the goal is to construct a drug to cure a human being of disease, there's just far too much complexity in the physical world to expect these fitted learned models to actually predict things in a well-behaved way.
A good way to understand the magnitude and class of variability is to look at the anatomy of a real drug discovery program. AstraZeneca actually published a super cool paper. AstraZeneca is the top five or six market biopharma company, and they converted their clinical success from a 4% rate across all drug programs between 2005, 2010 to 20%, so an enormous boost, by just looking at very specific levers they had to play with to influence success. And these levers were right target, right tissue, right safety, right patient, right market. They call these the five R's, but each one of these individual levers or variables has an enormous hive mind level of complexity that requires a university of scientists and it developed, for instance, an expertise in its particular disease and its particular therapeutic modality.
Machine learning, when we fix many of these sources of variability and constrain our problem to one that is very well-defined, can be incredibly useful. We've seen lots of examples of this, but we cannot overstate the amount of rational thought that goes into fixing all of these sources of variability, essentially constructing a lucid model of how biology actually works. I think a lot of the people who are projecting optimistic visions for the future of AI foundation models are kind of ignorant to or ignoring conveniently in their picture of how these methods will work. The drug discovery industry is a freaking complex beast. I just do not see how these challenges are going to be met with what we have now.

‍

Tess van Stekelenburg:
Alfredo, what do you see as some of the fundamental changes that AI is going to have in biology?

‍

Alfredo Andere:
Obviously I'm extremely skeptical, just like Kenny, of AI in the short-term and the impact it will have to biology, but I think in the long-term that changes radically. I believe that the future of biology will be high throughput, irrational drug design using artificial intelligence. And what that means is... well, let's go back in history. Think back to 1950, 1980. So the Pfizer and the Mercks were coming up in the world and were identifying targets for disease and then screening against them using thousands of natural drug candidates. Merck famously would pay its employees. If they went on a trip, if they went on vacation, they would pay part of their plane ride even if it was personal, if they brought back a vial of dirt samples, because then they could screen that against the targets they had and they have these libraries of millions and millions of them.
Then, Regeneron with genomics and Vertex with small molecules. At the time, they had this crazy idea of, "What if we made drug design rational? What if instead of doing this kind of high throughput like screens, we use techniques that have advanced in computational and crystallography and genetics and NMR to design these drugs around a target rather than just screen millions of molecules around this against this target?" And that worked out. I mean, if you look at Vertex's stock, Regeneron stock, you can see it worked out incredibly well. They built the next generation of pharma and over the past 30 years that has had a massive, massive success.
But now what we're seeing is a new shift, right? Because all the trends of demultiplexing, all the trends of being able to have these AI models that capture large amounts of data and then rationalize what Merck was doing very irrationally, they can now rationalize. And so every trend is pointing towards these millions of data points generated through these highly multiplexed biological experiments just like Pfizer and Merck were doing, but in a way where you can detangle that and interpret it through these billion or trillion parameter general function, approximators, LLMs.
That all brings me to the final answer, which I used to think the holy grail of biology would be to simulate an organism like a mouse or a human, well down to every cell, and then grow it out in simulation, like self-driving simulations in AI, but for biological simulations, and then simulate it growing and you now have this perfect atomic accuracy model of each cell and then you could test a million molecules at the same time and see which one is curing the cancer that you just modeled.
Now I don't think that will ever happen. If you think about that model of the world, any model of the world that you're trying to simulate, it's actually quite inefficient. It could be compressed a lot into function approximator. And so I think what we will have is we'll feed so much data into these AI LLM models that they'll be able to approximate the world super closely. And while we will probably not be able to interpret what is happening inside, we'll still be able to give it the million molecules and be like, "Hey, which one of these would affect the model your compression into the second state that we want to compress towards?" And they'll be able to interpret that and give you an answer of these two molecules that you should be trying. And so I think that is the future where the holy grail is AI models compressing the world and us being able to interact with that, like an oracle, that we don't truly understand.

‍

Tess van Stekelenburg:
Last question, which has been on my mind a lot. When you have increased accessibility, so when things are not just locked up as a very big model that's being used for internal drug development, but if you actually open up some of these models or open up some of these bioinformatic tools and make them more accessible to people, what are your thoughts on open source versus closed source? So the idea of having an ESM, an EVO and AlphaFold 2 versus keeping it internally.

‍

Alfredo Andere:
I love open source and I think it's done a massive good for the world. I think it's a double-edged sword and a lot of people don't realize it. And I actually want to talk about the two dangers of open source, because everyone touts it as the greatest thing since sliced bread and in many ways, I agree. We have two things. So the first one is the fallacy that if you release something out to the world, you'll have all these contributors swarming to help you build it, and suddenly you'll have a huge community and you'll have to do much less work because the whole community is contributing. That is a total fallacy.
If you look at the stats around this, it turns out that most big projects are carried through by only a few number of people. The average number of contributors to most open source projects... sorry, not the average, the median, is actually zero. And to release an open source project, it needs to have a much higher quality. So it means you're putting in all this time to have it be open source quality and being able to release this code to the world in this really neat and readable quality. And most of the time you're getting no benefit from it. I do think there needs to be some skepticism around that part.
And then the second part is the business model. I love companies open sourcing as much as they can, but I would much rather... if you look at those successful software companies in biotech, we have two. We have Beva and we have Benchling. And, well, apart from the fact that neither of them are open source, we need more. Biotech needs more software. Biotech needs more tools, and yet everyone has this expectation to get free open source software and that doesn't always lend itself for the best business models. And so if you ask me, would I rather have more successful software for biology that is helping these biotechs advance their science forward or would I rather have more open source? I would say I would rather have better tools and move the space forward. And so I think those are kind of the two edges of the sword. Again, huge, massive benefits to open source and I'm extremely grateful for it, but important things to consider if you're considering making an open source project in biology.

‍

Kenny Workman:
Yeah, I think I would agree with Alfredo that the effort to energy it takes to make a product open source is certainly not worth it. And I don't think it's actually even a super interesting conversation. Not to denigrate your question, but as someone interested in the progression of this technology and these methods towards solving actual useful problems, the question of whether something is open source or not seems like a very much a rent-seeking type behavior. It's like the same people who talk about AI alignment when we're making massive advances in the distributed training and architectures and theory behind large language models. I don't care about these superfluous questions. I care about how good is the thing at solving real problems.
And to Alfredo's point, they are getting faster and better. And by the time that, again, this podcast is released, we will be talking about a new class of these models, and the meat of the discussion is going to have to only be relevant to the people developing the project, amongst a small circle of people who have all the tacit knowledge and understanding needed to actually make progress on it. So I don't think that open source has a particular utility broadly, especially in biology. And again, I want to harp on the fact that what is going to make the most progress here is our ability to assay and synthesize an increasingly large throughput the actual state of biomolecular systems and anything we can do to move that needle is something I'm more interested in helping.

‍

Tess van Stekelenburg:
I mean, you can also imagine a lot of these drug discovery companies, they probably have some of the world's best machine learning talents that are building models that only they are using internally. I think there's increasing incidents-

‍

Kenny Workman:
They don't. They don't follow it. They just don't have world's best machine learning talent. And then they're working with usually 5- to 10-year-old machine learning technology, except for a small handful of outliers.

‍

Tess van Stekelenburg:
So let's say that there's a group of these drug discovery companies that are developing their own internal machine learning models that they're pointing towards or training on their own assay data. It's incredibly effective at solving their problem. If you consider just even the value of being able to access EVO for viral capsid design, which you were doing because of the fact that it was an open source model and you could use it on Latch and you could do it on a plane. If you guys think about your own demographic, I know that you'll always have a special place in your heart for academics and for making sure that they do get to use Latch even though they're not these big companies. How are you guys putting that actually into your product and into your practice, and is that going to remain the case?

‍

Kenny Workman:
Our ability to host EVO on Latch almost does anyone no good without the host of tacit understanding of how to run large libraries of AV capsids and test them and how to do these things. So what I think is actually useful though, and what Latch is going to pursue and is pursuing is building sleds between academia and companies like Dyno and companies who adopt different infrastructure. So the moment that gold standard tools and computational biology, machine learning, etc, are developed, they're instantly rolled out to the thousands of companies in industry where they can instantly make use of them. And we're doing this with a variety of assays.
If you think of something as simple as Bulk RNA-seq, which has been around for over two decades, what aligner do you use? What differential expression statistical tool do you use? What libraries do you use? What's state of the art and how can you use it instantly in a way to get insight from your data so you're not bottlenecked? These are the things that we're interested in doing.

‍

Alfredo Andere:
Just to continue on that, I mean, do you know how expensive training machine learning models is? I mean, obviously it's how expensive it is. You give people money to do that.

‍

Tess van Stekelenburg:
Yeah.

‍

Alfredo Andere:
Yeah, yeah. That's literally most of your money, most of Veloxis's money being spent on going and training machine learning models, some of that in bio. And that's good, right? And it almost strikes me as the same question, or it can be reduced to the same question as should drugs be price-controlled and should we not allow people to charge more for drugs? And what we've seen is you can do that, and many countries do that, and the US does less of that. And what you end up seeing is that most of the innovation in biotech ends up coming from the United States, because if I want to move a state-of-the-art technology forward and I know I'm not going to be able to charge for that once I get to the end, then I'm not going to do that as a business. It just makes no sense. Maybe I will do it because I feel very giving that day, but in general, large corporations will tend to make decisions that align with the money that they can make from them.
And so if we want to incentivize companies to create the best drugs and if we want to incentivize companies to create the best machine learning models, we should then incentivize them to charge for those drugs and models when they create the model. There are many ways to make knowledge accessible while still capturing some of the value you created, and academia is missing some of that. And that said, academia doesn't have money. How do you solve that problem? I do think we should be giving a lot of weight to academia and just hoping, and that's what we do at Latch. We just hope that the word of mouth and the goodwill and their amazing research to the world, and a little karma here and there will at some point revert back into being good for Latch. But academia doesn't have a lot of money, and I think you shouldn't try to invent some novel business model for them. You should just give them your stuff as discounted or as free as possible, and then hope that karma pays you back later.

‍

Tess van Stekelenburg:
I strongly resonate with that. I mean, I do think the future is likely going to change a lot, where the way that we are monetizing these models, the way that drug discovery companies will be maybe using these models to go hire people, commercializing models instead of just only drugs, I think we're going to probably enter a world where that whole intersection and the business models around how the data is being commercialized and maybe their own internal data sets are going to become, like you said, just APIs that other companies can use it. It looks a lot more like a puzzle piece. So yeah, I think this whole... Very early days of this ecosystem. Thank you both for making time today.

‍

Alfredo Andere:
Thank you. No, yeah, thank you so much for having us.

‍

Kenny Workman:
Yeah, thank you.

‍