Connections are the key ingredient for careers, society and AI neural networks to boot. Sometimes those connections arise spontaneously and other times they’re planned, but the most interesting ones tend to be planned that go in unexpected directions. That’s the story of David Ha, the co-founder and CEO of Sakana, a world-class generative AI research lab in Tokyo, Japan.
We previously announced that Lux led a $30 million founding seed round in the company a few weeks ago on the podcast, but we didn’t dive deeper into the ricochets of David’s peripatetic career. Studying computer science and machine learning at the University of Toronto, he worked down the hall from now-famous AI researcher Geoffrey Hinton. He ultimately headed to Goldman Sachs in Tokyo doing derivatives trading, but on the side, he published a shadow and anonymous blog where he posted random experiments in artificial intelligence. A decade later of serendipitous connections later, and he is now leading one of the emerging national AI leaders for Japan.
We talk through the stochastic moments that defined David’s career, why complex systems knowledge would ultimately turn out to be so valuable, the unique features and benefits of Japan, why openly communicating ideas and particularly interactive demos can spawn such serendipitous connections, why industry has produced more innovation in AI than academia, and why Google’s creativity should never be discounted.
Produced by Christopher Gates
Music by George Ko
Transcript
Danny Crichton:
Grace, we're here today to talk about Sakana, you led along with Brandon here at Lux Capital, a $30 million founding seed round into the company. It's our first big investment into Japan. Well, it's our first investment ever in Japan, so that's very exciting. But there're two questions on my mind. First, why do we need a whole new AI company, one that's focused on other types of large language models and two, why Japan? So, maybe we can start with the first.
Grace Isford:
Yeah. Well, thanks for having me. We're very excited about Sakana AI and partnering alongside not only David Ha, but also Llion Jones and Ren Ito, the three co-founders. They're just really fantastic. When you think about AI today, AI scientists have made tremendous progress over the last few decades. First, we saw the rise of computer vision and ImageNet. We first partnered with Matt Zeiler of Clarifi back in 2013. We saw the GANs, generative adversarial networks. And then, the Transformer paper, of course, published by Llion Jones, one of the co-authors in fact of Sakana. And now, we see Transformers undergirding huge systems. You've seen in tandem with that, the scale of not only data, the scale of compute and the scale of these models themselves.
So, bigger and bigger models, more and more compute and GPU usage, and it feels like we're scaling with no end in sight at this point. What is the bigger model you can do, and the more access you can get and access to GPUs? Sakana is taking a different approach and it's very actually emblematic of their logo, which is a red fish swimming away from a sea of fish.
Sakana is, of course, the Japanese word for fish, and they're thinking when everyone's looking at scaling, and how do we make Transformers bigger, and models larger and larger, how can we take a different approach? And so, they are technically doing things quite different. They're actually taking advantage of evolution methods or evolutionary methods of intelligence.
So, these are collective intelligence models. Simple lay way of saying it is smaller agent-based models that are working together in cool ways to create more efficient and adaptive models going forward. It dates back to a lot of David's incredible research working on the JAX framework at Google. He actually has a few other folks who have done work on EvoJAX and other really exciting frameworks who have since joined the Sakana team.
It also works with a lot of Llion's work on character-based training, which is another key aspect, which is basically how do we train these large language models not just based on the per word basis, but on a per character basis. So, we're leveraging, for example, a Japanese character, Kanji versus a word, and how can we train models that are specialized for that language specifically.
So, they're really looking bigger picture with Sakana of how do we do models that are simpler, more efficient, and also going to achieve the same result with these nature and inspired methods versus trying to raise hundreds of millions of dollars for compute and scale the biggest model possible.
Danny Crichton:
And I think it's amazing because the symbology of the fish, this idea of a school of fish coming together, all swimming in the same direction, creating a path forward, I think encapsulates the thesis here, but one of the most interesting facts about this company is David was at Google, Llion was at Google, and yet they're in the center of Tokyo Japan. How did that happen and why are we making our first investment here in Japan with Sakana?
Grace Isford:
Well, Japan is a really interesting place. My fun fact is I actually spent part of my childhood in Japan as an expat, and it's a really special place for those who have visited. Not only is the technical infrastructure amazing where you can hop on bullet trains, the subway is spotless. It's a really a culture where people are taking care of each other.
I remember walking around when I was 10 years old alone and everyone was looking out for you. You see a three-year-old on the subway where everyone was looking out for them. And it's really this really beautiful culture that I encourage everyone to go and visit combined with huge early adopters of technology. I believe they were one of the first international markets to adopt Twitter.
Over 50% of the population uses it today. Think of all the cool innovations that have come out of Japan, whether it's the Sony Walkman or it's the bullet train, the LED light, lithium-ion batteries, huge robotics and humanoid robot adopters as well. So, they're really a huge innovation culture and economy and early adopter of that.
Fast forward to kind of David and Llion stories, they came to Japan in unique ways. David moved there for a Goldman Sachs in the derivative training division and was working there. He's not Japanese by background either. He actually was studying in Toronto prior to that. Lylin came because he was looking for somewhere else to move in Google's office, and had a great vacation Japan, and never came back.
Japan has a great way attracting folks to it and also keeping them there just as whether you're an expat or whether you're someone visiting because it has a very unique culture that I mentioned.
Danny Crichton:
Well, and we've just seen the government of Japan just introduced a digital nomad visa in the last two weeks since we recorded this. Obviously, the company has attracted enormous number of candidates, both locally and domestically from University of Tokyo and a lot of great research labs, but also internationally, a lot of folks who want to work on the frontier of AI and LLMs, but also would like to do that in an amazing city like Tokyo.
And so, bringing those talent pools together, bringing the fish hall into one school is an amazing story. We've got David on the line. We are actually recording this about 12 hours after talking to David because we were on a different time zone. And so, let's dive into our conversation with David, his story background and what he's building with Sakana now.
Are we recording? Oh, wow. Oh, amazing. Wow, that was a sudden... we're live. Did you know this is a live show, David?
David Ha:
No, no.
Danny Crichton:
We're broadcasting live on YouTube right now. No, not really.
David Ha:
It's cool.
Danny Crichton:
Don't freak out. I don't know if that joke works better in the morning or late at night, but it's so great to have you on the show. I want to dive in because I think you are an extremely rare person who was on the rarefied track of investment banking, and the IB world, derivatives trading, popular place doing complex mathematics.
But then at the same side, you had this career shadow life where you had an anonymous blog and not the kind of Wall Street Oasis anonymous blog complaining about your colleagues, but you actually did productive things. How did this come around and why did you build this outlet?
David Ha:
I'll give you a background. I studied engineering science at University of Toronto, and when I graduated, it was still a few years after the dot-com crash. So, it's quite a long time ago. Maybe many people don't remember the dot-com crash now. It was pretty hard to get technical jobs, and what I worked on as an undergrad was on things like control systems, signals and analysis, and even neural networks.
So, my lab was quite close to Geoff Hinton's back then when no one cared about Geoff Hinton and neural networks. It was during the neural network winter. So, at the time, I was hanging out in the library, and I discovered some cool math books, and one of them was a book on derivatives pricing. So, that led me to get into the world of finance, quantitative finance,
It was the business section and in the appendix, there was C++ code. So, I thought it was cool. It was a cool concept for me. Luckily, at U of T, there were a few other profs. There's a professor called John Ho and Alan White. So, they wrote the book Options, Futures and Other Derivatives as one of the BIPOs. And I audited their class as an undergrad. They had an MBA class. So, I went through all of their exercises, sat into their MBA class, and they let me, and then that was basically how I got into Wall Street.
Danny Crichton:
Now, I have to point out how crazy this is. So, Grace and I both went to Stanford, but very different times. I was there during the great financial crisis. At least at Stanford at the time, there was the people who wanted to go into investment banking whose careers got completely waylaid. No one got a job. Goldman didn't comp. It's hard to believe, but the big banks, the big consulting firms were gone.
They didn't show up. And the only thing you could do in 2008 was to join app companies. It was basically the app store was coming. Apple just released the iPhone, and so everyone got waylaid from economics and the smart people ended up doing ad sales. They actually went into ads when the mobile networks had just gotten started. You did the exact opposite course, which I have to find is funny to me.
And maybe apropos to Sakana and the whole way you're building the company, but you went into finance during the great financial crisis when no one else was able to break in.
David Ha:
It was crazy. It's like swimming the opposite tide from my class, if anything. I liked it too at the beginning. It was interesting. It taught me many concepts like probability, distributions applied to the real world. As a practitioner in finance, as an options trader, you really learn about how the math doesn't work, especially in the financial crisis.
I think that's one of the things that many academics, especially in machine learning might not appreciate when you have other theory, you have tail distributions, you even have the so-called real-world distributions in the data set. But whenever you deploy stuff in the real world, it'll never behave the way that you intended it to behave.
This is why I think understanding these concepts, complex systems, dynamical systems was an important part of my education in finance. The whole school of academics that are taught to understand, the efficient market hypothesis, everything is a nice behaved Gaussian distribution and a random walk. Maybe that might work for an individual stock trader or an individual investor.
As a trader at a company like Goldman or any large institution, the price, the trajectory of the asset price is directly a function of what the participant does. So, also on one day, if I decided to, through some customer or through my own strategy, buy or sell certain bonds, then that would move the price and that might actually, through some butterfly effect, change the trajectory of the price 15 years later.
I dive into an appreciation of the works of George Soros. So, he was a proponent of something called the reflexivity theory. It's like rather than the price is a reflection of the world, the world is a reflection of the price. Therefore, the price actually affects how the behaviors in the real world. And hence, there's a positive feedback loop that can happen in finance if... that was how he triggered how the Bank of England had to change the policy because of his price action.
He claims that it's going to happen anyway and he accelerated it. But regardless of what happened, it does show that this is related to machine learning in a sense as well. It shows the environment is actually shaped by the actions of its agents. So, right now in the machine learning world, you have the datasets and the benchmarks.
So, here's the ImageNet datasets, and you train a model to get a high score on the validation set, or you have some reinforcement learning environment and you train the agents to play that game. But in the real world, the line between the environment and the agents is very blurry. In some ways, the environment is part of the agents and the agent is part of the environment.
And I think these are the themes that I'm interested in as an AI researcher. It's like rather than training state-of-the-art model to do some particular task, you want to look at the intricacies of how these models interact with their environment and with each other.
Danny Crichton:
I think what you're getting at is the core philosophical divide in statistics when you're talking about frequentism and frequentists. You have a data generating process, you have this collection of data, you have an ideal model, and you're trying to find which ideal model matches the data.
So, when you're trying to match to an ImageNet benchmark or any benchmark, you collect models, you have 50, let's say you run them, you back test them and you're like, "This is the model. This one is the best, this is the most accurate." And it seems to me like you're much more of a Bayesianist. You're coming at it from I don't know anything about the world.
I have some prior knowledge, but I'm open-ended to how all the intricacies, all the complexities work together. I have to see what emerges out of the system and I will mold the model to kind of fit the world as it is, not the world as I wish it was.
David Ha:
Much of my thinking and how I approached this subject was also studying some of the work of the Bayesian probability theory, not just in finance, reading the likes of basically, you can't really predict anything. It's all Beijing. But in machine learning, if I recommend one textbook that everyone should read, it's one of those timeless textbook and it's written by Chris Bishop.
It's called Machine Learning Probabilistic Models. And I think that was when I picked up some of the basics about how to think in a Bayesian framework. No, actually, that was one of the early books that I had the opportunity to study during lunch when I was working at a Goldman Sachs as a trader. Luckily I was in the Japan office and you have really long lunches.
Danny Crichton:
I don't think that's a quality most people associate with Japan or East Asia is very long, three martini lunches at the office.
David Ha:
It's true. I think people would go have lunch or in the old days before my time, it's not allowed now, but probably go drink and smoke with their brokers and talk about stuff. I thought I was going to come here to stay for two years because I liked anime and stuff, but I ended up staying for 10 years. I worked in a tower called Roppongi Hills.
Danny Crichton:
Yeah, of course.
David Ha:
And after buying lunch there for two years, you save up lots of points. They have this little point card. Japan loves points cards, and what I discovered was I can use that point card and gain access to the Roppongi Hills library. So, it's right in the same building above the Goldman trading floor. There's this private library and I could access it for free because I did lots of lunch buying over two years.
So, every day at lunch, I would go to the library. I read up on machine learning. I also read up on many papers about evolution, some of the papers written by folks like Ken Stanley and about the theories of open-endedness, about meta-learning and some of these things. I thought it was better to learn concepts and try to build something rather than just read books and read papers.
So, I'm like learn it by building. So, what I ended up doing was I created... I like this simple language called JavaScript because you don't really need a compiler, you just need a laptop, and a text editor, and it'll run in your browser. And it looked like C, which I learned in school and it's a bit messy. So, I ended up implementing many machine learning experiments in JavaScript.
And one of the benefits of JavaScript is you can actually play around with the models on your browser. So, it was easy to create these demos where I can test my results interactively. So, I think some of my earlier works, it was really interactive machine learning based. So, many people publish papers, but I like to publish paper, but also have an interactive version of the paper that people can play around with.
Danny Crichton:
Well, and I will say when you think about that era, and even to the present day and computer science is typically the conference paper, you go to a big conference. NeurIPS, I know a bunch of us just went down in New Orleans, hopefully Vancouver next year. And the idea is this big poster and you get a poster award, et cetera.
And it's always so sad to me because on one hand, it's great that we're making these theoretical contributions and people have good work, but computer science is so experimental. You can actually write the code. You can actually play with it. You actually have this artifact you can do something with, and yet we print it out and paste it to a piece of cork board and off we go. As opposed to something that people can actually use and demonstrate something in their browser or whatever the case may be.
David Ha:
There're deeper implications of this as well. When you're presenting your research as something that people can play with, it's much more impactful than just having a paper with a table one, your figures and your results on some benchmarks because people can actually identify holes in your approach, which the researcher might not like, but it's actually important that people can find funny examples, post it on Twitter and have it become viral.
It also allows people to fork the approach and extend it. So, I think an extension of this interactive ML phenomenon, I think it died down, but it got replaced by the current open source phenomenon. So, people can kind of, "Okay, here's something cool I did. I randomly interpolated the weights of two stable diffusion models and I merged them."
It doesn't make any sense, but the results look really cool. Here's the model, and they post it on Hugging Face. And now, that's the proof of the concept called model merging and other people take over. So, I think getting the hands dirty and building stuff is a lot more important than presenting in the results on the tabular formats.
Danny Crichton:
And this is the tension between mathematics, and the elegance of formulas, and writing out a proof, and then at the end you're done, QED with engineering. And the idea of engineering is there's never a done point of engineering. You have these demos, you play with them, you can obviously optimize them, you can make them better. You could do that for the rest of your life and never get to the end as we see all the time in artificial intelligence.
So, it's very different. But I do want to go back. So, you were in Roppongi Hills, you're a Goldman, you're doing derivatives trading, you're getting these long Japanese style lunches and you're in the library. Where did the idea for an anonymous blog come from for all these different hacks and experiments you were sort of playing with?
David Ha:
Yeah, I kept a diary of what I was learning and post some of my results online. I had GitHub accounts, but I also want to write up some of the projects and get people to play around with it, but I really can't share it under my real name. Living in Japan, I like eating sushi. So, I called my blog otoro.net under this... it's a tuna name. I put some of my experiments there.
Then yeah, some of them became popular. I think as a researcher, getting feedback from audience is important. You post some model, people can play around with it, and tell you what works and what doesn't work. So, it's like, how do you say, Instagram for researchers in a sense, you count the feedback of what you're doing. Yeah.
So, you post something and see if it organically has any impact. If people like it or if people don't like it, maybe you can tweak your approach to find more interesting. So, I was more going about this thing of pursuit of novelty, like pursuing novel, interesting things. As someone with a MacBook and in the library, you can't beat ImageNet or CIFAR-10.
You can create some virtual creatures that battle each other, but you can't really beat state of the art. My whole machine learning education was about pursuing Interestingness, doing things that haven't really been done before. And I think looking back, if we look at the education of the machine learning researcher, I call this like the tiger mom theory.
So, you have a cohort of educated people who wants to achieve the highest grades and they finally get into the MIT undergrad. And after that, they want to work for all the right profs, and then get into the right grad school. Then, the metric is producing the most papers, and the most citations, and so on.
So, I think this culture might actually explain how we got to where we are now, though we have this culture of chasing high benchmarks, larger model sizes, improved state of the art. It's important though. I think we need this in a sense because if you add up a million incremental improvements, you get breakthroughs.
So, that's one side of how technology develops, but you also need the other side. You need people that just think of some wacky new ideas that was never meant to work. Fast forward a few years, they look at it now again into this open source or weird foundation model world. You find a lot of these breakthroughs are not happening at labs.
It's some hobbyists that are doing some cool stuff with hacking on some other people's models and they discover cool concepts that people at labs might not discover themselves. So, you have this, right now, it's like a cyberpunk mentality. You have all these complicated big models and you have the hackers. So, it's the hackers versus the academics, I think. I think the hackers are important.
Danny Crichton:
I think what you're getting at is this is fertile territory. You can do whatever you want. You can experiment. You can hack. And when you go to academia, unfortunately the pathways are very strict. There are a couple of conferences you must present. To have a paper that is accepted to one of these conferences, it has to meet a certain bar.
Those bars are determined by panels of judges, which are very constrained. You can't just show a little demo and be like, "Look at this, I'll post it to a post board or whatever." And unfortunately, those careers are so long as you were pointing out from undergrad into graduate school into postdocs, into a professorship. And at any point if you sort of give it up, if you start to experiment or be a rebel, you're off track.
There's kind of no way back on. It's very competitive. There are very few slots and it windows all the way through. And so, in some ways, it shouldn't surprise us that a career path, which is built on follow the herd or in your case or more apropos, a school of fish all going in the same way.
Each salmon being caught by the bear as they're going upstream against a very, very strong current upstream being the fish that doesn't follow along in the same path maybe the best way to find something new on the edge of what's possible. And so, I'm not surprised that we're seeing in the decentralized open source communities way more innovation than in some of these AI cathedral hubs at major universities.
David Ha:
Yeah, it's true. And also the weird thing about your point is at some point in academia when you actually... the light at the end of the tunnel is tenure where you're supposed to have the freedom.
Danny Crichton:
Now you're free. You're free to do whatever you want.
David Ha:
You're free, and you see-
Danny Crichton:
And no one believes that.
David Ha:
... they can do that. But the reality is many profs, at that stage of their life, might not become so free. Although some profs, I know are quite innovative once they get tenure or at least close to tenure. So, I think some people still take advantage of that, but surprisingly, not so many.
Danny Crichton:
It's the selection effect. If you made it all the way there, there're some folks who still have the little pilot light that's ready to spark. As soon as they get the tenure offer, they're able to be like, everything is going away. All the useless papers are done. This part of my career is over. Now I have 20 brilliant ideas I'm ready to actually go do.
But the great part about I think AI, engineering, computer science in general, unlike many other fields, you go to biology as much as there's democratization, most of the democratization is coming from AI. But unfortunately the reality is, is that you need a lab. It takes hundreds of thousands of dollars, if not more to get going. You're not going to get a lab until your 40s.
It takes a really, really long time to get going and be able to do original science. What's amazing with computer science is you can do it in high school, you can be a middle schooler, you can download anything from Hugging Face, you can be part of this community, you can hack, you can distribute this on Twitter and be taken seriously.
Particularly, as you pointed out with your viral blog, tens of thousands, hundreds of thousands of people can look at this, and realize immediately that you've done something original and they don't even know who you are. It's a little bit like that New Yorker cartoon of the dog on the internet, and they're like on the internet, no one knows that I'm a dog.
But you could otherwise have been trapped. I just think of it as the internet here is so revolutionary for you because otherwise you would've been trapped in Roppongi in the 1980s. I think Roppongi Hills probably got built in the '90s, so I'm making this up, but sequestered in a library, reading a textbook, coming up with brilliant ideas, it would just be lost to history.
But instead, you had this platform and medium that allowed you to distribute those ideas, get them in front of the faces of really brilliant smart people and have the world delectably enjoy the work that you were building.
David Ha:
It was really delightful. And it's also really delightful to me to stand on the shoulders of giants as well, like reading some of the old papers by Ken Stanley on things. It's called a morphology search, and now it's called architecture search. So, one of the blog posts I did was to extend Ken's idea on neural network morphology search, but incorporate backprop to it so you can fine tune the weights in real time.
It's simple, but you can make an interactive demo. And I think that work and a few other works became popular with a few folks at Google. So, Jeff Dean reached out to me with regards to the architecture search work, that blog post, and my previous colleague, and manager and mentor, so Doug Eck reached out to me with regards to some of the other creative machine learning experiments.
They encouraged the anonymous me to like, "Hey, whoever you are, you should try to apply to become a researcher at Google." They had a Google Brain Residency program that was opening up in 2016, and they were looking to get that program to attract non-typical PhD folks or a mix of. I went for that and I got a job at Google.
So, I moved to California and I had the best experience in my life. I never regret joining Google. I think it's the best place, especially the batch in that first batch of the Google Brain Residency was filled with amazing people. We had people like a Peter Schroeder.
Danny Crichton:
The batch I was in was the best batch of all time. It's only been downhill since then.
David Ha:
I think you got a few good years. Yeah, you got a few good years. In my year, it was pretty good.
Danny Crichton:
I will say I was a part of the APM intern way back in the day, but it was a similar notion when the APM program just got started. When Google initiated one of these new ideas, there's all these folks who, plus or minus two, three years, were like the right folks. You had a couple of slots, and all of a sudden, you're bringing this amazing cohort together that should have known each other all along.
But it took Google building a little program to get going, and then the pipeline of talent changes. You have this huge wave, and then it resets a little bit and becomes normal. But it's that first group in a lot of these programs that it's like a sucking sound from the market, but you suddenly had all these people who knew that they needed to know each other, and it takes a program to bring them together.
David Ha:
I'm in a really privileged position to say this, but I would pay money to be in the Google Brain Residency program. It's like going to grad school. You just bounce around so many ideas. And I joined projects that resonated with me. So, my mentor, Doug Eck, he started a project called Magenta. So, it was about applying machine learning to creativity.
And that had a big impact in my research, making interactive demos, making machine learning being used in real time by the audience. And I had the opportunity to move to Japan again in 2018. So, around that time, Google built this gigantic building in Shibuya district in Japan.
Danny Crichton:
It's very pretty.
David Ha:
It's awesome, yeah.
Danny Crichton:
I had a good sushi there, probably similar, maybe not the Roppongi Hills sushi option, but it was really quite incredible.
David Ha:
No, yeah, it's incredible, 35 floors. I guess this is a theme that we might come back to later again, but I figure maybe because from Google's point of view, the China market didn't really work out, and my guess is Japan's a great markets. YouTube has its own culture in Japan. There's YouTubers.
There're people who do cosplay on YouTube and there's a whole... Japan YouTube is very different than YouTube. And ads is also phenomenal in Japan. So, I think they invested in Japan, and a bunch of us, people at Google Research who were interested to move to Japan, we got together and as a Googlers, we wrote a design doc.
We wrote the four-pager to explain, hey, why you should let us go to Japan, and it got approved. So, I was one of the first research folks from the brain team to move to Japan along with other folks like Heiga Zen from the speech team and so on. So, I started a really small team in the Shibuya office leading the Google Brain research team. There are other teams as well. There're the Translation NLP teams that were already there.
So, we're not the first researchers, but I think the Google Brain research team, we were the first folks establishing a team there. And then, I worked on more the wacky ideas like a neural network evolution collective systems, scaling up evolution and working on more of the creativity machine learning angle for another two to three years. So, that was basically my Google career. So, I ended up working at Google for six and a half years before venturing into now, the startup space. So, that was a fun ride.
Danny Crichton:
Let's go up to the present day. So, you're based in Shibuya, or at least Tokyo. I don't actually know.
David Ha:
Tokyo, yeah.
Danny Crichton:
Yeah. But you built out Sakana, focused on heterogeneous natural inspired AI models, just raised 30 million funding seed round from us and a bunch of our friends here at Lux. I'm curious because you had this amazing experience at Google, a company that gets hit pretty hard in the news these days, both on the financial side, but among employees that the company's culture changed a lot.
It's less inventive. It's harder to get projects and products underway. I have to admit that I was there 12 years ago and it was pretty hard to get a product launch back then too. So, I always feel like the grass is always greener in the different history era. But I'm curious, as you're starting to think about building Sakana, you have this theme of a fish going against the stream, against the other school of fish going in the same direction.
How do you build a culture that allows for creativity, innovation, this hacking mentality, all the personality that you just described and the experiences you just had? How do you think about building that for all these new employees? Because you're already attracting a great cohort and cross-section of Japan talent, global talent, all looking to descend on Tokyo, how do you stitch that together?
David Ha:
There're a few parts. The first comment I want to make is I think Googlers are still very creative. I think with these waves of technology, you always have, as my co-founder, Llion Jones, like just said, you have the wave of exploration, and exploitation, and exploration and exportation. So, Google is pretty good at the exploration phase, and I think it takes some time to get together.
When you're a company of 100,000 people, you're navigating an aircraft carrier. So, it takes them a little bit more time to change direction to go to the exploitation phase. So, I don't know, I'm still rooting for Google. I think they've recently shown some good progress on Gemini. You've seen some of the recent benchmarks. My favorite benchmark is the Chatbot Arena.
So, when you have two chatbots enter and one chatbot leaves, so actually, you can't really cheat that. It's a human evaluation that they were able to achieve quite good scores. So, I think having them improve is actually good for everyone. It might give OpenAI some pressure as well to move forward. So, I think it's always a good thing.
But getting to your other question now, how do we encourage innovation? I don't know the perfect answer to that, otherwise it'll be fantastic. But some of the principles I use for innovation is to enforce constraints on resources. But imagine on day one, we spent all 30 million and we signed up for... lock us in for three years of computes with Google Cloud.
And we suddenly have this massive TPU Pods at our disposal. The type of work we do will be very different. It will be like, okay, every minute I'm not using this compute, the clock is ticking, then I'm wasting my resources. So, we're really naturally figuring out how to train the model that would use all of these resources. It's more like an engineering type problem.
The other approach is, I don't know, maybe you see someone on some of my Twitter feed is that what the first thing I did was I bought a few workstations and each of them have two GPUs, and the researcher is, okay, this is what you get to experiment with, to play around with. And then, keep ourselves at the GPU port.
Danny Crichton:
Well, I think when you think about what the next generation of AI technologies is to come, a lot of it is about on device, particularly mobile device. Can I actually run the entire neural net on an iPhone? And you can't today. It's too big. It has to be in the cloud. The costs are extremely high. We've seen the numbers from Anthropic, and I believe OpenAI as well.
The gross margins are actually quite bad on AI companies. Every query is very expensive. It's unprofitable. And so, at some point, these curves have to change, which means we have to get much more efficient with the ability to both train a model as well as execute against it. And if you have hundreds of millions of dollars in compute credits, there's really no incentive to try to get better at this.
You're not limited by that factor. And so, engineers are smart people. They're like, "Okay, if I can use 10,000 H100 chips, let's just go do that. Why would I not? I have all the compute power in the world." It's amazing, Terminator style. But I think what you're getting at is a classic push for innovation, which is the constraint.
Which is if you want engineers to get more creative to really rethink solutions, and to get out of their paradigms and their ideologies, they need a constraint that says, "Look, you can't do the thing that's in the paper because that's a billion dollars to train that model." You've got a million bucks, and I think I'm forgetting all the computer science laws, but is it Hofstadter's law that says that if any input changes by 10X, you have to rewrite the system?
Well, it works in reverse. If the input gets cut by 90%, so you have to go to a 10th of what you had before, it's decimated if you will. Suddenly you have to rethink the whole model, "Okay, my memory went from a gigabyte to 128 megabytes. I have to rethink of the way I code. I have to rethink of efficiencies. I have to track my pointers much better than I did the first time.
I can't allocate everything." And so, to me, I think setting that up right from the beginning is super interesting because it sets you up for this course of finding new solutions. And I will say, to connect it into my little world of policy, I always try to remind folks in DC, every time you cut off China or other countries from certain types of chips, you are actually just offering them new constraints.
And you don't actually know whether that actually harms or actually helps because by distending them from all this compute, you might actually encourage them to find new solutions, and they might actually have better models in a couple of years than you otherwise realize unintentionally based on this policy.
And that gets back at our early part of the conversation around complexity, which is sometimes the butterfly effect, okay, we put it in a license, you can't buy an H100. Well, now I don't need it at all. Now, I don't even have to buy it because I've come up with something better that uses easier and cheaper chips, and I've come out to a much more efficient outcome. Who would've thought that all the way from the beginning of this whole process?
David Ha:
100%, I agree with that. I think the founder of Lux, one of the co-founders, Josh, also has interesting views on geopolitics as well. Some of the-
Danny Crichton:
Yes, interesting. Yes.
David Ha:
Yeah. Especially with the complex systems of trade, if it limits, say China's ability to buy GPUs, they will come up with their own better version of the GPU better in their own way. So, I think the policy folks have to think harder.
Danny Crichton:
I always joke, but I wrote a piece a year ago, and I won't get distracted with this. Grace is here, but is scared and looking at like don't go that direction. But I will say I do enjoy how much that the US focuses on restrictions, blocking your access to technology. Whereas from China's perspective, China always goes the other direction. It just dumps, it dumps solar panels, it's dumping electric vehicles.
Now in Europe, there's a huge crisis because Europe is swarming with EVs. In fact, China, as we're recording this week, took over from Japan on EV, an overall auto exports for the first time ever, and is because they have cheap EVs. And so, now you have no domestic industry and there's no incentive to get better at it because you're like, "Well, I can buy it so cheaply, it's just coming over the border. Why would I build out my own capability?"
And I always find this dichotomy so fascinating. But let's go to one final question because I know it's very late in Japan, but you have lived in Japan for a long time. You've had careers there, Goldman, Google, now Sakana as CEO and running your own company. What is the most interesting attraction for building a company in Japan?
Why there when everyone else is going to Silicon Valley, to Cerebral Valley as Hayes Valley is called? And I always have to say from a branding perspective, brilliant, because I love Hayes Valley. It's a very nice neighborhood, but it is small. You can walk Hayes Valley in about four minutes and to think that it's supposed to be the hub of all AI in the world is a little frightening to me, but why Tokyo?
David Ha:
Yeah, it's a good question. I can list so many reasons, but at the end of the day, it's a place that I love. I love the food here, to be honest, I love sushi. So, I think for me and my co-founders, it's not a choice for us to start a company here. This is the place that we've grown accustomed to living in, a place that we call home. But when we did the analysis, okay, then the decision is should we stay and start a company in Japan or should we say move to the Bay Area to start a company?
So, then if I frame it that way, then it makes more sense where if I go to the Bay Area to start a company, there's suddenly 200 other competitors full of ex-Google Brain Residency people starting their own companies that I would compete directly with. Whereas in Japan, there is many good AI companies here, but there's more room to achieve an impact here.
So, it's less crowded. I think also it feels like Tokyo as a city is not like tech hub. It's like there's banks, there's fashion companies, there's creative industries, so there's a whole mix of people. Even the foreigner, the non-Japanese people working here now, it's a really good diverse crowd of chefs to web developers, to even lawyers, typical bankers, but also artists, sports people.
So, for me as a researcher, I got all of my ideas when I was working, and traveling and living in Japan. Even when I was at Google, many of my ideas were formulated when I was living in Japan. It was presented when I was working at Google and refined. So, I think there're certain elements of life here that allows me to have some innovation.
There are some examples of that, like the fact that there's the eight-hour time difference away from the Bay Area means that during Google Times, there's always like, I don't have to go to all these meetings. You have your own times to develop your own stuff. So, this space is also the reason why Japan develops its own weird tech that's different than the rest of the world. From a business point of view, recruiting is actually surprisingly easier than I thought.
Grace Isford:
Popping in here. The very fact that Sakana is not in Silicon Valley is one of the most unique things about it in our mind, and the strong history of technological innovation, stable democracy and top five global economy, there's a really unique experiment of ingredients where to David's point, there's also a talent arbitrage where he can get a ton of amazing talent and be that regional hub. We often analogize it to the deep mind equivalent for Asia and a great place to attract expats to live in Tokyo itself.
Danny Crichton:
I think it just is such an attractive place. I spent six weeks in Tokyo in the last year, mostly in Shibuya, but all across Japan. That's why I was at the Google building even if it's just down the street, it's amazing to see this obelisk of technology, but I think what's so attractive is people would love to live there. And it seems like there's this barrier to say, I don't want to work for NTT.
I don't want to work for a large conglomerate like Toshiba, or Sony, or any of these classic Japanese companies. I want to work for Edge something innovative. And that's always been a little bit of a divide. The startup ecosystem has always been underdeveloped for Japan's success on the economy side. And so, I think you are one of these great bridges where I can either work for Google, or one or two other companies.
Twitter, I think, has an office there, but now there are these great startups where there is a conglomeration of talent, there's really smart people building up. It's innovative. It's a hacker mentality. And so, you're getting the best of both worlds. And I just think that's extremely rare. And for a certain type of person, look, it's not going to be attractive to everyone.
But for a certain type of person, that is the world's best place to be. But David, I know we've kept you way past your bedtime. Normally, you're in bed by 7:00 P.M. and it is 11:30 your time, and I want to be appreciative of the time. So, David Ha, CEO and founder of Sakana AI, thank you so much for joining us.
David Ha:
Thanks, everyone.