think it's possible that physics has exploits and we should be trying to find them arranging some kind of a crazy quantum mechanical system that somehow gives you buffer overflow somehow gives you a rounding error in the floating Point synthetic intelligences are kind of like the next stage of development and I don't know where it leads to like at some point I suspect the universe is some kind of a puzzle these synthetic AIS will uncover that puzzle and solve it ~ Preview Segment Removed ~ the following is a conversation with Andre capothy previously the director of AI at Tesla and before that at open Ai and Stanford he is one of the greatest scientists engineers and Educators in the history of artificial intelligence this is the Lex Friedman podcast to support it please check out our sponsors and now dear friends here's Andre capathi what is a neural network and why does it seem to uh do such a surprisingly good job of learning what is a neural network it's a mathematical abstraction of the brain I would say that's how it was originally developed at the end of the day it's a mathematical expression and it's a fairly simple mathematical expression when you get down to it it's basically a sequence of Matrix multiplies which are really dot products mathematically and some nonlinearities thrown in and so it's a very simple mathematical expression and it's got knobs in it many knobs many knobs and these knobs are Loosely related to basically the synapses in your brain they're trainable they're modifiable and so the idea is like we need to find the setting of The Knobs that makes the neural nut do whatever you want it to do like classify images and so on and so there's not too much mystery I would say in it like um you might think that basically don't want to endow it with too much meaning with respect to the brain and how it works it's really just a complicated mathematical expression with knobs and those knobs need a proper setting for it to do something uh desirable yeah but poetry is just the collection of letters with spaces but it can make us feel a certain way and in that same way when you get a large number of knobs together whether it's in a inside the brain or inside a computer they seem to they seem to surprise us with the with their power yeah I think that's fair so basically I'm underselling it by a lot because you definitely do get very surprising emergent behaviors out of these neurons when they're large enough and trained on complicated enough problems like say for example the next uh word prediction in a massive data set from the internet and then these neurons take on a pretty surprising magical properties yeah I think it's kind of interesting how much you can get out of even very simple mathematical formalism when your brain right now I was talking is it doing next word prediction or is it doing something more interesting well definitely some kind of a generative model that's a gpt-like and prompted by you um yeah so you're giving me a prompt and I'm kind of like responding to it in a generative way and by yourself perhaps a little bit like are you adding extra prompts from your own memory inside your head automatically feels like you're referencing some kind of a declarative structure of like memory and so on and then uh you're putting that together with your prompt and giving away some messages like how much of what you just said has been said by you before uh nothing basically right no but if you actually look at all the words you've ever said in your life and you do a search you'll probably said a lot of the same words in the same order before yeah it could be I mean I'm using phrases that are common Etc but I'm remixing it into a pretty uh sort of unique sentence at the end of the day but you're right definitely there's like a ton of remixing what you didn't you it's like Magnus Carlsen said uh I'm I'm rated 2900 whatever which is pretty decent I think you're talking very uh you're not giving enough credit to neural Nets here why do they seem to what's your best intuition about this emergent Behavior I mean it's kind of interesting because I'm simultaneously underselling them but I also feel like there's an element to which I'm over like it's actually kind of incredible that you can get so much emergent magical Behavior out of them despite them being so simple mathematically so I think those are kind of like two surprising statements that are kind of just juxtapose together and I think basically what it is is we are actually fairly good at optimizing these neural Nets and when you give them a hard enough problem they are forced to learn very interesting Solutions in the optimization and those solution basically have these immersion properties that are very interesting there's wisdom and knowledge in the knobs and so what's this representation that's in the knobs does it make sense to you intuitively the large number of knobs can hold the representation that captures some deep wisdom about the data it has looked at it's a lot of knobs it's a lot of knobs and somehow you know so speaking concretely um one of the neural Nets that people are very excited about right now are are gpts which are basically just next word prediction networks so you consume a sequence of words from the internet and you try to predict the next word and uh once you train these on a large enough data set um they you can basically uh prompt these neural amounts in arbitrary ways and you can ask them to solve problems and they will so you can just tell them you can you can make it look like you're trying to um solve some kind of a mathematical problem and they will continue what they think is the solution based on what they've seen on the internet and very often those Solutions look very remarkably consistent look correct potentially do you still think about the brain side of it so as neural Nets is an abstraction or mathematical abstraction of the brain you still draw wisdom from from the biological neural networks or even the bigger question so you're a big fan of biology and biological computation what impressive thing is biology do doing to you that computers are not yet that Gap I would say I'm definitely on I'm much more hesitant with the analogies to the brain than I think you would see potentially in the field um and I kind of feel like certainly the way neural network started is everything stemmed from inspiration by the brain but at the end of the day the artifacts that you get after training they are arrived at by a very different optimization process than the optimization process that gave rise to the brain and so I think uh I kind of think of it as a very complicated alien artifact um it's something different I'm not sorry the uh the neuralness that we're training okay they are complicated uh Alien artifact uh I do not make analogies to the brain because I think the optimization process that gave rise to it is very different from the brain so there was no multi-agent self-play kind of uh setup uh and evolution it was an optimization that is basically a what amounts to a compression objective on a massive amount of data okay so artificial neural networks are doing compression and biological neural networks are not to survive and they're not really doing any they're they're an agent in a multi-agent self-place system that's been running for a very very long time that said Evolution has found that it is very useful to to predict and have a predictive model in the brain and so I think our brain utilizes something that looks like that as a part of it but it has a lot more you know gadgets and gizmos and uh value functions and ancient nuclei that are all trying to like make a survive and reproduce and everything else and the whole thing through embryogenesis is built from a single cell I mean it's just the code is inside the DNA and it just builds it up like the entire organism yes and like it does it pretty well it should not be possible so there's some learning going on there's some there's some there's some kind of computation going through that building process I mean I I don't know where if you were just to look at the entirety of history of life on Earth where do you think is the most interesting invention is it the origin of life itself is it just jumping to eukaryotes is it mammals is it humans themselves Homo sapiens the the origin of intelligence or highly complex intelligence or or is it all just in continuation the same kind of process certainly I would say it's an extremely remarkable story that I'm only like briefly learning about recently all the way from um actually like you almost have to start at the formation of Earth and all of its conditions and the entire solar system and how everything is arranged with Jupiter and Moon and the habitable zone and everything and then you have an active Earth that's turning over material and um and then you start with a biogenesis and everything and so it's all like a pretty remarkable story I'm not sure that I can pick like a single Unique Piece of it that I find most interesting um I guess for me as an artificial intelligence researcher it's probably the last piece we have lots of animals that uh you know are are not building technological Society but we do and um it seems to have happened very quickly it seems to have happened very recently and uh something very interesting happened there that I don't fully understand I almost understand everything else kind of I think intuitively uh but I don't understand exactly that part and how quick it was both explanations would be interesting one is that this is just a continuation of the same kind of process there's nothing special about humans that would be deeply understanding that would be very interesting that we think of ourselves as special but it was obvious all it was already written in the in the code that you would have greater and greater intelligence emerging and then the other explanation which is something truly special happened something like a rare event whether it's like crazy rare event like uh Space Odyssey what would it be see if you say like the invention of Fire or the uh as Richard rangham says the beta males deciding a clever way to kill the alpha males by collaborating so just optimizing the collaborations really the multi-agent aspect of the multi-agent and that really being constrained on resources and trying to survive the collaboration aspect is what created the complex intelligence but it seems like it's a natural outgrowth of the evolution process like what could possibly be a magical thing that happened like a rare thing that would say that humans are actually human level intelligence is actually a really rare thing in the universe yeah I'm hesitant to say that it is rare by the way but it definitely seems like it's kind of like a punctuated equilibrium where you have lots of exploration and then you have certain leaps sparse leaps in between so of course like origin of life would be one um you know DNA sex eukaryotic system eukaryotic life um the endosymbiosis event or the archaeon 8 little bacteria you know just the whole thing and then of course emergence of Consciousness and so on so it seems like definitely there are sparse events where mass amount of progress was made but yeah it's kind of hard to pick one so you don't think humans are unique gotta ask you how many intelligent aliens civilizations do you think are out there and uh is there intelligence different or similar to ours yeah I've been preoccupied with this question quite a bit recently uh basically the for me Paradox and just thinking through and and the reason actually that I am very interested in uh the origin of life is fundamentally trying to understand how common it is that there are technological societies out there uh um in space and the more I study it the more I think that um uh there should be quite a few quite a lot why haven't we heard from them because I I agree with you it feels like I just don't see why what we did here on Earth is so difficult to do yeah and especially when you get into the details of it I used to think origin of life was very um it was this magical rare event but then you read books like for example McLean um uh the vital question a life ascending Etc and he really gets in and he really makes you believe that this is not that rare basic chemistry you have an active Earth and you have your alkaline Vents and you have lots of alkaline Waters mixing whether it's a devotion and you have your proton gradients and you have the little porous pockets of these alkaline vents that concentrate chemistry and um basically as he steps through all of these little pieces you start to understand that actually this is not that crazy you could see this happen on other systems um and he really takes you from just a geology to primitive life and he makes it feel like it's actually pretty plausible and also like the origin of life um didn't uh was actually fairly fast after formation of Earth um if I remember correctly just a few hundred million years or something like that after basically when it was possible life actually arose and so that makes me feel like that is not the constraint that is not the limiting variable and that life should actually be fairly common um and then it you know where the drop-offs are is very um is very interesting to think about I currently think that there's no major drop-offs basically and so there should be quite a lot of life and basically what it where that brings me to then is the only way to reconcile the fact that we haven't found anyone and so on is that um we just can't we can't see them we can't observe them just a quick brief comment Nick Lane and a lot of biologists I talked to they really seem to think that the jump from bacteria to more complex organisms is the hardest jump the eukaryotic glyphosis yeah which I don't I get it they're much more knowledgeable uh than me about like the intricacies of biology but that seems like crazy because how much how many single cell organisms are there like and how much time you have surely it's not that difficult like in a billion years it's not even that long of a time really just all these bacteria under constrained resources battling it out I'm sure they can invent more complex again I don't understand it's like how to move from a hello world program to like like invent a function or something like that I don't yeah I I so I don't yeah so I'm with you I just feel like I don't see any if the origin of life that would be my intuition that's the hardest thing but if that's not the hardest thing because it happens so quickly then it's got to be everywhere and yeah maybe we're just too dumb to see it well it's just we don't have really good mechanisms for seeing this life I mean uh by what how um so I'm not an expert just to preface this but just said it was I want to meet an expert on alien intelligence and how to communicate I'm very suspicious of our ability to to find these intelligences out there and to find these Earths like radio waves for example are are terrible uh their power drops off as basically one over R square uh so I remember reading that our current radio waves would not be uh the ones that we we are broadcasting would not be uh measurable by our devices today only like was it like one tenth of a light year away like not even basically tiny distance because you really need like a targeted transmission of massive power directed somewhere for this to be picked up on long distances and so I just think that our ability to measure is um is not amazing I think there's probably other civilizations out there and then the big question is why don't they build binomial probes and why don't they Interstellar travel across the entire galaxy and my current answer is it's probably Interstellar travel is like really hard uh you have the interstellar medium if you want to move at closer speed of light you're going to be encountering bullets along the way because even like tiny hydrogen atoms and little particles of dust are basically have like massive kinetic energy at those speeds and so basically you need some kind of shielding you need you have all the cosmic radiation uh it's just like brutal out there it's really hard and so my thinking is maybe Interstellar travel is just extremely hard to build hard it feels like uh it feels like we're not a billion years away from doing that it just might be that it's very you have to go very slowly potentially as an example through space um right as opposed to close the speed of light so I'm suspicious basically of our ability to measure life and I'm suspicious of the ability to um just permeate all of space in the Galaxy or across galaxies and that's the only way that I can certainly I can currently see a way around it yeah it's kind of mind-blowing to think that there's trillions of intelligent alien civilizations out there kind of slowly traveling through space to meet each other and some of them meet some of them go to war some of them collaborate or they're all just uh independent they are all just like little pockets I don't know well statistically if there's like if it's there's trillions of them surely some of them some of the pockets are close enough to get some of them happen to be close yeah in the close enough to see each other and then once you see once you see something that is definitely complex life like if we see something yeah we're probably going to be severe like intensely aggressively motivated to figure out what the hell that is and try to meet them what would be your first instinct to try to like at a generational level meet them or defend against them or what would be your uh Instinct as a president of the United States and the scientists I don't know which hat you prefer in this question yeah I think the the question it's really hard um I will say like for example for us um we have lots of primitive life forms on Earth um next to us we have all kinds of ants and everything else and we share space with them and we are hesitant to impact on them and to we are and we're trying to protect them by default because they are amazing interesting dynamical systems that took a long time to evolve and they are interesting and special and I don't know that you want to um destroy that by default and so I like complex dynamical systems that took a lot of time to evolve I think I'd like to I like to preserve it if I can afford to and I'd like to think that the same would be true about uh the galactic resources and that uh they would think that we're kind of incredible interesting story that took time it took a few billion years to unravel and you don't want to just destroy it I could see two aliens talking about Earth right now and saying uh I'm I'm a big fan of complex dynamical systems so I think it was a value to preserve these and who basically are a video game they watch or show a TV show that they watch yeah I think uh you would need like a very good reason I think to to destroy it uh like why don't we destroy these ant farms and so on it's because we're not actually like really in direct competition with them right now uh we do it accidentally and so on but um there's plenty of resources and so why would you destroy something that is so interesting and precious well from a scientific perspective you might probe it yeah you might interact with it later you might want to learn something from it right so I wonder there's could be certain physical phenomena that we think is a physical phenomena but it's actually interacting with us to like poke the finger and see what happens I think it should be very interesting to scientists other alien scientists what happened here um and you know it's a what we're seeing today is a snapshot basically it's a result of a huge amount of computation uh of over like billion years or something like that so it could have been initiated by aliens this could be a computer running a program like when okay if you had the power to do this when you okay for sure at least I would I would pick uh a Earth-like planet that has the conditions based my understanding of the chemistry prerequisites for life and I would see it with life and run it right like yeah wouldn't you 100 do that and observe it and then protect I mean that that's not just a hell of a good TV show it's it's a good scientific experiment yeah and that in his it's physical simulation right maybe maybe the evolution is the most like actually running it uh is the most efficient way to uh understand computation or to compute stuff or to understand life or you know what life looks like and uh what branches it can take it does make me kind of feel weird that we're part of a science experiment but maybe it's everything's a science experiments how to does that change anything for us for a science experiment um I don't know two descendants of Apes talking about being inside of a science experience I'm suspicious of this idea of like a deliberate Pence Premiere as you described it service and I don't see a divine intervention in some way in the in the historical record right now I do feel like um the story in these in these books like Nick Lane's books and so on sort of makes sense uh and it makes sense how life arose on Earth uniquely and uh yeah I don't need a I need I don't need to reach for more exotic explanations right now sure but NPCs inside a video game don't don't don't observe any divine intervention either and we might just be all NPCs running a kind of code maybe eventually they will currently NPCs are really dumb but once they're running gpts um maybe they will be like hey this is really suspicious what the hell so you uh famously tweeted it looks like if you bombard Earth with photons for a while you can emit A roadster so if like an Hitchhiker's Guide to the Galaxy we would summarize the story of Earth so in in that book it's mostly harmless uh what do you think is all the possible stories like a paragraph long or a sentence long that Earth could be summarized as once it's done it's computation so like all the Possible full if Earth is a book right yeah uh probably there has to be an ending I mean there's going to be an end to Earth and it could end in all kinds of ways it could end soon it can end later what do you think are the possible stories well definitely there seems to be yeah you're sort of it's pretty incredible that these self-replicating systems will basically arise from the Dynamics and then they perpetuate themselves and become more complex and eventually become conscious and build a society and I kind of feel like in some sense it's kind of like a deterministic wave uh that you know that kind of just like happens on any you know any sufficiently well arranged system like Earth and so I kind of feel like there's a certain sense of inevitability in it um and it's really beautiful and it ends somehow right so it's a it's a chemically a diverse environment where complex dynamical systems can evolve and become more more further and further complex but then there's a certain um what is it there's certain terminating conditions yeah I don't know what the terminating conditions are but definitely there's a trend line of something and we're part of that story and like where does that where does it go so you know we're famously described often as a biological Bootloader for AIS and that's because humans I mean you know we're an incredible uh biological system and we're capable of computation and uh you know and love and so on um but we're extremely inefficient as well like we're talking to each other through audio it's just kind of embarrassing honestly they were manipulating like seven symbols uh serially we're using vocal chords it's all happening over like multiple seconds yeah it's just like kind of embarrassing when you step down to the uh frequencies at which computers operate or are able to cooperate on and so basically it does seem like um synthetic intelligences are kind of like the next stage of development and um I don't know where it leads to like at some point I suspect uh the universe is some kind of a puzzle and these synthetic AIS will uncover that puzzle and um solve it and then what happens after right like what because if you just like Fast Forward Earth many billions of years it's like uh it's quiet and then it's like to tourmal you see like city lights and stuff like that and then what happens like at the end like is it like a is it or is it like a calming is it explosion is it like Earth like open like a giant because you said emit Roasters like well let's start emitting like like a giant number of Like Satellites yes it's some kind of a crazy explosion and we're living we're like we're stepping through a explosion and we're like living day to day and it doesn't look like it but it's actually if you I saw a very cool animation of Earth uh and life on Earth and basically nothing happens for a long time and then the last like two seconds like basically cities and everything and just in the low earth orbit just gets cluttered and just the whole thing happens in the last two seconds and you're like this is exploding this is a statement explosion so if you play yeah yeah if you play it at normal speed yeah it'll just look like an explosion it's a firecracker we're living in a firecracker where it's going to start emitting all kinds of interesting things yeah and then so explosion doesn't it might actually look like a little explosion with with lights and fire and energy emitted all that kind of stuff but when you look inside the details of the explosion there's actual complexity happening where there's like uh yeah human life or some kind of life we hope it's not destructive firecracker it's kind of like a constructive uh firecracker all right so given that I think uh hilarious disgusting it is a really interesting to think about like what the puzzle of the universe is that the creator of the universe uh give us a message like for example in the book contact UM Carl Sagan uh there's a message for Humanity for any civilization in uh digits in the expansion of Pi and base 11 eventually which is kind of interesting thought uh maybe maybe we're supposed to be giving a message to our creator maybe we're supposed to somehow create some kind of a quantum mechanical system that alerts them to our intelligent presence here because if you think about it from their perspective it's just say like Quantum field Theory massive like cellular automaton like thing and like how do you even notice that we exist you might not even be able to pick us up in that simulation and so how do you uh how do you prove that you exist that you're intelligent and that you're a part of the universe so this is like a touring test for intelligence from Earth yeah the Creator is uh I mean maybe this is uh like trying to complete the next word in a sentence this is a complicated way of that like Earth is just is basically sending a message back yeah the puzzle is basically like alerting the Creator that we exist or maybe the puzzle is just to just break out of the system and just uh you know stick it to the Creator in some way uh basically like if you're playing a video game you can um you can somehow find an exploit and find a way to execute on the host machine in the arbitrary code there's some for example I believe someone got Mario a game of Mario to play Pong just by uh exploiting it and then um creating a basically writing writing code and being able to execute arbitrary code in the game and so maybe we should be maybe that's the puzzle is that we should be um uh find a way to exploit it so so I think like some of these synthetic ads will eventually find the universe to be some kind of a puzzle and then solve it in some way and that's kind of like the end game somehow do you often think about it as a as a simulation so as the universe being a kind of computation that has might have bugs and exploits yes yeah I think so is that what physics is essentially I think it's possible that physics has exploits and we should be trying to find them arranging some kind of a crazy quantum mechanical system that somehow gives you buffer overflow somehow gives you a rounding error and a floating Point uh yeah that's right and like more and more sophisticated exploits like those are jokes but that could be actually very close yeah we'll find some way to extract infinite energy for example when you train a reinforcement learning agents um and physical simulations and you ask them to say run quickly on the flat ground they'll end up doing all kinds of like weird things um in part of that optimization right they'll get on their back leg and they will slide across the floor and it's because of the optimization the enforcement learning optimization on that agent has figured out a way to extract infinite energy from the friction forces and basically their poor implementation and they found a way to generate infinite energy and just slide across the surface and it's not what you expected it's just a it's sort of like a perverse solution and so maybe we can find something like that maybe we can be that little dog in this physical simulation the the cracks or escapes the intended consequences of the physics that the Universe came up with we'll figure out some kind of shortcut to some weirdness yeah and then oh man but see the problem with that weirdness is the first person to discover the weirdness like sliding in the back legs that's all we're going to do yeah it's very quickly because everybody does that thing so like the the paper clip maximizer is a ridiculous idea but that very well you know could be what then we'll just uh we'll just all switch that because it's so fun well no person will Discover it I think by the way I think it's going to have to be uh some kind of a super intelligent AGI of a third generation like we're building the first generation AGI you know third generation yeah so the the Bootloader for an AI the that AI yeah will be a Bootloader for another AI yeah and then there's no way for us to introspect like what that might even uh I think it's very likely that these things for example like say you have these agis it's very like for example they will be completely inert I like these kinds of sci-fi books sometimes where these things are just completely inert they don't interact with anything and I find that kind of beautiful because uh they probably they've probably figured out the meta game of the universe in some way potentially they're they're doing something completely beyond our imagination um and uh they don't interact with simple chemical life forms like why would you do that so I find those kinds of ideas compelling what's their source of fun what are they what are they doing what's the source of solving in the universe but inert so can you define what it means inert so they escape as in um they will behave in some very like strange way to us because they're uh they're beyond they're playing The Meta game uh and The Meta game is probably say like arranging quantum mechanical systems in some very weird ways to extract Infinite Energy uh solve the digital expansion of Pi to whatever amount uh they will build their own like little Fusion reactors or something crazy like they're doing something Beyond Comprehension and uh not understandable to us and actually brilliant under the hood what if quantum mechanics itself is the system and we're just thinking it's physics but we're really parasites on on or not parasite we're not really hurting physics we're just living on this organisms this organism and we're like trying to understand it but really it is an organism and with a deep deep intelligence maybe physics itself is uh the the organism that's doing a super interesting thing and we're just like one little thing yeah ant sitting on top of it trying to get energy from it we're just kind of like these particles in a wave that I feel like is mostly deterministic and takes uh Universe from some kind of a big bang to some kind of a super intelligent replicator some kind of a stable point in the universe given these laws of physics you don't think uh as Einstein said God doesn't play dice so you think it's mostly deterministic there's no Randomness in the thing I think it's deterministic oh there's tons of uh well I'm I want to be careful with Randomness pseudo random yeah I don't like random uh I think maybe the laws of physics are deterministic um yeah I think they're determinants just got really uncomfortable with this question do you have anxiety about whether the universe is random or not what's there's no Randomness uh you say you like Goodwill Hunting it's not your fault Andre it's not it's not your fault man um so you don't like Randomness uh yeah I think it's uh unsettling I think it's a deterministic system I think that things that look random like say the uh collapse of the wave function Etc I think they're actually deterministic just entanglement uh and so on and uh some kind of a Multiverse Theory something something okay so why does it feel like we have a free will like if I raise the hand I chose to do this now um what that doesn't feel like a deterministic thing it feels like I'm making a choice it feels like it okay so it's all feelings it's just feelings yeah so when an RL agent is making a choice is that um it's not really making a choice the choices are all already there yeah you're interpreting the choice and you're creating a narrative for or having made it yeah and now we're talking about the narrative it's very meta looking back what is the most beautiful or surprising idea in deep learning or AI in general that you've come across you've seen this field explode and grow in interesting ways just what what cool ideas like like we made you sit back and go hmm small big or small well the one that I've been thinking about recently the most probably is the the Transformer architecture um so basically uh neural networks have a lot of architectures that were trendy have come and gone for different sensory modalities like for Vision Audio text you would process them with different looking neural nuts and recently we've seen these convergence towards one architecture the Transformer and you can feed it video or you can feed it you know images or speech or text and it just gobbles it up and it's kind of like a bit of a general purpose uh computer that is also trainable and very efficient to run on our Hardware and so uh this paper came out in 2016 I want to say um attention is all you need attention is all you need you criticize the paper title in retrospect that it wasn't um it didn't foresee the bigness of the impact yeah that it was going to have yeah I'm not sure if the authors were aware of the impact that that paper would go on to have probably they weren't but I think they were aware of some of the motivations and design decisions beyond the Transformer and they chose not to I think expand on it in that way in the paper and so I think they had an idea that there was more um than just the surface of just like oh we're just doing translation and here's a better architecture you're not just doing translation this is like a really cool differentiable optimizable efficient computer that you've proposed and maybe they didn't have all of that foresight but I think is really interesting isn't it funny sorry to interrupt that title is memeable that they went for such a profound idea they went with the I don't think anyone used that kind of title before right protection is all you need yeah it's like a meme or something exactly it's not funny that one like uh maybe if it was a more serious title it wouldn't have the impact honestly I yeah there is an element of me that honestly agrees with you and prefers it this way yes if it was two grand it would over promise and then under deliver potentially so you want to just uh meme your way to greatness that should be a t-shirt so you you tweeted the Transformers the Magnificent neural network architecture because it is a general purpose differentiable computer it is simultaneously expressive in the forward pass optimizable via back propagation gradient descent and efficient High parallelism compute graph can you discuss some of those details expressive optimizable efficient yeah for memory or or in general whatever comes to your heart you want to have a general purpose computer that you can train on arbitrary problems like say the task of next word prediction or detecting if there's a cat in the image or something like that and you want to train this computer so you want to set its weights and I think there's a number of design criteria that sort of overlap in the Transformer simultaneously that made it very successful and I think the authors were kind of uh deliberately trying to make this really powerful architecture and um so basically it's very powerful in the forward pass because it's able to express um very general computation as a sort of something that looks like message passing you have nodes and they all store vectors and these nodes get to basically look at each other and it's each other's vectors and they get to communicate and basically notes get to broadcast hey I'm looking for certain things and then other nodes get to broadcast hey these are the things I have those are the keys and the values so it's not just the tension yeah exactly Transformer is much more than just the attention component it's got many pieces architectural that went into it the residual connection of the way it's arranged there's a multi-layer perceptron in there the way it's stacked and so on um but basically there's a message passing scheme where nodes get to look at each other decide what's interesting and then update each other and uh so I think the um when you get to the details of it I think it's a very expressive function uh so it can express lots of different types of algorithms and forward paths not only that but the way it's designed with the residual connections layer normalizations the softmax attention and everything it's also optimizable this is a really big deal because there's lots of computers that are powerful that you can't optimize or they're not easy to optimize using the techniques that we have which is back propagation and gradient and send these are first order methods very simple optimizers really and so um you also need it to be optimizable um and then lastly you want it to run efficiently in the hardware our Hardware is a massive throughput machine like gpus they prefer lots of parallelism so you don't want to do lots of sequential operations you want to do a lot of operations serially and the Transformer is designed with that in mind as well and so it's designed for our hardware and it's designed to both be very expressive in a forward pass but also very optimizable in the backward pass and you said that uh the residual connections support a kind of ability to learn short algorithms fast them first and then gradually extend them longer during training yeah what's what's the idea of learning short algorithms right think of it as a so basically a Transformer is a series of uh blocks right and these blocks have attention and a little multi-layer perceptron and so you you go off into a block and you come back to this residual pathway and then you go off and you come back and then you have a number of layers arranged sequentially and so the way to look at it I think is because of the residual pathway in the backward path the gradients uh sort of flow along it uninterrupted because addition distributes the gradient equally to all of its branches so the gradient from the supervision at the top uh just floats directly to the first layer and the all the residual connections are arranged so that in the beginning during initialization they contribute nothing to the residual pathway um so what it kind of looks like is imagine the Transformer is kind of like a uh python uh function like a death and um you get to do various kinds of like lines of code say you have a hundred layers deep Transformer typically they would be much shorter say 20. so if 20 lines of code then you can do something in them and so think of during the optimization basically what it looks like is first you optimize the first line of code and then the second line of code can kick in and the third line of code can and I kind of feel like because of the residual pathway and the Dynamics of the optimization you can sort of learn a very short algorithm that gets the approximate tensor but then the other layers can sort of kick in and start to create a contribution and at the end of it you're you're optimizing over an algorithm that is 20 lines of code except these lines of code are very complex because it's an entire block of a transformer you can do a lot in there what's really interesting is that this Transformer architecture actually has been a remarkably resilient basically the Transformer that came out in 2016 is the Transformer you would use today except you reshuffle some of the layer norms the layer normalizations have been reshuffled to a pre-norm formulation and so it's been remarkably stable but there's a lot of bells and whistles that people have attached on and try to uh improve it I do think that basically it's a it's a big step in simultaneously optimizing for lots of properties of a desirable neural network architecture and I think people have been trying to change it but it's proven remarkably resilient but I do think that there should be even better architectures potentially but it's uh your you admire the resilience here yeah there's something profound about this architecture that that at least so maybe we can everything can be turned into a uh into a problem that Transformers can solve currently definitely looks like the Transformers taking over Ai and you can feed basically arbitrary problems into it and it's a general differentiable computer and it's extremely powerful and uh this convergence in AI has been really interesting to watch uh for me personally what else do you think could be discovered here about Transformers like what's surprising thing or or is it a stable um I went to stable place is there something interesting we might discover about Transformers like aha moments maybe has to do with memory uh maybe knowledge representation that kind of stuff definitely the Zeitgeist today is just pushing like basically right now this ad guys is do not touch the Transformer touch everything else yes so people are scaling up the data sets making them much much bigger they're working on the evaluation making the evaluation much much bigger and uh um they're basically keeping the architecture unchanged and that's how we've um that's the last five years of progress in AI kind of what do you think about one flavor of it which is language models have you been surprised uh has your sort of imagination been captivated by you mentioned GPT and all the bigger and bigger and bigger language models and uh what are the limits of those models do you think so just let the task of natural language basically the way GPT is trained right is you just download a mass amount of text Data from the internet and you try to predict the next word in a sequence roughly speaking you're predicting will work chunks but uh roughly speaking that's it and what's been really interesting to watch is uh basically it's a language model language models have actually existed for a very long time um there's papers on language modeling from 2003 even earlier can you explain that case what a language model is uh yeah so language model just basically the rough idea is um just predicting the next word in a sequence roughly speaking uh so there's a paper from for example bengio and the team from 2003 where for the first time they were using a neural network to take say like three or five words and predict the um next word and they're doing this on much smaller data sets and the neural net is not a Transformer it's a multiple error perceptron but but it's the first time that a neural network has been applied in that setting but even before neural networks there were language models except they were using engram models so engram models are just a count based models so um if you try to if you start to take two words and predict the third one you just count up how many times you've seen any two word combinations and what came next and what you predict that's coming next is just what you've seen the most of in the training set and so language modeling has been around for a long time neural networks have done language modeling for a long time so really what's new or interesting or exciting is just realizing that when you scale it up with a powerful enough neural net Transformer you have all these emergent properties where basically what happens is if you have a large enough data set of text you are in the task of predicting the next word you are multitasking a huge amount of different kinds of problems you are multitasking understanding of you know chemistry physics human nature lots of things are sort of clustered in that objective it's a very simple objective but actually you have to understand a lot about the world to make that prediction you just said the U word understanding uh are you in terms of chemistry and physics and so on what do you feel like it's doing is it searching for the right context uh in in like what what is it what is the actual process Happening Here Yeah so basically it gets a thousand words and it's trying to predict a thousand at first and uh in order to do that very very well over the entire data set available on the internet you actually have to basically kind of understand the context of of what's going on in there yeah um and uh it's a sufficiently hard problem that you uh if you have a powerful enough computer like a Transformer you end up with uh interesting Solutions and uh you can ask it uh to all do all kinds of things and um it it shows a lot of emerging properties like in context learning that was the big deal with GPT and the original paper when they published it is that you can just sort of uh prompt it in various ways and ask it to do various things and it will just kind of complete the sentence but in the process of just completing the sentence it's actually solving all kinds of really uh interesting problems that we care about do you think it's doing something like understanding like and when we use the word understanding for us humans I think it's doing some understanding it in its weights it understands I think a lot about the world and it has to in order to predict the next word in a sequence so let's train on the data from the internet uh what do you think about this this approach in terms of data sets of using data from the internet do you think the internet has enough structured data to teach AI about human civilization yeah so I think the internet has a huge amount of data I'm not sure if it's a complete enough set I don't know that uh text is enough for having a sufficiently powerful AGI as an outcome um of course there is audio and video and images and all that kind of stuff yeah so text by itself I'm a little bit suspicious about there's a ton of things we don't put in text in writing uh just because they're obvious to us about how the world works and the physics of it and the Things fall we don't put that stuff in text because why would you we share that understanding and so Texas communication medium between humans and it's not a all-encompassing medium of knowledge about the world but as you pointed out we do have video and we have images and we have audio and so I think that that definitely helps a lot but we haven't trained models uh sufficiently uh across both across all those modalities yet so I think that's what a lot of people are interested in but I wonder what that shared understanding of like well we might call Common Sense has to be learned inferred in order to complete the sentence correctly so maybe the fact that it's implied on the internet the model is going to have to learn that not by reading about it by inferring it in the representation so like common sense just like we I don't think we learn common sense like nobody says tells us explicitly we just figure it all out by interacting with the world right so here's a model of reading about the way people interact with the world it might have to infer that I wonder yeah uh you you briefly worked on a project called the world of bits training in our RL system to take actions on the internet versus just consuming the internet like we talked about do you think there's a future for that kind of system interacting with the internet to help the learning yes I think that's probably the uh the final frontier for a lot of these models because um so as you mentioned I was at open AI I was working on this project world of bits and basically it was the idea of giving neural networks access to a keyboard and a mouse and the idea could possibly go wrong so basically you um you perceive the input of the screen pixels and basically the state of the computer is sort of visualized for human consumption in images of the web browser and stuff like that and then you give the neural network the ability to press keyboards and use the mouse and we're trying to get it to for example complete bookings and you know interact with user interfaces and um what did you learn from that experience like what was some fun stuff this is super cool idea yeah I mean it's like uh yeah I mean the the step between Observer to actor yeah is a super fascinating step yeah well the universal interface in the digital realm I would say and there's a universal interface in like the Physical Realm which in my mind is a humanoid form factor kind of thing we can later talk about Optimus and so on but I feel like there's a they're kind of like a similar philosophy in some way where the human the world the physical world is designed for the human form and the digital world is designed for the human form of seeing the screen and using keyword keyboard and mouse and so as the universal interface that can basically uh command the digital infrastructure we've built up for ourselves and so it feels like a very powerful interface to to command and to build on top of now to your question as to like what I learned from that it's interesting because the world of bits was basically uh too early I think at open AI at the time this is around 2015 or so and the Zeitgeist at that time was very different in AI from the Zeitgeist today at the time everyone was super excited about reinforcement learning from scratch this is the time of the Atari paper where uh neural networks were playing Atari games and beating humans in some cases uh alphago and so on so everyone's very excited about train training neural networks from scratch using reinforcement learning um directly it turns out that reinforcement learning is extremely inefficient way of training neural networks because you're taking all these actions and all these observations and you get some sparse rewards once in a while so you do all this stuff based on all these inputs and once in a while you're like told you did a good thing you did a bad thing and it's just an extremely hard problem you can't learn from that you can burn forest and you can sort of Brute Force through it and we saw that I think with uh you know with uh go and DOTA and so on and it does work but it's extremely inefficient uh and not how you want to approach problems uh practically speaking and so that's the approach that at the time we also took to World of bits we would uh have an agent initialize randomly so with keyboard mash and mouse mash and try to make a booking and it's just like revealed the insanity of that approach very quickly where you have to stumble by the correct booking in order to get a reward of you did it correctly and you're never going to stumble by it by chance at random so even with a simple web interface there's too many options there's just too many options uh and uh it's two sparse of reward signal and you're starting from scratch at the time and so you don't know how to read you don't understand pictures images buttons you don't understand what it means to like make a booking but now what's happened is uh it is time to revisit that and open your eyes interested in this uh companies like Adept are interested in this and so on and uh the idea is coming back because the interface is very powerful but now you're not training an agent from scratch you are taking the GPT as an initialization so GPT is pre-trained on all of text and it understands what's a booking it understands what's a submit it understands um quite a bit more and so it already has those representations they are very powerful and that makes all the training significantly more efficient and makes the problem tractable should the interaction be with like the way humans see it with the buttons and the language or it should be with the HTML JavaScript and this and the CSS what's what do you think is the better so today all this interest is mostly on the level of HTML CSS and so on that's done because of computational constraints but I think ultimately everything is designed for human visual consumption and so at the end of the day there's all the additional information is in the layout of the web page and what's next to you and what's a red background and all this kind of stuff and what it looks like visually so I think that's the final frontier as we are taking in pixels and we're giving out keyboard mouse commands but I think it's impractical still today do you worry about bots on the internet given given these ideas given how exciting they are do you worry about bots on Twitter being not the the stupid boss that we see now with the cryptobots but the Bots that might be out there actually that we don't see that they're interacting in interesting ways so this kind of system feels like it should be able to pass the I'm not a robot click button whatever um which you actually understand how that test works I don't quite like there's there's a there's a check box or whatever that you click it's presumably tracking oh I see like Mouse movement and the timing and so on yeah so exactly this kind of system we're talking about should be able to pass that so yeah what do you feel about um Bots that are language models Plus have some interact ability and are able to tweet and reply and so on do you worry about that world uh yeah I think it's always been a bit of an arms race uh between sort of the attack and the defense uh so the attack will get stronger but the defense will get stronger as well our ability to detect that how do you defend how do you detect how do you know that your karpate account on Twitter is is human how do you approach that like if people were claim you know uh how would you defend yourself in the court of law that I'm a human um this account is yeah at some point I think uh it might be I think the society Society will evolve a little bit like we might start signing digitally signing uh some of our correspondents or you know things that we create uh right now it's not necessary but maybe in the future it might be I do think that we are going towards the world where we share we share the digital space with uh AIS synthetic beings yeah and uh they will get much better and they will share our digital realm and they'll eventually share our Physical Realm as well it's much harder uh but that's kind of like the world we're going towards and most of them will be benign and awful and some of them will be malicious and it's going to be an arms race trying to detect them so I mean the worst isn't the AI is the worst is the AIS pretending to be human so mine I don't know if it's always malicious there's obviously a lot of malicious applications but yeah it could also be you know if I was an AI I would try very hard to pretend to be human because we're in a human world yeah I wouldn't get any respect as an AI yeah I want to get some love and respect I don't think the problem is intractable people are people are thinking about the proof of personhood yes and uh we might start digitally signing our stuff and we might all end up having like uh yeah basically some some solution for proof of personhood it doesn't seem to be intractable it's just something that we haven't had to do until now but I think once the need like really starts to emerge which is soon I think when people think about it much more so but that too will be a race because um obviously you can probably uh spoof or fake the the the proof of personhood so you have to try to figure out how to probably I mean it's weird that we have like Social Security numbers and like passports and stuff it seems like it's harder to fake stuff in the physical space than the residual space it just feels like it's going to be very tricky very tricky to out um because it seems to be pretty low cost fake stuff what are you gonna put an AI in jail for like trying to use a fake fake personhood proof you can I mean okay fine you'll put a lot of AIS in jail but there'll be more ai's arbitrary like exponentially more the cost of creating a bot is very low uh unless there's some kind of way to track accurately like you're not allowed to create any program without showing uh tying yourself to that program like you any program that runs on the internet you'll be able to uh Trace every single human program that was involved with that program yeah maybe you have to start declaring when uh you know we have to start drawing those boundaries and keeping track of okay uh what our digital entities versus human entities and uh what is the ownership of human entities and digital entities and uh something like that um I don't know but I think I'm optimistic that this is uh this is uh possible and at some in some sense we're currently in like the worst time of it because um all these Bots suddenly have become very capable but we don't have defenses yet built up as a society and but I think uh that doesn't seem to be intractable it's just something that we have to deal with it seems weird that the Twitter but like really crappy Twitter Bots are so numerous like is it so I presume that the engineers at Twitter are very good so it seems like what I would infer from that uh is it seems like a hard problem it they're probably catching all right if I were to sort of steal them on the case it's a hard problem and there's a huge cost to uh false positive to to removing a post by somebody that's not a bot because creates a very bad user experience so they're very cautious about removing so maybe it's uh and maybe the boss are really good at learning what gets removed and not such that they can stay ahead of the removal process very quickly my impression of it honestly is there's a lot of blowing for it I mean yeah just that's what I it's not subtle it's my impression of it it's not so but you have to yeah that's my impression as well but it feels like maybe you're seeing the the tip of the iceberg maybe the number of bots is in like the trillions and you have to like just it's a constant assault of bots and yeah you yeah I don't know um you have to still man the case because the boss I'm seeing are pretty like obvious I could write a few lines of code that catch these Bots I mean definitely there's a lot of longing fruit but I will say I agree that if you are a sophisticated actor you could probably create a pretty good bot right now um you know using tools like gpts because it's a language model you can generate faces that look quite good now uh and you can do this at scale and so I think um yeah it's quite plausible and it's going to be hard to defend there was a Google engineer that claimed that the Lambda was sentient do you think there's any inkling of Truth to what he felt and more importantly to me at least do you think language models will achieve sentence or the illusion of sentience soonish fish yeah to me it's a little bit of a canary Nicole mine kind of moment honestly a little bit because uh so this engineer spoke to like a chatbot at Google and uh became convinced that uh this bot is sentient yeah as there's some existential philosophical questions and it gave like reasonable answers and looked real and uh and so on so to me it's a uh he was he was uh he wasn't sufficiently trying to stress the system I think and uh exposing the truth of it as it is today um but uh I think this will be increasingly harder over time uh so uh yeah I think more and more people will basically uh become um yeah I think more and more there will be more people like that over time as this gets better like form an emotional connection to to an AI yeah perfectly plausible in my mind I think these AIS are actually quite good at human human connection human emotion a ton of text on the Internet is about humans and connection and love and so on so I think they have a very good understanding in some in some sense of of how people speak to each other about this and um they're very capable of creating a lot of that kind of text the um there's a lot of like sci-fi from 50s and 60s that imagined AIS in a very different way they are calculating cold vulcan-like machines that's not what we're getting today we're getting pretty emotional AIS that actually uh are very competent and capable of generating you know possible sounding text with respect to all of these topics see I'm really hopeful about AI systems that are like companions that help you grow develop as a human being help you maximize long-term happiness but I'm also very worried about AI systems that figure out from the internet the humans get attracted to drama and so these would just be like shit talking AIS that's just constantly did you hear like they'll do gossip they'll do uh they'll try to plant seeds of Suspicion to like other humans that you love and trust and just kind of mess with people uh in the you know because because that's going to get a lot of attention so drama maximize drama on the path to maximizing uh engagement and US humans will feed into that machine yeah and get it'll be a giant drama shitstorm so I'm worried about that so it's the objective function really defines the way that human civilization progresses with AIS in it yeah I think right now at least today they are not sort of it's not correct to really think of them as goal seeking agents that want to do something they have no long-term memory or anything they it's literally a good approximation of it is you get a thousand words and you're trying to predict a thousand at first and then you continue feeding it in and you are free to prompt it in whatever way you want so in text so you say okay you are a psychologist and you are very good and you love humans and here's a conversation between you and another human human colon Something you something and then it just continues the pattern and suddenly you're having a conversation with a fake psychologist who's not trying to help you and so it's still kind of like in a realm of a tool it is a um people can prompt their arbitrary ways and it can create really incredible text but it doesn't have long-term goals over long periods of time it doesn't try to uh so it doesn't look that way right now yeah but you can do short-term goals that have long-term effects so if my prompting short-term goal is to get Andre capacity to respond to me on Twitter when I like I think AI might that's the goal but he might figure out that talking shit to you it would be the best in a highly sophisticating interesting way and then you build up a relationship when you respond once and then it like over time it gets to not be sophisticated and just like just talk shit and okay maybe you won't get to Andre but it might get to another celebrity it might get into other big accounts and then it'll just so with just that simple goal get them to respond yeah maximize the probability of actual response yeah I mean you could prompt a uh powerful model like this with their its opinion about how to do any possible thing you're interested in so they will discuss they're kind of on track to become these oracles I could I sort of think of it that way they are oracles uh currently is just text but they will have calculators they will have access to Google search they will have all kinds of couches and gizmos they will be able to operate the internet and find different information and um yeah in some sense that's kind of like currently what it looks like in terms of the development do you think it'll be an improvement eventually over what Google is for access to human knowledge like it'll be a more effective search engine to access human knowledge I think there's definite scope in building a better search engine today and I think Google they have all the tools all the people they have everything they need they have all the puzzle pieces they have people training Transformers at scale they have all the data uh it's just not obvious if they are capable as an organization to innovate on their search engine right now and if they don't someone else will there's absolute scope for building a significantly better search engine built on these tools it's so interesting a large company where the search there's already an infrastructure it works as it brings out a lot of money so where structurally inside a company is their motivation to Pivot yeah to say we're going to build a new search engine yep that's really hard so it's usually going to come from a startup right that's um that would be yeah or some other more competent organization um so uh I don't know so currently for example maybe Bing has another shot at it you know so Microsoft Edge because we're talking offline um I mean I definitely it's really interesting because search engines used to be about okay here's some query here's here's here's web pages that look like the stuff that you have but you could just directly go to answer and then have supporting evidence um and these uh these models basically they've read all the texts and they've read all the web pages and so sometimes when you see yourself going over to search results and sort of getting like a sense of like the average answer to whatever you're interested in uh like that just directly comes out you don't have to do that work um so they're kind of like uh yeah I think they have a way to this of distilling all that knowledge into like some level of insight basically do you think of prompting as a kind of teaching and learning like this whole process like another layer you know because maybe that's what humans are we already have that background model and then your the world is prompting you yeah exactly I think the way we are programming these computers now like gpts is is converging to how you program humans I mean how do I program humans via prompt I go to people and I I prompt them to do things I prompt them from information and so uh natural language prompt is how we program humans and we're starting to program computers directly in that interface it's like pretty remarkable honestly so you've spoken a lot about the idea of software 2.0 um all good ideas become like cliches so quickly like the terms it's kind of hilarious um it's like I think Eminem once said that like if he gets annoyed by a song He's written very quickly that means it's going to be a big hit because it's it's too catchy but uh can you describe this idea and how you're thinking about it has evolved over the months and years since since you coined it yeah yeah so I had a blog post on software 2.0 I think several years ago now um and the reason I wrote that post is because I kept I kind of saw something remarkable happening in like software development and how a lot of code was being transitioned to be written not in sort of like C plus and so on but it's written in the weights of a neural net basically just saying that neural Nets are taking over software the realm of software and uh taking more and more tasks and at the time I think not many people understood uh this uh deeply enough that this is a big deal it's a big transition uh neural networks were seen as one of multiple classification algorithms you might use for your data set problem on kaggle like this is not that this is a change in how we program computers and I saw neural Nets as uh this is going to take over the way we program computers is going to change is not going to be people writing a software in C plus or something like that and directly programming the software it's going to be accumulating training sets and data sets and crafting these objectives by which we train these neural Nets and at some point there's going to be a compilation process from the data sets and the objective and the architecture specification into the binary which is really just uh the neural nut you know weights and the forward pass of the neural net and then you can deploy that binary and so I was talking about that sort of transition and uh that's what the post is about and I saw this sort of play out in a lot of fields uh you know autopilot being one of them but also just a simple image classification people thought originally you know in the 80s and so on that they would write the algorithm for detecting a dog in an image and they had all these ideas about how the brain does it and first we detected corners and then we detect lines and then we stitched them up and they were like really going at it they were like thinking about how they're going to write the algorithm and this is not the way you build it and there was a smooth transition where okay first we thought we were going to build everything then we were building the features uh so like Hawk features and things like that that detect these little statistical patterns from image patches and then there was a little bit of learning on top of it like a support Vector machine or binary classifier for cat versus dog and images on top of the features so we wrote the features but we trained the last layer sort of the the classifier and then people are like actually let's not even design the features because we can't honestly we're not very good at it so let's also learn the features and then you end up with basically a convolutional neural net where you're learning most of it you're just specifying the architecture and the architecture has tons of fill in the blanks which is all the knobs and you let the optimization write most of it and so this transition is happening across the industry everywhere and uh suddenly we end up with a ton of code that is written in neural net weights and I was just pointing out that the analogy is actually pretty strong and we have a lot of developer environments for software 1.0 like we have Ides um how you work with code how you debug code how do you how do you run code how do you maintain code we have GitHub so I was trying to make those analogies in the new realm like what is the GitHub or software 2.0 it turns out that something that looks like hugging face right now uh you know and so I think some people took it seriously and built cool companies and uh many people originally attacked the post it actually was not well received when I wrote it and I think maybe it has something to do with the title but the post was not well received and I think more people sort of have been coming around to it over time yeah so you were the director of AI at Tesla where I think this idea was really implemented at scale which is how you have engineering teams doing software 2.0 so can you sort of Linger on that idea of I think we're in the really early stages of everything you just said which is like GitHub Ides like how do we build engineering teams that that work in software 2.0 systems and and the the data collection and the data annotation which is all part of that software 2.0 like what do you think is the task of programming a software 2.0 is it debugging in the space of hyper parameters or is it also debugging the space of data yeah the way by which you program the computer and influence its algorithm is not by writing the commands yourself you're changing mostly the data set uh you're changing the um loss functions of like what the neural net is trying to do how it's trying to predict things but yeah basically the data sets and the architectures of the neural net and um so in the case of the autopilot a lot of the data sets have to do with for example detection of objects and Lane line markings and traffic lights and so on So You accumulate massive data sets of here's an example here's the desired label and then uh here's roughly how the architect here's roughly what the algorithm should look like and that's a conclusional neural net so the specification of the architecture is like a hint as to what the algorithm should roughly look like and then to fill in the blanks process of optimization is the training process and then you take your neural nut that was trained it gives all the right answers on your data set and you deploy it so there's in that case perhaps it all machine learning cases there's a lot of tasks so is coming up formulating a task like uh for a multi-headed neural network is formulating a task part of the programming yeah very much so how you break down a problem into a set of tasks yeah I'm on a high level I would say if you look at the software running in in the autopilot I gave a number of talks on this topic I would say originally a lot of it was written in software 1.0 there's imagine lots of C plus plus all right and then gradually there was a tiny neural net that was for example predicting given a single image is there like a traffic light or not or is there a landline marking or not and this neural net didn't have too much to do in this in the scope of the software it was making tiny predictions on individual little image and then the rest of the system stitched it up so okay we're actually we don't have just a single camera with eight cameras we actually have eight cameras over time and so what do you do with these predictions how do you put them together how do you do the fusion of all that information and how do you act on it all of that was written by humans um in C plus and then we decided okay we don't actually want uh to do all of that Fusion in C plus code because we're actually not good enough to write that algorithm we want the neural Nets to write the algorithm and we want to Port uh all of that software into the 2.0 stack and so then we actually had neural Nets that now take all the eight camera images simultaneously and make predictions for all of that so um and and actually they don't make predictions in a in the space of images they now make predictions directly in 3D and actually they don't in three dimensions around the car and now actually we don't um manually fuse the predictions over in 3D over time we don't trust ourselves to write that tracker so actually we give the neural net uh the information over time so it takes these videos now and makes those predictions and so your sort of just like putting more and more power into the neural network processing and at the end of it the eventual sort of goal is to have most of the software potentially be in the 2.0 land um because it works significantly better humans are just not very good at writing software basically so the prediction is space happening in this like 4D land yeah was three-dimensional world over time yeah how do you do annotation in that world what what have you as it's just a data annotation whether it's self-supervised or manual by humans is um is a big part of this software 2.0 world right I would say by far in the industry if you're like talking about the industry and how what is the technology of what we have available everything is supervised learning so you need data sets of input desired output and you need lots of it and um there are three properties of it that you need you need it to be very large you need it to be accurate No mistakes and you need it to be diverse you don't want to uh just have a lot of correct examples of one thing you need to really cover the space of possibility as much as you can and the more you can cover the space of possible inputs the better the algorithm will work at the end now once you have really good data sets that you're collecting curating um and cleaning you can train uh your neural net um on top of that so a lot of the work goes into cleaning those data sets now as you pointed out it's probably it could be the question is how do you achieve a ton of uh if you want to basically predict in 3D you need data in 3D to back that up so in this video we have eight videos coming from all the cameras of the system and this is what they saw and this is the truth of what actually was around there was this car there was this car this car these are the lane line markings this is geometry of the road there's a traffic light in this three-dimensional position you need the ground truth um and so the big question that the team was solving of course is how do you how do you arrive at that ground truth because once you have a million of it and it's large clean and diverse then training a neural network on it works extremely well and you can ship that into the car and uh so there's many mechanisms by which we collected that training data you can always go for human annotation you can go for simulation as a source of ground truth you can also go for what we call the offline tracker um that we've spoken about at the AI day and so on which is basically an automatic reconstruction process for taking those videos and recovering the three-dimensional sort of reality of what was around that car so basically think of doing like a three-dimensional reconstruction as an offline thing and then understanding that okay there's 10 seconds of video this is what we saw and therefore here's all the lane last cars and so on and then once you have that annotation you can train your neural Nets to imitate it and how difficult is the reconstruct the 3D reconstruction it's difficult but it can be done so there's so the there's overlap between the cameras and you do the Reconstruction and there's uh perhaps if there's any inaccuracy so that's caught in The annotation step uh yes the nice thing about The annotation is that it is fully offline you have infinite time you have a chunk of one minute and you're trying to just offline in a super computer somewhere figure out where were the positions of all the cars all the people and you have your full one minute of video from all the Angles and you can run all the neural Nets you want and they can be very efficient massive neural Nets there can be neural Nets that can't even run in the car later at this time so they can be even more powerful neurons than what you can eventually deploy so you can do anything you want three-dimensional reconstruction neural Nets uh anything you want just to recover that truth and then you supervise that truth what have you learned you said no mistakes about humans doing annotation because I assume humans are uh there's like a range of things they're good at in terms of clicking stuff on screen it's not how interesting is that to you of a problem of designing an annotator where humans are accurate enjoy it like what are they even the metrics are efficient or productive all that kind of stuff yeah so uh I grew The annotation team at Tesla from basically zero to a thousand uh while I was there that was really interesting you know my background is a PhD student researcher so growing that common organization was pretty crazy uh but uh yeah I think it's extremely interesting and part of the design process very much behind the autopilot as to where you use humans humans are very good at certain kinds of annotations they're very good for example at two-dimensional annotations of images they're not good at annotating uh cars over time in three-dimensional space very very hard and so that's why we were very careful to design the tasks that are easy to do for humans versus things that should be left to the offline tracker like maybe the maybe the computer will do all the triangulation and 3D reconstruction but the human will say exactly these pixels of the image are car exactly these pixels are human and so co-designing the the data annotation pipeline was very much bread and butter was what I was doing daily do you think there's still a lot of open problems in that space um just in general annotation where the stuff the machines are good at machines do and the humans do what they're good at and there's maybe some iterative process right I think to a very large extent we went through a number of iterations and we learned a ton about how to create these data sets I'm not seeing big open problems like originally when I joined I was like I was really not sure how this would turn out yeah but by the time I left I was much more secure in actually we sort of understand the philosophy of how to create these data sets and I was pretty comfortable with where that was at the time so what are strengths and limitations of cameras for the driving test in your understanding when you formulate the driving task as a vision task with eight cameras you've seen that the entire you know most of the history of the computer vision field when it has to do with neural networks what just if you step back what are the strengths and limitations of pixels of using pixels to drive yeah pixels I think are a beautiful sensory beautiful sensor I would say the thing is like cameras are very very cheap and they provide a ton of information ton of bits uh so it's uh extremely cheap sensor for a ton of bits and each one of these bits as a constraint on the state of the world and so you get lots of megapixel images uh very cheap and it just gives you all these constraints for understanding what's actually out there in the world so vision is probably the highest bandwidth sensor it's a very high bandwidth sensor and um I love that pixels it is a is a constraint on the world This is highly complex uh high bandwidth constraint in the world on the stage of the world that's fascinating it's not just that but again this real real importance of it's the sensor that humans use therefore everything is designed for that sensor yeah the text the writing the flashing signs everything is designed for vision and so and you just find it everywhere and so that's why that is the interface you want to be in um talking again about these Universal interfaces and uh that's where we actually want to measure the world as well and then develop software uh for that sensor but there's other constraints on the state of the world that humans use to understand the world I mean Vision ultimately is the main one but we're like we're like referencing our understanding of human behavior and some common sense physics that could be inferred from vision from from a perception perspective but it feels like we're using some kind of reasoning to predict the world yeah not just the pixels I mean you have a powerful prior uh sorry right for how the world evolves over time Etc so it's not just about the likelihood term coming up from the data itself telling you about what you are observing but also the prior term of like where where are the likely things to see and how do they likely move and so on and the question is how complex is the uh the the range of possibilities that might happen in the driving task right that's still is is that to you still an open problem of how difficult is driving like philosophically speaking like do you all the time you've worked on driving do you understand how hard driving is yeah driving is really hard because it has to do with the predictions of all these other agents and the theory of mind and you know what they're gonna do and are they looking at you are they where are they looking what are they thinking yeah there's a lot that goes there at the at the full tail of you know the the expansion of the nines that we have to be comfortable with eventually the final problems are of that form I don't think those are the problems that are very common uh I think eventually they're important but it's like really in the tail end in the tail and the rare edge cases from the vision perspective what are the toughest parts of the vision problem of driving um well basically the sensor is extremely powerful but you still need to process that information um and so going from brightnesses of these pixel values to hey here the three-dimensional world is extremely hard and that's what the neural networks are fundamentally doing and so um the difficulty really is in just doing an extremely good job of engineering the entire pipeline uh the entire data engine having the capacity to train these neural nuts having the ability to evaluate the system and iterate on it uh so I would say just doing this in production at scale is like the hard part it's an execution problem so the data engine but also the um the sort of deployment of the system such that has low latency performance so it has to do all these steps yeah for the neural net specifically just making sure everything fits into the chip on the car yeah and uh you have a finite budget of flops that you can perform and uh and memory bandwidth and other constraints and you have to make sure it flies and you can squeeze in as much compute as you can into the tiny what have you learned from that process because it maybe that's one of the bigger like new things coming from a research background where there's there's a system that has to run under heavily constrained resources right has to run really fast what what kind of insights have you uh learned from that yeah I'm not sure if it's if there's too many insights you're trying to create a neural net that will fit in what you have available and you're always trying to optimize it and we talked a lot about it on the AI day and uh basically the the triple backflips that the team is doing to make sure it all fits and utilizes the engine uh so I think it's extremely good engineering um and then there's also all kinds of little insights peppered in on how to do it properly let's actually zoom out because I don't think we talked about the data engine the entirety of the layout of this idea that I think is just beautiful with humans in the loop can you describe the data engine yeah the data engine is what I call the almost biological feeling like process by which you uh perfect the training sets for these neural networks um so because most of the programming now is in the level of these data sets and make sure they're large diverse and clean oh basically you have a data set that you think is good you train your neural net you deploy it and then you observe how well it's performing and you're trying to uh always increase the quality of your data set so you're trying to catch scenarios basically there are basically rare and uh it is in these scenarios that the neural Nets will typically struggle in because they weren't told what to do in those rare cases in the data set but now you can close the loop because if you can now collect all those at scale you can then feed them back into the Reconstruction process I described and uh reconstruct the truth in those cases and add it to the data set and so the whole thing ends up being like a staircase of improvement of perfecting your training set and you have to go through deployments so that you can mine uh the parts that are not yet represented well in the data set so your data set is basically imperfect it needs to be diverse it has pockets there are missing and you need to pad out the pockets you can sort of think of it that way in the data what role do humans play in this so what's the uh this biological system like a human body is made up of cells what what role like how do you optimize the human uh system the the multiple Engineers collaborating figuring out what to focus on what to contribute which which task to optimize in this neural network uh who's in charge of figuring out which task needs more data can you speak to the hyper parameters the human uh system right it really just comes down to extremely good execution from an engineering team and does what they're doing they understand intuitively the philosophical insights underlying the data engine and the process by which the system improves and uh how to again like delegate the strategy of the data collection and how that works and then just making sure it's all extremely well executed and that's where most of the work is is not even the philosophizing or the research or the ideas of it it's just extremely good execution it's so hard when you're dealing with data at that scale so your role in the data engine executing well on it it is difficult and extremely important is there a priority of like uh like a vision board of saying like we really need to get better at stop lights yeah like the the prioritization of tasks is that essentially and that comes from the data that comes to um a very large extent to what we are trying to achieve in the product for a map where we're trying to the release we're trying to get out um in the feedback from the QA team worth it where the system is struggling or not the things we're trying to improve and the QA team gives some signal some information in aggregate about the performance of the system in various conditions and then of course all of us drive it and we can also see it it's really nice to work with the system that you can also experience yourself you know it drives you home it's is there some insight you can draw from your individual experience that you just can't quite get from an aggregate statistical analysis of data yeah it's so weird right yes it's it's not scientific in a sense because you're just one anecdotal sample yeah I think there's a ton of uh it's a source of truth it's your interaction with the system yeah and you can see it you can play with it you can perturb it you can get a sense of it you have an intuition for it I think numbers just like have a way of numbers and plots and graphs are you know much harder yeah it hides a lot of it's like if you train a language model it's a really powerful way is by you interacting with it yeah 100 try to build up an intuition yeah I think like Elon also like he always wanted to drive the system himself he drives a lot and uh I'm gonna say almost daily so uh he also sees this as a source of Truth you driving the system uh and it performing and yeah so what do you think tough questions here uh so Tesla last year removed radar from um from the sensor suite and now just announced that it's going to remove all ultrasonic sensors relying solely on Vision so camera only does that make the perception problem harder or easier I would almost reframe the question in some way so the thing is basically you would think that additional sensors by the way can I just interrupt good I wonder if a language model will ever do that if you prompt it let me reframe your question that would be epic this is the wrong problem sorry it's like a little bit of a wrong question because basically you would think that these sensors are an asset to you yeah but if you fully consider the entire product in its entirety these sensors are actually potentially reliability because these sensors aren't free they don't just appear on your car you need something you need to have an entire supply chain you have people procuring it there can be problems with them they may need replacement they are part of the manufacturing process they can hold back the line in production you need to Source them you need to maintain them you have to have teams that write the firmware all of it and then you also have to incorporate and fuse them into the system in some way and so it actually like bloats the organ the a lot of it and I think Elon is really good at simplify simplified best part is no part and he always tries to throw away things that are not essential because he understands the entropy in organizations and approach and I think uh in this case the cost is high and you're not potentially seeing it if you're just a computer vision engineer and I'm just trying to improve my network and you know is it more useful or less useful how useful is it and the thing is if once you consider the full cost of a sensor it actually is potentially a liability and you need to be really sure that it's giving you extremely useful information in this case we looked at using it or not using it and the Delta was not massive and so it's not useful is it also blow in the data engine like having more sensors is a distraction and these sensors you know they can change over time for example you can have one type of say radar you can have other type of radar they change over time I suddenly need to worry about it now suddenly you have a column in your sqlite telling you oh which sensor type was it and they all have different distributions and then uh they can they just they contribute noise and entropy into everything and they bloat stuff and also organizationally has been really fascinating to me that it can be very distracting um if you if all if you only want to get to work is Vision all the resources are on it and you're building out a data engine and you're actually making forward progress because that is the the sensor with the most bandwidth the most constraints on the world and you're investing fully into that and you can make that extremely good if you're uh you're only a finite amount of sort of spend of focus across different facets of the system and uh this kind of reminds me of Rich Sutton's a bitter lesson it just seems like simplifying the system yeah in the long run now of course you don't know what the long run it seems to be always the right solution yeah yes in that case it was 4rl but it seems to apply generally across all systems that do computation yeah so where uh what do you think about the lidar as a crutch debate uh the battle between point clouds and pixels yeah I think this debate is always like slightly confusing to me because it seems like the actual debate should be about like do you have the fleet or not that's like the really important thing about whether you can achieve a really good functioning of an AI system at this scale so data collection systems yeah do you have a fleet or not it's significantly more important whether you have lidar or not it's just another sensor um and uh yeah I think similar to the radar discussion basically I um but yeah I don't think it it um basically doesn't offer extra extra information is extremely costly it has all kinds of problems you have to worry about it you have to calibrate it Etc it creates bloat and entropy you have to be really sure that you need this uh this um sensor in this case I basically don't think you need it and I think honestly I will make a stronger statement I think the others some of the other uh companies are using it are probably going to drop it yeah so you have to consider the sensor in the full in considering can you build a big Fleet that collects a lot of data and can you integrate that sensor with that that data and that sensor into a data engine that's able to quickly find different parts of the data that then continuously improves whatever the model that you're using yeah another way to look at it is like vision is necessary in a sense that uh the drive the world is designed for human visual consumption so you need vision is necessary and then also it is sufficient because it has all the information that you that you need for driving and humans obviously is a vision to drive so it's both necessary and sufficient so you want to focus resources and you have to be really sure if you're going to bring in other sensors you could you could you could add sensors to Infinity at some point you need to draw the line and I think in this case you have to really consider the full cost of any One sensor that you're adopting and do you really need it and I think the answer in this case is no so what do you think about the idea of the that the other companies are forming high resolution maps and constraining heavily the geographic regions in which they operate is that approach not in your in your view um not going to scale over time to the entirety of the United States I think I'll take two as you mentioned like they pre-map all the environments and they need to refresh the map and they have a perfect centimeter level accuracy map of everywhere they're going to drive it's crazy how are you going to when we're talking about autonomy actually changing the world we're talking about the deployment on a on a global scale of autonomous systems for transportation and if you need to maintain a centimeter accurate map for Earth or like for many cities and keep them updated it's a huge dependency that you're taking on huge dependency it's a massive massive dependency and now you need to ask yourself do you really need it and humans don't need it um right so it's it's very useful to have a low-level map of like okay the connectivity of your road you know that there's a fork coming up when you drive an environment you sort of have that high level understanding it's like a small Google Map and Tesla uses Google Map like similar kind of resolution information in the system but it will not pre-map environments to send me a level accuracy it's a crutch it's a distraction it costs entropy and it diffuses the team it dilutes the team and you're not focusing on what's actually necessary which is the computer vision problem what did you learn about machine learning about engineering about life about yourself as one human being from working with Elon Musk I think the most I've learned is about how to sort of run organizations efficiently and how to create efficient organizations and how to fight entropy in an organization so human Engineering in the fight against entropy yeah there's a there's a I think Elon is a very efficient warrior in the fight against entropy in organizations what is the entropy in an organization look like exactly it's process it's it's process and inefficiencies and that kind of stuff yeah meetings he hates meetings he keeps telling people to skip meetings if they're not useful um he basically runs the world's biggest uh startups I would say uh Tesla SpaceX are the world's biggest startups Tesla actually has multiple startups I think it's better to look at it that way and so I think he's he's extremely good at uh at that and uh yeah he's a very good intuition for streamline processes making everything efficient uh best part is no part uh simplifying focusing um and just kind of removing barriers uh moving very quickly making big moves all this is a very startupy sort of seeming things but at scale so strong drive to simplify for me from your perspective I mean that um that also probably applies to just designing systems and machine learning and otherwise yeah like simplify simplify yes what do you think is the secret to maintaining the startup culture in a company that grows is there can you introspect that I do think you need someone in a powerful position with a big hammer like Elon who's like the cheerleader for that idea and ruthless ruthlessly pursues it if no one has a big enough Hammer everything turns into committees democracy within the company uh process talking to stakeholders decision making just everything just crumbles yeah if you have a big person who's also really smart and has a big hammer things move quickly so you said your favorite scene in interstellar is the intense docking scene with the AI and Cooper talking saying uh Cooper what are you doing docking it's not possible no it's necessary such a good line by the way just so many questions there why in AI in that scene presumably is supposed to be able to compute a lot more than the human is saying it's not optimal why the human I mean that's a movie but shouldn't they AI know much better than the human anyway uh what do you think is the value of setting seemingly impossible goals so like uh our initial intuition which seems like something that you have taken on that Elon espouses that where the initial intuition of the community might say this is very difficult and then you take it on anyway with a crazy deadline you're just from a human engineering perspective um uh have you seen the value of that I wouldn't say that setting impossible goals exactly is is a good idea but I think setting very ambitious goals is a good idea I think there's a what I call sublinear scaling of difficulty uh which means that 10x problems are not 10x hard usually 10x 10x harder problem is like 2 or 3x harder to execute on because if you want to actually like if you want to improve the system by 10 it costs some amount of work and if you want to 10x improve the system it doesn't cost you know 100x amount of the work and it's because you fundamentally change the approach and it if you start with that constraint then some approaches are obviously dumb and not going to work and it forces you to reevaluate um and I think it's a very interesting way of approaching problem solving but it requires a weird kind of thinking it's just going back to your like PhD days it's like how do you think which ideas in in the machine Learning Community are solvable yes it's uh it requires what is that I mean there's the cliche of first prince people's thinking but like it requires to basically ignore what the community is saying because doesn't the community doesn't a community in science usually draw lines of what isn't isn't possible right and like it's very hard to break out of that without going crazy yep I mean I think a good example here is you know the Deep learning revolution in some sense because you could be in computer vision at that time when during the Deep learning sort of revolution of 2012 and so on uh you could be improving your computer vision stack by 10 or we can just be saying actually all this is useless and how do I do 10x better computer vision well it's not probably by tuning a hog feature detector I need a different approach um I need something that is scalable going back to uh Richard Sutton's um and understanding sort of like the philosophy of the uh bitter lesson and then being like actually I need a much more scalable system like a neural network that in principle works and then having some deep Believers that can actually execute on that mission and make it work so that's the 10x solution what do you think is the timeline to solve the problem of autonomous driving this still in part an open question yeah I think the tough thing with timelines of self-driving obviously is that no one has created self-driving yeah so it's not like what do you think is a timeline to build this bridge well we've built million Bridges before here's how long that takes it's it you know it's uh no one has built autonomy it's not obvious uh some parts turn out to be much easier than others so it's really hard to forecast you do your best based on trend lines and so on and based on intuition but that's why fundamentally it's just really hard to forecast this no one has even still like being inside of it is hard to uh to do yes some things turn out to be much harder and some things turn out to be much easier do you try to avoid making forecasts because like Elon doesn't avoid them right and heads of car companies in the past have not avoided it either uh Ford and other places have made predictions that we're going to solve at level four driving by 2020 2021 whatever and now they're all kind of Backtrack on that prediction IU as a as an AI person do you free yourself privately make predictions or do they get in the way of like your actual ability to think about a thing yeah I would say like what's easy to say is that this problem is tractable and that's an easy prediction to make extractable it's going to work yes it's just really hard some things turn out to be harder than some things turn out to be easier uh so uh but it definitely feels tractable and it feels like at least the team at Tesla which is what I saw internally is definitely on track to that how do you form a uh strong representation that allows you to make a prediction about tractability so like you're the leader of a lot a lot of humans you have to kind of say this is actually possible like how do you build up that intuition it doesn't have to be even driving it could be other tasks it could be um and I wonder what difficult tasks did you work on in your life I mean classification achieving certain just an image that certain level of superhuman level performance yeah expert intuition it's just intuition it's belief so just like thinking about it long enough like studying looking at sample data like you said driving uh my intuition has really flawed on this like I don't have a good intuition about tractability it could be either it could be anything it could be solvable like uh you know the driving task could could be simplified into something quite trivial like uh the solution to the problem would be quite trivial and at scale more and more cars driving perfectly might make the problem much easier Yeah the more cars you have driving like people learn how to drive correctly not correctly but in a way that's more optimal for a heterogeneous system of autonomous and semi-autonomous and manually driven cars that could change stuff then again also I've spent a ridiculous number of hours just staring at pedestrians crossing streets thinking about humans and it feels like the way we use our eye contact it sends really strong signals and there's certain quirks and edge cases of behavior and of course a lot of the fatalities that happen have to do with drunk driving and um both on The Pedestrian side and the driver's side so there's that problem of driving at night and all that kind of yeah so I wonder you know it's like the space of possible solution to autonomous driving includes so many human factor issues that it's almost impossible to predict there could be super clean nice Solutions yeah I would say definitely like to use a game analogy there's some fog of War but you definitely also see the frontier of improvement and you can measure historically how much you've made progress and I think for example at least what I've seen in uh roughly five years at Tesla when I joined it barely kept laying on the highway I think going up from Palo Alto to SF was like three or four interventions anytime the road would do anything geometrically or turn too much it would just like not work and so going from that to like a pretty competent system in five years and seeing what happens also under the hood and what the scale which the team is operating now with respect to data and compute and everything else uh is just a massive progress so there's a you're climbing a mountain and it's fog but you're making a lot of progress fog you're making progress and you see what the next directions are and you're looking at some of the remaining challenges and they're not like uh they're not perturbing you and they're not changing your philosophy and you're not contorting yourself you're like actually these are the things that we still need to do yeah the fundamental components of solving the problems seem to be there for the data engine to the compute to the the computer on the car to the compute for the training all that kind of stuff so you've done uh over the years you've been a test you've done a lot of amazing uh breakthrough ideas and Engineering all of it um from the data engine to The Human Side all of it can you speak to why you chose to leave Tesla basically as I described that ran I think over time during those five years I've kind of uh gotten myself into a little bit of a managerial position most of my days were you know meetings and growing the organization and making decisions about sort of high level strategic decisions about the team and what it should be working on and so on and uh it's kind of like a corporate executive role and I can do it I think I'm okay at it but it's not like fundamentally what I what I enjoy and so I think uh when I joined uh there was no computer vision team because Tesla was just going from the transition of using mobileye a third-party vendor for all of its computer vision to having to build its computer vision system so when I showed up there were two people training deep neural networks and they were training them at a computer at their at their legs like uh kind of basic classification task yeah and so I kind of like grew that into what I think is a fairly respectable deep learning team a massive compute cluster a very good um data annotation organization and uh I was very happy with where that was it became quite autonomous and so I kind of stepped away and I uh you know I'm very excited to do much more technical things again yeah and kind of like we focus on AGI what was this soul searching like because you took a little time off and think like what um how many mushrooms did you take no I'm just uh I mean what what was going through your mind the human lifetime is finite yeah he did a few incredible things you're you're one of the best teachers of AI in the world you're one of the best and I don't mean that I mean that in the best possible way you're one of the best tinkerers in the AI world meaning like understanding the fundamental fundamentals of how something works by building it from scratch and playing with it with the basic intuitions it's like Einstein feinmen were all really good at this kind of stuff like a small example of a thing to to play with it to try to understand it uh so that and obviously now with us that you help build a team of machine learning um uh like engineers and a system that actually accomplishes something in the real world so given all that like what was the soul searching like well it was hard because obviously I love the company a lot and I love I love Elon I love Tesla I want um it was hard to leave I love the team basically um but yeah I think actually I would potentially like interested in revisiting it maybe coming back at some point uh working in Optimus working in AGI at Tesla uh I think Tesla is going to do incredible things it's basically like uh it's a massive large-scale robotics kind of company with a ton of In-House talent for doing really incredible things and I think uh human robots are going to be amazing I think autonomous transportation is going to be amazing all this is happening at Tesla so I think it's just a really amazing organization so being part of it and helping it along I think was very basically I enjoyed that a lot yeah it was basically difficult for those reasons because I love the company uh but you know I'm happy to potentially at some point come back for act two but I felt like at this stage I built the team it felt autonomous and uh I became a manager and I wanted to do a lot more technical stuff I wanted to learn stuff I wanted to teach stuff and uh I just kind of felt like it was a good time for for a change of pace a little bit what do you think is uh the best movie sequel of all time speaking of part two because like because most of them suck in movie sequels yeah and you tweet about movies so just in a tiny tangent is there what's your what was like a favorite movie sequel Godfather Part Two um are you a fan of Godfather because you didn't even tweet or mention the Godfather yeah I don't love that movie I know it hasn't edit that out we're gonna edit out the hate towards the Godfather how dare you just I think I will make a strong statement I don't know why I don't know why but I basically don't like any movie before 1995 something like that didn't you mention Terminator two okay okay that's like uh Terminator 2 was a little bit later 1990 no I think Terminator 2 was a name I like Terminator one as well so okay so like a few exceptions but by and large for some reason I don't like movies before 1995 or something they feel very slow the camera is like zoomed out it's boring it's kind of naive it's kind of weird and also Terminator was very much ahead of its time yes and The Godfather there's like no AGI [Laughter] I mean but you have Good Will Hunting was one of the movies you mentioned and that doesn't have any AGI either I guess that's mathematics yeah I guess occasionally I do enjoy movies that don't feature or like Anchorman that has no that's the increment it's so good I don't understand um speaking of AGI because I don't understand why Will Ferrell is so funny it doesn't make sense it doesn't compute there's just something about him and he's a singular human because you don't get that many comedies these days and I wonder if it has to do about the culture uh or the like the machine of Hollywood or does it have to do with just we got lucky with certain people and comedy it came together because he is a singular human that was a ridiculous tangent I apologize but you mentioned humanoid robot so what do you think about Optimus about Tesla bot do you think we'll have robots in the factory in in the home in 10 20 30 40 50 years yeah I think it's a very hard project I think it's going to take a while but who else is going to build humano robots at scale yeah and I think it is a very good form factor to go after because like I mentioned the the world is designed for humanoid form factor these things would be able to operate our machines they would be able to sit down in chairs uh potentially even drive cars uh basically the world is designed for humans that's the form factor you want to invest into and make work over time uh I think you know there's another school of thought which is okay pick a problem and design a robot to it but actually designing a robot and getting a whole data engine and everything behind it to work is actually an incredibly hard problem so it makes sense to go after General interfaces that uh okay they are not perfect for any one given task but they actually have the generality of just with a prompt with English able to do something across and so I think it makes a lot of sense to go after a general uh interface um in the physical world and I think it's a very difficult project I think it's going to take time but I see no other no other company that can execute on that Vision I think it's going to be amazing like uh basically physical labor like if you think transportation is a large Market try physical labor insane well but it's not just physical labor to me the thing that's also exciting is the social robotics so the the relationship we'll have on different levels with those robots that's why I was really excited to see Optimus like um people have criticized me for the excitement but I've I've worked with uh uh a lot of research Labs that do humanoid legged robots Boston Dynamics unitary a lot there's a lot of companies that do legged robots but that's the the Elegance of the movement is a tiny tiny part of the big picture so integrating the two big exciting things to me about Tesla doing humanoid or any Lego robots is clearly integrating it into the data engine so the the data engine aspect so the actual intelligence for the perception and the and the control and the planning and all that kind of stuff integrating into this huge the fleet that you mentioned right um and then speaking of Fleet the second thing is the mass manufacturers Just knowing uh culturally uh driving towards a simple robot that's cheap to produce at scale yeah and doing that well having experience to do that well that changes everything that's why that's a very different culture and style than Boston Dynamics who by the way those those robots are just the the way they move it's uh like it'll be a very long time before Tesla could achieve the smoothness of movement but that's not what it's about it's it's about uh it's about the entirety of the system like we talked about the data engine and the fleet that's super exciting even the initial sort of models uh but that too was really surprising that in a few months you can get a prototype yep and the reason that happened very quickly is as you alluded to there's a ton of copy based from what's happening in the autopilot yes a lot the amount of expertise that like came out of the Woodworks at Tesla for building the human robot was incredible to see like basically Elon said at one point we're doing this and then next day basically like all these CAD models started to appear and people talk about like the supply chain and Manufacturing and uh people showed up with like screwdrivers and everything like the other day and started to like put together the body and I was like whoa like all these people exist at Tesla and fundamentally building a car is actually not that different from building a robot the same and that is true uh not just for uh the hardware pieces and also let's not forget Hardware not just for a demo but manufacturing of that Hardware at scale is like a whole different thing but for software as well basically this robot currently thinks it's a car uh it's gonna have a midlife crisis at some point it thinks it's a car um some of the earlier demos actually we were talking about potentially doing them outside in the parking lot because that's where all of the computer vision that was like working out of the box instead of like in inside um but all the operating system everything just copy pastes uh computer vision mostly copy paste I mean you have to retrain the neural Nets but the approach and everything in data engine and offline trackers and the way we go about the occupancy tracker and so on everything copy paste you just need to retrain the neural Lots uh and then the planning control of course has to change quite a bit but there's a ton of copy paste from what's happening at Tesla and so if you were to if you were to go with goal of like okay let's build a million human robots and you're not Tesla that's that's a lot to ask if you're a Tesla it's actually like it's not it's not that crazy and then the the follow-up question is and how difficult just like we're driving how difficult is the manipulation task uh such that it can have an impact at scale I think depending on the context the really nice thing about robotics is the um unless you do a manufacturing that kind of stuff is there's more room for error driving is so safety critical and so that and also time critical robot is allowed to move slower which is nice yes I think it's going to take a long time but the way you want to structure the development is you need to say okay it's going to take a long time how can I set up the uh product development roadmap so that I'm making Revenue along the way I'm not setting myself up for a zero one loss function where it doesn't work until it works you don't want to be in that position you want to make it useful almost immediately and then you want to slowly deploy it uh and uh at scale and you want to set up your data engine your improvement Loops the Telemetry the evaluation the harness and everything and you want to improve the product over time incorrectly and you're making Revenue along the way that's extremely important because otherwise you cannot build these these uh large undertakings just like don't make sense economically and also from the point of view of the team working on it they need the dopamine along the way they're not just going to make a promise about this being useful this is going to change the world in 10 years when it works this is not where you want to be you want to be in a place like I think autopilot is today where it's offering increased safety and um and uh convenience of driving today people pay for it people like it people purchase it and then you also have the greater mission that you're working towards and you see that so the dopamine for the team that that was a source of Happiness yes you're deploying this people like it people drive it people pay for it they care about it there's all these YouTube videos your grandma drives it she gives you feedback people like it people engage with it you engage with it huge do uh people that drive Teslas like recognize you and give you love like uh like hey thanks for the for the this nice feature that it's doing yeah I think the tricky thing is like some people really love you some people unfortunately like you're working on something that you think is extremely valuable useful Etc some people do hate you there's a lot of people who like hate me and the team and whatever the whole project and I think they have Tesla drivers uh many cases they're not actually yeah that's that's actually makes me sad about humans or the current the ways that humans interact I think that's actually fixable I think humans want to be good to each other I think Twitter and social media is part of the mechanism that actually somehow makes the negativity more viral but it doesn't deserve like disproportionately uh add of like a viral viral boost yeah negativity but like I wish people would just get excited about uh so suppress some of the jealousy some of the ego and just get excited for others and then there's a Karma aspect to that you get excited for others they'll get excited for you same thing in Academia if you're not careful there's a like a dynamical system there if you if you think of in silos and get jealous of somebody else being successful that actually perhaps counterintuitively uh leads the less productivity of you as a community and you individually I feel like if you keep celebrating others that actually makes you more successful yeah I think people haven't in depending on the industry haven't quite learned that yet yeah some people are also very negative and very vocal so they're very prominently featured but actually there's a ton of people who are cheerleaders but they're silent cheerlead cheerleaders and uh when you talk to people just in the world they will all tell you it's amazing it's great especially like people who understand how difficult it is to get this stuff working like people who have built products and makers entrepreneur entrepreneurs like make making this work and changing something is is incredibly hard those people are more likely to cheerlead you well one of the things that makes me sad is some folks in the robotics Community uh don't do the cheerleading and they should there's uh because they know how difficult it is well they actually sometimes don't know how difficult it is to create a product at scale right they actually deploy in the real world a lot of the development of robots and AI systems is done on very specific small benchmarks um and as opposed to real world conditions yes yeah I think it's really hard to work on robotics in academic setting or AI systems that apply in the real world you you've criticized you uh flourished and loved for time the imagenet the famed image in that data set and I've recently had some words uh of criticism that the academic research ml Community gives a little too much love still to the imagenet or like those kinds of benchmarks can you speak to the strengths and weaknesses of data sets used in machine learning research actually I don't know that I recall the specific instance where I was uh unhappy or criticizing imagenet I think imagenet has been extremely valuable uh it was basically a benchmark that allowed the Deep Learning Community to demonstrate that deep neural networks actually work it was uh there's a massive value in that um so I think imagenet was useful but um basically it's become a bit of an eminist at this point so eminist is like the 228 by 28 grayscale digits there's kind of a joke data set that everyone like just crushes if there's no Papers written on MNS though right maybe they should have strong papers like papers that focus on like how do we learn with a small amount of data that kind of stuff yeah I could see that being helpful but not in sort of like Mainline computer vision research anymore of course I think the way I've heard you somewhere maybe I'm just imagining things but I think you said like image that was a huge contribution to the community for a long time and now it's time to move past those kinds of well image that has been crushed I mean you know the error rates are uh yeah we're getting like 90 accuracy in in one thousand classification way uh prediction and I've seen those images and it's like really high that's really that's really good if I remember correctly the top five error rate is now like one percent or something given your experience with a gigantic real world data set would you like to see benchmarks move in certain directions that the research Community uses unfortunately I don't think academics currently have the next imagenet uh We've obviously I think we've crushed mnist we've basically kind of crushed imagenet uh and there's no next sort of big Benchmark that the entire Community rallies behind and uses um you know for further development of these networks uh yeah what it takes for data set to Captivate the imagination of everybody like where they all get behind it that that could also need like a viral like a leader right you know somebody with popularity I mean that yeah why did image of that take off is there or is it just the accident of History it was the right amount of difficult uh it was the right amount of difficult and simple and uh interesting enough it just kind of like it was it was the right time for that kind of a data set question from Reddit uh what are your thoughts on the role that synthetic data and game engines will play in the future of neural net model development I think um as neural Nets converge to humans uh the value of simulation to neural Nets will be similar to value of simulation to humans so people use simulation for uh people use simulation because they can learn something in that kind of a system and without having to actually experience it um but are you referring to the simulation we're doing our head no sorry simulation I mean like video games or uh you know other forms of simulation for various professionals well so let me push back on that because maybe their simulation that we do in our heads like simulate if I do this what do I think will happen Okay that's like internal simulation yeah internal isn't that what we're doing let's assuming before we act oh yeah but that's independent from like the use of uh simulation in the sense of like computer games or using simulation for training set creation or you know is it independent or is it just Loosely correlated because like uh isn't that useful to do like um counterfactual or like Edge case simulation to like you know what happens if there's a nuclear war what happens if there's you know like those kinds of things yeah that's a different simulation from like Unreal Engine that's how I interpreted the question uh so like simulation of the average case is that what's Unreal Engine what what what what what do you mean by Unreal Engine so simulating a world yeah physics of that world why is that different like because you also can add Behavior to that world and you can try all kinds of stuff right like you could throw all kinds of weird things into it so Unreal Engine is not just about similar I mean I guess it is about submitting the physics of the world it's also doing something with that yeah the graphics the physics and the Agents that you put into the environment and stuff like that yeah see I think you I feel like you said that it's not that important I guess for the future of AI development is that is that correct to interpret you that way uh I think humans use uh simulators for um humans use simulators and they find them useful and so computers will use simulators and find them useful okay so you're saying it's not I I don't use simulators very often I play a video game every once in a while but I don't think I derive any wisdom about my own existence from from those video games it's a momentary escape from reality versus a source of wisdom about reality so I don't so I think that's a very polite way of saying simulation is not that useful yeah maybe maybe not I don't see it as like a fundamental really important part of like training neural Nets currently uh but I think uh as neural Nets become more and more powerful I think you will need fewer examples to train additional behaviors and uh simulation is of course there's a domain Gap in a simulation that's not the real world there's slightly something different but uh with a powerful enough neural net uh you need um The Domain Gap can be bigger I think because neural network will sort of understand that even though it's not the real world it like has all this high level structure that I'm supposed to be able to learn from so then you'll know we'll actually yeah you'll be able to Leverage the synthetic data better yes by closing the get better understanding in which ways this is not real data exactly uh right to do better questions next time that was that was a question but I'm just kidding all right um so is it possible do you think speaking of feminist to construct neural Nets and training processes that require very little data so we've been talking about huge data sets like the internet for training I mean one way to say that is like you said like the querying itself is another level of training I guess and that requires a little data yeah but do you see any uh value in doing research and kind of going down the direction of can we use very little data to train to construct a knowledge base 100 I just think like at some point you need a massive data set and then when you pre-train your massive neural nut and get something that you know is like a GPT or something then you're able to be very efficient at training any arbitrary new task uh so a lot of these gpts you know you can do tasks like sentiment analysis or translation or so on just by being prompted with very few examples here's the kind of thing I want you to do like here's an input sentence here's the translation into German input sentence translation to German input sentence blank and the neural network will complete the translation to German just by looking at sort of the example you've provided and so that's an example of a very few shot uh learning in the activations of the neural net instead of the weights of the neural land and so I think basically uh just like humans neural Nets will become very data efficient at learning any other new task but at some point you need a massive data set to pre-train your network to get that and probably we humans have something like that do we do we have something like that do we have a passive in the background background model constructing thing that just runs all the time in a self-supervised way we're not conscious of it I think humans definitely I mean obviously we have uh we learn a lot during during our life span but also we have a ton of Hardware that helps us initialize initialization coming from sort of evolution and so I think that's also a really big a big component a lot of people in the field I think they just talk about the amounts of like seconds and the you know that a person has lived pretending that this is a table arasa sort of like a zero initialization of a neural net and it's not like you can look at a lot of animals like for example zebras zebras get born and they see and they can run there's zero train data in their lifespan they can just do that so somehow I have no idea how Evolution has found a way to encode these algorithms and these neural net initializations are extremely good to 80 CGS and I have no idea how this works but apparently it's possible because here's a proof by existence there's something magical about going from a single cell to an organism that is born to the first few years of life I kind of like the idea that the reason we don't remember anything about the first few years of our life is that it's a really painful process like it's a very difficult challenging training process yeah like intellectually like and maybe yeah I mean I don't why don't we remember any of that there might be some crazy training going on and the that maybe that's the background model training that uh is is very painful and so it's best for the system once it's trained not to remember how it's constructed I think it's just like the hardware for long-term memory is just not fully developed sure I kind of feel like the first few years of uh of infants is not actually like learning it's brain maturing yeah um we're born premature um and there's a theory along those lines because of the birth canal and the swelling of the brain and so we're born premature and then the first few years we're just the brains maturing and then there's some learning eventually um it's my current view on it what do you think do you think neural Nets can have long-term memory like that approach is something like humans do you think you know do you think there needs to be another meta architecture on top of it to add something like a knowledge base that learns facts about the world and all that kind of stuff yes but I don't know to what extent it will be explicitly constructed um it might take unintuitive forms where you are telling the GPT like hey you have a you have a declarative memory bank to which you can store and retrieve data from and whenever you encounter some information that you find useful just save it to your memory bank and here's an example of something you have retrieved and Heiser how you say it and here's how you load from it you just say load whatever you teach it in text in English and then it might learn to use a memory bank from from that oh so so the neural net is the architecture for the background model the the base thing and then yeah everything else is just on top of this it's not just a text right it's you're giving it gadgets and gizmos so uh you're teaching in some kind of a special Language by which we can it can save arbitrary information and retrieve it at a later time and you're telling about these special tokens and how to arrange them to use these interfaces it's like hey you can use a calculator here's how you use it just do five three plus four one equals and when equals is there uh a calculator will actually read out the answer and you don't have to calculate it yourself and you just like tell it in English this might actually work do you think in that sense gato is interesting the the Deep Mind system that it's not just new language but actually throws it all uh in the same pile images actions all that kind of stuff that's basically what we're moving towards yeah I think so so gato is uh is very much a kitchen sink approach to like um reinforcement learning lots of different environments with a single fixed Transformer model right um I think it's a very sort of early result in that in that realm but I think uh yeah it's along the lines of what I think things will eventually look like right so this is the early days of a system that eventually will look like this like from a rigid Rich sudden perspective yeah I'm not super huge fan of I think all these interfaces that like look very different um I would want everything to be normalized into the same API so for example it's green pixels versus same API instead of having like different world environments at a very different physics and Joint configurations and appearances and whatever and you're having some kind of special tokens for different games that you can plug I'd rather just normalize everything to a single interface so it looks the same to the neural net if that makes sense so it's all going to be pixel based pong in the end I think so okay uh let me ask you about your own personal life a lot of people want to know you're one of the most productive and brilliant people in the history of AI what is a productive day in the life of Andre capathi look like what time do you wake up because imagine um some kind of dance between the average productive day and a perfect productive day so the perfect productive day is the thing we strive towards in the average is kind of what it kind of converges to getting all the mistakes and human eventualities and so on yeah so what times you wake up are you morning person I'm not a morning person I'm a night owl for sure I think stable or not that's semi-stable like a eight or nine or something like that during my PhD it was even later I used to go to sleep usually at 3am I think uh the am hours are are precious and very interesting time to work because everyone is asleep um at 8 AM or 7 A.M the east coast is awake so there's already activity there's already some text messages whatever there's stuff happening you can go in like some news website and there's stuff happening it's distracting uh at 3am everything is totally quiet and so you're not going to be bothered and you have solid chunks of time to do your work um so I like those periods Night Owl by default and then I think like productive time basically um what I like to do is you need you need to like build some momentum on the problem without too much distraction and um you need to load your Ram uh your working memory with that problem and then you need to be obsessed with it when you're taking shower when you're falling asleep you need to be obsessed with the problem and it's fully in your memory and you're ready to wake up and work on it right there so there's a scale of uh is this in a scale temporal scale of a single day or a couple of days a week a month so I can't talk about one day basically in isolation because it's a whole process when I want to get when I want to get productive in the problem I feel like I need a span of a few days where I can really get in on that problem and I don't want to be interrupted and I'm going to just uh be completely obsessed with that problem and that's where I do most of my good workouts you've done a bunch of cool like little projects in a very short amount of time very quickly so that that requires you just focusing on it yeah basically I need to load my working memory with the problem and I need to be productive because there's always like a huge fixed cost to approaching any problem uh you know like I was struggling with this for example at Tesla because I want to work on like small side projects but okay you first need to figure out okay I need to SSH into my cluster I need to bring up a vs code editor so I can like work on this I need to I run into some stupid error because of some reason like you're not at a point where you can be just productive right away you are facing barriers and so it's about uh really removing all that barrier and you're able to go into the problem and you have the full problem loaded in your memory and somehow avoiding distractions of all different forms like uh news stories emails but also distractions from other interesting projects that you previously worked on are currently working on and so on you just want to really focus your mind and I mean I can take some time off for distractions and in between but I think it can't be too much uh you know most of your day is sort of like spent on that problem and then you know I drink coffee I have my morning routine I look at some news uh Twitter Hacker News Wall Street Journal Etc so basically you wake up you have some coffee are you trying to get to work as quickly as possible do you do taking this diet of of like what the hell's happening in the world first I am I do find it interesting to know about the world I don't know that it's useful or good but it is part of my routine right now so I do read through a bunch of news articles and I want to be informed and um I'm suspicious of it I'm suspicious of the practice but currently that's where I am Oh you mean suspicious about the positive effect yeah of that practice on your productivity and your well-being my well-being psychologically uh and also on your ability to deeply understand the world because how there's a bunch of sources of information you're not really focused on deeply integrating yeah it's a little bit distracting or yeah in terms of a perfectly productive day for how long of a stretch of time in one session do you try to work and focus on a thing it's a couple hours is it one hours or 30 minutes is 10 minutes I can probably go like a small few hours and then I need some breaks in between for like food and stuff and uh yeah but I think like uh it's still really hard to accumulate hours I was using a Tracker that told me exactly how much time I've spent coding any one day and even on a very productive day I still spent only like six or eight hours yeah and it's just because there's so much padding commute talking to people food Etc there's like the cost of life just living and sustaining and homeostasis and just maintaining yourself as a human is very high and and there seems to be a desire within the human mind to to uh to participate in society that creates that padding yeah because I yeah the most productive days I've ever had is just completely from start to finish just tuning out everything yep and just sitting there and then and then you could do more than six and eight hours yeah is there some wisdom about what gives you strength to do like uh tough days of long Focus yeah just like whenever I get obsessed about a problem something just needs to work something just needs to exist it needs to exist and you so you're able to deal with bugs and programming issues and technical issues and uh design decisions that turn out to be the wrong ones you're able to think through all of that given given that you want to think to exist yeah it needs to exist and then I think to me also a big factor is uh you know are other humans are going to appreciate it are they going to like it that's a big part of my motivation if I'm helping humans and they seem happy they say nice things uh they tweet about it or whatever that gives me pleasure because I'm doing something useful so like you do see yourself sharing it with the world like with yes on GitHub with a blog post or through videos yeah I was thinking about it like suppose I did all these things but did not share them I don't think I would have the same amount of motivation that I can build up you enjoy the feeling of other people uh gaining value and happiness from the stuff you've created yeah uh what about diet is there I saw you playing with in intermittent fast do you fast does that help with everything well the things you played what's been most beneficial to the your ability to mentally focus on a thing and just meant the mental productivity and happiness you still fast yeah it's so fast but I do intermittent fasting but really what it means at the end of the day is I skip breakfast yeah so I do uh 18 6 roughly by default when I'm in my steady state if I'm traveling or doing something else I will break the rules but in my steady state I do 18 6 so I eat only from 12 to 6. not a hard Rule and I break it often but that's my default and then um yeah I've done a bunch of random experiments for the most part right now uh where I've been for the last year and a half I want to say is I'm um plant-based or planned forward I heard plant forward it sounds better exactly I didn't actually know the differences but it sounds better in my mind but it just means I prefer plant-based food and raw or cooked or I prefer cooked uh and blunt paste so plant-based oh forgive me I don't actually know how wide the category of plant entails Wellness just means that you're not uh and you can flex and uh you just prefer to eat plants and you know you're not making you're not trying to influence other people and if someone is you come to someone's house party and they serve you a stake that they're really proud of you will eat it yes right it's just not judgment oh that's beautiful I mean that's um on the flip side of that but I'm very sort of flexible have you tried doing one meal a day uh I have uh accidentally not consistently but I've accidentally had that I don't I don't like it I think it makes me feel uh not good it's too it's too much too much of a hit yeah and uh So currently I have about two meals a day 12 and six I do that non-stop I'm doing it now I'm doing one meal a day okay so it's interesting it's a interesting feeling have you ever fasted longer than a day yeah I've done a bunch of water fasts because I was curious what happens uh anything interesting yeah I would say so I mean you know what's interesting is that you're hungry for two days and then starting day three or so you're not hungry it's like such a weird feeling because you haven't eaten in a few days and you're not hungry isn't that weird it's really one of the many weird things about human biology is figure something out it finds finds another source of energy or something like that or uh relaxes the system I don't know how yeah the body is like you're hungry you're hungry and then it just gives up it's like okay I guess we're fasting now there's nothing and then it just kind of like focuses on trying to make you not hungry uh and you know not feel the the damage of that and uh trying to give you some space to figure out the food situation so are you still to this day most productive uh at night I would say I am but it is really hard to maintain my PhD schedule um especially when I was say working at Tesla and so on it's a non-starter so but even now like you know people want to meet for various events they Society lives in a certain period of time and you sort of have to like work so that's it's hard to like do a social thing and then after that return and do work yeah it's just really hard uh that's why I try to do social things I try not to do too uh too much drinking so I can return and continue doing work um but a Tesla is there is there conversions in Tesla but any any company is there a convergence towards the schedule or is there more is that how humans behave when they collaborate I need to learn about this yeah do they try to keep a consistent schedule you're all awake at the same time I mean I do try to create a routine and I try to create a steady state in which I'm uh comfortable in uh so I have a morning routine I have a day routine I try to keep things to do a steady state and um things are predictable and then you can sort of just like your body just sort of like sticks to that and if you try to stress that a little too much it will create uh you know when you're traveling and you're dealing with jet lag you're not able to really Ascend to you know where you need to go yeah yeah that's weird as humans with the habits and stuff uh what are your thoughts on work-life balance throughout a human lifetime so the testing part was known for sort of pushing people to their limits in terms of what they're able to do in terms of what they're uh trying to do in terms of how much they work all that kind of stuff yeah I mean I will say teslaq is still too much uh bad rep for this because what's happening is Tesla is a it's a bursting environment uh so I would say the Baseline uh my only point of reference is Google where I've interned three times and I saw what it's like inside Google and and deepmind um I would say the Baseline is higher than that but then there's a punctuated equilibrium where once in a while there's a fire and uh someone like people work really hard and so it's spiky and bursty and then all the stories get collected about the bursts yeah and then it gives the appearance of like total insanity but actually it's just a bit more intense environment and there are fires and Sprints and so I think uh you know definitely though I I would say um it's a more intense environment than something you would get but you in your person forget all of that just in your own personal life um what do you think about the happiness of a human being a brilliant person like yourself about finding a balance between work and life or is it such a thing not a good thought experiment yeah I think I think balance is good but I also love to have Sprints that are out of distribution and that's what I think I've been pretty uh creative and um as well Sprints out of distribution means that most of the time you have a yeah quote-unquote balance I have balance most of the time yes I like being obsessed with something once in a while once in a while is what once a week once a month once a year yeah probably like say once a month or something yeah and that's when we get a new GitHub repo come on yeah that's when you like really care about a problem it must exist this will be awesome you're obsessed with it and now you can't just do it on that day you need to pay the fixed cost of getting into the groove and then you need to stay there for a while and then Society will come and they will try to mess with you and they will try to distract you yeah yeah the worst thing is like a person who's like I just need five minutes of your time yeah this is the cost of that is not five minutes and Society needs to change how it thinks about just five minutes of your time right it's never it's never just one minute it's just 30 it's just a quick what's the big deal why are you being so yeah no uh what's your computer setup what uh what's like the perfect are you somebody that's flexible to no matter what laptop four screens yeah uh or do you uh prefer a certain setup that you're most productive um I guess the one that I'm familiar with is one large screen uh 27 inch um and my laptop on the side with operating system I do Max that's my primary for all tasks I would say OS X but when you're working on deep learning everything as Linux your SSH into a cluster and you're working remotely but what about the actual development like that using the IDE yeah you would use uh I think a good way is you just run vs code um my favorite editor right now on your Mac but you are actually you have a remote folder through SSH um so the actual files that you're manipulating are on the cluster somewhere else so what's the best IDE uh vs code what else do people so I use emacs still that's cool uh so it may be cool I don't know if it's maximum productivity um so what what do you recommend in terms of editors you worked with a lot of software Engineers editors for python C plus plus machine learning applications I think the current answer is vs code currently I believe that's the best um IDE it's got a huge amount of extensions it has a GitHub co-pilot um uh integration which I think is very valuable what do you think about the the co-pilot integration I was actually uh I got to talk a bunch with Guido and Rossum who's a creative Python and he loves Coppola he like he programs a lot with it yeah uh do you yeah he's copilot I love it and uh it's free for me but I would pay for it yeah I think it's very good and the utility that I found with it was is in is it I would say there is a learning curve and you need to figure out when it's helpful and when to pay attention to its outputs and when it's not going to be helpful where you should not pay attention to it because if you're just reading its suggestions all the time it's not a good way of interacting with it but I think I was able to sort of like mold myself to it I find it's very helpful number one in copy paste and replace some parts so I don't um when the pattern is clear it's really good at completing the pattern and number two sometimes it suggests apis that I'm not aware of so it tells you about something that you didn't know so and that's an opportunity to discover and you it's an opportunity to see I would never take copilot code AS given I almost always uh copy a copy this into a Google Search and you see what this function is doing and then you're like oh it's actually actually exactly what I need thank you copilot so you learned something so it's in part a search engine apart maybe getting the exact syntax correctly that once you see it yep it's that NP hard thing it's like once you see it you know yes exactly correct exactly you yourself you can struggle you can verify efficiently but you you can't generate efficiently and copilot really I mean it's it's autopilot for programming right and currently it's doing the link following which is like the simple copy paste and sometimes suggest uh but over time it's going to become more and more autonomous and so the same thing will play out in not just coding but actually across many many different things probably but coding is an important one right like writing programs yeah what how do you see the future of that developing uh the program synthesis like being able to write programs that are more and more complicated because right now it's human supervised in interesting ways yes like what it feels like the transition will be very painful my mental model for it is the same thing will happen as with the autopilot uh So currently it's doing link following is doing some simple stuff and eventually we'll be doing autonomy and people will have to intervene less and less and there could be like you like testing mechanisms like if it writes a function and that function looks pretty damn correct but how do you know it's correct because you're like getting lazier and lazier as a programmer like your ability to because like little bugs but I guess it won't make a little no it will it copilot will make uh off by one subtle bugs it has done that to me but do you think future systems will or is it really the off by one is actually a fundamental challenge of programming in that case it wasn't fundamental and I think things can improve but uh yeah I think humans have to supervise I am nervous about people not supervising what comes out and what happens to for example the proliferation of bugs in all of our systems I'm nervous about that but I think and there will probably be some other copilots for bug finding and stuff like that at some point because there will be like a lot more automation for uh oh man so it's like a program a co-pilot that generates a compiler for one that does a linter yes one that does like a a type Checker yes it's a committee of like a GPT sort of like and then they'll be like a manager for the committee yeah and then there'll be somebody that says a new version of this is needed we need to regenerate it yeah there were 10 gpts that were forwarded and gave 50 suggestions another one looked at it and picked a few that they like a bug one looked at it and it was like it's probably a bug they got re-ranked by some other thing and then a final Ensemble uh GPT comes in it's like okay given everything you guys have told me this is probably the next token you know the feeling is the number of programmers in the world has been growing and growing very quickly do you think it's possible that it'll actually level out and drop to like a very low number with this kind of world because then you'll be doing software 2.0 programming um and you'll be doing this kind of generation of copilot type systems programming but you won't be doing the old school software 1.0 program I don't currently think that they're just going to replace human programmers um it's I'm so hesitant saying stuff like this right because this is going to be replaced in five years I don't know it's going to show that like this is where we thought because I I agree with you but I think we might be very surprised right like what are the next I I what's your sense of what we're seeing with language models like does it feel like the beginning or the middle or the end the beginning 100 I think the big question in my mind is for sure GPT will be able to program quite well competently and so on how do you steer the system you still have to provide some guidance to what you actually are looking for and so how do you steer it and how do you say how do you talk to it how do you um audit it and verify that what is done is correct and how do you like work with this and it's as much not just an AI problem but a UI ux problem yeah um so beautiful fertile ground for so much interesting work for vs code plus plus where you're not just it's not just human programming anymore it's amazing yeah so you're interacting with the system so not just one prompt but it's iterative prompting yeah you're trying to figure out having a conversation with the system yeah that actually I mean to me that's super exciting to have a conversation with the program I'm writing yeah maybe at some point uh you're just conversing with it it's like okay here's what I want to do actually this variable maybe it's not even that low level as variable but you can also Imagine like can you translate this to c plus and back to python yeah that already kind of existence no but just like doing it as part of the program experience like I think I'd like to write this function in C plus plus or or like you just keep changing for different uh different programs because they're different syntax maybe I want to convert this into a functional language and so like you get to become multilingual as a programmer and dance back and forth efficiently yeah I mean I think the UI ux of it though is like still very hard to think through because it's not just about writing code on a page you have an entire developer environment you have a bunch of hardware on it uh you have some environmental variables you have some scripts that are running in the Chrome job like there's a lot going on to like working with computers and how do these uh systems set up environment flags and work across multiple machines and set up screen sessions and automate different processes like how all that works and it's auditable by humans and so on is like massive question at the moment you've built archive sanity what is archive and what is the future of academic research publishing that you would like to see so archive is This pre-print Server so if you have a paper you can submit it for publication to journals or conferences and then wait six months and then maybe get a decision pass or fail or you can just upload it to Archive and then people can tweet about it three minutes later and then everyone sees it everyone reads it and everyone can profit from it uh in their own ways you can cite it and it has an official look to it it feels like a pub like it feels like a publication process yeah it feels different than you if you just put in a blog post oh yeah yeah I mean it's a paper and usually the the bar is higher for something that you would expect on archive as opposed to and something you would see in a blog post well the culture created the bar because you could probably yes host a pretty crappy face for an archive um so what's that make you feel like what's that make you feel about peer review so rigorous peer review by two three experts versus the peer review of the community right as it's written yeah basically I think the community is very well able to peer review things very quickly on Twitter and I think maybe it just has to do something with AI machine learning field specifically though I feel like things are more easily auditable um and the verification is is easier potentially than the verification somewhere else so it's kind of like um you can think of these uh scientific Publications there's like little blockchains where everyone's building on each other's work and setting each other and you sort of have ai which is kind of like this much faster and loose blockchain but then you have and any one individual entry is like very um very cheap to make and then you have other fields where maybe that model doesn't make as much sense um and so I think in AI at least things are pretty easily verifiable and so that's why when people upload papers they're a really good idea and so on people can try it out like the next day and they can be the final Arbiter whether it works or not on their problem and the whole thing just moves significantly faster so I kind of feel like Academia still has a place sorry this like conference Journal process still has a place but it's sort of like an um it lags behind I think and it's a bit more maybe higher quality process but it's not sort of the place where you will discover Cutting Edge work anymore yeah it used to be the case when I was starting my PhD that you go to conferences and journals and you discuss all the latest research now when you go to a conference or Journal like no one discusses anything that's there because it's already like three generations ago irrelevant yes which makes me sad about like deepmind for example where they they still they still publish in nature and these big prestigious I mean there's still value as opposed to The Prestige that comes with these big venues but the the result is that they they'll announce some breakthrough performance and it'll take like a year to actually publish the details I mean and those details in if they were published immediately would Inspire the community to move in certain directions with that yeah it would speed up the rest of the community but I don't know to what extent that's part of their objective function also that's true so it's not just the prestige a little bit of the delay is uh is part yeah they certainly deepmind specifically has been um working in the regime of having a slightly higher quality basically process and latency and uh publishing those papers that way another question from Reddit do you or have you suffered from imposter syndrome being the director of AI Tesla being this person when you're at Stanford where like the world looks at you as the expert in AI to teach teach the world about machine learning when I was leaving Tesla after five years I spent a ton of time in meeting rooms uh and you know I would read papers in the beginning when I joined Tesla I was writing code and then I was writing lesson last code and I was reading code and then I was reading lesson less code and so this is just a natural progression that happens I think and uh definitely I would say near the tail end that's when it sort of like starts to hit you a bit more that you're supposed to be an expert but actually the source of Truth is the code that people are writing the GitHub and the actual the actual code itself and you're not as familiar with that as you used to be and so I would say maybe there's some like insecurity there yeah that's actually pretty profound that a lot of the insecurity has to do with not writing the code in the computer science space like that because that is the truth that that right there code is the source of Truth the papers and everything else it's a high level summary I don't uh yeah just a high level summary but at the end of the day you have to read code it's impossible to translate all that code into actual uh you know paper form uh so when when things come out especially when they have a source code available that's my favorite place to go so like I said you're one of the greatest teachers of machine learning AI ever uh from cs231n to today what advice would you give to beginners interested in getting into machine learning beginners are often focused on like what to do and I think the focus should be more like how much you do so I I'm kind of like believer on a high level in this 10 000 hours kind of concept where you just kind of have to just pick the things where you can spend time and you you care about and you're interested in you literally have to put in 10 000 hours of work um it doesn't even like matter as much like where you put it and your you'll iterate and you'll improve and you'll waste some time I don't know if there's a better way you need to put in 10 000 hours but I think it's actually really nice because I feel like there's some sense of determinism about uh being an expert at a thing if you spend ten thousand hours you can literally pick an arbitrary thing and I think if you spend ten thousand hours of deliberate effort and work you actually will become an expert at it and so I think it's kind of like a nice thought um and so uh basically I would focus more on like are you spending 10 000 hours that's what I focus on so and then thinking about what kind of mechanisms maximize your likelihood of getting to ten thousand dollars exactly which for us silly humans means probably forming a daily habit of like every single day actually doing the thing whatever helps you so I do think to a large extent is a psychological problem for yourself uh one other thing that I help that I think is helpful for the psychology of it is many times people compare themselves to others in the area I think this is very harmful only compare yourself to you from some time ago like say a year ago are you better than you year ago this is the only way to think um and I think this then you can see your progress and it's very motivating that's so interesting that focus on the quantity of ours because I think a lot of people uh in the beginner stage but actually throughout get paralyzed uh by uh the choice like which one do I pick this path or this path yeah like they'll literally get paralyzed by like which ID to use well they're worried yeah they're worried about all these things but the thing is some of the you you will waste time doing something wrong yes you will eventually figure out it's not right you will accumulate scar tissue and next time you'll grow stronger because next time you'll have the scar tissue and next time you'll learn from it and now next time you come into a similar situation you'll be like all right I messed up I've spent a lot of time working on things that never materialize into anything and I have all that scar tissue and I have some intuitions about what was useful what wasn't useful how things turned out uh so all those mistakes were uh were not dead work you know so I just think you should just focus on working what have you done what have you done last week uh that's a good question actually to ask for for a lot of things not just machine learning um it's a good way to cut the the I forgot what the term will use but the fluff the blubber whatever the uh the inefficiencies in life uh what do you love about teaching you seem to find yourself often in the like drawn to teaching you're very good at it but you're also drawn to it I mean I don't think I love teaching I love happy humans and happy humans like when I teach yes I I wouldn't say I hate teaching I tolerate teaching but it's not like the act of teaching that I like it's it's that um you know I I have some I have something I'm actually okay at it yes I'm okay at teaching and people appreciate it a lot yeah and uh so I'm just happy to try to be helpful and uh teaching itself is not like the most I mean it's really it can be really annoying frustrating I was working on a bunch of lectures just now I was reminded back to my days of 231 and just how much work it is to create some of these materials and make them good the amount of iteration and thought and you go down blind alleys and just how much you change it so creating something good um in terms of like educational value is really hard and uh it's not fun it's difficult so for people should definitely go watch your new stuff you put out there are lectures where you're actually building the thing like from like you said the coldest truth so discussing back propagation by building it by looking through and just the whole thing so how difficult is that to prepare for I think that's a really powerful way to teach how did you have to prepare for that or are you just live thinking through it I will typically do like say three takes and then I take like the the better take uh so I do multiple takes and I take some of the better takes and then I just build out a lecture that way uh sometimes I have to delete 30 minutes of content because it just went down the Nelly that I didn't like too much there's about a bunch of iteration and it probably takes me you know somewhere around 10 hours to create one hour of content to give one hour it's interesting I mean is it difficult to go back to the like the basics do you draw a lot of like wisdom from going back to the basics yeah going back to back propagation loss functions where they come from and one thing I like about teaching a lot honestly is it definitely strengthens your understanding uh so it's not a purely altruistic activity it's a way to learn if you have to explain something to someone uh you realize you have gaps in knowledge uh and so I even surprised myself in those lectures like also the result will obviously look at this and then the result doesn't look like it and I'm like okay I thought I understood this yeah but that's why it's really cool to literally code you run it in a notebook and it gives you a result and you're like oh wow and like actual numbers actual input act you know actual code yeah it's not mathematical symbols Etc the source of Truth is the code it's not slides it's just like let's build it it's beautiful you're a rare human in that sense uh what advice would you give to researchers uh trying to develop and publish idea that have a big impact in the world of AI so maybe um undergrads maybe early graduate students yep I mean I would say like they definitely have to be a little bit more strategic than I had to be as a PhD student because of the way AI is evolving it's going the way of physics where you know in physics you used to be able to do experiments on your benchtop and everything was great and you could make progress and now you have to work in like LHC or like CERN and and so AI is going in that direction as well um so there's certain kinds of things that's just not possible to do on the bench top anymore and uh I think um that didn't used to be the case at the time do you still think that there's like Gan type papers to be written where like uh like very simple idea that requires just one computer to illustrate a simple example I mean one example that's been very influential recently is diffusion models diffusion models are amazing the fusion models are six years old for the longest time people were kind of ignoring them as far as I can tell and they're an amazing generative model especially in uh in images and so stable diffusion and so on it's all diffusion based the fusion is new it was not there and came from well it came from Google but a researcher could have come up with it in fact some of the first actually no those came from Google as well but a researcher could come up with that in an academic Institution yeah what do you find Most Fascinating about diffusion models so from the societal impact to the the technical architecture what I like about the fusion is it works so well is that surprising to you the amount of the variety almost the novelty of the synthetic data is generating yeah so the stable diffusion images are incredible it's the speed of improvement in generating images has been insane uh we went very quickly from generating like tiny digits to the tiny faces and it all looked messed up and now we have stable diffusion and that happened very quickly there's a lot that Academia can still contribute you know for example um flash attention is a very efficient kernel for running the attention operation inside the Transformer that came from academic environment it's a very clever way to structure the kernel uh that that's the calculation so it doesn't materialize the attention Matrix um and so there's I think there's still like lots of things to contribute but you have to be just more strategic do you think neural networks could be made to reason uh yes do you think they're already reason yes what's your definition of reasoning uh information processing so in a way that humans think through a problem and come up with novel ideas it it feels like a reasoning yeah so the the novelty I don't want to say but out of distribution ideas you think it's possible yes and I think we're seeing that already in the current neural Nets you're able to remix the training set information into true generalization in some sense that doesn't appear it doesn't matter like you're doing something interesting algorithmically you're manipulating you know some symbols and you're coming up with some correct a unique answer in a new setting what would uh illustrate to you holy shit this thing is definitely thinking to me thinking or reasoning is just information processing and generalization and I think the neural Nets already do that today so being able to perceive the world or perceive the whatever the inputs are and to make uh predictions based on that or actions based on that that's that's the reason yeah you're giving correct answers in novel settings by manipulating information you've learned the correct algorithm you're not doing just some kind of a lookup table and there's neighbor search let me ask you about AGI what what are some moonshot ideas you think might make significant progress towards AGI or maybe in other ways what are big blockers that we're missing now so basically I am fairly bullish on our ability to build agis uh basically automated systems that we can interact with and are very human-like and we can interact with them in a digital realm or Physical Realm currently it seems most of the models that sort of do these sort of magical tasks are in a text Realm um I think uh as I mentioned I'm suspicious that the text realm is not enough to actually build full understanding of the world I do actually think you need to go into pixels and understand the physical world and how it works so I do think that we need to extend these models to consume images and videos and train on a lot more data that is multimodal in that way if you think you need to touch the world to understand it also well that's the big open question I would say in my mind is if you also require the embodiment and the ability to uh sort of interact with the world run experiments and have a data of that form then you need to go to Optimus or something like that and so I would say Optimus in some way is like a hedge in AGI because it seems to me that it's possible that just having data from the internet is not enough if that is the case then Optimus may lead to AGI because Optimus would I to me there's nothing Beyond optimism you have like this humanoid form factor that can actually like do stuff in the world you can have millions of them interacting with humans and so on and uh if that doesn't give a rise to AGI at some point like not I'm not sure what will um so from a completeness perspective I think that's the uh that's a really good platform but it's a much harder platform because uh you are dealing with atoms and you need to actually like build these things and integrate them into society so I think that path takes longer but it's much more certain and then there's a path of the internet and just like training these compression models effectively uh on uh trying to compress all the internet and that might also give these agents as well compress the internet but also interact with the internet yeah so it's not obvious to me in fact I suspect you can reach AGI without ever entering the physical world and which is a little bit more uh concerning because it might that results in it happening faster so it just feels like we're in again boiling water we won't know as it's happening I would like to I'm not afraid of AGI I'm excited about it there's always concerns but I would like to know when it happens yeah or and have like hints about when it happens like a year from now it will happen that kind of thing yeah I just feel like in the digital realm it just might happen yeah I think all we have available to us because no one has built AGI again so all we have available to us is uh is there enough for cow ground on the periphery I would say yes and we have the progress so far which has been very rapid and uh there are next steps that are available and so I would say uh yeah it's quite likely that we'll be interacting with digital entities how will you know that somebody's birthday it's going to be a slow I think it's going to be a slow incremental transition is going to be product based and focused it's going to be GitHub co-pilot getting better and then uh GPT is helping you right and then these oracles that you can go to with mathematical problems I think we're on a on the verge of being able to ask very complex questions in chemistry physics math of these oracles and have them complete Solutions so AGI to use primarily focus on intelligence so Consciousness doesn't enter into uh into it so in my mind Consciousness is not a special thing you will you will figure out and bolt-on I think it's an emerging phenomenon of a large enough and complex enough um generative model sort of so um if you have a complex and Alpha World model that understands the world then it also understands its predicament in the world as being a language model which to me is a form of Consciousness or self-awareness and so in order to understand the world deeply you probably have to integrate yourself into the world yeah and in order to interact with humans and other living beings Consciousness is a very useful tool I think Consciousness is like a modeling insight modeling Insight yeah it's a you have a powerful enough model of understanding the world that you actually understand that you are an entity in it yeah but there's also this um perhaps just a narrative we tell ourselves there's a it feels like something to experience the world the hard problem of Consciousness yeah but that could be just the narrative that we tell ourselves yeah I don't think what yeah I think it will emerge I think it's going to be something uh very boring like we'll be talking to these uh digital AIS they will claim they're conscious they will appear conscious they will do all the things that you would expect of other humans and uh it's going to just be a stalemate I I think there would be a lot of actual fascinating ethical questions like Supreme Court level questions of whether you're allowed to turn off a conscious AI if you're allowed to build the conscious AI maybe there would have to be the same kind of debates that you have around um sorry to bring up a political topic but you know abortion which is the deeper question with abortion is what is life and the Deep question with AI is also what is life and what is conscious and I think that'll be very fascinating to bring up it might become illegal to build systems that are capable that like of such a level of intelligence that Consciousness would emerge and therefore the capacity to suffer would emerge and some A system that says no please don't kill me well that's what the Lambda compute the Lambda chatbot already told um this Google engineer right like it it was talking about not wanting to die or so on so that might become illegal to do that right I because otherwise you might have a lot of a lot of creatures that don't want to die and they will uh you can just spawn Infinity of them on a cluster and then that might lead to like horrible consequences because then there might be a lot of people that secretly love murder and they'll start practicing murder on those systems I mean there's just I to me all of this stuff just brings a beautiful mirror to The Human Condition and human nature we'll get to explore it and that's what like the best of uh the Supreme Court of all the different debates we have about ideas of what it means to be human we get to ask those deep questions that we've been asking throughout human history there's always been the other in human history uh we're the good guys and that's the bad guys and we're going to uh you know throughout human history let's murder the bad guys and the same will probably happen with robots it'll be the other at first and then we'll get to ask questions of what does it mean to be alive what does it mean to be conscious yeah and I think there's some Canary in the coal mines even with what we have today um and uh you know for example these there's these like waifus that you like work with and some people are trying to like this company is going to shut down but this person really like yeah love their waifu and like is trying to like Port it somewhere else and like it's not possible and like I think like definitely uh people will have feelings towards uh towards these um systems because in some sense they are like a mirror of humanity because they are like sort of like a big average of humanity yeah in a way that it's trained but we can that average we can actually watch it's nice to be able to interact with the big average of humanity yeah and do like a search query on it yeah yeah it's very fascinating and uh we can also of course also like shape it it's not just a pure average we can mess with the training data we can mess with the objective we can fine tune them in various ways so we have some um you know impact on what those systems look like if you want to achieve AGI um and you could have a conversation with her and ask her uh talk about anything maybe ask her a question what kind of stuff would you would you ask I would have some practical questions in my mind like uh do I or my loved ones really have to die uh what can we do about that do you think it will answer clearly or would it answer poetically I would expect it to give Solutions I would expect it to be like well I've read all of these textbooks and I know all these things that you've produced and it seems to me like here are the experiments that I think it would be useful to run next and hear some Gene therapies that I think would be helpful and uh here are the kinds of experiments that you should run okay let's go over the Start experiment okay imagine that mortality is actually uh like a prerequisite for happiness so if we become immortal we'll actually become deeply unhappy and the model is able to know that so what is this supposed to tell you stupid human about it yes you can become a mortal but you will become deeply unhappy if if the model is if the AGI system is trying to empathize with you human what is this supposed to tell you that yes you don't have to die but you're really not going to like it because that is it going to be deeply honest like there's a Interstellar what is it the AI says like humans want 90 honesty so like you have to pick how honest I want to answer these practical questions yeah I love Yeah Interstellar by the way I think it's like such a sidekick to the entire story but at the same time it's like really interesting it's kind of limited in certain ways right yeah it's limited and I think that's totally fine by the way I don't think uh I think it's find impossible to have a limited and imperfect agis is that the feature almost as an example like it has a fixed amount of compute on its physical body and it might just be that even though you can have a super amazing Mega brain super intelligent AI you also can have like you know less intelligent AIS that you can deploy in a power efficient way and then they're not perfect they might make mistakes no I meant more like say you had infinite compute and it's still good to make mistakes sometimes like in order to integrate yourself like um what is it going back to Goodwill Hunting uh Robin Williams character says like the human imperfections that's the good stuff right isn't it isn't that this like we don't want perfect we want flaws in part to form connections with each other because it feels like something you can attach your feelings to the the flaws in that same way you want an AI That's flawed I don't know I feel like perfectionist but then you're saying okay yeah but that's not AGI but see AGI would need to be intelligent enough to give answers to humans that humans don't understand and I think perfect isn't something humans can't understand because even science doesn't give perfect answers there's always gabs and Mysteries and I don't know I I don't know if humans want perfect yeah I could imagine just uh having a conversation with this kind of Oracle entity as you'd imagine them and uh yeah maybe it can tell you about you know based on my analysis of Human Condition uh you might not want this and here are some of the things that might but every every dumb human will say yeah yeah trust me I can give me the truth I can handle it but that's the beauty a lot of people can choose uh so but then the old marshmallow test with the kids and so on I feel like too many people uh like it can't handle the truth probably including myself like the Deep truth of The Human Condition I don't I don't know if I can handle it like what if there's some dark stuff what if we are an alien science experiment and it realizes that what if it had I mean I mean this is the Matrix you know the middle over again I don't know I would what would I talk about I don't even yeah I uh probably I will go with the save for scientific questions at first that have nothing to do with my own personal life yeah immortality just like about physics and so on yeah uh to build up like let's see where it's at or maybe see if it has a sense of humor that's another question would it be able to uh presumably in order to if it understands humans deeply would able to generate uh yep to generate humor yeah I think that's actually a wonderful Benchmark almost like is it able I think that's a really good point basically to make you laugh yeah if it's able to be like a very effective stand-up comedian that is doing something very interesting computationally I think being funny is extremely hard yeah because it's hard in a way like a touring test the original intent of the touring test is hard because you have to convince humans and there's nothing that's why that's why when comedians talk about this like there's this is deeply honest because if people can't help but laugh and if they don't laugh that means you're not funny they laugh that's funny and you're showing you need a lot of knowledge to create to create humor about like the occupational Human Condition and so on and then you need to be clever with it uh you mentioned a few movies you tweeted movies that I've seen five plus times but I'm ready and willing to keep watching Interstellar Gladiator contact Goodwill Hunting The Matrix Lord of the Rings all three Avatar Fifth Element so on goes on Terminator two Mean Girls I'm not gonna ask about that mean girls is great um what are some of the jump onto your memory that you love and why like you mentioned the Matrix as a computer person why do you love The Matrix there's so many properties that make it like beautiful and interesting so there's all these philosophical questions but then there's also agis and there's simulation and it's cool and there's you know the black uh you know uh the look of it the feel of it the look of it the feel of it the action the bullet time it was just like innovating in so many ways and then uh Good Good Will Hunting why do you like that one yeah I just I really like this uh torture genius sort of character who's like grappling with whether or not he has like any responsibility or like what to do with this gift that he was given or like how to think about the whole thing and uh there's also a dance between the genius and the the personal like what it means to love another human being and there's a lot of themes there it's just a beautiful movie and then the fatherly figure The Mentor in the in the psychiatrist and the it like really like uh it messes with you you know there's some movies that's just like really mess with you uh on a deep level do you relate to that movie at all no it's not your fault doctor as I said Lord of the Rings that's self-explanatory Terminator 2 which is interesting you we watch that a lot is that better than Terminator one you like you like Arnold I do like Terminator one as well uh I like Terminator 2 a little bit more but in terms of like its surface properties [Laughter] do you think Skynet is at all a possibility oh yes well like the actual sort of uh autonomous uh weapon system kind of thing do you worry about that uh stuff I 100 worry about it and so the I mean the uh you know some of these uh fears of AGS and how this will plan out I mean these will be like very powerful entities probably at some point and so um for a long time they're going to be tools in the hands of humans uh you know people talk about like alignment of agis and how to make the problem is like even humans are not aligned uh so uh how this will be used and what this is going to look like is um yeah it's troubling so do you think it'll happen so slowly enough that we'll be able to as a human civilization think through the problems yes that's my hope is that it happens slowly enough and in an open enough way where a lot of people can see and participate in it just figure out how to deal with this transition I think which is going to be interesting I draw a lot of inspiration from nuclear weapons because I sure thought it would be it would be fucked once they develop nuclear weapons but like it's almost like uh when the when the systems are not so dangerous they destroy human civilization we deploy them and learn the lessons and then we quickly if it's too dangerous we're quickly quicker we might still deploy it uh but you very quickly learn not to use them and so there'll be like this balance that you humans are very clever as a species it's interesting we exploit the resources as much as we can but we don't we avoid destroying ourselves it seems like well I don't know about that actually I hope it continues um I mean I'm definitely like concerned about nuclear weapons and so on not just as a result of the recent conflict even before that uh that's probably like my number one concern for society so if Humanity uh destroys itself or destroys you know 90 of people that would be because of nukes I think so um and it's not even about full destruction to me it's bad enough if we reset society that would be like terrible it would be really bad and I can't believe we're like so close to it yeah it's like so crazy to me it feels like we might be a few tweets away from something like that yep basically it's extremely unnerving but and has been for me for a long time it seems unstable that world leaders just having a bad mood can like um take one step towards a bad Direction and it escalates yeah and because of a collection of bad moods it can escalate without being able to um stop yeah it's just it's a huge amount of uh Power and then also with the proliferation and basically I don't I don't actually really see I don't actually know what the good outcomes are here uh so I'm definitely worried about that a lot and then AGI is not currently there but I think at some point we'll more and more become uh something like it the danger with AGI even is that I think it's even less likely worse in a sense that uh there are good outcomes of AGI and then the bad outcomes are like an absolute way like a tiny one way and so I think um capitalism and humanity and so on will drive for the positive uh ways of using that technology but then if bad outcomes are just like a tiny like flipping minus sign away uh that's a really bad position to be in a tiny perturbation of the system results in the destruction of the human species it's a weird line to walk yeah I think in general what's really weird about like the Dynamics of humanity and this explosion was talked about is just like the insane coupling afforded by technology yeah and uh just the instability of the whole dynamical system I think it's just it doesn't look good honestly yes that explosion could be destructive and constructive and the probabilities are non-zero in both both senses I'm going to have to I do feel like I have to try to be optimistic and so on and yes I think even in this case I still am predominantly optimistic but there's definitely me too uh do you think we'll become a multiplayer species probably yes but I don't know if it's dominant feature of uh future Humanity uh there might be some people on some planets and so on but I'm not sure if it's like yeah if it's like a major player in our culture and so on we still have to solve the drivers of self-destruction here on Earth so just having a backup on Mars is not going to solve the problem so by the way I love the backup on Mars I think that's amazing you should absolutely do that yes and I'm so thankful uh and would you go to Mars uh personally no I do like Earth quite a lot okay uh I'll go to Mars I'll go for you unless I'll tweet at you from there maybe eventually I would once it's uh safe enough but I don't actually know if it's on my lifetime scale unless I can extend it by a lot I do think that for example a lot of people might disappear into um virtual realities and stuff like that and I think that could be the major thrust of um sort of the cultural development of humanity if it survives uh so it might not be it's just really hard to work in Physical Realm and go out there and I think ultimately all your experiences are in your brain yeah and so it's much easier to disappear into digital Realm and I think people will find them more compelling easier safer more interesting so you're a little bit captivated by Virtual Reality by the possible worlds whether it's the metaverse or some other manifestation of that yeah yeah it's really interesting it's uh I'm I'm interested just just talking a lot to Carmack where's the where's the thing that's currently preventing that yeah I mean to be clear I think what's interesting about the future is um it's not that I kind of feel like the variance in The Human Condition grows that's the primary thing that's changing it's not as much the mean of the distribution it's like the variance of it so there will probably be people on Mars and there will be people in VR and they're all people here on Earth it's just like there will be so many more ways of being and so I kind of feel like I see it as like a spreading out of a human experience there's something about the internet that allows you to discover those little groups and you you gravitate each other something about your biology likes that kind of world and that you find each other yeah and we'll have transhumanists and then we'll have the Amish and they're gonna everything is just gonna coexist you know the cool thing about it because I've interacted with a bunch of Internet communities is um they don't know about each other like you can have a very happy existence just like having a very close-knit community and not knowing about each other I mean even even since this just having traveled to Ukraine there's they they don't know so many things about America you you like when you travel across the world I think you experience this too there are certain cultures they're like they have their own thing going on they don't and so you can see that happening more and more and more and more in the future we have little communities yeah yeah I think so that seems to be the that seems to be how it's going right now and I don't see that Trend like really reversing I think people are diverse and they're able to choose their own like path and existence and I sort of like celebrate that um and so will you spend so much time in the meters in the virtual reality or which Community are you are you the physicalist uh the the the physical reality enjoyer or uh do you see drawing a lot of uh pleasure and fulfillment in the digital world yeah I think well currently the virtual reality is not that compelling I do think it can improve a lot but I don't really know to what extent maybe you know there's actually like even more exotic things you can think about with like neural links or stuff like that so um currently I kind of see myself as mostly a team human person I love nature yeah I love Harmony I love people I love Humanity I love emotions of humanity um and I I just want to be like in this like solar Punk little Utopia that's my happy place yeah my happy place is like uh people I love thinking about cool problems surrounded by a lush beautiful Dynamic nature yeah yeah and secretly high tech in places that count places like they use technology to empower that love for other humans and nature yeah I think a technology used like very sparingly uh I don't love when it sort of gets in the way of humanity in many ways uh I like just people being humans in a way we sort of like slightly evolved and prefer I think just by default people kept asking me because they they know you love reading are there particular books that you enjoyed that had an impact on you for silly or for profound reasons that you would recommend you mentioned the vital question many of course I think in biology as an example the vital question is a good one anything by McLean really uh life ascending I would say is like a bit more potentially uh representative as like a summary of a lot of the things he's been talking about I was very impacted by the selfish Gene I thought that was a really good book that helped me understand altruism as an example and where it comes from and just realizing that you know the selection is in the level of genes was a huge insight for me at the time and it sort of like cleared up a lot of things for me what do you think about the the idea that ideas of the organisms the means yes love it 100 [Laughter] are you able to walk around with that notion for a while that that there is an evolutionary kind of process with ideas as well there absolutely is there's memes just like genes and they compete and they live in our brains it's beautiful are we silly humans thinking that we're the organisms is it possible that the primary organisms are the ideas yeah I would say like the the ideas kind of live in the software of like our civilization in the in the minds and so on we think as humans that the hardware is the fundamental thing I human is a hardware entity yeah but it could be the software right yeah yeah I would say like there needs to be some grounding at some point to like a physical reality yeah but if we clone an Andre the software is the thing like is this thing that makes that thing special right yeah I guess I you're right but then cloning might be exceptionally difficult like there might be a deep integration between the software and the hardware in ways we don't quite understand well from the evolution point of view like what makes me special is more like the the gang of genes that are writing in my chromosomes I suppose right like they're the they're the replicating unit I suppose and no but that's just for you the thing that makes you special sure wow the reality is what makes you special is your ability to survive based on the software that runs on the hardware that was built by the genes um so the software is the thing that makes you survive not the hardware all right yeah it's just like a second layer it's a new second layer that hasn't been there before the brain they both they both coexist but there's also layers of the software I mean it's it's not it's a it's a abstraction that's uh on top of abstractions but okay so selfish Gene um a neckline I would say sometimes books are like not sufficient I like to reach for textbooks sometimes um I kind of feel like books are for too much of a general consumption sometime and they just kind of like uh they're too high up in the level of abstraction and it's not good enough yeah so I like textbooks I like the cell I think the cell was pretty cool uh that's why also I like the writing of uh McLean is because he's pretty willing to step one level down and he doesn't uh yeah he's sort of he's willing to go there but he's also willing to sort of be throughout the stack so he'll go down to a lot of detail but then he will come back up and I think he has a yeah basically I really appreciate that that's why I love college early college even high school but just textbooks on the basics yeah of Computer Science and Mathematics of of biology of chemistry yes those are they condense down like uh uh it's sufficiently General that you can understand the both the philosophy and the details but also like you get homework problems and you you get to play with it as much as you would if you weren't yeah programming stuff yeah and then I'm also suspicious of textbooks honestly because as an example in deep learning uh there's no like amazing textbooks and I feel this changing very quickly I imagine the same is true and say uh synthetic biology and so on these books like this cell are kind of outdated they're still high level like what is the actual real source of truth it's people in wet Labs working with cells yeah you know sequencing genomes and yeah actually working with working with it and uh I don't have that much exposure to that or what that looks like so I sold them fully I'm reading through the cell and it's kind of interesting and I'm learning but it's still not sufficient I would say in terms of understanding well it's a clean summarization of the mainstream narrative yeah but you have to learn that before you break out yeah towards The Cutting Edge yeah what is the actual process of working with these cells and growing them and incubating them and you know it's kind of like a massive cooking recipe so making sure your self slows and proliferate and then you're sequencing them running experiments and uh just how that works I think is kind of like the source of truth of at the end of the day what's really useful in terms of creating therapies and so on yeah I wonder in the future AI textbooks will be because you know there's a artificial intelligence a modern approach I actually haven't read if it's come out the recent version the recent there's been a recent Edition I also saw there's a science a deep learning book I'm waiting for textbooks that worth recommending worth reading it's It's tricky because it's like papers and code code honestly papers are quite good I especially like the appendix appendix of any paper as well it's like it's like the most detail it can have it doesn't have to be cohesive to connected to anything else you just describe me a very specific way you solved a particular thing yeah many times papers can be actually quite readable not always but sometimes the introduction in the abstract is readable even for someone outside of the field uh not this is not always true and sometimes I think unfortunately scientists use complex terms even when it's not necessary I think that's harmful I think there there's no reason for that and papers sometimes are longer than they need to be in this in the parts that don't matter yeah appendix would be long but then the paper itself you know look at Einstein make it simple yeah but certainly I've come across papers I would say in say like synthetic biology or something that I thought were quite readable for the abstract and the introduction and then you're reading the rest of it and you don't fully understand but you kind of are getting a gist and I think it's cool what uh advice you give advice to folks interested in machine learning and research but in General Life advice to a young person High School um Early College about how to have a career they can be proud of or a life they can be proud of yeah I think I'm very hesitant to give general advice I think it's really hard I've mentioned like some of the stuff I've mentioned is fairly General I think like focus on just the amount of work you're spending on like a thing uh compare yourself only to yourself not to others that's good I think those are fairly General how do you pick the thing uh you just have like a deep interest in something uh or like try to like find the art Max over like the things that you're interested in ARG Max at that moment and stick with it how do you not get distracted and switch to another thing uh you can if you like um well if you do an ARG Max repeatedly every week it doesn't converge it doesn't it's a problem yeah you can like low pass filter yourself uh in terms of like what has consistently been true for you um but yeah I definitely see how it can be hard but I would say like you're going to work the hardest on the thing that you care about the most also a low pass filter yourself and really introspect in your past were the things that gave you energy and what are the things that took energy away from you concrete examples and usually uh from those concrete examples sometimes patterns can merge I like I like it when things look like this when I'm these positions so that's not necessarily the field but the kind of stuff you're doing in a particular field so for you it seems like you were energized by implementing stuff building actual things yeah being low level learning and then also uh communicating so that others can go through the same realizations and shortening that Gap um because I usually have to do way too much work to understand the thing and then I'm like okay this is actually like okay I think I get it and like why was it so much work it should have been much less work and that gives me a lot of frustration and that's why I sometimes go teach so aside from the teaching you're doing now uh putting out videos aside from a potential uh Godfather part two uh with the AGI at Tesla and Beyond uh what does the future for Android kapati hold have you figured that out yet or no I mean uh as you see through the fog of war that is all of our future um do you do you start seeing silhouettes of the what that possible future could look like the consistent thing I've been always interested in for me at least is is AI and um uh that's probably where I'm spending my rest of my life on because I just care about a lot and I actually care about like many other problems as well like say aging which I basically view as disease and uh um I care about that as well but I don't think it's a good idea to go after it specifically I don't actually think that humans will be able to come up with the answer I think the correct thing to do is to ignore those problems and you solve Ai and then use that to solve everything else and I think there's a chance that this will work I think it's a very high chance and uh that's kind of like the the way I'm betting at least so when you think about AI are you interested in all kinds of applications all kinds of domains and any domain you focus on will allow you to get insights to the big problem of AGI yeah for me it's the ultimate mental problem I don't want to work on any one specific problem there's too many problems so how can you work on all problems simultaneously you solve The Meta problem which to me is just intelligence and how do you automate it is there cool small projects like archive sanity and and so on that you're thinking about the the the the world the ml world can anticipate there's some always like some fun side projects yeah um archive sanity is one basically like there's way too many archive papers how can I organize it and recommend papers and so on uh I transcribed all of your yeah podcasts what did you learn from that experience uh from transcribing the process of like you like consuming audiobooks and and podcasts and so on and here's the process that achieves um closer to human level performance and annotation yeah well I definitely was like surprised that uh transcription with opening eyes whisper was working so well compared to what I'm familiar with from Siri and like a few other systems I guess it works so well and uh that's what gave me some energy to like try it out and I thought it could be fun to random podcasts it's kind of not obvious to me why whisper is so much better compared to anything else because I feel like there should be a lot of incentive for a lot of companies to produce transcription systems and that they've done so over a long time whisper is not a super exotic model it's a Transformer it takes smell spectrograms and you know just outputs tokens of text it's not crazy uh the model and everything has been around for a long time I'm not actually 100 sure why yeah it's not obvious to me either it makes me feel like I'm missing something I'm missing something yeah because there's a huge even at Google and so on YouTube uh transcription yeah um yeah it's unclear but some of it is also integrating into a bigger system yeah that so the user interface how it's deployed and all that kind of stuff maybe running it as an independent thing is eat much easier like an order of magnitude easier than deploying to a large integrated system like YouTube transcription or um anything like meetings like Zoom has trans transcription that's kind of crappy but creating uh interface where it detects the different individual speakers it's able to um display it in compelling ways Run in real time all that kind of stuff maybe that's difficult but that's the only explanation I have because like um I'm currently paying uh quite a bit for human uh transcription human caption right annotation and like it seems like uh there's a huge incentive to automate that yeah it's very confusing and I think I mean I don't know if you looked at some of the whisper transcripts but they're quite good they're good and especially in tricky cases yeah I've seen uh Whispers performance on like super tricky cases and it does incredibly well so I don't know a podcast is pretty simple it's like high quality audio and you're speaking usually pretty clearly and so I don't know it uh I don't know what open ai's plans are yeah either but yeah there's always like fun fun projects basically and stable diffusion also is opening up a huge amount of experimentation I would say in the visual realm and generate generating images and videos and movies videos now and so that's going to be pretty crazy uh that's going to that's going to almost certainly work and it's going to be really interesting when the cost of content creation is going to fall to zero you used to need a painter for a few months to paint a thing and now it's going to be speak to your phone to get your video so if Hollywood will start using that to generate scene means which completely opens up yeah so you can make a like a movie like Avatar eventually for under a million dollars much less maybe just by talking to your phone I mean I know it sounds kind of crazy and then there'd be some voting mechanism like how do you have a like would there be a show on Netflix that's generated completely uh automatedly potentially yeah and what does it look like also when you can just generate It On Demand and it's uh and there's Infinity of it yeah oh man all the synthetic content I mean it's humbling because we we treat ourselves as special for being able to generate art and ideas and all that kind of stuff if that can be done in an automated Way by AI yeah I think it's fascinating to me how these uh the predictions of AI and what it's going to look like and what it's going to be capable of are completely inverted and wrong and the Sci-Fi of 50s and 60s was just like totally not bright they imagined AI is like super calculating theore improvers and we're getting things that can talk to you about emotions they can do art it's just like weird are you excited about that feature just ai's like hybrid systems heterogeneous systems of humans and AIS talking about emotions Netflix and chill with an AI system legit where the Netflix thing you watch is also generated by AI I think it's uh it's going to be interesting for sure and I think I'm cautiously optimistic but it's not it's not obvious well the sad thing is your brain and mine developed in a time where um before Twitter before the before the internet so I wonder people that are born inside of it might have a different experience um like I maybe you can will still resist it uh and the people born now will not well I do feel like humans are extremely malleable yeah and uh you're probably right what is the meaning of life Andre we we talked about sort of the universe having a conversation with us humans or with the systems we create to try to answer for the university for the creator of the universe to notice us we're trying to create systems that are loud enough just answer back I don't know if that's the meaning of life that's like meaning of life for some people the first level answer I would say is anyone can choose their own meaning of life because we are conscious entity and it's beautiful number one but uh I do think that like a deeper meaning of life if someone is interested is uh or along the lines of like what the hell is All This and like why and if you look at the into fundamental physics and the quantum field Theory and a standard model they're like very complicated and um there's this like you know 19 free parameter parameters of our universe and like what's going on with all this stuff and why is it here and can I hack it can I work with it is there a message for me am I supposed to create a message and so I think there's some fundamental answers there but I think there's actually even like you can't actually really make dent in those without more time and so to me also there's a big question around just getting more time honestly yeah that's kind of like what I think about quite a bit as well so kind of the ultimate or at least first way to sneak up to the why question is to try to escape uh the system the universe yeah and then for that you sort of uh backtrack and say okay for that that's going to be take a very long time so the why question boils down from an engineering perspective to how do we extend yeah I think that's the question number one practically speaking because you can't uh you're not gonna calculate the answer to the deeper questions in the time you have and that could be extending your own lifetime or extending just the lifetime of human civilization of whoever wants to not many people might not want that but I think people who do want that I think um I think it's probably possible uh and I don't I don't know that people fully realize this I kind of feel like people think of death as an inevitability but at the end of the day this is a physical system some things go wrong uh it makes sense why things like this happen evolutionarily speaking and uh there's most certainly interventions that uh that mitigate it that would be interesting if death is eventually looked at as as a fascinating thing that used to happen to humans I don't think it's unlikely I think it's I think it's likely and it's up to our imagination to try to predict what the world without death looks like yeah it's hard to I think the values will completely change could be I don't I don't really buy all these ideas that oh without that there's no meaning there's nothing as I don't intuitively buy all those arguments I think there's plenty of meaning plenty of things to learn they're interesting exciting I want to know I want to calculate uh I want to improve the condition of all the humans and organisms that are alive yet the way we find meaning might change we there is a lot of humans probably including myself that finds meaning in the finiteness of things but that doesn't mean that's the only source of meaning yeah I do think many people will will go with that which I think is great I love the idea that people can just choose their own adventure like you you are born as a conscious free entity by default I'd like to think and um you have your unalienable rights for Life uh in the pursuit of happiness I don't know if you have that in the nature the landscape of happiness you can choose your own adventure mostly and that's not it's not fully true but I still am pretty sure I'm an NPC but um an NPC can't know it's an NPC there could be different degrees and levels of consciousness I don't think there's a more beautiful way to end it uh Andre you're an incredible person I'm really honored you would talk with me everything you've done for the machine learning world for the AI world to just inspire people to educate millions of people it's been it's been great and I can't wait to see what you do next it's been an honor man thank you so much for talking today awesome thank you thanks for listening to this conversation with Andre karapathi to support this podcast please check out our sponsors in the description and now let me leave you with some words from Samuel Carlin the purpose of models is not to fit the data but to sharpen the questions thanks for listening and hope to see you next time Back To Top