hey everyone welcome to the laden space podcast this is alesio partner and CTO in Residence at deal partners and I'm joined by my co-hosts founder of small hey and today we have in the studio Su chantala welcome thanks for having me uh on one of your rare visits uh from New York where you live yeah um you were you got your start in computer vision um at NYU with um yakun that was a very for for to start I was actually listening to your interview on the gradient podcast so if people want to know more about like the history of zth uh history of P torch uh they can go to that podcast we won't spend that much time there uh but I just was marveling at your luck or I I don't know if it's your luck or your drive to find AI early and then find like the right quality uh Mentor because I guess yan yan really sort of introduced you to that world you're talking about extrinsic success right like a lot of people just have drive to do things that they think is fun and a lot of those things might or might not be extrinsically perceived as as good and successful I think I just happen to like something that is now like one of the coolest things in the world or whatever but if I happened you know the the first thing I tried to become was a was a 3D VFX artist and I was really interested in doing that but I turned out to be very bad at it so I ended up not doing that further but even if I was good at that whatever and I ended up going down that path I probably would have been equally happy it's just like maybe like the perception of oh is this person successful or not might be different but I think like after a baseline like your happiness is probably more correlated with your intrinsic stuff yes um I think Dan pink has this book on Drive uh that that I often refer to about the power of intrinsic motivation versus extrinsic and how long extrinsic lasts it's not very long at all but anyway now you are you know an investor in Runway so in a way you're working on bfx yes I mean in a very convoluted way it reminds me of the Ed catm I don't know if you guys know but you know he actually tried to become an animator in his early years and failed or didn't get accepted by Disney and then went and created Pixar and then got bought by Disney and created a Toy Story yeah um so you joined Facebook in 2014 um and eventually became uh creator maintainer of pytorch um and uh and there's there's a long story there you can refer to on the gradient uh but you also like um I think maybe people don't know that you also involved in more sort of hardware and cluster decision Affair um and we can dive into more details there because we're all all about Hardware this month um and uh yeah yeah and then finally I I don't know what else like what else should people know about you on the personal side or professional side I think uh open source is definitely like a big passion of mine and probably forms a little bit of my identity at this point um I I'm irrationally interested in open source right it's like one of those things that I attribute to um I think open source has that fundamental uh way to distribute opportunity in a way that uh is very powerful like I grew up in India uh um I didn't have internet for for a while uh in in college actually I didn't have internet except for like GPRS or whatever um so just having like and like knowledge was knowledge was very centralized but like I saw that evolution of knowledge slowly getting decentralized and that ended up helping me learn quicker and faster for like0 and I think that was a strong reason why I ended up where I am so like that like the open sour set of things I always push regardless of like what I get paid for like I think I would do that as a passion project on the side yeah that's wonderful and we we we'll talk about the challenges as well that open source has uh open models versus closed models uh but maybe you want to talk touch a little bit on pyour before we move on to sort of meta AI in general M yeah we kind of touched on pip torch in a lot of episodes so we had George Hots from Tiny grad um he called py torch a cisk and Tiny gr a risk I would love to get your thoughts on py dorge design direction as far as um I know you talk a lot about kind of having a a happy path to start with and then making complexity hidden way but then available to the to the end user what one of the things that George mentioned is I think you have like 250 primitive operators in in pytorch I think tiny grat is four so uh how how do you think about some of the learnings that maybe he's gonna run into that you already had in the past seven eight years almost of of Running P torch yeah I think uh everyone starts I there's different models here but like I think it's two two different models that people generally start with either they go like I have a grand vision and I'm going to build like a giant system that achieves this Grand Vision and my V one is or like you know super like complex feature complete whatever or other people say they will get incrementally ambitious right and they say oh we'll start with something simple and then we'll slowly layer out complexity in a way that um optimally applies Huffman coding or whatever like you know where the density of users are and what they're using I would want to keep it in the like easy happy path and where the more Niche Advanced use cases I'll still want people to try them but they need to take additional frictional steps um George I think uh just like we started with p word George started with a like the incrementally ambitious thing I remember um tiny grad used to be like we would be limited to thousand lines of code and I think now it's like a 5,000 so I think there's no real like magic with to which why pych has the kind of complexity I think it's like probably partly necessitated and partly uh because we built with the technology available under us at that time if you had to rewrite p p is like 190,000 lines of code or something at this point I think if you had to rewrite it we would probably think about ways to rewrite it in like a vastly simplified way for sure but a lot of that complexity comes from the fact that the like um in a very simple explainable way you have memory hierarchies uh you have CPU has like three levels of caches and then you have Dam and SSD and then you have Network um similarly GPU has several levels of memory and then you have like different levels of network hierarchies NV link plus like Infinity band or rocky or something like that right and the way the flops are are available on your Hardware they are available in a certain way and your computation is in a certain way and you have to retrofit your computation onto both the memory hierarchy and like the flops available when you're doing this uh it is actually like a fairly hard mathematical problem to uh do this do this um setup like you find the optimal thing and finding the optimal thing is like what is optimal the optimal what is optimal depends on like the input variables themselves so like okay like what is the shape of your input tensors and like um what is the operation you're trying to do and like various things like that finding that optimal configuration uh like and writing it do in code um is not the same for every every op every input uh configuration you have like like for example just as a shape of the tensors change let's say you have three input tensors into like a uh um uh sparse do product or something like that the the shape of each of these input tensors will vastly change how you do this optimally placing uh this operation onto the hardware in a way that will get you maximal throughput so uh a lot of our complexity comes from writing out like hundreds of configurations uh for each single pytorch operator uh and templatized these things and like symbolically like like generating the the final Cuda code or like CPU code um there's no way to avoid it because mathematically we haven't found symbolic ways to do this uh that also keep compile time near zero um you can write a very simple framework uh but then you also should be willing to eat the long compile times of like searching for that Optimal Performance at run time so that's the the tradeoff there's no like I don't think like unless we have like great breakthroughs like georg's vision is achievable like or like he should be thinking about a narrower problem such as I'm only going to make this for like work for self-driving car conet uh or like I'm only going to make this work for like l M Transformers of the Llama style like if you start narrowing the problem down you can make a vastly simpler framework but uh if you don't if you need the generality to power all of the a research that is happening and keep like zero compile time and all these other factors I think it's it's not easy to avoid the complexity um that's interesting and we kind of touched on this with uh with Chris lner when he was on the podcast if you think about Frameworks they have the model Target they have the hardware Target they have a different different things to think about he mentioned when he was at Google tensorflow it's trying to be optimized to make tpus go bur you know and go as fast um I think George is trying to make especially AMD stack be better than than Rockham how come pytorch has been such as Switzerland versus just making meta Hardware go go first meta is not in the business of selling Hardware meta is not in the business of of uh Cloud compute um we kind of uh the way matter things about funding pitor is it's just like we're funding it because it's not good for meta to fund pych because pych has become a standard and a big open source project and generally it uh gives us a timeline Edge it gives us like where is like like and all that like within our own work um so why is p like more of a Switzerland rather than being opinionated like I think the way we think about it is not in terms of Switzerland or not we actually the way like we articulated it to all Hardware vendors and software vendors and all who come to us being like we want to build a back end in core for py torch and ship it by default is like we just only look at our user side of things like if users are using a particular piece of Hardware then we want to support it we very much don't want to King make the hardware side of things um so as like the MacBooks uh have gpus and as that stuff started getting increasingly interesting we pushed Apple to push some engineers and work on the MPS support and we spend significant time from like meta funded Engineers on that as well because a lot of people are using the apple gpus and there's demand so like we kind of mostly look at it from the demand side we never look at it from like oh which Hardware should we start taking opinions on is is is there a future in which because p u Mojo or modulus Mojo is kind of a superet of python is there a future in which um pytorch might use Modo features optionally I think it depends on how well integrated it is uh into the python e ecosystem so if Mojo is like a pip install and uh it's readily available and users feel like they can use Mojo so smoothly within their workflows within uh in a way that just is slow friction um we would definitely look into that like in the same way like P torch now depends on Triton like opening eye Triton and we weren't we we never had a conversation that was like huh that's like a dependency should we just build a Triton of our own or should we like use Triton like it almost doesn't like those conversations don't really come up for as it's the conversations are more like well just try and have like 10,000 dependencies and is it hard to install like we almost don't look at these things from like a strategic leverage point of view we look at these things from like a user experience point of view like is it easy to install all is it like smoothly integrated if so we should consider and does it give enough benefits for us to like start depending on if so yeah we should consider that's how we think about it you're inclusive by default if as long as it meets like the the minimum bar of yeah um but like maybe maybe I phrased it wrongly maybe it's more like okay like what problems would you look to solve okay right um that that you have right now I think it depends on what problems Mojo will be useful at um it's more performance mainly a performance pitch uh some amount of cross compiling pitch yeah I think like the performance pitch for Mojo was like we're going to be per Performing uh even if you have like a lot of custom stuff like you can write arbitrary custom things and like be will be performant and that value proposition is not clear to us from the pitor side to consider it for pytorch so pytorch exposes like it's actually not 250 operators like a thousand operators py Expos about a th operators and people kind of write their ideas in the Thousand operators of f torch um Mojo is like well maybe like it's okay to completely Sid step those like thousand operators of pytorch and just write it in a more natural form just write like raw python like right for Loops or whatever right so from the consideration of how do we intersect py torit Mojo like I can see one use case where you're like you have custom staff uh for some parts of your program but mostly it's pytorch and so we can probably figure out how to like make it easier for say torch. compile to like smoothly also consume Mojo subgraphs uh and like you know the interoperability being actually usable that I think is valuable but like MO as a fundamental frontend would be replacing pytorch not like augmenting pytorch so in that sense I don't see a Synergy in more deeply like integrating Mojo So call out to Mojo whenever they have written something in Mojo and uh there's some performance yeah related thing going on uh and then since you mentioned M uh Apple um what do you what should people think of P Torch versus mlx I mean mlx is early and I know the the folks well Anie uh used to work at fair and I chat you know I used to chat with him all the time he used to be based out of New York as well um the way I think about emx [Music] is that mlx is specialized for Apple right now um it has a happy path because it's like in a it's defined its product in a narrow way at some point mlx either says we will only be supporting apple and we will just focus on enabling you know this is a framework if you use your MacBook but once you like go server side or whatever that's not my problem and I don't care um or MLS it enters like the server side set of things as well like it one of these two things will happen right if the first thing will happen like mlx is overall addressable Market will be small but it probably do well within that addressable Market uh if it enters the second phase they're going to run into all the same complexities that um we have to deal with they will not have any magic one and they will have vastly more complex work to do they probably wouldn't be able to move as fast and like having to deal with distributed comput distributed uh Nvidia and AMD gpus like just like having a genericization of the concept of a backend how they treat compilation uh with plus overheads right now they deeply assume like the whole MPS graph thing um so they need to think about all these additional things if they end up expanding onto the server side and they'll probably build something like pych as well right like you eventually that where it will end and I think there they will kind of fell on the like lack of differentiation like it wouldn't be obvious to people why they would want to use it um yeah I mean there are some Cloud companies offering M1 and M2 chips on on servers um I feel like it might be interesting for Apple to pursue that market but it's not their core yeah I mean if Apple can figure out their interconnect story maybe like then it it can become a thing yeah honestly that's more interesting than the cars yes I think like I mean the mode that didn't has right now I feel like is that they're they have the interconnect that no one else has like MD gpus are pretty good um I'm sure there's very silicon that is not bad at all but like the interconnect um like NV link is uniquely awesome so I'm like and like I'm sure the other Hardware providers are working on it but I feel like when you say it's uniquely awesome you have some appreciation of it that the rest of us don't um the rest I mean the rest of us just like you know we hear marketing lines but what do you mean when you say um Nvidia is very good in networking uh they made the acquisition maybe like like the bandwidth it offers and the latency it offers I mean like tpus also have a good interconnect but you can't buy them so you have to go to Google to use it good who are some of the other Fair pytorch uh alumni that are building cool companies I know you have fireworks AI lightning AI uh lepon and Y Kang you knew since College when he was building coffee uh yeah so yanii and I used to be framework Rivals like Cafe torch uh I mean we were all a very small clothesknit community back then um Cafe Cafe torch Tiano uh chainer um caras um where is where is Frameworks I mean it used to be like more like 20 Frameworks I can't remember all the names CCV by Leo Leo who is also based out of SF um and I would actually like one of the ways it was interesting is like you went into the framework guts and saw if someone wrote their own convolution kernel or they like were just copying someone else's and there were like four or five convolution kernels that were like unique and interesting there's one from this from this guy out of Russia I forgot the name but like I remembered who was awesome enough to have like written their own con kernel um and at some point there like I um I built out these these benchmarks called conet benchmarks y that um they're just benchmarking all um all the convolution kernels that were available at that time uh and it hilariously became big enough that at that time AI was getting like important but not important enough that industrial strength players came in to do these kind of benchmarking standardization like we have ml per today so a lot of the startups um were using conent benchmarks in their pitch decks as like oh you know on conage benchmarks like this is how we fair so you should fund us I remember Nirvana actually was at the top of the the pack because Scott Gray wrote like amazingly fast con convolution Colonels at that time um very interesting but separate times but to answer your question alesio um I think mainly leptin fireworks are the two most obvious ones um but I'm sure the the the fingerprints are a lot wider um they're just people who worked within the pie torch Cafe to cohort of things and now end up at various other places yeah um I I think as a um both as a investor and a um people looking to build on top of their services um it's um um uncomfortable like I don't know what I don't know pitch uh because I've met yaning and I've met l l yeah you know I've met I've met these folks and they're like you know we we were deep in the p on pyro ecosystem and we serve billions of inferences a day or whatever at Facebook and now we can do it for you and I'm like okay that's that's like great like what what should I be wary of or cautious of when when when these things happen because I'm like obviously this this experience is extremely powerful and and um and valuable I just don't know what I don't know like what should people know about like these sort of new um inference as a service companies at that point you would be investing in them for their expertise and of one kind so if they if they've been at a large company uh but they've been doing amazing work you would be thinking about it as like okay like what these people bring to the table is that they're really good at like GPU programming or understanding the complexity of serving models once once it hits a certain scale like you know various expert like from the infra and like Ai and gpu's point of view what you would obviously want to figure out is like whether their understanding of the external Market is clear whether they know and understand how to think about running a business like understanding how to be disciplined about making money or you know various things like that maybe I I'll put it like it's actually I will emphasize the investing bit and just more as a potential customer oh okay like it's more like okay like you know you have p g gods of course like what what what should I what else should I know I mean I would not care about who's building something if I'm trying to be a customer I would care about whether benchmarks yeah I'm I I use it and it's like usability and reliability and speed right quality as well yeah I if someone from some random unknown place came to me and said use our stuff it's great like I and I have the band with I probably will give it a shot and if it turns out to be great like I'll just use it okay great um and then maybe one one more thing about benchmarks since we already brought it up uh and he brought up confident benchmarks um there was recent some recent drama around any scale um the any scale released their own benchmarks and obviously they look great on their own benchmarks but um maybe didn't give the other uh I I feel I feel like there are two lines of criticism one which is they they didn't test sort of apples for apples on the kind of endpoints that uh the other providers that they are competitors with um you know on their benchmarks and you know that's due diligence Baseline and then the second would be more just like optimizing for the right thing um you had some commentary on it I'll just kind of let you riff yeah I mean in summary basically my criticism of that was any scale built these benchmarks for end user ERS to just understand what they should pick right and that's a very good thing to do I think what they didn't do a good job of is give that end user an a full understanding of what they should pick like they just give them like a very narrow slice of understanding I think they just gave them um latency numbers and um that's not sufficient right like you you need to understand your total cost of ownership at some reasonable scale not like oh like one API call is like 1 cent but like 1,000 API calls are like 10 cents or like you know like you like people can miss price to cheat on those benchmarks so you want to understand okay like how much is it going to cost me if I actually subscribe to you and do like a million API calls a month or something and then you want to understand uh the latency uh and reliability not just from like one call you made but like an aggregate of calls you made over several various times of the day and Times of the week yeah and uh the nature of the workloads like it's like is it just like some generic single paragraph that you're sending that is cachable or like is it like testing of real work real world workload I think that kind of rigor like in presenting that Benchmark wasn't there it was a much more narrow sliver of what should have been a good Ben Mark that was my main criticism and I'm pretty sure if before they released it they uh showed it to their like other stakeholders who would be caring about this Benchmark because they are present in it they would have easily just pointed out these gaps and I think they didn't do that and they just like yeah released it so I think those were the two main criticism I think they were fair and Robert took it well and he took it very well yeah and we'll we'll have him on at some point we'll discuss it but I think it's important for I think the market being maturing enough that people start caring and competing on these kinds of things means that we need to establish what best practice is right because otherwise everyone's going to play dirty yeah absolutely I my view of the AL menance Market in general is that it's it's like the laundromat uh model like the margins are going to drive down towards the bare minimum like it's going to be all kinds of Arbitrage between how much you can get the hard for and then how much you sell the API and how much like latency your customers are willing to let go like you need to figure out how to squeeze your margins like what is your unique thing here like uh I think like together and fireworks and all these people are trying to build some faster Cuda kernels and faster like you know Hardware kernels in general but those modes only last for a month or two like these ideas quickly propagate even if they're not published even if they're not published like the idea space is small okay so even if they're not published so the discovery rate is going to be pretty high it's not like we're talking about a combinatorial thing that is really large you're talking about like llama style llm models and we're going to beat those to death like on like a few different Hardware skes right like it's not even like we have a huge diversity of Hardware you're going to aim to run it on now when you have such a narrow problem and you have a lot of people working on it like the rate at which these ideas are going to get figured out is going to be pretty R is it like a standard bag of tricks like the the standard one that I know of is you know fusing operators and yeah it's the standard bag of tricks on like figuring out how to like improve your memory bth and all that yeah okay interesting um any ideas instead of things that are not being beaten to death that people should be paying more attention to one thing I was like you know you have a thousand operators right like what's the most interesting use usage of pytorch that you're seeing maybe outside of this little bubble so pytorch uh it's very interesting and scary at the same time but basically it's used in a lot of exotic ways like from the ml angle like okay like what kind of models are being built and you get all the way from like State space model and all these things to like stuff that like and order differentiable models like like neural neural Odes and stuff like that um I think like there's one set of interestingness factor from like the the ml uh side of things and then there's the other set of interesting factor from the applications point of view it's used in Mars Rover simulations to drug Discovery to Tesla cars and there's a huge diversity of like applications in which it is used and so in terms of the most in like I think like in terms of the most interesting application side of things I think I am scared at how many interesting things that are also very critical and really important it is used in um if if you're if like if I think the scariest was when I went to uh visit CERN at some point and they said they were using it pytorch and they were using Gans at the same time for like particle physics research and I was scared more about the fact that they were using Gans than they were using pytorch because at that time I was like a researcher focusing on Gans the diversity is probably the most interesting how many different things it is being used in I think that's the most interesting to me from the applications perspective from from the models perspective I I think I've seen a lot of them like the the really interesting ones to me are where we're starting to combine search uh and symbolic stuff with with differentiable models um I I think uh like the whole alphao style model so is one example and then I think we're attempting to do it for llms as well with like various reward models and then search I mean I don't think pyro is being used uh in this but like the whole Alpha geometry thing was interesting because again it's an example of combining the symbolic models with with uh the gradient based ones uh but there are stuff like Alpha geometry that PCH is used at um especially when you're intersect biology and chemistry with with um ml like in those areas you you want stronger guarantees on the output um so so yeah maybe from the ml side those things to me are very interesting right now yeah people are very excited about the alpha geometry thing and it's kind of like for me it's theoretical it's you know it's great you can solve some olymp questions I'm not sure how to make that bridge over into the real world applications but I'm sure will figure it out let me give you an example of it you know how like whole thing about synthetic data will be the next rage in llms is a thing already is a Rage which I think is uh fairly misplaced in how people perceive it people think synthetic data is some kind of magic wand that you wave and it's going to be amazing synthetic data is useful in neural networks right now because we as humans have figured out a bunch of uh uh symbolic models of the world or made up certain symbolic models because of human inate biases uh so we' figured out how to ground particle physics in a 30 parameter model and it's just very hard and to compute as in like it's like it takes a lot of flops to compute but it only has 30 parameters or so I mean I'm not a physics expert but it's a very low rank model we built mathematics as a as a field that basically is very low rank um language like a deep understanding of language like the whole syntactic parries and like just understanding how language can be broken down in into a formal symbolism is something that that we figured out so we basically as humans have accumulated all this knowledge on these subjects either synth IC I we created those subjects you know in our heads or like we grounded some real world phenomenon into a set of symbols but we haven't figured out how to teach neural networks symbolic World models directly the only way we have to teach them is generating a bunch of inputs and outputs and gradient descenting over them so in areas where we have the symbolic models but we we like you know and we need to teach all like the the the the knowledge we have that is better encoded in the symbolic models what we're doing is we're generating a bunch of synthetic data a bunch of input output Pairs and then giving that to the neural network and asking it to learn the same thing that we already have a better low rank model off in gr and decent in a much more overparameterized way outside of this like where we don't have good symbolic models like synthetic data obviously like does doesn't make any sense so synthetic data is not a magic one where it'll work in all cases in every case or whatever it's just where vs humans already have good symbolic models off we can we need to give impart that knowledge to neural networks and we figured out the synthetic data is a vehicle to like impart this knowledge to so but people because because maybe they they don't know enough about synthetic data as a not but like they hear like you know the next way of data Revolution is synthetic data they think it's some kind of magic where we just just create a bunch of random data somehow they don't think about how and then they think like that's just the Revolution and I think that's maybe like a gap in understanding most people have in this hype cycle yeah well it's a relatively New Concept so yeah oh there's two more that I I'll push I'll put in front of you and then see if you see what you respond um one is um you know I have this joke that it's uh you know it's only synthetic data if it's from the Mistral region of France otherwise there's a sparkling distillation which is which is what news research is doing like they're distilling gp4 by creating synthetic data from gp4 like creating mock textbooks inspired by F2 and then fine tuning open source models like like llama yeah um and so should we call that synthetic data should we call it something else I don't know but it's yeah I mean the outputs of llms are they synthetic data they probably are but I think it depends on the goal you have if you're if your goal is like you're creating synthetic data by with the goal of trying to distill GPD 4's superiority into another model I guess you can call it synthetic data but it's also feels like disingenous because your goal is like I need to like copy the behavior of gp4 and it's also uh not just Behavior but data set um so I I've often thought of this as data set washing like you need one model at the top of the chain um you know unnamed French company that has that you know makes a model that has all the data in it that we don't know where it's from but it's open source hey and then we dis Ste from that and it's great yeah um so but they also to be fair they also use uh larger models as judges so for preference ranking right like so that is I think an a very very accepted use of synthetic correct I think uh it's a very in time where we don't really have good social uh social models of what is acceptable uh in ter depending on how many bits of information you use from someone else right it's like okay you use like one bit is that okay yeah that's accept it to be okay okay what about if you use like 20 bits is that okay I don't know no what if you use like 200 bits like I don't think we as Society have ever been in this conundrum where we have to be like where is the boundary of copyright or where is the boundary of socially accepted understanding of copying someone else like this like we haven't been tested this mathematically before in my opinion whether it's transformative use yes um so yeah I I think this New York Times opening eye case is going to goes to the Supreme Court we have to decide it cuz I it'll be very interesting never had to deal with it before uh and then finally for synthetic data the thing I'm personally exploring is um solving this um very Stark Paradigm difference between Rag and fine-tuning um where you can kind of create synthetic data off of your um retrieved documents yeah and then F tune on that that's kind of synthetic um all you need is variation or uh diversity of of samples for for you to find tune on and then you can find tune new knowledge into into your your data set your your model yeah um I don't know if you've seen that as a direction for synthetic data I think that is that is like you're basically trying to create where like what you're doing is you're saying well language I I know how to parameterize language to an extent yeah and I need to teach my model variations of this input data so that it's resilient or invariant to to language uses of that data yeah it doesn't overfit yeah so I think that's 100% like synthetic right you understand like the key is like you create variations of your documents and you know how to do that because you have a symbolic model or like some implicit symbolic model of language okay do you think the the issue with symbolic models is just the architecture of the language models that we're building I think like the maybe the thing that people grasp is like the inability of Transformers to deal with numbers because of the tokenizer um is it a fundamental issue there too and do you see alternative architectures that will be better with symbolic understanding I am not sure if it's a fundamental issue or not I think we just don't understand Transformers enough uh I don't even mean Transformers as an architecture I mean like the use of Transformers today like combining the tokenizer and Transformers and the Dynamics of training like when when you show math heavy questions versus not I don't have a good calibration of whether I know the answer or not um I you know there's common criticisms that are like well you know Transformers will just fail at X but then when you scale them up to sufficient scale um they actually don't faill it that X I think this is this this entire subfield where they're trying to figure out these answers called like the science of deep learning or something so we'll we'll we'll get to nor more I I don't know the answer got it um let's touch a little bit on uh just meta Ai and you know stuff that's going on there maybe I don't know how deeply you're personally involved in it but uh you're a first guest from which is really fantastic and Lama one was uh you know uh you know you you you you are such a believer in open source llama one was more or less like the real breakthrough in in open source AI um uh the most interesting thing for for for us on covering on this in this podcast was uh the death of chinchilla as people say um any any interesting insights there around like the scaling models for for open source models or smaller models or whatever that that design decision was when when you guys were doing it so llama one was G lumple and team um there was uh opt before which I think I'm also very proud of um true because we bridged a gap in understanding of um how complex it is to train these models to the world like until then no one really in gory detail published the logs yeah like like why is it complex and everyone says like oh it's complex but no one really talked about why it's complex um so I I I think OPD was cool uh we probably also I met Susan and she's very very outspoken yeah we probably I think uh didn't train it for long enough right like you know that's that's kind of obvious in retrospect for 175b yeah we trained it according uh you trained it according to chinchilla at the time or or I I can't remember the reals but I think it's a commonly held belief at this point that like well if we trade OPD longer it would actually end up being better um llama one I think was yeah GM lul and team GM is fantastic uh and went on to build Mistral I wasn't too involved in that side of things uh so I don't know what you're asking me which is like well like how did they think about scaling law and all of that um llama 2 I was more closely involved in um I helped them a reasonable amount with like their um infrastructure needs and stuff llama 2 I think was more like let's get to to the evolution at that point we kind of understood what we were missing from the industry understanding of llms and we needed more data and we needed more to train the models for longer and we made I think a few tweaks to the architecture and we scaled up more um and like that was Lama too I think L 2 you can think of it as like after gam left the team kind of rebuilt their muscle around um Lama 2 and Hugo I think who's the first author is fantastic and I think he he did play a reasonable big role in llama one as well and he overlaps between llama 1 and two so and llama 3 obviously hopefully will be awesome mhm um just one question on llama 2 and then we'll try and fish llama three spoilers out of you uh in the lon two paper the the Lost curves of the 34 and 70b parameter they they still seem kind of steep like they could go lower how from an infrastructure level how do you allocate resources like U could they have just gone longer or were you just like hey this is all the gpus that we can burn and let's just move on to llama 3 and then make that one better instead of answering specifically about like that Lama to situation or whatever I'll tell you like how we think about things M generally we're we have I mean Mark Rel some numbers right he so let's let's site those things again uh all I remember is like 600k gpus uh that is by the end of this year and 600k h100 equivalent okay uh with 350k h100s and including all the our other GPU or accelerator stuff it would be 600 something uh k um agregate capacity M that's a lot of gpus we'll talk about that separately but um the way we think about it is we have a we have a train of models right Lama 1 two 3 4 um and we have a bunch of gpus I I don't think we're short of gpus like yeah no I wouldn't say so yeah so I think the it's it's all a matter of time I think time is the biggest bottleneck it's like when do you stop training the previous one and when do you start training the next one and how do you make those decisions um the the data do you have net new data better clean data for the next one in a way that it's not worth like really focusing on the previous one it's just a standard iterative product you're like when is the iPhone one when like when you start working iPhone 2 versus iPhone like so on right um so mostly the considerations are time and generation rather than gpus in my opinion so one other the thing with the scaling laws like chinella is like Optimal to balance training and inference cost I think a Facebook scale or meta scale you would rather pay a lot more maybe at training and then save on inference how do you think about that from uh infrastructure perspective I I think in your Tweet you say you can try and guess on like how we're using this GPS can you just give people a bit of understanding as like because I've already seen a lot of VC say LL Tre is been trained on 600,000 gpus and that's obviously not true uh I'm sure um how do you allocate between the research like fair and uh the Llama training the inference on Instagram suggestions that got me to scroll like AI generated stickers on WhatsApp and all of that yeah um we haven't talked about any of this publicly uh but like as a broad stroke it's like how we would allocate resources of any other kinds at any company uh um you you run a comp you you run like VC portfolio like how do you allocate um your how do you allate your Investments between different companies or whatever you kind of make various trade-offs and you kind of decide should I invest in this project or this other project or how much should I invest in this project it's very much like a a zero sum of tradeoffs and it also comes into play like you know how is your how how are your like clusters configured like overall like you know what you can fit of what size and what cluster and so on so broadly there's no magic sauce here like I mean I think the details would add more spice but also wouldn't add more understanding uh it's just going to be like oh okay I mean this looks like they just think about this as I would normally do right so even the GPU rich run through the same struggles of having to to decide where to allocate things yeah I mean like at some point uh I forgot who said it but it's like you kind of fit your models to the amount of computer you have if you don't have enough computer you figure out how to make to it's small models but like no one as of today I think would feel like they have enough compute I don't think like I've heard any company within the AI space uh be like oh yeah like we feel like we have sufficient compute and we couldn't have done better uh so like that that conversation I don't think I've heard from any of my friends at other companies um Stella Stella from Luther sometimes says that because she has a lot of donated compute yeah um and she's trying to put it to interesting uses but uh for some reason she's decided to stop uh making large models uh so there I mean that's a that's a cool high conviction opinion that might pay out right I mean she's taking a path that most people don't care to take about in this climate and she probably will have very differentiated ideas yeah um I mean think about the correlation of ideas in AI right now it's so bad right like so everyone's fighting for the same pie um in some weird sense like that's partly why I don't really directly work on llms I used to be a gen like I used to do image models and stuff and I actually stopped doing ANS because uh ganss were getting so hot that I didn't have any calibration of whether like my work would be useful or not because oh yeah like someone else did the same thing you did it's like there's so much to do I don't understand why need to like fight for their same pie so like you know I I think like Stella's decision is very smart and how do you reconcile that with how we started the discussion about a intrinsic versus extrinsic kind of like a accomplishment success how should people think about that when especially when they're doing a PhD or like early in their career seems like I think in Europe I walked through a lot of the posters and what not there seems to be mod collapse in a way in the research a lot of people working on on the same things is it worth for like a PhD to not take a bat on something that is like maybe not as interesting you know just because of funding and you know visibility and whatnot or uh yeah what what suggestions would you give I think there's a baseline level of compatibility you need to have with the field uh basically you need to figure out if you will get paid enough to eat right like and like whatever reasonable normal lifestyle you want to have as a baseline so you at least have to pick a problem within a neighborhood of like fundable like you you wouldn't want to be doing something so obscure that people are like ah I don't know like you can work on it with a limit on fundability I'm just like observing something like three months of compute right that's the top line that's the like Max that you can spend on any one project but like I I think that's very ill specified like how much comp right so I think uh I think the the notion of funil is broader it's more like hey are these family of models within the acceptable set of you're not crazy or something right like even something like neural Odes which is a very like boundary pushing thing or like State space models or whatever like all of these things I think are still in fundable territory when you're talking about I'm going to do one of the neuromorphic models um and then apply like image classification to them or something then it becomes like a bit questionable again it depends on your motivation maybe if you're a neuroscientist it actually is feasible but if you're like a AI engineer like the audience of these podcasts then it's less qu you know it's more questionable so I I think like the way I think about it is like you you need to figure out how you can be in the Baseline level of fundability just so that you you can you can just live and then after that really focus on intrinsic motivation and um depends on your strengths like how you can plate your strengths and your interests at the same time like you like I try to look at a bunch of ideas that are interesting to me but also try to play to my strengths I'm not going to go work on theoretical ml um I'm interested in it but when I want to work on something that I try to partner with someone who is actually a good like theoretical ml person and see if I actually have any value to provide and if they think I do then I come in so I think you'd want to find that intersection of ideas you like and that also play your your strengths and I'd go from there everything else like actually finding extrinsic success and all of that I think is the way I think about it is like somewhat immaterial when you're talking about building ecosystems and stuff like slightly different considerations come into play but that that's a that's a different conversation yeah um I I should we we're going to Pivot a little bit to just talk talking about open source AI um but one one more thing I wanted to establish for meta is like this 600k number just kind of running out the discussion uh that's for all meta um so including your own inference needs right it's not just a training it's I it's for all it's it's going to be the number in our data centers for all of meta yeah yeah so like you know there there's a decent amount of workload serving Facebook and Instagram and you know whatever um and then uh is there interest in like your own Hardware we already talked about our own Hardware um it's called MTI yeah our own silicon uh I think we've even showed like the standard photograph of you holding like the chip that doesn't work I mean like as in the chip that you basically just get like as a test yeah test chip or whatever um so we are working on our silicon and we'll probably talk more about it when the time is right but like what gaps do you have that you know the market doesn't offer okay I mean this is easy to answer so basically remember how I told you about the whole s like there there's this memory hierarchy and like sweet spots and all of that fundamentally like when you build a hardware like you you make it General enough that a wide set of customers and a wide set of workloads can use it effectively while trying to get the maximum level of performance they can uh um the more special specialized you make the chip the the more uh Hardware efficient it's going to be the more power efficient it's going to be the more easier it's going to be to find like the software um like the kernel right to just map one that one or two workloads to that hardware and so on um so it's pretty well understood across the industry that if you have a sufficiently large large volume enough workload you can specialize it and get some efficiency gains like power gains and so on so the way you can think about every un building every large company building silicon like I think uh a bunch of the other large companies are building their own silicon as well is they each large company has a sufficient enough set of verticalized work workloads uh that have a pattern to them that say a more generic accelerator like an Nvidia or MD GPU does not exploit so there is some level of power efficiency that you're leaving on the table by not exploiting that and you have sufficient skill and you have sufficient uh forecasted stability that those workloads will exist in the same form that it's worth spending the time to build out a chip uh to to exploit that sweet spot like obviously something like this is only useful if you hit a certain scale and that you're like forecasted prediction of those kind of workloads being in the same kind of specializ EX you know exploitable way is true um so yeah that's that's that's why we're building our own chips amazing awesome um yeah I know we we've been talking a lot on on a lot of different topics and going back to open source you had a very good tweet he said that a single company's close Source Seer rate limits against people's imaginations and needs how do you think about that how do you think about all the impact that some of the M work uh in open source has been doing and maybe directions of the whole open source AI space yeah um in general I think first I think it's we talking about this in terms of open uh and not just open source because like with the whole notion of model weights no one even knows what source means for these things uh but just say just for the discussion when I say open source you can assume it's just I'm talking about open and then there's the whole notion of like licensing and all that like you know what commercial non-commercial commercial with Clauses and all that I think like at a fundamental level uh the most benefited value of Open Source is that you make the distribution to be very wide like it's just available with no friction and like you can people can do transformative things um in a way that's very accessible like maybe like it's open source but it has a commercial license and I'm a student like in India I don't care about the license I I just don't even understand license but like the fact that I can use it and do something with it is very transformative to me like I I got this thing in a very accessible way um and then like so it's very very various degrees right like and then like if it's open source but it's like a actually like a commercial license then a lot of companies are going to benefit from like gaining value that they didn't previously have that they maybe had to pay a closed Source company for it so open source is just a very interesting tool that you can use in various ways so there's again two kinds of Open Source One is like some large company doing a lot of work and then open sourcing it and that kind of effort is not really feasible by say like a band of volunteers doing it the same way so there's both a capital and operational expenditure that the large company just decided to um ignore and give it away to the world uh for some benefits of some kind U they're not as tangible as like direct revenue or something so in that part meta has been doing incredibly good things um they've fund a huge amount of the pytorch uh development they've op Source llama and those family of models um and several other fairly transformative uh projects f is one segment anything detectron detron to uh dense pose I mean it's yeah seamless like it's just like the list is so long that you know we're not going to cover so like I think meta comes into that category where like we spend a lot of capex and Opex and um we have a high Talent density of great AI people and and we open our stuff and the thesis for that uh I remember when fair was started the common thing was like wait why would meta want to start a open AI lab like what what what exactly is the benefit like from a commercial perspective and for then like the thesis was very simple it was like AI is currently rate limiting meta's ability to do things um our ability to to um build various product Integrations moderation various other factors like AI was the limiting factor and we just wanted AI to advance more and we didn't care if the IP of the AI was uniquely in our possession or not for us like however the field advances that accelerates like meta's ability to build a better product so we just built like an open AI lab and we said if this helps accelerate the progress of AI That's strictly great for us like very easy rational right still the same to a large extent with like the Llama stuff and it's it's a bit more um I think it's it's the same values but like you know the argument it's it's a bit more nuanced um and then there's the second kind of Open Source which is oh you know we built this project nights and weekends and very smart people and we open sourced it and then we built a community around it this is like the Linux kernel and um various software projects like that so um I think about open source like both of these things being beneficial and both of these things being different um they're they're different and beneficial in their own ways uh the second one is really useful when there's an active Arbitrage to be done um if if someone's not really looking at a particular space uh because it's not commercially viable or whatever like a band of volunteers can just coordinate online and do something and then make that happen uh and that's great um I want to cover a little bit about like open source llms maybe um so open source llms have been very interesting because I think we were trending towards a an increase in open source in AI from 2010 uh all the way to like 2017 or something like where more and more pressure um within the community was to open source their stuff so that their methods and stuff get get adopted and then the llm revolution kind of uh took the opposite effect um open the eye stopped open sourcing their stuff and uh deep mind kind of you know didn't like you know all the other cloud and all these other like providers they they didn't open source their stuff and it was not good uh in the sense that first like science done in isolation probably will just form its own bubble where like people believe their own or whatever right so there there was that problem uh and then there was the other problem which was the accessibility part like okay uh I again always go back to like I'm a student in India with no money um what is my accessibility to any of these Clos Clos Source models uh at at some scale I have to pay money um that makes it a non-starter and stuff and there's also the control thing I strongly believe the best um if you want human aligned stuff you want all humans to give feedback and you want all humans to have access to that technology in the first place um and I actually have seen living in New York whenever I come to Silicon Valley I see a different cultural bubble uh like all the friends I hang out with talk about some random thing like Dyson spars or whatever ever you know that's a thing and most of the world doesn't know or care about any of this stuff like it's like definitely like a bubble and Bubbles can form very easily and when you make a lot of decisions because you're in a bubble uh they're probably not globally optimal decision so I think like open source the distribution of open source powers a certain kind of non falsifiability that I think is very important um so I think uh on the open source models like it's going great in the fact that Laura I think came out of the necessity of Open Source models uh needing to be fine-tunable in some way um yeah and I think DPO also came out of like uh like the academic open source of things so why like do any of the Clos Source Labs all did any did any of them already have Lura or DPO internally maybe but like that does not Advance M like Humanity in any way it advances like some companies probability of doing the winner takesa uh that I talked about earlier in the podcast so um I don't know it just feels fundamentally good like when people try to you know people are like well like what are the ways in which it is not okay and this this might be a little controversial but like I find a lot of arguments uh based on whether like Clos Source models are safer or open source models are safer very much related to whether what kind of cultural uh culture they grew up in what kind of society they grew up in if they grew up in a society that they trusted then I think they take the close Source argument and if they grew up in a society that they couldn't trust where the norm was that you ah you didn't trust your government obviously like it's corrupt or whatever then I think like the open source argument is what they take I think there's a deep connection to like people's innate um innate biases from their childhoods and their trust in society and governmental aspects that pushed stem towards one opinion or the other and I am definitely in the camp of um open source is definitely going to actually have better outcomes for society close source to me just means that centralization of power which you know is really hard to trust um so I think it's it's it's it's going well in so many ways um there's not like the we're actively disaggregating the centralization of power to just like two or three providers we are I think benefiting from like so many people using these models in so many ways that aren't loved by like say like Silicon Valley left wing um um tropes like some of these things are good or bad but like they're not culturally accepted universally in the world um so those are things worth thinking about and I think open source is not winning in certain ways uh like these are all the things in which like as I mentioned it's actually being very good and beneficial and winning I think one of the ways in which it's not winning at some point I should write a long form post about this is I think it has a classic uh coordination problem I mean open source in general always has a coordination problem if there's a vertically integrated provider with more resources um uh they will just be better coord coordinated than open source and so now open source has to figure out how to have coordinated benefits and the reason you want coordinated benefits is because these models are getting better uh based on human feedback um and if you see with with open source models like if you go to like Reddit local llama subreddit like there's so many variations of models that are being produced from say news research to I mean like there's like so many like variations built by so many people and one common theme is they're all using these fine-tuning or human preferences data sets that are very limited and like someone published them somewhere and like they're they're not sufficiently diverse and you you look at the other side like say frontends like uba or like um hugging chat or AMA they don't really have like feedback buttons like all the people using all these front ends they probably want to give feedback but there's no way for them to give feedback so these models are being built they're being uh arbitrarily measured and then they are being deployed into all these open source front ends um or like apps that are closed Source they're serving open source models and these these Front Ends don't don't have they are not exposing the ability to give feedback so we're just losing all of this feedback maybe open source models are being as used as GPT is at this point in like all kinds of in a very fragmented way like in aggregate all the open source models together probably being used as much as GPD is maybe you know close to that but the amount of feedback that is driving back into the open source ecosystem is like negligible Maybe less than 1% of like the usage um so I think like some like the blueprint here I think is you'd want someone to create a sinkhole for the feedback some centralized sinkhole like maybe huging face or someone uh just funds like okay like I will make available a call to log a string along with like you know a bit of information of positive or negative or something like that and then you would want to send pull requests to all the like um open source frontends like uber and all being like hey we're just integrating like a feedback UI and and and then work with like the close Source people is also being like look it doesn't cost you anything just like have a button and then the sinkhole will have a bunch of this data coming in and then I think a bunch of Open Source researchers should figure out how to filter their feedback into only the the like high quality one I'm sure like it'll be exploited by spam Bots or whatever right like this is like the perfect way to inject your advertising product like the next Coca-Cola so uh there needs to be some level of that that in the same way I'm sure like like all the Clos providers are doing today like open AI cloud like with the like the feedback that comes in I'm sure they are figuring out if that's legit or not so that kind of data filtering needs to be done and that Loop has to be set up and this requires that Central sinkhole and that like data cleaning effort both to be like there they're not there right now they're not there right now I think for for Capital reasons but also for coordination reasons okay if that Central sinkhole is there who's going to go coordinate all of this integration across all these like open source front ends but I think if we do that if that actually happens I think that probably has a real chance of the open source models having a runaway effect against um open AI with their current like daily active users rumored um probably doesn't have a chance against Google because you know Google has Android and chrome and Gmail and Google Docs and everything you know so people just use that a lot uh but like I think like there's a clear chance we can take at um truly winning open source do you think this feedback is helpful to make open source models better or to get to like open source AGI because in in a way like open A's goal is to get to AGI right so versus I think in open source we're more focus on personal better usage or like I think that's a good question but I think like largely I I actually don't think people have a good understanding of AGI and I don't mean definition level I mean people are like okay we're going to AGI means it's powering 40% of world economic output or some some something like that right but what does that mean so do you think electricity is powering 40% of world economic output or is it not like generally the notion of like powering x% of economic output is not defined well at all for me to understand like how to know when we got to Aji or how how to measure whether we're getting AI like you know you can look at it in terms of intelligence or task automation whatever and I think that's what we are doing right now we're basically integrating like the current set of AI Technologies into so many real world use cases where we find Value that if some new version of AI comes in we can find like we can be like ah this helps me more um in that sense I think like the whole process of like how we think we got to AGI will be continuous and not like not discontinous like how I think the question is posed so I think the open source thing will be very much in line with um getting to AGI because open source has that like uh natural selection effect like if a better open source model comes really no one says ha I don't want to use it because there are ecosystem effect I'm logged into my ecosystem or like I don't know if I like the models you know whatever it's just a very pure direct thing so if there's a betterle model that comes out then it will be used uh so I I I definitely think it it has a good chance of achieving how I would think about it as a continuous um path to what we might Define as AI um for the listeners I'll actually mention a couple other maybe related notes on just uh this this very interesting concept of uh feedback Sy Co for for open source really catch up um in terms of the the overall Google versus opening ey debate um open Assistant was was led by Yan Kure who recently ended his effort I think the criticism there was like the kind of people that go to a specific website to give feedback uh is not representative of real world usage and that's why the the models train on open syst didn't didn't really seem like they they have caught on in the open source World um the two leading candidates in my mind are lmis out of UC Berkeley um who have the LM lmis Arena which um you know is being touted as one of the only ways only reliable benchmarks anymore I kind kind of call them non-parametric benchmarks cuz there's nothing to cheat on it except for ELO uh and then the other one is um open router which is Alex atala thing I don't know if you've talked to any of these people I obviously know all all of the efforts that you talked about I haven't talked to them directly about this yet but uh the way I think about it is the way these models are going to be used is always going to be way more distributed than centralized um like which is the power of the open source movement like the the UI within which these models are going to be used is going to be decentralized like it's these models are going to be integrated into like hundreds and thousands of projects and products and all of that right and I think that is important to recognize like like the alenis leaderboard is the best thing we have right now to understand whether a model is better or not versus another model but it's also bias only having a sliver of view into how people actually use these models like the the people who actually end up coming to the ls leaderboard and then using a model only use it for certain things like like G like GitHub co-pilot style usage is not captured in say like LMS things and so many other styles like the character AI style uh things is not captured in El which open router could do they don't do it right now but yeah so like I I think like yeah my my point is like the way these models are going to be used is going to be always a large surface area and I think we need to figure out how to provide infrastructure to integrate with all these like ways in which it's being used even if you get like the top 100 front ends that this this the model like the open source models are used through to subscribe to like the sinkhole I think that's already like a substantial thing I think like thinking one or two things bu by themselves get a lot of data I think is not going to happen yep fair enough um before we let you go uh can we do just a quick Beyond text uh segment so uh you're investor in Runway which is a v generation your investor in 1X which is a humanoid assistant osmo which is focus on using AI for smell recognition and synthesis uh you advise a bunch of Robotics projects at at NYU maybe and he builds his own home robot yeah yeah exactly on a more yeah maybe open-ended thing what are like the things that you're most excited about Beyond like tax generation and kind of the more mundane usage yeah I mean in general I have more things things I'm generally excited about than I can possibly do uh investing is one way to try to clear those urges um I'm generally excited about robotics uh being a possibility home robotics being like 5 to seven years away into commercialization I think like it's not like next year or two years from now but like 5 to 7 years from now I think like a lot more robotics companies might pop out um there's not a good consensus on whether Hardware is a bottleneck or AI is a bottleneck in robotics right now um my view is actually Hardware is still the bottleneck and AI is also a little bit of bottleneck but like I don't think there's any like obvious um breakthroughs we need I think it just work so I'm generally excited about robotics I spend a lot of time a lot of personal time I spend like every Wednesday afternoon at NYU working with lurel pinto and team and just getting towards my like home robot that just does my dishes and stuff um what's the status of it like what what what does it do for you now as of today uh we just deployed um couple of months ago we deployed our our home robotic stuff into like several t of New York City Homes and like Tred to make it do a bunch of tasks and we're basically starting to build out a framework that gets to a certain level of robustness on fairly simple tasks like you know picking this cup and putting it somewhere else or like um taking a few pieces of cloth on the ground and put it somewhere else uh or open your microwave and like various like like Baseline tasks like that um with low sample complexity so like our the key thing I I think one of the things people don't spend any time in robotics is like the user experience uh which I think we in the research I do at NYU we spend a huge amount of time on I think the key there is sample complexity has to be really low uh a lot of the current robotics research if you see they're like oh yeah we collected like 50 demos and now it's able to do this task or we collected like 300 demos or the sample the number of samples you need for this thing to do the task is really high so we're focusing a lot on you you show it like two or three times and that's sufficient for it to actually like do the task um but it comes with like less generalization right like you there's some initial conditions that have to be true for it to do the task uh um so we're making progress that's very interesting in general the space um I don't think people in the space have settled on the hardware like how the hardware looks like for it to be truly useful in the home or whatever um or the ux or the like AI ml stuff needed to to make it sample efficient and all of that uh but I think like lots of work is happening in the field mhm yeah one of my friends Carlo at Berkeley he worked on a project called m3l which is two CNN's one for t tile feedback and one for image yeah um when you say Hardware is it running all these things on the edge or is it just like the actual servos and yeah by Hardware I mean like the actual like servos like the motors servos the the even like the sensors um I think we have incredible Vision that still like is is so much better compared to in the field of you and in resolution compared to um any of the cameras we can buy we have our our skin is like all available touch sensing and we have like some of the most efficient you know some of the most uh high capacity um Motors that can lift large loads you know in in like the dexterity of a hand and stuff so in terms of um Hardware I mean like in terms of those capabilities like you know we haven't figured out um how to do a lot of these stuff um I mean Tesla has been making incredible progress um 1X uh I think announced their new thing that looks incredible um some of the other companies figure and like others are doing great work but we're really not anywhere close to like the hardware that we feel like we need and there's obviously the other thing I want to call out is um a lot of what people show um works but like has to be fixed all the time and like that's the other thing we we are incredible at like we we don't need any maintenance or like the maintenance is part of us um if you buy a product an Electronics product of any kind you buy a PS5 you don't say oh yeah my PS5 breaks like every 6 days and I have to like do some reasonable amount of work on it but like that's robotics like if if it's not industrial robotics where it's very controlled and specialize or whatever like you're talking about reliability like in those ranges so I think people don't talk about the reliability thing enough like when I mean like you know we need we're going to enter the commercialization phase I mean like we're going to start thinking about okay now we have this thing and we need to figure out how to get reliability high enough to deploy it into homes and like just sell it to people and like Best Buy or something so that's the other factor that that's we have to make a lot of progress on I I I just realized that Google has a play in this with like paly and stuff and open ey obviously has a long history of doing this stuff um is there anything in meta uh I no robotic stuff in meta I used to we we have a small robotics program at meta out of fair I actually used to do it at Fair a little bit before I moved into infra and focused on my my meta time on a lot of like other infrastructural stuff um so yeah meta's robotics program is a lot smaller uh seems like it would be a fit personal Computing you can think of it as like Meta Meta has a ridiculously large device strategy right like you know this is how our reality Labs stuff like you know we're going at it from VR and AR and you know we showcase a lot of that stuff I think for meta like the robot is not as important as like the physical devices kind of stuff yeah for sure yeah um okay I want to touch on osmo a bit because uh very unusual company to the stuff that we normally discuss not robotics uh sense of smell yeah um the the my my the original pitch I heard from the found maybe you can correct me is that the you realize that you can smell cancer yeah is that intuitive is that what you get or poal like uh the very interesting reason I invested in osmo is because Alex wizco the founder of osmo also was like a um before py torch there was torch and Alex filco actively worked on Torch he's actually like a Frameworks guy like you know he built uh this thing called tangent from Google um like another like autod framework and stuff like so I know him from that side of things and then like I also like he is a neurobiologist by training um he just happens to also love like neural networks and like hacking on those Frameworks so incredibly smart guy one of the smartest people I know uh so when he was going in this direction I thought it was incredible that like smell is something that we haven't even started to scrape in terms of digiti a when we think about audio or images or video they're like so Advanced so we have the concept of color spaces we have the concept of like frequency spectrums like you know we figured out how ears process like uh frequencies in Mel Spectrum or whatever like logarithmically scaled images where like RGB yuv like we have so many different kinds of parameterizations we have formalized these two senses ridiculously well um touch and smell NADA we're like we're like where we were with images in say 1920 or maybe even the 1800s right that's where we're at and Alex has this incredible vision of like having a smell sensor just eventually just be part of your daily life like as of today you don't really think about like when you're watching an Instagram reel or something huh like I also would love to know what it smell like and you're watching a reel of of food or something you don't because we really haven't as a society got that muscle to even understand what a smell sensor can do I think the more near-term effects are obviously going to be around things that provide more obvious uh utility in the short term like maybe smelling cancer or like repelling mosquitoes better or you know stuff like that yeah more recently he's been talking about like categorizing perfumes obviously that's that's a market that you can pursue yeah like I mean think about how you can customize a perfume to your own liking in the same way you can customize a shoe or something right um so that like that's I think all the near-term stuff I think if um he's able to figure out a near term value for it they as a company can sustain themselves to then eventually like try to make progress on the long term which is really an Uncharted Territory mhm like think about it 50 years from now it would be pretty obvious to like kids of the generation to just like you know I guess I was saying I was going to say scroll a real on their phone maybe phones wouldn't be there they're just like you know on their glasses they're watching something I think VR would be and then like they immediately get like a smell sense of that remote experience as well like we we haven't really progress enough in that Dimension um and I think they have a chance to do it awesome um awesome I mean we touched on a lot of things anything we're we're missing anything you want to direct people to or yeah call to action yeah call to call for research call for startups I don't really have a lot of calls to action because usually I think in like people should be intrinsically like figuring that's a good look inside [Music] yourself that's good uh awesome thank you so much for coming on this was thanks a [Music] bit