frw you're really in you've advocated a lot for program synthesis and I'm curious uh you know as someone who like created Caris and is obviously like kind of a black belt and deep learning how did you um become uh like a proponent a proponent of program synthesis and how what was that like how that happened right um well you know they very early on I was really fascinated uh by Deep planning and like many other people in the field I thought the planning was going to be able to do everything so I thought you could actually use cred and descent as a a full replacement for programming right uh that neur networks uh under some conditions could beat your incomplete uh and CR in descent was a was a good enough way to program them into doing whatever you wanted as as long as you could provide them enough examples and uh you know around 20 2016 there was a lot of research uh along these lines 2015 2016 like nuring machines for instance uh and the like and uh around that time I was working on a project uh together with Christian Zim uh at Google and we were trying to use deep learning uh to do theorem proving in particular to to guide a symbolic theorem improver um and our string models uh to interpret higher order logic statements um to a string uh on back then right using using K training carms and I had a a very difficult time with it and I tried many many tricks many kinds of techniques to address it but uh the inevitable conclusion I was reaching was that uh no matter what you tried the neural network would always try to latch onto statistical regularities like said School noise effectively and would not be able to actually implement the passing program I wanted it to implement it just could not find it this was despite the fact that in theory that program was representable with the network it's just great desent couldn't find it and so I was I was uh struck by that fact and I started realizing that de actually uh was a good fit for some kinds of problems uh problems where you're doing effectively pattern matching continuous space but was not going to be able to do everything that you actually needed to invol discrete symbolic programs and that if you wanted to create these discrete symbolic programs there were better ways than cre on descent and I think this this is a realization that's been confirmed uh many times since and uh nowadays we even have examples where uh you have a very simple algorithmic task uh and you try to to solve it with Transformer and the solution the solution that to generalize is representable uh by the Transformer and if you initialize the weights of the Transformer with the exact correct weights uh that implement the generalizable solution and then you keep finding the network on more examples of the same task the network will unlearn the solution that generaliz and will learn a different overfit solution so this is this is very very strong proof that uh gr descent is just not uh the way to learn algorithms like this that you actually need this good search so how I guess how much of that do you uh put into like the learning mechanism right so in that case you're saying you started with the initial conditions being the correct solution and you unlearn uh so how much of that is kind of deficiencies in the learning mechanism versus the representation so we'll just you know before you came we were talking about kind of the advantages of like symbolic program you know kind of classical program representations versus you know more more like neuro representations and whether there's the benefit of like hybrids or things which are just distinct from either of those right and so I guess thinking forward do you think we need is it primarily like new learning methods like new optimization methods or new representations or both or or something else it's a little bit of both but the primary bottleneck is actually the learning mechanism the primary problem is quite inent is just not the way to learn programs um the representation as well uh I think we we need better representations I think so if you're faced with any problem that has a continuous structure then Vector spaces are are the right data structure to approach it um and if you have a problem that is more discrete in nature it's always possible to embed your District discret structure in a continuous space but of course you don't derive any benefits from interpolating or or you know moving around the continuous space so uh it is clearly not it is possible to handle discrete problems with neural networks it is clearly not an optimal Choice there are no no benefits to doing so uh in fact it's just not not very efficient um so the representation problem uh I think definitely is real and that's that's that's a space where you can propose new ideas uh but it's not the primary bottleneck you you could do in principle everything with neural networks if you had the proper learning mechanism even even though for many problems neural networks might not be the best fit MLS is sponsored by two for AI Labs now they are the Deep seek based in Switzerland they have an amazing team seen many of the folks on the team they acquired mind's AI of course they did a lot of great work on Arc they're now working on 01 style models and reasoning and thinking and test time computation the reason you want to work for them is you get loads of autonomy you get visibility you can publish your research and also they are hiring as well as ml Engineers they're hiring a chief scientist they really really want to find the best possible person for this role and they're prepared to pay top dollar as as a joining bonus so if you're interested in working for them as an M Mo engineer or or their Chief scientist get in touch with Benjamin cruisier go to tabs. and uh see what happens yeah I mean we've we ~ Sponsored Segment Removed ~ gu we and you have done some work like looking at kind of embeddings of discrete structures into continuous spaces and I think you know in some of the early work you know I remember you were saying to me we could try this but all of the evidence you've seen so far is that even if you can encode it it's just going to create this intractable surface to optimize and so it's actually not worthwhile um do you think this is generally true or like what are the what are the cases where there is some benefit of having a relaxation or an embedding of a discrete problem into a continuous space and what are the cases where like that just doesn't work out for whatever reason it seems helpful when you can uh prove that the relaxation is like actually kind of upper bounding what you want so uh it can give you some guidance because in the smooth space there can to be stuff that you can pretend to do that you could actually do in the discreet space uh but only is a kind of a stepping own to refining it into a discrete solution um which which is kind of connected to but I think distinct from uh using neur networks as a heris to guide discrete searches but it is this weird empirical fact that when you try to train neural Nets to like emulate algorithms they just they just don't do it whereas they emulate all sorts of other interesting computations it's really weird I think it really depends on the on the intrinsic structure of your problem if you have a problem that has a continuous interpolative structure so a problem that follows the assumptions of the manifold hypothesis then sure you should be using your networks it is it is probably the best fit uh if you have a very discrete problem like I know you want you want to find new prime numbers for instance you can absolutely cannot use your networks it's just a terrible idea so we we've been talking about uh continuous presentations disc presentations what do you think could be a hybrid substrate uh a kind of data structure and kind of uh you know learning process that would not quite be uh continuous in nature like neur networks not quite be discrete in nature like discret program search something in between that could do both at the same time not not not quite hybrid in the sense of combining new networks with discrete search but something hybrid in the sense that it is this one data structure that natively has characteristics of both yeah I think this is the big question I I don't think we I don't think we necessarily have the answer I think I think there are ways of I mean it still may come under hybrid but I think there are ways of integrating your networks into programs that are deeper level than what is currently being done right so right now um you've got like you know programs calling neuron networks and neuron networks calling programs effectively um but I think there's a way to integrate the say neural network more into the semantics of a programming language which could at least be interesting so one kind of M model I have in my mind is of debugging right so imagine you've got a a Python program and instead of just like running it you're debugging it right and so you know more or less you're like stepping through every uh uh kind of step of the execution traits and every step you can inspect the stack you can like maybe go backwards and revise the steps you did like what if you put a neural network there right where it can actually control the Dynamics of the program's execution so this is like a more deep integration of a of a neuron Network than just like a blackbox function call and I think it could lead to like almost like the program being the the guard rails or the structure of how the things should execute but the neuron network has more latitude about exactly what to what to do when um I think for like a more like Bottoms Up uh you know reformulation I think we need to think deeply like what do we want from um uh from our representations of programs and maybe my more concrete answer is to maybe take like a functional perspective right so um you can imagine uh using a neural network to implement uh the operations that a kind of entally to imp to implement The Interpreter of a program right uh and this is saying well I don't care about the underlying syntax how this represented I just care that when I execute this I get the actual right result right and and I think there's a lot of work that I've done like experimentally and other people have done saying well let's take a kind of a functional perspective like what what do we want our programs to do as opposed to what they are and if we know what we want them to do then we can encode neural networks that Implement those things uh and that could perhaps make the learning problem like the discovery problem uh of these programs a little bit easier right so this is like let's you know you were talking about neur neural machines and around the same time there were things like neural stacks and neural data structures this is less like let's make a new architecture for this new thing L instead perhaps use a standard neural architecture but Implement you know uh a t machine or a stack by kind of Behavioral equivalence uh uh instead of like a structural equivalence and so it's a little bit fuzzy but I think there's a kind of a scope of approaches which are looking at a higher level of behavior rather than like new new architectures I do you have ideas about this uh no so so um first I uh uh uh I I think it is the big question as you were saying to to do what um uh uh you're suggesting like to make something which is more deeply integrated and not just a a Frankenstein like bolting together these pieces um uh and uh if we were to have a neural network that was executing the code it might have the advantage that you could do some of the like test time training that we're seeing working on Arc uh because you can kind of fluidly redefine the semantics for each problem so um uh this seems really valuable to be able to not just have like a fixed set of symbols but to say like well for this problem I want to like Define what an object is a little bit differently you know and it might be more of a perceptual sub symbolic thing and uh if you were to Define your semantics in terms of these neural operators which had kind of the API of a programming language but might have a sub symbolic implementation then maybe you could get kind of The Best of Both Worlds do you have any ideas yeah I mean these are very interesting ideas um and trying to learn program interpreter I think that can have lots of useful applications in particular even if you're just doing you know discrete program search uh you could leverage a new program interpreter because uh program execution is really what takes a lot of time uh in program search right uh and the four part of your of your uh program interpr neural network would probably be quite fast so you could use a neural network as a a sort of guessing machine with respect to program execution right uh you you're you're kind of like uh reading reading the program or some representation of the program and guessing uh uh what the behavior would be uh and that would be a way to drive to guide uh the search process uh more efficiently without having to actually you know enter enter this exis Loop one question i' I've I've had for you is um how much do you like how important do you think are almost like systems level engineering uh problems or approaches in this and I and I think about that in the context of your work in Caris and so on um and we discussed it a little bit over email right there's a whole ecosystem of machine learning built upon a framework of automated differentiation and then as you built in Caris lays on top of that so as you think about like you know these kinds of problems these kind of high level problems do we think we need like more infrastructure to support them or do you think we more or less have the same or more or less have like the right foundation in order to tackle these I really don't think we have the right foundations today I do think we we need a lot a lot more and a lot better infrastructure and that will come but I think right now we're actually at the research stage where we need to figure out what works like the the set of programs today is pretty much the state that deep learning was at around like 201 for instance uh there's a lot more interest in it some things are starting to work but we are still uh um missing the big breakthrough moment and the this sort of like crystallization of understanding of what techniques actually going to work and what techniques actually going to scale right to solve your world problems and once we understand that once we've settled on one algorithm right uh then we can start building up infrastructure around it and in the future there will be a careas for program synthesis I'm quite sure uh I don't think today is the right time to build it because we just don't know enough uh about the state of the land basically uh but in a few years hopefully yes we we're going to have that yes so my my um my main pH adviser amand his kind of main thing was program synthesis right and he's kind of the original program synthesis guy joint with joint with Kevin um well he he advised us TR he advised both of us um and there there was like a kind of a a lot of different techniques for program synthesis uh that were developed mostly symbolic um but symbolic doesn't just mean like you know searching over programs there's a lot of more interesting techniques right you can use abstraction abstract interpretation uh sat smt sols which are incredible pieces of Technology um and then you know language models came along and they started to generate programs right uh and in some sense like the whole research Paradigm was validated because it turns out that program synthesis is indeed very useful but a lot of the methods that were developed uh aren't directly leading to like the current leading um program synthesis methods which are largely LM based and so I wonder like when you think about uh I guess current you know technical foundations do you think we're going to see a uh revisit back to some of the more classical techniques I don't maybe classical isn't even the right word but like uh methods which uh incorporate more of like the semantic understanding of what programs are right which do more analysis versus things which are more kind of purely you know data driven uh learned even if they are using programs as the underlying representation that's being constructed I think the future is definitely more learned and more data driven but I think you can uh learn in a more symbolic manner you can learn symbolic abstractions via a symbolic search process right um I think the reason why people are using llm so much for code generation today is just because it works and it's easy it's a tool let available the reason it works so well is just because of scale like the amount of resources that went into developing these models and training them is roughly you know 10,000 times what went into the the symbolic techniques that you're describing which you know remain today very very obscure academic topics um and meanwhile you know there's like hundreds of billions of dollars that went into developing uh this huge uh models trained on all the code in the world and when you when you put this this amount of resource into something um even if it may not be the optimal approach even if it's actually severely suboptimal it will create something useful it will create something powerful and once it's out there then you canot have to use it if you're not making use of it you're just leaving you know a lot of power on the table you're just missing out on a very very powerful very useful tool and so from a from a game theory perspective just the the the sheer amount of investment inms kind of forces everyone to standardize on this set of techniques at least for now right I don't think I don't think it's the best way to approach to approach problem synthesis but it's a way that is giving good results today so it's it's just worse puring it yeah I I I uh one thing I wondered is like imagine what like program compilation would look like if you used like GPT for level compute to compile your programs you know um do you have any thoughts on this like uh I don't want GPT compiling um uh like maybe could write part of the compiler if I I just meant the level of compute right if you think about like uh how much compute goes into analysis of programs at the moment it's you know it's like a we get tired if we've spent a few minutes right on a on a single machine right if you had a cluster a super computer it expands the class of analyses and uh methods that you would employ oh like a super Optimizer yeah yeah I mean that that could happen yeah yeah um probably hand inand with the LMS just because you know they uh they're so useful in practice um so speaking of like scaling methods are either of you freaked out by the fact that things like psych like tried hard to scale symbolic knowledge and then uh today we don't use psych but we do use GPT like how do you reconcile that psych was never very very scalable because it was very reliant on human labor I see so you think learning is really just the missing ingredient so to scale you need to be able to delegate everything to the computer there should be no human in the loop yeah and I think also um you know there's a class of approaches which have been uh like yeah human-made ontologies of knowledge right by groups of groups of dedicated people um and like you know decade long projects and I think these have been they've scaled in the sense that they've grown grown and grown but not you know comparatively to you know let's say learning based approaches and I think part of the reason is um what Fran said like you know you want to delegate as much as possible if not everything to the you know to things that scale you know computers uh but also you know classical ontologies are like just not good representations of knowledge right like the world is not encoded in graphs um and so we need both like things that can capture the complexity of the world and we need systems that can uh SC scale to build that Knowledge from from data yes and you can definitely you know argue that um Vector spaces in emings are intrinsically a much better representation of knowledge than graphs it may not be true for every type of data but for most of the data that we care about it is and that's this is why new networks actually do so well across so so many modalities they're just a fundamentally good idea compared to anies but also I feel like there's there's a few different layers there right so like kind of there's the vector space which is the the most grounded layer or in which you know things are encoded but on top of that you know there's whatever the learn representations to encode that so um I guess when you know when you were speaking I was thinking uh there's not in my mind a direct analogy between the kinds of graph anal graph ontologies that people have constructed and the vector space representation they seem to be at like slightly different levels of of implementation I guess I've got some questions about um uh Arc and your findings from you know the ark challenge I'm sure we'll talk more about this soon uh maybe could you just say like what what was like kind of the biggest Insight that you learned or things that are unexpected uh going through the whole process this most recent right the the biggest Insight is that um you know I've come to realize that it's definitely possible to uh make llms and and and deing M and adapt to novelty and there's more than one way to achieve this uh and uh test time training is is definitely you know it has emerged as one of the big techniques that are working today to get their to adap no to and it's it's still an open question whether test time training is actually good enough like is test time training going to be able to do everything uh to solve uh strong generalization in your networks um I I think it's possible you know with with further requirements and uh test time training is not even the the only approach you can take you can also do do things uh in the style of uh the o1 model where you try to adapt novelty by writing down a program and that program is is written you know step by step in a very iterative fashion in fact it's it's written by uh an alpha zero style search process probably um and by writing down this program um uh which is which is an artifact that uh is is new and is adapted to the task at hand you can make uh your static model adapt adapt to novelty right um and you know a few years ago it seemed to me that we were never going to get deep planning out of the the classical uh uh Paradigm where you're just training a big model on a lot of data uh and then you're you're applying it in aesthetic fashion uh at inference time and of course if if you stay within that Paradigm then you're fundamentally limited to uh memorizing and then uh inference time fetching and reapplying patterns and if you're doing this then you're never going to generalize very well you're always going to be limited to uh situations that are very close to what you've seen in the past uh but now if you if if you include you know uh test time fing to adapt to new patterns if you're including uh the ability to iteratively write uh your own program reprogram yourself on the fly to to adapt to new task and so on then you know maybe these systems can can can achieve a a much higher degree of generality and and starting starting with crack cracking rgi I think it's it's highly possible at this point we see point that we haveh do you see like a a clear delineation between classes of Novel like things that are more genuinely novel and things which are not right and so in the past you've said um one of the just like technical implications of encoding you know uh your knowledge within a vector space is that you're really going to be you know interpolating between as you said patterns of of of of of patterns of or programs um but one of the things that Arc tries to test for at least as in my view is like a certain class of novy which goes beyond that um uh and we've you know thought about and talked about like what is that boundary of of novity there's clearly some generalization that happens between you know uh which you can interpolate between and and intuitively when you look at problems there some which seems like oh it requires some new concept or some new idea that we haven't seen before um but it's hard to say here is like the strict boundary between these two different uh kind of classes of generalization if if there are two I wonder if you had thoughts on on that yeah I think what you're describing is a compositional novelty where you're taking Elementary building blocks and recombining them together uh typically the have function composition right and this is something that Transformers are really bad that uh in general Transformers are very bad at at function composition uh broadly speaking um and uh it it doesn't mean that you cannot solve these problems with Transformers it means that you cannot solve these problems only with Transformers right you need to add something but if you put your Transformer in the loop for instance uh you start being able to do uh uh to be able to solve a much larger set of problems um so one one thing we're doing right now is trying to understand which of the tasks in Arc um are sort of like easy to easy to solve without exhibiting strong generalization and which ones actually do require a little bit more uh and remain very is it for humans of course because humans have the secret source for generalization and we don't we don't quite know what it is yet and U so next year early next year we're going to be releasing a new version of data set that is trying to carry higher signal towards the gii so basically require stronger generalization so it's going to contain a fewer tasks that are easy to crack via uh just BR for search uh and more tasks that that really require you know uh uh that involve compositional complexity and really require strong generalization to solve can you give any more hints about um what the design decisions are which go to push you like what is it like a systematic process where youd say okay this you know this new candidate Arc 2 problem does force us into strong generalization versus versus not or is it more of an intuitive uh process where like you just you know know as a human design it's it's a little bit of both so the thing is that we don't yet have a crisp answer as to how humans uh perform strong generalization um or what would be a kind of task that would strictly require it and that would not not be approachable at all but force uh so we don't quite know so we're we're looking at the results that we're currently getting we're also looking at human difficulty data and by the way this is also a big nalty with aru that we're actually uh getting all the new tasks uh soled by uh humans uh by by just you know Rand people that we're hiring and we're getting them to to to solve our little puzzles and that's how we know first of all we know that they're solvable and second we have human difficulty data available like we know how many attempts uh we used we know how many people uh Sol solve this problem and so on and we can see how that correlates with AI facing difficulty as well I see I guess I got a question for maybe both both of you um So within this new project marrow that we've been developing and and launched we've been both thinking about our own benchmarks and also existing benchmarks obviously we've started working on Arc um but there's you know there's a question of should we continue working on Arc right and uh and uh I guess both Arc one versus Arc two as it comes out and this came up in our discussions earlier with Tim of like how much of the uh approaches that were taken to Arc uh in the you know the previous round were truly in the spirit of Ark versus were kind of trying to um you know just like get the best Squat and I I think I said that our approach was I think a little bit of both right we had some some Arc hacks right we also thought deeply about some some principles um so maybe as a as a question to Kevin and i' like to hear your thoughts from well of like you know how much time should we dedicate the mara project to Arc um and and if not what other things might we want to pursue that are not within the arc framework uh so I I think Arc like continues to be a source of new ideas um uh and it's it you know there's there's been a lot of progress on it but it's also been amazingly resistant to people um trying incredibly hard to crack it so I think there going to be a good payoff and contining to work on it um I think W with Mara the the main thing we want to bring into the picture is the fact that you're not just passively receiving the data um you're trying to play with something experiment with it and um in the spirit of Arc build a model on the Fly um but uh where it's your job to um uh figure out what kind of questions you want to ask and experiments you want to do so um uh I think our plan here is to to keep on doing Arc stuff and thing in the arc sphere but also to supplement it with um uh new kinds of tasks that are Arc style but um where it's more active learning so you FR well should we wait for Arc two or should we should we continue on Arc one I think Arc two is starting satate but uh of course I think Arc one uh remains a very useful playground so you don't have to wait for Arc to if you have ideas that you want to try you can try them on Arc one and if they're good ideas they will score well and then you can you can try them again on Arc two when it comes out and you're not going to have to wait too long by the way um and you know I think the the reason why you should continue working on AR is uh it's like a microw world for a lot of these important uh uh problems regarding generalization and on the Fly adaptation and if you look at uh any other Benchmark that features the same problems you will see that it also involves a very significant amount of knowledge so for instance you could be working on uh code benchmarks like software engineering benchmarks and they do fiture the the the same kind of problems but um because they involve so much specialized knowledge about programming and programming languages and code patterns and so on uh it it kind of gets in the way and Arc is this very clean very minimalistic uh microw world where there's almost no knowledge involved just you know a few bits of core knowledge and it's all about uh abstraction generation it's all about the key problems so this this brings really I think a lot of focus on the questions that's really matter Back To Top