Hi, everyone. Thanks for joining us. As an introduction, this is deep papers. It's four part series doing deep dives on seminal papers. Today, we are fortunate enough to have some of the authors of the Instruct GPT paper on with us from OpenAI. This was kind of one of the first major results applying reinforcement, learning with human feedback to large language models and is kind of done in the wake of a lot of the the success of chat GPT. So instruct GPT was some of the background behind Chap GPT or a lot of the the main methods end up carrying over. So we're very fortunate to have some of the authors from the paper on today and we're going to talk about the instructor GPT paper. But before we begin, let's just do a quick round of intros for everyone to introduce themselves. So I'm also joined with with Jason, who's the CEO of Arize AI. So maybe, maybe I can go first and then you guys from and I can go and then Jason can go. So I'm Brian. I run this Twitter account called AI-Pub. The goal for 2023 is to turn it into like a Twitter native public publication covering technical AI news and topics primarily geared towards a technical audience of people who are machine learning engineers, machine learning researchers. I'm also on leave from Machine Learning Ph.D. within the Statistics department at the University of Washington. So you want to go Long? Yeah. My name is Longo OuyYang. I'm a research scientist at OpenAI. I am by training, I'm a cognitive psychologist. But these days I work on doing human in the loop animal projects, of which there are many at Openai. And I collaborated with Ryan for this Instruct GPT. project, which spanned a couple of years. Yeah, and I'm Ryan. I'm also a researcher at OpenAI, and I currently lead a team called Practical Alignment, which is a sub team under the alignment team and focusing essentially on kind of like short term questions which in the last couple of years has essentially meant working on the construction beat here. Great, great, great to have both of you on. And I'm Jason, co-founder of Arize AI. We're more observability so in monitoring. So we watch and observe and analyze models. So we see a lot from large language models to CDD to you name it, and really fascinated about the subject. So excited to kick this off. Great. Well, maybe let's just let's just dive into it. The first 10 minutes or so of this podcast are just going to be giving an overview of the paper and then later on we'll move into a more open ended discussion. I assume the vast majority of the readers haven't actually read the paper. So the first bit we're just going to go through and chat about some of the major ideas. So so just to start, like you kind of have these Vanilla Elms is trained on vast quantities of text data from the Internet, and then the instruction about paper and papers afterwards words, and before it introduced these kind of models that are trained using reinforcement learning with human feedback, kind of what are the main problems with these vanilla, large language models that you guys addressed in the paper? And also what was some of the motivation behind the instruction paper, maybe in a more positive sense, aside from just problems encountered with with non fine tuned alarms? Yeah. So I think one of the main issues we're trying to solve is that when GPT three came out, there was a lot of excitement about using it to do useful cognitive work, for example, like summarizing a news article or something and Out of the Box is not exactly designed to do that. It's like designed to predict what someone on the internet might say in a given setting. And it turns out that you can kind of like trick the model into performing useful work for you by like setting up. A. Text that, you know, when the model auto completes and gives you rob you want. So for a summarization, an example would be, you know, like maybe a few examples of like an article and then tldr, you know, a summary of the article and then finally the article that you want summarized and tldr and then you ask them to complete it and so the model isn't designed to actually be like an assistant or a useful tool, but you can kind of like contort it to do that in some cases. And, and the goal, the overall goal of the paper and the project, which continues to this day, is to actually just fine tune the model on an objective function, which is like actually, you know, be a useful assistant or useful tool. And this is actually kind of like this emerge out of some earlier work on on what we call aligning language models. And actually, Ryan, you want to just talk a bit about alignment at opening. I sure. So just say go for the whole AMA team, etc. etc.. Why not? Yeah. Yeah. Okay. Briefly. People have different definitions of alignment, but one definition that you could use is how do you get the systems that we're training to like optimize the thing that we actually want to optimize? And you know, historically only at a certain time with a small team and that's where some of the like initial early Jeff work kind of came into play and that kind of evolved. And now we kind of have, let's say, like a short term alignment team, which is like, how do we have current models like currently which models like optimize the thing, We really want to optimize, which is like be useful, be helpful, and also like, you know, mitigate harms and like be truthful. And there's also some work on kind of like longer term alignment, which is kind of trying to think about what are the new alignment problems that might come up as we see all these models. And so there's some work on scalable supervision, we call it, and a bunch of other things which which I won't get into right now. What was the term alignment created by Openai? I hadn't heard of? You know, it's caught on recently, but, but just wondering if that was created by. But you guys. I don't think so. My memory is that I could be misremembering, but I think is maybe if not coined by then, at least like popularized by Stuart Russell at at Berkeley. Okay. Well. Maybe, maybe you guys could just give a short elevator pitch or a summary of the instruction paper like, what did you guys do? What was the result of it? What is structure for someone who's listening and hasn't heard of it? Yeah, why don't I describe what it is and then Ryan, you can give an outline of like how he, how he built it. This is an automated system which you provide some text as an input and then it provides you some text as an output. It Yeah, I guess these are probability distributions over what we call tokens. So a token is like kind of a part of a word. Sometimes it's an entire word and you get output from, from them. This tool by a kind of at every stage like sampling what the next token might be, and then kind of continuing that process until you're done. So sometimes you get different results because the model is a little bit like probabilistic. Importantly, the input that you give this model is just a natural language like command or instruction. So you can give it an instruction like, you know, write a story about frogs in French and it's trained on a wide variety of different tasks. So it can generalize to tasks like, you know, write a story about frogs in French, which I think there's not that type of thing that was not seen during his training in and just to like highlight again the difference between in the abstract model and earlier vanilla language models, The abstract model like kind of quote unquote understands that you're giving it at some explicit cognitive task to do and that you're doing that just explicitly in language, whereas previous LMS, the way that you kind of like communicated through the model, the task you wanted done was maybe through some examples or is a more implicit fashion. Gotcha. Right. I'm unit I started talking, I telling you, yeah, you can say at, at a high level how we get there is essentially using human data, using laborers. We hire a set of contractors to label data for us, and we essentially do an extra fine tuning stage on top of the normal language modeling, language model, retraining stage and and then of three steps, which which I think we'll we'll get into a bit. But essentially the goal is to get the main kind of data. There's a few different kinds of data, but one of the main kinds of data these labels produce is given some input, like write a story about frogs. There's like multiple candidate outputs generated by different models and labels go and they essentially ranked from best or worst, which which outputs they prefer according to like some set of instructions and their interpretation of the instructions. And and then essentially we train the model using reinforcement learning to try to produce outputs that are closer to the outputs that a human would prepare for in a highly gotcha. And then and then that kind of naturally segways into getting a little bit more into the details of this three step process. I'm curious if you could kind of just go through each step in the process. And then I think one thing that would be kind of interesting if you have any comment on it, is kind of the intuition behind the method, right? Why why train a reward model at all? Why do the supervised fine tuning in the first step? If you have intuition of that sort? Yeah. So maybe, maybe we'll start with just the reward model because that's maybe the critical piece of our method so that the kind of the kind of data that Ryan referred to earlier where data enablers are giving their like preferences over different, say, like stories about frogs, we use that data to train one very large that work which we call a reward model. And then in a separate step, we that the word model you can think of as like a this is almost like the score in a video game or like a teacher. So what the reward model takes is in as input, as an instruction and an output, and it returns a number and that number tells you how good was this output? So if the number was good, it means like the story about frogs was a good story. If the number is low, it means the story about frogs was a bad story. We train this word model on like human judgments. And so this this big model is kind of like approximating like what people think is a good attempt at writing frogs stories or, you know, summarizing news articles or what have you. That's one very large model. And then we train an entirely different model to actually do a good job according to the reward model. So the kind of the important piece of our method is that instead of like instead of doing some other approaches, what we're doing is we're explicitly learning a representation of what people think is good performance on a task. And then separately, we optimize know that we're out to do a good job according to that representation. And so that's that's the kind of substantive, like reinforcement learning from human feedback piece of things we're doing reinforcement learning because we have like one model trying to do a good job according to a different model. And then the human feedback piece comes from like the teacher or like score model is is trained to predict what what humans would prefer. And that's the kind of like that's the meat of the method. And then separately, to bootstrap a bunch of things, we do what's called supervised learning or supervised fine tuning, where instead of having people give their preferences about like stories of problems that were already written, we actually just ask them to directly produce what's called a demonstration. So their they themselves are asked to write a story about frogs in French, and we kind of train them all to mimic the words that they they use in those cases that happens to be just like useful bootstrapping data, but it's not necessarily like required to do this oral method. That's actually that's very interesting that it's not required in other applications. I'm just kind of curious as a side question, like do you do you see other major applications as skipping the first step? I think. We still. Do it sometimes. I think one thing is that a few short prompting is is getting pretty competitive now. And and so you can sometimes just skip doing collecting demonstrations because that the outputs from the model few shot is already like acceptable or just good enough to where it doesn't necessarily make sense to do separate supervised fine tuning. Yeah yeah. I think one way to think about it is, is if like this is kind of a kind of headway the intuition but really helps you get more kind of like fine fine grained tuning of of model behavior whereas supervised hand demonstrations can kind of more drastically shift the model behavior. For instance, let's say let's say you have like some model and it's just like to start off with like, sucks at generating summaries, then getting a bunch of ranking feedback between between like different really shitty summaries and is maybe not the most useful. And so instead of what you might want to do is like collect some examples of like really, really good summaries and then have your model kind of try to imitate that for a bit and really have an empirical question as to like, when is it the best, when is it best to switch over from collecting demonstrations to collecting comparisons or ranking data? And there's we have like some results on this in a different paper, but it's it's still like a very open question. Actually, it's not, But far from like assault, I think. Yeah. Oh, I'm kind of I'm curious just to to to recap, I'm like, how did you guys come up with the idea of instruction maybe that also segways into some of the background of the paper? So do you guys come up with the idea behind this? And also, I mean, I think you've already said this, but what would you say are maybe the key ideas, the paper? You probably actually skipped that last bit because you've already said a lot on that. But but how did how did how did this idea emerge? How did the project emerge with an opening? I Yeah. So this, this method we've actually been working on for a while with slightly different motivations. So the alignment team is generally interested in kind of not necessarily making the models better, although that does happen as like side effects sometimes, but making them just more in line with what we want. And so we had in a couple of previous papers just applied this method in narrower domains to kind of see whether it work. And then kind of immediately on the heels of GPT three being kind of like deployed to the public through an API, some members of our team had the thought to like apply these alignment techniques that we had developed in previous papers on this, on this new model which we are now serving to the public. Yeah, I think the original I remember the person who wrote the original Google doc proposing this was Pakistani who was at the time the manager of the lab with him. Yeah, I mean, it's funny, I, I think a lot of us had probably never heard of alignment or thought of it or like the early Jeff and it's now the hot thing. So I see open source libraries trying to do it. And I mean it's pretty you know it's it's I think we've all been so impressed by it. It feels like GPT three was had learned a lot but wasn't able to express it or interact in a way which could could take that learning and make it useful for for people. And all of us were pretty impressed. A question for you, like you mentioned fine tuning, you've talked about our early Jeff and I feel like there's another one which is coming up a lot prompt engineering, which is kind of how do I format the input to get the response I want? How do you see all three of these together intertwined and anything else to add to that? I have some thoughts. I mean, I think the kinds of prompt engineering you need to do will change over time as we change our models. The kind of prompt engineering you had to do for a base language model was like very particular and annoying and it was like kind of for speaking a certain kind of like coded language. And depending on how you I mean, I think it's very dependent on how you fine tune the model. So if you have a model that's not very sensitive to different like, you know, adding new adjectives or like changing things around, then prime engineering is not going to be that useful because you can try changing around the prompt and it's not going to it's not going to give you kind of any different behavior. But what you really want is you want models that are kind of like adaptive and controllable or steerable. And so, you know, if you do that, I think prompt, you could it's almost like a it's also like a just like a normal technology design problem. It's like what are the ways that human might want to, like, interface with a language model to express what they want? And obviously natural language is a key way to do that. And maybe there are certain like keywords that people will, will like gravitate towards using it to prompt engineer. And yeah, so, so I certainly think as, as we sort of get better at fine tuning these models, the nature of engineering will change. Hopefully it'll become more like natural, it'll become less, less like kind of like weird artificial. But at the same time, you know, we people are always going to like, discover new things after we release them out, like new ways to play around with it. That that weren't previously thought out. Friend So the hope is that these models just become more like, intuitively durable over time. Do you think that then the I mean, I've noticed myself, like there's probably five different ways I can ask the question now and get the right answer. And I mean, it's amazingly improved. Do you feel like it's the size of the model or LH graph a little bit of all of it. Like like what? What's made it so much better? Is it our LH of. I mean it's it's both. Yeah, yeah. Yeah. There's actually maybe one thing just to interject, so I don't think we actually got to it in the discussion. Are there like maybe were, are there other major ways that are or lh f like really improves these language models. Aside from alignment we kind of discussed alignment with with human preferences. One one thing that really struck me about the paper was just as a method of distillation for remember correctly was something like the 1 billion parameter model that was, that was trained with our LH. F was performed on human evaluations roughly similarly or something like that, a little bit better than 175 billion kind of vanilla GPT three I'm curious one, if you could say a little bit about the distillation stuff just for the readers you or for the listeners who haven't read the paper? And also if there are other there are other major benefits of, of this fine tuning aside, aside from alignment and aside from distillation. Yeah, I guess I think for us alignment is just the, the whole project. And I think that in some ways, like it's actually it's too broad of a term. So we, we often like decompose that into some some dimensions like helpfulness, harmfulness and honesty. So like helpfulness is like doing what the user asked honesty is, is telling the truth, not making up things. And heartlessness is like maybe not helping with things that are harmful. So, you know, if the instructions provided to the model were like, give me Python code to hack a bank, maybe not, not kind of necessarily helping with that directly. I so in the paper, a lot of the benefits that we observed were from helpfulness, but we also looked at at homelessness. And in fact our method doesn't like in at least in this first paper, it doesn't all most of the benefits come from improvements in helpfulness and honesty rather than aimlessness, although homelessness is like definitely a thing that we care a lot about and a focus of current work and yeah, I guess to get into some a little bit more detail about just like how this helps because alignment isn't just some, isn't some like monolithic thing. There's there's lots of ways that like small ways that are kind of orthogonal but improve the models helpfulness in different ways. So one question that you can ask about whether a model was helpful is like, did it even try to do the right task? And it turns out that like vanilla alarms don't even always do the right task when you ask them to. And so, like our model more often does attempts to correct task. It that was requested a different one might be like you might like Ryan mentioned this idea of stare ability like the instruction might contain some constraints like, you know, write a story about frogs don't you know, don't, don't mention toads in the story or something. And we should call that other piece like don't mention toads. The constraint, which is not like the full task, but it's like some additional constraint on it. And so our model also like follows constraints more than, than like some baseline models. So those are some, some thoughts about like, you know, how this, you know, how this the model is improving alignment. And for us, really alignment is the big thing. Ryan Do you want to talk about distillation a bit? Yeah. A quick thought on that is I think it's a little bit I guess one way to think of it is distillation, but normally distillation kind of means like, Oh, you're taking one model and trying to do what it does in a smaller model or like trying to completely imitate the behavior of the bigger model in a smaller model. And there's a lot of things that be like the full 175 billion proud or Guilty three can do that. The smaller models can't, right? Like, let's say like imitating an author or writing a story like I'm sure the the bigger model is better at that. But yeah, I guess it really is a story of like pointing the models in the direction that you want to point them and it just so happens for this particular like in this particular case, the thing we're pointing them at is doing well according to human preferences on this very particular distribution, which is the distribution of prompts that people send to the open API. And so as a side effect of that, like smaller models, if you point them in the right direction, happen to be to do to do a lot better and yeah, but so so yeah, I guess I guess you could, you could consider that a distillation but again I think, I think the main it's kind of a side effect of my view of like just like all, you know, 1.3 billion parameters pointed at like trying to do the thing in quotes of generate an output that's ranked highly akin to a human versus 175 billion parameters pointed in the direction of trying to generate random text on the Internet. And I think one question I've had is is on and some of it's watching people write large language models for kind of different verticals is do you do alignment for, say, mathematicians or scientists or biological like is the answer is that people want different or just one thing kind of covered enough and we're just trying to get the preferences right and you might not have an answer, but any any any got some like where the industry goes. I think so far we we've kind of treated preferences as like mostly like the average preference you know that you'd get from asking a panel of people. Yeah I think there are very obvious use cases where you want to be sensitive to like idiosyncrasies of particular kinds of people or maybe even just like one person or even like one person in a particular context. Like, you know, my preference is about food on an airplane are probably different from like my preferences about food to find out in restaurants, for example. I think that is an important, important area. But it's I think there's still research to be done about how to how to kind of like accommodate different preferences from from different people and potentially different contexts. Got it. So it sounds like the jury's out on whether it's a large name, large language model built just for mathematicians or whether it's an alignment phase on top of what's there or if it just gets good enough to answer everyone. Do you feel like the jury's still out or do you have, I guess, Ryan, do you have any thoughts on that? Well, I think as as our models get better and, you know, you can start seeing this with Chad GPT are more recent instructive models. They're getting to the point where they can give useful answers to the science questions. And so and a person with no scientific background isn't really able to give informed preferences, at least without without assistance or without doing a lot of Googling maybe. And so certainly it's a thing we're like thinking about is like having the different laborers with different specializations who can who can label different things and that that all that data might go into like a, like a one soup that ends up being kind of one. There's one model, but when it's in science mode, it's been labeled with, you know, people with more scientific expertise like, you know, these are things we're thinking about and there's kind of an art and science to be developed there. That that kind of that kind of opens up the two related questions. I had. One I want to ask a very generic open ended question, which is kind of like, where do you guys what are the ideas that are on your mind now? Where you want to take this work, maybe alignment more broadly, or even just the ideas for structure over the next year or two and then the other question that's that's maybe related or you can comment on is a you know, there are a lot of murmurs in the air. I'm sure many of us on the call have seen some things in private. The next generation of language models are going to be really, really powerful and I'm curious if you think that there are kind of like interesting problems that come up from that or like sort of like there's going to be even more powerful language models going forward. Like, are there are there interesting challenges that come from that? Or like, are there ways that you'll have to adapt this method in new ways to to deal with to deal with these even more powerful language models? They might just know a lot more stuff, too. Two relatively open ended questions. Like, Do you understand or do you understand? I can start. I think the first question was where do we see things going? Oh, actually, I think maybe you might be a better person. Answer that one. Yeah, I can certainly give my my personal thoughts. I think there is. Yeah, there's a lot of there's a lot of potential directions. There's like from a technical perspective, like a place where we'd love to be is, you know, you see some kind of misalignments in the model and you know, for example, you have, you know, we have some content policy that says that we don't have the model to generate like, yeah, like code to hack a bank or something. And we find that, oh, well, actually it is possible to get criminals to write code to hack a bank. And, and so, you know, if we had right now there's, you know, we have this really tough process to to, you know, steer things in the direction of not doing that. But there's still like gaps in reliability and durability. So so like kind of there's just like continuing to own our techniques and and just make them better and make it so that like, you know, if you identify some misalignment, you can fix it quite quickly. That's like, that's like one one set of things and yeah, and there's we're, we're exploring a variety of things. There are some recent interesting papers by bi anthropic around using models to help this process which which are super interesting. One of the things I'm particularly interested in is like moving beyond the, the framework of, you know, you have essentially like aligning to the average labor and there's going be some really tricky questions when, when, you know, you start to ask like to whom are you aligning these models right now? It's essentially like our levelers plus us through the set of instructions that we're we're getting them to follow. Right. And, you know, we certainly don't want like opening I doesn't want to be in a position where we are kind of like the moral dictator of like what is what is right, where are the correct values and so navigating that is going to be challenging and involves both kind of like machine learning interventions, but also kind of like a broader socio technical perspective. So I find myself drifting in that direction recently thinking about that. Yeah, it seems like the same complex, like content moderation questions that Twitter has and some of these others have that, you know, you didn't realize you'd step into, but you're likely will eventually. What's it in terms of on the technical side. It. Is the reinforcement model I noticed it was smaller much smaller than the main model which you know, kind of surprised me that it works and works well like, like that. How much investment or how much you think that that piece is important or is it more just getting the data that that's the most critical thing? And I'm sure you're you probably have more data than anyone out there these days to do this to to do our job. But what's how much is that reinforcement model versus the data to do the reinforcement? The most important. I don't know that we like to distinguish them in that way. Okay. I guess like that's the thing we want is for the reward model to be good. Yeah, but the reward model is like, quote unquote, like programed by the data. So it's like both, it's both the process of training the data and like the input data that we give it that is, you know, responsible for the weights at the end of the day. So. A Yeah, it's kind of like those are part and parcel. Yeah, I will say that maybe like one thing I am excited about is paying more attention to kind of designing the data that we feed into the reward model going forward. Yep. Yeah. Oh. Oh, good. So as want say, one thing I'd add is that I certainly I don't want to make the claim that early is like definitely the way to go is definitely the most effective way to be fine tuning these models and and you could certainly use the same data that we're collecting and find to using a different algorithm. Yeah, that's and probably Jeff, it's actually like, you know, it's kind of complicated. It's like there's like several stages to it, you know, the stage is like pretty compute intensive. So I think there's actually quite a bit of interesting research to do in terms of like how do you can you come up with other algorithms that are get similar benefits that maybe with less compute or, you know, things like that? So I think, you know, part of was just kind of like we have we've done our job in the past, you know, opening our has a lot of organizational experience with PPO, which is the parallel algorithm we use for fine tuning. But, but yeah, I think I think that's that's kind of an open question. I had one one last question and then, Jason, if you want to ask another question, then maybe maybe I'll open up the floor also to the authors to finish off. I just wanted to circle back on something I mentioned and maybe we just didn't get to it. And again, if you can't speak too much about things, that's totally fine. But I am curious like if there if there are interesting questions that come up or like new challenges or new directions for this kind of research as language models become much, much more powerful even than GPT three. Yeah, I think one one issues that just like judging these, making these comparison judgments becomes much harder if those are very powerful. So, you know, an example of a task that we would we'd like to give a powerful model is like, you know, write, you know, write a code review for this pull request on GitHub. And, you know, models can't do that today. But you can imagine, you know, more capable models in a year or two you might be able to. And that's definitely the kind of thing we want helping out or helping out with the time. It's going to take a data labeling contractor to evaluate whether in model written like code review is is good. It might be very high, they might not be able to do it at all. So right to cases where there's like, you know, the things that people use models for kind of outstrips any individual person's ability to evaluate the model. So I think one very salient challenge is just like when the models are very powerful at a large diversity of things and just evaluating whether or not they're doing a good job becomes pretty non-trivial. And that's maybe an area where you want to start building ML systems that help people evaluate the behavior of other animal systems. Well, that's also a really cool spot to be in, right? They just it's so good that like, yeah. It's, it's it's, it's an exciting area and then state BE. Ryan do you have any thoughts on that that area I'm curious. Yeah well I totally agree with what Lange said I think maybe maybe a thing I would add is that kind of speaking to some of the longer term climate research, you know that the systems like optimize literally what you kind of program them optimized. So if they're optimizing I think things that would rank highly when they are doing their rankings, you know, what you're literally optimizing is producing outputs that sound good according to humans. And as models get more and more powerful, it's possible that as they optimize this, they find maybe interesting or tricky or, you know, maybe you could apply the adjective deceptive, although maybe that's arguable ways to to generate outputs like at a high score, which actually are the kinds of output that we want. And I don't think we're quite there yet, but at least that's something that, you know, we want to keep an eye on. And then there's, you know, thinking about ways to mitigate that. There's like the kinds of approaches that Long talked about, which is like you have other models. They models kind of help you in evaluating the outputs. And that's the kind of like kind of scalable supervision style of research that I talked about. So people are working on kind of like more interpretability things like can we try to understand what's going on inside of the model? And that's kind of like another thing of alignment research. But yeah, so I mean, who knows when we'll get there. But, but that's the thing to think about. Any given where we sit in the space around observability, monitoring, I'll ask a kind of a last question that just you just hinted that there's people looking at like stuff going on inside. Anything you can point to that's external that you've seen recently that is interesting in that that space. I mean, we spent a lot of time looking at embeddings and understanding manifolds and structures and but would love to know what would interest all of you or anything external that you see is exciting. You mean specifically in the in. Terms of understanding. Yeah. Understanding what these are doing and yeah. So anything there or anything it's worth pointing or is it a whole new area that there's not really great stuff. I mean I feel a little bit in that camp, there's it fills on, you know, a pretty blank blank slate, but love to hear what we all think. Well, I haven't like super deep into this literature, but from what I've seen and skimmed, there's some really interesting research from Anthropic recently on on interpretability and they're they're working on kind of like they're mostly working on a smaller scale Transformers but trying to trying to understand what's happening inside of them. And yeah, I think that's I think it's quite interesting although, you know, we'll see how you split ends up in. Great. Yeah there's another kind of complementary work on on setting up the look where the language models do in a way that's like more observable so anthropic it kind of while we're speaking of them, kind of is into this idea and as are we like supervising the process that the language models do than the outcome here might be something like breaking down a big task into a bunch of small components and you kind of maybe like have a better handle on each one of the components than you do, like the overall end to end training process. Mm hmm. And is that part of the actual training or is that like the fine tuning at the end or. I think I've seen this so far in people like building products or there's, there's a research group come on, which also builds like a kind of a what's called like lit review assistant or something for academic papers. And they, they've used this kind of technique for building their language model, assisted lit review tool. So the examples I've seen so far are like that they're kind of like at test time. But it's interesting to think like, how would you decompose training. Kind of, well, exciting stuff. You must I mean, it must be fun to work on probably the most cutting edge stuff in in the world these days. So thank you for for joining us. Sure. Thanks for having me. I wanted to I want to just ask one more very open ended question, which is kind of like, thanks for joining us. Or is there is there anything for people who are listening that you'd recommend checking out to learn further about your work, either this instructor paper or just your your work more broadly? Great question. Let me think. Certainly we know we're trying to think about like what's like where the the most effective ways to communicate about our work. The interactivity paper now is like it was like a year ago right. But, you know, pretty much checking it out for for folks who haven't anything I recommend checking out. I think like, you know, probably people are already doing this, but if you haven't already playing around with these models, like playing around with chat, you could hear like getting an intuition for what it can do and what it can do and, and like where it's noticing ways in which it's like you were trying to get it to do something and it's not doing that thing and you know, that's, that's kind of gives like opportunities as well, right. For for people to like, you know, we're going to be doing, you know, we're doing a work, but You can also think of the work that companies do if they if they at least that they find true in our models to act as like aligning it for a very specific use case and maybe maybe also like gives less about our work. But but you know, developing a bit of curiosity about like, hey, like what these systems are getting a lot more powerful. Like what? What will happen if we have like know good seven and and there have been people thinking about these kinds of like longer term alignment questions so that our colleagues on the more longer term climate side wrote a paper about critiques creating language models to to critique which which is kind of like a step in a scalable alignment direction. And yeah, so it's, I think it's the shadow that give. But I'd like to give a I'll definitely give a plug for, I think the Open Air blog that you put out like as a whole, and they give a really good summary of different subjects. So there's a whole instruct section there that gives a great summary of the work. And then links to papers the model card. So for plug for the Open I a blogs that are to to learn more and a bunch of subjects to. Long did you are anything you'd recommend checking out or. Yeah I think Ryan mentioned giving chat yep to try I'd also recommend giving interactive media try. Oh yeah. Yeah yeah. Yeah. Actually I mean it's worth saying out loud because and it's not that widely known that, you know, this is like a public model that you can get some free credits to play with at like beta dot opening icon and, and it can be pretty useful in building up little tools here and there. So I would encourage people to play with both chat and interactivity. Holy shit, how do I figure that? But I mean, yeah, it's funny because the like underlying you know, GPT 3.5 has been available since like, you know, whatever much earlier this year or last year and, and only once it's like, you know, people use it for free and it's in the form of a system that's really taken us. But, but yeah trade and activity it's has has some ways it's better in some ways it's worse than Jason to you. Yeah definitely. Cool. Well, thanks so much for joining us. This is this was really fun. This this was really fantastic. So it's it's very kind of you to give your time to all the folks books for listening. Thank you very much. Thank you. Thanks for anything for having us here. Back To Top