uh hello everyone my name's uh Daniel I work for runtime verification I am a formal verification engineer there and a lot of what I do is uh related to rust and the rust compiler uh and so I thought that I'd share at least a cursory overview about that with you here today um a cursory overview that's going to try to go from start to finish as much as possible I think that is a a somewhat ambitious goal to fit within an hour um there's it's a pretty complicated piece of software and I guess trying to squeeze uh all of the intro details even inside an hour is a bit like trying to fit a school bus into a your living room but we'll uh try to make it happen uh can everyone see the screen share can I maybe get some yeah we can see the um you might have to move your uh what is it called yeah just just move it up into the upper right corner I guess or something oh this you guys can see the overlay yeah oh nice okay how about that that's probably you won't be able to see any reactions if you put it down there I don't know wait let me try ah yeah so that yeah so just as an FYI that's okay I I'll fly blind okay um okay so uh some stuff that I'm going to talk about today is uh a b basic overview of what rust as a language is is trying to do this this is only going to be brief like 3 to 5 minutes just just talking about uh what the language is which isimportant because uh after that we're going to be talking about the compiler and the compiler's goal is to take a source programming language that we that we write in Russ Russ source code it's going to turn it into a binary and so uh it's important to know like what what is it about rust that this compiler has to maintain and and enforce as it goes through these Transitions and so there's a bunch of things that it's going to need to make sure uh properties that are held by this Source language and um to do that we'll have to do rounds of analysis and so it's going to transform the language into different intermediate representations uh in order to perform different analysis um uh the best points in time that it can so so these two points uh here will take a bit of time maybe between 20 and 30 minutes but this is where we're actually going to be looking inside the Ross compiler uh after that I will talk about um using callbacks which is a more interactive uh way of dealing with the rust compiler and um these last two points unless uh unless I time travel I don't think I'm going to get to them because I did a bit of a practice run and I had to strip a bunch of content out of here to even just just make it towards an hour but uh maybe that can be left as a teaser for another day uh so let's start off what is rust so rust are defined by rust Lang group themselves is a language empowering everyone to build reliable and efficient software so I think that what they mean by empowering is uh you get a systems level language that gives you explicit control over your memory allocation uh and it's able to compile to many targets and it's interoperable with other languages through foreign function interface so it's pretty powerful it does a lot uh and furthermore it's efficient uh this is able to be compiled down small enough to run on embedded systems and it doesn't have a garbage collector and not having a garbage collector is a little interesting because it is reliable uh without that garbage collector but its type system um is able to give some strong guarantees of memory safety and thread safety um and by memory safety I mean that uh the typical foot guns that might be available to you in languages like C where you're able to uh control your own memory allocation like buffer overflow using off free dangling pointers these are prohibited by the type system so you know if it co- compiles that you're not going to run into that error uh you also know that there's thread safety in the sense of uh data race Freedom when you do concurrency uh this doesn't prohibit you from I think logically deadlocking a li blocking your code um in that sense you're thread unsafe but uh you are at least free of data race data races so all of this is enforced at compile time and um this is done by the fact that rust has borrowing and lifetime semantics so the way that it deals with references uh that if you have shared data uh so this is multiple borrows multiple references out in the world that you are unable to mutate them so you know that if you have shared data you can't change the data you also know that if you have mutable data unable to Alias it so you can't make two references uh to it two mutable borrowers and through uh making sure that these are all enforced at compile time is how they get those strong guarantees as with anything though there are some caveats so the memory safety and thread safety comes with the assumption that you haven't inappropriately used the unsafe keyword using the unsafe keyword gives you a lot more power to directly manipulate things like memory and raw pointers uh but it then uh also empowers you with the foot guns of getting all of the uh memory bugs back in So on the flip side to that you are also able to safely uh mutate uh shared data and Alias mutable data with the appropriate use of the unsafe keyword so if you use it appropriately you can do these things and be empowered and so uh a good way to do that would be to use the standard Library which is essentially wrapping unsafe code to give you things like vectors and uh Atomic references for concurrency so uh there are canonical ways to use unsafe code in a reasonable way so that's uh that's what's going on with the language of Ross and so now moving on to Ross C Ross C is as I mentioned the program that's going to take the rust Source language down to a Target binary and uh we'll have a look at the GitHub for that here so this is the rust Lang uh the Ros compiler GitHub and maybe I'll bump that up in case it's a bit hard for people to see uh also feel free to turn off the mic and interrupt me at any time I love questions and interjections they fuel me but uh here uh maybe I'll point out there's four interesting directories I'll do a bit of orientation because when you first get to this GitHub it can be a little overwhelming uh tests is all of your test code so there's not too much that we'll need to talk about in there today the source directory is not relevant to the conversations that we're having today there's a bunch of stuff with documentation and other things auxilary to the point of this talk um and then there compiler and Library so compiler is where most of the code that we're going to interact with today exists and uh that also does rely um on the library code existing so I'll take a look in these in a moment uh if you clone this compiler and uh build it there'll be a fourth directory called build and that will contain the the actual Russ C binary uh that you can run it takes a long time to build uh and it does use up quite a bit of space to do it but um if you want to do it uh you're more than able to in order to do it you will need to use the scripts down here which are the X scripts so there's x.p and X x.p allows you to have uh some configuration um maybe stuff that we'll be able to touch on later but if you do that it will put some compilers in your build directory uh and when I say compilers that's because the the rust compiler is a bootstrap compiler meaning that it is the code to write the compiler for rust is itself written in Rust so if we take a look in one of these many many crates here that make up the compiler itself we choose here in here we see that this is all rust code so this is code that makes up Ross C and it itself is uh written in Ross uh which might be a bit confusing uh to some people if you're unfamiliar with bootstrapping um but it's pretty clever compiler design and I encourage you to to look into it um if you would be interested so inside the compiler directory we have many many crates and so these crates are responsible from everything from going from the binary oh sorry the source language down to the binary and Performing all the analyses on the way all of this is going to be happening at different stages along here uh when I first started writing this talk I thought that I might actually go around through these directories but uhor in the chat could you sweet to build a compile using langu to see compilers also build and C uh I don't think that this is a rule like you can build compilers in ex external languages and in fact there's a lot of Frameworks like you can use Java cup and uh I think it's called Flex YYC uh these are all Frameworks for how to build um uh compilers and paes for uh a grammar that you can Define uh on the spot so it isn't a particular rule but uh it is I think there are many different bootstrap compilers and um there's definitely a lot of benefit that people have to doing their own bootstrapping what compiles rust into a binary so that would be the rust C program so when you typically if you're experienced with using surface language rust you might have used a tool called cargo I don't know if you can see this uh that's being shared on the screen but cargo is um your sort of canonical way of interacting with the rust compiler but when you do a cargo run what this is actually doing under the hood for you is it's running Russ c um and it does it with a bunch of arguments to make things easier for you so that you don't have to worry about that but Russ C here is the program that is going to take the biner uh the The Source language down into a binary and so back over here this uh GitHub is the GitHub that contains all of the source code that will that will give us that Rusty binary if we build it now uh it's also worth mentioning that uh there are a couple of different versions of the compiler so when I was back here if you had a seen when I did rusty version it comes up with the word nightly here and this is because there on this repo there are nightly pushes to the rust compiler um like is in every single night uh with changes so this is all sorts of different activity coming from the community to try and improve the compiler add new features clean things up fix bugs but um people generally want something that's a bit more stable and so there are also stable releases of the compiler uh which are more reliable um and they're uh excluding more of the experimental features so uh aside from the the stuff that's in this compiler directory there's also the library directory and this is containing a lot of The Primitives that exist in the language things like the core library and here you have uh sort of the core types and things like uh you know pointers that sort of stuff uh things to do with all your types and handling panics and all of that kind of thing and in here in allocation there's all stuff to do with allocation and there's the standard Library down the bottom so this has things like vectors and slices and um all stuff that if you're familiar with using rust things like print line these types of macro functions are are inside the the standard crate here so all of this is in in some way useful when building the compiler the the this directory will depend on some some things inside here uh getting back to the slides so uh I mentioned the bootstrapping and the nightly releases if you um are at all wanting to interact with the community and get involved in learning more about the rust compiler there is a zulip it's a fantastic place to go ask questions I encourage everyone that would be interested to go there so that that's the GitHub for the compiler that has all of the source code but this is trying to implement an idea and the idea is of course going from the source code down to the binary and so this is my illustration of all of the things that we need to go through in order to achieve that goal so at the top level up here uh well I I break the the process down into three main stages there's the rust source code level there's the rust intermediate representation level and uh finally we have code generation and so each one of these inner boxes is a different uh representation of the code on that way and some of these dot points are different things that we have to do to the code to get it into the the format that we would like and there's different rounds of analysis here in order to make sure that the code is well formed and conforming to what it is that that language promises to do so I'll start going down through here and I'll use a hello world example as a motivating example for us to view some of the different forms so if you're familiar with Russ this is pretty much as simple program as you can get it's you define the main function and you tell it to print out uh hello world so Russ C when it sees this uh source code the first thing that it's going to do is decide that it has to transfer this into its next form which is an abstract syntax tree but on the way to getting there it has to do a few rounds of analysis that I'll point out so I'm aware that some people might not be familiar with compiler um terms here so I've I've added some definitions but an abstract syntax tree is uh a a tree like a tree is in a graph tree representation of a a source program and so to to get the tree we need to go through uh two things uh and that's called Lexing and pausing so Lexing is where we take in a stream of characters and we're going to read them in and choose uh or not choose but we we recognize tokens that that they match in the language so up uh here uh we see FN and then some whites space and we know that this is uh the start of the Declaration of a function so this is the function definer um and seeing uh string of characters before either white space or the open parentheses means this is the identifier of the function name and so leing is to take in all of those tokens and once we've done that we them into the tree this is where we take those tokens and put them into the tree what that looks like when I say tree to give you a graphical idea from the Wikipedia is this is what an abstract syntax tree looks like um and this is represent presenting a simple program where there's uh some statements a sequence of statements one of them is a while and a while has a condition and a body this condition is the comparison of a variable with a constant and if you go into the body there is a branch in that Branch there's a condition that's going to compare two variables A and B and it has an if and an else uh both of these being assignments so this tree is a way of representing a program uh that is useful uh for compilers and so the first thing that rust is going to do is try to get there so it lexes and paes to get into the as however it then to get all the way down it needs to expand macros and name resolution so but or expanding the macros at least but we can view that's why I put some white space here we can view what that unexpanded uh program looks like so over here uh I have our hello world example and there's a bunch of commands that I've written down that you can run from Russ C in order to view uh different different points of what's going on inside the compiler on the way so here this command is using the DZ flag on pretty um these flags are not necessarily easy to find but if you use Russy help what it will tell you is uh if you want to know some stuff about the compiler there are some unstable options that you can find with- Z help so Russ z-z help will spit out a whole bunch of DZ compiler flags and using these you're able to sort of tell it to dump out information about the state or turn off certain things for the compiler uh depending on depending on what you're interested in which could be a whole range of different things so if we wanted to know some stuff about the we might go Russ uh- Z help and then we could grap uh for the and it will tell us here uh a bunch of flags tree tree uh comma expanded and so these are two things that we can provide to the Unpretty flag in order for it to dump some state for us to look at so having a look at I've already run both of these commands and we'll have a look at what an abstract syntax tree looks like when rust is dumping it out so it doesn't look like that graphical representation where you have nodes and arrows pointing to to the nodes uh instead it's like a a Cony or math math graph where you have nodes and then part of the node will will have a a pointer or indicate what it points to next uh a label and so this here is corresponding to this this program if I split it to the right although it looks quite different but this is the first thing that the rust compiler has done to to try and turn this into the binary is it's taken this hello world program it's it's started at the crate level it's created uh some idea of there being some items we can see a main function exists here which makes sense uh it is indeed of kind function and inside here I know this is hard to read we won't spend a lot of time into it but inside here we can find a call to the print line and we can also find uh our string literal hello world so all of this information is in here it's just been expanded into a graph uh if we want to see the so as we saw in there there was still the print line as a macro but this ends up getting expanded straight away we can use the rust analyzer to predict what this will get expanded into by using the command expand recursively at at the macro and we can see that this print line function will get expanded into uh the standard Library IO module or create um underscore print function so a um a vs code plugin you use for that or is that yeah it absolutely is so if you are wanting to do stuff with rust I almost certainly encourage you to use the rust analyzer it's um I thought it would tell me how many people used it there but it's oh yeah it's got four million people that are using it uh this is part of the the Russ langang team and it it's it's using an LSP to give you a bunch of information in the IDE it's very very helpful for using rust and so one of the things that it can do amongst many other things is expand macros right have another question in the chat sorry yep go ahead could you read it out to me if possible sure um why or when would someone ever want to see the as of their program uh is this tool mostly for people who want to work on the compiler um it is uh it it is mostly for people who are interested in compilers going through the but it's not exclusively for that so if you are interested in compilers going through these different stages is important but uh for uh us uh we need to understand what's going on for these intermediate representations uh at runtime verification because we would like to build tooling on top of it so we want to be able to do uh theorum proving deductive verification and build our own interpreters for rust um and we can't just do this at the source language level we end up needing to have some more lowlevel representation and so we're going further down to something called the middle intermediate representation but uh there are many other programs that are working with intermediate representations of the rust compiler for their their own business case that they have although I will admit that generally um people looking at this are are people that are looking at compiler uh or interested in the r compiler itself okay another question came in uh a question related to as rust have macros that allow you to generate code are these related um I think yeah I think the question is asked relationship between as and the macros uh there there is in the sense that the that I showed you before uh has them unexpanded uh however and then there is the ability to expand them so this tree ends up being a lot a lot larger because all of these macros have been expanded but the the existence of an is dependent on a macro in in that sense they are like mutually exclusive um well I shouldn't say that actually because the the macro must exist in the um but those ideas are are are related in that way um but when you do create a a a macro you do so if you if this if the question is related to um when you write a procedural macro in Rust it is true that you have to think about Lexing and paing tokens at at that point which actually probably is what the question meant is someone who has looked at that so you do have to think about um this sort of stuff when you're working at the the surface language level for for those macros okay uh I got in the chat that answers the question thank you nice uh so this is I mean obviously we can take a look at all this stuff all day and I know that everyone attending would be thrilled to do so but um we move on so this is the abstract syntax tree and and another reason why we might want to be looking at this stuff is we might want to debug a program that's particularly nefarious and we have some knowledge of compiler uh stuff and we want to see what actually is getting spat out so the next thing after the that we need to do is we want to transform that into something that we that's very similar to it but is more amendable for us doing analysis too and so this is going to be the high intermediate representation in order to get there we need to do some lowering and desugaring and what that means is uh features of the rust language that do the same thing we want to streamline all of those into one representation so in Rust you can have for Loops while loops and the infinite Loop just the loop key word and break or return inside it to exit that Loop and all of these three things are are ways to have a loop but uh here wants to to get rid of the multiple representations and it just works with the loop keyword and so a bunch of different features that do the same thing it streamlines them into one uh control flow all gets turned into match statements I'm pretty sure and uh that's lowering and D sugaring once we have uh done that we're now in inside a here representation and we can do our first rounds of analysis which are type inference trait solving and type checking so this is starting to get to that idea of like what the rust compiler is guaranteed it guarantees you a type Safe program but on top of that it has a lot of other things so I'll show roughly using some commands here we can do the same thing we can dump uh Unpretty here and Unpretty here tree we won't need to spend much time looking at this um this first here one almost looks identical to the original program that we started with so on the left here we have our original hello world in the surface language and the here at this point has added in the Prelude it's added in that we're using the standard Library things that we are we are able to alide and we don't have to mention um at the surface language uh start to get be made explicit and uh here we can see that the macros have been expanded and the uh actual construction of this string into a constant is a bit more explicit there's also the here tree um output and so this is really what's more so happening inside uh the rust compiler it gives you a less pretty printed View and so this is almost the same as the it looks very similar except there are uh some other things going on which are really really important for the compiler to be able to do its analysis it starts allocating things called def IDs to everything that has uh a body or is something that can be alled and every single expression that exists inside the code is given a here ID and these deaf IDs are not just useful at the he level they're going to get carried down through many rounds of analysis um uh as identify as to different points of things of interest in the code sorry what do you mean when you say getting allo what's that mean in this context uh what I meant by an aloc is an aloc is things that exist at some point in memory so that might be a global that might be and uh sorry you can also alloc conss um but there are there are this idea of things that end up getting uh allocated in memory and yeah that's what I meant by that so that's the here representation and when we want to do rounds of analysis there are some things to I guess for the interest of whoops learning about this um at the he level the the things that we do for analysis are uh trait solving um type inference and what's the last thing that we do we do type checking so an example of type inference here is the fact that this V I haven't actually explicitly told it that this is a a vector of strings now the rust analyzer has in Gray told me like I know that this is a vector of strings and how it knows that is because it's it's using the information from the rust compiler to do type inference and it knows from the context of what's Happening Here that that it must be a vector of strings so that's type inference and trait resolution is or trait solving is the rust compiler deciding that for the generics that I've used have I used generics in a way where it's possible to actually construct functions that will be able to have the concrete instantiation for for for what I've asked of them and so that was very wordy I know but uh my example for that here is I want to log some elements I have a function log elements and this is a generic function of a vector of t uh what I do inside log elements is I enumerate element and I you know for my test here I just print out the the index and the element but the print line function says you can only print an element in the way that I've done it if it implements display and so here I've had to put a a trait bound on display uh sorry a trait bound on T where I've said t which is generic it can be anything except it can't be something that doesn't have display I know that one thing that must be true of T is it has to have display and then this function will resolve if I take this away the rust compiler will say I can't solve for these traits you're asking me to create uh or to satisfy a trait bound and I I don't have the ability to enforce this it it's too weak um so that's what trait solving is happening and it happens at the here level as well well and then type checking is something that we're probably all familiar with uh I'm sure everyone here has written a Russ program or any kind of program and they've got the types wrong like here you know you can't assign negative numbers to a u32 um because it's it's meant to be unsigned and so there's there's a lot more that goes on with typechecking type checking is much more complicated than just making sure you don't put the the negative number in the the unsigned but that's an example of um going through and making sure type checking is satisfied uh I might go back all right so uh any more questions Jeffrey no that's pretty clear cool so uh this is all happening at this stage where're we're getting about halfway down in our representations so we're still this here it still looks like an abstract syntax tree uh like where we came from but it has a little more information to allow us to do our analysis uh the next thing that we want to do is transfer that here into here type tie intermediate representation and this isn't that different to the here it just uh everything has been type checked in all of the types are uh completely elaborated um I'm less familiar with this than everything I did a little bit of uh Googling admittedly to see what is the fear actually used for and from what I can see it's used to uh the analysis that's done on the the is things like unsafety check so if you're using the unsafe keyword or you're using different functions that rely on unsafe it will wait until it's got the fear to check if um you're breaking any of the the rules around unsafety uh just quickly I'll and I mean very quickly because it is completely unreadable in my opinion at least when I had to look at it for the first time today but um you can use uh Rusty dason pretty thear tree and you can use un pretty fear flat to have a look at some extremely massive gravs that are I'm sure there's lots of information if you're someone that's familiar with reading this stuff in there and there's a flattened version of it here um that's a this is a look into what this looks like uh but the important thing is it's used for rounds of analysis uh this one for un safety so this is is still an abstract syntax tree looking form but the next one that we go to from the to me this is where things change up so now we change from an to sorry before we transition there's another question here yeah um checking my understanding here but we can create tools like rust analyzer using the compiler artifacts provided by the rust compiler uh for example H Etc yeah so I I must admit I'm not entirely booked up on how uh the rust analyzer is doing what it does but the LSP that has the language server protocol is in some way uh aware I don't know it must be referencing this source code directly and it's able to on the Fly ensure some things about this this process if not maybe everything at uh some particular levels are are holding but it isn't compiling your code and creating an artifact in the Target directory when the rust analyzer is running but the things that it's telling you are things related to making sure you have a valid a uh whether or not you're breaking type inference whether or not you're you're breaking uh your trait bounds um the like it can show you the type inference he can tell you when you typed check WR so all this is coming from the rust analyzer so it is definitely related to this the the exact relationship with I I haven't dug into myself uh any more questions nope cool so uh you once we're uh uh or to get to the mirr um we're going to need to change the format into a CFG format and so CFG stands for control flow graph uh I thought I'd look up a graphical representation for that as well and to my delight uh when I went to the Wikipedia and I clicked on the picture it actually shows you a rust Mir program as the example so um this is a a better view than um at least if you want to see what a CFG is in terms of like actually seeing the blocks and the arrows but this is some sort of rust mirr program it's at the Mir level and what you can think of is there's a bunch of information declaring some places in memory which are going to have some type so this is just these places of memory are going to be assigned something at some point and then the actual program is here inside these these nodes of the graphs with these being the edges of the CFG and here basic block zero has a list of statements these statements are performing assignments to those places in memory and then it gets down to a terminator and this Terminator makes some decision it's going to do a switch based on some int and it either takes this Branch or it takes this Branch to go to basic block one or basic block two in basic block one there's two assignment statements and then basic block one always has an edge to basic block four basic block four has a few statements and it returns so this is this is the graphical representation of what a CFG looks like so this is different to what our looked like uh also the these can point back around in Loops like that it would be reasonable for uh two to branch and wrap back around into one um that would be a valid CFG uh oops where am I going oh back here um and so once we've transformed the into this CFG where and why we would want to do that is because it's really beneficial for the next rounds of analysis that we want to do uh here this the rounds of analysis with the Mir are the drop elaboration and the borrow checking and so what you can think of is this is kind of that rust memory safety uh guarantee that I spoke of right at the start of talk this is where this is getting enforced all of the borrowers all of the lifetime making sure that all of the memory allocation is handled correctly with allocation freeing all of that stuff all of that analysis is happening at this level here um in something called the borrow Checker and so Mia has a really nice uh so if you remember this is our hello world example um Mia has a really nice pretty print option uh that that's useful to get a flavor for what's going on uh that you can use with uh Unpretty mirr but um a nicer option is to use Unpretty mirr and you use this flag here where you you do a minus promote temps that turns uh constant promotion off so constant promotion isn't relevant for what we're trying to look at here that's why I turned it off um and so if we split that one to the right and we take a look at this so on the left we have our hello world source example and on the right we have our mere representation of it we do get a warning straight away that um this is a pretty printed version of this uh it all all bets are off if you're going to take this a little too literally um and what we can see is just like before there's a bunch of Declaration of places in memory where with a CFG we now uh this idea of we don't have variables we have places in memory and they have some types Place zero here is always reserved for the return of a function uh and the rest of these you can think of as like um uh registers if you will or just yeah literally places in memory um so this function here prints hello world uh the way that it does that through a control flow graph is it starts always with basic block zero and it assigns to place four the constant string uh hello world it then creates another place in memory which is a uh a non-mutable reference to that um constant of hello world and then it tries to create a whatever an argument new con is of of three which was the reference to hello world if that succeeds we're going to Branch to B one and if that fails we unwind which is is like AB boarding the program but literally unwinding up the stack all the way back the C stack um B1 here is uh going to assign to place one in memory the the output of printing what was in two and what was in two came from basic block zero which was the constant hello world so we're going to literally print hello world if that uh doesn't error we're going to the Terminator here is going to take us to basic block two and otherwise we unwind back up the stack and basic block two just returns because the return type here is the unit it returns nothing so that's like a a crash course on how to read a mirr program um as I said Mir is doing a lot of those uh memory guarantees like lifetimes and borrow checking so after all this we're finished with the rust intermediate representations and the last thing that's left is cenation cenation is something I'm going to Breeze over pretty quickly um but what's important is what generally ships with rust is lvm but this section here is actually really uh you can you can change it you can exchange it with other really common Cod genen backends like GCC cran lift uh but you can also write your own custom Cod gen backend maybe uh you have a particular use case for uh turning rust after these rounds of analysis have happened at near and you say okay but now I've got that there's a whole bunch of different code generation from what everyone else in the world is doing that is useful for my particular business case um an example of this is the Carney model Checker which um takes I think all of this and then uses uh some interesting code generation for I I think it's cbmc or something like this to uh do model checking of programs um in order to get from Mia to l VM you have to go through quite a lot of different stages uh constant evaluation so all constants in a program that you write in Rust are evaluated before you even get to code generation it's it's done well yeah it's evaluated every constant that it can at least um and uh as well as that you do even more lowering if you remember lowering was sort of normal izing what was going on and the lowering that is happening here is called single static assignment um this is a big simplification as well there's a lot more that goes on and then another thing of interest is there's monomorph so uh when we write in a powerful language like rust we have a lot of generics we want functions to be able to take multiple types and we want them to be able to return multiple types we want traits and all this stuff but when we get down to a binary a binary doesn't really understand what any of that stuff is instead there's a list of different functions and uh as the program counter is stepping through uh these instructions it's jumping to different points and the binary itself doesn't understand a generic function and so the monomorph is taking all of the possible generics that can be satisfied and creating individual functions in in the code generation of the actual uh assembly or llvm in this case so that um you can handle all the different types you need to for your generic code once you're in llvm there's there's a ton of rounds of optimization and llvm is getting pretty close to a binary representation at this point using the commands I was able to admit it using llvm and llvm VC uh that's this one here and llvm is but I want to say this is readable but I have no idea how to read it um but I did notice I could see Hello World in there but uh I haven't spent any time in my life really going through lvm I can see that there's loads in stores but if you don't like that you can also use the llvm BC which is uh you know if that's if that's your kind of thing that's that's good too but uh that after llv MBC sorry after llvm there's only one more place to go and that's to the actual binary and so the actual binary is is the target of what it is so whatever you're compiling for so this machine that I'm running it on runs on x86 but you might compile to wasm arm whatever um this is probably the path that if you've used cargo before that you're or at least this is the end that you're familiar with you written a rust Source program at the top and then you say cargo build um and it spits out at the end a Target but this is everything that happens on that Journey you can get a bit of a view into the the Assembly of it like without looking directly at the binary using emit ASM uh and so as I mentioned my machine's x86 machine and so this is the hello world program as x86 assembly instructions which you know if you've looked at x86 before this might be somewhat interesting I don't know I know that you can do debugging at this level and I assume on every level uh up above uh so that's it for the the round trip of what's going on with these uh particular uh IRS that's that's start to finish so so I am aware that I'm pretty close to time uh what did you want to do Jeffrey uh maybe what we should do is maybe what I can do is just collect everyone's emails and we'll invite everyone to a part two would that be yeah sure I mean I've got all the stuff here to to sort of show a bit more interaction but maybe that's actually a good place to to cut off that's a that's a walk through the different I and how to examine them still if anyone would like to ask any questions um I'm I'm happy to stay for until the last ones are gone oh you have a actually have another question here um abstract syntax Street of Russ is application binary interface to solidity hm the application binary interface so if I understand correctly and I'm not saying that I do the AI is like it contains all of the actual solidity no sorry the evm bite code and it contains all of the information for how to access it as well like what are the what are the handles into it maybe actually Jeffrey you probably know more about this than me yeah that's right yeah that's correct yeah um no I wouldn't say that that's what the as is the a is actually really just showing you what the the source program was in Russ so showing you function main uh open parentheses close parentheses open curly brace but it's done it in a way where it's um all of those are a node in a graph and it has some way of determining at already at that point whether you're paring a valid program so if you write instead of function main close parenthesis and open parentheses you won't be able to create the abstract syntax tree because you've already broken the rule when you've started paing so the ABI is a bit further Advanced down the path so solidity would have to create its own abstract syntax tree when it's initially reading in everything that that that is in the source language all of the characters AI just contains to function yeah yeah nice the Ros compiler itself doesn't generate any binary code that's the job of L yeah actually that that is true the code generation that part that's at the bottom so whether that's llvm GCC or whatever that's going to handle the target so you choose a Target whether that's x86 arm or whatever and so that's your binary that's your Target and uh it's the job of the code generation to do that so I mean I didn't uh probably fairly like I I meant to that like that's a different IR to hear it's you know separated but really the code generation is in large part in inside the rust compiler itself but I I didn't really consider that as an IR itself because it's it's so replaceable and you can bring in your own custom one from outside completely but you can't do that with the rust IRS and so that's why I separated them uh so going back to this what we did last time was uh we looked at all the IRS so those are these little boxes inside of the the dotted ones the dotted ones are my way of representing so this is the source code these are some intermediate representations that are still in the rust environment and then there's code generation and the code generation is um it's it's kind of separate from the the intermediate representations in in some way now uh from The Source language you go down to an abstract syntax tree so this hasn't really modified uh anything from the program or done any analysis it's just simply taken all of the text that's come in like as characters a stream of of characters and it's said let's group these into a tree structure because we're going to prefer a tree structure to do some analysis later and so uh we turn it into an by Lexing and paing uh it's still got the macros in it at that point so we want to expand the macros we want to fully um elaborate the names to know uh where everything is concretely all the way down to the to the crate route and once we have that we have the the uh is then massaged a bit by lowering um into the higher intermediate representation and this is the first intermediate representation where we can do our rounds of analysis so from here we can do uh the the compiler does like its first pass of guarantees and that's making sure that all the types are right it it infers um some of the types that you didn't explicitly put in there and it will um make sure that the the bounds that you have on all the traits and generics that you've started doing are actually possible to be instantiated so that's called uh trait solving so once all the type checking done there's a little more Des sugaring and you get to typed High intermediate representation and so this is all of the types are fully elaborated and um I don't know much about this but I do know that um unsafety stuff is checked here and this is our last representation as an abstract syntax tree from here we go to the mirr uh middle intermediate representation and so we transform from an abstract syntax Tre to a control flow graph the control flow graph being a sequence of uh basic blocks where we have statements inside those basic blocks which are not talking about variables anymore they're talking about assignments to places in memory and after you go through a sequence of statements you get to a terminator and the Terminator will potentially branch and it can point to more basic blocks and and this is now the layout of the the program so there's basic block zero where you enter and it can fork and go to different basic blocks and eventually will um get to the final basic block and terminate so this um intermediate representation is where we do the analysis that ensures that all the borrowing and the lifetimes are sound so making sure that um the that shared data is not mutated so you and so there can be multiple references out there to share data and it's going to make sure that none of those references are are mutating that data and it also makes sure that if you have a mutable reference to that data that anytime someone tries to create an alias to it it blocks it and so this this representation is where all of that work is done to to give those guarantees from there we can go to code generation there's a little more massaging that needs to happen here that uh or before you get to there you need to evaluate all of the constants all of the constants are checked out or is all the ones that are at least evaluated I believe get get done at that point and then um you need to process all of the generics out so all of the generics become concrete instantiations of of whatever functions are going to be called with concrete arguments um and uh all of this is done to to get into the in this example llvm but as I mentioned here there are different options you it doesn't have to be llvm but whatever the code generation representation is we get to that point and then we go down to our Target binary oops and that's just what if you're used to just doing cargo build that's where you start at this level and then you do build and it comes down and you have uh your binary down the bottom here so normally you skip looking at all these steps but we had a bit of a look at them so what I want to move on to now that we' sort of talked about these IRS is it'd be really nice if instead of just dumping them like we did where just at some point during the compilation they just spat out a um a string representation a pretty printed representation of what they were it' be nice to be able to actually while the the program is running Say Hey I want the compiler to pause now and I want to have a look at that representation and I also might want to write a program which is going to do some analysis or some manipulation to that data um uh there might be things I need to know and and you can write complex programs that look at it so the key to doing this is that uh rust allows you to call the rust compiler through bringing in a crate in inside itself and this is is the rusty driver module so when I gave this talk at Russ Brisbane I think that there was a little bit of trouble when I said that briefly so I'll try to I'll do a bit of stressing over this point because it might be a little bit of an elusive one but you can write a rust program here this is just the the main function it's a bit pseudo Cod like I'm just doing the lipes there for the arguments but this program this main function will look at an internal Rusty crate called the driver and in there it can create a new instance of the compiler and run it so this whole program is going to get compiled by Ross C and then when you run that binary that binary is going to again call a new version of the Russ compiler we're going to point it at some Russ program here and compile it and the the reason why we would be interested in doing this nested compilation is is this Rusty driver compiler we have the ability to pause that compilation and manipulate the IRS there and we can write complicated rust programs you know in the lines previous to this when we pause it to be able to do uh analysis and manipulation to it so if you want to do this uh type of thing there's a couple of steps so the Paradigm to bring it in is you write extern crate Rusty driver in your rust program and um you also have to add a feature flag that I'll show in the actual Source but this is the the crate to bring into do this type of thing but you need a little bit more than that if you just do that and you try to run it um Russy is going to BFF and say uh I don't I don't have all the things I need to be able to do this and if you're using rust up which I certainly recommend you use you can just uh it will give you this command on the command line it say hey I need to install some more components these are the ones I need to to do what you're doing uh so I guess first I'll just point out where you can where you can find this so if you rem recall from us talking this is the this is the rust crate and in there most of the code that we care about for this type of talk is in the um the compiler and in here there's there's two uh crates there's the rusty driver and the rusty driver impul these are the these are the crates that I'm referring to here uh I think driver just points to the driver imple um but in here this is where the the code is that we're going to be leveraging to do the nest and call in the compiler so what that looks like is uh this so this is what I would consider the minimum example for running the compiler um Jeffrey you maybe give me some feedback do I need to bump up the font size or anything like that uh it looks fine to me but it probably wouldn't hurt it seem as like have enough space on the right side unless you're going to use it later no worries cool so uh here uh maybe before we look at this source code I'll bring up that inside this Source directory there is only this main. RS file but something that I have external is this hello world uh saying hello rare skills but uh this is actually outside of the source repository and if I hover over this it says this file isn't included in any crates so this is not going to be involved when I do uh cargo build or any anything like that it's it's it's separate but I am going to point to it later I just want to point out now it's not part of the the package so uh this this uh minimum example of using the driver is a little bit simple we're not going to do any Interruption of the callbacks we're just going to run the compiler from inside a compiled program and to do that we bring in the crate Rusty driver if we're going to do that we need to add this feature flag saying that we're using Rosy private and then uh as long as we've added those components everything's all good the first thing that uh I'm going to do is I'm going to take some arguments over the command line and that's I'm going to point to this hell World program and what I what this Rusty dri compiler needs to take are the arguments what you want to point at that and it takes those callbacks like I said the whole point of this is to be able to interrupt the compiler but uh we don't want to interrupt it at the moment so all I did is I just made an empty struct callback oops I made an empty struck callbacks and I implemented the the necessary TR in order to be argument for here but I haven't done any changes I it's just it doesn't have any um overriding of the functions in here it's an empty block so this is just going to be the default callbacks which as it turns out just do nothing they just let everything pass it just calls the compiler so what we're expecting if I do I guess what I'll show first actually is if I cargo build I shouldn't end up with um anything happening but I'm going to build this main program so cargo build and I now have in my target here a binary which if I run the binary is going to call uh the rust compiler so I can run that with cargo run and if I add an argument of uh hello do hello. RS hello rest skills what we're expecting is we'll come into this main function we're going to grab the the path to this rust function here which I I called um build before and no binary got built for this this external uh source code here and then it's going to run the compiler so I should be expecting a binary to be output um in fact I can show that it's going to call the rust compiler by not pointing it at the program and it just prints out the help message for Russi and it's telling me like you you're doing wrong usage if you're going to call Russ C you need to give me an input so uh that's annoying me um so if I do that again but I add an argument this time pointing at that we should expect it to build a binary that's going to print out out uh hell rare skills and it does build a binary and if I run it says hell R skills so hopefully that's I mean that's pretty simple we haven't done anything too crazy at the moment but hopefully the idea that we're doing a nested call to the compiler is clear so I'll remove that binary and actually first uh I'll go over so let's have a look at what that trait is that I was calling um inside the actual rust compiler here so I guess probably kick it up a bit so if we come inside uh Rusty driver imple Source lib and we have a look eventually we're going to see the ah here we go we're going to see the Callback trade so this trait here it has it takes you can have a config function but there's three other functions which are interesting to us there's after create root passing which I did notice when I had a bit of a look at this before it looks like this is deprecated but I think it still works uh at the moment but after create root paing it takes a interface of the compiler and a queries and what you can do is add inside the body of this function if you over write it some uh your own custom code to to look through the query system at what's happening inside the compiler at that time and then you can choose to continue the compilation and if you do it'll go into after expansion which happens after the macro expansion and and the same thing is possible and and there's one more point that you can enter after analysis sorry could you could you explain queries a little bit really quick you may have mentioned that before but maybe didn't catch it uh the queries are a system that's inside the rust compiler to try and get information from different points um I think for the context of this call uh or sorry for yeah for this presentation all we need to know is if we uh give a a a query that says can I look at the typing context that's something that exists and the typing context is the thing that has all of the intermediate representation so I'll show the way that you uh craft that query and then you can take the typing context and do some manipulations on it there's more stuff that you can do with queries um they are as the name suggests a way of um uh feeding things through the rust compiler at particular points uh that you might be interested in looking at here cool thanks um so there's three points that we can interrupt the analysis and and oh sorry interrupt the compiler to do some analysis and if we go back to the uh diagram that I have here after crate root paing is happening after we've done the Lexing and paing so we've just created our initial abstract syntax tree the macros are still uh unexpanded and then we have our second query after expansion which happens after the macro expansion and name resolution so this is essentially happening on the abstract syntax tree so no analysis at this point has happened because our first bit of analysis happens when we turn it into here and our last bit of analysis happens after we get all the way down into the mirr um so our third uh callback is after analysis so we can have a look after we've done all of that so we will be expecting that at these points uh the type checking and and the the trait solving all of that stuff isn't happening and like if we have a program that is uh breaking the borrow checker for example we would still be able to analyze what that abstract syntax tree looks like like it will still be able to craft one and we can still examine it here but if we had let it go to the point where it would go to after analysis we would expected the B Checker would throw an error um and that's uh actually what I'll do for the example as I'll we'll have a look at that so uh so here I have uh expanded the uh struct that I had my callbacks um the Ino that I had before was just an empty block if you remember it did nothing it was just like that but now I've added in um some EMP these These are admittedly pretty empty uh callback functions but I have all three here there's after the crate roote pausing I'm still going to continue the compilation I do have the ability to stop it you can change this to stop and it won't keep going which means it won't create a binary here but uh what I've done for this example is I've just shown you can add your own custom code at these points uh this is just for the sake of example all it does is just prints out what phase I'm in and if I well if I build just to be explicit again if I build nothing happens because it's just building it doesn't need to take an argument it got upset at me uh but it doesn't it doesn't create anything but it's when I run it that it runs the compiler for the second time and I'm expecting that in the running of that compiler it's going to print out where it is and so all three of these get uh communicated if after expansion I say stop so maybe before I do that like we'll notice that because I didn't stop it at all it created the hello binary and hello binary does what the example does there it says hello rare skills but if I delete that and I stop compilation after expansion then we won't see this after analysis printed so doing that it it doesn't get to after analysis now to show what talking about where if we haven't done it if we haven't done any analysis that means that rust programs that we know are actually illegal uh we won't get we will still be able to look at their and things they'll still be legal enough to be turned into an abstract syntax tree so if we let a uh it can be a a u32 five if we we know so I haven't flagged this as mutable and so if I try to assign to it again that's that's not allowed in Rust um this is outside the scope of the rust analyzer so the rust analyzer isn't able to do the the extra bit of work where it usually gives you the red underline it's it's this is as syntax makes sense but it's it's a broken rust program it's not allowed and so if we go to here if we're stopping the compilation after EXP expion then when I try to do this this is still pointed there it's not it's not baffing it's not throwing an error saying hey this is like you're breaking the rules I found that out in the analysis but um it's also it's only stopping here we're not going into the analysis at all so it's not creating a binary if I let that continue we should see it throw an error uh even if I stop here it shouldn't get to the point even where it can print this because the analysis should should throw and and that's what happens it straight away realizes that's uh you need mute if you're going to reassign to a variable so that's showing that we can inject code and we can look at different forms and the different points mean that different guarantees have happened to that code and so we'll look at a bit more of a example where we actually grab the typ in context like I said before and um uh do something with it with a query so if we wrote something that doesn't even syntactically make sense then like um I think if I try to go uh a equals and then what's another key if I go a equals function this should be enough for it to say I I can't do this uh I think if I think we won't even see after expansion I think that won't even be printed so let's have a look oh no yeah okay it it does actually that's interesting I thought that maybe I thought that I thought that there was more uh checking happening when it was p things in but EV what if we change FN to something empty so that it it's uh it doesn't because it might be maybe it's still might be trying to uh see it as a variable or something true that's a good idea if uh semicolon so that it doesn't read it off the print line or I could just remove that actually let's just do this a equals and then nothing so if we run that ah still interesting so it does create an abstract syntax tree I I did think that there was more of a of a guard here for a well-formed program but evidently not it will still go ahead could is it possible that it tries to make sense of the rest of the program which in this case is empty before presenting you all the errors because if you if you throw at the first Arrow then you're going to have a hard time doing any oh actually that is true is like a really rich error system going on where it doesn't just hit the first thing and then throw it tries to collect as much information about your program as possible so y did have a good point there um that it probably has recorded the error but it's continuing as far as it can uh with its compilation in case there's more information that it can spit out to you got it so if we wanted to break it out the after expansion phase that would be like if we put some macro in there that doesn't exist or something um that's a that's a good idea if we do ABCD macro I wonder if it will expand this it thre an error after pausing continued with expansion so both it's interesting that it sees the error before the expansion I'm guessing that standard eror is getting passed through this thing so this printing after expansion is you can think about this is happening it's the first thing that we're doing after expansion so this is after expansion the first line is here so the analysis of whether things can be expanded uh is is indeed happening before printing this line got it yeah but but the error is but with this ER thing that's printing out here that's getting just streamed from the compiler through yeah this this apparently is not delayed it just goes through straight away unless it's racy but I don't think it would be racy okay cool thank you yeah um so yeah we'll have a look at something a little more juicy I think I can go get switch seven okay so now here what I've done is I've got rid of I've added in a few more crates to be able to look at a bit more more stuff and uh I've got rid of the uh after passing and after expansion callbacks and we're just looking at after analysis and so uh looking back to the graph uh I'm in at this point here I've we've all the borer everything's fired um we have our maximum guarantees I guess that we can get out of the compiler and this this is the construct with the query to be able to look at the type in context so there is a a function for the query saying give me the global context you unwrap it you get a mutable reference to it and then you use the function enter and enter takes a closure and that closure gives you a handle to um the the Lambda variable which is the the typing context here and through the typing context you can call um tons of the internal Rusty functions there's there's many functions at many different stages that take a uh Ty CX which which I'll show I'll go over and show you what that is and you can do a lot of things with this any function that takes one you now have the ability to call uh so here just for the example I have uh used the the span crate to grab the def ID of of the local crate like the crate that um whatever F what the crate of whatever program we're pointing the internal compiler at it's going to grab that crate number and then from the typing context I've said give me give me the name of what that crate is turn it into a string and then I get the crate ID again with could have just saved into a variable that I got it again um and I'm going to print both of those out so this is stuff that you normally don't really see at all that's inside the internals but um using using the the tcx you can have a look at this stuff uh so I can see that the name we pointed it at this hello. RS program uh and also why binary didn't get built as I told it don't build one at the end just stop so I can see the name of the crate and I can see the crate number and since there's only one thing one crate here with printing out this string it's zero index and so it being zero makes sense so we might have uh a lot of I mean depending on what you want to do inside the compiler you would probably be motivated to look at some particular thing um for just an example uh I thought about how if you use rust C uh one of the one of the examples that I gave last week was um the CLI command that you can do das Z on pretty mirr and it will print out so after piping through looking at all these CLI ugs it's it's going to do a pretty printing um a pretty printing of the program that you pointed at so so here the the if I run that command that's from the CLI uh I can see this this is the mirror of the hell World program so it's got our warning at the top that we see and um the basic blocks and it's got a bit of the promoted thing here but another thing that we can do is this is just the example that I chose but uh inside these the the rust crates there is the very function that does that and all it takes to do it is a handle to the typing context so inside here there's a a pretty and if you have a look for right me pretty this uh is going to print out a human readable representation of the Mir and so what it takes as an argument here is a tctx and what this query system is giving us so any function that's in here I think we should be able to call and we can do our own manipulations or our own um we can use what's available in the rust compiler to to look at what we want so maybe for the sake of this example we're going to just call that direct directly at least that's what I thought of as an idea and so uh we can bring that function in so here it's it's not able to route to it because uh I haven't brought it in but if I use the rusty middle Mir right Mir pretty bring that function in all it needs is access to a tctx which we are given by using this this setup and if I do the cargo run and point at it so first it should print out the crate number hello uh the crate name and number and then it does indeed call that internal function that we saw before so this is showing how we have some amount of ability to sort of control and craft what we're calling uh I saw that this prints out it promoted so we can see here that this constant is a constant that gets promoted which um is a little bit out of scope of the discussion here but uh another thing that I thought to do is maybe it would be interesting to be like well I want to know what a promoted looks like like what what is this and so what I did to discover what that would look like is I went into the rust compiler and I I Greed for like functions that had the word promoted and I found this one here promoted me here and it takes a t a tctx and so I thought well it needs a def ID and so in the previous rounds of analysis I know that there's a um in the here everything that has a body gets a def ID so I'll from the typing context I'll grab all of the here I'll get all of the body owners and I'll iterate through them and if they have a promoter I'll first print out its size we can see from here that there should only be one that's all we're expecting but I'll print out all these promoters and I'll have a look like what what are they for every P that's in there let's print it out and have a look at what that is and running that okay it gives us a warning there but it spits out a whole bunch of the internal representation of like what what is this promoted thing so the version that we saw further up here this is like a a pretty printed version whereas this is just dumping out kind of the the the struct of what it is um so that was just some way that getting this representation I don't think you can do this from the command line um but you are able to do it and much more by interacting with these callbacks and that's sort of the point that I was hoping to demonstrate here now what's uh the length mean in here uh promoted what's it yeah so uh there might be multiple things promoted so this gives you a vector of promoted things now we can see here that inside promoted there's zero and that's the only thing that gets pretty printed so there's just one thing in this promoted um but I did the length to it I just wanted to see is there for this example I'm expecting there to be one thing and um there was indeed there was just one thing so that's why that was there um cool so this uh this ends up being handy for us at runtime verification we use this setup to do um some analysis so this in a uh an environment where this would actually come in handy aside from just poking around and looking at some things we have a project where we want to get out all of this um intermediate representation called the stable mirr and we wanted a Json serialized version of it and that didn't exist so we used this Paradigm where we took some augs same thing and then we call this function which is a a stable Mir driver which is from driver so I'll go and show that in a second and we we pointed at this function here called emit smear which comes from the printer directory oops and so that driver is doing the same thing that I did in that example we have a a callback this callback in this case it takes a function which ends up being that emits me a function that I I described and it does the same thing it grabs the queries says give me a global context um with there I want to enter I give me a closure the the Lambda argument is going to be the typing context and then it says I'm going to run that call back function so in in our case it's the um the emit smear function and so we use the rust C driver and give it the arguments and give it the callbacks which we've created here so why we want to do that like I said is we want to do uh some serialization and the serialization that we we need we have some we have some problems because it's not in a form that that's ready for serialization so here is the Amit smar function both branches here called andit meere internal and this is a big block of code that we're not going to dive into but here's where we do all of the serialization to Json and so that was the whole goal of this project is we want to serialize this stuff as Json but what we had to do first is um we had to do some uh like normalizing of the form we had to do the mon monom moralization ourselves and so being able to carry around this typing context and uh do all of the things we needed allowed us to get this um uh mirr in the form where we were able to serialize it so there's a uh like industry level use case for where this sort of stuff comes in handy and once we have that serialized um version of this stuff this is input to another um program that we write a tool that we're building so that's the motivation for why we do that um cool Cen backend so at this point we've been interacting with the IRS at this level but you do have the ability to let all this be let it go through and replace the cod genen with something that is uh more what you need for your use case inside the Russ compiler you get three three for free I guess there's llvm the default one GCC and crane lift but you aren't limited to just using these in fact if you write your own all you have to do is implement this trait C genen backend and call the rusty driver function set make Cen back in and point it to something that successfully extends this and you can have your own custom code generation um I'll show just briefly what those traits look like so the Cen backend trait that you need to extend it's it's quite beefy there's quite a lot of stuff that you need to put in here but that makes sense because you you you need to tell rust how to generate all the code from from the IR um and S go ahead so if we were like compiling Russ to some custom blockchain which has its own VM this is the part we would override um it has it own instructions up I think it is well I guess you could do that yeah that that is something that you could do um I'm not aware of anyone that uses this in the blockchain space although that doesn't by any means mean that it's not used there uh but yeah this that if you wanted to do that this would absolutely be a way that you could do it you could uh create your own Cen backend and call this function pointing at it and it would uh spit out the the Assembly of of your definition uh according to the extensions of the trait that you defined yeah so that is possible okay I think somebody just a similar question um are the Russ ccore driver callbacks used to generate a solidity verifier uh Doo for a Halo 2 proof um I don't believe so um I don't I don't know there is it I'm not sure if anyone is consuming M from uh a stream like that and I would think that if you were to do it you would need to do it after analysis this uh and the reason why I'm not sure if people are consuming me like that is there wasn't really a serialization for it that we came across and so we had to do that and so what that means is no one was using it in a portable sense um I suppose if there these people were doing this Halo to in the rust environment because you can then just access the stuff directly and write your rust program to do whatever that might be possible um yeah I can't give a concrete answer for that but if what you needed for that was to look at the IR then uh yeah you could hook up to it or another way you might do it um it it might be through this Coen backend so this project that I mentioned here Carnie I think they use this uh sort of Paradigm to be able to transfer things into uh what they need for cbmc but to do bounded model checking um I encourage I don't think we have the time to go into Cary but um I would encourage anyone that was interested to to look for the this trait being overwritten and dysfunction being called inside there to see what they're doing with in a in a real world program okay I just got confirmation to answer the question thank you nice okay building rosi so this is where I'm going to point out uh some things so if you're using Russi this is just a clone for it how am I doing time yeah okay if you're using Rosy uh there's every chance that the way that the compiler is uh will make it hard for you to and I'm saying it's like if you've got this downloaded like I have and you and you've decided I'm going to change things I'm going to hack away I'm going to do it for you know Fun and Profit um there's a couple of tricks that you can do to make that that a lot easier uh for what you need so the first thing that I'll say is oh there's a couple of build scripts I'm going to get the compiler building and then while it's building I'll uh explain what they are because the first pass is a little bit slow so if we call this x.p which X and x.p are the build Scripts and we say that we want to do it with what is it it's setup it's going to download a few other things and then it's going to give us five options and it says here are some profiles um which one are you interested in and one of them is compiler and since this this is about like maybe we want to hack away at the rust compiler then choosing that one has a bunch of configurations set up so that uh our life is much easier in doing that so just give it a tiny bit and we'll go through so it's asking us what would we like to do there's a b c d e we're going to choose B we would like to work on the compiler uh it asks us some things like do you want hooks uh what are we using we're using vs code and uh no we don't need to worry about that cool and so now the compiler is set up for us to do uh programming on it I'll explain the compiler thing in just a sec but let's first get things building so if we do time dox build this is just building the Russ compiler raw and so we'll time it and see how long that takes so I'll that's what away in the background now what exactly are you signing up for when you choose the compiler configuration you can see in this file here config example. toml a whole bunch of different settings and this is a big file because there's a whole ways that you can customize this compiler now what's happened when I've chosen um the the compiler profile is it's taken out a series of these and it's set them to a configuration and save them in this config dotl uh okay sorry it just points to the profile for the compiler but this uh pointer two file has a bunch of configurations a bunch of settings that are here that are going to be useful for us so there's a whole bunch of different stuff General build configurations and this file is great for explaining everything that's in here but obviously it's a bit too much for a newcomer to know what do I want to turn on and off so what uh the rust compiler team did is inside source and bootstrap there is defaults and they have a a few default settings that you would be interested in and so when this config dotl is pointing at this profile compiler it's pointing at this here which as you can see these are uncommented this is the setup that it thinks is best if you want to build the compiler and and you're changing things and working on it so that's already one thing that's really nice is we have things like debugging is on in the correct version that's going to be useful for us incremental compilation is enabled um and you can read through and have a look at what's in here in particular so we're still building here but this is Maybe where I'll point out what exactly it is that's building once we start the build uh there is this build directory and in here under the name of our Target so this machine that I'm on is a x86 Linux machine so my target is is this one here the only one here and in here maybe I should have shown this earlier but what it's been progressively doing is it's building a stage zero compiler it's building uh stage zero Russ C the standard library and then it starts moving on to the stage one compiler so you remember from the last talk I mentioned that rust is bootstrapped rust is built in Rust itself this stage zero compiler is like a small subset of the language that ends up building this bigger version of the compiler stage one compiler and then I think when anything is going to be uh actually published out there like a binary you to download from GitHub and use rust C with or something through rust up this stage one compiler tries to build itself again and the result of that is the stage two compiler um and I think that's just for extra safety like that um that this should be able to build itself identically now um everything that I'm explaining here is available and I would encourage uh people to look at it in in the rusty Dev guide pretty much everything I've exped in all this talk is just a filtered down interactive version of The Rusty Dev guide so in how to build the compiler and run it there's also a suggested workflows so that all of that stuff about doing the profiles and stuff was in that first bit but there's a bit more that you can do to even make things faster and that's using uh yeah it keeps stage and so that's what I'm going to explain next so if we save so our first run here oops our first run here went at this time that's not really the kind of time that you'd like to to have for compilation if you're working on something um this machine's multi-threaded and so 28 minutes of user time went down to 3 minutes but even so if you make a change to Something in here and then you want to compile and see what happened to your change that's not really acceptable by 3 minutes you've already uh opened Instagram and and you're lost the void um but let's let's choose something to change we'll choose oh I should change that um yeah I'll change that function actually that uh dumps the mirr so that's in pretty uh so let's get rid of that uh I think it's called function right near pretty that's it and so we're going to make some changes to this um so here's where it writes the line okay actually I'm just going to grab that so we're going to do that again uh but now let's uh what do we want to say we want to say oops what am I doing uh uh I guess hello this is an example cool so now I've made a change to the rust compiler uh the documentation here tells me that and and I saw in the build here oops saw in the build here that a stage one compiler okay it doesn't want to show me but a stage one compiler exists for my target I've only changed one minor thing here I don't actually have to build the entire compiler again and I can tell it not to do that by saying uh what did I do before I didx build and I want to tell it so keep keep stage one and so what it's going to do is incremental compilation and it's going to try and keep everything from stage one um that it doesn't have to change and so we'll see how fast does it build it with this in mind and so here all of these Russy uncore blah blah blah is it building the crates that it it must build to accommodate for the change that I put on on this line here and it shouldn't change anything else although it will rebuild the standard library but this time now is much better than what we had before so if we go to here now we have real time of 21 seconds and only a minute and 20 seconds of user time so this is this is pretty good like 20 seconds isn't too bad a wait for building a extremely complicated compiler it can be a little better but what I'll show is that we have actually changed the binary at this point and so by doing that what I need to do is go inside the build directory under my target which if you remember was the x86 Linux one uh there should be the stage one compiler and in here there should be a binary and there should be Russ C because that's the whole thing we're building Russ C and if I I'm GNA have to give it I'm GNA have to give the file uh is there one just no there's okay that was crazy um this uh hello. RS oh yeah here we go and so function main I need to give it a rust program to point at uh cool look oops so there is a function sorry there is a program to point the Russ compiler at which is hello. RS and we're expecting it to compile it and it should have a binary there and if I actually I need to make sure that I call this here so what I need to do is uh the dash Z Unpretty equals me and it should spit out on the command line here yep and it's got our change our change is in there so we've now made a change to the rust compiler and admitt all it is is a printing change but I mean you can do whatever change you want um you could make addition subtraction and subtraction addition or something like that um and this is like a pretty reasonable turn time now the last thing that I'll show is you can actually kick this time up a little bit more so let's change this text so it has a diff for something to to uh you know make the difference with and come on W yep so I need to build this so I want to do keep stage one but if you actually tell it I I don't fully get this but if you point it at the library you say build the library because you can put arguments in here uh like you could do that without the keep stage one but if you do that and tell it to keep stage one one past that it was doing what did I do wrong building the boot trap oh oh sorry this is SSH so maybe it just got a little bit hairy yeah okay I think that the the text that it was showing me was not actually what it was on the server um but here it's building it again and we'll have a look at what time uh I don't really know why when you add the library see it didn't build the library here if we go all the way back up back here you can see that it started building oh no it started building rust oh I think I get it so rustock must happen after the library uh but here we're telling it just build up until just build up until the library and then we're saying keep stage one okay I get it and so this time is much faster so if we have a look here we're now down to 36 seconds of user time but it actually took 13 seconds to to make that change um now we can kind of hack on the rust compiler and we're not going to get uh interrupted in our flow of ideas as we as we start changing things um yeah I think that's everything that I have to say actually I think I'm done which it looks like it's not a bad time to be done so yeah I'm up to questions really nice thank you sweet cool what what are the most common backends that get swapped out if if it's not one of those three that are included um the most common backends that get swapped out are definitely going to be between cran lift GCC and llvm um but aside from that uh as I mentioned like Amazon I think it's Amazon that does Carney it might actually be a third party or or at least Amazon's really tied with it but you know they have their own reason for they want to do this bounded model Checker and so they have that and then there's uh projects like anas and Caron and so they they do a similar thing they exchange the Coen backend to try and uh point to the different interactive theorems that they're they're working with um I'm sure that there's more or I have those examples off the top of my head because I'm from the formal methods world and those are formal methods projects that do this any queries namespaced oh our queries namespace um so th those are already ex these queries are already existing inside the rust compiler and so you have um a really limited option of what you are so in that sense they are name spaces in there already predeter determined uh you they they grab the the the typ in context from where it is in the analysis that uh sorry in the Callback that you've got it um trying to think what what do you mean exactly by namespace like at that point uh there shouldn't be any problems with uh like the resolution of any names like as in I don't think that you should be able to call a fun by the name of a query and it be confused when you're in using the query system it's like hang on which function do you want because it will be fully elaborated by that point oh are they across different layers uh yeah I mean you could think of that in the sense of um there's different IRS that still through the process of the compiler happening um they're still there and accessible to you I sort of I showed that with the example where what I had was I was looking at the mirr but um I was looking at the promoted stuff that was in the mirr but actually I wanted to grab all the def IDs and all the def IDs happen in the here so I said okay well we've already processed all of that here it's just sitting there give me that let me see all the def IDs and then um from that I'll use some mirr quer H sorry some some functions on the mirr uh that take a def ID are there any compiler vulnerabilities were discovered from doing uh uh yeah absolutely yeah so uh there's um you know if you go to the Rusty GitHub there's a issues board there um so the compiler does have bugs nearly all compilers have bugs uh they're just hard to find the solidity compiler has bugs uh it has previously in history had bugs I think the Viper compiler as well because I think a lot of people here are more familiar with like evm World um these These are an inevit in complicated software and the rust compiler itself has historically had some bugs uh that it needed cleaned up um and it's there's still ones there so when the compiler itself errors that's called an ice an internal compiler error and generally if you're um doing something and uh you get an error which is it's it's clearly not a normal error it's going to be have crazy symbols all this stuff going on dump like a ton of information and if you start reading through that information you might be able to detect you're like oh wow this is like an internal error this isn't anything to do with my project this is like internal to the compiler it's it's gone Haywire uh usually that stuff you want to go to the compiler GitHub go to the issue board and uh you know say the conditions that your ice occurred in and then they can patch the bug really the only way to avoid uh bugs and compilers is to well I guess you could try to formally verify them and there are projects that try to formally verify compilers but um compilers are pretty complicated software so um there's yes usually you have to restrict to a subset of the language you can't fully verify the entire compiler like you go okay well we'll you know verify C but see that looks like this kind of thing it doesn't have these features maybe um yeah do making sure that a entire compiler that's as complex as something that can build an entire operating system um is free of bugs is a pretty big challenge Back To Top