[Music] hi everybody I'm Steve uh you know as I said earlier I'm here from oxide computer company and this is how rust makes oxide possible uh before the pandemic I used to go to a lot of conferences and speak at a lot of conferences this is only my second talk back and my first time back at a rust conference so I'll admit I'm a little nervous I decided I would try something a little different at the beginning of this hey there we go there's a microphone uh I decided I would try something a little different at the beginning of this so I wanted to give some shout outs so first of all Floren thank you so much for having me here uh you know an old friend at this point we realized that the first time I came to Berlin was 12 years ago and that was thanks to Floren is one of my favorite places in the world and so you know I'm so excited to have so many good friends here and so many good times so I'm really excited to be back uh to oxide thank you for giving me the week off work and you know paying for my travel and stuff uh to all of my colleagues thank you and sorry um we're about 60 people at oxide and about to talk about the work of everyone at the company you know most of the stuff I'm going to talk about is not things that I personally did so they deserve the credit for all of this and if I get anything wrong about it it's my fault not theirs um and then finally uh my barber who gave me a great haircut right before this conference and I forgot to bring any product to put on my hair so it looks very flat so if you ever see this I'm sorry Jason um lastly I would like to give a shout out to this person on Hacker News who 72 days ago oxidize OS was on on Hacker News and they left a comment saying like I don't think this is related to oxide computer so they're probably going to give a season desist really soon and at this point Flor and had asked me to be here but it wasn't announced yet so I couldn't make the joke that I wanted to make which was like not only do we not sue people over this name but there's even you know oxidized conference that we're going to have you know this year so I think it's really funny and amusing that you know you have oxide and oxidos at you know this conference so anyway uh you know glad glad we all picked up the good name basically so okay so beyond all that stuff um some background on oxide I really don't want this to be like a sales pitch but I do need to explain what we're doing so you have some context for the like breadth of the stuff that we're doing at oxide so um we came up with this tagline the cloud computer but fundamentally the business is very simple uh you give us a bunch of money and we give you a computer the computer looks like this it's a whole rack of servers but we call it a computer because it's been designed as a cohesive hole from the top to bottom so we've thrown out a lot of the stuff like you can't buy a single U or one of those individual sleds you kind of buy um I'm from Texas so I like to joke it's like ribs you get like a whole Rack or a half rack uh at a time um and so the the idea here is that we build the hardware and the software designed together to make everything work in like a cohesive integrated way uh it is not a mistake that several of my former colleagues and the people who started the company are from Sun so a lot of people make Illusions to like this sort of being like a sun 3.0 kind of situation or I was an apple Fanboy as a little kid even though I haven't owned a Mac in a long time so a lot of my motivation was being a little sad that uh you know I was born in the middle of the 80s so I kind of missed that era of computers and I never thought I would get to work at like a computer company so to me oxide is kind of like an apple but not evil I know some people kind of think that's a little I'll get it that a second um but uh sort of the idea is that you don't manage individual computers you get an API where you can like it's like ec2 so you upload a VM image and then you get uh you know a VM and you sort of manage that kind of thing um and so the other part this is the not eval part of the apple and a part that matters if you're interested in I'm going to share some code but also if you'd like to see more about what we're doing with rust um all the code that we can legally make open source is open source uh there's still a couple little things uh you know AMD give us your PSP code please um but like all the code that we write is open source by default uh and uh generally like available we think that's really important not only for our customers but also sort of to share what we're doing with people so anyway sort of the background on like why the company is um I'm not going to talk a ton about the hardware specifically because I'm a software person but um sort of to give you an idea of like what we've built and like the degree to which we are writing new code because I think that matters for the context of this this is what one of those sleds looks like on the inside uh you can see that giant heat sink there has like an AMD chip underneath it and we got some RAM and we always talk about the fans so uh I it's just like become a meme at this point but one of the fun things about not having to do with the traditional server form factor is that we can use these big old 80 mm fans you can see the three little yellow things there um I kept joking that for April 1st this year I wanted to start an oxide only fans and just post pictures of the fans to only fans didn't manage to pull it off maybe next year we'll see how it goes um but on the on the left here is a a development board for uh the um service processor AKA totally not a BMC so we've like thrown away the baseboard management controller stuff and written our own uh and so you can see you know we got an arm chip there in the middle and a whole bunch of development board stuff but like I'm not going to be speaking really a ton about the hardware today as I said earlier because I'm a software person but we have a bunch of like very gifted electrical engineers and we are like not using any reference designs from AMD or like anything like that uh you know if you buy something from a really big manufacturer they tend to use those things we're building everything from the ground up and so to do that we had to decide like how do you use the the software part of this thing and so as you might guess from the fact that I'm standing here and it's been said a zillion already basically all the code is written in Rust and that really matters because like for for me uh you know having worked on Rust for 10 years basically uh and you know I started using rust in 2012 so I'm one of the few people that can legitimately say I have 10 years of rust experience on LinkedIn or whatever um it's been really like satisfying you build a systems programming language but like there's nothing more systems programming language than this stuff you know what I mean like building an actual computer and so it's been personally gratifying to like use rust in this context where you know we're dealing with the lowest levels of computing um one more uh picture of the hardware so this is what it looks like you know not a glossy rendered version on the website but like what it looks like in a data center uh the pictures with people for scale are like not as good so you know but like it's it's taller than me and I'm pretty tall uh and then finally there's no cable Gore so you know you can't show a server without showing a bunch of cables coming out of the back we ship them pre-bled and it looks like that but anyway uh enough about that let's talk about the software and like the stuff that we've written um I'm kind of vaguely going to go from the lowest level to the highest level and kind of like it's organized that way so I'm going to show you a bunch of the different stuff that we're doing and some of the codes so the very first thing that ends up running on the rack that we're allowed to run is again I mentioned if you haven't dealt with AMD before they have some own proprietary stuff when their CPUs boot up you have to deal with but um the Pico host boot loader or phbl which I think is sometimes pronounced like Pebble but I'm not 100% on that sorry Dan in advance but uh basically this is a bootloader from x86 so uh this gets loaded from SPI via the PSP and then starts in 16-bit real mode uh if you've never had to deal with x86 like boot up stuff before uh don't unless you have to uh arm stuff is way nicer but the way I kind of like to describe this boot process is like we have to pretend that we're in 16bit mode even though no one is 16bit PCS for like a very long time and then you have to go into 32-bit mode and then you you know boot up into to Long mode or 64-bit mode so this code like handles doing all that stuff and then actually bringing up the the host operating system which in this case is lumos I'll talk a little bit about that a little more later but notably one of the things that's kind of interesting about this process is a little different I'm going to show you some very like kind of classic code about this uh in a second but um we are like completely getting rid of all of the like bios ufi layer entirely so it's kind of like a very old school thing in a sense like operating systems are supposed to manage your Hardware but like on Modern PCS it's really like ufi that manages your hardware and then operating systems talk to ufi so uh we call this holistic boot but the idea is basically this boot loader just loads up the OS and then the OS is responsible for bringing the rest of the hardware online so it's a really different division of responsibilities than your traditional PC and that's cool because uh first of all you know we can write a lot of the stuff in Rust but secondly uh you know it works really well and really fast because there's not all these intermediate layers that we're having to deal with so anyway uh this is the read me on GitHub if you want to like go look at some of that details but um the first thing I wanted to talk about with phpl is the sort of like distribution of the code itself so this is like 93.8% rust and 6.2% assembly um and this has been pretty static for at least like a month or two so it doesn't tend to change because it's relatively small so this project uh you know 2300 2400 lines of rust code to bring this up and uh you know that's pretty nice because it's pretty tiny um that's really important given that you know it's the first thing that's loaded if you need to load a huge binary that would take a very long time all those kind of things but another thing that I really like about phbl and a thing that's kind of like near and dear to my heart I got started doing operating system stuff in college cuz I I really wanted to do OS things and so me and my friends had worked on an operating system in d uh because D was like the cool new language and if if you're curious one of the reasons why I got into rust is because I used D in like 2008 and that was fine but uh you know like rust is like a lot better but this sort of like doing the bootup stuff for x86 like at the time 64-bit stuff was just sort of coming along and I would argue with professors that said it was stupid and no one's going to use it because pointers are too big and that Wast too much memory but like this kind of code of like the x86 boot up process as I said earlier like if you don't have to deal with it you don't have to but you know we all we all get traumatized and then trauma bond with things so you know I love this process even not and this is like this small amount of code means it's also pretty accessible so if you've ever been curious about how this works I think looking at the source code of phpl is like a really great way to sort of learn about how that boo process works but the other thing people always wonder when talking about you know okay how how much unsafe do you need right so like how how much unsafe you think is in this well I did some math and I decided to not do like a tool like cargo Geer which attends to tend tries to like analyze since unsafe can be a block right and there can be more than one line uh and I decided to check just like how many times is the word unsafe used in the source code and a small project like this I feel like this is pretty fair um there are some tests in here that because you know Russ makes it so easy to write tests that we can have tests for X6 bootloader um so I didn't try to like strip out test code or anything I just said okay like how many times is unsafe mentioned and that's 118 in the total tree as of this morning when I put this slide together um and as I said earlier 2,300 lines of code so this is 5% unsafe so when you know you read comments online or people that are new to rust they assume like oh you know it's nice if you're writing a higher level thing maybe you need to use like no unsafe but once you start doing like real manly computer things uh you're going to need to use a lot of unsafe and that's just empirically not true um you'll see this 5% number later actually which is kind of interesting um but another really interesting cool thing about phbl and sort of the rust tool chain um my cooworker Dan who works on this is really really interested in like rusts guarantees in this space and so we actually have tests that we run under Miry to make sure there's no undefined behavior and this is like caught certain patterns that we've used in the past you know because you're you're porting some code that does dirty terrible things that technically work and so when you're like you know trying to put that to rust initially you write it the same way you would write it and see and then you end up finding oh yeah that's undefined behavior and so um it's kind of really neat that not only do we have tests for a boot loader but also that rust tooling is good enough that we can you know check for undefined behavior in a boot loader and so I think that's really neat um and then finally uh if you've been testing or hearing about the pointer Providence work so this is kind of like given that unsafe code is kind of like a big question mark right now I mean we have some idea of where you know what is good and unsafe and what's not um one of the it's a really big Topic in that area is pointer Providence and this is basically the idea that your pointer I guess we're kind of in a church so comes from God Providence no not that way but like you know pointers are derived from other pointers and like what stuff are you allowed to do um Dan has tried really hard to make sure to follow all of the experimental pointer Providence rules and provide feedback Upstream to like does that model work when you're doing this kind of super low-level stuff um and so that's also been kind of interesting to sort of change some code patterns to make sure the compiler understands pointer Providence and how that stuff goes um okay so uh I wanted to show you a little bit of code from Pebble because I think it's neat if you're curious what this sort of like lowlevel you know if you haven't dealt with rust at this kind of level before um I decided to pick the IDT uh the interrupt descriptor table so basically this is the code that uh when an interrupt happens you know gets called so you have to set up a table of function pointers and interupt happens it invokes something um but is this the wrong thing no this is the right thing I just scroll down a little bit um finally another thing I really like pump up my co-workers a little more uh this first comment derived from the RX uh v64 operating system so Dan in particular has been around Unix for a really long time and so it's kind of neat RX v64 is sort of like a learning kernel that was made at MIT and so there's kind of this like connection to older unises and sort of that tradition um anyway I'm going to make this a little bit bigger um I'm not going to go over everything about this code but I just wanted to show you like sort of what a little bit of like low-level rust with inline assembly and stuff looks like um so uh you know you got classic bit structs setting up you know here's what the descriptor looks like of every entry in the IDT so you know you have all these different things that are uh named totally great like mbz1 I forget what that even means at all but you know this is the stuff that the CPU all expects to see uh and then you know we kind of because this is rust like we can still have structs like if you were doing this in C you would also write a struct probably I've also seen people do it manually with like pointer offsets and stuff but I think that more reasonable people would write a struct to make this work um but we also just get methods which is kind of cool uh you know and organize some of your code that way so we get methods to like return what is a totally empty descriptor that has nothing put into it or you know create one giving it this is called a thunk in this case uh but it's just a function pointer basically so you know hey give me a descriptor that has this function point wi er um you know as its thing and uh I'm going to skip over these methods later but like you know we set up where the offsets are and all that kind of stuff but like uh yeah and finally this is like you know a frame for when uh you know when you get called into an interrupt you got to save all the registers and stuff so we have all this code uh and then some fun macros to generate the stubs so like all of these things need assembly pre uh preambles and like post ambl that's ter postcript let's go with prescript and postcript I don't like those names either uh but just the point is is like you know you can use macros to make some of that code get a little better and then generate you know all the individual interrupts that you need to handle uh and then some fun inline assembly that you know is that said pre and post amble so pushing all the registers you need calling the actual function you put in and then popping it um and then some final some other little code that I'm not really going to Super go over but you know we got some back Trace code in here and then finally you know initializing it we can actually check by storing Atomic pool like has this already been initialized or not so we can guarantee we've only done it once so anyway uh this isn't necessarily the most super exciting code but if you've never dealt with rust at this level I kind of wanted to point out how like normal this is I mean some of the variable names are pretty concise because that's what they're named in the specification you want to follow the spec but like this is like manageable for people who have not done you know operating systems work before so I would really encourage you if you haven't done this kind of code uh to start learning about this stuff because uh I just just I think it's neat um anyway that's one example source code from uh phpl I want to share with you okay so sort of the next section up so um not only we like boot the OS but server Hardware has more than one computer on it that's kind of one of the weird things is that you're not just you know running stuff for the VMS but you also need management Hardware so on a traditional server there's a thing called a BMC and we have a thing we call service processor it's totally not a BMC but it does all the same things the reason there's hate on bmc's is because usually uh people put full os's in there and that's a large amount of attack surface and like there's also a lot of uh like history and people piling crap on top of crap um one of my co-workers said have you ever implemented scuzzy in JavaScript before and I said no and he said good this is partly why we're getting rid of these things um so anyway the idea is that we wanted to take those same like behaviors but we needed a strip down replacement for it that did exactly just what we did um the other thing that hubis is used for is a root of trust so it's kind of really important this is the thing I didn't appreciate um until I worked for cloud flare right before I worked for oxide you know if you have a server that's in a rack in some place in the world that you haven't like gone to you want to verify that nobody's tampered with your software so a root of trust is this special piece of Hardware that's able to verify that the code that you put on the rack is actually the code that you wanted and it's running the same thing and then there's this process called adastation that lets you kind of like thread that the whole way up through um all these things so uh hubris is this operating system that we wrote um in Rust which is named hubis because of course you know you're like you're writing a new real-time operating system in pure rust so there's a little bit of hubis going on and then humility is the debugger because you know you need to be humble when you're debugging things and so um huis is this you know real- time OS is like a weird complicated term so I just like to say an OS for embedded systems but it's got preemptive mult multitasking and full memory isolation and like a messaging system and about 2,000 lines of rust um I checked in the kernel itself there there are drivers in a real system so it's more than that but just purely for the kernel there's 103 invocations of unsafe so again that's about 5% unsafe uh in terms of uh you know a kernel um so I think that's also pretty cool that we're able to keep almost everything in safe rust um but one of the really interesting design decisions that we made for hubris and something that the reason where we chose it over some existing uh os's that existed rust and also like what allows us to do some of the things that we do specifically is hubus is a little weird and that we realize that if you could pre-o everything at build time then you don't need things like dynamic memory allocation because if you never have the ability to create a new task then you don't need to like have a list of running tasks which means you don't need to D dynamically allocate memory anymore so the way huus works is you like build individual tasks and you compile them in separate programs and you compile the kernel and then we assemble them all into one image in a flat memory space um and so uh you know that that's like a whole thing and so that works really well for what we're doing but obviously won't work for every system so right so there's trade-offs so that's why it's great there's such a a number of different uh operating systems in in the rust space and then finally another kind of fun little thing is like there's no code at all which is also very cool um just because it's again demonstrating you can use rust for this kind of work and just pure rust uh and it's like not a problem um humility is this debugger that we built that pairs with it that basically like uses a ton of dwarf and other things to be able to get really interesting stuff like uh you know so when you build that huis image you get uh all the debug info stored from you know in dwarf in that image and humility can kind of take that image and then a running Hub system and debug it that way so you don't need to put any of the debug information on the actual like chip itself you can look at it totally on the host um and so there's a whole bunch of other stuff going on um on there that I'm not going to get into all the detail but um there's a thing called idle we're going to talk about in a moment which is sort of an RPC system so huis has this like uh idolatry is the full name for IDL right like Cliff really likes puns it turns out um but it's a way of defining like RPC mechanisms and then the debugger can invoke functions in the OS remotely which is like kind of neat um another thing that's like very interesting and sort of rust specific is that because all of our tasks are running in their own separate name space there's actually system calls in hubis that let you do borrowing from other tasks so if you don't want to copy memory from one task to another right because that could slow a lot of stuff down and in micr kernels that's often a big problem you can literally say to another task hey this region of memory I want to allow you to borrow it and then you can use a regular rust reference so this transactions. color.bar uh you know is basically in this current task giving me a reference to memory that's in another task Nam space and we can know that that's safe due to rust's rules so we're kind of like replicating uh immutable and mutable borrows both but you know at a slightly higher level as part of the operating system's API and that lets you not do as much copying and so you can kind of see uh a little bit you know later borrow. WR at allows you to do you know a right into that um so uh I also wanted to briefly asso humility output so humility tasks is one of my f favorite things to do with humility because it's basically top for your our real time OS so this is an example of some output with like all the indiv tasks that are running on like a little demo thing and uh you know what generation they are which basically if a task dies for some reason they'll get restarted by the supervisor it's kind of very earling inspired so you can see the Ping task has died 14,000 times since this was running uh it's job is to actually be sort of like killed and the pong task was killed by another task and so you all this output so we built all this like Rich infrastructure um to be able to like debug these systems since this is like some of the lowest level and most important stuff in the system we to make sure that it's really really robust um and you can even get like stack traces of like okay that was killed where actually was that specifically and so you know you can see line numbers and stack traces and all that and so investing the time in building a rich debugger has really really helped us When developing the system um okay so uh is that all the time I have left okay cool 10 extra minut excellent cuz I was like this seems really low I was I'm going to show you some actual code for this too now cuz I was a little worried that I was going over um all right cool so I want to show you what like a slightly more than just an out of context screenshot of what a task in hubris kind of like looks like um so this is that pong task that was being killed by the the supervisor so uh we have this user library that includes a lot of stuff so most of the tasks just include it in general and then we want the pong task to be able to talk to the LED task the idea of this pong task is it asks the colonel hey please ping me every so often and then when it receives a message from the colonel it tells the LED driver to flip an LED right that's like just the basic hello world it's always LEDs blinking um so we need a connection to that LED's task and since all these are separately compiled programs we sort of need to insert a reference to the other task but we don't know that until build time so we got this totally not magic macro that makes that happen that says Okay I want to talk to the user LED tasks and it gives me access to it via this variable um and so then uh you know we're going to wait a period of time and we can kind of grab the uh you know give me a rust object uh that's from that sort of uh task uh and gives us a reference to it and then we say hey make a system call to set a timer uh that you know waits for this particular interval and then says please you know uh mask out just the timer bits whenever we get something and then finally um it's very common in systems like hubis uh or if you know if you're llang you're often in a loop and then immediately waiting for messages so uh you know that's the way the style of program is so the rest of this driver sort of sits in task sits in a loop and it uh has a like receive system call where it's waiting for okay you know uh did I get any timer notifications or have I gotten a message from another task uh and so um we we check to see who sent us the message and if it's not the kernel then we reply to it hence pong in the name uh and then finally like oh if we got it from the kernel we sort of know that it's the thing that needs to flip the LED and so you know here we call into that other task and we say hey LED task please you know toggle your LED and keep track of how many of those are and so this is a very very simple and not useful driver but just kind of to give you an idea of what it's like to sort of code in the system and you know we get to take advantage of all of the the nice stuff um about rust to make sure that this is really robust you'll notice that there isn't a single I'm going to double check that's actually true before I say that it is there isn't a single unsafe in this whole thing right and so it's really nice to be able to uh you know like when you're designing this kind of system you tend to have the driver that's dealing with the hardware that has unsafe in it and then it exposes an interface that other tasks call into so because like ultimately this will you need to use unsafe code to you know flip the right register or whatever to make the LED turn on or off that's kind of like isolated in a specific driver and that means that other higher level drivers don't need to use any unsafe code and so this like separation that Russ gives us the it's kind of like the same ability of how unsafe in regular rust lets you build a safe abstraction on top of it we can kind of do that at a higher level where tasks will contain unsafe and then provide a safe interface and then other tasks can not need to use unsafe directly um so that's an example of that uh and then briefly I also wanted to show you um ulter so there's kind of like a newer tasks pong was one of the first ones that was made because it's basically a hello world kind of thing but for more recent task we've been writing um we decided that you know uh maintaining handwritten bindings to the apis that tasks provide each other is a lot of work and why not make computers do work instead of humans so uh we made an IDL uh called idle and it allows you to uh you know Define uh a specific interface and then derive rust code both calling into it and also producing it so this is an example of an SPI uh device where you know maybe you want to exchange some stuff um leases is that like sort of borrowing thing that we were talking about earlier and so uh you can then like generate uh you know using build RS because everybody uses build RS to generate stuff uh you get kind of like uh a type that you implement to make this um thing work and then finally I want to show you an example sort of like a real world uh we have a single supervisor task in the system and you can call it whatever you want we call it he um so this is he interface that allows you to sort of like Get state or set state from an individual task and like do all these other things but this is sort of like a a more realistic example of uh you know writing this IDL and so um we're really big fan about that I will talk more about idls a little later uh okay so I'm not going to show you the actual LED driver because uh I don't have enough time thought I was going to have too little time now I think I'm not going to have enough so as I said a little bit before the advantages that rust gives us specifically in this sort of like low-level systems tasks are encapsulating unsafe is really really useful um as we saw in phpl there's not as much encapsulation but it's still useful to know like these are the lines of code where things go wrong and then also in something like huis uh being able to encapsulate an individual tasks means that you know one task going down doesn't bring down the whole system and it also you know makes it much easier just to find like okay something has gone wrong you know it's isolated to this specific part of the system and so that's been really helpful to track down some um relatively nasty bugs um other advantages that rust gives us here is their really rich command line interface and tooling inter uh ecosystem like humility would not be possible uh without like clap for example like there's a humility reppel that I implemented and it's basically like I'm able to reuse the same clap code that parses the command line interface to provide a reppel that would let you type the same commands inside and you know if I had to reimplement that all myself it would take a lot longer and it would be a lot less robust um in general you know I'll mention this a little later too but like rust's tooling ecosystem like we really love building tools to help us do our job better and the fact that rust has so many packages that are built around building really you know robust clis has been a really big boom um to you know us building things in production because tooling is very very important culturally at oxide um it's really interesting that ownership and borrowing gave us this way to think about IPC as I kind of briefly mentioned uh a lot of times in sort of message passing systems uh you know you're slow because you have to copy data from one thing to the another because you think about things being immutable um you know as boat said like functional programming was a great thing that showed us that you know we didn't necessarily need mutability and Russ is a fantastic trick to show us that we still can have mutability anyway and so that sort of idea of like taking that lesson learned from rust of okay one writer or multiple readers at a higher level we're able to put that in the OS and that's been really nice to sort of give efficiency in a paradigm that's traditionally considered to be not very uh inefficient um another thing I phrase this very specifically uh rust lets us keep the binary size under control it doesn't mean that it's automatically small binaries we do have to like think about this and we're on 32-bit arm so we don't have as tight constraints as some other people do but uh you know rust at least gives us the tools to be able to deal with binary size and keep things small to keep it underneath you know certain sizes so that's also very helpful if we use a language with like a big runtime for example we' run out of space it's just not feasible and so it's really important um and just reliability in general um if you've written a bunch of rust code in the past you know that it compiles it works is not true but it feels like it's true so you know uh that re that sense like what's actually true is that rust code tends to be very reliable and that really really matters when you're talking about the thing that's you know controlling the rest of your system so um okay at a slightly higher level I'm going to talk about this a little less because it's what I've worked on a little bit less um the control plane we named this Omicron in 2019 oops uh so we're trying to call it just the control plane now but the the repo is still called Omicron anyway and we'll eventually like we're we're trying to pick a better name at this point but if you never thought about what a control plane like does I'm not going to read all of this but basically the idea is the control plane is what handles kind of the heart of the rest of the rack so it manages virtual machines it manages provisioning it manages uh you know permissions between things like it manages metrics and monitoring and it's kind of like it's the biggest single project that we have inside of oxide um and so it's also like what gives a lot of different stuff um and basically what it sort of does from like a programmer kind of level is as I mentioned earlier the primary interface you interact with a rack is through an API so it has to be providing that API to people and then making that H stuff happen behind behind the scene um it also has to do stuff for you know if you're the person who's in charge of maintaining the rack not just for your users but like you know turning a sled on or turning a sled off and all that kind of stuff um and then dealing with all that other kind of stuff you know like remote access if you know a customer has a problem we need to help debug being able to like if they give us access uh you know being able to get in there or being able to say like please give me a crash dump of what happened if part of the control plane bro broke and you know that kind of thing so it's got a lot of responsibilities um and then it's got to deal with like f management and updating stuff and the system just there's a lot right um so uh as you might imagine it is also by far our largest repository uh 300,000 lines of rust code uh I laughed when I saw this earlier 55,000 lines of Json I think that's a lot of config stuff and tests um but I I don't know why there's 55,000 lines of Json but 300,000 lines of rust is not a particularly small uh you know thing but also you know I just showed you a bunch of paragraphs of text it also does a lot and I kind of think that maybe 300,000 lines is not that much for doing all of that stuff um maybe that's just cuz you know uh I don't know if that's actually real or not but uh if you're wondering the control the cargo. loock is uh big it's 12,000 lines in the cargo. loock at three uh 250 kiloby cargo. loock but again like in terms of what rust gives us some of this is obviously crates that we've written that we depend on but also being able to rely on the ecosystem to manage a lot of these tasks is you know it takes a while to write 300,000 lines of code but if we had to write everything all of ourselves you know that's I said earlier I thought it was like a little small um you know being able to rely on the rich ecosystem that rust and the rust Community give us is really really nice um also uh the control plane has to deal with like integrating with click house for metrics and integrating with cockroach DB to manage State and then you can like simulate the hardware versus running on an actual hardware and dealing with networking so there's just like a ton of stuff going on here and R rust's breed is what allows us to like use a single language to deal with all of this stuff you know um I think a thing that's a little underappreciated about rust you know people talk about it being good for low-level stuff but I think it's also getting pretty good for higher level stuff and like the reason people the reason why JavaScript ended up taking over on the server was because you could run the same code in the client as in the server Aster you know obviously but like the point is you could do most things in one language and I'm I'll talk a little bit about front and web stuff in a minute but like rusts I think a thing that's underappreciated in a production sense is having one language be able to be used in such broad contexts for the lowest level of the system to stuff that's a little higher level like this um what for the next okay excellent um the other thing about Omicron don't tell anyone but this is where we have some C code in the system um so some things are just too big to rewrite um we're using lumos as the operating system that drives the the the stuff and beehive for the virtualization for the virtualized system while we have been like rewriting basically everything and writing tons of code and throwing a lot of stuff out it turns out there's some stuff that is still just too big to straight up rewrite and we're not meaningfully rewriting illumos and rust yet uh at least anyway I still think we will Brian doesn't I I am like we'll see you know in 10 years when we IPO and I don't have anything to do I'm going to make that happen we'll see how it goes but it's important to note that like there is a certain amount of pragmatism when you're building a system like this and it's not you know as much as I like to joke uh also it's true that a lot of people have rotten a really good you know stuff in C like cockroach DB I think click house is cockroach isn't go mostly but like the point is it's not rust and right so even though we Nam the company after rust integrating with other things is another superpower that rust brings it's a really good way to interact with other systems and so you know you sometimes need to do that as well and it's important when you're trying to serve customers to do what's best for them and rewriting a whole Unix OS in Rust would be fun but it would not help our customers so it doesn't make sense um so I say like pragmatism while I'm like we threw out everything and rewrote everything you know there's a balance to be had there even if we're a little further along on the uh you know rewrite all the things system okay one more high level before I kind of like wrap this up um we also built our own uh web server situation so this is called drop shot um and the reason that we did that was at the time there were no other things that produced open API um so as sort of you can tell with the whole Idol thing and these other things we really kind of like definition languages between components in the system and so uh this is a project I'm working on uh and this is a very simple endpoint to return uh you know a resource called parts so you got a part and a part ID and it returns it um and so this is called part view because that's just the convention that I happen to pick and this kind of looks like most other web Frameworks you get a request context and then you know what is in the curly braces part of the part and you return an HTP response or an error and then this code is very basic connect to a database uh you know get that particular parameter and then you know look it up I hid all the diesel away in a models part view function but this fetches it from the database and returns it um but what's really cool about this is I can ask the web server hey give me the open API definition for every endpoint that you have so instead of writing open API by hand and then trying to generate a web service from it we instead write the web service code we want and then ask it what's the the you know what would that output be and so that means that it's much easier to keep uh you know the source of Truth accurate I I have a CI test in here that regenerates the schema and make sure that it's the same one that's checked in so that that way I know that I'm not lying to my clients and then on the front end side we have typescript that's able to take that open API and generate a client from that so this is my uh um remix code soon to be react router I guess but uh this api. methods. part viw that is generated from this part view code here so I'm able to then have fully typed stuff the whole way up up through the stack into my front end and so we do um not use WM currently I'm talk about that in a second but the point is is like there's this really nice workflow where we're able to use rust on the server side and then have JavaScript on the front end typescript in this case and keep it typed the whole way through and so we still gain a lot of the advantages um that way an area where we're just straight up not using rust as I kind of mentioned is the front end of stuff so this is the console if you don't want to interact with the API you can use this um but this is all written in typescript and that just like the web stack is much more mature these days and WM on the front end is everyone seems to be paying attention to server side WM so we're not using rust WM on the front end yet and this is an area where we don't really plan to um again it's about pragmatism um and so you know WM on the client still has a lot of binary size issues the Frameworks are very good in my understanding but they're also still very new you know like typescript is more mature than basically every front-end WM framework but I'm still excited to see those develop and maybe someday in the future we'll switch to that um and finally I think that like you know people tend to be very deep at oxide but front end people tend to know typescript already and so it kind of helps all right uh really quickly because I have like 20 seconds uh some problems with rust that we've had build systems cargo is really great until it isn't right so we've now built three different types of build systems on top of cargo Omicron has its own little build system on top of cargo and hubus has it own build system on top of cargo I don't really know what the answer is there I think there's some good possible solutions but uh you know for now this is definitely a really big problem um and build times are especially bad on Omicron 300,000 lines of rust code compiles really slowly no matter how we try uh you know we're still working on some things um and the last thing I want to talk about briefly is async async is not bad but it is also not perfect I hate when people say async is bad in Rust it's actually pretty good but also I would be lying to you if I told you there was no pain in async whatsoever um cancellation in particular is a really big issue for us being able to know if a task has been canceled in the control plane is like a really big problem and the biggest thing is it's it's not even a difficult one it's like a emotional one because that compiles it works sort of thing kind of goes away rust async land like oh this thing got canceled and we didn't know about it and so it makes it feel like the promised rust gives to you was a little broken um and so I think that a lot of people need to write a little L async and keep it to the outside but I don't really have time to talk about that right now um this is my last slide uh so common themes of here like we need to move up and down the stack and rust is fantastic for that it can do the lowest level things as well as basically the highest level things types are really good and typed interfaces are really better every time we've invested in putting a typed interface between parts of the system it has paid off and it's been really good building tools is fantastic and rust is a great way to build tooling Uh custom tooling for what you're doing and uh the community is really awesome um I wish I could talk about productivity but I am past time so thank you so much again for having me here and uh yeah we can we can chat later I'm [Applause] sure thanks Steve you can stay there because we do have time for questions what was that we have time for questions oh excellent we have time for questions okay cool does anyone have a question here Jonathan hi yeah thank you for the great talk um my BDI saw some interesting apis up on the screen I saw you using sync unsafe cell so I looked at up I thought that's I can't believe that's in coure I want that I didn't know they'd added it they haven't added it it's nightly yes so can you talk a little bit about your choice of the uh how you ride the release train for rust so we try to stick to stable for literally everything however Hubris in particular is using nightly still um I don't remember if the unsync cell was in the phbl or if it was in hubris but hubis is on nightly specifically and that's sort of because there's just a couple of things that you need to have that uh doesn't exist so in particular the big loadbearing one and this is an FCP right now is ASM const for inline assembly we use that and we could get rid of it but it's basically almost stabilized and that'll be the last thing one of the things that we've done though is um because again we built this build system on top of cargo I actually implemented an allow list of nightly features to make sure that we don't add more so at some point we we decided okay we're getting close to the point where we could build this stuff on stable let's just like cut out the ones that are unnecessary I had a couple polyfills to get rid of some apis that were just convenient but had not been stabilized yet and then just making sure that we never added more and so huis in particular is like very very close and I think once amm con lands the last two or three flags are things that we could get rid of and so there's a desire to do that um I'm not 100% sure off the top of my head if phb is also equally as close but we do stand stay on stable rust and only go to Nightly if we absolutely have to um but it is definitely true and that's a good point that sometimes when you're doing the embedded things some of that stuff is uh you know the day that inline assembly was stabilized was such a great day for the like embedded rust ecosystem right so we're like almost there thank you anyone else have a question got a couple I'm gonna go from the back to the front if that's okay so I've got here here and I'll go here I had one yeah um having a whole um ecosystem in Rust in your company I guess you have you host your crates yourself like some version of crates iio or some open source project what do you use for this crates IO um by default uh and the stuff that we don't uh we do have some like G dependencies get dependencies in cargo are their kind of pain uh they're like not super ideal but again because everything is kind of Open Source there's not a reason to not just put it on crates by default although I will say that occasionally we'll start projects that are closed source and then eventually open them and so sometimes things have lived on as as g dependencies a little more than they had to um but in general yeah we we haven't investigated running our own crates.io or doing anything like that yeah okay I had a question here floran I think um you mentioned that um it is very important for you to know exactly what code you're executing but also you have a lot of dependencies uh including external dependencies how do you deal with with um for example supply chain attacks yeah so uh you know we do run cargo audit in some places and rely on that but I think there's a difference between it's such like a politician answer but like what I meant by that was like that the code we intended to run is the code that runs it is true that like if we built a thing and we depend on a dependency there's a dependency a dependency that's code that we didn't write but what I like the the situation here is more of like validating that the firmware is the firmware that we gave you and that the problem is not that something got changed there rather than like uh you know a stronghold guarantee that nobody has snuck something in on a a tiny dependency I definitely think that um you know as we if we ever start selling into more regulated Industries maybe this will become a bigger thing but our first customer was a US government agency and they're a science lab so they care a little less about that kind of stuff but um you know it as we start moving into things that are more regulated I'm assuming we're going to have to be a little more uh intense about that but for right now it's mostly the standard stuff everybody does cargo audit and uh cross your fingers um but like usually things are pretty good right so yeah it's a good question good I had one here so you do a lot of kind of bringing up the server from scratch so I imagine there's a lot of the initialization process one of the problems that I've seen with embedded is when you kind of hit things that need to happen really on in the initialization process typically before like even rust gets initialized have you had to deal with that and how have you gotten around it has it just been ASM and things like that yeah well I mean when you're writing code like this there's not really any before rust gets initialized because you're not using the standard Library so there's no like pre-code there but it is true that system up as hard and rust cannot solve uh again I'll mention Dan from phbl he's been working on um a funny thing about throwing out uh things like UEFI is that when you get the next generation of processors you have to write the bring up code again right and so he specifically was working on bringing up the next generation of the stuff and there was like a week or two where he was like I don't know why this thing is returning zero and it should be and so rust can't save you from those kind of like low-level bit banging system bring up problems but at least they can it can help you from accidentally putting some of the wrong bits in the wrong place but it's true that there's definitely fundamental like bring up is just hard and uh there's not a whole lot that programming language tools can do to make that a ton better so yeah okay anyone else there's one more here you had a lot of bangs after productivity so I wonder you want to talk about that for letting me two more minutes so thing goes on my slide no um I feel like rust is incredibly productive language and it's hard to communicate that in a way to new people without sounding like you're a zealot and obviously I've been using rust for like 12 years so you know uh I'm not saying it takes literally 12 years but like I think that we're really bad at thinking of like the total cost of ownership of code and so when I think about the Ruby code that I wrote it feels like I wrote it faster but I'm not accounting for all the debugging time that spent after it's done fixing it later but with the rust code like it takes longer to get to that initial state but then I'm spending much much less time on the back end dealing with problems and so I think if you think of it in a holistic way I feel like rust is actually a very productive language but it's also you know it requires you to sort of get over that hump of you know people say you're fighting with a borrow Checker all the time I don't really fight with a borrow Checker anymore I make mistakes and the borrow Checker corrects me and then I fix it and then I move on with my life and so more experienced as as we've gotten more and more more experienced rust programmers not just me personally I think there's like a really interesting challenge in talking about this problem with people who have not used rust or people that are starting because they feel like iteration times are so slow it takes forever to get going with rust but if there's a library I feel like I'm roughly as productive with rust as I am with Ruby if not more when I think about the like whole amount of the work that I'm doing and so I haven't really figured out how to like talk about that without without just purely sounding like a zealot um but yeah I feel like rust is actually very very productive um even though many people don't I think if there's anywhere where you can sound like a zealot this is the place that's fair yeah yeah um is there any other questions we have time for one more if not I have one uh Steve you mentioned that one of your first clients was a US government agency can you tell us a little bit more about who buys oxide computers like where are they being used where are they being implemented so it took us like four years to get to the point of being to sell one and we sold the first one in November this past year and that was to this department of energy uh they're doing sciencey stuff not bombs Z stuff um and then the second customer I can't talk about other than say it's a financial institution and then the third one I I was telling some people a couple days ago if you stocked our GitHub really closely you might see people from this company commenting on our public GitHub issues and so I can't say anything about it but and then Brian like the person tweeted about we're so excited to have side and Brian retweeted it so Shopify is like the third customer so uh you know there's lots of like tech companies that want to run their own Hardware uh governments I think is very natural given that you like you know the kinds of people that are buying hardware and still running in a data center often care about things like privacy or data sovereignty so you know I know over here you all respect that a lot more than we do on my side of the ocean so I'm expecting once we're able to sell in Europe that there's probably a lot of organizations you know I know that like uh in my understanding like German Hospital Systems require to keep medical records on site at the hospital uh and you know France doesn't want secrets to get off French soil and all those kind of things uh so there's a lot of like things like that that I think people especially find the stuff that we're doing interesting but you know also sometimes it's just about my AWS bill is too high and buying a server is you know cheaper in the long run and that's where people like Shopify or other like organizations that are not necessarily like privacy or security focused or sort of interested yeah thank you I think that's all the time we have for today so thank you so much Steve thanks [Music] [Music] Back To Top