thanks so uh title slide all right yeah I'm Andrew Kelly I am the uh president and Lead software developer of the zig software foundation and uh thanks for coming to my talk so uh first thing um where am I pointing here pressing the Go Button not working no we're experiencing technical difficulties oh okay I'll press it hard and long there we go all right uh yeah just want to tell you a little B about my backstory uh I'm sure it's pretty familiar to a lot of you uh I became interested in games at a very young age and uh but my parents gave me a rule that I could only be on the computer or TV or video games for 1 hour a day it didn't matter what it was just 1 hour a day and then I was kicked off so needless to say I became very sneaky uh my first game was uh Sonic the Hedgehog 2 for Sega Genesis so this always has a nostalgic place in my heart um but my favorite games were the ones where you can make your own stuff does anyone remember Tony Hawk Pro skat or 2 they had that that level editor right that was the like this thing was so cool you can make your own levels you can put like the start points in there um you can even like make your own gaps name them give them points and then you could like play local multiplayer with your friends and uh and like play horse and tag and stuff it was super cool so when I discovered uh programming it's like the ultimate game right it's like make your own game that's ultimate power so yeah then I got into Visual Basic 6 I still love this so much right you open up the program and then just like any other contemporary program it gives you a new document form one just like how it would look like it's just begging you to just drag a button on there and make it do something right um so you know as I as I progressed I would you know I learned other languages learned Pearl python C++ C and yeah I learned them in the wrong order uh just like everybody does and then um every time I looked at my own code even from just like a week ago I would always think it looked like garbage I'd be like yeah that code I wrote a week ago it's trash I've learned so much in one week right that was my experience for like 10 years you know every time I looked at my old code I was like I know how I can do it better but then it stopped right after you know about 10 years of of experience I'd start to write code and then I go back and look at my old code and I think yeah you know fair enough like if I did it now just do it pretty much the same way like I don't really see any improvements I can make so um and you know part part of that whole process was you know learning like oh object-oriented programming that they teach us not quite right like you know you learn your own way to go you know but I still plateaued right at some point I hit this plateau and um but then something happened where it it started happening again and I'm in this phase right now where I'm looking at code I wrote four months ago and I'm seeing s again I got past the plateau and for me that was learning about data oriented programming so to get here um you know I watched a bunch of Talks on YouTube uh that's a pretty famous One in fact let me just take a little moment here I think I I'm going to do a talk on data oriented design I think I better uh [Applause] yeah that feels right that feels right okay anyway so yeah in order to get here I I I watched some talks you know um I I attended handmade Seattle three times you know talked to a bunch of people uh and then I read this uh data oriented design book by Richard Fabian um but it still took me a long time to get it like it just didn't click for a long time for me so my goal of this talk is that if you're here if you're where I was for this 10-year period of my life I want to help you like get over it with me uh like if if you have the same like learning model as me maybe I can like help you just like Fast Forward right okay so let's so let's do this ~ Introduction Segment Removed ~ what's a computer right the big picture if I run uh LS CPU on this laptop that I'm presenting with oh wait a minute we're doing it totally different than I thought just kidding if we run LS CPU on my main laptop um it tells us a little bit about it so uh 8 with hyper threading I got this amount of L1 cache L2 cache L3 cache what what does that mean um so here here it is in kind of like graph form so this is this is high level what the hell is a computer from the perspective of memory right and if you if you notice L1 has a very small amount of memory L2 has a little bit more L3 has a little bit more then main memory is huge right so there's trade-offs here l1's real fast l2's a little slower L3 is a little slower and Main memory is a lot slower so um just try to keep that little that picture in your head when I go on to this next slide here so we have um yeah thanks to it.com for this this chart and um if you're uh if you later just go look up this chart because it's super interesting and I'm only going to highlight a couple points here but what I want to point out is where where those L1 L2 and L3 are on this chart right and if you if you look over uh sorry wrong button this one so if you look over here every time we go down it's an order of magnitude right L1 really fast okay L2 order of magnitude L3 order of magnitude main Ram order of magnitude right and the other interesting point is that all this stuff there's there's so many things that a CPU can do that's way faster than reading from Main memory right so like in particular notice that the fastest thing you can do is math like math is faster than reading from memory so you can draw a pretty interesting conclusion from this which is like should you memoize results of doing math maybe not you might actually just be slowing yourself down by not repeating the math right and that includes multiplication right multiplication uh right there faster than faster than L1 read even right um and then furthermore if since you know we're focusing on memory here I want to highlight one more thing so kernel call is one of the slowest possible things on here and that's something that might happen if you call Malik right if you if you Heap alocate and uh to highlight um the talk on the the the Json parser that's that that cor that correlates right because they found out all all their in the first round it was Malik doing kernel calls that was taking the the performance down to this order of magnitude right okay so what's the takeaway here right the CPU is fast but main memory is slow so okay now we understand but how do we apply that like what do we actually need to do so we need to do more stuff on the CPU we need to do less stuff with memory we'll we'll drill down into that more but let me just one more big picture mental model help you out with this so try to think of it like whenever you need to do any memory access you're always going to go through a cach line always so the question is are we going to have to evict a cach line in order to do the job so let's say that I'm um storing something to memory each cach line is typically about 64 bytes could be 32 could be 128 it's usually 64 and that's that's the granularity that you have so if you're going to access something from you know this part of a 64 by chunk and then you want to access something from the same 64 by chunk that's good you're not going to evict a cash line but if you if you access something really far away you might require another Cash Line to get involved you're going to increase the chances that you evict one right and so the whole point here is just don't have cash misses that's the whole point point right so now I'm going to make this so you know my talk title is applying data oriented design in in practice right I want to help you do it like you know we all we all know we want stuff to go faster we want high quality code what do we actually do so here's one strategy you can use among many that you can that you can use to apply data oriented design okay think of your pet project think of the thing you're working on think about where you have a lot of objects in memory of the same kind like like a struct think about a struct that you have the most of in memory and think about how you can make it smaller that's the whole trick that's all we're going to talk about today so part two I'm going to do like a short lecture on uh how structs are laid out in memory and this part's interactive so I want people to like shout out don't feel um like you can be a smarty pants know at all don't worry just just go for it so every type in in programming like C Zig Nim whatever uh it has a natural alignment and a size and let's just go with some examples we do example based learning here so u32 right 32 by integer uh what is the natural alignment four good and the size four okay good everyone's on on the same page um so let's go a little different all right aou natural alignment is and size is all right I had some different answers so alignment's actually one you can just load a a buol um with even if it's not aligned you can just load a one bite no problem one one alignment one size one all right next challenge a union with a u32 and a bu natural alignment is four I'm hearing fours and size is all right I'm hearing all correct answers yeah okay now instead of a union let's go struct natural lemon is oh I'm hearing some different answers size is I'm hearing eight so the alignment is the alignment of the biggest one which is the u32 and the size is here's how it works you start with each field and uh you add the size so we go from zero to four okay and we put the buol right there so now we're at five but now we're at the end so we have to go to the stru alignments we have to add three more bytes boom boom boom to get back to four so the answer ends up being eight okay all right let's keep going now we have a struct we have u32 u64 u32 what's the alignment eight right and the size yeah you had to do math on that one yeah 24 right because the ALG lthm is start at zero we go four but then we have to go aligned so we have to go up to eight and then we put the 8 one in there we get 16 and then we put the four one in there we get to 20 but now we're not aligned yet we have to go four more to get back to eight alignment so now we're at 24 so just by moving these two fields around right we put the two ins in a row now what are we looking at what's the alignment same same alignment and the size yeah we went back down to 16 now maybe your language is going to do this for you you know some languages do but that's not my point my point is on the computer there you know this is how it's going to work like if the language does it for you great but this is what the language is doing for you right okay I have one more let's put another Bo on there now where we at alignment's the same right eight how about the size oh I heard different numbers okay we're back up to 24 yeah so this is wild right because abou is one like in information Theory Abu is one bit of information but we went from 16 to 24 right we paid the cost of 64 bits just to add one to the struct and I remember from Mike Acton's talk on data oriented design he was so mad about this Boolean that was an A struct in like uh like ogre 3D or something right I remember watching this and thinking yeah he was so mad and he did this like script that like printed out how many bits are being wasted or something and I was like okay okay I get it but like the hell am I supposed to put that state because I need to know if it's true or false like what am I supposed to do right uh so I'm going to try to answer that question um and so here we go into the next uh to the next section here so here are some strategies that you can use in practice to take a struct and represent the same information according to information Theory um you know you're not you're not losing any bits but you're reducing the footprint in memory of your struct right um so let's look at we'll get back to the bu one in a minute but here's another example here's a struct with some pointers okay so if we're running on a 64-bit CPU uh we're looking at uh 32 bytes right if we're running on a 32bit CPU we're actually only looking at 16 bytes so you know interesting observation even those 64 that computers can uh you know have more RAM it by default they actually also use more RAM right because each pointer is double the size but there's something you can do just use integers so if we uh if we don't Heap allocate these objects if we have them in an array list for example and we use indexes instead of pointers to refer to them we've halfed the size of of the struct on you know on 64-bit CPUs use so uh use indexes instead of pointers that's a trick you can use to have a smaller uh memory footprint now I have to say um oh sorry I want to make one more Point too we also reduced the alignment right we went the alignment went from eight to four in this case so if you wanted to add just a 32-bit integer in a fifth field you're not wasting anything right because if we tried to add a u32 onto the other struct we'd be paying for four bites of padding here it's 4 by alignment we're not going to pay for that four byes of padding so that's another benefit okay but I have a warning which is watch out for type safety because in the pointer example you're going to get compile errors if you use the wrong type of stuff right you pass to a and you should have passed to B compile error if they're all integers with no types this is uh you know you've you've made debugging harder and you've made some possibly interesting uh bugs possible right so you have to watch out for that you know maybe your language has a distinct integers that will probably help um Zig doesn't have them maybe we should add them I think I think Odin has them there you go use Odin for this use case um I also would say uh if you um write this down go search the internet for handles are the better pointers there's this blog post by um flow of wo Andre Weiss flog um it's really good and so I'm I'm not I'm just telling you like here's something you can do I'm not telling you the details of like here's how you use integers instead of pointers this blog post will tell you here's how you use integers instead of pointers right it's the how instead of the like what right okay so there's that one we took a um what was our original size 20 32 bytes we got down to 16 with this example okay what else okay here's an example we're going back to that Boolean right um so we have a pointer we have two 32bit in we have that Bool uh so if we look at the size of this we're looking at 24 bytes right this is the same example 24 bytes on 64-bit CPUs 16 bytes on 32bit CPUs lots of wasted bits um I did a little bit of you know math here so you can kind of get a feel for it if you had a 100 of these that'd be 2.4 kilobytes so what this is where I was thinking like what can we do I need to know if the monster's alive like I can't just delete that field because I have to know what the answer is right so the trick is you can store that information somewhere else so by having two arrays instead of one uh you have all the alive monsters in one array all the dead monsters in another array now that that Boolean that that according to information Theory it's still there it's which array is it in right but according to your memory footprint it's gone so we went from uh we went from was 24 on 64-bit CPUs to 16 right and if we do the other trick trick that I just mentioned and go to indexes we're down to 12 so this so here we go this is another trick we went half right we were at we were at 24 now we're down to 12 instead of taking up 2.4 kilobytes for 100 monsters we're taking up 1.2 for 100 monsters there's another benefit to this too which is that if you had a loop where you know the first thing you're doing in the loop is saying like if the monster is dead skip them right which you can imagine you would do in your game now you're just looping over only the alive monsters and before in memory you were touching that flag right you were looking you were doing a memory load of alive to see if they were alive right that could evict a cash line but if we only look at the alive monsters we're not we're not paying for that check there's no Branch there's no load and that's going to that's going to result in less cash misses so store booleans out of band that's that's the answer to the the question the part that I did not understand when I first watched Mike D's talk um okay here's another one I'll give you a second to kind of study the uh the struct layout here we're going to go with like the monster but I'm just going to keep swapping out the fields to just different ideas so again we have a pointer um now we have an enum that tells what kind of monster it is we got five monsters here or five animals here um and yes humans are monsters uh so in this example I did 10,000 monsters and I did little calculation 160 kiloby for all this stuff right so let's look at how this is laid out in memory in memory we're going to have uh this is called array of structs right so at element zero we have all the fields of the struct and then before we get to element one we need element one to be aligned so we had to insert seven bytes of padding to get back to 8 by alignment so so that the struct so that element one could be properly aligned and then same thing for element two at the end we had to have more padding so that element two can be properly aligned so every single element is paying seven bytes of padding but there's a really simple trick you can do instead of having one array where you have the struct as the element you just have multiple arrays uh which each with each field as the element so this is called um struct of arrays so you know if your programming language lets you use a multi-array list for example that's a five character fix same same API and now the elements are right after each other right we have the pointer pointer pointer pointer those are all eight byes aligned no padding needed then we have enum enum enum enum those are all one bite remember one bite things don't need any alignment so we just eliminated the padding by putting them in different arrays so if we go back to our example this is the oneline fix we just use a different data structure and now we went from 160 kiloby for 10,000 monsters to 91 kiloby right that's no joke that's a big savings just by eliminating the padding so eliminate padding with structive arrays that's a trick what else can we do all right here's another one so in this case we got a monster we got some HP some XY coordinates we have an array of them okay now we have this thing where each monster can have like four things that their hold you know maybe we use zero for empty or something like that um so I ran the script I ran the calculation so if if there was 10,000 monsters this would be 366 kilobytes including the array capacity overhead right the part where you over alocate all that stuff um and Al that'll become clear why we're now including that calculation in a moment let's say that we make a hypothetical observation so let's say that we for our use case we just happen to notice that 90% of monsters are not holding anything all all of their items held are empty so if we make this observation we can exploit it so now we take out we take out that field and we just store that out of band kind of the same thing we did with the Boolean right but now we're using a table so as long as we can use a index for the monster and that's the key of the hashmap we can just put that data as the value and now it's sparse and if we do the math and um I I did a whole bunch of scripts to calculate these numbers there'll be links there'll be like a gist link the slides at the ends you can check my work if you want um just just go download it later you can play with it but anyway so I I did the math with 10,000 monsters we went from 366 to 198 kilobytes including the overhead from these data structures right the savings are are there and and that's with with uh only uh 10% of monsters holding something and I also want to point out like we could have done different things you know maybe it's that most monsters are only holding one thing so then you would put that in the struct and then you'd have it the special cases for when they're holding like four you know you you can really uh choose to store the data according to what makes sense tically for your observed experience okay so store sparse data in hashmaps that's that one I got one more for you and this is kind of the the cool thing I'm bringing to the table here so um please take the time to to understand this a little bit because I think it'll be this is kind of like one of the cool things that I've I've done so I want to uh I won't make you like do too much hard stuff but let like this this one's interesting right so um I'm I'm going to give you 15 seconds to shout out how many bytes is the struct 2 what you got 28 any other guesses I'll tell you that's not it uh you might be forgetting that the union is tagged I heard of 32 okay that's that's that's right so the human struct is 20 the B struct is one um there's a a tagged Union there so there's a tag right that's just an enum and then there's the the payload so 32 bytes now this is using a strategy where we use the same size for every monster right if we could put these all in Array and they all each element would take up 32 bytes exactly that has convenient properties um but we are paying you know if if we have a lot of bees we're paying for all those humans even for the humans that are not bees it's a funny sentence it's true uh okay so let's let's TR let's take a different strategy let's try organizing our data a different way so here I've actually take an object-oriented approach or I I guess you could just call it polymorphism to me those two things mean kind of the same thing but we've reorganize the data so that there's a base struct okay that's tag X and Y and then each uh you know specialization of it um just adds stuff so the B adds color and the humans add hat shoes shirt pants and heads braces right this is actually an improvement uh so if we do this we're looking at 24 down from 32 right this is an improvement we we have used less mem in terms of how much memory is used for a a a a set of objects we have reduced the memory footprint through an object-oriented approach right but can we do better so take special note right now of all the state that we need to represent because on the next slide it's less obvious how all the state is represented but that is the point right so maybe I focus on um like has braces for example will be interesting and maybe like the color of the be right yellow black red so here we go so this is what I call the encoding approach and in this example there is a tag and there is some common data but we get back to the the um property that they all uh there's a there's a common amount of data that takes the same size and then there's uh sparse data stored externally we're kind of putting all of the the um uh lessons together for this one so you can see that bees there's no longer an enum for bees the the the color for bees is now extracted into the encoding tags likewise we have four encodings for humans so we've chosen to represent naked humans differently than clothed humans and we've chosen to represent humans with braces differently than humans without braces so that that has braces has disappeared into the encoding tag the B color has disappeared in the encoding tag you can see how we're we're kind of the out of Band theme is coming back right and so how this works for humans let's look at um human clothed like how do we know all the information for human clothed so in this case uh you know the the the the the monster is an a struct of arrays multi-array list right so we're not paying for padding between the tag and the common struct for a clothed human we're going to repurpose the oh I don't need a point I did it in the thing there we go so the the extra index can be repurposed depending on the tag so if the tag is human braces clothed or human clothed we know we have to look at the extra index field and then go look up the this struct inside this extra uh this extra thing so that's how this works um and then let's figure out the size so if you want to encode a b a b is 13 bytes because you need the tag the common struct but that's it the color we could have put the color here but we didn't even use this we just put the color here so we even have you know extra data about the B we could store you know this is not a fully loaded fully efficient data structure I this is this is not optimized for memory footprint this is optimized for fitting on the slide like let's be honest right um so here's what this comes out to now in order to give you a number I I have to add more assumptions right because we we made the how much size stuff takes depend on you know the the actual thing that you have in practice so we have to make some assumptions in order to report how did we do right which is good because what we're learning is that by by taking in theistic information about your actual data you can create encodings that work better for you so here we have to assume equal distribution of humans and bees and we have to assume you know some kind of distribution of naked and clothed humans and I feel like a realistic distribution of naked and clothed humans is like half and half you know feel like that matches the world so uh let's do that that comes out to 17 bytes per monster so we went down from 32 to 17 almost broke that in half Again by switching to an encoding approach and yeah I mean I think I made this point already but choosing codings according to your actual distributions right if if you know if half and half naked in clothes is not what you have pick something else like maybe most humans are just walking around with a hat and nothing else in which case you would want a human with hat encoded you know and then you could just put the Hat as the extra index and then you got 13 bytes for humans with Hats done right okay so that's the encoding approach and uh that's we're going to come back to that in in the next sections of this of the of the talk so um that's the five like lessons you know this that's the five strategies that I want to share with you and that that's it that's the take-home knowledge but I do have a case study so you know I did I did these things uh in the zig compiler and uh I just want to report my findings so that you know you know this is not just me you know at home just like oh maybe I can move the data around I tried it for real in a real project and I I will share with you the results um but I will say uh you know this guy is pretty cute but if since we're talking about performance uh we got to go fast so this is the um the pipeline of the zig compiler uh if it's it's pretty standard um on the very left hand side we have source code which is what you type into the editor right hand side we have machine code which is what the CPU understands um this stuff is logic so this is like implementations of doing stuff um this stuff is data right source code is data machine code is data um but we can't control what the source code format is or what the machine code format is machine code is specified by the hardware the source code is specified by the language specification but everything else that's compiler details we can just pick whatever data we want whatever data layout we want for this so all those principles that I just talked about we can apply them here that's a that's the core pipeline of the compiler so if this stuff actually works we would expect to see good results right uh I also just want to take advantage of the fact that I'm bothering to document this stuff just to explain that um everything to the left of this line is done on a per file basis doesn't matter what flags you pass doesn't matter anything like what the same source code results in the same Z data and on the right hand side of that we start to have to get more you know involved in the in the compiler stuff it's per function and when we start to get into code generation now we have to do something different for arm the x86 you get the idea um I also want to foreshadow that uh this part here is embarrassingly parallel um so we'll we'll come back to that point and uh let's let's hone in on the tokens so let's go look at how we can apply these memory footprint reduction strategies to a real world project where our goal is to tokenize source code and output a list of tokens that can then be um used to parse for a compiler so I'm going to throw I'm going to throw a struct at you here um I'll go now we're not doing exercises so you know don't feel bad if you didn't like get it exactly I'm not going to ask you to try and calculate these this is uh more high level at this point but I'm just kind of reporting like so this is how it used to be right it used to be that you might recognize this right this is the tagged Union approach where everything's the same size um you can see that I uh got an enum there some there's some padding right this is me right now looking at my old code and being like oh now I see the mistakes again right so like there's the enum uh is there a Boolean here probably like why are we using 64-bit sizes why are we storing the line and column we can just calculate those you know like just I I I'm almost delighted with how bad my old code looks to me again finally right um so yeah to say it more sustin we can compute line and column lazily get them out of there uh another Point um let's make a reasonable limitation we only want to be able to compile 4 gigabyte source files or smaller if we make this limitation We Can Make a Better Faster compiler so let's use smaller amounts of memory to store those okay other point do we need to store the end position of a token in a compiler I mean we can just roken it again right we can just do a little a little bit of math again and avoid storing some memory um also like in the compilers tokens are just like asterisk slash or like the and keyword we don't even need to roken IE those we just know how we just know the end position based on the fact that it's the and keyword it's obviously three right um finally uh why are we parsing integer literals why are we parsing string literals during the tokenizing we we can just do that later it'll be the same cost but now we can make our tokens way smaller by not storing all this extra data um and then yeah so so here's the results right I I went through all all my own strategies you know it took my own lessons and we went down from 64 bytes per token to five it's just it's just where is it in the file and what is it that's right there in the file that is what a zig token is now so yeah that's uh it's much smaller and we have a lot of tokens right you can imagine you know every identifier in your source code every asterisk every slash that's a token so every single one of those is taking up five uh five bytes instead of 64 when the compiler is doing its work so that's that one we're going to look at as next and then I'll show you a benchmark and then we'll go to the next one so okay as we're I'm going to show you another struct again so this is how it used to be um actually this is how it still is in the compiler that we ship today so this has not been uh fixed yet so to speak so this is again the tagged union strategy for the as nodes uh you can see I got a Boolean in there you can see I got an enum in there what are the lining column doing they just keep coming back like I just was so paranoid I was going to forget what the lining column of stuff was but you can just calculated lazily uh so yeah once again stop stop stop memoizing stuff I can calculate that uh right also uh I was also storing uh like like links to like where which tokens this this as node point2 and all these things you actually it's a tree structure so you only need to store one um that's less of a data oriented design thing and more of just I realized that I could calculate something again calculate something rather than memorizing it right this that's what this bullet point is again um and then okay the encoding strategy is going to come back now right so here we go I'll put up I'll put up the new one uh the new one is uh I I I did a I I wrote a little hacky script that ran the whole standard Library tests for both cases and just kind of counted up how many of them there were divided by the total so these numbers are pretty accurate actually um so this went from 120 bytes to 15.6 average and you know it's average because the nodes now take up a different amount of size for each one so again um this is not what it is exactly this is optimized to fit on a slide um but it is the encoding strategy so you can see that there's uh a main token um all of these are stored with structive arrays so we're not we're not paying for padding or anything like that um there's a left- hand side and a right hand side that are repurposed depending on the tag and the point is that we have multiple encodings so a variable declaration is encoded in three different ways an if is encoded in two different ways a while is encoded in two different ways and there's just you know an arbitrary number of these encodings depending on what tically we found there were we needed to be represented in a efficient way okay so I showed you these two um and I actually did collect some per performance stats for this so did it help yes after I did just these two changes it didn't change anything else yet and I just measured the difference in uh just just parsing a whole bunch of files this is the stats that I took and uh wall clock time is the big one right and if if you do 22% faster wall clock time something that I'm quite happy with so that's that's good but we're not done yet there's still three more stages in the compiler pipeline so I will talk about this one um I'm I'm low on time so I won't talk about this one I won't talk about this one suffice to say the same strategies work but let's go look at the the Z one next phase of the pipeline so this one's going to take in uh a syntax tree and it's going to Output um um an IR that has no type analysis done yet so here's what we had before before uh before we had it's basically polymorphism you know object-oriented programming that that that whole thing so each each node will have a different amount of size depending on which one it is um pointers to talk about each other you know one canonical the T in this case the tag it's not the encoding strategy right there's one canonical way to represent everything which is you know nice in a maintainability perspective but um not as ideal for memory footprint right so here's some observations uh not every instruction needs a source location some can be inferred from a some other context uh Source locations can be u32 indexes into the tokens array or ASC node so uh that's going to drop the size of that one in a half um let's see also sorry also this one before it was like some canonical Source location thing now since we're doing in in now we're doing encodings the encoding tells you the meaning of the source location so that the encoding can say in this case the source location a token index or in this case the source location is a node index uh and then finally this is the pointer to index thing right so now references to other instructions are in are half as big right so there's that one uh and then finally uh references can encode simple values so if you're just talking about you know a true like literally the value true or false uh we don't need to create you know a whole instruction to represent that um the the the reference to true or the reference to false itself in the the IR format can just have an enum tag dedicated to that um uh I'm going to just gloss over that because I don't want to run out of time but anyway suffice to say before it was uh an average of 54.0 bytes each um oh and sorry we're going to do one more observation here we're going to use the encoding strategy obviously that's what I've been keeping saying so we're going to go from the PO polymorphism strategy to represent memory to the encoding strategy so we go from 54 on average to 20.3 again on average and um you know download the the code if you want at the end and and check my work I did some scripts that just ran against the whole standard library to figure out these uh these averages so this is the encoding strategy um I had to simplify it a lot to just fit it on the slide but but um it's the same deal right it's the encoding so you you encode multiple different ways to express everything here we can see strong and weak that's a that's a Boolean flag that's been you know put into the encoding tag um what else can I point out here I think I don't I don't think I need to point anything else out it's the encoding strategy it's the same thing I gave you three other examples of we went from 54.0 to 20.3 um so what I do want to point out though is that each of these improvements we we cut the amount the size of each object in half or smaller through these techniques like the amount of memory that this thing is using drastically reduced uh so and did it work like you know does using less memory result in less cache misses well empirically yeah um and this one it gave me a 39% reduction of wall clock time which is ridiculous and and that's on top of the other one that I just reported for just doing it for the other two um I guess I had more of these objects in memory than the other one um so that works like in a real world project it works if you do the tricks you will get faster code uh and now I know I foreshadowed this embarrassingly parallel thing here so I had one more thing to report which is that when I went ahead and um I implemented a a thread pool or sorry King implemented a thread pool which I just copied from him thank you for that uh and then I I I made it distribute the work of doing this embarrassingly parallel part uh you know on the threadpool and this is on my like Dell Inspiron it's it's a pretty beefy laptop right it's a good laptop um when I did this just for this part of the pipeline it came out to 8.9 millions of lines of code per second so that's pretty fast and as a bonus uh uh the the zir right the the the output of this part of the pipeline it's now only four arrays it's a tag it's a common data um and then it's like an extra array and then there's a string table it's arrays that's it it's it's stupid stupidly easy to save and load that from disk there's a Cy call on posix called uh write V or or read V where you actually just give arrays to the OS and it just does the whole thing at once like it's one CIS call to save these to the cash or to load these from the cash so like this this number here this includes like saving the data into the cache right like this we're interacting with the file system here to like avoid this work next time okay caveat though right I'm only talking about this orange part and this parts of the compiler pipeline are no joke I know like is that Walter I see there yeah I know you're thinking that right now um so I do want to um highlight these areas over here because that number I gave you that's going to go down a lot right um if I can keep it above a million I would love that but obviously if these parts of the pipeline are harder that number is going to go down that number is just for the orange part I just want to be perfectly clear um but let me just let me just zoom in and and tell you like okay where are we at like what's going on here so I made this slide so that you can kind of see the idea um the the part that I gave you that stat add on that part is done okay but we're still working on the back end so um we're at about 35% of the behavior tests are passing for the self-hosted compiler so every time I've been reporting on these improvements I'm talking about improvements that have been made in the selfhosted compiler which is not what you get if you download Zig today if you download Zig today you get the slow one so if you are interested in this and you haven't tried it yet actually kind of do recommend waiting like maybe wait till these numbers go up to like I don't know 90 or so 90 or so um you know wait till we have a release where we ship this new code as the default and then take it away so I just wanted to be very Frank with the uh the progress report here and um I think uh yeah I think let's move on from here so let's summarize this is what a computer looks like from a memory perspective add CPU cache to your mental model of computers see CPU is fast memory is slow uh identify where you have a lot of the same thing in memory and make that size of each thing smaller there's a bunch of Handy tricks you can you you can use to reduce the size of things so indexes instead of pointers watchever type safety store bullion out of band eliminate padding with structive arrays store sparse data in hashmaps use encodings instead of uh polymorphism uh and you know as you can see with the Z compiler it's worth it like these tricks uh H have actual real benefits um so that's it that's my talk thanks for listening Back To Top