[Music] [Applause] uh oh great you can hear me um all right let's go ahead and get started so uh good afternoon everyone thanks for being here my name's Christopher domis I wanted to talk to you a little bit about a project I've been working on recently uh that I call the mafas skater but before we get into any technical details I wanted to give you a little bit of background information on this project so this whole thing started when I was bored one weekend and I was browsing a site called re maath so if you haven't been to this GitHub place um it's absolutely fascinating just tons and tons of awesome articles I love reading things here so check that out as soon as you can if you haven't seen it already so I was browsing through some things on re maath looking for something uh interesting and I found something that caught my eye an article by a guy named Steven Dolan so Dolan's article Ed out with this humorous observation about the x86 architecture Dolan says it is well known that the x86 instruction set is Baroque over complicated and redundantly redundant we showed just how much fluff it has by demonstrating that it remains tur incomplete when reduced to just one instruction so that really caught my attention right away I thought I've been using x86 for for a decade and I didn't know one of these instructions in this architecture was T and complete by itself what what instruction is this Dolan what magical powerful instruction is so awesome that it is touring complete on its own it turns out it's the move instruction the simplest instruction in the entire architecture moving data from one place to another is apparently touring complete so what does that mean if if move is Turing complete well sort of at a high level it means that any code that we write uh could be written as a set of moves instead and absolutely nothing else just a bunch of moves and and that just boggled my mind at first I thought really I mean when you think of all the incredibly complicated things we do with our code like arithmetic and comparisons and jumps and function calls and exception handling to then tell me that move can do all of these things on its own um that seems impossible but if move is turning complete then then that's the way it is so the more I thought about that the cooler it seemed suddenly I kind of realized that' be pretty hard to reverse engineer wouldn't it because we'd go from something like this where we've got all these nice instructions that tell us exactly what this code is doing and just a matter of seconds I can look at this and see that we're initializing uh a variable to zero then we're pushing a string out of the stack and printing it out we're removing an argument from the stack we're incrementing our local variable by one and comparing it to 100 and if it's not 100 yet we're repeating 100 times this whole thing in a loop so just by looking at the pneumonics in this in a couple of seconds I can reverse engineer it and understand exactly what this code is doing but if move was turned complete if we rewrote this whole thing as a series of moves we'd go from from this where we've got all these nice Clues into into this just a whole bunch of incomprehensible data transfers from one location to another we would lose all the clues that we use for reverse engineering software and that seemed like it would be devastating for reverse engineering efforts so I thought that was really cool and I wanted to pursue it more but in order to use this idea we need to understand a little bit more about what a turing machine is so if you're not familiar with this this concept the turing machine sort of at a at a high academic level is really just this five tupal where you've got this finite set of States q a distinguished starting state in Q a finite set of symbols Sigma and one of those is a blank signal and then you've got this transition table Delta that Maps um the cross product of Q and sigma to the cross products of Q Sigma on either left or right what that really gives you is the state transition diagram a way of moving from one state uh to another and that's exactly how Dolan approached this problem in his article he uh designed his own little turing machine and showed how you could move from one state to another using only the move instruction and through his abstract turing machine by showing that it could work with just move he proved that move is Turing complete so that's really really cool but that's really purely academic because as soon as you try to actually apply this concept what you're going to find is that this idea of this abstract State transition diagram just doesn't really map well to the processors that we use every day meaning that the ideas from this article although they're really fascinating um just aren't that useful in practice and Dolan acknowledges this he sort of um concludes this article with a tongue and cheek observation he says removing all but the move instruction from future iterations of the x86 architecture would have many advantages the instruction format would be greatly simplified the expensive decode unit would become much cheaper and silicon currently used for complex functional units could be repurposed as even more cache as long as someone else implements the compiler so I was absolutely fascinated by this article and really amused with his approach so I read that last piece and I thought you know what I'll take you on Dolan I can I can implement the compiler for you challenge accepted so I set out to implement Dolan's move only compiler but it was a little bit daunting to start on this thing uh because like I said his ideas are so far removed from any practical application that they're a little bit hard to use but there were a few key ideas from his article that we can pretty quickly adapt his first key idea was that move can check for equality we don't need a comparison instruction because move can do it for us and it's just this simple for example if we have two variables X and Y and we want to see if these variables are equal to each other all we have to do is move zero into the memory pointed to by X move one into the memory pointed to by Y and then read back from the memory pointed to by X so imagine X and Y are both three both of these are equal to each other then this is move zero into memory address 3 move one into memory address 3 overwriting the zero and then read back for memory address 3 the last thing we wrote to memory address 3 was a one R becomes one indicating that the these two things are equal to each other but now suppose they're not equal suppose X was 2 and Y was three this is move zero into memory address 2 move one into memory address 3 and read back from memory address two well the last thing we wrote to two was zero so R becomes zero indicating that these things are not equal to each other so with just three move instructions we can actually check values um for equality another key idea from Dolan's paper is that there's really only one code path if all you've got is a bunch of moves and no branches everything executes all the time but Doan observed that if you design this thing correctly a block of move instructions can actually either have an effect on the system state or have no effect on the system State all depending on the initial state of the system so we'll show a little bit more of how we can use that later on Dolan's turning machine required a single jump instruction to loop back to the beginning of his program that sounds kind of like it's cheating if we're trying to prove that move is Turing complete we can't just start tossing in Jump instructions but I'll show later that this is kind of incidental we can actually get away from this jump in instruction but it's a useful starting point for our design um but what that would look like is we're going to have a whole bunch of move instructions in a row followed by at the very end a jump back to the beginning so somehow this whole thing's going to execute over and over and over again in a loop and it's going to magically do computation for us if we can make this work um another key idea from his paper his turing machine um required an invalid memory address for halting if you imagine we've just got these moves executing in a loop forever we need some way to stop the program so Dolan said assume my turing machine has some invalid address and when I try to access that address the program stops so that was another useful idea that we can adapt so we've got these these basic ideas of how to use move for some useful stuff uh but it's a little bit challenging to get started with this but I thought maybe if we can build on these primitive touring machine operations that Dolan came up with we could maybe adapt those for some higher level logic we could also change his turing machine his thing just works on abstract symbols if we want to actually apply this we need something that will work on real data so we're we'll adapt this for real data and we'll start adding new operations we can gradually build up um operations for if else arithmetic logic jumps Loops everything else our program needs to do we can add these piece by piece using move instructions and maybe if we do all those things we can actually get this a little closer to something we could use in practice so let's look at how we would actually build up some of these operations let's look at one of the most critical ones for programming if how do we build an if statement using only move instructions so we want to design something like this if X and Y equal to each other then assign 100 to the value X so pretty simple at this level but we've got a big catch if we want to just use move instructions namely we have no branches going back a slide if we don't have any branches I don't have any way to skip this x equals 100 If X and Y are not equal all the paths are going to execute all the time no matter what so the solution I came up with for this is that we can sort of force a path to operate on dummy data if we don't want its results and real data if we do want its results so that would look something like this we're we're going to have a real copy of our data this is our real System state sitting in this copy of the data and we're going to have a scratched copy of our data this is our fake System state that just gets discarded and never really used for anything important then we're going to have a selector basically a pointer that will let us select between the real System state and the fake System state and what we'll do is we'll evaluate the first part of this if statement we'll see if X and Y are equal to each other if they are then we're going to load our selector with a pointer to the real data um so that when we execute xal 100 that 100 is going to get into the real copy of the system State on the other hand if X and Y are not equal to each other we're going to load up our selector with a pointer to the fake data to the scratch System state our xal 100 is still going to execute but now that 100 is going to be stored to the scratch State instead of the real State effectively discarding the results so that's how we can sort of implement if statements um using just move instructions and see that's going to look a little bit something like this uh we've got an array of two pointers one pointer points to our dummy data one pointer points to the real data then we use our equality check to select either between the real pointer or the fake pointer we dereference that pointer and we assign 100 to whatever it points to so without any actual branches we've just effectively implemented this if statement uh using this pointer selection method so in assembly that would look something a little bit more like this we're going to have a real copy of both of our variables and a fake copy of both of our variables and then we're going to have these selection arrays that let us choose between the real copy um and the fake copy and the move instructions themselves to implement that if statement are going to look like this uh at first we're just going to do the equality check that we already talked about to determine if X and Y are equal to each other if so eax gets set to one thing if not it gets set to something else then we use eax as an index into our pointer selection table to select a pointer either to the real data or to the fake data then we de reference that pointer and store 100 to whatever it points to effectively storing 100 to x if x and y were equal storing to some scratch data if they weren't so um really we can Implement any if statement then just by adding these selector functions which are really just pointer arrays to all of our variables to let us switch back and forth between real and fake data now if you're paying close attention you might have noticed that there's a little bit of a problem with this equality check I showed we're sort of reading this variable X and then trying to assign zero into whatever um memory address that is you can't really write to any arbitrary memory address that's a good way to say faults so um we can get around this if we limit ourselves to by- siiz data and just create a scratch 256 element array for this equality check so I'm going to skip the details of that for brevity but if you're really curious as to how we get around the siga issue with these equality checks you can check out those slides later so we've got we've got a way to do if statements we're getting there um with a few simple extensions it's not too hard to imagine how to do if else statements or if else if else statements or inquality checks those are all pretty easy extensions to our basic if statement thing if we have to write all of this in just moves this is going to get really really tedious in a hurry so we build up some simple assembly macros um using nasm syntax macros uh so that we can have some high level functionality that's going to expand to move so this is a macro for checking two values for equality using only move instructions this is for checking two values for in quality um and this is a macro for setting up those selector arrays to check or select between real and fake data um so we're getting further uh but what we really want beyond just ifs is a way to do loops in brand es but we can extend that if El idea in order to make this work so here's how we can do arbitrary branches with only move instructions um essentially every time we have a conditional jump that we want to perform we're going to check should that Branch be taken if it should then store the address that we're trying to Branch to we can do that with a move instruction and then turn execution off and when I say turn execution off I mean simply switch all your pointer over to dummy data so that anything that executes will not have an effect on the real System state on the other hand if the branch is not taken simply leave execution on and work on real data then on each basic operation not every move instruction but each primitive operation you're trying to perform you're going to check is execution on if execution is on just run the operation on real data if execution is off you need to check should I turn execution on in other words is the current address the stored Branch Target if it is turn execution on and run the operation on real data if it's not turn execution off and run it on the dummy data so that's really really complicated and a lot to take in at first but it's pretty simple simple with an illustration so this is what our program looks like a whole bunch of moves followed by a jump back to the beginning these are just going to execute in a loop and let's say I wanted to Branch from this location in my program to this location in my program using only move instructions it's actually not too bad um at this location in the program we're going to store the Target that we're trying to Branch to I'm trying to Branch to address onec so we're going to store that somewhere at this location in the program now I'm going to turn execution off and by turning it off I mean we're going to switch all our pointers over to dummy data so that any memory rights that occur don't affect the real System state so I turn execution off by switching all those pointers over then I allow execution to continue so all these moves happen but they're all going to be affecting dummy data instead of real data each of these uh basic blocks is going to check if it's the branch Target or not if they're not the branch Target they just operate on the dummy data having no real effect so we just continue execution each block checking if it's the branch Target until finally we hit one c 1 C sees that it's the branch Target so it switches execution on in other words it switches all the pointers over to real data instead of fake data so that now any of the subsequent moves will actually have an effect on the system state so this effectively implements branching just by switching execution on and off whenever you want to take a branch so we're getting a little bit further but we've still got a long ways to go if you want to do anything useful we've got to be able to do computation we need to be able to do arithmetic arithmetic actually turns out to be surprisingly easy with move instructions especially given that we've already constrained ourselves to one byte values so we can actually do basic arithmetic with simple lookup tables um which will allow us to use move instructions for this so that would look something like this let's say we wanted to implement the increment function with move instructions only well we can make a lookup table for that so what this will allow us to do is if we want to figure out what is three incremented by one I go to element three and my lookup table so this is 0 1 2 3 so I find that element three or three increment by one is simply the value four so an assembly that looks something like this to build up our table and something like this as our actual increment instruction so really really simple one move instruction performs increment it simply uh goes to the table looks up an element in the table reads it out in order to increment that value so um increment couldn't be easier uh with move instructions same with decrement we can build up a decrement table so if I want to figure out what is 2 minus 1 I go to element two in my table this is 0 1 2 and I find out two decremented is one so same basic thing we can build up a decrement table with assembly macros and that's our decrement instruction as as a move instead of a deck so we're getting a little bit closer um we've got some really really basic arithmetic now we still need logic gates we need to be able to say things like and or not um stuff like that we can do that with lookup tables too so this is a little bit easier to visualize and see let's say we wanted to um figure out what is the logical and of one and zero uh without using an and instruction just using lookup tables uh well we can do that with this uh with this table I go to array one in the table element zero to figure out the logical and of one and zero is zero what about the logical or of zero or one I go to array zero in my or lookup table element one to figure out the logical or of those two values is one so we can pretty easily build up these tables these are the logical or tables as assembly macros logical and tables as assembly macros logical not um all expanding to simple move instructions we need a way to stop our program um if we're just executing these Loops or executing these moves in a continuous loop we've got to have some way to stop this whole thing right now our program Loops forever fortunately Dolan figured this out for us he said assume we've got some invalid memory address and when we access it the program stops so that should sound kind of familiar that's basically just null when we access null our program stops um It's s fults it's not the cleanest way to stop but it stops so we'll use null as a convenient way to just stop our program execution so all we really got to do is try to access um the null pointer in order to Halt execution obviously we don't want to access null and halt our program every single time we want to be able to conditionally do that so we can do that with lookup tables as well this is a way to um conditionally look up either a real pointer to real data or the null pointer to to zero and dreference that to conditionally halt the system so we're really getting there we've got all these building blocks now as as assembly macros that expand to move and structure ction so we've actually got some high level logic going on here we've got macros for equality inequality not and or we have ways to get real and fake data ways to increment decrement and turn execution on and off we've built all the macros we need for some high level logic um at this point and if you build up enough of these macros um programming this way actually becomes almost doable in assembly but if we want to be really useful I'd like to be able to go beyond programming with assembly macros so C compiler is a lofty goal here um so I want to start with something similar so I promised a compiler for this talk but I didn't actually say what we'd be compiling so we're going to talk or start off with a really really simple programming language um which some people call something else I'm going to call this brain yucky for the purpose of this talk so brain yuck is a minimalistic esoteric programming language it's really simple it's got eight basic instructions and only two registers an instruction pointer and a data pointer um so these are the eight brain yucky instructions we're interested in and brain you can increment or decrement your data pointer you can increment or decrement the bite pointed to by the data pointer you can read a bite in you can print a bite out you can Branch forward and you can Branch backwards so eight really simple instructions I'm adding one more to allow our programs to Halt so just a little bit of um introduction to brain yucky if you haven't seen it before this is what printing out 1 two 3 4 would look like in brain yucky all of your memory starts in with uh zeros and brain yucky we want to print out 1 two 3 four asky one is hex 31 so we need to get one of our zeros up to the value 31 in order to print this out so I increment our current data cell the cell point2 by the data pointer 31 times in order to get it up to asky one and then I print out that one I increment it again to get it up to 32 print out the two etc etc to print out 1 two 3 4 another common Paradigm in brain yucky is this little uh chunk here this is how we set data cells to zero in brain yucky what this says is check the current data cell if it's zero um then Branch over here if it's not zero then decrement it check it again if it's zero now continue if it's not zero Branch back so all this does is it sits in a loop decrementing the data cell one by one until it reaches zero so that's how we set a value to zero in brain yucky code so you can actually build up some real programs with this brain yucky tur in complete meaning we can write anything in brain yucky that we would in any other language so this is hello world and brain yucky they do some complicated Loops to get their data cells right up around the asky values that they're going to need then they start printing things out so this prints out the capital H uh shifts over a data cell decrements it down to a lowercase e prints out the e um then it starts incrementing the E F G H I J K L prints out two L's increments it n o um l m n o prints out the O etc etc to print out the whole hello world program we can get more complicated this is a Fibonacci number generator and brain yucky this is 99 bottles of beer on the wall and brain yucky you can really program anything in this um the thing is this is even worse than the moves ever worse so why on Earth would we try to add this to our to our design if it's this ridiculous um well with the building blocks that we've already built up the things we've already talked about it turns out it's actually really really easy to uh Implement these brain yucky operations with just the move instruction which means that if I can get the code that I want into brain yucky then I can easily convert it into only move operations and going into this I actually knew that there existed a pretty decent basic to brain yucky compiler already so by chaining together these compilers I could actually go from brain or from basic into only move instructions so let's look at how a brain yucky to move compiler would actually work it's going to start out by um simply using some moves to read in the current brain yucky instruction then we're going to figure out what that instruction is using our equality macro again this equality macro expanding to a bunch of moves we're going to see is the instruction output is it input um is it incrementing or decrementing the data pointer etc etc so we're going to figure out what that brain yucky instruction is so this is what the increment brain yucky instruction would look like as a set of moves first we're going to check if execution is on or not if execution is on and we're dealing with the increment instruction then select the real data pointer if neither of those or if either of those things are false go over to the scratch data for execution read in a value from the data cell into Al um into Al and then use our increment table to figure out what Al incremented by one is and then store it back into either the real data if these conditions were true or the fake data if these conditions were false so that's sort of what all our instructions are going to look like check if this is the right instruction load up your selector with either a real or a fake data pointer and then carry out the instruction decrement is pretty much the same increment and decrement the data pointer are really pretty basic um halting the system is pretty easy we already sort of saw how to do that by D referencing the null data pointer input and output get a little bit more complicated and brain yucky so we start out sort of the same way we always did check if this is the right instruction set up the system State accordingly um but you'll notice down here I've got something that's not a move instruction at the end of the day I need to ask the uh operating system to do this IO for me um to to actually um output something to the council or read a value in from the council so that kind of felt like cheating but it was an easy way to get started I'll actually show how we can fix that later on but for now we're using the in instruction to do this comma is reading data in these are the move instructions for reading data in branching gets a little bit more complicated but it's still not all that bad so all we're doing is checking are we on the branch instruction and is execution on and do we want to Branch if all those things are true then we turn execution off we switch over to those fake data pointers and continue execution branching backwards pretty much the same thing so we've got all these um brain yucky instructions and we've now got ways to translate those into only move instructions and collectively that's that's what I called the mafus skater a way to translate brain yucky into only uh move instruction so let's look at a a really basic um example of using macation so I've got this brain yucky program this is a rot 13 Cipher written in um brain yucky what the mcor lets us do is it lets us take that code and um compile it into only move instructions so I just gave the compiler the mafas skater our rot 13 brain yucky Cipher and I had it output a set of assembly for that Cipher so then we're going to use nasm to link this thing and we're going to you or nasm to assemble this thing and we'll use LD to link this thing so finally I've got a rot 13 program um written in only move instructions or mostly move instructions so let's see what this looks like um so there are our move instructions for our rot 133 Cipher you can tell it it got a little bigger than it than it might have been um by other techniques uh so it's mostly move instructions we've got that jump at the end that I talked about uh we've got those in 80s that I'll that I talked about so I'll fix those in a little bit but we can see now um it can actually do computation for us which is kind of cool it's just a whole lot of move instructions executing in a continuous loop doing computation I think that's that's actually kind of awesome that this actually works but there's something that really really bothered me about this even though that in 80 instruction and that jump instruction at the end don't really violate the Ching completeness of this and I don't think they violate the spirit of of the project um it really bothered me that I had two non-m move instructions in this Loop or uh in this uh in this program so I wrecked my brain for a while trying to come up with with how to fix that problem so that we could really only have move instructions um and eventually I came up with a way to do it it wasn't too hard to get rid of that inad instruction so before our program starts to execute um in our environment setup we can uh sort of get rid of this 80 with MMO so sort of in our start function before our real program begins we'll just set up the environment to um call mmap to map standard in and standard out into the processes memory space allowing us to access um the iio streams through move instructions so we can get rid of 80 that way that was pretty easy that jump at the end that was a little bit more problematic we need to somehow perform a jump using only move instructions and in a lot of architectures you can load the instruction pointer directly with a move instruction you can't do that in x86 uh so we've got to get a little bit more creative so the way I came up with um for this was to set my series of moves to be their own exception Handler then at the very end of the uh series of moves we're going to trigger a very specific exception so we're already using the Segal exception to Halt the program I need to find a different an exception so I decided to use the uh illegal um instruction exception Handler to uh to do this so in our start function before our program really begins executing we'll call us the Sig action um uh function in order to set up our program as its own exception Handler we need to set this no defer flag allowing our exception Handler to um recursively trigger exceptions so we'll call that as well to set that up um finally the key piece to all of this is replacing that jump instruction with an illegal move and when I say a legal move I don't mean um just a move that's going to fault I don't mean something like accessing the null pointer that's actually a memory access violation rather than an illegal move instruction as far as I I could think um I could only think of one illegal move instruction in all of x86 which is trying to load the code segment register with the move instruction they want you to load the code segment registers other ways there's uh if you try to do this this instruction does encode but it's their only invalid uh move instruction so replacing that jump with this invalid move instruction will cause this exception to trigger causing the program to jump back lost my mic uh to the beginning to uh handle its own exception finally since we're going to be calling this exception Handler recursively uh we need this to not overflow the stack so we'll reload the stack pointer um every time this this Loop executes so that pretty much gets us to uh our our new version of the MAF skater where we can use only move instructions but we've got a big problem if we have to write all of our code in brain yucky because that's way worse than just dealing with the assembler so that's where um this other compiler comes into play this basic to brain yucky compiler allows us to write our code in basic and then feed that into the moscat to come up with move only uh code so let's look at a demo of uh how that kind of thing uh might work so uh what I've got here if I can find it so I wrote this little uh little basic program here 99 bottles of beer on the wall not a whole lot to it just written in basic we initialize a variable to 99 we start a a loop we call our first set of lyrics call our second set of lyrics call the first set again print out take one down pass it around decrement our bottles of beer repeat the lyrics and continue until we reach no more bottles of beer not a whole lot to that lyrics One checks how many bottles of beer we have left it prints one thing if we have none it prints something else if we have some lyric two same thing prints one thing if there are bottles of beer left print something else if there aren't and finally there's this little done chunk at the end which we might get to if there's if there's time today um but what we can do is we can use this BF basic compiler to take our basic code and translate it into brain yucky code so now when I open up our newly created brain yucky file you can see our compiler just created this 15,000 instruction brain yucky program to implement 99 bottles of beer on the the wall so all we've got to do at this point is give that um to our mafas skater in order to build a move only version of uh of this uh of this program so the mcer just took that brain yucky code outputed assembly we use nasm to uh assemble that and LD to link the whole thing um together so now if we look at this we got our move only code time it really is only move instructions uh for the entire program we got rid of those pesky in80s um we got rid of that jump at the end uh I think this will finish dumping sometime before the end of my presentation but it's it's a sizable amount of move instructions let me see if I will give it a few more seconds all right so you can see [Applause] uh so you can see now at the end there we got rid of that jump instruction it's just an illegal move instruction now that's going to trigger an exception causing this whole thing to reexecute uh forever um in a loop so we can go ahead and run this thing and when we do that nothing will happen at first because it's really really really slow but it does work um it just takes a little while Let's uh give it a second we can find out how many bottles of beer are left after we take one down 98 so uh it seems impractically slow this seems totally unusable and it probably is totally unusable but I think it's really really cool um because it's not just it's not just a giant print F right it's not just printing all the lyrics at once it's actually doing um arithmetic and comparison and branches and function calls all with only the move instruction that's that's really really neat that move can do everything um for us and I wanted to really uh highlight um that fact that we can do really Advanced things with only the move instruction so here's a a factorization program um using only the move instruction you know factorization is not an easy thing it's factoring large numbers using only data transfers of course it's segals at the end to stop the program um we can find prime numbers using only move inst instuction so again doing really complicated math and arithmetic um just by moving data from from one place to another in fact here's a a complete um CSS decryption Cipher written with only move instructions I don't know why you would want that but but you could get that uh what else so uh here's a complete role playing game that I compiled into only move instructions I even I even mafc my mafas skater so I've got a program written in only move instructions that will compile other programs into only move instructions um that one's never going to finish dumping so we're just going to kill that as well so so the point is um this is kind of neat we we can do anything with move instruction so now that I had sort of gotten the concept prove out proven out and knew that I could could do this I I wanted to readdress my original question of of is this a real anti-reverse engineering technique how would an experienced reverse engineer approach this so I thought I'm an experienced reverse engineer what would I do if I sat down and dropped something into Ida and just saw thousands and thousands of move instructions and nothing else so uh truthfully I thought if if I saw this I'd go find something else to do because I reverse engineer because I think it's but this does this doesn't look like fun to me but more realistically I think I think thinking about how we would reverse engineer such a thing brings up some interesting points so if we had written that 99 bottles of beer on the wall program in um C and compiled it it would look something like this for our our main routine which just calls some of the lyrics the first set of lyrics look like this the second set of lyrics look like this if you didn't use any functions your 99 bottles of beer on the wall program would look like this and see so really clear concise easy to understand and interpret control flow graph here um so if you ran this through a a different aisc tool you might wind up with with something like this a lot of officiation tools work by trying to add complexity making things very very complicated in order to make them difficult to reverse engineer but if you have enough experience even this isn't really all that bad you can look at it enough and figure out which nodes matter which nodes don't matter which groups of nodes are performing more basic operations and you can sort of reduce this into something a little bit easier to understand so this problem is still kind of approachable like this but um in contrast this is what the mopc uh 99 bottles of beer on the wall program looks like like it's it's a line in fact uh this is the rot 13 Cipher that we saw earlier this is the towers of hanoy this is a mandal br fractal uh this is the Lost role playing game you might notice a pattern here um pretty much everything looks the same not only do all the instructions look the same but the control flow grass are all identical so um also when you try to actually look at this in Ida Ida really really doesn't like 100,000 move instructions in a single block it crashes every single time you try to look at anything that you've mosc so it's really kind of interesting that this works completely opposite of other appication tools instead of adding complexity it sort of removes all the complexity and somehow removing complexity makes things really really hard to reverse engineer because every single program ends up looking exactly the same on top of that there's really no way to reduce this there's no way to simplify the problem because there's no dead code here there's only code that sometimes works on fake data and sometimes works on on real data um there is one thing you could do to try to approach this a good reverse engineer would eventually see that you've got these same repeated blocks of move instructions and they would be able to figure out what each block of move instructions is doing and eventually reverse engineer the program back into the original brain yucky code and uh work with the brain yucky code instead of the 100,000 move instructions it's actually pretty easy to thwart that approach so I haven't actually done this yet but it's not too hard to imagine um most of the move instructions inside of these basic blocks are independent of one another it's not hard to shuffle those things around in order to make no two blocks look uh look alike it's also easy to add junk instructions to these blocks to add diversity a lot of the blocks are independent of each other meaning you can inter leave these different move blocks you can even rearrange the different blocks effectively um making if you wanted to you could have this thing repute the entire program every single time you compile um which is going to be especially effective in the move world where everything looks the same already you'd compile it once and it would look like this compile again it would look like this again would like look like this um so that that's a lot to sift through and that's really going to make it awfully hard to reconstruct the original brain yucky code um from these programs but even without this uh there um once in a while when I was I was making this thing I i' make a mistake I'd have a bug in my code I'd actually have to dive into GDB and start trying to decipher what was going on and even though I I myself had just written these move instructions 15 seconds ago I could not understand understand what was going on when I tried to watch the program run because every single instruction looks exactly the same you just cannot keep track of what's going on when all you see is is move instructions so it was actually uh pretty effective um I found it's not too hard to imagine some extensions to this idea it's pretty easy to imagine uh substituting other instructions for moves so for example a pair of exors an add sub pair an Andor pair a push pop pair all those can do basically the same thing as a move uh so you could have an exor visator instead of a mafos skater if you so desired um I sort of on a roll here and I was having a lot of fun with this so I wondered where else I could take it um earlier on I said that we were limiting ourselves to one bite data which is a pretty big limitation so I wondered how hard would it be to actually um go beyond one by data how could I do 32-bit operations um with mfos skated code so I built a 32-bit arithmetic logic unit for doing 32-bit math here the challenge with doing 32-bit math with move instructions is you can't just use lookup tables like we did with 8bit math you have enough memory for it what you can do is Cascade a whole bunch of 8bit operations in sort of a ripple carry fashion in order to implement 32bit logic so these are the macros for 32-bit addition expands out to the set of moves that's actually not too bad that's not a whole lot of moves for 32-bit addition 32-bit subtraction is similar um the shifts get a little bit more complicated this the left shift as macros this the left shift as moves the right Shi shift as macros this is that expanded to moves multiplication is where things start to get pretty hairy so this is 32-bit multiplication as a series of nasm macros which expands to this fairly monstrous set of move instructions uh division though in modulo that's where it gets bad this is 32-bit division in modulo as nasza macros this is that expanded to move instructions it's 7,000 move instructions to implement division in modulo but it can be done because move is turning complete so um so our limitations with this approach really aren't in what we can do we can do anything I think it's pretty obvious our limitation is is speed the thing just runs way too slow to be useful and really if you think about how it works the reason for our horrible speed is essentially that uh the way we have to implement jumps with only moves jumping switches between dummy data and real data so if you can imagine jumping forward just a little bit if I wanted to jump from here down to here I switch data off or I switch execution off here I switch over to fake data so all these moves execute but they operate on fake data so it's just a bunch of wasted computation but that's not a whole lot of wasted computation if we've just got a short jump forward it's a short jump backwards that causes a bigger problem if I want to jump from here to here I have to turn execution off and my entire program has to re-execute on dummy Data before I can reach my short jump backward branch and the problem with that is in brain yucky short jumps back short jumps backward happen an awful lot so this is how again we set data to zero in brain yucky this is simply looping over the same memory cell over and over decrementing it by one each time so if you imagine if you had a cell set to 200 if you just wanted to set that cell to zero your entire program has to reexecute 200 times in order to make that happen so that's going to make things uh really really slow and totally unusable just to highlight that fact if we uh check on how our uh 99 bottles of beer program is doing it's still it's still trending away and my laptop's not too happy about it down to 60 bottles of beer um we'll get there but this is pretty timeconsuming for something that should have taken well under a second to to run so not the fastest thing in the world not especially usable in its current state but but I sort of in this mentality I've I've gone this far I might as well keep going so uh over the last few weeks I built what I call mcor uh 2.0 which is actually a complete C compiler that will uh compile C code into only move instruction so I did this by retargeting the LCC compiler that's the little C compiler um I didn't really have any experience with retargeting compilers didn't know what I was doing GCC and llvm looked really really complicated l l LCC looked almost doable so that's why I I picked LCC for this it was an interesting little experiment we needed a lot of registers just in order to do the the memory transfers in order to shuffle things around so I had to reserve most of the registers for myself um and left LCC with only ESI and EDI to work with effectively turning x86 into a two register architecture had to make my own calling convention none of the existing ones work worked very well for mcad code built an emulated stack through only move instructions integrated my 32-bit ALU um changed the way we do dummy selectors instead of having duplicate copies of all our variables we've now just got one big scratch space for the program to operate on when executions off I implemented all 102 Intermediate Language instructions LCC needed with only move instructions and overall this actually went way more smoothly than I ever could have imagined um especially given that I I had no idea what I was doing in fact I think the only bad part about this this whole thing was LCC uses gccc for a couple of pieces namely LCC relies on gcc's assembler and GCC likes things in AT&T syntax which I just find abhorent so I had to write an awful lot of assembly um looking something like that which which just felt gross and made me miserable by the end of the day but other than dealing with AT&T syntax uh this this wasn't too bad so I wanted to quickly demo the uh the new version of the MAF skater which works on uh code so uh I really just got this working um pretty much last night so I made this little nibbles program um it's a really ugly short simple uh version of the nibbles game in C it uses en curses for uh for the video or for the for the graphics now we can use LCC to compile this thing so I'm asking LCC to compile it using my new move x86 backend so it'll build that dump a bunch of uh debug information um now we can check our uh program that it created we can see our move only version of of nibbles uh there's a couple if you can happen to have really really good eyes you might notice that there's one non- move instruction in there every once in a while it still does external function calls uh through a jump if equals instruction and a comparison instruction so I've got a way to fix that I don't have that implemented yet but anytime you want to call into the standard C libraries it's got to it's got to use those two instructions but beyond that it's really nothing but moves um doing all of our work uh so when we run this thing we've got a version of nibbles now running with really only move instructions and you'll notice it's actually uh fairly fast this uh new version of the code runs blazingly fast um which I was fairly surprised about uh but it's it's actually got a delay in here to make it um a little bit more playable but uh so the new version of the mcor the C compiler is uh actually produces fairly usable opos skated code so as far as I know I I think this is the first single instruction C compiler um I think it's the first single instruction compiler for any language that actually uses a real instruction not a synthetic instruction which is kind of cool um I'm still sort of in the mindset of where do I go with this now so I've been trying to think of ideas for mcor 3.0 a friend of mine pointed me to this uh presentation showing that the x86 mmu is turning complete what that lets you do it lets you do computation without any instructions not with one instruction with no instructions and that's actually than to some guys that are uh here right now I think Julie and Sergey uh so my goal for 2016 would be to have a no instruction C compiler building off of what we've already got here but we'll see how how that goes so wrapping everything up um I really don't know if this is a legitimate antire solution or not I haven't really spent a lot of time actually trying to reverse engineer this stuff I really only made this because I thought it would be funny but it turned out to be a really really fun project it really interesting to see that move can do anything we could possibly imagine just a fun thing to work on all around so if you're at all interested in this um I've got version one of the MF skater that's the uh brain yucky to move compiler is on GitHub now you can check that out to see how it works I'll get um the C compiler up as soon as I can I want to get the code cleaned up because it's really ugly and embarrassing right now and I still have to get a permission from my employer to uh post that there as well but I'd like to get that up as soon as possible I was talking to a guy from um kasperski the other day he said if you're going to do this you really need to make a crack me for it you can't give a presentation on an anti-reverse engineering technique and then not give people anything to actually reverse engineer so the other night I made the simplest possible crackme I could imagine it's a 15 line C program it's basically just a a string compare uh but I compiled it with the mafa skater so it's just a bunch of move instructions now so if anybody wants to take try their hand at that um that's on GitHub as well if anyone has ideas or feedback on this approach I would absolutely love to discuss it hunt me down at the conference or I just signed up for Twitter uh the other day uh so that I could actually talk to people here so it's exor eax eax eax if you want to um talk about um any of this so I've got a tiny bit of time left if there are any questions and while we're doing that we can see uh how our our bottles of beer are are doing it's uh it's getting there tends to go a little bit faster as it gets uh further along but I don't know if we'll quite make it uh to the end before before the end of the presentation so are there any any questions people had yeah uh question is what is the typical size of our uh binaries um large is the answer so there's about a megabyte of lookup tables for anything no matter how simple the program it's going to be at least a megabyte for your your lookup tables um and then it's probably uh 100 to a th000 uh times increase in the actual program size beyond that so um large if size is a concern this is not the officiation tool for you yeah a that's really depressing that it did not finish the bottles of beer I don't know what the issue was there but uh the question was how did we uh skip over a call uh with the move uh the M skater so for external calls there's really no way the only way to get well that's not true um a way to get to external calls like in the standard C library would be to use the call instruction um so that's basically what we have in the C compiler right now um you can do that through exception handling um the same way that I cause caused it to jump back to the beginning at the end so that'll be the next addition to the C compiler is using move instructions only for um calls um yeah um I did think about targeting arm um that seemed easier so I didn't want to do that um all right so I'll call it quits there if you got any more uh questions feel free to hunt me down sometime I'd love to talk a little bit more about this I'm really sad that the demo didn't quite get down to zero bottles of beer on the wall but I guess uh 98 is is good enough for a day so thanks everyone um