
so uh so I'm going to I'm going to talk about like a project that I had like last year I think so um I want it to play like video games and non-japanese at the same time so I select reverse engineering this game that we probably know it's like 20 yards and it's like a pretty famous RPG it's Final Fantasy 7 so the thing is actually like like it looks more like this actually like and awesome graphics and all these like weird symbols so yeah we're going to look at that and here's our tool we're going to use for that is like an powerful debugger I didn't make it I just like useful and little like cluttered but
we're going to do some disassembly with right away all right radar - which is like a command line like these assembler that's free and really sweet and the last tool of the day is called VK Chan which is um like kind of like an online dictionary the idea is it's an add-on for Firefox and when you click on a word it's going to pop up some some definition of the word you're looking at so what's the objective of this project it's um let's take the text out of the game and type them into like a web page so we can use we catch on to like look upwards while playing and it's it's different than like just like Google translating
because you can click on the words and directly know which is what is the meaning of that word as opposed to just translating everything and you know not kind of losing the meaning of each individual word which is not great if you trying to learn Japanese so actually I'm not I'm not the first one coming with society and I've found like inspiration in this project which is actually pretty similar but the guy did that on unlocked PC using wine and the same then it was like Firefox invocation and I wanted to do it for 537 and I picked like the PlayStation version running in the emulator because I mean the guy did already the work on
Windows so it's not that fun and it was just like an interesting like platform to like explore so the gods for like today's presentation is first of all to a fan because I mean we're not going to change the world with that but if we can have some fun while doing it that's cool I'm going to try to like show you like a few like nip stuff and maybe gives you some Japanese like pointers or something and I think the most important for me is if I had this project and I set my goal to like serve that and it wasn't that hard and what I really want to share is if you came up with like an ID you can
walk on it and power through it and eventually like it happens and it's not as hard as it seems in the first place so I just want to say people if you've like an idea like go for it and work on it and that's it so the plan is we're going to find like the text in memory and we need to figure out like how to get it out and then like like pipe it into like like a web page or something so we can like use the chi-chan as a dictionary so where's the text so that's the beginning of the game like on the left like the guys asking oh I don't think I've heard your name what's your
name and then you have a screen that says what name do you want to have and then you say well do I on the right side says oh I'm blah blah blah so what can we do is that I think like information security people like to feel like buffers with ace so we can be like oh I'm a a that's cool so we can do that and then because it's running in an emulator we can take like a screenshot I mean snapshot of the web and then let's do the same but with B's and like BBB okay so I don't know if you see where I'm going with that but um the next step is we're going to diff the two different
black snapshot that we took and we're going to see what changed so if you look at that we can select in red like on top is like before before before before and on the bottom is V 5 V 5 V 5 V 5 which is encouraging I guess like we change like one one value so it's we think that those at most in one location that change and this one is actually more interesting because there is more more to it and that like mimic what we were like seeing on the like last slide that says here's the name of the character and then he like lying back and the name again and like some more like Japanese
so this one is actually like the one where the the text being displayed on screen is coming from so yeah so that's so what we do next we're going to put a bit point on it so the idea is like usually more commonly with people know like point like that figures when like a piece of code is executed but you can also set up like memory break points so that if like a memory location is accessed it's going to trigger the I'm going to trigger a break point which is what we want to do here is like locate the code that's responsible for reading this code so we're going to create like a memory me read the breakpoint on the
location that we found and see which is the code that does it that's actually actually accessing the data so here's the funky debugger back if you like crossing the cluttered interface like the movie points like the first arrange like and so yeah we find like a memory location that's accessing that address so when we when that like real-time what happened is like every frame of the game like the bleep on triggers but not for all the speech but only some of them and we've seen there is a query encoding going on so what we actually have is most likely like a function that's in like the draw loop of the game so like when on every frame and that's
going to render the text from why Devine coding so that's cool but okay let's have a look to this function in radar to have something a little more stable so how we do that if you don't know like whether is like a free free software and it actually handles that PlayStation binaries out of the box which is kind of what's really surprising to me but I was like oh that's neat and so if you ever put like a PlayStation game in your PC you would see something like that where like it's little cutoff but um at the bottom that's actually like the binary that's the game so you can like pipe that into like radar and so if you not
tell me always whether it's like kind of like the VIII of the disassembly world where like you have a lot of like command line weird stuff so you can run like not to analysis to like find the function and whatnot and something that was useful to me is like you can ask it to display like video text instead of like actual like me B's assembly which if you don't know anything about me is like helpful so did you not have to learn like a whole set of a whole set of new meme onyx and whatnot so we can have a look to like the function that's response that we were like looking at you know a debugger so these were like
the big breakpoint figures and it figures it's like accessing like that like the memory address at like SEO and we can see that SEO is being assigned at the top of the function so that might be useful like for later on and if we look a little down to this function we can see more processing and okay so something that's really hard to see here at the bottom I wanted to share that with you guys I've liked his version in Andy's assembly so there's just a cute little like basic block and we can see that SEO is incremented but that's after like the jump and the second unconditional jump so I was like what the hell is a quitter
or something actually no but it took me like a while to learn about that do you know anyone knows what what this is about mm-hmm exactly so di D is um it's hard to see the idea is some architecture like wants to lock up to my side pipeline and so they like the jump is going to take like more than one cycle to like execute so there is like more instruction that I fit after the jump that are going to execute it before the jump figures and so that's not a bug that's exactly the way lag meats work so yeah that's it was like interesting like tidbits I discovered about like that architecture so we see that geo is
incremented and that's more the bottom of the function like in other places well like we will switch like increase as zero and with some stuff and then if the conditions are right we're going to jump back to like the beginning of the function which is basically but what does that mean is we like actually like iterating through the whole like buffer that we had and we like some processing are going on and if we look at some of the tests well this is like magic values that data is compared to like each some bytes are compared to and if we look back at the memory dump we had we can see that a seven and FF are like in that
them so like it's pretty convincing that we like looking at the right stuff and that's stuff I like meaning so so so it turns out like a seven would be like a line break and FF would be like a terminating cha like so that the game knows that it needs to stop processing the the string so let's go back to the plan we wanted to find text in memory we did it but actually Zeiss like a new issue that like showed up which is like what the hell is this encoding so we could do like a bunch of stuff to figure out like what the encoding is like if we do like BBB it's going to be like like we have
before we turn to b5 and we could like iterate and see like oh the letters are what they are it's keep increasing okay we could also just try to modify the new allocation and see what happens actually wound up doing was like I kind of like I googled it and it turns out like this video game is like super popular and it's been like 20 years so like a lot of people have been like working on reverse engineering that stuff and so for instance I found that which is not quite useful because it's the tractor map for like the English version but you still I give like ideas of what's going on and for instance we
can see that like East 7 would be like line break and so on so and they're like much more information on the website about like the structure of the game and stuff so that that was needed to find that so if we think about it like the what's what's the font in the game it's either like something that will be like in the bios of the console and that would just provide like printing capabilities to all the games oh it could be a constant front and the custom font would be like glyph in memory somewhere and so it actually turns out that that's the right that's the latter we have like some spreadsheet somewhere that we need to find that
contain the font so let's try to look for the assets and see where the spreadsheets are so if we dig more into the into the game like this a bin files that contains a bunch of stuff we can run like bin work on them which kind of work bin work is a tool that's people use for like reverse engineering like few moas and stuff it's it knows a lot of signature of things and as then it's able to pass binaries and say oh maybe it's a I've found like some zips a signature here here's a Linux kernel or something so like here we can see it's like a lot of gzip compressed data um I was a singing like in fortunately
bin work is a little like confused or something and it's not doing a perfect job at extracting the stuff so like by looking at digging it more it turns out that we find this like structure where we first have like in some of the files there is first field that says oh here's like an entry in the file that's that big and then like you the entries I could J zip something so you can carve that out knowing how big it is and then like NJ zip it and you get something like that which I think it's the beginning is like a team header which is Tim is like an image format on the PlayStation that I
think it was like well like the position was like shipping with like is like an SDK that could handle that which is basically like an image format and we don't really want to like implement or on like decoding of Tim but it turns out that people have done it before so there is like this awesome Tim to PNG that you can find online and so what do we find so here's like some cool stuff extracted from the from the game on the left side it's like a lot of sprites the color is wrong because I guess like the color palette was on pick properly but we select recognize a lot of like element from the game and what we really care
about is like the right side whereas we actually like three three like sheets of characters like the top one and then like this bottom one and that third one so so it's what we're like looking for in terms of fonts and stuff so if we look at deep into this like spread like spreadsheet we can notice that like all the characters are like on on a grid they're like all the same size the second fixed-width font which is why I like it's fine for like the Japanese character because I like always meant to be like in a same size but like it looks a little funky with the English characters because like it doesn't handle like kerning were like
you know you'd write some letters like close out which was to make it look better but anyway so as I was saying before some of the values if you think that's in the encoding like everything that's below is seven is actually just like a lookup on this lock first on the first page sheet so like the top top left character would be encoded as like 1 and then 2 3 4 and so on a 7 is like a line break and then like in Indo games are like these special values if a to Fe that would trigger like look up in also spreadsheets for like countries which are like the most like complicated characters so here's like just some code
for like possessing the going through like the encoding with I'm just like its movin to like the rest of the content so the plan was finding the text memory figuring out the encoding but now we want to get the text out of the game so how we do that we find like a debugger that we can use programmatically and then we will like set a breakpoint at the top of the function that we're looking at and with the memory location and get it out so we're going to instrument that chunk of code so how so like the PlayStation emulator was using is like open source and that has like a built-in debugger as we could see but as
ever the 3 no documentation of it we don't get a black in the source code so is like the explanation of the protocol which is just like a client-server like communication where you just like send commands like also to big money into Bitcoin and so on so we can like program the stuff so it's like pretty easy with just like set back like I reimplemented like a simpler clients that would send the data send up the commands with the data and so on so that's the first like try you can see on the left like some debugging thing and it's printing in a console like some of the text that's on screen so as I still like
some issue I said there is like like five like we like cooking for like five spreadsheets in the spreadsheet so we in the in the game and we only sees three of them those but there is like at least extremis thing so OOP never mind that's coming next sorry that's like another issue so there a lot of characters like Japanese as like to to set of alpha kind of like alphabet plus like all these special characters so I think in total it's more than 2,000 characters that we're looking at and we don't really want to like and I'll enter them by hand because we need to we eventually we want to convert like something into like Unicode and pipe it
into a website so we can just use like the sprite just the sprite so we want to convert them into something more useful and we don't really want that to do that by hand so we can like when like an OCR on on the data so the idea is so that's the code again so because the font is fixed with you can like split it in tiny chunks and then like ran like a tool that's going to recognize it's meant for like quicken is like handwriting or something and so that's kind of like the output of it so it's it's like a little forty sometimes but it's saved a lot of time because I really don't want to type
like 2,000 characters and by hand and like try to recognize which is which so yeah and I was saying just before something is also missing it's in this spreadsheet we have we've seen three but is like a few more so where are those I looked for awhile and I couldn't find them and I was kind of frustrated so I was like well the game knows so we're just going to dump them like PlayStation is like a VM that you can recycle just like the graphic memory and if it's printing the characters it must be in the VM at some point so we can just like dumb the VM so yeah and that's what you get so it's
a little blurry here but we can see that that's like some of this thing we were like looking before but on the left side is like this we like it's appreciate that were like missing so here's like with more like like that's like more progress with like all the all the characters like being like implemented so we've sorry so now we got like everything and we ready to send the data back into like a web browser or something and then like use the online dictionary so let's build it here's like a little like crappy diagram of how I built that so like we're going to collect WebSocket server that talk with the Python code and Firefox will connect
to the WebSocket server and get the data and then like we catch on is going to translate that for us so the code just going to skip it and let's try to do a demo hopefully everything is going to work so this is like some of my debug view and this is the game and here's firefox that's not to create interface but that's it so we connect the WebSockets you can see on the bottom it says that the connection was established and so if we see hi to the guys we can see here on the left that the text is like showing up in the WebSocket and the guide Chinese running so we can be like how
what this one mean you know and so thank you so yeah and we can like keep playing them very sexy like a few like glitches where like some characters miss recognized and so on but uh it's uh it's it's like it's pretty good and uh it's nice like if you want to learn like how I do how you see like purse or something like that it does the thing so yeah that's kind of the idea yeah so uh yeah I think winning at one time so do you have like any questions so to be honest like I made it and then I was asked to type
you