[Kuba Tyszko] like many of us, has been hacking things from a young age. An early attempt at hacking around with grandpa’s tractor might have been swiftly quashed by his father, but likely this was not the last such incident. With a more recent interest in cracking encrypted applications, [Kuba] gives us some insights into some of the tools at your disposal for reading out the encrypted secrets of applications that have something worth hiding. (Slides here, PDF.)
There may be all sorts of reasons for such applications to have an encrypted portion, and that’s not really the focus. One such application that [Kuba] describes was a pre-trained machine-learning model written in the R scripting language. If you’re not familiar with R, it is commonly used for ‘data science’ type tasks and has a big fan base. It’s worth checking out. Anyway, the application binary took two command line arguments, one was the encrypted blob of the model, and the second was the path to the test data set for model verification.
The first thing [Kuba] suggests is to disable network access, just in case the application wants to ‘dial home.’ We don’t want that. The application was intended for Linux, so the first port of call was to see what libraries it was linked against using the
ldd command. This indicated that it was linked against OpenSSL, so that was a likely candidate for encryption support. Next up, running
objdump gave some clues as to the various components of the binary. It was determined that it was doing something with 256-bit AES encryption. Now after applying a little experience (or educated guesswork, if you prefer), the likely scenario is that the binary yanks the private key from somewhere within itself reads the encrypted blob file, and passes this over to
libssl. Then the plaintext R script is passed off to the R runtime, the model executes against the test data, and results are collated.
[Kuba]’s first attack method was to grab the OpenSSL source code and drop in some strategic printf() function calls into the target functions. Next, using the LD_PRELOAD ‘trick’ the standard system OpenSSL library was substituted with the ‘fake’ version with the trojan
printfs. The result of this was the decryption function gleefully sending the plaintext R script direct to the terminal. No need to even locate the private key!
Next [Kuba] outlines the ‘easy way’ which is to freeze the binary, just like we could with a whole machine in years gone b, by having it read from a FIFO instead of a file but never place data on the other end. Then, with the
read() call blocked, the binary is frozen, and hopefully, the private key is in memory already. Next, we use gcore to create a core dump of the running application, which only requires knowledge of the process PID. Since the binary has already accessed the key and decrypted the secret model data, which it is held in memory, the plaintext contents will be in the core file, and easily visible by just opening it as a text file! After a bit of searching around, the R script code was visible. No special libraries are needed, just a handful of standard Linux commands.
With the shoe on the other foot, how can you protect your application against such a simple hacking process? Roll you own crypto? That is a dangerous proposition and the consensus is to not do this. Preventing the FIFO attack could be as simple as using
stat() to check the file presented is an actual file. You could also statically link certain critical libraries, to prevent the LD_PRELOAD attack, if that is possible. [Kuba] also suggests that the application could inspect any loaded shared objects using callbacks, to verify the libraries are the expected ones.
The only way to be sure (and you can never be 100%) is to enumerate all the possible attack methods and mitigate each one accordingly. There is no hack-proof method, you just have to make it as hard as possible.