Note
This article took some time to get published. I wrote it in November but I had to put off finishing it until now. Sorry for the delay.
So the hack.lu 2013 CTF is over and it was a blast again. Thanks to everyone for their participation and especially for the feedback. One thing I noticed when reading the writeups out there is, that no writeup seems to have gone the intended way of solving my "What's wrong with this?" challenge. I am really amazed to see the solutions you guys found but I also want to present the way it was intended, so here goes a writeup of my own challenge.
The Idea
The idea for the challenge came to me when reading Looking Inside the (Drop) Box. The paper successfully deobfuscates the Dropbox source code, which as you may know is written in Python. The thing with an interpreted language is that it will always stay interpreted and so always needs an interpreter that translates its bytecode into assembly that can actually be executed. A big problem (or feature, it depends) is that it's really hard to hide your source code.
So companies producing commercial software and want to keep their source code confidential while still distributing it to clients need to develop techniques of hiding their source code. Of course, this is nothing more than plain old obfuscation. I don't want to go into details here, but you should really read that paper if you are interested in this topic - it is very detailed and even provides source code so you can validate some of their findings.
Anyway, this paper triggered my idea for a challenge that modifies the interpreter in such a way that the usual tools cannot easily recover the source code from the modules. If you read the paper, you will find that they use lots of advanced techniques such as alteration of the bytecode, a different magic number and lots more. But for my first attempt on playing with the interpreter, I wanted to keep things simple, so I decided to focus on the bytecode alteration.
With that knowledge in mind, let's first get started on solving it and afterwards I will explain how I built it and what problems I faced.
The Challenge
The challenge description gives you this:
We managed to get this package of the robots servers. We managed to determine that it is some kind of compiled bytecode. But something is wrong with it. Our usual analysis failed - so we have to hand this over to you pros. We only know this: The program takes one parameter and it responds with "Yup" if you have found the secret code, with "Nope" else. We expect it should be obvious how to execute it.
Here is the challenge: https://ctf.fluxfingers.net/static/downloads/whats_wrong/hello.tar.gz
This text gives us not much to get started, but it already hints that this is bytecode and it seems to have been tinkered with in some way. Additionally, we learn that the program needs one parameter and we know the response to look for. Now let's download the file and extract it:
hello
├── _bisect.so
├── _codecs_cn.so
├── _codecs_hk.so
├── _codecs_iso2022.so
├── _codecs_jp.so
├── _codecs_kr.so
├── _codecs_tw.so
├── _collections.so
├── _ctypes.so
├── _functools.so
├── _hashlib.so
├── _heapq.so
├── _io.so
├── _locale.so
├── _multibytecodec.so
├── _multiprocessing.so
├── _random.so
├── _socket.so
├── _ssl.so
├── _struct.so
├── array.so
├── binascii.so
├── bz2.so
├── cPickle.so
├── cStringIO.so
├── datetime.so
├── fcntl.so
├── grp.so
├── hello
├── itertools.so
├── libbz2.so.1.0
├── libcrypto.so.1.0.0
├── libncursesw.so.5
├── library.zip
├── libreadline.so.6
├── libssl.so.1.0.0
├── libz.so.1
├── math.so
├── mmap.so
├── operator.so
├── py
├── pyexpat.so
├── readline.so
├── select.so
├── strop.so
├── termios.so
├── time.so
├── unicodedata.so
└── zlib.so
0 directories, 49 files
Thie first thing you notice is that there are a lot of .so-files. These are compiled C extensions so this is obviously CPython. The archive and the folder were named hello, so let's execute that first.
$ ./hello test
Nope
And that's all we see. Now let's see which files are no .so-files:
hello
├── hello
├── library.zip
├── py
0 directories, 49 files
Notice the py-file? Well, it's executable, so let's see what we get:
$ ./py
Python 2.7.5+ (2.7:1a65bb15dedf+, Sep 24 2013, 01:47:00)
[GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(MyConsole)
>>
Okay, so this is Python 2.7 (Notice the +? It means it is build from the development tree. Nothing important just a nice thing to keep in mind). So now we know that this all is most likely a python program. But the hello file is no python file is it?
$ file hello
hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=1d17b89e08aec60eabacc54e414e35128ee9ec26, stripped
And some of you may already know how you translate a bytecode language into an executable: You zip the bytecode files and prepend an executable header that extracts it in-memory and can then run it from there, so let's take a look:
$ unzip -l hello
Archive: hello
Length Date Time Name
--------- ---------- ----- ----
3021 2013-09-24 01:48 __main__.pyc
3160 2013-09-24 01:48 __main__py__.pyc
907 2013-09-24 01:50 __main__hello__.pyc
...
266 2013-09-24 01:48 xml/parsers/expat.pyc
40661 2013-09-24 01:48 xmlrpclib.pyc
40045 2013-09-24 01:48 zipfile.pyc
--------- -------
2221257 259 files
I have shortened the output, but you should already see the most interesting part: All files are .pyc (python bytecode) files and one has main and hello in it, so let's just run it:
$ python2 __main__hello__.pyc
Traceback (most recent call last):
File "__main__hello__.py", line 2, in <module>
File "hashlib.py", line 63, in <module>
__all__ = __always_supported + ('new', 'algorithms')
TypeError: unsupported operand type(s) for %: 'tuple' and 'tuple'
You probably already figured this can't work as I already told you the bytecode is modified. And this error message doesn't help at all: It's totally weird. I initially planned to include a modified magic number, which would have yielded a better error message but in the end I went without it. So there you go: Weird error message. Now let's try our usual tool for de-obfuscation: uncompyle2:
$ uncompyle2 __main__hello__.pyc
# 2013.11.08 16:40:11 CET
#Embedded file name: __main__hello__.py
--- This code section failed: ---
0 LOAD_CONST -1
3 LOAD_CONST None
6 IMPORT_NAME 'sys'
9 STORE_NAME 'sys'
12 LOAD_CONST -1
15 LOAD_CONST ('sha256',)
18 IMPORT_NAME 'hashlib'
21 IMPORT_FROM 'sha256'
24 STORE_NAME 'sha256'
27 ROT_TWO None
...
129 IMPORT_STAR None
Syntax error at or near `ROT_TWO' token at offset 27
### Can't uncompyle __main__hello__.pyc
Traceback (most recent call last):
File "uncompyle2/__init__.py", line 197, in main
uncompyle_file(infile, outstream, showasm, showast, deob)
File "uncompyle2/__init__.py", line 130, in uncompyle_file
uncompyle(version, co, outstream, showasm, showast, deob)
File "uncompyle2/__init__.py", line 98, in uncompyle
ast = walker.build_ast(tokens, customize)
File "uncompyle2/Walker.py", line 1436, in build_ast
raise ParserError(e, tokens)
ParserError: --- This code section failed: ---
0 LOAD_CONST -1
3 LOAD_CONST None
6 IMPORT_NAME 'sys'
9 STORE_NAME 'sys'
...
129 IMPORT_STAR None
Syntax error at or near `ROT_TWO' token at offset 27
# decompiled 0 files: 0 okay, 1 failed, 0 verify failed
# 2013.11.08 16:40:11 CET
Remember what I said before about always needing an interpreter? This program cannot run without an interpreter and in this situation it is the modified one. It has to be present as it executes these files. Remember py?
$ ../py __main__hello__.pyc test
Traceback (most recent call last):
File "<string>", line 6, in <module>
File "__main__.py", line 126, in <module>
File "__main__py__.py", line 60, in <module>
File "__main__hello__.pyc", line 1
SyntaxError: Non-ASCII character '\xf3' in file __main__hello__.pyc on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details
That does not work, because python expects a .py file but we give it .pyc. It will only work if it is imported, so we can do that from our interpreter (py).
Note
I have no idea why it worked above with python2 but not here, I would assume the above must also fail, but it doesn't.
Python 2.7.5+ (2.7:1a65bb15dedf+, Sep 24 2013, 01:47:00)
[GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(MyConsole)
>>> import __main__hello__
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "__main__hello__.py", line 19, in <module>
IndexError: list index out of range
[0.02s]
>>> import sys
>>> sys.argv.append('test')
>>> import __main__hello__
Nope
Okay that seems to work, so we can run this program even if it is extracted. Now let's combine our approaches: Try to use our py interpreter together with uncompyle2 against the __main__hello__.pyc (note that I needed to make slight path alterations to get this to run, I leave it up to you to figure this out):
$ ./hello/py uncompyle2/scripts/uncompyle2 hello/hello-dir/__main__hello__.pyc
# 2013.11.08 16:55:50 CET
#Embedded file name: __main__hello__.py
import sys
from hashlib import sha256
import dis, multiprocessing, UserList
def encrypt_string(s):
new_str = []
for index, c in enumerate(s):
if index == 0:
new_str.append(rot_chr(c, 10))
else:
new_str.append(rot_chr(c, ord(new_str[index - 1])))
return ''.join(new_str)
def rot_chr(c, amount):
return chr((ord(c) - 33 + amount) % 94 + 33)
SECRET = 'w*0;CNU[\gwPWk}3:PWk"#&:ABu/:Hi,M'
if encrypt_string(sys.argv[1]) == SECRET:
print 'Yup'
else:
print 'Nope'
+++ okay decompyling hello/hello-dir/__main__hello__.pyc
# decompiled 1 files: 1 okay, 0 failed, 0 verify failed
# 2013.11.08 16:55:50 CET
Wohoo, we have source code. The rest is easy, reverse the algorithm and find out that the debofuscated secret is modified_in7erpreters_are_3vil!!!. So that's it. Now let's look at some problems and tweaks in the challenge.
Creation of the Challenge
Two parts of the challenge are very important to make it work the way it did:
- Modfiying the interpreter in such a way that it does not break
- Build the challenge in such a way that all necessary resources are created along with it.
Modifying the Interperter
The modification of the interpreter is actually pretty easy if you know where you have to look. You will easily find Include/opcode.h in the CPython source code. If you modify only this, the challenge will work, but reversing won't work, because opcodes are defined in two places, the other one being Lib/opcode.py.
In both files I just went a ahead and tweaked the opcodes a little bit, the pattern being mostly rotating blocks of numbers. In the end the total diff is very small but the result is an interpeter that produces source code that cannot be run on the usual Python interpreter.
diff -r 1a65bb15dedf Include/opcode.h
--- a/Include/opcode.h Thu Sep 05 00:26:15 2013 +0200
+++ b/Include/opcode.h Fri Nov 08 17:03:20 2013 +0100
@@ -7,33 +7,33 @@
/* Instruction opcodes for compiled code */
-#define STOP_CODE 0
-#define POP_TOP 1
-#define ROT_TWO 2
-#define ROT_THREE 3
-#define DUP_TOP 4
-#define ROT_FOUR 5
-#define NOP 9
+#define STOP_CODE 5
+#define POP_TOP 2
+#define ROT_TWO 3
+#define ROT_THREE 7
+#define DUP_TOP 1
+#define ROT_FOUR 0
+#define NOP 6
-#define UNARY_POSITIVE 10
-#define UNARY_NEGATIVE 11
-#define UNARY_NOT 12
-#define UNARY_CONVERT 13
+#define UNARY_POSITIVE 11
+#define UNARY_NEGATIVE 12
+#define UNARY_NOT 13
+#define UNARY_CONVERT 14
#define UNARY_INVERT 15
-#define BINARY_POWER 19
+#define BINARY_POWER 18
-#define BINARY_MULTIPLY 20
-#define BINARY_DIVIDE 21
-#define BINARY_MODULO 22
-#define BINARY_ADD 23
-#define BINARY_SUBTRACT 24
-#define BINARY_SUBSCR 25
-#define BINARY_FLOOR_DIVIDE 26
-#define BINARY_TRUE_DIVIDE 27
-#define INPLACE_FLOOR_DIVIDE 28
-#define INPLACE_TRUE_DIVIDE 29
+#define BINARY_MULTIPLY 19
+#define BINARY_DIVIDE 20
+#define BINARY_MODULO 21
+#define BINARY_ADD 22
+#define BINARY_SUBTRACT 23
+#define BINARY_SUBSCR 24
+#define BINARY_FLOOR_DIVIDE 25
+#define BINARY_TRUE_DIVIDE 26
+#define INPLACE_FLOOR_DIVIDE 27
+#define INPLACE_TRUE_DIVIDE 28
#define SLICE 30
/* Also uses 31-33 */
@@ -44,43 +44,43 @@
#define DELETE_SLICE 50
/* Also uses 51-53 */
-#define STORE_MAP 54
-#define INPLACE_ADD 55
-#define INPLACE_SUBTRACT 56
-#define INPLACE_MULTIPLY 57
-#define INPLACE_DIVIDE 58
-#define INPLACE_MODULO 59
-#define STORE_SUBSCR 60
-#define DELETE_SUBSCR 61
+#define STORE_MAP 55
+#define INPLACE_ADD 56
+#define INPLACE_SUBTRACT 57
+#define INPLACE_MULTIPLY 58
+#define INPLACE_DIVIDE 59
+#define INPLACE_MODULO 54
+#define STORE_SUBSCR 61
+#define DELETE_SUBSCR 62
-#define BINARY_LSHIFT 62
-#define BINARY_RSHIFT 63
-#define BINARY_AND 64
-#define BINARY_XOR 65
-#define BINARY_OR 66
-#define INPLACE_POWER 67
-#define GET_ITER 68
+#define BINARY_LSHIFT 63
+#define BINARY_RSHIFT 64
+#define BINARY_AND 65
+#define BINARY_XOR 66
+#define BINARY_OR 67
+#define INPLACE_POWER 68
+#define GET_ITER 69
-#define PRINT_EXPR 70
-#define PRINT_ITEM 71
-#define PRINT_NEWLINE 72
-#define PRINT_ITEM_TO 73
-#define PRINT_NEWLINE_TO 74
-#define INPLACE_LSHIFT 75
-#define INPLACE_RSHIFT 76
-#define INPLACE_AND 77
-#define INPLACE_XOR 78
-#define INPLACE_OR 79
-#define BREAK_LOOP 80
-#define WITH_CLEANUP 81
-#define LOAD_LOCALS 82
-#define RETURN_VALUE 83
-#define IMPORT_STAR 84
-#define EXEC_STMT 85
-#define YIELD_VALUE 86
-#define POP_BLOCK 87
-#define END_FINALLY 88
-#define BUILD_CLASS 89
+#define PRINT_EXPR 71
+#define PRINT_ITEM 72
+#define PRINT_NEWLINE 73
+#define PRINT_ITEM_TO 74
+#define PRINT_NEWLINE_TO 75
+#define INPLACE_LSHIFT 76
+#define INPLACE_RSHIFT 77
+#define INPLACE_AND 78
+#define INPLACE_XOR 79
+#define INPLACE_OR 70
+#define BREAK_LOOP 81
+#define WITH_CLEANUP 82
+#define LOAD_LOCALS 83
+#define RETURN_VALUE 84
+#define IMPORT_STAR 85
+#define EXEC_STMT 86
+#define YIELD_VALUE 87
+#define POP_BLOCK 88
+#define END_FINALLY 89
+#define BUILD_CLASS 80
#define HAVE_ARGUMENT 90 /* Opcodes from here have an argument: */
diff -r 1a65bb15dedf Lib/opcode.py
--- a/Lib/opcode.py Thu Sep 05 00:26:15 2013 +0200
+++ b/Lib/opcode.py Fri Nov 08 17:03:20 2013 +0100
@@ -43,32 +43,32 @@
# Instruction opcodes for compiled code
# Blank lines correspond to available opcodes
-def_op('STOP_CODE', 0)
-def_op('POP_TOP', 1)
-def_op('ROT_TWO', 2)
-def_op('ROT_THREE', 3)
-def_op('DUP_TOP', 4)
-def_op('ROT_FOUR', 5)
+def_op('STOP_CODE', 5)
+def_op('POP_TOP', 2)
+def_op('ROT_TWO', 3)
+def_op('ROT_THREE', 7)
+def_op('DUP_TOP', 1)
+def_op('ROT_FOUR', 0)
-def_op('NOP', 9)
-def_op('UNARY_POSITIVE', 10)
-def_op('UNARY_NEGATIVE', 11)
-def_op('UNARY_NOT', 12)
-def_op('UNARY_CONVERT', 13)
+def_op('NOP', 6)
+def_op('UNARY_POSITIVE', 11)
+def_op('UNARY_NEGATIVE', 12)
+def_op('UNARY_NOT', 13)
+def_op('UNARY_CONVERT', 14)
def_op('UNARY_INVERT', 15)
-def_op('BINARY_POWER', 19)
-def_op('BINARY_MULTIPLY', 20)
-def_op('BINARY_DIVIDE', 21)
-def_op('BINARY_MODULO', 22)
-def_op('BINARY_ADD', 23)
-def_op('BINARY_SUBTRACT', 24)
-def_op('BINARY_SUBSCR', 25)
-def_op('BINARY_FLOOR_DIVIDE', 26)
-def_op('BINARY_TRUE_DIVIDE', 27)
-def_op('INPLACE_FLOOR_DIVIDE', 28)
-def_op('INPLACE_TRUE_DIVIDE', 29)
+def_op('BINARY_POWER', 18)
+def_op('BINARY_MULTIPLY', 19)
+def_op('BINARY_DIVIDE', 20)
+def_op('BINARY_MODULO', 21)
+def_op('BINARY_ADD', 22)
+def_op('BINARY_SUBTRACT', 23)
+def_op('BINARY_SUBSCR', 24)
+def_op('BINARY_FLOOR_DIVIDE', 25)
+def_op('BINARY_TRUE_DIVIDE', 26)
+def_op('INPLACE_FLOOR_DIVIDE', 27)
+def_op('INPLACE_TRUE_DIVIDE', 28)
def_op('SLICE+0', 30)
def_op('SLICE+1', 31)
def_op('SLICE+2', 32)
@@ -84,42 +84,42 @@
def_op('DELETE_SLICE+2', 52)
def_op('DELETE_SLICE+3', 53)
-def_op('STORE_MAP', 54)
-def_op('INPLACE_ADD', 55)
-def_op('INPLACE_SUBTRACT', 56)
-def_op('INPLACE_MULTIPLY', 57)
-def_op('INPLACE_DIVIDE', 58)
-def_op('INPLACE_MODULO', 59)
-def_op('STORE_SUBSCR', 60)
-def_op('DELETE_SUBSCR', 61)
-def_op('BINARY_LSHIFT', 62)
-def_op('BINARY_RSHIFT', 63)
-def_op('BINARY_AND', 64)
-def_op('BINARY_XOR', 65)
-def_op('BINARY_OR', 66)
-def_op('INPLACE_POWER', 67)
-def_op('GET_ITER', 68)
+def_op('STORE_MAP', 55)
+def_op('INPLACE_ADD', 56)
+def_op('INPLACE_SUBTRACT', 57)
+def_op('INPLACE_MULTIPLY', 58)
+def_op('INPLACE_DIVIDE', 59)
+def_op('INPLACE_MODULO', 54)
+def_op('STORE_SUBSCR', 61)
+def_op('DELETE_SUBSCR', 62)
+def_op('BINARY_LSHIFT', 63)
+def_op('BINARY_RSHIFT', 64)
+def_op('BINARY_AND', 65)
+def_op('BINARY_XOR', 66)
+def_op('BINARY_OR', 67)
+def_op('INPLACE_POWER', 68)
+def_op('GET_ITER', 69)
-def_op('PRINT_EXPR', 70)
-def_op('PRINT_ITEM', 71)
-def_op('PRINT_NEWLINE', 72)
-def_op('PRINT_ITEM_TO', 73)
-def_op('PRINT_NEWLINE_TO', 74)
-def_op('INPLACE_LSHIFT', 75)
-def_op('INPLACE_RSHIFT', 76)
-def_op('INPLACE_AND', 77)
-def_op('INPLACE_XOR', 78)
-def_op('INPLACE_OR', 79)
-def_op('BREAK_LOOP', 80)
-def_op('WITH_CLEANUP', 81)
-def_op('LOAD_LOCALS', 82)
-def_op('RETURN_VALUE', 83)
-def_op('IMPORT_STAR', 84)
-def_op('EXEC_STMT', 85)
-def_op('YIELD_VALUE', 86)
-def_op('POP_BLOCK', 87)
-def_op('END_FINALLY', 88)
-def_op('BUILD_CLASS', 89)
+def_op('PRINT_EXPR', 71)
+def_op('PRINT_ITEM', 72)
+def_op('PRINT_NEWLINE', 73)
+def_op('PRINT_ITEM_TO', 74)
+def_op('PRINT_NEWLINE_TO', 75)
+def_op('INPLACE_LSHIFT', 76)
+def_op('INPLACE_RSHIFT', 77)
+def_op('INPLACE_AND', 78)
+def_op('INPLACE_XOR', 79)
+def_op('INPLACE_OR', 70)
+def_op('BREAK_LOOP', 81)
+def_op('WITH_CLEANUP', 82)
+def_op('LOAD_LOCALS', 83)
+def_op('RETURN_VALUE', 84)
+def_op('IMPORT_STAR', 85)
+def_op('EXEC_STMT', 86)
+def_op('YIELD_VALUE', 87)
+def_op('POP_BLOCK', 88)
+def_op('END_FINALLY', 89)
+def_op('BUILD_CLASS', 80)
HAVE_ARGUMENT = 90 # Opcodes from here have an argument:
And that's how easily you can modify the Python interpreter.
Conclusion
This was a pretty long article, escpecially with a lot of code to read. However, I believe it is worth being shown in this great detail. We could see that it is very easy to modify the interpreter but also not too hard to get the source code back from it. I also believe that it is not possible to hide the source code at all while using Python in this way. The interpreter needs at least the byte code and from that you will always be able to recover most of the source code given the interpreter itself is not changed in such a way that it works completely different.
If you have any questions please let me know in the comments.