hack.lu 2013 - "What's wrong with this?" Writeup

Sat 11 January 2014 by Javex

Note

This article took some time to get published. I wrote it in November but I had to put off finishing it until now. Sorry for the delay.

So the hack.lu 2013 CTF is over and it was a blast again. Thanks to everyone for their participation and especially for the feedback. One thing I noticed when reading the writeups out there is, that no writeup seems to have gone the intended way of solving my "What's wrong with this?" challenge. I am really amazed to see the solutions you guys found but I also want to present the way it was intended, so here goes a writeup of my own challenge.

The Idea

The idea for the challenge came to me when reading Looking Inside the (Drop) Box. The paper successfully deobfuscates the Dropbox source code, which as you may know is written in Python. The thing with an interpreted language is that it will always stay interpreted and so always needs an interpreter that translates its bytecode into assembly that can actually be executed. A big problem (or feature, it depends) is that it's really hard to hide your source code.

So companies producing commercial software and want to keep their source code confidential while still distributing it to clients need to develop techniques of hiding their source code. Of course, this is nothing more than plain old obfuscation. I don't want to go into details here, but you should really read that paper if you are interested in this topic - it is very detailed and even provides source code so you can validate some of their findings.

Anyway, this paper triggered my idea for a challenge that modifies the interpreter in such a way that the usual tools cannot easily recover the source code from the modules. If you read the paper, you will find that they use lots of advanced techniques such as alteration of the bytecode, a different magic number and lots more. But for my first attempt on playing with the interpreter, I wanted to keep things simple, so I decided to focus on the bytecode alteration.

With that knowledge in mind, let's first get started on solving it and afterwards I will explain how I built it and what problems I faced.

The Challenge

The challenge description gives you this:

We managed to get this package of the robots servers. We managed to determine that it is some kind of compiled bytecode. But something is wrong with it. Our usual analysis failed - so we have to hand this over to you pros. We only know this: The program takes one parameter and it responds with "Yup" if you have found the secret code, with "Nope" else. We expect it should be obvious how to execute it.

Here is the challenge: https://ctf.fluxfingers.net/static/downloads/whats_wrong/hello.tar.gz

This text gives us not much to get started, but it already hints that this is bytecode and it seems to have been tinkered with in some way. Additionally, we learn that the program needs one parameter and we know the response to look for. Now let's download the file and extract it:

hello
├── _bisect.so
├── _codecs_cn.so
├── _codecs_hk.so
├── _codecs_iso2022.so
├── _codecs_jp.so
├── _codecs_kr.so
├── _codecs_tw.so
├── _collections.so
├── _ctypes.so
├── _functools.so
├── _hashlib.so
├── _heapq.so
├── _io.so
├── _locale.so
├── _multibytecodec.so
├── _multiprocessing.so
├── _random.so
├── _socket.so
├── _ssl.so
├── _struct.so
├── array.so
├── binascii.so
├── bz2.so
├── cPickle.so
├── cStringIO.so
├── datetime.so
├── fcntl.so
├── grp.so
├── hello
├── itertools.so
├── libbz2.so.1.0
├── libcrypto.so.1.0.0
├── libncursesw.so.5
├── library.zip
├── libreadline.so.6
├── libssl.so.1.0.0
├── libz.so.1
├── math.so
├── mmap.so
├── operator.so
├── py
├── pyexpat.so
├── readline.so
├── select.so
├── strop.so
├── termios.so
├── time.so
├── unicodedata.so
└── zlib.so

0 directories, 49 files

Thie first thing you notice is that there are a lot of .so-files. These are compiled C extensions so this is obviously CPython. The archive and the folder were named hello, so let's execute that first.

$ ./hello test
Nope

And that's all we see. Now let's see which files are no .so-files:

hello
├── hello
├── library.zip
├── py

0 directories, 49 files

Notice the py-file? Well, it's executable, so let's see what we get:

$ ./py
Python 2.7.5+ (2.7:1a65bb15dedf+, Sep 24 2013, 01:47:00)
[GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(MyConsole)
>>

Okay, so this is Python 2.7 (Notice the +? It means it is build from the development tree. Nothing important just a nice thing to keep in mind). So now we know that this all is most likely a python program. But the hello file is no python file is it?

$ file hello
hello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=1d17b89e08aec60eabacc54e414e35128ee9ec26, stripped

And some of you may already know how you translate a bytecode language into an executable: You zip the bytecode files and prepend an executable header that extracts it in-memory and can then run it from there, so let's take a look:

$ unzip -l hello
Archive:  hello
  Length      Date    Time    Name
---------  ---------- -----   ----
     3021  2013-09-24 01:48   __main__.pyc
     3160  2013-09-24 01:48   __main__py__.pyc
      907  2013-09-24 01:50   __main__hello__.pyc
      ...
      266  2013-09-24 01:48   xml/parsers/expat.pyc
    40661  2013-09-24 01:48   xmlrpclib.pyc
    40045  2013-09-24 01:48   zipfile.pyc
---------                     -------
  2221257                     259 files

I have shortened the output, but you should already see the most interesting part: All files are .pyc (python bytecode) files and one has main and hello in it, so let's just run it:

$ python2 __main__hello__.pyc
Traceback (most recent call last):
  File "__main__hello__.py", line 2, in <module>
  File "hashlib.py", line 63, in <module>
    __all__ = __always_supported + ('new', 'algorithms')
TypeError: unsupported operand type(s) for %: 'tuple' and 'tuple'

You probably already figured this can't work as I already told you the bytecode is modified. And this error message doesn't help at all: It's totally weird. I initially planned to include a modified magic number, which would have yielded a better error message but in the end I went without it. So there you go: Weird error message. Now let's try our usual tool for de-obfuscation: uncompyle2:

$ uncompyle2 __main__hello__.pyc
# 2013.11.08 16:40:11 CET
#Embedded file name: __main__hello__.py
--- This code section failed: ---

0   LOAD_CONST        -1
3   LOAD_CONST        None
6   IMPORT_NAME       'sys'
9   STORE_NAME        'sys'

12  LOAD_CONST        -1
15  LOAD_CONST        ('sha256',)
18  IMPORT_NAME       'hashlib'
21  IMPORT_FROM       'sha256'
24  STORE_NAME        'sha256'
27  ROT_TWO           None

...
129 IMPORT_STAR       None


Syntax error at or near `ROT_TWO' token at offset 27
### Can't uncompyle __main__hello__.pyc
Traceback (most recent call last):
  File "uncompyle2/__init__.py", line 197, in main
    uncompyle_file(infile, outstream, showasm, showast, deob)
  File "uncompyle2/__init__.py", line 130, in uncompyle_file
    uncompyle(version, co, outstream, showasm, showast, deob)
  File "uncompyle2/__init__.py", line 98, in uncompyle
    ast = walker.build_ast(tokens, customize)
  File "uncompyle2/Walker.py", line 1436, in build_ast
    raise ParserError(e, tokens)
ParserError: --- This code section failed: ---

0   LOAD_CONST        -1
3   LOAD_CONST        None
6   IMPORT_NAME       'sys'
9   STORE_NAME        'sys'

...
129 IMPORT_STAR       None

Syntax error at or near `ROT_TWO' token at offset 27
# decompiled 0 files: 0 okay, 1 failed, 0 verify failed
# 2013.11.08 16:40:11 CET

Remember what I said before about always needing an interpreter? This program cannot run without an interpreter and in this situation it is the modified one. It has to be present as it executes these files. Remember py?

$ ../py __main__hello__.pyc test
Traceback (most recent call last):
  File "<string>", line 6, in <module>
  File "__main__.py", line 126, in <module>
  File "__main__py__.py", line 60, in <module>
  File "__main__hello__.pyc", line 1
SyntaxError: Non-ASCII character '\xf3' in file __main__hello__.pyc on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

That does not work, because python expects a .py file but we give it .pyc. It will only work if it is imported, so we can do that from our interpreter (py).

Note

I have no idea why it worked above with python2 but not here, I would assume the above must also fail, but it doesn't.

Python 2.7.5+ (2.7:1a65bb15dedf+, Sep 24 2013, 01:47:00)
[GCC 4.8.1 20130725 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(MyConsole)
>>> import __main__hello__
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "__main__hello__.py", line 19, in <module>
IndexError: list index out of range
[0.02s]
>>> import sys
>>> sys.argv.append('test')
>>> import __main__hello__
Nope

Okay that seems to work, so we can run this program even if it is extracted. Now let's combine our approaches: Try to use our py interpreter together with uncompyle2 against the __main__hello__.pyc (note that I needed to make slight path alterations to get this to run, I leave it up to you to figure this out):

$ ./hello/py uncompyle2/scripts/uncompyle2 hello/hello-dir/__main__hello__.pyc
# 2013.11.08 16:55:50 CET
#Embedded file name: __main__hello__.py
import sys
from hashlib import sha256
import dis, multiprocessing, UserList

def encrypt_string(s):
    new_str = []
    for index, c in enumerate(s):
        if index == 0:
            new_str.append(rot_chr(c, 10))
        else:
            new_str.append(rot_chr(c, ord(new_str[index - 1])))

    return ''.join(new_str)


def rot_chr(c, amount):
    return chr((ord(c) - 33 + amount) % 94 + 33)


SECRET = 'w*0;CNU[\gwPWk}3:PWk"#&:ABu/:Hi,M'
if encrypt_string(sys.argv[1]) == SECRET:
    print 'Yup'
else:
    print 'Nope'
+++ okay decompyling hello/hello-dir/__main__hello__.pyc
# decompiled 1 files: 1 okay, 0 failed, 0 verify failed
# 2013.11.08 16:55:50 CET

Wohoo, we have source code. The rest is easy, reverse the algorithm and find out that the debofuscated secret is modified_in7erpreters_are_3vil!!!. So that's it. Now let's look at some problems and tweaks in the challenge.

Creation of the Challenge

Two parts of the challenge are very important to make it work the way it did:

  1. Modfiying the interpreter in such a way that it does not break
  2. Build the challenge in such a way that all necessary resources are created along with it.

Modifying the Interperter

The modification of the interpreter is actually pretty easy if you know where you have to look. You will easily find Include/opcode.h in the CPython source code. If you modify only this, the challenge will work, but reversing won't work, because opcodes are defined in two places, the other one being Lib/opcode.py.

In both files I just went a ahead and tweaked the opcodes a little bit, the pattern being mostly rotating blocks of numbers. In the end the total diff is very small but the result is an interpeter that produces source code that cannot be run on the usual Python interpreter.

diff -r 1a65bb15dedf Include/opcode.h
--- a/Include/opcode.h      Thu Sep 05 00:26:15 2013 +0200
+++ b/Include/opcode.h      Fri Nov 08 17:03:20 2013 +0100
@@ -7,33 +7,33 @@

 /* Instruction opcodes for compiled code */

-#define STOP_CODE  0
-#define POP_TOP            1
-#define ROT_TWO            2
-#define ROT_THREE  3
-#define DUP_TOP            4
-#define ROT_FOUR   5
-#define NOP                9
+#define STOP_CODE  5
+#define POP_TOP            2
+#define ROT_TWO            3
+#define ROT_THREE  7
+#define DUP_TOP            1
+#define ROT_FOUR   0
+#define NOP                6

-#define UNARY_POSITIVE     10
-#define UNARY_NEGATIVE     11
-#define UNARY_NOT  12
-#define UNARY_CONVERT      13
+#define UNARY_POSITIVE     11
+#define UNARY_NEGATIVE     12
+#define UNARY_NOT  13
+#define UNARY_CONVERT      14

 #define UNARY_INVERT       15

-#define BINARY_POWER       19
+#define BINARY_POWER       18

-#define BINARY_MULTIPLY    20
-#define BINARY_DIVIDE      21
-#define BINARY_MODULO      22
-#define BINARY_ADD 23
-#define BINARY_SUBTRACT    24
-#define BINARY_SUBSCR      25
-#define BINARY_FLOOR_DIVIDE 26
-#define BINARY_TRUE_DIVIDE 27
-#define INPLACE_FLOOR_DIVIDE 28
-#define INPLACE_TRUE_DIVIDE 29
+#define BINARY_MULTIPLY    19
+#define BINARY_DIVIDE      20
+#define BINARY_MODULO      21
+#define BINARY_ADD 22
+#define BINARY_SUBTRACT    23
+#define BINARY_SUBSCR      24
+#define BINARY_FLOOR_DIVIDE 25
+#define BINARY_TRUE_DIVIDE 26
+#define INPLACE_FLOOR_DIVIDE 27
+#define INPLACE_TRUE_DIVIDE 28

 #define SLICE              30
 /* Also uses 31-33 */
@@ -44,43 +44,43 @@
 #define DELETE_SLICE       50
 /* Also uses 51-53 */

-#define STORE_MAP  54
-#define INPLACE_ADD        55
-#define INPLACE_SUBTRACT   56
-#define INPLACE_MULTIPLY   57
-#define INPLACE_DIVIDE     58
-#define INPLACE_MODULO     59
-#define STORE_SUBSCR       60
-#define DELETE_SUBSCR      61
+#define STORE_MAP  55
+#define INPLACE_ADD        56
+#define INPLACE_SUBTRACT   57
+#define INPLACE_MULTIPLY   58
+#define INPLACE_DIVIDE     59
+#define INPLACE_MODULO     54
+#define STORE_SUBSCR       61
+#define DELETE_SUBSCR      62

-#define BINARY_LSHIFT      62
-#define BINARY_RSHIFT      63
-#define BINARY_AND 64
-#define BINARY_XOR 65
-#define BINARY_OR  66
-#define INPLACE_POWER      67
-#define GET_ITER   68
+#define BINARY_LSHIFT      63
+#define BINARY_RSHIFT      64
+#define BINARY_AND 65
+#define BINARY_XOR 66
+#define BINARY_OR  67
+#define INPLACE_POWER      68
+#define GET_ITER   69

-#define PRINT_EXPR 70
-#define PRINT_ITEM 71
-#define PRINT_NEWLINE      72
-#define PRINT_ITEM_TO   73
-#define PRINT_NEWLINE_TO 74
-#define INPLACE_LSHIFT     75
-#define INPLACE_RSHIFT     76
-#define INPLACE_AND        77
-#define INPLACE_XOR        78
-#define INPLACE_OR 79
-#define BREAK_LOOP 80
-#define WITH_CLEANUP    81
-#define LOAD_LOCALS        82
-#define RETURN_VALUE       83
-#define IMPORT_STAR        84
-#define EXEC_STMT  85
-#define YIELD_VALUE        86
-#define POP_BLOCK  87
-#define END_FINALLY        88
-#define BUILD_CLASS        89
+#define PRINT_EXPR 71
+#define PRINT_ITEM 72
+#define PRINT_NEWLINE      73
+#define PRINT_ITEM_TO   74
+#define PRINT_NEWLINE_TO 75
+#define INPLACE_LSHIFT     76
+#define INPLACE_RSHIFT     77
+#define INPLACE_AND        78
+#define INPLACE_XOR        79
+#define INPLACE_OR 70
+#define BREAK_LOOP 81
+#define WITH_CLEANUP    82
+#define LOAD_LOCALS        83
+#define RETURN_VALUE       84
+#define IMPORT_STAR        85
+#define EXEC_STMT  86
+#define YIELD_VALUE        87
+#define POP_BLOCK  88
+#define END_FINALLY        89
+#define BUILD_CLASS        80

 #define HAVE_ARGUMENT      90      /* Opcodes from here have an argument: */

diff -r 1a65bb15dedf Lib/opcode.py
--- a/Lib/opcode.py Thu Sep 05 00:26:15 2013 +0200
+++ b/Lib/opcode.py Fri Nov 08 17:03:20 2013 +0100
@@ -43,32 +43,32 @@
 # Instruction opcodes for compiled code
 # Blank lines correspond to available opcodes

-def_op('STOP_CODE', 0)
-def_op('POP_TOP', 1)
-def_op('ROT_TWO', 2)
-def_op('ROT_THREE', 3)
-def_op('DUP_TOP', 4)
-def_op('ROT_FOUR', 5)
+def_op('STOP_CODE', 5)
+def_op('POP_TOP', 2)
+def_op('ROT_TWO', 3)
+def_op('ROT_THREE', 7)
+def_op('DUP_TOP', 1)
+def_op('ROT_FOUR', 0)

-def_op('NOP', 9)
-def_op('UNARY_POSITIVE', 10)
-def_op('UNARY_NEGATIVE', 11)
-def_op('UNARY_NOT', 12)
-def_op('UNARY_CONVERT', 13)
+def_op('NOP', 6)
+def_op('UNARY_POSITIVE', 11)
+def_op('UNARY_NEGATIVE', 12)
+def_op('UNARY_NOT', 13)
+def_op('UNARY_CONVERT', 14)

 def_op('UNARY_INVERT', 15)

-def_op('BINARY_POWER', 19)
-def_op('BINARY_MULTIPLY', 20)
-def_op('BINARY_DIVIDE', 21)
-def_op('BINARY_MODULO', 22)
-def_op('BINARY_ADD', 23)
-def_op('BINARY_SUBTRACT', 24)
-def_op('BINARY_SUBSCR', 25)
-def_op('BINARY_FLOOR_DIVIDE', 26)
-def_op('BINARY_TRUE_DIVIDE', 27)
-def_op('INPLACE_FLOOR_DIVIDE', 28)
-def_op('INPLACE_TRUE_DIVIDE', 29)
+def_op('BINARY_POWER', 18)
+def_op('BINARY_MULTIPLY', 19)
+def_op('BINARY_DIVIDE', 20)
+def_op('BINARY_MODULO', 21)
+def_op('BINARY_ADD', 22)
+def_op('BINARY_SUBTRACT', 23)
+def_op('BINARY_SUBSCR', 24)
+def_op('BINARY_FLOOR_DIVIDE', 25)
+def_op('BINARY_TRUE_DIVIDE', 26)
+def_op('INPLACE_FLOOR_DIVIDE', 27)
+def_op('INPLACE_TRUE_DIVIDE', 28)
 def_op('SLICE+0', 30)
 def_op('SLICE+1', 31)
 def_op('SLICE+2', 32)
@@ -84,42 +84,42 @@
 def_op('DELETE_SLICE+2', 52)
 def_op('DELETE_SLICE+3', 53)

-def_op('STORE_MAP', 54)
-def_op('INPLACE_ADD', 55)
-def_op('INPLACE_SUBTRACT', 56)
-def_op('INPLACE_MULTIPLY', 57)
-def_op('INPLACE_DIVIDE', 58)
-def_op('INPLACE_MODULO', 59)
-def_op('STORE_SUBSCR', 60)
-def_op('DELETE_SUBSCR', 61)
-def_op('BINARY_LSHIFT', 62)
-def_op('BINARY_RSHIFT', 63)
-def_op('BINARY_AND', 64)
-def_op('BINARY_XOR', 65)
-def_op('BINARY_OR', 66)
-def_op('INPLACE_POWER', 67)
-def_op('GET_ITER', 68)
+def_op('STORE_MAP', 55)
+def_op('INPLACE_ADD', 56)
+def_op('INPLACE_SUBTRACT', 57)
+def_op('INPLACE_MULTIPLY', 58)
+def_op('INPLACE_DIVIDE', 59)
+def_op('INPLACE_MODULO', 54)
+def_op('STORE_SUBSCR', 61)
+def_op('DELETE_SUBSCR', 62)
+def_op('BINARY_LSHIFT', 63)
+def_op('BINARY_RSHIFT', 64)
+def_op('BINARY_AND', 65)
+def_op('BINARY_XOR', 66)
+def_op('BINARY_OR', 67)
+def_op('INPLACE_POWER', 68)
+def_op('GET_ITER', 69)

-def_op('PRINT_EXPR', 70)
-def_op('PRINT_ITEM', 71)
-def_op('PRINT_NEWLINE', 72)
-def_op('PRINT_ITEM_TO', 73)
-def_op('PRINT_NEWLINE_TO', 74)
-def_op('INPLACE_LSHIFT', 75)
-def_op('INPLACE_RSHIFT', 76)
-def_op('INPLACE_AND', 77)
-def_op('INPLACE_XOR', 78)
-def_op('INPLACE_OR', 79)
-def_op('BREAK_LOOP', 80)
-def_op('WITH_CLEANUP', 81)
-def_op('LOAD_LOCALS', 82)
-def_op('RETURN_VALUE', 83)
-def_op('IMPORT_STAR', 84)
-def_op('EXEC_STMT', 85)
-def_op('YIELD_VALUE', 86)
-def_op('POP_BLOCK', 87)
-def_op('END_FINALLY', 88)
-def_op('BUILD_CLASS', 89)
+def_op('PRINT_EXPR', 71)
+def_op('PRINT_ITEM', 72)
+def_op('PRINT_NEWLINE', 73)
+def_op('PRINT_ITEM_TO', 74)
+def_op('PRINT_NEWLINE_TO', 75)
+def_op('INPLACE_LSHIFT', 76)
+def_op('INPLACE_RSHIFT', 77)
+def_op('INPLACE_AND', 78)
+def_op('INPLACE_XOR', 79)
+def_op('INPLACE_OR', 70)
+def_op('BREAK_LOOP', 81)
+def_op('WITH_CLEANUP', 82)
+def_op('LOAD_LOCALS', 83)
+def_op('RETURN_VALUE', 84)
+def_op('IMPORT_STAR', 85)
+def_op('EXEC_STMT', 86)
+def_op('YIELD_VALUE', 87)
+def_op('POP_BLOCK', 88)
+def_op('END_FINALLY', 89)
+def_op('BUILD_CLASS', 80)

 HAVE_ARGUMENT = 90              # Opcodes from here have an argument:

And that's how easily you can modify the Python interpreter.

Conclusion

This was a pretty long article, escpecially with a lot of code to read. However, I believe it is worth being shown in this great detail. We could see that it is very easy to modify the interpreter but also not too hard to get the source code back from it. I also believe that it is not possible to hide the source code at all while using Python in this way. The interpreter needs at least the byte code and from that you will always be able to recover most of the source code given the interpreter itself is not changed in such a way that it works completely different.

If you have any questions please let me know in the comments.


Comments