Source: Immunity Blog

Immunity Blog Solving Sokohashv2.0 full of Angr on Ekoparty 2016

In memory of Pocho, the Panther. RIP buddy.Last Ekoparty I got the chance to try my luck solving one of Core Security challenges. For a while I have been hearing great things about angr so I wanted to get to the solution using it.Angr is a Python binary analysis framework that provides a quite clean API to a powerful symbolic executor (built around vex intermediate code) and a solving back-end (provided by a z3 wrapper called claripy). I have to admit that it took me a while to solve the challenge (I did not win :( ), I have only my own very limited intellect and lack of familiarity with the tool to blame. But what I lack in intelligence I have in perseverance so let us solve this with, as always, maximum effort.The Sokohash V2 challenge imitates the rules of the known Sokoban puzzle game (https://en.wikipedia.org/wiki/Sokoban). Through a Rogue-like interface you get four "boxes" (x,y,z,w) that the player has to move to their corresponding correct positions. Sokohash interface - it is a bit off because I ran it in Wine :PUnlike Sokoban there are no limits to the number of steps or pushes the player can make. In Sokohash, the game assigns a hash to each position on the board, then on each movement it takes the hashes of each box and calculates a global hash with the current state of the board. The objective of the challenge is to have that global hash match the winning hash:C03922D0206DC3A33016010D6C66936E953ABAB9000010AE805CE8463CBE9A2DAs I want to get a sense of what angr can do for me, I'll keep my binary analysis to a minimum. I basically have the following needs:Identify were the global hash is calculatedIdentify were the hash was stored.Identify the function input and verify how it was related to the boxes coordinates/hashes.Model the hash calculation with angr.First we need to locate the function that calculates the hash. This is a simple step, we take a look at the strings with IDA and "Calculating hash..." takes us to the right spot.After that, It was pretty straightforward to locate the function input by putting a breakpoint at the correct place, in my case 0x40101a (*). After the canary the function copies 32 bytes from stack and a uses them as local data. Just by looking at the process screen is evident that those 32 bytes correspond to the hashes of each of the boxes, 8 bytes each, the position of the player seems to be irrelevant. (*) I run the Win32 binary with Wine and debugged with GDB. Why? Because I was feeling wild and reckless.Then we look for the place were the calculated hash is stored, this is also easy by using IDA/GDB. So now we know what the input is and where the output is stored. We might as well spin angr and see what it says. We are going to tell angr to find a path between start_addr and to_find. As the input values that will cause the function to walk that path will evidently be not unique (there are barely any code constraints between the start and end) we will also need to tell angr that we expect the input values to produce the winning hash by setting specific logical constraints on the resulting path.As the output hash is not stored in memory in a linear fashion I just made a helper Python function help set up the constraints on the memory output.import angrimport sysimport simuveximport structfrom itertools import combinations, productWIN_HASH = "C03922D0206DC3A33016010D6C66936E953ABAB9000010AE805CE8463CBE9A2D".decode("hex")def do_nothing(state): passdef do_repmovsd(state): # angr does not like rep movsd # we do it by hand buffer = state.memory.load(state.regs.esi, 8 * 4) state.memory.store(state.regs.edi, buffer)def get_hash_map(init_addr): # Helper function to get each address were a byte of # of the calculated hash is going to be stored in the # proper order addr = init_addr hash_map = [] for i in xrange(0, len(WIN_HASH), 2): pair = WIN_HASH[i:i+2] hash_map.append((addr, ord(pair[1]))) hash_map.append((addr+1, ord(pair[0]))) addr += 8 return hash_mapdef main(): proj = angr.Project('sokohashv2.0.exe', use_sim_procedures=True, load_options={"auto_load_libs": False}) # ADDRS main = 0x401013 # The start of our path, just after the canary to_find = 0x0040123E # The end, just before the security check hash_addr = 0x04216C0 # The address were the output hash will be stored # HOOKS - We avoid any function call that does not alter # our result func_hooks = [0x0040102C, 0x0401033] for addr in func_hooks: proj.hook(addr, do_nothing, length=6) func_hooks = [0x401215, 0x40121E, 0x401239, 0x40123C] for addr in func_hooks: proj.hook(addr, do_nothing, length=2) # We need to hook the rep movsd because the symbolic execution # fails to resolve it. We model it by hand proj.hook(0x0401028, do_repmovsd, length=2) proj.hook(0x0401253, do_nothing, length=5) proj.hook(0x040103E, do_nothing, length=5) proj.hook(0x0401225, do_nothing, length=5) proj.hook(0x0401243, do_nothing, length=5) # initial state init = proj.factory.blank_state(addr=main) # we set some registers to get a context closer to real world init.regs.ebp = init.regs.esp + 0x78 buffer = init.memory.load(init.regs.ebp + 0x8, 0x20) pg = proj.factory.path_group(init, threads=8) pg.explore(find=to_find) path = pg.found[0] found = path.state # We print the expected hash for verification conds = [] expected = [] hash_map = get_hash_map(hash_addr) for addr, value in hash_map: memory = found.memory.load(addr, 1, endness=proj.arch.memory_endness) # Here we declare that each byte in the output hash must be # part of the winning hash conds.append((memory == value)) expected.append((hex(addr), hex(value))) print "Expected is '%s'\n\n" % expected found.add_constraints(init.se.And(*conds)) # We print the resulting hash for verification result = [] hash_map = get_hash_map(hash_addr) for addr, value in hash_map: buf_ptr = found.memory.load(addr, 1) possible = found.se.any_int(buf_ptr) result.append((hex(addr), "0x%x" % possible)) print "Result is '%s'\n\n" % result # Print solutions possible = found.se.any_n_int(buffer, 1) # We ask for the first solution for i, f in enumerate(possible): out = "%x" % f if len(out) < (0x20*2): continue names = ["x","y","z","w"] values = [] for j in xrange(0, len(out), 16): value = out[j:j+16] unpk_value = struct.unpack("<Q", value.decode("hex"))[0] values.append((names[j//16], "%.16x" % unpk_value)) print "\tSolution %d: %s" % (i, values)if __name__ == '__main__': #angr.path_group.l.setLevel('DEBUG') main()If we run it we get:Bingo! But if we check the solution on the actual game we won't get the winning hash. We can think that maybe there are more than one solution and just ask the solver for more of them by using the line:...possible = found.se.any_n_int(buffer, 10) # Give me 10 solutions if possible...With lots of solutions checking each one will be very time consuming and we may never arrive at the correct one. As we narrow the output using constraints maybe we also need to narrow the input. To do this we need to ask ourselves the question: What is the relationship between the position hash and the box coordinates? Turns out that by looking at the caller of "x_calc_hash" we can see that the coordinates are used to read data from memory. If we take a look at the memory region where the data is read from it is full of hashes, turns out that each hash correspond to a board position. Then it will be ideal to find were those hashes are stored, dump them and then using them to add the proper constraints. We do that finding the first and last hash and then finding their memory address with GDB.We can easily map each hash to a coordinate using python but first we need the x,y coordinate pairs. The board is 27x17 but we also need to consider there are positions on the board that cannot be reached (the portal, the letter in "CORE", inside letters). To get an accurate coordinate map we need to exclude these positions from it, for example, using the function below:def get_valid_coords(): var = """# ## O ## x ## w ## * ## ## #### ###### ### ### # ## ## ## ## ## ### ## ## ## ## ## ### ## ## ##### #### ### ## ## ## ## ## ### # ## ## ## ## ## ## ## #### ## ## ### ## ## ## yz ## O #/ ## #""" valid = [] invalid = (list(product(range(7,12),[9,10])) + list(product([7,8],[17,18])) ) x = 1 for line in var.splitlines(): line = line.strip() line = line[1:len(line)-1] y = 1 for i in line: if i not in ["O", "#", "/"]: if (x,y) not in invalid: valid.append((x,y)) y += 1 x += 1 return validAfter we just need to load our memory dump and read the hashes, I will also store the dump in symbolic memory in case angr accesses it. def do_memset(state): addr = 0x417490 with open("matrix.bin","rb") as f: content = f.read() for i in content: state.memory.store(addr, state.se.BVV(ord(i), 8 * 1)) addr += 1 start_off = 0x41d450 - addr end_off = 0x41e0c8 - addr coords = [] for i in xrange(start_off, end_off+8, 8): coords.append(struct.unpack("<Q", content[i:i+8])[0]) return coords After this we just map hashes to coordinates: coords = do_memset(init) coord_dict = {} count = 0 for i in get_valid_coords():

Read full article »
Est. Annual Revenue
$5.0-25M
Est. Employees
25-100
Dave Aitel's photo - CEO of Immunity

CEO

Dave Aitel

CEO Approval Rating

70/100

Read more