**2013/10/22**

This normally quiet page saw 6000 visits on the weekend after the Heartbleed OpenSSL bug announcement (April 12-13th 2014). Although the method described here could conceivably work to retrieve private keys via Hearbleed there are better approaches when working with a limited data leak. Rubin Xu's post presents a good summary via Coppersmith's attack.

Given a core dump from an application that has read an OpenSSL RSA private key, how easy is it to find and recover that private key? It turns out that it's not as hard as it might appear with a little python and some ELF-parsing magic courtesy of Eli Bendersky's pyelftools.

The steps were:

- Create a data structure to represent the process address space and populate it from the core file
- Iterate over all of the addresses in the address space
to find OpenSSL
`BIGNUM`

s - Triage these
`BIGNUM`

s and recover the RSA private key

First off, I needed a suitable core dump from some application: I
generated one from `ssh-keysign(8)`

using `gcore(1)`

as root. This came
out at around 25MB. An ELF core file is composed of segments that reflect
the virtual address space of the process when the core was dumped. You
can see these sections using `readelf(1)`

or `objdump(1)`

. We are
interested in `load1`

, `load2`

, ... which are distinguished by a `p_type`

of `PT_LOAD`

. The `VMA`

column is the Virtual Memory Address, and ```
File
off
```

gives the byte offset into the core file for the data:

martin@linux:~$ objdump -h core | head -20 core: file format elf64-x86-64 Sections: Idx Name Size VMA LMA File off Algn 0 note0 00000558 0000000000000000 0000000000000000 000003f8 2**0 CONTENTS, READONLY 1 .reg/7449 000000d8 0000000000000000 0000000000000000 0000047c 2**2 CONTENTS 2 .reg 000000d8 0000000000000000 0000000000000000 0000047c 2**2 CONTENTS 3 .auxv 00000130 0000000000000000 0000000000000000 0000060c 2**3 CONTENTS 4 .reg2/7449 00000200 0000000000000000 0000000000000000 00000750 2**2 CONTENTS 5 .reg2 00000200 0000000000000000 0000000000000000 00000750 2**2 CONTENTS 6 load1 00000000 0000000000400000 0000000000000000 00001000 2**12 ALLOC, READONLY, CODE 7 load2 00001000 0000000000600000 0000000000000000 00001000 2**12 martin@linux:~$

For 64 bit architectures this means that we will necessarily have
a sparse array of data. I modelled this using an array of (non-sparse)
contiguous `Block`

s of data. A `Block`

is a simple object with `start`

and `end`

addresses and its associated `data`

. Squashing adjacent blocks
together reduced the number of them from 46 down to 9 which was good
for performance.

Here's the `Core`

factory method:

@classmethod def from_file(cls, path): core = Core() with open(path, 'rb') as fd: core.elffile = ELFFile(fd) # Add a memory block for each segment found for i in xrange(core.elffile.num_segments()): segment = core.elffile.get_segment(i) header = segment.header if header.p_type == 'PT_NOTE': next elif header.p_type == 'PT_LOAD': if header.p_filesz > 0: block = Block.from_segment(segment) core.blocks.append(block) else: raise RuntimeError('Unknown p_type %s' % header.p_type) # Sort memory blocks in place by start address core.blocks.sort(key=lambda block: block.start) # Squash adjacent blocks for i in xrange(len(core.blocks) - 2, -1, -1): if self.blocks[i].end + 1 == self.blocks[i+1].start: self.blocks[i].end = self.blocks[i+1].end self.blocks[i].size += self.blocks[i+1].size self.blocks[i].data += self.blocks[i+1].data del self.blocks[i+1] return core

I wrote an iterator `each_addr`

to return addresses in order, using a
small helper `next_addr`

to return the next (possibly non-contiguous)
address, and a `peek`

function to return the appropriate slice of data.

`BIGNUM`

An OpenSSL `RSA`

key is just a struct containing its parameters as OpenSSL
`BIGNUM`

s and some other stuff we don't care too much about. A `BIGNUM`

looks like this:

typedef struct bignum_st BIGNUM; struct bignum_st { BN_ULONG *d; /* Pointer to an array of 'BN_BITS2' bit chunks. */ int top; /* Index of last used d +1. */ /* The next are internal book keeping for bn_expand. */ int dmax; /* Size of the d array. */ int neg; /* one if the number is negative */ int flags; };

I modelled this using python's `ctypes`

:

class Bn(Structure): """Class to represent OpenSSL BIGNUM.""" _fields_ = [ ('d', POINTER(c_ulong)), ('top', c_int), ('dmax', c_int), ('neg', c_int), ('flags', c_int), ] LIBCRYPTO = cdll.LoadLibrary('libcrypto.so') bn2dec = LIBCRYPTO.BN_bn2dec bn2dec.restype = c_char_p def to_long(self): return long(Bn.bn2dec(byref(self))) def __str__(self): return str(self.to_long())

`BN_ULONG`

is an `unsigned long`

(or `unsigned long long`

) so 8 bytes on my system.
An `int`

is 4 bytes. Using this information I was able to write a second iterator with a
small test for `BIGNUM`

ness (please forgive the excessive
nesting!). Python's `struct.unpack`

was used to do the data translation (beware
endianism if you try this snippet since it assumes little-endian):

def each_bn(self): """Iterator to return all possible BIGNUMs in the address space.""" struct_len = 8 + 4*4 # 64 bit pointer + 4 x 32 bit ints (no padding) for addr in self.core.each_addr(): if addr % 8 == 0: # 64 bit aligned struct_data = self.core.peek(addr, struct_len) if len(struct_data) == struct_len: top = unpack('i', struct_data[8:12])[0] # int dmax = unpack('i', struct_data[12:16])[0] # int if ((top > 5) and (dmax > 5) and (dmax - top < 2) and (dmax - top >= 0) and (dmax < 1000000)): neg = unpack('i', struct_data[16:20])[0] # int d = unpack('Q', struct_data[0:8])[0] # 64 bit pointer if (neg == 0 or neg == 1) and self.core.contains_addr(d): data_len = dmax * 8 # dmax unsigned long longs data = self.core.peek(d, data_len) if len(data) == data_len: flags = unpack('i', struct_data[20:24])[0] # int data_buf = create_string_buffer(data) bn = Bn(d=cast(pointer(data_buf), POINTER(c_ulong)), top=c_int(top), dmax=c_int(dmax), neg=c_int(neg), flags=c_int(flags)) yield bn

I've done a little work to make this efficient but it could be tightened
further (the upper bound on `dmax`

for one thing). Since the structs
are 8 byte aligned we only need to test one in every eight bytes. If
`peek`

returned enough consecutive data then it is unpacked and the
values checked for sanity. The final check is for `d`

, which is a
pointer to another location which contains the actual value of the
`BIGNUM`

. Since checking this pointer is relatively expensive this is
done last (with only 9 blocks to check binary search was not a significant
speed improvement). If all of those tests pass then this might be a real
`BIGNUM`

so it is `yield`

ed to the triage routine.

RSA uses a public/private key pair, each containing some parameters. Skipping
the details (how parameters `d`

, `e`

are chosen, other values that are
stored in the private key for performance reasons, ...) the scheme works
approximately as follows:

`p`

and`q`

are large primes`n = p⋅q`

, the public modulus`φ(n) = φ(p)⋅φ(q) = (p − 1)⋅(q − 1)`

, Euler's totient function`d`

is the decryption exponent`e`

is the encryption exponent`d⋅e ≡ 1 (mod φ(n))`

, by construction- The public key contains
`n`

and`e`

- The private key contains
`n`

and`d`

- To encrypt message
`m`

:`c = m^e (mod n)`

- To decrypt ciphertext
`c`

:`m = c^d (mod n)`

This gives us a straightforward algorithm:

- We have
`n`

and`e`

from the public key - Any value that divides
`n`

is either`p`

or`q`

- From
`p`

,`q`

we can obtain`φ(n)`

- Seek
`d`

in the remaining values by testing the congruence`d⋅e ≡ 1 (mod φ(n))`

Using just python's native arithmetic this ran in 30 seconds on my 25MB test core file (O(n) in both time and space). This makes it a very feasible operation.

- This was considerably easier (and faster, both to write and to run) than I had imagined it to be
- That said, it is quite fiddly and strongly tied to implementation details (and even compiler specifics if we consider struct padding/packing options)
- Protect your core files!