2013/10/22
This normally quiet page saw 6000 visits on the weekend after the Heartbleed OpenSSL bug announcement (April 12-13th 2014). Although the method described here could conceivably work to retrieve private keys via Hearbleed there are better approaches when working with a limited data leak. Rubin Xu's post presents a good summary via Coppersmith's attack.
Given a core dump from an application that has read an OpenSSL RSA private key, how easy is it to find and recover that private key? It turns out that it's not as hard as it might appear with a little python and some ELF-parsing magic courtesy of Eli Bendersky's pyelftools.
The steps were:
BIGNUM
sBIGNUM
s and recover the RSA private keyFirst off, I needed a suitable core dump from some application: I
generated one from ssh-keysign(8)
using gcore(1)
as root. This came
out at around 25MB. An ELF core file is composed of segments that reflect
the virtual address space of the process when the core was dumped. You
can see these sections using readelf(1)
or objdump(1)
. We are
interested in load1
, load2
, ... which are distinguished by a p_type
of PT_LOAD
. The VMA
column is the Virtual Memory Address, and File
off
gives the byte offset into the core file for the data:
martin@linux:~$ objdump -h core | head -20 core: file format elf64-x86-64 Sections: Idx Name Size VMA LMA File off Algn 0 note0 00000558 0000000000000000 0000000000000000 000003f8 2**0 CONTENTS, READONLY 1 .reg/7449 000000d8 0000000000000000 0000000000000000 0000047c 2**2 CONTENTS 2 .reg 000000d8 0000000000000000 0000000000000000 0000047c 2**2 CONTENTS 3 .auxv 00000130 0000000000000000 0000000000000000 0000060c 2**3 CONTENTS 4 .reg2/7449 00000200 0000000000000000 0000000000000000 00000750 2**2 CONTENTS 5 .reg2 00000200 0000000000000000 0000000000000000 00000750 2**2 CONTENTS 6 load1 00000000 0000000000400000 0000000000000000 00001000 2**12 ALLOC, READONLY, CODE 7 load2 00001000 0000000000600000 0000000000000000 00001000 2**12 martin@linux:~$
For 64 bit architectures this means that we will necessarily have
a sparse array of data. I modelled this using an array of (non-sparse)
contiguous Block
s of data. A Block
is a simple object with start
and end
addresses and its associated data
. Squashing adjacent blocks
together reduced the number of them from 46 down to 9 which was good
for performance.
Here's the Core
factory method:
@classmethod def from_file(cls, path): core = Core() with open(path, 'rb') as fd: core.elffile = ELFFile(fd) # Add a memory block for each segment found for i in xrange(core.elffile.num_segments()): segment = core.elffile.get_segment(i) header = segment.header if header.p_type == 'PT_NOTE': next elif header.p_type == 'PT_LOAD': if header.p_filesz > 0: block = Block.from_segment(segment) core.blocks.append(block) else: raise RuntimeError('Unknown p_type %s' % header.p_type) # Sort memory blocks in place by start address core.blocks.sort(key=lambda block: block.start) # Squash adjacent blocks for i in xrange(len(core.blocks) - 2, -1, -1): if self.blocks[i].end + 1 == self.blocks[i+1].start: self.blocks[i].end = self.blocks[i+1].end self.blocks[i].size += self.blocks[i+1].size self.blocks[i].data += self.blocks[i+1].data del self.blocks[i+1] return core
I wrote an iterator each_addr
to return addresses in order, using a
small helper next_addr
to return the next (possibly non-contiguous)
address, and a peek
function to return the appropriate slice of data.
BIGNUM
An OpenSSL RSA
key is just a struct containing its parameters as OpenSSL
BIGNUM
s and some other stuff we don't care too much about. A BIGNUM
looks like this:
typedef struct bignum_st BIGNUM; struct bignum_st { BN_ULONG *d; /* Pointer to an array of 'BN_BITS2' bit chunks. */ int top; /* Index of last used d +1. */ /* The next are internal book keeping for bn_expand. */ int dmax; /* Size of the d array. */ int neg; /* one if the number is negative */ int flags; };
I modelled this using python's ctypes
:
class Bn(Structure): """Class to represent OpenSSL BIGNUM.""" _fields_ = [ ('d', POINTER(c_ulong)), ('top', c_int), ('dmax', c_int), ('neg', c_int), ('flags', c_int), ] LIBCRYPTO = cdll.LoadLibrary('libcrypto.so') bn2dec = LIBCRYPTO.BN_bn2dec bn2dec.restype = c_char_p def to_long(self): return long(Bn.bn2dec(byref(self))) def __str__(self): return str(self.to_long())
BN_ULONG
is an unsigned long
(or unsigned long long
) so 8 bytes on my system.
An int
is 4 bytes. Using this information I was able to write a second iterator with a
small test for BIGNUM
ness (please forgive the excessive
nesting!). Python's struct.unpack
was used to do the data translation (beware
endianism if you try this snippet since it assumes little-endian):
def each_bn(self): """Iterator to return all possible BIGNUMs in the address space.""" struct_len = 8 + 4*4 # 64 bit pointer + 4 x 32 bit ints (no padding) for addr in self.core.each_addr(): if addr % 8 == 0: # 64 bit aligned struct_data = self.core.peek(addr, struct_len) if len(struct_data) == struct_len: top = unpack('i', struct_data[8:12])[0] # int dmax = unpack('i', struct_data[12:16])[0] # int if ((top > 5) and (dmax > 5) and (dmax - top < 2) and (dmax - top >= 0) and (dmax < 1000000)): neg = unpack('i', struct_data[16:20])[0] # int d = unpack('Q', struct_data[0:8])[0] # 64 bit pointer if (neg == 0 or neg == 1) and self.core.contains_addr(d): data_len = dmax * 8 # dmax unsigned long longs data = self.core.peek(d, data_len) if len(data) == data_len: flags = unpack('i', struct_data[20:24])[0] # int data_buf = create_string_buffer(data) bn = Bn(d=cast(pointer(data_buf), POINTER(c_ulong)), top=c_int(top), dmax=c_int(dmax), neg=c_int(neg), flags=c_int(flags)) yield bn
I've done a little work to make this efficient but it could be tightened
further (the upper bound on dmax
for one thing). Since the structs
are 8 byte aligned we only need to test one in every eight bytes. If
peek
returned enough consecutive data then it is unpacked and the
values checked for sanity. The final check is for d
, which is a
pointer to another location which contains the actual value of the
BIGNUM
. Since checking this pointer is relatively expensive this is
done last (with only 9 blocks to check binary search was not a significant
speed improvement). If all of those tests pass then this might be a real
BIGNUM
so it is yield
ed to the triage routine.
RSA uses a public/private key pair, each containing some parameters. Skipping
the details (how parameters d
, e
are chosen, other values that are
stored in the private key for performance reasons, ...) the scheme works
approximately as follows:
p
and q
are large primesn = p⋅q
, the public modulusφ(n) = φ(p)⋅φ(q) = (p − 1)⋅(q − 1)
, Euler's totient functiond
is the decryption exponente
is the encryption exponentd⋅e ≡ 1 (mod φ(n))
, by constructionn
and e
n
and d
m
: c = m^e (mod n)
c
: m = c^d (mod n)
This gives us a straightforward algorithm:
n
and e
from the public keyn
is either p
or q
p
, q
we can obtain φ(n)
d
in the remaining values by testing the congruence d⋅e ≡ 1 (mod φ(n))
Using just python's native arithmetic this ran in 30 seconds on my 25MB test core file (O(n) in both time and space). This makes it a very feasible operation.