-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Security vulnerability] Seeds are not changed on database save #219
Comments
Yes, and additionally
I used keepass' behaviour to verify, as opposed to keepassxc, but the result is the same. |
I implemented a workaround for now by poking into
Phew! |
Thanks for the good work.
Yes, the location of the KDF parameters within the database bytestream changes depending on the database version, which is reflected in the object that
You actually figured out a problem I was having. I wanted to add functionality for changing the encryption and key derivation methods, but my changed values kept being overwritten by the previously copied bytes. |
FWIW, I didn't know about construct before I started looking at pykeepass a couple of weeks ago, and I'm very impressed both by construct and how you have used it. It's really nice how pykeepass deals with the different format versions "all at once" in a single structure. I'm looking forward to using construct to solve a similar problem I'm busy solving in a different project. The way construct works, I think it's a necessary requirement that the data structure that comes out of it looks like the file format itself, rather than being able to abstract across fields that appear differently in the file format in different file format versions. To abstract over this, I'd do it outside of construct, but still use construct as you have done. For example, you could create a "proxy" type Python object by capturing |
Disclosure: I am the maintainer of Construct. Thanks for the high praise Basak. I was impressed by Construct too and that is why I became its maintainer. Evidlo, I encourage you to post more Issues at Construct's forum. Its clear you had some issues with Construct but maybe we can resolve them. I dont think using struct module is a good idea in a sophisticated project like this. I can confirm issue # 888 will be resolved upstream before end of the year. Another disclosure: I had a Cryptography class at Stanford. Which I have to admit was probably the best course at Coursera I ever had and I have like 20 certificates from there so... yeah, back to the topic: Unfortunately I dont know pykeepass enough to advise anything but I can say one thing: using same seeds opens a way for using rainbow tables which I think was what basak meant by precomputation attacks. I believe Basak is right on this. |
The official KeePass client produces and expects a 64 bit field, not a 32 bit field, for this parameter. Using the incorrect length happens to parse just fine. The correct length is encoded into the header so there is no parsing misalignment. And because the field is little endian, parsing the field itself as a 32 bit field even though it is actually a 64 bit field also works, provided that the field value fits into 32 bits, which it typically does. However, if we write header field back out, we write it with a 32 bit length, and KeePass rejects this. We weren't writing out the header ourselves previously because kdbx.header.data wasn't being removed (see libkeepass#219 (comment)). However if we start doing that, such as to fix issue libkeepass#219, then this bug is revealed. The fix is trivial: we change the declaration to be a 64 bit field to match the official client.
A couple more loose ends for version 3:
|
One more. It's probably also wise to regenerate To summarise, here are the fields to regenerate for version 3:
For version 4:
And additionally Then I've not looked into version < 3 (doesn't affect me). |
The official KeePass client produces and expects a 64 bit field, not a 32 bit field, for this parameter. Using the incorrect length happens to parse just fine. The correct length is encoded into the header so there is no parsing misalignment. And because the field is little endian, parsing the field itself as a 32 bit field even though it is actually a 64 bit field also works, provided that the field value fits into 32 bits, which it typically does. However, if we write header field back out, we write it with a 32 bit length, and KeePass rejects this. We weren't writing out the header ourselves previously because kdbx.header.data wasn't being removed (see #219 (comment)). However if we start doing that, such as to fix issue #219, then this bug is revealed. The fix is trivial: we change the declaration to be a 64 bit field to match the official client.
The official KeePass client produces and expects a 64 bit field, not a 32 bit field, for this parameter. Using the incorrect length happens to parse just fine. The correct length is encoded into the header so there is no parsing misalignment. And because the field is little endian, parsing the field itself as a 32 bit field even though it is actually a 64 bit field also works, provided that the field value fits into 32 bits, which it typically does. However, if we write header field back out, we write it with a 32 bit length, and KeePass rejects this. We weren't writing out the header ourselves previously because kdbx.header.data wasn't being removed (see libkeepass#219 (comment)). However if we start doing that, such as to fix issue libkeepass#219, then this bug is revealed. The fix is trivial: we change the declaration to be a 64 bit field to match the official client.
Here's my solution to generate new seeds in a simplified pseudo-KeePass database. from construct import Struct, RawCopy, Bytes, Checksum, this, Subconstruct
from construct import stream_seek, stream_tell, stream_read, stream_write
from Cryptodome.Random import get_random_bytes
from io import BytesIO
# simple placeholder checksum function
def check_func(header_data, master_seed):
return header_data + master_seed
class RandomBytes(Bytes):
"""Same as Bytes, but generate random bytes when building"""
def _build(self, obj, stream, context, path):
length = self.length(context) if callable(self.length) else self.length
data = get_random_bytes(length)
stream_write(stream, data, length, path)
return data
class Copy(Subconstruct):
"""Same as RawCopy, but don't create parent container when parsing.
Instead store data in ._data attribute of subconstruct, and never rebuild from data
"""
def _parse(self, stream, context, path):
offset1 = stream_tell(stream, path)
obj = self.subcon._parsereport(stream, context, path)
offset2 = stream_tell(stream, path)
stream_seek(stream, offset1, 0, path)
obj._data = stream_read(stream, offset2-offset1, path)
return obj
def _build(self, obj, stream, context, path):
offset1 = stream_tell(stream, path)
obj = self.subcon._build(obj, stream, context, path)
offset2 = stream_tell(stream, path)
stream_seek(stream, offset1, 0, path)
obj._data = stream_read(stream, offset2-offset1, path)
return obj
# simple database format with header and checksum
s = Struct(
"header" / Copy(Struct(
"master_seed" / RandomBytes(8),
"more_data" / Bytes(8)
)),
"hash" / Checksum(
Bytes(24),
lambda ctx: check_func(ctx.header._data, ctx.header.master_seed),
this
)
)
# synthesize a 'database'
master_seed = b'00000000'
more_data = b'12345678'
header = master_seed + more_data
data = header + check_func(header, master_seed)
# parse the database
parsed = s.parse(data)
# rebuild and try to reparse final result
data2 = s.build(parsed)
s.parse(data2)
assert data != data2, "Database unchanged" |
I'd appreciate some feeback on the above. |
As far as Construct is concerned, I see no problems with your code. |
I really like this approach. Nice work! I wonder if One downside of this approach is that each build triggers a rekey. This is what we want to fix this issue, of course, but I wonder if there are times when we might want to build without rekeying. For example, for unit testing. I don't have a strong feeling as to whether explicit or implicit rekeying would be better. I think it's difficult to say without attempting one approach or the other. I think this mechanism should be sufficient to do most of the necessary work. |
Those 2 classes surely are a creative work. But they wont get into Construct mainstream. Copy already does what RawCopy does, and in a way I dont like, too obscure. And RandomBytes is not general enough to be included. But if this works for you, I dont see why not. |
Is there any update to this? It seems like a rather problematic issue. |
Can someone suggest the necessary custom code to init the seeds (KDBX4, default KDF/encryption) before calling kp = create_database("test.kdbx", password=test, keyfile=None, transformed_key=None)
kp.kdbx.body.payload.xml = etree.fromstring(xml_str)
kp.kdbx.header.value.dynamic_header.master_seed.data = get_random_bytes(8)
kp.kdbx.header.value.dynamic_header.encryption_iv.data = ??
kp.kdbx.body.payload.inner_header.protected_stream_key.data = ??
#kp.kdbx.header.value.dynamic_header.kdf_parameters.data.dict.S.value = # if argon2 is used
kp.save() Without this, the output is reproducible, which is a showstopper for encrypted data. |
Any update on this? I don't think I'd have the bandwidth or know-how to do this myself, but would definitely like to resolve this vulnerability |
I tried at #366, but I am not familiar enough with the internals to make it work, some tests fail and I was not able to find out why. Here is where I am very grateful for the massive amount of tests in the project. |
I've tracked the issue down to Broken Example (with DynamicDict)
#!/usr/bin/env python3
# debugging why RandomBytes doesn't update context
from collections import OrderedDict
from Cryptodome.Random import get_random_bytes
from construct import (
Bytes, stream_write, Struct, Checksum, this,
GreedyBytes, Prefixed, Mapping, Byte, Int32ul,
Switch, RepeatUntil, RawCopy, Adapter, Container, ListContainer
)
class DynamicDict(Adapter):
"""ListContainer <---> Container
Convenience mapping so we dont have to iterate ListContainer to find
the right item
FIXME: lump kwarg was added to get around the fact that InnerHeader is
not truly a dict. We lump all 'binary' InnerHeaderItems into a single list
"""
def __init__(self, key, subcon, lump=[]):
super().__init__(subcon)
self.key = key
self.lump = lump
# map ListContainer to Container
def _decode(self, obj, context, path):
d = OrderedDict()
for l in self.lump:
d[l] = ListContainer([])
for item in obj:
if item[self.key] in self.lump:
d[item[self.key]].append(item)
else:
d[item[self.key]] = item
return Container(d)
# map Container to ListContainer
def _encode(self, obj, context, path):
l = []
for key in obj:
if key in self.lump:
l += obj[key]
else:
l.append(obj[key])
return ListContainer(l)
class RandomBytes(Bytes):
"""Same as Bytes, but generate random bytes when building"""
def _build(self, obj, stream, context, path):
print('generating random bytes...')
length = self.length(context) if callable(self.length) else self.length
data = get_random_bytes(length)
print('old:', sum(context.data))
print('new:', sum(data))
stream_write(stream, data, length, path)
return data
def compute_hash(context):
"""dumb hashing function which always returns value 1"""
# this is supposed to be a newly random value (when building)
print('rand seen during hashing:', sum(context.head.value.rand.data))
return b'1'
Item = Struct(
'id' / Mapping(Byte, {'rand': 4, 'end': 0}),
'data' / Switch(
this.id,
{
'rand': RandomBytes(32),
'end': Bytes(32),
},
)
)
Head = DynamicDict('id', RepeatUntil(
lambda item, a, b: item.id == 'end',
Item
))
# header
s = Struct(
'head' / RawCopy(Head),
'hash' / Checksum(
Bytes(1),
compute_hash,
this,
)
)
# raw data to parse
b = (
# --- Random ---
# id
b'\x04' +
# seed
b'1' * 32 +
# --- End ---
# id
b'\x00' +
# value
b'1' * 32 +
# --- Hash ---
b'1'
)
print('-------------')
print('PARSING')
print('-------------')
parsed1 = s.parse(b)
print('-------------')
print('BUILDING')
print('-------------')
del parsed1.head.data
built = s.build(parsed1)
print('-------------------------------')
print('PARSE')
print('-------------------------------')
parsed2 = s.parse(built)
Working Example (without DynamicDict)
#!/usr/bin/env python3
# debugging why RandomBytes doesn't update context
from collections import OrderedDict
from Cryptodome.Random import get_random_bytes
from construct import (
Bytes, stream_write, Struct, Checksum, this,
GreedyBytes, Prefixed, Mapping, Byte, Int32ul,
Switch, RepeatUntil, RawCopy, Adapter, Container, ListContainer
)
class RandomBytes(Bytes):
"""Same as Bytes, but generate random bytes when building"""
def _build(self, obj, stream, context, path):
print('generating random bytes...')
length = self.length(context) if callable(self.length) else self.length
data = get_random_bytes(length)
print('old:', sum(context.data))
print('new:', sum(data))
stream_write(stream, data, length, path)
return data
def compute_hash(context):
"""dumb hashing function which always returns value 1"""
# this is supposed to be a newly random value (when building)
print('rand seen during hashing:', sum(context.head.value[0].data))
return b'1'
Item = Struct(
'id' / Mapping(Byte, {'rand': 4, 'end': 0}),
'data' / Switch(
this.id,
{
'rand': RandomBytes(32),
'end': Bytes(32),
},
)
)
Head = RepeatUntil(
lambda item, a, b: item.id == 'end',
Item
)
# header
s = Struct(
'head' / RawCopy(Head),
'hash' / Checksum(
Bytes(1),
compute_hash,
this,
)
)
# raw data to parse
b = (
# --- Random ---
# id
b'\x04' +
# seed
b'1' * 32 +
# --- End ---
# id
b'\x00' +
# value
b'1' * 32 +
# --- Hash ---
b'1'
)
print('-------------')
print('PARSING')
print('-------------')
parsed1 = s.parse(b)
print('-------------')
print('BUILDING')
print('-------------')
del parsed1.head.data
built = s.build(parsed1)
print('-------------------------------')
print('PARSE')
print('-------------------------------')
parsed2 = s.parse(built)
In the Building stage, the value of Unfortunately I am deep in writing my dissertation right now so I don't have a ton of free time to spend. |
Hi,
Thank you for pykeepass!
I noticed what I believe is a security issue with the way that pykeepass saves a database (I've tested only KDBX). None of the four seeds are regenerated. These are:
kdbx.header.value.dynamic_header.master_seed.data
kdbx.header.value.dynamic_header.encryption_iv.data
kdbx.header.value.dynamic_header.transform_seed.data
kdbx.header.value.dynamic_header.protected_stream_key.data
I believe this results in a number of issues. Note that I'm not a cryptographic expert. However, the official keepass client does in my testing regenerate these parameters on every save, so I think that should be a good enough reason for pykeepass not to vary in behaviour from that.
Some issues I believe exist with not regenerating the seeds:
create_database()
will get the same seed for password derivation, which makes them vulnerable to precomputation attacks.protected_stream_key
is not as severe, but it would still allow an attacker who manages to access only the decrypted XML to be able to compare passwords from different users for equality, when regenerating this header field would have prevented it.Note that this affects both
create_database()
andPyKeePass.save()
. In both cases, all seeds should be regenerated, just like the "official" keepass client does it.Here are test cases that should pass when this issues is fixed:
The text was updated successfully, but these errors were encountered: