mockturtle

2010-02-15 06:26:04 UTC

Dear all,

I am searching for results about statistical characteristics of hash

functions (I will be more precise in a moment) and I was wondering if

you have some reference to suggest. Even few googling keywords would

be fine; I tried with some obvious choices, but the resulting signal/

noise ratio is very low (e.g., the first page of results of "MD5

statistical" has links to statistical data "signed" with MD5). Note

that since I would not use the hash function against a "malicious"

adversary, but just as a double check of the correctness of a result

(let me skip all the details here, it would be too long), even "old"

and "fragile" functions could be fine, as long as some results are

known.

Let me state more precisely what I mean with "statistical

characteristics". Some questions I am interested to find an answer

are

1) Is the hash function surjective? If it is not, which fraction of

the codomine is covered? [I expect a good hash function to be

surjective or "almost" surjective, but a quantitative result would be

useful]

2) Suppose the message length fixed. How the anti-image of a hash

value is "distributed" over the domain? I expect it to be "spread

out" uniformly all over the domain; for example I expect that if S is

a subset of the domain with |S| elements and h = H(x) is the b-bit

hash of x, then the intersection of S with H^-1(h) has

"approximately" (in some sense to be specified) |S|/2^b elements.

3) What is the minimum Hamming distance between two elements of

H^-1(h)?

I agree that these questions could look a bit vague, but any result

along these lines could be of interest to me.

Thank you for any help

I am searching for results about statistical characteristics of hash

functions (I will be more precise in a moment) and I was wondering if

you have some reference to suggest. Even few googling keywords would

be fine; I tried with some obvious choices, but the resulting signal/

noise ratio is very low (e.g., the first page of results of "MD5

statistical" has links to statistical data "signed" with MD5). Note

that since I would not use the hash function against a "malicious"

adversary, but just as a double check of the correctness of a result

(let me skip all the details here, it would be too long), even "old"

and "fragile" functions could be fine, as long as some results are

known.

Let me state more precisely what I mean with "statistical

characteristics". Some questions I am interested to find an answer

are

1) Is the hash function surjective? If it is not, which fraction of

the codomine is covered? [I expect a good hash function to be

surjective or "almost" surjective, but a quantitative result would be

useful]

2) Suppose the message length fixed. How the anti-image of a hash

value is "distributed" over the domain? I expect it to be "spread

out" uniformly all over the domain; for example I expect that if S is

a subset of the domain with |S| elements and h = H(x) is the b-bit

hash of x, then the intersection of S with H^-1(h) has

"approximately" (in some sense to be specified) |S|/2^b elements.

3) What is the minimum Hamming distance between two elements of

H^-1(h)?

I agree that these questions could look a bit vague, but any result

along these lines could be of interest to me.

Thank you for any help