2010-02-15 06:26:04 UTC
I am searching for results about statistical characteristics of hash
functions (I will be more precise in a moment) and I was wondering if
you have some reference to suggest. Even few googling keywords would
be fine; I tried with some obvious choices, but the resulting signal/
noise ratio is very low (e.g., the first page of results of "MD5
statistical" has links to statistical data "signed" with MD5). Note
that since I would not use the hash function against a "malicious"
adversary, but just as a double check of the correctness of a result
(let me skip all the details here, it would be too long), even "old"
and "fragile" functions could be fine, as long as some results are
Let me state more precisely what I mean with "statistical
characteristics". Some questions I am interested to find an answer
1) Is the hash function surjective? If it is not, which fraction of
the codomine is covered? [I expect a good hash function to be
surjective or "almost" surjective, but a quantitative result would be
2) Suppose the message length fixed. How the anti-image of a hash
value is "distributed" over the domain? I expect it to be "spread
out" uniformly all over the domain; for example I expect that if S is
a subset of the domain with |S| elements and h = H(x) is the b-bit
hash of x, then the intersection of S with H^-1(h) has
"approximately" (in some sense to be specified) |S|/2^b elements.
3) What is the minimum Hamming distance between two elements of
I agree that these questions could look a bit vague, but any result
along these lines could be of interest to me.
Thank you for any help