D. J. Bernstein

2004-09-07 03:26:27 UTC

Here's a simple function that hashes 512 bits, in[0] through in[15],

down to 256 bits, out[0] through out[7], using out[8] through out[15] as

temporaries:

void slash(uint32 out[16],const uint32 in[16])

{

int i, r;

uint32 sum = 0, x = in[15];

for (i = 0;i < 16;++i) out[i] = in[i];

for (r = 0;r < 32;++r) {

sum += 0x9e3779b9;

for (i = 0;i < 16;++i)

x = out[i] += ((x + sum) ^ ((x << 7) | (x >> 25)));

}

for (i = 0;i < 8;++i) out[i] = in[i] ^ out[i + 8];

}

I'll pay $1000 to the first person to find a collision in this function,

or an input whose output begins with 128 bits of 0's.

This function takes a few thousand cycles on typical CPUs, like the

SHA-256 compression function. It has 512 very simple rounds, sweeping

through each of the 16 input words 32 times, rather than 64 rounds of a

more complicated operation. A differential can keep slightly more than

128 of the first 256 rounds inactive but disintegrates after that.

Some ways to vary SLASH without destroying its spirit: a slightly wider

(less serial) computation would take better advantage of the parallelism

of typical CPUs; a slightly less regular order of word accesses would

provide faster avalanche in the initial rounds.

---D. J. Bernstein, Associate Professor, Department of Mathematics,

Statistics, and Computer Science, University of Illinois at Chicago

down to 256 bits, out[0] through out[7], using out[8] through out[15] as

temporaries:

void slash(uint32 out[16],const uint32 in[16])

{

int i, r;

uint32 sum = 0, x = in[15];

for (i = 0;i < 16;++i) out[i] = in[i];

for (r = 0;r < 32;++r) {

sum += 0x9e3779b9;

for (i = 0;i < 16;++i)

x = out[i] += ((x + sum) ^ ((x << 7) | (x >> 25)));

}

for (i = 0;i < 8;++i) out[i] = in[i] ^ out[i + 8];

}

I'll pay $1000 to the first person to find a collision in this function,

or an input whose output begins with 128 bits of 0's.

This function takes a few thousand cycles on typical CPUs, like the

SHA-256 compression function. It has 512 very simple rounds, sweeping

through each of the 16 input words 32 times, rather than 64 rounds of a

more complicated operation. A differential can keep slightly more than

128 of the first 256 rounds inactive but disintegrates after that.

Some ways to vary SLASH without destroying its spirit: a slightly wider

(less serial) computation would take better advantage of the parallelism

of typical CPUs; a slightly less regular order of word accesses would

provide faster avalanche in the initial rounds.

---D. J. Bernstein, Associate Professor, Department of Mathematics,

Statistics, and Computer Science, University of Illinois at Chicago