A New Kind of Science: The NKS Forum > Pure NKS > Hashing
Author
janos

CT

Registered: Nov 2004
Posts: 23

Hashing

On page 622 the Book says that hashing requires a rough uniform distribution of values to be able to retrieve the data.

Here is a small program with Mathematica 5.1 which does NOT show this uniform distribution. Any good explanation ?

In[75]:=
MapIndexed[{First[#2],
Hash[#1]} & ,
ToExpression /@
(StringJoin["aaaa", "\$",
ToString[#1]] & ) /@
Sort[Table[Random[
Integer, {9, 10}],
{i, 1, 20}]]]
Out[75]=
{{1, 1584071780},
{2, 1584071780},
{3, 1584071780},
{4, 1584071780},
{5, 1584071780},
{6, 1584071780},
{7, 1584071780},
{8, 1584071780},
{9, 1584071780},
{10, 1149946229},
{11, 1149946229},
{12, 1149946229},
{13, 1149946229},
{14, 1149946229},
{15, 1149946229},
{16, 1149946229},
{17, 1149946229},
{18, 1149946229},
{19, 1149946229},
{20, 1149946229}}

__________________
--------------------------------------
sunflower oil, and not only bought it, but
spilled it too."
Bulgakov: Master and Margarita

Report this post to a moderator | IP: Logged

01-05-2005 03:18 PM
Aaron Turner

Registered: Jan 2005
Posts: 1

Perhaps I'm missing something here, but you've merely hashed two strings repeatedly - you expect a uniform distribution?

Report this post to a moderator | IP: Logged

01-09-2005 01:23 AM
janos

CT

Registered: Nov 2004
Posts: 23

Yes, I expect two values "close" to each other. For example if I has aaaa8 and aaaa9 the the distribution of hashing values are differ just in 1.

In[2]:=
MapIndexed[{First[#2],
Hash[#1]} & ,
ToExpression /@
(StringJoin["aaaa", "\$",
ToString[#1]] & ) /@
Sort[Table[Random[
Integer, {8, 9}],
{i, 1, 20}]]]
Out[2]=
{{1, 1584071779},
{2, 1584071779},
{3, 1584071779},
{4, 1584071779},
{5, 1584071779},
{6, 1584071779},
{7, 1584071779},
{8, 1584071779},
{9, 1584071779},
{10, 1584071779},
{11, 1584071779},
{12, 1584071780},
{13, 1584071780},
{14, 1584071780},
{15, 1584071780},
{16, 1584071780},
{17, 1584071780},
{18, 1584071780},
{19, 1584071780},
{20, 1584071780}}

So the question is why there is the jump for aaaa9 and aaaa10 ? The big jump puts the hash value in a very far region of the memory and to retrieve it might require paging of memory - inbetween - in and out, making Hashing slow, ie defeating its original purpose.

__________________
--------------------------------------