S
imple hash function and hash table algorithm
with an example of application for strings
Hi there, If you need to use and try a good and simple hash function algorithm for any purpose especially for string query. I hope that your answer be in this article.
Hash function data structure consists of:
Chaining node
Table of nodes
Hash function
Chaining node:
Node is a container for the value (a word for example), and it links with another node forming a chain of nodes.
To define a node in C for example:
typedef struct node
{
char word[word-length];
struct node *next;
}
node;
where struct node is the name of the node in chain, we shortcut the struct node into simply node in the last line above. Struct is the datatype of the node.
char word is the variable that carries the value (a word for example) that we will search for up-next. word-length is an integer (46 for example), and it represents the maximum number of characters that word can carry.
Hash Table (table of nodes) Definition:
Hash table is simply a table of nodes, we can define a list containing certain number of pointers for nodes.
to define a hash table in C for example:
node *table[Number];
we can see that table is a list of node pointers, and Number is the number of pointers in the table list. We may specify that Number = 26 as the number of alphabets, but in this example for high speed processing Number will equal 18278.
Hash Function Definition:
Each node carries a value, points to another node. In addition to that it has a unique key (or index) that specifies the certain value inside each node.
If some nodes share the same key, we must chain them together.
if we make a wide range for the hash key, we get a rapid result from the hash function.
to define a hash function in C for example:
int hash(const char *word)
{
// Key is the same whether the first character is upper case or lower case
unsigned int key = 0, First_Letter_Index = 0, Second_Letter_Index = 0, Third_Letter_Index = 0;
// Calculate the length of the word that needs to be inserted in the hash table
int Word_Lengh = strlen(word);
if (Word_Lengh == 1)
{
if (65 <= word[0] && word[0] <= 90)
{
Third_Letter_Index = (word[0] - 65 + 1) % Number;
}
else if (97 <= word[0] && word[0] <= 122)
{
Third_Letter_Index = (word[0] - 97 + 1) % Number;
}
}
if (Word_Lengh == 2)
{
if (65 <= word[0] && word[0] <= 90)
{
Second_Letter_Index = (word[0] - 65 + 1) % Number;
}
else if (97 <= word[0] && word[0] <= 122)
{
Second_Letter_Index = (word[0] - 97 + 1) % Number;
}
/********************************************************/
if (65 <= word[1] && word[1] <= 90)
{
Third_Letter_Index = (word[1] - 65 + 1) % Number;
}
else if (97 <= word[1] && word[1] <= 122)
{
Third_Letter_Index = (word[1] - 97 + 1) % Number;
}
}
if (Word_Lengh >= 3)
{
if (65 <= word[0] && word[0] <= 90)
{
First_Letter_Index = (word[0] - 65 + 1) % Number;
}
else if (97 <= word[0] && word[0] <= 122)
{
First_Letter_Index = (word[0] - 97 + 1) % Number;
}
/********************************************************/
if (65 <= word[1] && word[1] <= 90)
{
Second_Letter_Index = (word[1] - 65 + 1) % Number;
}
else if (97 <= word[1] && word[1] <= 122)
{
Second_Letter_Index = (word[1] - 97 + 1) % Number;
}
/********************************************************/
if (65 <= word[2] && word[2] <= 90)
{
Third_Letter_Index = (word[2] - 65 + 1) % Number;
}
else if (97 <= word[2] && word[2] <= 122)
{
Third_Letter_Index = (word[2] - 97 + 1) % Number;
}
}
key = (First_Letter_Index * 26 * 26) + (Second_Letter_Index * 26) + (Third_Letter_Index * 1);
return (key - 1);
}
At this point our hash function data structure is completely defined. In the following we will discuss how to operate this all.
We would better make an init function to set the initial pointers in each node in the table to null.
for (int i = 0; i < Number; i++)
{
// Set the initial pointers in the has table to null
table[i] = NULL;
}
To insert a new value into your data structure, you should declare a new node object (allocate memory for that) in which you will insert the new value, derive a certain key for that value, and then insert the node into the hash table according to that certain key.
bool insert(char *value)
{
// Make a new node and copy the word inside it
node *NewNode = malloc(sizeof(node));
if (NewNode == NULL)
{
return false;
}
strcpy(NewNode->word, value);
NewNode->next = NULL;
// Calculate the hash key
// Key is the same whether the first character is upper case or lower case
unsigned int Key;
Key = hash(value);
// Check if chain[key] is empty
if (table[Key] == NULL)
{
table[Key] = NewNode;
}
// Collision
else
{
// Those two lines must be in order, otherwise you lose your chain
NewNode->next = table[Key];
table[Key] = NewNode;
}
return true;
}
To search for a certain value to make sure whether or not it exists inside your data structure, you should derive the key for that value, create a temporary node containing that value, and then compare the temporary node to each node inside the chain that starts with table[key]. If the temporary node value matches any node value inside the chain, then it means that the value exists. Otherwise not!
bool check(const char *word)
{
unsigned int Key;
Key = hash(word);
node *temp = table[Key];
// Calculate the length of the word that needs to be searched in the hash table
while (temp)
{
if (strcasecmp(word, temp->word) == 0)
{
// The word matches the data
return true;
}
temp = temp->next;
}
// Otherwise
return false;
}
After you finish using the hash function data structure, we would better free all allocated memory. In order to do that we can define a specific unload function in which we write the following few lines of codes.
node *FreeNode;
for (int i = 0; i < Number; i++)
{
// Free list, by using a while loop and a temporary variable to point
// to the next node before freeing the current one
while (table[i] != NULL)
{
FreeNode = table[i]->next;
free(table[i]);
table[i] = FreeNode;
}
}
Would you try updating and deleting a value in your own?
Not to forget that we will call some libraries like those in top of our C code file.
#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <strings.h>
#include <stdlib.h>
Also we shall not forget defining Number (number of nodes for the hash table)
#define Number 18278
If you wonder how I made this specific number, I just made a 26th counting system for strings of letters only.
If I add ([1-26] * 26^2) + ([1-26] * 26^1) + ([1-26] * 26^0) where [1-26] are the indices of letters in alphabets, so the maximum possible Number should be 18277. I added one extra place in the hash table for strings beginning with non-letter characters. You can modify the hash function if you query for other ASCII characters, so that you can calculate the key for table[18278] chain.
That equation is made for a word depending on it's first three letters. What will happen if I make the hash function for a word depending on it's first four/five letters?
You can find the source code in my repository at github:
https://github.com/kobros-tech/Hashing-Algorithm-for-C
That is everything for now. If you have a question or a suggestion, I welcome your comments and emails.
Comments
Post a Comment