Simple hash function and hash table algorithm with an example of application for strings

S

imple hash function and hash table algorithm

with an example of application for strings

Hi there, If you need to use and try a good and simple hash function algorithm for any purpose especially for string query. I hope that your answer be in this article.

Hash function data structure consists of:

Chaining node
Table of nodes
Hash function

Chaining node:

Node is a container for the value (a word for example), and it links with another node forming a chain of nodes.

To define a node in C for example:

typedef struct node
{
char word[word-length];
struct node *next;
}
node;

where struct node is the name of the node in chain, we shortcut the struct node into simply node in the last line above. Struct is the datatype of the node.

char word is the variable that carries the value (a word for example) that we will search for up-next. word-length is an integer (46 for example), and it represents the maximum number of characters that word can carry.

Hash Table (table of nodes) Definition:

Hash table is simply a table of nodes, we can define a list containing certain number of pointers for nodes.

to define a hash table in C for example:

node *table[Number];

we can see that table is a list of node pointers, and Number is the number of pointers in the table list. We may specify that Number = 26 as the number of alphabets, but in this example for high speed processing Number will equal 18278.

Hash Function Definition:

Each node carries a value, points to another node. In addition to that it has a unique key (or index) that specifies the certain value inside each node.

If some nodes share the same key, we must chain them together.

if we make a wide range for the hash key, we get a rapid result from the hash function.

to define a hash function in C for example:

int hash(const char *word)
{
    // Key is the same whether the first character is upper case or lower case
    unsigned int key = 0, First_Letter_Index = 0, Second_Letter_Index = 0, Third_Letter_Index = 0;

    // Calculate the length of the word that needs to be inserted in the hash table
    int Word_Lengh = strlen(word);

    if (Word_Lengh == 1)
    {
        if (65 <= word[0] && word[0] <= 90)
        {
            Third_Letter_Index = (word[0] - 65 + 1) % Number;
        }
        else if (97 <= word[0] && word[0] <= 122)
        {
            Third_Letter_Index = (word[0] - 97 + 1) % Number;
        }
    }
    if (Word_Lengh == 2)
    {
        if (65 <= word[0] && word[0] <= 90)
        {
            Second_Letter_Index = (word[0] - 65 + 1) % Number;
        }
        else if (97 <= word[0] && word[0] <= 122)
        {
            Second_Letter_Index = (word[0] - 97 + 1) % Number;
        }
        /********************************************************/
        if (65 <= word[1] && word[1] <= 90)
        {
            Third_Letter_Index = (word[1] - 65 + 1) % Number;
        }
        else if (97 <= word[1] && word[1] <= 122)
        {
            Third_Letter_Index = (word[1] - 97 + 1) % Number;
        }
    }
    if (Word_Lengh >= 3)
    {
        if (65 <= word[0] && word[0] <= 90)
        {
            First_Letter_Index = (word[0] - 65 + 1) % Number;
        }
        else if (97 <= word[0] && word[0] <= 122)
        {
            First_Letter_Index = (word[0] - 97 + 1) % Number;
        }
        /********************************************************/
        if (65 <= word[1] && word[1] <= 90)
        {
            Second_Letter_Index = (word[1] - 65 + 1) % Number;
        }
        else if (97 <= word[1] && word[1] <= 122)
        {
            Second_Letter_Index = (word[1] - 97 + 1) % Number;
        }
        /********************************************************/
        if (65 <= word[2] && word[2] <= 90)
        {
            Third_Letter_Index = (word[2] - 65 + 1) % Number;
        }
        else if (97 <= word[2] && word[2] <= 122)
        {
            Third_Letter_Index = (word[2] - 97 + 1) % Number;
        }
    }

    key = (First_Letter_Index * 26 * 26) + (Second_Letter_Index * 26) + (Third_Letter_Index * 1);
    return (key - 1);
}

At this point our hash function data structure is completely defined. In the following we will discuss how to operate this all.

We would better make an init function to set the initial pointers in each node in the table to null.

for (int i = 0; i < Number; i++)
    {
        // Set the initial pointers in the has table to null
        table[i] = NULL;
    }

To insert a new value into your data structure, you should declare a new node object (allocate memory for that) in which you will insert the new value, derive a certain key for that value, and then insert the node into the hash table according to that certain key.

bool insert(char *value)
{

    // Make a new node and copy the word inside it
    node *NewNode = malloc(sizeof(node));
    if (NewNode == NULL)
    {
        return false;
    }
    strcpy(NewNode->word, value);
    NewNode->next = NULL;

    // Calculate the hash key
    // Key is the same whether the first character is upper case or lower case
    unsigned int Key;
    Key = hash(value);

    // Check if chain[key] is empty
    if (table[Key] == NULL)
    {
        table[Key] = NewNode;
    }
    // Collision
    else
    {
        // Those two lines must be in order, otherwise you lose your chain
        NewNode->next = table[Key];
        table[Key] = NewNode;
    }
    return true;
}

To search for a certain value to make sure whether or not it exists inside your data structure, you should derive the key for that value, create a temporary node containing that value, and then compare the temporary node to each node inside the chain that starts with table[key]. If the temporary node value matches any node value inside the chain, then it means that the value exists. Otherwise not!

bool check(const char *word)
{
    unsigned int Key;
    Key = hash(word);

    node *temp = table[Key];

    // Calculate the length of the word that needs to be searched in the hash table
    while (temp)
    {
        if (strcasecmp(word, temp->word) == 0)
        {
            // The word matches the data
            return true;
        }
        temp = temp->next;
    }
    // Otherwise
    return false;

}

After you finish using the hash function data structure, we would better free all allocated memory. In order to do that we can define a specific unload function in which we write the following few lines of codes.

node *FreeNode;
    for (int i = 0; i < Number; i++)
    {
        // Free list, by using a while loop and a temporary variable to point
        // to the next node before freeing the current one
        while (table[i] != NULL)
        {
            FreeNode = table[i]->next;
            free(table[i]);
            table[i] = FreeNode;
        }

    }

Would you try updating and deleting a value in your own?

Not to forget that we will call some libraries like those in top of our C code file.

#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>
#include <strings.h>
#include <stdlib.h>

Also we shall not forget defining Number (number of nodes for the hash table)

#define Number 18278

If you wonder how I made this specific number, I just made a 26th counting system for strings of letters only.

If I add ([1-26] * 26^2) + ([1-26] * 26^1) + ([1-26] * 26^0) where [1-26] are the indices of letters in alphabets, so the maximum possible Number should be 18277. I added one extra place in the hash table for strings beginning with non-letter characters. You can modify the hash function if you query for other ASCII characters, so that you can calculate the key for table[18278] chain.

That equation is made for a word depending on it's first three letters. What will happen if I make the hash function for a word depending on it's first four/five letters?

You can find the source code in my repository at github:

https://github.com/kobros-tech/Hashing-Algorithm-for-C

That is everything for now. If you have a question or a suggestion, I welcome your comments and emails.

Use CS50 library in my local machine offline to run codes in C language

M ake your PC ready to run codes in C language How to use CS50 library in your local machine offline Here are three videos presented by someone, they will guide you to make your PC ready to run C files. How to Download and Install Visual Studio Code ( VS Code ) on Windows 10 How to Download and Install C Cpp Toolset ( gcc g++ gdb ) in Windows 10 using mingw-w64 and msys2 How to Set up Visual Studio Code for C and C++ Programming After watching the above videos and following the steps in them, you can apply the following steps in order to use CS50 library for implementing codes written in C language in your local machine offline. Download the zip file from Github Release, https://github.com/cs50/libcs50/releases Unzip it, locate to libcs50/src/, you can get cs50.h and cs50.c Copy cs50.h and cs50.c in the Workspace Create and save a C file which uses cs50 libraries in the Workspace. We can call it hello.c, hello.c should be with cs50.h and cs50.c in the same folde...

Enlighten Mind

Search This Blog

Categories