Hashing And Its Different Techniques

SIDDHARTH NAHAR
6 min readDec 24, 2020

Co-written by :-SIDDHARTH NAHAR, SIDDHI MUNDADA, SRUSHTI NAIK,DAANISH SHAIKH, VISHWESH MEHER, TEJAS MURKYA,

Heads up, if you've come here searching for different techniques to make hash brown burgers then you should probably turn back, sorry to disappoint you.

Fig 1. Hashing illustration created by Sofi Salazar.

This is a blog about the hashing cryptography technique which we use to secure our passwords and other private stuff online and offline.

Before we move onto the the encryption techniques let us know what the hashing data structure is.

Hashing is an important Data Structure which is designed to use a special function called the Hash function which is used to map a given value with a particular key for faster access of elements. The efficiency of mapping depends of the efficiency of the hash function used.” (Source:- GeeksForGeeks).

Hashing provides constant time search, insert and delete operations on average. This is why hashing is one of the most used data structure, example problems are, distinct elements, counting frequencies of items, finding duplicates, etc. And also with the help of hashing, you can perform the insertion, deletion, and searching operation in O (1) time. (HASHING IS FAST).

Fig 2. Just for fun.

Basics Of Hashing.

As the name loosely suggests, hashing is a process for chopping up and mixing information. In a typical data operation, a string of characters will be transformed into a shorter string of fixed length which represents the original character string. This reduced-length value is also known as a key.

Hashing can be defined as a way to store data into some data structure (generally Hash Table is used) in the form of key-value pairs. For each data you will assign some key and based on that key the insertion, deletion, and searching of your data will be performed.

The key, which is used to identify the data, is given as an input to the hashing function. The hash code, which is an integer, is then mapped to the fixed size we have.

Hashing Function: The Core of Hashing Algorithm

Just like there’s no hash browns without potatoes, there’s no hashing algorithm without a hashing function.

Fig 3. A simple Hash function (too many hash jokes sorry ;))

Let’s put the jokes aside for a moment and concentrate on the crux of the matter. A hash function is a mathematical function that converts an input value into a compressed numerical value — a hash or hash value. Basically, it’s a processing unit that takes in data of arbitrary length and gives you the output of a fixed length — the hash value.

Hash Function: Hash functions or hashing algorithms are the mathematical procedures used in computing hash values from base input numbers and character strings. They may be highly complex, and can produce a hash value that’s almost impossible to derive from the original input data without knowing the applied hash function. For this reason, the keys used in public key encryption may be derived from such algorithms.
A good hash function should have following properties -
1) Efficiently computable.
2) Should uniformly distribute the keys (Each table position equally likely for each key)

Fig 4. A Hash table

Hash Table: An array that stores pointer to records corresponding to a given phone number. An entry in the hash table is NIL if no existing phone number has hash function value equal to the index for the entry.

Popular Hashing Algorithms

Here are some relatively simple hash functions/algorithms that have been used:

Fig 5. The MD5 Hashing Algorithm (picture credits)

1)MD5: An MD5 hash function encodes a string of information and encodes it into a 128-bit fingerprint. MD5 is often used as a checksum to verify data integrity. However, due to its age, MD5 is also known to suffer from extensive hash collision vulnerabilities, but it’s still one of the most widely used algorithms in the world.

2)Collision Handling: Since a hash function gets us a small number for a big key, there is possibility that two keys result in same value. The situation where a newly inserted key maps to an already occupied slot in hash table is called collision and must be handled using some collision handling technique.

Whenever a collision occurs in a hash table, then that collision can be resolved using two techniques:

1. Chaining — The idea is to make each cell of hash table point to a linked list of records that have same hash function value. Chaining is simple, but requires additional memory outside the table.

2. Open Addressing -In open addressing, all elements are stored in the hash table itself. Each table entry contains either a record or NIL. When searching for an element, we one by one examine table slots until the desired element is found or it is clear that the element is not in the table.

Fig 6. Collision resolution techniques (picture credits)

3)The Division-remainder Method: This is a relatively basic hashing algorithm in which the total number of items in a data array is estimated. This figure is then divided into each data value to give a quotient (which is disregarded) and a remainder — which is taken as the hash value. Clearly, with a large number of figures, many duplicate remainders emerge, leading to the possibility of multiple collisions. So search algorithms for such data sets must take this flaw into account.

4) Folding Function: It involves the procedure of splitting keys into two or more parts and then combining the pieces and form the hash addresses.

Example-: To map the key 26456715 to a range between 0 and 9999, you can: split the number into two as 2645 and 6715, and then add these two to obtain 9360 as the hash value.

5)Secure Hash Algorithm-2 (SHA-2): SHA-2, developed by the National Security Agency (NSA), is a cryptographic hash function. SHA-2 includes significant changes from its predecessor, SHA-1. The SHA-2 family consists of six hash functions with digests (hash values) that are 224, 256, 384 or 512 bits: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, SHA-512/256.

6)CRC32: A cyclic redundancy check (CRC) is an error-detecting code often used for the detection of accidental changes to data. Encoding the same data string using CRC32 will always result in the same hash output, thus CRC32 is sometimes used as a hash algorithm for file integrity checks. These days, CRC32 is rarely used outside of Zip files

There are several well-known hash functions used in cryptography. These include the message-digest hash functions MD2, MD4, used for hashing digital signatures into a shorter value called a message-digest, and the Secure Hash Algorithm (SHA), a standard algorithm, that makes a larger (60-bit) message digest and is similar to MD4. A hash function that works well for database storage and retrieval, however, might not work as for cryptographic or error-checking purposes.

Shall We Wrap Up?

In conclusion, hash is a useful tool which verifies that the data is copied correctly between two resources. It can also check if the data are identical without opening and comparing them.We can use it primarily for retrieval of items in a database as it makes it quicker to find the item using a short hash key than to locate it using the original value.

It ensures that the messages during transmission are not tamper with and thus plays a vital role in the data security system. It is also used for encryption and decryption of digital signatures using hash codes.

--

--

SIDDHARTH NAHAR

Electronics undergrad with a passion for Computer science and ML