The search time of this algorithm is independent of the size of elements. In binary and sequential searching a number of comparisons are required to find out a value. In searching best algorithms are those that minimize these comparisons. In hashing there is no unnecessary searching. In one single attempt the value may be retrieved from the array. When the value is retrieved in one chance, this means there is some relation between the value and the location where the value is stored. The values themselves can serve as indices to the array.
Let us consider that a company with an inventory file where each part has a part number of 5 digits and the part numbers are to be stored in an array. Suppose the company has 100 such parts. If we set the part numbers as the array index then for 100 part numbers we need an array of huge size where most of the memory cells will be left vacant. Here we will use the last two digits of the part numbers as the array index, so the lower bound of such array will be 0 and upper bound will be 99.
A function that converts the part number into an array index is called hash function. If HAS is the hash function and VALUE is the part number, then HAS (VALUE) is called the hash of key and is the index where the value should be set in the array. If K is the value returned by hash function then K is called the hash key of VALUE. The hash function of the above example is HAS (VALUE) =VALUE % 100.This function can produce a number between 0 and 99 and this will be the index value.
So if the part number is 11111 then its position in the array is 11th index. For part number 78988, the number will be set at 88th index.
This method has one flaw if hash key of two different part numbers becomes same. Suppose the numbers are 78988 and 23188.Here in both the cases hash key is 88.If 78988 is set in the 88th position of the array then we cannot set 23188 in the same location. Such a situation is called hash collision or hash clash.
There are two basic methods of dealing with a hash collision. The first process is known as rehashing where another hash function is used on the hash key to find the next available empty space in the array where the value will be positioned. The second method is called chaining where a separate linked list is created and all the values with same hash key are set.
No comments:
Post a Comment