Skip to content

Instantly share code, notes, and snippets.

@maheensaleh
Last active September 28, 2020 17:37
Show Gist options
  • Save maheensaleh/b2093051402729831a2bc52b0a1d88c3 to your computer and use it in GitHub Desktop.
Save maheensaleh/b2093051402729831a2bc52b0a1d88c3 to your computer and use it in GitHub Desktop.

Indexing:

Indexing is helpful to search records in a database in an efficient manner with reduced access time.( search is performed with the help of that field-value on which an index is created/defined )

Some basic concepts related to indexing are explained below:

Key Terminologies:

  • Search Key : The attribute through which the records are searched.
  • Record File : The file in which the complete reocrd/data is stored.
  • Index File : The file in which the indices are mantained.

Types of Indices:

There are basically two types of indices :

1 - Ordered Indices 2 - Hash Indices

1- Ordered Indices :

In ordered indices, the search-key values are always in sorted order in the index file. Whereas, these search-key values might and might not be in sorted order in the original record file. This gives us the following types of Ordered Indices :

a ) Clustering Indices or Primary Indices :

Clustering indices have following properties :

  • The search-key values in the record files is also in sorted order.
  • Generally, search-key is the primary key but not necessary.

Also called Primary Index but that doesn;t mean k seacrch-key is always a PK, it can be a non PK.

b ) Non Clustering Indices or Secondary Indices :

Non Clustering Index has following properties:

  • The search key values in the stored record files is not in sorted order.

  • Both candidate and non-candidate keys can be taken as search-key.

Concept of Dense and Sparse Index:

We have two techniques associated with ordered indices:

1 - Dense Index 2 - Sparse Index

- Dense Index:

It has following properties :

  • For every unique search-key value present in the records file, an index is created in the index file.

  • There are two methods by which pointers to records can be stored in the index file.:

  • Method 01 : The index file may store Index pointer to only the first record occurence of every search-key value in the index file.

  • Method 02 : Index file may store pointer to every record occurence of every search key values in the index file.

  • For Clustering Index: Dense index can be defined by any of the above two methods.

  • For Non Clustering Index: Dense index must be defined by method # 2.

    WHY ? : Because for non clustering, the search-key values in the record file are not ordered. i.e all records with a particular search key value will not be stored consecutively in the records file, they will be scattered through out the file and hence we have to scan all the records to find the desired one. Hence pointer to every record must be stored in the index file i.e method 02

- Sparse Index:

It has following properties :

  • Index is created for some ( not all ) search-key values present in the records file.

  • For Clustering Index: Sparse index can be defined for clustering index.

  • For Non Clustering Index: Sparse index cannot be defined.( WHY? : reason same as above )

Multilevel index :

The sole purpose of indexing is to reduce the reduce the time required to get a value from a large data. But for a very very large data, the index file itself becomes large which is undesireable.

To encounter this issue, an another index is created on the index itself. thus called multilevel indexing.

Note : The size of index file must be such that it can be easily kept in the main memory without consuming a very significant amount of space.

2- Hash Indices:

Indices are made with the help of a hashing function.

Key Terminologies :

Hash File Organization : Organizes the records of a record file by a hasing technique.

Hash Index Organization : Organizes the indices of an index file by a hasing technique. Bucket : A unit of storage that stores multiple records. Hash Function : A function that maps a search-key value to a bucket.

Note A bucket may store index entries for the search key values of the records along with the respective pointers OR A bucket may store the actual records of the of their respective search-key values.

Search in Hash Indices:

Search in hash indices has following steps:

  • Say we want to search a record with a search-key value of K,
  • We put this K in the hash function whihc tells us that in which bucket, K can be found. Let that buckte be B.
  • The bucket B may also contain records other than the one we are searching for.
  • Hence we have to scan the enire bucket B to find record with the desired search-key value i.e K

Bucket Overflow:

  • If a bucket is full and more data is to be inserted in it, then bucket overlfow is said to occur.

  • In case of overlfow for a bucket B,an additional bucket is provided to *B called the overlfow buckets.

  • If overflow occurs, again then another overlfow bucket is provided to B.

  • Overflow bucketof B are linked together by a linked list

  • To search a value in this bucket B, it is necessary to search it in all its overflow buckets.

Note : This technique is called Closed Addressing or Closed Hashing or Overflow Chaining

Skew Condition:

The condition in which some buckets are assigned more data than the other buckets. This can occur as a result of :

  • Incorrect choice of hash fucntion which maps most of the search key values to the same hash fucntion.

  • Most of the records might have same search key values.

Hence the hash function must be such that it assigns equal data to all buckets. This is the ideal case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment