Indexing is a way to optimise the performance of a database by minimising the number of disk accesses required when a query is processed. It is a data structure technique which is used to quickly locate and access the data in a database.
Indexes are created using a few database columns.
- The first column is the Search key that contains a copy of the primary key or candidate key of the table. These values are stored in sorted order so that the corresponding data can be accessed quickly.
Note: The data may or may not be stored in sorted order. - The second column is the Data Reference or Pointer which contains a set of pointers holding the address of the disk block where that particular key value can be found.
Please note that the indexes are created on Disk, not in main memory. It is always better to create indexes that are sorted on the basis of keys so that binary search can be applied for searching. It doesn’t make much sense to create unsorted indexes since access time will be 0(n).
Types of Indexing
Primary Index
Primary Index is a sorted file which is of fixed length size with two fields. The first field is the primary key and second field, is pointed to that specific data block. In the primary index, there is always one to one relationship between the entries in the index table.
The primary Indexing in DBMS is also further divided into two types.
- Dense Index
- Sparse Index
Dense Index
- For every search key value in the data file, there is an index record.
- This record contains the search key and also a reference to the first data record with that search key value.
Sparse Index
- The index record appears only for a few items in the data file. Each item points to a block as shown.
- To locate a record, we find the index record with the largest search key value less than or equal to the search key value we are looking for.
- We start at that record pointed to by the index record, and proceed along with the pointers in the file (that is, sequentially) until we find the desired record.
Below is an database index Example of Sparse Index
Secondary Indexing
- In secondary indexing, our main file is unsorted.
- Secondary indexing can be done on key as well as non key attribute.
- A non clustered or secondary index just tells us where the data lies, i.e. it gives us a list of virtual pointers or references to the location where the data is actually stored.
- We can have only dense ordering in the secondary indexing as sparse ordering is not possible because data is not sorted in the main file.
Let’s understand secondary indexing with a database index example:
In a bank account database, data is stored sequentially by acc_no; you may want to find all accounts in of a specific branch of ABC bank.
Here, you can have a secondary index in DBMS for every search-key. Index record is a record point to a bucket that contains pointers to all the records with their specific search-key value.
Clustered Indexing
- Clustering indexing is defined on a sorted data file. The data file is sorted on a non-key field like age of a student or salary of an employee etc. In real world, there are cases where we want to search on the basis of non unique key. For such cases, the index is created on non-primary key columns which may not be unique for each record.
- In such cases, in order to identify the records faster, we will group two or more columns together to get the unique values and create index out of them. This method is known as the clustered indexing
- There will be only one entry for each unique value of non key attribute in the index table.
- Basically, records with similar characteristics are grouped together and indexes are created for these groups. For example, students studying in each semester are grouped together. i.e. 1st Semester students, 2nd semester students, 3rd semester students etc are grouped.
Multilevel Indexing
Multilevel Indexing in database is created when a primary index does not fit in memory. In this type of indexing method, we can create indexes for indexes in a sparse manner.
B-Trees
Indexing in dbms works on B-Trees internally. B-tree is a data structure that store data in its node in sorted order. We can represent sample B-tree as follows.
B-tree stores data such that each node contains keys in ascending order. Each of these keys has two references to another two child nodes. Te left side child node keys are less than the current keys and the right side child node keys are more than the current keys. If a single node has “n” number of keys, then it can have maximum “n+1” child nodes.
B-tree index is the widely used data structures for tree based indexing in DBMS. It is a multilevel format of tree based indexing in DBMS technique which has balanced binary search trees. All leaf nodes of the B tree signify actual data pointers. Time for insertion, deletion and searching in B-Trees is O(log n) which is equivalent to the height of the tree.
How is indexing done in DBMS using B-Trees
When B-tree comes to the database indexing, this data structure gets a little complicated by not just having a key, but also having a value associated with the key. This value is a reference to the actual data record.
Assume the following three-column table needs to be stored on a database.
First, the database creates a unique random index (or primary key) for each of the given records and converts the relevant rows into a byte stream. Then, it stores each of the keys and record byte streams on a B+tree. Here, the random index used as the key for indexing. The resulting B+tree could be represented as follows.
Here you can see that all records are stored in the leaf nodes of the B+tree and index used as the key to creating a B+tree. No records are stored on non-leaf nodes. Each of the leaf nodes has reference to the next record in the tree. A database can perform a binary search by using the index or sequential search by searching through every element by only travelling through the leaf nodes.
If no indexing is used, then the database reads each of these records to find the given record. When indexing is enabled, the database creates three B-trees for each of the columns in the table as follows. Here the key is the B-tree key used for indexing. The index is the reference to the actual data record.
When indexing is used first, the database searches a given key in correspondence to B-tree and gets the index in O(log(n)) time. Then, it performs another search in B+tree by using the already found index in O(log(n)) time and gets the record.
Advantages of Indexing
- It helps you to reduce the total number of I/O operations needed to retrieve that data, so you don’t need to access a row in the database from an index structure.
- Offers faster search and retrieval of data to users.
Disadvantages of Indexing
- To perform the indexing database management system, you need a primary key on the table with a unique value.
- You can’t perform any other indexes in Database on the Indexed data.
- SQL Indexing Decrease performance in INSERT, DELETE, and UPDATE query.
That was all about indexing in DBMS. I’ve tried to cover everything you need to know about indexing in detail. Feel free to reach out to me in case you have any doubts :)