Home >>Distributed DBMS Tutorial >DDBMS - Query Optimization in Centralized Systems
The optimal access path is determined once the alternative access paths are derived for processing a relational algebra expression. We will discuss query optimization in a centralized system in this chapter, while we will research query optimization in a distributed system in the next chapter.
Query processing is performed in a centralized system with the following goal:
The SQL query is scanned initially. Then it is evaluated to look for syntactic errors and data type correctness. If the query passes this step, it decomposes the query into smaller blocks of queries. Each block is then converted to the equivalent expression in relational algebra.
Steps for Query Optimization
Three steps are involved in query optimization, namely generation of query tree, generation of plan, and generation of query plan code.
Step 1 − Query Tree Generation
A query tree is a structure of tree data that represents a relational expression of algebra. The query tables are represented as leaf nodes. As the internal nodes, the relational algebra operations are represented. As a whole, the root represents the query.
An internal node is executed during execution whenever its operand tables are available. The node is then replaced by the table for the result. Until the root node is executed and replaced by the result table, this method continues for all internal nodes.
Let us consider, for instance , the following schemas
EMPLOYEE
EmpID | EName | Salary | DeptNo | DateOfJoining |
---|
DEPARTMENT
DNo | DName | Location |
---|
Example 1
Let us consider the query as the following.
$$\pi_{EmpID} (\sigma_{EName = \small "ArunKumar"} {(EMPLOYEE)})$$
The corresponding query tree will be −
Example 2
Let us consider another query involving a join.
$\pi_{EName, Salary} (\sigma_{DName = \small "Marketing"} {(DEPARTMENT)}) \bowtie_{DNo=DeptNo}{(EMPLOYEE)}$
Following is the query tree for the above query.
Step 2 − Query Plan Generation
A query plan is made after the query tree is created. A query plan is an extended tree of queries that provides access paths for all query tree operations. Access paths define how you can perform the relational operations in the tree. A selection operation, for instance, may have an access path that provides information about how to use the B+ tree index for selection.
In addition, the query plan also states how to move the intermediate tables from one operator to the next, how to use temporary tables, and how to pipeline / combine operations.
Step 3− Code Generation
The final step in query optimization is code generation. It is the executable form of the query, the form of which is based on the type of the operating system underlying it. Once the query code is created, it is run by the Execution Manager and the results are produced.
Exhaustive search and heuristics-based algorithms are often used among the approaches for query optimization.
Exhaustive Search Optimization
In these techniques, all possible query plans are initially created for a query and the best plan is then chosen. While these techniques have the best solution, due to the large solution space, they have an exponential time and space complexity. Dynamic programming techniques, for instance.
Heuristic Based Optimization
For query optimization, heuristic based optimization uses rule-based optimization approaches. These algorithms have a complexity of polynomial time and space, which is lower than the exponential complexity of exhaustive algorithms based on search. These algorithms do not, however, actually generate the best plan for the query.
Some of the common rules for heuristic are −
Perform prior to entering operations, pick and project operations. This is achieved by moving the operations of pick and project down the question tree. This limits the number of tuples which can be joined.
Before the other operations, execute the most restrictive select / project operations first.
Avoid cross-product operations because they result in intermediate tables of very large sizes