Home >>Distributed DBMS Tutorial >DDBMS - Query Optimization in Distributed Systems
Processing a query in a distributed database system consists of optimization at both the global and local levels. The query reaches the client or control site of the database system. Here, the user is validated, the query is checked, translated, and globally optimized.
You may represent the architecture as –
As follows, the process of mapping global queries to local ones can be realized
For example, let us consider that according to the area, the following project schema is horizontally fragmented, the cities being New Delhi, Kolkata and Hyderabad.
PROJECT
PId | City | Department | Status |
---|
Suppose there is a query for all projects whose status is "Ongoing" to retrieve details.
The global query will be &inus;
$$\sigma_{status} = {\small "ongoing"}^{(PROJECT)}$$
Query in New Delhi’s server will be −
$$\sigma_{status} = {\small "ongoing"}^{({NewD}_-{PROJECT})}$$
Query in Kolkata’s server will be −
$$\sigma_{status} = {\small "ongoing"}^{({Kol}_-{PROJECT})}$$
Query in Hyderabad’s server will be −
$$\sigma_{status} = {\small "ongoing"}^{({Hyd}_-{PROJECT})}$$
In order to get the overall result, we need to union the results of the three queries as follows −
$\sigma_{status} = {\small "ongoing"}^{({NewD}_-{PROJECT})} \cup \sigma_{status} = {\small "ongoing"}^{({kol}_-{PROJECT})} \cup \sigma_{status} = {\small "ongoing"}^{({Hyd}_-{PROJECT})}$
Distributed query optimization involves a large number of query trees to be evaluated, each of which produces the necessary query results. This is largely due to the existence of large amounts of repeated and fragmented information. The goal, therefore, is to find an optimal solution rather than the best solution.
The main problems for optimizing distributed queries are −
Optimal Utilization of Resources in the Distributed System
In order to perform the operations pertaining to a query, a distributed system has a variety of database servers on different sites. The methods for efficient use of resources are below.
Operation Shipping: The operation is carried out on the site where the data is processed and not on the site of the client during operation shipping. The findings are then moved to the client site. This is ideal for operations where operands are accessible at the same place. Example: Operations of Select and Project.
Data Shipping-The data fragments are moved to the database server during data shipping, where the operations are performed. This is used in systems in which the operands are distributed at different sites. In systems where the communication costs are minimal, and local processors are much slower than the client server, this is also acceptable.
Hybrid Shipping- This is a combination of shipping of data and activities. Here, data fragments are moved where the procedure runs to the high-speed processors. The findings are then sent to the client site.
Query Trading
The control / client site for a distributed query is called the buyer and the sites where the local queries are performed are called sellers in the query trading algorithm for distributed database systems. A variety of alternatives for selecting sellers and reconstructing the global results are formulated by the buyer. The buyer's aim is to achieve the optimal cost.
The algorithm begins with the buyer assigning the seller sites to sub-queries. From local optimised query plans proposed by the sellers in conjunction with the communication costs for reconstructing the final result, the optimal plan is generated. The query is performed until the global optimal plan is formulated.
Reduction of Solution Space of the Query
Generally, the optimal solution requires reducing the space of the solution such that the cost of query and data transfer is reduced. This, like heuristics in centralized systems, can be done by a set of heuristic rules.
Some of the rules are below: