![]() |
VOOZH | about |
A single query can be executed in many ways. Query optimization helps choose the most efficient plan by comparing different execution methods to find the one with the lowest cost.
Importance: The goal of query optimization is to reduce the system resources required to fulfill a query, and ultimately provide the user with the correct result set faster.
To optimize a query, we use equivalence rules to rewrite it into simpler, equivalent relational algebra expressions. Below are some of the ways to optimize queries:
1. Conjunctive selection operations can be written as a sequence of individual selections. This is called a sigma-cascade.
Explanation: Applying condition intersection is expensive. Instead, filter out tuples satisfying condition (inner selection) and then apply condition (outer selection) to the then resulting fewer tuples. This leaves us with less tuples to process the second time. This can be extended for two or more intersecting selections. Since we are breaking a single condition into a series of selections or cascades, it is called a "cascade".
2. Selection is commutative.
Explanation: condition is commutative in nature. This means, it does not matter whether we apply first or first. In practice, it is better and more optimal to apply that selection first which yields a fewer number of tuples. This saves time on our outer selection.
3. All following projections can be omitted, only the first projection is required. This is called a pi-cascade.
Explanation: A cascade or a series of projections is meaningless. This is because in the end, we are only selecting those columns which are specified in the last, or the outermost projection. Hence, it is better to collapse all the projections into just one i.e. the outermost projection.
4. Selections on Cartesian Products can be re-written as Theta Joins.
5. Theta Joins are commutative.
Explanation: Theta Joins are commutative, and the query processing time depends to some extent which table is used as the outer loop and which one is used as the inner loop during the join process (based on the indexing structures and blocks).
6. Join operations are associative.
7. Selection operation can be distributed.
8. Projection distributes over the Theta Join.
9. Union and Intersection are commutative.
Explanation: Union and intersection are both distributive; we can enclose any tables in parentheses according to requirement and ease of access.
10. Union and Intersection are associative.
Explanation: Union and intersection are both distributive; we can enclose any tables in parentheses according to requirement and ease of access.
11. Selection operation distributes over the union, intersection, and difference operations.
Explanation: In set difference, we know that only those tuples are shown which belong to table E1 and do not belong to table E2. So, applying a selection condition on the entire set difference is equivalent to applying the selection condition on the individual tables and then applying set difference. This will reduce the number of comparisons in the set difference step.
12. Projection operation distributes over the union operation.
Explanation: Applying individual projections before computing the union of E1 and E2 is more optimal than the left expression, i.e. applying projection after the union step.
A set of equivalence rules is said to be minimal if no rule can be derived from any combination of the others. A query is said to be optimal when it is minimal.
Examples: Assume the following tables:
instructor(ID, name, dept_name, salary)
teaches(ID, course_id, sec_id, semester, year)
course(course_id, title, dept_name, credits)
Query 1: Find the names of all instructors in the Music department, along with the titles of the courses that they teach
Here, dept_name is a field of only the instructor table. Hence, we can select out the Music instructors before joining the tables, hence reducing query time.
Optimized Query: Using rule 7a, and Performing the selection as early as possible reduces the size of the relation to be joined.
Query 2: Find the names of all instructors in the CSE department who have taught a course in 2009, along with the titles of the courses that they taught
Optimized Query: We can perform an "early selection", hence the optimized query becomes: