public class CombinerOptimizer
extends MROpPlanVisitor
Optimize map reduce plans to use the combiner where possible.
Algebriac functions and distinct in nested plan of a foreach are partially
computed in the map and combine phase.
A new foreach statement with initial and intermediate forms of algebraic
functions are added to map and combine plans respectively.
If bag portion of group-by result is projected or a non algebraic
expression/udf has bag as input, combiner will not be used. This is because
the use of combiner in such case is likely to degrade performance
as there will not be much reduction in data size in combine stage to
offset the cost of the additional number of times (de)serialization is done.
Major areas for enhancement:
1. use of combiner in cogroup
2. queries with order-by, limit or sort in a nested foreach after group-by
3. case where group-by is followed by filter that has algebraic expression