Building Query Compilers (Under Construction) [expected time to completion: 5 years] Guido Moerkotte October 31, 2024 Contents I Basics 3 1 Introduction 1.1 General Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 DBMS Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Interpretation versus Compilation . . . . . . . . . . . . . . . . . . 1.4 Requirements for a Query Compiler . . . . . . . . . . . . . . . . 1.5 Search Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Generation versus Transformation . . . . . . . . . . . . . . . . . 1.7 Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 Organization of the Book . . . . . . . . . . . . . . . . . . . . . . 5 5 5 6 9 11 12 12 13 2 Textbook Query Optimization 2.1 Example Query and Outline . . . . . . . . . . . . . . . . . . . . . 2.2 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Canonical Translation . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Logical Query Optimization . . . . . . . . . . . . . . . . . . . . . 2.5 Physical Query Optimization . . . . . . . . . . . . . . . . . . . . 2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 15 16 17 20 24 25 3 Join Ordering 31 3.1 Queries Considered . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1.1 Query Graph . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.1.2 Join Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.1.3 Simple Cost Functions . . . . . . . . . . . . . . . . . . . . 34 3.1.4 Classification of Join Ordering Problems . . . . . . . . . . 40 3.1.5 Search Space Sizes . . . . . . . . . . . . . . . . . . . . . . 41 3.1.6 Problem Complexity . . . . . . . . . . . . . . . . . . . . . 45 3.2 Deterministic Algorithms . . . . . . . . . . . . . . . . . . . . . . 47 3.2.1 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2.2 Determining the Optimal Join Order in Polynomial Time 49 3.2.3 The Maximum-Value-Precedence Algorithm . . . . . . . . 56 3.2.4 Dynamic Programming . . . . . . . . . . . . . . . . . . . 61 3.2.5 Memoization . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.2.6 Join Ordering by Generating Permutations . . . . . . . . 79 3.2.7 A Dynamic Programming based Heuristics for Chain Queries 81 3.2.8 Transformation-Based Approaches . . . . . . . . . . . . . 94 i ii CONTENTS 3.3 3.4 3.5 3.6 3.7 3.8 Probabilistic Algorithms . . . . . . . . . . . . . . . . . . . . . . . 101 3.3.1 Generating Random Left-Deep Join Trees with Cross Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3.3.2 Generating Random Join Trees with Cross Products . . . 103 3.3.3 Generating Random Join Trees without Cross Products . 107 3.3.4 Quick Pick . . . . . . . . . . . . . . . . . . . . . . . . . . 116 3.3.5 Iterative Improvement . . . . . . . . . . . . . . . . . . . . 116 3.3.6 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . 117 3.3.7 Tabu Search . . . . . . . . . . . . . . . . . . . . . . . . . 118 3.3.8 Genetic Algorithms . . . . . . . . . . . . . . . . . . . . . . 119 Hybrid Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 122 3.4.1 Two Phase Optimization . . . . . . . . . . . . . . . . . . 122 3.4.2 AB-Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 122 3.4.3 Toured Simulated Annealing . . . . . . . . . . . . . . . . 122 3.4.4 GOO-II . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 3.4.5 Iterative Dynamic Programming . . . . . . . . . . . . . . 123 Ordering Order-Preserving Joins . . . . . . . . . . . . . . . . . . 123 Characterizing Search Spaces . . . . . . . . . . . . . . . . . . . . 131 3.6.1 Complexity Thresholds . . . . . . . . . . . . . . . . . . . 131 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 4 Database Items, Building Blocks, and Access Paths 137 4.1 Disk Drive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 4.2 Database Buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 4.3 Physical Database Organization . . . . . . . . . . . . . . . . . . . 147 4.4 Slotted Page and Tuple Identifier (TID) . . . . . . . . . . . . . . 150 4.5 Physical Record Layouts . . . . . . . . . . . . . . . . . . . . . . . 151 4.6 Physical Algebra (Iterator Concept) . . . . . . . . . . . . . . . . 152 4.7 Simple Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 4.8 Scan and Attribute Access . . . . . . . . . . . . . . . . . . . . . . 153 4.9 Temporal Relations . . . . . . . . . . . . . . . . . . . . . . . . . . 155 4.10 Table Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 4.11 Indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 4.12 Single Index Access Path . . . . . . . . . . . . . . . . . . . . . . 158 4.12.1 Simple Key, No Data Attributes . . . . . . . . . . . . . . 158 4.12.2 Complex Keys and Data Attributes . . . . . . . . . . . . 163 4.13 Multi Index Access Path . . . . . . . . . . . . . . . . . . . . . . . 165 4.14 Indexes and Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 4.15 Remarks on Access Path Generation . . . . . . . . . . . . . . . . 172 4.16 Counting the Number of Accesses . . . . . . . . . . . . . . . . . . 172 4.16.1 Counting the Number of Direct Accesses . . . . . . . . . . 172 4.16.2 Counting the Number of Sequential Accesses . . . . . . . 182 4.16.3 Pointers into the Literature . . . . . . . . . . . . . . . . . 187 4.17 Disk Drive Costs for N Uniform Accesses . . . . . . . . . . . . . 188 4.17.1 Number of Qualifying Cylinders, Tracks, and Sectors . . . 188 4.17.2 Command Costs . . . . . . . . . . . . . . . . . . . . . . . 189 CONTENTS iii 4.17.3 Seek Costs . . . . . . . . . . . . . . . . . . . . . . . . . . 189 4.17.4 Settle Costs . . . . . . . . . . . . . . . . . . . . . . . . . . 191 4.17.5 Rotational Delay Costs . . . . . . . . . . . . . . . . . . . 191 4.17.6 Head Switch Costs . . . . . . . . . . . . . . . . . . . . . . 193 4.17.7 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 4.18 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . 194 4.19 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 II Foundations 197 5 Logic, Null, and Boolean Expressions 199 5.1 Two-Valued Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 199 5.2 Null Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 5.2.1 Functions and Operators . . . . . . . . . . . . . . . . . . 199 5.2.2 Comparison Operators . . . . . . . . . . . . . . . . . . . . 201 5.3 Three-Valued Logic . . . . . . . . . . . . . . . . . . . . . . . . . . 202 5.4 Preparation of Boolean Expressions . . . . . . . . . . . . . . . . . 204 5.5 Equivalence Classes based on Equality . . . . . . . . . . . . . . . 204 5.6 Nullability Inference . . . . . . . . . . . . . . . . . . . . . . . . . 206 5.7 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 6 Functional Dependencies 207 6.1 Functional Dependencies . . . . . . . . . . . . . . . . . . . . . . . 207 6.2 Functional Dependencies in the presence of NULL values . . . . . 208 6.3 Deriving Functional Dependencies over algebraic operators . . . . 208 6.4 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 7 An Algebra for Sets, Bags, and Sequences 209 7.1 Sets, Bags, and Sequences . . . . . . . . . . . . . . . . . . . . . . 209 7.1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 7.1.2 Duplicate Data: Bags . . . . . . . . . . . . . . . . . . . . 211 7.1.3 Explicit Duplicate Control . . . . . . . . . . . . . . . . . . 214 7.1.4 Ordered Data: Sequences . . . . . . . . . . . . . . . . . . 215 7.2 Aggregation Functions . . . . . . . . . . . . . . . . . . . . . . . . 216 7.3 Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 7.3.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 221 7.3.2 Signatures . . . . . . . . . . . . . . . . . . . . . . . . . . . 223 7.3.3 Projection . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 7.3.4 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 7.3.5 Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 7.3.6 Unary Grouping . . . . . . . . . . . . . . . . . . . . . . . 227 7.3.7 Unnest Operators . . . . . . . . . . . . . . . . . . . . . . 228 7.3.8 Flatten Operator . . . . . . . . . . . . . . . . . . . . . . . 229 7.3.9 Join Operators . . . . . . . . . . . . . . . . . . . . . . . . 229 7.3.10 Groupjoin . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 7.3.11 Min/Max Operators . . . . . . . . . . . . . . . . . . . . . 231 iv CONTENTS 7.3.12 Other Dependent Operators . . . . . . . . . . . . . . . . . 232 7.4 Linearity of Algebraic Operators . . . . . . . . . . . . . . . . . . 233 7.4.1 Linearity of Algebraic Operators . . . . . . . . . . . . . . 233 7.4.2 Exploiting Linearity . . . . . . . . . . . . . . . . . . . . . 238 7.5 Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239 7.5.1 Three Different Representations . . . . . . . . . . . . . . . 239 7.5.2 Conversion between Representations . . . . . . . . . . . . 241 7.5.3 Conversion between Bulk Types . . . . . . . . . . . . . . 241 7.5.4 Adjusting the Algebra . . . . . . . . . . . . . . . . . . . . 242 7.5.5 Partial Preaggregation . . . . . . . . . . . . . . . . . . . . 243 7.6 A Note on Equivalences . . . . . . . . . . . . . . . . . . . . . . . 243 7.7 Simple Reorderability . . . . . . . . . . . . . . . . . . . . . . . . 244 7.7.1 Unary Operators . . . . . . . . . . . . . . . . . . . . . . . 244 7.7.2 Push-Down/Pull-Up of Unary into/from Binary Operators 246 7.7.3 Binary Operators . . . . . . . . . . . . . . . . . . . . . . . 248 7.8 Predicate Detachment and Attachment . . . . . . . . . . . . . . . 253 7.9 Basic Equivalences for D-Join . . . . . . . . . . . . . . . . . . . . 255 7.10 Equivalences for Outerjoins . . . . . . . . . . . . . . . . . . . . . 257 7.10.1 Outerjoin Simplification . . . . . . . . . . . . . . . . . . . 263 7.10.2 Generalized Outerjoin . . . . . . . . . . . . . . . . . . . . 264 7.11 Equivalences for Unary Grouping . . . . . . . . . . . . . . . . . . 266 7.11.1 An Elementary Fact about Grouping . . . . . . . . . . . . 266 7.11.2 Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266 7.11.3 Left Outerjoin . . . . . . . . . . . . . . . . . . . . . . . . 277 7.11.4 Left Outerjoin with Default . . . . . . . . . . . . . . . . . 280 7.11.5 Full Outerjoin . . . . . . . . . . . . . . . . . . . . . . . . . 281 7.11.6 D-Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 7.11.7 Groupjoin . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 7.11.8 Intersection and Difference . . . . . . . . . . . . . . . . . 291 7.12 Eliminating Redundant Joins . . . . . . . . . . . . . . . . . . . . 291 7.13 Semijoin and Antijoin Reducer . . . . . . . . . . . . . . . . . . . 293 7.14 Outerjoin Simplification . . . . . . . . . . . . . . . . . . . . . . . 293 7.15 Correct and Complete Exploration of the Core Search Space . . . 293 7.15.1 The Core Search Space . . . . . . . . . . . . . . . . . . . 293 7.15.2 Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . 295 7.15.3 More Issues . . . . . . . . . . . . . . . . . . . . . . . . . . 303 7.16 Logical Algebra for Sequences . . . . . . . . . . . . . . . . . . . . 307 7.16.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 307 7.16.2 Algebraic Operators . . . . . . . . . . . . . . . . . . . . . 308 7.16.3 Equivalences . . . . . . . . . . . . . . . . . . . . . . . . . 311 7.16.4 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . 311 7.17 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 7.18 ToDo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 CONTENTS v 8 Declarative Query Representation 313 8.1 Calculus Representations . . . . . . . . . . . . . . . . . . . . . . 313 8.2 Datalog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 8.3 Tableaux Representation . . . . . . . . . . . . . . . . . . . . . . . 313 8.4 Monoid Comprehension . . . . . . . . . . . . . . . . . . . . . . . 313 8.5 Expressiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 8.6 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 9 Translation and Lifting 315 9.1 Query Language to Calculus . . . . . . . . . . . . . . . . . . . . . 315 9.2 Query Language to Algebra . . . . . . . . . . . . . . . . . . . . . 315 9.3 Calculus to Algebra . . . . . . . . . . . . . . . . . . . . . . . . . 315 9.4 Algebra to Calculus . . . . . . . . . . . . . . . . . . . . . . . . . 315 9.5 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 10 Query Equivalence, Containment, Minimization, and Factorization 317 10.1 Set Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 10.1.1 Conjunctive Queries . . . . . . . . . . . . . . . . . . . . . 318 10.1.2 . . . with Inequalities . . . . . . . . . . . . . . . . . . . . . 320 10.1.3 . . . with Negation . . . . . . . . . . . . . . . . . . . . . . . 321 10.1.4 . . . under Constraints . . . . . . . . . . . . . . . . . . . . 321 10.1.5 . . . with Aggregation . . . . . . . . . . . . . . . . . . . . . 321 10.2 Bag Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 10.2.1 Conjunctive Queries . . . . . . . . . . . . . . . . . . . . . 321 10.3 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322 10.3.1 Path Expressions . . . . . . . . . . . . . . . . . . . . . . . 322 10.4 Minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 10.5 Detecting common subexpressions . . . . . . . . . . . . . . . . . 323 10.5.1 Simple Expressions . . . . . . . . . . . . . . . . . . . . . . 323 10.5.2 Algebraic Expressions . . . . . . . . . . . . . . . . . . . . 323 10.6 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 III Rewrite Techniques 325 11 Simple Rewrites 327 11.1 Simple Adjustments . . . . . . . . . . . . . . . . . . . . . . . . . 327 11.1.1 Rewriting Simple Expressions . . . . . . . . . . . . . . . . 327 11.1.2 Normal forms for queries with disjunction . . . . . . . . . 329 11.2 Deriving new predicates . . . . . . . . . . . . . . . . . . . . . . . 329 11.2.1 Collecting conjunctive predicates . . . . . . . . . . . . . . 329 11.2.2 Equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 11.2.3 Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 11.2.4 Aggregation . . . . . . . . . . . . . . . . . . . . . . . . . . 331 11.2.5 ToDo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 11.3 Predicate Push-Down and Pull-Up . . . . . . . . . . . . . . . . . 333 vi CONTENTS 11.4 Eliminating Redundant Joins . . . . . . . . . . . . . . . . . . . . 333 11.5 Distinct Pull-Up and Push-Down . . . . . . . . . . . . . . . . . . 333 11.6 Set-Valued Attributes . . . . . . . . . . . . . . . . . . . . . . . . 333 11.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 333 11.6.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 334 11.6.3 Query Rewrite . . . . . . . . . . . . . . . . . . . . . . . . 335 11.7 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 12 View Merging 339 12.1 View Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 12.2 Simple View Merging . . . . . . . . . . . . . . . . . . . . . . . . . 339 12.3 Predicate Move Around (Predicate pull-up and push-down) . . . 340 12.4 Complex View Merging . . . . . . . . . . . . . . . . . . . . . . . 341 12.4.1 Views with Distinct . . . . . . . . . . . . . . . . . . . . . 341 12.4.2 Views with Group-By and Aggregation . . . . . . . . . . 342 12.4.3 Views in IN predicates . . . . . . . . . . . . . . . . . . . . 343 12.4.4 Final Remarks . . . . . . . . . . . . . . . . . . . . . . . . 343 12.5 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 13 Quantifier treatment 345 13.1 Pseudo-Quantifiers . . . . . . . . . . . . . . . . . . . . . . . . . . 345 13.2 Existential quantifier . . . . . . . . . . . . . . . . . . . . . . . . . 346 13.3 Universal quantifier . . . . . . . . . . . . . . . . . . . . . . . . . . 346 13.4 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350 14 Unnesting Nested Queries 351 15 Optimizing Queries with Materialized Views 353 15.1 Conjunctive Views . . . . . . . . . . . . . . . . . . . . . . . . . . 353 15.2 Views with Grouping and Aggregation . . . . . . . . . . . . . . . 353 15.3 Views with Disjunction . . . . . . . . . . . . . . . . . . . . . . . 353 15.4 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 16 Semantic Query Rewrite 355 16.1 Constraints and their impact on query optimization . . . . . . . 355 16.2 Semantic Query Rewrite . . . . . . . . . . . . . . . . . . . . . . . 355 16.3 Exploiting Uniqueness in Query Optimization . . . . . . . . . . . 356 16.4 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 IV Plan Generation 357 17 Current Search Space and Its Limits 359 17.1 Plans with Outer Joins, Semijoins and Antijoins . . . . . . . . . 359 17.2 Expensive Predicates and Functions . . . . . . . . . . . . . . . . 359 17.3 Techniques to Reduce the Search Space . . . . . . . . . . . . . . 359 17.4 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 CONTENTS vii 18 Dynamic Programming-Based Plan Generation 361 18.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361 18.2 Hypergraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 362 18.3 CCPs: Csg-Cmp-Pairs for Hypergraphs . . . . . . . . . . . . . . 363 18.4 Neighborhood . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 18.5 The CCP Enumerator BuEnumCppHyp . . . . . . . . . . . . . . . . 365 18.5.1 BuEnumCcpHyp . . . . . . . . . . . . . . . . . . . . . . . 366 18.5.2 EnumerateCsgRec . . . . . . . . . . . . . . . . . . . . . . 367 18.5.3 EmitCsg . . . . . . . . . . . . . . . . . . . . . . . . . . . . 368 18.5.4 EnumerateCmpRec . . . . . . . . . . . . . . . . . . . . . . 369 18.5.5 EmitCsgCmp . . . . . . . . . . . . . . . . . . . . . . . . . 369 18.5.6 Neighborhood Calculation . . . . . . . . . . . . . . . . . . 369 18.6 DPhyp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 18.7 Adding Selections . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 18.8 Adding Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 18.9 Adding Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 19 Optimizing Queries with Disjunctions 371 19.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 19.2 Using Disjunctive or Conjunctive Normal Forms . . . . . . . . . 372 19.3 Bypass Plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 19.4 Implementation remarks . . . . . . . . . . . . . . . . . . . . . . . 374 19.5 Other plan generators/query optimizer . . . . . . . . . . . . . . . 374 19.6 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 20 Generating Plans for the Full Algebra 377 21 Generating DAG-structured Plans 379 22 Simplifying the Query Graph 381 22.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 22.2 On Optimizing Join Queries . . . . . . . . . . . . . . . . . . . . . 382 22.3 Graph Simplification Algorithm . . . . . . . . . . . . . . . . . . . 383 22.3.1 Simplifying the Query Graph . . . . . . . . . . . . . . . . 384 22.3.2 The Full Algorithm . . . . . . . . . . . . . . . . . . . . . . 386 22.3.3 Join Ordering Criterion . . . . . . . . . . . . . . . . . . . 387 22.3.4 Theoretical Foundation . . . . . . . . . . . . . . . . . . . 388 22.4 The Time/Quality Trade-Off . . . . . . . . . . . . . . . . . . . . 390 23 Deriving and Dealing with Interesting Orderings and Groupings 393 23.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 23.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . 394 23.2.1 Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . 394 23.2.2 Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . 396 23.2.3 Functional Dependencies . . . . . . . . . . . . . . . . . . . 397 23.2.4 Algebraic Operators . . . . . . . . . . . . . . . . . . . . . 397 viii CONTENTS 23.2.5 Plan Generation . . . . . . . . . . . . . . . . . . . . . . . 398 23.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 399 23.4 Detailed Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 402 23.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 23.4.2 Determining the Input . . . . . . . . . . . . . . . . . . . . 403 23.4.3 Constructing the NFSM . . . . . . . . . . . . . . . . . . . 404 23.4.4 Constructing the DFSM . . . . . . . . . . . . . . . . . . . 407 23.4.5 Precomputing Values . . . . . . . . . . . . . . . . . . . . . 408 23.4.6 During Plan Generation . . . . . . . . . . . . . . . . . . . 408 23.4.7 Reducing the Size of the NFSM . . . . . . . . . . . . . . . 408 23.4.8 Complex Ordering Requirements . . . . . . . . . . . . . . 412 23.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 413 23.6 Total Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413 23.7 Influence of Groupings . . . . . . . . . . . . . . . . . . . . . . . . 415 23.8 Annotated Bibliography . . . . . . . . . . . . . . . . . . . . . . . 419 24 Cardinality and Cost Estimation 423 24.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 24.2 A First Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 426 24.2.1 Top-Most Cost Formula (Overall Costs) . . . . . . . . . . 426 24.2.2 Summation of Operator Costs . . . . . . . . . . . . . . . . 426 24.2.3 CPU Cost . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 24.2.4 Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . 427 24.2.5 I/O Costs . . . . . . . . . . . . . . . . . . . . . . . . . . . 427 24.2.6 Cardinality Estimates . . . . . . . . . . . . . . . . . . . . 429 24.3 The Simple Profile: A First Logical Profile and its Propagation . 431 24.3.1 The Logical Profile . . . . . . . . . . . . . . . . . . . . . . 431 24.3.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . 432 24.3.3 Profile Propagation for Selection . . . . . . . . . . . . . . 434 24.3.4 Profile Propagation for Join . . . . . . . . . . . . . . . . . 440 24.3.5 Profile Propagation for Projection . . . . . . . . . . . . . 441 24.3.6 Profile Propagation for Division . . . . . . . . . . . . . . . 445 24.3.7 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446 24.4 Approximation of a Set of Values . . . . . . . . . . . . . . . . . . 447 24.4.1 Approximations and Error Metrics . . . . . . . . . . . . . 447 24.4.2 Example Applications . . . . . . . . . . . . . . . . . . . . 448 24.5 Approximation with Linear Models . . . . . . . . . . . . . . . . . 449 24.5.1 Linear Models . . . . . . . . . . . . . . . . . . . . . . . . 449 24.5.2 Example Applications . . . . . . . . . . . . . . . . . . . . 453 24.5.3 Linear Models Under l2 . . . . . . . . . . . . . . . . . . . 460 24.5.4 Linear Models Under l∞ . . . . . . . . . . . . . . . . . . . 465 24.5.5 Linear Models Under lq . . . . . . . . . . . . . . . . . . . 468 24.5.6 Non-Linear Models under lq . . . . . . . . . . . . . . . . . 475 24.5.7 Multidimensional Models under lq . . . . . . . . . . . . . 476 24.6 Traditional Histograms . . . . . . . . . . . . . . . . . . . . . . . . 477 24.6.1 Bucketization . . . . . . . . . . . . . . . . . . . . . . . . . 478 24.6.2 Heuristics to Determine Bucket Boundaries . . . . . . . . 479 CONTENTS ix 24.7 More on Q . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 480 24.7.1 Properties of the Q-Error . . . . . . . . . . . . . . . . . . 480 24.7.2 Properties of Estimation Functions . . . . . . . . . . . . . 488 24.7.3 θ,q-Acceptability . . . . . . . . . . . . . . . . . . . . . . . 489 24.7.4 Testing θ,q-Acceptability for Buckets . . . . . . . . . . . . 490 24.7.5 From Buckets To Histograms . . . . . . . . . . . . . . . . 493 24.7.6 Q-Compression . . . . . . . . . . . . . . . . . . . . . . . . 502 24.8 One Dimensional Synopses . . . . . . . . . . . . . . . . . . . . . . 505 24.8.1 Four Level Tree and Variants . . . . . . . . . . . . . . . . 505 24.8.2 Q-Histograms (Type I) . . . . . . . . . . . . . . . . . . . . 508 24.8.3 Q-Histogram (Type II) . . . . . . . . . . . . . . . . . . . . 508 24.9 Sketches For Counting The Number of Distinct Values . . . . . . 508 24.9.1 Linear Counting . . . . . . . . . . . . . . . . . . . . . . . 510 24.9.2 DvByKMinVal . . . . . . . . . . . . . . . . . . . . . . . . 510 24.9.3 Logarithmic Counting . . . . . . . . . . . . . . . . . . . . 511 24.9.4 SuperLogLog Counting . . . . . . . . . . . . . . . . . . . 512 24.9.5 HyperLogLog Counting . . . . . . . . . . . . . . . . . . . 515 24.9.6 DvByMinAvg . . . . . . . . . . . . . . . . . . . . . . . . . 515 24.9.7 DvByKMinAvg . . . . . . . . . . . . . . . . . . . . . . . . 516 24.9.8 Pointers to the Literature . . . . . . . . . . . . . . . . . . 517 24.10Multidimensional Synopsis . . . . . . . . . . . . . . . . . . . . . . 517 24.10.1 Introductory Example . . . . . . . . . . . . . . . . . . . . 518 24.10.2 Solving the Introductory Problem without 2-Dimensional Synopsis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 24.10.3 Statistical Views . . . . . . . . . . . . . . . . . . . . . . . 520 24.10.4 Regular Partitioning: equi-width . . . . . . . . . . . . . . 521 24.10.5 Equi-Depth Histogram . . . . . . . . . . . . . . . . . . . . 521 24.10.6 2-Dimensional Synopsis based on SVD . . . . . . . . . . . 521 24.10.7 PHASED . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 24.10.8 MHIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 24.10.9 GENHIST . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 24.10.10HiRed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 24.10.11VI Histograms . . . . . . . . . . . . . . . . . . . . . . . . 521 24.10.12Grid Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 24.10.13More . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 24.11Iterative Selectivity Combination . . . . . . . . . . . . . . . . . . 522 24.12Maximum Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . 522 24.13Selected Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 24.13.1 Exploiting and Augmenting Existing DBMS Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 24.13.2 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 24.13.3 Query Feedback . . . . . . . . . . . . . . . . . . . . . . . 526 24.13.4 Combining Data Summaries with Sampling . . . . . . . . 526 24.13.5 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 24.13.6 Selectivity of String-Valued Attributes . . . . . . . . . . . 526 24.14Cost Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 526 24.14.1 Disk-based Joins . . . . . . . . . . . . . . . . . . . . . . . 526 x CONTENTS 24.14.2 Main Memory Joins . . . . . . . . . . . . . . . . . . . . . 526 24.14.3 Additional Pointers to the Literature . . . . . . . . . . . . 526 V Implementation 529 25 Architecture of a Query Compiler 531 25.1 Compilation process . . . . . . . . . . . . . . . . . . . . . . . . . 531 25.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 25.3 Control Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 25.4 Memory Management . . . . . . . . . . . . . . . . . . . . . . . . 533 25.5 Tracing and Plan Visualization . . . . . . . . . . . . . . . . . . . 533 25.6 Driver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 25.7 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533 26 Internal Representations 537 26.1 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 26.2 Algebraic Representations . . . . . . . . . . . . . . . . . . . . . . 537 26.2.1 Graph Representations . . . . . . . . . . . . . . . . . . . . 538 26.2.2 Query Graph . . . . . . . . . . . . . . . . . . . . . . . . . 538 26.2.3 Operator Graph . . . . . . . . . . . . . . . . . . . . . . . 538 26.3 Query Graph Model (QGM) . . . . . . . . . . . . . . . . . . . . . 538 26.4 Classification of Predicates . . . . . . . . . . . . . . . . . . . . . 538 26.5 Treatment of Distinct . . . . . . . . . . . . . . . . . . . . . . . . 538 26.6 Query Analysis and Materialization of Analysis Results . . . . . 538 26.7 Query and Plan Properties . . . . . . . . . . . . . . . . . . . . . 539 26.8 Conversion to the Internal Representation . . . . . . . . . . . . . 541 26.8.1 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 541 26.8.2 Translation into the Internal Representation . . . . . . . . 541 26.9 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541 27 Details on the Phases of Query Compilation 543 27.1 Parsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 27.2 Semantic Analysis, Normalization, Factorization, Constant Folding, and Translation . . . . . . . . . . . . . . . . . . . . . . . . . 543 27.3 Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 27.4 Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545 27.5 Constant Folding . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 27.6 Semantic analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 27.7 Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548 27.8 Rewrite I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 27.9 Plan Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 27.10Rewrite II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 27.11Code generation . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 27.12Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554 CONTENTS xi 28 Hard-Wired Algorithms 555 28.1 Hard-wired Dynamic Programming . . . . . . . . . . . . . . . . . 555 28.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 555 28.1.2 A plan generator for bushy trees . . . . . . . . . . . . . . 559 28.1.3 A plan generator for bushy trees and expensive selections 560 28.1.4 A plan generator for bushy trees, expensive selections and functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 28.2 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 560 29 Rule-Based Algorithms 563 29.1 Rule-based Dynamic Programming . . . . . . . . . . . . . . . . . 563 29.2 Rule-based Memoization . . . . . . . . . . . . . . . . . . . . . . . 563 29.3 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563 30 Example Query Compiler 565 30.1 Research Prototypes . . . . . . . . . . . . . . . . . . . . . . . . . 565 30.1.1 AQUA and COLA . . . . . . . . . . . . . . . . . . . . . . 565 30.1.2 Black Dahlia II . . . . . . . . . . . . . . . . . . . . . . . . 565 30.1.3 Epoq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565 30.1.4 Ereq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 30.1.5 Exodus/Volcano/Cascade . . . . . . . . . . . . . . . . . . 568 30.1.6 Freytags regelbasierte System R-Emulation . . . . . . . . 570 30.1.7 Genesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 571 30.1.8 GOMbgo . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 30.1.9 Gral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 30.1.10 Lambda-DB . . . . . . . . . . . . . . . . . . . . . . . . . . 579 30.1.11 Lanzelotte in short . . . . . . . . . . . . . . . . . . . . . . 579 30.1.12 Opt++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580 30.1.13 Postgres . . . . . . . . . . . . . . . . . . . . . . . . . . . . 580 30.1.14 Sciore & Sieg . . . . . . . . . . . . . . . . . . . . . . . . . 582 30.1.15 Secondo . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582 30.1.16 Squiral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582 30.1.17 System R and System R∗ . . . . . . . . . . . . . . . . . . 584 30.1.18 Starburst and DB2 . . . . . . . . . . . . . . . . . . . . . . 584 30.1.19 Der Optimierer von Straube . . . . . . . . . . . . . . . . . 587 30.1.20 Other Query Optimizer . . . . . . . . . . . . . . . . . . . 588 30.2 Commercial Query Compiler . . . . . . . . . . . . . . . . . . . . 590 30.2.1 The DB 2 Query Compiler . . . . . . . . . . . . . . . . . 590 30.2.2 The Oracle Query Compiler . . . . . . . . . . . . . . . . . 590 30.2.3 The SQL Server Query Compiler . . . . . . . . . . . . . . 594 VI Selected Topics 595 31 Generating Plans for Top-N-Queries? 597 31.1 Motivation and Introduction . . . . . . . . . . . . . . . . . . . . . 597 31.2 Optimizing for the First Tuple . . . . . . . . . . . . . . . . . . . 597 xii CONTENTS 31.3 Optimizing for the First N Tuples . . . . . . . . . . . . . . . . . . 597 32 Recursive Queries 599 33 Issues Introduced by OQL 601 33.1 Type-Based Rewriting and Pointer Chasing Elimination . . . . . 601 33.2 Class Hierarchies . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 33.3 Cardinalities and Cost Functions . . . . . . . . . . . . . . . . . . 605 34 Issues Introduced by XPath 607 34.1 A Naive XPath-Interpreter and its Problems . . . . . . . . . . . 607 34.2 Dynamic Programming and Memoization . . . . . . . . . . . . . 607 34.3 Naive Translation of XPath to Algebra . . . . . . . . . . . . . . . 607 34.4 Pushing Duplicate Elimination . . . . . . . . . . . . . . . . . . . 607 34.5 Avoiding Duplicate Work . . . . . . . . . . . . . . . . . . . . . . 607 34.6 Avoiding Duplicate Generation . . . . . . . . . . . . . . . . . . . 607 34.7 Index Usage and Materialized Views . . . . . . . . . . . . . . . . 607 34.8 Cardinalities and Costs . . . . . . . . . . . . . . . . . . . . . . . 607 34.9 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607 35 Issues Introduced by XQuery 609 35.1 Reordering in Ordered Context . . . . . . . . . . . . . . . . . . . 609 35.2 Result Construction . . . . . . . . . . . . . . . . . . . . . . . . . 609 35.3 Unnesting Nested XQueries . . . . . . . . . . . . . . . . . . . . . 609 35.4 Cardinalities and Cost Functions . . . . . . . . . . . . . . . . . . 609 35.5 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 609 36 Outlook 611 A Query Languages? 613 A.1 Designing a query language . . . . . . . . . . . . . . . . . . . . . 613 A.2 SQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 A.3 OQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 A.4 XPath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 A.5 XQuery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 A.6 Datalog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613 B Query Execution Engine (?) 615 C Glossary of Rewrite and Optimization Techniques 617 D Useful Formulas 623 Bibliography 624 Index 693 E ToDo 695 List of Figures 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 DBMS architecture . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Query interpreter . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Simple query interpreter . . . . . . . . . . . . . . . . . . . . . . . 7 Query compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Query compiler architecture . . . . . . . . . . . . . . . . . . . . . 8 Execution plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Potential and actual search space . . . . . . . . . . . . . . . . . . 12 Generation vs. transformation . . . . . . . . . . . . . . . . . . . . 13 2.1 Relational algebra . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Equivalences for the relational algebra . . . . . . . . . . . . . . . 2.3 (Simplified) Canonical translation of SQL to algebra . . . . . . . 2.4 Text book query optimization . . . . . . . . . . . . . . . . . . . . 2.5 Logical query optimization . . . . . . . . . . . . . . . . . . . . . . 2.6 Different join trees . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Plans for example query (Part I) . . . . . . . . . . . . . . . . . . 2.8 Plans for example query (Part II) . . . . . . . . . . . . . . . . . . 2.9 Physical query optimization . . . . . . . . . . . . . . . . . . . . . 2.10 Plan for example query after physical query optimization . . . . 17 18 19 20 21 27 28 29 30 30 3.1 3.2 3.3 3.4 32 33 55 Query graph for example query of Section 2.1 . . . . . . . . . . . Query graph shapes . . . . . . . . . . . . . . . . . . . . . . . . . Illustrations for the IKKBZ Algorithm . . . . . . . . . . . . . . . A query graph, its directed join graph, some spanning trees and join trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 A query graph, its directed join tree, a spanning tree and its problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Search space with sharing under optimality principle . . . . . . . 3.7 Algorithm DPsize . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Algorithm DPsub . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9 Size of the search space for different graph structures . . . . . . . 3.10 Algorithm DPccp . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 Enumeration Example for DPccp . . . . . . . . . . . . . . . . . . 3.12 Sample graph to illustrate EnumerateCsgRec . . . . . . . . . . . 3.13 Call sequence for Figure 3.12 . . . . . . . . . . . . . . . . . . . . 3.14 Example of rule transformations (RS-1) . . . . . . . . . . . . . . xiii 56 58 63 70 72 74 75 75 77 77 99 xiv LIST OF FIGURES 3.15 Encoding Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 3.16 Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.17 Tree-merge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 3.18 Algorithm UnrankDecomposition . . . . . . . . . . . . . . . . . . 110 3.19 Leaf-insertion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 3.20 A query graph, its tree, and its standard decomposition graph . . 111 3.21 Algorithm Adorn . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 3.22 A query graph, a join tree, and its encoding . . . . . . . . . . . . 121 3.23 Pseudo code for IDP-1 . . . . . . . . . . . . . . . . . . . . . . . . 124 3.24 Pseudocode for IDP-2 . . . . . . . . . . . . . . . . . . . . . . . . 125 3.25 Subroutine applicable-predicates . . . . . . . . . . . . . . . . 127 3.26 Subroutine construct-bushy-tree . . . . . . . . . . . . . . . . . 128 3.27 Subroutine extract-plan and its subroutine . . . . . . . . . . . 128 3.28 Impact of selectivity on the search space . . . . . . . . . . . . . . 134 3.29 Impact of relation sizes on the search space . . . . . . . . . . . . 134 3.30 Impact of parameters on the performance of heuristics . . . . . . 134 3.31 Impact of selectivities on probabilistic procedures . . . . . . . . . 135 4.1 4.2 4.3 Disk drive assembly . . . . . . . . . . . . . . . . . . . . . . . . . 138 Disk drive read request processing . . . . . . . . . . . . . . . . . 139 Time to read 100 MB from disk (depending on the number of 8 KB blocks read at once) . . . . . . . . . . . . . . . . . . . . . . 142 4.4 Time needed to read n random pages . . . . . . . . . . . . . . . . 144 4.5 Break-even point in fraction of total pages depending on page size145 4.6 Physical organization of a relational database . . . . . . . . . . . 148 4.7 Slotted pages and TIDs . . . . . . . . . . . . . . . . . . . . . . . 150 4.8 Various physical record layouts . . . . . . . . . . . . . . . . . . . 151 4.9 Clustered vs. non-clustered index . . . . . . . . . . . . . . . . . . 157 4.10 Illustration of seek cost estimate . . . . . . . . . . . . . . . . . . 190 5.1 5.2 5.3 5.4 5.5 Truth tables for two-valued logic . . . . . . . . . . . . . . . . . . 199 Laws for two-valued logic . . . . . . . . . . . . . . . . . . . . . . 200 Comparison functions in the presence of NULL values . . . . . . 201 Truth tables for three-valued logic . . . . . . . . . . . . . . . . . 202 True-/false-interpretation and Negation . . . . . . . . . . . . . . 203 7.1 Laws for Set Operations . . . . . . . . . . . . . . . . . . . . . . . 210 7.2 Laws for Bag Operations . . . . . . . . . . . . . . . . . . . . . . . 212 7.3 Decomposition of aggregate functions . . . . . . . . . . . . . . . . 219 7.4 Example for map and group operators . . . . . . . . . . . . . . . 226 7.5 Three possible representations of a bag . . . . . . . . . . . . . . . 240 7.6 Example for outerjoin reorderability (for strict q) . . . . . . . . . 257 7.7 Example for outerjoin reorderability (for non-strict q ′ ) . . . . . . 258 7.8 Example for outerjoin reorderability (for partially non-strict q ′ ) . 258 7.9 Example for outerjoin associativity for strict q . . . . . . . . . . . 259 7.10 Example for outerjoin associativity for non-strict q ′ . . . . . . . . 260 7.11 Example for outerjoin l-asscom for strict q . . . . . . . . . . . . . 260 LIST OF FIGURES xv 7.12 Example for grouping and join . . . . . . . . . . . . . . . . . . . 268 7.13 Extended example for grouping and join . . . . . . . . . . . . . . 269 7.14 Example for Eqv. 7.113 . . . . . . . . . . . . . . . . . . . . . . . 274 7.15 Example relations . . . . . . . . . . . . . . . . . . . . . . . . . . 289 7.16 Join results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 7.17 Left- and right-hand sides . . . . . . . . . . . . . . . . . . . . . . 290 7.18 Transformation rules for assoc, l-asscom, and r-asscom . . . . . . 294 7.19 Core search space example . . . . . . . . . . . . . . . . . . . . . . 295 7.20 The complete search space . . . . . . . . . . . . . . . . . . . . . . 296 7.21 Algorithm DPsube . . . . . . . . . . . . . . . . . . . . . . . . . . 297 7.22 Calculating TES for simple operator trees . . . . . . . . . . . . . 299 7.23 Example showing the incompleteness of CD-A . . . . . . . . . . . 301 7.24 Calculating conflict rules for simple operator trees . . . . . . . . 301 7.25 Example showing the incompleteness of CD-B . . . . . . . . . . . 302 7.26 Conflict detection for unary and binary operators . . . . . . . . . 304 7.27 Example for Map Operator . . . . . . . . . . . . . . . . . . . . . 309 7.28 Examples for unary grouping and the groupjoin . . . . . . . . . . 310 11.1 Simplification rules for boolean expressions . . . . . . . . . . . . 330 11.2 Axioms for equality . . . . . . . . . . . . . . . . . . . . . . . . . . 330 11.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 11.4 Axioms for inequality . . . . . . . . . . . . . . . . . . . . . . . . 338 18.1 Sample hypergraph . . . . . . . . . . . . . . . . . . . . . . . . . . 362 18.2 Trace of algorithm for Figure ?? . . . . . . . . . . . . . . . . . . 367 18.3 Pseudocode for calcNeighborhood . . . . . . . . . . . . . . . . . 370 19.1 DNF plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 19.2 CNF plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 19.3 Bypass plans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 22.1 Runtimes for Different Query Graphs . . . . . . . . . . . . . . . . 383 22.2 Exemplary Simplification Steps for a Star Query . . . . . . . . . 384 22.3 Pseudo-Code for a Single Simplification Step . . . . . . . . . . . 385 22.4 The Full Optimization Algorithm . . . . . . . . . . . . . . . . . . 387 22.5 The Effect of Simplification Steps for a Star Query with 20 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391 22.6 The Effect of Simplification Steps for a Grid Query with 20 Relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 23.1 Propagation of orderings and groupings . . . . . . . . . . . . . . 398 23.2 Possible FSM for orderings . . . . . . . . . . . . . . . . . . . . . 400 23.3 Possible FSM for groupings . . . . . . . . . . . . . . . . . . . . . 401 23.4 Combined FSM for orderings and groupings . . . . . . . . . . . . 401 23.5 Possible DFSM for Figure 23.4 . . . . . . . . . . . . . . . . . . . 401 23.6 Preparation steps of the algorithm . . . . . . . . . . . . . . . . . 403 23.7 Initial NFSM for sample query . . . . . . . . . . . . . . . . . . . 405 23.8 NFSM after adding DF D edges . . . . . . . . . . . . . . . . . . . 406 xvi LIST OF FIGURES 23.9 NFSM after pruning artificial states . . . . . . . . . . . . . . . . 406 23.10Final NFSM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 406 23.11Resulting DFSM . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 23.12contains Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 23.13transition Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 407 23.14Plan generation for different join graphs, Simmen’s algorithm (left) vs. our algorithm (middle) . . . . . . . . . . . . . . . . . . 413 23.15Memory consumption in KB for Figure 23.14 . . . . . . . . . . . 415 23.16Time requirements for the preparation step . . . . . . . . . . . . 418 23.17Space requirements for the preparation step . . . . . . . . . . . . 419 24.1 Overview of operations for cardinality and cost estimations . . . 424 24.2 Sample for range query result estimation under CVA and ESA. . 435 ⊥ . . . . . . . . . . . . . . . . . . 444 24.3 Calculating the lower bound DG 24.4 Calculating the estimate for DG . . . . . . . . . . . . . . . . . . . 445 24.5 Example frequency density and cumulated frequency . . . . . . . 455 24.6 Cumulated frequency and its approximation . . . . . . . . . . . . 456 24.7 Q-error and plan optimality . . . . . . . . . . . . . . . . . . . . . 460 24.8 Algorithm for best linear approximation under l∞ . . . . . . . . 469 24.9 Algorithm finding best linear approximation under lq . . . . . . . 474 24.10Sample data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 477 24.11Q-compression, logb -based . . . . . . . . . . . . . . . . . . . . . . 502 24.12Binary Q-compression . . . . . . . . . . . . . . . . . . . . . . . . 504 24.13FLT example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 24.14FLT example 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507 24.15Car database example . . . . . . . . . . . . . . . . . . . . . . . . 510 24.16Linear Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . 511 24.17Algorithm DvByKMinVal . . . . . . . . . . . . . . . . . . . . . . 511 24.18Algorithm LogarithmicCounting . . . . . . . . . . . . . . . . . . . 512 24.19Algorithm PCSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513 24.20Filling M for LogLogCounting, SuperLogLogCounting, and HyperLogLogCounting . . . . . . . . . . . . . . . . . . . . . . . . . 514 24.21SuperLogLog Counting . . . . . . . . . . . . . . . . . . . . . . . . 514 24.22Calculation of α̃ . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 24.23HyperLogLog Counting . . . . . . . . . . . . . . . . . . . . . . . 516 24.24DvByMinAvg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517 24.25DvByKMinAvg . . . . . . . . . . . . . . . . . . . . . . . . . . . . 518 24.26Example for Equi-Depth Tree . . . . . . . . . . . . . . . . . . . . 521 24.27Sample B+ -Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 25.1 The compilation process . . . . . . . . . . . . . . . . . . . . . . . 532 25.2 Class Architecture of the Query Compiler . . . . . . . . . . . . . 534 25.3 Control Block Structure . . . . . . . . . . . . . . . . . . . . . . . 535 27.1 Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544 27.2 Expression hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . 545 27.3 Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 LIST OF FIGURES xvii 27.4 Query 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 547 27.5 Internal representation . . . . . . . . . . . . . . . . . . . . . . . . 549 27.6 An algebraic operator tree with a d-join . . . . . . . . . . . . . . 552 27.7 Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 553 28.1 A sample execution plan . . . . . . . . . . . . . . . . . . . . . . . 556 28.2 Different join operator trees . . . . . . . . . . . . . . . . . . . . . 557 28.3 Bottom up plan generation . . . . . . . . . . . . . . . . . . . . . 559 28.4 A Dynamic Programming Optimization Algorithm . . . . . . . . 561 30.1 Beispiel einer Epoq-Architektur . . . . . . . . . . . . . . . . . . . 566 30.2 Exodus Optimierer Generator . . . . . . . . . . . . . . . . . . . . 568 30.3 Organisation der Optimierung . . . . . . . . . . . . . . . . . . . . 571 30.4 Ablauf der Optimierung . . . . . . . . . . . . . . . . . . . . . . . 574 30.5 Architektur von GOMrbo . . . . . . . . . . . . . . . . . . . . . . 575 30.6 a) Architektur des Gral-Optimierers; b) Operatorhierarchie nach Kosten . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 576 30.7 Die Squiralarchitektur . . . . . . . . . . . . . . . . . . . . . . . . 583 30.8 Starburst Optimierer . . . . . . . . . . . . . . . . . . . . . . . . . 585 30.9 Der Optimierer von Straube . . . . . . . . . . . . . . . . . . . . . 587 33.1 Algebraic representation of a query . . . . . . . . . . . . . . . . . 601 33.2 A join replacing pointer chasing . . . . . . . . . . . . . . . . . . . 603 33.3 A Sample Class Hierarchy . . . . . . . . . . . . . . . . . . . . . . 604 33.4 Implementation of Extents . . . . . . . . . . . . . . . . . . . . . . 605 xviii LIST OF FIGURES Preface Goals Primary Goals: • book covers many query languages (at least SQL, OQL, XQuery (XPath)) • techniques should be represented as query language independent as possible • book covers all stages of the query compilation process • book completely covers fundamental issues • book gives implementation details and tricks Secondary Goals: • book is thin • book is not totally unreadable • book separates concepts from implementation techniques Organizing the material is not easy: The same topic pops up • at different stages of the query compilation process and • at different query languages Acknowledgements Introducer to query optimization: Günther von Bültzingsloewen Peter Lockemann First paper coauthor: Stefan Karl, Coworkers: Alfons Kemper, Klaus Peithner, Michael Steinbrunn, Donald Kossmann, Carsten Gerlhof, Jens Claussen, Sophie Cluet, Vassilis Christophides, Georg Gottlob, V.S. Subramanian, Sven Helmer, Birgitta König-Ries, Wolfgang Scheufele, Carl-Christian Kanne, Thomas Neumann, Norman May, Matthias Brantner Robin Aly xix LIST OF FIGURES 1 Discussions: Umesh Dayal, Dave Maier, Gail Mitchell, Stan Zdonik, Tamer Özsu, Arne Rosenthal, Don Chamberlin, Bruce Lindsay, Guy Lohman, Mike Carey, Bennet Vance, Laura Haas, Mohan, CM Park, Yannis Ioannidis, Götz Graefe, Serge Abiteboul, Claude Delobel Patrick Valduriez, Dana Florescu, Jerome Simeon, Mary Fernandez, Christoph Koch, Adam Bosworth, Joe Hellerstein, Paul Larson, Hennie Steenhagen, Harald Schöning, Bernhard Seeger, Encouragement: Anand Deshpande Manuscript: Simone Seeger, and many others to be inserted. 2 LIST OF FIGURES Part I Basics 3 Chapter 1 Introduction 1.1 General Remarks Query languages like SQL or OQL are declarative. That is, they specify what the user wants to retrieve and not how to retrieve it. It is the task of the query compiler to generate a query evaluation plan (evaluation plan for short, or execution plan or simply plan) for a given query. The query evaluation plan (QEP) essentially is an operator tree with physical algebraic operators as nodes. It is evaluated by the runtime system. Figure 1.6 shows a detailed execution plan ready to be interpreted by the runtime system. Figure 28.1 shows an abstraction of a query plan often used to explain algorithms or optimization techniques. The book tries to demystify query optimization and query optimizers. By means of the multi-lingual query optimizer BD II, the most important aspects of query optimizers and their implementation are discussed. We concentrate not only on the query optimizer core (Rewrite I, Plan Generator, Rewrite II) of the query compilation process but touch on all issues from parsing to code generation and quality assurance. We start by giving a two-module overview of a database management system. One of these modules covers the functionality of the query compiler. The query compiler itself involves several submodules. For each submodule we discuss the features relevant for query compilation. 1.2 DBMS Architecture Any database management system (DBMS) can be divided into two major parts: the compile time system (CTS) and the runtime system (RTS). User commands enter the compile time system which translates them into executables which are then interpreted by the runtime system (Fig. 1.1). The input to the CTS are statements of several kinds, for example connect to a database (starts a session), disconnect from a database, create a database, drop a database, add/drop a schema, perform schema changes (add relations, object types, constraints, . . . ), add/drop indexes, run statistics commands, manually modify the DBMS statistics, begin of a transaction, end of a transac5 6 CHAPTER 1. INTRODUCTION user command (e.g. query) CTS execution plan RTS result Figure 1.1: DBMS architecture Rewrite query calculus interpretation result Figure 1.2: Query interpreter tion, add/drop a view, update database items (e.g. tuples, relations, objects), change authorizations, and state a query. Within the book, we will only be concerned with the tiny last item. 1.3 Interpretation versus Compilation There are two essential approaches to process a query: interpretation and compilation. The path of a query through a query interpreter is illustrated in Figure 1.2. Query interpretation translates the query string into some internal representation that is mostly calculus-based. Optionally, some rewrite on this representation takes place. Typical steps during this rewrite phase are unnesting nested queries, pushing selections down, and introducing index structures. After that, the query is interpreted. A simple query interpreter is sketched in Figure 1.3. The first function, interprete, takes a simple SQL block and extracts the different clauses, initializes the result R and calls eval. Then, eval recursively evaluates the query by first producing the cross product of the entries in the from clause. After all of them have been processed, the predicate is applied and for those tuples where the where predicate evaluates to true, a result tuple is constructed and added to the result set R. Obviously, the sketeched interpreter is far from being efficient. A much better approach has been described by Wong and Youssefi [931, 962]. Let us now discuss the compilation approach. The different steps are sum- 7 1.3. INTERPRETATION VERSUS COMPILATION interprete(SQLBlock x) { /* possible rewrites go here */ s := x.select(); f := x.from(); w := x.where(); R := ∅; /* result */ t := []; /* empty tuple */ eval(s, f , w, t, R); return R; } eval(s, f , w, t, R) { } if(f .empty()) if(w(t)) R += s(t); else foreach(t′ ∈ first(f )) eval(s, tail(f ), w, t ◦ t′ , R); Figure 1.3: Simple query interpreter Rewrite query calculus Rewrite / Transformation plan generation / algebra translation code generation execution plan Figure 1.4: Query compiler marized in Figure 1.4. First, the query is rewritten. Again, unnesting nested queries is a main technique for performance gains. Other rewrites will be discussed in Part ??. After the rewrite, the plan generation takes place. Here, an optimal plan is constructed. Whereas typically rewrite takes place on a calculus-based representation of the query, plan generation constructs an algebraic expression containing well-known operators like selection and join. Sometimes, after plan generation, the generated plan is refined: some polishing takes place. Then, code is generated, that can be interpreted by the runtime system. More specifically, the query execution engine—a part of the runtime system— interpretes the query execution plan. Let us illustrate this. The following query 8 CHAPTER 1. INTRODUCTION query CTS parsing abstract syntax tree nfst internal representation rewrite I internal representation query optimizer plan generation internal representation rewrite II internal representation code generation execution plan Figure 1.5: Query compiler architecture is Query 1 of the now obsolete TPC-D benchmark [878]. RETURNFLAG, LINESTATUS, SUM(QUANTITY) as SUM QTY, SUM(EXTENDEDPRICE) as SUM EXTPR, SUM(EXTENDEDPRICE * (1 - DISCOUNT)), SUM(EXTENDEDPRICE * (1 - DISCOUNT)* (1 + TAX)), AVG(QUANTITY), AVG(EXTENDEDPRICE), AVG(DISCOUNT), COUNT(*) FROM LINEITEM WHERE SHIPDDATE <= DATE ’1998-12-01’ GROUP BY RETURNFLAG, LINESTATUS ORDER BY RETURNFLAG, LINESTATUS SELECT 9 1.4. REQUIREMENTS FOR A QUERY COMPILER The CTS translates this query into a query execution plan. Part of the plan is shown in Fig. 1.6. One rarely sees a query execution plan. This is the reason why I included one. But note that the form of query execution plans differs from DBMS to DBMS since it is (unfortunately) not standardized the way SQL is. Most DBMSs can give the user abstract representations of query plans. It is worth the time to look at the plans generated by some commercial DBMSs. I do not expect the reader to understand the plan in all details. Some of these details will become clear later. Anyway, this plan is given to the RTS which then interprets it. Part of the result of the interpretation might look like this: RETURNFLAG A N N R LINESTATUS F F O F SUM QTY 3773034 100245 7464940 3779140 SUM EXTPR 5319329289.68 141459686.10 10518546073.98 5328886172.99 ... ... ... ... ... This should look familar to you. The above query plan is very simple. It contains only a few algebraic operators. Usually, more algebraic operators are present and the plan is given in a more abstract form that cannot be directly executed by the runtime system. Fig. 2.10 gives an example of an abstracted more complex operator tree. We will work with representations closer to this one. A typical query compiler architecture is shown in Figure 1.5. The first component is the parser. It produces an abstract syntax tree. This is not always the case but this intermediate representation very much simplifies the task of following component. The NFST component performs several tasks. The first step is normalization. This mainly deals with introducing new variables for subexpressions. Factorization and semantic analysis are performed during NFST. Last, the abstract syntax tree is translated into the internal representation. All these steps can typically be performed during a single path through the query representation. Semantic analysis requires looking up schema definitions. This can be expensive and, hence, the number of lookups should be minimized. After NFST the core optimization steps rewrite I and plan generation take place. Rewrite II does some polishing before code generation. These modules directly correspond to the phases in Figure 1.4. They are typically further devided into submodules handling subphases. The most prominent example is the preparation phase that takes place just before the actual plan generation takes place. In our figures, we think of preparation as being part of the plan generation. 1.4 Requirements for a Query Compiler Here are the main requirements for a query compiler: 1. Correctness 2. Completeness 3. Generate optimal plan (viz avoid the worst case) 10 CHAPTER 1. INTRODUCTION (group (tbscan {segment ’lineitem.C4Kseg’ 0 4096} {nalslottedpage 4096} {ctuple ’lineitem.cschema’} [ 20 LOAD_PTR 1 LOAD_SC1_C 8 1 2 // L_RETURNFLAG LOAD_SC1_C 9 1 3 // L_LINESTATUS LOAD_DAT_C 10 1 4 // L_SHIPDATE LEQ_DAT_ZC_C 4 ’1998-02-09’ 1 ] 2 1 // number of help-registers and selection-register ) 10 22 // hash table size, number of registers [ // init MV_UI4_C_C 1 0 // COUNT(*) = 0 LOAD_SF8_C 4 1 6 // L_QUANTITY LOAD_SF8_C 5 1 7 // L_EXTENDEDPRICE LOAD_SF8_C 6 1 8 // L_DISCOUNT LOAD_SF8_C 7 1 9 // L_TAX MV_SF8_Z_C 6 10 // SUM/AVG(L_QUANTITY) MV_SF8_Z_C 7 11 // SUM/AVG(L_EXTENDEDPRICE) MV_SF8_Z_C 8 12 // AVG(L_DISCOUNT) SUB_SF8_CZ_C 1.0 8 13 // 1 - L_DISCOUNT ADD_SF8_CZ_C 1.0 9 14 // 1 + L_TAX MUL_SF8_ZZ_C 7 13 15 // SUM(L_EXTDPRICE * (1 - L_DISC)) MUL_SF8_ZZ_C 15 14 16 // SUM((...) * (1 + L_TAX)) ] [ // advance INC_UI4 0 // inc COUNT(*) MV_PTR_Y 1 1 LOAD_SF8_C 4 1 6 // L_QUANTITY LOAD_SF8_C 5 1 7 // L_EXTENDEDPRICE LOAD_SF8_C 6 1 8 // L_DISCOUNT LOAD_SF8_C 7 1 9 // L_TAX MV_SF8_Z_A 6 10 // SUM/AVG(L_QUANTITY) MV_SF8_Z_A 7 11 // SUM/AVG(L_EXTENDEDPRICE) MV_SF8_Z_A 8 12 // AVG(L_DISCOUNT) SUB_SF8_CZ_C 1.0 8 13 // 1 - L_DISCOUNT ADD_SF8_CZ_C 1.0 9 14 // 1 + L_TAX MUL_SF8_ZZ_B 7 13 17 15 // SUM(L_EXTDPRICE * (1 - L_DISC)) MUL_SF8_ZZ_A 17 14 16 // SUM((...) * (1 + L_TAX)) ] [ // finalize UIFC_C 0 18 DIV_SF8_ZZ_C 10 18 19 // AVG(L_QUANTITY) DIV_SF8_ZZ_C 11 18 20 // AVG(L_EXTENDEDPRICE) DIV_SF8_ZZ_C 12 18 21 // AVG(L_DISCOUNT) ] [ // hash program HASH_SC1 2 HASH_SC1 3 ] [ // compare program CMPA_SC1_ZY_C 2 2 0 EXIT_NEQ 0 CMPA_SC1_ZY_C 3 3 0 ]) Figure 1.6: Execution plan 4. Efficiency, generate the plan fast, do not waste memory 5. Graceful degradation 1.5. SEARCH SPACE 11 6. Robustness First of all, the query compiler must produce correct query evaluation plans. That is, the result of the query evaluation plan must be the result of the query as given by the specification of the query language. It must also cover the complete query language. The next issue is that an optimal query plan must (should) be generated. However, this is not always that easy. That is why some database researchers say that one must avoid the worst plan. Talking about the quality of a plan requires us to fix the optimization goal. Several goals are reasonable: We can optimize throughput, minimize response time, minimize resource consumption (both, memory and CPU), and so on. A good query compiler supports two optimization goals: minimize resource consumption and minimize the time to produce the first tuple. Obviously, both goals cannot be achieved at the same time. Hence, the query compiler must be instructed about the optimization goal. Irrespective of the optimization goal, the query compiler should produce the query evaluation plan fast. It does not make sense to take 10 seconds to optimize a query whose execution time is below a second. This sounds reasonable but is not trivial to achieve. As we will see, the number of query execution plans that are equivalent to a given query, i.e. produce the same result as the query, can be very large. Sometimes, very large even means that not all plans can be considered. Taking the wrong approach to plan generation will result in no plan at all. This is the contrary of graceful degradation. Expressed positively, graceful degradation means that in case of limited resources, a plan is generated that may not be the optimal plan, but also not that far away from the optimal plan. Last, typical software quality criteria should be met. We only mentioned robustness in our list, but others like maintainability must be met also. 1.5 Search Space For a given query, there typically exists a high number of plans that are equivalent to the plan. Not all of these plans are accessible. Only those plans that can be generated by known optimization techniques (mainly algebraic equivalences) can potentially be generated. Since this number may still be too large, many query compilers restrict their search space further. We call the search space explored by a query optimizer the actual search space. The potential search space is the set of all plans that are known to be equivalent to the given query by applying the state of the art of query optimization techniques. The whole set of plans equivalent to a given query is typically unknown: we are not sure whether all optimization techniques have been discovered so far. Figure 1.7 illustrates the situation. Note that we run into problems if the actual search space is not a subset of the equivalent plans. Then the query compiler produces wrong results. As we will see in several chapters of this book, some optimization techniques have been proposed that produce plans that are not equivalent to the original query. 12 CHAPTER 1. INTRODUCTION equivalent plans actual search space potential search space Figure 1.7: Potential and actual search space 1.6 Generation versus Transformation Two different approaches to plan generation can be distinguished: • The transformation-based approach transforms one query execution plan into another equivalent one. This can, for example, happen by applying an algebraic equivalence to a query execution plan in order to yield a better plan. • The generic or synthetic approach produces a query execution plan by assembling building blocks and adding one algebraic operator after the other, until a complete query execution plan has been produced. Note that in this approach only when all building blocks and algebraic opertors have been introduced the internal representation can be executed. Before that, no (complete) plan exists. For an illustration see Figure 1.8. A very important issue is how to explore the search space. Several wellknown approaches exist: A∗ , Branch-and-bound, greedy algorithms, hill-climbing, dynamic programming, memoization, [209, 530, 531, 682]. These form the basis for most of the plan generation algorithms. 1.7 Focus In this book, we consider only the compilation of queries. We leave out many special aspects like query optimization for multi-media database systems or 13 1.8. ORGANIZATION OF THE BOOK a) Generative Approach b) Transformational Approach Figure 1.8: Generation vs. transformation multidatabase systems. These are just two omissions. We further do not consider the translation of update statements which — especially in the presence of triggers — can become quite complex. Furthermore, we assume the reader to be familiar with the fundamentals of database systems [264, 484, 650, 709, 816] and their implementation [403, 316]. Especially, knowledge on query execution engines is required [347]. Last, the book presents a very personal view on query optimization. To see other views on the same topic, I strongly recommend to read the literature cited in this book and the references found therein. A good start are overview articles, PhD theses, and books, e.g. [902, 322, 445, 446, 468, 543] [610, 613, 662, 830, 850, 886, 887]. 1.8 Organization of the Book The first part of the book is an introduction to the topic. It should give an idea about the breadth and depth of query optimization. We first recapitulate query optimization the way it is described in numerous text books on database systems. There should be nothing really new here except for some pitfalls we will point out. The Chapter 3 is devoted to the join ordering problem. This has several reasons. First of all, at least one of the algorithms presented in 14 CHAPTER 1. INTRODUCTION this chapter forms the core of every plan generator. The second reason is that this problem allows to discuss some issues like search space sizes and problem complexities. The third reason is that we do not have to delve into details. We can stick to very simple (you might call them unrealistic) cost functions, do not have to concern ourselves with details of the runtime system and the like. Expressed positively, we can concentrate on some algorithmic aspects of the problem. In Chapter 4 we do the opposite. The reader will not find any advanced algorithms in this chapter but plenty of details on disks and cost functions. The goal of the rest of the book is then to bring these issues together, broaden the scope of the chapters, and treat problems not even touched by them. The main issue not touched is query rewrite. Chapter 2 Textbook Query Optimization Almost every introductory textbook on database systems contains a section on query optimization (or at least query processing) [264, 484, 650, 709, 816]. Also, the two existing books on implementing database systems contain a section on query optimization [403, 316]. In this chapter we give an excerpt1 of these sections and subsequently discuss the problems with the described approach. The bottom line will be that these descriptions of query optimization capture the essence of it but contain pitfalls that need to be pointed out and gaps to be filled. 2.1 Example Query and Outline We use the following relations for our example query: Student(SNo, SName, SAge, SYear) Attend(ASNo, ALNo, AGrade) Lecture(LNo, LTitle, LPNo) Professor(PNo, PName) Those attributes belonging to the key of the relations have been underlined. With the following query we ask for all students attending a lecture by a Professor called “Larson”. select distinct s.SName from Student s, Attend a, Lecture l, Professor p where s.SNo = a.ASNo and a.ALNo = l.LNo and l.LPNo = p.PNo and p.PName = ‘Larson’ The outline of the rest of the chapter is as follows. A query is typically translated into an algebraic expression. Hence, we first review the relational algebra and then discuss the translation process. Thereafter, we present the two phases of textbook query optimization: logical and physical query optimization. A brief discussion follows. 1 We do not claim to be fair to the above mentioned sections. 15 16 2.2 CHAPTER 2. TEXTBOOK QUERY OPTIMIZATION Algebra Let us briefly recall the standard definition of the most important algebraic operators. Their inputs are relations, that is sets of tuples. Sets do not contain duplicates. The attributes of the tuples are assumed to be simple (nondecomposable) values. The most common algebraic operators are defined in Fig. 2.1. Although the common set operations union (∪), intersection (∩), and setdifference (\) belong to the relational algebra, we did not list them. Remember that ∪ and ∩ are both commutative and associative. \ is neither of them. Further, for ∪ and ∩, two distributivity laws hold. However, since these operations are not used in this section, we refer to Figure 7.1 in Section 7.1.1. Before we can understand Figure 2.1, we must clarify some terms and notations. For us, a tuple is a mapping from a set of attribute names (or attributes for short) to their corresponding values. These values are taken from certain domains. An actual tuple is denoted embraced by brackets. They include a comma-separated list of the form attribute name, column and attribute value as in [name: ‘‘Anton’’, age: 2]. If we have two tuples with different attribute names, they can be concatenated, i.e. we can take the union of their attributes. Tuple concatentation is denoted by ‘◦’. For example [name: ‘‘Anton’’, age: 2] ◦ [toy: ‘‘digger’’] results in [name: ‘‘Anton’’, age: 2, toy: ‘‘digger’’]. Let A and A′ be two sets of attributes where A′ ⊆ A holds. Further let t a tuple with schema A. Then, we can project t on the attributes in A (written as t.A). The resulting tuple contains only the attributes in A′ ; others are discarded. For example, if t is the tuple [name: ‘‘Anton’’, age: 2, toy: ‘‘digger’’] and A = {name, age}, then t.A is the tuple [name: ‘‘Anton’’, age: 2]. A relation is a set of tuples with the same attributes. The schema of a relation is the set of attributes. For a relation R this is sometimes denoted by sch(R), the schema of R. We denote it by A(R) and extend it to any algebraic expression producing a set of tuples. That is, A(e) for any algebraic expression is the set of attributes the resulting relation defines. Consider the predicate age = 2 where age is an attribute name. Then, age behaves like a free variable that must be bound to some value before the predicate can be evaluated. This motivates us to often use the terms attribute and variable synonymously. In the above predicate, we would call age a free variable. The set of free variables of an expression e is denoted by F(e). Sometimes it is useful to work with sequences of attributes in comparison predicates. Let A = ⟨a1 , . . . , ak ⟩ and B = ⟨b1 , . . . , bk ⟩ be two attribute sequences. Then for any comparison operator θ ∈ {=, ≤, <, ≥, >, ̸=}, the expression AθB abbreviates a1 θb1 ∧ a2 θb2 ∧ . . . ∧ ak θbk . Often, a natural join is defined. Consider two relations R1 and R2 . Define Ai := A(Ri ) for i ∈ {1, 2}, and A := A1 ∩ A2 . Assume that A is non-empty and A = ⟨a1 , . . . , an ⟩. If A is non-empty, the natural join is defined as R1 B R2 := ΠA1 ∪A2 (R1 Bp ρA:A′ (R2 )) where ρA:A′ renames the attributes ai in A to a′i in A′ and the predicate p has the form A = A′ , i.e. a1 = a′1 ∧ . . . ∧ an = a′n . 2.3. CANONICAL TRANSLATION 17 σp (R) := {r|r ∈ R, p(r)} ΠA (R) := {r.A|r ∈ R} R1 A R2 := {r1 ◦ r2 |r1 ∈ R1 , r2 ∈ R2 } R1 Bp R2 := σp (R1 A R2 ) Figure 2.1: Relational algebra For our algebraic operators, several equivalences hold. They are given in Figure 2.2. For them to hold, we typically require that the relations involved have disjoint attribute sets. That is, we assume—even for the rest of the book— that attribute names are unique. This is often achieved by using the notation R.a for a relation R or v.a for a variable ranging over tuples with an attribute a. Another possibility is to use the renaming operator ρ. Some equivalences are not always valid. Their validity depends on whether some condition(s) are satisfied or not. For example, Eqv. 2.4 requires F(p) ⊆ A. That is, all attribute names occurring in p must be contained in the attribute set A the projection retains: otherwise, we could not evaluate p after the projection has been applied. Although all conditions in Fig. 2.2 are of this flavor, we will see throughout the course of the book that more complex conditions exist. 2.3 Canonical Translation The next question is how to translate a given SQL query into the algebra. Since the relational algebra works on sets and not bags (multisets), we can only translate SQL queries that contain a distinct. Further, we restrict ourselves EX to flat queries not containing any subquery. Since negation, disjunction, aggregation, and quantifiers pose further problems, we neglect them. Further, we do not allow group by, order by, union, intersection, and except in our query. Last, we allow only attributes in the select clause and not more complex expressions. Thus, the generic SQL query pattern we can translate into the algebra looks as follows: select distinct a1 , a2 , . . . , am from R1 c1 , R2 c2 , . . . , Rn cn where p Here, the Ri are relation names and the ci are correlation names. The ai in the select clause are attribute names (or expressions of the form ci .ai ) taken from the relations in the from clause. The predicate p is assumed to be a conjunction of comparisions between attributes or attributes and constants. The translation process then follows the procedure described in Figure 2.3. First, we construct an expression that produces the cross product of the entries 18 CHAPTER 2. TEXTBOOK QUERY OPTIMIZATION σp1 ∧...∧pk (R) ≡ σp1 (. . . (σpk (R)) . . .) σp1 (σp2 (R)) ≡ σp2 (σp1 (R)) ΠA1 (ΠA2 (. . . (ΠAk (R)) . . .)) ≡ ΠA1 (R) (2.1) (2.2) if Ai ⊆ Aj for i < j (2.3) if F(p) ⊆ A (2.4) ΠA (σp (R)) ≡ σp (ΠA (R)) (R1 A R2 ) A R3 ≡ R1 A (R2 A R3 ) (2.5) (R1 Bp1,2 R2 ) Bp2,3 R3 ≡ R1 Bp1,2 (R2 Bp2,3 R3 ) if F(p1,2 ) ⊆ A(R1 ) ∪ A(R2 ) and F(p2,3 ) ⊆ A(R2 ) ∪ A(R3 ) R1 A R2 ≡ R2 A R1 R1 Bp R2 ≡ R2 Bp R1 σp (R1 A R2 ) ≡ σp (R1 ) A R2 (2.6) (2.7) (2.8) if F(p) ⊆ A(R1 ) (2.9) if F(p) ⊆ A(R1 ) (2.10) if A = A1 ∪ A2 , Ai ⊆ A(Ri ) (2.11) σp (R1 Bq R2 ) ≡ σp (R1 ) Bq R2 ΠA (R1 A R2 ) ≡ ΠA1 (R1 ) A ΠA2 (R2 ) ΠA (R1 Bp R2 ) ≡ ΠA1 (R1 ) Bp ΠA2 (R2 ) if F(p) ⊆ A, A = A1 ∪ A2 , and Ai ⊆ A(Ri ) (2.12) where θ is any of ∪, ∩, \ (2.13) σp (R1 θR2 ) ≡ σp (R1 )θσp (R2 ) ΠA (R1 ∪ R2 ) ≡ ΠA (R1 ) ∪ ΠA (R2 ) σp (R1 A R2 ) ≡ R1 Bp R2 Figure 2.2: Equivalences for the relational algebra found in the from clause. The result is ((. . . ((R1 A R2 ) A R3 ) . . .) A Rn ). Next, we add a selection with the where predicate: σp ((. . . ((R1 A R2 ) A R3 ) . . .) A Rn ). Last, we project on the attributes found in the select clause. Πa1 ,...,an (σp ((. . . ((R1 A R2 ) A R3 ) . . .) A Rn )). For our example query (2.14) (2.15) 2.3. CANONICAL TRANSLATION 19 1. Let R1 . . . Rk be the entries in the from clause of the query. Construct the expression  R1 if k = 1 F := ((. . . (R1 A R2 ) A . . .) A Rk ) else 2. The where clause is optional in SQL. Therefore, we distinguish the cases that there is no where clause and that the where clause exists and contains a predicate p. Construct the expression  F if there is no where clause W := σp (F ) if the where clause contains p 3. Let s be the content of the select distinct clause. For the canonical translation it must be of either ’*’ or a list a1 , . . . , an of attribute names. Construct the expression  W if s = ’*’ S := Πa1 ,...,an (W ) if s = a1 , . . . , an 4. Return S. Figure 2.3: (Simplified) Canonical translation of SQL to algebra select distinct s.SName from Student s, Attend a, Lecture l, Professor p where s.SNo = a.ASNo and a.ALNo = l.LNo and l.LPNo = p.PNo and p.PName = ‘Larson’ the result of the translation is Πs.SN ame (σp (((Student[s] A Attend[a]) A Lecture[l]) A Prof essor[p])) where p equals s.SNo = a.ASNo and a.ALNo = l.LNo and l.LPNo = p.PNo and p.PName = ‘Larson’. Note that we used the notation R[r] to say that a relation named R has the correlation name r. During the course of the book we will be more precise about the semantics of this notation and it will deviate from the one suggested here. We will take r as a variable successively bound to the elements (tuples) in R. However, for the purpose of this chapter it is sufficient to think of it as associating a correlation name with a relation. The query is represented graphically in Figure 2.7 (top). 20 CHAPTER 2. TEXTBOOK QUERY OPTIMIZATION 1. translate query into its canonical algebraic expression 2. perform logical query optimization 3. perform physical query optimization Figure 2.4: Text book query optimization 2.4 Logical Query Optimization Textbook query optimization takes place in two separate phases. The first phase is called logical query optimization and the second physical query optimization. Figure 2.4 lists all these steps together with the translation step. In this section we discuss logical query optimization. The foundation for this step is formed by the set of algebraic equivalences (see Figure 2.2). The set of algebraic equivalences spans the potential search space for this step. Given an initial algebraic expression—resulting from the translation of the given query—the algebraic equivalences can be used to derive all algebraic expressions that are equivalent to the initial algebraic expression. This set of all equivalent algebraic expressions can be derived by applying the equivalences first to the initial expression and then to all derived expressions until no new expression is derivable. Thereby, the algebraic equivalences can be applied in both directions: from left to right and from right to left. Care has to be taken that the conditions attached to the equivalences are obeyed. Of course, whenever we find a new algebraic equivalence that could not be derived from those already known, adding this equivalence increases our potential search space. On the one hand, this has the advantage that in a larger search space we may find better plans. On the other hand, it increases the already large search space which might cause problems for its exploration. Nevertheless, finding new equivalences is a well-established sport among database researchers. One remark on better plans. Plans can only be compared if costs can be attached to them via some cost function. This is what happens in most industrial strength query optimizers. However, at the level of logical algebraic expressions adding precise costs is not possible: too many implementation details are missing. These are added to the plan during the next phase called physical query optimization. As a consequence, we are left with plans without costs. The only thing we can do is to heuristically judge the effectiveness of applying an equivalence from left to right or in the opposite direction. As always with heuristics, the hope is that they work for most queries. However, it is typically very easy to find counter examples where the heuristics do not result in the best plan possible. (Again, best with respect to some metrics.) This finding can be generalized: any query optimization that takes place in more than a single phase risks missing the best plan. This is an important observation and we will come back to this issue more than once. After these words of warning let us continue to discuss textbook query 2.4. LOGICAL QUERY OPTIMIZATION 21 1. break up conjunctive selection predicates (Eqv. 2.1: →) 2. push down selections (Eqv. 2.2: →), (Eqv. 2.9: →) 3. introduce joins (Eqv. 2.15: →) 4. determine join order Eqv. 2.8, Eqv. 2.6, Eqv. 2.5, Eqv. 2.7 5. introduce and push down projections (Eqv. 2.3: ←), (Eqv. 2.4: →), (Eqv. 2.11: →), (Eqv. 2.12: →) Figure 2.5: Logical query optimization optimization. Logical query optimization requires the organization of all equivalences into groups. Further, the equivalences are directed. That is, it is fixed whether they are applied in a left to right or right to left manner. A directed equivalence is called rewrite rule. The groups of rewrite rules are then successively applied to the initial algebraic expression. Figure 2.5 describes the different steps performed during logical query optimization. Associated with each step is a set of rewrite rules that are applied to the input expression to yield a result expression. The numbers correspond to the equivalences in Figure 2.2. A small arrow indicates the direction in which the equivalences are applied. The first step breaks up conjunctive selection predicates. The motivation behind this step is that selections with simple predicates can be moved around easier. The rewrite rule used in this step is Equivalence 2.1 applied from left to right. For our example query Step 1 results in Πs.SN ame ( σs.SN o=a.ASN o ( σa.ALN o=l.LN o ( σl.LP N o=p.P N o ( σp.P N ame=‘Larson′ ( ((Student[s] A Attend[a]) A Lecture[l]) A Prof essor[p]))))) The query is represented graphically in Figure 2.7 (middle). Step 2 pushes selections down the operator tree. The motivation here is to reduce the number of tuples as early as possible such that subsequent (expensive) operators have smaller input. Applying this step to our example query yields: Πs.SN ame ( 22 CHAPTER 2. TEXTBOOK QUERY OPTIMIZATION σl.LP N o=p.P N o ( σa.ALN o=l.LN o ( σs.SN o=a.ASN o (Student[s] A Attend[a]) ALecture[l]) A(σp.P N ame=‘Larson′ (Prof essor[p])))) The query is represented graphically in Figure 2.7 (bottom). Excursion In general, we might encounter problems when pushing down selections. It may be the case that the order of the cross products is not well-suited for pushing selections down. If this is the case, we must consider reordering cross products during this step (Eqv. 2.7 and 2.5). To illustrate this point consider the following example query. select distinct s.SName from Student s, Lecture l, Attend a where s.SNo = a.ASNo and a.ALNo = l.LNo and l.LTitle = ‘Databases I’ After translation and Steps 1 and 2 the algebraic expression looks like Πs.SN ame ( σs.SN o=a.ASN o ( σa.ALN o=l.LN o ( (Student[s] A σl.LT itle=‘Databases I ′ (Lecture[l])) A Attend[a]))). Neither of σs.SN o=a.ASN o and σa.ALN o=l.LN o can be pushed down further. Only after reordering the cross products such as in Πs.SN ame ( σs.SN o=a.ASN o ( σa.ALN o=l.LN o ( (Student[s] A Attend[a]) A σl.LT itle=‘Databases I ′ (Lecture[l])))) can σs.SN o=a.ASN o be pushed down: Πs.SN ame ( σa.ALN o=l.LN o ( σs.SN o=a.ASN o (Student[s] A Attend[a]) Aσl.LT itle=‘Databases I ′ (Lecture[l]))) This is the reason why in some textbooks reorder cross products before selections are pushed down [264]. In this appoach, reordering of cross products takes into account the selection predicates that can possibly be pushed down to the leaves and down to just prior a cross product. In any case, the Steps 2 and 4 are highly interdependent and there is no simple solution. 2 After this small excursion let us resume rewriting our main example query. The next step to be applied is converting cross products to join operations (Step 3). The motivation behind this step is that the evaluation of cross products is very expensive and results in huge intermediate results. For every tuple in 2.4. LOGICAL QUERY OPTIMIZATION 23 the left input an output tuple must be produced for every tuple in the right input. A join operation can be implemented much more efficiently. Applying Equivalence 2.15 from left to right to our example query results in Πs.SN ame ( ((Student[s] Bs.SN o=a.ASN o Attend[a]) Ba.ALN o=l.LN o Lecture[l]) Bl.LP N o=p.P N o (σp.P N ame=‘Larson′ (Prof essor[p]))) The query is represented graphically in Figure 2.8 (top). The next step is really tricky and involved: we have to find an optimal order for evaluating the joins. The join’s associativity and commutativity gives us plenty of alternative (equivalent) evaluation plans. For our rather simple query Figure 2.6 lists some of the possible join orders where we left out the join predicates and used the single letter correlation names to denote the relations to be joined. Only p abbreviates the more complex expression σp.P N ame=‘Larson′ (Prof essor[p]). The edges show how plans can be derived from other plans by applying commutativity (c) or associativity (a). Unfortunately, we cannot ignore the problem of finding a good join order. It has been shown that the order in which joins are evaluated has an enormous influence on the total evaluation cost of a query. Thus, it is an important problem. On the other hand, the problem is really tough. Most join ordering problems turn out to be NP-hard. As a consequence, many different heuristics and cost-based algorithms have been invented. They are discussed in depth in Chapter 3. There we will also find examples showing how important (in terms of costs) the right choice of the join order is. To continue with our example query, we use a very simple heuristics: among all possible joins select the one first that produces the smallest intermediate result. This can be motivated as follows. In our current algebraic expression, the first join to be executed is Student[s] Bs.SN o=a.ASN o Attend[a]. All students and their attendances to some lecture are considered. The result and hence the input to the next join will be very big. On the other hand, if there is only one professor named Larson, the output of σp.P N ame=‘Larson′ (Prof essor[p]) is a single tuple. Joining this single tuple with the relation Lecture results in an output containing one tuple for every lecture taught by Larson. For a large university, this will be a small subset of all lectures. Continuing this line, we get the following algebraic expression: Πs.SN ame ( ((σp.P N ame=‘Larson′ (Prof essor[p]) Bp.P N o=l.LP N o Lecture[l]) Bl.LN o=a.ALN o Attend[a]) Ba.ASno=s.SN o Student[s]) The query is represented graphically in Figure 2.8 (middle). 24 CHAPTER 2. TEXTBOOK QUERY OPTIMIZATION The last step minimizes intermediate results by projecting out irrelevant attributes. An attribute is irrelevant, if it is not used further up the operator tree. When pushing down projections, we only apply them just before a pipeline EX breaker [347]. The reason is that for pipelined operators like selection, eliminating superfluous attributes does not gain much. The only pipeline breaker occurring in our plan is the join operator. Hence, before a join is applied, we project on the attributes that are further needed. The result is Πs.SN ame ( Πa.ASN o ( Πl.LN O ( Πp.P N o (σp.P N ame=‘Larson′ (Prof essor[p])) Bp.P N o=l.LP N o Πl.LP no,l.LN o (Lecture[l])) Bl.LN o=a.ALN o Πa.ALN o,a.ASN o (Attend[a])) Ba.ASno=s.SN o Πs.SN o,s.SN ame (Student[s])) This expression is represented graphically in Figure 2.8 (bottom). 2.5 Physical Query Optimization Physical query optimization adds more information to the logical query evaluation plan. First, there exist many different ways to access the data stored in a database. One possibility is to scan a relation to find the relevant tuples. Another alternative is to use an index to access only the relevant parts. If an unclustered index is used, it might be beneficial to sort the tuple identifiers (TIDs2 ) to turn otherwise random disk accesses into sequential accesses. Since there is a multitutude of possibilities to access data, this topic is discussed in depth in Chapter 4. Second, the algebraic operators used in the logical plan may have different alternative implementations. The most prominent example is the join operator that has many different implementations: simple nested loop join, blockwise nested loop join, blockwise nested loop join with in-memory hash table, index nested loop join, hybrid hash join, sort merge join, bandwidth join, special spatial joins, set joins, and structural joins. Most of these join implementations can be applied only in certain situations. Most algorithms only implement equi-joins where the join predicate is a conjunction of simple equalities. Further, all the implementations differ in cost and robustness. But also other operators like grouping may have alternative implementations. Typically, for these operators exist sort-based and hash-based alternatives. Third, some operators require certain properties for their input streams. For example, a sort merge join requires its input to be sorted on the join attributes occurring in the equalities of the join predicate. These attributes are called join attributes. The sortedness property can be enforced by a sort operator. The sort operator 2 Sometimes TIDs are called RIDs (Row Identifiers). 2.6. DISCUSSION 25 is thus an enforcer since it makes sure that the required property holds. As we will see, properties and enforcers play a crucial role during plan generation. If common subexpressions are detected at the algebraic level, it might be beneficial to compute them only once and store the result. To do so, a tmp operator must be introduced. Later on, we will see more of these operators that materialize (partial) intermediate results in order to avoid the same computation to be performed more than once. An alternative is to allow QEPs which are DAGs and not merely trees (see Section ??). Physical query optimization is concerned with all the issues mentioned above. The outline of it is given in Figure 2.9. Let us demonstrate this for our small example query. Let us assume that there exists an index on the name of the professors. Then, instead of scanning the whole professor relation, it is beneficial to use the index to retrieve only those professors named Larson. Further, since a sort merge join is very robust and not the slowest alternative, we choose it as an implementation for all our join operations. This requires that we sort the inputs to the join operator on the join attributes. Since sorting is a pipeline breaker, we introduce it between the projections and the joins. The resulting plan is Πs.SN ame ( Sorta.ASN o (Πa.ASN o ( Sortl.LN o (Πl.LN O ( Sortp.P N o (Πp.P N o (IdxScanp.P N ame=‘Larson′ (Prof essor[p]))) Bsmj p.P N o=l.LP N o Sortl.LP N o (Πl.LP no,l.LN o (Lecture[l]))) Bsmj l.LN o=a.ALN o Sorta.ALN o (Πa.ALN o,a.ASN o (Attend[a])))) Bsmj a.ASno=s.SN o Sorts.SN o (Πs.SN o,s.SN ame (Student[s]))) where we annotated the joins with smj to indicate that they are sort merge joins. The sort operator has the attributes on which to sort as a subscript. We cheated a little bit with the notation of the index scan. The index is a physical entity stored in the database. An index scan typically allows to retrieve the TIDs of the tuples qualifying the predicate. If this is the case, another access to the relation itself is necessary to fetch the relevant attributes (p.PNo in our case) from the qualifying tuples of the relation. This issue is rectified in Chapter 4. The plan is shown as an operator graph in Figure 2.10. 2.6 Discussion This chapter left open many interesting issues. We took it for granted that the presentation of a query is an algebraic expression or operator tree. Is this really true? We have been very vague about ordering joins and cross products. We only considered queries of the form select distinct. How can we assure correct duplicate treatment for select all? We separated query optimization into two distinct phases: logical and physical query optimization. Any separation into 26 CHAPTER 2. TEXTBOOK QUERY OPTIMIZATION different phases results in the danger of not producing an optimal plan. Logical query optimization turned out to be a little difficult: pushing selections down and reordering joins are mutually interdependent. How can we integrate these steps into a single one and thereby avoid the problem mentioned? Further, our logical query optimization was not cost based and cannot be: too much information is still missing from the plan to associate precise costs with a logical algebraic expression. How can we integrate the phases? How can we determine the costs of a plan? We covered only a small fraction of SQL. We did not discuss disjunction, negation, union, intersection, except, aggregate functions, groupby, order-by, quantifiers, outer joins, and nested queries. Furthermore, how about other query languages like OQL, XPath, XQuery? Further, enhancements like materialized views exist nowadays in many commercial systems. How can we exploit them beneficially? Can we exploit semantic information? Is our exploitation of index structures complete? What happens if we encounter NULL-values? Many questions and open issues remain. The rest of the book is about filling these gaps. 27 2.6. DISCUSSION B p B l a s B a s l s B l a s a c l a c p B p B l c p B a p B l s a l a a B p a B p l c c a a B B s l B a l p p c B B a B p c B c p B cl s B s l a a B a s B c B s p l s p B B s a B B l B B c a B s s l c B c a B p a B c B B B s B p s B l B B B p p l a l c B B s a B c s B c s l B B p B B c a B l a B p s B p a c B l B B s B s B l l B B a c B B a a c s l a p s B s s B B B p B s B B c l a a B a B B B a c a c B p B c l s p a c l s B B p c B p B B p a B B c s B a a B a B c l B B l B p B p c B c B B p B c l B s B a B p Figure 2.6: Different join trees B s l B a p l 28 CHAPTER 2. TEXTBOOK QUERY OPTIMIZATION Πs.SN ame σs.SN o = a.ASN o ∧ a.ALN o = l.LN o ∧ l.LP N o = p.P N o ∧ p.P N ame = 0 Larson0 A A A Student[s] Attend[a] Lecture[l] Professor[p] Πs.SN ame σs.SN o = a.ASN o σa.ALN o = l.LN o σl.LP N o = p.P N o σp.P N ame = 0 Larson0 A A A Student[s] Attend[a] Lecture[l] Professor[p] Πs.SN ame σl.P N o = p.P N o A σa.ALN o = l.LN o Professor[p] A σs.SN o = a.ASN o σp.P N ame = 0 Larson0 Lecture[l] A Student[s] Attend[a] Figure 2.7: Plans for example query (Part I) 29 2.6. DISCUSSION Πs.SN ame Bl.P N o = p.P N o σp.P N ame = 0 Larson0 Ba.ALN o = l.LN o Bs.SN o =a.ASN o Student[s] Lecture[l] Professor[p] Attend[a] Πs.SN ame Ba.ASN o = s.SN o Bl.LN o = a.ALN o Bp.P N o = l.LP N o σp.P N ame = 0 Larson0 Student[s] Attend[a] Lecture[l] Professor[p] Πs.SN ame Ba.ASN o = s.SN o Πa.ASN o Bl.LN o Student[s] Πl.LN o Bp.P N o = l.LP N o Πp.P N o σp.P N ame = 0 Larson0 Πs.SN o,s.SN ame Πa.ALN o,a.ASN o Attend[a] Πl.LP N o,l.LN o Lecture[l] Professor[p] Figure 2.8: Plans for example query (Part II) 30 CHAPTER 2. TEXTBOOK QUERY OPTIMIZATION 1. introduce index accesses 2. choose implementations for algebraic operators 3. introduce physical operators (sort, tmp) Figure 2.9: Physical query optimization Πs.SN ame Bsmj a.ASno = s.SN o Sorta.ASN o Πa.ASN o Bsmj l.LN o=a.ALN o Sortl.LN o Πl.LN o Bsmj p.P N o=l.LP N o Sortp.P N o Sortl.LP N o Πp.P N o Πl.LP N o,l.LN o Sorts.SN o Πs.SN o,s.SN ame Student[s] Sorta.ALN o Πa.ALN o,a.ASN o Attend[a] IdxScanp.P N ame=0 Larson0 Lecture[l] Professor[p] Figure 2.10: Plan for example query after physical query optimization Chapter 3 Join Ordering The problem of join ordering is a very restricted and — at the same time — a very complex one. We have touched this issue while discussing logical query optimization in Chapter 2. Join ordering is performed in Step 4 of Figure 2.5. In this chapter, we simplify the problem of join ordering by not considering duplicates, disjunctions, quantifiers, grouping, aggregation, or nested queries. Expressed positively, we concentrate on conjunctive queries with simple and cheap join predicates. What this exactly means will become clear in the next section. Subsequent sections discuss different algorithms for solving the join ordering problem. Finally, we take a look at the structure of the search space. This is important if different join ordering algorithms are compared via benchmarks. If the wrong parameters are chosen, benchmark results can be misleading. The algorithms of this chapter form the core of every plan generator. 3.1 Queries Considered A conjunctive query is one whose where clause contains a (complex) predicate which in turn is a conjunction of (simple) predicates. Hence, a conjunctive query involves only and and no or or not operations. A simple predicate is of the form e1 θe2 where θ ∈ {=, ̸=, <, >, ≤, ≥} is a comparison operator and the ei are simple expressions in attribute names possibly containing some simple and cheap arithmetic operators. By cheap we mean that it is not worth applying extra optimization techniques. In this chapter, we restrict simple predicates even further to the form A = B for attributes A and B. A and B must also belong to different relations such that every simple predicate in this chapter is a join predicate. There are two reasons for this restriction. First, the most efficient join algorithms rely on the fact that the join predicate is of the form A = B. Such joins are called equi-joins. Any other join is called a non-equijoin. Second, in relational systems joins on foreign key attributes of one relation and key attributes of the other relation are very common. Other joins are rare. A base relation is a relation that is stored (explicitly) in the database. For the rest of the chapter, let Ri (1 ≤ i ≤ n) be n relations. These relations can be base relations but do not necessarily have to be. They could also be base relations to which predicates have already been supplied, e.g. as a result 31 32 CHAPTER 3. JOIN ORDERING Student s.SNo = a.ASNo Attend a.ALNo = l.LNo Professor Lecture l.LPNo = p.PNo p.PName = ’Larson’ Figure 3.1: Query graph for example query of Section 2.1 of applying the first three steps of logical query optimization. Summarizing, the queries we consider can be expressed in SQL as select distinct * from R1 ,. . . ,Rn where p where p is a conjunction of simple join predicates with attributes from exactly two relations. The latter restriction is not really necessary for the algorithms presented in this chapter but simplifies the exposition. 3.1.1 Query Graph A query graph is a convenient representation of a query. It is an undirected graph with nodes R1 , . . . , Rn . For every simple predicate in the conjunction P whose attributes belong to the relations Ri and Rj , we add an edge between Ri and Rj . This edge is labeled by the simple predicate. From now on, we denote the join predicate connecting Ri and Rj by pi,j . In general, pi,j can be a conjunction of simple join predicates connecting Ri and Rj . If query graphs are used for more than join ordering, selections need to be represented. This is done by self-edges from the relation to which the selection applies to itself. For the example query of Chapter 2.6, Figure 3.1 contains the according query graph. Query graphs can have many different shapes. The shapes that play a certain role in query optimization and the evaluation of join ordering algorithms are shown in Fig. 3.2. The query graph classes relevant for this chapter are chain queries, star queries, tree queries, cyclic queries and clique queries. Note that these classes are not disjoint and that some classes are subsets of other classes. EX In this chaper, we only treat connected query graphs. These can be evaluated without cross products. Excursion In general, the query graph is a hypergraph [888] as the following example shows. 33 3.1. QUERIES CONSIDERED chain queries cyclic query star queries cycle queries grid query tree query clique queries Figure 3.2: Query graph shapes select * from R1, R2, R3, R4 where f(R1.a, R2.a,R3.a) = g(R2.b,R3.b,R4.b) 3.1.2 Join Tree A join tree is an algebraic expression in relation names and join operators. Sometimes, cross products are allowed, too. A cross product is the same as a join operator with true as its join predicate. A join tree has its name from its graph representation. There, a join tree is a binary tree whose leaf nodes are the relations and whose inner nodes are joins (and possibly cross products). The edges represent the input/output relationship. Examples of join trees have been shown in Figure 2.6. Join trees fall into different classes. The most important classes are left-deep trees, right-deep trees, zig-zag trees, and bushy trees. Left-deep trees are join trees where every join has one of the relations Ri as its right input. Right-deep trees are defined analogously. In zig-zag trees at least one input of every join is a relation Ri . The class of zig-zag trees contains both left-deep and rightdeep trees. For bushy trees no restriction applies. Hence, the class of bushy trees contains all of the above three classes. The roots of these notions date back to the paper by Selinger et al. [784], where the search space of the query optimizer was restricted to left-deep trees. There are two main reasons for this restriction. First, only one intermediate result is generated at any time during query evaluation. Second, the number of left-deep trees is far less than the number of e.g. bushy trees. The other classes were then added later by other researchers whenever they found better join trees in them. The different classes are illustrated in Figure 2.6. From left to right, the columns contain left-deep, zig-zag, right-deep, and bushy trees. 34 CHAPTER 3. JOIN ORDERING Left-deep trees directly correspond to an ordering (i.e. a permutation) of the relations. For example, the left-deep tree ((((R2 B R3 ) B R1 ) B R4 ) B R5 ) directly corresponds to the permutation R2 , R3 , R1 , R4 , R5 . It should be clear that there is a one-to-one correspondence between permutations and left-deep join trees. We will also use the term sequence of relations synonymously. The notion of join ordering goes back to the times where only left-deep trees were considered and, hence, producing an optimal join tree was equivalent to optimally ordering the joins, i.e. determining a permutation with lowest cost. Left-deep, right-deep, and zig-zag trees can be classed under the general term linear trees. Sometimes, the term linear trees is used synonymously for left-deep trees. We will not do so. Join trees are sometimes called operator trees or query evaluation plans. Although this is not totally wrong, these terms have a slightly different connotation. Operator trees typically contain more than only join operators. Query evaluation plans (QEPs or plans for short) typically have more information from physical query optimization associated with them. 3.1.3 Simple Cost Functions In order to judge the quality of join trees, we need a cost function that associates a certain positive cost with each join tree. Then, the task of join ordering is to find among all equivalent join trees the join tree with lowest associated costs. One part of any cost function are cardinality estimates. They are based on the cardinalities of the relations, i.e. the number of tuples contained in them. For a given relation Ri , we denote its cardinality by |Ri |. Then, the cardinality of intermediate results must be estimated. This is done by introducing the notion of join selectivity. Let pi,j be a join predicate between relations Ri and Rj . The selectivity fi,j of pi,j is then defined as fi,j = |Ri Bpi,j Rj | |Ri | ∗ |Rj | This is the number of tuples in the join’s result divided by the number of tuples in the Cartesian Product between Ri and Rj . If fi,j is 0.1, then only 10% of all tuples in the Cartesian Product survive the predicate pi,j . Note that the selectivity is always a number between 0 and 1 and that fi,j = fj,i . We use an f and not an s, since the selectivity of a predicate is often called filter factor . Besides the relation’s cardinalities, the selectivities of the join predicates pi,j are assumed to be given as input to the join ordering algorithm. Therefore, we can compute the output cardinality of a join Ri Bpi,j Rj , as |Ri Bpi,j Rj | = fi,j |Ri ||Rj | From this it becomes clear that if there is no join predicate for two relations Ri and Rj , we can assume a join predicate true and associate a selectivity of 1 with it. The output cardinality is then the cardinality of the cross product 35 3.1. QUERIES CONSIDERED between Ri and Rj . We also define fi,i = 1 for all 1 ≤ i ≤ n. This allows us to keep subsequent formulas simple. We now need to extend our cardinality estimation to join trees. This can be done by recursively applying the above formula. Consider a join tree T joining two join trees T1 and T2 , i.e. T = T1 B T2 . Then, the result cardinality |T | can be calculated as follows. If T is a leaf Ri , then |T | := |Ri |. Otherwise, Y |T | = ( fi,j ) |T1 | |T2 |. Ri ∈T1 ,Rj ∈T2 Note that this formula assumes that the selectivities are independent of each other. Assuming independence is common but may be very misleading. More on this issue can be found in Chapter ??. Nevertheless, we assume independence and stick to the above formula. For sequences of joins we can give a simple cardinality definition. Let s = R1 , . . . , Rn be a sequence of relations. Then |s| = n Y k=1 k Y |Rk |( fi,k ). i=1 Given the above, a query graph alone is not really sufficient for the specification of a join ordering problem: cardinalities and selectivities are missing. On the other hand, from a complete list of cardinalities and selectivities we can derive the query graph. Obviously, the following defines a chain query with query graph R1 − − − R2 − − − R3 : |R1 | = 10 |R2 | = 100 |R3 | = 1000 f1,2 = 0.1 f2,3 = 0.2 In all examples, we assume for all i and j for which fi,j is not given that there is no join predicate and hence fi,j = 1. We now come to cost functions. The first cost function we consider is called Cout . For a join tree T , Cout (T ) is the sum of all output cardinalities of all joins in T . Recursively, we can define Cout as  0 if T is a single relation Cout (T ) = |T | + Cout (T1 ) + Cout (T2 ) if T = T1 B T2 From a theoretial point of view, Cout has many interesting properties: it is symmetric, it has the ASI property, and it can be applied to an expression of the logical algebra. From a practical point of view, however, it is rarely applied (yet). In real cost functions, the cardinalities only serve as input to more complex formulas capturing the costs of a join implementation. Since real cost functions 36 CHAPTER 3. JOIN ORDERING are too complex for this section, we stick to simple cost functions proposed by Krishnamurthy, Boral, and Zaniolo [520]. They argue that these cost functions are appropriate for main memory database systems. For the three different join implementations nested loop join (nlj), hash join (hj), and sort merge join (smj), they give the following cost functions: Cnlj (e1 Bp e2 ) = |e1 ||e2 | Chj (e1 Bp e2 ) = h|e1 | Csmj (e1 Bp e2 ) = |e1 |log(|e1 |) + |e2 |log(|e2 |) where ei are join trees and h is the average length of the collision chain in the hash table. We will assume h = 1.2. All these cost functions are defined for a single join operator. The cost of a join tree is defined as the sum of the costs of all joins it contains. We use the symbols Cx to also denote the costs of not only a single join but the costs of the whole tree. Hence, for sequences s of relations, we have Cnlj (s) = Chj (s) = Csmj (s) = n X i=2 n X i=2 n X i=2 |s1 , . . . , si−1 | ∗ |si | 1.2|s1 , . . . , si−1 | |s1 , . . . , si−1 | log(|s1 , . . . , si−1 |) + n X i=2 |si | log(|si |) Some notes on the cost functions are in order. First, note that these cost functions are even for main memory a little incomplete. For example, constant factors are missing. Second, the cost functions are mainly devised for left-deep trees. This becomes apparent when looking at the costs of hash joins. It is assumed that the right input is already stored in an appropriate hash table. Obviously, this can only hold for base relations, giving rise to left-deep trees. Third, Chj and Csmj do not work for cross products. However, we can extend these cost functions by defining the cost of a cross product to be equal to its output cardinality, which happens to be the cost of Cnlj . Fourth, in reality, more complex cost functions are used and other parameters like the width of the tuples—i.e. the number of bytes needed to store them—also play an important role. Fifth, the above cost functions assume that the same join algorithm is chosen throughout the whole plan. In practice, this will not be true. For the above chain query, we compute the costs of different join trees. The last join tree contains a cross product. R1 B R2 R2 B R3 R1 A R3 (R1 B R2 ) B R3 (R2 B R3 ) B R1 (R1 A R3 ) B R2 Cout Cnlj Chj Csmj 100 1000 12 697.61 20000 100000 120 10630.26 10000 10000 10000 10000.00 20100 101000 132 11327.86 40000 300000 24120 32595.00 30000 1010000 22000 143542.00 37 3.1. QUERIES CONSIDERED For the calculation of Cout note that |R1 B R2 B R3 | = 20000 is included in all of the last three lines of its column. For the nested loop cost function, the costs are calculated as follows: Cnlj ((R1 B R2 ) B R3 ) = 1000 + 100 ∗ 1000 = 101000 Cnlj ((R2 B R3 ) B R1 ) = 100000 + 20000 ∗ 10 = 300000 Cnlj ((R1 A R3 ) B R2 ) = 10000 + 10000 ∗ 100 = 1010000 The reader should verify the other costs. Several observations can be made from the above numbers: • The costs of different join trees differ vastly under every cost function. Hence, it is worth spending some time to find a cheap join order. • The costs of the same join tree differ under the different cost functions. • The cheapest join tree is (R1 B R2 ) B R3 under all four cost functions. • Join trees with cross products are expensive. Thus, a heuristics often used is not to consider join trees that contain unnecessary cross products. (If the query graph consists of several unconnected components, then and only then cross products are necessary. In other words: if the query graph is connected, no cross products are necessary.). • The join order matters even for join trees without cross products. We would like to emphasize that the join order is also relevant under other cost functions. Avoiding cross products is not always beneficial, as the following query specifiation shows: |R1 | = 1000 |R2 | = 2 |R3 | = 2 f1,2 = 0.1 f1,3 = 0.1 For Cout we have costs Cout Join Tree R1 B R2 200 R2 A R3 4 R1 B R3 200 (R1 B R2 ) 1 R3 240 (R2 A R3 ) 1 R1 44 (R1 B R3 ) 1 R2 240 38 CHAPTER 3. JOIN ORDERING Note that although the absolute numbers are quite small, the ratio of the best and the second best join tree is quite large. The reader is advised to find more examples and to apply other cost functions. The following example illustrates that a bushy tree can be superior to any linear tree. Let us use the following query specification: |R1 | = 10 |R2 | = 20 |R3 | = 20 |R4 | = 10 f1,2 = 0.01 f2,3 = 0.5 f3,4 = 0.01 If we do not consider cross products, we have for the symmetric (see below) cost function Cout the following join trees and costs: Cout Join Tree R1 B R2 2 R2 B R3 200 R3 B R4 2 ((R1 B R2 ) B R3 ) B R4 24 ((R2 B R3 ) B R1 ) B R4 222 (R1 B R2 ) B (R3 B R4 ) 6 Note that all other linear join trees fall into one of these classes, due to the symmetry of the cost function and the join ordering problem. Again, the reader is advised to find more examples and to apply other cost functions. If we want to annotate a join operator by its implementation—which is necessary for the correct computation of costs—we write Bimpl for an implementation impl. For example, Bsmj is a sort-merge join, and the according cost function Csmj is used to compute its costs. Two properties of cost functions have some impact on the join ordering problem. The first is symmetry. A cost function Cimpl is called symmetric if Cimpl (R1 Bimpl R2 ) = Cimpl (R2 Bimpl R1 ) for all relations R1 and R2 . For symmetric cost functions, it does not make sense to consider commutativity. Hence, it suffices to consider left-deep trees only if we want to restrict ourselves to linear join trees. Note that Cout , Cnlj , Csmj are symmetric while Chj is not. The other property is the adjacent sequence interchange (ASI) property. Informally, the ASI property states that there exists a rank function such that the order of two subsequences is optimal if they are ordered according to the rank function. The ASI property is formally defined in Section 3.2.2. Only for tree queries and cost functions with the ASI property, a polynomial algorithm to find an optimal join order is known. Our cost functions Cout and Chj have the ASI property, Csmj does not. Summarizing the properties of our cost functions, we see that the classification is orthogonal: 39 3.1. QUERIES CONSIDERED symmetric ¬ symmetric ASI Cout , Cnlj Chj ¬ ASI Csmj (see text) For the missing case of a non-symmetric cost function not having the ASI property, we can use the cost function of the hybrid hash join [238, 677]. We turn to another not really well-researched topic. The goal is to cut down the number of cost functions which have to be considered for optimization and to possibly allow for simpler cost functions, which saves time during plan generation. Unfortunately, we have to restrict ourselves to left-deep join trees. Let s denote a sequence or permutation of a given set of joins. We define an equivalence relation on cost functions. Definition 3.1.1 Let C and C ′ be two cost functions. Then C ≡ C ′ :≺≻ (∀s C(s) minimal ≺≻ C ′ (s) minimal) Here, s is a join sequence. Obviously, ≡ is an equivalence relation. Now we can define the ΣIR property. Definition 3.1.2 A cost function C is ΣIR :≺≻ C ≡ Cout . That is, ΣIR is the set of all cost functions that are equivalent to Cout . Let us consider a very simple example. The last element of the sum in Cout is the size of the final join (all relations are joined). This is not the case for the following cost function: ′ (s) := Cout n−1 X i=2 |s1 , . . . , si | ′ Obviously, we have Cout is ΣIR. The next observation shows that we can construct quite complex ΣIR cost functions: Observation 3.1.3 Let C1 and C2 be two ΣIR cost functions. For nondecreasing functions f1 : R → R and f2 : R × R → R and constants c ∈ R and d ∈ R+ , we have that EX C1 + c C1 ∗ d f1 ◦ C1 f2 ◦ (C1 , C2 ) are ΣIR. Here, ◦ denotes function composition and (·, ·) function pairing. There are of course many more possibilites of constructing ΣIR functions. For the cost functions Chj , Csmj , and Cnlj , we now investigate which of them have the ΣIR property. 40 CHAPTER 3. JOIN ORDERING Let us consider Chj first. From Chj (s) = n X i=2 1.2|s1 , . . . , si−1 | = 1.2|s1 | + 1.2 = EX n−1 X |s1 , . . . , si | i=2 ′ 1.2|s1 | + 1.2Cout (s) and observation 3.1.3, we conclude that Chj is ΣIR for a fixed relation to be joined first. If we can optimize Cout in polynomial time, then we can optimize Cout for a fixed starting relation. Indeed, by trying each relation as a starting relation, we can find the optimal join tree in polynomial time. An algorithm that computes the optimal solution for an arbitrary relation to be joined first can be found in Section 3.2.2. Now, consider Csmj . Since n X i=2 is minimal if and only if Pn |s1 , . . . , si−1 |log(|s1 , . . . , si−1 |) n X i=2 |s1 , . . . , si−1 | is minimal and i=2 |si | log(|si |) is independent of the order of the relations within s — that is constant — we conclude that Csmj is ΣIR. Last, we have that Cnlj is not ΣIR. To see this, consider the following counter example with three relations R1 , R2 , and R3 of sizes 10, 10, and 100, 1 1 9 , f2,3 = 10 , and f1,3 = 10 . Now, resp. The selectivities are f1,2 = 10 |R1 R2 | = 90 |R1 R3 | = 100 |R2 R3 | = 100 and Cnl (R1 R2 R3 ) = 10 ∗ 10 + 90 ∗ 100 = 9100 Cnl (R1 R3 R2 ) = 10 ∗ 100 + 100 ∗ 10 = 2000 Cnl (R2 R3 R1 ) = 10 ∗ 100 + 100 ∗ 10 = 2000 We see that R1 R2 R3 has the smallest sum of intermediate result sizes but produces the highest cost. Hence, Cnlj is not ΣIR. 3.1.4 Classification of Join Ordering Problems After having discussed the different classes of query graphs, join trees and cost functions, we can classify join ordering problems. To define a certain join ordering problem, we have to pick one entry from every class: 41 3.1. QUERIES CONSIDERED Query Graph Classes × Possible Join Tree Classes × Cost Function Classes The query graph classes considered are chain, star , tree, and cyclic. For the join tree classes we distinguish between the different join tree shapes, i.e. whether they are left-deep, zig-zag, or bushy trees. We left out the right-deep trees, since they do not differ in their behavior from left-deep trees. We also have to take into account whether cross products are considered or not. For cost functions, we use a simple classification: we only distinguish between those that have the ASI property and those that do not. This leaves us with 4∗3∗2∗2 = 48 different join ordering problems. For these, we will first review search space sizes and complexity. Then, we discuss several algorithms for join ordering. Last, we give some insight into cost distributions over the search space and how this might influence the benchmarking of different join ordering algorithms. 3.1.5 Search Space Sizes Since search space sizes are easier to count if cross products are allowed, we consider them first. Then we turn to search spaces where cross products are not considered. Join Trees with Cross Products We consider the number of join trees for a query graph with n relations. When cross products are allowed, the number of left-deep and right-deep join trees is n!. By allowing cross products, the query graph does not restrict the search space in any way. Hence, any of the n! permutations of the n relations corresponds to a valid left-deep join tree. This is true independent of the query graph. Similarly, the number of zig-zag trees can be estimated independently of the query graph. First note that for joining n relations, we need n − 1 join operators. From any left-deep tree, we derive zig-zag trees by using the join’s commutativity and exchange the left and right inputs. Hence, from any leftdeep tree for n relations, we can derive 2n−2 zig-zag trees. We have to subtract another one, since the bottommost joins’ arguments are exchanged in different left-deep trees. Thus, there exists a total of 2n−2 n! zig-zag trees. Again, this number is independent of the query graph. The number of bushy trees can be estimated as follows. First, we need the number of binary trees. For n leaf nodes, the number of binary trees is given by C(n − 1), where C(n) is defined by the recurrence C(n) = n−1 X k=0 C(k)C(n − k − 1) with C(0) = 1. The numbers C(n) are called the Catalan Numbers (see [209]). They can also be computed by the following formula:   1 2n C(n) = . n+1 n 42 CHAPTER 3. JOIN ORDERING The Catalan Numbers grow in the order of Θ(4n /n3/2 ). After we know the number of binary trees with n leaves, we now have to attach the n relations to the leaves in all possible ways. For a given binary tree, this can be done in n! ways. Hence, the total number of bushy trees is n!C(n − 1). This can be simplified as follows (see also [307, 532, 867]):   1 2(n − 1) n!C(n − 1) = n! n n−1 1 (2n − 2)! = n! n (n − 1)!((2n − 2) − (n − 1))! (2n − 2)! = (n − 1)! Chain Queries, Left-Deep Join Trees, No Cartesian Product We now derive the function that calculates the number of left-deep join trees with no cross products for a chain query of n relations. That is, the query graph is R1 – R2 – . . . – Rn−1 – Rn . Let us denote the number of join trees by f (n). Obviously, for n = 0 there is only one (the empty) join tree. For n = 1, there is also only one join tree (no join). For larger n: Consider the join trees for R1 – . . . – Rn−1 where relation Rn−1 is the k-th relation from the bottom where k ranges from 1 to n − 1. From such a join tree we can derive join trees for all n relations by adding relation Rn at any position following Rn−1 . There are n − k such join trees. Only for k = 1, we can also add Rn below Rn−1 . Hence, for k = 1 we have n join trees. How many join trees with Rn−1 at position k are there? For k = 1, Rn−1 must be the first relation to be joined. Since we do not consider cross products, it must be joined with Rn−2 . The next relation must be Rn−3 , and so on. Hence, there is only one such join tree. For k = 2, the first relation must be Rn−2 , which is then joined with Rn−1 . Then Rn−3 , . . . , R1 must follow in this order. Again, there is only one such join tree. For higher k, for Rn−1 to occur safely at position k (no cross products) the k − 1 relations Rn−2 , . . . , Rn−k must occur before Rn−1 . There are exactly f(k − 1) join trees for the k − 1 relations. On each such join tree we just have to add Rn−1 on top of it to yield a join tree with Rn−1 at position k. Pn−1 Now we can compute the f(n) as n + k=2 f (k − 1) ∗ (n − k) for n > 1. n−1 Solving this recurrence gives us f (n) = 2 . The proof is by induction. The case n = 1 is trivial. 43 3.1. QUERIES CONSIDERED The induction step for n > 1 provided by Thomas Neumann goes as follows: f (n) = n + = n+ = n+ = n+ = n+ = n+ n−1 X k=2 n−3 X k=0 n−3 X k=0 n−2 X k=1 n−2 X f (k − 1) ∗ (n − k) f (k + 1) ∗ (n − k − 2) 2k ∗ (n − k − 2) k2n−k−2 2 n−k−2 k=1 n−2 X n−2 X + n−2 X k=2 (k − 1)2n−k−2 2n−j−2 i=1 j=i = n+ n−2 X n−i−2 X i=1 = n+ = n+ 2j j=0 n−2 X (2n−i−1 − 1) i=1 n−2 X i=1 2i − (n − 2) = n + (2n−1 − 2) − (n − 2) = 2n−1 Chain Queries, Zig-Zag Join Trees, No Cartesian Product All possible zig-zag trees can be derived from a left-deep tree by exchanging the left and right arguments of a subset of the joins. Since for the first join these alternatives are already considered within the set of left-deep trees, we are left with n − 2 joins. Hence, the number of zig-zag trees for n relations in a chain query is 2n−2 ∗ 2n−1 = 22n−3 . Chain Queries, Bushy Join Trees, No Cartesian Product We can compute the number of bushy trees with no cross products for a chain query in the following way. Let us denote this number by f(n). Again, let us assume that the chain query has the form R1 – R2 – . . . – Rn−1 – Rn . For n = 0, we only have the empty join tree. For n = 1 we have one join tree. For n = 2 we have two join trees. For more relations, every subtree of the join tree must contain a subchain in order to avoid cross products. Further, the subchain can occur 44 CHAPTER 3. JOIN ORDERING as the left or right argument of the join. Hence, we can compute f(n) as n−1 X k=1 2 f(k) f(n − k) This is equal to 2n−1 C(n − 1) EX where C(n) are the Catalan Numbers. Star Queries, No Cartesian Product The first join has to connect the center relation R0 with any of the other relations. The other relations can follow in any order. Since R0 can be the left or the right input of the first join, there are 2 ∗ (n − 1)! possible left-deep join trees for Star Queries with no Cartesian Product. The number of zig-zag join trees is derived by exchanging the arguments of all but the first join in any left-deep join tree. We cannot consider the first join since we did so in counting left-deep join trees. Hence, the total number of zig-zag join trees is 2 ∗ (n − 1)! ∗ 2n−2 = 2n−1 ∗ (n − 1)!. Constructing bushy join trees with no Cartesian Product from a Star Query other than zig-zag join trees is not possible. Remarks The numbers for star queries are also upper bounds for tree queries. For clique queries, no join tree containing a cross product is possible. Hence, all join trees are valid join trees and the search space size is the same as the corresponding search space for join trees with cross products. To give the reader a feeling for the numbers, the following tables contain the potential search space sizes for some n. n 1 2 3 4 5 6 7 8 9 10 left-deep 2n−1 1 2 4 8 16 32 64 128 256 512 Join trees without cross products chain query star query zig-zag bushy left-deep zig-zag/bushy 22n−3 2n−1 C(n − 1) 2 ∗ (n − 1)! 2n−1 (n − 1)! 1 1 1 1 2 2 2 2 8 8 4 8 32 40 12 48 128 224 48 384 512 1344 240 3840 2048 8448 1440 46080 8192 54912 10080 645120 32768 366080 80640 10321920 131072 2489344 725760 185794560 45 3.1. QUERIES CONSIDERED n 1 2 3 4 5 6 7 8 9 10 With cross products/clique left-deep zig-zag bushy n−2 n! 2 ∗ n! n!C(n − 1) 1 1 1 2 2 2 6 12 12 24 96 120 120 960 1680 720 11520 30240 5040 161280 665280 40320 2580480 17297280 362880 46448640 518918400 3628800 928972800 17643225600 Note that in Figure 2.6 only 32 join trees are listed, whereas the number of bushy trees for chain queries with four relations is 40. The missing eight cases are those zig-zag trees which are symmetric (i.e. derived by applying commutativity to all occurring joins) to the ones contained in the second column. From these numbers, it becomes immediately clear why historically the search space of query optimizers was restricted to left-deep trees and cross products for connected query graphs were not considered. 3.1.6 Problem Complexity The complexity of the join ordering problem depends on several parameters. These are the shape of the query graph, the class of join trees to be considered, whether cross products are considered or not, and whether the cost function has the ASI property or not. Not for all the combinations complexity results are known. What is known is summarized in the following table. Query graph general tree/star/chain star general/tree/star chain general tree star chain general tree/star/chain Join tree left-deep left-deep left-deep left-deep left-deep bushy bushy bushy bushy bushy bushy Cross products no no no yes yes no no no no yes yes Cost function ASI one join method (ASI) two join methods (NLJ+SMJ) ASI — ASI — ASI any ASI ASI Ibaraki and Kameda were the first who showed that the problem of deriving optimal left-deep trees for cyclic queries is NP-hard for a cost function for an n-way nested loop join implementation [438]. The proof was repeated for the cost function Cout which has the ASI property [194, 877]. In both proofs, the Complexity NP-hard P NP-hard NP-hard open NP-hard open P P NP-hard NP-hard 46 CHAPTER 3. JOIN ORDERING clique problem was used for the reduction [320]. Cout was also used in the other proofs of NP-hardness results. The next line goes back to the same paper. Ibaraki and Kameda also described an algorithm to solve the join ordering problem for tree queries producing optimal left-deep trees for a special cost function for a nested-loop n-way join algorithm. Their algorithm was based on the observation that their cost function has the ASI property. For this case, they could derive an algorithm from an algorithm for a sequencing problem for job scheduling designed by Monma and Sidney [627]. They, in turn, used an earlier result by Lawler [537]. The algorithm of Ibaraki and Kameda was slightly generalized by Krishnamurthy, Boral, and Zaniolo, who were also able to sketch a more efficient algorithm. It improves the time bounds from O(n2 log n) to O(n2 ). The disadvantage of both approaches is that with every relation, a fixed (i.e. join-tree independent) join implementation must be associated before the optimization starts. Hence, it only produces optimal trees if there is only one join implementation available or one is able to guess the optimal join method before hand. This might not be the case. The polynomial algorithm which we term IKKBZ is described in Section 3.2.2. For star queries, Ganguly investigated the problem of generating optimal left-deep trees if no cross products but two different cost functions (one for nested loop join, the other for sort merge join) are allowed. It turned out that this problem is NP-hard [312]. The next line is due to Cluet and Moerkotte [194]. They showed by reduction from 3DM that taking into account cross products results in an NP-hard problem even for star queries. Remember that star queries are tree queries and general graphs. The problem for general bushy trees follows from a result by Scheufele and Moerkotte [768]. They showed that building optimal bushy trees for cross products only (i.e. all selectivities equal one) is already NP-hard. This result also explains the last two lines. By noting that for star queries, all bushy trees that do not contain a cross product are left-deep trees, the problem can be solved by the IKKBZ algorithm for left-deep trees. Ono and Lohman showed that for chain queries dynamic programming considers only a polynomial number of bushy trees if no cross products are considered [653]. This is discussed in Section 3.2.4. The table is rather incomplete. Many open problems exist. For example, if we have chain queries and consider cross products: is the problem NP-hard or in P? Some results for this problem have been presented [768], but it is still an open problem (see Section 3.2.7). Open is also the case where we produce optimal bushy trees with no cross products for tree queries. Yet another example of an open problem is whether we could drop the ASI property and are still able to derive a polynomial algorithm for a tree query. This is especially important, since the cost function for a sort-merge algorithm does not have the ASI property. Good summaries of complexity results for different join ordering problems can be found in the theses of Scheufele [766] and Hamalainen [394]. Given that join ordering is an inherently complex problem with no polynomial algorithm in sight, one might wonder whether there exists good polynomial 3.2. DETERMINISTIC ALGORITHMS 47 approximation algorithms. Chances are that even this is not the case. Chatterji, Evani, Ganguly, and Yemmanuru showed that three different optimization problems — all asking for linear join trees — are not approximable [145]. 3.2 Deterministic Algorithms 3.2.1 Heuristics We now present some simple heuristic solutions to the problem of join ordering. These heuristics only produce left-deep trees. Since left-deep trees are equivalent with permutations, these heuristics order the joins according to some criterion. The core algorithm for the heuristics discussed here is the greedy algorithm (for an introduction see [209]). In greedy algorithms, a weight is associated with each entity. In our case, weights are associated with each relation. A typical weight function is the cardinality of the relation (|R|). Given a weight function weight, a greedy join ordering algorithm works as follows: GreedyJoinOrdering-1({R1 , . . . , Rn }, (*weight)(Relation)) Input: a set of relations to be joined and a weight function Output: a join order S = ϵ; // initialize S to the empty sequence R = {R1 , . . . , Rn }; // let R be the set of all relations while(!empty(R)) { Let k be such that: weight(Rk ) = minRi ∈R (weight(Ri )); R\ = Rk ; // eliminate Rk from R S◦ = Rk ; // append Rk to S } return S This algorithm takes cross products into account. If we are only interested in left-deep join trees with no cross products, we have to require that Rk is connected to some of the relations contained in S in case S ̸= ϵ. Note that a more efficient implementation would sort the relations according to their weight. Not all heuristics can be implemented with a greedy algorithm as simple as above. An often-used heuristics is to take the relation next that produces the smallest (next) intermediate result. This cannot be determined by the relation alone. One must take into account the sequence S already processed, since only then the selectivities of all predicates connecting relations in S and the new relation are deducible. And we must take the product of these selectivities and the cardinality of the new relation in order to get an estimate of the intermediate result’s cardinality. As a consequence, the weights become relative to S. In other words, the weight function now has two parameters: a sequence of relations already joined and the relation whose relative weight is to be computed. Here is the next algorithm: 48 CHAPTER 3. JOIN ORDERING GreedyJoinOrdering-2({R1 , . . . , Rn }, (*weight)(Sequence of Relations, Relation)) Input: a set of relations to be joined and a weight function Output: a join order S = ϵ; // initialize S to the empty sequence R = {R1 , . . . , Rn }; // let R be the set of all relations while(!empty(R)) { Let k be such that: weight(S, Rk ) = minRi ∈R (weight(S, Ri )); R\ = Rk ; // eliminate Rk from R S◦ = Rk ; // append Rk to S } return S Note that for this algorithm, sorting is not possible. GreedyJoinOrdering-2 can be improved by taking every relation as the starting one. GreedyJoinOrdering-3({R1 , . . . , Rn }, (*weight)(Sequence of Relations, Relation)) Input: a set of relations to be joined and a weight function Output: a join order Solutions = ∅; for (i = 1; i ≤ n; + + i) { S = Ri ; // initialize S to a singleton sequence R = {R1 , . . . , Rn } \ {Ri }; // let R be the set of all relations while(!empty(R)) { Let k be such that: weight(S, Rk ) = minRi ∈R (weight(S, Ri )); R\ = Rk ; // eliminate Rk from R S◦ = Rk ; // append Rk to S } Solutions += S; } return cheapest in Solutions In addition to the relative weight function mentioned before, another often used relative weight function is the product of the selectivities connecting relations in S with the new relation. This heuristics is sometimes called MinSel . The above two algorithms generate linear join trees. Lohman and Fegaras independently proposed heuristics (named Greedy Operator Ordering (GOO)) to generate bushy join trees [276, 277, 566]. The idea is as follows. A set of join trees Trees is initialized such that it contains all the relations to be joined. It then investigates all pairs of trees contained in Trees. Among all of these, the algorithm joins the two trees that result either in the smallest cost of performing this join (Lohman [566]) or in the smallest intermediate result (Fegaras [276, 277]). The two trees are then eliminated from Trees and the new join tree joining them is added to it. The algorithm then looks as follows: 3.2. DETERMINISTIC ALGORITHMS 49 GOO({R1 , . . . , Rn }, (*weight)(T1 ,T2 )) Input: a set of relations to be joined Output: join tree Trees := {R1 , . . . , Rn } while (|Trees| != 1) { find Ti , Tj ∈ Trees such that i ̸= j, weight(Ti , Tj ) is minimal among all pairs of trees in Trees Trees − = Ti ; Trees − = Tj ; Trees + = Ti B Tj ; } return the tree contained in Trees; Our GOO variant differs slightly from the one proposed by Fegaras. He uses arrays, explicitly handles the forming of the join predicates, and materializes intermediate result sizes. Hence, his algorithm is a little more elaborated, but we assume that the reader is able to fill in the gaps. Further, Fegaras proposes the weight function weight(T1 , T2 ) = |Ti B Tj | whereas Lohman proposes to use the weight function weight(T1 , T2 ) = cost(Ti B Tj ). None of our algorithms so far considers different join implementations. An explicit consideration of commutativity for non-symmetric cost functions could also help to produce better join trees. The reader is asked to work out the details of these extensions. In general, the heuristics do not produce the optimal plan. EX The reader is advised to find examples where the heuristics are far off the best possible plan. EX 3.2.2 Determining the Optimal Join Order in Polynomial Time Since the general problem of join ordering is NP-hard, we cannot expect to find a polynomial solution for it. However, for special cases, we can expect to find solutions that work in polynomial time. These solutions can also be used as heuristics for the general case, either to find a not-that-bad join tree or to determine an upper bound for the costs that is then fed into a search procedure to prune the search space. The most general case for which a polynomial solution is known is charactized by the following features: • the query graph must be acyclic • no cross products are considered • the search space is restricted to left-deep trees • the cost function must have the ASI property The algorithm was presented by Ibaraki and Kameda [438]. Later Krishnamurthy, Boral, and Zaniolo presented it again for some other cost functions (still having the ASI property) [520]. They also observed that the upper bound O(n2 log n) of the original algorithm could be improved to O(n2 ). In any case, 50 CHAPTER 3. JOIN ORDERING the algorithm is based on an algorithm discovered by Monma and Sidney for job scheduling [537, 627] . Let us call the (unimproved) algorithm IKKBZAlgorithm. The IKKBZ-Algorithm considers only join operations that have a cost function of the form: cost(Ri 1 Rj ) = |Ri | ∗ hj (|Rj |) where each Rj can have its own cost function hj . We denote the set of hj by H and parameterize cost functions with it. Example instanciations are • hj ≡ 1.2 for main memory hash-based joins • hj ≡ id for nested-loop joins where id is the identity function. Let us denote by ni the cardinality of the relation Ri (ni := |Ri |). Then, the hi (ni ) represent the costs per input tuple to be joined with Ri . The algorithm works as follows. For every relation Rk it computes the optimal join order under the assumption that Rk is the first relation in the join sequence. The resulting subproblems then resemble a job-scheduling problem that can be solved by the Monma-Sidney-Algorithm [627]. In order to present this algorithm, we need the notion of a precedence graph. A precedence graph is formed by taking a node in the (undirected) query graph and making this node a root node of a (directed) tree where the edges point away from the selected root node. Hence, for acyclic, connected query graphs— those we consider in this section—a precedence graph is a tree. We construct the precedence graph of a query graph G = (V, E) as follows: • Make some relation Rk ∈ V the root node of the precedence graph. • As long as not all relations are included in the precedence graph: Choose a relation Ri ∈ V , such that (Rj , Ri ) ∈ E is an edge in the query graph and Rj is already contained in the (partial) precedence graph constructed so far and Ri is not. Add Rj and the edge Rj → Ri to the precedence graph. A sequence S = v1 , . . . , vk of nodes conforms to a precedence graph G = (V, E) if the following conditions are satisfied: 1. for all i (2 ≤ i ≤ k) there exists a j (1 ≤ j < i) with (vj , vi ) ∈ E and 2. there is no edge (vi , vj ) ∈ E for i > j. For non-empty sequences U and V in a precedence graph, we write U → V if, according to the precedence graph, U must occur before V . This requires U and V to be disjoint. More precisely, there can only be paths from nodes in U to nodes in V and at least one such path exists. Consider the following query graph: 51 3.2. DETERMINISTIC ALGORITHMS R1 R5 R3 R4 R2 R6 For this query graph, we can derive the following precedence graphs: R1 R4 R5 R6 R3 R5 R6 R4 R6 R6 R5 R6 R4 R4 R6 R5 R2 R5 R4 R5 R4 R3 R1 R3 R3 R2 R3 R2 R1 R3 R1 R5 R2 R3 R1 R2 The IKKBZ-Algorithm takes a single precedence graph and produces a new one that is totally ordered. From this total order it is very easy to construct a corresponding join graph: the following figure contains a totally ordered precedence graph (left-hand side) as generated by the IKKBZ-Algorithm and the corresponding join graph on the right-hand side. R1 B R2 R3 R5 B R4 R4 B R5 R6 R6 B R3 B R1 R2 52 CHAPTER 3. JOIN ORDERING Define R1,2,...,k := R1 1 R2 1 · · · 1 Rk n1,2,...,k := |R1,2,...,k | For a given precedence graph, let Ri be a relation and Ri be the set of relations from which there exists a path to Ri . Then, in any join tree adhering to the precedence graph, all relations Q in Ri and only those will be joined before Ri . Hence, we can define si = Rj ∈Ri fi,j for i > 1. Note that for any i only one j with fi,j ̸= 1 exists in the product. If the precedence graph is a chain, then the following holds: n1,2,...,k+1 = n1,2...,k ∗ sk+1 ∗ nk+1 We define s1 = 1. Then we have n1,2 = s2 ∗ (n1 ∗ n2 ) = (s1 ∗ s2 ) ∗ (n1 ∗ n2 ) and, in general, n1,2,...,k = k Y i=1 (si ∗ ni ). We call the si selectivities, although they depend on the precedence graph. The costs for a totally ordered precedence graph G can thus be computed as follows: n X CostH (G) = [n1,2,...,i−1 ∗ hi (ni )] = i=2 n i−1 X Y [( i=2 j=1 sj ∗ nj ) ∗ hi (ni )] If we define hi (ni ) = si ni , then CostH ≡ Cout . The factor si ni determines by how much the input relation to be joined with Ri changes its cardinality after the join has been performed. If si ni is less than one, we call the join decreasing, if it is larger than one, we call the join increasing. This distinction plays an important role in the heuristic discussed in Section 3.2.3. The cost function can also be defined recursively. Definition 3.2.1 Define the cost function CH as follows: CH (ϵ) = 0 CH (Rj ) = 0 if Rj is the root CH (Rj ) = hj (nj ) else CH (S1 S2 ) = CH (S1 ) + T (S1 ) ∗ CH (S2 ) where T (ϵ) = 1 Y T (S) = (si ∗ ni ) Ri ∈S 3.2. DETERMINISTIC ALGORITHMS 53 It is easy to prove by induction that CH is well-defined and that CH (G) = CostH (G). EX Definition 3.2.2 Let A and B be two sequences and V and U two non-empty sequences. We say that a cost function C has the adjacent sequence interchange property (ASI property) if and only if there exists a function T and a rank function defined for sequences S as rank(S) = T (S) − 1 C(S) such that for non-empty sequences S = AU V B the following holds C(AU V B) ≤ C(AV U B) ≺≻ rank(U ) ≤ rank(V ) (3.1) if AU V B and AV U B satisfy the precedence constraints imposed by a given precedence graph. Lemma 3.2.3 The cost function CH defined in Definition 3.2.1 has the ASI property. The proof is very simple. Using the definition of CH , we have CH (AU V B) = CH (A) +T (A)CH (U ) +T (A)T (U )CH (V ) +T (A)T (U )T (V )CH (B) and, hence, CH (AU V B) − CH (AV U B) = T (A)[CH (V )(T (U ) − 1) − CH (U )(T (V ) − 1)] = T (A)CH (U )CH (V )[rank(U ) − rank(V )] The proposition follows. 2 Definition 3.2.4 Let M = {A1 , . . . , An } be a set of node sequences in a given precedence graph. Then, M is a called a module if for all sequences B that do not overlap with the sequences in M one of the following conditions holds: • B → Ai , ∀ 1 ≤ i ≤ n • Ai → B, ∀ 1 ≤ i ≤ n • B ̸→ Ai and Ai ̸→ B, ∀ 1 ≤ i ≤ n Lemma 3.2.5 Let C be any cost function with the ASI property and {A, B} a module. If A → B and additionally rank(B) ≤ rank(A), then we find an optimal sequence among those in which B directly follows A. 54 CHAPTER 3. JOIN ORDERING Proof Every optimal permutation must have the form (U, A, V, B, W ), since A → B. Assumption: V ̸= ϵ. If rank(V ) ≤ rank(A), then we can exchange V and A without increasing the costs. If rank(A) ≤ rank(V ), we have rank(B) ≤ rank(V ) due to the transitivity of ≤. Hence, we can exchange B and V without increasing the costs. Both exchanges produce legal sequences obeying the precedence graph, since {A, B} is a module. 2 If the precedence graph demands A → B but rank(B) ≤ rank(A), we speak of contradictory sequences A and B. Since the lemma shows that no non-empty subsequence can occur between A and B, we will combine A and B into a new single node replacing A and B. This node represents a compound relation comprising all relations in A and B. Its cardinality is computed by multiplying the cardinalities of all relations occurring in A and B, and its selectivity s is the product of all the selectivities si of the relations Ri contained in A and B. The continued process of this step until no more contradictory sequence exists is called normalization. The opposite step, replacing a compound node by the sequence of relations it was derived from, is called denormalization. We can now present the algorithm IKKBZ. IKKBZ(G) Input: an acyclic query graph G for relations R1 , . . . , Rn Output: the best left-deep tree R = ∅; for (i = 1; i ≤ n; + + i) { Let Gi be the precedence graph derived from G and rooted at Ri ; T = IKKBZ-Sub(Gi ); R+ = T ; } return best of R; IKKBZ-Sub(Gi ) Input: a precedence graph Gi for relations R1 , . . . , Rn rooted at some Ri Output: the optimal left-deep tree under Gi while (Gi is not a chain) { let r be the root of a subtree in Gi whose subtrees are chains; IKKBZ-Normalize(r); merge the chains under r according to the rank function in ascending order; } IKKBZ-Denormalize(Gi ); return Gi ; IKKBZ-Normalize(r) Input: the root r of a subtree T of a precedence graph G = (V, E) Output: a normalized subchain while (∃ r′ , c ∈ V , r →∗ r′ , (r′ , c) ∈ E: rank(r′ ) > rank(c)) { replace r′ by a compound relation r′′ that represents r′ c; 55 3.2. DETERMINISTIC ALGORITHMS }; We do not give the details of IKKBZ-Denormalize, as it is trivial. R1 18 100 R2 R3 1 2 10 100 R1 R4 1 4 1 5 1 3 1 2 A) 100 R5 R2 R3 49 50 24 25 R4 R6,7 53 R6 10 1 10 R7 D) 20 R5 R1 R2 R3 49 50 24 25 R4 R5 R1 19 20 R6 5 6 4 5 R2 R3 49 50 24 25 R4,6,7 199 320 E) B) R7 R5 5 6 1 2 R1 R2 R3 49 50 24 25 C) R4 R5 5 6 19 20 19 20 R6,7 53 F) Figure 3.3: Illustrations for the IKKBZ Algorithm Let us illustrate the algorithm IKKBZ-Sub by a simple example. We use the cost function Cout . Figure 3.3 A) shows a query graph. The relations are annotated with their sizes and the edges with the join selectivities. Chosing R1 as the root of the precedence graph results in B). There, the nodes are annotated by the ranks of the relations. R4 is the root of a subtree all of whose subtrees are chains. Hence, we normalize it. For R5 , there is nothing to do. The ranks of R6 and R7 are contradictory. We form a compound relation R6,7 , calculate its cardinality, selectivity, and rank. The latter is shown in C). Merging the two subchains under R4 results in D). Now R1 is the root of a subtree with only chains underneath. Normalization detects that the ranks for R4 and R5 are contradictory. E) shows the tree after introducing the compound relation R4,5 . Now R4,5 and R6,7 have contradictory ranks, and we replace them by the compound relation R4,5,6,7 as shown in F). Merging the chains under R1 gives G). Since this is a chain, we leave the loop and denormalize. The final result is shown in H). 5 6 56 CHAPTER 3. JOIN ORDERING R1 I R2 R3 R4 p2,3 II p1,2 III a) p3,4 p3,4 III b) B R4 B p2,3 R3 B R1 p1,2 IV a) p2,3 R2 IV b) B B p3,4 R1 B R2 R3 R4 p1,2 p2,3 V a) V b) B B B p1,2 p3,4 R1 R2 R3 R4 Figure 3.4: A query graph, its directed join graph, some spanning trees and join trees We can use the IKKBZ-Algorithm to derive a heuristics also for cyclic queries, i.e. for general query graphs. In a first step, we determine a minimal spanning tree of the query graph. It is then used as the input query graph for the IKKBZ-Algorithm. Let us call this the IKKBZ-based Heuristics. 3.2.3 The Maximum-Value-Precedence Algorithm Lee, Shih, and Chen proposed a very interesting heuristics for the join ordering problem [539]. They use a weighted directed join graph (WDJG) to represent queries. Within this graph, every join tree corresponds to a spanning tree. 3.2. DETERMINISTIC ALGORITHMS 57 Given a conjunctive query with join predicates P . For a join predicate p ∈ P , we denote by R(p) the relations whose attributes are mentioned in p. Definition 3.2.6 The directed join graph of a conjunctive query with join predicates P is a triple G = (V, Ep , Ev ), where V is the set of nodes and Ep and Ev are sets of directed edges defined as follows. For any two nodes u, v ∈ V , if R(u) ∩ R(v) ̸= ∅ then (u, v) ∈ Ep and (v, u) ∈ Ep . If R(u) ∩ R(v) = ∅, then (u, v) ∈ Ev and (v, u) ∈ Ev . The edges in Ep are called physical edges, those in Ev virtual edges. Note that in G for every two nodes u, v, there is an edge (u, v) that is either physical or virtual. Hence, G is a clique. Let us see how we can derive a join tree from a spanning tree of a directed join graph. Figure 3.4 I) gives a simple query graph Q corresponding to a chain and Part II) presents Q’s directed join graph. Physical edges are drawn by solid arrows, virtual edges by dotted arrows. Let us first consider the spanning tree shown in Part III a). It says that we first execute R1 Bp1,2 R2 . The next join predicate to evaluate is p2,3 . Obviously, it does not make much sense to execute R2 Bp2,3 R3 , since R1 and R2 have already been joined. Hence, we replace R2 in the second join by the result of the first join. This results in the join tree (R1 Bp1,2 R2 ) Bp2,3 R3 . For the same reason, we proceed by joining this result with R4 . The final join tree is shown in Part III b). Part IV a) shows another spanning tree. The two joins R1 Bp1,2 R2 and R3 Bp3,4 R4 can be executed independently and do not influence each other. Next, we have to consider p2,3 . Both R2 and R3 have already been joined. Hence, the last join processes both intermediate results. The final join tree is shown in Part IV b). The spanning tree shown in Part V a) results in the same join tree shown in Part V b). Hence, two different spanning trees can result in the same join tree. However, the spanning tree in Part IV a) is more specific in that it demands R1 Bp1,2 R2 to be executed before R3 Bp3,4 . Next, take a look at Figure 3.5. Part I), II), and III a) show a query graph, its directed join tree and a spanning tree. To build a join tree from the spanning tree we proceed as follows. We have to execute R2 Bp2,3 R3 and R3 B R4 first. In which way we do so is not really fixed by the spanning tree. So let us do both in parallel. Next is p1,2 . The only dependency the spanning tree gives us is that it should be executed after p3,4 . Since there is no common relation between those two, we perform R1 Bp1,2 R2 . Last is p4,5 . Since we find p3,4 below it, we use the intermediate result produced by it as a replacement for R4 . The result is shown in Part III b). It has three loose ends. Additional joins are required to tie the partial results together. Obviously, this is not what we want. A spanning tree that avoids this problem of additional joins is called effective. It can be shown that a spanning tree T = (V, E) is effective if it satisfies the following conditions [539]: 1. T is a binary tree, 2. for all inner nodes v and node u with (u, v) ∈ E it holds that R∗ (T (u)) ∩ R(v) ̸= ∅, and 58 CHAPTER 3. JOIN ORDERING I R1 II p1,2 R2 R3 p2,3 p4,5 III a) R4 R5 p3,4 p4,5 III b) B p2,3 p1,2 ? B R2 B R3 R3 B R5 R4 R1 R2 p3,4 Figure 3.5: A query graph, its directed join tree, a spanning tree and its problem 3. for all nodes v, u1 , u2 with u1 ̸= u2 , (u1 , v) ∈ E, and (u2 , v) ∈ E one of the following two conditions holds: (a) ((R∗ (T (u1 )) ∩ R(v)) ∩ (R∗ (T (u2 )) ∩ R(v))) = ∅ or (b) (R∗ (T (u1 )) ∩ R(v) = R(v)) ∨ (R∗ (T (u2 )) ∩ R(v) = R(v)). Thereby, we denote by T (v) the partial tree rooted at v and by R∗ (T ′ ) = ∪v∈T ′ R(v) the set of all relations in subtree T ′ . We see that the spanning tree in Figure 3.5 III a) is ineffective since, for example, R(p2,3 ) ∩ R(p4,5 ) = ∅. The spanning tree in Figure 3.4 IV a) is also ineffective. During the algorithm we will take care—by checking the above conditions—that only effective spanning trees are generated. We now assign weights to the edges of the directed join graph. For two nodes v, u ∈ V define u ⊓ v := R(u) ∩ R(v). For simplicity, we assume that every predicate involves exactly two relations. Then for all u, v ∈ V , u ⊓ v contains a single relation. Let v ∈ V be a node with R(v) = {Ri , Rj }. We abbreviate Ri Bv Rj by Bv . Using these notations, we can attach weights to the edges to define the weighted directed join graph. Definition 3.2.7 Let G = (V, Ep , Ev ) be a directed join graph for a conjunctive query with join predicates P . The weighted directed join graph is derived from G by attaching a weight to each edge as follows: • Let (u, v) ∈ Ep be a physical edge. The weight wu,v of (u, v) is defined as wu,v = | Bu | . |u ⊓ v| • For virtual edges (u, v) ∈ Ev , we define wu,v = 1. 3.2. DETERMINISTIC ALGORITHMS 59 (Lee, Shih, and Chen actually attach two weights to each edge: one additional weight for the size of the tuples (in bytes) [539].) The weights of physical edges are equal to the si of the dependency graph used in the IKKBZ-Algorithm (Section 3.2.2). To see this, assume R(u) = {R1 , R2 }, R(v) = {R2 , R3 }. Then | Bu | |u ⊓ v| |R1 Bu R2 | = |R2 | f1,2 |R1 | |R2 | = |R2 | = f1,2 |R1 | wu,v = Hence, if the join R1 Bu R2 is executed before the join R2 Bv R3 , the input size to the latter join changes by a factor wu,v . This way, the influence of a join on another join is captured by the weights. Since those nodes connected by a virtual edge do not influence each other, a weight of 1 is appropriate. Additionally, we assign weights to the nodes of the directed join graph. The weight of a node reflects the change in cardinality to be expected when certain other joins have been executed before. They are specified by a (partial) spanning tree S. Given S, we denote by BSpi,j the result of the join Bpi,j if all joins preceding pi,j in S have been executed. Then the weight attached to node pi,j is defined as | BSpi,j | w(pi,j , S) = . |Ri Bpi,j Rj | For empty sequences ϵ, we define w(pi,j , ϵ) = |Ri Bpi,j Rj |. Similarly, we define the cost of a node pi,j depending on other joins preceding it in some given spanning tree S. We denote this by cost(pi,j , S). The actual cost function can be one we have introduced so far or any other one. In fact, if we have a choice of several join implementations, we can take the minimum over all their cost functions. This then choses the most effective join implementation. The maximum value precedence algorithm works in two phases. In the first phase, it searches for edges with a weight smaller than one. Among these, the one with the biggest impact is chosen. This one is then added to the spanning tree. In other words, in this phase, the costs of expensive joins are minimized by making sure that (size) decreasing joins are executed first. The second phase adds edges such that the intermediate result sizes increase as little as possible. MVP(G) Input: a weighted directed join graph G = (V, Ep , Ev ) Output: an effective spanning tree Q1 .insert(V ); /* priority queue with largest node weights w(·) first */ Q2 = ∅; /* priority queue with smallest node weights w(·) first */ G′ = (V ′ , E ′ ) with V ′ = V and E ′ = Ep ; /* working graph */ 60 CHAPTER 3. JOIN ORDERING S = (VS , ES ) with VS = V and ES = ∅; /* resulting effective spanning tree */ while (!Q1 .empty() && |ES | < |V | − 1) { /* Phase I */ v = Q1 .head(); among all (u, v) ∈ E ′ , wu,v < 1 such that S ′ = (V, ES′ ) with ES′ = ES ∪ {(u, v)} is acyclic and effective select one that maximizes cost(Bv , S) - cost(Bv , S ′ ); if (no such edge exists) { Q1 .remove(v); Q2 .insert(v); continue; } MvpUpdate((u, v)); recompute w(·) for v and its ancestors; /* rearranges Q1 */ } while (!Q2 .empty() && |ES | < |V | − 1) { /* Phase II */ v = Q2 .head(); among all (u, v), (v, u) ∈ E ′ denoted by (x, y) henceforth such that S ′ = (V, ES′ ) with ES′ = ES ∪ {(x, y)} is acyclic and effective select the one that minimizes cost(Bv , S ′ ) - cost(Bv , S); MvpUpdate((x, y)); recompute w(·) for y and its ancestors; /* rearranges Q2 */ } return S; MvpUpdate((u, v)) Input: an edge to be added to S Output: side-effects on S, G′ , ES ∪ = {(u, v)}; E ′ \ = {(u, v), (v, u)}; E ′ \ = {(u, w)|(u, w) ∈ E ′ }; /* (1) */ E ′ ∪ = {(v, w)|(u, w) ∈ Ep , (v, w) ∈ Ev }; /* (3) */ if (v has two inflowing edges in S) { /* (2) */ E ′ \ = {(w, v)|(w, v) ∈ E ′ }; } if (v has one outflowing edge in S) { /* (1) in paper but not needed */ E ′ \ = {(v, w)|(v, w) ∈ E ′ }; } Note that in order to test for the effectiveness of a spanning tree in the algorithm, we just have to check the conditions for the node the selected edge leads to. MvpUpdate first adds the selected edge to the spanning tree. It then eliminates edges that need not to be considered for building an effective spanning tree. Since (u, v) has been added, both (u, v) and (v, u) do not have to be considered any longer. Also, since effective spanning trees are binary trees, (1) 3.2. DETERMINISTIC ALGORITHMS 61 every node must have only one parent node and (2) at most two child nodes. The edges leading to a violation are eliminated by MvpUpdate in the lines commented with the corresponding numbers. For the line commented (3) we have the situation that u → v 99K w and u → w in G. This means that u and w have common relations, but v and w do not. Hence, the result of performing v on the result of u will have a common relation with w. Thus, we add a (physical) edge v → w. 3.2.4 Dynamic Programming Algorithms Consider the two join trees (((R1 B R2 ) B R3 ) B R4 ) B R5 and (((R3 B R1 ) B R2 ) B R4 ) B R5 . If we know that ((R1 BR2 )BR3 ) is cheaper than ((R3 BR1 )BR2 ), we know that the first join tree is cheaper than the second. Hence, we could avoid generating the second alternative and still won’t miss the optimal join tree. The general principle behind this is the optimality principle (see [208]). For the join ordering problem, it can be stated as follows.1 Let T be an optimal join tree for relations R1 , . . . , Rn . Then, every subtree S of T must be an optimal join tree for the relations it contains. To see why this holds, assume that the optimal join tree T for relations R1 , . . . , Rn contains a subtree S which is not optimal. That is, there exists another join tree S ′ for the relations contained in S with strictly lower costs. Denote by T ′ the join tree derived by replacing S in T by S ′ . Since S ′ contains the same relations as S, T ′ is a join tree for the relations R1 , . . . , Rn . The costs of the join operators in T and T ′ that are not contained in S and S ′ are the same. Then, since the total cost of a join tree is the sum of the costs of the join operators and S ′ has lower costs than S, T ′ has lower costs than T . This contradicts the optimality of T . The idea of dynamic programming applied to the generation of optimal join trees now is to generate optimal join trees for subsets of R1 , . . . , Rn in a bottomup fashion. First, optimal join trees for subsets of size one, i.e. single relations, are generated. From these, optimal join trees of size two, three and so on until n are generated. Let us first consider generating optimal left-deep trees. There, join trees for subsets of size k are generated from subsets of size k − 1 by adding a new join operator whose left argument is a join tree for k − 1 relations and whose right argument is a single relation. Exchanging left and right gives us the procedure for generating right-deep trees. If we want to generate zig-zag trees since our 1 The optimality principle does not hold in the presence of properties. 62 CHAPTER 3. JOIN ORDERING cost function is asymmetric, we have to consider both alternatives and take the cheapest one. We capture this in a procedure CreateJoinTree that takes two join trees as arguments and generates the above-mentioned alternatives. In case we want to consider different implementations for the join, we have to perform the above steps for all of them and return the cheapest alternative. Summarizing, the pseudo-code for CreateJoinTree looks as follows: CreateJoinTree(T1 , T2 ) Input: two (optimal) join trees T1 and T2 . for linear trees, we assume that T2 is a single relation Output: an (optimal) join tree for joining T1 and T2 . BestTree = NULL; for all implementations impl do { if(!RightDeepOnly) { Tree = T1 Bimpl T2 if (BestTree == NULL || cost(BestTree) > cost(Tree)) { BestTree = Tree; } } if(!LeftDeepOnly) { Tree = T2 Bimpl T1 if (BestTree == NULL || cost(BestTree) > cost(Tree)) { BestTree = Tree; } } } return BestTree; The boolean variables RightDeepOnly and LeftDeepOnly are used to restrict the search space to right-deep trees and left-deep trees. If both are false, zig-zag trees are generated. However, CreateJoinTree also generates bushy trees, if none of the input trees is a single relation. In case of linear trees, T2 will be the single relation in all of our algorithms. CreateJoinTree should not copy T1 or T2 . Instead, the newly generated join trees should share T1 and T2 by using pointers. Further, the join trees generated do not really need to be generated except for the final (best) join tree: the cost functions should be implemented such that they can be evaluated if they are given the left and right argument of the join. Using CreateJoinTree, we are now ready to present our first dynamic programming algorithm in pseudo-code. DP-Linear-1({R1 , . . . , Rn }) Input: a set of relations to be joined Output: an optimal left-deep (right-deep, zig-zag) join tree 63 3.2. DETERMINISTIC ALGORITHMS {R1 R2 R3 R4} {R1 R2 R4} {R1 R3 R4} {R1 R2 R3} {R2 R3 R4} {R1 R4} {R1 R3} {R1 R2} {R2 R3} {R2 R4} {R3 R4} R1 R2 R3 R4 Figure 3.6: Search space with sharing under optimality principle for (i = 1; i <= n; ++i) { BestTree({Ri }) = Ri ; } for (i = 1; i < n; ++i) { for all S ⊆ {R1 , . . . , Rn }, |S| = i do { for all Rj ∈ {R1 , . . . , Rn }, Rj ̸∈ S do { if (NoCrossProducts && !connected({Rj }, S)) { continue; } CurrTree = CreateJoinTree(BestTree(S),Rj ); S ′ = S ∪ {Rj }; if (BestTree(S ′ ) == NULL || cost(BestTree(S ′ )) > cost(CurrTree)) { BestTree(S ′ ) = CurrTree; } } } } return BestTree({R1 , . . . , Rn }); NoCrossProducts is a boolean variable indicating whether cross products should be investigated. Of course, if the join graph is not connected, there must be 64 CHAPTER 3. JOIN ORDERING a cross product, but for DP-Linear-1 and subsequent algorithms we assume that it is connected. The boolean function connected returns true, if there is a join predicate between one of the relations in its first argument and one of the relations in its second. The variable BestTree keeps track of the best join trees generated for every subset of the relations {R1 , . . . , Rn }. How this is done may depend on several parameters. The approaches are to use a hash table or XC search an array of size 2n (−1). Another issue is how to represent the sets of relations. space size Typically, bitvector representations are used. Then, testing for membership, difference computing a set’s complement, adding elements and unioning is cheap. Yet problem another issue is the order in which join trees are generated. The procedure DP-Linear-1 takes the approach to generate the join trees for subsets of size 1, 2, . . . , n. To do so, it must be able to access the subsets of {R1 , . . . , Rn } or their respective join trees by their size. One possibility is to chain all the join trees for subsets of a given size k (1 ≤ k ≤ n) and to use an array of size n to keep pointers to the start of the lists. In this case, to every join tree the set of relations it contains is attached, in order to be able to perform the test Ri ̸∈ S. One way to do this is to embed a bitvector into each join tree node. Figure 3.6 illustrates how the procedure DP-Linear-1 works. In its first loop, it initializes the bottom row of join trees of size one. Then it computes the join trees joining exactly two relations. This is indicated by the next group of join trees. Since the figure leaves out commutativity, only one alternative join tree for every subset of size two is generated. This changes for subsets of size three. There, three alternative join trees are generated. Only the best join tree is retained. This is indicated by the ovals that encircle three join trees. Only this best join tree of size three is used to generate the final best join tree. The short clarification after the algorithm already adumbrated that the order in which join trees are generated is not compulsory. The only necessary condition is the following. Let S be a subset of {R1 , . . . , Rn }. Then, before a join tree for S can be generated, the join trees for all relevant subsets of S must already be available. EX Note that this formulation is general enough to also capture the generation of bushy trees. It is, however, a little vague due to its reference to “relevance”. For the different join tree classes, this term can be given a precise semantics. Let us take a look at an alternative order to join tree generation. Assume that sets of relations are represented as bitvectors. A bitvector is nothing more than a base two integer. Successive increments of an integer/bitvector lead to different subsets. Further, the above condition is satisfied. We illustrate this by a small example. Assume that we have three relations R1 , R2 , R3 . The i-th bit from the right in a three-bit integer indicates the presence of Ri for 1 ≤ i ≤ 3. 3.2. DETERMINISTIC ALGORITHMS 000 001 010 011 100 101 110 111 65 {} {R1 } {R2 } {R1 , R2 } {R3 } {R1 , R3 } {R2 , R3 } {R1 , R2 , R3 } This observation leads to another formulation of our dynamic programming algorithm. For this algorithm, it is very convenient to use an array of size 2n to represent BestTree(S) for subsets S of {R1 , . . . , Rn }. DP-Linear-2({R1 , . . . , Rn }) Input: a set of relations to be joined Output: an optimal left-deep (right-deep, zig-zag) join tree for (i = 1; i <= n; ++i) { BestTree(1 << i − 1) = Ri ; } for (S = 1; S < 2n ; ++S) { if (BestTree(S) != NULL) continue; for all i ∈ S do { S ′ = S \ {i}; CurrTree = CreateJoinTree(BestTree(S ′ ),Ri ); if (BestTree(S) == NULL || cost(BestTree(S)) > cost(CurrTree)) { BestTree(S) = CurrTree; } } } return BestTree(2n − 1); DP-Linear-2 differs from DP-Linear-1 not only in the order in which join trees are generated. Another difference is that it takes cross products into account. From DP-Linear-2, it is easy to derive an algorithm that explores the space of bushy trees. DP-Bushy({R1 , . . . , Rn }) Input: a set of relations to be joined Output: an optimal bushy join tree for (i = 1; i <= n; ++i) { BestTree(1 << i − 1) = Ri ; } for (S = 1; S < 2n ; ++S) { if (BestTree(S) != NULL) continue; for all S1 ⊂ S, S1 ̸= ∅ do { 66 CHAPTER 3. JOIN ORDERING } S2 = S \ S1 ; CurrTree = CreateJoinTree(BestTree(S1 ), BestTree(S2 )); if (BestTree(S) == NULL || cost(BestTree(S)) > cost(CurrTree)) { BestTree(S) = CurrTree; } } return BestTree(2n − 1); This algorithm also takes cross products into account. The critical part is the generation of all subsets of S. Fortunately, Vance and Maier [898] provide a code fragment with which subset bitvector representations can be generated very efficiently. In C, this fragment looks as follows: S1 = S & - S; do { /* do something with subset S1 */ S1 = S & (S1 - S); } while (S1 != S); S represents the input set. S1 iterates through all subsets of S where S itself and the empty set are not considered. Analogously, all supersets an be generated as follows: S1 = ~S & - ~S; /* do something with first superset S1 */ while (S1 ) { S1 = ~S & (S1 - ~S) /* do something with superset S1 } S represents the input set. S1 iterates through all supersets of S including S itself. XC ToDo Excursion Problem: exploiting orderings devastates the optimality principle. Example: . . . XC ToDo Excursion Pruning . . . Number of Entries to be stored in the dynamic programming table If dynamic programming uses a static hash table, determining its size in advance is necessary as the search space sizes differ vastly for different query graphs. In general, for every connected subgraph of the query graph one entry must 67 3.2. DETERMINISTIC ALGORITHMS exist. Chains require far fewer entries than cliques. It would be helpful to have a small routine solving the following problem: given a query graph, how many connected subgraph are there? Unfortunatly, this problem is #-P hard as Sutner, Satyanarayana, and Suffel showed [856]. They build on results by Valiant [896] and Lichtenstein [555]. (For a definition of #P-hard see the book by Lewis and Papadimitriou [553] or the original paper by Valiant [895].) However, for specific cases, these numbers can be given. If cross products are consideres, the number of join trees stored in the dynamic programming table is 2n − 1 which is one for each non-empty subset of relations. If we do not consider cross products, the number of entries in the dynamic programming table corresponds to the number of connected subgraphs of the query graph. For connected query graphs, we denote this by #csg. For chains, cycles, stars, and cliques with n nodes, we have n(n + 1) 2 #csgcycle (n) = n2 − n + 1 #csgchain (n) = star #csg #csg clique (n) = 2 n−1 n +n−1 (n) = 2 − 1 (3.2) (3.3) (3.4) (3.5) These equations can be derived from the following by summing over k > 1 where k gives the size of the connected subset: #csgchain (n, k) = (n − k + 1)  1 n=k #csgcycle (n, k) = n else  n  k=1 #csgstar (n, k) = n−1 k>1  k−1 n #csgclique (n, k) = k Number of Join Trees Investigated The number of join trees investigated by dynamic programming was extensively studied by Ono and Lohman [652, 653]. In order to estimate these numbers, we assume that CreateJoinTree produces a single join tree and hence counts as one although it may evaluate the costs for several join alternatives. We further do not count the initial join trees containing only a single relation. Join Trees With Cartesian Product For the analysis of dynamic programming variants that do consider cross products, the notion of join-pair is helpful. Let S1 and S2 be subsets of the nodes (relations) of the query graph. We say (S1 , S2 ) is a join-pair, if and only if 68 CHAPTER 3. JOIN ORDERING 1. S1 and S2 are disjoint If (S1 , S2 ) is a join-pair, then (S2 , S1 ) is a join pair. Further, if T1 is a join tree for the relations in S1 and T2 is one for those in S2 , then we can construct two valid join trees T1 1 T2 and T2 1 T1 where the joins may be cross products. Hence, the number of join-pairs coincides with the search space a dynamic programming algorithm explores. In fact, the number of join-pairs is the minimum number of join trees any dynamic programming algorithm that considers cross products has to investigate. If CreateJoinTree considers commutativity of joins, the number of calls to it is precisely expressed by the count of non-symmetric join-pairs. In other implementations CreateJoinTree might be called for all join-pairs and, thus, may not consider commutativity. The two formulas below only count nonsymmetric join pairs. The numbers of linear and bushy join trees with cartesian product is easiest to determine. They are independent of the query graph. For linear join trees, the number of join trees investigated by dynamic programming is equal to the number of non-symmetric join-pairs which is n2n−1 − n(n + 1) 2 Dynamic programming investigates the following number of bushy trees if cross products are considered. (3n − 2n+1 + 1) 2 This is equal to the number of non-symmetric join-pairs. Join Trees without Cross Products In this paragraph, we assume that the query graph is connected. For the analysis of dynamic programming variants that do not consider cross products, it is helpful to have the notion of a csgcmp-pair. Let S1 and S2 be subsets of the nodes (relations) of the query graph. We say (S1 , S2 ) is a csg-cmp-pair , if and only if 1. S1 induces a connected subgraph of the query graph, 2. S2 induces a connected subgraph of the query graph, 3. S1 and S2 are disjoint, and 4. there exists at least one edge connected a node in S1 to a node in S2 . If (S1 , S2 ) is a csg-cmp-pair, then (S2 , S1 ) is a valid csg-cmp-pair. Further, if T1 is a join tree for the relations in S1 and T2 is one for those in S2 , then we can construct two valid join trees T1 1 T2 and T2 1 T1 . Hence, the number of csgcmp-pairs coincides with the search space a dynamic programming algorithm explores. In fact, the number of csg-cmp-pairs is the minimum number of join trees any dynamic programming algorithm that does not consider cross products has to investigate. 69 3.2. DETERMINISTIC ALGORITHMS If CreateJoinTree considers commutativity of joins, the number of calls to it is precisely expressed by the count of non-symmetric csg-cmp-pairs. In other implementations CreateJoinTree might be called for all csg-cmp-pairs and, thus, may not consider commutativity. Let us denote the number of non-symmetric csg-cmp-pairs by #ccp. Then 1 3 (n − 3n2 + 2n) 6 #ccpcycle (n) = (n3 − 2n2 + n)/2 #ccpchain (n) = #ccpstar (n) = (n − 1)2n−2 #ccpclique (n) = (3n − 2n+1 + 1)/2 These numbers have to be multiplied by two if we want to count all csg-cmppairs. If we do not consider composite inners, that is we restrict ourselves to leftdeep join trees, then dynamic programming makes the following number of calls to CreateJoinTree for chain queries [653]: (n − 1)2 The following table presents some results for the above formulas. n 2 3 4 5 6 7 8 9 10 without cross products chain star linear bushy linear (n − 1)2 (n3 − n)/6 (n − 1)2n−2 1 1 1 4 4 4 9 10 12 16 20 32 25 35 80 36 56 192 49 84 448 64 120 1024 81 165 2304 with cross products any query graph linear bushy n2n−1 − n(n + 1)/2 (3n − 2n+1 + 1)/2 1 1 6 6 22 25 65 90 171 301 420 966 988 3025 2259 9330 5065 28501 Compare this table with the actual sizes of the search spaces in Section 3.1.5. The dynamic programming algorithms can be implemented very efficiently and often form the core of commercial plan generators. However, they have the disadvantage that no plan is generated if they run out of time or space since the search space they have to explore is too big. One possible remedy goes as follows. Assume that a dynamic programming algorithm is stopped in the middle of its way through its actual search space. Further assume that the largest plans generated so far involve k relations. Then the cheapest of the plans with k relations is completed by applying any heuristics (e.g. MinSel). The completed plan is then returned. In Section 3.4.5, we will see two alternative solutions. Another solution is presented in [488]. 70 CHAPTER 3. JOIN ORDERING DPsize Input: a connected query graph with relations R = {R0 , . . . , Rn−1 } Output: an optimal bushy join tree without cross products for all Ri ∈ R { BestPlan({Ri }) = Ri ; } for all 1 < s ≤ n ascending // size of plan for all 1 ≤ s1 ≤ s/2 { // size of left/right subplan s2 = s − s1 ; // size of right/left subplan for all S1 ⊂ R in BestPlan with |S1 | = s1 S2 ⊂ R in BestPlan with |S2 | = s2 { ++InnerCounter; if (∅ = ̸ S1 ∩ S2 ) continue; if not (S1 connected to S2 ) continue; ++CsgCmpPairCounter; p1 =BestPlan(S1 ); p2 =BestPlan(S2 ); CurrPlan = CreateJoinTree(p1 , p2 ); if (cost(BestPlan(S1 ∪ S2 )) > cost(CurrPlan)) { BestPlan(S1 ∪ S2 ) = CurrPlan; } } } OnoLohmanCounter = CsgCmpPairCounter / 2; return BestPlan({R0 , . . . , Rn−1 }); Figure 3.7: Algorithm DPsize Generating Bushy Trees without Cross Products We now discuss dynamic programming algorithms to generate bushy trees without cross products. For this section, we assume that the query graph is connected. We will present three algorithms. The first algorithm (DPsize) generates its plans in increasing size of subplans and, hence, is a generalization of DP-Linear-1. The second algorithm (DPsub) geneerates its plans by considering plans subsets as does DP-Linear-2. An analysis of these two algorithms reveals that both are far away from the lower bound presented in the previous sections. Thus, a third algorithm (DPccp) which reaches this lower bound is presented. The results of this section are taken from [618, 616]. Size-based enumeration: DPsize In general, dynamic programming generates solutions for a larger problem in a bottom-up fashion by combining solutions for smaller problems. Taking this description literally, we can construct optimal plans of size n by joining plans p1 and p2 of size k and n − k. We just have to take care that (1) the sets of relations contained in p1 and p2 do not overlap, and (2) there is a join predicate connecting a relation p1 with a relation in p2 . After this remark, we are prepared to understand the pseudocode 71 3.2. DETERMINISTIC ALGORITHMS for algorithm DPsize (see Fig. 3.7). A table BestPlan associates with each set of relations the best plan found so far. The algorithm starts by initializing this table with plans of size one, i.e. single relations. After that, it constructs plans of increasing size (loop over s). Thereby, the first size considered is two, since plans of size one have already been constructed. Every plan joining n relations can be constructed by joining a plan containing s1 relations with a plan containing s2 relations. Thereby, si > 0 and s1 + s2 = n must hold. Thus, the pseudocode loops over s1 and sets s2 accordingly. Since for every possible size there exist many plans, two more loops are necessary in order to loop over the plans of sizes s1 and s2 . (This is best implemented by keeping list heads for every possible plan size pointing to a first plan of this size and chaining plans of equal size via some next-pointer.) Then, conditions (1) and (2) from above are tested. Only if their outcome is positive, we consider joining the plans p1 and p2 . The result is a plan CurrPlan. Let S be the relations contained in CurrPlan. If BestPlan does not contain a plan for the relations in S or the one it contains is more expensive than CurrPlan, we register CurrPlan with BestPlan. The algorithm DPsize can be made more efficient in case of s1 = s2 . The algorithm as stated cycles through all plans p1 joining s1 relations. For each such plan, all plans p2 of size s2 are tested. Assume that plans of equal size are represented as a linked list. If s1 = s2 , then it is possible to iterate through the list for retrieving all plans p1 . For p2 we consider the plans succeeding p1 in the list. Thus, the complexity can be decreased from P (s1 ) ∗ P (s2 ) to P (s1 ) ∗ P (s2 )/2, where P (si ) denotes the number of plans of size si . The following formulas are valid only for the variant of DPsize where this optimization has been incorporated (see [616] for details). If the counter InnerCounter is initialized with zero at the beginning of the algorithm DPsize, then we are able to derive analytically its value after DPsize terminates. Since this value of the inner counter depends on the query graph, we have to distinguish several cases. For chain, cycle, star, and clique queries, chain , I cycle , I star , and I clique the value of InnerCounter we denote by IDPsize DPsize DPsize DPsize after termination of algorithm DPsize. chain (n) = For chain queries, we then have: IDPsize  1/48(5n4 + 6n3 − 14n2 − 12n) n even 4 3 2 1/48(5n + 6n − 14n − 6n + 11) n odd cycle For cycle queries, we have: IDPsize (n) =  1 4 3 2 4 (n − n − n ) 1 4 3 2 4 (n − n − n + n) n even n odd star (n) = For star queries, we have: IDPsize (  22n−4 − 1/4 2(n−1) n even n−1  + q(n)  2(n−1) n−1 2n−4 2 − 1/4 n−1 + 1/4 (n−1)/2 + q(n) n odd 72 CHAPTER 3. JOIN ORDERING DPsub Input: a connected query graph with relations R = {R0 , . . . , Rn−1 } Output: an optimal bushy join tree for all Ri ∈ R { BestPlan({Ri }) = Ri ; } for 1 ≤ i < 2n − 1 ascending { S = {Rj ∈ R|(⌊i/2j ⌋ mod 2) = 1} if not (connected S) continue; // ∗ for all S1 ⊂ S, S1 ̸= ∅ do { ++InnerCounter; S2 = S \ S1 ; if (S2 = ∅) continue; if not (connected S1 ) continue; if not (connected S2 ) continue; if not (S1 connected to S2 ) continue; ++CsgCmpPairCounter; p1 = BestPlan(S1 ); p2 = BestPlan(S2 ); CurrPlan = CreateJoinTree(p1 , p2 ); if (cost(BestPlan(S)) > cost(CurrPlan)) { BestPlan(S) = CurrPlan; } } } OnoLohmanCounter = CsgCmpPairCounter / 2; return BestPlan({R0 , . . . , Rn−1 }); Figure 3.8: Algorithm DPsub with q(n) = n2n−1 − 5 ∗ 2n−3 + 1/2(n2 − 5n + 4). For clique queries, we have: clique IDPsize (n) = (   n 22n−2 − 5 ∗ 2n−2 + 1/4 2n n  − 1/4 n/2 + 1 n even 22n−2 − 5 ∗ 2n−2 + 1/4 2n n odd n +1  n √ Note that 2n n is in the order of Θ(4 / n). Proofs of the above formulas as well as implementation details for the algorithm DPsize can be found in [616]. Subset-Driven Enumeration: DPsub Fig. 3.8 presents the pseudocode for the algorithm DPsub. The algorithm first initializes the table BestPlan with all possible plans containing a single relation. Then, the main loop starts. It iterates over all possible non-empty subsets of {R0 , . . . , Rn−1 } and constructs the best possible plan for each of them. The enumeration makes use of a bitvector representation of sets: The integer i induces the current subset S with its binary representation. Taken as bitvectors, the integers in the range from 1 3.2. DETERMINISTIC ALGORITHMS 73 to 2n − 1 exactly represent the set of all non-empty subsets of {R0 , . . . , Rn−1 }, including the set itself. Further, by starting with 1 and incrementing by 1, the enumeration order is valid for dynamic programming: for every subset, all its subsets are generated before the subset itself. This enumeration is very fast, since increment by one is a very fast operation. However, the relations contained in S may not induce a connected subgraph of the query graph. Therefore, we must test for connectedness. The goal of the next loop over all subsets of S is to find the best plan joining all the relations in S. Therefore, S1 ranges over all non-empty, strict subsets of S. This can be done very efficiently by applying the code snippet of Vance and Maier [897, 898]. Then, the subset of relations contained in S but not in S1 is assigned to S2 . Clearly, S1 and S2 are disjoint. Hence, only connectedness tests have to be performed. Since we want to avoid cross products, S1 and S2 both must induce connected subgraphs of the query graph, and there must be a join predicate between a relation in S1 and one in S2 . If these conditions are fulfilled, we can construct a plan CurrPlan by joining the plans associated with S1 and S2 . If BestPlan does not contain a plan for the relations in S or the one it contains is more expensive than CurrPlan, we register CurrPlan with BestPlan. chain , I cycle , For chain, cycle, star, and clique queries, we denote by IDPsub DPsub star , and I clique the value of InnerCounter after termination of algorithm IDPsub DPsub DPsub. For chains, we have chain (n) = 2n+2 − nn − 3n − 4 IDPsub (3.6) cycle IDPsub (n) = n2n + 2n − 2n2 − 2 (3.7) star (n) = 2 ∗ 3n−1 − 2n IDPsub (3.8) clique IDPsub (n) = 3n − 2n+1 + 1 (3.9) For cycles, we have For stars, we have For cliques, we have The number of failures for the additional check can easily be calculated as 2n − #csg(n) − 1. Sample numbers Fig. 3.9 contains tables with values produced by our formulas for input query graph sizes between 2 and 20. For different kinds of query graphs, it shows the number of csg-cmp-pairs (#ccp). and the values for the inner counter after termination of DPsize and DPsub applied to the different query graphs. Looking at these numbers, we observe the following: • For chain and cycle queries, the DPsize soon becomes much faster than DPsub. 74 CHAPTER 3. JOIN ORDERING n 2 5 10 15 20 #ccp/2 1 20 165 560 1330 n 2 5 10 15 20 #ccp/2 1 32 2304 114688 4980736 Chain DPsub 2 84 3962 130798 4193840 Star DPsub 2 130 38342 9533170 2323474358 DPsize 1 73 1135 5628 17545 #ccp/2 1 40 405 1470 3610 DPsize 1 110 57888 57305929 59892991338 #ccp/2 1 90 28501 7141686 1742343625 Cycle DPsub 2 140 11062 523836 22019294 Clique DPsub 2 180 57002 14283372 3484687250 DPsize 1 120 2225 11760 37900 DPsize 1 280 306991 307173877 309338182241 Figure 3.9: Size of the search space for different graph structures • For star and clique queries, the DPsub soon becomes much faster than DPsize. • Except for clique queries, the number of csg-cmp-pairs is orders of magnitude less than the value of InnerCounter for all DP-variants. From the latter observation we can conclude that in almost all cases the tests performed by both algorithms in their innermost loop fail. Both algorithms are far away from the theoretical lower bound given by #ccp. This conclusion motivates us to derive a new algorithm whose InnerCounter value is equal to the number of csg-cmp-pairs. Csg-cmp-pair enumeration-based algorithm: DPccp The algorithm DPsub solves the join ordering problem for a given subset S of relations by considering all pairs of disjoint subproblems which were already solved. Since the enumeration of subsets is very fast, this is a very efficient strategy if the search space is dense, e.g. for clique queries. However, if the search space is sparse, e.g. for chain queries, the DPsub algorithm considers many subproblems which are not connected and, therefore, are not relevant for the solution, i.e. the tests in the innermost loop fail for the majority of cases. The main idea of our algorithm DPccp is that it only considers pairs of connected subproblems. More precisely, the algorithm considers exactly the csg-cmp-pairs of a graph. Thus, our goal is to efficiently enumerate all csg-cmp-pairs (S1 , S2 ). Clearly, we want to enumerate every pair once and only once. Further, the enumeration must be performed in an order valid for dynamic programming. That is, whenever a pair (S1 , S2 ) is generated, all non-empty subsets of S1 and S2 must have been generated before as a component of a pair. The last requirement is that the overhead for generating a single csg-cmp-pair must be constant or at most linear. This condition is necessary in order to beat DPsize and DPsub. 75 3.2. DETERMINISTIC ALGORITHMS DPccp Input: a connected query graph with relations R = {R0 , . . . , Rn−1 } Output: an optimal bushy join tree for all Ri ∈ R) { BestPlan({Ri }) = Ri ; } for all csg-cmp-pairs (S1 , S2 ), S = S1 ∪ S2 { ++InnerCounter; ++OnoLohmanCounter; p1 = BestPlan(S1 ); p2 = BestPlan(S2 ); CurrPlan = CreateJoinTree(p1 , p2 ); if (cost(BestPlan(S)) > cost(CurrPlan)) { BestPlan(S) = CurrPlan; } CurrPlan = CreateJoinTree(p2 , p1 ); if (cost(BestPlan(S)) > cost(CurrPlan)) { BestPlan(S) = CurrPlan; } } CsgCmpPairCounter = 2 * OnoLohmanCounter; return BestPlan({R0 , . . . , Rn−1 }); Figure 3.10: Algorithm DPccp 0 1 2 3 1 1 1 0 0 3 2 2 0 1 ... Graph 2 3 3 1. 2. 2 3 3. 2 4. 5. 3 6. 7. Figure 3.11: Enumeration Example for DPccp If we meet all these requirements, the algorithm DPccp is easily specified: iterate over all csg-cmp-pairs (S1 , S2 ) and consider joining the best plans associated with them. Figure 3.10 shows the pseudocode. The first steps of an example enumeration are shown in Figure 3.11. Thick lines mark the connected subsets while thin lines mark possible join edges. Note that the algorithm explicitly exploits join commutativity. This is due to our enumeration algorithm developed below. If (S1 , S2 ) is a csg-cmp-pair, then either (S1 , S2 ) or (S2 , S1 ) will be generated, but never both of them. An alternative is to modify CreateJoinTree to take care of commutativity. We proceed as follows. Next we discuss an algorithm enumerating nonempty connected subsets S1 of {R0 , . . . , Rn−1 }. Then, we show how to enumerate the complements S2 such that (S1 , S2 ) is a csg-cmp-pair. ... 76 CHAPTER 3. JOIN ORDERING Let us start the exposition by fixing some notations. Let G = (V, E) be an undirected graph. For a node v ∈ V define the neighborhood IN(v) of v as IN(v) := {v ′ |(v, v ′ ) ∈ E}. For a subset S ⊆ V of V we define the neighborhood of S as IN(S) := ∪v∈S IN(v) \ S. The neighborhood of a set of nodes thus consists of all nodes reachable by a single edge. Note that for all S, S ′ ⊂ V we have IN(S ∪ S ′ ) = (IN(S) ∪ IN(S ′ )) \ (S ∪ S ′ ). This allows for an efficient bottom-up calculation of neighborhoods. The following statement gives a hint on how to construct an enumeration procedure for connected subsets. Let S be a connected subset of an undirected graph G and S ′ be any subset of IN(S). Then S ∪ S ′ is connected. As a consequence, a connected subset can be enlarged by adding any subset of its neighborhood. We could generate all connected subsets as follows. For every node vi ∈ V we perform the following enumeration steps: First, we emit {vi } as a connected subset. Then, we expand {vi } by calling a routine that extends a given connected set to bigger connected sets. Let the routine be called with some connected set S. It then calculates the neighborhood IN(S). For every non-empty subset N ⊆ IN(S), it emits S ′ = S ∪ N as a further connected subset and recursively calls itself with S ′ . The problem with this routine is that it produces duplicates. This is the point where the breadth-first numbering comes into play. Let V = {v0 , . . . , vn−1 }, where the indices are consistent with a breadth-first numbering produced by a breadth-first search starting at node v0 [209]. The idea is to use the numbering to define an enumeration order: In order to avoid duplicates, the algorithm enumerates connected subgraphs for every node vi , but restricts them to contain no vj with j < i. Using the definition Bi = {vj |j ≤ i}, the pseudocode looks as follows: EnumerateCsg Input: a connected query graph G = (V, E) Precondition: nodes in V are numbered according to a breadth-first search Output: emits all subsets of V inducing a connected subgraph of G for all i ∈ [n − 1, . . . , 0] descending { emit {vi }; EnumerateCsgRec(G, {vi }, Bi ); } EnumerateCsgRec(G, S, X) N = IN(S) \ X; for all S ′ ⊆ N , S ′ ̸= ∅, enumerate subsets first { emit (S ∪ S ′ ); } for all S ′ ⊆ N , S ′ ̸= ∅, enumerate subsets first { EnumerateCsgRec(G, (S ∪ S ′ ), (X ∪ N )); } Let us consider an example. Figure 3.12 contains a query graph whose nodes are numbered in a breadth-first fashion. The calls to EnumerateCsgRec 77 3.2. DETERMINISTIC ALGORITHMS R0 R1 R2 R3 R4 Figure 3.12: Sample graph to illustrate EnumerateCsgRec EnumerateCsgRec S X N {4} {0, 1, 2, 3, 4} ∅ {3} {0, 1, 2, 3} {4} {2} {0, 1, 2} {3, 4} {1} {0, 1} {4} → {1, 4} {0, 1, 4} {2, 3} {0} {1, 2, 3} → {0, 1} {0, 1, 2, 3} {4} → {0, 2} {0, 1, 2, 3} {4} {0} emit/S {3, 4} {2, 3} {2, 4} {2, 3, 4} {1, 4} {1, 2, 4} {1, 3, 4} {1, 2, 3, 4} {0, 1} {0, 2} {0, 3} {0, 1, 2} {0, 1, 3} {0, 2, 3} {0, 1, 2, 3} {0, 1, 4} {0, 2, 4} Figure 3.13: Call sequence for Figure 3.12 are contained in the table in Figure 3.13. In this table, S and X are the arguments of EnumerateCsgRec. N is the local variable after its initialization. The column emit/S contains the connected subset emitted, which then becomes the argument of the recursive call to EnumerateCsgRec (labelled by →). Since listing all calls is too lengthy, only a subset of the calls is listed. Generating the connected subsets is an important first step but clearly not 78 CHAPTER 3. JOIN ORDERING sufficient: we have to generate all csg-cmp-pairs. The basic idea to do so is as follows. Algorithm EnumerateCsg is used to create the first component S1 of every csg-cmp-pair. Then, for each such S1 , we generate all its complement components S2 . This can be done by calling EnumerateCsgRec with the correct parameters. Remember that we have to generate every csg-cmp-pair once and only once. To achieve this, we use a similar technique as for connected subsets, using the breadth-first numbering to define an enumeration order: we consider only sets S2 in the complement of S1 (with (S1 , S2 ) being a csg-cmp-pair) such that S2 contains only vj with j larger than any i with vi ∈ S1 . This avoids the generation of duplicates. We need some definitions to state the actual algorithm. Let S1 ⊆ V be a non-empty subset of V . Then, we define min(S1 ) := min({i|vi ∈ S1 }). This is used to extract the starting node from which S1 was constructed (see Lemma ??). Let W ⊂ V be a non-empty subset of V . Then, we define Bi (W ) := {vj |vj ∈ W, j ≤ i}. Using this notation, the algorithm to construct all S2 for a given S1 such that (S1 , S2 ) is a csg-cmp-pair looks as follows: EnumerateCmp Input: a connected query graph G = (V, E), a connected subset S1 Precondition: nodes in V are numbered according to a breadth-first search Output: emits all complements S2 for S1 such that (S1 , S2 ) is a csg-cmp-pair X = Bmin(S1 ) ∪ S1 ; N = IN(S1 ) \ X; for all (vi ∈ N by descending i) { emit {vi }; EnumerateCsgRec(G, {vi }, X ∪ (Bi ∩ N )); } Algorithm EnumerateCmp considers all neighbors of S1 . First, they are used to determine those S2 that contain only a single node. Then, for each neighbor of S1 , it recursively calls EnumerateCsgRec to create those S2 that contain more than a single node. Note that here both nodes concerning the enumeration of S1 (Bmin(S1 ) ∪ S1 ) and nodes concerning the enumeration of S2 (N ) have to be considered in order to guarantee a correct enumeration. Otherwise the combined algorithm would emit (commutative) duplicates. Let us consider an example for algorithm EnumerateCmp. The underlying graph is again the one shown in Fig. 3.12. Assume EnumerateCmp is called with S1 = {R1 }. In the first statement, the set {R0 , R1 } is assigned to X. Then, the neighborhood is calculated. This results in N = {R0 , R4 } \ {R0 , R1 } = {R4 }. Hence, {R4 } is emitted and together with {R1 }, it forms the csg-cmp-pair ({R1 }, {R4 }). Then, the recursive call to EnumerateCsgRec follows with arguments G, {R4 }, and {R0 , R1 , R4 }. Subsequent EnumerateCsgRec generates the connected sets {R2 , R4 }, {R3 , R4 }, and {R2 , R3 , R4 }, giving three more csg-cmp-pairs. 3.2. DETERMINISTIC ALGORITHMS 3.2.5 79 Memoization Whereas dynamic programming constructs the join trees iteratively from small trees to larger trees, i.e. works bottom up, memoization works recursively. For a given set of relations S, it produces the best join tree for S by recursively calling itself for every subset S1 of S and considering all join trees between S1 and its complement S2 . The best alternative is memoized (hence the name). The reason is that two (even different) (sub-) sets of all relations may very well have the common subsets. For example, {R1 , R2 , R3 , R4 , R5 } and {R2 , R3 , R4 , R5 , R6 } have the common subset {R2 , R3 , R4 , R5 }. In order to avoid duplicate work, memoization is essential. In the following variant of memoization, we explore the search space of all bushy trees and consider cross products. We split the functionality across two EX functions. The first one initializes the BestTree data structure with single relation join trees for Ri and then calls the second one. The second one is the core memoization procedure which calls itself recursively. MemoizationJoinOrdering(R) Input: a set of relations R Output: an optimal join tree for R for (i = 1; i <= n; ++i) { BestTree({Ri }) = Ri ; } return MemoizationJoinOrderingSub(R); MemoizationJoinOrderingSub(S) Input: a (sub-) set of relations S Output: an optimal join tree for S if(NULL == BestTree(S)) { for all S1 ⊂ S do { S2 = S \ S1 ; CurrTree = CreateJoinTree(MemoizationJoinOrderingSub(S1 ), MemoizationJoinOrderingSub( if (BestTree(S) == NULL || cost(BestTree(S)) > cost(CurrTree)) { BestTree(S) = CurrTree; } } } return BestTree(S); Again, pruning techniques can help to speed up plan generation [798]. 3.2.6 Join Ordering by Generating Permutations For any set of cost functions, we can directly generate permutations. Generating all permutations is clearly too expensive for more than a couple of relations. However, we can safely neglect some of them. Consider the join ToDo? 80 CHAPTER 3. JOIN ORDERING sequence R1 R2 R3 R4 . If we know that R1 R3 R2 is cheaper than R1 R2 R3 , we do not have to consider R1 R2 R3 R4 . The idea of the following algorithm is to construct permutations by successively adding relations. Thereby, an extended sequence is only explored if exchanging the last two relations does not result in a cheaper sequence. ConstructPermutations(Query Specification) Input: query specification for relations {R1 , . . . , Rn } Output: optimal left-deep tree BestPermutation = NULL; Prefix = ϵ; Rest = {R1 , . . . , Rn }; ConstructPermutationsSub(Prefix, Rest); return BestPermutation ConstructPermutationsSub(Prefix, Rest) Input: a prefix of a permutation and the relations to be added (Rest) Ouput: none, side-effect on BestPermutation if (Rest == ∅) { if (BestPermutation == NULL || cost(Prefix) < cost(BestPermutation)) { BestPermutation = Prefix; } return } foreach (Ri , Rj ∈ Rest) { if (cost(Prefix ◦ ⟨Ri , Rj ⟩) ≤ cost(Prefix ◦ ⟨Rj , Ri ⟩)) { ConstructPermutationsSub(Prefix ◦ ⟨Ri ⟩, Rest \ {Ri }); } if (cost(Prefix ◦ ⟨Rj , Ri ⟩) ≤ cost(Prefix ◦ ⟨Ri , Rj ⟩)) { ConstructPermutationsSub(Prefix ◦ ⟨Rj ⟩, Rest \ {Rj }); } } return The algorithm can be made more efficient, if the foreach loop considers only a single relation and performs the swap test with this relation and the last relation occurring in Prefix. The algorithm has two main advantages over dynamic programming and memoization. The first advantage is that it needs only linear space opposed to exponential space for the two mentioned alternatives. The other main advantage over dynamic programming is that it generates join trees early, whereas with dynamic programming we only generate a plan after the whole search space has been explored. Thus, if the query contains too many joins— that is, the search space cannot be fully explored in reasonable time and space—dynamic programming will not generate any plan at all. If stopped, 3.2. DETERMINISTIC ALGORITHMS 81 ConstructPermutations will not necessarily compute the best plan, but still some plans have been investigated. This allows us to stop it after some time limit has exceeded. The time limit itself can be fixed, like 100 ms, or variable, like 5% of the execution time of the best plan found so far. The predicates in the if statement can be made more efficient if a (local) ranking function is available. Further speed-up of the algorithm can be achieved if additionally the idea of memoization is applied (of course, this jeopardizes the small memory footprint). The following variant might be interesting if one is willing to go from linear space consumption to quadratic space consumption. The original algorithm is then started n times, once for each relation as a starting relation. The n different instantiations then have to run interleaved. This variant reduces the dependency on the starting relation. Worst Case Analysis ToDo/EX Pruning/memoization/propagation ToDo/EX 3.2.7 A Dynamic Programming based Heuristics for Chain Queries In Section 3.1.6, we saw that the complexity of producing optimal left-deep trees possibly containing cross products for chain queries is an open problem. However, the case does not seem to be hopeless. In fact, Scheufele and Moerkotte present two algorithms [766, 768] for this problem. For one algorithm, it can be proven that it has polynomial runtime, for the other, it can be proven that it produces the optimal join tree. However, for none of them both could be proven so far. Basic Definitions and Lemmata An instance of the join-ordering problem for chain queries (or a chain query for short) is fully described by the following parameters. First, n relations R1 , . . . , Rn are given. The size of relation Ri (1 ≤ i ≤ n) is denoted by |Ri | or nRi . Second, the query graph G on the set of relations R1 , . . . , Rn must be a chain. That is, its edges are {(Ri , Ri+1 ) | 1 ≤ i < n}: R1 — R2 — . . . — Rn For every edge (Ri , Ri+1 ), there is an associated selectivity fi,i+1 = |Ri B Ri+1 |/|Ri × Ri+1 |. We define all other selectivities fi,j = 1 for |i − j| = ̸ 1. They correspond to cross products. In this section we consider only left-deep processing trees. However, we allow them to contain cross products. Hence, any permutation is a valid join tree. There is a unique correspondence not only between left-deep join trees but also between consecutive parts of a permutation and segments of a leftdeep tree. Furthermore, if a segment of a left-deep tree does not contain cross products, it uniquely corresponds to a consecutive part of the chain in the query graph. In this case, we also speak of (sub)chains or connected (sub)sequences. We say that two relations Ri and Rj are connected if they are adjacent in G; more generally, two sequences s and t are connected if there exist relations Ri 82 CHAPTER 3. JOIN ORDERING in s and Rj in t such that Ri and Rj are connected. A sequence of relations s is connected if for all subsequences s1 and s2 satisfying s = s1 s2 it holds that s1 is connected to s2 . Given a chain query, we ask for a permutation s = r1 . . . rn of the n relations (i.e. there is a permutation π such that ri = Rπ(i) for 1 ≤ i ≤ n) that produces minimal costs under the cost function Cout . Remember that the dynamic programming approach considers n2n−1 −n(n+ 1)/2 alternatives for left-deep processing trees with cross products—independently of the query graph and the cost function. The question arises whether it is possible to lower the complexity in case of simple chain queries. The IKKBZ algorithm solves the join ordering problem for tree queries by decomposing the problem into polynomially many subproblems which are subject to tree-like precedence constraints. The precedence constraints ensure that the cost functions of the subproblems now have the ASI property. The remaining problem is to optimize the constrained subproblems under the simpler cost function. Unfortunately, this approach does not work in our case, since no such decomposition seems to exist. Let us introduce some notions used for the algorithms. We have to generalize the rank used in the IKKBZ algorithm to relativized ranks. We start by relativizing the cost function. The costs of a sequence s relative to a sequence u are defined as Cu (ϵ) := 0 Cu (Ri ) := 0 if u = ϵ Y Cu (Ri ) := ( fj,i )ni Rj 0. The argument to the function f (x) is (for the computation of the size of a single relation Ri ) fi |Ri |. But this is the factor by which the next intermediate result will increase (or decrease). Since we sum up intermediate results, this is an essential number. Furthermore, it follows from the monotonicity of f (x) that ranku (Ri ) ≤ ranku (Rj ) if and only if fi |Ri | ≤ fj |Rj | where fj is the product of all selectivities between Rj and relations in u. Example 1 (cont’d): Supposing the query given in Example 1, the optimal sequence R1 R3 R2 gives rise to the following ranks. TR1 (R2 )−1 100∗0.9−1 CR1 (R2 ) = 100∗0.9 ≈ 0.9888 TR1 (R3 )−1 rankR1 (R3 ) = C = 10∗1.0−1 10∗1.0 = 0.9 R1 (R3 ) TR1 R3 (R2 )−1 rankR1 R3 (R2 ) = CR R (R2 ) = 100∗0.9∗0.9−1 100∗0.9∗0.9 ≈ 0.9877 1 3 rankR1 (R2 ) = 84 CHAPTER 3. JOIN ORDERING Hence, within the optimal sequence, the relation with the smallest rank (here R3 , since rankR1 (R3 ) < rankR1 (R2 )) is preferred. As the next lemma will show, this is no accident. 2 Using the rank function, the following lemma can be proved. Lemma 3.2.9 For sequences S = r1 · · · rk−1 rk rk+1 rk+2 · · · rn S ′ = r1 · · · rk−1 rk+1 rk rk+2 · · · rn the following holds: C(S) ≤ C(S ′ ) ⇔ ranku (rk ) ≤ ranku (rk+1 ) where u = r1 · · · rk−1 . Equality only holds if it holds on both sides. Example 1 (cont’d): Since the ranks of the relations in Example 1 are ordered with ascending ranks, Lemma 3.2.9 states that, whenever we exchange two adjacent relations, the costs cannot decrease. In fact, we observe that C(R1 R3 R2 ) ≤ C(R1 R2 R3 ). 2 An analogous lemma still holds for two unconnected subchains: Lemma 3.2.10 Let u, x and y be three subchains where x and y are not interconnected. Then we have: C(uxy) ≤ C(uyx) ⇔ ranku (x) ≤ ranku (y) Equality only holds if it holds on both sides. Next, we define the notion of a contradictory chain, which will be essential to the algorithms. The subsequent lemmata will allow us to cut down the search space to be explored by any optimization algorithm. Definition 3.2.11 (contradictory pair of subchains) Let u, x, y be nonempty sequences. We call (x, y) a contradictory pair of subchains if and only if Cu (xy) ≤ Cu (yx) ∧ ranku (x) > rankux (y) A special case occurs when x and y are single relations. Then the above condition simplifies to rankux (y) < ranku (x) ≤ ranku (y) To explain the intuition behind the definition of contradictory subchains, we need another example. 3.2. DETERMINISTIC ALGORITHMS 85 Example 2: Suppose a chain query involving R1 , R2 , R3 is given. The relation sizes are |R1 | = 1, |R2 | = |R3 | = 10 and the selectivities are f1,2 = 0.5, f2,3 = 0.2. Consider the sequences R1 R2 R3 and R1 R3 R2 , which differ in the order of the last two relations. We have rankR1 (R2 ) = 0.8 rankR1 R2 (R3 ) = 0.0 rankR1 (R3 ) = 0.9 rankR1 R3 (R2 ) = 0.5 and C(R1 R2 R3 ) = 15 C(R1 R3 R2 ) = 20 Hence, rankR1 (R2 ) > rankR1 R2 (R3 ) rankR1 (R3 ) > rankR1 R3 (R2 ) C(R1 R2 R3 ) < C(R1 R3 R2 ) and (R2 , R3 ) is a contradictory pair within R1 R2 R3 . Now the use of the term contradictory becomes clear: the costs do not behave as could be expected from the ranks. 2 The next (obvious) lemma states that contradictory chains are necessarily connected. Lemma 3.2.12 If there is no connection between two subchains x and y, then they cannot build a contradictory pair (x, y). Now we present the fact that between a contradictory pair of relations, there cannot be any other relation not connected to them without increasing cost. Lemma 3.2.13 Let S = usvtw be a sequence. If there is no connection between relations in s and v and relations in v and t, and ranku (s) ≥ rankus (t), then there exists a sequence S ′ not having higher costs, where s immediately precedes t. Example 3: Consider five relations R1 , . . . , R5 . The relation sizes are |R1 | = 1, |R2 | = |R3 | = |R4 | = 8, and |R5 | = 2. The selectivities are f1,2 = 12 , f2,3 = 14 , f3,4 = 81 , and f4,5 = 12 . Relation R5 is not connected to relations R2 and R3 . Further, within the sequence R1 R2 R5 R3 R4 relations R2 and R3 have 3 2−1 1 contradictory ranks: rankR1 (R2 ) = 4−1 4 = 4 and rankR1 R2 R5 (R3 ) = 2 = 2 . Hence, at least one of R1 R5 R2 R3 R4 and R1 R2 R3 R5 R4 must be of no greater cost than R1 R2 R5 R3 R4 . This is indeed the case: C(R1 R2 R3 R5 R4 ) = 4 + 8 + 16 + 8 = 36 C(R1 R2 R5 R3 R4 ) = 4 + 8 + 16 + 8 = 36 C(R1 R5 R2 R3 R4 ) = 2 + 8 + 16 + 8 = 34 86 CHAPTER 3. JOIN ORDERING 2 The next lemma shows that, if there exist two sequences of single rank-sorted relations, then their costs as well as their ranks are necessarily equal. Lemma 3.2.14 Let S = x1 · · · xn and S ′ = y1 · · · yn be two different ranksorted chains containing exactly the relations R1 , . . . , Rn , i.e. rankx1 ···xi−1 (xi ) ≤ rankx1 ···xi (xi+1 ) for all 1 ≤ i ≤ n, ranky1 ···yi−1 (yi ) ≤ ranky1 ···yi (yi+1 ) for all 1 ≤ i ≤ n, then S and S ′ have equal costs and, furthermore, rankx1 ···xi−1 (xi ) = ranky1 ···yi−1 (yi ) for all 1 < i ≤ n One could conjecture that the following generalization of Lemma 3.2.14 is true, although no one has proved it so far. Conjecture 3.2.1 Let S = x1 · · · xn and S ′ = y1 · · · ym be two different ranksorted chains for the relations R1 . . . , Rn where the x′i s and yi′ s are subsequences such that rankx1 ···xi−1 (xi ) ≤ rankx1 ···xi (xi+1 ) for all 1 ≤ i < n, ranky1 ···yi−1 (yi ) ≤ ranky1 ···yi (yi+1 ) for all 1 ≤ i < m, and the subsequences xi and yj are all optimal (with respect to the fixed prefixes x1 . . . xi−1 and y1 . . . yj−1 ), then S and S ′ have equal costs. Consider the problem of merging two optimal unconnected chains. If we knew that the ranks of relations in an optimal chain are always sorted in ascending order, we could use the classical merge procedure to combine the two chains. The resulting chain would also be rank-sorted in ascending order and, according to Lemma 3.2.14, it would be optimal. Unfortunately, this does not work, since there are optimal chains whose ranks are not sorted in ascending order: those containing sequences with contradictory ranks. Now, as shown in Lemma 3.2.13, between contradictory pairs of relations there cannot be any other relation not connected to them. Hence, in the merging process, we have to take care that we do not merge a contradictory pair of relations with a relation not connected to the pair. In order to achieve this, we apply the same trick as in the IKKBZ algorithm: we tie the relations of a contradictory subchain together by building a compound relation. Assume that we tie together the relations r1 , . . . , rn to a new relation r1,...,n . Then we define the size of r1,...,n as |r1,...,n | = |r1 B . . . B rn | Further, if some ri (1 ≤ i ≤ n) does have a connection to some rk ̸∈ {r1 , . . . , rn } then we define the selectivity factor fr1,...,n ,rk between rk and r1,...,n as fr1,...,n ,rk = fi,k . If we tie together contradictory pairs, the resulting chain of compound relations still does not have to be rank-sorted with respect to the compound relations. To overcome this, we iterate the process of tying contradictory pairs of compound relations together until the sequence of compound relations is rank-sorted, which will eventually be the case. That is, we apply the normalization as used in the IKKBZ algorithm. However, we have to reformulate it for relativized costs and ranks: 3.2. DETERMINISTIC ALGORITHMS 87 Normalize(p,s) while (there exist subsequences u, v (u ̸= ϵ) and compound relations x, y such that s = uxyv and Cpu (xy) ≤ Cpu (yx) and rankpu (x) > rankpux (y)) { replace xy by a compound relation (x, y); } return (p, s); The compound relations in the result of the procedure Normalize are called contradictory chains. A maximal contradictory subchain is a contradictory subchain that cannot be made longer by further tying steps. Resolving the tyings introduced in the procedure normalize is called de-normalization. It works the same way as in the IKKBZ algorithm. The cost, size and rank functions can now be extended to sequences containing compound relations in a straightforward way. We define the cost of a sequence containing compound relations to be identical with the cost of the corresponding de-normalized sequence. The size and rank functions are defined analogously. The following simple observation is central to the algorithms: every chain can be decomposed into a sequence of adjacent maximal contradictory subchains. For convenience, we often speak of chains instead of subchains and of contradictory chains instead of maximal contradictory subchains. The meaning should be clear from the context. Further, we note that the decomposition into adjacent maximal contradictory subchains is not unique. For example, consider an optimal subchain r1 r2 r3 and a sequence u of preceding relations. If ranku (r1 ) > rankur1 (r2 ) > rankur1 r2 (r3 ) one can easily show that both (r1 , (r2 , r3 )) and ((r1 , r2 ), r3 ) are contradictory subchains. Nevertheless, this ambiguity is not important since in the following we are only interested in contradictory subchains which are optimal . In this case, the condition Cu (xy) ≤ Cu (yx) is certainly true and can therefore be neglected. One can show that for the case of optimal subchains the indeterministically defined normalization process is well-defined, that is, if S is optimal, normalize(P,S) will always terminate with a unique “flat” decomposition of S into maximal contradictory subchains (flat means that we remove all but the outermost parenthesis, e.g. (R1 R2 )(((R5 R4 )R3 )R6 ) becomes (R1 R2 )(R5 R4 R3 R6 )). The next two lemmata and the conjecture show a possible way to overcome the problem that if we consider cross products, we have an unconstrained ordering problem and the idea of Monma and Sidney as exploited in the IKKBZ algorithm is no longer applicable. The next lemma is a direct consequence of the normalization procedure. Lemma 3.2.15 Let S = s1 . . . sm be an optimal chain consisting of the maximal contradictory subchains s1 , . . . , sm (as determined by the function normalize). Then rank(s1 ) ≤ ranks1 (s2 ) ≤ ranks1 s2 (s3 ) 88 CHAPTER 3. JOIN ORDERING ≤ · · · ≤ ranks1 ...sm−1 (sm ), in other words, the (maximal) contradictory subchains in an optimal chain are always sorted by ascending ranks. The next result shows how to build an optimal sequence from two optimal non-interconnected sequences. Lemma 3.2.16 Let x and y be two optimal sequences of relations where x and y are not interconnected. Then the sequence obtained by merging the maximal contradictory subchains in x and y (as obtained by normalize) according to their ascending rank is optimal. Merging two sequences in the way described in Lemma 3.2.16 is a fundamental process. We henceforth refer to it by simply saying that we merge by the ranks. We strongly conjecture that the following generalization of Lemma 3.2.14 is true, although it is yet unproven. It uses the notion of optimal recursive decomposable subchains defined in the next subsection. Conjecture 3.2.2 Consider two sequences S and T containing exactly the relations R1 ,. . . ,Rn . Let S = s1 . . . sk and T = t1 . . . tl be such that each of the maximal contradictory subchains si , i = 1, . . . , k and tj , j = 1, . . . , l are optimal recursively decomposable. Then S and T have equal costs. The first algorithm We first use a slightly modified cost function C ′ , which additionally respects the size of the first relation in the sequence, i.e. C and C ′ relate via  C(s) + |nR |, if u = ϵ and s = Rs′ ′ Cu (s) = Cu (s), otherwise This cost function can be treated in a more elegant way than C. The new rank function is now defined as ranku (s) := (Tu (s) − 1)/Cu′ (s). Note that the rank function is now defined even if u = ϵ and s is a single relation. The size function remains unchanged. At the end of this subsection, we describe how our results can be adapted to the original cost function C. The rank of a contradictory chain depends on the relative position of the relations that are directly connected to it. For example, the rank of the contradictory subchain (R5 R3 R4 R2 ) depends on the position of the neighbouring relations R1 and R6 relative to (R5 R3 R4 R2 ). That is, whether they appear before or after the sequence (R5 R3 R4 R2 ). Therefore, we introduce the following fundamental definitions: Definition 3.2.17 (neighbourhood) We call the set of relations that are directly connected to a subchain (with respect to the query graph G) the complete neighbourhood of that subchain. A neighbourhood is a subset of the complete neighbourhood. The complement of a neighbourhood u of a subchain s is defined as v \ u, where v denotes the complete neighbourhood of s. 3.2. DETERMINISTIC ALGORITHMS 89 Note that the neighbourhood of a subchain s within a larger chain us is uniquely determined by the subsequence u of relations preceding it. For convenience, we will often use sequences of preceding relations to specify neighbourhoods. We henceforth denote a pair consisting of a connected sequence s and a neighbourhood u by [s]u . Definition 3.2.18 (contradictory subchain, extent) A contradictory subchain [s]u is inductively defined as follows. 1. For a single relation s, [s]u is a contradictory subchain. 2. There is a decomposition s = vw such that (v, w) is a contradictory pair with respect to the preceding subsequence u and both [v]u and [w]uv are contradictory subchains themselves. The extent of a contradictory chain [s]u is defined as the pair consisting of the neighbourhood u and the set of relations occurring in s. Since contradictory subchains are connected, the set of occurring relations has always the form {Ri , Ri+1 , . . . , Ri+l } for some 1 ≤ i ≤ n, 0 ≤ l ≤ n − i. An optimal contradictory subchain to a given extent is a contradictory subchain with lowest cost among all contradictory subchains of the same extent. The number of different extents of contradictory subchains for a chain query of n relations is 2n2 − 2n + 1. Each contradictory chain can be completely recursively decomposed into adjacent pairs of connected subchains. Subchains with this property are defined next (similar types of decompositions occur in [435, 799]). Definition 3.2.19 ((optimal) recursively decomposable subchain) A recursively decomposable subchain [s]u is inductively defined as follows. 1. If s is a single relation, then [s]u is recursively decomposable. 2. There is a decomposition s = vw such that v is connected to w and both [v]u and [w]uv are recursively decomposable subchains. The extent of a recursively decomposable chain is defined in the same way as for contradictory chains. Note that every contradictory subchain is recursively decomposable. Consequently, the set of all contradictory subchains for a certain extent is a subset of all recursively decomposable subchains of the same extent. Example 4: Consider the sequence of relations s = R2 R4 R3 R6 R5 R1 . Using parentheses to indicate the recursive decompositions, we have the following two possibilities (((R2 (R4 R3 ))(R6 R5 ))R1 ) ((R2 ((R4 R3 )(R6 R5 )))R1 ) 90 CHAPTER 3. JOIN ORDERING The extent of the recursively decomposable subchain R4 R3 R6 R5 of s is ({R2 }, {R3 , R4 , R5 , R6 }). 2 The number of different recursively decomposable chains involving the relations R1 , . . . , Rn is rn , where rn denotes the n-th Schröder number [799]. Hence, the number of recursively decomposable chains is rn + 2(n − 1)rn−1 +  P totaln−2 r . It can be shown that 4 n−2 i i=1 i √ C(2 + 8)n rn ≈ n3/2 q √ where C = 1/2 2 π2−4 . Using Stirling’s formula for n! it is easy to show that limn→∞ rn!n = 0. Thus, the probability of a random permutation to be recursively decomposable strives to zero for large n. An optimal recursively decomposable subchain to a given extent is a recursively decomposable subchain with lowest cost among all recursively decomposable subchains of the same extent. There is an obvious dynamic programming algorithm to compute optimal recursive decomposable subchains. It is not hard to see that Bellman’s optimality principle [608, 209] holds and every optimal recursively decomposable subchain can be decomposed into smaller optimal recursively decomposable subchains. Example 5: In order to compute an optimal recursively decomposable subchain for the extent ({R2 , R7 }, {R3 , R4 , R5 , R6 }) the algorithm makes use of optimal recursively decomposable subchains for the extents ({R2 }, {R3 }) ({R2 }, {R3 , R4 }) ({R2 }, {R3 , R4 , R5 }) ({R7 }, {R4 , R5 , R6 }) ({R7 }, {R5 , R6 ) ({R7 }, {R6 }) ({R7 , R3 }, {R4 , R5 , R6 }) ({R7 , R4 }, {R5 , R6 }) ({R5 , R7 }, {R6 }) ({R2 , R4 }, {R3 }) ({R2 , R5 }, {R3 , R4 }) ({R2 , R6 }, {R3 , R4 , R5 }) which have been computed in earlier steps2 . A similar dynamic programming algorithm can be used to determine optimal contradictory subchains. 2 Let E be the set of all possible extents. We define the following partial order P = (E, ≺) on E. For all extents e1 , e2 ∈ E, we have e1 ≺ e2 if and only if e1 can be obtained by splitting the extent e2 . For example, ({R7 }, {R5 , R6 }) ≺ ({R2 , R7 }, {R3 , R4 , R5 , R6 }). The set of maximal extents M then corresponds to a set of incomparable elements (antichain) in P such that for all extents e enumerated so far, there is an extent e′ ∈ M with e ≺ e′ . Now, since every optimal join sequence has a representation as a sequence of contradictory subchains, we only have to determine this representation. Consider a contradictory subchain c in an optimal join sequence s. What can we say 2 The splitting of extents induces a partial order on the set of extents. 3.2. DETERMINISTIC ALGORITHMS 91 about c? Obviously, c has to be optimal with respect to the neighbourhood defined by the relations preceding c in s. Unfortunately, identifying contradictory subchains that are optimal sequences seems to be as hard as the whole problem of optimizing chain queries. Therefore, we content ourselves with the following weaker condition which may lead to multiple representations. Nevertheless, it seems to be the strongest condition for which all subchains satisfying the condition can be computed in polynomial time. The condition says that s should be optimal both with respect to all contradictory chains of the same extent as s and with respect to all recursively decomposable subchains of the same extent. So far it is not clear whether these conditions lead to multiple representations. Therefore, we have no choice but to enumerate all possible representations and select the one with minimal costs. Next we describe the first algorithm. Algorithm Chain-I’: 1. Use dynamic programming to determine all optimal contradictory subchains. This step can be made faster by keeping track of the set M of all maximal extents (with respect to the partial order induced by splitting extents). 2. Determine all optimal recursively decomposable subchains for all extents included in some maximal extent in M . 3. Compare the results from steps 1 and 2 and retain only matching subchains. 4. Sort the contradictory subchains according to their ranks. 5. Eliminate contradictory subchains that cannot be part of a solution. 6. Use backtracking to enumerate all sequences of rank-ordered optimal contradictory subchains and keep track of the sequence with lowest cost. In step 5 of the algorithm, we eliminate contradictory subchains that do not contribute to a solution. Note that the contradictory subchains in an optimal sequence are characterized by the following two conditions. 1. The extents of all contradictory subchains in the representation build a partition of the set of all relations. 2. The neighbourhoods of all contradictory subchains are consistent with the relations occurring at earlier and later positions in the sequence. Note that any contradictory subchain occurring in the optimal sequence (except at the first and last positions) necessarily has matching contradictory subchains preceding and succeeding it in the list. In fact, every contradictory subchain X occurring in the optimal join sequence must satisfy the following two conditions. 1. For every relation R in the neighbourhood of X, there exists a contradictory subchain Y at an earlier position in the list which itself meets condition 1, such that R occurs in Y , and Y can be followed by X. 2. For every relation R in the complementary neighbourhood of X, there exists a contradictory subchain Y at a later position in the list which itself meets condition 2, such that R occurs in the neighbourhood of Y , and X can be followed by Y . 92 CHAPTER 3. JOIN ORDERING Using these two conditions, we can eliminate “useless” contradictory chains from the rank-ordered list by performing a reachability algorithm for each of the DAGs defined by the conditions 1 and 2. In the last step of our algorithm, backtracking is used to enumerate all representations. Suppose that at some step of the algorithm we have determined an initial sequence of contradictory subchains and have a rank-sorted list of the remaining possible contradictory subchains. In addition to the two conditions mentioned above, another reachability algorithm can be applied to determine the set of reachable relations from the list (with respect to the given prefix). With the use of this information, all branches that do not lead to a complete join sequence can be pruned. Let us analyze the worst case time complexity of the algorithm. The two dynamic programming steps both iterate over O(n2 ) different extents, and each extent gives rise to O(n) splittings. Moreover, for each extent one normalization is necessary, which requires linear time (cost, size and rank can be computed in constant time using recurrences). Therefore, the complexity of the two dynamic programming steps is O(n4 ). Sorting O(n2 ) contradictory chains can be done in time O(n2 log n). The step where all “useless” contradictory subchains are eliminated, consists of two stages of a reachability algorithm which has complexity O(n4 ). If conjecture 3.2.2 is true, the backtracking step requires linear time, and the total complexity of the algorithm is O(n4 ). Otherwise, if conjecture 3.2.2 is false, the algorithm might exhibit exponential worst case time complexity. We now describe how to reduce the problem for our original cost function C to the problem for the modified cost function C ′ . One difficulty with the original cost function is that the ranks are defined only for subsequences of at least two relations. Hence, for determining the first relation in our solution we do not have sufficient information. An obvious solution to this problem is to try every relation as starting relation, process each of the two resulting chain queries separately and choose the chain with minimum costs. The new complexity will increase by about a factor of n. This first approach is not very efficient, since the dynamic programming computations overlap considerably, e.g. if we perform dynamic programming on the two overlapping chains R1 R2 R3 R4 R5 R6 and R2 R3 R4 R5 R6 R7 , for the intersecting chain R2 R3 R4 R5 R6 everything is computed twice. The cue is that we can perform the dynamic programming calculations before we consider a particular starting relation. Hence, the final algorithm can be sketched as follows: Algorithm CHAIN-I: 1. Compute all optimal contradictory chains by dynamic programming (corresponds to the steps 1-4 of Algorithm I’) 2. For each starting relation Ri , perform the following steps: (a) Let L1 be the result of applying steps 5 and 6 of Algorithm I’ to all contradictory subchains whose extent (N, M ) satisfies Ri ∈ N and M ⊆ {R1 , . . . , Ri }. (b) Let L2 be the result of applying steps 5 and 6 of Algorithm I’ to all contradictory subchains whose extent (N, M ) satisfies Ri ∈ N and 3.2. DETERMINISTIC ALGORITHMS 93 M ⊆ {Ri , . . . , Rn }. (c) For all (l1 , l2 ) ∈ L1 × L2 , perform the following steps: i. Let L be the result of merging l1 and l2 according to their ranks. ii. Use Ri L to update the current-best join ordering. Suppose that conjecture 3.2.2 is true, and we can replace the backtracking part by a search for the first solution. Then the complexity of the step 1 is O(n4 ), Pn 2 whereas the complexity of step 2 amounts to i=1 (O(i ) + O(n − i)2 + O(n)) = O(n3 ). Hence, the total complexity would be O(n4 ) in the worst case. Of course, if our conjecture is false, the necessary backtracking step might lead to an exponential worst case complexity. The second algorithm The second algorithm is much simpler than the first one but proves to be less efficient in practice. Since the new algorithm is very similar to some parts of the old one, we just point out the differences between both algorithms. The new version of the algorithm works as follows. Algorithm CHAIN-II’: 1. Use dynamic programming to compute an optimal recursive decomposable chain for the whole set of relations {R1 , . . . , Rn }. 2. Normalize the resulting chain. 3. Reorder the contradictory subchains according to their ranks. 4. De-normalize the sequence. Step 1 is identical to step 2 of our first algorithm. Note that Lemma 3.2.15 cannot be applied to the sequence in Step 2, since an optimal recursive decomposable chain is not necessarily an optimal chain. Therefore, the question arises whether Step 3 really makes sense. One can show that the partial order defined by the precedence relation among the contradictory subchains has the property that all elements along paths in the partial order are sorted by rank. By computing a greedy topological ordering (greedy with respect to the ranks), we obtain a sequence as requested in step 3. Let us briefly analyze the worst case time complexity of the second algorithm. The first step requires time O(n4 ), whereas the second step requires time O(n2 ). The third step has complexity O(n log n). Hence, the total complexity is O(n4 ). Algorithm II’ is based on the cost function C ′ . We can now modify the algorithm for the original cost function C as follows. Algorithm CHAIN-II: 1. Compute all optimal recursive decomposable chains by dynamic programming (corresponds to step 1 of Algorithm II’) 2. For each starting relation Ri , perform the following steps: 94 CHAPTER 3. JOIN ORDERING (a) Let L1 be the result of applying the steps 2 and 3 of Algorithm II’ to all optimal recursive decomposable subchains whose extent (N, M ) satisfies Ri ∈ N and M ⊆ {R1 , . . . , Ri }. (b) Let L2 be the result of applying the steps 2 and 3 of Algorithm II’ to all optimal recursive decomposable subchains whose extent (N, M ) satisfies Ri N and M ⊆ {Ri , . . . , Rn }. (c) Let L be the result of merging L1 and L2 according to their ranks. (d) De-normalize L. (e) Use Ri L to update the current-best join ordering. The complexity of Step 1 is O(n4 ), whereas the complexity of Step 2 amounts P to ni=1 (O(i2 ) + O(n − i)2 + O(n)) = O(n3 ). Hence, the time complexity of Algorithm II is O(n4 ). Summarizing, we are now left with one algorithm that produces the optimal result but whose worst-case runtime behavior is unknown and one algorithm with polynomial runtime but producing a result which has not been proven to be optimal. Due to this lack of hard facts, Moerkotte and Scheufele ran about 700,000 experiments with random queries of sizes up to 30 relations and fewer experiments for random queries with up to 300 relations to compare the results of our algorithms. For n ≤ 15, they additionally compared the results with a standard dynamic programming algorithm. Their findings can be summarized as follows. • All algorithms yielded identical results. • Backtracking always led to exactly one sequence of contradictory chains. • In the overwhelming majority of cases the first algorithm proved to be faster than the second. EX Whereas the run time of the second algorithm is mainly determined by the number of relations in the query, the run time of the first also heavily depends on the number of existing optimal contradictory subchains. In the worst case, the first algorithm is slightly inferior to the second. Additionally, Hamalainen reports on an independent implementation of the second algorithm [394]. He could not find an example where the second algorithm did not produce the optimal result either. We encourage the reader to prove that it produces the optimal result. 3.2.8 Transformation-Based Approaches The idea of transformation-based algorithms can be described as follows. Starting from an arbitrary join tree, equivalences (such as commutativity and associativity) are applied to it to derive a set of new join trees. For each of the join trees, the equivalences are again applied to derive even more join trees. This procedure is repeated until no new join tree can be derived. This procedure exhaustively enumerates the set of all bushy trees. Furthermore, before an 95 3.2. DETERMINISTIC ALGORITHMS equivalence is applied, it is difficult to see whether the resulting join tree has already been produced or not (see also Figure 2.6). Thus, this procedure is highly inefficient. Hence, it does not play any role in practice. Nevertheless, we give the pseudo-code for it, since it forms the basis for several of the following algorithms. We split the exhaustive transformation approach into two algorithms. One that applies all equivalences to a given join tree (ApplyTransformations) and another that does the loop (ExhaustiveTransformation). A transformation is applied in a directed way. Thus, we reformulate commutativity and associativity as rewrite rules using ; to indicate the direction. The following table summarizes all rules commonly used in transformationbased and randomized join ordering algorithms. The first three are directly derived from the commutativity and associativity laws for the join. The other rules are shortcuts used under special circumstances. For example, left associativity may turn a left-deep tree into a bushy tree. When only left-deep trees are to be considered, we need a replacement for left associativity. This replacement is called left join exchange. R1 B R2 (R1 B R2 ) B R3 R1 B (R2 B R3 ) (R1 B R2 ) B R3 R1 B (R2 B R3 ) ; ; ; ; ; R2 B R1 R1 B (R2 B R3 ) (R1 B R2 ) B R3 (R1 B R3 ) B R2 R2 B (R1 B R3 ) Commutativity Right Associativity Left Associativity Left Join Exchange Right Join Exchange Two more rules are often used to transform left-deep trees. The first operation (swap) exchanges two arbitrary relations in a left-deep tree. The second operation (3Cycle) performs a cyclic rotation of three arbitrary relations in a left-deep tree. To account for different join methods, a rule called join method exchange is introduced. The first rule set (RS-0) we are using contains the commutativity rule and both associativity rules. Applying associativity can lead to cross products. RS-0 If we do not want to consider cross products, we only apply any of the two associativity rules if the resulting expression does not contain a cross product. It is easy to extend ApplyTransformations to cover this by extending the if conditions with and (ConsiderCrossProducts || connected(·)) where the argument of connected is the result of applying a transformation. ExhaustiveTransformation({R1 , . . . , Rn }) Input: a set of relations Output: an optimal join tree Let T be an arbitrary join tree for all relations Done = ∅; // contains all trees processed ToDo = {T }; // contains all trees to be processed while (!empty(ToDo)) { Let T be an arbitrary tree in ToDo 96 CHAPTER 3. JOIN ORDERING ToDo \ = T ; Done ∪ = T ; Trees = ApplyTransformations(T ); for all T ∈ Trees do { if (T ̸∈ ToDo ∪ Done) { ToDo + = T ; } } } return cheapest tree found in Done; ApplyTransformations(T ) Input: join tree Output: all trees derivable by associativity and commutativity Trees = ∅; Subtrees = all subtrees of T rooted at inner nodes for all S ∈ Subtrees do { if (S is of the form S1 B S2 ) { Trees + = S2 B S1 ; } if (S is of the form (S1 B S2 ) B S3 ) { Trees + = S1 B (S2 B S3 ); } if (S is of the form S1 B (S2 B S3 )) { Trees + = (S1 B S2 ) B S3 ; } } return Trees; Besides the problems mentioned above, this algorithm also has the problem that the sharing of subtrees is a non-trivial task. In fact, we assume that ApplyTransformations produces modified copies of T . To see how ExhaustiveTransformation works, consider again Figure 2.6. Assume that the top-left join tree is the initial join tree. Then, from this join tree ApplyTransformations produces all trees reachable by some edge. All of these are then added to ToDo. The next call to ApplyTransformations with any to the produced join trees will have the initial join tree contained in Trees. The complete set of visited join trees after this step is determined from the initial join tree by following at most two edges. Let us reformulate the algorithm such that it uses a data structure similar to dynamic programming or memoization in order to avoid duplicate work. For any subset of relations, dynamic programming remembers the best join tree. This does not quite suffice for the transformation-based approach. Instead, we have to keep all join trees generated so far including those differing in the order of the arguments or a join operator. However, subtrees can be shared. This is done by keeping pointers into the data structure (see below). So, the difference between dynamic programming and the transformation-based approach becomes smaller. The main remaining difference is that dynamic programming 3.2. DETERMINISTIC ALGORITHMS 97 only considers these join trees while with the transformation-based approach we have to keep the considered join trees since other join trees (more beneficial) might be generatable from them. The data structure used for remembering trees is often called the MEMO structure. For every subset of relations to be joined (except the empty set), a class exists in the MEMO structure. Each class contains all the join trees that join exactly the relations describing the class. Here is an example for join trees containing three relations. {R1 , R2 , R3 } {R2 , R3 } {R1 , R3 } {R1 , R2 } {R3 } {R2 } {R1 } {R1 , R2 } B R3 , R3 B {R1 , R2 }, {R1 , R3 } B R2 , R2 B {R1 , R3 }, {R2 , R3 } B R1 , R1 B {R2 , R3 } {R2 } B {R3 }, {R3 } B {R2 } {R1 } B {R3 }, {R3 } B {R1 } {R1 } B {R2 }, {R2 } B {R1 } R3 R2 R1 Here, we used the set notation {. . .} as an argument to a join to denote a reference to the class of join trees joining the relations contained in it. We reformulate our transformation-based algorithm such that it fills in and uses the MEMO structure [683]. In a first step, the MEMO structure is initialized by creating an arbitrary join tree for the class {R1 , . . . , Rn } and then going down this join tree and creating an entry for every join encountered. Then, we call ExploreClass on the root class comprising all relations to be joined. ExploreClass then applies ApplyTransformations2 to every member of the class it is called upon. ApplyTransformations2 then applies all rules to generate alternatives. ExhaustiveTransformation2(Query Graph G) Input: a query specification for relations {R1 , . . . , Rn }. Output: an optimal join tree initialize MEMO structure ExploreClass({R1 , . . . , Rn }) return best of class {R1 , . . . , Rn } ExploreClass(C) Input: a class C ⊆ {R1 , . . . , Rn } Output: none, but has side-effect on MEMO-structure while (not all join trees in C have been explored) { choose an unexplored join tree T in C ApplyTransformation2(T ) mark T as explored 98 CHAPTER 3. JOIN ORDERING } return ApplyTransformations2(T ) Input: a join tree of a class C Output: none, but has side-effect on MEMO-structure ExploreClass(left-child(T )); ExploreClass(right-child(T )); foreach transformation T and class member of child classes { foreach T ′ resulting from applying T to T { if T ′ not in MEMO structure { add T ′ to class C of MEMO structure } } } return RS-1 ApplyTransformations2 uses a set of transformations to be applied. We discuss now the effect of different transformation sets on the complexity of the algorithm. Applying ExhaustiveTransformation2 with a rule set consisting of Commutativity and Left and Right Associativity generates 4n −3n+1 +2n+2 − n − 2 duplicates for n relations. Contrast this with the number of join trees contained in a completely filled MEMO structure3 : 3n − 2n+1 + n + 1. This clearly shows the problem. The problem of generating the same join tree several times was considered by Pellenkoft, Galindo-Legaria, and Kersten [683, 684, 685]. The solution lies in parameterizing ExhaustiveTransformation2 by an appropriate set of transformations. The basic idea is to remember for every join operator which rules are applicable to it. For example, after applying commutativity to a join operator, we disable commutativity for it. For acyclic queries, the following rule set guarantees that all bushy join trees are generated, but no duplicates [685]. Thereby, cross products are not considered. That is, a rule is only applicable if it does not result in a cross product. This restricts the applicability of the above algorithm to connected queries. We use Ci to denote some class of the MEMO structure. We call the following rule set RS-1: T1 : Commutativity C1 B0 C2 ; C2 B1 C1 Disable all transformations T1 , T2 , and T3 for B1 . T2 : Right Associativity (C1 B0 C2 ) B1 C3 ; C1 B2 (C2 B3 C3 ) Disable transformations T2 and T3 for B2 and enable all rules for B3 . T3 : Left associativity C1 B0 (C2 B1 C3 ) ; (C1 B2 C2 ) B3 C3 Disable transformations T2 and T3 for B3 and enable all rules for B2 . 3 The difference to the according number for dynamic programming is due to the fact that we have to keep alternatives generated by commutativity and that join trees for single relations are counted. 99 3.2. DETERMINISTIC ALGORITHMS Class {R1 , R2 , R3 , R4 } Initialization Transformation Step {R1 , R2 } B111 {R3 , R4 } {R3 , R4 } B000 {R1 , R2 } 3 R1 B100 {R2 , R3 , R4 } 4 {R1 , R2 , R3 } B100 R4 5 {R2 , R3 , R4 } B000 R1 8 R4 B000 {R1 , R2 , R3 } 10 {R2 , R3 , R4 } R2 B111 {R3 , R4 } {R3 , R4 } B000 R2 {R2 , R3 } B100 R4 R4 B000 {R2 , R3 } 4 6 6 7 R3 B111 R4 {R1 , R2 } B111 R3 R3 B000 {R1 , R2 } R1 B100 {R2 , R3 } {R2 , R3 } B000 R1 R4 B000 R3 5 9 9 9 2 R1 B111 R2 R2 B000 R1 1 {R1 , R3 , R4 } {R1 , R2 , R4 } {R1 , R2 , R3 } {R3 , R4 } {R2 , R4 } {R2 , R3 } {R1 , R4 } {R1 , R3 } {R1 , R2 } Figure 3.14: Example of rule transformations (RS-1) In order to be able to follow these rules, the procedure ApplyTransformations2 has to be enhanced such that it is able to keep track of the application history of the rules for every join operator. The additional memory requirement is neglectible, since a single bit for each rules suffices. As an example, let us consider the chain query R1 −R2 −R3 −R4 . Figure 3.14 shows the MEMO structure. The first column gives the sets of the relations identifying each class. We leave out the single relation classes assuming that {Ri } has Ri as its only join tree which is marked as explored. The second column shows the initialization with an arbitrarily chosen join tree. The third column is the one filled by the Apply Transformation2 procedure. We apply the rule set RS-1, which consists of three transformations. Each join is annotated with three bits, where the i-th bit indicates whether Ti is applicable (1) or not (0). After initializing the MEMO structure, ExhaustiveTransformation2 calls ExploreClass for {R1 , R2 , R3 , R4 }. The only (unexplored) join tree is {R1 , R2 }B111 {R3 , R4 }, which will become the argument of ApplyTransformations2. Next, ExploreClass is called on {R1 , R2 } and {R3 , R4 }. In both cases, T1 is the only applicable rule, and the result is shown in the third column under steps 1 and 2. Now we have to apply all transformations on {R1 , R2 } B111 {R3 , R4 }. 100 CHAPTER 3. JOIN ORDERING Commutativity T1 gives us {R3 , R4 } B000 {R1 , R2 } (Step 3). For right associativity, we have two elements in class {R1 , R2 }. Substituting them and applying T2 gives 1. (R1 B R2 ) B {R3 , R4 } ; R1 B100 (R2 B111 {R3 , R4 }) 2. (R2 B R1 ) B {R3 , R4 } ; R2 B111 (R1 A {R3 , R4 }) The latter contains a cross product. This leaves us with the former as the result of Step 4. The right argument of the top most join is R2 B111 {R3 , R4 }. Since we do not find it in class {R2 , R3 , R4 }, we add it (4). T3 is next. 1. {R1 , R2 } B (R3 B R4 ) ; ({R1 , R2 } B111 R3 ) B100 R4 2. {R1 , R2 } B (R4 B R3 ) ; ({R1 , R2 } A R4 ) B100 R3 The latter contains a cross product. This leaves us with the former as the result of Step 5. We also add {R1 , R2 } B111 R3 . Now that {R1 , R2 } B111 {R3 , R4 } is completely explored, we turn to {R3 , R4 }B000 {R1 , R2 }, but all transformations are disabled here. R1 B100 {R2 , R3 , R4 } is next. First, {R2 , R3 , R4 } has to be explored. The only entry is R2 B111 {R3 , R4 }. Remember that {R3 , R4 } is already explored. T2 is not applicable. The other two transformations give us T1 {R3 , R4 } B000 R2 T3 (R2 B000 R3 ) B100 R4 and (R2 A R4 ) B100 R3 Those join trees not exhibiting a cross product are added to the MEMO structure under 6. Applying commutativity to {R2 , R4 } B100 R3 gives 7. Commutativity is the only rule enabled for R1 B100 {R2 , R3 , R4 }. Its application results in 8. {R1 , R2 , R3 } B100 R4 is next. It is simple to explore the class {R1 , R2 , R3 } with its only entry {R1 , R2 } B111 R3 : T1 R3 B000 {R1 , R2 } T2 R1 B100 (R2 B111 R3 ) and R2 B100 (R1 A R3 ) Commutativity can still be applied to R1 B100 (R2 B111 R3 ). All the new entries are numbered 9. Commutativity is the only rule enabled for {R1 , R2 , R3 }B100 R4 Its application results in 10. 2 The next two sets of transformations were originally intended for generating all bushy/left-deep trees for a clique query [684]. They can, however, also be used to generate all bushy trees when cross products are considered. The rule set RS-2 for bushy trees is T1 : Commutativity C1 B0 C2 ; C2 B1 C1 Disable all transformations T1 , T2 , T3 , and T4 for B1 . 3.3. PROBABILISTIC ALGORITHMS 101 T2 : Right Associativity (C1 B0 C2 ) B1 C3 ; C1 B2 (C2 B3 C3 ) Disable transformations T2 , T3 , and T4 for B2 . T3 : Left Associativity C1 B0 (C2 B1 C3 ) ; (C1 B2 C2 ) B3 C3 Disable transformations T2 , T3 and T4 for B3 . T4 : Exchange (C1 B0 C2 ) B1 (C3 B2 C4 ) ; (C1 B3 C3 ) B4 (C2 B5 C4 ) Disable all transformations T1 , T2 , T3 , and T4 for B4 . If we initialize the MEMO structure with left-deep trees, we can strip down the above rule set to Commutativity and Left Associativity. The reason is an observation made by Shapiro et al.: from a left-deep join tree we can generate all bushy trees with only these two rules [798]. If we want to consider only left-deep trees, the following rule set RS-3 is appropriate: T1 Commutativity R1 B0 R2 ; R2 B1 R1 Here, the Ri are restricted to classes with exactly one relation. T1 is disabled for B1 . T2 Right Join Exchange (C1 B0 C2 ) B1 C3 ; (C1 B2 C3 ) B3 C2 Disable T2 for B3 . 3.3 Probabilistic Algorithms 3.3.1 Generating Random Left-Deep Join Trees with Cross Products The basic idea of the algorithms in this section and the following sections is to generate a set of randomly chosen join trees, evaluate their costs, and return the best one. The problem with this approach lies in the random generation of join trees: every join tree has to be generated with equal probability. Although there are some advocates of the pure random approach [303, 304, 306, 302], typically a random join tree or a set of random join trees is used in subsequent algorithms like iterative improvement and simulated annealing. Obviously, if we do not consider cross products the problem is really hard, since the query graph plays an important role. So let us start with the simplest case where random join trees are generated that might contain cross products even for connected query graphs. Then, any join tree is a valid join tree. The general idea behind all algorithms is the following. Assume that the number of join trees in the considered search space is known to be N . Then, instead of generating a random join tree directly, a bijective mapping from the interval of non-negative integers [0, N [ to a join tree in the search space is established. Then, a random join tree can be generated by (1) generating a random number in [0, N [ and (2) mapping the number to the join tree. The problem of bijectively mapping an interval of non-negative integers to elements of a set is usually called unranking. The opposite mapping is called ranking. Obviously, the crux in our case is the efficiency of the unranking problem. 102 CHAPTER 3. JOIN ORDERING We start with generating random left-deep join trees for n relations. This problem is identical to generating random permutations. That is, we look for a fast unranking algorithm that maps the non-negative integers in [0, n![ to permutations. Let us consider permutations of the numbers {0, . . . , n − 1}. A mapping between these numbers and relations is established easily, e.g. via an array. The traditional approach to ranking/unranking of permutations is to first define an ordering on the permutations and then find a ranking and unranking algorithm relative to that ordering. For the lexicographic order, algorithms require O(n2 ) time [556, 725]. More sophisticated algorithms separate the ranking/unranking algorithms into two phases. For ranking, first the inversion vector of the permutation is established. Then, ranking takes place for the inversion vector. Unranking works in the opposite direction. The inversion vector of a permutation π = π0 , . . . , πn−1 is defined to be the sequence v = v0 , . . . , vn−1 , where vi is equal to the number of entries πj with πj > πi and j < i. Inversion vectors uniquely determine a permutation [876]. However, naive algorithms of this approach again require O(n2 ) time. Better algorithms require O(n log n). Using an elaborated data structure, Dietz’ algorithm requires O((n log n)/(log log n)) [242]. Other orders like the Steinhaus-JohnsonTrotter order have been exploited for ranking/unranking but do not yield any run-time advantage over the above mentioned algorithms (see [519, 725]). Since it is not important for our problem that any order constraints are satisfied for the ranking/unranking functions, we use the fastest possible algorithm established by Myrvold and Ruskey [636]. It runs in O(n) which is also easily seen to be a lower bound. The algorithm is based on the standard algorithm to generate random permutations [224, 251, 630]. An array π is initialized such that π[i] = i for 0 ≤ i ≤ n − 1. Then, the loop for (k = n − 1; k >= 0; − − k) swap(π[k], π[random(k)]); is executed where swap exchanges two elements and random(k) generates a random number in [0, k]. This algorithm randomly picks any of the possible permutations. Assume the random elements produced by the algorithm are rn−1 , . . . , r0 where 0 ≤ ri ≤ i. Obviously, there are exactly n(n − 1)(n − 2) . . . 1 = n! such sequences and there is a one-to-one correspondence between these sequences and the set of all permutations. We can thus unrank r ∈ [0, n![ by turning it into a unique sequence of values rn−1 , . . . , r0 . Note that after executing the swap with rn−1 , every value in [0, n[ is possible at position π[n−1]. Further, π[n−1] is never touched again. Hence, we can unrank r as follows. We first set rn−1 = r mod n and perform the swap. Then, we define r′ = ⌊r/n⌋ and iteratively unrank r′ to construct a permutation of n − 1 elements. The following algorithm realizes this idea. Unrank(n, r) { Input: the number n of elements to be permuted and the rank r of the permutation to be constructed 3.3. PROBABILISTIC ALGORITHMS 103 Output: a permutation π for (i = 0; i < n; + + i) π[i] = i; Unrank-Sub(n, r, π); return π; } } } Unrank-Sub(n, r, π) { for (i = n; i > 0; − − i) { swap(π[i − 1], π[r mod i]); r = ⌊r/i⌋; } 3.3.2 Generating Random Join Trees with Cross Products Next, we want to randomly construct bushy plans possibly containing cross products. This is done in several steps: 1. Generate a random number b in [0, C(n − 1)[. 2. Unrank b to obtain a bushy tree with n − 1 inner nodes. 3. Generate a random number p in [0, n![. 4. Unrank p to obtain a permutation. 5. Attach the relations in order p from left to right as leaf nodes to the binary tree obtained in Step 2. The only step that we still have to discuss is Step 2. It is a little involved and we can only try to bring across the general idea. For details, the reader is referred to the literature [556, 557, 558]. Consider Figure 3.15. It contains all 14 possible trees with four inner nodes. The trees are ordered according to the rank we will consider. The bottom-most number below any tree is its rank in [0, 14[. While unranking, we do not generate the trees directly, but an encoding of the tree instead. This encoding works as follows. Any binary tree corresponds to a word in a Dyck language with one pair of parenthesis. The alphabet hence consists of Σ = {′ (′ , ′ )′ }. For join trees with n inner nodes, we use Dyck words of length 2n whose parenthesization is correct. That is, for every ′ (′ , we have a subsequent ′ )′ . From a given join tree, we obtain the Dyck word by a preorder traversal. Whenever we encounter an inner node, we encode this with a ′ (′ . All but the last leaf nodes are encoded by a ′ )′ . Appending all these 2n encodings gives us a Dyck word of length 2n. Figure 3.15 shows directly below each tree its corresponding Dyck word. In the line below, we simply changed the representation by substituting every ′ (′ by a ′ 1′ and every ′ )′ by a ′ 0′ . The encoding that will be generated by the unranking algorithm is shown in the third line below each tree: we remember the places (index in the bit-string) where we find a ′ 1′ . 104 CHAPTER 3. JOIN ORDERING B B B B B B B B B B B B B B B B B B B B (((()))) 11110000 1, 2, 3, 4 0 ( ( ( )( ) ) ) 11101000 1, 2, 3, 5 1 B B B B B B ((()))() 11100010 1, 2, 3, 7 3 ((())()) 11100100 1, 2, 3, 6 2 B B B B B B B B B B B (()())() 11010010 1, 2, 4, 7 6 B B B (())(()) 11001100 1, 2, 5, 6 7 B B B ()((())) 10111000 1, 3, 4, 5 9 B B B ()(())() 10110010 1, 3, 4, 7 11 (())()() 11001010 1, 2, 5, 7 8 B B B B ()(()()) 10110100 1, 3, 4, 6 10 B B B B (()()()) 11010100 1, 2, 4, 6 5 (()(())) 11011000 1, 2, 4, 5 4 ()()(()) 10101100 1, 3, 5, 6 12 B B ()()()() 10101010 1, 3, 5, 7 13 Figure 3.15: Encoding Trees In order to do the unranking, we need to do some counting. Therefor, we map Dyck words to paths in a triangular grid. For n = 4 this grid is shown in Figure 3.16. We always start at (0, 0) which means that we have not opened a parenthesis. When we are at (i, j), opening a parenthesis corresponds to going to (i + 1, j + 1) and closing a parenthesis to going to (i + 1, j − 1). We have thus established a bijective mapping between Dyck words and paths in the grid. Thus counting Dyck words corresponds to counting paths. 105 3.3. PROBABILISTIC ALGORITHMS 1 4 [0,0] 1 4 3 [1,4[ 3 9 2 1 [4,9[ 1 5 14 2 [9,14[ 1 2 3 4 5 6 7 8 Figure 3.16: Paths The number of different paths from (0, 0) to (i, j) can be computed by   i+1 j+1 p(i, j) = i + 1 21 (i + j) + 1 These numbers are called the Ballot numbers [131]. The number of paths from (i, j) to (2n, 0) can thus be computed as (see [557, 558]): q(i, j) = p(2n − i, j) Note the special case q(0, 0) = p(2n, 0) = C(n). In Figure 3.16, we annotated nodes (i, j) by p(i, j). These numbers can be used to assign (sub-) intervals to paths (Dyck words, trees). For example, if we are at (4, 4), there exists only a single path to (2n, 0). Hence, the path that travels the edge (4, 4) → (5, 3) has rank 0. From (3, 3) there are four paths to (2n, 0), one of which we already considered. This leaves us with three paths that travel the edge (3, 3) → (4, 2). The paths in this part as assigned ranks in the interval [1, 4[. Figure 3.16 shows the intervals near the edges. For unranking, we can now proceed as follows. Assume we have a rank r. We consider opening a parenthesis (go from (i, j) to (i + 1, j + 1)) as long as the number of paths from that point does no longer exceed our rank r. If it does, we close a parenthesis instead (go from (i, j) to (i − 1, j + 1)). Assume, that we went upwards to (i, j) and then had to go down to (i − 1, j + 1). We subtract the number of paths from (i + 1, j + 1) from our rank r and proceed iteratively from (i − 1, j + 1) by going up as long as possible and going down again. Remembering the number of parenthesis opened and closed along our way results in the required encoding. The following algorithm finalizes these ideas. UnrankTree(n, r) Input: a number of inner nodes n and a rank r ∈ [0, C(n − 1)] 106 CHAPTER 3. JOIN ORDERING Output: encoding of the inner leaves of a tree lNoParOpen = 0; lNoParClose = 0; i = 1; // current encoding j = 0; // current position in encoding array while (j < n) { k = q(lNoParOpen + lNoParClose + 1, lNoParOpen - lNoParClose + 1); if (k ≤ r) { r -= k; ++lNoParClose; } else { aTreeEncoding[j++] = i; ++lNoParOpen; } ++i; } Given an array with the encoding of a tree, it is easy to construct the tree from it. The following procedure does that. TreeEncoding2Tree(n, aEncoding) { Input: the number of internal nodes of the tree n Output: root node of the result tree root = new Node; /* root of the result tree */ curr = root; /* curr: current internal node whose subtrees are to be created */ i = 1; /* pointer to entry in encoding */ child = 0; /* 0 = left , 1 = right: next child whose subtree is to be created */ while (i < n) { lDiff = aEncoding[i] - aEncoding[i − 1]; for (k = 1; k < lDif f ; + + k) { if (child == 0) { curr->addLeftLeaf(); child = 1; } else { curr->addRightLeaf(); while (curr->right() != 0) { curr = curr->parent(); } child = 1; } } if (child == 0) { curr->left(new Node(curr)); // curr becomes parent of new node curr = curr->left(); ++i; 3.3. PROBABILISTIC ALGORITHMS 107 child = 0; } else { curr->right(new Node(curr)); curr = curr->right(); ++i; child = 0; } } while (curr != 0) { curr->addLeftLeaf(); // addLeftLeaf adds leaf if no left-child exists curr->addRightLeaf(); // analogous curr = curr->parent(); } return root; } 3.3.3 Generating Random Join Trees without Cross Products A general solution for randomly generating join trees without cross products is not known. However, if we restrict ourselves to acyclic queries, we can apply an algorithm developed by Galindo-Legaria, Pellenkoft, and Kersten [304, 303, 306]. For this algorithm to work, we have to assume that the query graph is connected and acyclic. For the rest of this section, we assume that G = (V, E) is the query graph and |V | = n. That is, n relations are to be joined. No join tree contains a cross product. With every node in a join tree, we associate a level . The root has level 0. Its children have level 1, and so on. We further use lower-case letters for relations. For a given query graph G, we denote by TG the set of join trees for G. v(k) Let TG ⊆ TG be the subset of join trees where the leaf node (i.e. relation) v occurs at level k. Some trivial observations follow. If the query graph consists v(0) of a single node (n = 1), then |TG | = |TG | = 1. If n > 1, the top node in v(0) the join tree is a join and not a relation. Hence, |TG | = 0. Obviously, the v(k) maximum level that can occur in any join tree is n − 1. Hence, |TG | = 0 if k ≥ n. Since the level at which a leaf node v occurs in some join tree is v(k) v(i) v(j) unique, we have TG = ∪nk=0 TG and TG ∩ TG = ∅ for i ̸= j. This gives us Pn v(k) |TG | = k=0 |TG |. The algorithm generates an unordered tree with n leaf nodes. If we wish to have a random ordered tree, we have to pick one of the 2n−1 possibilities to order the (n − 1) joins within the tree. We proceed as follows. We start with some notation for lists, discuss how two lists can be merged, describe how a specific merge can be specified, and count the number of possible merges. This is important, since join trees will be described as lists of trees. Given a leaf node v, we simply traverse the path from the root to v. Thereby, subtrees that branch off can be collected into a list of trees. After these remarks, we start developing the algorithm in several steps. First, we consider two operations 108 CHAPTER 3. JOIN ORDERING R1 S1 R2 S2 R1 R1 S1 S1 R2 R1 R2 v v R S S1 S2 R2 S2 S2 v v v (R, S, [1, 1, 0]) (R, S, [2, 0, 0]) (R, S, [0, 2, 0]) Figure 3.17: Tree-merge with which we can construct new join trees: leaf-insertion introduces a new leaf node into a given tree and tree-merging merges two join trees. Since we do not want to generate cross products in this section, we have to apply these operations carefully. Therefor, we need a description of how to generate all valid join trees for a given query graph. The central data structure for this purpose is the standard decomposition graph (SDG). Hence, in the second step, we define SDGs and introduce an algorithm that derives an SDG from a given query graph. In the third step, we start counting. The fourth and final step consists of the unranking algorithm. We do not discuss the ranking algorithm. It can be found in [306]. We use the Prolog notation | to separate the first element of a list from its tail. For example, the list ⟨a|t⟩ has a as its first element and a tail t. Assume that P is a property of elements. A list l′ is the projection of a list L on P , if L′ contains all elements of L satisfying the property P . Thereby, the order is retained. A list L is a merge of two disjoint lists L1 and L2 if L contains all elements from L1 and L2 and both are projections of L. A merge of a list L1 with a list L2 whose respective lengths are l1 and l2 can be described by an array α = [α0 , . . . , αl2 ] of non-negative integers whose sum is equal to l1 . The non-negative integer αi−1 gives the number of elements of L1 which precede the i-th element of L2 in the merged list. We obtain the merged list L by first taking α0 elements from L1 . Then, an element from L2 follows. Then α1 elements from L1 and the next element of L2 follow and so on. Finally follow the last αl2 elements of L1 . Figure 3.17 illustrates possible merges. Compare list merges to the problem of non-negative (weak) integer composition [?]. There, we ask for the number of compositions of a non-negative P integer n into k non-negative integers αi with ki=1 αk = n. The answer is  n+k−1 [829]. Since we have to decompose l1 into l2 + 1 non-negative intek−1  2 gers, the number of possible merges is M (l1 , l2 ) = l1 l+l . The observation 2 109 3.3. PROBABILISTIC ALGORITHMS M (l1 , l2 ) = M (l1 − 1, l2 ) + M (l1 , l2 − 1) allows us to construct an array of size n ∗ n in O(n2 ) that materializes the values for M . This array will allow us to rank list merges in O(l1 + l2 ). The idea for establishing a bijection between [1, M (l1 , l2 )] and the possible αs is a general one and used for all subsequent algorithms of this section. Assume that we want to rank the elements of some set S and S = ∪ni=0 Si is partitioned into disjoint Si . If we want to rank x ∈ Sk , we first find the local rank of x ∈ Sk . The rank of x is then defined as k−1 X i=0 |Si | + local-rank(x, Sk ) To unrank some number r ∈ [1, N ], we first find k such that k = min(r ≤ j j X i=0 |Si |) Then, we proceed by unranking with the new local rank r′ = r − k−1 X i=0 |Si | within Sk . Accordingly, we partition the set of all possible merges into subsets. Each subset is determined by α0 . For example, the set of possible merges of two lists L1 and L2 with length l1 = l2 = 4 is partitioned into subsets with α0 = j for 0 ≤ j ≤ 4. In each partition, we have M (j, l2 − 1) elements. To unrank a number P r ∈ [1, M (l1 , l2 )], we first determine the partition by computing k = j ′ minj r ≤ i=0 M (j, l2 − 1). Then, α0 = l1 − k. With the new rank r = Pk r − i=0 M (j, l2 − 1), we start iterating all over. The following table gives the numbers for our example and can be used to understand the unranking algorithm. The algorithm itself can be found in Figure 3.18. k α0 (k, l2 − 1) M (k, l2 − 1) rank intervals 0 4 (0, 3) 1 [1, 1] 1 3 (1, 3) 4 [2, 5] 2 2 (2, 3) 10 [6, 15] 3 1 (3, 3) 20 [16, 35] 4 0 (4, 3) 35 [36, 70] We now turn to the anchored list representation of join trees. Definition 3.3.1 Let T be a join tree and v be a leaf of T . The anchored list representation L of T is constructed as follows: • If T consists of the single leaf node v, then L = ⟨⟩. • If T = (T1 B T2 ) and without loss of generality v occurs in T2 , then L = ⟨T1 |L2 ⟩, where L2 is the anchored list representation of T2 . 110 CHAPTER 3. JOIN ORDERING UnrankDecomposition(r, l1 , l2 ) Input: a rank r, two list sizes l1 and l2 Output: a merge specification α. for (i = 0; i ≤ l2 ; + + i) { alpha[i] = 0; } i = k = 0; while (l1 > 0 && l2 > 0) { m = M (k, l2 − 1); if (r ≤ m) { alpha[i + +] = l1 − k; l1 = k; k = 0; − − l2 ; } else { r− = m; + + k; } } alpha[i] = l1 ; return alpha; Figure 3.18: Algorithm UnrankDecomposition T1 v T1 T1 T1 v T2 T2 T2 w T T2 w w w (T, 1) (T, 2) (T, 3) v Figure 3.19: Leaf-insertion We then write T = (L, v). v(k) Observe that if T = (L, v) ∈ TG , then T ∈ TG ≺≻ |L| = k. The operation leaf-insertion is illustrated in Figure 3.19. A new leaf v is inserted into the tree at level k. Formally, it is defined as follows. 111 3.3. PROBABILISTIC ALGORITHMS e a b c d b e +e [0, 5, 5, 5, 3] c ∗c [0, 0, 2, 3] d [0, 1, 1] +c +c [0, 1] [0, 1] +b d [1] [1] a a Figure 3.20: A query graph, its tree, and its standard decomposition graph Definition 3.3.2 Let G = (V, E) be a query graph, T a join tree of G. v ∈ V be such that G′ = G|V \{v} is connected, (v, w) ∈ E, 1 ≤ k < n, and T T ′ = (⟨T1 , . . . , Tk−1 , v, Tk+1 , . . . , Tn ⟩, w) (3.10) = (⟨T1 , . . . , Tk−1 , Tk+1 , . . . , Tn ⟩, w). (3.11) Then we call (T ′ , k) an insertion pair on v and say that T is decomposed into (or constructed from) the pair (T ′ , k) on v. v(k) Observe that leaf-insertion defines a bijective mapping between TG and inserw(i) tion pairs (T ′ , k) on v, where T ′ is an element of the disjoint union ∪n−2 i=k−1 TG′ . The operation tree-merging is illustrated in Figure 3.17. Two trees R = (LR , w) and S = (LS , w) on a common leaf w are merged by merging their anchored list representations. Definition 3.3.3 Let G = (V, E) be a query graph, w ∈ V , T = (L, w) a join tree of G, V1 , V2 ⊆ V such that G1 = G|V1 and G2 = G|V2 are connected, V1 ∪ V2 = V , and V1 ∩ V2 = {w}. For i = 1, 2: • Define the property Pi to be “every leaf of the subtree is in Vi ”, • Let Li be the projection of L on Pi . • Ti = (Li , w). Let α be the integer composition such that L is the result of merging L1 and L2 on α. Then we call (T1 , T2 , α) a merge triplet. We say that T is decomposed into (constructed from) (T1 , T2 , α) on V1 and V2 . Observe that the tree-merging operation defines a bijective mapping between w(k) w(i) w(k−i) TG and merge triplets (T1 , T2 , α), where T1 ∈ TG1 , T2 ∈ TG2 , and α specifies a merge of two lists of sizes i and k − i. Further,  theknumber of these merges (i.e. the number of possibilities for α) is i+(k−i) = i . k−i A standard decomposition graph of a query graph describes the possible constructions of join trees. It is not unique (for n > 1) but anyone can be used 112 CHAPTER 3. JOIN ORDERING to construct all possible unordered join trees. For each of our two operations it has one kind of inner nodes. A unary node labeled +v stands for leaf-insertion of v. A binary node labeled ∗w stands for tree-merging its subtrees whose only common leaf is w. The standard decomposition graph of a query graph G = (V, E) is constructed in three steps: 1. pick an arbitrary node r ∈ V as its root node; 2. transform G into a tree G′ by directing all edges away from r; 3. call QG2SDG(G′ , r) with QG2SDG(G′ , r) Input: a query tree G′ = (V, E) and its root r Output: a standard query decomposition tree of G′ Let {w1 , . . . , wn } be the children of v; switch (n) { case 0: label v with "v"; case 1: label v as "+v "; QG2SDG(G′ , w1 ); otherwise: label v as "∗v "; create new nodes l, r with label +v ; E \ = {(v, wi )|1 ≤ i ≤ n}; E ∪ = {(v, l), (v, r), (l, w1 )} ∪ {(r, wi )|2 ≤ i ≤ n}; QG2SDG(G′ , l); QG2SDG(G′ , r); } return G′ ; Note that QG2SDG transforms the original graph G′ into its SDG by side-effects. Thereby, the n-ary tree is transformed into a binary tree similar to the procedure described by Knuth [504, Chap 2.3.2]. Figure 3.20 shows a query graph G, its tree G′ rooted at e, and its standard decomposition tree. v(k) For an efficient access to the number of join trees in some partition TG in the unranking algorithm, we materialize these numbers. This is done in the count array. The semantics of a count array [c0 , c1 , . . . , cn ] of a node u with label ◦v (◦ ∈ {+, ∗}) of the SDG is that u can construct ci different trees in which leaf v is at level i. Then, the total number of trees for a query can be computed by summing up all the ci in the count array of the root node of the decomposition tree. To compute the count and an additional summand adornment of a node labeled +v , we use the following lemma. 3.3. PROBABILISTIC ALGORITHMS 113 Lemma 3.3.4 Let G = (V, E) be a query graph with n nodes, v ∈ V such that G′ = G|V \v is connected, (v, w) ∈ E, and 1 ≤ k < n. Then X v(k) w(i) |TG | = |TG′ | i≥k−1 This lemma follows from the observation made after the definition of the leafinsertion operation. w(i) The sets TG′ used in the summands of Lemma 3.3.4 directly correspond v(k),i v(k),i to subsets TG (k − 1 ≤ i ≤ n − 2) defined such that T ∈ TG if v(k) 1. T ∈ TG , 2. the insertion pair on v of T is (T ′ , k), and w(i) 3. T ′ ∈ TG′ . v(k),i w(i) Further, |TG | = |TG′ |. For efficiency, we materialize the summands in an array of arrays summands. To compute the count and summand adornment of a node labeled ∗v , we use the following lemma. Lemma 3.3.5 Let G = (V, E) be a query graph, w ∈ V , T = (L, w) a join tree of G, V1 , V2 ⊆ V such that G1 = G|V1 and G2 = G|V2 are connected, V1 ∪ V2 = V , and V1 ∩ V2 = {v}. Then X k  v(i) v(k) v(k−i) |TG | = |TG1 | |TG2 | i i This lemma follows from the observation made after the definition of the treemerge operation. w(i) The sets TG′ used in the summands of Lemma 3.3.5 directly correspond v(k),i v(k),i to subsets TG (0 ≤ i ≤ k) defined such that T ∈ TG if v(k) 1. T ∈ TG , 2. the merge triplet on V1 and V2 of T is (T1 , T2 , α), and v(i) 3. T1 ∈ TG1 .  v(i) v(k),i v(k−i) Further, |TG | = ki |TG1 | |TG2 |. Before we come to the algorithm for computing the adornments count and summands, let us make one observation that follows directly from the above two lemmata. Assume a node v whose count array is [c1 , . . . , cP m ] and whose i summands is s = [s0 , . . . , sn ] with si = [si0 , . . . , sim ], then ci = m j=0 sj holds. Figure 3.21 contains the algorithm to adorn SDG’s nodes with count and summands. It has worst-case complexity O(n3 ). Figure 3.20 shows the count adornment for the SDG. Looking at the count array of the root node, we see that the total number of join trees for our example query graph is 18. The algorithm UnrankLocalTreeNoCross called by UnrankTreeNoCross adorns the standard decomposition graph with insert-at and merge-using annotations. These can then be used to extract the join tree. 114 CHAPTER 3. JOIN ORDERING Adorn(v) Input: a node v of the SDG Output: v and nodes below are adorned by count and summands Let {w1 , . . . , wn } be the children of v; switch (n) { case 0: count(v) := [1]; // no summands for v case 1: Adorn(w1 ); assume count(w1 ) = [c10 , . . . , c1m1 ]; P 1 1 count(v) = [0, c1 , . . . , cm1 +1 ] where ck = m i=k−1 ci ; 0 m +1 k k 1 summands(v) = [s , . . . , s ] where s = [s0 , . . . , skm1 +1 ] and  1 ci if 0 < k and k − 1 ≤ i ski = 0 else case 2: Adorn(w1 ); Adorn(w2 ); assume count(w1 ) = [c10 , . . . , c1m1 ]; assume count(w2 ) = [c20 , . . . , c2m2 ]; count(v) = [c0 , . . . , cm1 +m2 ] where P 1 k 1 2 2 ck = m i=0 i ci ck−i ; // ci = 0 for i ̸∈ {0, . . . , m2 } 0 m +m 1 2 summands(v) ] where sk = [sk0 , . . . , skm1 ] and  k 1 2= [s , . . . , s i ci ck−i if 0 ≤ k − i ≤ m2 ski = 0 else } Figure 3.21: Algorithm Adorn UnrankTreeNoCross(r,v) Input: a rank r and the root v of the SDG Output: adorned SDG let count(v) = [x0 , . . . , xm ]; P k := minj r ≤ ji=0 xi ; // efficiency: binary search on materialized sums. P r′ := r − k−1 i=0 xi ; UnrankLocalTreeNoCross(v, r′ , k); e(k) The following table shows the intervals associated with the partitions TG the standard decomposition graph in Figure 3.20: Partition e(1) TG e(2) TG e(3) TG e(4) TG Interval [1, 5] [6, 10] [11, 15] [16, 18] for 3.3. PROBABILISTIC ALGORITHMS 115 The unranking procedure makes use of unranking decompositions and unranking triples. For the latter and a given X, Y, Z, we need to assign each member in {(x, y, z)|1 ≤ x ≤ X, 1 ≤ y ≤ Y, 1 ≤ z ≤ Z} a unique number in [1, XY Z] and base an unranking algorithm on this assignment. We leave this as a simple exercise to the reader and call the function UnrankTriplet(r, X, Y, Z). Here, r is the rank and X, Y , and Z are the upper bounds for the numbers in the triplets. The code for unranking looks as follows: UnrankingTreeNoCrossLocal(v, r, k) Input: an SDG node v, a rank r, a number k identifying a partition Output: adornments of the SDG as a side-effect Let {w1 , . . . , wn } be the children of v switch (n) { case 0: assert(r = 1 && k = 0); // no additional adornment for v case 1: let count(v) = [c0 , . . . , cn ]; let summands(v) = [s0 , . . . , sn ]; assert(k ≤ n && r ≤ ck ); P k1 = minj r ≤ ji=0 ski ; P 1 −1 k si ; r1 = r − ki=0 insert-at(v) = k; UnrankingTreeNoCrossLocal(w1 , r1 , k1 ); case 2: let count(v) = [c0 , . . . , cn ]; let summands(v) = [s0 , . . . , sn ]; let count(w1 ) = [c10 , . . . , c1n1 ]; let count(w2 ) = [c20 , . . . , c2n2 ]; assert(k ≤ n && r ≤ ck ); P k1 = minj r ≤ ji=0 ski ; P 1 −1 k q = r − ki=0 si ; k2 = k − k1 ;  (r1 , r2 , a) = UnrankTriplet(q, c1k1 , c2k2 , ki ); α = UnrankDecomposition(a); merge-using(v) = α; UnrankingTreeNoCrossLocal(w1 , r1 , k1 ); UnrankingTreeNoCrossLocal(w2 , r2 , k2 ); } 116 3.3.4 CHAPTER 3. JOIN ORDERING Quick Pick The QuickPick algorithm of Waas and Pellenkoft [904, 905] does not generate random join trees in the strong sense but comes close to it and is far easier to implement and more broadly applicable. The idea is to randomly select an edge in the query graph and to construct a join tree corresponding to this edge. QuickPick(Query Graph G) Input: a query graph G = ({R1 , . . . , Rn }, E) Output: a bushy join tree BestTreeFound = any join tree while stopping criterion not fulfilled { E ′ = E; Trees = {R1 , . . . , Rn }; while (|Trees| > 1) { choose e ∈ E ′ ; E ′ − = e; if (e connects two relations in different subtrees T1 , T2 ∈ Trees) { Trees -= T1 ; Trees -= T2 ; Trees += CreateJoinTree(T1 , T2 ); } } Tree = single tree contained in Trees; if (cost(Tree) < cost(BestTreeFound)) { BestTreeFound = Tree; } } return BestTreeFound 3.3.5 Iterative Improvement Swami and Gupta [860], Swami [859] and Ioannidis and Kang [452] applied the idea of iterative improvement to join ordering [452]. The idea is to start from a random plan and then to apply randomly selected transformations from a rule set if they improve the current join tree, until not further improvement is possible. IterativeImprovementBase(Query Graph G) Input: a query graph G = ({R1 , . . . , Rn }, E) Output: a join tree do { JoinTree = random tree JoinTree = IterativeImprovement(JoinTree) 3.3. PROBABILISTIC ALGORITHMS 117 if (cost(JoinTree) < cost(BestTree)) { BestTree = JoinTree; } } while (time limit not exceeded) return BestTree IterativeImprovement(JoinTree) Input: a join tree Output: improved join tree do { JoinTree’ = randomly apply a transformation to JoinTree; if (cost(JoinTree’) < cost(JoinTree)) { JoinTree = JoinTree’; } } while (local minimum not reached) return JoinTree The number of variants of iterative improvements is large. The first parameter is the used rule set. To restrict search to left-deep trees, a rule set consisting of swap and 3cycle is appropriate [860]. If we consider bushy trees, a complete set consisting of commutativity, associativity, left join exchange and right join exchange makes sense. This rule set (proposed by Ioannidis and Kang) is appropriate to explore the whole space of bushy join trees. A second parameter is how to determine whether the local minimum has been reached. Considering all possible neighbor states of a join tree is expensive. Therefor, a subset of size k is sometimes considered. Then, for example, k can be limited to the number of edges in the query graph [860]. 3.3.6 Simulated Annealing Iterative Improvement suffers from the drawback that it only applies a move if it improves the current plan. This leads to the problem that one is often stuck in a local minimum. Simulated annealing tries to avoid this problem by allowing moves that result in more expensive plans [457, 452, 860]. However, instead of considering every plan, only those whose cost increase does not exceed a certain limit are considered. During time, this limit decreases. This general idea is cast into the notion temperatures and probabilities of performing a selected transformation. A generic formulation of simulated annealing could look as follows: SimulatedAnnealing(Query Graph G) Input: a query graph G = ({R1 , . . . , Rn }, E) Output: a join tree BestTreeSoFar = random tree; Tree = BestTreeSoFar; 118 CHAPTER 3. JOIN ORDERING do { do { Tree’ = apply random transformation to Tree; if (cost(Tree’) < cost(Tree)) { Tree = Tree’; } else { ′ with probability e−(cost(T ree )−cost(T ree))/temperature Tree = Tree’; } if (cost(Tree) < cost(BestTreeSoFar)) { BestTreeSoFar = Tree’; } } while (equilibrium not reached) reduce temperature; } while (not frozen) return BestTreeSoFar Besides the rule set used, the initial temperature, the temperature reduction, and the definitions of equilibrium and frozen determine the algorithm’s behavior. For each of them several alternatives have been proposed in the literature. The starting temperature can be calculated as follows: determine the standard deviation σ of costs by sampling and multiply it with a constant value ([860] use 20). An alternative is to set the starting temperature twice the cost of the first randomly selected join tree [452] or to determine the starting temperature such that at least 40% of all possible transformations are accepted [834]. For temperature reduction, we can apply the formula temp∗ = 0.975 [452] λt or max(0.5, e− σ ) [860]. The equilibrium is defined to be reached if for example the cost distribution of the generated solutions is sufficiently stable [860], the number of iterations is sixteen times the number of relations in the query [452], or number of iterations is the same as the number of relations in the query [834]. We can establish frozenness if the difference between the maximum and minimum costs among all accepted join trees at the current temperature equals the maximum change in cost in any accepted move at the current temperature [860], the current solution could not be improved in four outer loop iterations and the temperature has been fallen below one [452], or the current solution could not be improved in five outer loop iterations and less than two percent of the generated moves were accepted [834]. Considering databases are used in mission critical applitions. Would you bet your business on these numbers? 3.3.7 Tabu Search Morzy, Matyasiak and Salza applied Tabu Search to join ordering [629]. The general idea is that among all neighbors reachable via the transformations, only 3.3. PROBABILISTIC ALGORITHMS 119 the cheapest is considered even if its cost are higher than the costs of the current join tree. In order to avoid running into cycles, a tabu set is maintained. It contains the last join trees generated, and the algorithm is not allowed to visit them again. This way, it can escape local minima, since eventually all nodes in the valley of a local minimum will be in the tabu set. The stopping conditions could be that there ws no improvement over the current best solution found during the last given number of iterations or if the set neighbors minus the tabu set is empty (in line (*)). Tabu Search looks as follows: TabuSearch(Query Graph) Input: a query graph G = ({R1 , . . . , Rn }, E) Output: a join tree Tree = random join tree; BestTreeSoFar = Tree; TabuSet = ∅; do { Neighbors = all trees generated by applying a transformation to Tree; Tree = cheapest in Neighbors \ TabuSet; (*) if (cost(Tree) < cost(BestTreeSoFar)) { BestTreeSoFar = Tree; } if(|TabuSet| > limit) remove oldest tree from TabuSet; TabuSet += Tree; } while (not stopping condition satisfied); return BestTreeSoFar; 3.3.8 Genetic Algorithms Genetic algorithms are inspired by evolution: only the fittest survives [333]. They work with a population that evolves from generation to generation. Successors are generated by crossover and mutation. Further, a subset of the current population (the fittest) are propagated to the next generation (selection). The first generation is generated by a random generation process. The problem is how to represent each individual in a population. The following analogies are used: • Chromosome ←→ string • Gene ←→ character In order to solve an optimization problem with genetic algorithms, an encoding is needed as well as a specification for selection, crossover, and mutation. Genetic algorithms for join ordering have been considered in [74, 834]. We first introduce alternative encodings, then come to the selection process, and finally discuss crossover and mutation. 120 CHAPTER 3. JOIN ORDERING Encodings We distinguish ordered list and ordinal number encodings. Both encodings are used for left-deep and bushy trees. In all cases we assume that the relations R1 , . . . , Rn are to be joined and use the index i to denote Ri . 1. Ordered List Encoding (a) left-deep trees A left-deep join tree is encoded by a permutation of 1, . . . , n. For instance, (((R1 B R4 ) B R2 ) B R3 ) is encoded as “1423”. (b) bushy trees Bennet, Ferris, and Ioannidis proposed the following encoding scheme [74, 75]. A bushy join-tree without cartesian products is encoded as an ordered list of the edges in the join graph. Therefore, we number the edges in the join graph. Then the join tree is encoded in a bottom-up, left-to-right manner. See Figure 3.22 for an example. 2. Ordinal Number Encoding (a) left-deep trees A join tree is encoded by using a list of relations that is shortened whenever a join has been encoded. We start with the list L = ⟨R1 , . . . , Rn ⟩. Then within L we find the index of first relation to be joined. Let this relation be Ri . Ri is the i-th relation in L. Hence, the first character in the chromosome string is i. We eliminate Ri from L. For every subsequent relation joined, we again determine its index in L, remove it from L and append the index to the chromosome string. For instance, starting with ⟨R1 , R2 , R3 , R4 ⟩, the left-deep join tree (((R1 B R4 ) B R2 ) B R3 ) is encoded as “1311”. (b) bushy trees Again, we start with the list L = ⟨R1 , . . . , Rn ⟩ and encode a bushy join tree in a bottom-up, left-to-right manner. Let Ri B Rj be the first join in the join tree under this ordering. Then we look up their positions in L and add them to the encoding. Next we eliminate Ri and Rj from L and push Ri,j to the front of it. We then proceed for the other joins by again selecting the next join which now can be between relations and/or subtrees. We determine their position within L, add these positions to the encoding, remove them from L, and insert a composite relation into L such that the new composite relation directly follows those already present. For instance, starting with the list ⟨R1 , R2 , R3 , R4 ⟩, the bushy join tree ((R1 B R2 ) B (R3 B R4 )) is encoded as “12 23 12”. The encoding is completed by adding join methods. Crossover A crossover generates a new solution from two individuals. Therefore, two partial solutions are combined. Obviously, its definition depends on the encoding. Two kinds of crossovers are distinguished: the subsequence and the subset exchange. 121 3.3. PROBABILISTIC ALGORITHMS R2 1 R1 B 2 3 B R5 4 1243 B B R3 R3 R4 R5 R4 R1 R2 Figure 3.22: A query graph, a join tree, and its encoding The subsequence exchange for the ordered list encoding works as follows. Assume two individuals with chromosomes u1 v1 w1 and u2 v2 w2 . From these we generate u1 v1′ w1 and u2 v2′ w2 , where vi′ is a permutation of the relations in vi such that the order of their appearence is the same as in u3−i v3−i w3−i . In order to adapt the subsequence exchange operator to the ordinal number encoding, we have to require that the vi are of equal length (|v1 | = |v2 |) and occur at the same offset (|u1 | = |u2 |). We then simply swap the vi . That is, we generate u1 v2 w1 and u2 v1 w2 . The subset exchange is defined only for the ordered list encoding. Within the two chromosomes, we find two subsequences of equal length comprising the same set of relations. These sequences are then simply exchanged. Mutation A mutation randomly alters a character in the encoding. If duplicates must not occur — as in the ordered list encoding — swapping two characters is a perfect mutation. Selection The probability of a join tree’s survival is determined by its rank in the population. That is, we calculate the costs of the join trees encoded for each member of the population. Then we sort the population according to their associated costs and assign probabilities to each individual such that the best solution in the population has the highest probability to survive and so on. After probabilities have been assigned, we randomly select members of the population taking these probabilities into account. That is, the higher the probability of a member, the higher is its chance to survive. Algorithm The genetic algorithm then works as follows. First, we create a random population of a given size (say 128). We apply crossover and mutation with a given rate, for example such that 65% of all members of a population participate in crossover, and 5% of all members of a population are subject to random mutation. Then we apply selection until we again have a population 122 CHAPTER 3. JOIN ORDERING of a given size. We stop after we have not seen an improvement within the population for a fixed number of iterations (say 30). 3.4 Hybrid Algorithms All the algorithms we have seen so far can be combined to result in new approaches to join ordering. Some of the numerous possibilities have been described in the literature. We present them. 3.4.1 Two Phase Optimization Two phase optimization combines Iterative Improvement with Simulated Annealing [452]. For a number of randomly generated initial trees, Iterative Improvement is used to find a local minimum. Then Simulated Annealing is started to find a better plan in the neighborhood of the local minima. The initial temperature of Simulated Annealing can be lower as is its original variants. 3.4.2 AB-Algorithm The AB-Algorithm was developed by Swami and Iyer [861, 862]. It builds on the IKKBZ-Algorithm by resolving its limitations. First, if the query graph is cyclic, a spanning tree is selected. Second, two different cost functions for joins (join methods) are supported by the AB-Algorithm: nested loop join and sort merge join. In order to make the sort merge join’s cost model fit the ASI property, it is simplified. Third, join methods are assigned randomly before IKKBZ is called. Afterwards, an iterative improvement phase follows. The algorithm can be formulated as follows: AB(Query Graph G) Input: a query graph G = ({R1 , . . . , Rn }, E) Output: a left-deep join tree while (number of iterations ≤ n2 ) { if G is cyclic take spanning tree of G randomly attach a join method to each relation JoinTree = result of IKKBZ while (number of iterations ≤ n2 ) { apply Iterative Improvement to JoinTree } } return best tree found 3.4.3 Toured Simulated Annealing Lanzelotte, Valduriez, and Zäit introduced toured simulated annealing as a search strategy useful in distributed databases where the search space is even 3.5. ORDERING ORDER-PRESERVING JOINS 123 larger than in centralized systems [532]. The basic idea is that simulated annealing is called n times with different initial join trees, if n is the number of relations to be joined. Each join sequence in the set Solutions produced by GreedyJoinOrdering-3 is used to start an independent run of simulated annealing. As a result, the starting temperature can be decreased to 0.1 times the cost of the initial plan. 3.4.4 GOO-II GOO-II appends an Iterative Improvement step to the GOO-Algorithm. 3.4.5 Iterative Dynamic Programming Iterative Dynamic Programming combines heuristics with dynamic programming in order to overcome the deficiencies of both. It comes in two variants [515, 806]. The first variant, IDP-1 (see Figure 3.23), first creates all join trees which contain up to k relations where k is a parameter of the algorithm. After this step, it selects the cheapest join tree comprising k relations, replaces it by a new compound relation and starts all over again. The iteration stops, when only one compound relation representing a join tree for all relations remains in the ToDo list. The second variant, IDP-2 (see Figure 3.24), works the other way round. It first applies a greedy heuristics to build join trees of size up to k. To the larger subtree it applies dynamic programming to improve it. The result of the optimized outcome of the greedy algorithm is then encapsulated in a new compound relation which replaces its constituent relations in the ToDo list. The algorithm then iterates until only one entry remains in the ToDo list. Obviously, from these two basic variants several others can be derived. A systematic investigation of the basic algorithms and their variants is given by Kossmann and Stocker [515]. It turns out that the most promising variants exist for IDP-1. 3.5 Ordering Order-Preserving Joins This section covers an algorithm for ordering order-preserving joins [615]. This is important for XQuery and other languages that require order-preservation. XQuery specifies that the result of a query is a sequence. If no unordered or order by instruction is given, the order of the output sequence is determined by the order of the input sequences given in the for clauses of the query. If there are several entries in a for clause or several for clauses, order-preserving join operators [183] can be a natural component for the evaluation of such a query. The order-preserving join operator is used in several algebras in the context of • semi-structured data and XML (e.g. SAL [70], XAL [293]), • OLAP [820], and 124 CHAPTER 3. JOIN ORDERING IDP-1({R1 , . . . , Rn }, k) Input: a set of relations to be joined, maximum block size k Output: a join tree for (i = 1; i <= n; ++i) { BestTree({Ri }) = Ri ; } ToDo = {R1 , . . . , Rn }; while (|ToDo| > 1) { k = min(k, |ToDo|); for (i = 2; i < k; ++i) { for all S ⊆ ToDo, |S| = i do { for all O ⊂ S do { BestTree(S) = CreateJoinTree(BestTree(S \ O), BestTree(O)); } } } find V ⊂ ToDo, |V | = k with cost(BestTree(V )) = min{cost(BestTree(W )) | W ⊂ ToDo, |W | = k}; generate new symbol T ; BestTree({T }) = BestTree(V ); ToDo = (ToDo \ V ) ∪ {T }; for all O ⊂ V do delete(BestTree(O)); } return BestTree({R1 , . . . , Rn }); Figure 3.23: Pseudo code for IDP-1 • time series data [544]. We give a polynomial algorithm that produces bushy trees for a sequence of order-preserving joins and selections. These trees may contain cross products even if the join graph is connected. However, we apply selections as early as possible. The algorithm then produces the optimal plan among those who push selections down. The cost function is a parameter of the algorithm, and we do not need to restrict ourselves to those having the ASI property. Further, we need no restriction on the join graph, i.e. the algorithm produces the optimal plan even if the join graph is cyclic. Before defining the order-preserving join, we need some preliminaries. The above algebras work on sequences of sets of variable bindings, i.e. sequences of unordered tuples where every attribute corresponds to a variable. (See Chapter 7.16 for a general discussion.) Single tuples are constructed using the standard [·] brackets. Concatenation of tuples and functions is denoted by ◦. The set of attributes defined for an expression e is defined as A(e). The set of free variables of an expression e is defined as F(e). For sequences e, we use 3.5. ORDERING ORDER-PRESERVING JOINS 125 IDP-2({R1 , . . . , Rn }, k) Input: a set of relations to be joined, maximum block size k Output: a join tree for (i = 1; i <= n; ++i) { BestTree({Ri }) = Ri ; } ToDo = {R1 , . . . , Rn }; while (|ToDo| > 1) { // apply greedy algorithm to select a good building block B = ∅; for all v ∈ ToDo, do { B += BestTree({v}); } do { find L, R ∈ B with cost(CreateJoinTree(L,R)) = min{cost(CreateJoinTree(L′ ,R′ )) | L′ , R′ ∈ B}; P = CreateJoinTree(L,R)); B = (B \ {L, R}) ∪ {P }; } while (P involves no more than k relations and |B| > 1); // reoptimize the bigger of L and R, // selected in the last iteration of the greedy loop if (L involves more tables than R) { ReOpRels = relations involved in L; } else { ReOpRels = relations involved in R; } P = DP-Bushy(ReOpRels); generate new symbol T ; BestTree({T }) = P ; ToDo = (ToDo \ ReOpRels) ∪ {T }; for all O ⊂ V do delete(BestTree(O)); } return BestTree({R1 , . . . , Rn }); Figure 3.24: Pseudocode for IDP-2 α(e) to denote the first element of a sequence. We identify single element sequences with elements. The function τ retrieves the tail of a sequence, and ⊕ concatenates two sequences. We denote the empty sequence by ϵ. We define the algebraic operators recursively on their input sequences. The order-preserving join operator is defined as the concatenation of an orderpreserving selection and an order-preserving cross product. For unary operators, if the input sequence is empty, the output sequence is also empty. For 126 CHAPTER 3. JOIN ORDERING binary operators, the output sequence is empty whenever the left operand represents an empty sequence. The order-preserving join operator is based on the definition of an orderpreserving cross product operator defined as ˆ ) ⊕ (τ (e )×e ˆ 2 := (α(e1 )Ae e1 ×e 2 1 ˆ 2) where ˆ := e1 Ae 2 ( ϵ if e2 = ϵ ˆ (e1 ◦ α(e2 )) ⊕ (e1 Aτ (e2 )) else We are now prepared to define the join operation on ordered sequences: ˆ 2) e1 B̂p e2 := σ̂p (e1 ×e where the order-preserving selection is defined as  if e = ϵ  ϵ α(e) ⊕ σ̂p (τ (e)) if p(α(e)) σ̂p (e) :=  σ̂p (τ (e)) else As usual, selections can be reordered and pushed inside order-preserving joins. Besides, the latter are associative. The following equivalences formalize this. σ̂p1 (σ̂p2 (e)) σ̂p1 (e1 B̂p2 e2 ) σ̂p1 (e1 B̂p2 e2 ) e1 B̂p1 (e2 B̂p2 e3 ) = = = = σ̂p2 (σ̂p1 (e)) σ̂p1 (e1 )B̂p2 e2 if F(p1 ) ⊆ A(e1 ) e1 B̂p2 σ̂p1 (e2 ) if F(p1 ) ⊆ A(e2 ) (e1 B̂p1 e2 )B̂p2 e3 if F(pi ) ⊆ A(ei ) ∪ A(ei+1 ) While being associative, the order-preserving join is not commutative, as the following example illustrates. Given two tuple sequences R1 = ⟨[a : 1], [a : 2]⟩ and R2 = ⟨[b : 1], [b : 2]⟩, we have R1 B̂true R2 = ⟨[a : 1, b : 1], [a : 1, b : 2], [a : 2, b : 1], [a : 2, b : 2]⟩ R2 B̂true R1 = ⟨[a : 1, b : 1], [a : 2, b : 1], [a : 1, b : 2], [a : 2, b : 2]⟩ Before introducing the algorithm, let us have a look at the size of the search space. Since the order-preserving join is associative but not commutative, the input to the algorithm must be a sequence of join operators or, likewise, a sequence of relations to be joined. The output is then a fully parenthesized expression. Given a sequence of n binary associative but not commutative operators, the number of fully parenthesized expressions is (see [208])  1 if n = 1 P (n) = Pn−1 P (k)P (n − k) if n>1 k=1 We have that P (n) are the Catalan numbers defined  = C(n − 1), where 4C(n) n 2n 1 as C(n) = n+1 n . Since C(n) = Ω( n3/2 ), the search space is exponential in size. 3.5. ORDERING ORDER-PRESERVING JOINS 127 applicable-predicates(R, P) 01 02 03 04 05 B=∅ foreach p ∈ P IF (F(p) ⊆ A(R)) B+ = p return B Figure 3.25: Subroutine applicable-predicates The algorithm is inspired by the dynamic programming algorithm for finding optimal parenthesized expressions for matrix-chain multiplication [208]. The differences are that we have to encapsulate the cost function and deal with selections. We give a detailed example application of the algorithm below. This example illustrates (1) the optimization potential, (2) that cross products can be favorable, (3) how to plug in a cost function into the algorithm, and (4) the algorithm itself. The algorithm itself is broken up into several subroutines. The first is applicable-predicates (see Fig. 3.25). Given a sequence of relations Ri , . . . , Rj and a set of predicates, it retrieves those predicates applicable to the result of the join of the relations. Since joins and selections can be reordered freely, the only condition for a predicate to be applicable is that all its free variables are bound by the given relations. The second subroutine is the most important and intrigued. It fills several arrays with values in a bottom-up manner. The third subroutine then builds the query evaluation plan using the data in the arrays. The subroutine construct-bushy-tree takes as input a sequence R1 , . . . , Rn of relations to be joined and a set P of predicates to be applied. For every possible subsequence Ri , . . . , Rj , the algorithm finds the best plan to join these relations. Therefor, it determines some k such that the cheapest plan joins the intermediate results for Ri , . . . , Rk and Rk+1 , . . . , Rj by its topmost join. For this it is assumed that for all k the best plans for joining Ri , . . . , Rk and Rk+1 , . . . , Rj are known. Instead of directly storing the best plan, we remember (1) the costs of the best plan for Ri , . . . , Rj for all 1 ≤ i ≤ j ≤ n and (2) the k where the split takes place. More specifically, the array c[i, j] contains the costs of the best plan for joining Ri , . . . , Rj , and the array t[i, j] contains the k such that this best plan joins Ri , . . . , Rk and Rk+1 , . . . , Rj with its topmost join. For every sequence Ri , . . . , Rj , we also remember the set of predicates that can be applied to it, excluding those that have been applied earlier. These applicable predicates are contained in p[i, j]. Still, we are not done. All cost functions we know use some kind of statistics on the argument relation(s) in order to compute the costs of some operation. Since we want to be generic with respect to the cost function, we encapsulate the computation of statistics and costs within functions S0 , C0 , S1 , and C1 . The function S0 retrieves statistics for base relations. The function C0 computes the costs of retrieving (part of) a base relation. Both functions take a set of applicable predicates as an additional 128 CHAPTER 3. JOIN ORDERING construct-bushy-tree(R, P) 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 n = |R| for i = 1 to n B =applicable-predicates(Ri , P) P =P \B p[i, i] = B s[i, i] = S0 (Ri , B) c[i, i] = C0 (Ri , B) for l = 2 to n for i = 1 to n − l + 1 j =i+l−1 B = applicable-predicates(Ri...j , P) P =P \B p[i, j] = B s[i, j] = S1 (s[i, j − 1], s[j, j], B) c[i, j] = ∞ for k = i to j − 1 q = c[i, k] + c[k + 1, j] + C1 (s[i, k], s[k + 1, j], B) IF (q < c[i,j]) c[i, j] = q t[i, j] = k Figure 3.26: Subroutine construct-bushy-tree extract-plan(R, t, p) 01 return extract-subplan(R, t, p, 1, |R|) extract-subplan(R, t, p, i, j) 01 02 03 04 05 06 IF (j > i) X = extract-subplan(R, t, p, i, t[i, j]) Y = extract-subplan(R, t, p, t[i, j] + 1, j) return X B̂p[i,j] Y else return σ̂p[i,i] (Ri ) Figure 3.27: Subroutine extract-plan and its subroutine argument. The function S1 computes the statistics for intermediate relations. Since the result of joining some relations Ri , . . . , Rj may occur in many different plans, we compute it only once and store it in the array s. C1 computes the costs of joining two relations and applying a set of predicates. Below, we show how concrete (simple) cost and statistics functions can look like. Given the above, the algorithm (see Fig. 3.26) fills the arrays in a bottom-up manner by first computing for every base relation the applicable predicates, the 3.5. ORDERING ORDER-PRESERVING JOINS 129 statistics of the result of applying the predicates to the base relation and the costs for computing these intermediate results, i.e. for retrieving the relevant part of the base relation and applying the predicates (lines 02-07). Note that this is not really trivial if there are several index structures that can be applied. Then computing C0 involves considering different access paths. Since this is an issue orthogonal to join ordering, we do not detail on it. After we have the costs and statistics for sequences of length one, we compute the same information for sequences of length two, three, and so on until n (loop starting at line 08). For every length, we iterate over all subsequences of that length (loop starting at line 09). We compute the applicable predicates and the statistics. In order to determine the minimal costs, we have to consider every possible split point. This is done by iterating the split point k from i to j − 1 (line 16). For every k, we compute the cost and remember the k that resulted in the lowest costs (lines 17-20). The last subroutine takes the relations, the split points (t), and the applicable predicates (p) as its input and extracts the plan. The whole plan is extracted by calling extract-plan. This is done by instructing extract-subplan to retrieve the plan for all relations. This subroutine first determines whether the plan for a base relation or that of an intermediate result is to be constructed. In both cases, we did a little cheating here to keep things simple. The plan we construct for base relations does not take the above-mentioned index structures into account but simply applies a selection to a base relation instead. Obviously, this can easily be corrected. We also give the join operator the whole set of predicates that can be applied. That is, we do not distinguish between join predicates and other predicates that are better suited for a selection subsequently applied to a join. Again, this can easily be corrected. Let us have a quick look at the complexity of the algorithm. Given n relations with m attributes in total and p predicates, we can implement applicable-predicates in O(pm) by using a bit vector representation for attributes and free variables and computing the attributes for each sequence Ri , . . . , Rj once upfront. The latter takes O(n2 m). The complexity of the routine construct-bushy-tree is determined by the three nested loops. We assume that S1 and C1 can be computed in O(p), which is quite reasonable. Then, we have O(n3 p) for the innermost loop, O(n2 ) calls to applicable-predicates, which amounts to O(n2 pm), and O(n2 p) for calls of S1 . Extracting the plan is linear in n. Hence, the total runtime of the algorithm is O(n2 (n + m)p) In order to illustrate the algorithm, we need to fix the functions S0 , S1 , C0 and C1 . We use the simple cost function Cout . As a consequence, the array s simply stores cardinalities, and S0 has to extract the cardinality of a given base relation and multiply it by the selectivities of the applicable predicates. S1 multiplies the input cardinalities with the selectivities of the applicable predicates. We set C0 to zero and C1 to S1 . The former is justified by the fact that every relation must be accessed exactly once and hence, the access costs are equal in 130 CHAPTER 3. JOIN ORDERING all plans. Summarizing, we define S0 (R, B) := |R| S1 (x, y, B) := xy C0 (R, B) := 0 Y f (p) p∈B Y f (p) p∈B C1 (x, y, B) := S1 (x, y, B) where B is a set of applicable predicates and for a single predicate p, f (p) returns its selectivity. We illustrate the algorithm by an example consisting of four relations R1 , . . . , R4 with cardinalities |R1 | = 200, |R2 | = 1, |R3 | = 1, |R4 | = 20. Besides, we have three predicates pi,j with F(pi,j ) ⊆ A(Ri ) ∪ A(Rj ). They are p1,2 , p3,4 , and p1,4 with selectivities 1/2, 1/10, 1/5. Let us first consider an example plan and its costs. The plan ((R1 B̂p1,2 R2 )B̂true R3 )B̂p1,4 ∧p3,4 R4 has the costs 240 = 100 + 100 + 40. For our simple cost function, the algorithm construct-bushy-tree will fill the array s with the initial values: s 200 1 1 20 After initilization, the array c has 0 everywhere in its diagonal and the array p empty sets. For l = 2, the algorithm produces the following values: l 2 2 2 i 1 2 3 j 2 3 4 k 1 2 3 s[i,j] 100 1 2 q 100 1 2 current c[i,j] 100 1 2 current t[i,j] 1 2 3 For l = 3, the algorithm produces the following values: l 3 3 3 3 i 1 1 2 2 j 3 3 4 4 k 1 2 2 3 s[i,j] 200 200 2 2 q 101 200 4 3 current c[i,j] 101 101 4 3 current t[i,j] 1 1 2 3 For l = 4, the algorithm produces the following values: 131 3.6. CHARACTERIZING SEARCH SPACES l 4 4 4 i 1 1 1 j 4 4 4 k 1 2 3 s[1,4] 40 40 40 q 43 142 141 current c[1,4] 43 43 43 current t[1,4] 1 1 1 where for each k the value of q (in the following table denoted by qk ) is determined as follows: q1 = c[1, 1] + c[2, 4] + 40 = 0 + 3 + 40 = 43 q2 = c[1, 2] + c[3, 4] + 40 = 100 + 2 + 40 = 142 q3 = c[1, 3] + c[4, 4] + 40 = 101 + 0 + 40 = 141 Collecting all the above t[i, j] values leaves us with the following array as input for extract-plan: i\j 1 2 3 4 1 2 3 4 1 1 1 2 3 3 The function extract-plan merely calls extract-subplan. For the latter, we give the call hierarchy and the result produced: 000 100 200 210 211 212 210 220 200 000 extract-plan(. . ., 1, 4) extract-plan(. . ., 1, 1) extract-plan(. . ., 2, 4) extract-plan(. . ., 2, 3) extract-plan(. . ., 2, 2) extract-plan(. . ., 3, 3) return (R2 B̂true R3 ) extract-plan(. . ., 4, 4) return ((R2 B̂true R3 )B̂p3,4 R4 ) return (R1 B̂p1,2 ∧p1,4 ((R2 B̂true R3 )B̂p3,4 R4 )) The total cost of this plan is c[1, 4] = 43. 3.6 Characterizing Search Spaces 3.6.1 Complexity Thresholds The complexity results presented in Section 3.1.6 show that most classes of join ordering problems are NP-hard. However, it is quite clear that some instances of the join ordering problem are simpler than others. For example, consider a query graph which is a clique in n relations R1 , . . . , Rn . Further assume that each Ri has cardinality 2i and all join selectivities are 1/2 (i.e. fi,j = 1/2 for all 1 ≤ i, j ≤ n, i ̸= j). Obviously, this problem is easy to optimize although the query graph is clique. In this section we present some ideas on how the 132 CHAPTER 3. JOIN ORDERING complexity of instances of the join ordering problem is influenced by certain parameters. How can we judge the complexity of a single instance of a join ordering problem? Using standard complexity theory, for single problem instances we easily derive an algorithm that works in Θ(1). Hence, we must define other complexity measures. Consider our introductory join ordering problem. A simple greedy algorithm that orders relations according to their cardinality produces an optimal solution for it. Hence, one possibility to define the problem complexity would be how far a solution produced by typical heuristics for join ordering differ from the optimal solution. Another possibility is to use randomized algorithms like iterative improvement of simulated annealing and see how far the plans generated by them deviate from the optimal plan. These approaches have the problem that the results may depend on the chosen algorithm. This can be avoided by using the following approach. For each join ordering problem instance, we compute the fraction of good plans compared to all plans. Therefor, we need a measure of “good”. Typical examples thereof would be to say a plan is “good” if it does not deviate more than 10% or a factor of two from the optimal plan. If these investigations were readily available, there are certain obvious benefits [511]: 1. The designer of an optimizer can classify queries such that heuristics are applied where they guarantee success; cases where they are bound to fail can be avoided. Furthermore, taking into account the vastly different run time of the different join ordering heuristics and probabilistic optimization procedures, the designer of an optimizer can choose the method that achieves a satisfactory result with the least effort. 2. The developer of search procedures and heuristics can use this knowledge to design methods solving hard problems (as exemplified for graph coloring problems [430]). 3. The investigator of different join ordering techniques is able to (1) consciously design challenging benchmarks and (2) evaluate existing benchmarks according to their degree of challenge. The kind of investigation presented in this section first started in the context of artificial intelligence where a paper by Cheeseman, Kanefsky, and Taylor [162] spurred a whole new branch of research where the measures to judge the complexity of problem instances was investigated for many different NPcomplete problems like satisfiability [162, 212, 326, 609], graph coloring [162], Hamiltonian circuits [162], traveling salesman [162], and constraint satisfaction [928]. We only present a small fraction of all possible investigations. The restrictions are that we do not consider all parameters that possibly influence the problem complexity, we only consider left-deep trees, and we restrict ourselves to the cost function Chj . The join graphs are randomly generated. Starting with a circle, we randomly added edges until a clique is reached. The reader is advised to carry out his or her own experiments. Therefor, the following 3.6. CHARACTERIZING SEARCH SPACES 133 pointer into the literature might be useful. Lanzelotte and Valduriez provide an object-oriented design for search strategies [530]. This allows easy modification and even the exchange of the plan generator’s search strategy. Search Space Analysis The goal of this section is to determine the influence of the parameters on the search space of left-deep join trees. More specifically, we are interested in how a variation of the parameters changes the percentage of good solutions among all solutions. The quality of a solution is measured by the factor its cost deviates from the optimal permutation. For this, all permutations have to be generated and evaluated. The results of this experiment are shown in Figures 3.28 and 3.29. Each single curve accumulates the percentage of all permutations deviating less than a certain factor (given as the label) from the optimum. The accumulated percentages are given at the y-axes, the connectivity at the x-axes. The connectivity is given by the number of edges in the join graph. The curves within the figures are organized as follows. Figure 3.28 (3.29) shows varying mean selectivity values (relation sizes) and variances where the mean selectivity values (relation sizes) increase from top to bottom and the variances increase from left to right. Note that the more curves are visible and the lower their y-values, the harder is the problem. We observe the following: • all curves exhibit a minimum value at a certain connectivity • which moves with increasing mean values to the right; • increasing variances does not have an impact on the minimum connectivity, • problems become less difficult with increasing mean values. These findings can be explained as follows. With increasing connectivity, the join ordering problem becomes more complex up to a certain point and then less complex again. To see this, consider the following special though illustrative case. Assume an almost equal distribution of the costs of all alternatives between the worst case and optimal costs, equal relation sizes, and equal selectivities. Then the optimization potential worst case/optimum is 1 for connectivity 0 and cliques. In between, there exists a connectivity exhibiting the maximum optimization potential. This connectivity corresponds to the minimum connectivity of Figures 3.28 and 3.29. There is another factor which influences the complexity of a single problem instance. Consider joining n relations. The problem becomes less complex if after joining i < n relations the intermediate result becomes so small that the accumulated costs of the subsequent n − i joins are small compared to the costs of joining the first i relations. Hence, the ordering of the remaining n − i relations does not have a big influence on the total costs. This is the case for very small relations, small selectivities, or high connectivities. The greater selectivities and relation sizes are, the more relations have to be joined to reach this critical size of the intermediate result. If the connectivity is enlarged, this 134 CHAPTER 3. JOIN ORDERING critical size is reached earlier. Since the number of selectivities involved in the first few joins is small regardless of the connectivity, there is a lower limit to the number of joined relations required to arrive at the critical intermediate result size. If the connectivity is larger, this point is reached earlier, but there exists a lower limit on the connectivity where this point is reached. The reason for this lower limit is that the number of selectivities involved in the joins remains small for the first couple of relations, independent of their connectivity. These lines of argument explain subsequent findings, too. The reader should be aware of the fact that the number of relations joined is quite small (10) in our experiments. Further, as observed by several researchers, if the number of joins increases, the number of “good” plans decreases [302, 858]. That is, increasing the number of relations makes the join ordering problem more difficult. Figure 3.28: Impact of selectivity on the search space Figure 3.29: Impact of relation sizes on the search space Heuristics For analyzing the influence of the parameters on the performance of heuristics, we give the figures for four different heuristics. The first two are very simple. The minSel heuristic selects those relations first of which incident join edges exhibit the minimal selectivity. The recMinRel heuristic chooses those relations first which result in the smallest intermediate relation. We also analyzed the two advanced heuristics IKKBZ and RDC . The IKKBZ heuristic [520] is based on an optimal join ordering procedure [438, 520] which is applied to the minimal spanning tree of the join graph where the edges are labeled by the selectivities. The family of RDC heuristics is based on the relational difference calculus as developed in [418]. Since our goal is not to benchmark different heuristics in order to determine the best one, we have chosen the simplest variant of the family of RDC based heuristics. Here, the relations are ordered according to a certain weight whose actual computation is—for the purpose of this section—of no interest. The results of the experiments are presented in Figure 3.30. On a first glance, these figures look less regular than those presented so far. This might be due to the non-stable behavior of the heuristics. Nevertheless, we can extract the following observations. Many curves exhibit a peak at a certain connectivity. Here, the heuristics perform worst. The peak connectivity is dependent on the selectivity size but not as regular as in the previous curves. Further, higher selectivities flatten the curves, that is, heuristics perform better at higher selectivities. Figure 3.30: Impact of parameters on the performance of heuristics 3.7. DISCUSSION 135 Probabilistic Optimization Procedures Figure 3.31 shows four pictures corresponding to simulated annealing (SA), iterative improvement (II), iterative improvement applied to the outcome of the IKKBZ heuristic (IKKBZ/II) and the RDC heuristic (RDC/II) [418]. The patterns shown in Figure 3.31 are very regular. All curves exhibit a peak at a certain connectivity. The peak connectivities typically coincide with the minimum connectivity of the search space analysis. Higher selectivities result in flatter curves; the probabilistic procedures perform better. These findings are absolutely coherent with the search space analysis. This is not surprising, since the probabilistic procedures investigate systematically —although with some random influence— a certain part of the search space. Given a join ordering problem, we can describe its potential search space as a graph. The set of nodes consists of the set of join trees. For every two join trees a and b, we add an edge (a, b) if b can be reached from a by one of the transformation rules used in the probabilistic procedure. Further, with every node we can associate the cost its corresponding join tree. Having in mind that the probabilistic algorithms are always in danger of being stuck in a local minima, the following two properties of the search space are of interest: 1. the cost distribution of local minima, and 2. the connection cost of low local minima. Of course, if all local minima are of about the same cost, we do not have to worry, otherwise we do. It would be very interesting to know the percentage of local minima that are close to the global minima. Concerning the second property, we first have to define the connection cost. Let a and b be two nodes and P be the set of all paths from a to b. The connection cost of a and b is then defined as minp∈P maxs∈p {cost(s)|s ̸= a, s ̸= b}. Now, if the connection costs are high, we know that if we have to travel from one local minima to another, there is at least one node we have to pass which has high costs. Obviously, this is bad for our probabilistic procedures. Ioannidis and Kang [453] call a search graph that is favorable with respect to the two properties a well . Unfortunately, investigating these two properties of real search spaces is rather difficult. However, Ioannidis and Kang, later supported by Zhang, succeeded in characterizing cost wells in random graphs [453, 454]. They also conclude that the search space comprising bushy trees is better w.r.t. our two properties than the one for left-deep trees. Figure 3.31: Impact of selectivities on probabilistic procedures 3.7 Discussion Choose one of dynamic programming, memoization, permutations as the core of your plan generation algorithm and extend it with the rest of book. ToDo 136 3.8 CHAPTER 3. JOIN ORDERING Bibliography ToDo: Oezsu, Meechan [663, 664] Chapter 4 Database Items, Building Blocks, and Access Paths In this chapter we go down to the storage layer and discuss leaf nodes of query execution plans and plan fragments. We briefly recap some notions, but reading a book on database implementation might be helpful [403, 316]. Although alternative storage technologies exist and are being developed [764], databases are mostly stored on disks. Thus, we start out by introducing a simple disk model to capture I/O costs. Then, we say some words about database buffers, physical data organization, slotted pages and tuple identifiers (TIDs), physical record layout, physical algebra, and the iterator concept. These are the basic notions in order to start with the main purpose of this section: giving an overview over the possibilities available to structure the low level parts of a physical query evaluation plan. In order to calculate the I/O costs of these plan fragments, a more sophisticated cost model for several kinds of disk accesses is introduced. 4.1 Disk Drive Figure 4.1 shows a top and a side view of a typical disk. A disk consists of several platters that rotate around the spindle at a fixed speed. The platters are coated with a magnetic material on at least one of their surfaces. All coated sides are organized into the same pattern of concentric circles. One concentric circle is called a track. All the tracks residing exactly underneath and above each other form a cylinder. We assume that there is only one read/write head for every coated surface.1 All tracks of a cylinder can be accessed with only minor adjustments at the same time by their respective heads. By moving the arm around the arm pivot, other cylinders can be accessed. Each track is partitioned into sectors. Sectors have a disk specific (almost) fixed capacity of 512 B. The read and write granularity is a sector. Read and write accesses take place while the sector passes under the head. The top view of Figure 4.1 shows that the outer sectors are longer than the 1 This assumption is valid for most but not all disks. 137 138CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS arm assembly arm head spindle sector platter track head arm arm pivot cylinder a. side view b. top view Figure 4.1: Disk drive assembly inner sectors. The highest density (e.g. in bits per centimeter) at which bits can be separated is fixed for a given disk. For storing 512 B, this results in a minimum sector length which is used for the tracks of the innermost cylinder. Thus, since sectors on outer tracks are longer, storage capacity is wasted there. To overcome this problem, disks have a varying number of sectors per track. (This is where the picture lies.) Therefore, the cylinders are organized into zones. Every zone contains a fixed number of consecutive cylinders, each having a fixed number of sectors per track. Between zones, the number of sectors per track varies. Outer zones have more sectors per track than inner zones. Since the platters rotate with a fixed angular speed, sectors of outer cylinders can be read faster than sectors of inner cylinders. As a consequence, the throughput for reading and writing outer cylinders is higher than for inner cylinders. Assume that we sequentially read all the sectors of all tracks of some consecutive cylinders. After reading all sectors of some track, we must proceed to the next track. If it is contained in the same cylinder, then we must (simply) use another head: a head switch occurs. Due to calibration, this takes some time. Thus, if all sectors start at the same angular position, we come too late to read the first sector of the next track and have to wait. To avoid this, the angular start positions of the sectors of tracks in the same cylinder are skewed such that this track skew compensates for the head switch time. If the next track is contained in another cylinder, the heads have to switch to the next cylinder. Again, this takes time and we miss the first sector if all sectors of a surface start at the same angular positions. Cylinder skew is used such that the time needed for this switch does not make us miss to start reading the next sector. In general, skewing works in only one direction. A sector can be addressed by a triple containing its cylinder, head (surface), and sector number. This triple is called the physical address of a sector. However, disks are accessed using logical addresses. These are called logical block numbers (LBN) and are consecutive numbers starting with zero. The disk internally maps LBNs to physical addresses. This mapping is captured in the following table: 139 4.1. DISK DRIVE Host sends command Controller decodes it Data transfer to host Status message to host SCSI bus Disk 1 Disk 2 Seek Disk 3 Rotational latency Data transfer off mechanism Read service time for disk 1 Read service time for disk 2 Figure 4.2: Disk drive read request processing cylinder 0 ... 1 ... 15041 ... track 0 1 ... 5 0 ... 0 ... LBN 0 573 ... 2865 3438 ... 35841845 ... number of sectors per track 573 573 ... 573 573 ... 253 ... However, this ideal view is disturbed by the phenomenon of bad blocks. A bad block is one with a defect and it cannot be read or written. After a block with a certain LBN is detected to be bad, it is assigned to another sector. The above mapping changes. In order to be able redirect LBNs, extra space on the disk must exist. Hence, some cylinders, tracks, and sectors are reserved for this purpose. They may be scattered all over the platters. Redirected blocks cause hiccups during sequential read. Building (see e.g. [646]) and modeling (see e.g. [586, 750, 811, 812, 880, 926]) disk drives is challenging. Whereas the former is not really important when building query compiler, the latter is, as we have to attach costs to query evaluation plans. These costs reflect the amount of time we occupy the resource disk. Since disks are relatively slow, they may become the bottleneck of a database server. Modeling and minimizing disk access (time) is thus an important topic. Consider the case where we want to read a block from a SCSI disk. Simplified, the following actions take place and take their time (see also Fig. 4.2): 1. The host sends the SCSI command. 2. The disk controller decodes the command and calculates the physical address. Time 140CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS 3. During the seek the disk drive’s arm is positioned such that the according head is correctly placed over the cylinder where the requested block resides. This step consists of several phases. (a) The disk controller accelerates the arm. (b) For long seeks, the arm moves with maximum velocity (coast). (c) The disk controller slows down the arm. (d) The disk arm settles for the desired location. The settle times differ for read and write requests. For reads, an aggressive strategy is used. If, after all, it turns out that the block could not be read correctly, we can just discard it. For writing, a more conservative strategy is in order. 4. The disk has to wait until the sector where the requested block resides comes under the head (rotation latency). 5. The disk reads the sector and transfers data to the host. 6. Finally, it sends a status message. ToDo Note that the transfers for different read requests are interleaved. This is possible since the capacity of the SCSI bus is higher than the read throughput of the disk. Also note that we did not mention the operating system delay and congestions on the SCSI bus. Disk drives apply several strategies to accelerate the above-mentioned roundtrip time and access patterns like sequential read. Among them are caching, read-ahead, and command queuing. (discuss interleaving?) The seek and rotation latency times highly depend on the head’s position on the platter surface. Let us consider seek time. A good approximation of the seek time where d cylinders have to be travelled is given by √  c1 + c2 d d <= c0 seektime(d) = c3 + c4 d d > c0 where the constants ci are disk-specific. The constant c0 indicates the maximum number of cylinders where no coast takes place: seeking over a distance of more than c0 cylinders results in a phase where the disk arm moves with maximum velocity. For disk accesses, the database system must be able to estimate the time they take to be executed. First of all, we need the parameters of the disk. It is not too easy to get hold of them, but we can make use of several tools to extract them from a given disk [243, 311, 866, 771, 938, 939]. However, then we have a big problem: when calculating I/O costs, the query compiler has no idea where the head will be when the query evaluation plan emits a certain read (or write) command. Thus, we have to find another solution. In the following, we will discuss a rather simplistic cost model that will serve us to get a feeling for disk behavior. Later, we develop a more realistic model (Section 4.17). The solution is rather trivial: we sum up all command sending and interpreting times as well the times for positioning (seek and rotation latency) which 141 4.1. DISK DRIVE form by far the major part. Let us call the result latency time. Then, we assume an average latency time. This, of course, may result in large errors for a single request. However, on average, the error can be as “low” as 35% [750]. The next parameter is the sustained read rate. The disk is assumed to be able to deliver a certain amount of bytes per second while reading data stored consecutively. Of course, considering multi-zone disks, we know that this is oversimplified, but we are still in our simplistic model. Analogously, we have a sustained write rate. For simplicity, we will assume that this is the same as the sustained read rate. Last, the capacity is of some interest. A hypothetical disk (inspired by disks available in 2004) then has the following parameters: Model 2004 Parameter Value capacity 180 GB average latency time 5 ms sustained read rate 100 MB/s sustained write rate 100 MB/s Abbreviated Name Dcap Dlat Dsrr Dswr The time a disk needs to read and transfer n bytes is then approximated by Dlat + n/Dsrr . Again, this is overly simplistic: (1) due to head switches and cylinder switches, long reads have lower throughput than short reads and (2) multiple zones are not modelled correctly. However, let us use this very simplistic model to get some feeling for disk costs. Database management system developers distinguish between sequential I/O and random I/O. For sequential I/O, there is only one positioning at the beginning and then, we can assume that data is read with the sustained read rate. For random I/O, one positioning for every unit of transfer—typically a page of say 8 KB—is assumed. Let us illustrate the effect of positioning by a small example. Assume that we want to read 100 MB of data stored consecutively on a disk. Sequential read takes 5 ms plus 1 s. If we read in blocks of 8 KB where each block requires positioning then reading 100 MB takes 65 s. Assume that we have a relation of about 100 MB in size, stored on a disk, and we want to read it. Does it take 1 s or 65 s? If the blocks on which it is stored are randomly scattered on disk and we access them in a random order, 65 s is a good approximation. So let us assume that it is stored on consecutive blocks. Assume that we read in chunks of 8 KB. Then, • other applications, • other transactions, and • other read operations of the same query evaluation plan could move the head away from our reading position. (Congestion on the SCSI bus may also be problem.) Again, we could be left with 65 s. Reading the whole relation with one read request is a possibility but may pose problems to the buffer manager. Fortunately, we can read in chunks much smaller than 100 MB. Consider Figure 4.3. If we read in chunks of 100 8 KB blocks we are already pretty close to one second (within a factor of two). 142CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS 64 32 16 8 4 2 1 1 4 16 Figure 4.3: Time to read 100 MB from disk (depending on the number of 8 KB blocks read at once) 64 143 4.1. DISK DRIVE Note that the interleaving of actions does not necessarily mean a negative impact. This depends on the point of view, i.e. what we want to optimize. If we want to optimize response time for a single query, then obviously the impact of concurrent actions is negative. If, however, we want to optimize resource (here: disk) usage, concurrent actions might help. ToDo? There are two important things to learn here. First, sequential read is much faster than random read. Second, the runtime system should secure sequential read. The latter point can be generalized: the runtime system of a database management system has, as far as query execution is concerned, two equally important tasks: • allow for efficient query evaluation plans and • allow for smooth, simple, and robust cost functions. Typical measures on the database side are • carefully chosen physical layout on disk (e.g. cylinder or track-aligned extents [772, 773, 770], clustering), • disk scheduling, multi-page requests [228, 458, 781, 782, 789, 807, 838, 930, 937], • (asynchronous) prefetching, • piggy-back scans, • buffering (e.g. multiple buffers, replacement strategy from [71] to [600]), and last but not least • efficient and robust algorithms for algebraic operators [347]. Let us take yet another look at it. 100 MB can be stored on 12800 8 KB pages. Figure 4.4 shows the time to read n random pages. In our simplistic cost model, reading 200 pages randomly costs about the same as reading 100 MB sequentially. That is, reading 1/64th of 100 MB randomly takes as long as reading the 100 MB sequentially. Let us denote by a the positioning time, s the sustained read rate, p the page size, and d some amount of consecutively stored bytes. Let us calculate the break-even point n ∗ (a + p/s) = a + d/s n = (a + d/s)/(a + p/s) = (as + d)/(as + p) a and s are disk parameters and, hence, fixed. For a fixed d, the break-even point depends on the page size. This is illustrated in Figure 4.5. The x-axis is the page size p in multiples of 1 K and the y-axis is (d/p)/n for d = 100 MB. For sequential reads, the page size does not matter. (Be aware that our simplistic model heavily underestimates sequential reads.) For random reads, 144CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS 3 2.5 2 1.5 1 0.5 0 0 100 Figure 4.4: Time needed to read n random pages 200 30 145 4.1. DISK DRIVE 512 256 128 64 32 16 8 1 2 4 Figure 4.5: Break-even point in fraction of total pages depending on page size 8 146CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS as long as a single page is read, it matters neither: reading a single page of 1 KB lasts 5.0097656 ms, for an 8 KB page the number is 5.0781250 ms. From all this, we could draw the conclusion that the larger the page the better. However, this is only true for the disk, not, e.g., for the buffer or the SCSI bus. If we need to access only 500 B of a page, then the larger the page the higher the fraction that is wasted. This is not as severe as it sounds. Other queries or transactions might need other parts of the page during a single stay in the buffer. Let us call the fraction of the page that is read by some transaction during a stay in the buffer by utilization. Obviously, the higher the utilization the better is our usage of the main memory in which the buffer resides. For smaller pages, the utilization is typically higher than for larger pages. The frequency by which pages are used is another factor. [367, 368]. Excursion. Consider the root page of a B-tree. It is accessed quite frequently and most of its parts will be used, no matter how large it is. Hence, utilization is always good. Thus, the larger the root page of a B-tree the better. On the other hand, consider a leaf page of a B-tree that is much bigger than main memory. During a single stay of it, only a small fraction of the page will be used. That is, smaller leaf pages are typically better. By converting everything to money instead of time, Gray and Graefe [367] as well as Lomet [568] come to the conclusion that a page size between 8 and 16 KB was a good choice at the end of the last century. For the less simplistic model of disk access costs developed in Section 4.17, we need to describe a disk drive by a set of parameters. These parameters are summarized in Table 4.1. Let us close this section by giving upper bounds on seek time and rotational latency. Qyang proved the following theorem which gives a tight upper bound of disk seek time if several cylinders of a consecutive range of cylinders have to be visited [705]. Theorem 4.1.1 (Qyang) If the disk arm has to travel over a region of C cylinders, it is positioned on the first of the C cylinders and has to stop at s − 1 of them, then sDseek (C/s) is an upper bound for the seek time. The time required for s consecutive sectors in a track of zone i to pass by the head is Drot Drot (s, i) = sDZscan (i) = s (4.1) DZspt (i) A trivial upper bound for the rotational delay is a full rotation. 4.2 Database Buffer The database buffer 1. is a finite piece of memory, 2. typically supports a limited number of different page sizes (mostly one or two), 4.3. PHYSICAL DATABASE ORGANIZATION Dcyl Dtrack Dsector Dtpc total number of cylinders total number of tracks total number of sectors number of tracks per cylinder (= number of surfaces) Dcmd Drot Drdsettle Dwrsettle Dhdswitch command interpretation time time for a full rotation time for settle for read time for settle for write time for head switch DZone DZcyl (i) DZspt (i) DZspc (i) DZscan (i) total number of zones number of cylinders in zone i number of sectors per track in zone i number of sectors per cylinder in zone i (= Dtpc DZspt (i)) time to scan a sector in zone i (= Drot /DZspt (i)) Davgseek D c0 D c1 D c2 D c3 D c4 average seek costs parameter for seek cost function parameter for seek cost function parameter for seek cost function parameter for seek cost function parameter for seek cost function Dseek (d) cost of a seek √  of d cylinders D c1 + D c2 d if d ≤ Dc0 Dseek (d) = D c3 + D c4 d if d > Dc0 rotation cost for s sectors of zone i (= sDZscan (i)) Drot (s, i) 147 Table 4.1: Disk drive parameters and elementary cost functions 3. is often fragmented into several buffer pools, 4. each having a replacement strategy (typically enhanced by hints). Given the page identifier, the buffer frame is found by a hashtable lookup. Accesses to the hash table and the buffer frame need to be synchronized. Before accessing a page in the buffer, it must be fixed. These points account for the fact that the costs of accessing a page in the buffer are, therefore, greater than zero. 4.3 Physical Database Organization We call everything that is stored in the database and relevant for answering queries a database item. Let us exclude meta data. In a relational system, a database item can be a relation, a fragment of a relation (if the relation is horizontally or vertically fragmented), a segment, an index, a materialized view, 148CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS or an index on a materialized view. In object-oriented databases, a database item can be the extent of a class, a named object, an index and so forth. In XML databases, a database item can be a named document, a collection of documents, or an index. Access operations to database items form the leaves of query evaluation plans. Partition Relation 1 1 contains fragmented N N Segment N mapped M Fragment 1 N consists of contains N Page 1 stores N Record M N represented 1 Tuple Figure 4.6: Physical organization of a relational database The physical algebra implemented in the query execution engine of some runtime systems allow to access database items. Since most database items consist of several data items (tuples, objects, documents), these access operations produce a stream of data items. This kind of collection-valued access operation is called a scan. Consider the simple query select from * Student This query is valid only if the database item (relation) Student exists. It could 4.3. PHYSICAL DATABASE ORGANIZATION 149 be accessible via a relation scan operation rscan(Student). However, in reality we have to consider the physical organization of the database. Figure 4.6 gives an overview of how relations can be stored in a relational database system. Physical database items can be found on the left-hand side, logical database items on the right-hand side. A fraction of a physical disk is a partition. It can be an operating system file or a raw partition. A partition is organized into several segments. A segment consists of several pages. The pages within a segment are typically accessible by a non-negative integer in [0, n[, where n is the number of pages of the segment2 . Iterative access to all pages of a segment is typically possible. The access is called a scan. As there are several types of segments (e.g. data segments, index segments), several kinds of scans exist. Within a page, physical records are stored. Each physical record represents a (part of a) tuple of a fragment of a relation. Fragments are mapped to segments and relations are partitioned into fragments. In the simplest and most common organization, every relation has only one fragment with a one-to-one mapping to segments, and for every tuple there exists exactly one record representing only this tuple. Hence, both of relationships mapped and represented are one-to-one. However, this organization does not scale well. A relation could be larger than a disk. Even if a large relation, say 180 GB fits on a disk, scanning it takes half an hour (Model 2004). Horizontal partitioning and allocation of the fragments on several disks reduces the scan time by allowing for parallelism. Vertical partitioning is another means of reducing I/O [206]. Here, a tuple is represented by several physical records, each one containing a subset of the tuple’s attributes. Since the relationship mapped is N:M, tuples from different relations can be stored in the same segment. Furthermore, in distributed database systems some fragments might be stored redundantly at different locations to improve access times [136, 514, 706, 665]. Some systems support clustering of tuples of different relations. For example, department tuples can be clustered with employee tuples such that those employees belonging to the department are close together and close to their department tuple. Such an organization speeds up join processing. To estimate costs, we need a model of a segment. We assume an extentbased implementation. That is, a segment consists of several extents3 . Each extent occupies consecutive sectors on disk. For simplicity, we assume that whole cylinders belong to a segment. Then, we can model segments as follows. Each segment consists of a sequence of extents. Each extent is stored on consecutive cylinders. Cylinders are exclusively assigned to a segment. We then describe each extent j as a pair (Fj , Lj ) where Fj is the first and Lj the last cylinder of a consecutive sequence of cylinders. A segment can then be described by a sequence of such pairs. We assume that these pairs are sorted in ascending order. In such a description, an extent may include a zone boundary. Since cost functions are dependent on the zone, we break up cylinder ranges that are not contained in a single zone. The result can be described by a sequence of triples 2 This might not be true. Alternatively, the pages of a partition can be consecutively numbered. 3 Extents are not shown in Fig. 4.6. They can be included between Partitions and Segments. 150CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS 273 2 273 827 827 1 Figure 4.7: Slotted pages and TIDs (Fi , Li , zi ) where Fi and Li mark a range of consecutive cylinders in a zone zi . Although the zi ’s can be inferred from the cylinder numbers, we include them for clarity. Also of interest are the total number of sectors in a segment and the number of cylinders Scpe (i) in an extent i. Summarizing, we describe a segment by the parameter given in Table 4.2. Sext Ssec Sfirst (i) Slast (i) Scpe (i) Szone (i) number of extents in the segment P ext Scpe (i)DZspc (Szone (i))) total number of sectors in the segment (= Si=1 first cylinder in extent i last cylinder in extent i number of cylinders in extent i (= Slast (i) − Sfirst (i) + 1) zone of extent i Table 4.2: Segment parameters 4.4 Slotted Page and Tuple Identifier (TID) Let us briefly review slotted pages and the concept of tuple identifiers (TIDs) (see Figure 4.7) [42, 41, 569, 845]. Sometimes, record identifer or row identifier (RID) is used in the literature. A TID consists of (at least) two parts. The first part identifies a page, the second part a slot on a slotted page. The slot contains—among other things, e.g. the record’s size—a (relative) pointer to the actual record. This way, the record can be moved within the page without invalidating its TID. When a record grows beyond the available space, it is moved to another page and leaves a forward pointer (again consisting of a page and a slot identifier) in its original position. This happened to the TID [273, 1] in Figure 4.7. If the record has to be moved again, the forward pointer is adjusted. This way, at most two page accesses are needed to retrieve a record, given its TID. For evaluating the costs of record accesses, we will assume that 4.5. PHYSICAL RECORD LAYOUTS 151 273 2 273 827 827 1 Figure 4.8: Various physical record layouts the fraction of moved records is known. 4.5 Physical Record Layouts A physical record represents a tuple, object, or some other logical entity or fraction thereof. In case it represents a tuple, it consists of several fields, each representing the value of an attribute. These values can be integers, floating point numbers, or strings. In case of object-oriented or object-relational systems, the values can also be of a complex type. Tuple identifiers are also possible as attribute values [731]. This can, for example, speed up join processing. In any case, we can distinguish between types whose values all exhibit the same fixed length and those whose values may vary in length. In a physical record, the values of fixed-length attributes are concatenated and the offset from the beginning of the record to the value of some selected attribute can be inferred from the types of the values preceding it. This differs for values of varying length. Here, several encodings are possible. Some simple ones are depicted in Figure 4.8. The topmost record encodes varying length values as a sequence of pairs of the form [size, value]. This encoding has the disadvantage that access to an attribute of varying length is linear in the number of those preceding it. This disadvantage is avoided in the solution presented in the middle. Instead of storing the sizes of the individual values, there is an array containing relative offsets into the physical record. They point to the start of the values. The length of the values can be inferred from these offsets and, in case of the last value, from the total length of the physical record, which is typically stored in its slot. Access to a value of varying size is now simplified to an indirect memory access plus some length calculations. Although this might 152CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS be cheaper than the first solution, there is still a non-negligible cost associated with an attribute access. The third physical record layout can be used to represent compressed attribute values and even compressed length information for parts of varying size. Note that if fixed size fields are compressed, their length becomes varying. Access to an attribute now means decompressing length/offset information and decompressing the value itself. The former is quite cheap: it boils down to an indirect memory access with some offset taken from an array [921]. The cost of the latter depends on the compression scheme used. It should be clear that accessing an attribute value now is even more expensive. To make the costs of an attribute access explicit was the sole purpose of this small section. Remark Westmann et al. discuss an efficient implementation of compression and evaluate its performance [921]. Yiannis and Zobel report on experiments with several compression techniques used to speed up the sort operator. For some of them, the CPU usage is twice as large [959]. 4.6 Physical Algebra (Iterator Concept) Physical algebraic operators are mostly implemented as iterators. This means that they support the the interface operations open, next, and close. With open, the stream of items (e.g. tuples) is initialized. With next, the next item on the stream is fetched. When no more items are available, e.g. next returns false, close can be called to clean up things. The iterator concept is explained in many text books (e.g. [316, 403, 484]) and the query processing survey by Graefe [347]. This basic iterator concept has been extended to better cope with nested evaluation by Westmann in his thesis [919], Westmann et al. [921], and Graefe [351]. The two main issues are separation of storage allocation and initialization, and batched processing. The former splits open into resource allocation, initialization of the operator, and initialization of the iterator. 4.7 Simple Scan Let us come back to the scan operations. A logical operation for scanning relations (which could be called rscan) is rarely supported by relational database management systems. Instead, they provide (physical) scans on segments. Since a (data) segment is sometimes called file, the correct plan for the above query is often denoted by fscan(Student). Several assumptions must hold: the Student relation is not fragmented, it is stored in a single segment, the name of this segment is the same as the relation name, and no tuples from other relations are stored in this segment. Until otherwise stated, we will assume that relations are not partitioned, are stored in a single segment and that the segment can be inferred from the relation’s name. Instead of fscan(Student), we could then simply use Student to denote leaf nodes in a query execution plan. If we want to use a variable that is bound subsequently to each tuple in a relation, the query select * 4.8. SCAN AND ATTRIBUTE ACCESS from 153 Student can be expressed as Student[s] instead of Student. In this notation, the output stream contains tuples having a single attribute s bound to a tuple. Physically, s will not hold the whole tuple but, for example, a pointer into the buffer where the tuple can be found. An alternative is a pointer to a slot of a slotted page contained in the buffer. A simple scan is an example for a building block . In general, a building block is something that is used as a bottommost operator in a query evaluation plan. Hence, every leaf node of a query evaluation plan is a building block or a part thereof. This is not really a sharp definition, but is sometimes useful to describe the behavior of a query compiler: after their determination, it will leave building blocks untouched even if reorderings are hypothetically possible. Although a building block can be more than a leaf node (scan) of a query evaluation plan, it will never include more than a single database item. As soon as more database items are involved, we use the notion of access path, a term which will become more precise later on when we discuss index usage. The disk access costs for a simple scan can be derived from the considerations in Section 4.1 and Section 4.17. 4.8 Scan and Attribute Access Strictly speaking, a plan like σage>30 (Student[s]) is invalid, since the tuple stream produced by Student[s] contains tuples with a single attribute s. We have a choice. Either we assume that attribute access takes place implicitly, or we make it explicit. Whether this makes sense or not depends on the database management system for which we generate plans. Let us discuss the advantages of explicit attribute retrieval. Assume s.age retrieves the age of a student. Then we can write σs.age>30 (Student[s]), where there is some non-neglectable cost for s.age. The expression σs.age>30∧s.age<40 (Student[s]) executes s.age twice. This is a bad idea. Instead, we would like to retrieve it once and reuse it later. This purpose is well-served by the map operator (χ). It adds new attributes to a given tuple and is defined as χa1 :e1 ,...,an :en (e) := {t ◦ [a1 : c1 , . . . , an : cn ]|t ∈ e, ci = ei (t) ∀ (1 ≤ i ≤ n)} where ◦ denotes tuple concatenation and the ai must not be in A(e). (Remember that A(e) is the set of attributes produced by e.) Every input tuple t is extended by new attributes ai , whose values are computed by evaluating the expression ei , in which free variables (attributes) are bound to the attributes (variables) provided by t. The above problem can now be solved by σage>30∧age<40 (χage:s.age (Student[s])). 154CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS In general, it is beneficial to load attributes as late as possible. The latest point at which all attributes must be read from the page is typically just before a pipeline breaker4 . To see why this is useful, consider the simple query select name from Student where age > 30 The plan Πn (χn:s.name (σa>30 (χa:s.age (Student[s])))) makes use of this feature, while Πn (σa>30 (χn:s.name,a:s.age (Student[s]))) does not. In the first plan the name attribute is only accessed for those students with age over 30. Hence, it should be cheaper to evaluate. If the database management system does not support this selective access mechanism, we often find the scan enhanced by a list of attributes that is projected and included in the resulting tuple stream. In order to avoid copying attributes from their storage representation to some main memory representation, some database management systems apply another mechanism. They support the evaluation of some predicates directly on the storage representation. These are boolean expressions consisting of simple predicates of the form Aθc for attributes A, comparison operators θ, and constants c. Instead of a constant, c could also be the value of some attribute or expression thereof given that it can be evaluated before the access to A. Predicates evaluable on the disk representation are called SARGable where SARG is an acronym for search argument. Note that SARGable predicates may also be good for index lookups. Then they are called index SARGable. In case they can not be evaluated by an index, they are called data SARGable [784, 863, 322]. Since relation or segment scans can evaluate predicates, we have to extend our notation for scans. Let I be a database item like a relation or segment. Then, I[v; p] scans I, binds each item in I successively to v and returns only those items for which p holds. I[v; p] is equivalent to σp (I[v]), but cheaper to evaluate. If p is a conjunction of predicates, the conjuncts should be ordered such that the attribute access cost reductions described above are reflected (for details see Chapter ??). Syntactically, we express this by separating the predicates by a comma as in Student[s; age > 30, name like ‘%m%′ ]. If we want to make a distinction between SARGable and non-SARGable predicates, we write I[v; ps ; pr ], with ps being the SARGable predicate and pr a non-SARGable predicate. Additional extensions like a projection list are also possible. 4 The page on which the physical record resides must be fixed until all attributes are loaded. Hence, an earlier point in time might be preferable. 4.9. TEMPORAL RELATIONS 4.9 155 Temporal Relations Scanning a temporal relation or segment also makes sense. Whenever the result of some (partial) query evaluation plan is used more than once, it might be worthwhile to materialize it in some temporary relation. For this purpose, a tmp operator evaluates its argument expression and stores the result relation in a temporary segment. Consider the following example query. select e.name, d.name from Emp e, Dept d where e.age > 30 and e.age < 40 and e.dno = d.dno It can be evaluated by Dept[d] Bnl e.dno=d.dno σe.age>30∧e.age<40 (Emp[d]). Since the inner (right) argument of the nested-loop join is evaluated several times (once for each department), materialization may pay off. The plan then looks like Dept[d] Bnl e.dno=d.dno Tmp(σe.age>30∧e.age<40 (Emp[d])). If we choose to factorize and materialize a common subexpression, the query evaluation plan becomes a DAG. Alternatively, we could write a small “program” that has some statements materializing some expressions which are then used later on. The last expression in a program determines its result. For our example, the program looks as follows. 1. Rtmp = σe.age>30∧e.age<40 (Emp[d]); 2. Dept[d] Bnl e.dno=d.dno Rtmp [e] The disk costs of writing and reading temporary relations can be calculated using the considerations of Section 4.1. 4.10 Table Functions A table function is a function that returns a relation [576]. An example is Primes(int from, int to), which returns all primes between from and to, e.g. via a sieve-method. It can be used in any place where a relation name can occur. The query select * from TABLE(Primes(1,100)) as p returns all primes between 1 and 100. The attribute names of the resulting relation are specified in the declaration of the table function. Let us assume that for Primes a single attribute prime is specified. Note that table functions may take parameters. This does not pose any problems, as long as we 156CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS know that Primes is a table function and we translate the above query into Primes(1, 100)[p]. Although this looks exactly like a table scan, the implementation and cost calculations are different. Consider the following query where we extract the years in which we expect a special celebration of Anton’s birthday. select * from Friends f, TABLE(Primes( CURRENT YEAR, EXTRACT(YEAR FROM f.birthday) + 100)) as p where f.name = ‘Anton’ The result of the table function depends on our friend Anton. Hence, a join is no solution. Instead, we have to introduce a new kind of join, the d-join where the d stands for dependent. It is defined as R < S >= {t ◦ s|t ∈ T, s ∈ S(t)}. The above query can now be evaluted as χb:EXT RACT Y EAR(f.birthday)+100 (σf.name=‘Anton′ (Friends[f ])) < Primes(c, b)[p] > where we assume that some global entity c holds the value of CURRENT YEAR. Let us do the above query for all friends. We just have to drop the where clause. Obviously, this results in many redundant computations of primes. At the SQL level, using the birthday of the youngest friend is beneficial: select * from Friends f, TABLE(Primes( CURRENT YEAR, (select max(birthday) from Friends) + 100)) as p where p.prime ≥ f.birthday ToDo? At the algebraic level, this kind of optimizations will be considered in Section ??. Things can get even more involved if table functions can consume and produce relations, i.e. arguments and results can be relations. Little can be said about the disk costs of table functions. They can be zero if the function is implemented such that it does not access any disks (files stored there), but it can also be very expensive if large files are scanned each time it is called. One possibility is to let the database administrator specify the numbers the query optimizer needs. However, since parameters are involved, this is not really an easy task. Another possibility is to measure the table function’s behavior whenever it is executed, and learn about its resource consumption. 4.11 Indexes There exists a plethora of different index structures. In the context of relational database management systems, the most versatile and robust index is the Btree or variants/improvements thereof (e.g. [?]). It is implemented in almost 4.11. INDEXES 157 Figure 4.9: Clustered vs. non-clustered index every commercial database management system. Some support hash-indexes (e.g. [?]). Other data models or specific applications need specialized indexes. There exist special index structures for indexing path expressions in objectoriented databases (e.g. [?]) and XML databases (e.g. [?]). Special purpose indexes include join indexes (e.g. [401, 892]) multi-dimensional indexes (e.g. [?]), variant (projection) indexes [651], small materialized aggregates [614], bitmap indexes [?], and temporal indexes (e.g. [?]). We cannot discuss all indexes and their exploitations for efficient query evaluation. This fills more than a single book. Instead, we concentrate on B-tree indexes. In general, a B-tree can be used to index several relations. We only discuss cases where B-trees index a single relation. The search key (or key for short) of an index is the sequence of attributes of the indexed relation over which the index is defined. A key is a simple key if it consists of a single attribute. Otherwise, it is a complex key. Each entry in the B-tree’s leaf page consists of pairs containing the key values and a sequence of tuple identifiers (typically sorted by increasing page number). Every tuple with a TID in this list satisfies the condition that its indexed attribute’s values are equal to the key values. If for every sequence of key values there is at most one such tuple, we have a unique index, otherwise a non-unique index . The leaf entries may contain values from additional (non-key) attributes. Then we call the index attribute data added and the additional attributes data attributes. If the index contains all attributes of the indexed relation—in its key or data attributes—storing the relation is no longer necessary. The result is an index-only relation. In this case, the concept of tuple identifiers is normally no longer used since tuples can now be moved frequently, e.g. due to a leaf page split. This has two consequences. First, the data part does not longer contain the TID. Second, other indexes on the index-only relation cannot have tuple identifiers in their data part either. Instead, they use the key of the index-only relation to uniquely reference a tuple. For this to work, we must have a unique index. B-trees can be either clustered or non-clustered indexes. In a clustered index, the tuple identifiers in the list of leaf pages are ordered according to their page 158CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS numbers. Otherwise, it is a non-clustered index5 . Figure 4.9 illustrates this. Range queries result in sequential access for clustered indexes and in random access for non-clustered indexes. 4.12 Single Index Access Path 4.12.1 Simple Key, No Data Attributes Consider the exact match query select name from Emp where eno = 1077 If there exists a unique index on the key attribute eno, we can first access the index to retrieve the TID of the employee tuple satisfying eno = 1077. Another page access yields the tuple itself which constitutes the result of the query. Let Empeno be the index on eno, then we can descend the B-tree, using 1077 as the search key. A predicate that can be used to descend the B-tree or, in general, governing search within an index structure, is called an index sargable predicate. For the example query, the index scan, denoted as Empeno [x; eno = 1077], retrieves a single leaf node entry with attributes eno and TID. Similar to the regular scan, we assume x to be a variable holding a pointer to this index entry. We use the notations x.eno and x.TID to access these attributes. To dereference the TID, we use the map (χ) operator and a dereference function deref (or ∗ for short). It turns a TID into a pointer in the buffer area. This of course requires the page to be loaded, if it is not in the buffer yet. The complete plan for the query is Πname (χe:∗(x.TID),name:e.name (Empeno [x; eno = 1077])) where we computed several new attributes with one χ operator. Note that they are dependent on previously computed attributes and, hence, the order of evaluation does matter. We can make the dependency of the map operator more explicit by applying a d-join. Denote by 2 an operator that returns a single empty tuple. Then Πname (Empeno [x; eno = 1077] < χe:∗(x.TID),name:e.name (2) >) is equivalent to the former plan. Joins and indexes will be discussed in Section 4.14. A range query like select name from Emp where age ≥ 25 and age ≤ 35 5 Of course, any degree of clusteredness may occur and has to be taken into account in cost calculations. 4.12. SINGLE INDEX ACCESS PATH 159 specifies a range for the indexed attribute. It is evaluated by an index scan with start and stop conditions. In our case, the start condition is age ≥ 25, and the stop condition is age ≤ 35. The start condition is used to retrieve the first tuple satisfying it by searching within the B-tree. In our case, 25 is used to descend from the root to the leaf page containing the key 25. Then, all records with keys larger than 25 within the page are searched. Since entries in B-tree pages are sorted on key values, this is very efficient. If we are done with the leaf page that contains 25 and the stop key has not been found yet, we proceed to the next leaf page. This is possible since leaf pages of B-trees tend to be chained. Then all records of the next leaf page are scanned and so on until we find the stop key. The complete plan then is Πname (χe:∗(x.TID),name:e.name (Empage [x; 25 ≤ age; age ≤ 35])) If the index on age is non-clustered, this plan results in random I/O. We can turn random I/O into sequential I/O by sorting the result of the index scan on its TID attribute before dereferencing it6 . This results in the following plan: Πname (χe:∗(TID),name:e.name (SortTID (Empage [x; 25 ≤ age; age ≤ 35; TID]))) Here, we explicitly included the TID attribute of the index into the projection list. Consider a similar query which demands the output to be sorted: select from where order by name, age Emp age ≥ 25 and age ≤ 35 age Since an index scan on a B-tree outputs its result ordered on the indexed attribute, the following plan produces the perfect result: Πname,age (χe:∗(x.TID),name:e.name (Empage [x; 25 ≤ age; age ≤ 35])) On a clustered index this is most probably the best plan. On a non-clustered index, random I/O disturbs the picture. We avoid that by sorting the result of the index scan on the TID attribute and, after accessing the tuples, restore the order on age as in the following plan: Πname,age (Sortage (χe:∗(TID),name:e.name (SortTID (Empage [x; 25 ≤ age; age ≤ 35; TID])))) An alternative to this plan is not to sort on the original indexed attribute (age in our example), but to introduce a new attribute that holds the rank in the sequence derived from the index scan. This leads to the plan Πname,age ( Sortrank ( χe:∗(TID),name:e.name ( SortTID ( χrank:counter++ ( Empage [x; 25 ≤ age; age ≤ 35; TID]))))) 6 This might not be necessary, if Emp fits main memory. Then, preferably asynchronous I/O should be used. 160CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS This alternative might turn out to be more efficient since sorting on an attribute with a dense domain can be implemented efficiently. (We admit that in the above example this is not worth considering.) There is another important application of this technique: XQuery often demands output in document order. If this order is destroyed during processing, it must at the latest be restored when the output it produced [592]. Depending on the implementation (i.e. the representation of document nodes or their identifiers), this might turn out to be a very expensive operation. The fact that index scans on B-trees return their result ordered on the indexed attributes is also very useful if a merge-join on the same attributes (or a prefix thereof, see Chapter 23 for further details) occurs. An example follows later on. Some predicates are not index SARGable, but can still be evaluated with the index as in the following query select name from Emp where age ≥ 25 and age ≤ 35 and age ̸= 30 The predicate age ̸= 30 is an example of a residual predicate. We can once more extend the index scan and compile the query into Πname (χt:x.TID,e:∗t,name:e.name (Empage [x; 25 ≤ age; age ≤ 35; age ̸= 30])) Some index scan implementations allow exclusive bounds for start and stop conditions. With them, the query select name from Emp where age > 25 and age < 35 can be evaluated using Πname (χt:x.TID,e:∗t,name:e.name (Empage [x; 25 < age; age < 35])) If this is not the case, two residual predicates must be used as in Πname (χt:x.TID,e:∗t,name:e.name (Empage [x; 25 ≤ age; age ≤ 35; age ̸= 25, age ̸= 35])) Especially for predicates on strings, this might be expensive. Start and stop conditions are optional. To evaluate select name from Emp where age ≥ 60 we use age ≥ 60 as the start condition to find the leaf page containing the key 60. From there on, we scan all the leaf pages “to the right”. If we have no start condition, as in 4.12. SINGLE INDEX ACCESS PATH 161 select name from Emp where age ≤ 20 we descend the B-tree to the “leftmost” page, i.e. the page containing the smallest key value, and then proceed scanning leaf pages until we encounter the key 20. Having neither a start nor stop condition is also quite useful. The query select count(*) from Emp can be evaluated by counting the entries in the leaf pages of a B-tree. Since a B-tree typically occupies far fewer pages than the original relation, we have a viable alternative to a relation scan. The same applies to the aggregate functions sum and avg. The other aggregate functions min and max can be evaluated much more efficiently by descending to the leftmost or rightmost leaf page of a B-tree. This can be used to answer queries like select min/max(salary) from Emp much more efficiently than by a relation scan. Consider the query select name from Emp where salary = (select max(salary) from Emp) It can be evaluated by first computing the maximum salary and then retrieving the employees earning this salary. This requires two descendants into the Btree, while obviously one is sufficient. Depending on the implementation of the index (scan), we might be able to perform this optimization. Further, the result of an index scan, whether it uses start and/or stop conditions or not, is always sorted on the key. This property can be useful for queries with no predicates. If we have neither a start nor a stop condition, the resulting scan is called full index scan. As an example consider the query select salary from Emp order by salary which is perfectly answered by the following full index scan: Empsalary So far, we have only seen indexes on numerical attributes. 162CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS select name, salary from Emp where name ≥ ’Maaa’ gives rise to a start condition ′ Maaa′ ≤ name. From the query select name, salary from Emp where name like ’M%’ we can deduce the start condition ′ M′ ≤ name. To express all the different alternatives of index usage, we need a powerful (and runtime system dependent) index scan expression. Let us first summarize what we can specify for an index scan: 1. the name of the variable for index entries (or pointers to them), 2. the start condition, 3. the stop condition, 4. a residual predicate, and 5. a projection list. A projection list has entries of the form a : x.b for attribute names a and b and x being the name of the variable for the index entry. a : x.a is also allowed and often abbreviated as a. We also often summarize start and stop conditions into a single expression like in 25 ≤ age ≤ 35. For a full index specification, we list all items in the subscript of the index name separated by a semicolon. Still, we need some extensions to express the queries with aggregation. Let a and b be attribute names, then we allow entries of the form b : aggr(a) in the projection list and start/stop conditions of the form min/max(a). The latter tells us to minimize/maximize the value of the indexed attribute a. Only a complete enumeration gives us the full details. On the other hand, extracting start and stop conditions and residual predicates from a given boolean expression is rather simple. Hence, we often summarize these three under a single predicate. This is especially useful when talking about index scans in general. If we have a full index scan, we leave out the predicate. We use a star ‘*’ as an abbreviated projection list that projects all attributes of the index. (So far, these are the key attribute and the TID.) If the projection list is empty, we assume that only the variable/attribute holding the pointer to the index entry is projected. Using this notation, we can express some plan fragments. These fragments are complete plans for the above queries, except that the final projection is not present. As an example, consider the following fragment: χe:∗TID,name:e.name (Empsalary [x; TID, salary]) All the plan fragments seen so far are examples of access paths. An access path is a plan fragment with building blocks concerning a single database item. 4.12. SINGLE INDEX ACCESS PATH 163 Hence, every building block is an access path. The above plans touch two database items: a relation and an index on some attribute of that relation. If we say that an index concerns the relation it indexes, such a fragment is an access path. For relational systems, the most general case of an access path uses several indexes to retrieve the tuples of a single relation. We will see examples of these more complex access paths in the following section. An access to the original relation is not always necessary. A query that can be answered solely by accessing indexes is called an index only query. A query with in like select name from Emp where age in {28, 29, 31, 32} can be evaluated by taking the minimum and the maximum found in the lefthand side of in as the start and stop conditions. We further need to construct a residual predicate. The residual predicate can be represented either as age = 28 ∨ age = 29 ∨ age = 31 ∨ age = 32 or as age ̸= 30. An alternative is to use a d-join. Consider the example query select name from Emp where salary in {1111, 11111, 111111} Here, the numbers are far apart and separate index accesses might make sense. Therefore, let us create a temporary relation Sal equal to {[s : 1111], [s : 11111], [s : 111111]}. When using it, the access path becomes Sal[S] < χe:∗TID,name:e.name (Empsalary [x; salary = S.s; TID]) > Some B-tree implementations allow efficient searches for multiple ranges and implement gap skipping [34, 35, 170, 322, 323, 476, 545]. Gap skipping, sometimes also called zig-zag skipping, continues the search for keys in a new key range from the latest position visited. The implementation details vary but the main idea of it is that after one range has been completely scanned, the current (leaf) page is checked for its highest key. If it is not smaller than the lower bound of the next range, the search continues in the current page. If it is smaller than the lower bound of the next range, alternative implementations are described in the literature. The simplest is to start a new search from the root for the lower bound. Another alternative uses parent pointers to go up a page as long as the highest key of the current page is smaller than the lower bound of the next range. If this is no longer the case, the search continues downwards again. Gap skipping gives even more opportunities for index scans and allows efficient implementations of various index nested loop join strategies. 4.12.2 Complex Keys and Data Attributes In general, an index can have a complex key comprised of the key attributes k1 , . . . , kn and the data attributes d1 , . . . , dm . One possibility is to use a full 164CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS index scan on such an index. Having more attributes in the index makes it more probable that queries are index-only. Besides a full index scan, the index can be descended to directly search for the desired tuple(s). Let us take a closer look at this possibility. If the search predicate is of the form k1 = c1 ∧ k2 = c2 ∧ . . . ∧ kj = cj for some constants ci and some j <= n, we can generate the start and stop condition k1 = c1 ∧ . . . ∧ kj = cj . This simple approach is only possible if the search predicates define values for all search key attributes, starting from the first search key and then for all keys up to the j-th search key with no key attribute unspecified in between. Predicates concerning the other key attributes after the first non-specified key attribute and the additional data attributes only allow for residual predicates. This condition is often not necessary for multi-dimensional index structures, whose discussion is beyond the book. With ranges things become more complex and highly dependent on the implementation of the facilities of the B-tree. Consider a query predicate restricting key values as follows k1 = c1 ∧ k2 ≥ c2 ∧ k3 = c3 Obviously, we can generate the start condition k1 = c1 ∧ k2 ≥ c2 and the stop condition k1 = c1 . Here, we neglected the condition on k3 which becomes a residual predicate. However, with some care we can extend the start condition to k1 = c1 ∧ k2 ≥ c2 ∧ k3 = c3 : we only have to keep k3 = c3 as a residual predicate, since for k2 values larger than c2 , values different from c3 can occur for k3 . If closed ranges are specified for a prefix of the key attributes as in a1 ≤ k1 ≤ b1 ∧ . . . ∧ aj ≤ kj ≤ bj we can generate the start key k1 = a1 ∧ . . . ∧ kj = aj , the stop key k1 = b1 ∧ . . . ∧ kj = bj , and a2 ≤ k2 ≤ b2 ∧ . . . ∧ aj ≤ kj ≤ bj as the residual predicate. If for some search key attribute kj the lower bound aj is not specified, the start condition cannot contain kj and any kj+i . If for some search key attribute kj the upper bound bj is not specified, the stop condition cannot contain kj and any kj+i . Two further enhancements of the B-tree functionality possibly allow for alternative start/stop conditions: • The B-tree implementation allows to specify the order (ascending or descending) for each key attribute individually. 4.13. MULTI INDEX ACCESS PATH 165 • The B-tree implementation implements forward and backward scans (e.g. implemented in Rdb [34]). So far, we are only able to exploit query predicates which specify value ranges for a prefix of all key attributes. Consider querying a person on his/her height and his/her hair color: haircolor = ’blond’ and height between 180 and 190. If we have an index on sex, haircolor, height, this index cannot be used by means of the techniques described so far. However, since there are only the two values male and female available for sex, we can rewrite the query predicate to (sex = ’m’ and haircolor = ’blond’ and height between 180 and 190) or (sex = ’f’ and haircolor = ’blond’ and height between 180 and 190) and use two accesses to the index. This approach works fine for attributes with a small domain and is described by Antoshenkov [35]. (See also the previous section for gap skipping.) Since the possible values for key attributes may not be known to the query optimizer, Antoshenkov goes one step further and shifts the construction of search ranges to index scan time. Therefore, the index can be provided with a complex boolean expression which is then refined (rewritten) as soon as search key values become known. Search ranges are then generated dynamically, and gap skipping is applied to skip the intervals between the qualifying ranges during the index scan. 4.13 Multi Index Access Path We wish to buy a used digital camera and state the following query: select * from Camera where megapixel > 5 and distortion < 0.05 and noise < 0.01 and zoomMin < 35 and zoomMax > 105 We assume that on every attribute used in the where clause there exists an index. Since the predicates are conjunctively connected, we can use a technique called index and-ing. Every index scan returns a set (list) of tuple identifiers. These sets/lists are then intersected. This operation is also called And merge [562]. Using index and-ing, a possible plan is (((( Cameramegapixel [c; megapixel > 5; TID] ∩ Cameradistortion [c; distortion < 0.05; TID]) ∩ Cameranoise [c; noise < 0.01; TID]) ∩ CamerazoomMin [c; zoomMin < 35; TID]) ∩ CamerazoomMax [c; zoomMax > 105; TID]) 166CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS This results in a set of tuple identifiers that only needs to be dereferenced to access the according Camera tuples and produce the final result. Since the costs of the expression clearly depend on the costs of the index scans and the size of the intermediate TID sets, two questions arise: • In which order do we intersect the TID sets resulting from the index scans? • Do we really apply all indexes before dereferencing the tuple identifiers? EX The answer to the latter question is clearly “no”, if the next index scan is more expensive than accessing the records in the current TID list. It can be shown that the indexes in the cascade of intersections are ordered on increasing (fi − 1)/ci terms, where fi is the selectivity of the index and ci its access cost. Further, we can stop as soon as accessing the original tuples in the base relation becomes cheaper than intersecting with another index and subsequently accessing the base relation. Index or-ing is used to process disjunctive predicates. Here, we take the union of the TID sets to produce a set of TIDs containing references to all qualifying tuples. Note that duplicates must be eliminated during the processing of the union. This operation is also called Or merge [562]. Consider the query select * from Emp where yearsOfEmployment ≥ 30 or age ≥ 65 This query can be answered by constructing a TID set using the expression EmpyearsOfEmployment [c; yearsOfEmployment ≥ 30; TID]∪Empage [c; age ≥ 65; TID] ToDo and then dereferencing the list of tuple identifiers. Again, the index accessing can be ordered for better performance. Given a general boolean expression in and and or, constructing the optimal access path using index and-ing and or-ing is a challenging task that will be discussed in Chapter ??. This task is even more challenging, if some simple predicates occur more than once in the complex boolean expression and factorization has to be taken into account. This issue was first discussed by Chaudhuri, Ganesan and Saragawi [149]. We will come back to this in Chapter ??. The names index and-ing and or-ing become clear if bitmap indexes are considered. Then the bitwise and and or operations can be used to efficiently compute the intersection and union. Excursion on bitmap indexes. 2 There are even more possibilities to work with TID sets. Consider the query select * from Emp where yearsOfEmployment ̸= 10 and age ≥ 65 4.14. INDEXES AND JOINS 167 This query can be evaluated by scanning the index on age and then eliminating all employees with yearsOfEmployment = 10: Empage [c; age ≥ 65; TID]\ EmpyearsOfEmployment [c; yearsOfEmployment ̸= 10; TID] Let us call the application of set difference on index scan results index differencing. Some predicates might not be very restrictive in the sense that more than half the index has to be scanned. By negating these predicates and using index differencing, we can make sure that at most half of the index needs to be scanned. As an example consider the query select * from Emp where yearsOfEmployment ≤ 5 and age ≤ 65 Assume that most of our employees’ age is below 65. Then EmpyearsOfEmployment [c; yearsOfEmployment ≤ 5; TID] \ Empage [c; age > 65; TID] could be more efficient than EmpyearsOfEmployment [c; yearsOfEmployment ≤ 5; TID] ∩ Empage [c; age ≤ 65; TID] 4.14 Indexes and Joins There are two issues when discussing indexes and joins. The first is that indexes can be used to speed up join processing. The second is that index accesses can be expressed as joins. We discuss both of these issues, starting with the latter. In our examples, we used the map operation to (implicitly) access the relation by dereferencing the tuple identifiers. We can make the implicit access explicit by exchanging the map operator by a d-join or even a join. Then, for example, χe:∗TID,name:e.name (Empsalary [x; 25 ≤ age ≤ 35; TID]) becomes Empsalary [x; 25 ≤ age ≤ 35; TID] < χe:∗TID,name:e.name (2) > where 2 returns a single empty tuple. Assume that every tuple contains an attribute TID containing its TID. This attribute does not have to be stored explicitly but can be derived. Then, we have the following alternative access path for the join (ignoring projections): Empsalary [x; 25 ≤ age ≤ 35] Bx.TID=e.TID Emp[e] For the join operator, the pointer-based join implementation developed in the context of object-oriented databases may be the most efficient way to evaluate 168CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS ToDo the access path [804]. Obviously, sorting the result of the index scan on the tuple identifiers can speed up processing since it turns random into sequential I/O. However, this destroys the order on the key which might itself be useful later on during query processing or required by the query7 . Sorting the tuple identifiers was proposed by, e.g., Yao [957], Makinouchi, Tezuka, Kitakami, and Adachi in the context of RDB/V1 [578]. The different variants (whether or not and where to sort, join order) can now be transparently determined by the plan generator: no special treatment is necessary. Further, the join predicates can not only be on the tuple identifiers but also on key attributes. This often allows to join with other than the indexed relations (or their indexes) before accessing the relation. Rosenthal and Reiner proposed to use joins to represent access paths with indexes [739]. This approach is very elegant since no special treatment for index processing is required. However, if there are many relations and indexes, the search space might become very large, as every index increases the number of joins to be performed. This is why Mohan, Haderle, Wang, and Cheng abondoned this approach and sketched a heuristics which determines an access path in case multiple indexes on a single table exist [625]. The query select name,age from Person where name like ’R%’ and age between 40 and 50 is an index only query (assuming indexes on name and age) and can be translated to Πname,age ( Empage [a; 40 ≤ age ≤ 50; TIDa, age] BTIDa=TIDn Empname [n; name ≥′ R′ ; name ≤′ R′ ; TIDn, name]) Let us now discuss the former of the two issues mentioned in the section’s introduction. The query select * from Emp e, Dept d where e.name = ‘Maier’ and e.dno = d.dno can be directly translated to σe.name=′ Maier′ (Emp[e]) Be.dno=d.dno Dept[d] If there are indexes on Emp.name and Dept.dno, we can replace σe.name=′ Maier′ (Emp[e]) by an index scan as we have seen previously: χe:∗(x.T ID),A(Emp):e.∗ (Empname [x; name = ‘Maier′ ]) 7 Restoring the order may be cheaper than typical sorting since tuples can be numbered before the first sort on tuple identifiers, and this dense numbering leads to efficient sort algorithms. 4.14. INDEXES AND JOINS 169 Here, A(Emp) : t.∗ abbreviates access to all Emp attributes. This especially includes dno:t.dno. (Strictly speaking, we do not have to access the name attribute, since its value is already known.) As we have also seen, an alternative is to use a d-join instead: Empname [x; name = ‘Maier′ ] < χt:∗(x.T ID),A(e)t.∗ (2) > Let us abbreviate Empname [x; name = ‘Maier′ ] by Ei and χt:∗(x.T ID),A(e)t.∗ (2) by Ea . Now, for any e.dno, we can use the index on Dept.dno to access the according department tuple: Ei < Ea >< Deptdno [y; y.dno = dno] > Note that the inner expression Deptdno [y; y.dno = dno] contains the free variable dno, which is bound by Ea . Dereferencing the TID of the department results in the following algebraic modelling which models a complete index nested loop join: Ei < Ea >< Deptdno [y; y.dno = dno; dTID : y.TID] >< χu:∗dTID,A(Dept)u.∗ (2) > Let us abbreviate Deptdno [y; y.dno = dno; dTID : y.TID] by Di and χu:∗dTID,A(Dept)u.∗ (2) by Da . Fully abbreviated, the expression then becomes Ei < Ea >< Di >< Da > Several optimizations can possibly be applied to this expression. Sorting the outer of a d-join is useful under several circumstances since it may • turn random I/O into sequential I/O and/or • avoid reading the same page twice. In our example expression, • we can sort the result of expression Ei on TID in order to turn random I/O into sequential I/O, if there are many employees named “Maier”. • we can sort the result of the expression Ei < Ea > on dno for two reasons: – If there are duplicates for dno, i.e. there are many employees named “Maier” in each department, then this guarantees that no index page (of the index Dept.dno) has to be read more than once. – If additionally Dept.dno is a clustered index or Dept is an index-only table contained in Dept.dno, then large parts of the random I/O can be turned into sequential I/O. – If the result of the inner is materialized (see below), then only one result needs to be stored. Note that sorting is not necessary, but grouping would suffice to avoid duplicate work. 170CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS • We can sort the result of the expression Ei < Ea >< Di > on dTID for the same reasons as mentioned above for sorting the result of Ei on TID. The reader is advised to explicitly write down the alternatives. Another exercise EX is to give plan alternatives for the different cases of DB2’s Hybrid Join [322] which can now be decomposed into primitives like relation scan, index scan, d-join, sorting, TID dereferencing, and access to a unique index (see below). Let us take a closer look at materializating the result of the inner of the djoin. IBM’s DB2 for MVS considers temping (i.e. creating a temporary relation) the inner if it is an index access [322]. Graefe provides a general discussion on the subject [351]. Let us start with the above example. Typically, many employees will work in a single department and possibly several of them are called “Maier”. For everyone of them, we can be sure that there exists at most one department. Let us assume that referential integrity has been specified. Then, there exists exactly one department for every employee. We have to find a way to rewrite the expression Ei < Ea >< Deptdno [y; y.dno = dno; dTID : y.TID] > such that the mapping dno−−→dTID is explicitly materialized (or, as one could also say, cached ). For this purpose, Hellerstein and Naughton introduced a modified version of the map operator that materializes its result [414]. Let us denote this operator by χmat . The advantage of using this operator is that it is quite general and can be used for different purposes (see e.g. [103], Chap. ??, Chap. ??). Since the map operator extends a given input tuple by some attribute values, which must be computed by an expression, we need one to express the access to a unique index. For our example, we write IdxAccDept dno [y; y.dno = dno] to express the lookup of a single (unique) entry in the index on Dept.dno. We assume that the result is a (pointer to the) tuple containing the key attributes and all data attributes including the TID of some tuple. Then, we have to perform a further attribute access (dereferenciation) if we are interested in only one of the attributes. Now, we can rewrite the above expression to Ei < Ea >< χmat (2) > dT ID:(IdxAccDept [y;y.dno=dno]).TID dno If we further assume that the outer (Ei < Ea >) is sorted on dno, then it suffices to remember only the TID for the latest dno. We define the map operator χmat,1 to do exactly this. A more efficient plan could thus be Sortdno (Ei < Ea >) < χmat,1 dT ID:(IdxAccDept dno [y;y.dno=dno]).TID (2) > where, strictly speaking, sorting is not necessary: grouping would suffice. Consider a general expression of the form e1 < e2 >. The free variables used in e2 must be a subset of the variables (attributes) produced by e1 , i.e. 4.14. INDEXES AND JOINS 171 F(e2 ) ⊆ A(e1 ). Even if e1 does not contain duplicates, the projection of e1 on F(e2 ) may contain duplicates. If so, materialization could pay off. However, in general, for every binding of the variables F(e2 ), the expression e2 may produce several tuples. This means that using χmat is not sufficient. Consider the query select * from Emp e, Wine w where e.yearOfBirth = w.year If we have no indexes, we can answer this query by a simple join where we only have to decide the join method and which of the relations becomes the outer and which the inner. Assume we have only wines from a few years. (Alternatively, some selection could have been applied.) Then it might make sense to consider the following alternative: Wine[w] < σe.yearOfBirth=w.year (Emp[e]) > However, the relation Emp is scanned once for each Wine tuple. Hence, it might make sense to materialize the result of the inner for every year value of Wine if we have only a few year values. In other words, if we have many duplicates for the year attribute of Wine, materialization may pay off since then we have to scan Emp only once for each year value of Wine. To achieve caching of the inner, in case every binding of its free variables possibly results in many tuples, requires a new operator. Let us call this operator memox and denote it by M [351, 103]. For the free variables of its only argument, it remembers the set of result tuples produced by its argument expression and does not evaluate it again if it is already cached. Using memox, the above plan becomes Wine[w] < M(σe.yearOfBirth=w.year (Emp[e])) > It should be clear that for more complex inners, the memox operator can be applied at all branches, giving rise to numerous caching strategies. Analogously to the materializing map operator, we are able to restrict the materialization to the results for a single binding for the free variables if the outer is sorted (or grouped) on the free variables: Sortw.yearOfBirth (Wine[w]) < M1 (σe.yearOfBirth=w.year (Emp[e])) > Things can become even more efficient if there is an index on Emp.yearOfBirth: Sortw.yearOfBirth (Wine[w]) < M1 (EmpyearOfBirth [x; x.yearOfBirth = w.year] < χe:∗(x.TID),A(Emp):∗e (2) >) > So far we have seen different operators which materialize values: Tmp, M, and χmat . The latter in two variants. As an exercise, the reader is advised to discuss the differences between them. EX Assume, we have indexes on both Emp.yearOfBirth and Wine.year. Besides the possibilities to use either Emp or Wine as the outer, we now also have the possibility to perform a join on the indexes before accessing the actual Emp 172CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS and Wine tuples. Since the index scan produces its output ordered on the key attributes, a simple merge join suffices (and we are back at the latter): merge EmpyearOfBirth [x] Bx.yearOfBirth=y.year Wineyear [y] EX This example makes clear that the order provided by an index scan can be used to speed up join processing. After evaluating this plan fragment, we have to access the actual Emp and Wine tuples. We can consider zero, one, or two sorts on their respective tuple identifiers. If the join is sufficiently selective, one of these alternatives may prove more sufficient than the ones we have considered so far. 4.15 Remarks on Access Path Generation A last kind of optimization we briefly want to mention is sideways information passing. Consider a simple join between two relations: R BR.a=S.b S. If we decide to perform a sort merge join or a hash join, we can implement it by first sorting/partitioning R before looking at S. While doing so, we can remember the minimum and maximum value of R.a and use these as a restriction on S such that fewer tuples of S have to be sorted/partitioned. In case we perform a blockwise nested loop join, after the first scan of S we know the minimum and maximum value of S.b and can use these to restrict R. If the number of distinct values of R.a is small, we could also decide to remember all these values and evaluate perform a semi-join before the actual join. Algebraically, this could be expressed as R BR.a=S.b (S NS.b=R.a ΠR.a (R)) An alternative is to use a bitmap to represent the projection of R on a. The semi-join technique should be well-known from distributed database systems. In deductive database systems, this kind of optimization often carries the attribute magic. We will more deeply discuss this issue in Chapter ??. The following problem is not discussed in the book. Assume that we have fully partitioned a relation vertically into a set of files which are chronologically ordered. Then, the attribute ai of the j-th tuple can be found at the j-th position of the i-th file. This organizion is called partitioned transposed file [56]. (Compare this with variant (projection) indexes [651] and small materialized aggregates [614].) The problem is to find an access strategy to all the attribute required by the query given a collection of restriction on some of the relation’s attributes. This problem has been discussed in depth by Batory [56]. Full vertical partitioning is also used as the organizing principle of Monet [?]. Lately, it also gained some interest in the US [?]. 4.16 Counting the Number of Accesses 4.16.1 Counting the Number of Direct Accesses After the index scan, we have a set of (distinct) tuple identifiers for which we have to access the original tuples. The question we would like to answer is: 4.16. COUNTING THE NUMBER OF ACCESSES 173 How many pages do we have to read? Let R be the relation for which we have to retrieve the tuples. Then we use the following abbreviations N m B k |R| ||R|| N/m number of tuples in the relation R number of pages on which tuples of R are stored number of tuples per page (blocking factor ) number of (distinct) TIDs for which tuples have to be retrieved We assume that the tuples are uniformly distributed among the m pages. Then, each page stores B = N/m tuples. B is called blocking factor . Let us consider some borderline cases. If k > N − N/m or m = 1, then all pages are accessed. If k = 1 then exactly one page is accessed. The answer to the general question will be expressed in terms of buckets (pages in the above case) and items contained therein (tuples in the above case). Later on, we will also use extents, cylinders, or tracks as buckets and tracks or sectors/blocks as items. We assume that a bucket contains items. The total number of items will be N and the number of requested items will be k. The above question can then be reformulated to how many buckets contain at least one of the k requested items, i.e. how many qualifying buckets exist. We start out by investigating the case where the items are uniformly distributed among the buckets. Two subcases will be distinguished: 1. k distinct items are requested 2. k non-distinct items are requested. We then discuss the case where the items are non-uniformly distributed. In any case, the underlying access model is random access. For example, given a tuple identifier, we can directly access the page storing the tuple. Other access models are possible. The one we will subsequently investigate is sequential access where the buckets have to be scanned sequentially in order to find the requested items. After that, we are prepared to develop a model for disk access costs. Throughout this section, we will further assume that  the probability that we request a set with k items is N1 for all of the Nk possibilities to select (k) a k-set.8 We often make use of established equalities for binomial coefficients. For convenience, the most frequently used equalities are listed in Appendix D. Selecting k distinct items Our first theorem was discovered independently by Waters [913] and Yao [954]. We formulate it in terms of buckets containing items. We say a bucket qualifies if it contains at least one of the k items we are looking for. 8 A k-set is a set with cardinality k. 174CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS Theorem 4.16.1 (Waters/Yao) Consider m buckets with n items each. Then there is a total of N = nm items. If we randomly select k distinct items from all items, then the number of qualifying buckets is N,m Yn (k) = m ∗ YnN (k) (4.2) where YnN (k) is the probability that a bucket contains at least one item. This probability is equal to YnN (k) =  [1 − p] 1 k ≤N −n k >N −n where p is the probability that a bucket contains none of the k items. The following alternative expressions can be used to calculate p: p = = =  (4.3) N −n−i N −i (4.4) N −k−i N −i (4.5) N −n k  N k k−1 Y i=0 n−1 Y i=0 The second expression (4.4) is due to Yao, the third (4.5) is due to Waters. Palvia and March proved both formulas to be equal [669] (see also [39]). The fraction m = N/n may not be an integer. For these cases, it is advisable to have a Gamma-function based implementation of binomial coeffcients at hand (see [702] for details). Depending on k and n, either the expression of Yao or the one of Waters is faster to compute. After the proof of the above formulas and the discussion of some special cases, we will give several approximations for p. Proof The total number of possibilities to pick the k items from all N items is  N . The number of possibilities to pick k items from all items not contained in k  a fixed single bucket is N k−n . Hence, the probability p that a bucket does not   qualify is p = N k−n / Nk . Using this result, we can do the following calculation p = = =  N −n k  N k (N − n)! k!(N − k)! k!((N − n) − k)! N ! k−1 Y i=0 N −n−i N −i 4.16. COUNTING THE NUMBER OF ACCESSES 175 which proves the second expression. The third follows from  N −n k  N k p = (N − n)! k!(N − k)! k!((N − n) − k)! N ! (N − n)! (N − k)! = N ! ((N − k) − n)! n−1 Y N −k−i = N −i = i=0 2 Let us list some special cases: If n=1 n=N k=0 k=1 k=N N (k) = then Ym k/N 1 0 B/N = (N/m)N = 1/m 1 We examine a slight generalization of the first case in more detail. Let N items be distributed over N buckets such that every bucket contains exactly one item. Further let us be interested in a subset of m buckets (1 ≤ m ≤ N ). If we pick k items, then the number of buckets within the subset of size m that qualify is mY1N (k) = m k N (4.6) In order to see that the two sides are equal, we perform the following calculation: Y1N (k) = =  N −1 (1 − Nk  ) k (N −1)! k!((N −1)−k)! (1 − ) N! k!(N −k)! (N − 1)!k!(N − k)! ) N !k!((N − 1) − k)! N −k ) (1 − N N N −k ( − ) N N N −N +k N k N = (1 − = = = = 176CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS Since the computation of YnN (k) can be quite expensive, several approximations have been developed. The first one was given by Waters [912, 913]: p ≈ (1 − k/N )n This approximation (also described elsewhere [317, 669]) turns out to be pretty good. However, below we will see even better approximations. N,m For Y n (k) Whang, Wiederhold, and Sagalowicz gave the following approximation for faster calculation [925]: m ∗ [ (1 − (1 − 1/m)k )+ (1/(m2 n) ∗ k(k − 1)/2 ∗ (1 − 1/m)k−1 )+ (1.5/(m3 n4 ) ∗ k(k − 1)(2k − 1)/6 ∗ (1 − 1/m)k−1 ) ] A rough estimate is presented by Bernstein, Goodman, Wong, Reeve, and Rothnie [78]:  if k < m  k 2 N,m m k+m Y n (k) ≈ if ≤ k < 2m 2  2 m if 2m ≤ k An interesting and useful result was derived by Dihr and Saharia [241]. They give two formulas and show that they are lower and upper bounds to Water and Yao’s formula. The upper and lower bounds for p are k )n N − n−1 2 k k = ((1 − ) ∗ (1 − ))n/2 N N −n+1 plower = (1 − pupper for n = N/m. Dihr and Saharia claim that the maximal difference resulting from the use of the lower and the upper bound to compute the number of page accesses is 0.224—far less than a single page access. Selecting k non-distinct items So far, we assumed that we retrieve k distinct items. We could ask the same question for k non-distinct items. This question demands a different urn model. In urn model terminology, the former case is an urn model with a nonreplacement assumption, while the latter case is one with a replacement assumption. (Deeper insight into urn models is given by Drmota, Gardy, and Gittenberger [248].) Before presenting a theorem discovered by Cheung [174], we repeat a theorem from basic combinatorics. We know that the number of subsets of size k of a set with N elements is Nk . The following lemma gives us the number of k-multisets9 (see, e.g. [829]). The of k-multisets taken from a set S  number  N with |S| elements is denoted by . k 9 A k-multiset is a multiset with k elements. 4.16. COUNTING THE NUMBER OF ACCESSES 177 Lemma 4.16.2 Let S be a set with |S| = N elements. Then, the number of multisets with cardinality k containing only elements from S is  N k    N +k−1 = k For a proof we just note that there is a bijection between the k-multisets and the k-subsets of a N + k − 1-set. We can go from a multiset to a set by f with f ({x1 ≤ . . . ≤ xk }) = {x1 + 0 < x2 + 1 < . . . < xk + (k − 1)} and from a set to a multiset via g with g({x1 < . . . < xk }) = {x1 − 0 < x2 − 1 < . . . < xk − (k − 1)}. Theorem 4.16.3 (Cheung) Consider m buckets with n items each. Then there is a total of N = nm items. If we randomly select k not necessarily distinct items from all items, then the number of qualifying buckets is N,m Cheungn (k) = m ∗ CheungN n (k) (4.7) where CheungN n (k) = [1 − p̃] (4.8) with the following equivalent expressions for p̃:  N −n+k−1 k  N +k−1 k k−1 Y p̃ = = i=0 n−1 Y = i=0 (4.9) N −n+i N +i (4.10) N −1−i N −1+k−i (4.11) Eq. 4.9 follows from the observation that the probability that some bucket (N −n+k−1 ) k does not contain any of the k possibly duplicate items is N +k−1 . Eq. 4.10 ( k ) follows from p̃ = = = =  N −n+k−1 k  N +k−1 k (N − n + k − 1)! k!((N + k − 1) − k)! k!((N − n + k − 1) − k)! (N + k − 1)! (N − n − 1 + k)! (N − 1)! (N − n − 1)! (N − 1 + k)! k−1 Y i=0 N −n+i N +i 178CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS Eq. 4.11 follows from p̃ =  N −n+k−1 k  N +k−1 k (N − n + k − 1)! k!((N + k − 1) − k)! k!((N − n + k − 1) − k)! (N + k − 1)! (N + k − 1 − n)! (N − 1)! = (N + k − 1)! (N − 1 − n)! n−1 Y N −n+i = N +k−n+i = = i=0 n−1 Y i=0 N −1−i N −1+k−i 2 Cardenas discovered a formula that can be used to approximate p̃ [125]: (1 − n/N )k As Cheung pointed out, we can use the theorem to derive the number of distinct items accessed contained in a k-multiset. Corollary 4.16.4 Let S be a k-multiset containing elements from an N -set T . Then the number of distinct items contained in S is D(N, k) = Nk N +k−1 (4.12) if the elements in T occur with the same probability in S. We apply the theorem for the special where every bucket contains exactly Q0 case −1 N −1−i . And the number of one item (n = 1). In this case, i=0 N −1+k−i = NN−1+k N −1 N −1+k−N +1 k qualifying buckets is N (1 − N −1+k ) = N ( N −1+k ) = N N +k−1 . 2  N Another way to achieve this formula is the following. There are l possibilities to pick l different elements out of the N elements in T . In order to build a k-multiset with l different elements, we mustadditionally  choose n − l  l N elements from the l elements. Thus, we have l possibilities to n−l   N build a k-multiset. The total number of multisets is . Thus we may l conclude that    l N min(N,k) l X n−l   D(N, k) = l N l=1 l which can be simplified to the above. 179 4.16. COUNTING THE NUMBER OF ACCESSES A useful application of this formula is to calculate the size of a projection [174]. Another use is that calculating the number of distinct values contained in a multiset allows us to shift from the model with replacement to a model without replacement. However, there is a difference between N,m Yn N,m (Distinct(N, k)) ≈ Cheungn (k) even when computing Y with Eq. 4.5. Nonetheless, for n ≥ 5, the error is less than two percent. One of the problems when calculating the result of the left-hand side is that the number of distinct items is not necessarily an integer. To solve this problem, we can implement all our formulas using the Gammafunction. But even then a small difference remains. The approximation given in Theorem 4.16.3 is not too accurate. A better approximation can be calculated from the probability distribution. Denote by p(D(N, k) = j) the probability that the number of distinct values if we randomly select k items with replacement from N given items equals j. Then     X N k j ((j − l)/N )k j(−1) p(D(N, k) = j) = l j l=0 and thus min(N,k) D(N, k) = X j=1 j     X j N j(−1)k ((j − l)/N )k l j l=0 This formula is quite intense to calculate. We can derive a very good approximation by the following reasoning. We draw k elements from the set T with |T | = N elements. Every element from T can be drawn at most k times. We produce N buckets, one for each element of T . In each bucket, we insert k copies of the according element from t. Then, a sequence of draws from T with duplicates can be represented by a sequence of draws without duplicate by mapping them to different copies. Thus, the first occurrence is mapped to the first element in the according bucket, the second one to the second copy and so on. Then, we can apply formula by Waters and Yao to calculate the number of buckets (and hence elements of T ) hit: N k,k D(N, k) = Y N (k) Since the approximation is quite accurate and we already know how to efficiently calculate this formula, this is our method of choice. Non-Uniform Distribution of Items In the previous sections, we assumed that 1. every page contains the same number of records, and 2. every record is accessed with the same probability. 180CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS We now turn to relax the first assumption. Christodoulakis models the distribution by m numbers ni (for 1 ≤ i ≤ m) if there are m buckets. Each ni equals the number of records in some bucket i [177]. Luk proposes Zipfian record distribution [571]. However, Ijbema and Blanken say that Water and Yao’s formula is still better, as Luk’s formula results in too low values [440]. They all come up with the same general formula presented below. Vander Zander, Taylor, and Bitton [968] discuss the problem of correlated attributes which results in some clusteredness. Zahorjan, Bell, and Sevcik discuss the problem where every item is assigned its own access probability [967]. That is, they relax the second assumption. We will come back to these issues in Section ??. We still assume that every item is accessed with the same probability. However, we relax the first assumption. The following formula derived by Christodoulakis [177], Luk [571], and Ijbema and Blanken [440] is a simple application of Waters’s and Yao’s formula to a more general case. Theorem 4.16.5 (Yao/Waters/Christodoulakis) Assume a set of m buckets. Each bucket contains nj > 0 items (1 ≤ j ≤ m). The total number of items Pm is N = j=1 nj . If we look up k distinct items, then the probability that bucket j qualifies is  N −nj k  N k WnNj (k, j) = [1 − ] (= YnNj (k)) (4.13) and the expected number of qualifying buckets is m X N,m WnNj (k, j) W nj (k) := j=1 (4.14) Note that the product formulation in Eq. 4.5 of Theorem 4.16.1 results in a more efficient computation. We make a note of this in the following corollary. Corollary 4.16.6 Assume a set of m buckets. Each bucket contains nj > 0 P items (1 ≤ j ≤ m). The total number of items is N = m n j=1 j . If we look up k distinct items, then the expected number of qualifying buckets is with pj = ( Q 0 m X N,m W nj (k) = (1 − pj ) j=1 (4.15) nj −1 N −k−i i=0 N −i (4.16) k ≤ nj N − nj < k ≤ N If we compute the pj after we have sorted the nj in ascending order, we can use the fact that nj+1 −1 Y N −k−i pj+1 = pj ∗ . N −i i=nj We can also use the theorem to calculate the number of qualifying buckets in case the distribution is given by a histogram. 4.16. COUNTING THE NUMBER OF ACCESSES 181 Corollary 4.16.7 For 1 ≤ i ≤ L let there PL be li buckets containing ni items. Then the total number of buckets is m = i=1 li , and the total number of items P in all buckets is N = L i=1 li ni . For k randomly selected items, the number of qualifying buckets is N,m W nj (k) = L X i=1 li YnNj (k) (4.17) Last in this section, let us calculate the probability distribution for the number of qualifying items within a bucket. The probability that x ≤ nj items in a bucket j qualify can be calculated  as follows. The number of possibilities nj to select x items in bucket nj is x . The number of possibilites to draw the −nj  . The total number remaining k − x items from the other buckets is Nk−x  of possibilities to distribute k items over the buckets is Nk . This shows the following: Theorem 4.16.8 Assume a set of m buckets. Each bucket Pm contains nj > 0 items (1 ≤ j ≤ m). The total number of items is N = j=1 nj . If we look up k distinct items, the probability that x items in bucket j qualify is XnNj (k, x) = nj x  N −nj  k−x N k (4.18) Further, the expected number of qualifying items in bucket j is N,m X nj (k) = min(k,nj ) X xXnNj (k, x) (4.19) x=0 In standard statistics books the probability distribution XnNj (k, x) is called hypergeometric distribution. Let us consider the case where all nj are equal to n. Then we can calculate the average number of qualifying items in a bucket. With y := min(k, n) we 182CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS have min(k,n) X N,m X nj (k) = xXnN (k, x) x=0 min(k,n) = X xXnN (k, x) x=1 = = = = = =    y 1 X n N −n x  N x k−x k x=1  y    1 X x n N −n  N 1 x k−x k x=1     y 1 X n n−1 N −n  N 1 x−1 k−x k x=1    y−1  n X N −n n−1 1  N 0 + x (k − 1) − x k x=0    n n−1+N −n 1  N 0+k−1 k    n N −1 1  N k−1 k k k = N m Let us consider the even more special case where every bucket contains a single item. That is, N = m and ni = 1. The probability that a bucket contains a qualifying item reduces to  N −1 1 = n X1N (k, x) = = x N k  N −1 k−1  N k k−1  k k (= ) N m Since x can then only be zero or one, the average number of qualifying items a bucket contains is also Nk . The formulas presented in this section can be used to estimate the number of block/page accesses in case of random direct accesses. As we will see next, other kinds of accesses occur and need different estimates. = 4.16.2 Counting the Number of Sequential Accesses Vector of Bits When estimating seek costs, we need to calculate the probability distribution for the distance between two subsequent qualifying cylinders. We model the 4.16. COUNTING THE NUMBER OF ACCESSES 183 situation as a bitvector of length B with b bits set to 1. Then B corresponds to the number of cylinders and a 1 indicates that a cylinder qualifies. Theorem 4.16.9 Assume a bitvector of length B. Within it b ones are uniformly distributed. The remaining B − b bits are zero. Then the probability distribution of the number j of zeros 1. between two consecutive ones, 2. before the first one, and 3. after the last one is given by BbB (j) = B−j−1 b−1  B b  (4.20) A more general theorem (see Theorem 4.16.13) was first presented by Yao [955]. The above formulation is due to Christodoulakis [180]. To see why the formula holds, consider the total number of bitvectors having  a one in position i followed by j zeros followed by a one. This number is B−j−2 b−2 . We can chose B − j − 1 positions for i. The total number of bitvectors is Bb and each bitvector has b − 1 sequences of the form that a one is followed by a sequence of zeros is followed by a one. Hence, BbB (j) = =  (B − j − 1) B−j−2 b−2  (b − 1) Bb  B−j−1 b−1  B b Part 1. of the theorem follows. To prove part 2., we count the number of bitvectors that start with j zeros before the first one. There are B − j − 1 positions left for the remaining b−1 ones. Hence, the number of these bitvectors is B−j−1 and part 2 follows. Part 3 follows by symmetry. b−1 We can derive a less expensive way to evaluate the formula for BbB (j) as 184CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS follows. For j = 0, we have BbB (0) = Bb . If j > 0, then BbB (j) = =  B−j−1 b−1  B b (B−j−1)! (b−1)!((B−j−1)−(b−1))! B! b!(B−b)! (B − j − 1)! b!(B − b)! (b − 1)!((B − j − 1) − (b − 1))! B! (B − j − 1)! (B − b)! = b ((B − j − 1) − (b − 1))! B! (B − j − 1)! (B − b)! = b (B − j − b)! B! b (B − j)! (B − b)! = B − j (B − b − j)! B! = j−1 = b Y b ) (1 − B−j B−i i=0 This formula is useful when BbB (j) occurs in sums over j because we can compute the product incrementally. Corollary 4.16.10 Using the terminology of Theorem 4.16.9, the expected value for the number of zeros 1. before the first one, 2. between two successive ones, and 3. after the last one is B Bb = B−b X j=0 jBbB (j) = B−b b+1 (4.21) 185 4.16. COUNTING THE NUMBER OF ACCESSES Let us calculate: B−b X j=0     B−b X B−j−1 B−j−1 j = (B − (B − j)) b−1 b−1 j=0 = B B−b X  B−b   X B−j−1 B−j−1 − (B − j) b−1 b−1 j=0 b−1+j b−1 j=0 = B = = = = B−b X B−b X j=0  −b j=0 B−j b B−b X   b+j B −b b j=0 j=0     (b − 1) + (B − b) + 1 b + (B − b) + 1 B −b (b − 1) + 1 b+1     B B+1 B −b b b+1   B+1 B ) (B − b b+1 b b−1+j j  B−b X With B−b B+1 b+1 = = B(b + 1) − (Bb + b) b+1 B−b b+1 the claim follows. Corollary 4.16.11 Using the terminology of Theorem 4.16.9, the expected total number of bits from the first bit to the last one, both included, is B tot (B, b) = Bb + b b+1 (4.22) To see this, we subtract from B the average expected number of zeros between the last one and the last bit: B− B−b b+1 = = = B(b + 1) B − b − b+1 b+1 Bb + B − B + b b+1 Bb + b b+1 An early approximation of this formula was discovered by Kollias [507]. 186CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS Corollary 4.16.12 Using the terminology of Theorem 4.16.9, the number of bits from the first one and the last one, both included, is B 1-span (B, b) = Bb − B + 2b b+1 (4.23) We have two possibilities to argue here. The first subtracts from B the number of zeros at the beginning and the end: B−b b+1 Bb + B − 2B + 2b b+1 Bb − B + 2b b+1 B 1-span (B, b) = B − 2 = = The other possibility is to add the number of zeros between the first and the last one and the number of ones: B B 1-span (B, b) = (b − 1)B b + b B − b b(b + 1 = (b − 1) + b+1 b+1 Bb − b2 − B + b + b2 + b = b+1 Bb − B + 2b = b+1 EX or Cor? The number of bits from the first bit to the last one including both . . . The distance between the first and the last one . . . Let us have a look at some possible applications of these formulas. If we look up one record in an array of B records and we search sequentially, how many array entries do we have to examine on average if the search is successful? In [584] we find these formulas used for the following scenario. Let a file consist of B consecutive cylinders. We search for k different keys, all of which occur in the file. These k keys are distributed over b different cylinders. Of course, we can stop as soon as we have found the last key. What is the expected total distance the disk head has to travel if it is placed on the first cylinder of the file at the beginning of the search? Another interpretation of these formulas can be found in [429, 585]. Assume we have an array consisting of B different entries. We sequentially go through all entries of the array until we have found all the records for b different keys. We assume that the B entries in the array and the b keys are sorted. Further, all b keys occur in the array. On the average, how many comparisons do we need to find all keys? Vector of Buckets A more general scenario is as follows. Consider a sequence of m buckets containing ni items each. Yao [955] developed the following theorem. 187 4.16. COUNTING THE NUMBER OF ACCESSES Theorem 4.16.13 (Yao) Consider a sequence of m buckets. For 1 ≤P i ≤ m, m let ni be the number Pi of items in a bucket i. Then there is a total of N = i=1 ni items. Let ti = l=0 ni be the number of items in the first i buckets. If the buckets are searched sequentially, then the number of buckets that have to be examined until k distinct items have been found is  tj  − tj−1 N,m k k Cni (k, j) = (4.24)  N k Thus, the expected number of buckets that need to be examined in order to retrieve k distinct items is Pm tj−1  m X N,m j=1 N,m C ni (k) = jCni (k, j) = m − (4.25) k N k j=1 Applications of this formula can be found in [177, 180, 584, 586, 880]. Manolopoulos and Kollias describe the analogue for the replacement model [584]. Lang, Driscoll, and Jou discovered a general theorem which allows us to estimate the expected number of block accesses for sequential search. Theorem 4.16.14 (Lang/Driscoll/Jou) Consider a sequence of N items. For a batched search of k items, the expected number of accessed items is A(N, k) = N − N −1 X i=1 Prob[Y ≤ i] (4.26) where Y is a random variable for the last item in the sequence that occurs among the k items searched. proof? ? With the help of this theorem, it is quite easy to derive many average sequential accesses for different models. Cor or EX? 4.16.3 Pointers into the Literature Segments containing records can be organized differently. Records can be placed randomly in the segment, they can be ordered according to some key, or the segment is organized as a tree. Accordingly, the segment is called random, sequential, or tree-structure. From a segment, records are to be retrieved for a given bag of k keys. The general question then is: how many pages do we have to access? The answer depends on whether we assume the replacement or non-replacement model. Six cases occur. For sequential and tree-structured segments, it also makes sense to distinguish between successful, partially (un-) successful, and (totally) unsuccessfull searches. These notions capture the different possibilities where for all, some, none of the k keys records are found. The following table provides some entry points into the literature. It is roughly organized around the above categories. (Remember that we discussed the random file organization at length in Section 4.16.1.) 188CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS random sequential tree-structured 4.17 non-replacement [174, 177, 571, 687, 925, 954] [63, 177, 527, 586, 669, 668, 810, 955] [527, 526, 586, 668, 692] replacement [125, 177, 669, 687] [177, 527, 586, 810] [527, 526, 586, 810] Disk Drive Costs for N Uniform Accesses The goal of this section is to derive estimates for the costs (time) for retrieving N cache-missed sectors of a segment S from disk. We assume that the N sectors are read in their physical order on disk. This can be enforced by the DBMS, by the operating system’s disk scheduling policy (SCAN policy), or by the disk drive controller. Remembering the description of disk drives, the total costs can be described as Cdisk = Ccmd + Cseek + Csettle + Crot + Cheadswitch (4.27) For brevity, we omitted the parameter N and the parameters describing the segment and the disk drive on which the segment resides. Subsequently, we devote a (sometimes tiny) section to each summand. Before that, we have to calculate the number of qualifying cylinders, tracks, and sectors. These numbers will be used later on. 4.17.1 Number of Qualifying Cylinders, Tracks, and Sectors If N sectors are to be retrieved, we have to find the number of cylinders qualifying in an extent i. Let Ssec denote the total number of sectors our segment contains and Scpe (i) = Li −Fi +1 be the number of cylinders of the extent. If the N sectors we want to retrieve are uniformly distributed among the Ssec sectors of the segment, the number of cylinders that qualifies in (Fi , Li , zi ) is Scpe (i) times 1 minus the probability that a cylinder does not qualify. The probability that a cylinder does not qualify can be computed by deviding the total number of possibilities to chose the N sectors from sectors outside the cylinder by the total number of possibilities to chose N sectors from all Ssec sectors of the segment. Hence, the number of qualifying cylinders in the considered extent is: Ssec −DZspc (i) Ssec Qc (i) = Scpe (i)YD (N ) = Scpe (i)(1 − Zspc (i) N  Ssec N ) (4.28) We could also have used Theorem 4.16.13. Let us also calculate the number of qualifying tracks in a partion i. It can be calculated by Scpe (i)Dtpc (1 − Prob(a track does not qualify)). The probability that a track does not qualify can be computed by dividing the number of ways to pick N sectors from sectors not belonging to a track divided by the number of possible ways to pick N sectors from all sectors. Ssec −DZspt (i) Ssec Qt (i) = Scpe (i)Dtpc YD (N ) = Scpe (i)Dtpc (1 − Zspt (i) N  Ssec N ) (4.29) 4.17. DISK DRIVE COSTS FOR N UNIFORM ACCESSES 189 Just for fun, we calculate the number of qualifying sectors of an extent in zone i. It can be approximated by Qs (i) = Scpe (i)DZspc (i) N Ssec (4.30) Since all Scpe (i) cylinders are in the same zone, they have the same number of sectors per track, and we could also use Waters/Yao to approximate the number of qualifying cylinders by Scpe (i)D (Szone (i)),Scpe (i) Zspc Qc (i) = Y DZspc (Szone (i)) (Qs (i)) (4.31) This is a good approximation, as long as Qs (i) is not too small (e.g. > 4). 4.17.2 Command Costs The command costs Ccmd are easy to compute. Every read of a sector requires the execution of a command. Hence Ccmd = N Dcmd estimates the total command costs. 4.17.3 Seek Costs We give different alternative possibilities to estimate seek costs. We start with an upper bound by exploring Theorem 4.1.1. The first cylinder we have to visit requires a random seek with cost Davgseek . (Well this does not really give us an upper bound. For a true upper bound we should use Dseek (Dcyl − 1).) After that, we have to visit the remaining Qc (i)−1 qualifying cylinders. The segment spans a total of Slast (Sext ) − Sfirst (1) + 1 cylinders. Let us assume that the first qualifying cylinder is the first cylinder and the last qualifying cylinder is the last cylinder of the segment. Then applying Theorem 4.1.1 gives us the upper bound Slast (Sext ) − Sfirst (1) + 1 Cseek (i) ≤ (Qc (i) − 1)Dseek ( ) Qc (i) − 1 after we have found the first qualifying cylinder. We can be a little more precise by splitting the seek costs into two components. The first component Cseekgap expresses the costs of finding the first qualifying cylinder and jumping from the last qualifying cylinder of extent i to the first qualifying cylinder of extent i + 1. The second component Cseekext (i) calculates the seek costs within an extent i. Figure 4.10 illustrates the situation. The total seek costs then are Cseek (i) = Cseekgap + Sext X i=1 Cseekext (i) Since there is no estimate in the literature for Cseekgap , we have to calculate it ourselves. After we have done so, we present several alternatives to calculate Cseekext (i). 190CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS ∆gap seek ... ... ... |{z} Ξ |{z} Ξ |{z} | {z } | {z } | {z } Scpe Ξ Scpe Scpe The upper path illustrates Cseekgap , the lower braces indicate those parts for which Cseekext is responsible. Figure 4.10: Illustration of seek cost estimate The average seek cost for reaching the first qualifying cylinder is Davgseek . How far are we now within the first extent? We use Corollary 4.16.10 to derive that the number of non-qualifying cylinders preceding the first qualifying one in some extent i is Scpe (i) − Qc (i) Scpe (i) . B Qc (i) = Qc (i) + 1 The same is found for the number of non-qualifying cylinders following the last qualifying cylinder. Hence, for every gap between the last and the first qualifying cylinder of two extents i and i + 1, the disk arm has to travel the distance Scpe (i) Scpe (i+1) ∆gap (i) := B Qc (i) + Sfirst (i + 1) − Slast (i) − 1 + B Qc (i+1) Using this, we get Cseekgap = Davgseek + Sext −1 X Dseek (∆gap (i)) i=1 Let us turn to Cseekext (i). We first need the number of cylinders between the first and the last qualifying cylinder, both included, in extent i. It can be calculated using Corollary 4.16.12: Ξext (i) = B 1-span (Scpe (i), Qc (i)) Hence, Ξ(i) is the minimal span of an extent that contains all qualifying cylinders. Using Ξ(i) and Theorem 4.1.1, we can derive an upper bound for Cseekext (i): Cseekext (i) ≤ (Qc (i) − 1)Dseek ( Ξ(i) ) Qc (i) − 1 (4.32) 4.17. DISK DRIVE COSTS FOR N UNIFORM ACCESSES 191 Alternatively, we could formulate this as Scpe (i) Cseekext (i) ≤ (Qc (i) − 1)Dseek (B Qc (i) ) (4.33) by applying Corollary 4.16.10. A seemingly more precise estimate for the expected seek cost within the qualifying cylinders of an extent is derived by using Theorem 4.16.9: Scpe (i)−Qc (i) Cseekext (i) = Qc (i) X S (i) (j) Dseek (j + 1)BQcpe c (i) (4.34) j=0 There are many more estimates for seek times. Older ones rely on a linear disk model but also consider different disk scan policies. A good entry point is the work by Theorey and Pinkerton [871, 872]. 4.17.4 Settle Costs The average settle cost is easy to calculate. For every qualifying cylinder, one head settlement takes place: Csettle (i) = Qc (i)Drdsettle 4.17.5 (4.35) Rotational Delay Costs Let us turn to the rotational delay. For some given track in zone i, we want to read the Qt (i) qualifying sectors contained in it. On average, we would expect that the read head is ready to start reading in the middle of some sector of a track. If so, we have to wait for 21 DZscan (Szone (i)) before the first whole sector occurs under the read head. However, due to track and cylinder skew, this event does not occur after a head switch or a cylinder switch. Instead of being overly precise here, we ignore this half sector pass by time and assume that we EX are always at the beginning of a sector. This is also justified by the fact that we model the head switch time explicitly. Assume that the head is ready to read at the beginning of some sector of some track. Then, in front of us is a — cyclic, which does not matter — bitvector of qualifying and non-qualifying sectors. We can use Corollary 4.16.11 to estimate the total number of qualifying and non-qualifying sectors that have to pass under the head until all qualifying sectors have been seen. The total rotational delay for the tracks of zone i is Crot (i) = Qt (i) DZscan (Szone (i)) B tot (DZspt (Szone (i)), Qspt (i)) Ssec ,D (Szone (i)) (4.36) Zspt where Qspt (i) = W 1 (N ) = DZspt (Szone (i)) SNsec is the expected number of qualifying sectors per track in extent i. In case Qspt (i) < 1, we set Qspt (i) := 1. A more precise model is derived as follows. We sum up for all j the product of (1) the probability that j sectors in a track qualify and (2) the average number of sectors that have to be read if j sectors qualify. This gives us the number of 192CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS sectors that have to pass the head in order to read all qualifying sectors. We only need to multiply this number by the time to scan a single sector and the number of qualifying tracks. We can estimate (1) using Theorem 4.16.8. For (2) we again use Corollary 4.16.11. Crot (i) = Scpe (i) Dtpc DZscan (Szone (i)) min(N,DZspt (Szone (i))) · X (N, j) B tot (DZspt (Szone (i)), XDSsec (4.37) j) Zspt (Szone (i)) j=1 Another approach is taken by Triantafillou, Christodoulakis, and Georgiadis [880]. They split the total rotational delay into two components. The first component (Crotpass ) measures the time needed to skip unqualifying sectors and the second (Crotread ) that for (scanning and transferring) the qualifying sectors to the host. Let us deal with the first component. Assume that j sectors of a track in extent i qualify. The expected position on a track where the head is ready to read is the middle between two qualifying sectors. Since the expected number of sectors between two qualifying sectors is DZspt (Szone (i))/j, the expected number of sectors scanned before the first qualifying sector comes under the head is DZspt (Szone (i)) . The expected positions of j qualifying sectors on the same track 2j is such that the number of non-qualifying sectors between two successively qualifying sectors is the same. Hence, after having read a qualifying sector, DZspt (Szone (i)) unqualifying sectors must pass by until the next qualifying sector j shows up. The total number of unqualifying sectors to be passed if j sectors qualify in a track of zone i is Ns (j, i) = DZspt (Szone (i)) DZspt (Szone (i)) − j + (j − 1) 2j j (4.38) Using again Theorem 4.16.8, the expected rotational delay for the unqualifying sectors then is Crotpass (i) = Scpe (i) Dtpc DZscan (Szone (i)) min(N,DZspt (Szone (i))) · X j=1 XDSsec (N, j)Ns (j, i) Zspt (Szone (i)) (4.39) We have to sum up this number for all extents and then add the time needed to scan the N sectors. Hence Crot = Sext X i=1 Crotpass (i) + Crotread (i) where the total transfer cost for the qualifying sectors of an extent can be estimated as Crotread (i) = Qs (i) DZscan (Szone (i)) 4.17. DISK DRIVE COSTS FOR N UNIFORM ACCESSES 4.17.6 193 Head Switch Costs The average head switch cost is equal to the average number of head switches that occur times the average head switch cost. The average number of head switch is equal to the number of tracks that qualify minus the number of cylinders that qualify since a head switch does not occur for the first track of each cylinder. Summarizing Cheadswitch = Sext X (Qt (i) − Qc (i)) Dhdswitch (4.40) i=1 where Qt is the average number of tracks qualifying in an extent. 4.17.7 Discussion The disk drive cost model derived depends on many parameters. The first bunch of parameters concerns the disk drive itself. These parameters can (and must be) extracted from disk drives by using (micro-) benchmarking techniques [311, 866, 605, 771]. The second bunch of parameters concerns the layout of a segment on disk. The database system is responsible for providing these parameters. The closer it is to the disk, the easier these parameters are extracted. Building a runtime system atop the operating system’s file system is obviously a bad idea from the cost model perspective. If instead the storage manager of the runtime system implements cylinder aligned extents (or at least track aligned extents) using a raw I/O interface, the cost model will be easier to develop and much more precise. Again, providing reliable cost models is one of the most important tasks of the runtime system. We have neglected many problems in our disk access model: partially filled cylinders, pages larger than a block, disk drive’s cache, remapping of bad blocks, non-uniformly distributed accesses, clusteredness, and so on. Whereas the first two items are easy to fix, the rest is not so easy. In general, database systems ignore the disk drive cache. The justifying argument is that the database buffer is much larger than the disk drive’s cache and, hence, it is very unlikely that we read a page that is not in the database buffer but in the disk cache. However, this argument falls short for non-random accesses. Nevertheless, we will ignore the issue in this book. The interested reader is referred to Shriver’s thesis for disk cache modeling [811]. Remapping of bad sectors to other sectors really prevents the development of a precise cost model for disk accesses. Modelling disk drives becomes already a nightmare since a nice partitioning of the disk into zones is no longer possible since some sectors, tracks and even cylinders are reserved for the remapping. So even if no remapping takes place (which is very unlikely), having homogeneous zones of hundreds of cylinders is a dream that will never come true. The result is that we do not have dozens of homogeneous zones but hundreds (if not thousands) of zones of medium homogeneity. These should be reduced to a model of dozens of homogeneous zones such that the error does not become too large. The remaining issues will be discussed later in the book. EX 194CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS EX There is even more to say about our cost model. A very practical issue arises if the number of qualifying cylinders is small. Then for some extent i, the expected number of qualifying cylinders could be Qc (i) = 0.38. For some of our formulas this is a big problem. As a consequence, special cases for small N , small Qc , small Qt have to be developed and implemented. Another issue is the performance of the cost model itself. The query compiler might evaluate the cost model’s formulas thousands or millions of times. Hence, they should be fast to evaluate. So far, we can adequately model the costs of N disk accesses. Some questions remain. For example, how do we derive the number N of pages we have to access? Do we really need to fetch all N pages from disk or will we find some of them in the buffer? If yes, how many? Further, CPU costs are also an important issue. Deriving a cost model for CPU costs is even more tedious than modelling disk drive costs. The only choice available is to benchmark all parts of a system and then derive a cost model using the extracted parameters. To give examples of parameters to be extracted: we need the CPU costs for accessing a page present in the buffer, for accessing a page absent in the buffer, for a next call of an algebraic operator, for executing an integer addition, and so on. Again, this cannot be done without tools [47, 244, 412, 463, 686]. The bottom line is that a cost model does not have to be accurate, but must lead to correct decisions. In that sense, it must be accurate at the break even points between plan alternatives. Let us illustrate this point by means of our motivating example. If we know that the index returns a single tuple, it is quite likely that the sequential scan is much more expensive. The same might be true for 2, 3, 4, and 5 tuples. Hence, an accurate model for small N is not really necessary. However, as we come close to the costs of a sequential scan, both the cost model for the sequential scan and the one for the index-based access must be correct since the product of their errors is the factor a bad choice is off the best choice. This is a crucial point, since it is easy to underestimate sequential access costs by a factor of 2-3 and overestimate random access cost by a factor of 2-5. 4.18 Concluding Remarks Learned: Open Cost: I/O costs: non-uniform stuff, CPU costs: nothing done Wrong cardinality estimates: Open, leads to dynamic qo 4.19 Bibliography ToDo: • CPU Costs for B-tree search within inner and leaf pages [523] • Index/Relations: only joins between building blocks [739] 4.19. BIBLIOGRAPHY 195 • RDB/V1: predicate push down (views), 2 phase optimization (local: traditional, global: sharing of tables), five categories for predicates, nested loops evaluation for nested correlated subqueries, use of transitivity of equality, conjunctive normal form, use of min/max value of join column to reduce join cardinality by adding another selection to the other relation (min(a) <= b <= max(a)) for join predicate a=b. • K accesses to unique index: how many page faults if buffer has size b? [752] • buffer mgmt: [257] • buffer mgmt: [809] • buffer mgmt: [528] • buffer mgmt: [753, 754] • buffer mgmt: [109] • buffer mgmt: [176] • structured, semi-structured, unstructured data: [337] cited in Dono76 • B-trees and their improvements [213] • Vertical partitioning: [261, 587, 56] • Horizontal and vertical partitioning: [135] • set oriented disk access to large complex objects [918, 917], assembly operator: [483], • large objects: [89, 126, 542] 196CHAPTER 4. DATABASE ITEMS, BUILDING BLOCKS, AND ACCESS PATHS Part II Foundations 197 Chapter 5 Logic, Null, and Boolean Expressions 5.1 Two-Valued Logic The Boolean algebra with its operations not (¬), and (∧), and or (∨) is wellknown. The truth tables for these operations is given in Figure 5.1. Figure 5.2 summarizes well-known laws for two-valued logic. 5.2 Null Values Many database management systems (in particular all SQL-based relational systems but also object-oriented databases) support a special NULL value. It is used to express semantic concepts like undefined, unknown or not applicable. Although there exist proposals for supporting different NULL values for these different semantic concepts [?, ?], database management systems only support one NULL value. This NULL value is a special value, distinguishable from all other values in a domain. That is, all the domains are extended by this very special value. This necessitates the definition of operations, functions, and comparison operators in case some argument is NULL. 5.2.1 Functions and Operators If any of the arguments of a function or an operator is NULL, the result of the operator is typically also NULL. For example, in SQL every arithmetic operator ¬ true false false true ∨ true false true true true false true false ∧ true false true true false false false false Figure 5.1: Truth tables for two-valued logic 199 200 CHAPTER 5. LOGIC, NULL, AND BOOLEAN EXPRESSIONS Commutativity p1 ∨ p2 ∃e1 ∃e2 p ≡ p2 ∨ p1 ≡ ∃e2 ∃e1 p p1 ∧ p2 ∀e1 ∀e2 p ≡ p2 ∧ p1 ≡ ∀e2 ∀e1 p Associativity (p1 ∨ p2 ) ∨ p3 ≡ p1 ∨ (p2 ∨ p3 ) (p1 ∧ p2 ) ∧ p3 ≡ p1 ∧ (p2 ∧ p3 ) Distributivity p1 ∨ (p2 ∧ p3 ) ≡ (p1 ∨ p2 ) ∧ (p1 ∨ p3 ) ∃e (p1 ∨ p2 ) ≡ (∃e p1 ) ∨ (∃ p2 ) p1 ∧ (p2 ∨ p3 ) ≡ (p1 ∧ p2 ) ∨ (p1 ∧ p3 ) ∀e (p1 ∧ p2 ) ≡ (∀e p1 ) ∧ (∀e p2 ) Idempotency p∨p p ∨ ¬p p1 ∨ (p1 ∧ p2 ) p ∨ false p ∨ true ≡ ≡ ≡ ≡ ≡ p true (∗) p1 p true p∧p p ∧ ¬p p1 ∧ (p1 ∨ p2 ) p ∧ true p ∧ false ≡ ≡ ≡ ≡ ≡ ≡ ¬(p1 ) ∧ ¬(p2 ) ¬(p1 ∧ p2 ) ≡ ¬(p1 ) ∨ ¬(p2 ) ¬(∃e p) ≡ ∀e (¬p) ¬t1 θt2 ≡ t1 θt2 p1 ∧ (∃e p2 ) ≡ ∃e (p1 ∧ p2 ) p false (∗) p1 p false De Morgan ¬(p1 ∨ p2 ) Negation of Quantifiers ¬(∀e p) ≡ ∃e(¬p) Elimination of Negation ¬(¬(p)) ≡ p Conditioned Distributivity(F(p1 ) ∩ A(e) = ∅) p1 ∨ (∀e p2 ) p1 ∨ (∃e p2 ) p1 ∧ (∀e p2 ) ≡ ∀e (p1 ∨ p2 )  ∃e(p1 ∨ p2 ) ≡ p  1 ∀e(p1 ∧ p2 ) ≡ p1 if e ̸= {} if e = {} if e ̸= {} if e = {} Figure 5.2: Laws for two-valued logic and function is defined that way. Thus, we have for example 0 ∗ N U LL = N U LL, although 0 could also be reasonable. 201 5.2. NULL VALUES y is null y not null x=y x is null ⊥ ⊥ x not null ⊥ x=y y is null y not null . x=y x is null true false x not null false x=y Figure 5.3: Comparison functions in the presence of NULL values 5.2.2 Comparison Operators NULL values stored directly in base tables thus typically propagate up through operators and function calls. Thus, comparison operators must deal with NULL values as input. Without NULL values as input, all comparison operators yield either true or false as output. In the presence of NULL values, a third value called unknown (⊥) is possible. How these unknown values are handled by standard logical operators is the topic of the next section. Here, we concentrate on the definition of comparison operators. Since there will be many of them with different semantics, we need some specific notation. First, we assume that =v is the standard value-based comparison operator for a given domain (e.g., integer or varchar) that cannot have NULL values as argument. The standard comparison operator =v can be extended in two ways to handle NULL values as input. The first is denoted by = and has the same semantics as the equality in SQL. It returns any of true, false, ⊥. It is defined in Figure 5.3. Note that it always evaluates to ⊥, if at least one of its arguments is NULL. If none of the arguments are NULL, it behaves like a regular value comparison operator . =v . The second operator is =. It is also defined in Figure 5.3. If both inputs are NULL, it returns true. If only one input is NULL, it returns false. If no input is NULL, it returns the result of the regular value comparison operator. . . Note, that = never returns ⊥. In SQL predicates, = is called is not distinct . from. Also in SQL, = is used for grouping and duplicate elimination. Using ¬, we can define abbreviations for inequality: x ̸= y := ¬(a = b) . . x= ̸ y := ¬(a = b) For other comparison operators θ ∈ {≤, ≥, <, >}, we treat their negation by defining an operator θ that converts θ into it’s negated counterpart: ≤ := > ≥ := < < := ≥ > := ≤ Their semantics is defined analogously to that of equality, i.e., if at least one of their arguments is NULL, they return ⊥, otherwise they return the result of their regular value comparison counterparts. 202 CHAPTER 5. LOGIC, NULL, AND BOOLEAN EXPRESSIONS ¬ true false ⊥ false true ⊥ ∨ true false ⊥ true false ⊥ true true true true false ⊥ true ⊥ ⊥ ∧ true false ⊥ true true false ⊥ false false false false ⊥ false ⊥ ⊥ Figure 5.4: Truth tables for three-valued logic 5.3 Three-Valued Logic As we have seen, comparison operators can have three possible outcomes (true, false, and ⊥). A logic dealing with these three values is called three-valued logic. In this section, we review the operators of three-valued logic and give some useful laws. Figure 5.4 shows the extended truth-tables for the standard operators ¬, ∧, and ∨. For completeness, we also define implication and exclusive or : a =⇒ b := ¬a ∨ b · a ∨ b := (a ∨ b) ∧ ¬(a ∧ b) While three-valued logic correctly captures the uncertainty caused by NULL values, and a result of ⊥ can be reported back to the user as a boolean NULL, it is often necessary to convert a three-valued result into a two-valued one. Obviously, this can be done by converting ⊥ to either true or false. This is called true-interpreted or false-interpreted ⊥. Two operators ⌈·⌉⊥ and ⌊·⌋⊥ perform this conversion: x ⌈x⌉⊥ ⌊x⌋⊥ true true true false false false ⊥ true false An example for false-interpreted ⊥ values are where clauses in SQL: a given tuple only qualifies, if the predicate evaluates to true. An example for trueinterpreted ⊥ values are SQL check conditions: a constraint violation only occurs if the predicate in the check condition returns false. In the following, we need to say that two expressions in three-valued logic are equivalent (≡). This is defined to be true if and only if for all assignments of variables found on the left-hand side and the right-hand side to constants/nullvalues/truth values including ⊥, the evaluation of the right-hand side yields the same value (true, false, ⊥) as the evaluation of the left-hand side. Any database management system has a choice. Either • work all the way with NULL values and three-valued logic or • convert expressions in three-valued logic to two-valued logic. For the former see the exercises. For the latter, we need to push down ⌊·⌋⊥ and ⌈·⌉⊥ and prepare at the bottom of our expressions. Pushing ⌊·⌋⊥ and ⌈·⌉⊥ 203 5.3. THREE-VALUED LOGIC x ¬x ⌊x⌋⊥ ¬⌊x⌋⊥ ⌊¬x⌋⊥ ¬⌊¬x⌋⊥ ⌈x⌉⊥ ¬⌈x⌉⊥ ⌈¬x⌉⊥ ¬⌈¬x⌉⊥ true false true false false true true false false true false true false true true false false true true false ⊥ ⊥ false true false true true false true false Figure 5.5: True-/false-interpretation and Negation down ∧ and ∨ is rather easy: ⌈p1 ∧ p2 ⌉⊥ ≡ ⌈p1 ⌉⊥ ∧ ⌈p1 ⌉⊥ (5.1) ⌈p1 ∨ p2 ⌉⊥ ≡ ⌈p1 ⌉⊥ ∨ ⌈p1 ⌉⊥ (5.3) ⌊p1 ∧ p2 ⌋⊥ ≡ ⌊p1 ⌋⊥ ∧ ⌊p1 ⌋⊥ (5.2) ⌊p1 ∨ p2 ⌋⊥ ≡ ⌊p1 ⌋⊥ ∨ ⌊p1 ⌋⊥ (5.4) However, we must be very careful with negation. A complete account of the situation is given in Fig. 5.5. From there, we see that ⌈¬x⌉⊥ ≡ ¬⌊x⌋⊥ (5.5) ⌈x⌉⊥ ≡ ¬⌊¬x⌋⊥ (5.7) ⌊¬x⌋⊥ ≡ ¬⌈x⌉⊥ (5.6) ⌊x⌋⊥ ≡ ¬⌈¬x⌉⊥ (5.8) Using these equivalences, we can push down ⌊·⌋⊥ and ⌈·⌉⊥ until we meet some built-in predicate or boolean function. For built-in comparison operators, we can combine the false/true-interpretation with the operator yielding two additional comparison operators. As an example let us consider the equality operator. For it, we define two new equality operators, each combining = with one possible interpretation of unknown: e1 =− e2 := ⌊e1 = e2 ⌋⊥ e1 =+ e2 := ⌈e1 = e2 ⌉⊥ Analogously, we define for any comparison operator θ ∈ {≤, ≥, <, >, ̸=} two operators θ− and θ+ . For operators that do not yield unknown, we can eliminate the interpretation. These operators include for example exists, match, is distinct from, and is null. Thus, if we have an expression b that is guaranteed to evaluate only in true and false (and not ⊥), then we have ⌈b⌉⊥ = b ⌊b⌋⊥ = b If we take a careful look at Figure 5.2, we see that all equivalences except those marked by ’*’ hold for three-valued logic. Exercise 1. Find a 2-bit encoding of the values true, false, ⊥ which requires only one machine instruction to implement each of ∧, ∨, and ¬. 204 CHAPTER 5. LOGIC, NULL, AND BOOLEAN EXPRESSIONS · Exercise 2. Manually build a truth table for x1 =⇒ x2 and x1 ∨ x2 in three-valued logic. Then check, whether the right-hand side of their definitions is equivalent with your truth table. Exercise 3. Look for ways to move ⌈·⌉⊥ and ⌊·⌋⊥ down x1 =⇒ x2 and · x1 ∨ x2 . 5.4 Preparation of Boolean Expressions Before any further optimization can take place, boolean expressions need to be preprocessed. The most important steps that are required to take place are pushnot and pushunk. Before we come to these steps, let us consider another trick that is commonly found. It is partial evaluation: pareval If one term of a conjunction yields false then the other (not yet) evaluated terms are not evaluated. This is typically represented by cascading select operators, which can then be pushed down independently. Partial evaluation can also be applied to disjunctions if one factor evaluates to true. Now consider ¬(a ∧ b). Due to the negation, we cannot apply pareval without any precaution. Further, as we saw previously, negation swaps ⌊·⌋⊥ and ⌈·⌉⊥ . Thus, if not handled very carefully, negation leads to all kinds of problems. Thus, the first step in the preparation of a predicate is to push negation down. After this step, we can push ⌊·⌋⊥ and ⌈·⌉⊥ down to convert three-valued logic expressions to two-valued logic expressions. Thus, we perform in this order: pushnot push negation down pushunk push ⌊·⌋⊥ and ⌈·⌉⊥ down More on simple rewrites can be found in Chapter ??. 5.5 Equivalence Classes based on Equality The traditional equality operator =v is reflexive, symmetric, and transitive. Hence, it induces equivalence classes of expressions which, in the context of predicate evaluations must evaluate to the same value. It is good practice in database systems, to collect equivalence classes. As a prerequisite, we need the notion of some expression e′ occurring conjunctively in some predicate (boolean expression) e. A substitution is a partial mapping from subexpressions of some expression e to other expressions. Here, we are interested in ground substitions where expressions are mapped to constants. A substitution is denoted by [e1 /e′1 , . . . , ek /e′k ] where each ei is mapped to e′i . A substitution can be applied to some other expression and replaces each occurrence of ei by e′i . If p is some boolean expression, then application of a substitution to p is denoted by p[e1 /e′1 , . . . , ek /e′k ]. 5.5. EQUIVALENCE CLASSES BASED ON EQUALITY 205 We define that e occurs conjunctively in p if and only if p[e/false] is equivalent to false. This can be checked by applying the simplification rules (under idempotency in Fig. 5.2). If p[e/false] simplifies to false, e occurs conjunctively in p. Now we can build equivalence classes induced by p by finding all expressions of the form e1 = e2 which occur conjunctively in p. Then, if for example two attributes A and B are in the same equivalence class, we can replace an occurence of A by B. To see an example, why this procedure might be very helpful consider two relations R(A, B) and S(C) and assume that we have to evaluate the following algebraic expression: R BA=C∨B=C S In this case, any efficient implementation of the join operator, like hash joins, does not work. In order to avoid a nested-loop evaluation of this expression, we rewrite it as follows: (R BA=C S) ∪ (R BB=C∧A̸=C S) Note that the two arguments of the union operator are disjoint. Thus, we do not have a problem with duplicates. Now, we can apply a hash-join on either side. However, we have to evaluate A ̸= B as a residual predicate after the join. Noting that B = C occurs conjunctively in B = C ∧ A ̸= C, we can replace C by B in A ̸= C. Then, we have B = C ∧ A ̸= C ≡ B = C ∧ A ̸= B (5.9) Applying this equivalence to the above algebraic expression yields (R BA=C S) ∪ (R BB=C∧A̸=B S) Now, A ̸= B is a selection predicate (sometimes called restriction, as it involves two attributes from the same relation) and we can rewrite our expression to (R BA=C S) ∪ (σA̸=B (R) BB=C S), which is more efficient since we eliminate the R tuples before the join. Now let us see what happens in the presence of NULL values and threevalued logic. At the core of our reasoning was Equivalence 5.9. For illustrating purposes, we assume that each of the attributes A, B, and C can have values 3, 7, and NULL. The following table contains all cases, where it does not hold. (The other cases are left to the reader.) A 3 7 NULL NULL B 3 7 3 7 C NULL NULL 3 7 LHS ⊥ ⊥ false false RHS false false ⊥ ⊥ Thus, the equivalence, valid in two-valued logic does not hold in the presence of NULL values and three-valued logic. Predicates are typically true- or falseinterpreted. If the predicate is true-interpreted, than the left-hand side and the 206 CHAPTER 5. LOGIC, NULL, AND BOOLEAN EXPRESSIONS right-hand side of 5.9 clearly differ. However, if the predicate is false-interpreted we have that ⌊B = C ∧ A ̸= C⌋⊥ ≡ ⌊B = C ∧ A ̸= B⌋⊥ (5.10) or, after pushing ⌊·⌋⊥ down: B =− C ∧ A ̸=− C ≡ B =− C ∧ A ̸=− B (5.11) To perform this kind of optimizations, we must be able to build equivalence classes in the presence of NULL values. First note, that = is neither symmetric, reflexive nor transitive, as it might return ⊥. Thus, the only way to produce equivalence classes is by exploiting conjunctively occurring =− expressions. We define that an expressions e occurs conjunctively in p if 1. p[e/false] ≡ false and 2. p[e/⊥] ≡ false Then, we can collect all conjunctively occuring expressions in =− and build equivalence classes based on them. Note that this requires that we pushed down ⌈·⌉⊥ and ⌊·⌋⊥ operations as otherwise there are no =− expressions. 5.6 Nullability Inference 5.7 Bibliography NULL-values: [90, 546, 547, 747, 748] Chapter 6 Functional Dependencies In many query results attribute values are not independent of each other but have certain dependencies. Keeping track of these dependencies is very useful for many optimizations, for example in the following query select c.id, n.name from customers c, nations n where c.nid=n.id order byc.id, n.name the order by clause can be simplified to c.id without affecting the result: c.id is the key of customers, and thus determines c.nid. c.nid is joined with n.id, which is the key of nations and determines n.name, thus transitively c.id determines n.name. These functional dependencies between attributes have been studied primarily in the context of database design, but many optimization steps like order optimization (Chapter 23) and query unnesting (Chapter 14) profit greatly from known functional dependencies. In the following we first study functional dependencies when all attributes are not NULL, then extend this to attributes with NULL values, and finally discuss how functional dependencies are effected by relational operators. 6.1 Functional Dependencies As illustrated by the previous example, a functional dependency describes how attribute values depend on other attribute values. More formally, a relation R (with A1 , A2 ⊆ A(R)) satisfies a functional dependency f : A1 → A2 if and only if the following condition holds: ∀t1 , t2 ([t1 ∈ R ∧ t2 ∈ R ∧ t1 .A1 = t2 .A1 ] ⇒ [t1 .A2 = t2 .A2 ]). For base relations functional dependencies can be derived from the schema, in particular key constraints and check conditions [680]. For intermediate results 207 208 CHAPTER 6. FUNCTIONAL DEPENDENCIES additional function dependencies can be induced by algebraic operators, as we will see below. Once some functional dependencies are known to hold, further functional dependencies can be derived by using Armstrong’s axioms [?] (assuming A1 , A2 , A3 ⊆ A(R)): 1. A2 ⊆ A1 ⇒ A1 → A2 2. A1 → A2 ⇒ (A1 ∪ A3 ) → (A2 ∪ A3 ) 3. A1 → A2 ∧ A2 → A3 ⇒ A1 → A3 The Armstrong axioms are sound and complete, i.e., it is possible to derive all valid functional dependencies by applying these three axioms. For practical purposes it is often convenient to include three additional rules which can be derived from the original axioms: 4. A1 → A2 ∧ A1 → A3 ⇒ A1 → (A2 ∪ A3 ) 5. A1 → (A2 ∪ A3 ) ⇒ A1 → A2 ∧ A1 → A3 6. A1 → A2 ∧ (A2 ∪ A4 ) → A3 ⇒ (A1 ∪ A4 ) → A3 Given a set of functional dependencies F, we denote with F + the closure of F, i.e., the set of all functional dependencies that can be derived from F by using the inference rules shown above. Closely related to the concept of functional dependencies is the concept of keys: Given a relation R and an attribute set A ⊆ A(R), A is a super key of R if A → A(R) holds in R. Further A is a key of R if the following condition holds: ∀A(A′ ⊂ A ⇒ ¬(A′ → A(R))). 6.2 Functional Dependencies in the presence of NULL values In the presence of NULL values, a relation R (with A1 , A2 ⊆ A(R)) satisfies a functional dependency f : A1 → A2 if and only if the following condition holds: . . ∀t1 , t2 ([t1 ∈ R ∧ t2 ∈ R ∧ t1 .A1 = t2 .A1 ] ⇒ [t1 .A2 = t2 .A2 ]). XXX explain why, discuss lax dependencies 6.3 Deriving Functional Dependencies over algebraic operators XXX dependency graphs 6.4 Bibliography Chapter 7 An Algebra for Sets, Bags, and Sequences This section summarizes a logical algebra that is sufficent to express queries written in SQL, OQL and XPath/XQuery. The algebra is based upon substantial work by many people [68, 70, 189, 191, 196, 487, 488, 544, 820]. The most prominent features of the algebra are: • All operators are polymorphic and can deal with (almost) any kind of complex arguments. • The operators take arbitrary complex expressions as subscripts. This includes algebraic expressions. The advantage is that nested queries can directly be expressed as nested algebraic expressions and unnesting possibilities can be represented at the algebraic level, which allows rigorous correctness proofs. • The algebra is redundant, since some special cases of the operators can be implemented more efficiently. This chapter is organized as follows. First, we preprare some background material by discussing sets, bags, and sequences, as well as aggregation functions. Then we are ready to present the algebraic operators. This is done in two steps. First, we introduce their signatures and then their semantics. ToDo 7.1 Sets, Bags, and Sequences 7.1.1 Sets A set contains elements drawn from some domain D. In our case, the domain will often be tuples and we only consider finite sets. The set operations we are interested in are union (∪s ), intersection (∩s ), and difference (\s ). If the domain consists of tuples, we assume that both arguments have the same schema. That is, the attributes and their domains are the same in both arguments. Otherwise, the expression is not well-typed. In any case, set union and intersection are commutative and associative. Set difference is neither of them. Expressions 209 210 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES X ∪s ∅ X ∪s X X ∪s Y (X ∪s Y ) ∪s Z X ∩s ∅ X ∩s X X ∩s Y (X ∩s Y ) ∩s Z X \s ∅ ∅ \s X X \s X X \s Y (X \s Y ) \s Z X ∩s Y X ∪s (Y ∩s Z) X ∩s (Y ∪s Z) (X ∪s Y ) \s Z (X ∩s Y ) \s Z (X \s (Y ∪s Z) (X \s (Y ∩s Z) = = = = = = = = = = = ̸= ̸= = = = = = = = X X (idempotency) Y ∪s X (commutativity) X ∪s (Y ∪s Z) (associativity) ∅ X (idempotency) Y ∩s X (commutativity) X ∩s (Y ∩s Z) (associativity) X ∅ ∅ Y \s X (wrong) X \s (Y \s Z) (wrong) X \s (X \s Y ) (X ∪s Y ) ∩s (X ∪s Z) (distributivity) (X ∩s Y ) ∪s (X ∩s Z) (distributivity) (X \s Z) ∪s (Y \s Z) (distributivity) (X \s Z) ∩s (Y \s Z) (distributivity) (X \s Y ) ∩s (X \s Z) (X \s Y ) ∪s (X \s Z) Figure 7.1: Laws for Set Operations EXC containing the empty set can be simplified. Last but not least, some distributivity laws hold. These and other laws for set operation (see Fig. 7.1) should be well-known. A set of elements from a domain D can be seen as a function from D to {0, 1}. For a given set S, this function is called the characteristic function of  0 if s ̸∈ S . Obviously, there is a bijection S. It can be defined as χS (s) = 1 if s ∈ S between characteristic functions and sets. That is, sets can be characterized by their characteristic functions, and the set operations can be expressed in terms of operations on characteristic functions. In the presence of null values, we have to be a little careful to evaluate an expression like x ∈ S. Assume x is null and S contains some element y which is also null. Then, we would like to have that x ∈ S and x is equal to y. Thus, . we must use ‘=’. Set equality can be expressed as equality of characteristic functions. The subset relationship A ⊆ B can be expressed P as χA (x) ≤ χB (x) for all x. The cardinality |S| for a set S is defined as x χS (x). Because we deal with finite sets only, cardinality is well-defined. A singleton set is a set containing only one element, i.e., a set whose cardinality equals 1. As we have seen in Chapter 2, algebraic equivalences that reorder algebraic operators form the fundamental basis for query optimization. One could discuss the reorderability of each pair of operators resulting in n2 investigations if the number of operators in the algebra is n. In order to simplify this tedious task, we introduce a general argument covering most of the cases. The observation 211 7.1. SETS, BAGS, AND SEQUENCES will be that set-linearity of set operators implies their reorderability easily. A unary function f from sets to sets is called set-linear (or homomorph), if and only if the following two conditions hold for all sets X and Y : f (∅) = ∅, f (X ∪s Y ) = f (X) ∪s f (Y ). An n-ary mapping from sets to a set is called set-linear in its i-th argument, if and only if for all sets X1 , . . . , Xn and Xi′ the following conditions hold: f (X1 , . . . , Xi−1 , ∅, Xi+1 , . . . , Xn ) = ∅, f (X1 , . . . , Xi−1 , Xi ∪ Xi′ , Xi+1 , . . . , Xn ) = f (X1 , . . . , Xi−1 , Xi , Xi+1 , . . . , Xn ) ∪s f (X1 , . . . , Xi−1 , Xi′ , Xi+1 , . . . , Xn ). It is called set-linear , if it is set-linear in all its arguments. For a binary function or operator where we can distinguish between the left and the right argument, we call it left (right) set-linear if it is set-linear in its first (second) argument. Note that if an equivalence with linear mappings on both sides has to be proven, it suffices to proof it for singleton sets, i.e. sets with one element only. Using the commutativity of set union and set intersection as well as the observations above, we see that for a non-empty set X (∅ ∪s X) ̸= ∅, (∅ ∩s X) = ∅, (∅ \s X) = ∅, (X \s ∅) ̸= ∅, and for arbitrary sets X, Y , and Z (X ∪s Y ) ∪s Z = (X ∪s Z) ∪s (Y ∪s Z), (X ∪s Y ) ∩s Z = (X ∩s Z) ∪s (Y ∩s Z), (X ∪s Y ) \s Z = (X \s Z) ∪s (Y \s Z), X \s (Y ∪s Z) ̸= (X \s Y ) ∪s (X \s Z). We can conclude that set union is neither left nor right set-linear, set intersection is set-linear, and set difference is left set-linear but not right set-linear. 7.1.2 Duplicate Data: Bags A bag or multiset can contain every element more than once. It cannot contain an element less than zero times. A typical bag is {a, b, b}b , for which we also write {a1 , b2 }b . Another example is {a, b}b . The latter bag does not contain any duplicates. Hence, it could also be considered a set. We will only consider finite bags. For a given bag B, the characteristic function for bags maps every element of a domain D to the set of non-negative integers IN0 . The characteristic function gives the number of occurrences of each element in the bag. The number 212 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES X ∪b ∅b X ∪b X X ∪b Y (X ∪b Y ) ∪b Z X ∩b ∅b X ∩b X X ∩b Y (X ∩b Y ) ∩b Z X \b ∅b ∅b \b X X \b X X \b Y (X \b Y ) \b Z X ∩b Y X ∪b (Y ∩b Z) X ∩b (Y ∪b Z) (X ∪b Y ) \b Z (X ∩b Y ) \b Z X \b (Y ∪b Z) X \b (Y ∩b Z) = ̸= = = = = = = = = = ̸= ̸= = = ̸= ̸= = ̸= ̸= X X (wrong) Y ∪b X (commutativity) X ∪b (Y ∪b Z) (associativity) ∅b X (idempotency) Y ∩b X (commutativity) X ∩b (Y ∩b Z) (associativity) X ∅b ∅b Y \b X (wrong) X \b (Y \b Z) (wrong) X \b (X \b Y ) (X ∪b Y ) ∩b (X ∪b Z) (distributivity) (X ∩b Y ) ∪b (X ∩b Z) (wrong) (X \b Z) ∪b (Y \b Z) (wrong) (X \b Z) ∩b (Y \b Z) (distributivity) (X \b Y ) ∩b (X \b Z) (wrong) (X \b Y ) ∪b (X \b Z) (wrong) Figure 7.2: Laws for Bag Operations of occurrences of some element x in a bag B is χB (x), and we call this the multiplicity of x. We often denote the multiplicity of an element by a superscript as in {x77 }b , where the element x has multiplicity 77. Again, there is a bijection between bags and their characteristic functions. We use ∈ to denote bag membership. Given a bag B and its characteristic function χB , we have x ∈ B ≺≻ χB (x) > 0. If we use ∈ within a bag constructor, as in {x|x ∈ B}b , x iterates over all elements in B. This means, that if some element has multiplicity m, then x iterates over m duplicates of this element. In order to determine it for a given bag, we must have an equality defined on the items . in the bag. Here, we have to use =, which reflects the semantics of SQL. Thus, in {null, null, null}b the multiplicity of null is 3. It would be bad to have three nulls with multiplicity 1 each. Equality on bags is defined as equality of their characteristic functions. Subbag relationships can be defined using the characteristic function. For example, A ⊆ B can be defined as χA (x) ≤ χB (x) for all x. The cardinality |B| for a bag P B is defined as x χB (x). Because we deal with finite bags only, cardinality is well-defined. A bag B containing a single element is one whose characteristic function equals 0 for all but one element x. A singleton bag is one whose cardinality equals 1. The bag union X ∪b Y of two bags is defined such that the number of occurrences of some element in the union is the sum of its occurrences in X and Y . The number of occurrences of some element in the bag intersection X ∩b Y 7.1. SETS, BAGS, AND SEQUENCES 213 is the minimum of the number of its occurrences in X and Y . In the bag difference X \b Y , the number of occurrences of some element is the difference (−̇) of its occurrences in X and Y , where a−̇b is defined as max(0, a − b). Using characteristic functions, we can define χX∪b Y (z) = χX (z) + χY (z) χX∩b Y (z) = min(χX (z), χY (z)) χX\b Y (z) = χX (z)−̇χY (z) The laws for sets do not necessarily hold for bags (see Figure 7.2). We have that bag union and bag intersection are both commutative and associative. Bag difference is neither of them. Let us take a closer look at the different distributivity laws. Therefore, denote by LHS the left-hand side of an equivalence and by RHS its right-hand side. Let us first prove X ∪b (Y ∩b Z) = (X ∪b Y ) ∩b (X ∪b Z). Since for all x we have χLHS (x) = χX (x) + min(χY (x), χZ (x)) = min(χX (x) + χY (x), χX (x) + χZ (x)) = χRHS (x), the claim follows. For the bags X = {15 }b , Y = {13 }b , and Z = {13 }b , we get X ∩b (Y ∪b Z) = {15 }b ∩b {16 }b = {15 }b , but (X ∩b Y ) ∪b (X ∩b Z) = {13 }b ∪b {13 }b = {16 }b . For the bags X = {15 }b , Y = {13 }b , and Z = {12 }b , we calculate (X ∪b Y ) \b Z = {18 }b \b {12 }b = {16 }b , but (X \b Z) ∪b (Y \b Z) = {13 }b ∪b {11 }b = {14 }b . Consider (X ∩b Y ) \b Z = (X \b Z) ∩b (Y \b Z). This holds, since χLHS (x) = min(χX (x), χY (x))−̇χZ (x) = min(χX (x)−̇χZ (x), χY (x)−̇χZ (x)) = χRHS (x). For the bags X = {12 }b , Y = {11 }b , and Z = {11 }b , we calculate X \b (Y ∪b Z) = {12 }b \b {12 }b = ∅b , 214 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES but (X \b Y ) ∩b (X \b Z) = {11 }b ∩b {11 }b = {11 }b , and X \b (Y ∩b Z) = {12 }b \b {11 }b = {11 }b , but (X \b Y ) ∪b (X \b Z) = {11 }b ∪b {11 }b = {12 }b . Remark. Our definition of bag union is not the usual definition. The standard set theoretic definition of the bag union operator ∪max is defined such that χX∪max Y (x) = max(χX (x), χY (x)) holds [21, 221]. With this definition, the laws for sets carry over to bags. We decided to use the non-standard definition, since this is the semantics of bag union in SQL and other query languages. Dayal, Goodman, and Katz [221] and Albert [21] also investigate the non-standard bag union in their papers, although under a different name. For example, Albert calls it bag concatenation. As a side remark, it is interesting to note that Albert showed that bag concatenation can not be expressed using ∪max , ∩b , \b [21]. Thus, any query language featuring ∪b is strictly more expressive, since ∪max can be expressed using \b and ∪b because the equivalence X ∪max Y ≡ (X \b Y ) ∪b Y holds. Two other laws involving ∪max are X ∪max Y X ∩b Y ≡ (X ∪b Y ) \b (X ∩b Y ), ≡ (X ∪b Y ) \b (X ∪max Y ). We introduce linearity for bags in Sec. 7.4. 7.1.3 Explicit Duplicate Control Having every operation twice, once for bags and once for sets is quite inconvenient. Fortunately, for some operations we only need the one for bags. We can get rid of some set operations as follows. Every set can be seen as a bag whose ¯ characteristic function never exceeds one. Let I(S) turn a set S into a bag with identical characteristic function. The partial function I¯−1 (B) turns a bag into a set if the bag’s characteristic function does not exceed one. Otherwise let I¯−1 be undefined. Let X and Y be two sets. For the intersection function, we then have ¯ ¯ )) = X ∩s Y. I¯−1 (I(X) ∩b I(Y That is, for any two sets X and Y bag intersection and set intersection are the same. Thus, we only need one intersection operation, which is defined on bags and which we will denote by ∩. The above observation gives rise to the notion of set-faithfulness. We call a unary function on sets f set-faithful if and only if ¯ I¯−1 (f (I(X))) = f (X) 7.1. SETS, BAGS, AND SEQUENCES 215 holds for all sets X. Analogously, we call binary functions g set-faithful if and only if ¯ ¯ ))) = g(X, Y ) I¯−1 (g(I(X), I(Y holds for all sets X and Y . \b and ∩b are set-faithful. Hence, we can (and often will) simply use \ and ∩ to denote bag difference and intersection. If the arguments happen to be sets, the resulting bag will not contain any duplicates, i.e., it is a set. Note that ∪b is not set-faithful. One possibility is to carefully distinguish between ∪b and ∪s . However, this does not solve our problem for query processing. A relation can be a set (e.g. if a primary key is defined) or a bag. Assume we have two relations (or intermediate results) R1 , which is a set, and R2 , which is a bag. Obviously, R1 ∪s R2 is not valid since R2 is a bag. By treating sets as special bags, R1 ∪b R2 is valid. However, we cannot control duplicates in the result as demanded by SQL, where there is a fundamental difference between union all and union distinct. We could thus use two different union operators. Both take bags as input but one preserves duplicates, as does the bag union, and the other eliminates duplicates. Let us denote the former by ∪ and the latter by ∪d . To go from a bag to a set, we have to eliminate duplicates. Let us denote by ΠD the duplicate elimination operation. For a given bag B, we then have χΠD (B) (z) = min(1, χB (z)). Using ΠD , we can define ∪d as R1 ∪d R2 := ΠD (R1 ∪ R2 ). However, the right-hand side is our preferred way to take care of duplicate handling: we will always use the bag operator, denoted by ∪ and then, if necessary, eliminate duplicates explicitly. Summarizing, instead of working with sets and bags, we can work with bags ¯ only by identifying every set S with the bag I(S). To keep track of (possible) duplicates, we can annotate all bags with a property indicating whether it contains duplicates or not. If at some place a set is required and we cannot infer that the bag in that place is duplicate free, we can use ΠD as an enforcer of the set property. Note that for every set S we have ΠD (S) = S. Hence, ΠD does not do any harm except for the resources it takes. The reasoning whether a given expression produces duplicates or not is very important. Below, we will indicate on the fly how reasoning about duplicates can be performed. 7.1.4 Ordered Data: Sequences A sequence is ordered and may contain duplicates. An example sequence is ⟨a, b, b, c, b⟩. The length of the sequence is the number of elements it contains. For any sequence S, the length of the sequence is denoted by |S|. The above sequence has length five. The empty sequence (ϵ) contains zero elements and has length zero. As we consider only finite sequences, a sequence of length n ≥ 0 has a characteristic function χ from an interval [0, n[ to a domain D. Outside [0, n[, χ is undefined (⊥). Let S be a sequence. Then α(S) gives us the first element of 216 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES the sequence, i.e., α(S) = χS (0). For our example sequence, α(⟨a, b, b, c, b⟩) = a. The rest or tail of a sequence S of length n is denoted by τ (S) and contains all but the first element in the sequence. That is χτ (S) (i) = χS (i + 1). For our example sequence, τ (⟨a, b, b, c, b⟩) = ⟨b, b, c, b⟩. Concatenation of two sequences is denoted by ⊕. The characteristic function of the concatenation of two sequences S and T is  χS (i) if i < |S|, χS⊕T (i) = χT (i − |S|) if i ≥ |S|. As an example, ⟨a, b, b, c, b⟩ ⊕ ⟨a, b, c⟩ = ⟨a, b, b, c, b, a, b, c⟩. We can easily go from a sequence to a bag by just forgetting the order. To convert a bag into a sequence, we typically have to apply a Sort operator. In reality, however, bags are often represented as (ordered) streams, i.e., they are sequences. This is due to the fact that most physical algebras are implemented using the iterator concept introduced in Section 4.6. Analogously to set and bag linearity, we can introduce sequence linearity of unary and n-ary functions on sequences. In the definition, we only have to exchange the set union operator by concatentation. A unary function f from sequences to sequences is called sequence-linear , if and only if the following two conditions hold for all sequences X and Y : f (ϵ) = ϵ, f (X ⊕ Y ) = f (X) ⊕ f (Y ). An n-ary mapping from sequences to a sequence is called sequence-linear in its i-th argument if and only if for all sequences X1 , . . . , Xn and Xi′ the following conditions hold: f (X1 , . . . , Xi−1 , ϵ, Xi+1 , . . . , Xn ) = ϵ f (X1 , . . . , Xi−1 , Xi ⊕ Xi′ , Xi+1 , . . . , Xn ) = f (X1 , . . . , Xi−1 , Xi , Xi+1 , . . . , Xn ) ⊕f (X1 , . . . , Xi−1 , Xi′ , Xi+1 , . . . , Xn ) It is called sequence-linear , if it is sequence-linear in all its arguments. For a binary function or operator where we can distinguish between the left and the right argument, we call it left (right) sequence-linear if it is sequence-linear in its first (second) argument. Note that if an equivalence with linear mappings on both sides has to be proven, it suffices to proof it for singleton sequences, i.e. sequences with one element only. 7.2 Aggregation Functions SQL and other query languages support at least five aggregation functions. These are min, max, count, sum, and avg. In addition, SQL allows to qualify whether duplicates are removed before computing the aggregate or whether they are also considered by the aggregation function. For example, we may specify sum(distinct a) or sum(all a) for some attribute a. The term sum(a) is 7.2. AGGREGATION FUNCTIONS 217 equivalent to sum(all a). From this follows that aggregation functions can be applied to sets or bags. Other query languages (OQL and XQuery) also allow lists as arguments to aggregation functions. Additionally, OQL allows arrays. Hence, aggregation functions should be defined for any bulk type. Most query languages provide a special null value. In SQL it is called NULL. Initially, OQL did not have a special null value. Fortunately, it was introduced in version 3.0. There, the null value is called UNKNOWN. So far, XQuery has no null value. Instead, the inventors of XQuery tried hard to let the empty sequence play a dual role: that of an empty sequence and that of a null value. Of course, this leads to awkward complications. We will use ’-’, ⊥, or NULL to represent a null value. From this variance, the reader can already imagine its importance. Typically, aggregation functions can safely ignore null values. The only exception is count(*), where all input elements are counted. If for some attribute a, we want to count only values of a with a ̸= ⊥, then we often use countNN (a) to emphasize this fact. The corresponding SQL function is count(a). Let x be a single value and {x} a bag containing x only once. Since min({x}) = x, max({x}) = x, sum({x}) = x, avg({x}) = x, these aggregation functions behave like identity if we identify single elements with singleton bags. If we identify a single value with a bag containing this single value once, we see that min(min(X)) = min(X), max(max(X)) = max(X), sum(sum(X)) = sum(X), avg(avg(X)) = avg(X), that is, these aggregation functions are idempotent. Let N denote either a numeral data type (e.g. integer or float) or a tuple [a1 : τ1 , . . . , an : τn ] where each type τn is a numeral data type. Further, let N contain the null value. A scalar aggregation function agg is a function with signature agg : {τ }b → N . A scalar aggregation function agg : {τ }b → N is called decomposable if there exist functions 1 agg : {τ }b → N ′ , 2 agg : {N ′ }b → N , 218 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES with 2 1 1 agg(Z) = agg({agg(X), agg(Y )}b ) for all X and Y (not empty) with Z = X ∪ Y . This condition assures that agg(Z) can be computed on arbitrary subsets (-lists, -bags) of Z independently and the (partial) results can be aggregated to yield the correct total result. If the condition holds, we say that agg is decomposable with inner agg1 and outer agg2 . A decomposable scalar aggregation function agg : {τ }b → N is called reversible if for aggO there exists a function (aggO )−1 : N ′ , N ′ → N ′ with O I I agg(X) = γ((agg)−1 (agg(Z), agg(Y ))) for all X, Y , and Z with Z = X ∪ Y . This condition assures that we can compute agg(X) for a subset (-list, -bag) X of Z by “subtracting” its aggregated complement Y from the “total” aggO (aggI (Z)) by using (aggO )−1 . The fact that scalar aggregation functions can be decomposable and reversible is the basic observation upon which builds the efficient evaluation of aggregation functions. As an example, consider the scalar aggregation avg : {[a : f loat]}b → f loat averaging the values of the attributes a of a bag of tuples with a single attribute a. It is reversible with I agg : {[a : f loat]} → [sum : f loat, count : f loat], agg : [sum : f loat, count : f loat], [sum : f loat, count : f loat] → [sum : f loat, count : f loat], O O (agg)−1 : [sum : f loat, count : f loat], [sum : f loat, count : f loat] → γ : [sum : f loat, count : f loat] → [sum : f loat, count : f loat], f loat, where I agg(X) = [sum : sum(X.a), count : |X|], agg([sum : s1 , count : c1 ], [sum : s2 , count : c2 ]) = [sum : s1 + s2 , count : c1 + c2 ], −1 = [sum : s1 − s2 , count : c1 − c2 ], O O (agg) ([sum : s1 , count : c1 ], [sum : s2 , count : c2 ]) γ([sum : s, count : c]) = s/c. Here, sum(X.a) denotes the sum of all values of attribute a of the tuples in X, and |X| denotes the cardinality of X. Note that aggI (∅) = [sum : 0, count : 0], and γ([sum : 0, count : 0]) is undefined as is avg(∅). In statistics, the variancePof a bag of numbers is often calculated. For a bag 1 2 B, it is defined as s2 = n−1 x∈B (x − x) , where x is the average of the values P in B, i.e., x = n1 x∈B x. As an exercise, the reader should show that variance is decomposable and reversible. Not all aggregation functions are decomposable and reversible. For instance, min and max are decomposable but not reversible. If an aggregation function is applied to a bag that has to be converted to a set, then decomposability is jeopardized for sum and count. That is, in SQL sum(distinct) and count(distinct) are not decomposable. 219 7.2. AGGREGATION FUNCTIONS Let us look at the decomposition of our five aggregation functions. We can decompose them as follows: min(X ∪ Y ) = min(min(X), min(Y )), max(X ∪ Y ) = max(max(X), max(Y )), count(X ∪ Y ) = sum(count(X), count(Y )), sum(X ∪ Y ) = sum(sum(X), sum(Y )). The treatment of avg is slightly more complicated, as we have already seen above. In the presence of null values, avg is defined as avg(X) = sum(X)/ countNN (X). Hence, we can decompose it on the basis of NN NN avg(X ∪ Y ) = sum(sum(X), sum(Y ))/(count(X) + count(Y )) In a typical query compiler, every occurrence of avg(e) is replaced by sum(e)/ countNN (e) during the NFST phase. Thus, during subsequent phases of the query compiler, we can safely ignore the intricacies of average1 . Table 7.3 summarizes these findings. agg agg1 agg2 min min min max max max count(∗) count(∗) sum count(a) count(a) sum sum sum sum sum, countNN sum, sum avg Figure 7.3: Decomposition of aggregate functions We now extend the notion of decomposability to aggregation vectors. An aggregation vector is an expression of the form (b1 : agg(a1 ), . . . , bk : agg(ak )), 1 k where the ai and bi are attribute names and the aggi are aggregation functions. Often, we will leave out the enclosing parenthesis and simply write b1 : agg(a1 ), . . . , bk : agg(ak ). 1 k We use ◦ to denote the concatenation of two aggregation vectors. Let F = (b1 : agg1 (a1 ), . . . , bk : aggk (ak )) be an aggregation vector and all aggregates aggi be decomposable into agg1i and agg2i . Then, we say that F is decomposable into F 1 and F 2 where 1 1 F 1 := (b′1 : agg(a1 ), . . . , b′k : agg(ak )), F 1 2 := 1 2 k 2 ′ (b1 : agg(b1 ), . . . , bk : agg(b′k )). 1 k These are nicely described in a book by Savage [762] 220 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES Note that in all cases, we have that if F is decomposable into F 1 and F 2 , then F 1 is decomposable into F 1,1 and F 1,2 , and F 2 is decomposable into F 2,1 and F 2,2 . Further, we have F 1,1 = F 1 , F 1,2 = F 2 , F 2,1 = F 2 , F 2,1 = F 2 . Let e1 and e2 be arbitrary expressions. We say that an aggregation vector F is splittable into F1 and F2 with respect to e1 and e2 if F = F1 ◦ F2 , F(F1 ) ∩ A(e2 ) = ∅ and F(F2 ) ∩ A(e1 ) = ∅. Assume that F contains an aggregation function aggi applied to some attribute ai . If a ∈ A(e1 ), then clearly aggi (ai ) belongs to F1 , if a ∈ A(e2 ) then aggi (ai ) belongs to F2 . There are other cases where F is splittable. Consider, for example, sum(a1 + a2 ) for ai ∈ A(ei ). Since sum(a1 + a2 ) = sum(a1 ) + sum(a2 ), this does not hinder splittability. The same holds for subtraction. The correct handling of duplicates, i.e., bags, is essential for the correctness of the query compiler and requires some care. We will therefore classify our aggregation functions into those which are sensitive to duplicates and those which are not. An aggregation function is called duplicate agnostic if the multiplicity of the elements in the bag does not influence its result. It is called duplicate sensitive otherwise. For our aggregation functions we have • min, max, sum(distinct), count(distinct), avg(distinct) are duplicate agnostic and • sum, count, avg are duplicate sensitive. Yan and Larson used the term Class C aggregation function for duplicate sensitive aggregation functions and Class D for duplicate agnostic aggregation functions [946]. Finally, note that for all aggregate functions except count(∗), we have agg({a}) = a for arbitrary elements a. Thus, if we are sure that we deal with only one tuple, we can apply the following rewrite. Let ai and bi be attributes. Then, if F = (b1 : agg1 (a1 ), . . . , bm : aggm (am )), we define F̂ = (b1 : a1 , . . . , bm : am ). 7.3 Operators The bag operators as well as other typical operators like selection and join are well-known. As we will see, the only difference in the definitions used here is that they are extended to express nested queries. In order to enable this, we allow the subscripts (predicates, expressions) of these operators to contain algebraic expressions. In this section, we define all our operators on bags. Besides duplicate elimination, only projection will have explicit control over duplicates. 7.3. OPERATORS 221 Sometimes, the left outerjoin needs some additional tuning. The standard definition of the left outerjoin demands that if some tuple from its left argument does not have a join partner in its right argument, the attributes from the right argument are given null values. We extend the left outerjoin such that values other than null can be given to attributes of the right hand side. Similarily, the full outerjoin will be extended to carry two superscripts for this kind of defaults. The d-join operation is used for performing a join between two bag valued items, where the second one is dependent on the first one. One use is to express queries with table functions (see Sec. 4.10). Another is to access index structures (see Sec. 4.14). The d-join can also be used to unnest nested queries. It is often equivalent to a join between two bags with a membership predicate [803]. In some cases, it corresponds to an unnest operation. The map operator χ ([487]) is well-known from the functional programming language context. A special case of it, where it adds derived information in form of an added attribute with an according value (e.g. by object-base lookup or by method calls) to each tuple of a bag has been proposed in [486, 487]. Later, this special case was given the name materialization operator [94]. The unnest operator is known from NF2 [765, 746]. It will come in two different flavors allowing us to perform unnesting not only on nested relations but also on attributes whose value is a bag of elements which are not tuples. The reverse operator is the nest operator, which can be generalized to a grouping operator. In our algebra, there exist two grouping operators: one unary grouping operator and one binary grouping operator (called groupjoin). The unary grouping operator groups one bag of tuples according to a grouping condition. Further, it can apply an arbitrary expression to the newly formed group. The groupjoin adds a group to each element in the first argument bag. This group is formed from the second argument. The groupjoin will exploit the fact that in the object-oriented context objects can have bag-valued attributes. As we will see, this is useful for both, unnesting nested queries and producing nested results. We will even use nesting a a useful tool for processing SQL queries. 7.3.1 Preliminaries As already mentioned, our algebraic operators not only deal with standard relations but are polymorphic in the general sense. In order to fix the domain of the operators, we need some technical abbreviations and notations. Let us introduce these first. Since our operators are polymorphic, we need variables for types. We use τ possibly with a subscript to denote types. To express that a certain expression is of type e, we write e :: τ . Starting from concrete names for types and type variables, we can build type expressions the standard way by using type constructors to build tuple types ([·]), set types {·}s , bag types {·}b and sequence types < · >. Having two type expressions t1 and t2 , we denote by t1 ≤ t2 that t1 is a subtype of t2 . It is important to note that this subtype relationship is not based on the sub-/superclass hierarchy found in most object-oriented models. Instead, it simply denotes substitutability. That is, type t1 provides at least all 222 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES the attributes and member functions that t2 provides [124]. Most of our algebraic operators are tuned to work on bags of tuples. The most important information here is the set of attributes A(e) it provides or produces. For some expression e, the function A(e) is defined as follows. A(e) = {a1 , . . . , an } if e :: {[a1 : τ1 , . . . , an : τn ]}s , e :: {[a1 : τ1 , . . . , an : τn ]}b , e :: ⟨[a1 : τ1 , . . . , an : τn ]⟩, or e :: [a1 : τ1 , . . . , an : τn ]. Given a set of attributes A, we are sometimes interested in the attributes provided by an expression e which are not in A. For this complement we use the notation A(e), which is defined as A(e) \ A. Often, we are not only interested in the set of attributes an expression provides, but also in the set of free variables occurring in an expression e. We use F(e) to denote the set of all free variables (attributes) of e. Since the subscripts of our algebraic operators can contain arbitrary expressions, they may contain variables or even free variables. Then there is a need to get bindings for these variables before the subscript expression can be evaluated. These bindings are taken from the argument(s) of the operator. In order to do so, we need a specified binding mechanism. The λ-notation is such a mechanism and can be used, e.g., in case of ambiguities. For our purpose, it suffices if we stick to the following convention. • For an expression e with free variables F(e) = {a1 , . . . , an } and a tuple t with F(e) ⊆ A(t) we define e(t) := e[a1 ← t.a1 , . . . , an ← t.an ].2 Similarily, we define e(t1 , . . . , tn ) for more than a single tuple. This way, we can use an expressions as a function. Note that the attribute names of the ti have to be distinct to avoid name conflicts. • For an expression e with only one free variable x, we define e(t) = e[x ← t]. The mechanism is very much like the standard binding for the relational algebra. Consider for example a select operation σa=3 (R). Then we assume that a, the free variable of the subscript expression a = 3, is bound to the value of the attribute a of the tuples of the relation R. To express this binding explicitly, we would write for a tuple t ∈ R (a = 3)(t). Since a is an attribute of R and hence of t, by our convention a is replaced by t.a, the value of attribute a of tuple t. Since we want to avoid name conflicts right away, we assume that all variable/attribute names used in a query are distinct. This can be achieved in a renaming step. Typically, renaming takes place during the NFST phase. Application of a function f to arguments ei is denoted by either regular (e.g., f (e1 , . . . , en )) or dot (e.g., e1 .f (e2 , . . . , en )) notation. The dot notation is used for type-associated methods occurring in the object-oriented context. Last, we introduce the heavily overloaded symbol ◦. It denotes function concatenation and (as a special case) tuple concatenation as well as the concatenation of tuple types to yield a tuple type containing the union of the attributes of the two argument tuple types. 2 e[v1 ← e1 , . . . , vn ← en ] denotes a substitution of the variables vi by the expressions ei within an expression e. 223 7.3. OPERATORS Sometimes it is useful to be able to produce a bag containing only a single tuple with no attributes. This is done by the singleton scan operator denoted by 2. Thus, 2 ≡ {[]}b . Very often, we are given some database item which is a bag of other items. Binding these to variables or, equivalently, embedding the items into a tuple, we use the notation e[x] for an expression e and a variable/attribute name x. For bag-valued expressions e, e[x] is defined as e[x] = {[x : y]|y ∈ e}. For sequence-valued expressions e, we define e[a] = ϵ if e is empty and e[a] = ⟨[a : α(e)]⟩ ⊕ τ (e)[a] otherwise. By id we denote the identity function. 7.3.2 Signatures We are now ready to define the signatures of the operators of our algebra. Their semantics is defined in a subsequent step. Remember that we consider all operators as being polymorphic. Hence, their signatures are polymorphic and contain type variables, denoted by τ , often with an index. As mentioned before, we define all operators on bags. Let us start by typing our bag operators ∪ : {τ }b , {τ }b → {τ }b , ∩ : {τ }b , {τ }b → {τ }b , Π \ : {τ }b , {τ }b → {τ }b , D : {τ }b → {τ }b . The unary operators we use have the following signatures, where B denotes 224 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES the type boolean: ΠA : {τ }b → {τ ′ }b if τ ≤ τ ′ = [a1 : τ1 , . . . , an : τn ], A = {a1 , . . . , an }, ′ ΠD A : {τ }b → {τ }b if τ ≤ τ ′ = [a1 : τ1 , . . . , an : τn ], A = {a1 , . . . , an }, σp : {τ }b → {τ }b χf χa:f ΓθG;g:f if p : τ → B, : {τ1 }b → {τ2 }b if f : τ1 → τ2 , : {τ1 }b → {τ1 ◦ [a : τ2 ]}b if f : τ1 → τ2 , : {τ1 ◦ τ2 }b → {τ1 ◦ [g : τ ′ ]}b if τi ≤ [], f : {τ2 }b → τ ′ , G = A(τ1 ), νG;g : {τ1 ◦ τ2 }b → {τ1 ◦ [g : {τ2 }b ]}b if τi ≤ [], G = A(τ1 ), µg : {τ }b → {τ ′ }b if τ = [a1 : τ1 , . . . , an : τn , g : {τ0 }b ], τ0 ≤ [], τ ′ = [a1 : τ1 , . . . , an : τn ] ◦ τ0 , µg;c : {τ }b → {τ ′ }b if τ = [a1 : τ1 , . . . , an : τn , g : {τ0 }b ], τ ′ = [a1 : τ1 , . . . , an : τn ] ◦ [c : τ0 ]. One special operator is needed to translate OQL, which exhibits an explicit flatten operator to unnest bags of bags. An according algebraic operator is defined easily: flatten : {{τ }b }b → {τ }b . 225 7.3. OPERATORS The following is a list of signatures of some binary operators. A : {τ1 }b , {τ2 }b → {τ1 ◦ τ2 }b , Bq : {τ1 }b , {τ2 }b → {τ1 ◦ τ2 }b if τi ≤ [], q : τ1 , τ2 → B, Nq : {τ1 }b , {τ2 }b → {τ1 }b if τi ≤ [], q : τ1 , τ2 → B, Tq : {τ1 }b , {τ2 }b → {τ1 }b if τi ≤ [], q : τ1 , τ2 → B, Eq : {τ1 }b , {τ2 }b → {τ1+ ◦ τ2 }b if τi < [], q : τ1 , τ2 → B, Kq : {τ1 }b , {τ2 }b → {τ1+ ◦ τ2+ }b if τi < [], q : τ1 , τ2 → B, C : {τ1 }b , {τ2 }b → {τ1 ◦ τ2 }b ZA1 θA2 ;g:f if τi ≤ [], : {τ1 }b , {τ2 }b → {τ1 ◦ [g : τ ′ ]}b if τ1 ≤ [], f : {τ2 }b → τ ′ , Ai ⊆ A(τi ) for i = 1, 2. Using some special min/max operators to retrieve the element(s) whose value becomes minimal/maximal often results in more efficient plans: maxg;m;f : {τ }b → [m : τa , g : τf ] if τ ≤ [a : τa ], f : {τa }b → τf . 7.3.3 Projection Let A {a1 , . . . , an } be a set of attributes. We define the two projection operators ΠA (e) := {[a1 : x.a1 , . . . , an : x.an ]|x ∈ e}b , D ΠD A (e) := Π (ΠA (e)). The result of ΠD is always duplicate-free. In concordance with the characteristic . function of bags, we use = to determine whether two elements are equal or D not. Thus, ΠD A is defined such that the characteristic function of ΠA (e) yields max(1, χΠA (e) (x)) for all x ∈ ΠA (e). Thus, it is set-faithful. Typically, the result of ΠA is not duplicate-free, even if its input is duplicate-free. Thus, we need explicit duplicate control here. One exception occurs in the presence of functional dependencies. If A → A(e) and e is duplicate-free, then ΠA (e) is duplicate-free. Sometimes, we want to eliminate a single attribute or a set of attributes. This is denoted by ΠA (e) := ΠA(e)\A (e) D ΠA (e) := ΠD A(e)\A (e) 226 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES e1 := R1 a1 1 2 3 a2 1 2 e3 := Γa2 ;g:id (e2 ) g {[a2 : 1, b2 : 2], [a2 : 1, b2 : 3]}b {[a2 : 2, b2 : 4], [a2 : 2, b2 : 5]}b a1 1 2 3 e5 := e1 Za1 =a2 ;g:id e2 g {[a2 : 1, b2 : 2], [a2 : 1, b2 : 3]}b {[a2 : 2, b2 : 4], [a2 : 2, b2 : 5]}b ∅b e2 := R2 a2 b 1 2 1 3 2 4 5 2 e4 := χg:σa1 =a2 (e2 ) (e1 ) a1 g 1 {[a2 : 1, b2 : 2], [a2 : 1, b2 : 3]}b 2 {[a2 : 2, b2 : 4], [a2 : 2, b2 : 5]}b 3 ∅b e6 := e1 Ea1 =a2 e3 a1 a2 g 1 1 {[a2 : 1, b2 : 2], [a2 : 1, b2 : 3]}b 2 2 {[a2 : 2, b2 : 4], [a2 : 2, b2 : 5]}b 3 Figure 7.4: Example for map and group operators 7.3.4 Selection Note that in the following definition there is no restriction on the selection predicate. It may contain path expressions, method calls, nested algebraic operators, etc.: σp (e) := {x|x ∈ e, p(x)}b . The output of the selection is duplicate-free if its input is duplicate-free. As selection is set-faithful, we do not need any additional set-selection. An example of a selection together with a map operator (discussed next) can be found in Fig. 7.4. 7.3.5 Map The map operator is of fundamental importance to the algebra. It comes in two flavors. The first one extends a given input tuple by an attribute and assigns a value to this new attribute. This variant is also called materialize operator [486, 94]. The second one produces for each input element an output element by applying a function to it. This corresponds to the standard map as defined in, e.g., [487]. The latter is able to express the former. The two variants of the map operator are defined as follows: χa:e2 (e1 ) := {y ◦ [a : e2 (y)]|y ∈ e1 }b , χe2 (e1 ) := {e2 (x)|x ∈ e1 }b . We can generalize the last variant to calculate values for many attributes. Given an attribute assignment vector a1 : e1 , . . . , ak : ek , we define χa1 :e1 ,...,ak :ek (e) := χak :ek (. . . χa1 :e1 (e) . . .). 227 7.3. OPERATORS If we demand that ai ̸∈ A(e), then the ai are new attributes. Then, the materialize operator and its special single-attribute case χa:e are called extending, because it extends a given input tuple with new attributes while it does not modify the values of the input attributes. Many equivalences only hold for this specialization of the map operator, which, at the same time, is the predominant variant used. In fact, it is sufficient for SQL. An example of an extending map operator can be found in Fig. 7.4. Note that the map operator for the object-oriented and object-relational context obviates the need of a relational projection. Sometimes the map operator is equivalent to a renaming. In this case, we will use ρ instead of χ. Let A = {a1 , . . . , an } and B = {b1 , . . . , bn } be two sets with n attributes each. We then define ρA←B (e) := ΠA (χb1 :a1 ,...,bn :an (e)) The result of the extending variant of the map operator is duplicate-free if and only if its input is. Thus, the extending map operator is set-faithful. 7.3.6 Unary Grouping Two grouping operators are contained in our algebra. The first one, discussed here and called (unary) grouping, is defined on a bag and its subscript indicates the (i) grouping criterions and (ii) a new attribute name as well as a function which is used to calculate its value. ΓθG;g:f (e) := {y ◦ [g : x] | y ∈ ΠD G (e), x = f ({z|z ∈ e, z.G θ y.G}b )}s for some set of attributes G, an attribute g and a function f . The comparison . operator θ must be a null-extended comparison operator like ‘=’. Note that the result is a set, but f is applied to a bag. An example for the grouping operator can be found in Fig. 7.4. The grouping criterion may be defined on several attributes. Then, G and . θ represent sequences of attributes and comparators. In case all θ equal ’=’, we . abbreviate Γ=G;g:f by ΓG;g:f . We can extend the above definition to calculate several new attribute values by defining ΓθG;b1 :f1 ,...,bk :fk (e) := {y◦[b1 : x1 , . . . , bk : xk ] | y ∈ ΠD G (e), xi = fi ({z|z ∈ e, z.G θ y.G}b )}s . We also introduce two variants of the grouping operator, which can be used to abbreviate small expressions. Let F = b1 : e1 , . . . , bk : ek and F(ei ) = {g} for all i = 1, . . . , k. Then we define ΓG;F (e) := Πg (χF (ΓG;g:id (e))). Here, the free attribute g is implicit. If we wish to make it explicit, we write ΓG;g;F instead of simply ΓG;F . Note that g plays the same role as partition in OQL ([133, p. 114]). 228 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES Let us also introduce a SQL-notation based variant. Let F be an aggregation vector of the form F = b1 : agg(a1 ), . . . , bk : agg(ak ) 1 k for attributes ai . Then we define Fg as Fg = b1 : agg(g.a1 ), . . . , bk : agg(g.ak ) 1 k and introduce the following abbreviation: ΓG;F (e) := ΓG;g;Fg (e). This is the version we have to use for SQL. The traditional nest operator ν [765], which nests a relation R given a set of attributes G ⊂ A(e), can be defined as an abbreviation of the grouping operator: νG;g (e) := ΓG;g:ΠG (e), where G abbreviates A(e) \ G. The results of Γ and ν are always duplicate-free. Thus, these operators are set-faithful. 7.3.7 Unnest Operators The unnest operator comes in two different flavors. The first one is responsible for unnesting a set of tuples on an attribute being a set/bag/sequence of tuples itself. The second one unnests sets of tuples on an attribute not being a bulk of tuples but a set of something else, e.g., integers. The according definitions are µg (e) := {y.[A(y) \ {g}] ◦ x|y ∈ e, x ∈ y.g}b , µa:g (e) := {y.[A(y) \ {g}] ◦ [a : x]|y ∈ e, x ∈ y.g}b . If the bag-valued attribute is not stored explicitly but derived by the evaluation of an expression, we use the unnest map operator to unnest it: Υe2 (e1 ) := Πg (µg (χg:e2 (e1 ))), Υa:e2 (e1 ) := Πg (µa:g (χg:e2 (e1 ))). The motivation for the unnest map operator is that it saves the explicit materialization of the result of the evaluation of the expression e2 . The results of µg (e) and µa:g are duplicate-free, if and only if the following two conditions hold. 1. The input e is duplicate-free. 2. For each tuple t ∈ e we have that t.g is duplicate-free. Hence, explicit duplicate control for the unnest operator is in order. The same holds for the unnest map operator. 229 7.3. OPERATORS 7.3.8 Flatten Operator The flatten operator flattens a bag of bags by unioning the elements of the bags contained in the outer bag. flatten(e) = {y|x ∈ e, y ∈ x}b The flatten operator’s result is duplicate-free if and only if the bags it contains are duplicate-free and they have a pairwise empty intersection. Thus, explicit duplicate control is very much in order. 7.3.9 Join Operators The algebra features many different join operators. The first five, namely join, semijoin, antijoin, left outerjoin, and full outerjoin – are rather standard: e1 A e2 := {y ◦ x|y ∈ e1 , x ∈ e2 }b , e1 Bp e2 := {y ◦ x|y ∈ e1 , x ∈ e2 , p(y, x)}b , e1 Np e2 := {y|y ∈ e1 , ∃x ∈ e2 , p(y, x)}b , e1 Tp e2 := {y|y ∈ e1 , ¬∃x ∈ e2 p(y, x)}b , e1 Ep e2 := (e1 Bp e2 ) ∪ ((e1 Tp e2 ) A {⊥A(e2 ) }), e1 Kp e2 := (e1 Bp e2 ) ∪((e1 Tp e2 ) A {⊥A(e2 ) }) ∪({⊥A(e1 ) } A (e2 Tp e1 )). An example for the left outerjoin can be found in Fig. 7.4. More examples for join, left outerjoin, and full outerjoin can be found in Fig. 7.6 for the predicate . ′ := (b = qij := (bi = bj ) and in Fig. 7.7 for the predicate qij bj ) . i Regular joins were already present in Codd’s original proposal of a relational algebra [196]. Outerjoins were invented by Lacroix and Pirotte [524]. The next join operator to come is called dependency join, or d-join, and is denoted by C. It is a join between two bags, where the evaluation of the second bag may depend on the first bag. The filled triangle thus shows the direction into which information has to flow in order to evaluate the d-join. It is used to translate from clauses containing table functions with parameters (see Sec. 4.10 for an example) and lateral derived tables into the algebra. Whenever possible, d-joins will be rewritten into standard joins. The definition of the d-join is e1 C e2 := {y ◦ x|y ∈ e1 , x ∈ e2 (y)}b . The result of a d-join is duplicate-free if e1 is duplicate-free and if for each t1 ∈ e1 we have that e2 (t1 ) is duplicate-free. Example applications of the d-join can be found in Sec. 4.10 and Sec. 4.14. For the left outerjoin and the full outerjoin, we need a variant which allows us to set some attribute values to constants other than null for tuples with no 230 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES join partner. Let Di = di1 : ci1 , . . . , dik : cik (i = 1, 2) be two vectors assigning constants cij to attributes dij . We then define 2 e1 ED p e2 := (e1 Bp e2 ) 1 ;D 2 e1 KD e2 p ∪((e1 Tp e2 ) A {⊥A(e2 )\A(D2 ) ◦ [D2 ]}, := (e1 Bp e2 ) ∪((e1 Tp e2 ) A {⊥A(e2 )\A(D2 ) ◦ [D2 ]}), ∪((e2 Tp e1 ) A {⊥A(e1 )\A(D1 ) ◦ [D1 ]}), If one of D1 or D2 is empty, we use − to denote this. As can already be seen from the definitions, this set of join operators is highly redundant. As is well-known, the (regular) join can be expressed as a sequence of selection and cross product: e1 Bq e2 ≡ σq (e1 A e2 ). For an expression e2 and a predicate q, define the predicate p as p = (σq (e2 ) ̸= ∅). Therewith, the semijoin can be be expressed as a selection: e1 Nq e2 ≡ σp (e1 ). If we define p as p = (σq (e2 ) = ∅), then the antijoin can be expressed as e1 Tq e2 ≡ σq (e1 ). The outerjoins were already defined using these three operators, which in turn can be expressed using only selection and cross product. We observe that: • The results of cross product, (regular) join, left outerjoin and full outerjoin are duplicate-free if and only if both of their inputs are duplicate-free. • The results of a semi- and an antijoin are duplicate-free if and only if their left-input is duplicate-free. Thus, it follows that these operators are set-faithful. 7.3.10 Groupjoin The second grouping operator — called groupjoin or binary grouping — is defined on two input bags. It is more than 20 years old, but there is still no common name for it. It was first introduced by von Bültzingsloewen [901, 902] under the name of outer aggregation. Nakano calls the same operator general aggregate formation [637], since unary grouping is called aggregate formation by Klug [502]. Steenhagen, Apers, Blanken, and de By call a variant of the groupjoin nest-join [831]. The groupjoin is quite versatile and we strongly believe that no DBMS can do without it. For example, it has been 7.3. OPERATORS 231 successfully applied to the problem of unnesting nested queries in the context of SQL [92, 104, 105, 637, 901, 902], OQL [189, 190, 191], and XQuery [597]. Chatziantoniou, Akinde, Johnson, and Kim apply the groupjoin to efficiently evaluate data warehouse queries which feature a cube-by or group-by grouping sets clause [146]. They call the groupjoin MD-Join. The groupjoin is defined as follows: e1 ZA1 θA2 ;g:f e2 := {y ◦ [g : G]|y ∈ e1 , G = f ({x|x ∈ e2 , y.A1 θx.A2 }b )}b . Thus, each tuple t1 in e1 is extended by a new attribute g, whose value is the result of applying a function f to a bag. This bag contains all tuples from e2 which join on A1 θA2 with e1 . An example for the groupjoin can be found in Fig. 7.4. In fact, we do not have to rely on a comparison-based predicate. We can generalize the groupjoin to any join predicate: e1 Zq;g:f e2 := {y ◦ [g : G]|y ∈ e1 , G = f ({x|x ∈ e2 , q(x, y)}b )}b . Similar to unary grouping, we will use Zq;g;F to abbreviate Πg (χF (e1 Zq;g:id e2 )), and Zq;F to abbreviate ZA;g;F . In both cases, F must be an aggregation vector with F(F ) = {g}. An SQL notation variant of the groupjoin is defined as e1 Zq;F e2 := e1 Zq;Fg e2 , where the requirements for F and Fg are the same as for unary grouping. Since the reader is most likely not familiar with groupjoin, let us give some remarks and pointers on its implementation. Obviously, implementation techniques for the equijoin and the nest operator can be used if θ stands for equality. For the other cases, implementations based on sorting seem promising. One could also consider implementation techniques for non-equi joins, e.g., those developed for the band-width join [239]. An alternative is to use θ-tables, which were developed for efficient aggregate processing [192]. Implementation techniques for groupjoin have also been discussed in [146, 598]. Note that the groupjoin produces a duplicate-free result if and only if its left input is duplicate-free. It is thus set-faithful. 7.3.11 Min/Max Operators The max operator has a very specific use that will be explained in the sequel. The following definition is a generalization of the Max operator as defined in [189]. Defining a min operator is left to the reader. M axm;g;a;f (e) := [m : max({x.a|x ∈ e}b ), g : f ({x|x ∈ e, x.a = m}b )] The max operator successively performs three tasks. First, it calculates the maximum (m) of all elements contained in e.a for some attribute a ∈ A(e). Second, it uses this maximum (m) to select exactly those elements t from e such that t.a = m, i.e., their a value is maximal. Third, these maximizing elements t from e are collected into a bag and the result of applying the function f to it is stored as the value for the attribute g. In a real implementation, at least the first two phases will be merged. Thus, max requires only a single scan over e. 232 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES The sole purpose of these two operators is to efficiently evaluate expressions which demand to select a maximizing or minimizing element: f (σa=agg(χa (e2 )) (e1 )) ≡ aggm;g;a;f (e1 ).g χg:f (σa=m (e2 )) (χm:agg(e1 ) (e)) ToDo ≡ (7.1) if Πa (e1 ) = ρb←a (Πb (e2 )), χaggm;g;a;f (e1 ) (e) (7.2) if Πa (e1 ) = ρb←a (Πb (e2 )), where agg can stand for min or max. Clearly, in case e1 = e2 the conditions are fulfilled. This can be very useful also in the nested case. 7.3.12 Other Dependent Operators Similar to the d-join, we can introduce a d-semijoin, d-antijoin, and so forth. Before we introduce them, let us first make an important observation on the d-join. Therefore, let e1 and e2 be two expressions and let J = F(e2 ) ∩ A(e1 ). Then e1 C e2 ≡ e1 B (ρJ←J ′ (ΠD (7.3) J (e1 ) C e2 )). Thus, we can evaluate the d-join by first evaluating it for all distinct attribute combinations contained in ΠD J (e1 ) and then joining the result with e1 . This saves redundant evaluations of expression e2 for the same attribute combination. The motivation for the memox operator (M) followed exactly this reasoning (see Sec. 4.14). The expression (ρJ←J ′ (ΠD J (e1 ) C e2 )) will be used quite frequently. Thus, we abbreviate it by eb2 (e1 ) and even by eb2 if e1 is clear from the context. So far, the d-join had no selection predicate. We can simply add one, by defining e1 Cq e2 := e1 C σq (e2 ). Now, we can give alternative expressions for the d-join: e1 Cq e2 ≡ e1 C σq (e2 ), e1 Cq e2 ≡ e1 BJ=J ′ σ\ q (e2 ), e1 Cq e2 ≡ e1 BJ=J ′ (ρJ←J ′ (ΠD J (e1 ) Cq e2 )), e1 Cq e2 ≡ e1 BJ=J ′ ∧q eb2 , e1 Cq e2 ≡ e1 Bq̂ eb2 , where J = F(e2 ) ∩ A(e1 ) and q̂ = ((J = J ′ ) ∧ q). As for the d-join, the filled triangle points into the direction of the information flow for all subsequently defined dependent operators. Let us start with the d-semijoin and d-antijoin. They can be defined using the selection: e1 Oq e2 := σσq (e2 )̸=∅ (e1 ), e1 Uq e2 := σσq (e2 )=∅ (e1 ). We observe that e1 Oq e2 ≡ e1 Nq̂ eb2 , e1 Uq e2 ≡ e1 Tq̂ eb2 , 7.4. LINEARITY OF ALGEBRAIC OPERATORS 233 where J = F(e2 ) ∩ A(e1 ) and q̂ = ((J = J ′ ) ∧ q). The results of the d-semijoin and d-antijoin are duplicate-free if and only if their left argument is. We define the left outer d-join analogously to the left outerjoin: e1 Fq e2 := (e1 Cq e2 ) ∪ ((e1 Uq e2 ) A {⊥A(e2 ) }). Let us expand this definition. With E⊥2 = {⊥A(e2 ) }, J = F(e2 ) ∩ A(e1 ), and q̂ = ((J = J ′ ) ∧ q), we then have e1 Fq e2 ≡ (e1 Cq e2 ) ∪ ((e1 Uq e2 ) A E⊥2 ) ≡ (e1 Bq̂ eb2 ) ∪ ((e1 Tq̂ eb2 ) A E⊥2 ) ≡ e1 Eq̂ eb2 . The result of a left outer d-join is duplicate-free if its left input and eb2 are. Defining a full outer d-join does not make much sense. The third part of the expression (e1 Cq e2 ) ∪ ((e1 Uq e2 ) A {⊥A(e2 ) } ∪ ((e2 Tq e1 ) A {⊥A(e1 ) }) is not even evaluable, since e2 can only be evaluated in the context of bindings derived from e1 . One might be tempted to use eb2 such that the problematic part becomes (eb2 Tq e1 ) A {⊥A(e1 ) }). However, we abandon this possibility. The situation is less complicated for the dependent groupjoin. We can define it as e1 [q;g:f e2 := e1 Zqb;g:f eb2 . (7.4) We leave it as an exercise to the reader to show that g:f (∅) e1 [q;g:f e2 ≡ e1 EJ=J ′ Γq;g:f (eb2 ), (7.5) where J = F(e2 ) ∩ A(e1 ). The result of a dependent groupjoin is duplicate-free if and only if its left input is duplicate-free. 7.4 Linearity of Algebraic Operators 7.4.1 Linearity of Algebraic Operators The notion of linearity was first used by von Bültzingsloewen to simplify proofs of algebraic equivalences [903]. Since it saves a lot of work, we loosely follow his approach. Let us carry over the definition of linearity as defined for sets to bags. A unary function f from bags to bags is called strongly linear if and only if the following two conditions hold for all bags X and Y : f (∅b ) = ∅b , f (X ∪b Y ) = f (X) ∪b f (Y ). 234 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES An n-ary mapping from bags to a bag is called strongly linear in its i-th argument if and only if for all bags X1 , . . . , Xn and Xi′ the following conditions hold: f (X1 , . . . , Xi−1 , ∅b , Xi+1 , . . . , Xn ) = ∅b f (X1 , . . . , Xi−1 , Xi ∪b Xi′ , Xi+1 , . . . , Xn ) = f (X1 , . . . , Xi−1 , Xi , Xi+1 , . . . , Xn ) ∪b f (X1 , . . . , Xi−1 , Xi′ , Xi+1 , . . . , Xn ) It is called strongly linear if it is strongly linear in all its arguments. For a binary function or operator where we can distinguish between the left and the right argument, we call it strongly left (right) linear if it is strongly linear in its first (second) argument. Using the commutativity of bag union and bag intersection as well as the observations that in general (∅b ∪b X) ̸= ∅b , (∅b ∩b X) = ∅b , (∅b \b X) = ∅b , (X \b ∅b ) ̸= ∅b and (X ∪b Y ) ∪b Z ̸= (X ∪b Z) ∪b (Y ∪b Z), (X ∪b Y ) ∩b Z ̸= (X ∩b Z) ∪b (Y ∩b Z), (X ∪b Y ) \b Z ̸= (X \b Z) ∪b (Y \b Z), X \b (Y ∪b Z) ̸= (X \b Y ) ∪b (X \b Z), we can conclude that bag union is neither strongly left nor strongly right linear, bag intersection is neither strongly left nor strongly right bag-linear, and bag difference is neither strongly left nor strongly right linear. We can relax the definition of strongly linear by the additional assumption that the intersection of the two unioned bags is empty. A unary function f from bags to bags is called weakly linear if and only if the following two conditions hold for all bags X and Y with X ∩b Y = ∅b : f (∅b ) = ∅b , f (X ∪b Y ) = f (X) ∪b f (Y ). An n-ary mapping from bags to a bag is called weakly linear in its i-th argument if and only if for all bags X1 , . . . , Xn and Xi′ with Xi ∩b Xi′ = ∅b the following conditions hold: f (X1 , . . . , Xi−1 , ∅b , Xi+1 , . . . , Xn ) = ∅b , f (X1 , . . . , Xi−1 , Xi ∪b Xi′ , Xi+1 , . . . , Xn ) = f (X1 , . . . , Xi−1 , Xi , Xi+1 , . . . , Xn ) ∪b f (X1 , . . . , Xi−1 , Xi′ , Xi+1 , . . . , Xn ). It is called weakly linear , if it is weakly linear in all its arguments. For a binary function or operator where we can distinguish between the left and the right 7.4. LINEARITY OF ALGEBRAIC OPERATORS unary operator linear ΠD ◦ ΠA + ΠD ◦ A σp + χa:e + χf + ΓθG;F νG;g µg + µa:g + Υf + Υa:f + flatten + operator ∪ ∩ \ A Bp Np Tp Ep Kp Zp;F C 235 binary left lin. right lin. ◦ ◦ ◦ + + + + + + + + + does not apply Table 7.1: Linearity of algebraic operators argument, we call it weakly left (right) linear if it is weakly linear in its first (second) argument. Using the commutativity of bag union and bag intersection as well as the observations that in general (∅b ∪b X) ̸= ∅b , (∅b ∩b X) = ∅b , (∅b \b X) = ∅b , (X \b ∅b ) ̸= ∅b and (X ∪b Y ) ∪b Z ̸= (X ∪b Z) ∪b (Y ∪b Z), (X ∪b Y ) ∩b Z = (X ∩b Z) ∪b (Y ∩b Z), (X ∪b Y ) \b Z = (X \b Z) ∪b (Y \b Z), Z \b (X ∪b Y ) ̸= (Z \b X) ∪b (Z \b Y ) for X∩b Y = ∅b , we can conclude that bag union is neither weakly left nor weakly right linear, bag intersection is weakly linear, and bag difference is weakly left but not weakly right linear. For the whole algebra, Table 7.1 summarizes the linearity properties for all of our algebraic operators. Thereby, a ’+’ denotes strong linearity, ’◦’ denotes weak linearity, and ’-’ denotes neither of them. Let us take a closer look at the gap between weak and strong linearity. For some bag B, define the unary function f on bags such that  3 if x ∈ B, χf (B) (x) = 0 if x ̸∈ B 236 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES holds. Then, f is weakly linear but not strongly linear. The problem is that f manipulates the multiplicity of the elements. We can make the difference between weakly and strongly linear explicit. Therefore, we remember that the only difference in the definition was the disjointness we required for weak linearity. Consequently, we consider now the special case of bags containing a single element multiple times. We say that a unary function f is duplicate faithful if and only if for all x f ({xm }b ) = ∪m i=1 f ({x}b ) holds. Then, a unary function is strongly bag linear if and only if it is weakly bag linear and duplicate faithful. The same holds for n-ary functions if we extend the property duplicate faithful to multiple arguments. To see that the left semijoin is not even weakly right linear, consider the following example: {[a : 1]}b = {[a : 1]}b Na=b {[b : 1, c : 1], [b : 1, c : 2]}b = {[a : 1]}b Na=b ({[b : 1, c : 1]}b ∪ {[b : 1, c : 2]}b ) ̸= ({[a : 1]}b Na=b {[b : 1, c : 1]}b ) ∪ ({[a : 1]}b Na=b {[b : 1, c : 2]}b ) = {[a : 1]2 }b . This is the reason why some equivalences valid for sets do not hold for bags anymore. For example, ΠA(e1 ) (e1 Bq12 e2 ) ≡ e1 Nq12 e2 holds for sets but not for bags. If we eliminate duplicates explicitly, we still have D ΠD A(e1 ) (e1 Bq12 e2 ) ≡ ΠA(e1 ) (e1 Nq12 e2 ). (7.6) D ΠD A(e1 ) (e1 Eq12 e2 ) ≡ ΠA(e1 ) (e1 ), (7.7) Similiarily, we have D ΠD A(e1 ) (e1 Kq12 e2 ) ≡ ΠA(e1 ) (e1 ). (7.8) Let us now present some sample proofs of linearity. All proofs are by induction on the number of distinct elements contained in the argument bags. χf is strongly linear. χf (∅b ) = ∅b χf ({xm }b ) = ∪m i=1 f ({x}b ) χf (e1 ∪ e2 ) = {f (x)|x ∈ e1 ∪ e2 }b = {f (x)|x ∈ e1 }b ∪ {f (x)|x ∈ e2 }b = χf (e1 ) ∪ χf (e2 ) E is strongly left linear. m ∅b Eq e2 = ∅b {x }b Eq e2 = ∪m i=1 ({x}b Eq e2 ) (e′1 ∪ e′′1 ) Eq e2 = ((e′1 ∪ e′′1 ) Bq e2 ) ∪ (((e′1 ∪ e′′1 ) Tq e2 ) A {⊥A(e2 ) }b ) = (e′1 Bq e2 ) ∪ (e′′1 Bq e2 ) ∪ ((e′1 Tq e2 ) A {⊥A(e2 ) }b ) ∪ ((e′′1 Tq e2 ) A {⊥A(e2 ) }b ) = (e1 Eq e) ∪ (e2 Eq e) 7.4. LINEARITY OF ALGEBRAIC OPERATORS 237 Here, we exploited the linearity of join and antijoin. Since e1 Eq ∅b = ∅b if and only if e1 = ∅b , E is not even weakly right linear. C is strongly left linear. m ∅b C e2 = ∅b {x }b C e2 = ∪m i=1 ({x}b C e2 ) (e′1 ∪ e′′1 ) C e2 = {y ◦ x|y ∈ e′1 ∪ e′′1 , x ∈ e2 (y)}b = {y ◦ x|y ∈ e′1 , x ∈ e2 (y)}b ∪ {y ◦ x|y ∈ e′′1 , x ∈ e2 (y)}b = (e′1 C e2 ) ∪ (e′′1 C e2 ) Note that the notion of linearity cannot be applied to the second (inner) argument of the d-join, since, in general, it cannot be evaluated independently of the first argument. ΓG;:f is not linear. Consider the following counterexample: Γa;g:id ({[a : 1, b : 1], [a : 1, b : 2]}b ) = {[a : 1, g : {[a : 1, b : 1], [a : 1, b : 2]}b ]}b ̸= {[a : 1, g : {[a : 1, b : 1]}b ]}b ∪ {[a : 1, g : {[a : 1, b : 2]}]}b = Γa;g:id ({[a : 1, b : 1]}b ) ∪ Γa;g:id ({[a : 1, b : 2]}b ). µg is strongly linear. µg (∅b ) = ∅b µg ({xm }b ) = ∪m i=1 (µg ({x}b ) µg (e1 ∪ e2 ) = {x.[g] ◦ y|x ∈ e1 ∪ e2 , y ∈ x.g}b = {x.[g] ◦ y|x ∈ e1 , y ∈ x.g}b ∪ {x.[g] ◦ y|x ∈ e2 , y ∈ x.g}b = µg (e1 ) ∪ µg (e2 ) µa:g is also linear. This is shown analogously to the linearity of µg . flatten is strongly linear. flatten(∅b ) = ∅b flatten({xm }b ) = ∪m i=1 (flatten(x)) flatten(e1 ∪ e2 ) = {x|y ∈ e1 ∪ e2 , x ∈ y}b = {x|y ∈ e1 , x ∈ y}b ∪ {x|y ∈ e2 , x ∈ y}b = flatten(e1 ) ∪ flatten(e2 ) Note that the notion of linearity does not apply to the max operator, since it does not return a bag. 238 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES 7.4.2 Exploiting Linearity The concatenation of two weakly (strongly) linear mappings is again a weakly (strongly) linear mapping. Assume f and g to be weakly (strongly) linear mappings. Then f (g(∅b )) = ∅b , f (g({xm }b )) = ∪m i=1 f (g({x}b )), f (g(X ∪ Y )) = f (g(X) ∪ g(Y )) = f (g(X)) ∪ f (g(Y )), where the second line only applies for strongly linear mappings, and the bags X and Y are disjoint in case of weakly linear mappings. From the linearity considerations of the previous subsection, it is easy to derive reorderability laws. Let f : {τ1f } → {τ2f } and g : {τ1g } → {τ2g } be two strongly linear mappings. If f (g({x}b )) = g(f ({x}b )) for all bags {x}b containing a single element, then f (g(e)) = g(f (e)) (7.9) This is proven by induction on the number of distinct elements contained in a bag e. If e is empty, the statement follows directly from the linearity of f and g. For the induction, let e = e1 ∪ e2 . Then f (g(e)) = = =I.H. = = f (g(e1 ∪ e2 )) f (g(e1 )) ∪ f (g(e2 )) g(f (e1 )) ∪ g(f (e2 )) g(f (e1 ∪ e2 )) g(f (e)). 2 For strongly linear algebraic operators working on bags of tuples, we can replace the semantic condition f (g({x}b )) = g(f ({x}b )) by a syntactic criterion. The main issue here is to formalize that two operations do not interfere in their consumer/producer/modifier relationship on attributes. Let us first get rid of modifications. There are only a few algebraic operators which are capable of modifying an attribute’s value. One of them is the map operator. By a proper renaming of attributes, we can assume without loss of generality that no operator modifes an existing attribute value. For the map operator (say χa:e2 (e1 )), it would mean that it only introduces new attributes (thus a ̸∈ A(e1 )). This renaming step is an essential part of the NFST phase. This leaves us with checking the consumer/producer relationships. Consider, for example, the algebraic equivalence σp (e1 Bq12 e2 ) ≡ (σp (e1 )) Bq12 e2 . It is well-typed if and only if the predicate p does not access any attributes from e2 , i.e., F(p) ∩ A(e2 ) = ∅. Now, for most of our operators we can be sure that f (g(e)) = g(f (e)) for singleton bags e. This holds if and only if the following two conditions hold: 239 7.5. REPRESENTATIONS operator ΠD ΠA ΠD A σp χa:e ΓθG;F νG;g µg µa:g Υa:f unary produced ∅ ∅ ∅ ∅ {a} A(F ) {g} A(g) {a} {a} deleted ∅ A A ∅ ∅ G G {g} {g} ∅ operator ∪ ∩ \ A Bq Nq Tq Eq Kq Zq;g:F binary produced ∅ ∅ ∅ ∅ ∅ ∅ ∅ A(e2 ) A(e1 ) ∪ A(e2 ) {g} deleted ∅ ∅ A(e2 ) ∅ ∅ A(e2 ) A(e2 ) ∅ ∅ ∅ Table 7.2: Produced and deleted attributes of algebraic operators 1. g does not access attributes produced by f , and 2. f does not access attributes produced by g. We can formalize this as follows. We denote by P the set of produced attributes and by D the set of destroyed attributes (e.g., projected away). Given a unary operator f and an expression e, and a binary operator ◦ and expressions e1 and e2 , we can define P(f ) D(f ) P(◦) := := := A(f (e)) \ A(e), A(e) \ A(f (e)), A(e1 ◦ e2 ) \ (A(e1 ) ∪ A(e2 )), D(◦) :=∗ (A(e1 ) ∪ A(e2 )) \ A(e1 ◦ e2 ). A special case concerns the produced attributes of outerjoins. Since attribute values are assigned in case of null-padding a tuple with no join partner, we have to add the attributes of the preserved side(s) to the set of produced attributes. Table 7.2 shows the sets of produced and deleted attributes for some selected algebraic operators. Using this notation, the condition f (g({x}b )) = g(f ({x}b )) is satisfied if the two conditions F(f ) ∩ P(g) = P(f ) ∩ F(g) = ∅b , F(f ) ∩ D(g) = D(f ) ∩ F(g) = ∅b hold and if f and g are any of our unary operators. This statement is valid because we excluded attribute modifications. 7.5 Representations 7.5.1 Three Different Representations In this section, we discuss different representations for sets, bags, and sequences. Let us start with bags. Fig. 7.5 shows three different representations for bags. 240 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES R A 1 1 2 B 1 1 2 R A B 1 1 2 2 m 2 1 A 1 1 2 R B 1 1 2 i 1 2 3 Figure 7.5: Three possible representations of a bag The first representation (left) is the usual representation. Thus, we call it standard representation. Here, duplicates are represented by duplicating tuples. The order of appearance of the tuples in a bag is immaterial. Thus, the tuples could be represented in any order and still yield the same bag. The second representation (middle) contains every tuple only once, and the multiplicity of a tuple is given explicitly in the special attribute m. Hence, we call it multiplicity-based representation. Again, the order of tuples is immaterial. The third representation (right) adds a surrogate or tuple identifier i to each tuple. Again, the attribute i is special and not visible to the user. Klausner and Goodman say that i is a hidden attribute [497, 498]. In the example, it is a non-negative integer. We call this representation tid-based. The order of tuples is again immaterial if we use this representation for bags. The only property we need is the uniqueness of the TID-attribute i. Observe that although the bag represented contains duplicates, the multiplicitybased and the tid-based representations are duplicate-free, i.e., they are sets. For a bag e in a multiplicity-based representation, we even assume that Πm (e) is duplicate-free. This assumption will be relaxed later on. Note that if there are a lot of duplicates, then the multiplicity-based representation requires far less storage than the other representations. Otherwise, the storage overhead for the multiplicity-based and tid-based representations is negligible if the original tuples are not too small. For a set, all three representations are valid. Further, all of them are sets. Note that in the multiplicity-based representation m = 1 for all tuples. Sequences are a little trickier. Here, the order is important. We must thus assume that all representations imply some implicit order. This implicit order can be caused, for example, by a certain storage order, a list representation, or a tuple stream. For the rest of this section, we assume that all representations of bulk types are based on streams of tuples. Thus, they have an implicit order, which in case of sets or bags might not be relevant. A typical example of a relevant implicit order is a document scan in XQuery, where the resulting nodes in the stream are in document order. Consider now the tid-based representation. We require that i reflects the order of tuples in the sequence. For any two tuples t1 and t2 , we must have that t1 occurs before t2 in the sequence if and only if t1 .i < t2 .i. Consider the sequence ⟨[a : 1], [a : 2], [a : 1]⟩. Clearly, the multiplicity-based representation ⟨[a : 1, m : 2], [a : 2, m : 1]⟩ looses 241 7.5. REPRESENTATIONS the order. One way to remedy this situation is to keep not only the multiplicity of an element but its positions. This results in ⟨[a : 1]1,3 , [a : 2]2 ⟩ or in ⟨[a : 1, p : {1, 3}s ], [a : 2, p : {2}s ]⟩ if we represent the set of positions at which a tuple occurs in an extra attribute p. We call this a position-based representation. It is duplicate free and in case of multiple duplicates in the original sequence, it saves some memory. 7.5.2 Conversion between Representations Assume that the bag e is given in standard representation, and we wish to convert it to a multiplicity-based representation. Then ΓA(e);p;m:|p| (e) does the job. To go into the other direction, we need a special unnest operator, which, for a given attribute m in non-negative integers, produces m copies of a tuple. Since it looks like a special unnest operator, we use µm to denote it. It is defined as µm (e) := Πm ({tm |t ∈ e}b ) for attributes m in non-negative integers. Then, µm (e) converts a bag e from multiplicity-based to standard representation. To go from a standard representation of a bag to a tid-based representation, we apply a special tid-operator TIDi , which produces distinct TIDs (numbers) for every input tuple and stores this number in the attribute i. Assume some global variable c is initialized with 0, then TIDi (e) could be defined as χi:++c (e), mixing algebra and C++ code. The reverse direction is easily specified by Πi . Let us now turn to sets. Converting a set e in standard notation to a multiplicity-based representation is performed by χm:1 (e). Converting a set e in multiplicity-based representation to one in standard representation can be simplified to Πm(e) . The other conversions for sets are the same as for bags. Given a sequence e in standard representation with implicit ordering, we can apply the above TID-operator to convert e into a tid-based representation with TIDi (e). The reverse conversion is not simply Πi (e), except if we are sure that e is already sorted on i. Otherwise, since sorting on i will restore the order, we can convert a sequence e in tid-based representation with Πi (Sorti (e)), where we again must assume an implicit ordering. We can construct a position-based representation of a sequence given in a tid-based representation with ΓA(e)\{i};g:Π (e). The opposite direction is speci ified by Sorti (µg (e)). As an exercise, the reader should design an algorithm which allows to calculate the combination Sorti ◦ µg efficiently. EXC 7.5.3 Conversion between Bulk Types Given a set e in some representation, the same representation is a valid bag representation. Hence, this conversion is a no-op. 242 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES The opposite direction is also simple. Given a bag e in standard representation, ΠD produces a set in standard representation. This is expensive. Given a bag e in tid-based representation, ΠD (e) produces a set in stanA(e)\{i} dard representation. This is expensive, too. Given a bag e in multiplicity-based representation, Πm (e) produces a set in standard representation. This is cheap. Going from a sequence to a bag is simple. If the sequence is given in standard representation with implicit order or in tid-based representation, the conversion to a bag is the identity function. If a sequence e is given in the position-based representation with attribute p containing the set of positions, then Πp (χm:|p| (e)) converts it to a bag in multiplicity-based representation. From there, we can go anywhere. Obviously, going from a bag to a sequence requires explicit sorting. 7.5.4 Adjusting the Algebra Implicitly, we have defined our algebra on the standard representation. This is not coercive. Consider two bags e1 and e2 in a multiplicity-based representation. We can define a special counting cross product by e1 Am12 :m1 ∗m2 e2 := Πm1 ,m2 ({t1 ◦ t2 ◦ [m : t1 .m1 ∗ t2 .m2 ]|ti ∈ ei }) Doing the same exercise with the regular join operator results in the so-called counting join [944]. Luckily, it is not necessary to introduce a special counting cross product, as can be seen from e1 Am12 :m1 ∗m2 e2 ≡ Πm1 ,m2 (χm12 :m1 ∗m2 (e1 A e2 ), which can be generalized to Π{mi } (χm:Qi mi (e1 A . . . A en )) EXC if we want to take the counting cross product of n bags ei . Similarily, we can handle a cross product of sequences in a position-based representation, which is left as an exercise to the reader. Let us turn to projection. Consider the bag {[a : 1, b : 2, m : 3], [a : 2, b : 3, m : 4], [a : 1, b : 4, m : 2]}b in the multiplicity-based representation. Applying Πa carelessly results in {[a : 1, m : 3], [a : 2, m : 4], [a : 1, m : 2]}b , which is no longer a multiplicity-based representation, as it is not a set anymore. More specifically, Πa of the above bag contains duplicates. We use this representation as an alternative fourth representation. We call it multiplicity-based representation with duplicates. First, note that in terms of conversions and algebraic operators like join, the duplicates do not imply problems. We just loose some compression of the data. Second, observe that performing a Γa;m:sum(m) fixes this problem, i.e., it turns a bag in multiplicity-based representation with duplicates into the regular, duplicate-free multiplicity-based representation. 243 7.6. A NOTE ON EQUIVALENCES 7.5.5 Partial Preaggregation To illustrate partial preaggregation (or partial pregrouping) we use a hash-based implementation of the grouping operator as an example. Assume that it has limited buffer space and can keep k groups. If the buffer overflows, some group is ejected from the buffer to produce new free space for a new group. We denote such a special implementation of the grouping operator by Γpre(k) and call it partial preaggregation or partial pregrouping. pre(1) We illustrate it by applying it in the form of Γa;g;m:|g| to the following bag B := {[a : 1], [a : 1], [a : 2], [a : 1], [a : 1]}b . The result is Bpre := {[a : 1, m : 2], [a : 2, m : 1], [a : 1, m : 2]}b , where we assumed that the implicit order in which the pregrouping operator sees the tuples is from left to right. Calculating Γa;g;m:sum(m) gives with Bm := {[a : 1, m : 4], [a : 2, m : 1]}b the regular duplicate-free multiplicity-based representation. This observation also holds for sets of attributes, as in pre(k) ΓG;m:count(∗) (e) ≡ ΓG;m:sum(m′ ) (ΓG;m′ :count(∗) (e) (7.10) for any k ≥ 0, where we used the SQL-notation based variant of grouping. Recall that aggregation functions and vectors can be decomposable. Then it is easy to generalize the above equivalence. Let F be an aggregation vector decomposable into F 1 and F 2 . Then pre(k) ΓG;F (e) ≡ ΓG;F 2 (ΓG;F 1 (e) (7.11) holds. If the grouping operator is pushed into a join or any other binary operator and still some outer grouping is present (see Sec. 7.11), then the inner grouping can be replaced by a pregrouping. General partial pregrouping or preaggregation is discussed in several papers [425, 534]. They also discuss the expected resulting number of tuples of partial pregrouping. 7.6 A Note on Equivalences We have already seen expressions like E1 ≡ E2 , where the expressions Ei contain algebraic operators and other symbols. Typically, they contain variables ei to denote base tables or other database items or even algebraic expressions. They also contain p or q for predicates. These are essentially variables for predicates. Also attributes names a or b can be contained in the Ei . Again, these are essentially place holders for ‘real’ attribute names. Sometimes constants c are used. Let us collectively call all these underspecified symbols variables. We will then say that E1 ≡ E2 if and only if the following condition holds: For all 244 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES bindings for all variables in E1 and E2 , if E1′ and E2′ , which result from applying these bindings to them, are well-typed, then the evaluation of E1′ and E2′ yields the same result. Equivalence of relational expressions is discussed, e.g., by Aho, Saviv, and Ullman [17, 18]. In fact, they discuss weak and strong equivalence. We use strong equivalence. Weak equivalence is defined on universal relations and is not sufficent for our purpose. 7.7 Simple Reorderability 7.7.1 Unary Operators The simplest property we are interested in is the idempotency of unary operators. A unary operator f is called idempotent, if f (f (x)) = f (x) for all D D x. Since ΠD (x) = ΠD (ΠD (x)), ΠA (x) = ΠA (ΠA (x)), ΠD A (x) = ΠA (ΠA (x)), and σp (x) = σp (σp (x)), these operators are idempotent. Our other operators are not. For projections, we also have some generalized idempotency. Let A and B be two attribute sets with A ⊆ B. Then ΠA (x) = ΠA (ΠB (x)), D D D D ΠD A (x) = ΠA (ΠB (x)), and ΠA (x) = ΠA (ΠB (x)). As can be seen from the definition of the grouping operator, it is a generalization of duplicate elimination. If we apply a grouping with an empty aggregation vector, then it is equivalent to a duplicate elimination. In other words, ΠD (7.12) A (e) ≡ ΓA;() (e) holds for any set of attributes A with A ⊆ A(e). As a consequence, we have to ask ourselves whether there exists a property generalizing idempotency that holds for grouping. Indeed there is one. Let F be an aggregation vector which is decomposable into F 1 and F 2 , and G and G+ be two sets of grouping attributes with G ⊆ G+ . Then (7.13) ΓG;F (e) ≡ ΓG;F 2 (ΓG+ ;F 1 (e)) holds, since we can first group at a finer granularity and then combine finer groups to the groups derived from the grouping by G. We can even go a step further in the presence of functional dependencies. Assume the functional dependency G → G′ holds for two sets of grouping attributes G and G′ . Then, the equivalence ΓG;F (e) ≡ ΠG∪A(F ) (ΓG∪G′ ;F (e)) (7.14) holds, since the groups and their contents are the same in both cases. This equivalence can also be found under the name simplify group-by in a paper by Tsois and Sellis [882]. A slightly more general version for any function f also holds: ΓG;g:f (e) ≡ ΠG∪{g} (ΓG∪G′ ;g:f (e)). (7.15) Eqv. 7.14 can be simplified if, in addition to G → G′ , G ⊆ G′ holds: ΓG;F (e) ≡ ΠG∪A(F ) (ΓG′ ;F (e)). (7.16) 245 7.7. SIMPLE REORDERABILITY ΠD ΠA ΠD A σ χ Γ ν µ Υ ΠD + + + (-) (-) ΠA + + + + ΠD A + + (-) (-) σ + + + + + ◦ + + + χ + + + + + + + Γ ◦ - ν + - µ (-) + (-) + + + + Υ (-) + (-) + + + + Table 7.3: Reorderability of unary operators While introducing linearity, we have already discussed the usefulness of linearity for reordering unary operators. Table 7.3 shows a ’+’ sign if f (g(x)) ≡ g(f (x)) for two unary operators f and g. If this does not hold, the according entry contains a ‘-’. Thereby, we have to take care that the consumer/producer relationship is not disturbed. For example, χa:e and σp can only be reordered if a ̸∈ F(p). Since this should be clear by now, we will not always mention it explicitly anymore. The cases marked by ‘(-)’ involve the unnest or unnest map operator, and they need an additional condition to be reorderable. If we want to reorder µg with a duplicate eliminating projection on some input e, we must require that t.g is duplicate-free for all t ∈ e. For Υf , we must require that f (t) is duplicate-free for all t ∈ e. In general, reordering a selection with a grouping ΓθG;g:f is wrong. If, however, all comparison operators are based on equality, we can reorder it with a selection σp as in σp (ΓG;g:f (e) ≡ ΓG;g:f (σp (e)). (7.17) Of course, this requires that g ̸∈ F(p) and (F(p) ∩ A(e)) ⊆ G. A real annoyance is the fact that we cannot reorder a map with a grouping operator. But this situation can be remedied. Then, the map as well as the selection will become reorderable with all operators needed in the context of SQL. Consider the expression χa:e2 (ΓG;g;F (e1 )). It is valid only if F(e2 ) ⊆ G ∪ A(F ). If F(e2 ) ∩ A(F ) ̸= ∅, there is no hope to change the order of the map and grouping operators. Thus, assume that F(e2 ) ⊆ G. The expression ΓG;g;F (χa:e2 (e1 )) typically does not contain attribute a. However, since F(e2 ) ⊆ G, we observe that G → a if e2 contains only deterministic functions, which we assume. Thus, we have χa:e2 (ΓG;g;F (e1 )) = ΓG∪{a};g;F (χa:e2 (e1 )), (7.18) χa:e2 (ΓG;F (e1 )) = ΓG∪{a};F (χa:e2 (e1 )) (7.19) and if F(e2 ) ⊆ G. Whereas the expression e2 is evaluated once per group on the left-hand side, it is evaluated once per item in e1 on the right-hand side. This 246 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES ΠD ΠA ΠD A σ χ Γ ν µ Υ ∪ ◦/◦ -/-/-/-/-/-/-/-/- ∩ +/+ -/-/+/+ -/-/-/-/-/- \ +/◦ -/-/+/-/-/-/-/-/- A ◦/◦ -/-/+/+ +/+ -/-/+/+ +/+ B ◦/◦ -/-/+/+ +/+ -/-/+/+ +/◦ N +/+/+/+/+/+/+/+/+/- T +/+/+/+/+/+/+/+/+/- C ◦/-/-/+/+ +/+ -/-/+/+ +/+ E ◦/-/-/+/+/◦ -/-/+/+/- K ◦/◦ -/-/-/◦/◦ -/-/-/-/- Z +/-/-/+/+/-/-/+/+/- Table 7.4: Left/right push-down is unfortunate if we want to apply the equation from left to right. Remember that A(g) = A(e1 ) in ΓG;g;F (e1 ). Thus, A(e2 ) ⊆ G ⊆ A(g), and we can add the calculation of e2 to F . Therefore, we need the function pick, which picks an arbitrary element out of a bag. This is deterministic, since all tuples in a group g have the same values for all attributes contained in G. Then, we have χa:e2 (ΓG;g;F (e1 )) = ΓG;g;F ◦(a:(e2 (pick(g)))) (e1 ) (7.20) if F(e2 ) ⊆ G. In our SQL-notation variant of Γ, this reads like χa:e2 (ΓG;F (e1 )) = ΓG;g;F ◦(a:e2 (pick(g))) (e1 ) (7.21) if F(e2 ) ⊆ G. 7.7.2 Push-Down/Pull-Up of Unary into/from Binary Operators In this section, we consider pushing down (pulling up) unary operators into (from) the arguments of binary operators. Thus, we are interested in equivalences of the form f (e1 ◦ e2 ) ≡ f (e1 ) ◦ e2 and f (e1 ◦ e2 ) ≡ e1 ◦ f (e2 ). First, let us see how linearity helps in this context. Let f be a unary, strongly linear mapping and ◦ a binary mapping that is strongly linear in its left argument. If for all expressions e1 and e2 and for all xi ∈ ei we have f ({x1 }b ◦ {x2 }b ) = (f ({x1 }b ) ◦ {x2 }b ), then f (e1 ◦ e2 ) ≡ f (e1 ) ◦ e2 . If e1 is empty, f (e1 ◦ e2 ) and f (e1 ) ◦ e2 are also empty. If e1 is a singleton bag, the claim follows from the prerequisite. For the induction step, we observe that f (e1 ◦ e2 ) = = f ((e′1 ◦ e2 ) ∪ (e′′1 ◦ e2 )) f (e′1 ◦ e2 ) ∪ f (e′′1 ◦ e2 ) =I.H. (f (e′1 ) ◦ e2 ) ∪ (f (e′′1 ) ◦ e2 ) = f (e1 ) ◦ e2 247 7.7. SIMPLE REORDERABILITY if e1 = e′1 ∪ e′′1 . Our prerequisite required f ({x1 }b ◦ {x2 }b ) = (f ({x1 }b ) ◦ {x2 }b ). n m n If instead the stronger f ({xm 1 }b ◦ {x2 }b ) = (f ({x1 }b ) ◦ {x2 }b ) holds and ◦ is weakly left linear, then it suffices to push f down into the left argument of ◦. This follows from the above prove and the additional condition e′1 ∩ e′′1 = ∅. The induction is on the number of distinct elements in e1 . Table 7.4 summarizes the validity of pushing a unary operator down into the left or right argument of a binary operator. Again, some restrictions apply. First, we restrict ourselves to the map operator in its extending form χa:e . Other critical cases are marked by ◦. They include duplicate elimination and outerjoins. We open our discussion with duplicate elimination. Since duplicate elimination is weakly but not strongly linear, it is not surprising that we need additional conditions to push it down a binary operator. We have ΠD (e1 ∪ e2 ) ΠD (e1 A e2 ) D Π (e1 Bq12 e2 ) ΠD (e1 C e2 ) ΠD (e1 Eq12 e2 ) ΠD (e1 Kq12 e2 ) ≡ ≡ ≡ ≡ ≡ ≡ ΠD (e1 ) ∪ e2 if dupfree(e2 ) ∧ (e1 ∩ e2 ) = ∅b , D Π (e1 ) A e2 if dupfree(e2 ), ΠD (e1 ) Bq12 e2 if dupfree(e2 ), ΠD (e1 ) C e2 if ∀t1 ∈ e1 dupfree(e2 (t1 )), ΠD (e1 ) Eq12 e2 if dupfree(e2 ), ΠD (e1 ) Kq12 e2 if dupfree(e2 ), where dupfree(e) denotes the fact that e is duplicate-free. Let us now take a closer look at the case where we try to push down a map operator into the right-hand side of a left outerjoin. Consider the expression χa2 :f2 (e1 Eq12 e2 ), where F(f2 ) ∩ A(e1 ) = ∅. The question is to which value f (⊥A(e2 ) ) evaluates. If it evaluates to null, then we do not have any problem, since outerjoins append nulls. It if does not, the value for a will differ in e1 Eq12 χa2 :f2 (e2 ). We thus say that an expression or function f rejects null values on a set of attributes A if f (A) = null. With conditions attached, the equivalences read as follows: χa2 :f2 (e1 Eq12 e2 ) ≡ e1 Eq12 χa2 :f2 (e2 ) if f2 rejects null values on A(e2 ), χa1 :f1 (e1 Kq12 e2 ) ≡ χa1 :f1 (e1 ) Kq12 e2 if f1 rejects null values on A(e1 ), χa2 :f2 (e1 Kq12 e2 ) ≡ e1 Kq12 χa2 :f2 (e2 ) if f2 rejects null values on A(e2 ). The reorderability properties of the grouping operator and its special case, the nest operator, are of some concern because they are not linear. However, reordering is not hopeless. Let us consider the semijoin first. Let p be a selection predicate and G be a set of grouping attributes. If for some expression e the condition F(p) ∩ A(e) ⊆ G holds, we know that we can exchange the order of grouping and selection: ΓG;F (σp (e)) ≡ σp (ΓG;F (e)). From the definition of the semijoin and the above equivalence with p = (σq (e2 ) ̸= ∅) it follows that ΓG;F (e1 Nq e2 ) ≡ ΓG;F (σσq (e2 )̸=∅ (e1 )) ≡ σσq (e2 )̸=∅ (ΓG;F (e1 )) ≡ ΓG;F (e1 ) Nq e2 (7.22) 248 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES ∪ ∩ \ ΠD + + ΠA + - ΠD A - σ + + + χ + + + Γ - ν - µ + + + Υ + + + Table 7.5: Simultaneous push-down if (F(q) ∩ A(e1 )) ⊆ G. Analogously, we can derive ΓG;F (e1 Tq e2 ) ≡ ΓG;F (e1 ) Tq e2 (7.23) if (F(q) ∩ A(e1 )) ⊆ G. Pushing down a grouping into one of the arguments of a join, outerjoin, d-join, or groupjoin is a little more complex. Thus, we devoted a whole section to this problem (see Sec. 7.11). Since the bag operators ∪, ∩, \ require the same schema for both arguments, it is quite natural to ask for a simultaneous push-down into both arguments. Thus, we are interested in equivalences of the form f (e1 ◦ e2 ) ≡ f (e1 ) ◦ f (e2 ) for the bag operators above. Table 7.5 summarizes the valid instances of this equivalence pattern. Again, we must restrict the map operator to its extending form χa:e . And again, problems occur for duplicate eliminination and grouping as well as its special case nest. Consider duplicate elimination first. Evaluating the expression ΠD (e1 ∪ e2 ) will never result in any duplicates. ΠD (e1 ) ∪ ΠD (e2 ), however, might contain duplicates, e.g., if e1 = e2 = {[a : 1]}b , the result is {[a : 1]2 }b . Since this is the only problem, we immediately conclude that ΠD (e1 ∪ e2 ) ≡ ΠD (ΠD (e1 ) ∪ ΠD (e2 )), ΠD A (e1 ∪ e2 ) ≡ D D ΠD A (ΠA (e1 ) ∪ ΠA (e2 )). (7.24) (7.25) We now turn our attention to the grouping operator. Let e1 and e2 be two expressions with A(e1 ) = A(e2 ). Further, let G ⊆ A(e1 ) be a set of grouping attributes and F an aggregation vector. If (ΠG (e1 ) ∩ ΠG (e2 )) = ∅, then ΓG;F (e1 ∪ e2 ) ≡ ΓG;F (e1 ) ∪ ΓG;F (e2 ). (7.26) If (ΠG (e1 ) ∩ ΠG (e2 )) ̸= ∅, and F is decomposable into F 1 and F 2 , then ΓG;F (e1 ∪ e2 ) ≡ ΓG;F 2 (ΓG;F 1 (e1 ) ∪ ΓG;F 1 (e2 )). (7.27) Of course, this equivalence also holds if (ΠG (e1 ) ∩ ΠG (e2 )) = ∅. The cases of pushing grouping down an intersection or difference are discussed in Sec. 7.11.8. 7.7.3 Binary Operators Reordering binary operators is the core operation of any plan generator. We have already devoted a whole chapter to the problem of finding the optimal join order (Chap. 3). The search space for join ordering is huge since the join is commutative and associative. Thus, there was no restriction on the valid 7.7. SIMPLE REORDERABILITY 249 join trees besides syntactic constraints resulting from the consumer/producer relationship. In this section, we investigate the commutativity and associativity of our binary operators. Commutativity is the easiest. It is obvious that ∪, ∩, A, B, and K are commutative while the other binary operators are not. Let us denote the fact that a binary operator ◦ is commutative by comm(◦). In traditional mathematics, a binary operator ◦ is called associative if (a ◦ b) ◦ c = a ◦ (b ◦ c). Since we have to reorder many different operators, which possibly contain subscripts, we consider equivalences of the form (e1 ◦a12 e2 ) ◦b23 e3 ≡ e1 ◦a12 (e2 ◦b23 e3 ) for not necessarily distinct operators ◦a and ◦b . The subscripts in this equivalence have the following meaning. For operators not carrying a predicate or other expressions, it is immaterial and can be ignored. If an operator has an expression e as a subscript, then ij (for 1 ≤ i, j ≤ 3, i ̸= j) indicates that F(e) ∩ ek = ∅ for 1 ≤ k ≤ 3 and k ̸∈ {i, j}. This ensures that the equivalence is correctly typed on both sides of the equivalence sign. If for two operators ◦a and ◦b the above equivalence holds, then we denote this by assoc(◦a , ◦b ). As we will see, assoc is not symmetric. Thus, we have to be very careful about the order of the operators, which is tight to the syntactic pattern of the equivalence above. In order not to make a mistake, one has to remember two things. First, the operators appear in assoc in the same order as on the left-hand side of the equivalence. Second, the equivalence has left associatiation on its left-hand side and, consequently, right association on its right-hand side. If both operators are commutative, then the assoc property is symmetric, i.e., assoc(◦a , ◦b ), comm(◦a ), comm(◦b ) ≻ assoc(◦b , ◦a ), assoc(◦b , ◦a ), comm(◦a ), comm(◦b ) ≻ assoc(◦a , ◦b ) as can be seen from (e1 ◦a12 e2 ) ◦b23 e3 ≡ ≡ ≡ ≡ ≡ ≡ e1 ◦a12 (e2 ◦b23 e3 ) assoc(◦a , ◦b ) (e2 ◦b23 e3 ) ◦a12 e1 comm(◦a ) b a (e3 ◦23 e2 ) ◦12 e1 comm(◦b ) b a e3 ◦23 (e2 ◦12 e1 ) assoc(◦b , ◦a ) (e2 ◦a12 e1 ) ◦b23 e3 comm(◦b ) a b (e1 ◦12 e2 ) ◦23 e3 comm(◦a ). Assume we wish to prove associativity for two binary operators ◦a and ◦b , where ◦a is strongly right linear and ◦b is strongly left linear. Further assume that for all elements t1 ∈ e1 , t2 ∈ e2 , and t3 ∈ e3 {t1 }b ◦a12 ({t2 }b ◦b23 {t3 }b ) = ({t1 }b ◦a12 {t2 }b ) ◦b23 {t3 }b holds, where the subscript ij in ◦ij indicates that any subscript in ◦ij does not access attributes from ek if k ̸= i and k ̸= j. Then, we can easily prove 250 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES ∪ ∩ \ A B N T C E K Z ∪ + - ∩ + - \ - A + + + - B + + + - N + + + + - T + + + + - C + + + - E + + + ◦ ◦ - K ◦ - Z + + + - Table 7.6: The assoc-property for binary operators associativity by induction on the number of elements in the bag e2 . If e2 is empty, then (e1 ◦a12 e2 ) ◦b23 e3 and e1 ◦a12 (e2 ◦b23 e3 ) are also empty. For singleton bags, we apply the prerequisite above, and for e2 = e′2 ∪ e′′2 , (e1 ◦a12 e2 ) ◦b23 e3 ) ≡ ≡ ≡ (e1 ◦a12 (e′2 ∪ e′′2 )) ◦b23 e3 ((e1 ◦a12 e′2 ) ∪ (e1 ◦a12 e′′2 )) ◦b23 e3 ) ((e1 ◦a12 e′2 ) ◦b23 e3 ) ∪ ((e1 ◦a12 e′′2 ) ◦b23 e3 ) ≡I.H. (e1 ◦a12 (e′2 ◦b23 e3 )) ∪ (e1 ◦a12 (e′′2 ◦b23 e3 )) ≡ ≡ (e1 ◦a12 ((e′2 ◦b23 e3 ) ∪ (e′′2 ◦b23 e3 )) e1 ◦a12 (e2 ◦b23 e3 ) provides the induction step. Table 7.6 summarizes the associativities that hold. Be careful to determine assoc(◦a , ◦b ) from this table by looking up the row with ◦a and the column with ◦b . Almost all ’+’ entries’ proofs benefit from the strong linearity of both operators. Some of the exceptions benefit from the fact that semi- and antijoin can be expressed as selections and we already know how to push down/pull up selections. The final set of exceptions deals with outerjoins. Here, we also find most of the asymmetries. Since the reader might not be familiar with the d-join, reordering the d-join is discussed in Sec. 7.9. Since reordering outerjoins is complicated, the discussion is deferred to Sec. 7.10. Now, imagine the operators ◦a and ◦b access other attributes than in the associativity pattern above. For example, let ◦b possibly access e1 and e3 but not e2 . Then, the associativity pattern becomes (e1 ◦a12 e2 ) ◦b13 e3 ≡ e1 ◦a12 (e2 ◦b13 e3 ). Obviously, the right-hand side is ill-typed. However, we could rewrite the pattern to (e1 ◦a12 e2 ) ◦b13 e3 ≡ (e1 ◦b13 e3 ) ◦a12 e2 because then both sides are well-typed. Let us call instances of this pattern left asscom property and denote by l-asscom(◦a , ◦b ) the fact that the accord- 7.7. SIMPLE REORDERABILITY 251 ing equivalence holds. Analogously, we can define a right asscom property (rasscom): e1 ◦a13 (e2 ◦b23 e3 ) ≡ e2 ◦b23 (e1 ◦a13 e3 ). First note that l-asscom and r-asscom are symmetric properties, i.e., l-asscom(◦a , ◦b ) ≺≻ l-asscom(◦b , ◦a ), r-asscom(◦a , ◦b ) ≺≻ r-asscom(◦b , ◦a ). Then, the calculation (e1 ◦a12 e2 ) ◦b23 e3 ≡ ≡ ≡ ≡ (e2 ◦a12 e1 ) ◦b23 e3 if comm(◦a12 ) b a (e2 ◦23 e3 ) ◦12 e1 if l-asscom(◦a12 , ◦b23 ) e1 ◦a12 (e2 ◦b23 e3 ) if comm(◦a12 ) a b (e1 ◦12 e2 ) ◦23 e3 if assoc(◦a12 , ◦b23 ) implies that comm(◦a12 ), assoc(◦a12 , ◦b23 ) ≻ l-asscom(◦a12 , ◦b23 ), comm(◦a12 ), l-asscom(◦a12 , ◦b23 ) ≻ assoc(◦a12 , ◦b23 ). Thus, the l-asscom property is implied by associativity and commutativity, which explains its name. We leave it to the reader to show that comm(◦b23 ), assoc(◦a12 , ◦b23 ) ≻ r-asscom(◦a12 , ◦b23 ), comm(◦b23 ), r-asscom(◦a12 , ◦b23 ) ≻ assoc(◦a12 , ◦b23 ). The important question is whether there are instances of l/r-asscom which do not follow from the commutativity and associativity properties. The answer is yes, as the following investigation shows. Assume e ◦a e′ can be expressed as a selection σp(◦a ,e′ ) (e), then (e1 ◦a12 e2 ) ◦b13 e3 ≡ σp(◦a12 ,e2 ) (e1 ) ◦b13 e3 assumption b ≡ σp(◦a12 ,e2 ) (e1 ◦13 e3 ) l-pushable(σ, ◦b13 ) ≡ (e1 ◦b13 e3 ) ◦a12 e2 assumption Thus, isLikeSelection(◦a ), l-pushable(◦a , ◦b ) ≻ l-asscom(◦a , ◦b ). We leave the symmetric case for r-asscom to the reader. Another important exception is l-asscom(E, Z), which follows from the fact that both operators are strongly left linear. Assume that ({t1 }b ◦a12 {t2 }b ) ◦b13 {t3 }b ≡ ({t1 }b ◦b13 {t3 }b ) ◦a12 {t2 }b for all ti and that ◦a and ◦b are strongly left linear. Then l-asscom(◦a , ◦b ) holds. The proof is by induction on the number of elements contained in e1 . First observe that if e1 is empty, then (e1 ◦a12 e2 ) ◦b13 e3 and (e1 ◦b13 e3 ) ◦a12 e2 are 252 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES ∪ ∩ \ A B N T C E K Z ∪ +/+ -/-/-/-/-/-/-/-/-/-/- ∩ -/+/+ -/-/-/+/+/-/-/-/-/- \ -/-/-/-/-/+/+/-/-/-/-/- A -/-/-/+/+ +/+ +/+/+/+ +/-/+/- B -/-/-/+/+ +/+ +/+/+/+ +/-/+/- N -/+/+/+/+/+/+/+/+/-/+/- T -/+/+/+/+/+/+/+/+/-/+/- C -/-/-/+/+ +/+ +/+/+/+ +/-/+/- E -/-/-/+/+/+/+/+/+/◦/+/- K -/-/-/-/-/-/-/-/◦/◦/◦ -/- Z -/-/-/+/+/+/+/+/+/-/+/- Table 7.7: The l-/r-asscom property for binary operators ∪ ∩ \ ∪ -/+/+ -/- ∩ -/+/+ -/- \ -/-/+ -/- A +/+ +/+ +/+ B +/+ +/+ +/+ N -/+ -/+ -/+ T -/+ -/+ -/+ C +/+ +/+ +/+ E -/+ -/+ -/+ K -/-/-/- Z -/+ -/+ -/+ Table 7.8: The l-/r-dist property for binary operators also empty. Let e′1 and e′′1 be two bags such that e1 = e′1 ∪ e′′1 . The induction step looks like this: (e1 ◦a12 e2 ) ◦b13 e3 ≡ ((e′1 ∪ e′′1 ) ◦a12 e2 ) ◦b13 e3 ≡ ((e′1 ◦a12 e2 ) ◦b13 e3 ) ∪ ((e′′1 ◦a12 e2 ) ◦b13 e3 ) ≡ ((e′1 ◦a12 e2 ) ∪ (e′′1 ◦a12 e2 )) ◦b13 e3 ≡I.H. ((e′1 ◦b13 e3 ) ◦a12 e2 ) ∪ ((e′′1 ◦b13 e3 ) ◦a12 e2 ) ≡ ((e′1 ◦b13 e3 ) ∪ (e′′1 ◦b13 e3 )) ◦a12 e2 ≡ (e1 ◦b13 e3 ) ◦a12 e2 . ≡ ((e′1 ∪ e′′1 ) ◦b13 e3 ) ◦a12 e2 Table 7.7 summarizes the l-/r-asscom properties for all pairs of operators. Most of the entries follow from the abovementioned. Some equivalences for the d-join and the groupjoin, especially in conjunction with outerjoins, need dedicated proofs. This is a good sign, since, thanks to l-/r-asscom, reorderings become possible which were not possible with commutativity and associativity alone. Distributivity laws play a minor role in query compilers, but are very useful to prove equivalences. We consider right and left distributivity (l/r-dist): e1 ◦b (e2 ◦a e3 ) ≡ (e1 ◦b e2 ) ◦a (e1 ◦b e3 ) l-dist, (e1 ◦a e2 ) ◦b e3 ≡ (e1 ◦b e3 ) ◦a (e2 ◦b e3 ) r-dist. 7.8. PREDICATE DETACHMENT AND ATTACHMENT 253 With these definitions, it is easy to show that comm(◦b ), r-dist(◦a , ◦b ) ≻ l-dist(◦a , ◦b ), comm(◦b ), l-dist(◦a , ◦b ) ≻ r-dist(◦a , ◦b ). Table 7.8 summarizes the distributivity laws for ◦a ∈ {∪, ∩, \}. 7.8 Predicate Detachment and Attachment In most cases, an operator with a conjunctive selection predicate allows to move a part of it to a newly introduced selection operator. We call this process predicate detachment. Predicate attachment denotes the opposite rewrite. Let q, qi be join predicates and p and pi be selection predicates. Further, we require that F(pi ) ∩ A(e3−i ) = ∅ for i = 1, 2. Then, the following equivalences hold σq∧p (e) ≡ σq (σp (e)), (7.28) e1 Bq∧p1 e2 ≡ σp1 (e1 ) Bq e2 , (7.29) e1 Nq∧p1 e2 ≡ σp1 (e1 ) Nq e2 , (7.31) e1 Tq∧p2 e2 ≡ e1 Tq σ¬p2 (e2 ), (7.33) e1 Zq∧p2 ;g:f e2 ≡ e1 Zq;g:f σp2 (e2 ). (7.35) e1 Bq∧p2 e2 ≡ e1 Bq σp2 (e2 ), (7.30) e1 Nq∧p2 e2 ≡ e1 Nq σp2 (e2 ), (7.32) e1 Eq∧p2 e2 ≡ e1 Eq σp2 (e2 ), (7.34) There is no possibility to move a part of a conjunctive predicate, which only accesses attributes from one side, into or out of a full outerjoin. In case of a disjunction, we have a nice equivalence for the antijoin. e1 Tq1 ∨q2 e2 ≡ (e1 Tq1 e2 ) Tq2 e2 (7.36) e1 Tq∨p1 e2 ≡ σ¬p1 (e1 ) Tq e2 (7.37) and the equivalence holds if e2 ̸= ∅. Assume that the whole predicate of a binary operator references only attributes from its left or its right argument. Then, some simplifications/rewrites 254 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES are possible: e1 Bp1 e2 ≡ σp1 (e1 ) A e2 , (7.38) e1 Np2 e2 ≡ e1 Ntrue σp2 (e2 ) ≡ σσp2 (e2 )̸=∅ (e1 ), (7.40) e1 Tp2 e2 ≡ e1 Ttrue σp2 (e2 ) ≡ σσp2 (e2 )=∅ (e1 ), (7.42) e1 Ep2 e2 ≡ e1 Etrue σp2 (e2 ) (7.44) e1 Kp1 e2 ≡ (σp1 (e1 ) A e2 ) ∪ (σe2 =∅∨¬p1 (e1 ) A {⊥A(e2 ) }) (7.46) e1 Np1 e2 ≡ σp1 (e1 ) Ntrue e2 ≡ σe2 ̸=∅ (σp1 (e1 )), (7.39) e1 Tp1 e2 ≡ σ(e2 =∅)∨(¬p1 ) (e1 ), (7.41) e1 Ep1 e2 ≡ (σp1 (e1 ) A e2 ) ∪ ((σ(e2 =∅)∨(¬p1 ) (e1 )) A {⊥A(e2 ) }), (7.43) ≡ (e1 A σp2 (e2 )) ∪ ((σσp2 (e2 )=∅ (e1 )) A {⊥A(e2 ) }) ∪(σσp1 (e1 )=∅ (e1 ) A {⊥A(e1 ) }) e1 Zp1 ;g:f e2 ≡ e1 [true;g:f σp1 (e2 ) (7.45) (7.47) (7.48) e1 Zp2 ;g:f e2 ≡ e1 Ztrue;g:f ◦σp2 e2 ≡ e1 Ztrue;g:f σp2 e2 ≡ χg:f (σp2 (e2 ) (e1 ) ≡ e1 A {[g : f (σp2 (e2 ))]}, (7.49) where we left out symmetric cases, which are possible due to commutativity. Let us consider the semijoin. If F(p1 ) ∩ A(e2 ) = ∅, then  σp1 (e1 ) if e2 ̸= ∅, e1 Np1 e2 = ∅ if e2 = ∅, which can be summarized to σe2 ̸=∅ (σp1 (e1 )). If F(p2 ) ∩ A(e1 ) = ∅, then e1 Np2 e2 =  e1 if σp2 (e2 ) ̸= ∅, ∅ if σp2 (e2 ) = ∅, which can be summarized to σσp2 (e2 )̸=∅ (e1 ). Let us consider the antijoin. If F(p1 ) ∩ A(e2 ) = ∅, then  e1 if e2 = ∅, e1 Tp1 e2 = σ¬p1 (e1 ) if e2 ̸= ∅, which can be summarized to σe2 =∅∨¬p1 (e1 ). If F(p2 ) ∩ A(e1 ) = ∅, then e1 Tp2 e2 =  e1 if σp2 (e2 ) = ∅ ∅ if σp2 (e2 ) ̸= ∅, which can be summarized to σσp2 (e2 )=∅ (e1 ). For the semi- and the antijoin, we have the expression e2 in the subscript of some operator on the left-hand side of the equivalences. This could mean nested-loop evaluation. However, since the evaluation of e2 is independent of e1 , we can easily apply unnesting techniques, and evaluate e2 only once and only as far as necessary to evaluate the expressions for the semi- and antijoin. 7.9. BASIC EQUIVALENCES FOR D-JOIN 255 To consider the different cases for the left and full outerjoin, it is convenient to define E⊥i = {⊥A(ei ) } for i = 1, 2 and given expressions ei . If F(p1 )∩A(e2 ) = ∅, we can reason as follows for the left outerjoin: e1 Ep1 e2 ≡ (e1 Bp1 e2 ) ∪ ((e1 Tp1 e2 ) A E⊥2 ) ≡ (σp1 (e1 ) A e2 ) ∪ ((σ(e2 =∅)∨(¬p1 ) (e1 )) A E⊥2 ). If F(p2 ) ∩ A(e1 ) = ∅, we have e1 Ep2 e2 ≡ (e1 Bp2 e2 ) ∪ ((e1 Tp2 e2 ) A E⊥2 ) ≡ (e1 A σp2 (e2 )) ∪ ((e1 Ttrue σp2 (e2 )) A E⊥2 ) ≡ (e1 Etrue σp2 (e2 )) or, alternatively, e1 Ep2 e2 ≡ (e1 Bp2 e2 ) ∪ ((e1 Tp2 e2 ) A E⊥2 ) ≡ (e1 A σp2 (e2 )) ∪ ((σσp2 (e2 )=∅ (e1 )) A E⊥2 ). Next, we consider the full outerjoin. Assume F(p1 ) ∩ A(e2 ) = ∅, then e1 Kp1 e2 ≡ (e1 Bp1 e2 ) ∪ ((e1 Tp1 e2 ) A E⊥2 ) ∪ ((e2 Tp1 e1 ) A E⊥1 ) ≡ (σp1 (e1 ) A e2 ) ∪ (σ(e2 =∅)∨(¬p1 ) (e1 ) A E⊥2 ) ∪ ((σσp1 (e1 )=∅ (e2 )) A E⊥1 ). Finally, let us consider groupjoin. If F(q) ∩ A(e2 ) = ∅, there is not much we can do. If F(q) ∩ A(e1 ) = ∅, the expression can be considerably simplified. This is due to the fact that now every item in e1 has the same members of e1 in its group. In other words, in the result of e1 Ztrue;g:id e2 all tuples have the same value for g. 7.9 Basic Equivalences for D-Join From the definition of the d-join follows that e1 C e2 ≡ ∪t1 ∈e1 ({t1 }b A e2 (t1 )). (7.50) Assume e2 depends on some attributes provided by a tuple t1 . Then e2 (t1 ) C e3 ≡ (e2 C e3 )(t1 ). (7.51) That is, it is immaterial where we feed in the bindings contained in t1 . Of course, this only holds as long as F(e3 ) ∩ A(t1 ) = ∅. Let us prove that C is associative: (e1 C e2 ) C e3 ≡ (∪t1 ∈e1 ({t1 }b A e2 (t1 ))) C e3 ≡ ∪t1 ∈e1 (({t1 }b A e2 (t1 )) C e3 ) ≡ ∪t1 ∈e1 ({t1 }b A (e2 (t1 ) C e3 )) ≡ ∪t1 ∈e1 ({t1 }b A (e2 C e3 )(t1 )) ≡ e1 C (e2 C e3 ) 256 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES In a sense, the d-join is strongly right linear. If for all t1 ∈ e1 e2 (t1 ) = ∅, then e1 C e2 = ∅. Assume e2 has the form e′2 ∪ e′′2 . Then e1 C e2 = e1 C (e′2 ∪ e′′2 ) = e1 BJ=J ′ (eb′ ∪ eb′′ ) 2 2 = (e1 BJ=J ′ eb′2 ) ∪ (e1 BJ=J eb′′2 ) = (e1 C e′2 ) ∪ (e1 C e′′2 ) for J = A(e1 ) ∩ F(e2 )). The d-join and the unnest operators are closely related: e1 C e2 ≡ µg (χg:e2 (e1 )). (7.52) Between flatten and the d-join, there also exists a correspondence: flatten(χe2 (e1 )) ≡ ΠA(e2 ) (e1 C e2 ). (7.53) Sometimes a d-join can be expressed as a cross product or a join: e1 C e2 ≡ e1 × e2 (7.54) e1 C σq (e2 ) ≡ e1 Bq e2 (7.55) if F(e2 ) ∩ A(e1 ) = ∅, if F(e2 ) ∩ A(e1 ) = ∅. Denote by e ↓ the fact that some expression e is defined, i.e., returns a valid result. Then, we call a function f extending, if and only if ∀x, y : (f (x) ◦ y) ↓=⇒ f (x ◦ y) = f (x) ◦ y and we call it restricting, if and only if ∀x, y : f (x) ↓, (x ◦ y) ↓=⇒ f (x ◦ y) = f (x) Let us give an example of a function that is neither extending nor restricting. It returns a tuple with a single attribute c, whose value is bound to the number of attributes of its input tuple. Unnesting of operations buried in the d-join can be performed by applying the following equivalence: e C f (σA=A′ (ρA←A′ (e))) ≡ µg (ΓA;g:f (e)) (7.56) e1 C (e2 B e3 ) ≡ (e1 C e2 ) B e3 (7.57) e1 C χf (e2 ) ≡ χf (e1 C e2 ) (7.58) if A ⊆ A(e), if F(e3 ) ∩ A(e1 ) = ∅, if f extending, ΠA (e1 C χf (e2 )) ≡ ΠA (χf (e1 C e2 )) if A ⊆ A(χf (e2 )), and f restricting. (7.59) 257 7.10. EQUIVALENCES FOR OUTERJOINS e1 := R1 a1 b1 1 1 2 4 3 5 4 j e12 := e1 Bq12 e2 a1 b1 a2 b2 2 4 2 4 e elo := e E 1 q 12 2 12 a1 b1 a2 b2 1 1 2 4 2 4 3 5 4 fo e12 := e1 Kq12 e2 a1 b1 a2 b2 1 1 2 4 2 4 3 5 4 1 2 3 6 4 - e2 := R2 a2 b2 1 2 2 4 3 6 4 j e13 := e1 Bq13 e3 a1 b1 a3 b3 3 5 2 5 e elo := e E 1 q 13 3 13 a1 b1 a3 b3 1 1 2 4 3 5 2 5 4 fo e13 := e1 Kq13 e3 a1 b1 a3 b3 1 1 2 4 3 5 2 5 4 1 3 3 6 4 - e3 := R3 a3 b3 1 3 2 5 3 6 4 j e23 := e2 Bq23 e3 a2 b2 a3 b3 3 6 3 6 e elo := e E 2 q 23 3 23 a2 b2 a3 b3 1 2 2 4 3 6 3 6 4 fo e23 := e2 Kq23 e3 a2 b2 a3 b3 1 2 2 4 3 6 3 6 4 1 3 2 5 4 - Figure 7.6: Example for outerjoin reorderability (for strict q) We have to be careful if we exchange a d-join with a join. The dependency can move. In the following equivalences we provide the dependencies explicitly in parenthesis. e1 C p12 (e2 (e1 ) Bp23 e3 (e1 )) ≡ (e1 C p12 e2 (e1 )) C p23 e3 (e1 ) e1 C p12 (e2 Bp23 e3 (e1 )) ≡ (e1 Bp12 e2 ) C p23 e3 (e1 ) (7.60) (7.61) In the first equivalence, the join between e2 and e3 on the left-hand side must be turned into a dependent join on the right-hand side. In the second equivalence, the first dependent join between e1 and e2 becomes a regular join between e1 and e2 on the right-hand side and the regular join between e2 and e3 on the left-hand side becomes a dependent join on the right-hand side. 7.10 Equivalences for Outerjoins Outerjoins are a little brittle. Long papers have already been written on this subject. In this section, we summarize the most important findings, which are useful in the context of query optimization. For a full account on outerjoins the reader is referred to the literature [736, 298, 308]. The occurrence of an 258 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES e1 := R1 a1 b1 1 1 2 4 3 5 4 j′ ′ e2 e12 := e1 Bq12 a1 b1 a2 b2 2 4 2 4 4 4 ′ lo ′ e2 e12 := e1 Eq12 a1 b1 a2 b2 1 1 2 4 2 4 3 5 4 4 f o′ ′ e2 e12 := e1 Kq12 a1 b1 a2 b2 1 1 2 4 2 4 3 5 4 4 1 2 3 6 e2 := R2 a2 b2 1 2 2 4 3 6 4 j′ ′ e3 e13 := e1 Bq13 a1 b1 a3 b3 3 5 2 5 4 4 ′ lo ′ e3 e13 := e1 Eq13 a1 b1 a3 b3 1 1 2 4 3 5 2 5 4 4 f o′ ′ e3 e13 := e1 Kq13 a1 b1 a3 b3 1 1 2 4 3 5 2 5 4 4 1 3 3 6 e3 := R3 a3 b3 1 3 2 5 3 6 4 j′ ′ e3 e23 := e2 Bq23 a2 b2 a3 b3 3 6 3 6 4 4 ′ lo ′ e3 e23 := e2 Eq23 a2 b2 a3 b3 1 2 2 4 3 6 3 6 4 4 f o′ ′ e3 e23 := e2 Kq23 a2 b2 a3 b3 1 2 2 4 3 6 3 6 4 4 1 3 2 5 Figure 7.7: Example for outerjoin reorderability (for non-strict q ′ ) ′ ef12o2 := e1 Kq12′ e2 a1 b1 a2 b2 1 1 4 2 4 2 4 2 4 4 3 5 4 4 4 1 2 3 6 ′ elo2 23 := e2 Eq2′ 3 e3 a2 b2 a3 b3 1 2 2 4 3 6 3 6 4 1 3 4 2 5 4 3 6 4 4 - ′ ef23o2 := e2 Kq2′ 3 e3 a2 b2 a3 b3 1 2 2 4 3 6 3 6 4 1 3 4 2 5 4 3 6 4 4 - Figure 7.8: Example for outerjoin reorderability (for partially non-strict q ′ ) outerjoin can have several reasons. First, outerjoins are part of the SQL 2 specification. Second, outerjoins can be introduced during query rewrite. For example, unnesting nested queries or hierarchical views may result in outerjoins. Sometimes, it is also possible to rewrite universal quantifiers to outerjoins [886, 219]. Before reading any further, the reader should get acquainted to outerjoins by checking whether there is a mistake in Figs. 7.6, 7.7, or 7.8. There, we calculated 259 7.10. EQUIVALENCES FOR OUTERJOINS a1 elo 12 Bq23 e3 b1 a2 b2 a3 b3 a1 1 2 3 4 b3 6 a1 1 2 3 4 - e1 Eq12 ej23 b1 a2 b2 a3 1 4 5 j e1 Kq12 e23 b1 a2 b2 a3 1 4 5 3 6 3 a1 2 e1 Bq12 ef23o b1 a2 b2 a3 4 2 4 - b3 - a1 1 2 3 4 e1 Eq12 ef23o b1 a2 b2 a3 1 4 2 4 5 - b3 - ef12o Bq23 e3 a1 - a1 2 a1 1 2 3 4 - b1 - a2 3 b2 6 a3 3 ej12 Kq23 e3 b1 a2 b2 a3 4 2 4 1 2 3 4 elo K e q 3 23 12 b1 a2 b2 a3 1 4 2 4 5 1 2 3 4 b3 3 5 6 b3 3 5 6 - b3 b3 6 Figure 7.9: Example for outerjoin associativity for strict q for three relations Ri their joins, left outerjoins, and full outerjoins for three different sets of predicates. The first set of predicates does not apply any special comparisons with respect to null values. All predicates in this set are denoted by qij (1 ≤ i, j ≤ 3) and defined as qij := (bi = bj ). The second set of predicates . uses the special comparison ‘=’. Remember that this dotted equality returns true in the additional case that both arguments are null. The predicates of the . ′ and defined as q ′ := (b = second set are denoted by qij bj ). The third set of i ij . . predicates consists of q12′ := b1 = b2 ∨ b2 = null and q2′ 3 := b2 = b3 ∨ b2 = null. Note that in Fig. 7.8 there is no difference between e2 Eq2′ ,3 e3 and e2 Kq2′ ,3 e3 . Why? The main purpose of this section is to derive equivalences among expressions containing outerjoins. Let us start with the observation that the full outerjoin is commutative, but the left outerjoin is not. Less simple is the next item on 260 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES ′ e3 e100 := elo 12 Eq23 a1 b1 a2 b2 a3 b3 1 1 4 2 4 2 4 3 5 4 4 4 ′ e3 e100 := ef12o Eq23 a1 b1 a2 b2 a3 b3 1 1 4 2 4 2 4 3 5 4 4 4 1 2 3 6 3 6 4 4 fo ′ e3 e100 := e12 Kq23 a1 b1 a2 b2 a3 b3 1 1 4 2 4 2 4 3 5 4 4 4 1 2 3 6 3 6 4 4 1 3 2 5 ′ e100 := e1 Eq12 elo 23 a1 b1 a2 b2 a3 b3 1 1 2 4 2 4 3 5 4 ′ lo e100 := e1 Kq12 e23 a1 b1 a2 b2 a3 b3 1 1 2 4 2 4 3 5 4 1 2 3 6 3 6 4 4 ′ e100 := e1 Kq12 ef23o a1 b1 a2 b2 a3 b3 1 1 2 4 2 4 3 5 4 1 2 3 6 3 6 4 4 1 3 2 5 Figure 7.10: Example for outerjoin associativity for non-strict q ′ e100 := ef12o Bq13 e3 a1 b1 a2 b2 a3 b3 3 5 2 5 e100 := ej13 Kq12 e2 a1 b1 a2 b2 a3 b3 3 5 2 5 1 2 2 4 3 6 4 - Figure 7.11: Example for outerjoin l-asscom for strict q our list: associativity. As a simple start, consider (e1 Eq12 e2 ) Bq23 e3 ̸≡ e1 Eq12 (e2 Bq23 e3 ) If we let e2 and e3 be empty bags, then the right-hand side evaluates to the empty bag but the left-hand side simplifies to e1 A {⊥A(e2 )∪A(e3 ) }. Thus, ¬assoc(E, B). By taking a look at (e1 Kq12 e2 ) Bq23 e3 ̸≡ e1 Kq12 (e2 Bq23 e3 ), 261 7.10. EQUIVALENCES FOR OUTERJOINS with e2 and e3 yielding the empty bag, we see that ¬assoc(K, B). Imagine e1 and e2 yield empty bags. The left-hand side of (e1 Bq12 e2 ) Kq23 e3 ̸≡ e1 Bq12 (e2 Kq23 e3 ) then evaluates to {⊥A(e1 )∪A(e2 ) } A e3 . Since the right-hand side gives the empty bag, we have ¬assoc(B, K). Last in this sequence, we consider (e1 Eq12 e2 ) Kq23 e3 ̸≡ e1 Eq12 (e2 Kq23 e3 ). Assume again, that e1 and e2 evaluate to the empty bag. Then, the righthand side does the same, whereas the left-hand side results in the familiar {⊥A(e1 )∪A(e2 ) } A e3 . Consequently, ¬assoc(E, K). Summarizing, we have ¬assoc(E, B), ¬assoc(K, B), ¬assoc(B, K), and ¬assoc(E, K). These negative results are also comfirmed by our example (see Fig. 7.9). This leaves us to check each of assoc(E, E), assoc(B, E), assoc(K, E), and assoc(K, K), apart from the already known assoc(B, B). Fig. 7.9 shows that for this particular example all four properties hold. Let us start with assoc(E, E). To illustrate one problem which occurs in the context of associativity, consider the following three relations: e1 := R1 a a e2 := R2 b c b - e3 := R3 d d The results of different left outerjoin applications are e1 Ea=b e2 a b c a – – . e2 Ec=d∨c=null e3 b c d b – c . (e1 Ea=b e2 ) Ec=d∨c=null e3 a b c d a – – c . e1 Ea=b (e2 Ec=d∨c=null e3 ) a b c d a – – – Hence, in general (e1 Eq12 e2 ) Eq23 e3 ̸= e1 Eq12 (e2 Eq23 e3 ). The problem is that the predicate q23 does not reject null values, where a predicate rejects null values for a set of attributes A if it evaluates to false or undefined on every tuple in which all attributes in A are null. That is, q rejects null values if and only if q(⊥A ) ̸= true. We also say that a predicate is strict or strong if it rejects null values. For our example predicates, the following holds. All qij reject null values on any A(ei ). The predicates q12′ and q2′ 3 do not reject null values on ′ neither reject null A(e2 ) but on A(e1 ) or A(e3 ), respectively. The predicates qij values on A(ei ) nor on A(ej ). In order to understand why this is the core of the problem, let us investigate this more thoroughly. Define E⊥i := {⊥A(ei ) } for i = 1, 2, 3 and E⊥ij := {⊥A(ei )∪A(ej ) } for i, j = 1, 2, 3. Further, let q12 and q23 be join predicates such 262 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES that F(q12 ) ∩ A(e3 ) = ∅ and F(q23 ) ∩ A(e1 ) = ∅. For the left-hand side of associativity, we have (e1 Eq12 e2 ) Eq23 e3 ≡ ((e1 Bq12 e2 ) ∪ ((e1 Tq12 e2 ) A E⊥2 )) Eq23 e3 ≡ (((e1 Bq12 e2 ) ∪ ((e1 Tq12 e2 ) A E⊥2 )) Bq23 e3 ) ∪(((((e1 Bq12 e2 ) ∪ ((e1 Tq12 e2 ) A E⊥2 )) Tq23 e3 )) A E⊥3 ) ≡ ((e1 Bq12 e2 ) Bq23 e3 ) ∪(((e1 Tq12 e2 ) A E⊥2 ) Bq23 e3 ) ∪(((e1 Bq12 e2 ) Tq23 e3 ) A E⊥3 ) ∪((((e1 Tq12 e2 ) A E⊥2 ) Tq23 e3 ) A E⊥3 ) ≡ (e1 Bq12 (e2 Bq23 e3 )) ∪((e1 Tq12 e2 ) A (E⊥2 Bq23 e3 )) ∪(e1 Bq12 ((e2 Tq23 e3 ) A E⊥3 )) ∪((e1 Tq12 e2 ) A (E⊥2 Tq23 e3 ) A E⊥3 ) ≡ (e1 Bq12 ((e2 Bq23 e3 ) ∪ ((e2 Tq23 e3 ) A E⊥3 )) ∪((e1 Tq12 e2 ) A ((E⊥2 Bq23 e3 )) ∪ (E⊥2 Tq23 e3 ) A E⊥3 ) ≡ (e1 Bq12 (e2 Eq23 e3 )) ∪((e1 Tq12 e2 ) A (E⊥2 Eq23 e3 )). The right part of the cross product on the right-hand side of the union, (E⊥2 Eq23 e3 ), does look suspicious. Note that if q23 rejects nulls on A(e2 ), this part simplifies to E⊥23 . To confirm our suspicion, we take a look at the other side of associativity: e1 Eq12 (e2 Eq23 e3 ) ≡ (e1 Bq12 (e2 Eq23 e3 ) ∪ ((e1 Tq12 (e2 Eq23 e3 )) A E⊥23 ≡ (e1 Bq12 (e2 Eq23 e3 ) ∪ ((e1 Tq12 e2 ) A E⊥23 ). The last step is true, since e2 Eq23 e3 preserves e2 and F(q12 )∩A(e3 ) = ∅. Thus, the left outerjoin is associative if and only if ((e1 Tq12 e2 ) A (E⊥2 Eq23 e3 )) ≡ (e1 Tq12 e2 ) A E⊥23 . But this holds if q23 rejects nulls on A(e2 ). Thus, without any effort we have just proven the second of the following equivalences: (e1 Bq12 e2 ) Eq23 e3 ≡ e1 Bq12 (e2 Eq23 e3 ), (7.62) (e1 Eq12 e2 ) Eq23 e3 ≡ e1 Eq12 (e2 Eq23 e3 ) (7.63) (e1 Kq12 e2 ) Eq23 e3 ≡ e1 Kq12 (e2 Eq23 e3 ) (7.64) (e1 Kq12 e2 ) Kq23 e3 ≡ e1 Kq12 (e2 Kq23 e3 ) (7.65) if q23 rejects nulls on A(e2 ), if q23 rejects nulls on A(e2 ), if q12 and q23 reject nulls on A(e2 ). 7.10. EQUIVALENCES FOR OUTERJOINS 263 As an exercise, the reader should prove the remaining equivalences. This is necessary since the proofs of Galindo-Legaria [297] are valid for sets only. Let us now come to l-asscom. Fig. 7.11 shows that for our example l-asscom(K, B) does not hold. Bearing the symmetry of l-asscom in mind, the equivalences (e1 Eq12 e2 ) Bq13 e3 ≡ (e1 Bq13 e3 ) Eq12 e2 , (7.66) (e1 Eq12 e2 ) Kq13 e3 ≡ (e1 Kq13 e3 ) Eq12 e2 (7.68) (e1 Kq12 e2 ) Kq13 e3 ≡ (e1 Kq13 e3 ) Kq12 e2 (7.69) (e1 Eq12 e2 ) Eq13 e3 ≡ (e1 Eq13 e3 ) Eq12 e2 , if q12 rejects nulls on A(e1 ), (7.67) if q12 and q13 reject nulls on A(e1 ) cover all combinations of B, E, and K except the well-known case for regular joins. Since comm(B) and assoc(B, E) hold without restrictions, Eqv. 7.66 holds without restrictions. Since E is strongly left linear and the consumer/producer relationship is not disturbed because q12 does not access attributes from e3 and q13 does not access attributes from e2 , Eqv. 7.67 holds without restrictions. From Eqv. 7.64 and the fact that the full outerjoin is commutative, Eqv. 7.68 follows. Some care is just needed to see how the necessary restriction for associativity carries over to the l-asscom equivalence. Similarily, Eqv. 7.69 follows from Eqv.7.65. In all cases, the necessity of the restrictions is due to the fact that commutativity and l-asscom imply associativity. The r-asscom property is handled quickly. The only valid equivalence we have is e1 Kq13 (e2 Kq23 e3 ) ≡ e2 Kq23 (e1 Kq13 e3 ), (7.70) which follows directly from comm(K) and assoc(K, K) if q13 and q23 are both strict on A(e3 ). 7.10.1 Outerjoin Simplification Sometimes, a left or full outerjoin can be turned into a regular or one-sided outerjoin if it is followed a unary or binary algebraic operator with a strict predicate. These simplifications are important and should be applied before plan generation. In [308, 297], we find the the first two of the following equivalences. e1 Bp (e2 Eq e3 ) ≡ e1 Bp (e2 Bq e3 ) (7.71) e1 Tp (e2 Eq e3 ) ≡ e1 Tp (e2 Bq e3 ) (7.73) e1 Np (e2 Eq e3 ) ≡ e1 Np (e2 Bq e3 ) e1 Ep (e2 Eq e3 ) ≡ e1 Ep (e2 Bq e3 ) e1 Zp (e2 Eq e3 ) ≡ e1 Zp (e2 Bq e3 ) (7.72) (7.74) (7.75) These equivalences hold under the condition that p rejects nulls on A(e3 ). They can be prove to use the semijoin reducer equivalences 7.197-7.201. Similarly, 264 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES the equivalences e1 Bp (e2 Kq e3 ) ≡ e1 Bp (e2 Eq e3 ) (7.76) e1 Np (e2 Kq e3 ) ≡ e1 Np (e2 Eq e3 ) (7.77) e1 Ep (e2 Kq e3 ) ≡ e1 Ep (e2 Eq e3 ) (7.79) e1 Tp (e2 Kq e3 ) ≡ e1 Tp (e2 Eq e3 ) e1 Zp (e2 Kq e3 ) ≡ e1 Zp (e2 Eq e3 ) (7.78) (7.80) hold if p rejects nulls on A(e2 ). Commutatitvity of the full outerjoin gives symmetric equivalences if p rejects nulls on A(e3 ). Further, we can rewrite an outerjoin to a regular join whenever null-padded tuples are eliminated by some selection predicate. Equivalences that allow to do so and some further ones are given next. σp1 (e1 Eq e2 ) ≡ σp1 (e1 ) Eq e2 e1 Eq∧p2 e2 ≡ e1 Eq σp2 (e2 ), (7.81) (7.82) σp (e1 Eq e2 ) ≡ σp (e1 Bq e2 ) if p rejects nulls on A(e2 ), (7.83) σp (e1 Kq e2 ) ≡ σp (e1 Hq e2 ) if p rejects nulls on A(e2 ). (7.85) σp (e1 Kq e2 ) ≡ σp (e1 Eq e2 ) if p rejects nulls on A(e1 ), (7.84) We can extend the last two equivalences to outerjoins with default values: 2 σp (e1 ED q e2 ) ≡ σp (e1 Bq e2 ) 1 ,D2 σp (e1 KD e2 ) ≡ σp (e1 Eq e2 ) q 1 ,D2 σp (e1 KD e2 ) q 7.10.2 if ¬p(D2 ), (7.86) if ¬p(D2 ). (7.88) if ¬p(D1 ), ≡ σp (e1 Hq e2 ) (7.87) Generalized Outerjoin As pointed out by Galindo-Legaria and Rosenthal [736, 298, 308], the different outerjoins can be defined using the outer union operator, which in turn was introduced by Codd [200]. Let e1 and e2 be two relations and A1 and A2 their corresponding attributes. The outer union is then defined by padding the union of the relations with null values to the schema A1 ∪ A2 : + e1 ∪ e2 := (e1 A {⊥A2 \A1 }) ∪ (e2 A {⊥A1 \A2 }). (7.89) Given this definition of the outer union operator, we can define the outerjoin operations as follows: + e1 Eq e2 := e1 Bq e2 ∪ (e1 \ ΠA1 (e1 Bq e2 )), + (7.90) + e1 Kq e2 := e1 Bq e2 ∪ (e1 \ ΠA1 (e1 Bq e2 )) ∪ (e2 \ ΠA2 (e1 Bq e2 )).(7.91) The expression e1 Eq12 (e2 Bq23 e3 ) cannot be reordered given the equivalences so far. In order to allow reorderability of this expression, the generalized outerjoin was introduced by Dayal [220]. Here, we follow Rosenthal and Galindo-Legaria 7.10. EQUIVALENCES FOR OUTERJOINS 265 [736]. The generalized left outerjoin preserves attributes for a subset A ⊆ A(e1 ) only. It is defined as + e1 EA q e2 := (e1 Bq e2 ) ∪ (ΠA (e1 ) \ ΠA (e1 Bq e2 )). (7.92) However, we prefer a slightly different definition based on the antijoin: e1 EA q e2 := (e1 Bq e2 ) ∪ (ΠA (e1 Tq e2 ) A {⊥A∪A(e2 ) }b ). (7.93) This definition is equivalent to the one above. The generalized left outerjoin allows us to reorder left outerjoins and joins as well as full outerjoins and joins, but only in the context of sets. The equivalences 1) e1 Eq12 (e2 Bq23 e3 ) ≡ (e1 Eq12 e2 ) EA(e e3 q23 (7.94) 1) e1 Kq12 (e2 Bq23 e3 ) ≡ (e1 Kq12 e2 ) EA(e e3 q23 (7.95) if q23 rejects nulls on A(e2 ), if q23 rejects nulls on A(e2 ). only hold for sets. The following is a counterexample for bags. Define R1 := {[a1 : 1, b1 : 1]}b , R2 := {[a2 : 1, b2 : 1], [a2 : 2, b2 : 1]}b , and R3 := ∅b with schema {[a3 : int, b3 : int]}b . Evaluating R1 Eb1 =b2 (R2 Bb2 =b3 R3 ) then yields {[a1 : 1, b1 : 1, a2 : −, b2 : −, a3 : −, b3 : −]}b . Evaluating (R1 Eb1 =b2 R2 ) yields {[a1 : 1, b1 : 1, a2 : 1, b2 : 1], [a1 : 1, b1 : 1, a2 : 2, b2 : 1]}b . Thus, 1) (R1 Eq12 R2 ) EA(R R3 q23 evaluates to {[a1 : 1, b1 : 1, a2 : −, b2 : −, a3 : −, b3 : −]2 }b . We only discussed the basic equivalences for reordering algebraic expressions containing outerjoins. General frameworks for dealing with these expressions in toto are presented in [85, 86, 297, 298, 308]. Especially, the generalized left outerjoin can be generalized to preserve disjoint sets of attributes in order to derive more equivalences [297, 308], which also only hold in the context of sets. Bhargava, Goel, and Iyer propose the modified generalized outer join (MGOJ), a variant of the generalized outerjoin which correctly deals with bags [85]. ToDo 266 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES 7.11 Equivalences for Unary Grouping 7.11.1 An Elementary Fact about Grouping Let us first resume the discussion of properties of the grouping operator, which we started in Sec. 7.7.1. Assume that the functional dependencies H → G and G → H hold. Then, it should not make any difference whether we group by H or G. The only problem we have to solve is that H might not contain all attributes of G (or vice versa). However, since H → G, any attribute g ∈ (G\H) has only one possible value per group if we group by H. Thus, we can simply copy this value. We do so by adding a new aggregation function cpf(g), which copies the value g of the first tuple seen for a group. This is deterministic, since all tuples in a group have the same value for g (as H → G). Thus, to make sure that all values of G are extracted if we group according to H, we extend a given aggregation vector F as follows. Assume (G \ H) = {g1 , . . . , gk }. Then, we define F ◦ (G \ H) as F ◦ (g1 : cpf(g1 ), . . . , gk : cpf(gk )). Using this definition, we can state the equivalence ΓG;F (e) ≡ ΠC (ΓH;F ◦(G\H) (e)), (7.96) which holds if H → G and C = G ∪ A(F ). This equivalence allows to determine some set H with H → G such that grouping on H might become cheaper compared to grouping on G. Especially, it allows to minimize the number of grouping attributes. This trick can be applied to all equivalences in this section. Let G be a set of grouping attributes and assume that G → TID(e). Then, every group consists of only one tuple, i.e., ΠD G (e) = ΠG (e), and we can replace a grouping by a map: ΓG;F (e) ≡ ΠC (χF̂ (e)) (7.97) if F = (b1 : agg1 (a1 ), . . . , bm : aggm (am )), F̂ = (b1 : a1 , . . . , bm : am ), and C = G ∪ A(F ). Note that using F instead of F̂ also works. Tsois and Sellis call this equivalence remove-group-by [882]. 7.11.2 Join Let us now come to some more complex cases concerning the reorganization of expressions containing grouping and join. Traditionally, the grouping operator, if specified in some SQL query, is performed after the evaluation of all join operations. However, pushing down grouping can substantially reduce the input size of the joins and, thus, can be highly beneficial. Before we give some equivalences, let us look at some example relations, their joins and the result of applying some grouping operators. Fig 7.12 presents two relations R1 and R2 and the result of their join (e3 ) in the top row. The next row shows the result of applying a grouping operator to each of these items (e4 to e6 ). The last row contains the results of joining a grouped result with one original relation and 7.11. EQUIVALENCES FOR UNARY GROUPING 267 the result of joining the two grouped results given in e4 and e5 . Let us assume that our original expression Γg1 ,g2 ;c:count(∗),b1 :sum(a1 ),b2 :sum(a2 ) (e1 Bj1 =j2 R2 ) is equivalent to the original query we have to evaluate. The result of this expression is given in expression e6 of Fig 7.12. The question is whether any of e7 , e8 , or e9 can be used to provide the same result. Let us start with e7 . We have sum(c1 ) = 4 and sum(b′1 ) = 16, which is perfect. However, sum(a2 ) = 14, but according to e6 we should have 22. How can we fix that? Note that the difference is 8, which is exactly the value we see in the last row of e7 . Further, the count in c1 is 2. It indicates that for g1 = 1 and j1 = 2 two tuples where grouped. Each of these tuples joins with the tuple (1, 2, 8) of R2 . This is what we missed. This point is illustrated in the example in Fig. 7.13, where we added one more tuple to e2 , which is marked by a ∗. All tuples to which this tuple contributes are also marked by a ∗ later on. The reader should carefully follow the stars. Let us return to e7 in Fig. 7.12. We have to apply some correction for the fact that (1, 2, 8) of R2 has two (= c1 ) join partners. Thus, we calculate sum(c1 ∗ a2 ) in e7 instead of the plain sum(a2 ) and get the correct result 22. Let us turn to e8 in Fig. 7.12. There, we calculate sum(a1 ) = 16, sum(b′2 ) = 22, and sum(c2 ) = 4, which are all correct results. However, in e8 of Fig. 7.13, the tuples (1, 2, 4) and (1, 2, 8) of R1 now find two join partners in R2 , since we added the tuple marked by the asterisk. We should expect problems! And there are some, since sum(a1 ) in e8 gives 14 but should result in 28, according to e6 . Again, c2 holds the number of join partners the R1 tuples find in R2 . Thus calculating sum(a2 ∗ c2 ) = 28 solves the problem. Turning our attention to e9 of Fig. 7.12, we see that sum(b′1 ∗ c2 ) = 16 and sum(b′2 ∗ c1 ) = 22 give the correct results (compare these values to b1 and b2 of e6 ). However, sum(c1 ) = sum(c2 ) = 3 but c of e6 indicates that 4 is the correct result. The reader might guess that we have to take sum(c1 ∗ c2 ) = 4, which indeed is what has to be calculated. These observations give rise to the following definition. Let F = (b1 : agg1 (a1 ), . . . , bm : aggm (am )) be an aggregation vector. We define F ⊗ c for some attribute c, which will typically contain the result of some count(∗), as F ⊗ c = (b1 : agg′1 (e1 ), . . . , bm : agg′m (em )) with  aggi (ei )    ′ aggi (ei ∗ c) agg(ei ) =  sum(c) i   sum(ei = NULL ? 0 : c) if aggi is duplicate agnostic, if aggi is the duplicate sensitive sum, if aggi (ei ) = count(∗), if aggi (ei ) = count(ei ), ei ̸= ’*’. The goal now is to exploit the decomposability of aggregation functions and this definition to push a grouping down a join. Let us start with an equivalence, which Rosenthal, Rich, and Scholl [742, 729] used to speed up join processing. They noted that an ordinary join can be calculated by unnesting the result of a join of two nested relations. Let q 268 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES e2 := R2 g2 j2 a2 1 1 2 1 1 4 2 8 1 e1 := R1 g1 j1 a1 1 1 2 1 2 4 2 8 1 g1 1 1 1 1 e4 := Γg1 ,j1 ;F1 (e1 ) e5 := Γg2 ,j2 ;F2 (e2 ) ′ g1 j1 c1 b1 g2 j2 c2 b′2 1 1 1 2 1 1 2 6 1 1 2 2 12 2 1 8 e7 := e4 1j1 =j2 e2 e8 := e1 1j1 =j2 e5 j1 c1 b′1 g2 j2 a2 g1 j1 a1 g2 j2 c2 1 1 2 1 1 2 1 1 2 1 1 2 1 1 1 2 1 1 4 2 4 1 2 1 2 2 12 1 2 8 2 8 1 2 1 1 e9 := e4 1j1 =j2 e5 g1 j1 c1 b′1 g2 j2 c2 b′2 1 1 1 2 1 1 2 6 1 2 2 12 1 2 1 8 g1 1 1 1 e3 := R1 1j1 =j2 R2 j1 a1 g2 j2 a2 1 2 1 1 2 1 2 1 1 4 2 4 1 2 8 2 8 1 2 8 e6 := Γg1 ,g2 ;F (e3 ) g1 c b1 b2 1 4 16 22 b′2 6 8 8 where F = c : count(∗), b1 : sum(a1 ), b2 : sum(a2 ) F1 = c1 : count(∗), b′1 : sum(a1 ) F2 = c2 : count(∗), b′2 : sum(a2 ) Figure 7.12: Example for grouping and join be a join predicate and e1 and e2 be two expressions to be joined. Denote by Ji = F(q) ∩ A(ei ) the join attributes from ei for i = 1, 2. Then e1 Bq e2 ≡ µg2 (µg1 (ΓJ1 ;g1 :ΠA(e1 )\J1 (e1 ) Bq ΓJ2 ;g2 :ΠA(e2 )\J2 (e2 ))). (7.98) We need the projections ΠA(ei )\Ji to prevent the duplication of attributes in the result. J1 and J2 are the minimal sets of grouping attributes we can use, but + nothing hinders us from using larger grouping sets. Let G+ 1 and G2 be two sets of attributes with Ji ⊆ G+ i for i = 1, 2. Then e1 Bq e2 ≡ µg2 (µg1 (ΓG+ ;g1 :ΠA(e )\J (e1 ) Bq ΓG+ ;g2 :ΠA(e )\J (e2 ))) 1 1 1 2 2 2 (7.99) holds. Unnesting two nested attributes g1 and g2 in a row, as done in the above equivalences, is like generating the cross product of the items contained in g1 and g2 . Under the above assumptions, we can thus state the following two 269 7.11. EQUIVALENCES FOR UNARY GROUPING e2 := R2 g2 j2 a2 1 1 2 1 1 4 1 2 8 1 2 16 e1 := R1 g1 j1 a1 1 1 2 2 4 1 1 2 8 ∗ e4 := Γg1 ,j1 ;F1 (e1 ) e5 := Γg2 ,j2 ;F2 (e2 ) ′ g1 j1 c1 b1 g2 j2 c2 b′2 1 1 1 2 1 1 2 6 1 2 2 12 1 2 2 24 ∗ e7 := e4 1j1 =j2 e2 g1 j1 c1 b′1 g2 j2 a2 1 1 1 2 1 1 2 1 1 1 2 1 1 4 1 2 2 12 1 2 8 1 2 2 12 1 2 16 ∗ e9 := e4 1j1 =j2 e5 g1 j1 c1 b′1 g2 j2 c2 b′2 1 1 1 2 1 1 2 6 1 2 2 12 1 2 2 24 ∗ where F e3 := e1 1j1 =j2 e2 j1 a1 g2 j2 a2 1 2 1 1 2 1 2 1 1 4 2 4 1 2 8 2 4 1 2 16 2 8 1 2 8 2 8 1 2 16 g1 1 1 1 1 1 1 ∗ ∗ e6 := Γg1 ,g2 ;F (e3 ) g1 c b1 b2 1 6 28 54 ∗ g1 1 1 1 j1 1 2 2 e8 := e1 1j1 =j2 e5 a1 g2 j2 c2 b′2 2 1 1 2 6 4 1 2 2 24 8 1 2 2 24 ∗ ∗ = c : count(∗), b1 : sum(a1 ), b2 : sum(a2 ) F1 = c1 : count(∗), b′1 : sum(a1 ) F2 = c2 : count(∗), b′2 : sum(a2 ) Figure 7.13: Extended example for grouping and join equivalences e1 Bq e2 ≡ Πg1 ,g2 (µg (χg:g1 Ag2 ( (7.100) e1 Bq e2 ≡ Πg1 ,g2 (µg (χg:g1 Ag2 ( (7.101) ΓJ1 ;g1 :ΠA(e1 )\J1 (e1 ) Bq ΓJ2 ;g2 :ΠA(e2 )\J2 (e2 )))), ΓG+ ;g1 :Π 1 A(e1 )\G+ 1 (e1 ) Bq ΓG+ ;g2 :Π 2 A(e2 )\G+ 2 (e2 )))). In the next step, we want to get rid of g. Assume we apply some aggregation function agg(g.ai ) to g of the latter equivalence, where ai is an attribute of g1 , i.e., ai ∈ A(g1 ) or, by definition, ai ∈ A(e1 ) \ G+ 1 . It should be clear that  sum(g1 .ai ) ∗ |g2 |    count(g1 .ai ) ∗ |g2 | agg(g.ai ) = min(g1 .ai )    max(g1 .ai ) if agg = sum, if agg = count, if agg = min, if agg = max, 270 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES and |g| = |g1 | ∗ |g2 |. Analogously, we can exchange 1 and 2 in case ai ∈ A(g2 ). Now, we are prepared to add an additional grouping operator ΓG+ ;g;F to both sides of Eqv. 7.101. Therefore, we assume that J1 ⊆ G+ and J2 ⊆ G+ . + Further, we define G+ i as G ∩ A(ei ) for i = 1, 2. This results in ΓG+ ;g;F (e1 Bq e2 ) ≡ ΓG+ ;g;F (Πg1 ,g2 (µg (χg:g1 Ag2 ( ΓG+ ;g1 :Π 1 A(e1 )\G+ 1 (e1 ) Bq ΓG+ ;g2 :Π 2 A(e2 )\G+ 2 (e2 ))))). We know that g, g1 , and g2 cannot be part of the left-hand side. This means that they cannot occur in G or A(F ). Thus, we can eliminate the projection, which gives us ΓG+ ;g;F (e1 Bq e2 ) ≡ ΓG+ ;g;F (µg (χg:g1 Ag2 (ΓG+ ;g1 :Π 1 A(e1 )\G+ 1 (e1 )Bq ΓG+ ;g2 :Π 2 A(e2 )\G+ 2 (e2 )))) Now, note that the outer grouping on the right-hand side undoes the unnesting which immediately proceeds it. We could be tempted to rewrite the right-hand side to something like χF (χg:g1 Ag2 (ΓG+ ;g1 :Π 1 A(e1 )\G+ 1 (e1 ) Bq ΓG+ ;g2 :Π 2 A(e2 )\G+ 2 (e2 ))). In order to verify this, we have to take a close look at E := (ΓG+ ;g1 :Π 1 A(e1 )\G+ 1 (e1 ) Bq ΓG+ ;g2 :Π 2 A(e2 )\G+ 2 (e2 )). We must make sure that E produces only a single tuple for each group constructed by ΓG+ ;g;F . From the definition of Γ, we see that neither ΠG+ (ΓG+ ;g1 :Π (e1 )) + 1 nor ΠG+ (ΓG+ ;g2 :Π 2 2 (e2 )) contain duplicates. + A(e2 )\G2 1 A(e1 )\G1 + Since G+ = G+ 1 ∪ G2 , the claim follows. In the next step, we eliminate χg:g1 Ag2 . As we have seen, we might need the cardinalities of g1 and g2 if we have to deal with duplicate sensitive aggregation functions. We can calculate them using a map operator. Let us further assume that F can be split into F1 and F2 such that F = F1 ◦ F2 and the only free variable of Fi is gi . Then we can rewrite the equivalence to ΓG+ ;g;F (e1 Bq e2 ) ≡ χF2 ⊗c1 ( χF1 ⊗c2 ( χc1 :|g1 | ( χc2 :|g2 | ( ΓG+ ;g1 :Π 1 A(e1 )\G+ 1 (e1 ) Bq ΓG+ ;g2 :Π 2 A(e2 )\G+ 2 (e2 ))))). Next, we should start moving the different map operators inwards. The only problem occurs since F1 ⊗ c2 and F2 ⊗ c1 need elements of both parts of the join. Let F1 be decomposable into F11 and F12 and F2 be decomposable into F21 271 7.11. EQUIVALENCES FOR UNARY GROUPING and F22 . Then, we have ΓG+ ;g;F (e1 Bq e2 ) ≡ χF22 ⊗c1 ( χF12 ⊗c2 ( χF11 ( χF21 ( χc1 :|g1 | ( χc2 :|g2 | ( ΓG+ ;g1 :Π 1 A(e1 )\G+ 1 (e1 ) Bq ΓG+ ;g2 :Π 2 A(e2 )\G+ 2 (e2 ))))))). Pushing down the last four χ operators yields ΓG+ ;g;F (e1 Bq e2 ) ≡ χF22 ⊗c1 ( χF12 ⊗c2 ( χF11 (χc1 :|g1 | (ΓG+ ;g1 :Π 1 A(e1 )\G+ 1 (e1 ))) Bq χF21 (χc2 :|g2 | (ΓG+ ;g2 :Π 2 A(e2 )\G+ 2 (e2 ))))), which can now easily be rewritten to the following equivalence by observing that g1 and g2 are not needed outside the join: ΓG+ ;g;F (e1 Bq e2 ) ≡ χF12 ⊗c2 ,F22 ⊗c1 ( (7.102) ΓG+ ;g1 ;F 1 ◦(c1 :|g1 |) (e1 ) Bq ΓG+ ;g2 ;F 1 ◦(c2 :|g2 |) (e2 )). 1 1 2 2 In our SQL-notation based variant of Γ, this equivalence reads ΓG+ ;F (e1 Bq e2 ) ≡ χF12 ⊗c2 ,F22 ⊗c1 ( (7.103) ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )) 1 1 2 2 and it holds if F is splittable into F1 and F2 such that F(Fi ) ⊆ A(ei ) and F = F1 ◦ F2 , and Fi is splittable and decomposable into Fi1 and Fi2 . Consider the expression ΓG;g;F (e1 Bq e2 ). We denote the set of join attributes of q from ei as Ji = F(q) ∩ A(ei ) for i = 1, 2, and the set of all join attributes by J = J1 ∪ J2 . If J ⊆ G, we have the above case. Assume J ̸⊆ G. Define G+ = G ∪ J, Gi = G ∩ A(ei ) and G+ i = Gi ∪ JI for i = 1, 2. Let F be an aggregation vector splittable into F1 and F2 such that F(Fi ) ⊆ A(ei ) and F = F1 ◦ F2 . Further, let Fi be decomposable into Fi1 and Fi2 . Then Eqvs. 7.13 and 7.103, together with the properties of aggregation functions and vectors discussed in Sec. 7.2, give us the following ΓG;F (e1 Bq e2 ) ≡ ΓG;F12 ,F22 (ΓG+ ;F11 ,F21 (e1 Bq e2 )), ≡ ΓG;F12 ,F22 (χF 1,2 ⊗c2 ,F 1,2 ⊗c1 ( 1 2 ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 ) Bq ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 ))), 1 1 2 2 ≡ ΓG;F12 ⊗c2 ,F22 ⊗c1 ( ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )), 1 1 2 2 272 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES where C = G ∪ A(F ). It should be obvious that the expression on the righthand side can be cleaned up in case it contains several count(∗), or no count(∗) is needed because all aggregation functions in some Fi are duplicate agnostic. By now, the reader should be prepared to understand and prove the equivalences provided by Yan and Larson, which we present next. Let e1 and e2 be two expressions and q a join predicate for them. Define for i = 1, 2 the following sets of join attributes Ji = F(q) ∩ A(ei ). Let F1 = (b1 : agg1 (a1 ), . . . , bk : aggk (ak )) and F2 = (bk+1 : aggk+1 (ak+1 ), . . . , bm : aggm (am )) be two aggregation vectors. Define A1 = {a1 , . . . , ak }, A2 = {ak+1 , . . . , am }, F = F1 ◦ F2 , A = A1 ∪ A2 , and B = {b1 , . . . , bm }. Let G be a set of grouping attributes and define Gi ∩ A(ei ). We denote by G+ i the union of the grouping and join attributes of ei , that is, G+ = G ∪ J . i i i Eager/Lazy Groupby-Count The following equivalence corresponds to the main theorem of Yan and Larson [947]. It states that ΓG;F (e1 Bq e2 ) ≡ ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Bq e2 ) 1 1 (7.104) holds if F is splittable and F1 is decomposable into F11 and F12 . The proof of it can be found in [947]. From Eqv. 7.104 several other equivalences can be derived easily. First, since the join is commutative, ΓG;F (e1 Bq e2 ) ≡ ΓG;(F1 ⊗c2 )◦F22 (e1 Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )) 2 2 (7.105) holds if F is splittable and F2 is decomposable into F21 and F22 . Eager/Lazy Group-by If F2 is empty, that is F2 = (), then Eqv. 7.104 simplifies to ΓG;F (e1 Bq e2 ) ≡ ΓG;F12 (ΓG+ ;F 1 (e1 ) Bq e2 ). 1 1 (7.106) This equivalence holds if F1 is decomposable into F11 and F12 . If F1 is empty, then Eqv. 7.105 simplifies to ΓG;F (e1 Bq e2 ) ≡ ΓG;F22 (e1 Bq ΓG+ ;F 1 (e2 )). 2 2 (7.107) This equivalence holds if F2 is decomposable into F21 and F22 . Eager/Lazy Count If F1 = (), then Eqv. 7.104 simplifies to ΓG;F (e1 Bq e2 ) ≡ ΓG;(F2 ⊗c1 ) (ΓG+ ;c1 :count(∗) (e1 ) Bq e2 ). 1 (7.108) If F2 = (), then Eqv. 7.105 simplifies to ΓG;F (e1 Bq e2 ) ≡ ΓG;(F1 ⊗c2 ) (e1 Bq ΓG+ ;c2 :count(∗) (e2 )). 2 (7.109) 273 7.11. EQUIVALENCES FOR UNARY GROUPING Double Eager/Lazy For the next equivalence, assume F2 = (). Then ΓG;F (e1 Bq e2 ) ≡Eqv.7.106 ΓG;F12 (ΓG+ ;F 1 (e1 ) Bq e2 ) 1 1 ≡Eqv.7.109 ΓG;(F12 ⊗c2 ) (ΓG+ ;F 1 (e1 ) Bq ΓG+ ;c2 :count(∗) (e2 )). 1 2 1 Thus, ΓG;F (e1 Bq e2 ) ≡ ΓG;(F12 ⊗c2 ) (ΓG+ ;F 1 (e1 ) Bq ΓG+ ;c2 :count(∗) (e2 )) 1 2 1 (7.110) if F1 is decomposable into F11 and F12 . If F1 is empty, then, due to the commutativity of the join, the equivalence ΓG;F (e1 Bq e2 ) ≡ ΓG;(F22 ⊗c1 ) (ΓG+ ;c1 :count(∗) (e1 ) Bq ΓG+ ;F 1 (e2 )) 1 2 2 (7.111) holds if F2 is decomposable into F21 and F22 . Eager/Lazy Split Applying Eqv. 7.104 and then Eqv. 7.105 results in the equivalence ΓG;F (e1 Bq e2 ) ≡ ΓG;(F12 ⊗c2 )◦(F22 ⊗c1 ) ( (7.112) ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )), 1 1 2 2 which holds if F is splittable into F1 and F2 , F1 is decomposable into F11 and F12 , and F2 is decomposable into F21 and F22 . Eliminating the top grouping Historically, the first equivalence that reordered grouping and join was derived by Yan and Larson [946]. Opposed to the equivalences above, it has no final grouping on the right-hand side. Grouping is simply pushed down into the right path. Before we present the equivalence, we need some specialization of our notation. Let G be a set of grouping attributes, F = (b1 : agg1 (a1 ), . . . , bk : aggk (ak )) an aggregation vector, A = {a1 , . . . , ak } the set of aggregated attributes, B = {b1 , . . . , bk } the set of attributes containing the results of the aggregations, G = G1 ∪ G2 the grouping attributes, Gi = G ∩ A(ei ), q a join predicate, Ji = F(q) ∩ A(ei ), and G+ 1 = G1 ∪ J1 . The following equivalence demands that aggregation functions are only applied to the attributes of e1 . That is, A ∩ F(e2 ) = ∅. The equivalence ΓG;F (e1 Bq e2 ) ≡ ΠG,B (ΓG+ ;F (e1 ) Bq ΠG+ (e2 )) 1 2 (7.113) holds if and only if the following two functional dependencies hold in e1 Bq e2 : FD 1 (G1 , G2 ) → G+ 1 and FD 2 (G+ 1 , G2 ) → TID(e2 ). 274 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES R1 R2good g1 j1 a1 g2 j2 1 1 2 1 1 1 2 4 2 2 2 8 1 eg12 := R1 1j1 =j2 R2good g1 j1 a1 g2 j2 1 1 2 1 1 1 2 4 2 2 1 2 8 2 2 bad b e12 := R1 1j1 =j2 R2 g1 j1 a1 g2 j2 1 1 2 1 1 1 2 4 1 2 1 2 8 1 2 eu12 := R1 1j1 =j2 R2ugly g1 j1 a1 g2 j2 k2 1 1 2 1 1 1 1 1 2 2 1 2 1 1 2 2 1 3 G1 := Γg1 ,j1 ;F (R1 ) g1 j1 c1 b1 1 1 1 2 2 2 12 1 G1 := Γg1 ,j1 ;F (R1 ) g1 j1 c1 b1 1 1 1 2 1 2 2 12 R2bad g2 j2 1 1 1 2 R2ugly g2 j2 k2 1 1 1 2 1 2 2 1 3 E1g := Γg1 ,g2 ;F (eg12 ) g1 g2 c1 b1 1 1 1 2 1 2 2 12 E1b := Γg1 ,g2 ;F (eb12 ) g1 g2 c1 b1 1 1 3 14 E1u := Γg1 ,g2 ;F (eu12 ) g1 g2 c1 b1 1 1 1 2 1 2 2 4 E2g := G1 1j1 =j2 R2good g1 j1 c1 b1 g2 j2 1 1 1 2 1 1 1 2 2 12 2 2 E2b := G1 1j1 =j2 R2bad g1 j1 c1 b1 g2 j2 1 1 1 2 1 1 1 2 2 12 1 2 ugly u E2 := G1 1j1 =j2 R2 G1 := Γg1 ,j1 ;F (R1 ) g1 j1 c1 b1 g2 j2 k2 g1 j1 c1 b1 1 1 1 2 1 1 1 1 1 1 2 1 1 1 2 2 1 2 1 2 2 12 1 1 1 2 2 1 3 where F = [c1 : count(∗), b1 : sum(a1 )]. Figure 7.14: Example for Eqv. 7.113 For FD 2, we use an artificial attribute TID for expression e2 . It can be present explicitly or just in the mind of the query compiler. Its purpose is to uniquely identify a tuple in e2 . A consequence of FD 2 is that e2 cannot contain duplicates (without the TID attribute!). This point is illustrated in the example below. Further, since it does not contain duplicates (again, if TID is ignored), we might assume that e2 has a key (either an artificially generated one using the tid operator or, in case it is a base relation, a user-specified primary key). Then, we can replace the TID on the right-hand side by the key. Further note that the functional dependencies can be simplified. We did 7.11. EQUIVALENCES FOR UNARY GROUPING 275 not do so since we wanted to state them in the same way as Yan and Larson did. As an exercise, the reader should perform the simplification. The purpose of the functional dependencies can be sketched as follows. FD 1 ensures that each group on the left-hand side corresponds to one group on the right-hand side. That is, the grouping by G+ 1 is not finer grained than the grouping by G. FD 2 ensures that each row in the left argument of the join on the right-hand side contributes at most one row to the overall result of the right-hand side. This is illustrated by the following examples. Fig 7.14 contains a relation R1 , which we use for expression e1 , and three relations R2good , R2bad , and R2ugly , which we use for expression e2 . All of them are depicted in the top row of Fig. 7.14. The next three rows contain the evaluations of the left-hand side of Eqv. 7.113, divided into two steps. The first step (left column) calculates the join between R1 and each of the possibilities for e2 . The second step groups the result of the join (right column). The last three columns evaluate the right-hand side of Eqv. 7.113. Again, the calculation is separated into two steps. The first step does the grouping, the second step the join. We leave the execution of the final projection to the reader. For this example, the functional dependencies read as follows: FD 1 (g1 , g2 ) → g1 , j1 and FD 2 (g1 , j1 , g2 ) → tid(e2 ). In case of R2good , we observe that both functional dependencies hold. We further observe that the left-hand side and the right-hand side of Eqv. 7.113 give the same result. In case of R2bad , we observe that FD 1 is violated and FD 2 is satisfied. The results of the left-hand side and the right-hand side of Eqv. 7.113 differ. In case of R2ugly , we added an explicit key column (k2 ), which can serve as its TID. We observe that FD 1 is satisfied, but FD 2 is violated. Again, the results of the left-hand side and the right-hand side of Eqv. 7.113 differ. As an exercise, the reader should apply Eqvs. 7.104 to 7.110 to the examples. Yan and Larson [946] also give two extended equivalences, which allow an additional projection, either duplicate preserving or eliminating on top of the right-hand side of Eqv. 7.113. Obviously, this transformation is valid for all equivalences. With C ⊆ G ∪ A(F ), we therefore get ΠC (ΓG;F (e1 Bq e2 )) ≡ ΠC (ΓG+ ;F (e1 ) Bq e2 ), 1 D ΠD C (ΓG;F (e1 Bq e2 )) ≡ ΠC (ΓG+ ;F (e1 ) Bq e2 ), 1 (7.114) (7.115) which hold if FD 1 and FD 2 hold. After having seen the problems which can occur if we skip the top-level grouping, let us now prove the following equivalences, which result from Eqvs. 7.104 to 7.112 by eliminating the top-level grouping. Let C = G ∪ A(F ). Without the necessary conditions, which will be discussed afterwards, we have the following 276 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES equivalences: ΓG;F (e1 Bq e2 ) ≡ ΠC (χ(F\ c2 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Bq e2 )), (7.116) ⊗c )◦F 2 1 1 1 1 ΓG;F (e1 Bq e2 ) ≡ ΠC (χ(F\ c2 (e1 Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 ))), (7.117) ⊗c )◦F 1 2 2 2 2 ΓG;F (e1 Bq e2 ) ≡ ΠC (ΓG+ ;F (e1 ) Bq e2 ) (7.118) 1 ΓG;F (e1 Bq e2 ) ≡ ΠC (e1 Bq ΓG+ ;F (e2 )) (7.119) 2 (ΓG+ ;c1 :count(∗) (e1 ) Bq e2 )), ΓG;F (e1 Bq e2 ) ≡ ΠC (χF\ ⊗c 2 1 (7.120) 1 (e1 Bq ΓG+ ;c2 :count(∗) (e2 ))), ΓG;F (e1 Bq e2 ) ≡ ΠC (χF\ ⊗c 1 2 (7.121) 2 ΓG;F (e1 Bq e2 ) ≡ ΠC (χ \ (ΓG+ ;F 1 (e1 ) Bq ΓG+ ;c2 :count(∗) (e2 ))), (7.122) 2 F1 ⊗c2 1 F2 ⊗c1 1 2 1 ΓG;F (e1 Bq e2 ) ≡ ΠC (χ \ (ΓG+ ;c1 :count(∗) (e1 ) Bq ΓG+ ;F 1 (e2 ))), (7.123) 2 ΓG;F (e1 Bq e2 ) ≡ ΠC (χ \ 2 2 F1 ⊗c2 ◦F\ 2 ⊗c1 2 2 ( ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2(7.124) ))). 1 1 2 2 We can prove Eqv. 7.117 by eliminating the top-most grouping operator on the right-hand side of Eqv. 7.105 via an application of Eqv. 7.16 followed by an application of remove-group-by (Eqv. 7.97): ΓG;F (e1 Bq e2 ) ≡7.105 ΓG;(F1 ⊗c2 )◦F22 (e1 Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )) 2 ≡7.16 ≡7.97 2 ΠC (ΓG1 ,G+ ;(F1 ⊗c2 )◦F 2 (e1 Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 ))) 2 ΠC (χ 2 2 2 2( \ (F1 ⊗c 2 )◦F2 e1 Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 ))) 2 2 Let us now come to the conditions attached to the equivalences. For our discussion, we denote by I the join with its (grouped) arguments, i.e., I = e1 Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )). 2 2 The precondition of Eqv. 7.16 requires G → G1 , G+ 2 to hold. Thus, a grouping + by G1 , G2 is not finer grained than a grouping by G. We still have to make sure that the precondition required by Eqv. 7.97 holds. In our context, the precondition is that G1 , G+ (I) is 2 → TID(I) holds in I, or, equivalently, ΠG1 ,G+ 2 duplicate-free. Clearly, ΠG+ (ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )) is duplicate-free. It follows 2 2 2 that ΠG1 ,G+ (I) is duplicate-free if G1 , G+ 2 → TID(e1 ) holds in I. Summarizing, 2 Eqv. 7.117 holds if G → G+ and G1 , G+ 2 → TID(e1 ) both hold in I. Eqv. 7.117 can be further simplified. If F1 is empty, some simplification yields Eqv.7.119: ΓG;F (e1 Bq e2 ) ≡7.117 ≡ ΠC (χ \ 2 (F1 ⊗c 2 )◦F2 (e1 Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 ))) ΠC (e1 Bq ΓG+ ;F (e2 )). 2 2 2 7.11. EQUIVALENCES FOR UNARY GROUPING 277 By symmetry, Eqvs. 7.116 and 7.118 hold. Since Eqvs. 7.120 and 7.121 are also simplifications of Eqvs. 7.116 and 7.117, they can be proven similarily. Let us turn to Eqv. 7.124. Since ΠG+ (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 )) 1 1 1 and ΠG+ (ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )) 2 2 2 are duplicate-free, Eqv. 7.124 holds if G → G+ . Eqvs. 7.122 and 7.123 follow by simplifications from Eqv.7.124, if F1 or F2 is empty. 7.11.3 Left Outerjoin Consider a left outerjoin followed by a grouping operator. The goal is to push down the grouping operator into the arguments of the outerjoin. In order to do so, we will first mirror (and apply) Eqv. 7.104. The definition of the left outerjoin gives us ΓG;F (e1 Eq e2 ) ≡ ΓG;F ((e1 Bq e2 ) ∪ ((e1 Tq e2 ) A {⊥A(e2 ) })). As before, we define Ji = F(q)∩A(ei ), J = J1 ∪J2 , Gi = G∩A(ei ), G+ i = Gi ∪Ji , G+ = G ∪ J. We further demand that F is a splittable and decomposable aggregation vector. Define C = G ∪ A(F ). We also define abbreviations for some subexpressions: Ej = (e1 Bq e2 ), Ea = ((e1 Tq e2 ) A {⊥A(e2 ) }), E⊥ = {⊥A(e2 ) }. We could consider two cases to push down the grouping operator into the arguments of the outerjoin. Case 1 requires ΠG (Ej ) ∩ ΠG (Ea ) = ∅, and case 2 requires ΠG (Ej )∩ΠG (Ea ) ̸= ∅. The former condition is fulfilled, e.g., if G → J1 . Then, we can apply Eqv. 7.26 in case 1 and Eqv. 7.27 in case 2. Since Eqv. 7.27 also holds if ΠG (Ej ) ∩ ΠG (Ea ) = ∅, it suffices to apply it. As an exercise, the reader should consider case 1 explicitly. Eqv. 7.27 gives us ΓG;F1 ,F2 ((e1 Bq e2 ) ∪ ((e1 Tq e2 ) A E⊥ )) ≡ ΓG;F12 ,F22 (ΓG;F11 ,F21 (e1 Bq e2 ) ∪ ΓG;F11 ,F21 ((e1 Tq e2 ) A E⊥ )) where we expanded F to F1 , F2 . Applying Eqv. 7.104 to the left branch of the union gives us ΓG;F11 ,F21 (e1 Bq e2 ) ≡ ΠC (ΓG;(F 1 ⊗c1 )◦F 1,2 (ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 ) Bq e2 )) 2 1 1 1 Applying Eqv. 7.104 and then Eqv. 7.23 to the right branch of the union gives us: ΓG;F11 ,F21 ((e1 Tq e2 ) A E⊥ ) ≡ ΠC (ΓG;(F 1 ⊗c1 )◦F 1,2 (ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 Tq e2 ) A E⊥ )) 2 1 1 1 ≡ ΠC (ΓG;(F 1 ⊗c1 )◦F 1,2 ((ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 ) Tq e2 ) A E⊥ )) 2 1 1 1 278 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES Putting these two things together yields ΓG;F (e1 Eq e2 ) ≡ ΓG;F12 ,F22 ( ΓG;(F 1 ⊗c1 )◦F 1,2 (ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 ) Bq e2 ) 1 1 1 2 ∪ ΓG;(F 1 ⊗c1 )◦F 1,2 ((ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 ) Tq e2 ) A E⊥ )) 1 1 1 2 ≡ ΓG;(F2 ⊗c1 )◦F12 ( (ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 ) Bq e2 ) 1 1 ∪ ((ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 ) Tq e2 ) A E⊥ )) 1 1 ≡ ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 ) Eq e2 ) 1 1 where in the first step we could omit the ΠC due to the subsequent grouping. The second step pulls the two ΓG;(F 1 ⊗c1 )◦F 1,2 operators out of the two union 2 1 branches and merges them with the outer ΓG;F12 ,F22 . This is possible due to the properties of the aggregation vectors involved and the fact that both group on the same set G of grouping attributes. Eager/Lazy Groupby-Count Summarizing, we have the equivalence ΓG;F (e1 Eq e2 ) ≡ ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Eq e2 ), 1 1 (7.125) which holds if F is splittable into F1 and F2 with respect to e1 and e2 , and Fi is decomposable into Fi1 and Fi2 . The companion of Eqv. 7.105 is F 1 (∅),c2 :1 ΓG;F (e1 Eq e2 ) ≡ ΓG;(F1 ⊗c2 )◦F22 (e1 Eq 2 ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )). (7.126) 2 2 To prove it, we start with ΓG;F (e1 Eq e2 ) ≡ ΓG;F ((e1 Bq e2 ) ∪ ((e1 Tq e2 ) A E⊥ )) ≡ ΓG;F12 ,F22 (ΓG;F11 ,F21 (e1 Bq e2 ) ∪ ΓG;F11 ,F21 ((e1 Tq e2 ) A E⊥ )), where E⊥ = {⊥A(e2 ) }. Applying Eqv. 7.105 to the left argument of the union results in ΓG;F11 ,F21 (e1 Bq e2 ) ≡ ΓG;(F11 ⊗c2 )◦F22 (e1 Bq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )). 2 2 Applying Eqv. 7.105 to the right argument of the union yields ΓG;F11 ,F21 ((e1 Tq e2 ) A E⊥ )) ≡ ΓG;(F11 ⊗c2 )◦F22 ((e1 Tq e2 ) A ΓG+ ;F 1 ◦(c2 :count(∗)) (E⊥ )) 2 2 ≡ ΓG;(F11 ⊗c2 )◦F22 ((e1 Tq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )) 2 2 AΓG+ ;F 1 ◦(c2 :count(∗)) (E⊥ )) 2 2 ≡ ΓG;(F11 ⊗c2 )◦F22 ((e1 Tq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )) 2 2 AΠG+ ∪A(F )∪{c2 } (χF21 (∅),c2 :1 (E⊥ ))), 2 and the claim follows. 279 7.11. EQUIVALENCES FOR UNARY GROUPING Eager/Lazy Group-by If F2 = (), Eqv. 7.125 simplifies to ΓG;F (e1 Eq e2 ) ≡ ΓG;F12 (ΓG+ ;F 1 (e1 ) Eq e2 ) 1 (7.127) 1 This equivalence holds if F1 is decomposable into F11 and F12 . If F1 = (), Eqv. 7.126 simplifies to F 1 (∅) ΓG;F (e1 Eq e2 ) ≡ ΓG;F22 (e1 Eq 2 ΓG+ ;F 1 (e2 )), (7.128) ΓG;F (e1 Eq e2 ) ≡ ΓG;(F2 ⊗c1 ) (ΓG+ ;(c1 :count(∗)) (e1 ) Eq e2 ). (7.129) 2 2 which holds if F2 is decomposable. Eager/Lazy Count If F1 = (), Eqv. 7.125 simplifies to 1 This equivalence holds if F2 is decomposable into F21 and F22 . If F2 = (), Eqv. 7.126 simplifies to ΓG;F (e1 Eq e2 ) ≡ ΓG;(F1 ⊗c2 ) (e1 Ecq2 :1 ΓG+ ;c2 :count(∗) (e2 )). 2 (7.130) Double Eager/Lazy For the next equivalence assume F2 = (). We would like to derive an equivalence similar to Eqv. 7.110. Here it is: ΓG;F (e1 Eq e2 ) ≡ ΓG;(F12 ⊗c2 ) (ΓG+ ;F 1 (e1 ) Ecq2 :1 ΓG+ ;c2 :count(∗) (e2 )), 1 1 2 (7.131) which holds if F1 is decomposable into F11 and F12 . If F1 = () and F2 is decomposable into F21 and F22 , the equivalence F 1 (∅) ΓG;F (e1 Eq e2 ) ≡ ΓG;(F22 ⊗c1 ) (ΓG+ ;c1 :count(∗) (e1 ) Eq 2 1 ΓG+ ;F 1 (e2 )) 2 2 (7.132) holds. Eager/Lazy Split The companion of Eqv. 7.112 for the left outerjoin is ΓG;F (e1 Eq e2 ) ≡ ΓG;(F12 ⊗c2 )◦(F22 ⊗c1 ) ( (7.133) F21 (∅),c2 :1 ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Eq 1 1 ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )), 2 2 which holds if F1 is decomposable into F11 and F12 , and F2 is decomposable into F21 and F22 . 280 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES Eliminating the top grouping We can eliminate the top grouping in the above equivalences for the outerjoin by the same arguments as for the join. The resulting equivalences are ΓG;F (e1 Eq e2 ) ≡ ΠC (χ(F\ c2 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Eq e2 )), ⊗c )◦F 2 1 1 1 1 F 1 (∅),c2 :1 2 ΓG;F (e1 Eq e2 ) ≡ ΠC (χ(F\ c2 (e1 Eq ⊗c )◦F 1 2 (7.135) ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 ))), 2 2 2 ΓG;F (e1 Eq e2 ) ≡ ΠC (ΓG+ ;F (e1 ) Eq e2 ), (7.136) 1 ΓG;F (e1 Eq e2 ) ≡ ΠC (e1 EFq (∅) ΓG+ ;F (e2 )), (7.137) 2 (ΓG+ ;(c1 :count(∗)) (e1 ) Eq e2 )), ΓG;F (e1 Eq e2 ) ≡ ΠC (χF\ ⊗c 2 1 (7.138) 1 ΓG;F (e1 Eq e2 ) ≡ ΠC (χF\ (e1 Ecq2 :1 ΓG+ ;c2 :count(∗) (e2 ))), ⊗c 1 2 (7.139) 2 ΓG;F (e1 Eq e2 ) ≡ ΠC (χ \ (ΓG+ ;F 1 (e1 ) Ecq2 :1 ΓG+ ;c2 :count(∗) (e2 ))), 2 F1 ⊗c2 1 1 2 F 1 (∅) ΓG;F (e1 Eq e2 ) ≡ ΠC (χ \ (ΓG+ ;c1 :count(∗) (e1 ) Eq 2 2 ΓG;F (e1 Eq e2 ) ≡ ΠC (χ F2 ⊗c1 1 ( \ 2 2 G;F\ 1 ⊗c2 ◦F2 ⊗c1 F 1 (∅),c2 :1 ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Eq 2 1 (7.134) 1 ΓG+ ;F 1 (e2 )), 2 2 (7.140) (7.141) ΓG+ ;F 1 ◦(c2 :count(∗))(7.142) (e2 ))). 2 2 These equivalences hold if in addition to the according conditions concerning splittability, decomposability, and emptyness, the following functional dependencies hold: Eqv. 7.134 Eqv. 7.135 Eqv. 7.136 Eqv. 7.137 Eqv. 7.138 Eqv. 7.139 Eqv. 7.140 Eqv. 7.141 Eqv. 7.142 7.11.4 + G → G1 , G+ 2 , G1 , G2 → TID(e1 ), + G → G1 , G+ 2 , G1 , G2 → TID(e1 ), + G → G1 , G2 , G1 , G+ 2 → TID(e1 ), + G → G1 , G2 , G1 , G+ 2 → TID(e1 ), + G → G1 , G+ , G , G 1 2 2 → TID(e1 ), + G → G1 , G2 , G1 , G+ 2 → TID(e1 ), + G → G1 , G2 , G → G1 , G+ 2, G → G1 , G+ 2. Left Outerjoin with Default Main Equivalences Let us next consider the outerjoin with default. For a set of attributes {d1 , . . . , dl } ⊆ A(e2 ) of e2 , constants c1 , . . . , cl and a vector D = d1 : c1 , . . . dl : cl , we now consider the expression ΓG;F (e1 ED q e2 ). If we take a close look at the proof of Eqv. 7.125 and think of E⊥ as being defined as E⊥ := (⊥A(e2 )\A(D) A {D}), 281 7.11. EQUIVALENCES FOR UNARY GROUPING we see that the proof remains valid. Thus, we have the following equivalences: ΓG;F (e1 ED q e2 ) ≡ ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Eq e2 ), D,F21 (∅),c2 :1 ΓG;F (e1 ED q e2 ) ≡ ΓG;(F1 ⊗c2 )◦F22 (e1 Eq ΓG;F (e1 ED q e2 ) ≡ (7.143) 1 1 ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )), (7.144) 2 2 ΓG;F12 (ΓG+ ;F 1 (e1 ED q e2 )) 1 1 (7.145) if F2 is empty, D,F21 (∅) ΓG;F (e1 ED q e2 ) ≡ ΓG;F22 (e1 Eq ΓG+ ;F 1 (e2 )) 2 (7.146) 2 if F1 is empty, ΓG;F (e1 ED q e2 ) ≡ ΓG;(F2 ⊗c1 ) (ΓG+ ;(c1 :count(∗)) (e1 ) ED q e2 ) (7.147) 2 :1 ≡ ΓG;(F1 ⊗c2 ) (e1 ED,c ΓG+ ;c2 :count(∗) (e2 )) q (7.148) 2 :1 ≡ ΓG;(F1 ⊗c2 ) (ΓG+ ;F 1 (e1 ) ED,c ΓG+ ;c2 :count(∗) (e2 )) q (7.149) 1 if F1 is empty, ΓG;F (e1 ED q e2 ) 2 if F2 is empty, ΓG;F (e1 ED q e2 ) 1 1 2 if F2 is empty, D,F21 (∅) ΓG;F (e1 ED q e2 ) ≡ ΓG;(F22 ⊗c1 ) (ΓG+ ;c1 :count(∗) (e1 ) Eq 1 ΓG+ ;F 1 (e2 )) 2 2 (7.150) if F1 is empty, ΓG;F (e1 ED q e2 ) ≡ ΓG;(F1 ⊗c2 )◦(F2 ⊗c1 ) ( (7.151) D,F21 (∅),c2 :1 ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Eq 1 1 ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )). 2 2 These equivalences hold under the same conditions as their corresponding equivalences for the outerjoin with no default. Eliminating the top grouping This can be performed analogously to the left outerjoin without default. 7.11.5 Full Outerjoin The next expression we consider is ΓG;F (e1 Kq e2 ). In order to deal with this expression, we will need the full outerjoin with defaults for both sides. Define E1⊥ = {⊥A(e1 ) } and let us start by observing ΓG;F (e1 Kq e2 ) ≡ ΓG;F ((e1 Eq e2 ) ∪ ((e2 Tq e1 ) A E1⊥ )) ≡ ΓG;F 2 (ΓG;F 1 (e1 Eq e2 ) ∪ ΓG;F 1 ((e2 Tq e1 ) A E1⊥ )). Applying Eqv. 7.125 to the left-hand side of the union results in ΓG;F 1 (e1 Eq e2 ) ≡ ΓG;(F 1 ⊗c1 )◦F 1,2 (ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 ) Eq e2 ) 2 ≡ 1 1 1 F 1,1 (∅),c2 :1 ΓG;(F 1 ⊗c2 )◦F 1,2 (e1 Eq 2 ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 )). 1 2 2 2 282 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES Applying Eqvs. 7.104 and 7.23 to the right-hand side of the union yields ΓG;F 1 ((e2 Tq e1 ) A E1⊥ ) ≡ ΓG;(F 1 ⊗c2 )◦F 1,2 (ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 Tq e1 ) A E1⊥ ) 2 1 2 2 ≡ ΓG;(F 1 ⊗c2 )◦F 1,2 ((ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 ) Tq e1 ) A E1⊥ ). 2 2 2 1 Putting these things together, we have ΓG;F (e1 Kq e2 ) ≡ ΓG;F 2 ( F 1,1 (∅),c2 :1 ΓG;(F 1 ⊗c2 )◦F 1,2 (e1 Eq 2 2 1 ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 )) 2 ∪ 2 (ΓG;(F 1 ⊗c2 )◦F 1,2 ((ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 ) Tq e1 ) A E1⊥ ))) 1 2 2 2 ≡ ΓG;(F1 ⊗c2 )◦F22 ( F 1,1 (∅),c2 :1 (e1 Eq 2 ∪ ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 )) 2 2 ((ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 ) Tq e1 ) A E1⊥ )) 2 2 −;F21,1 (∅),c2 :1 ≡ ΓG;(F1 ⊗c2 )◦F22 (e1 Kq ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 )). 2 2 Eager/Lazy Groupby-Count Due to the commutativity of the full outerjoin, we thus have F 1 (∅),c1 :1;− ΓG;F (e1 Kq e2 ) ≡ ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 )Kq 1 1 1 e2 ) (7.152) if F is splittable and F1 is decomposable into F11 and F12 . If F is splittable and F2 is decomposable into F21 and F22 , −;F21 (∅),c2 :1 ΓG;F (e1 Kq e2 ) ≡ ΓG;(F1 ⊗c2 )◦F22 (e1 Kq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )) (7.153) 2 2 holds. Eager/Lazy Group-by If F2 is empty, then Eqv. 7.152 simplifies to F 1 (∅);− ΓG;F (e1 Kq e2 ) ≡ ΓG;F12 (ΓG+ ;F 1 (e1 ) Kq 1 1 1 e2 ). (7.154) This equivalence holds if F1 is decomposable into F11 and F12 . If F1 is empty, then Eqv. 7.153 simplifies to −;F21 (∅) ΓG;F (e1 Kq e2 ) ≡ ΓG;F22 (e1 Kq ΓG+ ;F 1 (e2 )). 2 2 This equivalence holds if F2 is decomposable into F21 and F22 . (7.155) 283 7.11. EQUIVALENCES FOR UNARY GROUPING Eager/Lazy Count If F1 is empty, then Eqv. 7.152 simplifies to ΓG;F (e1 Kq e2 ) ≡ ΓG;(F2 ⊗c1 ) (ΓG+ ;(c1 :count(∗)) (e1 ) Kcq1 :1;− e2 ). (7.156) 1 If F2 is empty, then Eqv. 7.153 simplifies to 2 :1 ΓG;F (e1 Kq e2 ) ≡ ΓG;(F1 ⊗c2 ) (e1 K−;c ΓG+ ;(c2 :count(∗)) (e2 )). q (7.157) 2 Double Eager/Lazy If F2 is empty, the equivalence F 1 (∅);c2 :1 ΓG;F (e1 Kq e2 ) ≡ ΓG;(F12 ⊗c2 ) (ΓG+ ;F 1 (e1 ) Kq 1 ΓG+ ;(c2 :count(∗)) (e2 )) 2 (7.158) 1 2 holds if F1 is decomposable into F1 and F1 . If F1 is empty, the equivalence 1 1 c :1;F21 (∅) ΓG;F (e1 Kq e2 ) ≡ ΓG;(F22 ⊗c1 ) (ΓG+ ;(c1 :count(∗)) (e1 ) Kq1 1 ΓG+ ;F 1 (e2 )) 2 2 (7.159) holds if F2 is decomposable into F21 and F22 . Proof: If F2 is empty, then F 1 (∅);− ΓG;F (e1 Kq e2 ) ≡Eqv. 7.154 ΓG;F12 (ΓG+ ;F 1 (e1 ) Kq 1 1 1 e2 ) F 1 (∅);c2 :1 ≡Eqv. 7.157 ΓG;(F12 ⊗c2 ) (ΓG+ ;F 1 (e1 ) Kq 1 1 1 ΓG+ ;(c2 :count(∗)) (e2 )). 2 If F1 is empty, then −;F21 (∅) ΓG;F (e1 Kq e2 ) ≡Eqv. 7.155 ΓG;F22 (e1 Kq ≡Eqv. 7.156 ΓG+ ;F 1 (e2 )) 2 2 c :1;F21 (∅) ΓG;(F22 ⊗c1 ) (ΓG+ ;(c1 :count(∗)) (e1 ) Kq1 ΓG+ ;F 1 (e2 )). 2 1 2 2 Eager/Lazy Split If F is splittable and decomposable, then ΓG;F (e1 Kq e2 ) ≡ ΓG;(F12 ⊗c2 )◦(F22 ⊗c1 ) ( (7.160) F 1,1 (∅),c1 :1;F21,1 (∅),c2 :1 ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 ) Kq 1 1 1 ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 )). 2 2 Proof: ΓG;F (e1 Kq e2 ) F 1 (∅),c1 :1;− ≡Eqv. 7.152 ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Kq 1 1 1 F 1 (∅),c1 :1;F21 (∅),c2 :1 ≡Eqv. 7.153 ΓG;(F12 ⊗c2 )◦(F22 ⊗c1 ) (ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 ) Kq 1 1 1 e2 )) ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 )) 2 2 2 284 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES Eliminating the top grouping Under the same conditions under which their counterparts are valid, the following equivalences hold for the full outerjoin: F 1 (∅),c1 :1;− 1 ΓG;F (e1 Kq e2 ) ≡ ΠC (χ(F\ c2 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Kq ⊗c )◦F 2 1 1 1 1 −;F21 (∅),c2 :1 ΓG;F (e1 Kq e2 ) ≡ ΠC (χ(F\ c2 (e1 Kq ⊗c )◦F 1 2 ΓG+ ;F 1 ◦(c2 :count(∗)) (e(7.162) 2 ))), 2 2 (7.161) e2 )), 2 ΠC (ΓG+ ;F (e1 ) KFq (∅);− e2 ), 1 (7.163) ΓG;F (e1 Kq e2 ) ≡ ΠC (e1 Kq−;F (∅) ΓG+ ;F (e2 )), (7.164) ΓG;F (e1 Kq e2 ) ≡ 2 (ΓG+ ;(c1 :count(∗)) (e1 ) Kcq1 :1;− e2 )), ΓG;F (e1 Kq e2 ) ≡ ΠC (χG;F\ ⊗c 2 1 1 2 :1 ΓG;F (e1 Kq e2 ) ≡ ΠC (χF\ (e1 K−;c ΓG+ ;(c2 :count(∗)) (e2 ))), q ⊗c 1 2 2 F 1 (∅);c2 :1 ΓG;F (e1 Kq e2 ) ≡ ΠC (χ \ (ΓG+ ;F 1 (e1 ) Kq 1 2 F1 ⊗c2 ΓG;F (e1 Kq e2 ) ≡ ΠC (χ ΓG;F (e1 Kq e2 ) ≡ ΠC (χ 1 1 (7.165) (7.166) ΓG+ ;(c2 :count(∗)) (e(7.167) 2 ))), 2 c :1;F 1 (∅) (ΓG+ ;(c1 :count(∗)) (e1 ) Kq1 2 ΓG+ ;F 1(7.168) (e2 ))), \ 2 G;F2 ⊗c1 1 2 2 ( \ 2 2 G;F\ 1 ⊗c2 ◦F2 ⊗c1 ΓG+ ;F 1,1 ◦(c1 :count(∗)) (e1 ) 1 1 F 1,1 (∅),c1 :1;F21,1 (∅),c2 :1 Kq 1 ΓG+ ;F 1,1 ◦(c2 :count(∗)) (e2 ))). (7.169) 2 7.11.6 2 D-Join Next, let us turn to the d-join. The outline of this subsection mirrors the one for regular joins. Indeed, all equivalences that hold for regular joins will also hold for d-joins. Eager/Lazy Groupby-Count The equivalence ΓG;F (e1 Cq e2 ) ≡ ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Cq e2 ) 1 1 (7.170) holds if F1 is splittable into F1 and F2 , and F1 is decomposable into F11 and F12 . The equivalence ΓG;F (e1 Cq e2 ) ≡ ΓG;(F1 ⊗c2 )◦F22 (e1 Cq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )) 2 2 (7.171) holds if F2 is splittable into F1 and F2 , and F2 is decomposable into F21 and F22 . Eager/Lazy Group-by If F2 is empty, that is F2 = (), Eqv. 7.170 simplifies to ΓG;F (e1 Cq e2 ) ≡ ΓG;F12 (ΓG+ ;F 1 (e1 ) Cq e2 ). 1 1 (7.172) 285 7.11. EQUIVALENCES FOR UNARY GROUPING This equivalence holds if F1 is splittable and decomposable into F11 and F12 . If F1 is empty, Eqv. 7.171 simplifies to ΓG;F (e1 Cq e2 ) ≡ ΓG;F22 (e1 Cq ΓG+ ;F 1 (e2 )). 2 (7.173) 2 This equivalence holds if F2 is splittable and decomposable into F21 and F22 . Eager/Lazy Count If F1 is empty, Eqv. 7.170 simplifies to ΓG;F (e1 Cq e2 ) ≡ ΓG;(F2 ⊗c1 ) (ΓG+ ;c1 :count(∗) (e1 ) Cq e2 ). 1 (7.174) If F2 is empty, then Eqv. 7.171 simplifies to ΓG;F (e1 Cq e2 ) ≡ ΓG;(F1 ⊗c2 ) (e1 Cq ΓG+ ;c2 :count(∗) (e2 )). 2 (7.175) Double Eager/Lazy If F2 is empty ΓG;F (e1 Cq e2 ) ≡ ΓG;(F12 ⊗c2 ) (ΓG+ ;F 1 (e1 ) Cq ΓG+ ;c2 :count(∗) (e2 )), 1 1 2 (7.176) if F1 is splittable and decomposable into F11 and F12 . If F1 is empty ΓG;F (e1 Cq e2 ) ≡ ΓG;(F22 ⊗c1 ) (ΓG+ ;c1 :count(∗) (e1 ) Cq ΓG+ ;F 1 (e2 )) 1 2 2 (7.177) holds if F2 is splittable decomposable into F21 and F22 . Eager/Lazy Split Applying Eqv. 7.170 and then Eqv. 7.171 results in the equivalence ΓG;F (e1 Cq e2 ) ≡ ΓG;(F12 ⊗c2 )◦(F22 ⊗c1 ) ( (7.178) ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) Cq ΓG+ ;F 1 ◦(c2 :count(∗)) (e2 )), 1 1 2 2 which holds if F is splittable into F1 and F2 , F1 is decomposable into F11 and F12 , and F2 is decomposable into F21 and F22 . Eliminating the top grouping The top grouping can be eliminated under the conditions for the regular joins. 286 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES 7.11.7 Groupjoin Simple Facts about the Groupjoin Last in this section, we consider the groupjoin and thus the expressions of the form ΓG;F (e1 Zq;F̂ e2 ). Before we start, we discuss some equivalences for the groupjoin. Since σ and χ are linear and Z is linear in its left argument, it is easy to show that σp (e1 ZG1 θG2 ;g:f e2 ) ≡ σp (e1 ) ZG1 θG2 ;g:f e2 , χa:e (e1 ZG1 θG2 ;g:f e2 ) ≡ χa:e (e1 ) ZG1 θG2 ;g:f e2 . (7.179) (7.180) Then, we note that unary grouping can be expressed with the help of the groupjoin. ΓθG;f (e) ≡ ΠC (ρA(e1 )′ ←A(e1 ) (ρA(e1 )←A(e1 )′ (ΠD G (e)) ZG′ θG;f e)), ΓθG;g;F (e) ≡ ΠC (ρA(e1 )′ ←A(e1 ) (ρA(e1 )←A(e1 )′ (ΠD G (e)) ZG′ θG;g;F e)), ΓθG;F (e) ≡ ΠC (ρA(e1 )′ ←A(e1 ) (ρA(e1 )←A(e1 )′ (ΠD G (e)) ZG′ θG;F e)), where C on the right-hand side of an equivalence contains all attributes provided in the result of the left-hand side of the equivalence. The groupjoin itself can be expressed with the help of unary grouping and a left outerjoin: e1 ZG1 θG2 ;f e2 ≡ ΠC (e1 EG1 =G2 ΓθG2 ;f (e2 )), f (∅) (7.181) e1 ZG1 θG2 ;g;F e2 ≡ ΠC (e1 EG1 =G2 ΓθG2 ;g;F (e2 )), F (∅) (7.182) e1 ZG1 θG2 ;F e2 ≡ ΠC (e1 EG1 =G2 ΓθG2 ;F (e2 )), F (∅) (7.183) where C = G∪A(F ). We need to attach a small correction to these equivalences. Consider for example Eqv. 7.183. It only holds if F (∅) = F ({⊥A(e2 ) }). This is true in SQL-92 for min, max, sum, count(a), but not count(*). More precisely, count(*) yields 0 if the input is the empty set, and 1 if it is applied to some nulltuple. Thus, the right-hand side yields 0 for empty groups, whereas it should produce 1. Obviously, this problem can easily be fixed in the left outerjoin by using the correct default value of 1 for all attributes containing the result of a count(*). Hence, we define count(∗)(∅) := 1 in the context of default values for outerjoins. Thus, the above equivalences now read f ({⊥A(e ) }) e1 ZG1 θG2 ;f e2 ≡ ΠC (e1 EG1 =G2 2 ΓθG2 ;f (e2 )), (7.184) e1 ZG1 θG2 ;g;F e2 ≡ ΠC (e1 EG1 =G2 2 ΓθG2 ;g;F (e2 )), (7.185) F ({⊥A(e ) }) e1 ZG1 θG2 ;F e2 ≡ F ({⊥A(e ) }) ΠC (e1 EG1 =G2 2 ΓθG2 ;F (e2 )). (7.186) Apart from this detail, these equivalences follow directly from the definition of the groupjoin. 287 7.11. EQUIVALENCES FOR UNARY GROUPING For the regular join, we can apply a selection to get rid of tuples not finding a join partner by counting the number of join partners. This leads to the following equivalences: ΠC (e1 BG1 =G2 ΓθG2 ;g;F (e2 )) ≡ σc2 >0 (e1 ZG1 θG2 ;g;F ◦(c2 :|g|) e2 ), ΠC (e1 BG1 =G2 ΓθG2 ;F (e2 )) ≡ σc2 >0 (e1 ZG1 θG2 ;F ◦(c2 :count(∗)) e2 ), ΠC (e1 BG1 =G2 ΓG2 ;g;F (e2 )) ≡ σc2 >0 (e1 ZG1 =G2 ;g;F ◦(c2 :|g|) e2 ), ΠC (e1 BG1 =G2 ΓG2 ;F (e2 )) ≡ σc2 >0 (e1 ZG1 =G2 ;F ◦(c2 :count(∗)) e2 ). Pushing Grouping into the Groupjoin The general assumptions for the next three equivalences are as follows. Let F and F be two aggregation vectors. Let G be a set of grouping attributes such that G ⊆ A(e1 ) ∪ A(F ). Let J1 and J2 be non-empty sets of attributes with J1 ⊆ A(e1 ) and J2 ⊆ A(F ). Define G1 = G ∩ A(e1 ) and G+ 1 = G1 ∪ J1 . Assume F is splittable into F1 and F2 , and F1 is decomposable into F11 and F12 . Then ΓG;F (e1 ZJ1 θJ2 ;F e2 ) ≡ ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) ZJ1 θJ2 ;F e2 )). 1 1 (7.187) Note that F2 can only use attributes from F . Before we state the proof, note that two simplifications are derivable: If F2 is empty, then ΓG;F (e1 ZJ1 θJ2 ;F e2 ) ≡ ΓG;F12 (ΓG+ ;F 1 (e1 ) ZJ1 θJ2 ;F e2 )) 1 1 (7.188) holds if F1 is decomposable into F11 and F12 . If F1 is empty, then ΓG;F (e1 ZJ1 θJ2 ;F e2 ) ≡ ΓG;(F2 ⊗c1 ) (ΓG+ ;(c1 :count(∗)) (e1 ) ZJ1 θJ2 ;F e2 )) (7.189) 1 holds if F2 is decomposable into F21 and F22 . Trying to push the outer unary grouping into the right argument of the groupjoin does not make sense, since the right-hand side of a groupjoin will already be grouped by the groupjoin itself and a double grouping is not beneficial. However, it could be done. Proof of Eqv. 7.187: F (∅) ΓG;F (e1 ZJ1 θJ2 ;F e2 ) ≡7.183 ΓG;F (e1 EJ1 =J2 ΓθJ2 ,F (e2 )) F (∅) ≡7.143 ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) EJ1 =J2 ΓθJ2 ,F (e2 ))) 1 1 ≡7.183 ΓG;(F2 ⊗c1 )◦F12 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) ZJ1 θJ2 ;F e2 )) 1 1 2 Eliminating the top grouping Since ΠG+ (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 )) is duplicate-free, we can apply Eqv. 7.97 to 1 1 1 Eqv. 7.187 if G → G+ holds. With C = G ∪ A(F ), this gives us ΓG;F (e1 ZJ1 θJ2 ;F e2 ) ≡ ΠC (χF\ c2 (ΓG+ ;F 1 ◦(c1 :count(∗)) (e1 ) ZJ1 θJ2 ;F e2 )). ⊗c ◦F 2 1 1 1 1 (7.190) 288 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES Simplifications result in the following equivalences, which also hold if G → G+ holds: ΓG;F (e1 ZJ1 θJ2 ;F e2 ) ≡ ΠC (ΓG+ ;F (e1 ) ZJ1 θJ2 ;F e2 ), (7.191) 1 (ΓG+ ;(c1 :count(∗)) (e1 ) ZJ1 θJ2 ;F e2 )).(7.192) ΓG;F (e1 ZJ1 θJ2 ;F e2 ) ≡ ΠC (χF\ ⊗c 2 1 1 The first equivalence additionally needs that F2 is empty, the second that F1 is empty. Important Operator Conversions We now introduce two equivalences which allow us to replace a sequence of a grouping operator and a left-outerjoin/join by a single groupjoin [621]. For i = 1, 2, let ei be algebraic expressions and J1 = J2 be a join predicate, such that for the join attributes Ji ⊆ A(ei ) holds. For a set of grouping attributes G, define Gi = G ∩ A(ei ) and G+ i = Gi ∪ Ji . Further, let F be a splittable and decomposable aggregation vector with F(F ) ⊆ A(e2 ). We denote by C the set of attributes occurring in the result, i.e., C = G ∪ A(F ). Then, the equivalence ΓG;F (e1 EJ1 =J2 e2 ) ≡ ΠC (e1 ZJ1 =J2 ;F e2 ) (7.193) holds under the conditions that + 1. G → G+ 2 and G1 , G2 → TID(e1 ) hold in e1 EJ1 =J2 e2 , 2. J2 → G+ 2 holds in e2 , 3. F(F ) ⊆ A(e2 ), and 4. F (∅) = F ({⊥A(e2 ) }). We discuss these conditions to provide the intuition behind them. The two conditions under 1. stem from the main theorem of Yan and Larson in [946]. They assure that a grouping can be pushed into a regular join. In our context, the condition G1 , G+ 2 → TID(e1 ) assures that no two tuples from e1 belong to the same group. This is necessary since the groupjoin on the right-hand side provides exactly one output tuple for each input tuple of e1 . The condition + G → G+ 2 implies that grouping by G2 is not finer grained than grouping by G, which would lead to problems. In case the second condition (J2 → G+ 2 ) is not fulfilled, we would have more groups on the left-hand side than on the right-hand side of our equivalence, which would violate it. This is easy to see if we add to G an evil attribute from e2 , which is not functionally determined by J2 . The importance of the functional dependencies is illustrated in the examples below. The third condition (F(F ) ⊆ A(e2 )) can actually be relaxed if we maintain a final map operator (see Eqvs. 7.117 and 7.135). The fourth condition follows from the discussion of Eqv. 7.183. Eqv. 7.193 is important since it allows us to replace a unary grouping and a left outerjoin by a groupjoin. This is very beneficial in several scenarios. 289 7.11. EQUIVALENCES FOR UNARY GROUPING R1 a 1 R2 a 1 1 R3 a b 1 1 1 2 c 1 1 S d 8 9 e 1 2 Figure 7.15: Example relations m1 : R1 Ea=c S a c d e 1 1 8 1 1 1 9 2 m2 : R2 Ea=c S a c d e 1 1 8 1 1 1 9 2 1 1 8 1 1 1 9 2 m3 : R3 Eb=e S a b c d e 1 1 1 8 1 1 2 1 9 2 Figure 7.16: Join results Consider just the one where all these operators have a hash-based implementation in a main-memory setting. Then, the left-hand side requires to build two hash tables, whereas the right-hand side requires to build only one. Further, no intermediate result tuples for the outerjoin have to be built. The second equivalence replaces a sequence of a join and a grouping by a groupjoin. Given the notations of the previous subsection, the equivalence ΓG;F (e1 BJ1 =J2 e2 ) ≡ ΠC (σc2 >0 (e1 ZJ1 =J2 ;F ◦(c2 :count(∗)) e2 )) (7.194) holds under the conditions that + 1. G → G+ 2 and G1 , G2 → TID(e1 ) hold in e1 BJ1 =J2 e2 2. J2 → G+ 2 holds in e2 , and 3. F(F ) ⊆ A(e2 ). The intuition behind these conditions is the same as for the previous equivalence. The fourth condition could be omitted, since empty groups are eliminated by the selection σc2 >0 . Eqv. 7.194 is beneficial under similar circumstances as Eqv. 7.193. Before we come to the proofs, let us have a look at some examples. Fig. 7.15 contains some relations. The results of some outerjoins (Ri Eq S) with two different join predicates are given in Fig. 7.16. Since all tuples in some Ri always find a join partner, the results of the outerjoins are the same as the corresponding join results. We are now interested in the functional dependencies occurring in the conditions of our main equivalences. Therefore, we discuss four example instances of Eqv. 7.194, where at most one of the functional dependencies is violated: 290 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES r1 : R1 Za=c;sum(d) S a sum(d) 1 17 l1 : Γa;sum(d) (R1 Ea=c S) a sum(d) 1 17 l2 : Γa,e;sum(d) (R1 Ea=c S) a e sum(d) 1 1 8 1 2 9 r2 : R1 Za=c;sum(d) S a sum(d) 1 17 r3 : R2 Za=c;sum(d) S a sum(d) 1 17 1 17 r4 : R3 Zb=e;sum(d) S a b sum(d) 1 1 8 1 2 9 l3 : Γa;sum(d) (R2 Ea=c S) a sum(d) 1 34 l4 : Γa;sum(d) (R3 Eb=e S) a sum(d) 1 17 Figure 7.17: Left- and right-hand sides 1 2 3 4 G → G+ 2 + + + - G1 , G+ 2 → TID(e1 ) + + + J2 → G+ 2 + + + The according instances of the left-hand and right-hand side of Eqv. 7.194 are: 1 2 3 3 LHS Γa;sum(d) (R1 Ea=c S) Γa,e;sum(d) (R1 Ea=c S) Γa;sum(d) (R2 Ea=c S) Γa;sum(d) (R3 Eb=e S) RHS R1 Za=c;sum(d) S R1 Za=c;sum(d) S R2 Za=c;sum(d) S R3 Zb=e;sum(d) S The functional dependencies have to be checked on the join results given in Fig. 7.16. In order to help the reader to check the functional dependencies, we provide the following table holding the main attribute sets occurring in our main equivalences: 1 2 3 4 G {a} {a, e} {a} {a} G1 {a} {a} {a} {a} G2 ∅ {e} ∅ ∅ J2 {c} {c} {c} {e} G+ 2 {c} {c, e} {c} {e} Taking a look at Fig. 7.17, we see that both sides of the equivalence give the same result only if none of the functional dependencies is violated. 291 7.12. ELIMINATING REDUNDANT JOINS Proof of Eqv. 7.193 We now give the proof of Eqv. 7.193. We start with the right-hand side and transform it until we get the left-hand side: ΠC (e1 ZJ1 =J2 ;F e2 ) F (∅) ≡7.183 ΠC (e1 EJ1 =J2 ΓJ2 ;F (e2 )) ≡7.14 ≡7.137 F (∅) ΠC (e1 EJ1 =J2 ΓG+ ;F (e2 )) 2 ΓG;F (e1 EJ1 =J2 e2 )). The preconditions follow from collecting the preconditions of the different equivalences applied. 2 Proof of Eqv. 7.194 Eqv. 7.194 follows directly from Eqv. 7.193. An alternative is to modify the above proof by using Eqv. 7.187 instead of Eqv. 7.183 and Eqv. 7.117 instead of Eqv. 7.137. Remark Often, we encounter expression of the form ΓG;F (e1 ) ZJ1 =J2 e2 . If G = J1 , the hash table for the grouping can be reused by the groupjoin. Similarily, if G ⊇ J1 , any sorting produced to perform a sort-based grouping can be reused for a a sort-based groupjoin. 7.11.8 Intersection and Difference There is not much we can do in terms of pushing a unary grouping operator down an intersection or set difference. We can only change an explicit bag representation into a multiplicity-based bag representation. This gives us the following two equivalences: ΓG;F (e1 ∩ e2 ) ≡ ΓG;(F ⊗m) (χm:min(c1 ,c2 ) ( (7.195) ΓG;F (e1 \ e2 ) ≡ ΓG;(F ⊗m) (χm:c1 −̇c2 ( (7.196) E1 BA(e1 )=A(e2 )′ ρA(e2 )←A(e2 )′ (E2 ))), 2 :0 E1 EcA(e ′ 1 )=A(e2 ) ρA(e2 )←A(e2 )′ (E2 ))), where Ei is defined as ΓA(ei );ci :count(∗) (ei ) for i = 1, 2. 7.12 Eliminating Redundant Joins Since the join and outerjoin operations are very expensive, it makes sense to investigate possibilities to elimininate redundant joins and outerjoins. These often occur if queries use views [87, 658, 659], if queries are generated by frontend tools [327], if data is integrated from different source [165], or if objects are instantiated using relational views [88, 538]. One possibility studied intensively is to use tableaux techniques to detect and eliminate redundant joins [17, 18, 292 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES ?, 470, 471, 577, 759]. Chandra and Merlin show that finding the minimal conjunctive query for a given one is NP-hard [144]. The techniques discussed in the above papers apply to set semantics and not to bags. This deficiency was later remedied [159, 160, 221, 497]. Chaudhuri and Vardi as well as Ioannids and Ramakrishnan showed independently that the NP-hardness result still holds under bag semantics [159, 160, 449, 450]. Other work on join elimination occurs in the context of semantic query optimization. Early work here is by King [496]. The implementation of inner join elimination techniques in DBMSs is described by Cheng et al. for DB2 [172] and by Ghazal, Bhashyam, and Crolotte for Teradata [327]. They also describe the left-outer join elimination technique implemented in Teradata [328]. In this brief subsection, we present some simple algebraic equivalences that allow us to remove redundant joins and outerjoins. To derive an algorithm to eliminate all unnecessary joins is rather complicated and builds upon query containment and query equivalence. These will be discussed in depth in Chapter. 10. Clearly, it is beneficial to eliminate joins with relations, whose attributes are not needed to evaluate the query. Consider a simple algebraic expression containing a single join and another expression where the join has been eliminated: ΠA(e1 ) (e1 BA1 =A2 e2 ) ≡ e1 where Ai ⊆ A(ei ). This equivalence only holds if 1. for every tuple in e1 at most one join partner in e2 exists and 2. for every tuple in e1 at least one join partner in e2 exists. The first condition is easily satisfied if A2 is a (super-) key of e2 . The second condition demands that ΠA1 (e1 ) ⊆ ρA2 ←A1 (e2 ). However, this does not truely suffice. Additionally, we must have that all attributes in A1 are not null. If there is a referential integrity constraint e1 .A1 → e2 .A2 and the foreignkey attributes A1 are nullable, the join can still be eliminated if we add a not-null predicate on the foreign-key attributes [172]: . (e1 ) ΠA(e1 ) (e1 BA1 =A2 e2 ) ≡ σ¬(A1 =⊥) If we work with sets, things simplify a lot. Then, ΠD A(e1 ) (e1 BA1 =A2 e2 ) ≡ e1 holds whenever the second condition is fulfilled. Outerjoins are also easier. The equivalence ΠA(e1 ) (e1 EA1 =A2 e2 ) ≡ e1 holds if the first condition holds. Note that the above equivalences do not demand e1 and e2 to be different. Thus they can also be used to eliminate redundant self-(outer)-joins. Other pointers to the literature on join elimination are [164, 466, 855]. 7.13. SEMIJOIN AND ANTIJOIN REDUCER 7.13 293 Semijoin and Antijoin Reducer In the context of distributed and cluster-based database systems, it is important to reduce the amount of data shipped around [136, 264, 514, 666]. Introducing semijoin reducer is a common technique to achieve this. The according equivalences are: e1 Bq e2 ≡ e1 Bq (e2 Nq e1 ) (7.197) e1 Nq e2 ≡ e1 Nq (e2 Nq e1 ) (7.198) e1 Eq e2 ≡ e1 Eq (e2 Nq e1 ) (7.200) e1 Tq e2 ≡ e1 Tq (e2 Nq e1 ) e1 Zq;g:e e2 ≡ e1 Zq;g:e (e2 Nq e1 ) (7.199) (7.201) Assume we are given two relations R1 and R2 on two computers (stations) C1 and C2 and wish to calculate the join R1 Bp12 R2 . Let Ji := A(Ri ) ∩ F(p12 ) be the join attributes of the relations Ri . And define eJ2 := ΠD J2 (R2 ). Semijoin reduction then works by sending the projection of the join attributes, i.e., eJ2 of relation R2 to the computer C1 where the relation R1 resides. There, the semijoin R1 Np12 eJ2 is calculated. Then, we could send over the result to computer C2 and perform the join. However, sometimes it is beneficial to also reduce relation R2 , e.g., if the join is calculated at some computer C3 . Then, we can send the result of eJ21 := ΠD J1 (R1 Np12 eJ2 to computer C2 and use it to reduce R2 . However, if less than half of the values qualify, it is better to send the antijoin’s result, i.e., eA21 := ΠD J1 (R1 Tp12 eJ2 and use R2 Tp12 eA21 to reduce R2 . Integrating semijoin reducers into old-style plan generators has been described by Stocker, Kossmann, Braumandl, and Kemper [836]. 7.14 Outerjoin Simplification 7.15 Correct and Complete Exploration of the Core Search Space 7.15.1 The Core Search Space The core search space for a given operator tree, normally derived from the input query, is spanned by the transformations derived from the commutativity, associativity, l-asscom and r-asscom properties of the operators occurring in the input tree. Except for commutativity, these transformations are shown in Fig. 7.18 for some arbitrary binary operators ◦a and ◦b with their according predicates. Note the syntactic constraints on the left and remember that commutativity does not have these syntactic constraints. These syntactic constraints have one interesting consequence. Let us call a predicate p of some binary operator ◦ degenerate, if if does not reference relations from at least one argument side of ◦. Then, we can observe that the syntactic constraints for non-degenerate predicates imply that either associativity or l-asscom can be applied for left nesting but not both, and either associativity or r-asscom can be applied for right-nesting but not both. 294 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES ◦bpb assoc (◦a , ◦b ) F(pb ) ∩ A(e1 ) = ∅ e1 ◦bpb e2 e3 ◦apa e3 ◦apa e1 ◦bpb ≡ e1 e2 ◦apa r-asscom (◦a , ◦b ) F(pb ) ∩ A(e1 ) = ∅ e1 ◦bpb F(pa ) ∩ A(e3 ) = ∅ F(pa ) ∩ A(e2 ) = ∅ ≡ e2 l-asscom (◦a , ◦b ) F(pb ) ∩ A(e2 ) = ∅ e3 ◦apa F(pa ) ∩ A(e3 ) = ∅ ◦apa e1 e3 ◦bpb ◦bpb e2 e2 e3 ≡ e2 ◦apa e1 e3 Figure 7.18: Transformation rules for assoc, l-asscom, and r-asscom Fig. 7.19 shows an example of the seach space for an expression (e1 ◦a12 e2 )◦b13 e3 , where the subscripts of the operators indicate which arguments are referenced in their predicate. We observe that any expression in this search space can be reached by a sequence of at most two applications of commutativity, at most one application of associativity, l-asscom, or r-asscom, finally followed by at most two applications of commutativity. The total number of applications of commutativity can be restricted to 2. The case (e1 ◦a12 e2 ) ◦b23 e3 is left to the reader. The last observation only holds if there are no degenerate predictates and no cross products in the original plan. Fig. 7.20 shows all possible plans for two binary operators ◦a and ◦b . One can think of them as cross products. The plans are generated by applying assoc, l-asscom, r-asscom, and commutativity rewrites. Assume that the initial plan is the one in row 1 and column 3. The other plans in the first row are generated by using all rewrites but commutativity. The second row shows the plans derived from the plan above them by applying commutativity to the lower operator. The third row applies commutativity to the top operator of the plan above it in the first row. The fourth row applies commutativity to both operators. Thus, all plans in a column below a plan in the first row can be generated by at most two applications of commutativity. Of course, there are more possibilities to transform one plan into another. In order to indicate them, let us denote the matrix of plans by P . The 7.15. CORRECT AND COMPLETE EXPLORATION OF THE CORE SEARCH SPACE295 assoc (oa , ob ) comm (ob ) (e2 oa12 e1 ) ob13 e3 comm (oa ) comm (oa ) comm (ob ) (e1 oa12 e2 ) ob13 e3 l-asscomm (oa , ob ) comm (oa ) (e1 ob13 e3 ) oa12 e2 e3 ob13 (e1 oa12 e2 ) r-asscomm (oa , ob ) e2 oa12 (e1 ob13 e3 ) comm (ob ) comm (ob ) (e3 ob13 e1 ) oa12 e2 e3 ob13 (e2 oa12 e1 ) comm (oa ) e2 oa12 (e3 ob13 e1 ) assoc (ob , oa ) Figure 7.19: Core search space example application of transformations other than commutativity gives: P [2, i] ←→ P [3, i + 1] P [3, i] ←→ P [2, i + 1] P [4, i] ←→ P [4, i + 1] P [1, 1] ←→ P [4, 6] P [2, 1] ←→ P [3, 6] P [3, 1] ←→ P [3, 6] It is easy to see, that we need more than one of assoc, l-asscom, or r-asscom to get from P [1, 3] to, e.g., P [1, 1]. 7.15.2 Exploration How does the plan generator explore this search space? Remember the join ordering algorithms from Chapter 3, especially DPsub, DPsize, and DPccp, which are all based on dynamic programming. We extend the simple algorithm DPsub 296 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES 1 = r-assom(2) e2 e2 ◦b 2 = assoc(3) ◦a e1 ◦a ◦b e1 e3 e2 e3 ◦b ◦a e1 ◦a ◦b 3 ◦a ◦b e3 4 = l-asscom (3) ◦a e2 ◦b e1 e2 e1 e3 ◦b ◦a ◦a e3 ◦b e2 5 = assoc(4) e1 e1 ◦b 6 = r-asscom(5) ◦a e3 ◦a ◦b e3 e2 e1 e2 ◦b ◦a e3 ◦a ◦b e3 e1 e3 e2 e2 e1 e3 e1 e2 e3 e2 e1 ◦b ◦a ◦b ◦a ◦b ◦a ◦a e2 ◦b e1 e3 e2 e3 ◦b ◦a ◦a e3 e1 e2 ◦b e3 e2 e1 e1 e3 e3 ◦a e2 ◦b e1 e2 e1 e3 ◦b ◦a ◦a e2 e1 e2 ◦b e3 e1 ◦a e1 ◦b e3 e2 e1 e2 ◦b ◦a ◦a e1 e2 e3 Figure 7.20: The complete search space to one called DPsube. The resulting code is shown in Fig. ??. As input it takes the set of n relations R = {R0 , . . . , Rn−1 } and the set of operators O containing n − 1 operators which DPsube has to use in order to build a plan. First, it constructs plan for single relations. Then, it enumerates all subsets S of relations by decoding an integer, which represents a bitvector. For each set of relations S, DPsube then enumerates all subsets S1 of S and their complements S2 . Both of them must be non-empty. For each pair (S1 , S2 ), all operators ◦ in O are then tested for applicability via a call to applicable. If the operator is applicable, then the best plans p1 for S1 and p2 for S2 are recalled from the dynamic programming table BestPlan and combined into the plan p1 ◦ p2 for S. The costs of this plan are then calculated and it is possibly added to the DP-table. Since this piece of code is straight forward, we did not detail on it. Note that only if an operator is applicable then DPsube also considers p2 ◦ p1 if ◦ is commutative. The rest of the section deals with different implementations of applicable. Two implementations of applicable are described in the literature. Each of them uses a set of relations as a short-hand representation of possible reordering conflicts. The first set is called EEL, and is presented by Rao, Lindsay, Lohman, Pirahesh, and Simmen [717, 716]. The second set is called TES, and is presented by Moerkotte and Neumann [620, 619]. The first approach is limited to B, T, and E. Both approaches generate invalid plans, i.e., plans which are not equivalent to the input operator tree. Thus, we will present an alternative test. The main properties are that it will be correct and complete. An implementation of applicable is correct, if only valid plans are generated. It is complete, if all valid plans are generated. ◦b e2 e1 e3 e3 7.15. CORRECT AND COMPLETE EXPLORATION OF THE CORE SEARCH SPACE297 Algorithm DPsube a set of relations R = {R0 , . . . , Rn−1 } a set of operators O with associated conflict descriptors Output: an optimal bushy operator tree Input: for all Ri ∈ R BestPlan({Ri }) = Ri ; for 1 ≤ i < 2n − 1 ascending S = {Rj ∈ R|(⌊i/2j ⌋ mod 2) = 1} if (|S| = 1) continue for all S1 ⊂ S, S1 ̸= ∅ do S2 = S \ S1 ; for all ◦ ∈ O do if (applicable(◦, S1 , S2 )) build and handle the plans BestPlan(S1 ) ◦ BestPlan(S2 ) if (◦ is commutative) build and handle the plans BestPlan(S2 ) ◦ BestPlan(S1 ) return BestPlan(R); Figure 7.21: Algorithm DPsube Preliminaries In order to open our approach for new algebraic operators, we use a table driven approach. We use four tables which contain the properties of the algebraic operators. These contain the information of Tables 7.6 and 7.7 together with the information about the commutativity of the operators. Thus, extending our approach only requires to extend these tables. We develop our final approach in three steps. At each step, we present a complete bundle consisting of three components: 1. a representation for conflicts 2. a conflict detection (CD) algorithm, which detects the conflicts from an initial operator tree and produces a conflict represention for this operator, and 3. the implementation of applicable, which uses the conflict representation for an operator and then determines whether the operator can be applied in a given context. Each of the subsequently discussed bundles is correct, but only the last one is complete. The main idea in the following (and in the literature cited above) is to extend the consumer/producer constraints. Therefore, we first introduce syntactic eligibility sets (SES for short), which are attached to operators and contain the 298 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES set of relations that must be present before the operator can be applied. Sometimes, SES is called NEL. For every operator ◦, SES(◦) is thus a set of relations. Then, a plan of the form plan(S1 )◦plan(S2 ) is only considered if the test SES(◦) ⊆ S1 ∪S2 succeeds. Hence, SES checks for a consumer/producer relationships. Some operators like the groupjoin or map operator introduce new attributes. These are treated as if they belong to a new artificial relation. This new relation is present in the set of accessible relations after the groupjoin or map operator has been applied. We assume that an initial operator tree is given and refer to it as the operator tree. We need some notation. For a set of attributes A, we denote by REL(A) the set of relations to which these attributes belong. We abbreviate REL(F(e)) by FT (e). Let ◦ be an operator in the initial operator tree. We denote by left(◦) (right(◦)) its left (right) descendants. STO(◦) denotes the operators contained in the operator subtree rooted at ◦. REL(◦) denotes the set of relations contained in the subtree rooted at ◦. The syntactic eligibility set (SES) is used to express the syntactic constraints: all referenced attributes/relations must be present before an expression can be evaluated. First of all, it contains the relations referenced by a predicate. Further, as we also deal with table functions and dependent join operators as well as groupjoins, we need the following extensions. Let R be a relation, T a table-valued function call, ◦p any of our binary or unary operators except a groupjoin, and gj ∈ {Z, [}. Then, we define: SES(R) = {R} SES(T ) = {T } [ SES(◦p ) = R∈FT (p) SES(gjp;a1 :e1 ,...,an :en ) = SES(R) ∩ REL(◦p ) [ R∈FT (p)∪FT (ei ) SES(R) ∩ REL(gj) All conflict representations have a component TES which contains a set of tables. We always initialize TES with SES as calculated above. Further, we assume that our conflict representation has two accessors tesl and tesr which return tesl(◦) := TES(◦) ∩ REL(left(◦)) tesr(◦) := TES(◦) ∩ REL(right(◦)) This distinction is necessary, since we want to consider commutativity explicitly and prevent in those cases where commutativity does not hold, that operators which occurred on the left-hand side of an operator move to its right-hand side or vice versa. All our implementations of applicable conjunctively include the tests tesl ⊆ S1 , and tesr ⊆ S2 . 7.15. CORRECT AND COMPLETE EXPLORATION OF THE CORE SEARCH SPACE299 ◦b ◦a ◦a assoc −−→ e3 e1 ◦b e1 e2 e2 e3 ◦a l-asscom −−→ ◦b e2 ¬l-asscom(◦a , ◦b ) TES(◦b ) ∪ = REL(e2 ) e2 ¬assoc(◦b , ◦a ) TES(◦b ) ∪ = REL(e2 ) e1 e3 e3 ◦b ◦a assoc −−→ ◦a e1 e2 ◦b e3 e1 r-asscom −−→ e1 ◦a ◦b ¬assoc(◦a , ◦b ) TES(◦b ) ∪ = REL(e1 ) ¬r-asscom(◦a , ◦b ) TES(◦b ) ∪ = REL(e1 ) e3 e2 Figure 7.22: Calculating TES for simple operator trees Approach CD-A Let us first consider a simple operator tree with only two operators. Take a look at the upper half of Fig. 7.22. There, it illustrates the application of associativity and l-asscom to some plan. In the case that associativity does not hold, we add REL(e1 ) to TES(◦b ). This prevents the plan on the right-hand side of the arrow marked with assoc. It does not, however, prevent the plan on the right-hand side of the arrow marked with l-asscom. Similarily, adding REL(e2 ) to TES(◦b ) does prevent the plan resulting from l-asscom but not the plan resulting from applying associativity. The lower part of Fig. 7.22 shows the actions needed if an operator is nested in the right argument. Again, we can precisely prevent the invalid plans. The only problem we now have to solve is that a conflicting operator is deeper down the tree. This is possible since in general the ei are trees themselves. Some reordering could possibly move a conflicting operator up to the top of an argument subtree. We thus have to calculate the total eligibility sets bottom-up. In a first step, for every operator ◦ in a given operator tree SES(◦) is calculated. Then, TES(◦) is initialized to SES(◦). After that, the following procedure is applied bottom-up to every operator ◦apa in the operator tree: CD-A(◦bpb ) // operator ◦b and its predicate pb for ∀ ◦a ∈ STO(left(◦b )) if ¬assoc(◦a , ◦b ) then TES(◦b ) ∪ = REL(left(◦a )) 300 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES if ¬l-asscom(◦a , ◦b ) then TES(◦b ) ∪ = REL(right(◦a )) for ∀ ◦a ∈ STO(right(◦b )) if ¬assoc(◦b , ◦a ) then TES(◦b ) ∪ = REL(right(◦a )) if ¬r-asscom(◦a , ◦b ) then TES(◦b ) ∪ = REL(left(◦a )) If we do not have degenerate predicates and cross products among the operators in the initial operator tree, we can safely use TES instead of REL. The conflict representation comprises the TES for every operator. The definition of applicable is applicable(◦, S1 , S2 ) := tesl(◦) ⊆ S1 ∧ tesr(◦) ⊆ S2 . Let us now see why applicable is correct. We have to show that it prevents the generation of bad plans. Take the ¬ assoc case with nesting on the left. Let the original operator tree contain (e1 ◦a12 e2 ) ◦b23 e3 . Define the set of tables R2 := FT (◦b23 ) ∩ REL(left(◦b23 )) and R3 := FT (◦b23 ) ∩ REL(right(◦b23 )). Then SES(◦b23 ) = R2 ∪ R3 . Further, since ¬assoc(◦a12 , ◦b23 ), we have TES(◦b23 ) ⊇ SES(◦b23 ) ∪ REL(e1 ). Note that we used ⊇ and not equality since due to other conflicts, TES(◦b ) could be larger. Next, we observe that tesl(◦b23 ) ⊇ (SES(◦b23 ) ∪ REL(e1 )) ∩ REL(left(◦b23 )) = REL(e1 ) ∪ R2 tesr(◦b23 ) ⊇ (SES(◦b23 ) ∪ REL(e1 )) ∩ REL(right(◦b23 )) = R3 Let S1 , S2 be a pair of two arbitrary subsets of relations generated by DPsube. Then, the call applicable(◦b , S1 ,S2 ) checks tesl(◦b23 ) ⊆ S1 and tesr(◦b23 ) ⊆ S2 , and fails if S1 ̸⊇ REL(e1 ). Thus, neither e2 ◦b23 e3 nor e3 ◦b23 e2 will be generated and, hence, e1 ◦a12 (e2 ◦b23 e3 ) will not be generated. Similarily, if ¬l-asscom(◦a , ◦b ), tesl(◦b ) will contain REL(e2 ) and the test prevents the generation of e1 ◦b e3 . The remaining two cases can be checked analogously. From this discussion, it follows that DPsube generates only valid plans. However, it does not generate all valid plans. It is thus incomplete, as we can see from the example shown in Fig. 7.23. Since ¬assoc(N, E), TES(E) contains R1 . Thus, neither Plan 1 nor Plan 3 or any of those derived from applying join commutativity to them will be generated. Approach CD-B In order to avoid this problem, we need the more flexible mechanism of conflict rules. A conflict rule is simply a pair of sets of tables denoted by T1 → T2 . With every operator node ◦ in the operator tree, we associate a set of conflict 7.15. CORRECT AND COMPLETE EXPLORATION OF THE CORE SEARCH SPACE301 E2,3 B0,2 R3 B0,2 R2 N0,1 R0 N0,1 N0,1 R1 R0 E2,3 R1 R2 initial plan R1 B0,2 R3 R0 E R2 Plan 1 R3 Plan 3 Figure 7.23: Example showing the incompleteness of CD-A ◦b ◦a ◦a assoc e3 −−→ e1 ◦b e1 e2 ¬assoc(◦a , ◦b ) REL(e2 ) → REL(e1 ) CR(◦b ) + = e2 e3 ◦a l-asscom −−→ ◦b ¬l-asscom(◦a , ◦b ) REL(e1 ) → REL(e2 ) e2 CR(◦b ) + = e2 ¬assoc(◦b , ◦a ) CR(◦b ) + = REL(e1 ) → REL(e2 ) e1 e3 e3 ◦b ◦a assoc ◦a −−→ e1 e2 ◦b e3 e1 r-asscom −−→ e1 ◦a ◦b e3 e2 ¬r-asscom(◦a , ◦b ) CR(◦b ) + = REL(e2 ) → REL(e1 ) Figure 7.24: Calculating conflict rules for simple operator trees rules. Thus, our conflict representation now associates with every operator a TES and a set of conflict rules. Before we introduce their construction, let us illustrate their role in applicable(S1 , S2 ). A conflict rule T1 → T2 is obeyed for S1 and S2 , if with S = S1 ∪ S2 the following condition holds: T1 ∩ S ̸= ∅ =⇒ T2 ⊆ S. Thus, if T1 contains a single relation from S, then S must contain all relations in T2 . Keeping this in mind, it is easy to see that the invalid plans are indeed prevented by the rules shown in Fig. 7.24 if they are obeyed. As before, we just need to generalize it to arbitrary trees: 302 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES B0,1 R0 B1,2 N1,3 B1,2 R1 R3 R0 R2 initial plan R2 B0,1 N1,3 R1 R3 valid plan prevented Figure 7.25: Example showing the incompleteness of CD-B CD-B(◦bpb ) // operator ◦b and its predicate pb for ∀ ◦a ∈ STO(left(◦b )) if ¬assoc(◦a , ◦b ) then CR(◦b ) + = REL(right(◦a )) → REL(left(◦a )) if ¬l-asscom(◦a , ◦b ) then CR(◦b ) + = REL(left(◦a )) → REL(right(◦a )) for ∀ ◦a ∈ STO(right(◦b )) if ¬assoc(◦b , ◦a ) then CR(◦b ) + = REL(left(◦a )) → REL(right(◦a )) if ¬r-asscom(◦a , ◦b ) then CR(◦b ) + = REL(right(◦a )) → REL(left(◦a )) The test applicable(◦, S1 , S2 ) checks two conditions: 1. tesl ⊆ S1 ∧ tesr ⊆ S2 must hold, and 2. all rules in the rule set of ◦ must be obeyed. Again, this implementation of applicable is correct but not complete, as the example in Fig. 7.25 shows. Since assoc(B, N) and l-asscom(B, N), the only conflict occurs due to r-asscom(B, N). Thus, REL({R3 }) → REL({R1 , R2 }) ∈ CR(B0,1 ) The latter rule prevents the plan on the right-hand side of Fig. 7.25. Note that it is overly careful since R2 ̸∈ FT (N1,3 ). In fact, r-asscom would never be applied in this example, since B0,1 accesses table R1 and applying r-asscom would thus destroy the consumer/producer relationship already checked by SES(B0,1 ). Approach CD-C The approach CD-C differs from CD-B only by the calculation of the conflict rules. The conflict representation and the procedure for applicable remain the same. The idea is now to learn from the above example and include only those relations under operator ◦a , which occur in the predicate. However, we have to be careful to include special cases for degenerate predicates and cross products. CD-C(◦bpb ) // operator ◦b and its predicate pb 7.15. CORRECT AND COMPLETE EXPLORATION OF THE CORE SEARCH SPACE303 for ∀ ◦a ∈ STO(left(◦b )) if ¬assoc(◦a , ◦b ) then if REL(left(◦a )) ∩ FT (◦a ) ̸= ∅ then CR(◦b ) + = REL(right(◦a )) → REL(left(◦a )) ∩ FT (◦a ) else CR(◦b ) + = REL(right(◦a )) → REL(left(◦a )) if ¬l-asscom(◦a , ◦b ) then if REL(right(◦a )) ∩ FT (◦a ) ̸= ∅ then CR(◦b ) + = REL(left(◦a )) → REL(right(◦a )) ∩ FT (◦a ) else CR(◦b ) + = REL(left(◦a )) → REL(right(◦a )) a for ∀ ◦ ∈ STO(right(◦b )) if ¬assoc(◦b , ◦a ) then if REL(right(◦a )) ∩ FT (◦a ) ̸= ∅ then CR(◦b ) + = REL(left(◦a )) → REL(right(◦a )) ∩ FT (◦a ) else CR(◦b ) + = REL(left(◦a )) → REL(right(◦a )) if ¬r-asscom(◦a , ◦b ) then if REL(left(◦a )) ∩ FT (◦a ) ̸= ∅ then CR(◦b ) + = REL(right(◦a )) → REL(left(◦a )) ∩ FT (◦a ) else CR(◦b ) + = REL(right(◦a )) → REL(left(◦a )) Rule Simplification Large TES make the search space to be explored by the plan generator smaller and, thus, lead to more efficiency, at least if an advanced plan generator like DPhyp is used. Further, reducing the number of rules slightly decreases plan generation time. Thus, applying laws like R1 → R2 , R1 → R3 ≡ R1 → R2 ∪ R3 R1 → R2 , R3 → R2 ≡ R1 ∪ R3 → R2 can be used to rearrange the rule set for efficient evaluation. However, we are much more interested in eliminated rules altogether by adding their right-hand side to the TES. For some operator ◦, consider a conflict rule R1 → R2 . If R1 ∩ TES(◦) ̸= ∅, then we can add R2 to TES due to the existential quantifier on the left-hand side of a rule in the definition of obey. Further, if R2 ⊆ TES(◦), we can safely eliminate the rule. Applying these rearrangements is often possible since both REL(left(◦a )) ∩ FT (◦) and REL(right(◦a )) ∩ FT (◦) will be non-empty. 7.15.3 More Issues Unary Operators Not all unary operators are freely reorderable (see Table 7.3). Fortunately, handling conflicts for unary operators is quite simple. We associate a new 304 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES 0b / σ / � 0a / 0b/ 0b/ 0a / σ / � a 0a / σ / � b c Figure 7.26: Conflict detection for unary and binary operators artificial relation with every unary operator (that needs one). For a given unary operator ◦a , denote this relation by AREL(◦a ). Then, whenever a conflict with a unary operator ◦b above ◦a occurs, we add AREL(◦a ) to TES(◦b ). The result is that they will never be reordered. This is captured in the following code fragment: CD-C for each unary operator (◦b ) for each unary operator ◦a in STO(◦b ) if ¬reorderable(◦a , ◦b ) TES(◦b ) + = AREL(◦a ) Mixing Unary and Binary Operators A unary operator maybe left- or right-pushable into some binary operator (see Table 7.4). In the given original operator tree, a unary operator may occur above or below a binary operator. Accordingly, we extend CD-C by two more cases. Let us first consider the case where a binary operator ◦a can be found somewhere below a unary operator ◦b . This is illustrated in Fig. 7.26 a. (Don’t be confused by the two dotted lines, they will be used later on. Just image a single line connecteding ◦b with ◦a .) If ◦b is left- and right-pushable into ◦a , we do not have any conflict. If ◦b is neither left- nor right-pushable into ◦a , any valid plan must contain ◦b above ◦a . This is achieved by extending the TES of ◦b by all relations below ◦a . Consider the case where ◦b is not right-pushable. Then, we must prevent any plan where ◦b occurs in the right subtree of ◦a . Adding a conflict rule ◦a which says that if any relation from ◦a ’s right subtree occurs in the current plan to which we want to add ◦b , then it must contain all relations from its left subtree. The other case is symmetric. We summarize these ideas in the following extension to CD-C: 7.15. CORRECT AND COMPLETE EXPLORATION OF THE CORE SEARCH SPACE305 CD-C for all unary operators ◦b in the original operator tree for all binary operators ◦a ∈ ST O(◦b ) if ¬left-pushable(◦b , ◦a ) ∧ right-pushable(◦b , ◦a ) CR(◦b )+ = REL(left(◦a )) → REL(right(◦a )) if left-pushable(◦b , ◦a ) ∧ ¬right-pushable(◦b , ◦a ) CR(◦b )+ = REL(right(◦a )) → REL(left(◦a )) if ¬left-pushable(◦b , ◦a ) ∧ ¬right-pushable(◦b , ◦a ) TES(◦b )+ = REL(◦a ) Now, we consider the case where a unary operator ◦a can be found somewhere below a binary operator ◦b (see Fig. 7.26 b,c). In this case, if it cannot be pulled up, we prevent this by adding the artificial relation AREL of ◦b to the TES of ◦a : CD-C for all binary operators ◦b in the original operator tree for all unary operators ◦a ∈ ST O(lef t(◦b )) if ¬left-pushable(◦a , ◦b ) TES(◦b )+ = AREL(◦b ) for all unary operators ◦a ∈ ST O(right(◦b )) if ¬right-pushable(◦a , ◦b ) TES(◦b )+ = AREL(◦b ) A selection operator can be changed into a join if its predicate references two or more relations. In this case, a conflict between the resulting join and some other binary operator might occur. We can handle these potential conflicts as follows. Consider Fig. 7.26 a. By ◦b /σ/B we denote our selection that can be turned into a join. By ◦a /E we denote a binary operator below our selection. The case that it might be a left outerjoin is used in a subsequent example. The Figure shows the trick we perform. We assume that a selection that can be turned into a join has two arguments, a left and a right subtree. Both of which point to the (only) child node of the selection. Thus, the left outerjoin is once the left child of the selection/join and once the right one. Then, the usual CD-C procedure can be run in addition to the above conflict handling. Let us do so for the example. In case we treat the left outerjoin as the left child of the outerjoin, we derive from ¬assoc(E, B) CR(◦b )+ = REL(right(◦a )) → REL(left(◦a )) possibly with ∩FT (◦a ) on the right-hand side. In the other case, we get due to the fact that ¬r-asscom(B, E) CR(◦b )+ = REL(right(◦a )) → REL(left(◦a )), 306 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES possibly with ∩FT (◦a ) on the right-hand side. In any case, both conflicts result in the same conflict rule. Further, both are subsumed by the above conflict handling for the unary/binary operator mix. The reader should validate that this is the case for all the operators in our algebra. However, since we want to be purely table driven, we simply add these (redundant) conflict rules and rely on rule simplification. Cross Products and Degenerate Predicates Cross products and degenerate predicates require more care as the comparison between Fig. 7.19 and Fig. 7.20 shows. They lack the syntactic constraints due to attribute accesses, which highly restrict the number of syntactically valid plans. Consider an example like (R1 A R2 ) B1,3 (R3 N3,4 R4 ). So far, nothing prevents DPsube to consider invalid plans like R1 B1,3 (R3 N3,4 (R2 A R4 )). Note that in order to prevent this plan, we would have to detect conflicts on the “other side” of the plan. In our example, we need to consider conflicts between operators in the left and the right subtree of B1,3 . Since cross products and degenerate predicates should be rare in real queries, it suffices to produces correct plans. We have no ambition to explore the complete search space. Thus, we just want to make sure that in these abnormal cases the plan generator still produces a correct plan. In order to do so, we proceed as follows. We extend the conflict representation by two bitvectors representing the left and the right relations of an operator. Let us call them relLeft and relRight. Then, we extend the applicable test and check that at least one relation from relLeft occurs in the left subplan, and at least one relation from relRight occurs in the right subplan. That is, in the test for applicable(◦, S1 , S2 ), we conjunctively check (relLeft ∩ S1 ̸= ∅) ∧ (relRight ∩ S2 ̸= ∅). This results in a correct test, but, as experiments have shown, about a third of the valid search space will not be explored if cross products are present in the initial operator tree. However, note that if the initial plan does not contain cross products and degenerate predicates, this test will always succeed such that in this case still the whole core search space is explored. Further, still a larger portion of the core search space is explored when comparing this approach to the one by Rao et al. [716, 717]. There, two separate runs of the plan generator for the arguments of a cross product hinders any reordering of operators with cross products. There is a second issue concerning cross products. In some rare cases, they might be beneficially introduced, even if the initial plan does not demand them. In this case, we can proceed as proposed by Rao et al. [716, 717]. For each relation R, a companion set is calculated which contains all relations, that are connected to R only by inner join predicates. Within a companion set, all join orders and introductions of cross products are valid. 7.16. LOGICAL ALGEBRA FOR SEQUENCES 307 Other Plan Generators It is rather simple to incorporate our test into other algorithms than DPsub (e.g., DPsize, DPccp, TDxxx). However, the result is not necessarily too efficient. An efficient approach is discussed in Chapter ??, where we generalize DPccp to cover hypergraphs. To see why this is appropriate, observe that (tesl(◦), tesr(◦)) is a hyperedge. Beyond the Core Search Space At the beginning, we talked about the core search space. Why core? Because there are more equivalences which we would like to be considered by the plan generator. First of all, early grouping can significantly improve performance. Then, some equivalences with operator conversions (e.g., Eqvs. ??, 7.94, 7.95, 7.194), and 7.193) are also important. These cases require some special treatment. This is discussed in Chapter ??. 7.16 Logical Algebra for Sequences 7.16.1 Introduction The algebra (NAL) we use here extends the SAL-Algebra [70] developed by Beeri and Tzaban. SAL is the order-preserving counterpart of the algebra used in [189, 191] and in this book. SAL and NAL work on sequences of sets of variable bindings, i.e., sequences of unordered tuples where every attribute corresponds to a variable. We allow nested tuples, i.e. the value of an attribute may be a sequence of tuples. Single tuples are constructed by using the standard [·] brackets. The concatenation of tuples and functions is denoted by ◦. The set of attributes defined for an expression e is defined as A(e). The set of free variables of an expression e is defined as F(e). The projection of a tuple on a set of attributes A is denoted by |A . For an expression e1 possibly containing free variables, and a tuple e2 , we denote by e1 (e2 ) the result of evaluating e1 where bindings of free variables are taken from variable bindings provided by e2 . Of course this requires F(e1 ) ⊆ A(e2 ). For a set of attributes A, we define the tuple constructor ⊥A such that it returns a tuple with attributes in A initialized to NULL. For sequences e we use α(e) to denote the first element of a sequence. We identify single element sequences and elements. The function τ retrieves the tail of a sequence, and ⊕ concatenates two sequences. We denote the empty sequence by ϵ. As a first application, we construct from a sequence of nontuple values e a sequence of tuples denoted by e[a]. It is empty if e is empty. Otherwise, e[a] = [a : α(e)] ⊕ τ (e)[a]. By id we denote the identity function. In order to avoid special cases during the translation of XQuery into the algebra, we use the special algebraic operator (2̂) that returns a singleton sequence consisting of the empty tuple, i.e., a tuple with no attributes. 308 CHAPTER 7. AN ALGEBRA FOR SETS, BAGS, AND SEQUENCES We will only define order-preserving algebraic operators. For the unordered counterparts see [191]. Typically, when translating a more complex XQuery into our algebra, a mixture of order-preserving and not order-preserving operators will occur. In order to keep the section readable, we only employ the orderpreserving operators and use the same notation for them that has been used in [189, 191], SAL [70], and this book. Again, our algebra will allow nesting of algebraic expressions. For example, within a selection predicate of a select operator we allow the occurrence of further nested algebraic expressions. Hence, a join within a selection predicate is possible. This simplifies the translation procedure of nested XQuery expressions into the algebra. However, nested algebraic expressions force a nested loop evaluation strategy. Thus, the goal of the paper will be to remove nested algebraic expressions. As a result, we perform unnesting of nested queries not at the source level, but at the algebraic level. This approach is more versatile and less error-prone. 7.16.2 Algebraic Operators We define the algebraic operators recursively on their input sequences. For unary operators, if the input sequence is empty, the output sequence is also empty. For binary operators, the output sequence is empty whenever the left operand represents an empty sequence. The order-preserving selection operator is defined as  if e = ϵ,  ϵ α(e) ⊕ σ̂p (τ (e)) if p(α(e)), σ̂p (e) :=  σ̂p (τ (e)) else. For a list of attribute names A, we define the projection operator as Π̂A (e) :=  ϵ if e = ϵ, α(e)|A ⊕ Π̂A (τ (e)) else. We also define a duplicate-eliminating projection Π̂D A . Besides the projection, its semantics is similar to that of the distinct-values function of XQuery: it does not preserve order. However, we require it to be deterministic and idempotent. Sometimes we just want to eliminate some attributes. When we want to eliminate the set of attributes A, we denote this by Π̂A . We use Π̂ also for renaming attributes. Then we write Π̂A′ :A . The attributes in A are renamed to those in A′ . Attributes other than those in A remain untouched. The map operator is defined as follows: χ̂a:e2 (e1 ) :=  ϵ if e1 = ϵ, α(e1 ) ◦ [a : e2 (α(e1 ))] ⊕ χ̂a:e2 (τ (e1 )) else. It extends a given input tuple t1 ∈ e1 by a new attribute a whose value is computed by evaluating e2 (t1 ). For an example see Figure 7.27. 7.16. LOGICAL ALGEBRA FOR SEQUENCES e1 := R1 a1 1 2 3 e2 := R2 a2 b 1 2 1 3 2 4 5 2 309 e3 := χ̂a:σ̂a1 =a2 (e2 ) (e1 ) a1 a 1 ⟨[1, 2], [1, 3]⟩ 2 ⟨[2, 4], [2, 5]⟩ 3 ⟨⟩ Figure 7.27: Example for Map Operator We define the cross product of two tuple sequences as ( ϵ if e1 = ϵ, ˆ 2 := e1 ×e ˆ ˆ (α(e1 )Ae2 ) ⊕ (τ (e1 )×e2 ) else. where ˆ := e1 Ae 2 ( ϵ if e2 = ϵ, ˆ (e1 ◦ α(e2 )) ⊕ (e1 Aτ (e2 )) else. We are now prepared to define the join operation on ordered sequences: ˆ 2 ). e1 B̂p e2 := σ̂p (e1 ×e We define the semijoin as  α(e1 ) ⊕ (τ (e1 )N̂p e2 ) if ∃x ∈ e2 p(α(e1 ) ◦ x), τ (e1 )N̂p e2 else.  α(e1 ) ⊕ (τ (e1 )T̂p e2 ) (τ (e1 )T̂p e2 ) e1 N̂p e2 := and the anti-join as e1 T̂p e2 := if ̸ ∃x ∈ e2 p(α(e1 ) ◦ x), else. The left outer join, which will play an essential role in unnesting, is defined g:e as e1 Êp e2 :=  g:e  (α(e1 )B̂p e2 ) ⊕ (τ (e1 )Êp e2 ) (α(e1 ) ◦ ⊥A(e2 )\{g} ◦ [g : e])  g:e ⊕(τ (e1 )Êp e2 ) if (α(e1 )B̂p e2 ) ̸= ϵ, else. where g ∈ A(e2 ). Our definition deviates slightly from the standard left outer join operator, as we want to use it in conjunction with grouping and (aggregate) functions. Consider the relations R1 and R2 in Figure 7.28. If we want to join R1 (via a left outerjoin) with e3 , which is grouped on a2 , we need to be able to handle empty groups (as for the tuple with a1 = 3 in e1 in the example). In the definition of the left outerjoin with default, the expression e then defines the value given to the attribute g for all those elements in e1 that do not find g=0 a join partner in e2 . In our example, it would specify Ê . We define the dependency join (d-join for short) as ( ϵ if e1 = ϵ ˆ 2> ˆ := e1 ˆ else α(e1 )Ae2 (e1 ) ⊕ τ (e1 ), ̸=} be a comparison operator on atomic values. The grouping operator which produces a sequence-valued new attribute containing “the group” is defined by using a groupjoin. Γ̂θA;g:f (e) := Π̂A:A′ (Π̂D A′ :A (Π̂A (e)) ẐA′ θA;g:f e) where the groupjoin operator (sometimes called nest-join [831]) is defined as  ϵ if e1 = ϵ e1 ẐA1 θA2 ;e:f e2 := α(e1 ) ◦ [g : G(α(e1 ))] ⊕ (τ (e1 ) ẐA1 θA2 ;g:f e2 ) Here, G(x) := f (σ̂x|A1 θA2 (e2 )), and function f assigns a meaningful value to empty groups. See also Figure 7.28 for an example. The unary grouping operator processes a single relation and obviously groups only on those values that are present. The groupjoin works on two relations and uses the left-hand one to determine the groups. This will become important for the correctness of the unnesting procedure. Given a tuple with a sequence-valued attribute, we can unnest it using the unnest operator defined as  ϵ if e = ϵ, µ̂g (e) := ˆ (α(e)|{g} ×α(e).g) ⊕ µ̂g (τ (e)) else. where e.g retrieves the sequence of tuples of attribute g. In case that g is empty, it returns the tuple ⊥A(e.g) . (In our example in Figure 7.28, µ̂g (e4 ) = e2 .) We define the unnest map operator as follows: Υ̂a:e2 (e1 ) := µ̂g (χ̂g:e2 [a] (e1 )). This operator is mainly used for evaluating XPath expressions. Since this is a very complex issue [338, 340, 416], we do not delve into optimizing XPath evaluation, but instead take an XPath expression occurring in a query as it is and use it in the place of e2 . Optimized translation of XPath is orthogonal to our unnesting approach and not covered in this paper. The interested reader is referred to [416, 417]. 311 7.17. LITERATURE 7.16.3 Equivalences To acquaint the reader with ordered sequences, we state some familiar equivalences that still hold. σ̂p1 (σ̂p2 (e)) = σ̂p2 (σ̂p1 (e)), ˆ 2 ) = σ̂p (e1 )×e ˆ 2, σ̂p (e1 ×e ˆ 2 ) = e1 ×σ̂ ˆ p (e2 ), σ̂p (e1 ×e (7.202) σ̂p1 (e1 B̂p2 e2 ) = σ̂p1 (e1 )B̂p2 e2 , (7.205) σ̂p1 (e1 B̂p2 e2 ) = e1 B̂p2 σ̂p1 (e2 ), (7.206) σ̂p1 (e1 N̂p2 e2 ) = σ̂p1 (e1 )N̂p2 e2 , (7.207) = ˆ ˆ ˆ 2 )×e ˆ 3, e1 ×(e2 ×e3 ) = (e1 ×e (7.208) g:e σ̂p1 (e1 Êp2 e2 ) g:e σ̂p1 (e1 )Êp2 e2 , e1 B̂p1 (e2 B̂p2 e3 ) = (e1 B̂p1 e2 )B̂p2 e3 , ˆ 2 ) = e1 B̂p e2 , σ̂p (e1 ×e ˆ 2, ˆ 2> ˆ = e1 ×e e1 , ≤, ≥, ̸=. Any explicit use of equality can be eliminated as follows. For any literal of the form X = c, any occurrence of X is replaced by c and the equality literal is dropped from the query clause. For any literal of the form X = Y , any occurrence of Y is replaced by X and the equality literal is dropped from the query clause. This procedure is not possible for the other comparison operators <, >, ≤, ≥, ̸= which we call inequality opertors. An inequality is any literal literal using an inequality operator. Containment and minimization for conjunctive queries without inequalities are NP-complete Problems. First note that a tableaux directly corresponds to a conjunctive queries with all body literals having a common predicate. From that and the NP-completeness results for tableaux containment which in turn follows immediately from first order subsumption [65, 319], it follows immediately that containment of conjunctive queries is NP-complete. Chandra and Merlin proved that minimization is NP-complete [143]. The complexity of checking for equivalence of conjunctive queries is related to graph isomorphism. EX The procedure for checking query containment builds upon mappings from queries to queries1 . These mappings have two different names: homomorphism and containment mapping. Let q1 and q2 be the two queries for which we want to check containment. Assume the qi are of the form q1 : r1 : − l1 , . . . , lk ′ q2 : r2 : − l1′ , . . . , lm Let V(qi ) be the set of variables occurring in qi , and C(qi ) be the set of constants occurring in qi . Further, let h be a substitution h : V(q2 ) → (V(q1 ) ∪ C(q1 )). We call h a containment mapping from q2 to q1 , if and only if the following conditions are fulfilled: 1. h(r2 ) = r1 for the head literals, and 2. for all i (1 ≤ i ≤ m) there exists a j (1 ≤ j ≤ k) such that h(li′ ) = lj . The latter condition states that for each body literal li′ in q2 there is a body literal lj in q1 such that h(li ) = li′ . Note that this does not imply that h is injective or surjective. The following theorem connects containment mappings with the containment problem: Theorem 10.1.1 Let q1 and q2 be two conjunctive queries. Then q1 ⊆ q2 if and only if there is a containment mapping h mapping q2 to q1 . 1 In fact, Chandra and Merlin mapped natural models which is essentially the same. 10.1. SET SEMANTICS 319 Consider the following example: q1 : p(X1 , X2 ) : − q(X2 , X1 ), q(X1 , X3 ) q2 : p(Y1 , Y2 ) : − q(Y2 , Y1 ), q(Y3 , Y1 ), q(Y1 , Y4 ) Consider h with h(Y1 ) = X1 , h(Y2 ) = X2 , h(Y3 ) = X2 , and h(Y4 ) = X3 . Then l: p(Y1 , Y2 ) q(Y2 , Y1 ) q(Y3 , Y1 ) q(Y1 , Y4 ) h(l) : p(X1 , X2 ) q(X2 , X1 ) q(X2 , X1 ) q(X1 , X3 ) and, hence, q1 ⊆ q2 . A query q is minimal, if it contains the minimal possible number of body literals. More formally, q is minimal , if for any query q ′ with q ≡ q ′ the number of body literals in q ′ greater than or equal to the number of body literals in q. The following theorem shows that our initial thoughts on minimization are correct for conjunctive queries. Theorem 10.1.2 Let q be a conjunctive query. Then there is a minimal query q ′ equivalent to q such that q ′ results from q by deleting zero or more body literals. This suggests a simple procedure for minimizing a given query q. For every body literal check whether some containment mapping h exists such that it is subsumed by some other body literal. Note that this containment mapping must not rename head variables. Let q and q ′ be two conjunctive queries. If q can be derived from q ′ solely by reordering body literals and renaming variables, then q and q ′ are called isomorphic. Minimal queries are unique up to some isomorphism. Obviously, minimizing conjunctive queries is also NP-complete. Let us now come to unions of conjunctive queries. Let Q = Q1 ∪ . . . ∪ Qk and Q′ = Q′1 ∪ . . . ∪ Q′l be two unions of conjunctive queries Qi and Q′j with a common head predicate. A containment mapping h from Q to Q′ maps each Qi to some Q′j such that h(Qi ) ⊆ Q′j . Sagiv and Yannakakis showed the following theorem [759]. Theorem 10.1.3 Let Q = Q1 ∪ . . . ∪ Qk and Q′ = Q′1 ∪ . . . ∪ Q′l be two unions of conjunctive queries Qi and Q′j with a common head predicate. Then Q ⊆ Q′ if and only if there is a containment mapping from Q to Q′ . This theorem gives us a corollary which allows us minimizing unions of conjunctive queries by a pairwise checking of containment [759] (see also [887]). Corollary 10.1.4 Let Q = Q1 ∪ . . . ∪ Qk be a union of conjunctive queries Qi with common head predicate. Then there exists a subset R of Q such that 1. R ≡ Q 2. ¬∃ R′ ⊂ R R′ ≡ Q 3. If Qm is any equivalent to Q, then there is a containment mapping from Qm to R but none from Qm to any proper subset R′ of R. 320CHAPTER 10. QUERY EQUIVALENCE, CONTAINMENT, MINIMIZATION, AND FACTORIZAT This corollary implies that we can minimize a query that is a union of conjunctive queries by eliminating those conjunctive queries Qi from it that are contained in some Qj . For conjunctive queries the problems of containment, equivalence, and minimization are The problems of containment, equivalence, and minimization of conjunctive queries are most difficitult if all body literals have a common predicate p. This is quite an unrealistic assumption as typical conjunctive queries will not only self-join the same relation. A first question is thus whether there exist special cases where there are polynomial algorithms for containment checking. Another strain of work is devoted to more complex queries. As it turns out, the results become less nice and more restricted. 10.1.2 . . . with Inequalities We now turn to conjunctive queries with inequalities in their body. For this section, we assume that the domain is totally ordered and dense. That is, for all x and y with x < y, there exists a z with x < z < y. In this context, we have the following theorem: Theorem 10.1.5 Assume the two conjunctive queries q1 and q2 are of the form q1 : p1 : − l1 , . . . , lk , e1 , . . . , el ′ , e′ , . . . , e′ q2 : p2 : − l1′ , . . . , lm n 1 where pi are the head literals, li and li′ are ordinary subgoals and ei and e′i are inequalities. Let h be a containment mapping from q2 to q1 where both are restricted to their ordinary literals. If additionally for all i = 1, . . . , n we have e1 , . . . , el =⇒ h(e′i ) then q1 ⊆ q2 . This result is due to Klug [503] who used the following procedure to reason about inequalities using comparison operators in {=, <, ≤}. Given a set of inequalities L, an directed graph G is defined whose nodes are the variables and constants in L. Whenever for all x < y or x ≤ y in L, the edge (x, y) is added to G. For all constants c and c′ in L, if c < c′ then we add an edge (c, c′ ). Edges are labeled with the according comparison operator. For equality predicates, an edge in both direction is added. Given the graph G, we conclude that x ≤ y if there is a path from x to y and x < y only if additionally at least one edge is labelled by <. An alternative is to use the procedure presented in Section 11.2.3 to solve the inequality inference problem. It also allows for the comparison operator ̸=. To see why a dense domain is important consider the domain of integers. From 1 < x < 3 we can easily conclude that x = 2, a fact we can derive neither from the procedure above nor from the axioms and inference procedure presented in Section 11.2.3. 10.2. BAG SEMANTICS 321 Unfortunately, the other direction of Theorem ?? is wrong as the following example shows: q1 : p(X1 , X2 ) : − q(X1 , X2 ), r(X3 , X4 ), r(X4 , X3 ) q2 : p(Y1 , Y2 ) : − q(Y1 , Y2 ), r(Y3 , Y4 ), Y3 ≤ Y4 Obviously, Y3 ≤ Y4 cannot be implied by any non-existing inequalities from q1 . However, for q1 to be non-empty, we must have r(a, b) and r(b, a) for some a and b. We also have a ≤ b or b ≤ a. In the former case, we can chose Y3 = a and Y4 = b and in the latter Y3 = b and Y4 = a to satisfy r(Y3 , Y4 ) and Y3 ≤ Y4 . Klug provides an alternative method to solve the containment problem. It builds upon canonical models. He then shows that if the containment test succeeds for all canonical models then and only then query containment holds [503]. Klug does not give an explicit algorithm for constructing these canonical models but these can be found in the literature [?]. He also gives two simple subclasses of inequality queries, where constants are allowed only on the leftor only on the right-hand side. For these subclasses if and only if holds in the above Theorem ??. Although the theorem is stated in terms of conjunctive queries with inequalities, it holds for any predicate p. Assume two queries of the following form: q1 : p1 : − l1 , . . . , lk , P ′ ,P′ q2 : p2 : − l1′ , . . . , lm where P and P ′ are arbitrary formulas. If there is a containment mapping from q2 to q1 where both are restricted to their ordinary literals and P =⇒ h(P ′ ), then q1 ⊆ q2 . . . . with Negation 10.1.3 The first incarnation of negation we consider is set difference. Here, Sagiv and Yannakakis where the first to derive some results [759]. [201] . . . under Constraints 10.1.4 constraints: [470, 471, 439] negation+constraints: [270, 271, 272, 273] . . . with Aggregation 10.1.5 [202, 203] 10.2 Bag Semantics 10.2.1 Conjunctive Queries • definition bag-containment, bag-equivalence [221, 497] – characterizations [159, 450, 449] (no proofs in [160]) 322CHAPTER 10. QUERY EQUIVALENCE, CONTAINMENT, MINIMIZATION, AND FACTORIZAT – complexity results [159] • definition bag-set containment, bag-set equivalence [159] 10.3 Sequences 10.3.1 Path Expressions XPath constructs and their short-hands to denote XPath sublanguages. • branching (’[]’) • wild cards (’*’) • descendant axis (’//’) • disjunction : only binary or branching, (’—’) (or-branching) Otherwise XPath only contains the child axis and node name tests. These sublanguages are represented as tree patterns. Query containment for certain subclasses: • XP[],∗,// is coNP-complete [606] Consider we have to answer p ⊆ p′ . Then – if p ∈ P [],// and p ∈ P [],∗,// then query containment is coNPcomplete. – in PTIME if number of ’//’ is restricted by d which then gives the degree of the polynomial describing the time complexity – remains coNP-complete if p contains no ’*’ p′ contains at most two ’*’s – remains coNP-complete if p contains at most 5 branches and p′ contains at most 3 branches. • P[],∗ is in PTIME (follows from work on conjunctive acyclic queries [951], also noted by Wood [934]) • P[],// is in PTIME [28] • P∗,// is in PTIME (these are related to a fragment of regular expressions [607]) • Por is in PTIME • P[],or is coNP-complete • P| is coNP-complete [606] [645] showed that P[],∗,//,| is coNP-complete for infinite alphabets and in PSPACE for finite alphabets. 10.4. MINIMIZATION 323 • P//,| is PSPACE-complete • P[],∗,// with variable binding and equivality tests is Πp2 - hard [235] A PTIME algorithm for the fragment P// can be found in [118] Florescu, Levy, and Suciu showed that for a language quite similar to P[],// containment is NP-complete if evaluated on a graph-based data model instead of a tree-based one [290]. Calvanese et al. also consider a graph-based data model and more expressive queries [122]. [645] also contains work on languages with variable bindings with different semantics. More result: [235] query containment with DTDs: [935] Schwentick gives a very good overview over complexity results for containment checking [779]. We should repeat his table here. 10.4 Minimization minimization: [522] 10.5 Detecting common subexpressions [283, 393, 391] 10.5.1 Simple Expressions Simple Non-Expensive Expressions Simple Expensive Expressions 10.5.2 Algebraic Expressions 10.6 Bibliography In a pair of papers Aho, Sagiv, and Ullmann [17, 18] study equivalence, containment, and minimization problems for tableaux. More specifically, they introduce a restricted variant of relational expressions containing projection, natural join, and selection with predicates that only compare attributes with constants. They further assume the existence of a universal relation. That is, every relation R is the projection of the universal relation on A(R). Now, these restricted conjunctive queries can be expressed with tableaux. The authors tableaux equivalence, containment, and minimization problems also in the presence of functional dependences. The investigated problems are all NPcomplete. Since the practical usefulness is limited we do not give the concrete results of this pair of papers. [158, 161] contains (complexity) results for deciding query equivalence in the case of recursive and nonrecursive datalog. 324CHAPTER 10. QUERY EQUIVALENCE, CONTAINMENT, MINIMIZATION, AND FACTORIZAT View selection problem (just pointers): . . . [468, 464, 465] [167] • conjunctive queries: equivalence and minimization are NP-complete [143, 18] In [18] tableaux are used. • polynomial algorithms for equivalence and minimization for simple tableaux: [18, 17] • union of elementary differences: Πp2 complete: remark in [759] and a pointer to the thesis of Sagiv [756]. • acyclic conjunctive queries: PTIME [951] • equivalence (σ, B, π, ∪), equivalence (σ, B, π, ∪, \): ΠP2 -complete [759] • Recursive Datalog: [100] Part III Rewrite Techniques 325 Chapter 11 Simple Rewrites 11.1 Simple Adjustments 11.1.1 Rewriting Simple Expressions Constant Folding Constant subexpressions are evaluated and the result replaces the subexpression. For example an expression 1/100 is replaced by 0.01. Other expressions like a − 10 = 50 can be rewritten to a = 60. However, the latter kind of rewrite is rarely performed by commercial systems. Eliminate Between A predicate of the form Y BETWEEN X AND Z is replaced by X <= Y AND Y <= Z. This step not only eliminates syntactic sugar but also enables transitivity reasoning to derive new predicates (see ). Eliminate IN A predicate of the form x IN (c1 ,...,cn ) is rewritten to x = c1 OR ...OR x = cn . This eliminates on form of the IN predicate and enables multikey index access. Another possibility is to use a table function that produces a table with one column whose values are exactly those in the IN-list. From thereon, regular optimization takes place. This possibility is also investigated when several comparisons of a column with a constants are disjunctively connected. Eliminating LIKE A predicate of the form a LIKE ’Guy’ can only be rewritten to a = ’Guy’ if a is of type varchar. This is due to the different white space padding rules for LIKE and =. 327 328 CHAPTER 11. SIMPLE REWRITES Start and Stop conditions derived from LIKE predicates A predicate of the form a LIKE ‘bla%‘ gives rise to a start condition a >= ‘bla‘. Which can enable subsequent index usage. A stop predicate of the form a < ’blb’ can also be derived. completing a range predicate for an index scan. Start and stop conditions can only be derived if there is no leading ‘%‘ in the pattern. Pushing NOT operations down and eliminating them NOT operations need to be pushed downwards for correctness reasons. Attention has to be paid to the IS NOT NULL and IS NULL predicates. XXX complete set of rules go into some table. Merge AND, OR, and other associative operations While parsing, AND and OR operations are binary. For simpler processing they are often n-ary in the internal representation. Therefor (p AND (q AND r)) is rewritten to (AND p q r). In general, associative nested operations should be merged. Examples of other associative operations are + and ∗. Normalized Argument Order for Commutative Operations ToDo enabling factorization, constant folding: move constants to the left Speed up evaluation of equal. Eliminate - and / (x − y) ; x + (−y) x/y ; x ∗ (1/y) Adjust join predicates A = B + C becomes A − C = B if AandB are from one relation and C is from another. Simplifying boolean expressions The usual simplification rules for boolean expressions can be applied. For example, if a contradiction can be derived. Eliminating ANY, SOME, and ALL ANY and SOME operators in conjunction with a comparison operator are rewritten into disjunction of comparison predicates. For example a > ANY (c1 , c2 ) is rewritten to a > c1 OR a > c2 . Correspondingly, an ALL operator with a constant list is rewritten into a conjunction of comparisons. For example, a > ALL(c1 , c2 ) is rewritten to a > c1 AND a > c2 . If a subquery occurs, then the ANY or SOME expression is rewritten to a correlated subquery in an EXIST predicate. Consider the query a > ANY 11.2. DERIVING NEW PREDICATES 329 (SELECT b FROM ...WHERE p). It is rewritten to EXISTS(SELECT ...FROM ...WHERE p AND a > b). Correspondingly, a subquery within an ALL operator is rewritten into a NOT EXISTS subquery. For example, a > (SELECT b FROM ...WHERE p) is rewritten into NOT EXISTS (SELECT b FROM ...WHERE p and a <= b) • CASE ¡==¿ UNION 11.1.2 Normal forms for queries with disjunction Another step of the NFST component or the first step of the rewriting component can be the transformation of boolean expressions found in where clauses in order to account for NULL values. Pushing not operators inside the boolean expression allows to use two-valued logic instead of three-valued logic. If we miss this step, we can get wrong results. Another possible step is the subsequent transformation of the boolean expressions in where clauses into disjunctive normal form (DNF) or conjunctive normal form (CNF). This step is not always necessary and really depends on which plan generation approach is taken. Hence, this step could take place as late as in a preparatory step for plan generation. It is (obviously) only necessary if the query contains disjunctions. We discuss plan generation for queries with disjunctions in Section ??. 11.2 Deriving new predicates Given a set of conjunctive predicates, it is often possible to derive new predicates which might be helpful during query plan generation. This section discusses ways to infer new predicates. 11.2.1 Collecting conjunctive predicates A query predicate may not only contain the and connector, but also or or not. For the inference rules in this section we need base predicates that occur conjunctively. We say that a (base) predicate q occurs conjunctively in a (complex) predicate p if p [q ← f alse] can be simplified to false. That is, if we replace every occurrence of q by true, the simplification rules in Figure 11.1 (Fig. ??) simplify p [q ← true] to false. These simplification rules can be used to implement a simple member function occursConjunctively to determine whether a predicate occurs conjunctively in a predicate or not. Together with a member function or visitor CollectBasePredicates, we can compute the set of conjunctively occurring predicates. This set will form the basis for the next subsections. 11.2.2 Equality Equality is a reflexive, symmetric and transitive binary relationship (see Fig. 11.2). Such a relation is called an equivalence relation Hence, a set of conjunctively 330 CHAPTER 11. SIMPLE REWRITES N OT true → f alse N OT f alse → true p AN D true → p p AN D f alse → f alse p OR true → true p OR f alse → p Figure 11.1: Simplification rules for boolean expressions x=x x=y =⇒ y = x x = y ∧ y = z =⇒ x = z Figure 11.2: Axioms for equality occurring equality predicates implicitly partitions the set of composed terms (IUs) into disjunctive equivalence classes. Constants: Let X be an equivalence class of equal expressions. Let Y be the set of all equality expressions that contributed to X. Then, in the query predicate we replace all expressions x = y by x = c and y = c and subsequently eliminate redundant expressions. σx=c (e1 Bx=y e2 ) ≡ σx=c (e1 ) A σy=c (e2 ) replace all predicates by IU=C.IU’s equivalent to a constant In [208] an abstract data structure is presented that helps computing the equivalence classes fast and also allows for a fast check whether two terms (IUs) are in the same equivalence class. Since we are often interested in whether a given IU is equal to a constant - or, more specifically, equal to another IU bound to a constant -, we have to modify these algorithms such that the IU bound to a constant, if it exists, becomes the representative of its equivalence class. For the member functions addEqualityPredicate, getEqualityRepresentative and isInSameEqualityClass we need an attribute equalityRepresentative in class IU that is initialized such that it points to itself. Another member equalityClassRank is initialized to 0. The code for the two member functions is given in Figure 11.3. By calling addEqualityPredicate for all conjunctively occurring equality predicates we can build the equivalence classes. 11.2.3 Inequality Table 11.1 gives a set of axioms used to derive new predicates from a set of conjunctively occurring inequalities S (see [887], see Fig. 11.4). 11.2. DERIVING NEW PREDICATES 331 These axioms have to be applied until no more predicates can be derived. The following algorithm [887] performs this task efficiently: 1. Convert each X < Y into X ̸= Y and X ≤ Y . 2. Compute the transitive closure of ≤. 3. Apply axiom A8 until no more new predicates can be derived. 4. Reconstruct < by using axiom A4. Step 3 can be performed as follows. For any true IUs X and Y we find these IUs Z with X ≤ Z ≤ Y . Then we check whether any two such Z’s are related by ̸=. Here, it is sufficient to check the original ̸= pairs in S and these derived in 1. A1 : X ≤ X A2 : X < Y ⇒ X ≤ Y A3 : X < Y ⇒ X ̸= Y A4 : X ≤ Y ∧ X ̸= Y ⇒ X < Y A5 : X ̸= Y ⇒ Y ̸= X A6 : X < Y ∧ Y < Z ⇒ X < Z A7 : X ≤ Y ∧ Y ≤ Z ⇒ X ≤ Z A8 : X ≤ Z ∧ Z ≤ Y ∧ X ≤ W ∧ W ≤ Y ∧ W ̸= Z ⇒ X ̸= Y Table 11.1: Axioms for inequality 11.2.4 Aggregation Let R1 , . . . , Rn be relations or views, A1 , . . . , Am attributes thereof, pw and ph predicates, and a1 , . . . , al expressions of the form fj (Bj ) for aggregate functions fj and attributes Bj . For a query block of the form select A1 , . . . , Ak , a1 , . . . , al from R1 , . . . , Rn where pw group by A1 , . . . , A m having ph 332 CHAPTER 11. SIMPLE REWRITES we consider the derivation of new predicates [551]. Obviously, the following predicates are true: min(B) ≤ B max(B) ≥ B max(B) ≥ min(B) min(B) ≤ avg(B) avg(B) ≤ max(B) If pw contains conjunctively a predicate Bθc for some constant c, we can furmin(B) θ c if θ ∈ {>, ≥} if θ ∈ {<, ≤} These predicates can then ther infer max(B) θ c avg(B) θ c if θ ∈ {<, ≤, >, ≥} be used to derive further predicates. The original and the derive predicates are usefule when the query block is embedded in another query block since we are allowed to add them to the embedding query block conjunctively (see Section 12.3). If we know restrictions on the aggregates from some embedding query block, we might be able to add predicates to pw . The following table contains the restrictions on an aggregate we know in the left column and the predicates we can max(B) ≥ c ; B ≥ c if no other aggregation occurs max(B) > c ; B > c if no other aggregation occurs infer in the right column: min(B) ≤ c ; B ≤ c if no other aggregation occurs min(B) < c ; B < c if no other aggregation occurs Note that the aggregation occurring in the left column must be the only aggregation found in the query block. That is, l = 1 and ph contains no aggregation other than a1 . To see why this is necessary, consider the following query select deptNo, max(salary), min(salary) from Employee group by deptNo Even if we know that max(salary) > 100.000, the above query block is not equivalent to select deptNo, max(salary), min(salary) from Employee where salary ¿ 100.000 group by deptNo Neither is select deptNo, max(salary) from Employee group by deptNo having avg(salary) ¿ 50.000 11.3. PREDICATE PUSH-DOWN AND PULL-UP 333 equivalent to select deptNo, max(salary) from Employee where salary ¿ 100.000 group by deptNo having avg(salary) ¿ 50.000 even if we know that max(salary) > 100.000. 11.2.5 ToDo [579] 11.3 Predicate Push-Down and Pull-Up 11.4 Eliminating Redundant Joins 11.5 Distinct Pull-Up and Push-Down 11.6 Set-Valued Attributes In this section, we investigate the effect of query rewriting on joins involving set-valued attributes in object-relational database management systems. We show that by unnesting set-valued attributes (that are stored in an internal nested representation) prior to the actual set containment or intersection join we can improve the performance of query evaluation by an order of magnitude. By giving example query evaluation plans we show the increased possibilities for the query optimizer. This section is based on [423]. 11.6.1 Introduction The growing importance of object-relational database systems (ORDBMS) [841] has kindled a renewed interest in the efficient processing of set-valued attributes. One particular problem in this area is the joining of two relations on set-valued attributes [315, 420, 713]. Recent studies have shown that finding optimal join algorithms with set-containment predicates is very hard [121]. Nevertheless, a certain level of efficiency for joins on set-valued attributes is indispensable in practice. Obviously, brute force evaluation via a nested-loop join is not going to be very efficient. An alternative is the introduction of special operators on the physical level of a DBMS [420, 713]. Integration of new algorithms and data structures on the physical level is problematic, however. On one hand this approach will surely result in tremendous speed-ups, but on the other hand this efficiency is purchased dearly. It is very costly to implement and integrate new algorithms robustly and reliably. 334 CHAPTER 11. SIMPLE REWRITES We consider an alternative approach to support set-containment and nonempty intersection join queries by compiling these join predicates away. The main idea is to unnest the set-valued attributes prior to the join. Thereby, we assume a nested internal representation [712]. This is also the underlying representation for the specific join algorithms proposed so far [420, 713]. Whereas [713] concentrates on set-containment joins, we also consider joins based on non-empty intersections. Ramasamy et al. also present a query rewrite for containment queries in [713], but on an unnested external representation, which (as shown there) exhibits very poor performance. Further, the special case of empty sets was not dealt with. The goal of our paper is to show that by rewriting queries we can compile away the original set-containment or intersection join. As our experiments with DB2 show, our rewrite results in speed-up factors that grow linearly in the size of the input relations as compared to quadratic growth for brute-force nestedloop evaluation. The advantage of this approach—as compared to [420, 713]—is that no new join algorithms have to be added to the database system. 11.6.2 Preliminaries In this section we give an overview of the definition of the set type. Due to the deferral of set types to SQL-4 [291], we use a syntax similar to that of Informix 1 . A possible example declaration of a table with a set-valued attribute is: create table ngrams ( setID integer not null primary key, content set ); setID is the key of the relation, whereas content stores the actual set. The components of a set can be any built-in or user-defined type. In our case we used set, because we wanted to store 3-grams (see also Section ??). We further assume that on set-valued attributes the standard set operations and comparison operators are available. Our rewriting method is based on unnesting the internal nested representation. The following view defining the unnested version of the above table keeps our representation more concise: create view view_ngrams(setID, d, card) as ( (select ngrams.setID, d.value, count(ngrams.content) from ngrams, table(unnest(ngrams.content)) d) union all (select ngrams.setID, NULL, 0) from ngrams where count(ngrams.content) = 0) ); 1 http://www.informix.com/documentation/ 11.6. SET-VALUED ATTRIBUTES 335 where setID identifies the corresponding set, d takes on the different values in content and card is the cardinality of the set. We also need unnest, a table function that returns a set in the form of a relation. As unnest returns an empty relation for an empty set, we have to consider this special case in the second subquery of the union statement, inserting a tuple containing a dummy value. 11.6.3 Query Rewrite We are now ready to describe the queries we used to compare the nested and unnested approach. We concentrate on joins based on subset-equal and nonempty intersection predicates, because these are the difficult cases as shown in [121]. We have skipped joins involving predicates based on equality, because the efficient evaluation of these predicates is much simpler and can be done in a straightforward fashion (see [420]). Checking Subset Equal Relation Here is a query template for a join based on a subset-equal predicate: select n_1.setID, n_2.setID from ngrams n_1, ngrams n_2 where is_subseteq(n_1.content, n_2.content) <> 0; (The comparison with 0 is only needed for DB2, which does not understand the type bool.) This query can be rewritten as follows. The basic idea is to join the unnested version of the table based on the set elements, group the tuples by their set identifiers, count the number of elements for every set identifier and compare this number with the original counts. The filter predicate vn1.card <= vn2.card discards some sets that cannot be in the result of the set-containment join. We also consider the case of empty sets in the second part of the query. Summarizing the rewritten query we get (select vn1.setID, vn2.setID from view_ngrams vn1, view_ngrams vn2 where vn1.d = vn2.d and vn1.card <= vn2.card group by vn1.setID, vn1.card, vn2.setID, vn2.card having count(*) = vn1.card) union all (select vn1.setID, vn2.setID from view_ngrams vn1, view_ngrams vn2 where vn1.card = 0); Checking Non-empty Intersection Our query template for joins based on non-empty intersections looks as follows. 336 CHAPTER 11. SIMPLE REWRITES select n_1.setID, n_2.setID from ngrams n_1, ngrams n_2 where intersects(n_1.content, n_2.content) <> 0; The formulation of the unnested query is much simpler than the unnested query in Section 11.6.3. Due to our view definition, not much rewriting is necessary. We just have to take care of empty sets again, although this time in a different, simpler way. select distinct vn1.setID, vn2.setID from view_ngrams vn1, view_ngrams vn2 where vn1.d = vn2.d and vn1.card > 0; 11.7 Bibliography This section is based on the investigations by Helmer and Moerkotte [423]. There, we also find a performance evaluation indicating that that the rewrites depending on the relation sizes result in speed-up factors between 5 and 50 even for moderately sized relations. Nevertheless, it is argued their, that support for set-valued attributes must be build into the DBMS. A viable alternative to the rewrites presented here is the usage of special join algorithms for join predicates involving set-valued attributes [315, 419, 420, 581, 601, 602, 713]. Nevertheless, as has been shown by Cai, Chakaravarthy, Kaushik, and Naughton, dealing with set-valued attributes in joins theoretically (and of course practical) difficult issue [121]. Last, to efficiently support simple selection predicates on set-valued attributes, special index structures should be incorporated into the DBMS [421, 422, 424]. 337 11.7. BIBLIOGRAPHY IU::addEqualityClassUnderThis(IU* lIU){ IU*lRepresentativeThis = this -> getEqualityRepresentativeIU; IU*lRepresentativeArg = aIU -> getEqualityRepresentativeIU; lRepresentativeArg -> _equalityRepresentative = lRepresentativeThis; if(lRepresentativeArg -> _equalityClassRank >= lRepresentativeThis -> _equalityClassRank){ lRepresentativeThis -> _equalityClassRank = lRepresentativeArg -> _equalityClass Rank + 1; } } IU::addEqualityPredicate(Compositing* p){ IU*lLeft = p -> leftIU; IU*lRight = p -> rightIU; if (p -> isEqualityPredicateIU && lLeft -> getEqualityRepresentativeIU == lRight -> getEqualityRepresentativeIU){ if(lLeft - > isBoundToConstantIU) { lLeft -> addEqualityClassUnderThis(lRight); }else if(lRight -> isBoundToConstantIU){ lRight -> addEqualityClassUnderThis(lLeft), }else if (lLeft -> _equalityClassRank > lRight -> _equalityClassRank){ lLeft -> addEqualityClassUnderThis(lRight) }else{ lright -> addEqualityClassUnderThis(lLeft) } } } IU* IU:: getEqualityRepresentativeIU(){ if (this == _equalityRepresentative){ _equalityRepresentative = _equalityRepresentative -> getEqualityRepresentativeIU; } return_equalityRepresentative; } Figure 11.3: 338 CHAPTER 11. SIMPLE REWRITES A1 A2 X 150000 12.4. COMPLEX VIEW MERGING 341 but view resolution with a subsequent push-down of the predicate e.salary > 150.000 will result in select e.eno, e.name from ( select e1.eno, e1.name, e1.salary, e1.dno from Emp1[e1] where e1.salary > 150000) union ( all select e2.eno, e2.name, e2.salary, e2.dno from Emp2[e2] where e2.salary > 150000) Note that we did not eliminate unneeded columns/attributes. Further note that we can now exploit possible indexes on Emp1.salary and Emp2.salary. In case union would have been used in the view definition, the rewritten query would also contain union requiring a duplicate elimination. Here is another example where pushing a predicate down results in much more efficient plans. Given the view define view EmpStat as select e.dno, min(e.salary) minSal, max(e.salary) maxSal, avg(e.salary) avgSal from Emp[e] group by e.dno the query select from where * EmpStat[e] e.dno = 10 can be rewritten to select e.dno, min(e.salary) minSal, max(e.salary) maxSal, avg(e.salary) avgSal from Emp[e] where e.dno = 10 group by e.dno which can be further simplified to select e.dno, min(e.salary) minSal, max(e.salary) maxSal, avg(e.salary) avgSal from Emp[e] where e.dno = 10 12.4 Complex View Merging 12.4.1 Views with Distinct XXX TODO views with distinct 342 CHAPTER 12. VIEW MERGING 12.4.2 Views with Group-By and Aggregation Consider the following view with a group-by clause and aggregation: create view AvgSalary as select e.dno, avg(e.salary) as avgSalary from Emp[e] group by e.dno The following query uses this view: select d.name, s.avgSalary) from Dept[d], AvgSalary[s] where d.location = ‘Paris‘ and d.dno = s.dno Using the view definition, this query can be rewritten to select d.name, avg(e.salary) as avgSalary from Dept[d], Emp[e] where d.location = ‘Paris‘ and d.dno = e.dno group by d.ROWID, d.name where d.ROWID is a either a key-attribute like d.dno or a unique row identifier of the tuples in Dept. Or course, this transformation is not valid in general. The primary condition here is that we have a key-foreign key join. More specifically, d.dno must be the key of the Dept table or it must be a unique attribute. Applying simple view resolution results in: d.name, s.avgSalary) Dept[d], (select e.dno, avg(salary) as avgSalary from Emp[e] group by e.dno) [s] where d.location = ‘Paris‘ and d.dno = s.dno select from This query can then be unnested using the techniques of Section ??. Sometimes strange results occur. Consider for example the view define view EmpStat as select e.dno, min(e.salary) minSal, max(e.salary) maxSal, avg(e.salary) avgSal from Emp[e] group by e.dno If the user issues the query 12.4. COMPLEX VIEW MERGING 343 select avg(minSal), avg(maxSal), avg(avgSal) from EmpStat view merging results in select avg(min(e.salary)), avg(max(e.salary)), avg(avg(e.salary)) from Emp[e] group by e.dno This is perfectly o.k. You just need to think twice about it. The resulting plan will contain two group operations: XXX Plan 12.4.3 Views in IN predicates Consider a view that contains the minimum salary for each department create view MinSalary as select e.dno, min(e.salary) as minSalary from Emp[e] group by e.dno and a query asking for all those employees together with their salaries in Parisian departments earning the minimum salary: select e.name, e.salary from Emp[e], Dept[d] where e.dno = d.dno and d.location = ‘Paris‘ and (e.dno, e.sal) in MinSalary This query can be rewritten to: select e.name, e.salary from Emp[e], Dept[d], Emp[e2] where e.dno = d.dno and d.location = ‘Paris‘ and e.dno = e2.dno group by e.ROWID, d.ROWID, e.name, e.salary having e.salary = min(e2.sal) Note that the employee relation occurs twice. Avoiding to scan the employee relation twice can be done as follows: 12.4.4 Final Remarks Not all views can be merged. If for example a rownum function that numbers rows in a table is used in a view definition for a result column, then the view cannot be merged. Unmerged views will remain as nested subqueries with 344 CHAPTER 12. VIEW MERGING two alternative evaluation strategies: Either they will be evaluated as nested queries, that is for every row produced by some outer producer the view is evaluated, or the view will be materialized into a temporary table. Whatever is more efficient must be chosen by the plan generator. However, techniques for deriving additional predicates and subsequent techniques such as predicate move around (predicate pull-down, push-down) are still applicable. 12.5 Bibliography Chapter 13 Quantifier treatment 13.1 Pseudo-Quantifiers Again, the clue to rewrite subqueries with a ANY or ALL predicate is to apply aggregate functions [314]. A predicate of the form < ANY (select . . . from . . . where . . . ) can be transformed into the equivalent predicate < (select max(. . . ) from . . . where . . . ) Analogously, a predicate of the form < ALL (select . . . from . . . where . . . ) can be transformed into the equivalent predicate < (select min(. . . ) from . . . where . . . ) In the above rewrite rules, the predicate < can be replaced by =, ≤, etc. If the predicate is > or ≥ then the above rules are flipped. For example, a predicate of the form >ANY becomes >select min and >ALL becomes >select max. After the rewrites have been applied, the Type A or Type JA unnesting techniques can be applied, depending on the details of the inner query block. 345 346 CHAPTER 13. QUANTIFIER TREATMENT 13.2 Existential quantifier Existential quantifiers can be seen as special aggregate functions and query blocks exhibiting an existential quantifier can be unnested accordingly [220]. For example, an independent existential subquery can be treated the same way as a Type A query. Nested existential quantifiers with a correlation predicate can be unnested using a semi-join. Other approaches rewrite (existential) quantifiers using the aggregate function count [314]. Consider the partial query pattern ... where exists (select from where ... ... ...) It is equivalent to ... where 0 > (select from where count(. . . ) ... ...) A not exists like in ... where not exists (select from where ... ... ...) is equivalent to ... where 0 = (select from where count(. . . ) ... ...) After these rewrites have been applied, the Type A or Type JA unnesting techniques can be applied, depending on the details of the inner query block. 13.3 Universal quantifier Universal quantification is a little more complex. An overview is provided in [184]. Here is the prototypical OQL query pattern upon which our discussion 347 13.3. UNIVERSAL QUANTIFIER Case-No. 1 p() q() 2 p() q(e1 ) Case-No. 9 p(e2 ) q() 10 p(e2 ) q(e1 ) 3 p() q(e2 ) 11 p(e2 ) q(e2 ) 4 p() q(e1 , e2 ) 12 p(e2 ) q(e1 , e2 ) 5 p(e1 ) q() 6 p(e1 ) q(e1 ) 13 p(e1 , e2 ) q() 7 p(e1 ) q(e2 ) 8 p(e1 ) q(e1 , e2 ) 14 p(e1 , e2 ) q(e1 ) 15 p(e1 , e2 ) q(e2 ) Table 13.1: Classification Scheme According to the Variable Bindings of universal quantifiers nested within a query block is based: Q ≡ select e1 from e1 in E1 where for all e2 in select e2 from e2 in E2 where p: q where p (called the range predicate) and q (called the quantifier predicate) are predicates in a subset of the variables {e1 , e2 }. This query pattern is denoted by Q. In order to emphasize the (non-)occurrence of variables in a predicate p, we write p(e1 , . . . , en ) if p depends on the variables e1 , . . . , en . Using this convention, we can list all the possible cases of variable occurrence. Since both e1 and e2 may or may not occur in p or q, we have to consider 16 cases (see Table 13.1). All cases but 12, 15, and 16 are rather trivial. Class 12 queries can be unnested by replacing the universal quantifier by a division, set difference, anti-semijoin, or counting. Class 15 queries are treated by set difference, anti-semijoin or grouping with count aggregation. For Class 16 queries, the alternatives are set difference, anti-semijoin, and grouping with count aggregation. In all cases, special care has to be taken regarding NULL values. For details see [184]. Class 12 Let us first consider an example of a Class 12 query. select al.name from al in Airline where for all ap in (select ap from ap in Airport where apctry = ’USA’): ap in al.lounges Define U ≡ πap (σapctry=′ U SA′ (Airport[ap, apctry])). Then the three alternative algebraic expressions equivalent to this query are • plan with division: if U = ∅ 16 p(e1 , e2 ) q(e1 , e2 ) 348 CHAPTER 13. QUANTIFIER TREATMENT then Airline[name] else µap:lounges (Airline[name, lounges]) ÷ U • plan with set difference: Airline[name] \ (πname (U Nap̸∈lounges Airline[name, lounges])) • plan with anti-semijoin: πname (U Tap̸∈lounges Airline[name, lounges]) This plan is only valid, if the projected attributes of Airline form a superkey. The plan with the anti-semijoin is typically the most efficient. In general, the plan with division is [637, 355]: ifσp(e ) (E2 [e2 ])̸=∅ (((E1 [e1 ] Bq(e1 ,e2 ) E2 [e2 ]) ÷ σp(e2 ) (E2 [e2 ])), E1 [e1 ]) 2 In case the selection σp(e2 ) (E2 [e2 ]) yields at least a one tuple or object, we can apply the prediate p to the divident, as in ifσp(e ) (E2 [e2 ])̸=∅ (((E1 [e1 ] Bq(e1 ,e2 ) σp(e2 ) (E2 [e2 ])) ÷ σp(e2 ) (E2 [e2 ])), E1 [e1 ]). 2 If the quantifier predicate q(e1 , e2 ) is of the form e2 ∈ e1 .SetAttribute, then the join can be replaced by an unnest operator: ifσp(e ) (E2 [e2 ])̸=∅ ((µe2 :SetAttribute (E1 [e1 , SetAttribute]) ÷ σp(e2 ) (E2 [e2 ])), E1 [e1 ]) 2 Using set difference, the translation is E1 [e1 ] \ πe1 ((E1 [e1 ] × σp(e2 ) (E2 [e2 ])) \ (E1 [e1 ] Bq(e1 ,e2 ) σp(e2 ) (E2 [e2 ]))) which can be optimized to E1 [e1 ] \ E1 [e1 ] N¬q(e1 ,e2 ) σp(e2 ) (E2 [e2 ]) This plan is mentioned in [830], however using a regular join instead of a semijoin. The anti-semijoin can be employed to eliminate the set difference yielding the following plan: E1 [e1 ] T¬q(e1 ,e2 ) σp(e2 ) (E2 [e2 ]) This plan is in many cases the most efficient plan. However, the correctness of this plan depends on the uniqueness of e1 , i.e., the attribute(s) e1 must be a (super) key of E1 . This is especially fulfilled in the object-oriented context if e1 consists of or contains the object identifier. We do not present the plans based group and count operations (see [184]). 13.3. UNIVERSAL QUANTIFIER Class 15 349 Here is an example query of Class 15: select al.name from al in Airline where for all f in ( select f from f in Flight where al = f.carrier): f.to.apctry != “Libya” The quantifier’s range formulat σp(e1 ,e2 ) (E2 [e2 ]) is obviously not closed. It contains the free variable e1 . According to the reduction algorithm of Codd [199], the division plan is (E1 [e1 ] B¬p(e1 ,e2 )∨q(e2 ) E2 [e2 ]) ÷ E2 [e2 ]. The plan with set difference is E1 [e1 ] \ πe1 ((E1 [e1 ] Bp(e1 ,e2 ) E2 [e2 ]) \ (E1 [e1 ] Bp(e1 ,e2 ) σq(e2 ) (E2 [e2 ]))) and the most efficient plan using the antijoin is E1 [e1 ] Tp(e1 ,e2 ) σ¬q(e2 ) (E2 [e2 ]). Class 16 Here is an example Class 16 query: select al.name from al in Airline where for all ap in ( select ap from ap in Airport where apctry = alctry): ap in al.lounges The range predicate again depends on the outer level variable e1 . A valid division plan looks similar to the one for Class 15. A plan with set difference is E1 [e1 ] \ πe1 ((E1 [e1 ] Bp(e1 ,e2 ) E2 [e2 ]) \ (E1 [e1 ] Bp(e1 ,e2 )∧q(e1 ,e2 ) E2 [e2 ])). This plan can first be refined by replacing the seet difference of the two join expression by a semijoin resultint in E1 [e1 ] \ (E1 [e1 ] Np(e1 ,e2 )∧¬q(e1 ,e2 ) E2 [e2 ]) Finally, the remaining set difference is transformed into an anti-semijoin which also covers the semijoin: E1 [e1 ] Tp(e1 ,e2 )∧¬q(e1 ,e2 ) E2 [e2 ]. Again, the uniqueness constraing on E2 [e2 ] is required for this most efficient plan to be valid. For all discussed classes, problems with NULL values might occur. In that case, the plans have to refined [184]. 350 13.4 CHAPTER 13. QUANTIFIER TREATMENT Bibliography [467] [220] [184] [715, 708] Chapter 14 Unnesting Nested Queries 351 352 CHAPTER 14. UNNESTING NESTED QUERIES Chapter 15 Optimizing Queries with Materialized Views 15.1 Conjunctive Views 15.2 Views with Grouping and Aggregation 15.3 Views with Disjunction 15.4 Bibliography materialized view with aggregates: [828], materialized view with disjunction: [11], SQL Server: [334] other: [12, 151, 152, 163, 549, 843, 881, 948] [139, 143, 166, 154, 283, 482, 535, 676, 707, 785] some more including maintenance etc: [10, 15, 53, 95, 151, 157, 205, 383, 398] [434, 481, 548, 704, 749, 266, 828] [843, 852, 851, 973, 949] [6, 252, 253, 405] Overview: [390] [550] performance eval: [93] Stacked views: [225] recursion: [255] with patterns (integration): [707], [254, 256], [234] 353 354CHAPTER 15. OPTIMIZING QUERIES WITH MATERIALIZED VIEWS Chapter 16 Semantic Query Rewrite 16.1 Constraints and their impact on query optimization Using Constraints: [332, 374] 16.2 Semantic Query Rewrite Semantic query rewrite exploits knowledge (semantic information) about the content of the object base. This knowledge is typically specified by the user. We already saw one example of user-supplied information: inverse relationships. As we already saw, inverse relationships can be exploited for more efficient query evaluation. Another important piece of information is knowledge about keys. In conjunction with type inference, this information can be used during query rewrite to speed up query execution. A typical example is the following query select distinct from where * Professor p1, Professor p2 p1.university.name = p2.university.name By type inference, we can conclude that the expressions p1.university and p2.university are of type University. If we further knew that the name of universities are unique, that is the name is a candidate key for universities, then the query could be simplified to select distinct from where * Professor p1, Professor p2 p1.university = p2.university Evaluating this query does no longer necessitate accessing the universities to retrieve their name. Some systems consider even more general knowledge in form of equivalences holding over user-defined functions [1, 289]. These equivalences are then used to rewrite the query. Thereby, alternatives are generated all of which are subsequently optimized. 355 356 CHAPTER 16. SEMANTIC QUERY REWRITE Semantic Query Optimization: [139] 16.3 Exploiting Uniqueness in Query Optimization [681] 16.4 Bibliography [82] [73] [943] Foreign functions semantic rules rewrite: [154] Conjunctive Queries, Branch Minimization: [743] Part IV Plan Generation 357 Chapter 17 Current Search Space and Its Limits 17.1 Plans with Outer Joins, Semijoins and Antijoins outer join reordering [299, 298, 736, 308], outer join/antijoin plan generation [717], semijoin reducer [836], 17.2 Expensive Predicates and Functions 17.3 Techniques to Reduce the Search Space • join single row tables first • push down SARGable predicates • For large join queries do not apply transitivity of equality to derive new predicates and disable cross products and possibly bushy trees. 17.4 Bibliography 359 360 CHAPTER 17. CURRENT SEARCH SPACE AND ITS LIMITS Chapter 18 Dynamic Programming-Based Plan Generation 18.1 Introduction So far, we treated predicates that reference a single relation as selection predicates and predicates that reference two relations as join predicates. In general, a predicate can reference more than two relations. In this case, it can be treated as a join predicate. Consider for example the query select * from R, S, T, where R.A = S.B AND S.C = T.D and R.E + S.F = T.G A query graph as defined in Section ?? does not suffice to capture these predicates. What is needed are hypergraphs. There exists a second reason why hypergraphs are needed. In Section 7.15, we introduced several conflict handling mechanisms, which allow for the correct enumeration of the core search space. Every operator ◦ within some operator tree has a set of relations TES associated with it. This set of relations was a subset of all the relations in the leaf nodes below the operator subtree rooted at ◦. Hence, some relations occurred on ◦’s left side, others on its right side. Thus, we splitted TES into TESleft and TESright . Then, the pair (TESleft , TESright ) is a hyperedge. Algorithm DPsube was used to calculate the best plan bottom-up. The applicability test, among other things, assured that only connected components and connected complements thereof were formed. The test often fails. This is similar to the manner the tests of DPsub and DPsize (see Section ??) failed for regular graphs. There, this fact lead us to the development of DPccp, which enumerates CCPs for regular graphs quite efficiently. The first goal of this section is to build an equally efficient enumerator for CCPs for hypergraphs. Then, this basic algorithm is extended such that it is able to deal with more operators than those handled in the core search space. 361 362CHAPTER 18. DYNAMIC PROGRAMMING-BASED PLAN GENERATION R1 R4 R2 R5 R3 R6 Figure 18.1: Sample hypergraph 18.2 Hypergraphs Let us start with the definition of hypergraphs. Definition 18.2.1 (hypergraph) A hypergraph is a pair H = (V, E) such that 1. V is a non-empty set of nodes and 2. E is a set of hyperedges, where a hyperedge is an unordered pair (u, v) of non-empty subsets of V (u ⊂ V and v ⊂ V ) with the additional condition that u ∩ v = ∅. We call any non-empty subset of V a hypernode. We assume that the nodes in V are totally ordered via an (arbitrary) relation ≺. The ordering on nodes is important for our algorithm. A hyperedge (u, v) is simple if |u| = |v| = 1. A hypergraph is simple if all its hyperedges are simple. Note that a simple hypergraph is the same as an ordinary undirected graph. In our context, the nodes of hypergraphs are relations and the edges are abstractions of join predicates. Consider, for example, a join predicate of the form R1 .a + R2 .b + R3 .c = R4 .d + R5 .e + R6 .f . This predicate will result in a hyperedge ({R1 , R2 , R3 }, {R4 , R5 , R6 }). Fig. ?? contains an example of a hypergraph. The set V of nodes is V = {R1 , . . . , R6 }. Concerning the node ordering, we assume that Ri ≺ Rj ⇐⇒ i < j. There are the simple edges ({R1 }, {R2 }), ({R2 }, {R3 }), ({R4 }, {R5 }), and ({R5 }, {R6 }). The hyperedge from above is the only true hyperedge in the hypergraph. Note that is possible to rewrite the above complex join predicate. For example, it is equivalent to R1 .a + R2 .b = R4 .d + R5 .e + R6 .f − R3 .c. This leads to a hyperedge ({R1 , R2 }, {R3 , R4 , R5 , R6 }). If the query optimizer is capable of performing this kind of algebraic transformations, all derived hyperedges are added to the hypergraph, at least conceptually. We will come back to this issue in Section ??. To decompose a join ordering problem represented as a hypergraph into smaller problems, we need the notion of subgraph. More specifically, we only deal with node-induced subgraphs. Definition 18.2.2 (subgraph) Let H = (V, E) be a hypergraph and V ′ ⊆ V a subset of nodes. The node induced subgraph G|V ′ of G is defined as G|V ′ = 18.3. CCPS: CSG-CMP-PAIRS FOR HYPERGRAPHS 363 (V ′ , E ′ ) with E ′ = {(u, v)|(u, v) ∈ E, u ⊆ V ′ , v ⊆ V ′ }. The node ordering on V ′ is the restriction of the node ordering of V . As we are interested in connected subgraphs, we give Definition 18.2.3 (connected) Let H = (V, E) be a hypergraph. H is connected if |V | = 1 or if there exists a partitioning V ′ , V ′′ of V and a hyperedge (u, v) ∈ E such that u ⊆ V ′ , v ⊆ V ′′ , and both G|V ′ and G|V ′′ are connected. If H = (V, E) is a hypergraph and V ′ ⊆ V is a subset of the nodes such that the node-induced subgraph G|V ′ is connected, then we call V ′ a connected subgraph or csg for short. The number of connected subgraphs is important for dynamic programming: it directly corresponds to the number of entries in the dynamic programming table. If a node set V ′′ ⊆ (V \ V ′ ) induces a connected subgraph G|V ′′ , we call V ′′ a connected complement of V ′ or cmp for short. For the purpose of this chapter, we assume that all hypergraphs are connected. This way, we can make sure that no (additional) cross products are needed. This condition can easily be assured by adding according hyperedges: for every pair of connected components, we can add a hyperedge whose hypernodes contain exactly the relations of the connected components. By considering these hyperedges as A operators. As we saw in Section 7.15, cross products can be handled by our conflict detectors can be handled 18.3 CCPs: Csg-Cmp-Pairs for Hypergraphs With these notations, we can move closer to the heart of dynamic programming by defining a csg-cmp-pair, or ccp for short. Definition 18.3.1 (csg-cmp-pair, ccp) Let H = (V, E) be a hypergraph and S1 , S2 two subsets of V such that S1 ⊆ V and S2 ⊆ (V \ S1 ) are a connected subgraph and a connected complement. If there further exists a hyperedge (u, v) ∈ E such that u ⊆ S1 and v ⊆ S2 , we call (S1 , S2 ) a csg-cmp-pair. Note that if (S1 , S2 ) is a csg-cmp-pair, then (S2 , S1 ) is one as well. Out of these two possibilities, only one will be enumerated by our subsequent algorithm. More specifically, we will restrict the enumeration of csg-cmp-pairs to those (S1 , S2 ) which satisfy the condition that min(S1 ) ≺ min(S2 ), where min(S) = s such that s ∈ S and ∀s′ ∈ S : s ̸= s′ =⇒ s ≺ s′ . Since this restriction will hold for all csg-cmp-pairs enumerated by our procedure, we are sure that no duplicate csg-cmp-pairs are calculated. As a consequence, we have to take some care in order to ensure that our dynamic programming procedure is complete: if the binary operator we apply is commutative, the procedure to build a plan for S1 ∪ S2 from plans for S1 and S2 has to take commutativity into account. However, this is not really a challenge. Obviously, in order to be correct, any dynamic programming algorithm has to consider all csg-cmp-pairs [618]. Further, only these have to be considered. Thus, the minimal number of cost function calls of any dynamic programming algorithm is exactly the number of csg-cmp-pairs for a given hypergraph. Note 364CHAPTER 18. DYNAMIC PROGRAMMING-BASED PLAN GENERATION that the number of connected subgraphs is far smaller than the number of csgcmp-pairs. The problem now is to enumerate the csg-cmp-pairs efficiently and in an order acceptable for dynamic programming. The latter can be expressed more specifically. Before enumerating a csg-cmp-pair (S1 , S2 ), all csg-cmp-pairs (S1′ , S2′ ) with S1′ ⊆ S1 and S2′ ⊆ S2 have to be enumerated. 18.4 Neighborhood The main idea to generate csg-cmp-pairs is to incrementally expand connected subgraphs by considering new nodes in the neighborhood of a subgraph. Informally, the neighborhood N (S) under an exclusion set X consists of all nodes reachable from S that are not in X. We derive an exact definition below. When choosing subsets of the neighborhood for inclusion, we have to treat a hypernode as a single instance: either all of its nodes are inside an enumerated subset or none of them. Since we want to use the fast subset enumeration procedure introduced by Vance and Maier [898], we must have a single bit representing a hypernode and also single bits for relations occurring in simple edges. Since these may overlap, we are constrained to choose one unique representative of every hypernode occurring in a hyperedge. We choose the node that is minimal with respect to ≺. Accordingly, we define: min(S) = {s|s ∈ S, ∀s′ ∈ S s ̸= s′ =⇒ s ≺ s′ } Note that if S is empty, then min(S) is also empty. Otherwise, it contains a single element. Hence, if S is a singleton set, then min(S) equals the only element contained in S. For our hypergraph in Fig. ?? and with S = {R4 , R5 , R6 }, we have min(S) = {R4 }. Let S be a current set, which we want to expand by adding further relations. Consider a hyperedge (u, v) with u ⊆ S. Then, we will add min(v) to the neighborhood of S. However, we have to make sure that the missing elements of v, i.e. v \ min(v), are also contained in any set emitted. We thus define min(S) = S \ min(S) For our hypergraph in Fig. ?? and with S = {R4 , R5 , R6 }, we have min(S) = {R5 , R6 }. We define the set of non-subsumed hyperedges as the minimal subset E ↓ of E such that for all (u, v) ∈ E there exists a hyperedge (u′ , v ′ ) ∈ E ↓ with u′ ⊆ u and v ′ ⊆ v. Additionally, we make sure that none of the nodes of a hypernode are contained in a set X, which is to be excluded from neighborhood considerations. We thus define a set containing the interesting hypernodes for given sets S and X. We do so in two steps. First, we collect the potentially interesting hypernodes into a set E ↓′ (S, X) and then minimize this set to eliminate subsumed hypernodes. This step then results in E ↓ (S, X), with which the algorithm will work. E ↓′ (S, X) = {v|(u, v) ∈ E, u ⊆ S, v ∩ S = ∅, v ∩ X = ∅} 18.5. THE CCP ENUMERATOR BUENUMCPPHYP 365 Define E ↓ (S, X) to be the minimal set of hypernodes such that for all v ∈ E ↓′ (S, X) there exists a hypernode v ′ in E ↓ (S, X) such that v ′ ⊆ v. Note that apart from the connectedness, we test exactly the conditions given in Def. 18.3.1. For our hypergraph in Fig. ?? and with X = S = {R1 , R2 , R3 }, we have E ↓ (S, X) = {{R4 , R5 , R6 }}. We are now ready to define the neighborhood of a hypernode S, given a set of excluded nodes X. [ min(v) (18.1) IN(S, X) = v∈E↓(S,X) For our hypergraph in Fig. ?? and with X = S = {R1 , R2 , R3 }, we have IN(S, X) = {R4 }. Assuming a bit vector representation of sets, the neighborhood can be efficiently calculated bottom-up. 18.5 The CCP Enumerator BuEnumCppHyp Before starting with the algorithm description we give a high-level overview of the general principles used in the algorithm: 1. The algorithm constructs ccps by enumerating connected subgraphs from an increasing part of the query graph; 2. both the primary connected subgraphs and its connected complement are created by recursive graph traversals; 3. during traversal, some nodes are forbidden to avoid creating duplicates. More precisely, when a function performs a recursive call it forbids all nodes it will investigate itself; 4. connected subgraphs are increased by following edges to neighboring nodes. For this purpose hyperedges are interpreted as n : 1 edges, leading from n of one side to one (specific) canonical node of the other side (cmp. Eq. 18.1). Summarizing the above, the algorithm traverses the graph in a fixed order and recursively produces larger connected subgraphs. The main challenge relative to DPccp is the traversal of hyperedges: First, the ”starting” side of the edge can require multiple nodes, which complicates neighborhood computation. In particular the neighborhood can no longer be computed as a simple bottom-up union of local neighborhoods. Second, the ”ending” side of the edge can lead to multiple nodes at once, which disrupts the recursive growth of components. Consider a set S1 , which we want to extend by a hyperedge (u, w). Even if u ⊆ S1 , there is no guarantee that S1 ∪ w will be connected. To overcome these problems, the algorithm picks a representative end node. In our exmaple, it picks the 1 in the n : 1 of item 4 (see also Eq. 18.1). With it, it starts the recursive growth and exploits the DP table to check if a valid constellation has been reached, i.e., the constructed hypernode induces a connected subgraph. This exploitation builds on the fact that our DP strategies 366CHAPTER 18. DYNAMIC PROGRAMMING-BASED PLAN GENERATION enumerate subsets before supersets. We are now prepared to discuss the details of the algorithm. We give the implementation of our join ordering algorithm for hypergraphs by means of the pseudocode for member functions of a class BuEnumCcpHyp. This allows us to minimize the number of parameters by assuming that this class contains references to the query hypergraph (G = (V, E)) and to the dynamic programming table (DpTable). The whole algorithm is distributed over five subroutines. The top-level routine BuEnumCcpHyp initializes the DpTable with access plans for single relations and then calls EmitCsg and EnumerateCsgRec for each set containing exactly one relation. In a real implementation, the DpTable should be initialized before calling BuEnumCcpHyp. The member function EnumerateCsgRec is responsible for enumerating connected subgraphs. It does so by calculating the neighborhood and iterating over each of its subset. For each such subset S1 , it calls EmitCsg. This member function is responsible for finding suitable complements. It does so by calling EnumerateCmpRec, which recursively enumerates the complements S2 for the connected subgraph S1 found before. The pair (S1 , S2 ) is a csg-cmp-pair. For every such pair, EmitCsgCmp is called. Its main responsibility is to consider a plan built up from the plans for S1 and S2 . The following subsections discuss these five member functions in detail. We illustrate them with the example hypergraph shown in Fig. ??. The corresponding traversal steps are shown in Fig. 18.2, we will illustrate them during the description of the algorithm. 18.5.1 BuEnumCcpHyp The pseudocode for BuEnumCcpHyp looks as follows: BuEnumCcpHyp() for each v ∈ V // initialize DpTable DpTable[{v}] = plan for v for each v ∈ V descending according to ≺ EmitCsg({v}) // process singleton sets EnumerateCsgRec({v}, Bv ) // expand singleton sets return DpTable[V ] In the first loop, it initializes the dynamic programming table with plans for single relations. In the second loop, it calls for every node in the query graph, in decreasing order (according to ≺) the two subroutines EmitCsg and EnumerateCsgRec. In Fig. 18.2, we find the call stack of our algorithm. The calls generated by BuEnumCcpHyp correspond to those with stack-depth zero, where the stackdepth is indicated in the second column from the left. For convenience, we not only give the parameters, but also the neighborhood IN. The algorithm calls EmitCsg({v}) for single nodes v ∈ V to generate all csg-cmp-pairs ({v}, S2 ) via calls to EnumerateCsgCmp and EmitCsgCmp, where v ≺ min(S2 ) holds. This condition implies that every csg-cmp-pair is generated only once, and no symmetric pairs are generated. In Fig. 18.2, this corresponds to single vertex graphs, e.g. step 1 and 2. The calls to EnumerateCsgRec extend the initial set {v} to larger 367 18.5. THE CCP ENUMERATOR BUENUMCPPHYP . . . . . . . . . . . . R1 R4 R1 R4 R1 R4 R1 R4 R1 R4 R1 R4 R2 R5 R2 R5 R2 R5 R2 R5 R2 R5 R2 R5 R3 R6 R3 R6 R3 R6 R3 R6 R3 R6 R3 . . 1 . . 2 . . . . 3 . . . . 4 . . . . 5 . . . R6 6 . . . . R1 R4 R1 R4 R1 R4 R1 R4 R1 R4 R1 R4 R2 R5 R2 R5 R2 R5 R2 R5 R2 R5 R2 R5 R3 . R6 . R3 7 . R6 . R3 8 . . . R6 . R3 9 . . . R6 R3 . 10 . . . R6 R3 . 11 . . . R6 . . . . R4 R1 R4 R1 R4 R1 R4 R1 R4 R1 R4 R2 R5 R2 R5 R2 R5 R2 R5 R2 R5 R2 R5 . R6 R3 . 13 . R6 R3 . 14 . . . R6 R3 . 15 . . . R6 R3 . 16 . . . R6 R3 . 17 . . . R6 . . . . R4 R1 R4 R1 R4 R1 R4 R1 R4 R1 R4 R2 R5 R2 R5 R2 R5 R2 R5 R2 R5 R2 R5 R3 R6 R3 R6 R3 R6 R3 R6 R3 R6 R3 . 19 . . 20 . . . . . . 21 Legend: R1 R4 R1 R4 R1 R2 R2 R5 R2 R5 R1 R2 R3 . 25 R6 . R3 . R6 . . 22 . . 23 connected subgraph connected complement . R6 . 24 R1 R1 forbidden node non-forbidden node . 26 Figure 18.2: Trace of algorithm for Figure ?? sets S1 , for which then connected subsets of its complement S2 are found such that (S1 , S2 ) results in a csg-cmp-pair. In Fig. 18.2, this is shown in step 2, for example, where EnumerateCsgRec starts with R5 and expands it to {R5 , R6 } in step 4 (step 3 being the construction of the complement). To avoid duplicates during enumerations, all nodes that are ordered before v according to ≺ are prohibited during the recursive expansion [618]. Formally, we define this set as Bv = {w|w ≺ v} ∪ {v}. 18.5.2 . 18 R1 . . 12 R1 R3 . EnumerateCsgRec The general purpose of EnumerateCsgRec is to extend a given set S1 , which induces a connected subgraph of G to a larger set with the same property. It does so by considering each non-empty, proper subset of the neighborhood of S1 . For each of these subsets N , it checks whether S1 ∪N is a connected component. This is done by a lookup into the DpTable. If this test succeeds, a new connected component has been found and is further processed by a call EmitCsg(S1 ∪ N ). Then, in a second step, for all these subsets N of the neighborhood, we call EnumerateCsgRec such that S1 ∪ N can be further extended recursively. The 368CHAPTER 18. DYNAMIC PROGRAMMING-BASED PLAN GENERATION reason why we first call EmitCsg and then EnumerateCsgRec is that in order to have an enumeration sequence valid for dynamic programming, smaller sets must be generated first. Summarizing, the code looks as follows: EnumerateCsgRec(S1 , X) for each N ⊆ IN(S1 , X): N ̸= ∅ if DpTable[S1 ∪ N ]̸= ∅ EmitCsg(S1 ∪ N ) for each N ⊆ IN(S1 , X): N ̸= ∅ EnumerateCsgRec(S1 ∪ N, X ∪ IN(S1 , X)) Take a look at step 12. This call was generated by BuEnumCcpHyp on S1 = {R2 }. The neighborhood consists only of {R3 }, since R1 is in X (R4 , R5 , R6 are not in X either, but not reachable). EnumerateCsgRec first calls EmitCsg, which will create the joinable complement (step 13). It then tests {R2 , R3 } for connectedness. The according DpTable entry was generated in step 13. Hence, this test succeeds, and {R2 , R3 } is further processed by a recursive call to EnumerateCsgRec (step 14). Now the expansion stops, since the neighborhood of {R2 , R3 } is empty, because R1 ∈ X. 18.5.3 EmitCsg EmitCsg takes as an argument a non-empty, proper subset S1 of V , which induces a connected subgraph. It is then responsible to generate the seeds for all S2 such that (S1 , S2 ) becomes a csg-cmp-pair. Not surprisingly, the seeds are taken from the neighborhood of S1 . All nodes that have ordered before the smallest element in S1 (captured by the set Bmin(S1 ) ) are removed from the neighborhood to avoid duplicate enumerations [618]. Since the neighborhood also contains min(v) for hyperedges (u, v) with |v| > 1, it is not guaranteed that S1 is connected to v. To avoid the generation of false csg-cmp-pairs, EmitCsg checks for connectedness. However, each single neighbor might be extended to a valid complement S2 of S1 . Hence, no such test is necessary before calling EnumerateCmpRec, which performs this extension. The pseudocode looks as follows: EmitCsg(S1 ) X = S1 ∪ Bmin(S1 ) N = IN(S1 , X) for each v ∈ N descending according to ≺ S2 = {v} if ∃(u, v) ∈ E : u ⊆ S1 ∧ v ⊆ S2 EmitCsgCmp(S1 , S2 ) EnumerateCmpRec(S1 , S2 , X ∪ Bv (N )) where Bv (W ) = {w|w ∈ W, w ≤ v} is defined as in Section 3.2.4 for DPccp. Take a look at step 20. The current set S1 is S1 = {R1 , R2 , R3 }, and the neighborhood is IN = {R4 }. As there is no hyperedge connecting these two 18.5. THE CCP ENUMERATOR BUENUMCPPHYP 369 sets, there is no call to EmitCsgCmp. However, the set {R4 } can be extended to a valid complement, namely {R4 , R5 , R6 }. Properly extending the seeds of complements is the task of the call to EnumerateCmpRec in step 21. 18.5.4 EnumerateCmpRec EnumerateCsgRec has three parameters. The first parameter S1 is only used to pass it to EmitCsgCmp. The second parameter is a set S2 which is connected and must be extended until a valid csg-cmp-pair is reached. Therefore, it considers the neighborhood of S2 . For every non-empty, proper subset N of the neighborhood, it checks whether S2 ∪ N induces a connected subset and is connected to S1 . If so, we have a valid csg-cmp-pair (S1 , S2 ) and can start plan construction (done in EmitCsgCmp). Irrespective of the outcome of the test, we recursively try to extend S2 such that this test becomes successful. Overall, the EnumerateCmpRec behaves very much like EnumerateCsgRec. Its pseudocode looks as follows: EnumerateCmpRec(S1 , S2 , X) for each N ⊆ IN(S2 , X): N ̸= ∅ if DpTable[S2 ∪ N ]̸= ∅ ∧ ∃(u, v) ∈ E : u ⊆ S1 ∧ v ⊆ S2 ∪ N EmitCsgCmp(S1 , S2 ∪ N ) X = X ∪ IN(S2 , X) for each N ⊆ IN(S2 , X): N ̸= ∅ EnumerateCmpRec(S1 , S2 ∪ N, X) Take a look at step 21 again. The parameters are S1 = {R1 , R2 , R3 } and S2 = {R4 }. The neighborhood consists of the single relation R5 . The set {R4 , R5 } induces a connected subgraph. It was inserted into DpTable in step 6. However, there is no hyperedge connecting it to S1 . Hence, there is no call to EmitCsgCmp. Next is the recursive call in step 22 with S2 changed to {R4 , R5 }. Its neighborhood is {R6 }. The set {R4 , R5 , R6 } induces a connected subgraph. The according test via a lookup into DpTable succeeds, since the according entry was generated in step 7. The second part of the test also succeeds, as our only true hyperedge connects this set with S1 . Hence, the call to EmitCsgCmp in step 23 takes place and generates the plans containing all relations. 18.5.5 EmitCsgCmp The procedure EmitCsgCmp(S1 ,S2 ) is called for every S1 and S2 such that (S1 , S2 ) forms a csg-cmp-pair. It is the (call back) interface for BuEnumCcpHyp. Its only task is to call BuildPlan, which then builds the optimal plan(s) for (S1 , S2 ). 18.5.6 Neighborhood Calculation The formulation of neighborhood we used, is only one possibility. In fact, any neighborhood satisfying the following condition will do. Let G = (V, E) be a 370CHAPTER 18. DYNAMIC PROGRAMMING-BASED PLAN GENERATION calcNeighborhood(S, X) N := ∅ if isConnected(S) N = simpleNeighborhood(S) \ X else foreach s ∈ S N ∪= simpleNeighborhood(s) F = (S ∪ X ∪ N ) // forbidden since in X or already handled foreach (u, v) ∈ E if u ⊆ S if v ∩ F = ∅ N += min(v) F ∪= N if v ⊆ S if u ∩ F = ∅ N += min(u) F ∪= N Figure 18.3: Pseudocode for calcNeighborhood hypergraph not containing any subsumed edges. For some set S, for which we want to calculate the neighborhood, define the set of reachable hypernodes as W (S, X) := {w|(u, w) ∈ E, u ⊆ S, w ∩ (S ∪ X) = ∅}, where X contains the forbidden nodes. Then, any set of nodes N such that for every hypernode in W (S, X) exactly one element is contained in N can serve as the neighborhood. Further, in order to make BuEnumCcpHyp as efficient as DPccp for simple graphs, it is convenient to materialize the simple neighborhood for every plan class contained in the DpTable and calculate it bottom-up. Figure 18.3 contains one possible implementation of the neighborhood calculation. 18.6 DPhyp 18.7 Adding Selections 18.8 Adding Maps 18.9 Adding Grouping Chapter 19 Optimizing Queries with Disjunctions 19.1 Introduction Simple rewrites as indicated in Section ?? for IN and OR predicates that boil down to comparisons of a column with a set of constants can eliminate disjunction from the plan or push it into a multirange index access. Another possibility that can be used for disjunctions on single columns is to use DISJOINT UNION of plans. This is a special form of UNION where conditions ensure that no phantom duplicates are produced. The DISJOINT UNION operator merely concatenates the result tables without any further overhead like duplicate elimination. For example a predicate of the form x = c1 or y = c2 where x and y are columns of the same table results in two predicates 1. x = c1 2. x <> c1 AND y = c2 Obviously, no row can satisfy both conditions. Hence, the query select * from R where x = c1 or y = c2 can be safely rewritten to (select * from R where x = c1 ) DISJOINT UNION (select * from R where x <> c1 AND y = c2 In case there are indexes on x and y efficient plans do exist. If they don’t the table R needs to be scanned twice. This problem is avoided by using bypass plans. DISJOIN UNIONs can also be used for join predicates. Consider the following example query: select * from R, S where R.a = S.a OR R.b = S.a This query can be rewritten to (select * from R, S where R.a = S.a) DISJOINT UNION (select * from R, S where R.a <> S.a and R.b = S.b) The general condition here is that all equality predicates have one side identical. Note that both tables are scanned and joined twice. Bypass plans will eliminate this problem. 371 372 CHAPTER 19. OPTIMIZING QUERIES WITH DISJUNCTIONS Let us consider a more complex example: select * from R, S where R.a = S.a AND h.b IN (c1,c2). XXX 19.2 Using Disjunctive or Conjunctive Normal Forms 19.3 Bypass Plans All the above approaches rely on conjunctive normal forms. However, in the presence of disjunctions, this does not necessarily yield good plans. Using a disjunctive normal form does not always solve the problem either and this approach has its own problems with duplicates. This is why bypass plans where developed [489, 835, 186]. The idea is to provide selection and join operators with two different output streams: one for qualifying tuples and one for the not qualifying tuples. We cannot go into the details of this approach and only illustrate it by means of examples. Let us first consider a query with no join and a selection predicate of the form a ∧ (b ∨ c). This selection predicate is already in conjunctive normal form. The disjunctive normal form is (a ∧ b) ∨ (a ∧ c). We first consider some DNF-based plans (Fig. 19.1). These plans generate duplicates, if a tuple qualifies for both paths. Hence, some duplicate elimination procedure is needed. Note that these duplicates have nothing to do with the duplicates generated by queries. Even if the query does not specify distinct, the duplicates generated must be eliminated. If there are duplicates, which is quite likely, then the condition a is evaluted twice for those tuples qualifying for both conjuncts (Plan a and b). Figure 19.2 presents two CNF plans. ∪ ∪ ∪ σb σa σb σc σa σc σa σa Ia Ib Figure 19.1: DNF plans σb σc σa II 373 19.3. BYPASS PLANS σa σb∨c σb∨c σa I II Figure 19.2: CNF plans CNF plans never produce duplicates. The evaluation of the boolean factors can stop as soon as some predicate evaluates to true. Again, some (expensive) predicates might be evaluted more than once in CNF plans. Figure 19.3 shows some bypass plans. Note the different output streams. It should be obvious, that a bypass plan can be more efficient than both a CNF or DNF plan. It ∪ ∪ σc ∪ σc σb − ∪ + + σb σc σa σa I II σa σa σa −− + σa σa σc σc − + − + σb σb σb III IV V Figure 19.3: Bypass plans is possible to extend the idea of bypass plans to join operators. However, this and the algorithm to generate bypass plans is beyond the scope of the current paper (see [489, 835, 186]). 374 CHAPTER 19. OPTIMIZING QUERIES WITH DISJUNCTIONS 19.4 Implementation remarks The internal representation of execution plans during plan generation typically differs from that used in Rewrite I. The reason is that many plans have to be generated and space efficiency is a major concern. As in the query representation discussed earlier, the physical algebraic operators can be organized into a hierarchy. Besides their arguments, they possibly contain backpointers to the original query representation (e.g. for predicates). Sharing is a must for plan generation. Hence, subplans are heavily shared. The plan nodes are enhanced by so-called property vectors. These contain information about the plan: • logical information – the set of relations joined – the set of predicates applied so far – the set of IUs computed so far – order information • physical information – costs – cardinality information For fast processing, the first three set-valued items in the logical information block are represented as bit-vectors. However, the problem is that an upper bound on the size of these bitvectors is not reasonable. Hence, they are of variant size. It is recommendable, to have a plan node factory that generates plan nodes of different length such that the bit-vectors are included in the plan node. A special interpreter class then knows the offsets and lengths of the different bitvectors and supplies the operations needed to deal with them. This bit-vector interpreter can be attached to the plan generator’s control block as indicated in Fig. 25.3. 19.5 Other plan generators/query optimizer There are plenty of other query optimizers described in the literatur. Some of my personal favorites not mentioned so far are the Blackboard query optimzer [488], the Epoq optimizer [612, 611], the Genesis optimizer [57, 62], the Gral query optimizer [67], the Lanzelotte query optimizer [529, 530, 531], the Orion optimizer [50, 51, 493], the Postgres optimizer [469, 400, 398, 399], the Prima optimizer [404, 402], the Probe optimizer [223, 222, 655], the Straube optimizer [850, 889, 847, 848, 846, 849]. Highly recommended is a description of the DB2 query optimizer(s) [322]. Also interesting to read is the first proposal for a rule-based query optimizer called Squirel [823] and other proposals for rule-based query optimizers [295, 780, 486, 485, 580]. 19.6. BIBLIOGRAPHY 19.6 375 Bibliography Disjunctive queries: P. Ciaccia and M. Scalas: Optimization Strategy for Relational Queries. IEEE Transaction on Software Engineering 15 (10), pp 12171235, 1989. Kristofer Vorwerk, G. N. Paulley: On Implicate Discovery and Query Optimization. International Database Engineering and Applications Symposium (IDEAS’02) Jack Minker, Rita G. Minker: Optimization of Boolean Expressions-Historical Developments. IEEE Annals of the History of Computing 2 (3), pp 227-238, 1980. Chaudhuri: SIGMOD 03: [149] Conjunctive Queries, Branch Minimization: [743] Also Boolean Difference Calculus (?): [824] 376 CHAPTER 19. OPTIMIZING QUERIES WITH DISJUNCTIONS Chapter 20 Generating Plans for the Full Algebra 377 378 CHAPTER 20. GENERATING PLANS FOR THE FULL ALGEBRA Chapter 21 Generating DAG-structured Plans @misc{ roy-optimization, author = "Prasan Roy", title = "Optimization of DAG-Structured Query Evaluation Plans", url = "citeseer.nj.nec.com/roy98optimization.html" } 379 380 CHAPTER 21. GENERATING DAG-STRUCTURED PLANS Chapter 22 Simplifying the Query Graph [This chapter was written by Thomas Neumann] 22.1 Introduction As we have seen in Chapter 3, computing the the optimal join for large queries is a very hard problem. Most hand-written queries join just a few (<15) relations, but in general join queries can become quite large: Some systems like SAP R/3 store their data in thousands of relations, and subsequently generate large join queries. Other examples include data warehousing, where a fact table is joined with a large number of dimension tables, forming a star join, and databases that make heavy use of views to simplify query formulation (where the views then implicitly add joins). Existing database management systems have difficulties optimizing very large join queries, falling back to heuristics when they cannot solve them exactly anymore. This is unfortunate, as it does not offer a smooth transition. Ideally, one would optimize a query as much as possible under given time constraints. When optimizing join queries, the optimal join order is usually determined using some variant of dynamic programming (DP). However finding the optimal join is NP-hard in general, which means that large join queries become intractable at some point. On the other hand, the complexity of the problem depends heavily upon the structure of the query (see Chapter 3), where some queries can be optimized exactly even for a large number of relations while other queries quickly become too difficult. As computing the optimal join order becomes intractable at some point, the standard technique of handling large join queries resorts to some heuristics. Some commercial database systems first try to solve the problem exactly using DP, and then fall back to greedy heuristics when they run out memory. As we have seen in Chapter 3, a wide range of heuristics has been proposed in the literature. Most of them integrate some kind of greedy processing in the optimization process, greedily building execution plan fragments that seem plausible. The inherent problem of this approach is that it is quite likely to greedily make a decision that one would regret having more information about the complete execution plan. For example greedily deciding which two relations should be joined first is very hard, as it depends 381 382 CHAPTER 22. SIMPLIFYING THE QUERY GRAPH on all other joins involved in the query. Here, we follow a different approach presented in [639]: If a query is too complex to optimize exactly, we simplify it using a greedy heuristic until it becomes tractable using DP. The simplification step does not build execution plans but modifies the join graph of the query to make it more restrictive, ruling out join orders that it considers unlikely. In a way this is the opposite of the standard greedy plan building techniques: Instead of greedily choosing joins (which is very hard), we choose joins that must be avoided. The great advantage of this approach is that we can start with the ’easy’ decisions (i.e., the relatively obvious ones) using the heuristic and then leave the hard execution plan building to the DP algorithm once the problem is simplified enough. The resulting optimization algorithm adapts naturally to the query complexity and the given time budget, simplifying the query just as much as needed and then optimizing the simplified query exactly. 22.2 On Optimizing Join Queries Optimizing the join order is one of the most important steps of query optimization, as changes in the join order can affect the query execution times by orders of magnitudes. Unfortunately computing the optimal join order is NP-hard in general, and the standard technique of using dynamic programming fails if the query is large enough. Still, there are large differences in problem complexity even for queries of the same size. When disregarding cross products, the join predicates included in the query induce a query graph, and the structure of that query graph determines the complexity of the problem. Clique queries for example, where there is a join predicate between any two relations involved in the query, are the worst-case scenario for join ordering. Here any combination of relations is joinable, all joins affect each other through redundant join edges, and both the space complexity and the runtime complexity of the best known algorithm increases exponentially (in the order of O(2n ) and O(4n ), where n is the number of relations). For clique queries there is little hope of ever finding a good algorithm, but fortunately large clique queries never occur in practice. Chain queries on the other hand, where relations are joined sequentially, are quite common in practice and much easier to optimize: Any join tree without cross products must only consist of relations that are neighboring in the chain, i.e., that form a subchain. As there are less than n2 subchains of a chain of length n, and we can join a subchain only with less than n other (neighboring) subchains, we get a space complexity of O(n2 ) and a time complexity of O(n3 ). Other graph structures are between these two extremes. Star queries, which are common in data warehouse applications where dimension tables are joined to a central fact table, have a space complexity of O(2n ) and a time complexity of O(n2n ). The practical impact of these complexity differences can be seen in Figure 22.1. It shows the optimization time using DPhyp and the setup discussed in Section [639]. One observation here is that while small queries (<10 relations) can be optimized quickly regardless of the graph structure, larger queries 383 22.3. GRAPH SIMPLIFICATION ALGORITHM 1e+09 chains cycles stars grids cliques 1e+08 optimization time [ms] 1e+07 1e+06 100000 10000 1000 100 10 1 4 6 8 10 12 14 16 18 20 number of relations Figure 22.1: Runtimes for Different Query Graphs soon become too expensive for everything except chain and cycle queries. Clique queries are particular bad, of course, but even the data warehousing star queries are too complex relatively soon. For really large queries (e.g., 50 relations), finding the optimal solution using DP is out of question for most query types. Now the basic idea of graph simplification stems from the fact that some graph easier to solve than others: If the problem is too difficult to solve exactly, we change the query graph to make it easier to solve. We will look at this simplification strategy in the next section. 22.3 Graph Simplification Algorithm After examining the impact of the query graph structure on optimization time, we now study an algorithm to simplify the query graph as much as needed to allow for a dynamic programming solution. We first discuss the simplification itself, then how this can be used to simplify a query graph as much as needed, and then one edge selection heuristic (which is orthogonal to the main simplification algorithm). Finally we show that the approach is plausible by proving optimality for star queries and certain classes of cost functions. During this section we assume that the query has been brought into proper query (hyper-)graph form. In particular we assume that all non-inner joins have been expressed as hyperedges, as suggested in [619]. This allows us to reason about graph structures alone without violating proper query semantics. 384 CHAPTER 22. SIMPLIFYING THE QUERY GRAPH graph joins graph joins R0 R1 R0 R1 R3 R2 R0 B R1 R0 B R2 R0 B R3 original R0 R1 R3 R2 R0 B R1 {R0 , R1 } B R2 R0 B R3 1st step R0 R1 R3 R2 R0 B R1 {R0 , R1 } B R2 {R0 , R1 } B R3 2nd step R3 R2 R0 B R1 {R0 , R1 } B R2 {R0 , R1 , R2 } B R3 3rd step Figure 22.2: Exemplary Simplification Steps for a Star Query 22.3.1 Simplifying the Query Graph When a query graph is too complex to solve exactly, we perform a simplification step to reduce its complexity. Note that with simplification we mean a simplification of the underlying optimization problem. The graph itself becomes more complex, at least for a human. This is illustrated in Figure 22.2. The original query is a star query with three satellite relations. The number of possible join trees (ignoring commutativity) is 3! = 6, as any linear join order is valid. To reduce the search space we look for decisions that are relatively easy. For example if R0 B R1 is very selective and R0 B R2 greatly increases the cardinality, it is probably a good idea to join R1 first (for the real criterion see Section 22.3.3). We thus modify the join R0 B R2 into {R0 , R1 } B R2 . This describes that we can join a join tree containing R0 and R1 with a join tree contain R2 , and forms a hyperedge in the query graph. The search space shrinks to 3 possible trees, as now R1 is required to come before R2 . R3 can still be joined arbitrary, either before R1 , between R1 and R2 , or after R2 . We can reduce the search space to two trees by requiring R1 to be joined before R3 (2. step), and finally to just one valid tree by ordering the join with R2 before the join with R3 (3. step). At this point the optimization problem is trivial to solve, but the solution could be poor due to the heuristical join ordering. In the actual algorithm we therefore simplify just as much as needed to be able to solve the optimization problem, and we perform these simplification first where we are most certain about the correct join ordering. The pseudo-code for a single simplification step is shown in Figure 22.3. It examines all pairs of joins, and checks if they are neighboring in the query graph, i.e., they touch via common relations. The condition is somewhat complex, as the query graph contains hyperedges and not just regular join edges. It checks if B2 could occur in a subtree of B1 and if B2 need not come before B1 (otherwise ordering has no effect). If they are neighboring, we compute the 22.3. GRAPH SIMPLIFICATION ALGORITHM 385 SimplifyGraph(G = (V, E)) j1 = ∅, j2 = ∅, M = −∞ // Find the most beneficial simplification for each S1L B1 S1R ∈ E for each S2L B2 S2R ∈ E, B1 ̸= B2 // Does B1 neighbor B2 ? if ((S2L ⊆ S1L ∨ S2R ⊆ S1L ) ∧ (S2L ∪ S2R ̸⊆ S1L ))∨ ((S2L ⊆ S1R ∨ S2R ⊆ S1R ) ∧ (S2L ∪ S2R ̸⊆ S1R )) b =orderingBenefit(S1L B1 S1R ,S2L B2 S2R ) if b > M ∧ (B2 could be ordered before B1 ) j1 = S1L B1 S1R , j2 = S2L B2 S2R , M = b // No further simplification possible? if j1 = ∅ return G // Order j2 = S2L B2 S2R before j1 = S1L B1 S1R if (S2L ⊆ S1L ∨ S2R ⊆ S1L ) ∧ (S2L ∪ S2R ̸⊆ S1L )) return (V, E \ {j1 } ∪ {(S1L ∪ S2L ∪ S2R ) B1 S1R }) else return (V, E \ {j1 } ∪ {S1L B1 (S1R ∪ S2L ∪ S2R )}) Figure 22.3: Pseudo-Code for a Single Simplification Step expected benefit of ordering B2 before B1 . The implementation of orderingBenefit is orthogonal to the simplification itself, it should predict how likely it is that B2 must come before B1 (see Section 22.3.3). We restrict ourselves to ordering neighboring joins as it is hard to make useful predictions about arbitrary unrelated joins. Note that through a series of simplification steps the join neighborhoods increase, such that the algorithm can ultimately order all joins if needed. The algorithm remembers the join pair (j1 , j2 ) with the maximum estimated benefit, and modifies the query graph such that j2 must come before j1 . This creates an hyperedge in the query graph, as now j1 ’requires’ all relations involved in j2 to guarantee the ordering, effectively shrinking the search space. A detail of the pseudo-code not discussed yet is the condition ’B2 could be ordered before B1 ’ in the first loop. So far we have assumed that it is indeed possible to order B2 before B1 , but this might not be the case: First, the query might contain non-inner joins, which are not freely reorderable. Second, if the query is cyclic, a series of simplification steps could lead to a contradiction, demanding (transitively) that B1 must come before B2 and B2 before B1 . To avoid this, we build a partial ordering of joins as a directed graph, deriving the initial one from the original query hypergraph and then ordering the joins as indicated by the simplification step. The condition ’B2 could be ordered before B1 ’ is effectively a check if an edge B2 → B1 would create a cycle in our graph. In general the performance of SimplifyGraph can be improved significantly by maintaining proper data structures. As we will see in the next section, the algorithm is applied repeatedly to simplify a query graph, thus it pays off to remember already computed results. We therefore materialize all neighbors of a 386 CHAPTER 22. SIMPLIFYING THE QUERY GRAPH join, and update the neighbors when a join is modified. Further, we remember the estimated benefit for each neighbor, and keep all joins in a priority queue ordered by the maximum benefit they can get from ordering a neighbor. This eliminates the two nested loops in the algorithm, and greatly improves runtime. 22.3.2 The Full Algorithm The full algorithms works in two phases: First, it repeatedly performs simplification steps until the problem complexity has decreased enough, and then it runs a dynamic programming algorithm to find the optimal solution for the simplified graph. To check if the graph has been simplified enough, we can reuse the DPccp algorithm [618], where the complexity of the dynamic programming algorithm depends on the number of connected subgraphs in the query graph. More precisely, the number of connected subgraphs is identical to the size of the DP table after the optimization is finished. We can therefore simplify the query graph until the number of connected subgraphs decreased sufficiently. Counting the number of connected subgraphs is not that trivial, but an algoruthm follows naturally from graph-based query optimization: The DPhyp algorithm [619] solves the join ordering problem by enumerating all connected subgraphs of the query graph and joining them with all connected subgraphs that are disjoint from but connected to the first subgraph. By simply eliminating the enumeration of the second subgraph we get an algorithm that produces all connected subgraphs. Note that the algorithm does not have to fill a DP table, as we are only interested in the number of connected subgraphs, and we can stop as soon as we have enumerated more than our maximum number of connected subgraphs. Howver, enumerating 10,000 subgraphs in a query graph with 100 relations takes roughly 5ms. This means that while checking the problem complexity is not that expensive, we cannot afford to check it after each simplifications step, as there may be thousands of simplification steps. The full algorithm therefore operates as depicted in Figure 22.4. It is invoked by giving a query graph G and a maximum complexity budget b. It first generates all possible simplifications Ḡ by applying the SimplifyGraph step repeatedly. The complexity of these graphs decreases monotonically, as each simplification step adds more restrictions. Then, it performs a binary search over the list of graphs, and computes the complexity just for the currently examined graph. The graph with the least number of simplification steps that has a complexity ≤ b is stored in Gb . Note that Gb could be equal to G, i.e., the original problem, if the graph is simple enough. After the binary search, the optimal solution for Gb is computed by using DPhyp [619]. Again the pseudo-code is simplified. In particular it is not advisable to really materialize all query graphs in Ḡ, as this becomes noticeably expensive for queries with more than 50 relations. Instead, we just remember the two joins (j1 , j2 ) selected for merging by the SimplifyGraph step. Then we materialize the graphs examined by the binary search by replaying the merge steps based upon these (j1 , j2 ) values relative to the last materialized graph. Using these techniques, the full algorithm (including the final DPhyp call) takes less than one second for a star query with 50 relations and a complexity budget of 10, 000 in 22.3. GRAPH SIMPLIFICATION ALGORITHM 387 GraphSimplificationOptimizer(G = (V, E),b) // Input: A query graph G and a complexity budget b // Output: The best plan found under the budget b // Compute all possible simplification steps Ḡ = a list of query graphs, G′ = G do append G′ to Ḡ G = G′ , G′ =SimplifyGraph(G) while G ̸= G′ // Use binary search to find the necessary simplifications l = 0, r = |Ḡ|, v = r, Gb = Ḡ[r − 1] while l < r m = ⌊ l+r 2 ⌋ c =#connected subgraphs in Ḡ[m] (count at most b + 1) if c > b l =c+1 else r=c if c < v v = c, Gb = Ḡ[m] // Solve the simplified graph return DPhyp(Gb ) Figure 22.4: The Full Optimization Algorithm experiments [639]. Note that we can even avoid generating all possible merge steps: By using search techniques for unbounded search (e.g., [76]) we can generate merging steps as required by the search strategy. This does not change the asymptotic complexity, but it is more efficient if most queries require few or no simplification steps (which is probably the case in practice). 22.3.3 Join Ordering Criterion So far we have assumed that the simplification algorithm can somehow estimate the benefit of ordering B2 before B1 . In principle this is orthogonal to the simplification algorithm, and different kinds of ordering criterion could be used. The experiments in [639] used the following estimation function, which compares the relative costs of the join orders, and gave very good results: orderingBenefit(X B1 R1 , X B2 R2 ) = C((X B1 R1 ) B2 R2 ) C((X B2 R2 ) B1 R1 ) The rational here is that if joining first R2 and then R1 is orders of magnitude cheaper than first joining R1 and then R2 , it is very likely that the join with R2 will come before the join with R1 in the optimal solution, regardless of the other relations involved. As the simplification algorithms orders the highest 388 CHAPTER 22. SIMPLIFYING THE QUERY GRAPH expected benefit first, it first enforces orderings where the cost differences are particularly large (and thus safe). Note that the criterion shown above is oversimplified. First, computing the cost function C is not trivial, as we are only comparing joins and do not have complete execution plans yet. In particular information about physical properties of the input is missing, which is required by some cost functions. One way to avoid this is to use the Cout cost functions for the benefit estimation. The advantage of Cout is that it can be used without physical information, and further the optimizations based upon Cout are usually not that bad, as minimizing intermediate results is a plausible goal. Using the real cost function would be more attractive, but for some cost functions we can only use the real cost function in the final DP phase, as then physical information is available. The second problem is that we are not really comparing X B1 R1 with X B2 R2 , but S1L B1 S1R with S2L B2 S2R , where B1 and B2 are neighboring hyperedges in the query graph. The are multiple cases that can occur, here we assume that S2L ⊆ S1L , the other cases are analogous. We define |S|B as the output cardinality of joining the relations in S: |S|B = (ΠR∈S |R|) ∗ (ΠBi ∈V|S sel(Bi )). Then the joins S1L B1 S1R and S2L B2 S2R can be interpreted as X B1 R1 and X B2 R2 with |X| = max(|S1L |B , |S2L |B ), |R1 | = |S1R |B , and |R2 | = |S2R |B . Note that we do not have to compute the costs of joining the relations in Si , as we are only interested in comparing the relative performance of B1 and B2 . Note further that the accuracy of the prediction will increase over time, as the Si grow and at some point contain all relations that will come before a join. Therefore it is important to make the ’safe’ orderings early, when the uncertainty is higher, and perform the more unclear orderings later when more is known about the input. 22.3.4 Theoretical Foundation The join ordering criterion presented in the previous section is a heuristic, and can lead to suboptimal execution plans. However, in some cases, in particular for star queries with certain cost functions, we can guarantee the optimality of the reduction. We define that cost function C is relative order preserving if the following condition holds for arbitrary relations R0,...,3 and arbitrary joins B1,2,3 with independent join predicates: C(R0 B1 R1 B2 R2 ) ≥C(R0 B2 R2 B1 R1 ) ⇒C(R0 B3 R3 B1 R1 B2 R2 )≥C(R0 B3 R3 B2 R2 B1 R1 ) Or, in other words, the optimal relative ordering of B1 and B2 remains unchanged by changing the cardinality of R0 by a factor of α. This is closely related to the known ASI property of cost functions [438]. as it can be shown easily that every ASI cost function is relative order preserving. But relative order preserving is more general than ASI, for example a simple sort-merge-join 22.3. GRAPH SIMPLIFICATION ALGORITHM 389 cost function (CSM (R1 B R2 ) = C(R1 ) + C(R2 ) + |R1 | log |R1 | + |R2 | log |R2 |) does not satisfy the ASI property, but is relative order preserving. As queries we consider star queries of the form Q = (V = {R0 , . . . , Rn ), E = {R0 B1 R1 , . . . , R0 Bn Rn }) (can be guaranteed by renaming relations), and require independence between join predicates and a relative order preserving cost function C. W.l.o.g. we assume that the cost function is symmetric, as we can always construct a symmetric cost function by using min(C(Ri B Rj ), C(Rj B Ri )). Then, star queries have two distinct properties: First, all query plans are linear, with R0 involved in the first join. Thus, as our cost function is symmetric, we can restrict ourselves to plans of the form (R0 B Rπ(1) ) . . . B Rπ(n) , where π(i) defines a permutation of [1, n]. Second, given a non-empty join tree T and a relation Ri ̸∈ T , T ′ = T B Ri is a valid join tree 0 BRi | and |T ′ | = |T ||Ri | |R |R0 ||Ri | . Thus any (new) relation can be joined to an existing join tree and the selectivity of the join is unaffected by the relations already contained in the tree (due to the independence of join predicates). Note that while this holds for star queries, it does not hold in general. For example, clique queries also allow for an arbitrary join order, but the selectivities are affected by previously joined relations. Using these observations, we now show the optimality for star queries: Lemma 22.3.1 Given a query Q = (V, E), a relative order preserving cost function C and four relations R0 , Ri , Rj , Rk ∈ V (i ̸= j ̸= k ̸= 0). Then C(R0 Bi Ri Bj Rj ) ≥ C(R0 Bj Rj Bi Ri ) implies C(R0 Bi Ri Bj Rj Bk Rk ) ≥ C(R0 Bj Rj Bi Ri Bk Rk ). Theorem 22.3.2 Follows directly from the fact that (R0 Bi Ri Bj Rj ) ≡ (R0 Bj Rj Bi Ri ). The join Bk gets the same input in both cases, and thus causes the same costs. This lemma holds even for non-star queries and arbitrary (monotonic) cost functions. Lemma 22.3.3 Given a query Q = (V, E), a relative order preserving cost function C and four relations R0 , Ri , Rj , Rk ∈ V (i ̸= j ̸= k ̸= 0). Then C(R0 Bi Ri Bj Rj ) ≥ C(R0 Bj Rj Bi Ri ) implies C(R0 Bk Rk Bi Ri Bj Rj ) ≥ C(R0 Bk Rk Bj Rj Bi Ri ). Theorem 22.3.4 Follows from the definition of relative order preserving cost functions. Corollary 1 Given a query Q = (V, E), a relative order preserving cost function C, three relations R0 , Ri , Rj ∈ V (i ̸= j ̸= 0), and two join sequences S1 , S2 of relations in V such that R0 S1 Bi Ri Bj Rj S2 forms a valid join tree. Then C(R0 Bi Ri Bj Rj ) ≥ C(R0 Bj Rj Bi Ri ) implies C(R0 S1 Bi Ri Bj Rj S2 ) ≥ C(R0 S1 Bj Rj Bi Ri S2 ). Theorem 22.3.5 Follows from Lemma 22.3.1 and 22.3.3. Both assume nothing about Rk except independence, thus Bk Rk could be a sequence of joins. 390 CHAPTER 22. SIMPLIFYING THE QUERY GRAPH Theorem 1 Given a star query Q = (V, E) and a relative order preserving cost function C. Then for any optimal optimal join tree T and pairs of relations Ri , Rj neighbored in T (i.e., T has the form R0 S1 Bi Ri Bj Rj S2 ) the following condition holds: Either C(R0 Bi Ri Bj Rj ) ≤ C(R0 Bj Rj Bi Ri ) or T ′ = R0 S1 Bj Rj Bi Ri S2 is optimal, too. Theorem 22.3.6 By contradiction. We assume that C(R0 Bi Ri Bj Rj ) > C(R0 Bj Rj Bi Ri ) and T ′ is not optimal. By Corollary 1 we can deduce that C(R0 Bi Ri Bj Rj ) > C(R0 Bj Rj Bi Ri ) ⇒ C(T ′ ) = C(R0 S1 Bj Rj Bi Ri S2 ) ≤ C(R0 S1 Bi Ri Bi Ri S2 ) = C(T ). This is a contradiction to the assumption that T ′ is not optimal This theorem is a strong indication that our simplification algorithm is plausible, as we know that one of the optimal solutions will satisfy the ordering constraints used by the algorithm. Unfortunately the authors of [639] were only able to prove the optimality by restricting the cost function some more (perhaps unnecessarily): A cost function C is fully relative order preserving if it is relative order preserving and the following condition holds for arbitrary relations R0,...,3 and arbitrary joins B1,2,3 with independent join predicates: C(R0 B1 R1 B2 R2 ) ≥ C(R0 B2 R2 B1 R1 ) ⇒ C(R0 B1 R1 B3 R3 B2 R2 ) ≥ C(R0 B2 R2 B3 R3 B1 R1 ). Again, this property is satisfied by all ASI cost functions. Using this definition, we can show the optimality as follows. Lemma 22.3.7 Given a query Q = (V, E), a fully relative order preserving cost function C, three relations R0 , Ri , Rj ∈ V (i ̸= j ̸= 0), and three join sequences S1 , S2 , S3 of relations in V such that R0 S1 Bi Ri S2 Bj Rj S3 forms a valid join tree. Then C(R0 Bi Ri Bj Rj ) ≥ C(R0 Bj Rj Bi Ri ) implies C(R0 S1 Bi Ri S2 Bj Rj S3 ) ≥ C(R0 S1 Bj Rj S2 Bi Ri S3 ). Theorem 22.3.8 Follows from Corollary 1 and the definition of fully relative order preserving cost functions. Theorem 2 Given a star query Q = (V, E) and a fully relative order preserving cost function C. Applying the GraphSimplificationOptimizer algorithm repeatedly leads to the optimal execution plan. Theorem 22.3.9 As Q is a star query, any linear join order is valid, thus join ordering is done purely based upon costs. The algorithm repeatedly orders the two joins with the largest quotient, which is guaranteed to be ≥ 1 due to the lack of join ordering constraints. Lemma 22.3.7 shows joins can be ordered relative to each other regardless of other relations, thus if the algorithm orders Bi before Bj there exists an optimal solution with Bi before Bj (analogue to Theorem 1). The algorithm simplified the graph until the joins are in a total order, which uniquely describes one optimal execution plan. 22.4 The Time/Quality Trade-Off One particular interesting propery of the simplification algorithm is that it offers a direct trade-off between time and result quality. We therefore repeat some experimental results from [639] here that illustrate this trade-off. 391 22.4. THE TIME/QUALITY TRADE-OFF #subgraphs 500000 400000 300000 200000 0 14000 12000 10000 8000 6000 4000 2000 10 0 5 0 costs time [ms] 100000 0 20 40 60 80 100 120 140 160 number of simplification steps Figure 22.5: The Effect of Simplification Steps for a Star Query with 20 Relations Clearly, each simplification step decreases the search space, i.e., the number of connected subgraphs. Ideally the optimization time goes down analogously, and, unfortunately, the costs will go up if the heuristic makes mistakes. Figure 22.5 shows how the number of connected subgraphs, the optimization time, and the scaled costs (relative to the optimal solution) change during simplification of a star query with 20 relations. As predicted, the search space shrinks monotonically with simplification. It does not shrink strictly monotonically, as the simplification algorithm sometimes adds restrictions that are already implied through other restrictions, but this is not an issue for the full algorithm due to the binary search. The optimization time follows the search space size, although there are some local peaks. Apparently they are caused by the higher costs of hyperedges for the DPhyp algorithm relative to normal edges. The scaled costs are constantly 1 here, i.e., the algorithm produces the optimal solution regardless of the number of simplification steps. This is due to the theoretical properties of the ordering heuristic (see Section 22.3.4), which in this case is optimal. For grid queries the general situation is similar as shown in Figure 22.6. Search space and optimization time decrease similar to star queries, the costs however increase over time. Initially the heuristic performs only the relatively safe orderings, which do not cause any increases in costs, but at some point it makes a mistake in ordering and causes the costs to increase step-wise. Fortunately this happens when the search space has already been reduced a lot, which means that for simpler queries there is a reasonable hope that the heuristic will 392 24000 21000 18000 15000 12000 9000 6000 3000 0 2100 costs time [ms] #subgraphs CHAPTER 22. SIMPLIFYING THE QUERY GRAPH 1800 1500 1200 900 600 300 30 0 15 0 0 20 40 60 80 100 number of simplification steps Figure 22.6: The Effect of Simplification Steps for a Grid Query with 20 Relations never reach the point where it starts making mistakes. Chapter 23 Deriving and Dealing with Interesting Orderings and Groupings [This chapter was written by Thomas Neumann and Guido Moerkotte] 23.1 Introduction The most expensive operations (e.g. join, grouping, duplicate elimination) during query evaluation can be performed more efficiently if the input is ordered or grouped in a certain way. Therefore, it is crucial for query optimization to recognize cases where the input of an operator satisfies the ordering or grouping requirements needed for a more efficient evaluation. Since a plan generator typically considers millions of different plans – and, hence, operators –, this recognition easily becomes a performance bottleneck for plan generation, often leading to heuristic solutions. The importance of exploiting available orderings has already been recognized in the seminal work of Selinger et al [784]. They presented the concept of interesting orderings and showed how redundant sort operations could be avoided by reusing available orderings, rendering sort-based operators like sort-merge join much more interesting. Along these lines, it is beneficial to reuse available grouping properties, for example for hash-based operators. While heuristic techniques to avoid redundant group-by operators have been given [155], for a long time groupings have not been treated as thoroughly as orderings. One reason might be that while orderings and groupings are related (every ordering is also a grouping), groupings behave somewhat differently. For example, a tuple stream grouped on the attributes {a, b} need not be grouped on the attribute {a}. This is different from orderings, where a tuple stream ordered on the attributes (a, b) is also ordered on the attribute (a). Since no simple prefix (or subset) test exists for groupings, optimizing groupings even in a heuristic way is much more difficult than optimizing orderings. Still, it is desirable to combine order optimization and the optimization of groupings, as the problems are related and treated sim393 394CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN ilarly during plan generation. Recently, some work in this direction has been published [911]. However, this only covers a special case of grouping. Instead, in this chapter we follow the approach presented by Neumann and Moerkotte [644, 643] Other existing frameworks usually consider only order optimization, and experimental results have shown that the costs for order optimization can have a large impact on the total costs of query optimization [644]. Therefore, some care is needed when adding groupings to order optimization, as a slowdown of plan generation would be unacceptable. In this chapter, we present a framework to efficiently reason about orderings and groupings. It can be used for the plan generator described in Chapter ??, but is actually an independent component that could be used in any kind of plan generator. Experimental results show that it efficiently handles orderings and groupings at the same time, with no additional costs during plan generation and only modest one time costs. Actually, the operations needed for both ordering and grouping optimization during plan generation can be performed in O(1), basically allowing to exploit groupings for free. 23.2 Problem Definition The order manager component used by the plan generator combines order optimization and the handling of grouping in one consistent set of algorithms and data structures. In this section, we give a more formal definition of the problem and the scope of the framework. First, we define the operations of ordering and grouping (Section 23.2.1 and 23.2.2). Then, we briefly discuss functional dependencies (Section 23.2.3) and how they interact with algebraic operators (Section 23.2.4). Finally, we explain how the component is actually used during plan generation (Section 23.2.5). 23.2.1 Ordering During plan generation, many operators require or produce certain orderings. To avoid redundant sorting, it is required to keep track of the orderings a certain plan satisfies. The orderings that are relevant for query optimization are called interesting orders [784]. The set of interesting orders for a given query consists of 1. all orderings required by an operator of the physical algebra that may be used in a query execution plan for the given query, and 2. all orderings produced by an operator of the physical algebra that may be used in a query execution plan for the given query. This includes the final ordering requested by the given query, if this is specified. The interesting orders are logical orderings. This means that they specify a condition a tuple stream must meet to satisfy the given ordering. In contrast, the physical ordering of a tuple stream is the actual succession of tuples in the stream. Note that while a tuple stream has only one physical ordering, 395 23.2. PROBLEM DEFINITION it can satisfy multiple logical orderings. For example, the stream of tuples ((1, 1), (2, 2)) with schema (a, b) has one physical ordering (the actual stream), but satisfies the logical orderings a, b, ab and ba. Some operators, like sort, actually influence the physical ordering of a tuple stream. Others, like select, only influence the logical ordering. For example, a sort[a] produces a tuple stream satisfying the ordering (a) by actually changing the physical order of tuples. After applying select[a=b] to this tuple stream, the result satisfies the logical orderings (a), (b), (a, b), (b, a), although the physical ordering did not change. Deduction of logical orderings can be described by using the well-known notion of functional dependency (FD) [818]. In general, the influence of a given algebraic operator on a set of logical orderings can be described by a set of functional dependencies. We now formalize the problem. Let R = (t1 , . . . , tr ) be a stream (ordered sequence) of tuples in attributes A1 , . . . , An . Then R satisfies the logical ordering o = (Ao1 , . . . , Aom ) (1 ≤ oi ≤ n) if and only if for all 1 ≤ i < j ≤ r the following condition holds: ∧ (ti .Ao1 ≤ tj .Ao1 ) ∀1 < k ≤ m (∃1 ≤ l < k(ti .Aol < tj .Aol )) ∨ ((ti .Aok−1 = tj .Aok−1 ) ∧ (ti .Aok ≤ tj .Aok )) Next, we need to define the inference mechanism. Given a logical ordering o = (Ao1 , . . . , Aom ) of a tuple stream R, then R obviously satisfies any logical ordering that is a prefix of o including o itself. Let R be a tuple stream satisfying both the logical ordering o = (A1 , . . . , An ) and the functional dependency f = B1 , . . . , Bk → Bk+1 1 with Bi ∈ {A1 . . . An }. Then R also satisfies any logical ordering derived from o as follows: add Bk+1 to o at any position such that all of B1 , . . . , Bk occurred before this position in o. For example, consider a tuple stream satisfying the ordering (a, b); after inducing the functional dependency a, b → c, the tuple stream also satisfies the ordering (a, b, c), but not the ordering (a, c, b). Let O′ be the set of all logical orderings that can be constructed this way from o and f after prefix closure. Then, we use the following notation: o ⊢f O′ . Let e be the equation Ai = Aj . Then, o ⊢e O′ , where O′ is the prefix closure of the union of the following three sets. The first set is O1 defined as o ⊢Ai →Aj O1 , the second is O2 defined as o ⊢Aj →Ai O2 , and the third is the set of logical orderings derived from o where a possible occurrence of Ai is replaced by Aj or vice versa. For example, consider a tuple stream satisfying the ordering (a); after inducing the equation a = b, the tuple stream also satisfies the orderings (a, b), (b) and (b, a). Let e be an equation of the form A = const. Then O′ (o ⊢e O′ ) is derived from o by inserting A at any position in o. This is equivalent to o ⊢∅→A O′ . For example, consider a tuple stream satisfying the ordering (a, b); after inducing the equation c = const the tuple stream also satisfies the orderings (c, a, b), (a, c, b) and (a, b, c). 1 Any functional dependency which is not in this form can be normalized into a set of FDs of this form. 396CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN Let O be a set of logical orderings and F a set of functional dependencies (and possibly equations). We define the sets of inferred logical orderings Ωi (O, F ) as follows: Ω0 (O, F ) := O Ωi (O, F ) := Ωi−1 (O, F ) ∪ [ f ∈F,o∈Ωi−1 (O,F ) Let Ω(O, F ) be the prefix closure of only if o′ ∈ Ω(O, F ). 23.2.2 S∞ O′ with o ⊢f O′ i=0 Ωi (O, F ). We write o ⊢F o′ if and Grouping It was shown in [911] that, similar to order optimization, it is beneficial to keep track of the groupings satisfied by a certain plan. Traditionally, group-by operators are either applied after the rest of the query has been processed or are scheduled using some heuristics [155]. However, the plan generator could take advantage of grouping properties produced e.g. by avoiding re-hashing if such information was easily available. Analogous to order optimization, we call this grouping optimization and define that the set of interesting groupings for a given query consists of 1. all groupings required by an operator of the physical algebra that may be used in a query execution plan for the given query 2. all groupings produced by an operator of the physical algebra that may be used in a query execution plan for the given query. This includes the grouping specified by the group-by clause of the query, if any exists. These groupings are similar to logical orderings, as they specify a condition a tuple stream must meet to satisfy a given grouping. Likewise, functional dependencies can be used to infer new groupings. More formally, a tuple stream R = (t1 , . . . , tr ) in attributes A1 , . . . , An satisfies the grouping g = {Ag1 . . . , Agm } (1 ≤ gi ≤ n) if and only if for all 1 ≤ i < j < k ≤ r the following condition holds: ∀1 ≤ l ≤ m ti .Agl = tk .Agl ⇒ ∀1 ≤ l ≤ m ti .Agl = tj .Agl Two remarks are in order here. First, note that a grouping is a set of attributes and not – as orderings – a sequence of attributes. Second, note that given two groupings g and g ′ ⊂ g and a tuple stream R satisfying the grouping g, R need not satisfy the grouping g ′ . For example, the tuple stream ((1, 2), (2, 3), (1, 4)) with the schema (a, b) is grouped by {a, b}, but not by {a}. This is different from orderings, where a tuple stream satisfying an ordering o also satisfies all orderings that are a prefix of o. 397 23.2. PROBLEM DEFINITION New groupings can be inferred by functional dependencies as follows: Let R be a tuple stream satisfying both the grouping g = {A1 , . . . , An } and the functional dependency f = B1 , . . . , Bk → Bk+1 with {B1 , . . . , Bk } ⊆ {A1 , . . . , An }. Then R also satisfies the grouping g ′ = {A1 , . . . , An } ∪ {Bk+1 }. Let G′ be the set of all groupings that can be constructed this way from g and f . Then we use the following notation: g ⊢f G′ . For example {a, b} ⊢a,b→c {a, b, c}. Let e be the equation Ai = Aj . Then g ⊢e G′ where G′ is the union of the following three sets. The first set is G1 defined as g ⊢Ai →Aj G1 , the second is G2 defined as g ⊢Aj →Ai G2 , and the third is the set of groupings derived from g where a possible occurrence of Ai is replaced by Aj or vice versa. For example, {a, b} ⊢b=c {a, c}. Let e be an equation of the form A = const. Then g ⊢e G′ is defined as g ⊢∅→A G′ . For example, {a, b} ⊢c=const {a, b, c}. Let G be a set of groupings and F be a set of functional dependencies (and possibly equations). We define the set of inferred groupings Ωi (G, F ) as follows: Ω0 (G, F ) := G Ωi (G, F ) := Ωi−1 (G, F ) ∪ [ f ∈F,g∈Ωi−1 (G,F ) Let Ω(G, F ) be 23.2.3 S∞ G′ with g ⊢f G′ i=0 Ωi (G, F ). We write g ⊢F g ′ if and only if g ′ ∈ Ω(G, F ). Functional Dependencies The reasoning about orderings and groupings assumes that the set of functional dependencies is known. The process of gathering the relevant functional dependencies is described in detail in [818, 819]. Predominantly, there are three sources of functional dependencies: 1. key constraints 2. join predicates [references constraints] 3. filter predicates 4. simple expressions However, the algorithm makes no assumption about the functional dependencies. If for some reason an operator induces another kind of functional dependency (e.g., when using TID-based optimizations [588]), this can be handled the same way. The only important fact is that we provide the set of functional dependencies as input to the algorithm. 23.2.4 Algebraic Operators To illustrate the propagation of orderings and groupings during query optimization, we give some rules for concrete (physical) operators in Figure 23.1. As a 398CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN operator scan(R) indexscan(Idx) map(S,a = f (b)) select(S,a = b) bnl-join(S1 ,S2 ) indexnl-join(S1 ,S2 ) djoin(S1 ,S2 ) sort(S,a1 , . . . , an ) group-by(S,a1 , . . . , an ) hash(S,a1 , . . . , an ) sort-merge(S1 ,S2 ,⃗a = ⃗b) hash-join(S1 ,S2 ,⃗a = ⃗b) requires ⃗a ∈ O(S1 ) ∧ ⃗b ∈ O(S2 ) ⃗a ↓∈ O(S1 ) ∧ ⃗b ↓∈ O(S2 ) produces O(R) O(Idx) Ω(O(S), b → a) Ω(O(S), a = b) O(S1 ) O(S1 ) O(S1 ) (a1 , . . . , an ) {a1 , . . . , an } {a1 , . . . , an } Ω(O(S1 ), ⃗a = ⃗b) Ω(O(S1 ), ⃗a = ⃗b) Figure 23.1: Propagation of orderings and groupings EXC shorthand, we use the following notation: O(R) set of logical orderings and groupings satisfied by the physical ordering of the relation R O(S) inferred set of logical orderings and groupings satisfied by the tuple stream S x↓ {y|y ∈ x} Note that these rules somewhat depend on the actual implementation of the operators, e.g. a blockwise nested loop join might actually destroy the ordering if the blocks are stored in hash tables. The rules are also simplified: For example, a group-by will probably compute some aggregate functions, inducing new functional dependencies. Furthermore, additional information can be derived from schema information: If the right-hand side of a dependent join (index nested loop joins are similar) produces at most one tuple, and the left-hand side is grouped on the free attributes of the right-hand side (e.g. if they do not contain duplicates) the output is also grouped on the attributes of the righthand side. This situation is common, especially for index nested loop joins, and is detected automatically if the corresponding functional dependencies are considered. Therefore, it is important that all operators consider all functional dependencies they induce. 23.2.5 Plan Generation To exploit available logical orderings and groupings, the plan generator needs access to the combined order optimization and grouping component, which we describe as an abstract data type (ADT). An instance of this abstract data type OrderingGrouping represents a set of logical orderings and groupings, and wherever necessary, an instance is embedded into a plan note. The main operations the abstract data type OrderingGrouping must provide are 1. a constructor for a given logical ordering or grouping, 23.3. OVERVIEW 399 2. a membership test (called containsOrdering(LogicalOrdering)) which tests whether the set contains the logical ordering given as parameter, 3. a membership test (called containsGrouping(Grouping)) which tests whether the set contains the grouping given as parameter, and 4. an inference operation (called infer(set)). Given a set of functional dependencies and equations, it computes a new set of logical orderings and groupings a tuple stream satisfies. These operations can be implemented by using the formalism described before: containsOrdering tests for o ∈ O, containsGrouping tests for o ∈ G and infer(F) calculates Ω(O, F ) respectively Ω(G, F ). Note that the intuitive approach to explicitly maintain the set of all logical orderings and groupings is not useful in practice. For example, if a sort operator sorts a tuple stream on (a, b), the result is compatible with logical orderings {(a, b), (a)}. After a selection operator with selection predicate x = const is applied, the set of logical orderings changes to {(x, a, b), (a, x, b), (a, b, x), (x, a), (a, x), (x)}. Since the size of the set increases quadratically with every additional selection predicate of the form v = const, a naive representation as a set of logical orderings is problematic. This led Simmen et al. to introduce a more concise representation, which is discussed in the related work section. Note that Simmen’s technique is not easily applicable to groupings, and no algorithm was proposed to efficiently maintain the set of available groupings. The order optimization component described here closes this gap by supporting both orderings and groupings. The problem of quadatic growth is avoided by only implicitly representing the set. 23.3 Overview As we have seen, explicit maintenance of the set of logical orderings and groupings can be very expensive. However, the ADT OrderingGrouping required for plan generation does not need to offer access to this set: It only allows to test if a given interesting order or grouping is in the set and changes the set according to new functional dependencies. Hence, it is not required to explicitly represent this set; an implicit representation is sufficient as long as the ADT operations can be implemented atop of it. In other words, we need not be able to reconstruct the set of logical orderings and groupings from the state of the ADT. This gives us room for optimizations. The initial idea (see [644]) was to represent sets of logical orderings as states of a finite state machine (FSM). Roughly, a state of the FSM represents a current physical ordering and the set of logical orderings that can be inferred from it given a set of functional dependencies. The edges (transitions) in the FSM are labeled by sets of functional dependencies. They lead from one state to another, if the target state of the edge represents the set of logical orderings that can be derived from the orderings the edge’s source node represents by applying the set of functional dependencies the edge is labeled with. We have 400CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN abcd  {b → d}  ab abc {b → d}  abdc   a {b → d} abd Figure 23.2: Possible FSM for orderings to use sets of functional dependencies, since a single algebraic operator may introduce more than one functional dependency. Let us illustrate the idea by a simple example and then discuss some problems. In Figure 23.2, an FSM for the interesting order (a, b, c) and its prefixes (remember that we need prefix closure) and the set of functional dependencies {b → d} is given. When a physical ordering satisfies (a, b, c), it also satisfies its prefixes (a, b) and (a). This is indicated by the ϵ transitions. The functional dependency b → d allows to derive the logical orderings (a, b, c, d) and (a, b, d, c). This is handled by assuming that the physical ordering changes to either (a, b, c, d) or (a, b, d, c). Hence, these states have to be added to the FSM. We further add the transitions induced by {b → d}. Note that the resulting FSM is a non-deterministic finite state machine (NFSM). Assume we have an NFSM as above. Then (while ignoring groupings) the state of the ADT is a state of the NFSM and the operations of the ADT can easily be mapped to the FSM. Testing for a logical ordering can be performed by checking if the node with the ordering is reachable from the current state by following ϵ edges. If the set must be changed because of a functional dependency the state is changed by following the edge labeled with the functional dependency. Of course, the non-determinism is in our way. While remembering only the active state of the NFSM avoids the problem of maintaining a set of orderings, the NFSM is not really useful from a practical point of view, since the transitions are non-deterministic. Nevertheless, the NFSM can be considered as a special non-deterministic finite automaton (NFA), which consumes the functional dependencies and ”recognizes” the possible physical orderings. Further, an NFA can be converted into a deterministic finite automaton (DFA), which can be handled efficiently. Remember that the construction is based on the power set of the NFA’s states. That is, the states of the DFA are sets of states of the NFA [553]. We do not take the deviation over the finite automaton but instead lift the construction of deterministic finite automatons from non-deterministic ones to finite state machines. Since this is not a traditional conversion, we give a proof of this step in Section ??. Yet another problem is that the conversion from an NFSM to a deterministic FSM (DFSM) can be expensive for large NFSMs. Therefore, reducing the size of the NFSM is another problem we look at. We introduce techniques for reducing the set of functional dependencies that have to be considered and 401 23.3. OVERVIEW abc {b → d} abcd Figure 23.3: Possible FSM for groupings  abcd  {b → d}  ab abc {b → d}  abdc {b → d} abc   abcd  a {b → d} abd Figure 23.4: Combined FSM for orderings and groupings further techniques to prune the NFSM in Section 23.4.7. The idea of a combined framework for orderings and groupings was presented in [643]. Here, the main point is to construct a similar FSM for groupings and integrate it into the FSM for orderings, thus handling orderings and groupings at the same time. An example of this is shown in Figure 23.3. Here, the FSM for the grouping {a, b, c} and the functional dependency b → c is shown. We represent states for orderings as rounded boxes and states for groupings as rectangles. Note that although the FSM for groupings has a start node similar to the FSM for orderings, it is much smaller. This is due to the fact that groupings are only compatible with themselves, no nodes for prefixes are required. However, the FSM is still non-deterministic: given the functional dependency b → c, the grouping {a, b, c, d} is compatible with {a, b, c, d} itself and with {a, b, c}; therefore, there exists an (implicit) edge from each grouping to itself. The FSM for groupings is integrated into the FSM for orderings by adding ϵ edges from each ordering to the grouping with the same attributes; this is due to the fact that every ordering is also a grouping. Note that although the ordering (a, b, c, d) also implies the grouping {a, b, c}, no edge is required for this, since there exists an ϵ edge to (a, b, c) and from there to {a, b, c}. After constructing a combined FSM as described above, the full ADT supporting both orderings and groupings can easily be mapped to the FSM: The a,ab,abc,{ab} {b → d} a,ab,abc abd,abcd, abdc,{abd} Figure 23.5: Possible DFSM for Figure 23.4 402CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN state of the ADT is a state of the FSM and testing for a logical ordering or grouping can be performed by checking if the node with the ordering or grouping is reachable from the current state by following ϵ edges (as we will see, this can be precomputed to yield the O(1) time bound for the ADT operations). If the state of the ADT must be changed because of functional dependencies, the state in the FSM is changed by following the edge labeled with the functional dependency. However, the non-determinism of this transition is a problem. Therefore, for practical purposes the NFSM must be converted into a DFSM. The resulting DFSM is shown in Figure 23.5. Note that although in this simple example the DFSM is very small, the conversion could lead to exponential growth. Therefore, additional pruning techniques for groupings are presented in Section 23.4.7. However, the inclusion of groupings is not critical for the conversion, as the grouping part of the NFSM is nearly independent of the ordering part. In Section 23.5 we look at the size increase due to groupings. The memory consumption usually increases by a factor of two, which is the minimum expected increase, since every ordering is a grouping. Some operators, like sort, change the physical ordering. In the NFSM, this is handled by changing the state to the node corresponding to the new physical ordering. Implied by its construction, in the DFSM this new physical ordering typically occurs in several nodes. For example, (a, b, c) occurs in both nodes of the DFSM in Figure 23.5. It is, therefore, not obvious which node to choose. We will take care of this problem during the construction of the NFSM (see Section 23.4.3). 23.4 Detailed Algorithm 23.4.1 Overview Our approach consists of two phases. The first phase is the preparation step taking place before the actual plan generation starts. The output of this phase are the precomputed values used to implement the ADT. Then the ADT is used during the second phase where the actual plan generation takes place. The first phase is performed exactly once and is quite involved. Most of this section covers the first phase. Only Section 23.4.6 deals with the ADT implementation. Figure 23.6 gives an overview of the preparation phase. It is divided into four major steps, which are discussed in the following subsections. Subsection 23.4.2 briefly reviews how the input to the first phase is determined and, more importantly, what it looks like. Section 23.4.3 describes in detail the construction of the NFSM from the input. The conversion from the NFSM to the DFSM is only briefly sketched in Section 23.4.4, for details see [553]. From the DFSM some values are precomputed which are then used for the efficient implementation of the ADT. The precomputation is described in Section 23.4.5, while their utilization and the ADT implementation are the topic of Section 23.4.6. Section 23.4.7 contains some important techniques to reduce the size of the NFSM. They are applied in Steps 2 (b), 2 (c) and 2 (e). During the discussion, we illustrate the different steps by a simple running example. More complex 23.4. DETAILED ALGORITHM 403 1. Determine the input (a) Determine interesting orders (b) Determine interesting groupings (c) Determine set of functional dependencies 2. Construct the NFSM (a) Construct states of the NFSM (b) Filter functional dependencies (c) Build filters for orderings and groupings (d) Add edges to the NFSM (e) Prune the NFSM (f) Add artificial start state and edges 3. Convert the NFSM into a DFSM 4. Precompute values (a) Precompute the compatibility matrix (b) Precompute the transition table Figure 23.6: Preparation steps of the algorithm examples can be found in Section 23.5. 23.4.2 Determining the Input Since the preparation step is performed immediately before plan generation, it is assumed that the query optimizer has already determined which indices are applicable and which algebraic operators can possibly be used to construct the query execution plan. Before constructing the NFSM, the set of interesting orders, the set of interesting groupings and the sets of functional dependencies for each algebraic operator are determined. We denote the set of sets of functional dependencies by F. It is important for the correctness of our algorithms that we note which of the interesting orders are (1) produced by some algebraic operator or (2) only tested for. Note that the interesting orders which satisfy (1) may additionally be tested for as well. We denote those orderings under (1) by OP , those under (2) by OT . The total set of interesting orders is defined as OI = OP ∪ OT . The orders produced are treated slightly differently in the following steps. The ToDo: details groupings are classified similarly to the orderings: We denote the grouping pro- on determining duced by some algebraic operator by GP , and those just tested for by GT . The interesting ortotal set of interesting groupings is defined as GI = GP ∪GT . More information ders? on how to extract interesting groupings can be found in [911]. Furthermore, for a sample query the extraction of both interesting orders and groupings is 404CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN illustrated in Section 23.5. To illustrate subsequent steps, we assume that the set of sets of functional dependencies F = {{b → c}, {b → d}}, the interesting groupings GI = {{b}} ∪ {{b, c}} and the interesting orders OI = {(b), (a, b)} ∪ {(a, b, c)} have been extracted from the query. We assume that those in OT = {(a, b, c)} and GT = {{b, c}} are tested for but not produced by any operator, whereas those in OP = {(b), (a, b)} and GP = {{b}} may be produced by some algebraic operators. 23.4.3 Constructing the NFSM An NFSM consists of a tuple (Σ, Q, D, qo ), where • Σ is the input alphabet, • Q is the set of possible states, • D ⊆ Q × (Σ ∪ {ϵ}) × Q is the transition relation, and • q0 is the initial state. Coarsely, Σ consists of the functional dependencies, Q of the relevant orderings and groupings, and D describes how the orderings or groupings change under a given functional dependency. Some refinements are needed to provide efficient ADT operations. The details of the construction are described now. For the order optimization part the states are partitioned in Q = QI ∪ QA ∪ {q0 }, where q0 is an artificial state to initialize the ADT, QI is the set of states corresponding to interesting orderings and QA is a set of artificial states only required for the algorithm itself. QA is described later. Furthermore, the set QI is partitioned in QPI and QTI , representing the orderings in OP and OT , respectively. To support groupings, we add to QPI states corresponding to the groupings in GP and to QTI states corresponding to the groupings in GT . The initial NFSM contains the states QI of interesting groupings and orderings. For the example, this initial construction not including the start state qo is shown in Figure 23.7. The states representing groupings are drawn as rectangles and the states representing orderings are drawn with rounded corners. When considering functional dependencies, additional groupings and orderings can occur. These are not directly relevant for the query, but have to be represented by states to handle transitive changes. Since they have no direct connection to the query, these states are called artificial states. Starting with 405 23.4. DETAILED ALGORITHM b b a,b b,c a,b,c Figure 23.7: Initial NFSM for sample query the initial states QI , artificial states are constructed by considering functional dependencies QA = (Ω(OI , F) \ OI ) ∪ (Ω(GI , F) \ GI ). In our example, this creates the states (b, c) and (a), as (b, c) can be inferred from (b) when considering {b → c} and (a) can be inferred from (a, b), since (a) is a prefix of (a, b). The result is show in Figure 23.8 (ignore the edges). Sometimes the ADT has to be explicitly initialized with a certain ordering or grouping (e.g. after a sort). To support this, artificial edges are added later on. These point to the requested ordering or grouping (states in QPI ) and are labeled with the state that they lead to. Therefore, the input alphabet Σ consists of the sets of functional dependencies and produced orderings and groupings: Σ = F ∪ QPI ∪ {ϵ}. In our example, Σ = {{b → c}, {b → d}, (b), (a, b), {b}}. Accordingly, the domain of the transition relation D is D⊆ ((Q \ {q0 }) × (F ∪ {ϵ}) × (Q \ {q0 })) ∪ ({qo } × QPI × QPI ). The edges are formed by the functional dependencies and the artificial edges. Furthermore, ϵ edges exist between orderings and the corresponding groupings, as orderings are a special case of grouping: DF D = {(q, f, q ′ ) | q ∈ Q, f ∈ F ∪ {ϵ}, q ′ ∈ Q, q ⊢ f q ′ } DA = DOG = D = {(q0 , q, q) | q ∈ QP I } {(o, ϵ, g) | o ∈ Ω(OI , F), g ∈ Ω(GI , F), o ≡ g} DF D ∪ DA ∪ DOG First, the edges corresponding to functional dependencies are added (DF D ). In our example, this results in the NFSM shown in Figure 23.8. Note that the functional dependency b → d has been pruned, since d does not occur in any interesting order or grouping. The NFSM can be further simplified by pruning the artificial state (b, c), which cannot lead to a new interesting order. The result is shown in Figure 23.9. A detailed description of these pruning techniques can be found in Section 23.4.7. 406CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN {b → c} b,c b b  {b → c}  q0 a a,b {b → c} b,c  a,b,c Figure 23.8: NFSM after adding DF D edges b b {b → c}  q0 a a,b {b → c} b,c  a,b,c Figure 23.9: NFSM after pruning artificial states The artificial start state q0 has emanating edges incident to all states representing interesting orders in OIP and interesting groupings in GPI (DA ). Also, the states representing orderings have edges to their corresponding grouping states (DOG ), as every ordering is also a grouping. The final NFSM for the example is shown in Figure 23.10. Note that the states representing (a, b, c) and {b, c} are not linked by an artificial edge since it is only tested for, as they are in QTI . {b}  b b (b) (a,b) {b → c}  qo a a,b {b → c}  a,b,c Figure 23.10: Final NFSM b,c 407 23.4. DETAILED ALGORITHM {b → c} 1:{b} {b} (b) qo 4:{b},{b,c} {b → c} 2:(b),{b} 5:(b),{b},{b,c} (a,b) {b → c} 3:(a),(a,b) 6:(a),(a,b),(a,b,c) Figure 23.11: Resulting DFSM state 1 2 3 4 5 6 1: (a) 0 0 1 0 0 1 2: (a,b) 0 0 1 0 0 1 3: (a,b,c) 0 0 0 0 0 1 4: (b) 0 1 0 0 1 0 5: {b} 1 1 0 1 1 0 6: {b,c} 0 0 0 1 1 0 Figure 23.12: contains Matrix 23.4.4 Constructing the DFSM The construction of the DFSM from the NFSM follows the standard power set construction that is used to translate an NFA into a DFA [553]. A formal description and a proof of correctness is given in Section ??. It is important to note that this construction preserves the start state and the artificial edges. The resulting DFSM for the example is shown in Figure 23.11. state qo 1 2 3 4 5 6 1: {b → c} 4 5 6 4 5 6 2: (a, b) 3 - 3: (b) 2 - 4: {b} 1 - Figure 23.13: transition Matrix 408CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN 23.4.5 Precomputing Values To allow for an efficient precomputation of values, every occurrence of an interesting order, interesting grouping or set of functional dependencies is replaced by integers. This allows comparisons in constant time (equivalent entries are mapped to same integer). Further, the DFSM is represented by an adjacency matrix. The precomputation step itself computes two matrices. The first matrix denotes whether an NFSM state in QI is active, i.e. an interesting order or an interesting grouping, is contained in a specific DFSM state. This matrix can be represented as a compact bit vector, allowing tests in O(1). For our running example, it is given (in a more readable form) in Figure 23.12. The second matrix contains the transition table for the DFSM relation D. Using it, edges in the DFSM can be followed in O(1). For the example, the transition matrix is given in Figure 23.13. 23.4.6 During Plan Generation During plan generation, larger plans are constructed by adding algebraic operators to existing (sub-)plans. Each subplan contains the available orderings and groupings in the form of the corresponding DFSM state. Hence, the state of the DFSM, a simple integer, is the state of our ADT OrderingGrouping. When applying an operator to subplans, the ordering and grouping requirements are tested by checking whether the DFSM state of the subplan contains the required ordering or grouping of the operator. This is done by a simple lookup in the contains matrix. If the operator introduces a new set of functional dependencies, the new state of the ADT is computed by following the according edge in the DFSM. This is performed by a quick lookup in the transition matrix. For “atomic” subplans like table or index scans, the ordering and grouping is determined explicitly by the operator. The state of the DFSM is determined by a lookup in the transition matrix with start state qo and the edge annotated by the produced ordering or grouping. For sort and group-by operators the state of the DFSM is determined as before by following the artificial edge for the produced ordering or grouping and then reapplying the set of functional dependencies that currently hold. In the example, a sort on (b) results in a subplan with ordering/grouping state 2 (the state 2 is active in the DFSM), which satisfies the ordering (b) and the grouping {b}. After applying an operator which induces b → c, the ordering/grouping changes to state 5 which also satisfies {b, c}. 23.4.7 Reducing the Size of the NFSM Reducing the size of the NFSM is important for two reasons: First, it reduces the amount of work needed during the preparation step, especially the conversion from NFSM to DFSM. Even more important is that a reduced NFSM results in a smaller DFSM. This is crucial for plan generation, since it reduces 23.4. DETAILED ALGORITHM 409 the search space: Plans can only be compared and pruned if they have comparable ordering and a comparable set of functional dependencies (see [818, 819] for details). Reducing the size of the DFSM removes information that is not relevant for plan generation and, therefore, allows a more aggressive pruning of plans. At first, the functional dependencies are pruned. Here, functional dependencies which can never lead to a new interesting order or grouping are removed. For convenience, we extend the definition of Ω(O, F ) and define Ω(O, ϵ) := Ω(O, ∅). Then the set of prunable functional dependencies FP can be described by ΩN (o, f ) := Ω({o}, {f }) \ Ω({o}, ϵ) FP := {f |f ∈ F ∧ ∀o ∈ OI ∪ GI : (Ω(ΩN (o, f ), F ) \ Ω({o}, ϵ)) ∩ (OI ∪ GI ) = ∅}. Pruning functional dependencies is especially useful, since it also prunes artificial states that would be created because of the dependencies. In the example, this removed the functional dependency b → d, since d does not appear in any interesting order or grouping. This step also removes the artificial states containing d. The artificial states are required to build the NFSM, but they are not visible outside the NFSM. Therefore, they can be pruned and merged without affecting plan generation. Two heuristics are used to reduce the set of artificial states: 1. All artificial nodes that behave exactly the same (that is, their edges lead to the same states given the same input) are merged and 2. all edges to artificial states that can reach states in QI only through ϵ edges are replaced with corresponding edges to the states in QI . More formally, the following pairs of states can be merged: {(o1 , o2 ) | o1 ∈ QA , o2 ∈ QA ∧ ∀f ∈ F : (Ω({o1 }, {f }) \ Ω({o1 }, ϵ)) = (Ω({o2 }, {f }) \ Ω({o2 }, ϵ))}. The following states can be replaced with the next state reachable by an ϵ edge: {o | o ∈ QA ∧ ∀f ∈ F : Ω(Ω({o}, ϵ), {f }) \ {o} = Ω(Ω({o}, ϵ) \ {o}, {f })}. In the example, this removed the state (b, c), which was artificial and only led to the state (b). These techniques reduce the size of the NFSM, but still most states are artificial states, i.e. they are only created because they can be reached by considering functional dependencies when a certain ordering or grouping is available. But many of these states are not relevant for the actual query processing. 410CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN For example, given a set of interesting orders which consists only of a single ordering (a) and a set of functional dependencies which consists only of a → b, the NFSM will contain (among others) two states: (a) and (a, b). The state (a, b) is created since it can be reached from (a) by considering the functional dependency, however, it is irrelevant for the plan generation, since (a, b) is not an interesting order and is never created nor tested for. Actually, in the example above, the whole functional dependency would be pruned (since b never occurs in an interesting order), but the problem remains for combinations of interesting orders: Given the interesting orders (a), (b) and (c) and the functional dependencies {a → b, b → a, b → c, c → b}, the NFSM will contain states for all permutations of a, b and c. But these states are completely useless, since all interesting orders consist only of a single attribute and, therefore, only the first entry of an ordering is ever tested. Ideally, the NFSM should only contain states which are relevant for the query; since this is difficult to ensure, a heuristic can be used which greatly reduces the size of the NFSM and still guarantees that all relevant states are available: When considering a functional dependency of the form a → b and an ordering o1 , o2 , . . . , on with oi = a for some i (1 ≤ i ≤ n), the b can be inserted at any position j with i < j ≤ n + 1 (for the special case of a condition a = b, i = j is also possible). So, an entry of an ordering can only affect entries on the right of its own position. This means that it is unnecessary to consider those parts of an ordering which are behind the length of the longest interesting order; since that part cannot influence any entries relevant for plan generation, it can be omitted. Therefore, the orderings created by functional dependencies can be cut off after the maximum length of interesting orders, which results in less possible combinations and a smaller NFSM. The space of possible orderings can be limited further by taking into account the prefix of the ordering: before inserting an entry b in an ordering o1 , o2 , . . . , on at the position i, check if there is actually an interesting order with the prefix o1 , o2 , ...oi−1 , b and stop inserting if no interesting order is found. Also limit the new ordering to the length of the longest matching interesting order; further attributes will never be used. If functional dependencies of the form a = b occur, they might influence the prefix of the ordering and the simple test described above is not sufficient. Therefore, a representative is chosen for each equivalence class created by these dependencies, and for the prefix test the attributes are replaced with their representatives. Since the set of interesting orders with a prefix of o1 , . . . , on is a superset of the set for the prefix o1 , ...on , on+1 , this heuristic can be implemented very efficiently by iterating over i and reducing the set as needed. Additional techniques can be used to avoid creating superfluous artifical states for groupings: First, in Step 2.3 (see Figure 23.6) the set of attributes occurring in interesting groupings is determined: AG = {a | ∃g ∈ GI : a ∈ g} Now, for every attribute a occurring on the right-hand side of a functional 23.4. DETAILED ALGORITHM 411 dependency the set of potentially reachable relevant attributes is determined: r(a, 0) = {a} r(a, n) = r(a, n − 1) ∪ {a′ | ∃(a1 . . . am → a′ ) ∈ F : {a1 . . . am } ∩ r(a, n − 1) ̸= ∅} r(a) = r(a, |F|) ∩ AG This can be used to determine if a functional dependency actually adds useful attributes. Given a functional dependency a1 . . . an → a and a grouping g with {a1 . . . an } ⊆ g, a should only be added to g if r(a) ̸⊆ g, i.e. the attribute might actually lead to a new interesting grouping. For example, given the interesting groupings {a}, {a, b} and the functional dependencies a → c, a → d, d = b. When considering the grouping {a}, the functional dependency a → c can be ignored, as it can only produce the attribute c, which does not occur in an interesting grouping. However, the functional dependency a → d should be added, since transitively the attribute b can be produced, which does occur in an interesting grouping. Since there are no ϵ edges between groupings, i.e. groupings are not compatible with each other, a grouping can only be relevant for the query if it is a subset of an interesting ordering (as further attributes could be added by functional dependencies). However, a simple subset test is not sufficient, as equations of the form a = b are also supported; these can effectively rename attributes, resulting in a slightly more complicated test: In Step 2.3 (see Figure 23.6) the equivalence classes induced by the equations in F are determined and for each class a representative is chosen (a and a1 . . . an are attributes occuring in the GI ): E(a, 0) = {a} E(a, n) = E(a, n − 1) ∪ {a′ | ((a = a′ ) ∈ F) ∨ ((a′ = a) ∈ F)} E(a) = E(a, |F|) e(a) = a representative choosen from E(A) e({a1 . . . an }) = {e(a1 ) . . . e(an )}. Using these equivalence classes, a mapped set of interesting groupings is produced that will be used to test if a grouping is relevant: GE = {e(g) | g ∈ GI } I ′ Now a grouping g can be pruned if ̸ ∃g ′ ∈ GE I : e(g) ⊆ g . For example, given the interesting grouping {a} and the equations a = b, b = c, the grouping {d} can be pruned, as it will never lead to an interesting grouping; however, the groupings {b} and {c} have to be kept, as they could change to an interesting grouping later on. 412CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN Note that although they appear to test similar conditions, the first pruning technique (using r(a)) is not dominated by the second one (using e(a)). Consider e.g. the interesting grouping {a}, the equation a = b and the functional dependency a → b. Using only the second technique, the grouping {a, b} would be created, although it is not relevant. 23.4.8 Complex Ordering Requirements Specifying the ordering requirements of an operator can be surprisingly difficult. Consider the following SQL query: select * from S s, R r where r.a=s.a and r.b=s.b and r.c=s.c and r.d=s.d When answering this query using a sort-merge join, the operator has to request a certain odering. But there are many orderings that could be used: The intuitive ordering would be abcd, but adcb or any other premutation could have been used as well. This is problematic, as checking for an exponential number of possibilities is not acceptable in general. Note that this problem is not specific to our approach, the same is true, e.g., for Simmen’s approach. The problem can be solved by defining a total ordering between the attributes, such that a canonical ordering can be constructed. We give some rules how to derive such an ordering below, but it can happen that such an ordering is unavailable (or rather the construction rules are ambiguous). Given, for example, two indices, one on abcd and one on adcb, both orderings would be a reasonable choice. If this happens, the operators have two choices: Either they accept all reasonable orderings (which could still be an exponential number, but most likely only a few orderings remaing) or they limit themselves to one ordering, which could induce unnecessary sort operators. Probably the second choice is preferable, as the ambiguous case should be rare and does not justify the complex logic of the first solution. The attribute ordering can be derived by using the following heuristical rules: 1. Only attributes that occur in sets without natural ordering (i.e. complex join predicates or grouping attributes) have to be ordered. 2. Orderings that are given (e.g., indices, user-requested orderings etc.) order some attributes. 3. Small orderings should be considered first. If an operator requires an ordering with the attributes abc, and another operator requires an ordering with the attributes bc, the attributes b and c should come before a. 4. The attributes should be ordered according to equivalence classes. If a is ordered before b, all orderings in E(a) should be ordered before all orderings in E(b). 413 23.5. EXPERIMENTAL RESULTS n 5 6 7 8 9 10 5 6 7 8 9 10 5 6 7 8 9 10 #Edges n-1 n-1 n-1 n-1 n-1 n-1 n n n n n n n+1 n+1 n+1 n+1 n+1 n+1 t (ms) 2 9 45 289 1741 11920 4 21 98 583 4132 26764 12 69 370 2613 27765 202832 #Plans 1541 7692 36195 164192 734092 3284381 3060 14733 64686 272101 1204958 4928984 5974 26819 119358 509895 2097842 7779662 t/plan 1.29 1.17 1.24 1.76 2.37 3.62 1.30 1.42 1.51 2.14 3.42 5.42 2.00 2.57 3.09 5.12 13.23 26.07 t (ms) 1 2 12 74 390 1984 1 4 20 95 504 2024 1 6 28 145 631 3021 #Plans 1274 5994 26980 116562 493594 2071035 2051 9213 39734 149451 666087 2465646 3016 12759 54121 208351 827910 3400945 t/plan 0.78 0.33 0.44 0.63 0.79 0.95 0.48 0.43 0.50 0.63 0.75 0.82 0.33 0.47 0.51 0.69 0.76 0.88 % t 2.00 4.50 3.75 3.91 4.46 6.01 4.00 5.25 4.90 6.14 8.20 13.22 12.00 11.50 13.21 18.02 44.00 67.14 % #Plans 1.21 1.28 1.34 1.41 1.49 1.59 1.49 1.60 1.63 1.82 1.81 2.00 1.98 2.10 2.21 2.45 2.53 2.29 %. t/plan 1.65 3.55 2.82 2.79 3.00 3.81 2.71 3.30 3.02 3.40 4.56 6.61 6.06 5.47 6.06 7.42 17.41 29.62 Figure 23.14: Plan generation for different join graphs, Simmen’s algorithm (left) vs. our algorithm (middle) 5. Attributes should be ordered according to the functional dependencies, i.e. if a → b, a should come before b. Note that a = b suggests no ordering between a and b. 6. The remaining unordered attributes can be ordered in an arbitrary way. The rules must check if they create contradictions. If this happens. the contradicting ordering must be omitted, resulting in potentially superfluous sort operators. Note that in some cases these sort operators are simply unavoidable: If for the example query one index on R exists with the ordering abcd and one index on S with the ordering dcba, the heuristical rules detect a contradiction and choose one of the orderings. This results in a sort operator before the (sort-merge) join, but this sort could not have been avoided anyway. 23.5 Experimental Results The framework described in this chapter solves two problems: First, it provides an efficient representenation for reasoning about orderings and second, it allows keeping track of orderings and groupings at the same time. Since these topics are treated separately in the related work, the experimental results are split in two sections: In Section 23.6 the framework is compared to another published framework while only considering orderings, and in Section 23.7 the influence of groupings is evaluated. 23.6 Total Impact We now consider how order processing influences the time needed for plan generation. Therefore, we implemented both our algorithm and the algorithm proposed by Simmen et al. [818, 819] and integrated them into a bottom-up plan generator based on [567]. To get a fair comparison, we tuned Simmen’s algorithm as much as possible. The most important measure was to cache results in order to eliminate repeated 414CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN calls to the very expensive reduce operation. Second, since Simmen’s algorithm requires dynamic memory, we implemented a specially tailored memory management. This alone gave us a speed up by a factor of three. We further tuned the algorithm by thoroughly profiling it until no more improvements were possible. For each order optimization framework the plan generator was recompiled to allow for as many compiler optimizations as possible. We also carefully observed that in all cases both order optimization algorithms produced the same optimal plan. We first measured the plan generation times and memory usage for TPCR Query 8. A detailed discussion of this query follows in Section 23.7, here we ignored the grouping properties to compare it with Simmen’s algorithm. The result of this experiment is summarized in the following table. Since order optimization is tightly integrated with plan generation, it is impossible to exactly measure the time spent just for order optimization during plan generation. Hence, we decided to measure the impact of order optimization on the total plan generation time. This has the advantage that we can also (for the first time) measure the impact order optimization has on plan generation time. This is important since one could argue that we are optimizing a problem with no significant impact on plan generation time, hence solving a non-problem. As we will see, this is definitely not the case. In subsequent tables, we denote by t(ms) the total execution time for plan generation measured in milliseconds, by #Plans the total number of subplans generated, by t/plan the average time (in microseconds) needed to introduce one plan operator, i.e. the time to produce a single subplan, and by Memory the total memory (in KB) consumed by the order optimization algorithms. t (ms) #Plans t/plan (µs) Memory (KB) Simmen 262 200536 1.31 329 Our algorithm 52 123954 0.42 136 From these numbers, it becomes obvious that order optimization has a significant influence on total plan generation time. It may come as a surprise that fewer plans need to be generated by our approach. This is due to the fact that the (reduced) FSM only contains the information relevant to the query, resulting in fewer states. With Simmen’s approach, the plan generator can only discard plans if the ordering is the same and the set of functional dependencies is equal (respectively a subset). It does not recognize that the additional information is not relevant for the query. In order to show the influence of the query on the possible gains of our algorithm, we generated queries with 5-10 relations and a varying number of join predicates —that is, edges in the join graph. We always started from a chain query and then randomly added some edges. For small queries we averaged the results of 100 queries and averaged 10 queries for large queries. The results of the experiment can be found in Fig. 23.14. In the second column, we denote the number of edges in terms of the number of relations (n) given in the first column. The next six columns contain (1) the total time needed for 415 23.7. INFLUENCE OF GROUPINGS n 5 6 7 8 9 10 5 6 7 8 9 10 5 6 7 8 9 10 #Edges n-1 n-1 n-1 n-1 n-1 n-1 n n n n n n n+1 n+1 n+1 n+1 n+1 n+1 Simmen 14 44 123 383 1092 3307 27 68 238 688 1854 5294 53 146 404 1247 2641 8736 Our Algorithm 10 28 77 241 668 1972 12 36 98 317 855 2266 15 49 118 346 1051 3003 DFSM 2 2 2 3 3 4 2 2 3 3 4 4 2 3 3 4 4 5 Figure 23.15: Memory consumption in KB for Figure 23.14 plan generation (in ms), (2) the number of (sub-) plans generated, and (3) the time needed to generate a subplan (in µs), i.e. to add a single plan operator, for (a) Simmen’s algorithm (columns 3-5) and our algorithm (columns 6-8). The total plan generation time includes building the DFSM when our algorithm is used. The last three columns contain the improvement factors for these three measures achieved by our algorithm. More specifically, column % x contains the result of dividing the x column of Simmen’s algorithm by the corresponding x column entry of our algorithm. Note that we are able to keep the plan generation time below one second in most cases and three seconds in the worst case, whereas when Simmen’s algorithm is applied, plan generation time can be as high as 200 seconds. This observation leads to two important conclusions: 1. Order optimization has a significant impact on total plan generation time. 2. By using our algorithm, significant performance gains are possible. For completeness, we also give the memory consumption during plan generation for the two order optimization algorithms (see Fig. 23.15). For our approach, we also give the sizes of the DFSM which are included in the total memory consumption. All memory sizes are in KB. As one can see, our approach consumes about half as much memory as Simmen’s algorithm. 23.7 Influence of Groupings Integrating groupings in the order optimization framework allows the plan generator to easily exploit groupings and, thus, produce better plans. However, order optimization itself might become prohibitively expensive by considering 416CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN groupings. Therefore, we evaluated the costs of including groupings for different queries. Since adding support for groupings has no effect on the runtime behavior of the plan generator (all operations are still one table lookup), we measured the runtime and the memory consumption of the preparation step both with and without considering groupings. When considering groupings, we treated each interesting ordering also as an interesting grouping, i.e. we assumed that a grouping-based (e.g. hash-based) operator was always available as an alternative. Since this is the worst-case scenario, it should give an upper bound for the additional costs. All experiments were performed on a 2.4 GHz Pentium IV, using the gcc 3.3.1. To examine the impact for real queries, we choose a more complex query from the well-known TPC-R benchmark ([879], Query 8): select o year, sum(case when nation = ’[NATION]’ then volume else 0 end) / sum(volume) as mkt share from (select extract(year from o orderdate) as o year, l extendedprice * (1-l discount) as volume, n2.n name as nation from part,supplier,lineitem,orders,customer, nation n1,nation n2,region where p partkey = l partkey and s suppkey = l suppkey and l orderkey = o orderkey and o custkey = c custkey and c nationkey = n1.n nationkey and n1.n regionkey = r regionkey and r name = ’[REGION]’ and s nationkey = n2.n nationkey and o orderdate between date ’1995-01-01’ and date ’1996-12-31’ and p type = ’[TYPE]’ ) as all nations group by o year order by o year; When considering this query, all attributes used in joins, group-by and order-by clauses are added to the set of interesting orders. Since hash-based solutions are possible, they are also added to the set of interesting groupings. 23.7. INFLUENCE OF GROUPINGS 417 This results in the sets OIP = {(o year), (o partkey), (p partkey), (l partkey), (l suppkey), (l orderkey), (o orderkey), (o custkey), (c custkey), (c nationkey), (n1.n nationkey), (n2.n nationkey), (n regionkey), (r regionkey), (s suppkey), (s nationkey)} OIT GPI = ∅ = {{o year}, {o partkey}, {p partkey}, {l partkey}, {l suppkey}, {l orderkey}, {o orderkey}, {o custkey}, {c custkey}, {c nationkey}, {n1.n nationkey}, {n2.n nationkey}, {n regionkey}, GTI {r regionkey}, {s suppkey}, {s nationkey}} = ∅ Note that here OIT and GTI are empty, as we assumed that each ordering and grouping would be produced if beneficial. For example, we might assume that it makes no sense to intentionally group by o year: If a tuple stream is already grouped by o year it makes sense to exploit this, however, instead of just grouping by o year it could make sense to sort by o year, as this is required anyway (although here it only makes sense if the sort operator performs early aggregation). In this case, {o year} would move from GPI to GTI , as it would be only tested for, but not produced. The set of functional dependencies (and equations) contains all join conditions and constant conditions: F = {{p partkey = l partkey}, {∅ → p type}, {o custkey = c custkey}, {∅ → r name}, {c nationkey = n1.n nationkey}, {s nationkey = n2.n nationkey}, {l orderkey = o orderkey}, {s suppkey = l suppkey}, {n1.n regionkey = r regionkey}} To measure the influence of groupings, the preparation step was executed twice: Once with the data as given above and once with GPI = ∅ (i.e. groupings were ignored). The space and time requirements are shown below: With Groups Without Groups Duration [ms] 0.6ms 0.3ms DFSM [nodes] 63 32 Memory [KB] 5 2 418CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN preparation time o+g (n-1) o (n-1) o+g (n) o (n) o+g (n+1) o (n+1) 10 duration (ms) 8 6 4 2 0 4 5 6 7 8 no of relations 9 10 11 Figure 23.16: Time requirements for the preparation step Here time and space requirements both increase by a factor of two. Since all interesting orderings are also treated as interesting groupings, a factor of about two was expected. While Query 8 is one of the more complex TPC-R queries, it is not overly complex when looking at order optimization. It contains 16 interesting orderings/groupings and 8 functional dependencies, but they cannot be combined in many reasonable ways, resulting in a comparatively small DFSM. In order to get more complex examples, we produced randomized queries with 5 − 10 relations and a varying number of join predicates. We always started from a chain query and then randomly added additional edges to the join graph. The results are shown for n − 1, n and n + 1 additional edges. In the case of 10 relations, this means that the join graph consisted of 18, 19 and 20 edges, respectively. The time and space requirements for the preparation step are shown in Figure 23.16 and Figure 23.17, respectively. For each number of relations, the requirements for the combined framework (o+g) and the framework ignoring groupings (o) are shown. The numbers in parentheses (n − 1, n and n + 1) are the number of additional edges in the join graph. As with Query 8, the time and space requirements roughly increase by a factor of two when adding groupings. This is a very positive result, given that a factor of two can be estimated as a lower bound (since every interesting ordering is also an interesting grouping here). Furthermore, the absolute time and space requirements are very low (a few ms and a few KB), encouraging the inclusion of groupings in the order optimization framework. 419 23.8. ANNOTATED BIBLIOGRAPHY memory consumption of precomputed values o+g (n-1) o (n-1) o+g (n) o (n) o+g (n+1) o (n+1) 10 memory (KB) 8 6 4 2 0 4 5 6 7 8 no of relations 9 10 11 Figure 23.17: Space requirements for the preparation step 23.8 Annotated Bibliography Very few papers exist on order optimization. While the problem of optimizing interesting orders was already introduced by Selinger et al. [784], later papers usually concentrated on exploiting, pushing down or combining orders, not on the abstract handling of orders during query optimization. Papers by Simmen, Shekita, and Malkemus [818, 819] introduced a framework based on functional dependencies for reasoning about orderings. Since this is the only paper which really concentrates on the abstract handling orders and our approach is similar in the usage of functional dependencies, we will describe their approach in some more detail. For a plan node they keep just a single (physical) ordering. Additionally, they associate all the applicable functional dependencies with a plan node. Hence, the lower-bound space requirement for this representation is essentially Ω(n), where n is the number of functional dependencies derived from the query. Note that the set of functional dependencies is still (typically) much smaller than the set of all logical orderings. In order to compute the function containsOrdering, Simmen et al. apply a reduction algorithm on both the ordering associated with a plan node and the ordering given as an argument to containsOrdering. Their reduction roughly does the opposite of deducing more orderings using functional dependencies. Let us briefly illustrate the reduction by an example. Assume the physical ordering a tuple stream satisfies is (a), and the required ordering is (a, b, c). Further assume that there are two functional dependencies available: a → b and a, b → c. The reduction algorithm is performed on both orderings. Since (a) is already minimal, nothing changes. Let us now reduce (a, b, c). We apply the second functional dependency first. Using a, b → c, the reduction algorithm yields (a, b) because c appears in (a, b, c) 420CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN after a and b. Hence, c is removed. In general, every occurrence of an attribute on the right-hand side of a functional dependency is removed if all attributes of the left-hand side of the functional dependency precede the occurrence. Reduction of (a, b) by a → b yields (a). After both orderings are reduced, the algorithm tests whether the reduced required ordering is a prefix of the reduced physical ordering. Note that if we applied a → b first, then (a, b, c) would reduce to (a, c) and no further reduction would be possible. Hence, the rewrite system induced by their reduction process is not confluent. This problem is not mentioned by Simmen et al., but can have the effect that containsOrdering returns false whereas it should return true. The result is that some orderings remain unexploited; this could be avoided by maintaining a minimal set of functional dependencies, but the computation costs would probably be prohibitive. This problem does not occur with our approach. On the complexity side, every functional dependency has to be considered by the reduction algorithm at least once. Hence, the lower time bound is Ω(n). In case all functional dependencies are introduced by a single plan node and all of them have to be inserted into the set of functional dependencies associated with that plan node, the lower bound for inferNewLogicalOrderings is also Ω(n). Overall, Simmen et al. proposed the important framework for order optimization utilizing functional dependencies and nice algorithms to handle orderings during plan generation, but the space and time requirements are unfortunate since plan generation might generate millions of subplans. Also note that the reduction algorithm is not applicable for groupings (which, of course, was never intended by Simmen): Given the grouping {a, b, c} and the functional dependencies a → b and b → c, the grouping would be reduced to {a, c} or to {a}, depending on the order in which the reductions are performed. This problem does not occur with orderings, as the attributes are sorted and can be reduced back to front. A recent paper by Wang and Cherniack [911] presented the idea of combining order optimization with the optimization of groupings. Based upon Simmen’s framework, they annotated each attribute in an ordering with the information whether it is actually ordered by or grouped by. For a single attribute a, they write OaO (R) to denote that R is ordered by a, OaG (R) to denote that R is grouped by a and OaO →bG to denote that R is first ordered by a and then grouped by b (within blocks of the same a value). Before checking if a required ordering or grouping is satisfied by a given plan, they use some inference rules to get all orderings and groupings satisfied by the plan. Basically, this is Simmen’s reduction algorithm with two extra transformations for groupings. In their paper the check itself is just written as ∈, however, at least one reduction on the required ordering would be needed for this to work (and even that would not be trivial, as the stated transformations on groupings are ambiguous). The promised details in the cited technical report are currently not available, as the report has not appeared yet. Also note that, as explained above, the reduction approach is fundamentally not suited for groupings. In Wang’s and Cherniack’s paper, this problem does not occur, as they only look at a very specialized kind of grouping: As stated in their Axiom 3.6, they assume that a grouping 23.8. ANNOTATED BIBLIOGRAPHY 421 OaG →bG is first grouped by a and then (within the block of tuples with the same a value) grouped by b. However, this is a very strong condition that is usually not satisfied by a hash-based grouping operator. Therefore, their work is not general enough to capture the full functionality offered by a state-of-the-art query execution engine. In this chapter, we followed [644, 643]. 422CHAPTER 23. DERIVING AND DEALING WITH INTERESTING ORDERINGS AND GROUPIN Chapter 24 Cardinality and Cost Estimation 24.1 Introduction The plan generator relies on a cost function to evaluate the different plans and to determine the cheapest one. This chapter is concerned with the development of cost functions. The main input to cost functions are cardinalities. For example, assume a scan of a relation, which also applies a selection predicate. Clearly, the cost of scanning the relation depends on the physical layout of the relation on disk. Further, the CPU cost for evaluating the predicate depends on the number of tuples in the relation. Note that the cardinality of a relation is independent of its physical layout. In general, the cost of an algebraic operator is estimated by using a profile of the database. The profile must be small, e.g., a couple of kilobytes per relation1 . We distuinguish between the logical and the physical profile. For each database item and its constituents, there exist specialized logical and physical profiles. They exist for relations, indices, attributes, and sets of attributes. Consider a relation R. Its cardinality |R| belongs to its logical profile, whereas the number of pages ||R|| it occupies belongs to its physical profile. In Chapter 4, we saw more advanced physical profiles. The DBMS must be capable to perform several operations to derive profiles and to deal with them. Fig. 24.1 gives an overview. This figure roughly follows the approach of Mannino et al. [583, 582]. The first operation is the build operation, that takes as input a specification of the profiles to be build (because there are many different alternatives, as we will see) and the database. From that, it builds the according profiles for all database items of all the different granularities. When updates arrive, the profiles must be updated. This can either be done by a complete recalculation or by an incremental update operation on the profiles themselves. The latter is reflected in the operation update. Unfortunately, not all profiles can have an update operation. Within this book, we will not be too concerned with building and updating profiles. At the end 1 Given today’s cost for main memory, it may also be reasonable to use a couple of megabytes. 423 424 CHAPTER 24. CARDINALITY AND COST ESTIMATION profile specification build logical profile physical profile update profile update profile logical profile calculus or algebraic expression cardinality estimation cardinality estimate cardinality estimation physical profiles algebraic expression cost estimation cost logical profile calculus or algebraic expression profile propagation logical profile physical profile establishing physical profiles physical profile(s) logical profile(s) algebraic operator Figure 24.1: Overview of operations for cardinality and cost estimations of this chapter, we will provide some references (See [210] for an overview). The main operations this chapter deals with are among the remaining ones. The first of them is cardinality estmation. Given an algebraic expression or a calculus expression together with a logical profile of the database, we estimate the output/result cardinality of the expression. Why do we say algebraic or calculus expression? Remember that plan generators generate plans for plan classes. Each plan class corresponds to a set of equivalent plans. They all produce the same result and, hence, the same number of output tuples. Thus, in theory, one arbitrary representative of the class of equivalent algebraic expressions should suffice to calculate the logical profile, as a logical profile depends only on the outcome. On the other hand, the plan class more directly corresponds to a calculus expression. Hence, estimating the result cardinality of a calculus expression is a viable alternative. In the literature, most papers deal with the first approach while only a few deal with the latter (e.g., [227]). The second operation we will be concerned with is cost estimation. Given logical and physical profiles for all inputs and an algebraic operator (tree), this operation calculates the actual costs. Chapter 4 contains a detailed discussion about disk access cost calculation. Hence, this part is considered done for building blocks and access paths. The third major task is profile propagation. Given a logical or physical profile and an expression, we must be able to calculate the profile of the result, since this may be the input to other expressions and thus be needed for further cardinality estimates. The estimation of a physical profile occurs mostly in 24.1. INTRODUCTION 425 |χa:e2 (e1 )| = |e1 | |Γg;f (e)| = |ΠD g (e)| |e1 Zg;f e2 | = |e1 | |e1 T e2 | = |e1 | − |e1 N e2 | |e1 E e2 | = |e1 B e2 | + |e1 T e2 | |e1 K e2 | = |e1 1 e2 | + |e1 T e2 | + |e2 T e1 | |e1 1 e2 | = |ΠD A(e1 )∪A(e2 ) (e1 1̄e2 )| |Sort(e)| = |e| |Tmp(e)| = |e| |e1 A e2 | = |e1 | ∗ |e2 | |ΠA (e)| = |e| (bag sematics) |e1 ∪s e2 | = |ΠD A(e1 ) (e1 ∪b e2 )| (bag vs. set semantics) |e1 ∩s e2 | = |e1 1 e2 | equijoin over all attributes |e1 \s e2 | = |e1 | − |e1 ∩s e2 | |e1 ∪b e2 | = |e1 | + |e2 | bag semantics D |ΠD α∪β (R)| = |Πα (R)| if there is an FD α → β Table 24.1: Observations on cardinalities of different algebraic operators cases where operators write to disk. Given Chapter 4, this task is easy enough to be left to the reader. Since we follow the algebraic approach, we must be able to calculate the output cardinality of every operator occurring in the algebra. This task is vastly simplified by the observations contained in Table 24.1. This shows that we can go a far way if we are able to estimate the output cardinality for duplicate eliminating projections, selections, (bag) joins, and semijoins. For a certain class of profiles, Richard shows that a profile consisting ‘only’ of the sizes of all duplicate eliminating projections on all subsets of attributes of all relations is a complete profile under certain assumptions [730]. Since the set of subsets of a set of attributes can be quite large, Richard exploits functional dependencies to reduce this set by exploiting the fact that D |ΠD α∪β (R)| = |Πα (R)| if there exists a functional dependency α → β. A major differenciator for logical attribute profiles is the kind of the domain of the attribute. We distinguish between categorial attributes (e.g., color), discrete ordered domains (e.g., integer attributes, decimals, strings), and continous ordered domains (e.g., float). Categorial domains may be ordered or unordered. In the first case they are called ordinal, in the latter nominal. We will be mainly concerned with integer attributes. Strings are special, and we discuss some approaches in Sec. 24.13.6. Continous domains are also special. The probability of occurrence of any value in a continous domain in a finite set is zero. The techniques developed in this section can often easily be adopted to continous 426 CHAPTER 24. CARDINALITY AND COST ESTIMATION domains, even if we do not mention this explicitly. 24.2 A First Approach The first approach to cost and cardinality estimation integrated into a dynamic programming-based plan generator was presented by Selinger et al. [784]. We will use it as the basis for this section. 24.2.1 Top-Most Cost Formula (Overall Costs) Their top-most cost formula states that the total cost of a query evaluation plan equals the weighted sum of the I/O and CPU costs: C = CI/O + wCcpu (24.1) where w is the weight which can be adapted to different situations. If, for example, the system is CPU bound, we should increase w and if it is I/O bound, we decrease w. However, it is not totally clear what we are going to optimize under this cost formula. One interpretation could be the following. Assume w = 0.5. Then, we could interprete the total costs as response time under the assumption that fifty percent of the CPU time can be executed in parallel with I/O. Accordingly, we find other top-most cost formulas. For example, the weight is sometimes dropped [407]: C = CI/O + Ccpu (24.2) Under the above interpretation, this would mean that concurrency is totally absent. The opposite, total concurrency between I/O and CPU, can also be found [170]: C = max(CI/O , Ccpu ) (24.3) In these green days, an alternative is to calculate the power consumption during query execution. Therefore, we convert CPU time to Watts consumed by the CPU and disk time to Watts consumed by the disks and simply add up these number to get an estimate of the power consumption of the plan. 24.2.2 Summation of Operator Costs Given a query evaluation plan, the task is to calculate its I/O and CPU costs. This can be done by calculating the costs for each operator (op) occurring in the query evaluation plan (QEP) and adding up the according costs: X CI/O = CI/O (op) op∈QEP Ccpu = X op∈QEP Ccpu (op) However, these formulae sometimes raise a problem. For example, the nested loop join method requires multiple evalutions of its inner part. 24.2. A FIRST APPROACH 427 Further, in order to count the I/O cost correctly, it is necessary to make some assumptions when intermediate results are written to and read from disk. We will use the following assumption: Every operator is responsible for passing its result to the next operator via main memory. For example, a sort merge join may require sorting its inputs. The sort operators are then responsible for handing over their result to the merge join via main memory. This means that the merge join may not require any I/O if the merge can be done purely in main memory. 24.2.3 CPU Cost To get an estimate of the CPU costs, Selinger et al. simply count the number of calls to the tuple oriented interface (called RSI). This roughly corresponds to the number of next calls in an iterator-based implementation of algebraic operators (for details on the Research Storage Engine (RSS) and its interface RSI see [42]). Hence, what needs to be known is the output cardinality of each operator in the query plan. Given the input and output cardinalities, it is often quite straightforward to calculate the CPU costs. Take, for example, a selection operator. Clearly, the selection predicate is called n times if n is the input cardinality. The selection predicate itself consists of several calls to comparison functions, Boolean operators, arithmetic operators and the like. The CPU cost of each of these operators can easily be determined (by counting CPU cycles or measurements), and thus the total CPU cost of a selection operator can be determined easily. Other operators are also straightforward. A problem only arises if functions are called whose CPU costs can not easily be determined since they depend on their parameters. A typical example are string comparisons, where the CPU costs depend on the length of the string. Another example are user-defined functions. The framework presented in [410] can be used for all more complex functions. Another possibility is to use simplifying assumptions. The functions we are talking about are executed on a per tuple basis. As there are typically many tuples, using the average execution time for cost calculations is not a bad idea. 24.2.4 Abbreviations We need some abbreviations to state our cost formulas. A first bunch of them is summarized in Table 24.2. There, we assume that an index is always a B + tree. 24.2.5 I/O Costs Selinger et al. measure I/O costs in the number of pages read. Let us first discuss the different possible access paths to a single relation. Clearly, for a simple scan of a relation R, ||R|| pages have to be read. The next access path is composed of an access to a non-clustered index I to retrieve the tuple identifiers of those tuples that satisfy a predicate p followed by an access to the base relation R. Let F (p) be the fraction of tuples satisfying a certain predicate p. F (p) is called 428 CHAPTER 24. CARDINALITY AND COST ESTIMATION R,S,T I A,B,C DA dA minA maxA |R| ||R|| ||A||B ||A(R)||B ||I|| H(I) relations index attributes or sets of attributes ΠD A (R) |DA | min ΠD A (R) for an attribute A of R max ΠD A (R) for an attribute A of R number of tuples of R number of pages on which R is stored average length of a value of attribute A of R (in bytes) average length of a tuple in bytes number of leave pages of an index depth of the index I minus 1 Table 24.2: Notational conventions the selectivity of p. It is the main focus of the next subsection. Selinger et al. distinguish two cases. In the first case, all pages containing qualifying tuples fit into main memory. For this case, they estimate the number of pages accessed by H(T ) + F (p) ∗ (||I|| + ||R||). EXC Note that with the help of Chapter 4, we can already do better. In the second case, where the pages containing qualifying tuples do not fit into main memory, they give the estimate H(T ) + F (p) ∗ (||I|| + |R|) for the number of pages read. In case of a clustered index, they estimate the number of pages read by H(T ) + F (p) ∗ (||I|| + ||R||). Next, we have to discuss the costs of different join methods. Selinger et al. propose cost formulas for the simple nested loop join (1nl ) and the sort merge join (1sm ). Since summing up the costs of all operators in a tree results in some problems for nested loop joins, they adhere to a recursive computation of total costs. Let e1 and e2 be two algebraic expressions. Then they estimate the cost of the simple nested loop join as follows: CI/O (e1 1nl e2 ) = CI/O (e1 ) + |e1 | ∗ CI/O (e2 ) where |e1 | denotes the number of tuples produced by the expression e1 . As the cost calculation for the sort merge join is not convincing, we follow our own very simple approach here (see also Sec. 24.14). We split the costs into the sort costs and the merge costs. Given today’s memory sizes, it is not unlikely that we need a single merge phase. Hence, the I/O cost for sorting 429 24.2. A FIRST APPROACH consists of writing and reading the result of ei if it needs to be sorted. This can be estimated as CI/O (sort(ei )) = CI/O (ei ) + 2 ∗ ⌈1.2||A(ei )||B ∗ |ei |/pagesize⌉ where pagesize is the page size in bytes. The factor 1.2 is called the universal fudge factor . In the above case, it takes care of the storage overhead incurred by using slotted pages. If we assume that the merge phase of the sort merge join can be performed in main memory, no additional I/O costs occur and we are done. Clearly, in the light of Chapter 4, counting the numbers of pages read is not sufficient as the discrepancy between random and sequential I/O is tremendous. Thus, better cost functions should use a more elaborate I/O cost model along the lines of Chapter 4. In any case, note that the calculation of the I/O or CPU costs of any operator highly depends on its input and output cardinalities. 24.2.6 Cardinality Estimates Given a predicate p, we want to estimate its selectivity, which is defined as the fraction of qualifying tuples. If p is a selection predicate applied to a relation R, the selectivity of p is defined as s(p) = |σp (R)| . |R| If we know the selectivity of p, we can then easily calculate the result size of a selection: |σp (R)| = s(p)|R| Similarly for joins. Given a join predicate p and two relations R and S, we define the selectivity of p as s(p) = |R 1p S| |R 1p S| = |R × S| |R| ∗ |S| and can calculate the result size of a join by |R 1p S| = s(p) |R| |S|. The idea of the approach of Selinger et al. is to calculate the result cardinality for a plan class by the following procedure. First, the sizes of all relations represented by the plan class are multiplied. This is the result of their cross product. In a second step, they take a look at the predicate p applied to the relations in the plan class. For p they calculate a selectivity estimate s(p) and multiply it with the result of the first step. This then gives the result. Hence, if a plan class represents the algebraic expression σp (Ani=1 Ri ), the cardinality estimate is s(p) n Y i=1 |Ri |. 430 CHAPTER 24. CARDINALITY AND COST ESTIMATION predicate not(p1 ) p1 ∧ p2 p1 ∨ p2 A=c s(p) 1 − s(p1 ) s(p1 ) ∗ s(p2 ) s(p1 ) + s(p2 ) − s(p1 )s(p2 ) 1/dA 1/10 1/ max(dA , dB ) 1/dX 1/10 A=B comment independence if dA is known, uniformity else if dA and dB are known, uniformity if only dX , X ∈ {A, B} is known else A>c maxA −c maxA − minA if min and max are known, uniformity c1 ≤ A ≤ c2 c2 −c1 maxA − minA if min and max are known, uniformity 1/3 A IN L A IN Q 1/4 min(1/2, s(A = c)|L|) |Q|/|X| else else X is cross product of all relations in Q’s from clause Table 24.3: Selectivity estimation as proposed by Selinger et al.[784] Since p can be a complex predicate involving boolean operators, they have to deal with them. Table 24.3 summarizes the proposed selectivity estimation. A and B denote attributes, c, c1 , c2 denote constants, L denotes a list of values, Q denotes a subquery. In System R, the number of distinct values for an attribute (dA , dB ) is only known if there exists an according index on the attribute. Let us give some rational for the selectivity estimation of A IN Q for an attribute A and a subquery Q. Assume that A is an attribute of relation R and the subquery Q is of the form select B from S .... Further assume that ΠA (R) ⊆ ΠB (S), i.e., referential integrity holds. Clearly, if all tuples of S are in the result of Q, the selectivity is equal to 1. If the output cardinality of Q is restricted by a factor s′ = |Q|/|S|, then we may assume that the number of distinct values in Q’s result is restricted by the same factor. Hence, the selectivity factor of the total predicate is also s′ . Selinger et al. now continue as follows: “With a little optimism, we can extend this reasoning to include subqueries which are joins and subqueries in which column [B] is replaced by an arithmetic expression involving column names. This leads to the formula given above.” Discussion Taking a broad view at the above model, we see that • the estimates for CPU and I/O times are quite rough, • the approach is not complete, especially projection and semijoin are not treated, • profile propagation is not discussed. Further, the uniformity and independence assumptions are applied. This has been shown to be quite inaccurate in many cases. More specifically, apply- 24.3. THE SIMPLE PROFILE: A FIRST LOGICAL PROFILE AND ITS PROPAGATION431 ing these and other assumptions often leads to an overestimate of real result cardinalities [177, 179]. How bad is it in terms of plan generation if we under- or overestimate the cardinalities of intermediate results? As Ioannidis and Christodoulakis pointed out, errors propagate multiplicatively through joins [451]. Assume we want to join eight relations R1 , . . . , R8 and that the cardinality estimates of Ri are each a factor of 5 off. Then the cardinality estimation of R1 1 R2 1 R3 will be a factor of 125 off. Clearly, this can affect the subsequent join ordering. If we were only a factor of 2 off, the cardinality estimation of R1 1 R2 1 R3 could be only a factor of eight off. This shows that minimizing the multiplicative error is a serious intention. The effect of misestimating cardinalities on plan quality has not been thoroughly investigated. There exists a study by Kumar and Stonebraker, which concludes that it does not matter [521]. However, we do not trust this conclusion. Swami and Schiefer give a query and its profiles for which bad cardinality estimates lead to a very bad plan [864]. A very impressive example query is presented in [884]. The plan produced for the query under cardinality estimation errors runs 40 minutes while the plan produced with better cardinality estimates takes less than 2 seconds. Later, we will give two further examples showing that good cardinality estimation is vital for generation of good plans. Hence, we are very sure that accurate estimation is vital for plan generation. We suggest to the reader to find examples, using the simple Cout cost function, where wrong cardinality estimates lead to bad plans. EXC 24.3 The Simple Profile: A First Logical Profile and its Propagation We call a logical profile complete if it allows us to perform cardinality estimation and logical profile propagation for all algebraic operators. In this section, we present an almost complete logical profile and describe the procedure of profile propagation. The main components are the cumulated frequency, i.e., the number of tuples, and the number of distinct values for each attribute in a relation. It is easy to see that we cannot do without either of them. Further, an upper and lower bound for values of an attribute is needed. Again, we will see that we cannot do without them. Hence, the following profile is minimal. 24.3.1 The Logical Profile For every attribute A of a relation, we define its logical profile as a four tuple bA = [lA , uA , fA , dA ] where lA is a lower and uA is an upper bound for the values of A. Further, fA is the cumulated frequency, i.e., the number of tuples with an A value within the bounds, and dA is the number of distinct values occurring as A’s values within the given bounds. 432 CHAPTER 24. CARDINALITY AND COST ESTIMATION For the purpose of this section, we can define lA = min(ΠA (R)) uA = max(ΠA (R)) fA = |R| dA = |ΠD A (R)| If the attribute A is implicit from the context or does not matter, we may omit it. 24.3.2 Assumptions The first two assumptions we make are: 1. All attribute values are uniformely distributed, and 2. the values of all attributes are drawn independently. Other assumptions will follow. Often in the formulas developed below, we talk about the universe (U) or domain of the attributes. This is the potential set of values from which a given attribute takes its values. In case of integer attributes, it is easy to see that the domain of attribute A is [lA , uA ]. The size of the domain, denoted by nA , then is nA = uA − lA + 1. For real values, the size of the domain is (almost) infinite. Thus, only some of the formulas given below may carry over to attributes whose type is real. Please do not confuse the universe/domain of an attribute A with the active domain of an attribute A, which contains the actual values DA = ΠD A (R) occurring for A in relation R. Let us take a closer look at the assumptions. The uniform distribution assumption means that every value occurs about the same number of times. However, this cannot mean that every value of the domain does so, since dA may be much smaller than nA . Hence, we refine the uniform distribution assumption (UDA) by assuming that every distinct value occurs about fA /dA times. The second assumption is called attribute value independence assumption (AVI) or simply independence assumption. Assume we have two predicates p1 and p2 and wish to calculate the selectivity of p1 ∧ p2 . Independence tells us that we can do so by multiplying the selectivities of p1 and p2 . Thus, under independence sel(p1 ∧ p2 ) = sel(p1 ) ∗ sel(p2 ). We still need another assumption: the equal spread assumption (ESA), also called uniform spread assumption (USA) [210, 700]. It is used to answer the question where the occurring distinct values are in the domain. The equal spread assumption states that they occur at equal distance. Let us elaborate a little on this. For integers, we know the number nA of possible values from which A can be drawn. It is nA = uA − lA + 1. Let us assume that we have only a few distinct values, that is dA << nA . This is not strictly necessary but is good for our intuition. We can now define the spread between two values occurring in A. Let DA = ΠD A (R) = {x1 , . . . , xdA } where xi < xi+1 be the sorted set of values 24.3. THE SIMPLE PROFILE: A FIRST LOGICAL PROFILE AND ITS PROPAGATION433 occuring for attribute A, also known as active domain. Then we can define the spread as ∆i = xi+1 − xi The equal spread assumption (ESA) states that ∆i = ∆j for all 1 ≤ i, j < dA . Denote this value by ∆A . There are three subtypes of the equal spread assumption, depending on whether we assume the lower and upper bounds lA and uA belong to DA . Type I assumes lA , uA ∈ DA . Then ∆A becomes (uA − lA )/(dA − 1). In case of type II, where lA ∈ DA and uA ̸∈ DA holds, we have ∆A = (uA − lA )/dA . For type III, where lA ̸∈ DA and uA ̸∈ DA we get ∆A = (uA − lA )/(dA + 1). As an example, take lA = 1, uA = 13, and dA = 3. Then for the three types we have the different values 12/2 = 6, 12/3 = 4, and 12/4 = 3. It should be clear that the difference is small if dA is sufficiently large. If dA is small, we can store the frequency of each value explicitly. Otherwise, it is large, and it does not matter which type we use. In case of integers, the above numbers may result in non-integers. Thus, we prefer to define in this case ∆A = ⌊ uq − lq + 1 ⌋. dA An alternative to the uniform distribution assumption and the equal spread assumption is the continous-value assumption. Here, we assume that all values in the (discrete and finite) domain occur with frequency fA /nA . Different assumptions can lead to different estimates. To see this, we first fix some notation. Then, we provide estimation procedures under the continues value assumption and under the equal spread assumption. Afterwards, we present an example. Assume we are given a relation R and one of its attributes A. The possible values for attribute A as implied by its type is called universe and abbreviated by UA . The set of possible values is the active domain DA , which we already saw. The total number of tuples in R is typically called its cardinality and denoted by |R|. However, in this chapter we prefer to call this value cumulated frequency and denote it by fA . Remember that we denote the minimum of DA by lA and the maximum by uA . For attribute A, we consider range queries and try to estimate the result cardinality thereof. Thus, we are interested in queries Q of the form select count(*) from R where lq ≤ A ≤ uq . We denote the result of this range query by fq . We describe frequency densities of some attribute A by sets of points (xi , fi ), where xi is a domain value and fi is the frequency of the domain value. Thus, the frequency density is the result of the query select A, count(*) from R group by A. Here is our example for a frequency density: (1, 7), (5, 4), (7, 2), (8, 1). Thus, the integer value 1 occurs 7 times and the value 7 occurs 2 times. 434 CHAPTER 24. CARDINALITY AND COST ESTIMATION To estimate the result cardinality of a range query Q with bounds lq and uq under the continous value assumption, we use the formula uq − lq + 1 fˆq (cva) := ∗ fA . ua − la + 1 Let us first recall the spread under the equal spread assumptions. For integer values, we defined uq − lq + 1 ⌋. ∆A := ⌊ dA Using this definition, we provide an estimate for fˆq (esa) by applying the following formula: qu − ql + 1 fA fˆq (esa) := ⌊ ⌋∗ . ∆A dA EXC Note that if the active domain is dense, i.e., all possible values within [lA , uA ] occur in the database, then the estimation under cva and esa coincide. Fig. 24.2 shows the results for 28 different range queries specified by their lower bound (lq ) and upper bound (uq ) for the frequency density given above. The true cumulated frequency within the given query range is given in the column fq . The estimates determined under CVA and ESA are presented as well as a column indicating the better assumption for that particular query. As we can see, in most cases ESA wins. However, experiments by Wang and Sevcik [906] came to the conclusion that the opposite is true and CVA is superior to ESA. (We can follow this claim at least for some of their data sets). Since estimates using CVA are easier to calculate and easily extendible to contineous domains, we prefer them. Given the above assumptions (and one more to come), the task is to establish the operations cardinality estimation and logical profile propagation. The latter implies that we can calculate the logical profile of all attributes of any result relation established by applying some algebraic operator. Assume we have solved this task. Then it is clear that the cumulated frequency fA , which equals |R| in this section, solves the task of cardinality estimation. Hence, we will not mention the cardinality estimation task explicitly any more. The use of the cumulated frequency fA instead of the seemingly simpler cardinality notation |R| is motivated by the fact that a single attribute will have multiple (small, piecewise) profiles if histograms are applied. To make the formulas of this section readily available for histogram use is the main motivation for using the cumulated frequency. 24.3.3 Profile Propagation for Selection We start with the selection operation. Let R be a relation and A, C ∈ A(R) be two attributes of R. We are given the profiles bA = [lA , uA , fA , dA ] and ′ , u′ , f ′ , d′ ] bC = [lC , uC , fC , dC ] and have to calculate the profiles b′A = [lA A A A ′ ′ ′ ′ ′ and bC = [lC , uC , fC , dC ] for σp(A) (R) for various selection predicates p(A) in attribute A. We assume that the attribute values of A and C are uniformly distributed and that A and C are independent. If a selection predicate uses 24.3. THE SIMPLE PROFILE: A FIRST LOGICAL PROFILE AND ITS PROPAGATION435 no 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 lq 1 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 4 4 4 4 5 5 5 6 6 7 uq 2 3 4 5 6 7 8 3 4 5 6 7 8 4 5 6 7 8 5 6 7 8 6 7 8 7 8 8 fq 7 7 7 11 11 13 14 0 0 4 4 6 7 0 4 4 6 7 4 4 6 7 4 6 7 2 3 3 fˆq (cva) 3.5 5.25 7 8.75 10.5 12.25 14 3.5 5.25 7 8.75 10.5 12.25 3.5 5.25 7 8.75 10.5 3.5 5.25 7 8.75 3.5 5.25 7 3.5 5.25 3.5 fˆq (esa) 3.5 3.5 7 7 10.5 10.5 14 3.5 3.5 7 7 10.5 10.5 3.5 3.5 7 7 10.5 3.5 3.5 7 7 3.5 3.5 7 3.5 3.5 3.5 winner cva cva cva esa esa esa esa esa esa esa cva esa Figure 24.2: Sample for range query result estimation under CVA and ESA. two attributes A and B, we again need to give the profile propagation for all attributes C, which are different from them. Exact match queries The first case we consider is σA=c for a constant c. ′ = c, u′ = c. Further, Clearly, lA A d′A =  1 if c ∈ ΠA (R) 0 else We cannot be sure whether the first or second case occurs. Since no reasonable cardinality estimation should ever return zero, we always assume c ∈ ΠA (R). More generally, we assume that all constants in a query are contained in the database in the according attributes. As every distinct value occurs about fA /dA times, we conclude that fA′ = 436 CHAPTER 24. CARDINALITY AND COST ESTIMATION fA /dA . A special case occurs if A is the key. Then, we can immediately conclude that fA′ = 1. Let us now consider another attribute C ∈ A(R), C ̸= A. Since fC′ = fA′ , we only need to establish d′C . For the lack of any further knowledge, we keep ′ = l and u′ = u . To derive the number of the lower and upper bounds, i.e. lC C C C distinct values remaining for attribute B, we can use the formula by Yao/Waters (see Sec. 4.16.1) Denote by s(p) = |σA=c (R)|/|R| = fA′ /fA the fraction of tuples that survives the selection with predicate p ≡ A = c. Fix a distinct value for C. Using the uniform distribution assumption, it occurs in fC /dC tuples of R.  Then, for this value we have fA −ff C′ /dC possibilities to chose fA′ tuples without A  it. The total number of possibilities to chose fA′ tuples is ffA′ . Thus, we may A conclude that d′C = dC YffCA/dC (fA′ ) Alternatively, we could use d′C = dC ∗ (1 − (1 − s(p))fC /dC ) or any other good approximation (see Section 4.16.1). Range queries Let us now turn to range queries, i.e. selection predicates of the form c1 ≤ A ≤ c2 , where lA ≤ c1 < c2 ≤ uA . In all of them, the lower ′ = c and u′ = c . Using the and upper bounds are given by the range, i.e. lA 1 2 A System R approach, we can estimate fA′ = d′A = c2 − c1 ∗ fA uA − lA c2 − c1 ∗ dA uA − lA This estimate is good for real values. We could also rewrite the above estimate for the number of distinct values ′ dA to c2 − c1 d′A = ∆A As soon as we have estimated the number of distinct values in a given range, we can easily derive the cumulated frequency, as every distinct value occurs as often as it did in R. Thus fA′ = fA ∗ (d′A /dA ). For another attribute C, C ̸= A, the profile propagation is the same as in the case for A = c. We only need to define s(p) = |σc1 ≤A≤c2 (R)|/|R|. Equality-based correlation The next case we consider is a predicate of the form A = B. If uA < lB or uB < lA , the result is empty. If lA ̸= lB or uA ̸= uB , we first apply a selection with predicate max(lA , lB ) ≤ A ≤ min(uA , uB ) and max(lA , lB ) ≤ B ≤ min(uA , uB ). So assume w.l.o.g. that lA = lB and uA = uB . Note that fA = fB . Denote this number by f . Define n to be the number of values in the domain of attributes A and B. For integers, this number is 24.3. THE SIMPLE PROFILE: A FIRST LOGICAL PROFILE AND ITS PROPAGATION437 n = uA − lA + 1. To refer to the elements of the domain, we assume that it is {x1 , . . . , xn } with xi < xi+1 . Let x be a value in the domain. Then we say that R has a hole at x in attribute A, if x ̸∈ ΠA (R). Consider a value x in the domain. The probability of not having a hole at x in A is  n−1 dA d −1 p(x ∈ A) = An  = n d A In general, we have fA′ = fB′ = n X fA p(xi = A)p(xi = B|xi = A) (24.4) i=1 where fA = f /dA is the average frequency of a distinct value in ΠA (R), p(xi = A) = dA /n is the probability that a tuple has xi as its value for attribute A, and p(xi = B|xi = A) is the conditional probability that a tuple has xi in its B value if it is known that it has an A value xi . Let us first consider the special case where ΠA (R) ⊆ ΠB (R). Then p(xi = B|xi = A) becomes 1/dB . Hence, fA′ = fB′ = n X f dA 1 i=1 dA n dB = f /dB For ΠB (R) ⊆ ΠA (R), we get fA′ = fB′ = f /dA . Summarizing these cases, we may conclude that f fA′ = fB′ = max(dA , dB ) which is the formula applied in System R if indices exist on A and B. Clearly, we can calculate an upper bound on the number of distinct values as d′A = d′B = min(dA , dB ). Let us estimate the cumulated frequency after the selection if none of the above conditions hold and independence of A and B holds. Then, the conditional probability p(xi = B|xi = A) becomes p(xi = B) = 1/n. Thus fA′ = fB′ = n X f dA 1 i=1 dA n n = f n If A and B are independent and uniformly distributed, the number of distinct values d′A = d′B can be estimated as follows. According to Section 4.16.1, we can estimate the number of distinct values in ΠAB (R) as D(n∗n, |R|), where |R| = fA = fB . Since out of the n ∗ n possible pairs of values only n are of the form (xi , xi ), only n/(n ∗ n) = 1/n tuples are of the qualifying form. Using this factor, we derive D(n ∗ n, fA ) d′A = d′B = n 438 CHAPTER 24. CARDINALITY AND COST ESTIMATION In case of ΠA (R) ⊆ ΠB (R), only dA such pairs out of dA ∗ dB exist. Thus, the factor becomes dA /(dA ∗ dB ) = 1/dB . For ΠB (R) ⊆ ΠA (R), we have the factor 1/dA . Both cases can be summarized as in d′A = d′B = D(n ∗ n, fA ) max(dA , dB ) In case the domain size n is not available, we could estimate it by |ΠD A (R) ∪ D ΠB (S)|. If this number is not available either, we could hesitatingly use dA dB . An alternative is to use d′A = d′B = dA ∗ YffAA/dA (fA′ ) or some of its approximations like d′A = d′B = dA ∗ (1 − (1 − s(A = B)fA /dA )), where s(A = B) = fA′ /fA . Inequality-based correlation As a last exercise, let us calculate the profile for selections of the form σA≤B (R). For simplicity, we assume lA = lB and ′ = l′ = l and u′ = u′ = u . To calculate the cumulative uA = uB . Thus, lA A A B A B frequency of the result under independence of A and B, we apply the type I equal spread assumption, with ∆A = (uA − lA )/(dA − 1). Hence, we assume that xi = lA + (i − 1)∆A . This gives us fA′ = = dA X i=1 dA X i=1 = fA fA p(xi ≤ B|xi = A) fA p(xi ≤ B) dA X xi − lB i=1 = fA uB − lB dA X 1 (( xi ) − dA lB ) uB − lB i=1 = = = = = dA dA − 1 fA (lA − lB + ∆A ) uB − lB 2 dA (uA − lA ) (dA − 1) fA uB − lB (dA − 1) 2 uA − lA dA fA uB − lB 2 fA dA dA 2 fA 2 24.3. THE SIMPLE PROFILE: A FIRST LOGICAL PROFILE AND ITS PROPAGATION439 f′ d′ fA′ = fA /dA d′A = 1 1 fA′ = ucA2 −c −lA ∗ fA 1 d′A = ucA2 −c −lA ∗ dA fA′ = d′A ∗ (fA /dA ) 1) d′A = (c2∆−c A fA′ = max(dfA ,dB ) d′A = dA ∗ YffAA/dA (fA′ ) ΠA (R) ⊇ ΠB (R) fA′ = fB′ = fnA d′A = dA ∗ YffAA/dA (fA′ ) else A≤B fA′ = fB′ = f2A d′A = dA ∗ YffAA/dA (fA′ ) p(A) fC′ = fA C ̸∈ A = F(p) d′C = dC ∗ YdfCA/fC (fA′ ) predicate A=c c1 ≤ A ≤ c2 A=B comment ⊆ Table 24.4: Profile propagation for selection As an exercise the reader may verify that fA′ = (dA − 1)fA /(2dA ) under the type II equal spread assumption. As an additional exercise the reader should derive d′A and d′B . We conjecture that EXC d′A = D(nA , fA′ ) or d′A = dA ∗ YffAA/dA (fA′ ). The following observation is crucial: Even if in the original relations the values of A and B are uniformely distributed, which typically is not the case, the distribution of the values A and B after the selection with A ≤ B is nonuniform. For example, p(xi ≤ B) = xi − lB uB − lB for lB ≤ xi ≤ uB . Table 24.4 summarizes our findings about profile propagation for selections. Open ranges and functions There are plenty of other cases for selection predicates, which we have not discussed. Let us briefly mention a few of them. 440 CHAPTER 24. CARDINALITY AND COST ESTIMATION Clearly, we have: |σA̸=c (R)| = |R| − |σA=c (R)| |σc1 kl=1 fi′ ,l ) ++k; li,j = k; } Pi lbi = dj=1 li,j ; } ⊥ = max DG i=1,2 lbi ; ⊥ return DG ⊥ Figure 24.3: Calculating the lower bound DG Let us start by repeating the definition of the Kronecker product of two matrices A = (ai,j ) and B = (bi,j ) of dimension n × m and n′ × m′ . The result A ⊗ B is a matrix of dimension nn′ × mm′ . The general definition is   a1,1 B a1,2 B . . . a1,m B  a2,1 B a2,2 B . . . a2,m B  . A⊗B =  ... ... ... ...  an,1 B an,2 B . . . an,m B The estimate can not be calculated easily. First, we calculate the Kronecker product fG = f1 ⊗ . . . ⊗ fn of all frequency vectors. Note that to every value D combination v ∈ ΠD A1 (R) × . . . × ΠAn (R) there corresponds exactly one component in fG , which contains its probability of occurrence. With this observation, it is easy to derive the following theorem, in whichQwe denote by fG,i the i-th component of fG and by M its length, i.e. M = ni=1 di . Further remember that N = |R|. Theorem 24.3.3 (estimate) Let the following assumptions hold: 1. The data distributions of individual attributes in G are independent. 2. For the value combinations vi , its occurrence is the result of an independent Bernoulli trial, with the success (occurrence) probability fG,i . 3. The occurrences of individual possible value combinations are independent. Then, the expected number of distinct values DG is M X E[DG ] = M − (1 − fG,i )N . i=1 24.3. THE SIMPLE PROFILE: A FIRST LOGICAL PROFILE AND ITS PROPAGATION445 EstimateNumberOfDistinctValues(f1 , . . . , fn ) /* frequency vectors fi */ /* step 1: calculate fG = f1 ⊗ . . . ⊗ fn */ fG = f1 ; for (i = 2; i ≤ n; ++i) { f old = fG ; fG = ϵ; // empty vector for (j = 1; j ≤ |f old |; ++j) { for (k = 1; k ≤ di ; ++k) { fG = push back(fG , fjold × fi,j ); // append a value to a vector } } } /* step 2: compute the expected number of distinct value combinations */ S = 0; for (j = 1, j ≤ M ; ++j) { // M = length(fG ) S += (1 − fj )N ; } D̂G = M − S; return D̂G ; Figure 24.4: Calculating the estimate for DG The algorithm for computing the estimate is given in Fig. 24.4. In the first, most expensive phase, it constructs the Kronecker product. Then, the simple calculations according to the theorem follow. A more efficient implementation would calculate the Kronecker product only implicitly. Further, the frequency vectors may not be completely known but only a part of it via some histogram. As was also shown by Yu et al., end-biased histograms (coming soon) are optimal under the following error metrics. Let D̂G,hist be the estimate derived for a histogram. The error function they consider is Eabs = |D̂G − D̂G,hist |. 24.3.6 Profile Propagation for Division As a starting point, we use an observation made by Merrett and Otoo [603]. Assume we are given two sets X and Y , which are both subsets of a finite domain D with |D| = n elements. Then |X| < |Y | implies that X ̸⊆ Y . Otherwise, we can calculate the probability of X ⊇ Y as     |X| n p(X ⊇ Y ) = / |Y | |Y | Now let R and S be two relations with A(R) = {A, B} and A(S) = {B}. A value a ∈ ΠD A (R) is contained in the result of R÷B if and only if ΠB (σA=a (R)) ⊇ 446 CHAPTER 24. CARDINALITY AND COST ESTIMATION S. Hence, for any such a, fA = fA /dA and nB equal to the size of the common domain of R.B and S.B, we can calculate the survival probability as    nB fA / |S| |S|  provided that fA ≥ |S| and R is a set. Denote by fA′ and d′A the cumulated frequency and the number of distinct values for attribute A in the result of R ÷ S. Then we have the estimate     nB fA ′ ′ / fA = dA = dA ∗ |S| |S| in case R is a set. If R is a bag, we must be prepared to see duplicates in σA=a (R). In this case we can adjust the above formula to     xA n ′ ′ fA = dA = dA ∗ / |S| |S| where xA = D(xA , nA ), and nA is the size of the domain of R.A. If there is some variance among the number of distinct values associated with the a ∈ ΠD A (R), the estimate will be rough. To cure this, we need better information. Define for each a ∈ ΠD A (R) the number ha to be the number of distinct b values occurring for it, i.e. ha = |ΠD B (σA=a (R))|. Then we could ′ ′ estimate fA and dA as follows: fA′ = d′A = X a∈ΠD A (R)     ha n / |S| |S| Keeping ha for every possible a may not be practical. However, if the number of distinct values in H = {ha |a ∈ ΠD A (R)} is small, we can keep the number of distinct a values for each possible ha . Assume H = {h1 , . . . , hk } and define gi = |{a ∈ ΠD A (R)|ha = hi }|, then we have the estimate fA′ = d′A = k X i=0,hi ≥|S| 24.3.7 gi     hi n / . |S| |S| Remarks NULL Values Our profile is not really complete for attributes which can have NULL values. To deal with these, we need to extend our profiles by the frequency d⊥ A with which NULL occurs in an attribute A of some relation. It is straightforward to extend the above profile to deal with this additional count. 447 24.4. APPROXIMATION OF A SET OF VALUES Name Definition median(x̃) mean(x̄)  Error minimized x(n+1)/2 n odd (xn/2 + xn/2+1 )/2 n even 1/n Pn E1 = E2 = i=1 xi Pn i=1 |xi − x̂| pPn 2 i=1 (xi − x̂) middle (max(x) + min(x))/2 E∞ = maxni=1 |xi − x̂| q-value p max(X) min(X) Eq = maxni=1 max{xi /x̂, x̂/xi } Table 24.6: Approximations of a set of numbers by a single number Uniformity is not sufficient As we have seen, even if all attributes are uniformly distributed, which is rarely the case in practice, the result of algebraic operators may no longer be uniformly distributed. As a consequence, we need to be concerned with the approximation of the true distribution of values. Sets of Attributes Note that nothing prevents us to use the formulas developed above for selections and joins if A and B are attribute sets instead of single attributes. We just have to know or calculate dA for sets of attributes A. 24.4 Approximation of a Set of Values 24.4.1 Approximations and Error Metrics Assume we have a set of values x = {x1 , . . . , xn }. The task we want to tackle is to approximate this set of values by a single value. The left two columns of Table 24.6 show the names and definitions of some possible approximations. Whereas mean and median are well known, the other two may be not. The middle is defined as the value exactly between the minimum and maximum of X. Hence, the distance from the middle to either extreme is the same. The qvalue needs some further restriction: the values in X must be larger than zero. For our purposes, this restriction is not bad since execution costs are typically larger than zero and frequencies are mostly larger than zero if they are not exactly zero. The latter case needs some special attention if we use something like the q-value, which we could also term geometric or multiplicative middle. Let us take a look at a simple example. Assume X = {1, 2, 9}. Then we can easily calculate the approximations summarized in the following table: median 2 mean 4 middle 5 q-value 3 Which of these approximations is the best one? The answer depends on the error function we wish to minimize. Therefore, the rightmost column of Table 24.6 shows some error functions, which are minimized by the approximation defined in the same line. The variable x̂ denotes the estimate whose error is to be 448 CHAPTER 24. CARDINALITY AND COST ESTIMATION calculated. For E2 there exist plenty of equivalent formulations, where we think of two error measures as being equivalent, if andPonly if they result inPthe same minimum. Some important alternatives are 1/n (xi − x̂)2 , 1/(n−1) (xi − x̂)2 P (empirical variance), and simply (xi − x̂)2 . A nice property half of the approximations give us are error bounds. These are E∞ and Eq . Define the spread s of x as max(x) − min(x). Then, given the middle m of x, we have for every xi ∈ x that m − s/2 ≤ xi ≤ m + s/2. Thus, we have a symmetric, additive error bound for all elements in x. Define p the geometric spread as s = max(x)/ min(x). Then we have a symmetric, multiplicative error bound for all elements xi in x given by (1/s)q ≤ xi ≤ sq if q is the geometric middle. The following table shows the possible errors for all approximations of our example set X = {1, 2, 9}: E1 E2 E∞ Eq median 2 8 7.1 7 4.5 mean 4 10 6.2 5 4 middle 5 11 6.4 4 5 geo. mean 3 9 6.4 6 3 Which of these error metrics and, hence, which approximation is the best? Obviously, this depends on the application. In the query compiler context, E1 plays no role that we are aware of. E2 plays a predominant role as it is used to approximate values in a given histogram bucket. This has not come by sharp reasoning about the best possibility but merely by the existence of a huge body of literature in this area. Currently, the other two error metrics, E∞ and Eq , play minor roles. But this will change. 24.4.2 Example Applications Let us discuss some example applications relevant to building a query compiler. Assume we have to come up with the execution time (CPU usage) for some function. This could be a simple arithmetic function built into our system, a hash function executed for a hash-join, the CPU time used to dereference a TID if the according page is in memory, the CPU time needed to search a B-tree page residing in the buffer, or the CPU time needed to load a page from secondary storage into the buffer. Careful as we are, we measure the function’s execution time several times. Almost certainly, the numbers will not be same for every execution, except maybe for the simplest functions. To come up within a single number, we need to approximate the set of numbers derived from our measurements. If the function is going to be executed many times within a query execution plan (in which it occurs), we need to cost the average case and the mean is the approximation of choice. We will see more applications in Section 24.5.2. 449 24.5. APPROXIMATION WITH LINEAR MODELS 24.5 Approximation with Linear Models 24.5.1 Linear Models In this section, we want to approximate a given set of points (xi , yi ) (1 ≤ i ≤ m) by a linear combination fˆ of given functions Φj , 1 ≤ j ≤ n. The general assumption is that m > n. We define the estimation function fˆ as fˆ(x) := n X cj Φj (x) j=1 for coefficients cj ∈ R. The estimates yˆi for yi are then derived from fˆ by yˆi := fˆ(xi ) = n X cj Φj (xi ). j=1 Note that the functions Φj are not necessarily linar functions. For example, we could use polynomials Φj (x) = xj−1 . Further, there is no need for x to be a single number. It could as well be a vector ⃗x. It is convenient to state our approximation problem in terms of vectors and matrices. Let (xi , yi ) be the points we want to approximate and Φj , 1 ≤ j ≤ n be some functions. We define the design matrix A ∈ Rm×n , A = (ai,j ) by ai,j = Φj (xi ) or, equivalently, as an explicit matrix   Φ1 (x1 ) Φ2 (x1 ) Φ3 (x1 ) . . . Φn (x1 )  Φ1 (x2 ) Φ2 (x2 ) Φ3 (x2 ) . . . Φn (x2 )   A=   ... Φ1 (xm ) Φ2 (xm ) Φ3 (xm ) . . . Φn (xm ) (24.5) Assume we wish to approximate the points by a polynomial of degree n − 1. Then, Φi (x) = xi−1 and the design matrix becomes   1 (x1 )1 (x1 )2 . . . (x1 )n−1 1 (x2 )1 (x2 )2 . . . (x2 )n−1  . A=   ... 1 2 n−1 1 (xm ) (xm ) . . . (xm ) In the simplest case, where we want to use a linear function fˆ(x) = c1 + c2 x to approximate the points, the design matrix becomes   1 x1 1 x2   A=  ... . 1 xm As an example consider the three points (1, 20), (2, 10), (3, 60). 450 CHAPTER 24. CARDINALITY AND COST ESTIMATION The design matrix becomes   1 1 A = 1 2 1 3 (24.6) For every column vector ⃗c = (c1 , c2 )T A⃗c gives the result of fˆ for all points. Clearly, ⃗c should be determined such that the deviation of A⃗c from ⃗y = (y1 , . . . , ym )T becomes minimal. The deviation could be zero, that is A⃗c = ⃗y . However, remember our assumption that m > n. This means that we have more equations than variables. Thus, we have an overdetermined system of equations and it is quite unlikely that a solution to this system of equations exists. This motivates our goal to find an approximation as good as possible. Next, we formalize this goal. Often used measures for deviations or distances of two vectors are based on norms. Definition 24.5.1 (norm) Let S be a linear space. Then a function ||x|| : S → R is called a norm if and only if it has the following three properties: 1. ||x|| > 0 unless x = 0 2. ||λx|| = |λ| ||x|| 3. ||x + y|| ≤ ||x|| + ||y|| Various norms, called p norms can be found in the literature. Let x ∈ Rn and p ≥ 1 where p = ∞ is possible. Then n X 1 ||x||p = ( |xi |p ) p . i=0 The most important norms are the l1 , l2 , and l∞ norms: ||x||1 = |x1 | + . . . + |xn | p ||x||2 = (x1 )2 + . . . + (xn )2 n ||x||∞ = max |xi | i=1 Using these norms, we can define distance functions d1 , d2 , and d∞ . For two vectors x and y in Rn , we define d1 (x, y) = ||x − y||1 d2 (x, y) = ||x − y||2 d∞ (x, y) = ||x − y||∞ It should be clear, that these define the error measures E1 , E2 , and E∞ , which we used in Sec. 24.4. The only missing error function is Eq . We immediately fill this gap, and start with the one dimensional case. 24.5. APPROXIMATION WITH LINEAR MODELS 451 Definition 24.5.2 (Q-paranorm in R) Define for x ∈ R,   ∞ if x ≤ 0 1/x if 0 < x ≤ 1 ||x||Q =  x if 1 ≤ x || · ||Q is called Q-paranorm. Note that for x > 0, ||x||Q = max(x, 1/x). The multivariate case is a straightforward extension using the maximum over all components: Definition 24.5.3 (Q-paranorm in Rn ) For x ∈ Rn , define n ||x||Q = max ||xi ||Q . i=1 We denote this paranorm by lq . Definition 24.5.4 (paranorm) Let S be a linear space. Then a function ||x|| : S → R is called a paranorm if and only if the following two properties hold: 1. ||x|| ≥ 0 2. ||x + y|| ≤ ||x|| + ||y|| The Q-paranorm is a norm, hence the name. The only missing part is the distance function stated next. Let x and y be two vectors in Rn , where y = (y1 , . . . , yn )T with yi > 0. Then we define dq (x, y) = ||x/y||Q where we define x/y for two column vectors x, y ∈ Rn as follows: x/y = (x1 /y1 , . . . , xn /yn )T . Between norms there exist some inequalities. For all vectors x ∈ Rn , we have √ ||x||2 ≤ ||x||1 ≤ n||x||2 √ ||x||∞ ≤ ||x||2 ≤ n||x||∞ ||x||∞ ≤ ||x||1 ≤ n||x||∞ For lq , no such inequality exists as ||x||Q approaches infinity as x approaches zero. We can now formally state the approximation problem. Let A ∈ Rm×n be the design matrix and (xi , yi ), 1 ≤ i ≤ m be a set of points, and ⃗y = (y1 , . . . , ym ). The goal is to find a vector ⃗a∗ ∈ Rn minimizing d(A⃗a, ⃗y ). That is, we look for ⃗a∗ ∈ Rn such that d(A⃗a∗ , ⃗y ) = minn d(A⃗a, ⃗y ) ⃗a∈R (24.7) ⃗a∗ is then called solution of the approximation problem or best approximation. 452 CHAPTER 24. CARDINALITY AND COST ESTIMATION For different l (d), we get different problems. For l1 the problem is called quantile regression. We will not deal with it here, since we do not know of any application of it in the database context. The solutions for the problems for l2 , l∞ , and lq are discussed in subsequent sections, after we have given some example applications of what needs to be approximated in a DBMS. Before we proceed, let us give the solutions for approximating the points (1, 20), (2, 10), (3, 60) with a linear function α + βx. The following table shows the values of x, y and estimates for y produced by the best approximations fˆl2 , fˆl∞ , fˆlq , which minimize l2 , l∞ , and lq , resp. Additionally, we give the α and β of the best approximations as well as their quality measured by l1 , l2 and lq . x 1 2 3 y 20 10 60 α β l2 l∞ lq fˆl2 10 30 50 20 -10 14.1421 20 3 fˆl∞ 5 25 45 20 -15 15 15 4 fˆlq 10 20 30 10 0 19.1485 30 2 Let us repeat some general insights into approximation problems as defined above. Thereby, we follow the exposition of Watson [914]. We start with stating theorems on the existence of a solution. The following two theorems only apply to norms. That is, they do not apply to lq . However, as we will see later, solutions under lq exist. Theorem 24.5.5 (Existence 1) Let M denote a compact set in a normed linear space. Then to each point g of the space there exists a point of M closest to g. Compactness is a sufficient but not a necessary condition. Theorem 24.5.6 (Existence 2) Let M be a finite dimensional subspace of a normed linear space S. Then there exists a best approximation in M to any point of S. The next point to consider is the uniqueness of a solution. Proving the uniqueness of a solution is easy, if the norm is strictly convex. Definition 24.5.7 ((strictly) convex) Let f (x) be a function on the elements x of a linear space S. Then f (x) is convex if f (λx1 + (1 − λ)x2 ) ≤ λf (x1 ) + (1 − λ)f (x2 ) for all x1 , x2 ∈ S and 0 ≤ λ ≤ 1. If 0 < λ < 1 implies strict inequality in the above inequality, f (x) is called strictly convex. It is easy to show that all lp norms for p ̸= ∞ are strictly convex and that l∞ and lq are convex, but not strictly convex. For strictly convex norms, it is easy to show that a solution is unique. 24.5. APPROXIMATION WITH LINEAR MODELS 453 Theorem 24.5.8 In a strictly convex normed linear space S, a finite dimensional subspace M contains a unique best approximation to any point of S. Although l∞ and lq are not strictly convex, under certain circumstances a unique best approximation exists for them. This is discussed in subsequent sections. Considering the above, one might conjecture that l2 approximation is much simpler than l∞ or lq approximation. This is indeed the case. We will not repeat all the findings from approximation theory and the algorithms developed. There are plenty of excellent textbooks on this matter. We highly recommend the excellent book of Golub and van Loan [336], which discusses l2 approximation and several algorithms to solve them (e.g. QR factorization and SVD). Other good books to refresh one’s knowledge on matrix algebra are [389, 408, 432, 777]. Überhuber wrote another good book discussing l2 approximation, QR factorization and SVD [885]. In the context of statistics, many different regression models exist to approximate a given set of data. An excellent overview is provided by [?]. Another good reading, not only in this context, is the book by ToDo Press, Teukolsky, Vetterling, and Flannery [702]. Before reading these books, it might be helpful to repeat some linear algebra and some basics of matrices. An excellent book for doing so was written by Schmidt and Trenkler [777]. The only book we know of that discusses approximation under l∞ , is the one by Watson, already cited above [914]. Approximation under lq is not discussed in any textbook. Hence, we must refer to the original articles [617, 623]. In any case, since mathematics is quite involved at times, we give explicit algorithms only for the approximation by a linear function. For all other cases, we refer to the literature. 24.5.2 Example Applications In this section, we give some examples of approximation problems occurring in the database context. As we will see, different problems demand different norms. Additionally, we sketch how to use approximations. The details are left to the reader as an exercise. Disk seek times There exist small benchmarks, which measure the disk seek time for travelling n cylinders (see Sec. 4.1). To cover for random errors, many measurements are taken. The task is to find the parameters d and ci , 1 ≤ i ≤ 4, for the disk seek time formula from Sec. 4.1: √  c1 + c2 d d <= c0 seektime(d) = c3 + c4 d d > c0 Since many seeks occur during the processing of a single query, l2 is the appropriate norm. On the surface, we seem to have a problem √ using a simple linear model. However, we can approximate the parts c1 +c2 d and c3 +c4 d for several distinct c0 either by trying a full range of values for c0 or by a binary search. The solution for c0 we then favor is the one in which the maximum of the errors √ on both parts becomes minimal. A second problem is the occurrence of d, 454 CHAPTER 24. CARDINALITY AND COST ESTIMATION √ since this does not look linear. However, choosing Φ1 = 1 and Φ2 (x) = x will work fine. Another method is to transform a set of points (xi , yi ) with two (injective) transformation functions tx and ty into the set of points (tx (xi ), ty (yi )). Then this set is approximated and the result is transformed back. While using this approach, special attention has to be paid to the norm, as it can change due to the transformation. We see examples of this later on in Sec. 24.5.6. Functions sensitive to parameter size Another example is to approximate the execution time of a hash function on string values. As its calculation depends on the length of the input string, measurements can be taken for various lengths. Using l2 as a norm is perfect, because the hash function is typically executed many times during a hash join or hash teams [352]. Approximation of frequency densities and distributions We start by demonstrating the usage of approximating functions for cardinality estimation. Then, we look at the choice of error metrics for estimating selectivity results and the influence of cardinality estimation errors on joins. Let R be a relation and A one of its attributes. Let (xi , fi ) denote the frequency fi with which the value xi occurs in attribute A. Typically, only those values xi are written down and approximated for which fi ̸= 0. We further assume that the xi are sorted, i.e., xi < xi+1 . Using the methods to come, we can approximate this set of points by a function fˆ(x). To calculate the output cardinality of a selection σA=c (R), we can simply return fˆ(c) as an estimate. Hence it is a good choice to use lq (see below for strong arguments). To calculate the result cardinality for a range query of the form σc1 ≤A≤c2 (R), we distinguish several cases. First, assume that the domain of A is discrete and the number of values between c1 and c2 is small. Then, we can calculate the result quite efficiently by X fˆ(x) c1 ≤x≤c2 EXC if the active domain of the attribute under consideration is dense, which assume in this subsection. In Section ??, we present estimation formulas without this assumption. If the number of values between c1 and c2 is too large for an explicit summation, we can apply speed-up techniques if the function fˆ has a simple form. For example, if fˆ is a linear function fˆ(x) = α + βx, the above sum can be calculated very efficiently . If the number of values between c1 and c2 is very large and no efficient form for the above sum can be found, or if we do not have a discrete domain, we can use the integral to approximate the sum. Thus, we use the right-hand side of X c1 ≤x≤c2 fˆ(x) ≈ Z c2 c1 fˆ(x)dx 455 24.5. APPROXIMATION WITH LINEAR MODELS xi fi fi+ 5x 1 10 10 5 2 10 20 10 3 0 20 15 4 1 21 20 5 1 22 25 6 1 23 30 7 0 23 35 8 1 24 40 9 40 64 45 10 36 100 50 Figure 24.5: Example frequency density and cumulated frequency . to approximate the sum by evaluating an expression which is hopefully less expensive to evaluate. Yet another solution is the following. Instead P of approximating (xi , fi ) directly, we approximate (xi , fi+ ) where fi+ = j≤i (fi ). Let us denote the approximation of this cumulated frequency distribution by fˆ+ . Then the result cardinality of a range query of the form σc1 ≤A≤c2 (R) can be simply calculated by fˆ+ (c2 ) − fˆ+ (c1 ). However, this can be very dangerous since even if the approximation of the cumulated frequency function is rather precise, the difference can be vastly off the true value as we will see next. An example for a frequency density is shown in Figure 24.5. Further define the cumulated frequency f + (c1 , c2 ) as X f + (c1 , c2 ) := fi . c1 ≤xi ≤c2 Then, f + (c1 , c2 ) gives the result for above query.P Define the cumulated frequency f + (c2 ) := xi ≤c2 fi . With the help of + f (c2 ) we can calculate f + (c1 , c2 ) by observing that f + (c1 , c2 ) = f + (c2 ) − f + (c1 − 1). An idea often found in the literature is to approximate f + (c2 ) by some function fˆ+ (c2 ) and provide an estimate for f + (c1 , c2 ) by defining fˆ+ (c1 , c2 ) := fˆ+ (c2 ) − fˆ+ (c1 − 1). For our example, we let us define fˆ+ (x) = 5x. Note that this is a linear approximation. Then, we see that fˆ+ (xi ) is never more than a factor of 2 away from f + (xi ). Thus, it is a pretty good approximation. This is illustrated in Figure 24.6. However, we see that fˆ+ (8, 10) = 15 f + (8, 10) = 77 fˆ+ (4, 7) = 20 f + (4, 7) = 3 The estimates differ by far more than a factor of 2 from their true values. Thus, we have to look for a different solution. 456 CHAPTER 24. CARDINALITY AND COST ESTIMATION 120 cum freq 5x 100 80 60 40 20 0 0 2 4 6 8 Figure 24.6: Cumulated frequency and its approximation Assume for a change, that we are interested inP half-open intervals. Thus, we would like to provide estimates for f − (c1 , c2 ) := c1 ≤xi 1. Remember that during dynamic programming, all subsets of relations are considered. Especially those subsets occur in which all relations belong to one category only. Hence, building on the cancellation of errors by mixing them from different categories is not a true option. Instead, we should minimize Y max{fi /fˆi , fˆi /fi } Ri ∈x in order to minimize errors and error propagation. This product can be minimized by minimizing each of its factors. This means that if we want to minimize error propagation, we have to minimize the multiplicative error Eq for estimating the cardinalities of selections based on equality. This finding can obviously be generalized to any kind of selections. Thus, for cardinality estimations for selections (and joins, or cardinality estimation in general) the q-error is the error metrics of choice. Error bounds guaranteeing plan optimality. Let us give another strong argument for minimizing the multiplicative error Eq . Let us consider again the join expression given in 24.8. Further, denote by fi the correct selectivity of σAi =ci and by fˆi some estimate. If the plan generator uses the correct cardinalities, it produces the optimal plan. Given the estimates, it might produce another plan. The question is, how far can the cardinality estimates deviate from the true cardinalities such that the opimal plan still remains the same. More formally, denote by P the optimal plan under the correct cardinalities f and by P̂ the optimal plan under the estimates fˆ. Then, we can restate the above question to whether there exists a condition on fˆ such that if this condition holds then P̂ = P. The nice truth is that such conditions exist and they involve the Q paranorm. In the simplest case, let us assume that the expression given in 24.8 is used to evaluate a star query under an ASI cost function without considering cross products. From Sec. 3.2.2, we can conclude that the optimal join order for star queries starts with the center relation and orders the satellite relations according to their rank. This holds if the a symmetric cost function is used like Cout . The rank of a relation R is defined as rank(Ri ) = (T (Ri ) − 1)/C(Ri ), where C(S) are some fixed per tuple costs and T (Ri ) = f0,i fi |R|, if f0,i is the 24.5. APPROXIMATION WITH LINEAR MODELS 459 join selectivity of the join of Ri with the center relation R0 . Thus, P̂ = P if f and fˆ result in the same ordering of the relations. Since f (x) = (x − 1)/c is monotonically increasing for constants c, we can conclude that the ordering is indeed the same as long as for all i ̸= j we have f0,i fi |Ri | < f0,j fj |Rj | ⇐⇒ f0,i fˆi |Ri | < f0,j fˆj |Rj | which is equivalent to fˆi ri fi ri < 1 ⇐⇒ <1 fj rj fˆj rj for ri = f0,i |Ri |. We now show that if fi || ||Q < min i̸=j fˆi s || fi ri ||Q fj rj (24.9) for all i, then P = P̂ . This condition implies the much weaker condition that for all i ̸= j fˆj fˆi fi ri || ||Q || ||Q < || ||Q (24.10) fi fj fj rj To show the claim, it suffices to show that (fi ri )/(fj rj ) < 1 implies (fˆi ri )/(fˆj rj ) < 1. This follows from fˆi ri fˆj rj = fˆi fj fi ri fi fˆj fj rj = ( fi ri fˆi fj )/(|| ||Q ) fi fˆj fj rj ≤ (|| (∗) fj fˆi fi ri ||Q || ||Q )/(|| ||Q ) fi fj rj fˆj < 1 where (*) follows from (fi ri )/(fj rj ) < 1. Thus, we have shown that if the q-error is limited as in condition 24.9, the produced plan is still optimal. From cardinality estimation error bounds to cost error bounds. If there are cardinality estimation errors, the plan generator can accidentally produce the wrong plan. This plan may be suboptimal under the true cardinalities but is optimal under the estimated cardinalities. The question is, how bad is the plan? To clarify this, assume that the optimal plan under the true cardinalities is P . The optimal plan under the estimated cardinalities is P̂ . Then, we are interested in the factor by which the P̂ is worse than P . The following theorem answers this question [622]: Theorem 24.5.9 Let C = CSMJ or C = CGHJ be the cost function of the sort-merge or the Grace hash join. For a given query in n relations, let P be the optimal plan under the true cardinalities, P̂ be the optimal plan under the 460 CHAPTER 24. CARDINALITY AND COST ESTIMATION chain query star query 12 10 8 6 4 2 0 1 1.2 1.4 1.6 1.8 2 2.2 2.4 Figure 24.7: Q-error and plan optimality estimated cardinalities, C(P ) be the true costs under C of the optimal plan, and C(P̂ ) be the true costs under C of the plan produced under the estimated cardinalities. Then C(P̂ ) ≤ q 4 C(P ), where q is defined as q = max ||ŝx /sx ||Q , x⊆X with X being the set of relations to be joined, and sx (ŝx ) is the true (estimated) size of the join of the relations in x. That is, q is the maximum estimation error taken over all intermediate results. This bound is rather tight, as is demonstrated by the example shown in Fig. 24.7 (taken from [622]). This figure shows for a chain and a star query with four relations the quotient cost(P̂ )/cost(P ) for increasing q-errors. For the star query, we see that this ratio is about 11.11, which is about 23.46 . Thus, a bound of the form q 3 C(P ) would fail. 24.5.3 Linear Models Under l2 Now that we know that the solution to our problem exists and is unique, we continue by characterizing it. Let S be a linear space (say R2 ) and s ∈ S some point. Further denote by G some linear subspace of S (say a straight line). For any g ∈ G, we can define the residual vector g − f . Using residuals, we can characterize the unique solution quite easily and intuitively. Exactly the vector g ∗ ∈ G is closest to f whose residual f − g is orthogonal to G. Remember that two vectors are orthogonal if and only if their scalar product is zero. Now we can characterize the solution to our approximation problem under l2 . Theorem 24.5.10 (Characterization) Let S be a linear space and G a subspace. An element g ∗ is the best approximation of a point s ∈ S if and only if ⟨g ∗ − f, g⟩ = 0 holds. That is, if the error is orthogonal to all elements in G. 461 24.5. APPROXIMATION WITH LINEAR MODELS Since we are used to solve equations for x, we rewrite our problem to A⃗x = b. That is, the vector ⃗x replaces the coefficient vector c. Using Theorem 24.5.10, we must have that A⃗x∗ −b is orthogonal to the range of A. The range of a matrix A ∈ Rm×n is defined as R(A) = {Ax|x ∈ Rn }. Let ai be i-th column vector of A and ⃗x = (x1 , . . . , xn )T . Then, the best approximation can be found by solving the following system of linear equations, which is called (Gauß) normal equations: ⟨a1 , a1 ⟩x1 ⟨a1 , a2 ⟩x1 .. . + ⟨a2 , a1 ⟩x2 + ⟨a2 , a2 ⟩x2 .. . + . . . + ⟨an , a1 ⟩xn + . . . + ⟨an , a2 ⟩xn .. .. . . = ⟨b, a1 ⟩ = ⟨b, a2 ⟩ .. . ⟨a1 , an ⟩x1 + ⟨a2 , an ⟩x2 + . . . + ⟨an , an ⟩xn = ⟨b, an ⟩ or, using matrix notation, we get AT A⃗x = AT⃗b (24.11) This system of linear equations can be solved by many different approaches. Some fast and numerically stable approaches are QR decomposition and singular value decomposition (SVD). Both leave the conditioning of the problem unchanged. QR decomposition can only be applied if the the matrix has full rank (see below). Otherwise, one has to keep up with variants of QR decomposition or SVD. Hence, we will briefly discuss SVD. We will not give any algorithms. The interested reader is referred to [336]. Before we proceed with SVD, let us repeat some basics on matrices. A special matrix is the identity matrix I ∈ Rn×n with I = (δi,j )i,j , 1 ≤ i ≤ n. Matrices can have plenty of properties. Here are some of them. Definition 24.5.11 (rank) The rank of a matrix A, denoted by rank(A), is the rank of the subspace R(A). Definition 24.5.12 (full rank) A matrix A ∈ Rm×n , m > n has full rank if its rank is n. Definition 24.5.13 (symmetric) A matrix A ∈ Rn is symmetric if and only if AT = A. Note that for all matrices A ∈ Rm×n , we always have that AAT and AT A are symmetric. Definition 24.5.14 (idempotent) A matrix A ∈ Rn×n is idempotent if and only if AA = A. Definition 24.5.15 (inverse) A Matrix A−1 ∈ Rn×n is the inverse of a matrix A ∈ Rn×n if and only if A−1 A = AA−1 = I. A matrix for which the uniquely determined inverse exists is called regular. Definition 24.5.16 (orthogonal) A matrix A ∈ Rn×n is orthogonal if and only if AAT = AT A = I. 462 CHAPTER 24. CARDINALITY AND COST ESTIMATION Let us use a simple, operational, recursive definition of the determinant. Definition 24.5.17 (determinant) Let A ∈ Rn×n be a matrix. We define the determinant of A as det(A) = a1,1 if n = 1. Otherwise, we define det(A) = n X (−1)i+j ai,j det(Ai,j ) j=1 where Ai,j ∈ R(n−1)×(n−1) results from A by eliminating the i-th row and the j-th column. Definition 24.5.18 (characteristic polynomial) Let A ∈ Rn×n be a matrix. The characteristic polynomial is defined as (a1,1 − z) a1,2 a2,1 (a2,2 − z) Pn (z; A) := det(A − zI) = .. .. . . an,1 an,2 ... ··· .. . a1,n a2,n .. . · · · (an,n − z) Definition 24.5.19 (Eigenvalue) Let A ∈ Rn×n be a matrix and Pn (z; A) its characteristic polynomial. Any root λi of Pn (z; A), i.e. P( z; A)(λi ) = 0 is called Eigenvalue. The set of Eigenvalues is denoted by λ(A) := {λ1 , . . . , λk } and is called spectrum of A. Definition 24.5.20 (similar) Two matrices A, B ∈ Rn×n are similar if and only if there exists a regular matrix X ∈ Rn×n such that B = X −1 AX. Two similar matrices have the same Eigenvalues, as can be seen from the following theorem. Theorem 24.5.21 Let A, B ∈ Rn×n be two similar matrices. Then they have the same characteristic polynomial. Definition 24.5.22 (generalized inverse) A matrix A− ∈ Rn×m is the generalized inverse, or g-inverse, of a matrix A ∈ Rm×n , if AA− A = A holds. Every matrix and, hence, every vector has a g-inverse. For regular matrices, the g-inverse and the inverse coincide. In general, the g-inverse is not uniquely determined. Adding some additional properties makes it unique. Definition 24.5.23 (Moore-Penrose inverse) A matrix A+ ∈ Rn×m is the Moore-Penrose inverse of a matrix A ∈ Rm×n if the following conditions hold: 1. AA+ A = A 2. A+ AA+ = A+ 24.5. APPROXIMATION WITH LINEAR MODELS 463 3. (A+ A)T = A+ A 4. (AA+ )T = AA+ For every matrix and, hence, every vector there exists a uniquely determined Moore-Penrose inverse. In case A is regular, A+ = A−1 holds. If A is symmetric, then A+ A = AA+ . If A is symmetric and idempotent, then A+ = A. Further, all of A+ A, AA+ , I − A+ A, and I − AA+ are idempotent. Here are some equalities holding for the Moore-Penrose inverse: (A+ )+ = A T (A ) T (A A) + + (24.12) + T = (A ) (24.13) + T + (24.14) T + + = A (A ) + = (A ) A (24.15) T A AA + = A T (24.16) + T = A T (24.17) T (AA ) A AA The following theorem states the existence of a decomposition of any matrix into regular/orthogonal submatrices. Theorem 24.5.24 (singular value decomposition) Let A ∈ Rm×n be a matrix. Then there exist an orthogonal matrix U ∈ Rm×m and an orthogonal matrix V ∈ Rn×n such that U T AV = S such that S ∈ Rm×n is of the form S = diag(s1 , . . . , sk ) with k = min(m, n) and, further s1 ≥ s2 ≥ . . . ≥ sr > sr+1 = . . . = sk = 0 holds where r = rank(A). For a proof and algorithms to calculate the SVD of an arbitrary matrix see the book by Golub and Loan [336]. Another proof can be found in the book by Harville [408]. The diagonal elements si of S, which is orthogonal equivalent to A, are called singular values. From S T S = (U T AV )T (U T AV ) = V T AT U U T AV = V −1 AT AV it follows that S T S and AT A are similar. Since S T S = diag(s21 , . . . , s2r , 0, . . . , 0) and similar matrices have the same spectrum, it follows that p si = λi for λi ∈ λ(AT A), 1 ≤ i ≤ n. 464 CHAPTER 24. CARDINALITY AND COST ESTIMATION Define S −1 = diag(1/s1 , . . . , 1/sr , 0, . . . , 0) and A+ = V S −1 U T . From AA+ A = (U SV T )(V S −1 U T )(U SV T ) = U SS −1 SV T = U SV T = A and A+ AA+ = (V S −1 U T )(U SV T )(V S −1 U T ) = V S −1 SS −1 U T = V S −1 U T = A+ we see that A+ = V S −1 U T is a g-inverse of A. The reader is advised to check the remaining conditions of the Moore-Penrose inverse. Remember that we have to solve AT A⃗x = AT⃗b for ⃗x in order to find the best approximation for our set of data points. Set ⃗x = A+⃗b. Then AT A⃗x = AT AA+⃗b = AT⃗b where we used Eqn. 24.16. Hence, the Moore-Penrose inverse2 solves our problem. Moreover, the solution can be obtained easily from the singular value decomposition. Approximation by a linear function Assume we are given m points (xi , yi ), 1 ≤ i ≤ m and wish to approximate them by a linar function f (x) = α + βx. The design matrix, b and ⃗x then are     1 x1 y1   1 x2  α    ..  A = . ..  x = β , b =  .   .. .  ym 1 xm The resulting system of normal equations      Pm  Pm y m x α i i i=1 i=1 Pm Pm = Pm 2 β i=1 xi i=1 (xi ) i=1 xi yi has the solution P P P Pm Pm Pm Pm 2 m m x i yi − m xi m y i=1 i=1 i=1 (xi )P i=1 yi − P i=1 xi yi i=1 xi Pm Pm i=12 i ,β = α= m m 2 2 2 m i=1 (xi ) − ( i=1 xi ) m i=1 (xi ) − ( i=1 xi ) Note that this is a very nice formula as new points arrive or are deleted, only the sums have to be updated and the quotients to be calculated. There is no need to look at the other points again. 2 The Greville algorithm to calculate the Moore-Penrose inverse directly is described in [777]. 24.5. APPROXIMATION WITH LINEAR MODELS 24.5.4 465 Linear Models Under l∞ Let A ∈ Rm×n be a matrix, where m > n, and b ∈ Rm a vector. The problem we solve in this section is to find ⃗a ∈ Rn to minimize ||r(a)||∞ (24.18) r(⃗a) = ⃗b − A⃗a (24.19) where The components of the vector r(⃗a) are denoted by ri (⃗a). As pointed out earlier, l∞ is a convex norm. Hence, a solution exists. Since l∞ is not strictly convex, the uniqueness of the solution is not guaranteed. To solve problem 24.18 by following the approach proposed by Watson [914]. We start by characterizing the solution, continue with the conditions under which uniqueness holds, make some more observations, and finally derive an algorithm for the case n = 2, i.e. we find a best approximation by a linear function. Although only few applications in databases exist for l∞ , it is very useful to find a best approximation under lq if we want to approximate by a function eβ+αx (see Sec. 24.5.6). Assume we have a best solution ⃗a. Then, for some indices i, ri (⃗a) attains the maximum, i.e. ri (⃗a) = ||r(⃗a)||∞ . Otherwise, a better solution would exist. We ¯ a). We further denote the set of indices where the maximum is attained by I(⃗ ¯ The denote by θi (⃗a) the sign of ri (⃗a). Thus ri (⃗a) = θi (⃗a)||r(⃗a)||∞ for all i ∈ I. following theorem gives a characterization of the solution. Theorem 24.5.25 A vector ⃗a ∈ Rn solves problem 24.18 if and only if there exists a subset I of I¯ with |I| ≤ n + 1 and a vector ⃗λ ∈ Rm such that 1. λi = 0 for all i ̸∈ I, 2. λi θi ≥ 0 for all i ∈ I, and 3. AT⃗λ = ⃗0. The set I in the theorem is called an extremal subset of a solution ⃗a. There are two important corollaries to this theorem. Corollary 24.5.26 Let ⃗a solve problem 24.18. Then ⃗a solves an l∞ approximation problem in Rn+1 obtained by restricting the components of r(⃗a) to some particular n + 1 components. If A has rank t, then the components of r(⃗a may be restricted to a particular t + 1 components. Corollary 24.5.27 Let ⃗a solve problem 24.18 and let I be chosen according to Theorem 24.5.25 such that λi ̸= 0 for all i ∈ I. Further let d⃗ be another solution to 24.18. Then ⃗ = ri (⃗a). ri (d) Hence, not surprisingly, any two solutions have the same residuals for components where the maximum is attained. The theorem and its first corollary state that we need at most t + 1 solutions for a matrix A of rank t. The next theorem shows that at least t + 1 indices exist where the maximum is attained. 466 CHAPTER 24. CARDINALITY AND COST ESTIMATION Theorem 24.5.28 If A has rank t, a solution to problem 24.18 exists for which ¯ ≥ t + 1. |I| Thus, any submatrix of A consisting of a subset of the rows of A, which core¯ a), must have rank t for some solution ⃗a to spond to the indices contained in I(⃗ problem 24.18. The above theorems and corollaries indicate that the clue to uniqueness is the rank of subsets of rows of A. The following definition captures this intuition. Definition 24.5.29 (Haar condition) A matrix R ∈ Rm×n , where m ≥ n satisfies the Haar condition if and only if every submatrix consisting of n rows of A is nonsingular. Finally, we can derive uniqueness for those A which satisfy the Haar condition: Theorem 24.5.30 If A satisfies the Haar condition, the solution to problem 24.18 is unique. Obviously, we need to know, whether the Haar condition holds for a matrix A. Remember that we want to approximate a set of points by a linear combination of functions Φj , 1 ≤ j ≤ n. From the points (xi , yi ), 1 ≤ i ≤ m, and the Φj , the design matrix A is derived as shown in Equation 24.5. If the Φj form a Chebyshev set, the design matrix will fulfill the Haar condition. Definition 24.5.31 (Chebyshev set) Let X be a closed interval of R. A set of continous function Φ1 (x), . . . , Φn (x), Φi : X → R, is called a Chebyshev set, if every non-trivial linear combination of these functions has at most n − 1 zeros in X. Assume the xi are ordered, that is xi < xi+1 for 1 ≤ i < m. Further, it is wellknown that the set of polynomials Φj = xj−1 , 1 ≤ j ≤ n, forms a Chebyshev set on any interval X. From now on, we assume that our xi are ordered, that is x1 < . . . < xm . Further, we define X = [x1 , xm ]. We also assume that the matrix A of Problem 24.18 is defined as given in Equation 24.5, where the Φj are continous functions from X to R. We still need some more knowledge in order to build an algorithm. The next definition will help to derive a solution for subsets I of {1, . . . , m} with |I| = n + 1. Definition 24.5.32 (alternating set) Let ⃗a be a vector in Rn . We say that r(⃗a) alternates s times, if there exists points xi1 , . . . , xis ∈ {x1 , . . . , xm } such that rik (⃗a) = −rik+1 (⃗a) for 1 ≤ k < s. The set {xi1 , . . . , xis } is called an alternating set for ⃗a. Theorem 24.5.33 Let (xi , yi ), 1 ≤ i ≤ m, be an ordered set of points with xi ≤ xi+1 for 1 ≤ i < m. Define X = [x1 , xm ]. Further let Φj , 1 ≤ j ≤ n be a Chebyshev set on X. Define A = (ai,j ), where 1 ≤ i ≤ m, 1 ≤ j ≤ n, and ai,j = Φj (xi ). Then, a vector ⃗a ∈ Rn solves Problem 24.18 if and only if there exists an alternating set with n + 1 points for a. 24.5. APPROXIMATION WITH LINEAR MODELS 467 Consider again the example where we want to approximate the three points (1, 20), (2, 10), and (3, 60) by a linar function. We saw that the solution to our problem is fˆl∞ (x) = −15 + 20x. The following table gives the points, the value of fˆl∞ , the residuals, including their signs. x 1 2 3 y 20 10 60 fˆl∞ 5 25 45 ri +15 -15 +15 As Theorem 24.5.33 predicts, the signs of the residuals alternate. The proof of Lemma 24.5.33 uses the following lemma (see [914]). Lemma 24.5.34 Let (xi , yi ), 1 ≤ i ≤ m, be an ordered set of points with xi ≤ xi+1 for 1 ≤ i < m. Define X = [x1 , xm ]. Further let Φj , 1 ≤ j ≤ n be a Chebyshev set on X. Define the n × n determinant ∆i = ∆(x1 , . . . , xi−1 , xi+1 , . . . , xn+1 ) where Φ1 (x1 ) . . . Φn (x1 ) .. .. .. ∆(x1 , . . . , xn ) = det . . . Φ1 (xn ) . . . Φn (xn ) (24.20) Then sign(∆i ) = sign(∆i+1 ), ∀ 1 ≤ i ≤ n. Let us take a closer look at Theorem 24.5.33 in the special case where m = 3, i.e. we have exactly three points (xi1 , yi1 ), (xi2 , yi2 ), and (xi3 , yi3 ). We find the best linear approximation fˆ(x) = α + βx under l∞ by solving the following equations: yi1 − (α + βxi1 ) = −1 ∗ λ yi2 − (α + βxi2 ) = +1 ∗ λ yi3 − (α + βxi3 ) = −1 ∗ λ where λ represents the value of ||r(⃗a)||∞ for the solution ⃗a to be found. Solving these equations results in (yi − yi1 )(xi2 − xi1 ) yi2 − yi1 − 3 2 2(xi3 − xi1 ) yi2 − yi1 2λ β = − xi2 − yi1 x i2 − x i1 α = yi1 + λ − xi1 β λ = The algorithm to find the best approximation under l∞ starts with three arbitrary points with indices i1 , i2 , and i3 with xi1 < xi2 < xi3 . Next, it derives α, β, and λ using the solutions to our equations 24.21-24.21. Then, the algorithm tries find new indices j1 , j2 , j3 by exchanging one of the ij with 468 CHAPTER 24. CARDINALITY AND COST ESTIMATION some k such that λ will be increased. Obviously, we use a k that maximizes the deviation from the best approximation fˆ of i1 , i2 , i3 , i.e. ||yk − fˆ(xk )||∞ = max ||yi − f (xi )||∞ . i=1,...,m Depending on the position of xk in the sequence i1 , i2 , i3 and the signs of the residuals we determine the ij to be exchanged with k. • xk < xi1 if (sign(yk − fˆk ) == sign(yi1 − fˆi1 )) then j1 = k, j2 = i2 , j3 = i3 else j1 = k, j2 = i1 , j3 = i2 • xi1 < xk < xi2 if (sign(yk − fˆk ) == sign(yi1 − fˆi1 )) then j1 = k, j2 = i2 , j3 = i3 else j1 = i1 , j2 = k, j3 = i2 • xi2 < xk < xi3 if (sign(yk − fˆk ) == sign(yi2 − fˆi2 )) then j1 = i1 , j2 = k, j3 = i2 else j1 = i1 , j2 = i2 , j3 = k • xk > xi3 if (sign(yk − fˆk ) == sign(yi3 − fˆi3 )) then j1 = i1 , j2 = i2 , j3 = k else j1 = i2 , j2 = i3 , j3 = k EXC The above rules are called exchange rules. In general, they state that if k falls between two indices, the one with the same sign as rk is replaced by k. If k is smaller than the smallest index (larger than the largest index), we consider two cases. If the smaller (largest) index has the same sign of its residue as k, we exchange it with k; otherwise we exchange it with the largest (smallest) index. Stated this way, we can use the exchange rules for cases where n > 2. Algorithm 24.8 summarizes the above considerations. In case n > 2, the above algorithm remains applicable. We just have to use the general exchange rule and provide a routine solving the following system of equations for xi and λ: a1,1 x1 + a1,2 x2 + a1,n xn = −λ a2,1 x2 + a2,2 x2 + a2,n xn = +λ ... ... an+1,1 x2 + an+1,2 x2 + an+1,n xn = (−1)n+1 λ 24.5.5 Linear Models Under lq Let (xi , yi ) for 1 ≤ i ≤ m be a set of points with yi > 0, which we again want to approximate by a linear combination of a given set of functions Φj , 1 ≤ j ≤ n. 24.5. APPROXIMATION WITH LINEAR MODELS 469 BestLinearApproximationUnderChebyshevNorm 1. Choose arbitrary i1 , i2 , i3 with xi1 < xi2 < xi3 . (e.g. equi-distant ij .) 2. Calculate the solution for the system of equations. This gives us an approximation function fˆ(x) = α + βx and λ. 3. Find an xk for which the deviation of fˆ from the given data is maximized. Call this maximal deviation λmax . 4. If λmax − λ > ϵ for some small ϵ then apply the exchange rule using xk and go to step 2. (The ϵ is mainly needed for rounding problems with floating point numbers.) 5. Return α, β, λ. Figure 24.8: Algorithm for best linear approximation under l∞ This time, we measure the deviation by applying lq . That is, we want to find a coefficients aj such that the function fˆ(x) = n X aj Φj (x) j=1 minimizes max max i=1,...,m ( fˆ(xi ) , fˆ(xi ) yi yi ) . Let ⃗a and ⃗b be two vectors in Rn with bi > 0. Then, we define ⃗a/⃗b = ⃗⃗a = b (a1 /b1 , . . . , an /bn )t T . Let A ∈ Rm×n be a matrix, where m > n and ⃗b = (b1 , . . . , bm )t T be a vector in Rm with bi > 0. Then we can state the problem as find ⃗a ∈ Rn that minimizes ||A⃗a/⃗b||Q (24.21) under the constraint that αit T > 0, 1 ≤ i ≤ m, for all row vectors αi of A. Alternatively, we can modify A by “dividing’ it by ⃗b. We need some notations to do so. Let ⃗b = (b1 , . . . , bm )t T be a vector in Rm . Define diag(⃗b) to be the m × m diagonal matrix which contains the bi in its diagonal and is zero outside the diagonal. For vectors ⃗b with bi > 0, we can define ⃗b−1 = (1/b1 , . . . , 1/bm )t T . Using these notations, we can define A′ = diag(⃗b−1 )A In the special case of univariate polynomial approximation with fˆ(x) = a1 + 470 CHAPTER 24. CARDINALITY AND COST ESTIMATION a2 x + . . . + an xn−1 the matrix A′ has the form   1/y1 x1 /y1 . . . x1n−1 /y1  1/y2 x2 /y2 . . . xn−1 /y2  2   A′ =  . . .. ..   .. . ... . (24.22) 1/ym xm /ym . . . xn−1 m /ym Keeping the trick with A′ in mind, it is easy to see that Problem 24.21 can be solved, if we can solve the general problem find⃗a ∈ Rn that minimizes||A⃗a||Q . (24.23) The following proposition ensures that a solution to this general problem exists. Further, since ||A⃗a||Q is convex, the minimum is a global one. Proposition 24.5.1 Let A ∈ Rm,n such that R(A) ∩ Rm >0 ̸= ∅. Then ||A · ||Q attains its minimum. Recall that lq is subadditive and convex. Further it is lower semi-continuous (see also [732, p. 52]). However, it is not strictly convex. Hence, as with l∞ , we expect uniqueness to hold only under certain conditions. We need some more notation. Let A ∈ Rm,n . We denote by R(A) = {A⃗a | ⃗a ∈ Rn } the range of A and by N (A) = {⃗a ∈ Rn | A⃗a = 0} the nullspace of A. Problem (24.23) can be rewritten as the following constrained minimization problem: min (⃗a,q)∈Rn ×R q subject to 1 ≤ A⃗a ≤ q q and q ≥ 1. (24.24) The Lagrangian of (24.24) is given by 1 L(⃗a, q, λ+ , λ− , µ) := q − (λ+ )T (q − A⃗a) − (λ− )T (A⃗a − ) − µ(q − 1). q Assume that R(A) ∩ Rm a, q) : 1q ≤ A⃗a ≤ q and q ≥ >0 ̸= ∅. Then the set {(⃗ 1} is non-empty and closed and there exists (⃗a, q) for which we have strong inequality in all conditions. Then the following Karush-Kuhn-Tucker conditions ˆ, q̂) to be a minimizer of (24.24), see, e.g., [825, are necessary and sufficient for (⃗a + − m p. 62]: there exist λ̂ , λ̂ ∈ R≥0 and µ̂ ≥ 0 such that ˆ, q̂, λ̂+ , λ̂− , µ̂) = AT λ+ − AT λ− = 0 ∇⃗a L(⃗a m m X ∂ ˆ 1 X − L(⃗a, q̂, λ̂+ , λ̂− , µ̂) = 1 − λ̂+ − λ̂i − µ = 0 i ∂q q2 i=1 (24.25) (24.26) i=1 and for i = 1, . . . , m,   ˆ i λ̂+ â − (A⃗a) = 0, i   ˆ i−1 λ̂− (A⃗a) = 0, i q̂ µ̂(q̂ − 1) = 0. (24.27) (24.28) 24.5. APPROXIMATION WITH LINEAR MODELS 471 Assume that 1m ̸∈ R(A), where 1m is the vector with all components 1. Then q̂ > 1 and consequently µ̂ = 0. Furthermore, it is clear that not both λ̂+ i and ˆ)i cannot be ˆ)i and 1 = (A⃗a λ̂− can be positive because the conditions q̂ = (A ⃗ a i q̂ fulfilled at the same time, since q̂ > 1. Setting λ̂ := λ̂+ − λ̂− , we can summarize our findings (24.25) - (24.28) in the following theorem. Theorem 24.5.35 Let A ∈ Rm,n such that R(A) ∩ Rm >0 ̸= ∅ and 1m ̸∈ R(A). ˆ Then (⃗a, q̂) solves (24.24) if and only if there exists λ̂ ∈ Rm such that i) AT λ̂ = 0. ii) q = q P λ̂i >0 λ̂i + 1q P λ̂i . λ̂i <0 ˆ i < q. iii) λ̂i = 0 if 1q̂ < (A⃗a) ˆ i = q̂ and if λ̂i < 0 then (A⃗a) ˆ i = 1/q̂. iv) if λ̂i > 0 then (A⃗a)   ˆ)i implies sign (A⃗a ˆ)i − 1 = 1 and that Remark. We see that 1 < q̂ = (A⃗a     ˆ)i implies sign (A⃗a ˆ)i − 1 = −1; whence λ̂i (A⃗a ˆ)i − 1 ≥ 0. 1 > 1/q̂ = (A⃗a For our approximation problem (24.21) this means that the residuum fˆ(xi ) − bi fulfills λ̂i (fˆ(xi ) − bi ) ≥ 0. Under certain conditions, problem (24.23) has a unique solution which can be simply characterized. Let us start with some straightforward considerations ˆ of ||A·||Q that in this direction. If N (A) ̸= {⃗0}, then we have for any minimizer ⃗a ˆ + β, β ∈ N (A) is also a minimizer. In particular, we have that N (A) ̸= {⃗0} if ⃗a • m < n, • m ≥ n and A is not of full range, i.e., rank(A) < n. In these cases, we cannot have a unique minimizer. Note further, that if 1m ∈ R(A), then the minimum of ||A · ||Q is 1 and the set of minimizers is given by A+ 1m + N (A), where A+ denotes the Moore-Penrose inverse of A. Of course, this can easily be checked using the methods of Sec. 24.5.3. In the following, we restrict our attention to the case m > n and rank(A) = n. The following proposition considers (n + 1, n)–matrices. Proposition 24.5.2 Let A ∈ Rn+1,n such that R(A) ∩ Rn+1 >0 ̸= ∅, 1m ̸∈ R(A) and rank(A) = n. Then ||A · ||Q has a unique minimizer if and only if the Lagrange multipliers λ̂i , i = 1, . . . , n + 1 are not zero. 472 CHAPTER 24. CARDINALITY AND COST ESTIMATION By spark(A) we denote the smallest number of rows of A which are linearly dependent. In other words, any spark(A)−1 rows of A are linearly independent. For the ’spark’ notation we also refer to [27]. Examples. 1. We obtain for the matrix   1 0 0  0 1 0   A :=  rg(A) = 3, spark(A) = 3.  0 0 1 , 1 0 1 2. The matrix (m, n)–matrix A in (24.22) is the product of the diagonal matrix diag (1/bi )m i=1 with positive diagonal entries and a Vandermonde matrix. Hence, it can easily be seen that spark(A) = n + 1. If an (m, n)–matrix A has spark(A) = n + 1, then A fulfills the Haar condition. Proposition 24.5.2 can be reformulated as follows: Corollary 24.5.36 Let A ∈ Rn+1,n such that R(A)∩Rn+1 >0 ̸= ∅ and 1m ̸∈ R(A) . Then ||A · ||Q has a unique minimizer if and only if spark(A) = n + 1. The result can be generalized by the following theorem. Theorem 24.5.37 Let A ∈ Rm,n such that R(A) ∩ Rm >0 ̸= ∅. Suppose that spark(A) = n + 1. Then ||A · ||Q has a unique minimizer which is determined by n + 1 rows of A, i.e., there exists an index set J ⊂ {1, . . . , m} of cardinality |J| = n + 1 such that ||A · ||Q and ||A|J · ||Q have the same minimum and the same minimizer. Here A|J denotes the restriction of A to the rows which are contained in the index set J. We call such index set J an extremal set. Of course the condition spark(A) = n + 1 is not necessary for ||A · ||Q to have a unique minimizer as the following example shows. Example. The matrices     1 0 1 0  0 1   0 1    A :=  and A :=   −1 1  ,  −4 4  2 −4 2 −1 1 have both spark(A) = 2. By some following considerations, we obtain for both problems that the minimum of ||A · ||Q is q̂ = 2. However, in the first ˆ = ( 1 , 2)T while the whole problem the minimizer is uniquely determined by ⃗a 2 3 1 T T line c( 2 , 1) + (1 − c)( 2 , 2) , c ∈ [0, 1] minimizes the functional in the second case. For ( 21 , 1)T we have sign(λ̂1 , λ̂2 , λ̂3 , λ̂3 ) = (−1, 0, 1, −1) while the pattern is (0, 1, 1, −1) for ( 23 , 2)T and (0, 0, 1, −1) within the line bounded by these points. By Theorem 24.5.37, a method for finding theminimizer of ||A · ||Q would m be to compute the unique minimizers of the n+1 subproblems ||A|J · ||Q for all index sets J of cardinality n + 1 and to take the largest minimum â and ˆ the corresponding of the original problem. For our line problem  ⃗a as minimizer m 3 there exist 3 = O(m ) of these subproblems. In the following section, we give another algorithm which is also based on Theorem 24.5.37, but ensures 24.5. APPROXIMATION WITH LINEAR MODELS 473 that the value a enlarges for each new choice of the subset J. Since there is only a finite number of such subsets we must reach a stage where no further increase is possible and J is an extremal set. In normed spaces such methods are known as ascent methods, see [914]. In this section, we suggest a detailed algorithm for minimizing ||A·||Q , where we restrict our attention to the line problem   bi β + αxi max max , . (24.29) i=1,...,m β + αxi bi i.e., to the matrix A in (24.22) with n = 2. Corollary 24.5.38 Let (xi , bi ), i = 1, 2, 3 be given points with pairwise distinct xi ∈ R and positive bi , i = 1, 2, 3. Then the minimum q̂ and the minimizer ˆ ∈ R2 of (24.29) are given by q̂ = ||q̂1 ||Q and ⃗a      1 x2 −x1 b1 q̂1 β̂ = , −1 1 b2 q̂2 α̂ x2 − x1 where q̂1 := and r1 :=  q r2     q 1−r1 if r1 < 0 and r2 > 0, 1−r2 if r1 > 0 and r2 < 0, r1  q   1  r1 +r2 if r1 > 0 and r2 > 0, ( 1/q̂1 if rr21 < 0, q̂2 := x̂1 if rr12 > 0 b1 (x2 − x3 ) , b3 (x2 − x1 ) r2 := (24.30) b2 (x3 − x1 ) . b3 (x2 − x1 ) Remark. If the points are ordered, i.e., x1 < x2 < x3 (or alternatively in ˆ = (q̂, 1/q̂, q̂)T or A⃗a ˆ = (1/q̂, q̂, 1/q̂)T . This descending order), then either A⃗a means that λ̂ in Theorem 24.5.35 has alternating signs. In other words, the points f (x1 ), f (x3 ) lie above b1 , b3 and f (x2 ) lies below b2 or conversely. Later we will show that the alternating sign condition is true for general best polynomial approximation with respect to the Q-paranorm. Corollary 24.5.38 is the basis of the Algorithm 24.9, which finds the optimal line with respect to three points in each step and chooses the next three points if the minimum corresponding to their line becomes larger. Proposition 24.5.3 The algorithm computes the line f (x) = β̂ + α̂x which minimizes (24.29). Remark. Alternatively, one can deal with ordered points b1 < b2 < b3 r2 which restricts the effort in (24.30) to q̂1 = 1−r but requires an ascending 1 ordering of the points xi1 , xi2 , xj in each step of the algorithm. Finally, we want to generalize the remark on the signs of the Lagrange multipliers given after Corollary 24.5.38. Remember that the set of polynomials Φi (x) = xi−1 , i = 1, . . . , n forms a Chebyshev set (see Def. 24.5.31). Applying again Lemma 24.5.34, one can easily prove the following result. 474 CHAPTER 24. CARDINALITY AND COST ESTIMATION Algorithm. (Best line approximation with respect to lq ) Input: (xi , bi ), i = 1, . . . , m of pairwise distinct points xi ∈ R and bi > 0 Set i1 := 1, i2 := 2 and stopsignal := −1. While stopsignal = −1 do 1. For i = 1, . . . , m; i ̸= i1 , i2 compute r1,i := bi1 (xi2 − xi ) , bi (xi2 − xi1 ) r2,i := bi2 (xi − xi1 ) . bi (xi2 − xi1 ) 2. Compute âj = max{||x̂1 (r1,i , r2,i )||Q } by (24.30). Let j ̸= i1 , i2 be an i index where the maximum is attained and x̂1 = x̂1 (r1,j , r2,j ). 3. Compute a := max {||r1,i x̂1 + r2,i x̂2 ||Q }. i Let k be an index, where the maximum is attained. 4. If a ≤ âj then stopsignal = 1 and â = âj ,  β̂ α̂  1 = xi2 − xi1  xi2 −xi1 −1 1  bi1 q̂1 bi2 /q̂1  , otherwise set i1 := j and i2 := k and return to 1. Figure 24.9: Algorithm finding best linear approximation under lq . Theorem 24.5.39 Let Φi : I → R, i = 1, . . . , n be a Chebyshev set and let x1 < . . . < xn+1 be points in I. Then, for Φ := (Φj (xi ))n+1,n i,j=1 , the Lagrange multipliers λ̂i , i = 1, . . . , n + 1 corresponding to the minimizer of ||Φ · ||Q have alternating signs. For our polynomial approximation problem argmin⃗a∈Rn ||A⃗a||Q with A ∈ 1 < . . . < xn+1 we see that A = diag(1/bi )n+1 Φ, where Φ is the matrix belonging to the Chebyshev set i=1 Φi (x) = xi−1 . Since the bi are positive, we obtain immediately that the Lagrange multipliers λ̂i have alternating signs. Again, this means that fˆ(xi ) − bi has alternating signs. Rn+1,n defined by (24.22) and ordered points x EXC Using Theorem 24.5.39, we not only simplify Algorithm 24.9 but also use Algorithm 24.8 even for approximation by arbitrary Chebyshev sets of size n > 2. We only have to provide a routine which solves the following system of equations: 475 24.5. APPROXIMATION WITH LINEAR MODELS a1,1 x1 + a1,2 x2 + a1,n xn = λ+1 a2,1 x2 + a2,2 x2 + a2,n xn = λ−1 ... ... an+1,1 x2 + an+1,2 x2 + an+1,n xn = λ(−1) n Let us illustrate this for n = 2. In this case, we can write 1 (α + βx1 ) = y1 λ λ(α + βx2 ) = y2 1 (α + βx3 ) = y3 λ If we number the above equations from 1 to 3, then we may conclude that 3 =⇒ α 1, (∗) =⇒ λy3 − βx3 + βx1 =⇒ (y3 − y1 )λ =⇒ λ =⇒ λ 2, (∗), (∗∗) =⇒ q13 β(q13 y3 β − βx3 + βx2 ) =⇒ β 2 (q13 y3 − x3 + x2 ) =⇒ = = = = = = = β = where λy3 − βx3 (∗) λy1 (x3 − x1 )β x3 −x1 y3 −y1 β (∗∗) q13 β (∗∗) y2 −1 yq 2 q13 −1 g −1 y2 q13 x3 − x1 y3 − y1 g := q13 y3 − x3 + x2 q13 := Caution is necessary, if β = 0. Then: β = 0 α = λy1 p y2 /y1 λ = 24.5.6 Non-Linear Models under lq In general there is a lot to say about this subject and we refer the reader to the literature. However, let us consider two simple problems, which we can solve using algorithms we already know: 1. Find the best approximation using ep(x) under lq , and 2. find the best approximation using ln(p(x)) under l∞ , where p(x) is a linear combination of a set of Chebyshev functions. 476 CHAPTER 24. CARDINALITY AND COST ESTIMATION Let us start with the first problem. That is, we ask for an exponential function Pn fˆ = e j=1 αj Φj which best approximates under lq a given set of points (xi , yi ), i = 1, . . . , m with pairwise distinct xi ∈ Rd and yi > 0, 1 ≤ i ≤ m. Note that fˆ > 0 by definition. Since the ln function increases strictly monotone this is equivalent to minimizing )! ( yi fˆ(xi ) , = max max{ln yi − ln fˆ(xi ), ln fˆ(xi ) − ln yi } ln max max i=1,...,m i=1,...,m fˆ(xi ) yi = max | ln yi − i=1,...,m n X αj Φj (xi )| j=1 = ∥(ln yi )m i=1 − Φ α∥∞ . EXC P Thus, it remains to find the best function nj=1 αj Φj (xi ) with respect to the l∞ norm. It is now easy to see that we can solve the second problem as follows. Let (xi , yi ) be the data we want to approximate by a function of the form ln(p(x)) while minimizing the Chebyshev norm. We can do so by finding the best approximation of (xi , eyi ) under lq . 24.5.7 Multidimensional Models under lq In this section, we show that it is possible to find the best approximation under lq in the multidimensional setting. The idea is to reduce the problem to second order cone programming (SOCP). In general, SOCP can be used to solve problems of the form min ⟨c, x⟩ subject to M x + b ∈ K x∈Rs where c ∈ Rs , b ∈ Rt , M ∈ Rt,s , and K is a convex cone in Rt . For details on SOCP, we refer to [564]. For us, it is important that software packages like MOSEK solve SOCP problems quite efficiently. We now show how our problem can be reduced to SOCP. Thereby, we follow the approach of Setzer et al. [792]. Assume our set of d-dimensional points is X = {x1 , . . . , xm } ⊂ Rd . For each point xi , we have a measurement (e.g., its frequency) fi > 0. Further, we want to find a linear model in functions Φj , 1 ≤ j ≤ n. Then, the model can be represented by a matrix A := (Φj /fi )m,n i,j=1 . We assume that n < m and that A has full rank. The problem to find a best approximation under lq can then be formulated as α̂ = argminα∈Rn ||Aα||Q This is equivalent to the constraint problem min u∈Rm ,α∈Rn ||u||Q subject to Aα = u (24.31) 477 24.6. TRADITIONAL HISTOGRAMS 10 10 8 8 6 6 4 4 2 2 0 0 2 4 6 8 10 0 0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 14 16 18 10 8 6 4 2 0 Figure 24.10: Sample data sets which in turn can be rewritten as min a∈R,u∈Rm ,α∈Rn a subject to Aα = u, 1 ≤ a, 1/a ≤ u ≤ a (24.32) The first two constraints and u ≤ a are already cone constraints. The remaining constraints 1 ≤ aui can be rewritten to √    0 2 √0    0   2   ui +  √0  ∈ L4r  1    a 1 √2 1 1 − 2 because the following inequalities are equivalent: √ √ √ √ ( 2ui )2 + ( 2a)2 ≤ 2(ui + a + 2)(ui + a − 2) u2i + a2 ≤ (ui + a)2 − 2 1 ≤ ui a An alternative formulation of the last constraint can be found in [792]. 24.6 Traditional Histograms In the simple profile of Section 24.3, we approximated the frequency density of the whole domain by two numbers, the cumulated frequency fA and the number of distinct values dA . The idea behind histograms is that a piecewise approximation of the active domain may result in better estimates. Therefore, the 478 CHAPTER 24. CARDINALITY AND COST ESTIMATION active domain [lA , uA ] is partitioned into subsets called buckets. Traditionally, for each bucket Bi , the cumulated frequency fi+ of the values falling within the bucket as well as the number of distinct values d+ i of A within the bucket is stored. A histogram then consists of a sequence of buckets. 24.6.1 Bucketization Assume we wish to partition the active domain DA (= ΠA (R)) of some attribute A of some relation R into β buckets Bi (1 ≤ i ≤ β). Then, each bucket contains a subset of values of the universe of A. That is Bi ⊆ [lA , uA ]. Not any subset is used in practice. Since it is too memory consuming to store the values in each bucket explicitly, buckets always comprise subintervals of the active domain. Further, these are typically non-overlapping. That is Bi ∩ Bj = ∅ for i ̸= j. Such a partitioning of the active domain can be achieved in two steps. In a first step, we fix a set of bucket boundaries bi ∈ [lA , ua ] such that lA = b0 ≤ b1 ≤ . . . ≤ bα = uA . In order to decrease the search space, the bi are typically chosen from the active domain, that is bi ∈ DA . In a second step, we use these values as bucket boundaries to determine the buckets. Here, there are several alternatives. Let us first consider the case of an integer-valued attribute A. If we use closed intervals for buckets, we can define a bucket as comprising the values in [bi−1 + 1, bi ]. Note that [bi , bi+1 ] does not work, since it overlaps with [bi+1 , bi+2 ]. We could also build a bucket [bi−1 , bi − 1]. But with proper choices of the bi , these two are equivalent. A not equivalent alternative is to use half-open intervals. In this case, we can define a bucket as [bi , bi+1 [. Another issue S is whether the buckets completely cover the active domain, that is whether βi=1 Bi = [lA , uA ] holds or not. In the latter case, we can define buckets comprising closed intervals as [bi , bi+1 ] if we do not define buckets for [bi−1 , bi ] and [bi+1 , bi+2 ]. Thus, our histogram (the set of buckets we define) contains holes. This is typically only the case if no value from DA falls into the hole. Summarizing, we have the following three alternatives for attribut with a discrete, ordered domain: 1. closed-interval histogram without holes 2. closed-interval histogram with holes 3. half-open interval histogram without holes Note that independent of the kind of histogram constructed, all the values bi must be stored. The literature is sparse on investigations of whether holes are good or not. Most papers don’t even specify which kind of histogram they are talking about. As an exception, Wang and Sevcik consider 1) and 2) [906]. They propose to treat these two possibilities as alternatives during the construction of the histogram. For a contineous domain, e.g. floating point numbers, alternative 1) is not (easily) possible. Thus, in this case, we can only choose between 2) and 3). 24.6. TRADITIONAL HISTOGRAMS 479 If most range queries use close ranges, 1) and 2) should be the preferred options. To see this, note that we must add or subtract the frequencies of the boundaries to convert a half-open or open inveral into a closed interval or vice versa. Further, even if we have a good approximation of the exact frequencies of single values, we still face the problem of values not occurring in the active domain if it is not dense. Nonetheless, the subsequent discussion applies with minor modifications to all three alternatives. 24.6.2 Heuristics to Determine Bucket Boundaries In the sequel, we discuss some heuristics to determine the bucket boundaries. All these algorithms have a parameter β for the number of buckets to construct. Subsequently, we assume that we are given a set d of value/frequency pairs X = {(xi , fi )|1 ≤ i ≤ d}. We assume that the xi are sorted in increasing order, + or n , we denote the cumulated frequency i.e., xi < xP i+1 for 1 ≤ i < d. By F F + = n = di=0 fI . The bucket boundaries will be denoted by b0 , . . . , bβ , where b0 = x0 and bβ = xd . Of course, we assume that b << d. Equi-Width Histograms Kooi was the first to propose histograms for selectivity estimation [512]. The first type of histograms he proposed were equi-width histograms. In an equiwidth histogram, the bucket boundaries are determined by bi = x0 + iδ where δ = (xd − x0 )/β. Equi-Width Histograms Kooi [512] also proposed the alternative of equi-width histograms [512]. There, all buckets have about the same cumulated frequency. This can be achieved by a single scan through X and starting a new bucket as soon as the current bucket’s cumulated frequency exceeds F + /β. Another interesting approach to equi-width histograms has been described by Piatetsky-Shapiro and Connell [688]. There, buckets can overlap. The construction is quite simple, though expensive. First, from X we construct a bag Y of cardinality n such that each value xi occurs exactly fi times in Y . Then, Y is sorted by increasing values. A parameter S is used to determine the number of values to be stored. S determines the distance in the sorted vector Y between two stored values via N = (n − 1)/S. From the sorted vector we then pick every value at a position 1 + iN for i = 0, . . . , S. Hence, we store S + 1 values. If (n − 1)/S is not an integer, the distance between the last two elements can be smaller. This approach is called distribution steps, but could also be termed quantiles. If some values occur very frequently (more often than N times), then they are stored more than once. Thus, the buckets overlap. Besides the values nothing else is stored. Piatetsky-Shapiro and Connell then continue to give selectivity estimation formulars [688], which we do not repeat here. 480 CHAPTER 24. CARDINALITY AND COST ESTIMATION The Heuristics Zoo Besides the classic equi-width and equi-depth heuristics to find bucket boundaries, a whole zoo of heuristics has been proposed. Fortunately, this zoo has been classified by Poosala, Ioannidis, Haas, and Shekita [700]. Before we begin, let us recall some basic definitions. For every xi except the last one, we define the spread si as si = xi+1 − xi and the area ai as ai = fi ∗ si . The motivation behind these definitions is based on the two major problems we face when approximating a given data distribution X. The problems are 1. largely varying fi and 2. largely varying si . Any of the xi , fi , si , or ai is a parameter to the heuristics. We denote which one of these the heuristics should apply by V (value), F (frequency), S (spread), A (area), respectively. The first partitioning heuristics is Equi-Sum. It gives rise to Equi-Sum(V), Equi-Sum(F), Equi-Sum(S), and Equi-Sum(A) and tries to balance the sum of its parameter such that all buckets exhibit about the same value for the sum. Thus, Equi-Sum(V) corresponds to equi-width (here, we count the number of xi falling into a bucket) and Equi-Sum(F) corresponds to equi-depth histograms. The next bunch of heuristics are max-diff histograms. They come in the flavors of max-diff(F), max-diff(S), and max-diff(A). They put bucket bounds between those β − 1 values xi and xi+1 where the β − 1 highest values of the differences |fi+1 − fi |, |si+1 − si |, |ai+1 − ai | are found. End-biased histograms store those values together with their frequencies that exhibit the β1 lowest and β2 highest frequencies with β = β1 + β2 . The remaining values are put into a single bucket. High-biased histograms only store the β highest values and use a single histogram for the remaining values. Compressed histograms additional use a regular histogram (e.g. equi-depth) to approximate the frequency distribution of the remaining values. Note that none of these histograms comes with a guarantee regarding the maximal possible q-error for range queries. Histograms that exhibit this property will be discussed in Sec. ??. But before that, we need to prepare ourselves a little more. 24.7 More on Q 24.7.1 Properties of the Q-Error Definition of the Q-Error Let f ≥ 0 be a number and fˆ ≥ 0 be an estimate for f . Then, we define the q-error of fˆ as ||fˆ/f ||Q , where ||x||Q := max(x, 1/x). If for some value q ≥ 1 ||fˆ/f ||Q ≤ q, we say that the estimate is q-acceptable. 481 24.7. MORE ON Q Let R be a relation and A be one of its attributes. Let ΠD 1 , . . . , xd } A (R) = {x P + with xi ≤ xi+1 . Denote by fi the frequency of xi and by f (c1 , c2 ) := c1 ≤x 0. Then, it is easy to see that ||x/x′ ||Q ≤ max(a, b) holds. Sums For 1 ≤ i ≤ n, let fi be true values and fˆi be estimates with ||fˆi /fi ||Q ≤ q for all 1 ≤ i ≤ n. Then 1/q ∗ holds, i.e., n X i=1 fi ≤ n X i=1 fˆi ≤ q ∗ n X fi i=1 Pn ˆ fi ||Q ≤ q. || Pi=1 n i=1 fi Products For 1 ≤ i ≤ n, let fi be true values and fˆi be estimates with ||fˆi /fi ||Q ≤ qi for all 1 ≤ i ≤ n. Then n n n n n Y Y Y Y Y ˆ (1/qi ) ∗ fi ≤ fi ≤ qi ∗ fi i=1 holds, i.e., i=1 i=1 i=1 i=1 Qn ˆ n Y i=1 fi Q || n ||Q ≤ qi . i=1 fi i=1 Note that division behaves like multiplication. Differences Assume we are given a total value t > 0, which is the sum of a large values l > 0 and a small value s > 0. Thus, t = l + s and s ≤ l. The latter implies that s ≤ t/2 ≤ l. If we know t and an estimate ˆl of l, we can get an estimate ŝ = t − ˆl. This kind of estimation is not a good idea, as we see from the following example. Assume that t = 100, l = 90, s = 10, and ˆl = 99. Although ||ˆl/l||Q = 1.1, we have ŝ = t − ˆl = 1 and, thus, ||ŝ/s||Q = 10. The situation is different, if we use an estimate ŝ of (the smaller) s to derive an estimate ˆl for (the larger) l as the following theorem shows. 482 CHAPTER 24. CARDINALITY AND COST ESTIMATION Theorem 24.7.1 Let t, l, s > 0 be three numbers such that t = l + s and s ≤ l. Let ŝ be an estimate for s with ||ŝ/s||Q ≤ q for some q ≥ 1. Define the estimate ˆl for l as ˆl = max(t/2, t − ŝ). Then ||ˆl/l||Q ≤ min(2, q). Proof: First, observe that if q = 1 the theorem trivially holds. Second, observe that ||ˆl/l||Q ≤ 2 always holds. Thus, if for some q we have q = ||ŝ/s||Q >= 2 then ||ˆl/l||Q ≤ q. Finally, we have to show that ||ˆl/l||Q ≤ q for 1 < q < 2. Due to t = l + s and s ≤ l, we have t/2 ≤ l = t − s (*), and, thus t ≤ 2(t − s). Further remember our assumption 2 > q = ||ŝ/s||Q . According to the definition of ˆl, we have to distinguish two cases. Case 1: Assume t − ŝ < t/2. Then t − ŝ < t/2 t 2 t t− 2 t 2 t s s (**) t ŝ > t − qs > qs > 2q > 1 2q < Thus, ˆl || ||Q l = = =∗ = ≤∗∗ = ≤ t/2 ||Q l t || ||Q 2(t − s) 2(t − s) t s 2(1 − ) t 1 2(1 − ) 2q 1 2− q q || To see why the last inequality holds, we first observe that 2 − 1q is strongly increasing in q. Further, remember that 1 ≤ q always holds and observe that (1) q − (2 − 1q ) is strongly decreasing in q and (b) it attains its minimum in 483 24.7. MORE ON Q the interval [1, 2] at q = 1, for which q − (2 − 1q ) = 1 − (2 − 11 ) = 0. Thus, the difference q − (2 − 1q ) is always greater or equal to zero. This fact together with 1 q − (2 − ) ≥ 0 q 1 q ≥2− q finishes Case 1. Case 2: We have to show that (1/q)l ≤ t − ŝ ≤ ql under the assumptions that t/2 ≤ t − ŝ and 1 < q < 2. We start by showing t − ŝ ≤ ql. t − ŝ ≺≻ t − ŝ ≺ t − 1q s ≺≻ t − 1q s ≺≻ qs − 1q s ≺≻ (q − 1q )s ≤ ≤ ≤ ≤ ≤ ≤ ≺≻ ≤ (q− 1q ) (q−1) ql q(t − s) q(t − s) qt − qs qt − t (q − 1)t t s We now observe that (q − 1q ) (q − 1) q − 1 + 1 − 1q = = 1+ ≤ 2 t ≤ s q−1 1 − 1q q−1 1− 1 To see that q−1q ≤ 1 consider ≺≻ ≺≻ 1− 1q q−1 1 − 1q ≤ 1 ≤ q−1 2 ≤ q + 1q The function f (x) = x + 1/x is monotonically increasing on x ∈ [1, 2], our interval of interest. Thus, since 1 + 1/1 = 2, the claim follows. We now show that (1/q)l ≤ t − ŝ under the assumptions that t/2 ≤ t − ŝ and 1 < q < 2. If (1/q)(t − s) ≤ t/2 then (1/q)(t − s) ≤ t/2 ≤ t − ŝ and we are done. Thus, assume t/2 ≤ (1/q)(t − s). 484 CHAPTER 24. CARDINALITY AND COST ESTIMATION Consider (1/q)l ≤ t − ŝ ≺≻ 1/q(t − s) ≤ t − ŝ ≺ 1/q(t − s) ≤ t − qs 1 1 ≺≻ q t − q s ≤ t − qs ≺≻ qs − 1q s ≤ t − 1q t ≺≻ (q − 1q )s ≤ (1 − 1q )t q− 1q ≺≻ ≤ 1− 1q t s (*) Observe that q − 1q 1 − 1q = q2 − 1 q−1 (q + 1)(q − 1) q−1 = q + 1 (**) = Summarizing (*) and (**), it suffices to show t q+1≤ . s From our assumption t/2 ≤ 1q (t − s), we can derive ≺≻ ≺≻ ≺≻ ≺≻ ≺≻ t/2 qt q q q q+1 ≤ ≤ ≤ ≤ ≤ ≤ 1 q (t − s) 2t − 2s 2 t−s t 2(1 − st ) 2 − 2 st 3 − 2 st Now, it suffices to show that s t 3−2 ≤ . t s The following inequalities are equivalent: t s ≤ t s 3ts − 2s2 ≤ t2 3−2 0 ≤ t2 − 3ts + 2s2 0 ≤ t2 − 2ts + s2 − ts + s2 0 ≤ (t − s)2 − s(t − s) 0 ≤ (t − s) − s 0 ≤ t − 2s Since the latter holds, we are done with the case t/2 ≤ 1q (t − s) and, thus, with Case 2. 2 485 24.7. MORE ON Q Theorem 24.7.2 Let t, l, s > 0 be three numbers such that t = l + s and s ≤ l. Let t̂ be an estimate for t with ||t̂/t||Q ≤ q for some q ≥ 1. Define the estimate ˆl for l as ˆl = max(t̂/2, t − s). Then ||ˆl/l||Q ≤ 2q. Proof Case 1. ˆl = t̂/2. Define ˆl t̂/2 q ∗ := || ||Q = || ||Q . l t−s Case 1.1 t̂/2 ≥ t − s. Then t̂/2 t−s t̂/2 ≤ t/2 ≤ q q∗ = Case 1.2 t − s ≥ t̂/2. Then t−s t̂/2 t ≤ t̂/2 ≤ 2q q∗ = Case 2. ˆl = t̂ − s, t̂ − s ≥ t̂/2. Define ˆl t̂ − s q ∗ := || ||Q = || ||Q . l t−s Case 2.1 t̂ − s ≥ t − s. Then t̂ − s t−s t̂ − t/2 ≤ t − t/2 t̂ − t/2 ≤ t/2 ≤ 2q − 1 q∗ = Case 2.2 t − s ≥ t̂ − s. Then t−s t̂ − s t−s ≤ t̂/2 t ≤ t̂/2 ≤ 2q q∗ = 486 CHAPTER 24. CARDINALITY AND COST ESTIMATION 2 Note that if we make sure that we overestimate t, i.e., t̂ ≥ t, then only Case 2.1 applies and t̂ is quite precise. Theorem 24.7.3 Let t, l, s > 0 be three numbers such that t = l + s and s ≤ l. Let t̂ be an estimate for t with ||t̂/t||Q ≤ qt for some qt ≥ 1. Let ŝ be an estimate for s with ||ŝ/s||Q ≤ qs for some qs ≥ 1. Define ˆl := max(t̂/2, t̂ − ŝ). Additionally, assume that (1/qt ) − qs s > 0. Then, ||ˆl/l||Q ≤ max(2qt , qt2 qs ). ToDo Proof Case 1. Consider the case where t̂/2 > t̂ − ŝ and, thus, ˆl = t̂/2. The first condition implies that ŝ > t̂/2. Also, by our preconditions, t/2 ≤ t − s. Define ˆl t̂/2 q ∗ := || ||Q = || ||Q . l t−s Case 1.1 Assume ˆl ≥ l. Then t̂/2 t−s qt t/2 ≤ t/2 ≤ qt q∗ = Case 1.2 Assume ˆl < l. Then t−s t̂/2 t−s = 1/qt t/2 t−s = qt t/2 ≤ 2qt q∗ = Case 2. Consider the case where t̂/2 ≤ t̂ − ŝ and, thus, ˆl = t̂ − ŝ. Define ˆl t̂ − ŝ q ∗ := || ||Q = || ||Q . l t−s Case 2.1 Assume ˆl ≥ l. Then t̂ − ŝ t−s t 1 s ≤ qt − t − s qs t − s 1 s ≤ 2qt − qs t − s ≤ 2qt q∗ = 487 24.7. MORE ON Q Also, we have that t̂ − ŝ t−s t − (1/qt )ŝ ≤ qt t−s t − ŝ ≤ qt t−s ≤ qt qs q∗ = This is not too bad since qt will be close to 1 in our applications. Also t̂ − ŝ t−s t − ŝ + (qt − 1)t ≤ t−s (qt − 1)t ≤ qs + t−s ≤ qs + 2(qt − 1) q∗ = This is not too bad since qt will be close to 1 in our applications. Case 2.2 Assume ˆl < l. Then t−s q∗ = t̂ − ŝ t−s ≤ (1/qt )t − ŝ t−s ≤ qt t − qt ŝ ≤ qt2 qs This holds if t − qt qs s ≥ t/2. If qt qs < 2 then q∗ = ≤ ≤ ≤ ≤ ≤ t−s t̂ − ŝ t−s (1/qt )t − ŝ t/2 (1/qt )t − qs (t/2) t (2/qt )t − qs (t) 1 (2/qt ) − qs 1 qt 2 − qt qs das ist doof. 2 488 24.7.2 CHAPTER 24. CARDINALITY AND COST ESTIMATION Properties of Estimation Functions Let R be a relation and A one of its attributes. We assume that ΠD A (R) = D {x1 , . . . , xd }, where d := ΠA (R) and xi ≤ xj for all 1 ≤ i ≤ j ≤ d. We only treat range queries since exact match queries are simpler than range queries and distinct value queries are similar. An estimation function fˆ+ is called monotonic on [l, u], if and only if for all l ≤ c1 ≤ c′1 ≤ c′2 ≤ c2 ≤ u fˆ+ (c′1 , c′2 ) ≤ fˆ+ (c1 , c2 ) holds. An estimation function fˆ+ is called additive on [l, u], if and only if for all l = c1 ≤ . . . ≤ ck = u fˆ+ (c1 , ck ) = k−1 X fˆ+ (ci , ci+1 ) i=1 holds. Note that every additive estimation function is monotonic. Assume we have an additive linear estimation function fˆ+ (x, y) = αx + βy + γ. Then, we must have for all x, y, z with x ≤ y ≤ z: αx + βz + γ = (αx + βy + γ) + (αy + βz + γ) ≺≻ 0 = αy + βy + γ This can only be achieved if γ = 0 and α = −β. Thus, every linear and additive estimation function is of the form fˆ+ (x, y) = α(y − x). We typically wish the bucket’s estimation function to be precise for the whole bucket. Thus, we demand that ||fˆ+ (lb, ub)/f + (lb, ub)||Q ≤ q for some error bound q. With f + := f + (lb, ub), we have (1/q)f + ≤ fˆ+ (lb, ub) ≤ qf + (1/q)f + ≤ α(ub − lb) ≤ qf + f+ f+ ≤ α ≤q (1/q) ub − lb ub − lb and the above holds if || α f+ ub−lb ||Q = || α(ub − lb) ||Q ≤ q. f+ This clearly holds, if we use the usual estimation function y−x + + fˆavg (x, y) = f ub − lb but we have the possibility to choose α within certain bounds. 24.7. MORE ON Q 24.7.3 489 θ,q-Acceptability One problem occurs if the cardinality estimate for some query is fˆ ≥ 1 and the true cardinality is zero. This happens, since we should never return an estimate of zero, because this leads to query simplifications which may be wrong or in reorderings which may not be appropriate. To solve this dilemma, there is only a single solution: during query optimization time, we execute building blocks and even access paths until the first tuple has been delivered. From there on, we know for sure, wether the result will be empty or not. If there is a tuple delivered, we buffer it, since we want to avoid its recalculation at runtime. The overhead of this method should therefore be low. Now, assume that we are willing to buffer more tuples (say 1000). Then, if there are less than 1000 qualifying tuples, we now the exact answer after fetching them. If we have to halt the evaluation of the build block since the buffer is full, we know that there will be ≥ 1000 qualifying tuples. Let us denote by θbuf the number of tuples we are willing to buffer. Since we interleave query optimization and query execution, this can be considered a small step in the direction of adaptive query optimization [232]. However, before we can evaluate a building block or access paths, we have to determine an optimal one, which in turn requires cardinality estimates! Before we proceed note that cardinality estimates may be imprecise as long as they do not influence the decisions of the query optimizer badly. This means, as long as the query optimizer produces the best plan, any estimate is o.k. Let’s for example take the decision wether to exploit an index or not. Assume, an index is better than a scan if less than 10% of the tuples qualify (This is a typical value [591, 367]). If the relation has 10000 tuples, the threshold is at 1000 tuples. Thus, assume that for a given range query both, the estimate and the true value do not exceed 500. Then, no matter what the estimate is, we should use the index. Note that the q-error can be 500 (e.g., the estimate is 1 and the true value is 500). Still it does not have any bad influence on our decision. The important thing is that the estimate has to be precise around 1000. For a given relation and one of its indices, we denote by θidx the number of tuples that, if exceeded make a table scan more efficient than the index scan. Let us now combine these two things. Assume we want to have a maximal qerror of q. Define θ = min(θbuf − 1, (1/q)θidx ). Assume that fˆ is an estimate for the true cardinality f . Further assume that if fˆ or f exeeds θ, then ||fˆ/f ||Q ≤ q. Now let’s go through the optimizer. In a first step, we define our building blocks and access paths, which requires to decide on index usage. Clearly, the estimate will be precise above (1/q)θidx , which includes the critical part. After evaluating a building block or access path, we have precise cardinality estimates if fewer than θbuf tuples are retrieved. Otherwise, our estimate will obey the given q-error. Thus, we are as precise as necessary under all circumstances. These simple observations motivate us to introduce the notion of θ, q-acceptability. Let f ≥ 0 be a number and fˆ ≥ 0 be an estimate for f . Let q ≥ 1 and θ ≥ 1 be numbers. We say that fˆ is θ, q-acceptable if 1. f ≤ θ ∧ fˆ ≤ θ or 490 CHAPTER 24. CARDINALITY AND COST ESTIMATION 2. ||fˆ/f ||Q ≤ q. Let R be a relation and A be one of its attributes. Let ΠD 1 , . . . , xd } A (R) = {x P + with xi ≤ xi+1 . Denote by fi the frequency of xi and by f (c1 , c2 ) := c1 ≤x θ or fˆ+ (xi , xi′ +1 ) > θ This index i′ can be found by binary search. For a given L, assume that for all l with 1 ≤ l ≤ L • ||fˆ+ (xi , xi′ +l )/f + (xi , xi′ +l )||Q ≤ q and • f + (xi , xi′ +L ) ≥ kθ and • fˆ+ (xi , xi′ +L ) ≥ kθ. That is, we stop after L tests. Then, we will show that the bucket is θ, (q + k1 )-acceptable. Consider the range query [xi , xj [. If fˆ+ (xi , xj ) ≤ kθ, then it is θ, q-acceptable for f + (xi , xj ). Otherwise, we can find i1 , . . . , im such that • xi = xi1 and • xj = xim . Also, we can achieve that (a) • ∀ij < m − 1 f + (xij , xij +1 ) ≥ kθ and • f + (xim−1 , xim ) < θ or (b) • ∀ij < m − 1 fˆ+ (xij , xij +1 ) ≥ kθ. • fˆ+ (xim−1 , xim ) < θ. In the worst case, we have m = 3. 492 CHAPTER 24. CARDINALITY AND COST ESTIMATION Case 1. f + (xi , xj ) ≤ fˆ+ (xi , xj ) imples || fˆ+ (xi , xj ) ||Q = f + (xi , xj ) = ≤ fˆ+ (xi , xj ) f + (xi , xj ) fˆ+ (xi , xi ) + fˆ+ (xil ,xj f + (xi1 , xil−1 ) + f + (xil−1 , xil ) 1 l−1 qf + (xi1 , xil−1 ) + θ f + (xi1 , xil−1 ) + 1 qf + (xi1 , xil−1 ) + θ f + (xi1 , xil−1 ) θ ≤ q+ + f (xi1 , xil−1 ) θ ≤ q+ kθ 1 ≤ q+ k ≤ Case2. fˆ+ (xi , xj ) < f + (xi , xj ) implies || fˆ+ (xi , xj ) ||Q = f + (xi , xj ) f + (xi , xj ) fˆ+ (xi , xj ) = f + (xi1 , xil−1 ) + f + (xil , xj ) fˆ+ (xi , xi ) + fˆ+ (xi , xi ) ≤ f + (xi1 , xil−1 ) + θ fˆ+ (xi , xi ) + 1 1 1 l−1 l−1 l l−1 f + (x i1 , xil−1 ) + θ + ˆ f (xi1 , xil−1 ) θ ≤ q+ + ˆ f (xi , xi ) ≤ 1 l−1 1 ≤ q+ k Summarizing, we are able to trade in accuracy for performance when testing the θ, q-acceptability of some bucket. A Cheap Pretest for Dense Buckets If the domain of the attribute is discrete and every domain value within the bucket has a frequency larger than zero, the bucket is dense. This is always the case if dictionaries are used as in systems like Blink or Hana. In this case, θ,q-acceptability is implied by either of the following conditions: 1. The cumulated frequency of the bucket is less than or equal to θ or 2. maxi fi / mini fi ≤ q 2 . 493 24.7. MORE ON Q The first condition also holds for non-dense buckets. The last condition only holds if we use our flexibility concerning the α in our approximation function. + , we need to exchange it against If we use fˆavg qf ≥ max fi ∧ (1/q)f ≤ min fi , i i where f is the average frequency of the bucket. If this cheap pretest fails, we need to apply the quadratic test or the subtest. 24.7.5 From Buckets To Histograms As usual, let R be some relation and A be some of its attributes with ΠD A = {x1 , . . . , xd }, where d := |ΠD (R)| and x ≤ x for 1 ≤ i ≤ j ≤ d. i j A In general, θ,q-acceptability does not carry over from buckets to histograms. That is, even though all buckets maybe theta, q-acceptable, the histogram must not be. Consider a histogram in which each bucket has the true cumulated frequency θ and the estimate for each bucket is 1. Then, the estimate for a range query comprising n buckets is n and the true value is nθ. Clearly, the histogram is not θ,q-acceptable if q < θ. Theorem 24.7.4 Let H be a histogram. Consider two neighbored buckets B1 and B2 spanning the intervals [bi , bi+1 [ for i = 0, 1. Let k ≥ 2 be a number. q If both buckets B1 and B2 are θ,q-acceptable then the histogram is kθ,q + k−1 acceptable. Proof: Assume we have two buckets B1 = [b0 , b1 [ and B2 = [b1 , b2 ] and a range query asking for the cumulated frequency in [c1 , c2 [ with b0 ≤ c1 ≤ b1 ≤ c2 ≤ b2 . For each bucket Bi , we denote by fi+ (x, y) = X fi x≤xi θ∨ fˆ1 > θ)∧(f2 > θ∨ fˆ2 > θ). It follows that ||fˆ/f ||Q ≤ q. Case 3. We now assume that neither the condition of Case 1 nor the condition of Case 2 holds. Thus, ¬(f ≤ kθ ∧ fˆ ≤ kθ) ∧ ¬((f1 > θ ∨ fˆ1 > θ) ∧ (f2 > θ ∨ fˆ2 > θ)), which is equivalent to (f > kθ ∨ fˆ > kθ) ∧ ((f1 ≤ θ ∧ fˆ1 ≤ θ) ∨ (f2 ≤ θ ∧ fˆ2 ≤ θ)). We consider four subcases, where we denote by q ∗ the q-error of fˆ, i.e., fˆ fˆ1 + fˆ2 q ∗ := || ||Q = || ||Q . f f1 + f2 Case 3.1 Assume f > kθ, f1 ≤ θ, fˆ1 ≤ θ. From this, it follows that kθ < f = f1 + f2 ≤ θ + f2 and thus (k − 1)θ < f2 and, since k ≥ 2 || fˆ2 ||Q ≤ q. f2 Case 3.1.1 Assume fˆ1 + fˆ2 ≥ f1 + f2 . A simple calculation gives us q∗ = = fˆ1 + fˆ2 f1 + f2 fˆ1 f1 + f2 θ fˆ2 < + kθ f2 1 ≤ q+ k + fˆ2 f1 + f2 495 24.7. MORE ON Q Case 3.1.2 Assume f1 + f2 > fˆ1 + fˆ2 . A simple calculation gives us q∗ = f1 + f2 fˆ1 + fˆ2 ≤ θ + q fˆ2 fˆ2 ≤ q+ θ fˆ2 θ (1/q)f2 θ ≤ q+ (1/q)(k − 1)θ q ≤ q+ k−1 ≤ q+ Case 3.2 Assume f > kθ, f2 ≤ θ, fˆ2 ≤ θ. This implies kθ < f = f1 + f2 ≤ f1 + θ and thus (k − 1)θ < f1 and, since k ≥ 2 || fˆ1 ||Q ≤ q. f1 Case 3.2.1 Assume fˆ1 + fˆ2 ≥ f1 + f2 . Then, fˆ1 + fˆ2 > kθ. A simple calculation gives us q∗ = = fˆ1 + fˆ2 f1 + f2 fˆ1 f1 + f2 fˆ1 θ ≤ + f1 kθ 1 ≤ q+ k + fˆ2 f1 + f2 496 CHAPTER 24. CARDINALITY AND COST ESTIMATION ‘ Case 3.2.2 Assume f1 + f2 > fˆ1 + fˆ2 . A simple calculation gives us q∗ = f1 + f2 fˆ1 + fˆ2 ≤ q fˆ1 + θ fˆ2 ≤ q+ θ fˆ2 θ (1/q)f2 θ ≤ q+ (1/q)(k − 1)θ q ≤ q+ k−1 ≤ q+ Case 3.3 Assume fˆ > kθ, f1 ≤ θ, fˆ1 ≤ θ. From this, it follows that kθ < fˆ = fˆ1 + fˆ2 ≤ θ + fˆ2 and thus (k − 1)θ < fˆ2 and, since k ≥ 2 || fˆ2 ||Q ≤ q. f2 Case 3.3.1 Assume fˆ1 + fˆ2 ≥ f1 + f2 . A simple calculation gives us q∗ = ≤ ≤ ≤ fˆ1 + fˆ2 f1 + f2 fˆ1 + fˆ2 f1 + f2 f1 + f2 θ fˆ2 + f2 f2 θ +q (1/q)fˆ2 θ +q (1/q)(k − 1)θ q ≤ q+ k−1 ≤ 497 24.7. MORE ON Q Case 3.3.2 Assume f1 + f2 > fˆ1 + fˆ2 . A simple calculation gives us q∗ = f1 + f2 fˆ1 + fˆ2 f1 f2 + ˆ ˆ ˆ f1 + f2 f1 + fˆ2 θ f2 ≤ + kθ fˆ2 1 ≤ q+ k ≤ Case 3.4 Assume fˆ > kθ, f2 ≤ θ, fˆ2 ≤ θ. From this, it follows that kθ < fˆ = fˆ1 + fˆ2 ≤ fˆ1 + θ and thus (k − 1)θ < fˆ1 and, since k ≥ 2 || fˆ1 ||Q ≤ q. f1 Case 3.4.1 Assume fˆ1 + fˆ2 ≥ f1 + f2 . q∗ = ≤ ≤ ≤ ≤ ≤ fˆ1 + fˆ2 f1 + f2 fˆ1 + fˆ2 f1 + f2 f1 + f2 fˆ1 θ + f1 f1 θ q+ (1/q)fˆ1 θ q+ (1/q)(k − 1)θ q q+ k−1 Case 3.4.2 Assume f1 + f2 > fˆ1 + fˆ2 . A simple calculation gives us q∗ = f1 + f2 fˆ1 + fˆ2 f1 f2 + ˆ ˆ ˆ f1 + f2 f1 + fˆ2 f1 θ ≤ + fˆ1 kθ 1 ≤ q+ k ≤ 2 498 CHAPTER 24. CARDINALITY AND COST ESTIMATION Theorem 24.7.5 Let H be a histogram. Consider n ≥ 3 consecutive buckets Bi in H spanning the intervals [bi , bi+1 [ for i = 0, . . . , n. Let k ≥ 3 be a number. If every estimate for a range query spanning a whole bucket is q-acceptable and 2q every bucket Bi is θ,q-acceptable then the histogram is kθ,q + k−2 -acceptable. Proof: Assume a query interval [c1 , c2 [ spanning the n buckets of H. That is b0 ≤ c1 ≤ b1 and bn−1 ≤ c2 ≤ bn . We introduce the following abbreviations: f1 := f + (c1 , b1 ) f2 := f + (b1 , bn−1 ) f3 := f + (bn−1 , c2 ) f := f1 + f2 + f3 ˆ f1 := fˆ1+ (c1 , b1 ) fˆ2 := fˆ2+ (b1 , bn−1 ) fˆ3 := fˆ+ (bn−1 , c2 ) 3 fˆ := fˆ1+ + fˆ2+ + fˆ3+ By assumption, we have ||fˆ2+ /f2+ ||Q ≤ q. We distinguish several cases. Case 1. If f ≤ kθ and fˆ ≤ kθ, then the estimate is kθ, q-acceptable. Case 2. If (f1 > θ ∨ fˆ1 > θ) ∧ (f3 > θ ∨ fˆ3 > θ), the estimate is q-acceptable. Case 3. We now assume that neither the condition of Case 1 nor the condition of Case 2 holds. Thus ¬(f ≤ kθ ∧ fˆ ≤ kθ) ∧ ¬((f1 > θ ∨ fˆ1 > θ) ∧ (f3 > θ ∨ fˆ3 > θ)), which is equivalent to (f > kθ ∨ fˆ > kθ) ∧ ((f1 ≤ θ ∧ fˆ1 ≤ θ) ∨ (f3 ≤ θ ∧ fˆ3 ≤ θ)). We denote by q ∗ the q-error of fˆ, i.e., fˆ fˆ1 + fˆ2 + fˆ3 q ∗ := || ||Q = || ||Q . f f1 + f2 + f3 Case 3.1 Assume f1 ≤ θ and fˆ1 ≤ θ and f3 ≤ θ and fˆ3 ≤ θ. Case 3.1.1 Assume f > kθ. From f = f1 + f2 + f3 > kθ and f1 ≤ θ and f3 ≤ θ, we get f2 > (k − 2)θ and q fˆ2 > (k − 2)θ. 499 24.7. MORE ON Q If f ≤ fˆ we get q∗ = = = fˆ f fˆ1 + fˆ2 + fˆ3 f1 + f2 + f3 fˆ1 + fˆ3 f1 + f2 + f3 2θ ≤ +q kθ 2 ≤ q+ k + fˆ2 f1 + f2 + f3 If fˆ ≤ f we get q∗ = f fˆ = f1 + f2 + f3 fˆ1 + fˆ2 + fˆ3 ≤ 2θ + f2 fˆ2 ≤ q+ 2θ fˆ2 2θ (1/q)(k − 2)θ 2q ≤ q+ k−2 ≤ q+ Case 3.1.2 Assume fˆ > kθ. From fˆ = fˆ1 + fˆ2 + fˆ3 > kθ and fˆ1 ≤ θ and fˆ3 ≤ θ, we get fˆ2 > (k − 2)θ and If f ≤ fˆ we get qf2 > (k − 2)θ. q∗ = = fˆ f fˆ1 + fˆ2 + fˆ3 f1 + f2 + f3 2θ ≤ q+ f2 2θ ≤ q+ (1/q)(k − 2)θ 2q ≤ q+ k−2 500 CHAPTER 24. CARDINALITY AND COST ESTIMATION If fˆ ≤ f we get q∗ = = f fˆ f1 + f2 + f3 fˆ1 + fˆ2 + fˆ3 f2 + 2θ ˆ f1 + fˆ2 + fˆ3 2θ ≤ q+ kθ 2 ≤ q+ k ≤ Case 3.2 Assume f1 ≤ θ and fˆ1 ≤ θ and f3 > θ ∨ fˆ3 > θ. Case 3.2.1 Assume f > kθ. From f = f1 + f2 + f3 > kθ and f1 ≤ θ, we get f2 + f3 > (k − 1)θ and q(fˆ2 + fˆ3 ) > (k − 1)θ ˆ ˆ + f3 since || ff22 +f ||Q ≤ q. 3 ˆ If f ≤ f we get q∗ = = fˆ f fˆ1 + fˆ2 + fˆ3 f1 + f2 + f3 θ ≤ q+ kθ 1 ≤ q+ k If fˆ ≤ f we get q∗ = = f fˆ f1 + f2 + f3 fˆ1 + fˆ2 + fˆ3 θ ˆ f2 + fˆ3 θ ≤ q+ (1/q)(k − 1)θ q ≤ q+ k−1 ≤ q+ 501 24.7. MORE ON Q Case 3.2.2 Assume fˆ > kθ. From fˆ = fˆ1 + fˆ2 + fˆ3 > kθ and fˆ1 ≤ θ, we get fˆ2 + fˆ3 > (k − 1)θ and ˆ fˆ3 since || ff22 + +f3 ||Q ≤ q. q(f2 + f3 ) > (k − 1)θ. If f ≤ fˆ we get q∗ = = If fˆ ≤ f we get fˆ f fˆ1 + fˆ2 + fˆ3 f1 + f2 + f3 θ ≤ q+ f2 + f3 θ ≤ q+ (1/q)(k − 1)θ q ≤ q+ k−1 q∗ = = f fˆ f1 + f2 + f3 fˆ1 + fˆ2 + fˆ3 θ ˆ f1 + fˆ2 + fˆ3 θ ≤ q+ kθ 1 ≤ q+ k ≤ q+ Case 3.3 Assume f1 > θ ∨ fˆ1 > θ and f3 ≤ θ and fˆ3 ≤ θ. Case 3.3.1 Assume f > kθ. By Symmetry. Case 3.3.2 Assume fˆ > kθ. By Symmetry. 2 + ˆ In case the estimates for a whole bucket are precise, e.g., if we use favg , we can refine the bounds. Corollary 24.7.6 Let H be a histogram. Consider n ≥ 3 consecutive buckets Bi in H spanning the intervals [bi , bi+1 [ for i = 0, . . . , n. Let k ≥ 3 be a number. If every estimate for a range query spanning a whole bucket is 1-acceptable and every bucket Bi is θ,q-acceptable then the histogram is kθ,q ′ -acceptable, where 2 q ′ := k−2 q + 1. To see that the corollary holds, simply reconsider the above proof. Let us mention that for k ≥ 3, we never saw a q-error larger than q + 1/k. 502 CHAPTER 24. CARDINALITY AND COST ESTIMATION qcompressb(x, b) return (0 == x) ? 0 : ⌈logb (x)⌉ + 1⌉ qdecompressb(y, b) return (0 == y) ? 0 : by−1+0.5 qcompressbase(x, k) // x is the largest number to be compressed // k is the number of bits used to store a compressed value return x1/((1< 0, let x be some number in the interval [b2l , b2(l+1) ]. If we approximate x by b2l+1 then ||b2l+1 /x||Q ≤ b. Let xmax be the largest number to be compressed. If xmax ≤ b2(k+1) for some k is the maximal occurring number, we can approximate any x in [1, xmax ] with ⌈log2 (k)⌉ bits obeying a maximal q-error of b. We can extend q-compression to allow for the compression of 0 2 as in the code in Fig. 24.11. √ There, we use the base b instead of b as above. Thus, the error is at most b. Let us consider a concrete example. Let b = 1.1. 9 Assume we use 8 bits to store a number. Then, since 1.1254 ≈ 32.6 √ ∗ 10 , we can approximate even huge numbers with a small q-error of at most 1.1 = 1.0488. Other examples are given in Table 24.7. There exists a small disadvantage of q-compression with a general base. Though calculating the logarithm is quite cheap, since typically machine instructions to do so exist, calculating the power during decompression is quite expensive. On our machine, compression takes roughly 54 ns whereas decompression takes 158 ns. This is bad since in the context of cardinality estimation, decompression is used far more often than compression. Thus, we introduce an alternative called binary q-compression. Binary Q-Compression The idea of binary q-compression is simple. Let x be the number we want to compress. If we take the base b = 2 then ⌈log2 (x)⌉ = k, where k is the index of the highest bit set. This calculation can be done by √ a rather efficient machine instruction. This gives us a maximum q-error of 2. We can go below this, by remembering not only the highest bit set, but the k highest bits set. Additionally, we store the position of them (their shift) in s bits. The pseudocode is given in Fig. 24.12, where we extended the scheme to allow for the compression of zero. So far, this resembles a special floating point 503 24.7. MORE ON Q #Bits 4 4 4 5 5 5 6 6 6 7 7 8 Base 2.5 2.6 2.7 1.7 1.8 1.9 1.2 1.3 1.4 1.1 1.2 1.1 Largest compressable number 372529 645099 1094189 8193465 45517159 230466617 81140 11600797 1147990282 164239 9480625727 32639389743 q-Error 1.58 1.61 1.64 1.30 1.34 1.38 1.10 1.14 1.18 1.05 1.10 1.05 Table 24.7: Examples for q-compression representation with only positiv mantissa and exponent. p The q-middle of 2n and 2n−1 − 1 is 2n ∗ (2n+1 − 1). This is the estimate we should return for n. We do not want to compute the square root during decompression, since this is too expensive. A little calculation helps. √ p 2n ∗ (2n+1 − 1) ≈ 2n ∗ 2n+1 √ = 22n ∗ 2 √ = 2 ∗ 2n √ = 2n + ( 2 − 1) ∗ 2n √ The second part can be calculated by a constant ( 2 − 1) shifted by n to the left. The pseudocode in Fig. 24.12 gives the calculation of this√constant C in C. The best theoretical q-error achievable with storing k bits is 1 + 21−k . With our fast approximation, we get pretty close as the following table shows. The observed maximal q-error column was obtained experimentally. The deviation from the observed maximal q-error to the theoretical maximal q-error is due to the fact that only a small portion of the digits of C are used. Further, compression (2.7 ns) and decompression (2.8 ns) are fast. 504 CHAPTER 24. CARDINALITY AND COST ESTIMATION qcompress2(x, k, s) if 2s > x then bits = x shift = 0 else shift = index-of-highest-bit-set(x) - k + 1; bits = (x >> shift) return (bits << shift) | shift qdecompress2(y, k, s) shift = y & (2s − 1) bits = y >> shift x = bits << shift – assume C = (int) ((sqrt((double) 2.0) - 1.0) * 4 * (1 << 30)) x |= (C >> (32 - shift)) return x Figure 24.12: Binary Q-compression k 1 2 3 4 5 6 7 8 9 10 11 12 max q-error observed 1.5 1.25 1.13 1.07 1.036 1.018 1.0091 1.0045 1.0023 1.0011 1.00056 1.00027 √ max q-error theoretical ( 1 + 21−k ) 1.41 1.22 1.12 1.06 1.03 1.016 1.0078 1.0039 1.00195 1.00098 1.00048 1.00024 Incremental Updates It might come as a surprise that q-compressed numbers can be incrementally updated. Already in 1978, Morris observed this fact [628]. Later, Flajolet analyzed the probabilistic counting method thoroughly [284]. The main idea is rather simple. For binary q-compressed numbers, the incrementing procedure is defined as follows: RandomIncrement(int& c) // c: the counter 505 24.8. ONE DIMENSIONAL SYNOPSES τ1,1 : 100 – τ̂1,1 : 100 τ1,2 : 52 m1,2 : 33 τ̂1,2 : 52 τ1,4 : 30 m1,4 : 18 τ̂1,4 : 30 τ1,8 : 12 m1,8 : 6 τ̂1,8 : 12 τ2,8 : 18 – τ̂2,8 : 18 τ2,2 : 48 – τ̂2,2 : 48 τ2,4 : 22 – τ̂2,4 : 22 τ3,8 : 16 m3,8 : 11 τ̂3,8 : 16 τ4,8 : 6 – τ̂4,8 : 6 τ3,4 : 20 m3,4 : 13 τ̂3,4 : 20 τ5,8 : 6 m5,8 : 5 τ̂5,8 : 7 τ6,8 : 14 – τ̂6,8 : 13 τ4,4 : 28 – τ̂4,4 : 28 τ7,8 : 13 m7,8 : 7 τ̂7,8 : 13 Figure 24.13: FLT example 1 let δ be a binary random variable which takes value 1 with probability 2−c and value 0 with probability 1 − 2−c . c += δ To go to an arbitrary base, we have to modify the random variable δ such that it takes the value 1 with probability a−c and 0 with probability 1 − a−c . 24.8 One Dimensional Synopses 24.8.1 Four Level Tree and Variants The Original Four Level Tree Four level trees were introduced by Buccafurri, Pontieri, Rosaci, and Sacca [117]. Later, Buccafurri, Lax, Sacca, Pontieri, and Rosaci discussed three, five, and N-Level level trees [115, 116]. A concise description can also be found in [210]. The basic idea is to divide a bucket into eight subbuckets (called bucklets) of equal width. Consider the following sample bucket [116]: xi 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 fi 7 5 18 0 6 10 0 6 0 6 9 5 13 0 8 7 This bucket is divided into 8 bucklets of width 16/8 = 2. Every bucklet τi,8 summarizes the values in bucket i, 1 ≤ i ≤ 8. The next higher level of the four τ8,8 : 15 – τ̂8,8 : 15 506 CHAPTER 24. CARDINALITY AND COST ESTIMATION level tree contains four values τi,4 (1 ≤ i ≤ 4) summing the frequencies in the i-th quarter of the bucket. Thus, τi,4 = τ2i−1,8 + τ2i,8 for 1 ≤ i ≤ 4. The third level of the four level tree defines the values τi,2 for i = 1, 2 summing up the frequencies in each half of the bucket. The last level, τ1,1 contains the sum of all frequencies fi in the bucket. This scheme is illustrated in Fig. 24.13 and formally defined as τi,2k := τ2i−1,2k+1 + τ2i,2k+1 for k = 0, . . . , 3. The four level tree in Fig. 24.13 is compressed into 64 bits as follows. τ1,1 is stored in the first 32 bits. Next, the τj,2k for k > 0 are only stored if j is odd. For even j = 2i, τ2i,2k+1 can be calculated given τi,2k : τ2i,2k+1 := τi,2k − τ2i−1,2k+1 for k = 1, . . . , 3. Further, since 7 numbers have to be compressed into 32 bits, only an approximation thereof is stored. The number of bits bk used to store the approximation of some τ2i−1,2k+1 decreases from top to bottom: k bk 0 32 1 6 2 5 3 4 The intention is that if we make a mistake at a higher level, all lower levels are affected. Thus, we want to be precise at higher levels. Instead of storing τ2i−1,2k+1 directly, the ratio τ2i−1,2k+1 /τi,2k is approximated using bk bits: m2i−1,2k+1 := round( τ2i−1,2k+1 bk (2 − 1)). τi,2k (24.33) The 7 mi,j values are stored in the second 32 bits: m1,2 33 100001 m1,4 18 10010 m3,4 13 01101 m1,8 6 0110 m3,8 11 1011 m5,8 5 0101 m7,8 7 0111 The number of zeros and ones in the last line is 1 ∗ 6 + 2 ∗ 5 + 4 ∗ 4 = 32. From m2i−1,2k , we can restore an estimate for τ̂2i,22 k by calculating τ̂2i,22 k := round( m2i−1,2k ∗ τ̂i,2k ). 2bk − 1 (24.34) This recursion is possible, since we store τ1,1 explicitly. The τ̂ are also given in Fig. 24.13. Now, consider the example in Fig. 24.14. It shows the four level tree for a frequency density where the eight bucklets have the following cumulated frequencies: i fi+ 1 1.000.000 2 100.000 3 10.000 4 1000 5 100 6 10 7 1 8 10.000 507 24.8. ONE DIMENSIONAL SYNOPSES τ1,1 : 1121111 – τ̂1,1 : 1121111 τ1,2 : 1111000 m1,2 : 62 τ̂1,2 : 1103316 τ2,2 : 10111 – τ̂2,2 : 17795 τ1,4 : 1100000 m1,4 : 31 τ̂1,4 : 1103316 τ1,8 : 1000000 m1,8 : 14 τ̂1,8 : 1029762 τ2,8 : 100000 – τ̂2,8 : 73554 τ3,4 : 110 m3,4 : 0 τ̂3,4 : 0 τ2,4 : 11000 – τ̂2,4 : 0 τ3,8 : 10000 m3,8 : 14 τ̂3,8 : 0 τ4,8 : 1000 – τ̂4,8 : 0 τ5,8 : 100 m5,8 : 14 τ̂5,8 : 0 τ4,4 : 10001 – τ̂4,4 : 17795 τ6,8 : 10 – τ̂6,8 : 0 Figure 24.14: FLT example 2 As we can see, the error for last bucketlet 8,8 is quite large. The reason is that we substract an estimate of larger number from a smaller number, which is not a good idea (see Sec. 24.7.1). Although, the four level tree is an excellent idea, it has two major problems: 1. Whenever the fraction in Formula 24.33 is smaller than 1/2bk +1 , rounding takes place towards zero. 2. Always the left child’s τ is subtracted from the right child’s τ . This results in uncontrollable errors if the right child’s τ is smaller than the left child’s τ (see Sec. 24.7.1). Thus, we will modify the four level tree. Variants of the Four Level Tree Exploiting the techniques of (binary) q-compression, we can easily come up with several variants of the four level tree. All variants we discuss here use 7 indicator bits to remember whether the left or the right child node contained the smaller τi,j . The variant FLT2 stores τ1,1 in 11 bits using binary q-compression. For the other τi,j , the original compression scheme is used. At level 2, 8 instead of 6 bits are used, at level 3, 7 bits instead of 5 bits are used and at level 4, 6 instead of 4 bits are used. The variant qFLT also stores τ1,1 in 11 bits using binary q-compression. However, instead of deriving the other τi,j from estimates of their parents, it directly stores these values in q-compressed form. The number of bits used at each level is the same as in FLT2. The base used is derived from the estimate τ̂1,1 for τ1,1 . At level i, the minimal base for the number ⌈τ̂1,1 /2i−1 ⌉ is chosen. τ7,8 : 1 m7,8 : 0 τ̂7,8 : 0 τ8,8 : 10000 – τ̂8,8 : 17795 508 24.8.2 CHAPTER 24. CARDINALITY AND COST ESTIMATION Q-Histograms (Type I) So far, every bucket contains the same information. Loosening this restriction results in heterogeneous histograms. This leads to heterogeneous histograms which contain different kinds of buckets. However, there are a few disadvantages: higher cpu and memory consumption. Further, with every bucket type considered, histogram construction costs increase. Besides the bucket boundaries and the bucket contents a bucket header has to be stored. Since this is typically only a single byte per bucket, the increased flexibility by using different bucket types by far outways the price of this byte, leading to more compact and precise histograms. Simple Bucket Types We now briefly summarize some possible bucket types. standard bucket A standard bucket contains the cumulated frequency and the number of distinct values. standard bucket with boundary frequency Microsoft SQL Server stores for every bucket the frequency of the lower bucket boundary [226, p208]. poly2dim, exppoly2dim Assume (xi , fi ) is the frequency density of our attribute A. For given bucket boundaries b1 , b2 , we can approximate the set RGE := {(xi , xj , f + (xi , xj ))|b1 ≤ xi ≤ xj ≤ b2 } by a 2-dimensional polynomial. It is advantageous to use only low degrees (one or two) and find the best approximation under lq . Approximation by a ep for a polynomial p leads to another alternative. poly1dim, exppoly1dim Histogram Construction by Dynamic Programming A Heuristic for Histogram Construction 24.8.3 Q-Histogram (Type II) 24.9 Sketches For Counting The Number of Distinct Values Histograms are used to overcome deficiencies of the uniform distribution assumption. So far, the only way we discussed to combine selectivities derived for serveral predicates from different attributes has to be done under the attribute value independence assumption. In this section, we discuss an approach to avoid the attribute value independence assumption. This can be done by providing precise estimates on the number of distinct values in a set of attributes 24.9. SKETCHES FOR COUNTING THE NUMBER OF DISTINCT VALUES509 (sometimes called column group). Although the number of distinct values for a set of attributes is rather interesting at different points in cardinality estimation (see our simple profile), we show how to provide selectivity estimates under the uniform distribution assumption but without relying on the attribute value independence assumption. The number of distinct values for a column group can be calculated by sorting or hashing, but this is often far to expensive. Thus, we provide a set of techniques based on sketches. Sketches are small summaries of data typically calculated online, i.e., with a single scan over the data. We concentrate on sketches for the count distinct case. A general introduction and an overview is contained in [210] and a recent evaluation of different sketches can be found in [406]. The problem we look at in this section is to approximately answer a query of the form select count(distinct A1 ,. . . ,An ) from R Standard techniques like hashtables or red-black trees can be used to collect the distinct values contained in the attributes A1 , . . . , An of R. However, the space consumption is linear in the number of distinct values. Sketches require far less space. Before we delve into the details of the algorithms, let us recall an observation made by Ilyas, Markl, Haas, Brown, and Aboulnaga [442]. They observed that sometimes assuming uniformity is not as bad as assuming attribute value independence. In this case, keeping the number of distinct values for a set of attributes helps to make estimates more precise. Consider their example of a car database repeated in Fig. 24.15. Assume that we want to provide an estimate for select count(*) from Car where Make = Honda and Model = Accord Denote by p1 (p2 ) the first (second) predidate in the query. The selectivities are s(p1 ) = 1/7 and s(p2 ) = 1/8. Assuming AVI yields ŝ(p1 ∧ p2 ) = s(p1 ) ∗ s(p2 ) = 1/56. The true selectivity is 1/10. The number of distinct values in the attribute group (Make,Model) is calculated by the following query select count(distinct Make, Model) from Car and results in #DV = 9. Assuming that all distinct values occur equally often (uniformity assumption) results in the selectivity estimate ŝ(p1 ∧ p2 ) = 1/9, which is much better. Throughout the rest of this section, we assume that we want to produce an estimate dˆ for the number of distinct values d of a multiset (bag) X = {x1 , . . . , xd }. 510 CHAPTER 24. CARDINALITY AND COST ESTIMATION ID 1 2 3 4 5 6 7 8 9 10 Car Make Honda Honda Toyota Nissan Toyota BMW Mazda Saab Ford Mazda Model Accord Civic Camry Sentra Corolla 323 323 95i F150 323 Figure 24.15: Car database example 24.9.1 Linear Counting The first algorithm we look at is Linear Counting by Astrahan, Schkolnick, and Whang [40]. Later, Whang, Vander-Zanden, and Taylor analyzed the algorithm and proposed a slight improvement [924]. Further, they showed analytically and experimentally, that linear counting still yields good results even if the fill factor goes up to 5-12. The algorithm is rather simple: Initialize a bitvector B of length b to contain only zeros. Then, for each member x in the given multiset X, set B[h(x)] to one, where h is a hash function. Finally, the number of zeros z in B is counted and the resulting estimate for the number of distinct values in X is dˆ = −b ln(z/b) (or dˆ = b ln(b/z)). If the number of ones in the bitvector is rather small, this number can be used as the estimate. The pseudocode of LinearCounting is given in Fig. 24.16. A problem occurs, if the bitvector becomes full, i.e., all bits are set. In this case, Whang et al. propose to run the algorithm a second time with another hash function. Further, they show how to keep the probability of running into a full bitvector a second time low. However, it is better to integrate LinearCounting into other algorithms that are capable of counting large numbers of distinct values in far less than linear space. 24.9.2 DvByKMinVal Assume the hash function h hashes the elements of our set X to the interval [0, 1[. Further, let H = {hi |hi = h(xi ), xi ∈ X} and assume that 0 ≤ hi ≤ hi+1 < 1. If the hash function spreads out the elements evenly, we expect an average distance of δ = 1/(d + 1) ≈ 1/d between two neighbored hash values. For some given k, consider hk , i.e., the k-th smallest value in H. This value can easily be calculated exploiting a heap to keep the lowest k distinct values while scanning X. Clearly, we expect the value of hk to be around kδ. Thus, δ = hk /k. If we plug this into the former equation, we get hk /k = 1/dˆ and hence dˆ = k/hk . This very simple algorithm (see Fig.24.17), which we 24.9. SKETCHES FOR COUNTING THE NUMBER OF DISTINCT VALUES511 LinearCounting(X, h, B) X: bag of elements h: hash function in [0, b − 1] B: bitvector of length bk initialize B with zeros for all x ∈ X do B[h(i] = 1 for all i ∈ [0, m[ do if 0 = M [i] then z := z + 1 if 0 = z then z := 1 o := m − z // number of ones in the bitvector corresponding to M √ if o < m then return o else return m ∗ ln(m/z) Figure 24.16: Linear Counting DvByKMinVal(X,h) input: a bag X, a hashfunction h : X → [0, 1] output: estimtate for the number dˆ for the number of distinct values in X using,e.g., a heap calculate the k-th minimal value hk in {h(x)|x ∈ X} dˆ := (k − 1)/hk return dˆ Figure 24.17: Algorithm DvByKMinVal call DvByKMinVal, was developed and analyzed by Bar-Yossef, Jayram, Kumar, Sivikumar, Trevisan [52]. Later Beyer, Haas, Reinwald, Sismanis, and Gemulla showed that this estimator is biased. They found that an unbiased estimator is dˆ = (k−1)/hk [84]. As the hash function they recommend using the golden-ratio multiplicative hashing method [505]. 24.9.3 Logarithmic Counting In a series of papers, Flajolet and Martin introduced three different probabilistic counting algorithms [286, 287, 288]. For these algorithms, the hash function must map the elements of X to some bit pattern of a fixed length (say 32 or 64 bits). Let us start with the simplest one called LogarithmicCounting. The idea behind this algorithm is the observation that the probability that the first bit in a bit pattern produced by the hash function being zero is 1/2. The probability that the first two bits are zero is 1/4, and so on. The algorithm calculates now the smallest index R, such that among the bitvectors h(xi ) the R-th bit is never set. The estimate then roughly is 2R . However, this estimate is biased. 512 CHAPTER 24. CARDINALITY AND COST ESTIMATION LogarithmicCounting(X, b) // X: bag of elements // h: hash function // b: length of bitvector // constant ϕ = 0.7735162909 // indices for bitvectors start with 0 let B be a bitvector of length b, with all bits set to zero for each x ∈ X do B | = lowest-bit-set(h(x)) R = index-of-lowest-zero-bit in B return (1/ϕ) ∗ 2R Figure 24.18: Algorithm LogarithmicCounting A factor 1/ϕ corrects this. Fig. 24.18 shows the full algorithm including ϕ. LogarithmicCounting produces rather rough estimates. This is remedied by a first alternative called Multiple Probabilistic Counting. The idea is to calculate m estimates for m independent hash functions and average these. However, using m different hash function is expensive and it may prove difficult to find these [24]. As a variant, Flajolet and Martin suggest to use several predetermined permutations and only one hash function [287]. However, both alternatives are still quite expensive. Hence, we don’t detail on this algorithm. The third variant, Probabilistic Counting with Stochastic Averaging (PCSA), also averages several estimates, but it does so without applying multiple hash functions. The idea is to split the bitvector of the hashed value into two parts. The first k bits give an index into an array of bitvectors of size b − k. Then, for every input value xi , only the bitvector determined by the first k bits of h(xi ) is manipulated by remembering its lowest bit set. Instead of one R, we now have several Rj for 1 ≤ j ≤ 2k . These are averaged and the resulting estimate is produced. Fig. 24.19 shows the full pseudocode, where we also integrated the unbiasing presented in [287]. The standard deviation of PCSA is √ 0.78/ m. 24.9.4 SuperLogLog Counting Durand and Flajolet introduce two more space efficient probabilistic counting algorithms called LogLogCounting and SuperLogLogCounting [250]. The core idea of both algorithms is to remember the maximum of all indices i such that i corresponds to the lowest bit set for some x ∈ X. Note that this requires far less space: less than a byte suffices. As before, not only one such maximum is retained but m = 2k for some k. Fig. 24.20 shows how to fill an array M of m such maxima. The filled array M is the basis for LogLogCounting and SuperLogLogCounting. The algorithms just differ in how they produce their estimates of dˆ from M . In order to produce the LogLogCounting estimate, the maxima in M are averaged. Raising 2 by the power of this average and multiplying it by m yields 24.9. SKETCHES FOR COUNTING THE NUMBER OF DISTINCT VALUES513 PCSA(X, h, b, k) // X: bag of elements // h: hash function // b: length of bitvector produced by h // k: length of prefix used to index array // constant ϕ = 0.7735162909 // constant ψ = 1 + (0.31/m) // indices for bitvectors start with zero m := 2k let B be an array of size m containing bitvectors of size b − k for each x ∈ X do // i, r = split h(x) into k and b − k bits: i = h(x)&((1 << k) − 1) r = h(x) >> k B[i] | = lowest-bit-set(r) for each B[j], (1 ≤ j ≤ m) do RP j = index-of-lowest-zero-bit(B[j]) S = j Rj return (m/(ϕ ∗ ψ)) ∗ 2(S/m) Figure 24.19: Algorithm PCSA the estimate. Again, it is biased. To unbias it, a constant αm , for a given m, is used. Summarizing, the LogLogCounting estimate is calculated as follows: LogLogCounting(M ) 1/m αm := (Γ(−1/m) 1−2 )−m // Γ is the gamma function ln 2 P dˆloglog := αm m2(1/m) j M [j] return dˆ The standard deviation of LogLogCounting is √1.3 . m There are three major improvements in SuperLogLogCounting. For small numbers, LogLogCounting yields bad estimates. Thus, SuperLogLogCounting includes LinearCounting for this case. Second, instead of averaging all maxima in M , only a fraction of size ⌊0.7m⌋ is averaged, where the highest 30% are left out. Thus, we average only the ⌊0.7m⌋ smallest maxima in M . Third, some correction for hash collisions is performed. The integration of LinearCounting uses M as its ’bitmap’. First, the number of zeros z is determined by counting the entries in M which are zero. The estimate is than produced as in LinearCounting (see Fig. 24.16). As the details of SuperLogLogCounting cannot be found in [250]. we refer the reader to Durand’s thesis [249]. We review SuperLogLogCounting here to allow the reader to implement it. First, as already mentioned, the average of the ⌊0.7m⌋ smallest maxima in M is calculated. Let us call this value apartial . This 514 CHAPTER 24. CARDINALITY AND COST ESTIMATION FillM(X, h, M) // X: bag of elements // h: hash function // M: array of integers of size m = 2k // k: length of prefix used to index array M of maxima // indices for bitvectors start with one initialize M to 0 for each x ∈ X do y = h(x) if 0 == y then M[0] := max(M [0], 33 − k) // if the length of hash value is 32 bits else i := y & ((1 << k) - 1) j := idx-lowest-bit-set(y >> k) M [i] = max(M [i], j) Figure 24.20: Filling M for LogLogCounting, SuperLogLogCounting, and HyperLogLogCounting SuperLogLog(M ) dˆlinc := LinearCounting(M ) dˆloglog := LogLogCounting(M ) dˆsupll := α̃(dˆloglog )m2apartial L := 10m/5 case when dˆsupll < L ∧ dˆlinc < L do N := dˆlinc when dˆsupll > L ∧ dˆlinc > L do N := dˆsupll else N := (dˆlinc + dˆsupll )/2 esac H = 232 // if 32 bit is the length of a hash value N return −H ln(1 − H ) // correction of hash collisions Figure 24.21: SuperLogLog Counting eliminates bad accidental outliers. Then, three estimates are calculated. The first is dˆlinc produced by LinearCounting. The second is dˆloglog produced by LogLogCounting. This estimate is only used to produce the next estimate via some function α̃. The third is dˆsupll . The estimate produced by SuperLogLog is then calculated as shown in Fig. 24.21. The only missing piece is the calculation of the unbiasing function α̃ given in Fig. 24.22. As one can see, a polynomial of degree 4 is evaluated if k exceeds 24.9. SKETCHES FOR COUNTING THE NUMBER OF DISTINCT VALUES515 α̃(x) // x is the estimate produced by LogLogCounting // remember: k is the number of bits used for indexing M κ := ⌊ln(x/m)/ln(2) + 1.48⌋ + 1 − ln(x/m)/ln(2) if k < 4 then r := 0.74 else r := c4 κ4 + c3 κ3 + c2 κ2 + c1 κ1 + c0 return r Coefficents ci : k=4 k=5 k=6 k=7 k=8 k ∈ [9, 12] k > 12 c4 0.003497 0.00324250 0.0031390489 0.0030924632 0.0030709 0.0030517 0.0030504 c3 −0.03555 −0.0346687 −0.0343776755 −0.0342657653 −0.034219 −0.034180 −0.034177 c2 0.1999 0.19794194 0.197295 0.197045 0.19694 0.19685 0.19685 c1 −0.4812 −0.47555735320 −0.4730536 −0.4718622 −0.47129 −0.47077 −0.47073 c0 1.139000 1.140732 1.141759 1.142318 1.142600 1.142870 1.142880 Figure 24.22: Calculation of α̃ 3. The coefficients differ for different k. They are also given in the figure. The √ standard deviation of SuperLogLogCounting is 1.05/ m. For hashing strings, Flajolet and co-workers suggest to use the hash function proposed by Lum, Yuen, Dodd [572]. 24.9.5 HyperLogLog Counting The algorithm HyperLogLog Counting developed by Flajolet, Fusy, Gandouet, and Meunier uses the same procedure FillM but, instead of using the geometric means as LogLogCounting and successors, it relies on harmonic means [285]. Its pseudocode is given in Fig. 24.23. Again, for few entries in M , the linear counting estimate is returned. For large ranges, a correction for hash collisions is performed. The unbiasing factor αm is dependent on m and is calculated as follows: α16 = 0.673, α32 = 0.697, α64 = 0.709, and αm = 0.7213/(1+1.079/m) if m ≥ 128. 24.9.6 DvByMinAvg Whereas DvByKMinVal calculates the k-th smallest value, Lumbroso proposed to calculate m minima and average them [573]. This is done by splitting the values in the bag X into m partitions using the first l bits of the hash values. The remaining bits are then used to calculate the minima. The code of DvByMinAvg is shown in Fig. 24.24. The average of the minima contained in M is then calculated as the estimate E. As before, linear counting is used to estimate small numbers of distinct values. For the medium range, Lumbroso showed that the expected value of the estimate dˆ of the algorithm is (see Theorem 4 in 516 CHAPTER 24. CARDINALITY AND COST ESTIMATION HyperLogLog(X, h, m) X: bag of elements h: hash function to {0, 1}3 2 m: number of entries in matrix M , m = 2l for some l FillM(X,h, M ) P E := αm m2 ( i=0 i < m2−M [i] )−1 // ’raw’ estimate if E < 5/2 m then V := number of empty entries in M E ∗ := (V = 0) ? E ∗ : m log(m/V ) 1 32 2 else if E ≤ 30 ∗ then E := E else E ∗ := −232 log(1 − E/232 ) return E ∗ Figure 24.23: HyperLogLog Counting [573]): ˆ ≈ E(d) d , 1 − e−λ where d is the true number of distinct values in X and λ = d/m. In order to ˆ correct this bias, we set y = d/m and solve y= λ λ − e−λ for λ. Let us denote this inverse function by f −1 (λ). The best quadratic approximation under lq is f −1 (x) ≈ −0.0329046x2 + 1.34703 ∗ x − 0.932685 with a maximal q-error of 1.0035. 24.9.7 DvByKMinAvg Giroire proposed and algorithm we call DvByKMinAvg [331]. Alghough older than the approach by Lumbroso, DvByKMinAvg can easiest be understood as a combination of DvByKMin and DvByMinAvg. As can be seen in Fig. 24.25, we maintain an array M of buckets. Each bucket holds the k minimal values assigned to it, where k is a parameter pragmatically chosen to be 3 [331]. This combines relatively low overhead with relatively high precision. After the array M has been filled with the minimal values of the actual estimate is calculated in two steps. First, the sum of the negative logarithms of the k-th minimal values is calculated. In the algorithm, we denote by M k [i] the k-th smallest value in bucket i. Then, the actual estimate is calculated from this sum. The estimate found in the algorithm corresponds ot the logarithm family algorithm. Giroire presented two more estimators, namely the inverse family algorithm and the square root family algorithm [331]. 24.10. MULTIDIMENSIONAL SYNOPSIS 517 DvByMinAvg(X, h, m) X: bag of elements h: hash function to [0, 1]. m: number of entries in array matrix M , m = 2l for some l // calculate m minima for all x ∈ X do a := h(x) i := ⌊am⌋ M [i] := min(M [i], am − ⌊am⌋) od m(m−1) dˆ := M [0]+...+M [m−1] V := number of empty entries in M if V ≤ 0.86m then E ∗ := m log(m/V ) else if V < m ˆ then E ∗ := mf −a (d/m) else E ∗ := dˆ return E ∗ Figure 24.24: DvByMinAvg 24.9.8 Pointers to the Literature A general introduction and an overview is contained in [210]. A recent evaluation of different sketches can be found in [406]. Gelenbe and Gardy discuss a direct estimation approach to estimate the size of a projection [325] (see also the simple profile). In another paper, they do so in the presence of functional dependencies [324], which is an issue also investigated by Richard [730]. Oracle’s approatch to counting distinct values is described in [137]. Beyer, Haas, Reinwald, Sismanis, Gemulla show how to augment DvByKMinVal sketches with counters such that sketches for different bags can be combined such that estimates for unions/intersections/difference of these bags can be derived [84]. It is left as an exercise to the reader to show that any of the algorithms presented here can be used to efficiently estimate the number of distinct elements of disjoint unions of bags. This issue is important since large relations are often partitioned. 24.10 Multidimensional Synopsis In the headerline we cheated a little. We only discuss 2-dimensional synopsis. This has the advantages that on the one hand it is already sufficiently complex but on the other hand 2-dimensional figures are still drawable. Nonetheless, most of the approaches presented here can be elevated to more than two di- 518 CHAPTER 24. CARDINALITY AND COST ESTIMATION DvByKMinAvg(X, h, m) X: bag of elements h: hash function to [0, 1]. m: number of entries in array M of buckets, m = 2l for some l every bucket in M holds the k smallest values assigned to this bucket // calculate m times k smallest values for all x ∈ X do 1 if i−1 m ≤ h(x) ≤ m then actualize the k minima of bucket i with h(x) od P k s := m i=1 (ln(M [i]) 1 s Γ(k− ) dˆ := m · ( Γ(k)m )1 · e− m return dˆ Figure 24.25: DvByKMinAvg mensions. 24.10.1 Introductory Example To see why correlations happen, consider a very simple example with just one relation named Orders, which contains orders a sample company processes. We are only interested in two attributes: orderdate (od) and shipdate (sd). Assume every day 10 orders arrive. 5 are shipped the same day, 4 are shipped the next day and 1 is shipped the day after. Our database contains the orders for days 1 to 9. Thus, the cardinality of Orders is 90. The orders that are not yet shipped contain a NULL value in shipdate. Hence, there exist 6 tuples with null values in shipdate. Define the frequency matrix F as F (i, j) := |σod=i∧sd=j (Orders)|. Then, for our example we get the frequency matrix 1 2 3 4 5 6 7 8 9 1 5 0 0 0 0 0 0 0 0 2 4 5 0 0 0 0 0 0 0 3 1 4 5 0 0 0 0 0 0 4 0 1 4 5 0 0 0 0 0 5 0 0 1 4 5 0 0 0 0 6 0 0 0 1 4 5 0 0 0 7 0 0 0 0 1 4 5 0 0 8 0 0 0 0 0 1 4 5 0 9 0 0 0 0 0 0 1 4 5 This frequency matrix is highly correlated. Let us look at the consequences. 24.10. MULTIDIMENSIONAL SYNOPSIS Example 1 519 Assume we have a query σod≤4∧sd≤4 (Orders) and we wish to estimate the result cardinality using the independence assumption. Since the selectivity of od ≤ 4 is 40/84 and the selectivity of sd ≤ 4 is 34/84, the total selectivity under independence is 40/84 ∗ 34/84 = 0.19, and thus an estimate of 0.19 ∗ 84 ≈ 16 for our result cardinality. The true result cardinality is 34. Example 2 Assume we have a query σod≤4∧sd≥6 (Orders) and we wish to estimate the result cardinality using the independence assumption. Since the selectivity of od ≤ 4 is 40/84 and the selectivity of sd ≥ 6 is 40/84, we get that the total selectivity is 40/84 ∗ 40/84 = 0.23, and thus an estimate of 0.23 ∗ 84 = 19 for our result cardinality. The true result cardinality is 1. Two dimensional synopses are meant to avoid these inaccuracies. 24.10.2 Solving the Introductory Problem without 2-Dimensional Synopsis For the example above, a special solution exploiting one dimensional histograms exists. Instead of building a two-dimensional histogram on the attributes od and sd, we build a one-dimensional histogram on the difference (sd − od). The exact histogram for the introductory example looks as follows: (sd − od) 0 1 2 frequency 45 32 7 To see why this is a useful statistics to calculate estimates for the result sizes of our example queries, consider the general case of a conjunction of two range predicates (c1 ≤ A ≤ c2 ) ∧ (d1 ≤ B ≤ d2 ) (24.35) on attributes A and B. This predicate implies A − B ≤ c2 − d1 B − A ≤ d2 − c1 which is equivalent to A − B ≥ c1 − d2 A − B ≤ c2 − d1 520 CHAPTER 24. CARDINALITY AND COST ESTIMATION and thus (c1 − d2 ) ≤ (A − B) ≤ (c2 − d1 ). (24.36) Using the one-dimensional histogram, we can derive an estimate for the selectivity of Eq. 24.36. Call this selectivity s(Eq.24.36). Additionally, denote by s(c1 ≤ A ≤ c2 ) and s(d1 ≤ B ≤ c2 ) the selectivities of the two range predicates. Under the independence assumption, we would calculate the selectivity of Eq.24.35 as s(Eq.24.35) = s(c1 ≤ A ≤ c2 ) ∗ s(d1 ≤ B ≤ c2 ), which results in the problems illustrated by the introductory example. Now, let us take the minimum of the two terms and multiply it with the selectivity of the predicate in Eq. 24.36. Thus, the estimate for the conjunct in Eq.24.35 becomes s(Eq.24.35) = min(s(c1 ≤ A ≤ c2 ), s(d1 ≤ B ≤ c2 )) ∗ s(Eq.24.36). Let us see how this works for our example queries. In order to determine s(od ≤ 4∧sd ≤ 4), we have to determine the selectivities of the single predicates, which are s(od ≤ 4) = 40/84 and s(sd ≤ 4) = 34/84. Instantiating Eq.24.36 with c1 = d1 = 0 and c2 = d2 = 4 gives us −4 ≤ (sd − od) ≤ 4. Thus, all tuples qualify and the selectivity of this predicate is 1. Hence, we derive the cardinality estimate min(40/84, 34/84) ∗ 1 ∗ 84 = 34, which is accidentally perfect. Now consider the predicate (od ≤ 4 ∧ sd ≥ 6). The selectivity of (sd ≥ 6) is 40/84. The selectivity of 2 ≤ (sd − od) ≤ 4 is 7/84. Thus, the cardinality estimate is min(40/84, 40/84) ∗ 7/84 ∗ 84 = 3, which is closer to the truth than the estimate produced under independence. 24.10.3 Statistical Views The above histogram is easily created, since both attributes come from the same relation. In reality, things can be a little more complex. Consider for example the following query against the TPC-H schema: SELECT count(*) FROM Lineitem l, Orders o WHERE o.orderdate >= 1995.03.01 AND l.shipdate <= 1995.03.07 AND l.orderno = o.orderno Here, the two date attributes come from different relations. The solution to this problem is rather simple: define a statistical view. Although the exact syntax may differ, it is simply a view definition as in 521 24.10. MULTIDIMENSIONAL SYNOPSIS sd od [1, 3] , [4, 6] , [7, 9] 30 30 24 [1, 2] , [3, 5] 14 16 [4, 5] , [6, 8] 14 16 [7, 8] , [9, 9] 14 10 Figure 24.26: Example for Equi-Depth Tree CREATE STATISTICAL VIEW statview_lo_date AS SELECT o.shipdate - o.orderdate FROM Lineitem l, Orders o WHERE l.orderno = o.orderno together with some specification what kind of synopsis should be created on the projected attributes. If more than a single attribute is projected, any of the following multi-dimensional synopsis can be used. One major advantage of this approach is that is covers the correlations introduced by the join predicate. 24.10.4 Regular Partitioning: equi-width [603] [1, 3] [4, 6] [7, 9] 24.10.5 [1, 3] 24 0 0 [4, 6] 6 24 0 [7, 9] 0 6 24 Equi-Depth Histogram [634] 24.10.6 2-Dimensional Synopsis based on SVD 24.10.7 PHASED 24.10.8 MHIST 24.10.9 GENHIST 24.10.10 HiRed 24.10.11 VI Histograms 24.10.12 Grid Trees 24.10.13 More STHoles to organize query feedback. 522 CHAPTER 24. CARDINALITY AND COST ESTIMATION 24.11 Iterative Selectivity Combination In this approach, independence is assumed and selectivities are simply multiplied. There is only one minor complication. Consider the query select * from R, S, T where R.A = S.B and S.B = T.C The query compiler uses transitivity to derive more predicates to increase the search space and make it more indendent of the actual query formulation chosen by the user (Sec. 11.2.2). Thus, the query is rewritten to select * from R, S, T where R.A = S.B and S.B = T.C and R.A = T.C and all of {R.A, S.B, T.C} are within the same equivalence class. All of the three equality predicates have an associated selectivity. However, after two of the predicates have been applied and their selectivities have been multiplied, the third predicate is implied by the other two and, accordingly its selectivity should not be used. This can easily be prevented by using a union find datastructure [209] associated with each plan class. It contains only those variables that contain in equivalence classes with cardinality greater than two. Initially, each of these variables is in its own class. Then, whenever an equality predicate is about to be applied, we check whether the two variables on the left and right are already in the same equivalence class. If so, we ignore the predicate. Otherwise, we apply the predicate and union the two equivalence classes of the variables. There remains only one open question. Assume the plan generator has generated the partial plan RBR.A=S.B S. Then, there are two predicates left to join T : S.B = T.C and R.A = T.C. For this case, where several predicates that can be applied, Swami and Schiefer [864] showed that the following rule (called LS ) is the correct way to do it: ”Given a choice of join selectivities for a single equivalence class, always pick the largest join selectivity. Thus, to make things more efficient, we sort the equality predicates whose variables belong to equivalence classes with more than two elements by decreasing selectivity. Then, we can proceed as indicated above. 24.12 Maximum Entropy [589, 590] (useless) theoretical discussions: [480, 721, 722] 523 24.13. SELECTED ISSUES level 2 level 1 level 0 Figure 24.27: Sample B+ -Tree 24.13 Selected Issues 24.13.1 Exploiting and Augmenting Existing DBMS Data Structures Index Structures Assume we have an attribute A of some relation R and a B+ - or B ∗ -Tree on A. The goal of this subsection is to show that we can use as simple procedure to produce a cardinality estimate for a given range query A ∈ Iq , where Iq = [a, b] is some query intervall. Let us start with some notation. Denote a node in the B+ -Tree by N. We assign a level to every node in the B+ -Tree, increasing level numbers from leaf nodes up to the root whereby we start with level 0 for the leaf nodes (see Fig. 24.27). By N, we denote an arbitrary node in the B+ -Tree. An arbitrary node at level l is denoted by N[l]. By N[l].I[j] we denote the j-th interval within which all tuples R fall and by N[l].S[j] we denote the correspond child (subtree root). For an interval I = [a, b], we denote by len(I) := b − a its length. Roughly, there are two alternatives. In the first alternative, we maintain as little extra information in the or about the B-tree as possible. In the second alternative, we can maintain counters N[l].C to remember the number of elements in the subtree. The resulting enhanced B+ -Tree is called a ranked tree [32]. Reading out these counters for a given query and producing an estimate is rather simple and we will not detail on it here. However, maintaining these counters may not always be affordable in a transactional system. Thus, we concentrate on the first approach. The leaf nodes store either tuples (e.g., for an index only table) or tuple identifiers (TIDs). In either case, there is a certain number of tuples stored in every leaf node. For indices on non-key attributes, and values with high frequency, overflow pages may exist. We ignore this fact by simply assuming that we are given the minimum and the maximum number of (referenced) tuples in a leaf page (possibly including overflow pages. We denote these numbers by min[0] and max[0]. Similarily, we denote the minimum and maximum fanout at level l > 0 by min[i] and max[i]. For fixed length keys, the B+ -Trees guarantee min[i] ||Q ≤ 2 that these two numbers are at most a factor of two apart, i.e., || max[i] and that these numbers are the same at all internal nodes except the root node. 524 CHAPTER 24. CARDINALITY AND COST ESTIMATION Further, these numbers can be derived from the sizes of the nodes, keys, and page pointers. For B+ -Trees on attributes with domains of variable size (e.g., varchar), these numbers have to be maintained explicitly or estimates have to be produced. The same is true for the min[0] and max[0] values of the leaf nodes. Let us first assume that the min[i] and max[i] values are given. This then results in pseudo-ranked trees [32]. For the true number of tuples f [0] in a leaf node min[0] ≤ f [0] ≤ max[0] holds. For an arbitrary node N[1] at level 1 the number of tuples f [1] in any of its subtrees j satisfies min[0] ∗ min[1] ≤ f [1] ≤ max[0] ∗ max[1]. In general, for an arbitrary non-root node at levelQ l, the number of tuples f [l] stored in its Q subtree satisfies li=1 min[l] ≤ f [l] ≤ li=1 max[i]. Denote by MIN[l] the first product and by MAX[l] the second. Then, the most accurate estimate we can return is q-middle(MIN[i], MAX[i]) p min[i] with a maximal q-error of (2l ) if || max[i] ||Q ≤ 2 holds at all levels including the leaf node level. Given a node N[l, k] at an arbitrary level l > 0 with J child nodes, we can estimate its contribution to a range query with query interval Iq as in J X len(Iq ∩ N[l, k].I[j]) j=1 len(N[l, k].I[j]) q-middle(MIN[l], MAX[l]). This procedure can now be applied to the root node only, However, it maybe beneficial to descend into child nodes for better estimates. This is especially true for those child nodes, that are not fully contained in the query interval. Thus, the questions arise (1) which nodes to descend and (2) when to stop. Several traversal strategies have been defined (see [36]). This is not too bad, but there are certain problems. As indicated above, variable length keys and overflow pages due to high skew result in certain problems. Concerning the former problem. One possibility to overcome the former problem is to explicitly maintain the minimal and maximal fanout for each level. If this is too expensive, we could maintain the number of nodes n[l] at every level l and use this to calculate the average fanout at level l as n[l + 1]/n[l] and use this number instead of the minmal and maximal fanout. Definitely, we loose any error bounds in this case. Consider the latter problem. The simplest solution is to maintain the number of leaf nodes explicitly and to derive an average number avg[0] of tuples in the leaf nodes, which is then used instead of min[0] and max[0]. Obviously, we lose precision, which can only be restored by maintaing explicit cardinality counters. Dictionaries Introduction Many main memory database management systems designed for OLAP are column stores. Further, they often use ordered dictionaries to facilitate compression of columns. Two commercial systems following these lines are Hana [?], DB2 Blue [?] SQL Server [?]. 525 24.13. SELECTED ISSUES Let A be an attribute of some relation R. Assume that the active domain DA = {x1 , . . . , xn } with xi < xj if i < j. A dictionary then comprises two mappings: 1. a mapping of i to xi and 2. a mapping from xi to i. We call the i dictionary indexes and the xi dictionary values. No matter whether the domain of A is discrete or continous, the dictionary indexes are positive integers. In a column store, the column for A then contains the (compressed) dictionary indexes of the original values. If the dictionary is ordered, a range query Q := σlq ≤A≤uq (R) (24.37) with values lq and uq can be mapped to a range query on dictionary indexes. Depending on the use of ≤ vs. <, the lower and upper query bounds are mapped to lower (lidx ) and upper (uidx ) bounds on dictionary indexes as follows: lq ≤ A → lidx := max({i|xi ≥ lq }) lq < A → lidx := max({i|xi > lq }) A ≤ uq → uidx := min({i|xi ≤ uq }) A < uq → uidx := min({i|xi < uq }) Any range query (open or half-open or open) is then mapped to the closed range query Qidx := σlidx ≤A≤uidx (R). (24.38) The mapping itself can be carried out by rather efficiently by a binary search within the dictionary. Since Q and Qidx are equivalent, estimation problems can now be carried out on Qidx . This task is simplified by the very structure of a dictionary. Distinct Values Since the dictionary is typically dense, that is no values that do not occur in the active domain are stored, the number of distinct values of A in Q can be calculated exactly: |ΠD A (σlidx ≤A≤uidx (R))| = uidx − lidx + 1. (24.39) Cardinality Assume that every dictionary value xi occurs with frequency fi . Then, we have uidx X |ΠA (σlidx ≤A≤uidx (R))| = fi . (24.40) i=lidx This requires that the fi (4 bytes) are stored for every dictionary entry. At the expense of CPU time, we can use q-compression on the fi to diminish memory consumption to one byte per dictionary entry. Thereby, we can be very precise, since, e.g., 1.1255 ≈ 36 ∗ 109 . 526 CHAPTER 24. CARDINALITY AND COST ESTIMATION If I := (uidx − lidx + 1) is small, the above summation yields acceptable performance. Assume, that we are willing to add 2δ frequencies. (If δ = 50, this means we are willing to add 100 frequencies.) If I > 2δ, we have to rely on alternative mechanisms. We have several options. Among the most obvious are: • build a tree-like structure with fan out δ and height ⌈logδ (n)⌉, where n is the number of dictionary entries, or • build some kind of histogram on the dictionary index, where, within every bucket, we have to be precise only for ranges comprising more than δ values (see Sec. 24.8.3). 24.13.2 Sampling [210] 24.13.3 Query Feedback 24.13.4 Combining Data Summaries with Sampling 24.13.5 Wavelets 24.13.6 Selectivity of String-Valued Attributes 24.14 Cost Functions 24.14.1 Disk-based Joins 24.14.2 Main Memory Joins 24.14.3 Additional Pointers to the Literature [963] bit-valued attributes, top-k queries. [227] Cardinality estimation at the calculus level. [177] Estimating Block Transfers and Join Sizes [178] parametric: Pearson Type 2 and 7 for symmetric, unimodal distributions. [274] Inverted files, multiple regression for Zipf distributions. [275] also gives details cost model for inverted file retrieval. [479] models relations as arrays of bits. Defines similarity function between bitmaps, lusters homogeneous rectangles using a pyramidal scheme. [583] overview article summarizing the eighties up to 1986. [318] uses generating functions. [54] overview article (New Jersey data reduction report), several techniques. [447] Overvies article, several techniques. [699] application of histograms to load balancing for parallel joins. [698] M-dimension Histograms (MHIST) [461] optimal histograms with quality guarantees: minimum error under given space, minimal space under given maximal error, construction algo: O(n2 ) if n is the number of distinct values contained in the histogram 24.14. COST FUNCTIONS 527 [313] cost model for parallel query optimization [513] application of SVD to time series data [508] piecewise approximation, rough, with linear functions [96] kernel estimators [448] applies histograms to approximate query answers. [516] optimal histograms for hierarchical range queries (OLAP) [245] split and merge buckets to capture changes in variance (gives dynamic v-optimal histograms (DVO histograms) [110, 111] STHoles multidimension histograms constructed from query feedback [460] constructs many histograms at once to meet global storage bounds. this allows to give more memory to histograms for more skewed (less easy approximatable) attributes [376] histograms for data streams [374] discovery and application of check constraints [332] exploiting soft constraints [377] fast algorithm for histogram construction for hierarchical range queries (OLAP) [874] dynamic multidimensional histograms (for data streams) [117] 4byte encoding of 4-level tree to allow refinement estimates within a histogram bucket. (very nice idea) [509] automatic tuning of data synopsis [115] uses N-Level Tree Histograms (again bit encodings) to estimate range queries [442, 443, 441] automatic relationship discovery: correlations and soft functional dependencies [965] HASE: combines synopses-based selectivity estimation with samplingbased sel. est. [30] fast comp. of approx. statistics [262] just-in-time statistics (todo) [642] approximation of CDF with splines Teorey, Das: [870] physical database design Spyratos: [826] operational approach, database updates, views Yu, Luk, Siu: [963] Estimation Number of Desired Records with Respect to a given query Piatetsky-Shapiro [688]: Distribution Steps [635]: DDSM Architecture of Cardinality and Cost Estimation: Parameter systems [730]; [582] 528 CHAPTER 24. CARDINALITY AND COST ESTIMATION Part V Implementation 529 Chapter 25 Architecture of a Query Compiler 25.1 Compilation process 25.2 Architecture Figure 25.1 a path of a query through the optimizer. For every step, a single component is responsible. Providing a facade for the components results in the overall architecture (Fig. 25.2). Every component is reentrant and stateless. The information necessary for a component to process a query is passed via references to control blocks. Control blocks are discussed next, then we discuss memory management. Subsequent sections describe the components in some detail. 25.3 Control Blocks It is very convenient to have a hierarchy of control blocks within the optimizer. Figure 25.3 shows some of the control blocks. For simplification, those blocks concerned with session handling and transaction handling are omitted. Every routine call within the optimizer has a control block pointer as a parameter. The routines belonging to a specific phase have a pointer to the phase’ specific control block as a parameter. For example, the routines in NFST have a NFST CB pointer as a parameter. We now discuss the purpose of the different control blocks. The global control block governs the behavior of the query compiler. It contains boolean variables indicating which phases to perform and which phases of the compilation process are to be traced. It also contains indicators for the individual phases. For example, for the first rewrite phase it contains indicators which rules to apply, which rules to trace and so on. These control indicators are manipulated by the driver which also allows to step through the different phases. This is very important for debugging purposes. Besides this overall control of the query compilers behavior, the global control block also contains 531 532 CHAPTER 25. ARCHITECTURE OF A QUERY COMPILER query parsing CTS abstract syntax tree nfst internal representation rewrite I internal representation query optimizer plan generation internal representation rewrite II internal representation code generation execution plan Figure 25.1: The compilation process a pointer to the schema cache. The schema cache itself allows to look up type names, relations, extensions, indexes, and so on. The query control block contains all the information gathered for the current query so far. It contains the abstract syntax tree, after its construction, the analyzed and translated query after NFST has been applied, the rewritten plan 25.4. MEMORY MANAGEMENT 533 after the Rewrite I phase, and so on. It also contains a link to the memory manager that manages memory for this specific query. After the control block for a query is created, the memory manager is initialized. During the destructor call, the memory manager is destroyed and memory released. Some components need helpers. These are also associated with the control blocks. We discuss them together with the components. 25.4 Memory Management There are three approaches to memory management in query optimizers. The first approach is to use an automatic garbage collector if the language provides one. This is not necessarily the most efficient approach but by far the most convenient one. This approach can be imitated by an implementation based on smart pointers. I would not recommend doing so since the treatment of cycles can be quite tricky and it is inefficient. Another approach would be to collect all references to newly created objects and release these after the query has been processed. This approach is easy to implement, very convenient (transparent to the implementor), but inefficient. A better approach is to allocate bigger areas of memory by a memory manager. Factories1 then use these memory chunks to generate objects as necessary. After the query has been processed, the chunks are freed. Here, we consider only memory whose duration lasts for the processing of a single query. In general, we have more kinds of memory whose validity conforms to sessions and transactions. 25.5 Tracing and Plan Visualization 25.6 Driver 25.7 Bibliography 1 Design pattern. 534 CHAPTER 25. ARCHITECTURE OF A QUERY COMPILER Parser 1..* 1 QueryCompiler Singleton, Facade parse(Query_CB*) NFST CodeGenerator run(NFST_CB*) run(CodeGenerator_CB*) Scanner Rewrite_I PlanGenerator Rewrite_II run(Rewrite_I_CB*) run(PlanGenerator_CB*) run(Rewrite_II_CB*) Figure 25.2: Class Architecture of the Query Compiler 535 25.7. BIBLIOGRAPHY SchemaCache Factorizer NFST_CB BlockHandler Global_CB Query_CB Rewrite_I_CB PlanGenerator_CB BitMapHandler Rewrite_II_CB OpCodeMapper CodeGenerator_CB OperatorFactory MemoryManager RegisterManager Figure 25.3: Control Block Structure 536 CHAPTER 25. ARCHITECTURE OF A QUERY COMPILER Chapter 26 Internal Representations 26.1 Requirements easy access to information query representation: overall design goal: methods/functions with semantic meaning, not only syntactic meaning. relationships: consumer/producer (occurrance) precedence order information equivalence of expressions (transitivity of equality) see also expr.h fuer andere funktionen/beziehungen die gebraucht werden 2-ebenen repraesentation. 2. ebene materialisiert einige beziehungen und funktionen, die haeufig gebraucht werden und kompliziert zu berechnen sind anderer grund fuer materialisierung: vermeide zuviele geschachtelte forschleifen. bsp: keycheck: gegeben eine menge von attributen und eine menge von schluesseln, ist die menge ein schluessel? teste jeden schluessel, kommt jedes element in schluessel in menge von attributen vor? (schon drei schleifen!!!) modellierungsdetail: ein grosser struct mit dicken case oder feine klassenhierarchie. wann splitten: nur wenn innerhalb des optimierers verschiedene abarbeitung erfordert. Representation: info captured: 1) 1st class information (information obvious in original query+(standard)semantic analysis) 2) 2nd class information (derived information) 3) historic information (during query optimization itself) - modified (original expression, modifier) - copied (original expression, copier) 4) information about the expression itselt: (e.g.: is function call, is select) 5) specific representations for specific purposes (optimization algorithms, code generation, semantic analysis) beziehungen zwischen diesen repraesentationen info captured for 1) different parts of the optimizer syntactic/semantic information garbage collection: 1) manually 2) automatic 3) semi-automatic (collect references, free at end of query) 26.2 Algebraic Representations relational algebra in: [197]. 537 538 CHAPTER 26. INTERNAL REPRESENTATIONS 26.2.1 Graph Representations 26.2.2 Query Graph also called object graph: [79, 962] 26.2.3 Operator Graph used in: [823], [957] enhanced to represent physical properties: [739] with outerjoins: [736], [309] graph representation and equivalence to calculus: [661] 26.3 Query Graph Model (QGM) 26.4 Classification of Predicates klassifikation von praedikaten • nach stelligkeit,wertigkeit (selektion, join, nasty) • nach funktor(=,¡,..., between, oder boolsche funktion) • nach funktion: fuer keys in index: start/stop/range/exact/enum range(inpredicate) • nach sel-wert: simple (col = const), komplex (col = expr) cheap/expensive • nach join wert: fuer hj, smj, hbnlj,... • korrelationspraedikate 26.5 Treatment of Distinct 26.6 Query Analysis and Materialization of Analysis Results Questions: 1. was materialisieren wir 2. was packen wir in die 1. repraesentation? • bsp: properties: zeiger auf property oder besser inline properties • bsp: unique number: entweder in expr oder getrennter dictionary struktur 26.7. QUERY AND PLAN PROPERTIES 539 query analysis (purpose, determine optimization algorithm) #input relations, #predicates, #ex-quantifiers, #all-quantifiers, #conjunctions, #disjunctions, #joingraphkind(star,chain,tree,cyclic) #strongly-connected-components (for crossproduct indication) #false aggregates in projection list clause (implies grouping required) /* remark: typical query optimizes should at least have two algorithms: - exhaustive (for large queries) - heuristic (for small queries) */ for blocks: indicator whether they should produce a null-tuple, in case they do not produce any tuple. this is nice for some rewrite rules. other possibility: if-statement in algebra. 26.7 Query and Plan Properties Logical and Physical Properties of Plans Ausführungsplänen können eine Reihe von Eigenschaften zugeordnet werden. Diese Eigenschaften fallen in drei Klassen 1. logische Eigenschaften, also beispielsweise (a) beinhaltete Relationen (b) beinhaltete Attribute (c) angewendete Prädikate 2. physische Eigenschaften, also beispielsweise (a) Ordnung der Tupel (b) Strom oder Materialisierung des Ergebnisses (c) Materialisierung im Hauptspeicher oder Hintergrundspeicher (d) Zugriffspfade auf das Ergebnis (e) Rechnerknoten des Ergebnis (im verteilten Fall) (f) Kompression 3. quantitative Eigenschaften, also beispielsweise (a) Anzahl der Elemente im Ergebnis (b) Größe des Ergebnisses oder eines Ergebniselementes (c) Auswertungskosten aufgeschlüsselt nach I/O, CPU und Kommunikationskosten kosten: diese sind zu berechnen und dienen als grundlage fuer die planbewertung ges-kosten /* gesamt kosten (ressourcenverbrauch) */ ges-kosten += cpu-instr / inst/sek ges-kosten += seek-kosten * overhead (waiting/cpu) geskosten += i/o-kosten * io-weight cpu-kosten /* reine cpu-kosten */ i/o-kosten /* hintergrundspeicherzugriff ( warten auf platte + cpu fuer seitenzugriffe) */ 540 CHAPTER 26. INTERNAL REPRESENTATIONS com-kosten /* kommunikation */ com-init /* initialisierungskosten fuer kommunikationsvorgang */ com-exit /* exitkosten fuer kommunikationsvorgang */ com-cptu /* kosten fuer jede transfereinheit (z.b. byte) waehrend eines kommunikationsvorgangs */ kostenstruktur koennte etwas sein, dass ges/cpu/io kosten enthaelt. ausserdem waeren kosten fuer rescanning interessant, falls dies notwendig ist (pufferprobleme, indexscan und dann faellt seite raus) weiteres interessantes kostenmass sind die kosten, bis das erste tupel berechnet wird. dies sind die konstanten, die system-abhaengig sind. am besten sind, sie werden gemessen. Hardware: #cpu-instruktionen pro sekunde #cpu-instruktionen fuer block zugriff/transfer lesen/schreiben #cpu-instruktionen pro transfer init/send/exit init/receive/exit ms fuer seek/latency/transfer pro nK block RTS-kosten #cpu-instruktionen fuer open/next/close fuer scan operatoren unter verschiedenen voraussetzungen:mit/ohne praedikat, mit/ohne projektion (entsprechend den avm programmen) #cpu-instruktionen fuer open/next/close fuer jeden alg operator, #cpu-instruktionen fuer funktionen/operationen/praedikate/avmbefehle statistics: first/large physical page of a relation number of pages of a relation -¿ to estimate scan cost measured sequential scan cost (no interference/plenty interference) –properties: • menge der quns • menge der attribute • menge der praedikate • ordnung • boolean properties • globale menge der gepipelineten quns • kostenvektor • cardinalitaeten bewiesen/geschaetzt • gewuenschter puffer • schluessel, fds • #seiten, die durch ein fetch gelesen werden sollen • menge der objekte, von denen der plan (der ja teilplan sein kann) abhaengt • eigenschaften fuer parallele plaene • eigenschaften fuer smp plaene 26.8. CONVERSION TO THE INTERNAL REPRESENTATION 541 das folgende ist alles blabla. aber es weisst auf den punkt hin, das in dieser beziehung etwas getan werden muss. --index: determine degree of clustering - lese_rate = #gelesene_seiten / seiten_fuer_relation ein praedikate erniedrigt die lesen_rate, ein erneutes lesen aufgrund einer falls TIDs sortiert werden, muss fetch_ration erneut berechnet werden - seiten koennen in gruppen z.b. auf einem zylinder zusammengefasst werden und mit einem prefetch befehl geholt werden. anzahl seeks abschaetzen - cluster_ration(CR) CR = P(read(t) ohne page read) = (card - anzahl pagefetch)/card = (card - (#pagefetch - #page))/card das ist besonderer quark - cluster_factor(CF) CF = P(avoid unnecessary pagefetch) = (pagefetch/maxpagefetch) = card -#fetch / card - #pageinrel das ist besonderer quark index retrieval on full key => beide faktoren auf 100% setzen, da innerhalb eines index die TIDs pro key-eintrag sortiert werden. Speicherung von Properties unter dynamischem Programmieren und Memoization: Kosten und andere Eigenschaften, die nicht vom Plan abhängen, können pro Planklasse gespeichert werden und brauchen nicht pro Plan gespeichert zu werden. 26.8 Conversion to the Internal Representation 26.8.1 Preprocessing 26.8.2 Translation into the Internal Representation 26.9 Bibliography verdr 542 CHAPTER 26. INTERNAL REPRESENTATIONS Chapter 27 Details on the Phases of Query Compilation 27.1 Parsing Lexical analysis is pretty much the same as for traditional compilers. However, it is convenient to treat keywords as soft. This allows for example for relation names like order which is a keyword in SQL. This might be very convenient for users since SQL has plenty (several hundreds) of keywords. For some keywords like select there is less danger of it being a relation name. A solution for group and order would be to lex them as a single token together with the following by. Parsing again is very similar to parsing in compiler construction. For both, lexing and parsing, generators can be used to generate these components. The parser specification of SQL is quite lengthy while the one for OQL is pretty compact. In both cases, a LALR(2) grammar suffices. The outcome of the parser should be an abstract syntax tree. Again the data structure for abstract syntax trees (ast) as well as operations to deal with them (allocation, deletion, traversal) can be generated from an according ast specification. During parsing already some of the basic rewriting techniques can be applied. For example, between can be eliminated. In BD II, there are currently four parsers (for SQL, OQL, NQL (a clean version of XQuery), XQuery). The driver allows to step through the query compiler and allows to influence its overall behavior. For example, several trace levels can be switched on and off while within the driver. Single rewrites can be enabled and disabled. Further, the driver allows to switch to a different query language. This is quite convenient for debugging purposes. We used the Cocktail tools to generate the lexer, parser, ast, and NFST component. 27.2 Semantic Analysis, Normalization, Factorization, Constant Folding, and Translation The NFST component performs (at least) four different tasks: 543 544CHAPTER 27. DETAILS ON THE PHASES OF QUERY COMPILATION * / IU:salary IU:budget 100 Figure 27.1: Expression 1. normalization of expressions, 2. factorization of common subexpressions, 3. semantic analysis, and 4. translation into the internal algebra-based query representation. Although these are different tasks, a single pass over the abstract syntax tree suffices to perform all these tasks in one step. Consider the following example query: select e.name, (d.salary / d.budget) * 100 from Employee e, Department d where e.salary > 100000 and e.dno = d.dno The internal representation of the expression (d.salary / d.budget) * 100 in the query is shown in Fig. 27.1. It contains two operator nodes for the operations “∗” and “/”. At the bottom, we find IU nodes. IU stands for Information Unit. A single IU corresponds to a variable that can be bound to a value. Sample IUs are attributes of a relation or, as we will see, intermediate results. In the query representation, there are three IUs. The first two IUs are bound to attribute values for the attributes salary and budget. The third IU is bound to the constant 100. NFST routines can be implemented using a typical compiler generator tool. It is implemented in a rule-based language. Every rule matches a specific kind of AST nodes and performs an action. The ast tree is processed in post order. The hierarchy for organizing different kinds of expressions is shown in Fig 27.2. Here is a list of useful functions: • occurrance of expressions in another expression 545 27.3. NORMALIZATION Expression Constant IU DB I term Relation Extent Function Call Variable Bolean Attribute Access AND Aggregat OR NOT Figure 27.2: Expression hierarchy • for a given expression: compute the set of occurring (consumed, free) IUs • for a given expression: compute the set of produced IUs • for a given IU, retrieve the block producing the IU • determine whether some block returns a single value only • computation of the transivitity of predicates, especially equality to derive its equivalence classes. • determine whether some expression produces a subset of another expression • constant folding • merge and/or (from e.g. binary to n-ary) and push not operations • replace a certain expression by another one • deep and shallow copy These functions can be implemented either as member functions of expressions or according to visitor/collector/mutator patterns. For more complex functions (consumer/producer) we recommend the latter. Some of these functions will be called quite frequently, e.g. the consumer/producer, precedence ordering, equivalence (transivitity of equality) functions. So it might be convenient to compute these relationships only once and then materialize them. Since some transformation in the rewrite phases are quite complex, a recomputation of these materialized functions should be possible since their direct maintenance might be too complex. 27.3 Normalization Fig. 27.3 shows the result after normalization. The idea of normalization is to introduce intermediate IUs such that all operators take only IUs as arguments. This representation is quite useful. 27.4 Factorization Common subexpressions are factorized by replacing them with references to some IU. For the expressions in TPCD query 1, the result is shown in Fig. 27.4. 546CHAPTER 27. DETAILS ON THE PHASES OF QUERY COMPILATION IU:− * IU:− IU:− / IU:salary IU:budget 100 Figure 27.3: Expression Factorization is enabled by a factorization component that takes care of all expressions seen so far and the IUs representing these expressions. Every expression encountered by some NFST routine is passed to the factorization. The result is a reference to an IU. This IU can be a new IU in case of a new expression, or an existing IU in case of a common subexpression. The factorization component is available to the NFST routines via the NFST control block which is associated with a factorization component (Fig.25.3). 27.5 Constant Folding 27.6 Semantic analysis The main purpose of semantic analysis is to attach a type to every expression. For simple expressions it is very similar to traditional semantic analysis in compiler construction. The only difference occurs for references to schema constructs. The schema is persistence and references to e.g. relations or named objects have to be looked up there. For performance reasons it is convenient to have a schema cache in order to cache frequently used references. Another aspect complicating semantic analysis a little is that collection types are frequently used in the database context. Their incorporation is rather straight forward but the different collection types should be handled with care. As programming languages, query languages provide a block structure. Consider for example the SQL query ... select a, b, c from A, B where d > e and f = g ... 547 27.6. SEMANTIC ANALYSIS IU: IU: IU: SUM SUM SUM IU: * IU: IU: * IU: - IU:Extended Price IU: + IU:Discount IU:Tax 1 Figure 27.4: Query 1 Consider the semantic analysis of d . Since SQL provides implicit name look up, we have to check (formerly analyzed) relations A and B whether they provide an attribute called d . If none of them provides an attribute d , then we must check the next upper SFW-block. If at least one of the relations A or B provides an attribute d, we just check that only one of them provides such an attribute. Otherwise, there would be an unallowed ambiguity. The blockwise look up is handled by block handler. For every newly encounterd block (e.g. SFW block), a new block is opened. All identifiers analyzed within that block are pushed into the list of identifiers for that block. In case the query language allows for implicit name resolution, it might also be convenient to push all the attributes of an analyzed relation into the blocks list. The lookup is then performed blockwise. Within every block, we have to check for ambiguities. If the lookup fails, we have to proceed looking up the identifier in the schema. The handling of blocks and lookups is performed by the BlockHandler component attached to the control block of the NFST component (Fig. 25.3). 548CHAPTER 27. DETAILS ON THE PHASES OF QUERY COMPILATION Another departure from standard semantic analysis are false aggregates as provided by SQL. select avg(age) from Students I call count(age) a false aggregate since a true aggregate function operators on a collection of values and returns a single value. Here, the situation is different. The attribute age is of type integer. Hence, for the average function whith signature avg : {int} −−→ int the semantic analysis would detect a typing error. The result is that we have to treat these false aggregates as special cases. This is (mostly) not necessary for query languages like OQL. 27.7 Translation The translation step translates the original AST representation into an internal representation. There are as many internal query representations as there are query compiler. They all build on calculus expressions, operator graphs build over some algebra, or tableaux representations [886, 887]. A very powerful representation that also captures the subtleties of duplicate handling is the query graph model (QGM) [689]. The representation we use here is a mixture of a typed algebra and calculus. Algebraic expressions are simple operator trees with algebraic operators like selection, join, etc. as nodes. These operator trees must be correctly typed. For example, we are very picky about whether a selection operator returns a set or a bag. The expression that more resemble a calculus representation than an algebraic expression is the SFWD block used in the internal representation. We first clarify our notion of block within the query representation described here and then give an example of an SFWD block. A block is everything that produces variable bindings. For example a SFWD-block that pretty directly corresponds to a SFW-block in SQL or OQL. Other examples of blocks are quantifier expressions and grouping operators. A block has the following ingredients: • a list of inputs of type collection of tuples1 (labeled from) • a set of expressions whose top is an IU (labeled define) • a selection predicate of type bool (labeled where) For quantifier blocks and group blocks, the list of inputs is restricted to length one. The SFWD-block and the grouping block additionally have a projection list (labeled select) that indicates which IUs are to be projected (i.e. passed to subsequent operators). Blocks are typed (algebraic) expressions and can thus be mixed with other expressions and algebraic operator trees. An example of a SFWD-block is shown in Fig. 27.5 where dashed lines indicate the produced-by relationship. The graph corresponds to the internal 1 We use a quite general notion of tuple: a tuple is a set of variable (IU) bindings. 549 27.7. TRANSLATION select where > define IU: IU: IU: IU: / 100 100.000 * from IU: name IU: salary IU: budget Attr. Acces "name" Attr. Acces "salary" Attr. Acces "budget" scan key IU IU:e Relation/Extent "Employee" scan key IU IU:d Relation/Extent "Department" Figure 27.5: Internal representation representation of our example query. The semantics of a SFWD-block can be described as follows. First, take the cross product of the collections of tuples found in the list of inputs. (If this is not possible, due to dependencies, d-joins have to be used.) Then, for every resulting tuple, compute the bindings for all the IUs mentioned in the define clause, apply the selection predicate and return all the bindings for the IUs mentioned in the select clause. Although the SFWD-block looks neat, it lacks certain information that must be represented. This information concerns the role of the entries in the from clause and duplicate elimination. Let us start with the latter. There are three views relevant to duplicate processing: 1. the user view: did the user specify distinct? 2. the context view: does the occurrence or elimination of duplicates make 550CHAPTER 27. DETAILS ON THE PHASES OF QUERY COMPILATION a difference for the query result? 3. the processing view: does the block produce duplicates? All this information is attached to a block. This information can then be summarized to one of three values representing • eliminate duplicates • preserve duplicates • don’t care about duplicates (The optimizer can feel free to do whatever is more efficient.) This summary is also attached to every block. Let us illustrate this by a simple example: select distinct ssno from Employee where . . . and exists( select . . . from . . . where ) For the inner block, the user specifies that duplicates are to be preserved. However, duplicates or not does not modify the outcome of exists. Hence, the contextual information indicates that the outcome for the inner block is a don’t care. The processing view can determine whether the block produces duplicates. If for all the entries in the from clause, a key is projected in the select clause, then the query does not produce duplicates. Hence, no special care has to be taken to remove duplicates produced by the outer block if we assume that ssno is the key of Employee. No let us consider the annotations for the arguments in the from clause. The query select distinct e.name from Employee e, Department d where e.dno = d.dno retrieves only Employee attributes. Such a query is most efficiently evaluated by a semi-join. Hence, we can add a semi-join (SJ) annotation to the Department d clause. For queries without a distinct, the result may be wrong (e.g. in case an employee works in several departments) since a typical semi-join just checks for existence. A special semi-join that preserves duplicates should be used. The according annotation is (SJ,PD). Another annotation occurs whenever an outerjoin is used. Outer joins can (in SQL) be part of the from clause. Typically they have to be fully parenthesized since outer joins and regular joins not always commute. But under special circumstances, they commute and hence a list of entries in the from clause suffices [308]. Then, the entry to be preserved (the 27.7. TRANSLATION 551 outer part) should be annotated by (OJ). We use (AJ) as the anti-join annotation, and (DJ) for a d-join. To complete annotation, the case of a regular join can be annotated by (J). If the query language also supports all-quantifications, that translate to divisions, then the annotation (D) should be supported. Since the graphical representation of a query is quite complex, we also use text representations of the result of the NFST phase. Consider the following OQL query: select distinct s.name, s.age, s.supervisor.name, s.supervisor.age from s in Student where s.gpa > 8 and s.supervisor.age < 30 The annotated result (without duplicate annotations) of the normalization and factorization steps is select distinct sn, sa, ssn, ssa from s in Student (J) where sg > 8 and ssa< 30 define sn = s.name sg = s.gpa sa = s.age ss = s.supervisor ssn= ss.name ssa= ss.age Semantic analysis just adds type information (which we never show). In standard relational query processing multiple entries in the from clause are translated into a cross product. This is not always possible in objectoriented query processing. Consider the following query select distinct s from s in Student, c in s.courses where c.name = “Database” which after normalization yields select distinct s from s in Student, c in s.courses where cn = “Database” define cn = c.name The evaluation of c in s.courses is dependend on s and cannot be evaluated if no s is given. Hence, a cross product would not make much sense. To deal with this situation, the d-join has been introduced [189]. It is a binary operator that evaluates for every input tuple from its left input its right input and flattens the result. Consider the algebraic expression given in Fig. 27.6. For every student 552CHAPTER 27. DETAILS ON THE PHASES OF QUERY COMPILATION PROJECT [s] SELECT [cn=”Database”] EXPAND [cn:c.name] SCAN [s:student] D-JOIN [c:s.courses] Figure 27.6: An algebraic operator tree with a d-join s from its left input, the d-join computes the set s.courses. For every course c in s.courses an output tuple containing the original student s and a single course c is produced. If the evaluation of the right argument of the d-join is not dependend on the left argument, the d-join is equivalent with a cross product. The first optimization is to replace d-joins by cross products whenever possible. Queries with a group by clause must be translated using the unary grouping operator GROUP which we denote by Γ. It is defined as Γg;θA;f (e) = {y.A ◦ [g : G]|y ∈ e, G = f ({x|x ∈ e, x.Aθy.A})} where the subscripts have the following semantics: (i) g is a new attribute that will hold the elements of the group (ii) θA is the grouping criterion for a sequence of comparison operators θ and a sequence of attribute names A, and (iii) the function f will be applied to each group after it has been formed. We often use some abbreviations. If the comparison operator θ is equal to “=”, we don’t write it. If the function f is identity, we omit it. Hence, Γg;A abbreviates Γg;=A;id . Let us complete the discussion on internal query representation. We already mentioned algebraic operators like selection and join. These are called logical algebraic operators. There implementations are called physical algebraic operators. Typically, there exist several possible implementations for a single logical algebraic operator. The most prominent example being the join operator with implementations like Grace join, sort-merge join, nested-loop join etc. All the operators can be modelled as objects. To do so, we extend the expression hierarchy by an algebra hierarchy. Although not shown in Fig 27.7, the algebra class should be a subclass of the expression class. This is not necessary for SQL but is a requirement for more orthogonal query languages like OQL. 553 27.8. REWRITE I Expression Algebraic Operator AlgUnary AlgBinary Join Dup SortUnnestSelectChiProjection AlgIfDivision Elim Union AlgNary AlgScan Group SFWD block AlgSetop Intersection Difference Figure 27.7: Algebra 27.8 Rewrite I 27.9 Plan Generation 27.10 Rewrite II 27.11 Code generation In order to discuss the tasks of code generation, it is necessary to have a little understanding of the interface to the runtime system that interpretes the execution plan. I have chosen AODB as an example runtime system since this is one I know. The interface to AODB is defined by the AODB Virtual Machine (AVM). For simple operations, like arithmetic operations, comparisons and so on, AVM provides assembler-like operations that are interpreted at runtime. Simple AVM operations work on registers. A single register is able to hold the contents of exactly one IU. Additionally, AVM provides physical algebraic operators. These operators take AVM programs (possibly with algebraic operators) as arguments. There is one specialty about AVM programs though. In order to efficiently support factorization of common subexpressions involving arithmetic operations (as needed in aggregations like avg, sum), arithmetic operators in AVM can have two side effects. They are able to store the result of the operation into a register and they are able to add the result of the operation to the contents of another register. This is denoted by the result mode. If the result mode is A, they just add the result to some register, if it is C, they copy (store) the result to some register, if it is B, they do both. This is explored in the code for Query 1 of the TPC-D benchmark (Fig. 1.6). Code generation has the following tasks. First it must map the physical operators in a plan to the operators of the AVM code. This mapping is a straight forward 1:1 mapping. Then, the code for the subscripts of the operators has to be generated. Subscripts are for example the predicate expressions for the selection and join operators. For grouping, several AVM programs have to be 554CHAPTER 27. DETAILS ON THE PHASES OF QUERY COMPILATION generated. First program is the init program. It initializes the registers that will hold the results for the aggregate functions. For example, for an average operation, the register is initalized with 0. The advance program is executed once for every tuple to advance the aggregate computation. For example, for an average operations, the value of some register of the input tuple is added to the result register holding the average. The finalize program performs postprocessing for aggregate functions. For example for the average, it devides the sum by the number of tuples. For hash-based grouping, the last two programs (see Fig.1.6) compute the hash value of the input register set and compare the group-by attributes of the input registers with those of every group in the hash bucket. During the code generation for the subscripts factorization of common subexpression has to take place. Another task is register allocation and deallocation. This task is performed by the register manager. It uses subroutines to determine whether some registers are no longer needed. The register manager must also keep track in which register some IU is stored (if at all). Another component used during code generation is a factory that generates new AVM operations. This factory is associated with a table driven component that maps the operations used in the internal query representation to AVM opcodes. 27.12 Bibliography Chapter 28 Hard-Wired Algorithms 28.1 Hard-wired Dynamic Programming 28.1.1 Introduction Plan generation is performed block-wise. The goal is to generate a plan for every block. Typically, not all possible plans are generated. For example, the group operator (if necessary for the query) is mostly performed last (see also Sec. ??). This mainly leaves ordering joins and selections as the task of plan generation. A plan is an operator tree whose node consist of physical algebraic operators, e.g. selection, sort-operator, sort-merge and other joins, relation and index scans. The process of plan generation has received a lot of attention. Often, the term query optimization is used synonymous for the plan generation phase. Figure 28.1 shows a plan for the block select from where e.name Employee e, Department d e.dno = d.dno and d.name = “shoe” The bottom level contains two table scans that scan the base tables Employee and Department. Then, a selection operator is applied to restrict the departments to those named “shoe”. A nested-loop join is used to select those employees that work in the selected departments. The projection restricts the output to the name of the employees, as required by the query block. For such a plan, a cost function is used to estimate its cost. The goal of plan generation is to generate the cheapest possible plan. Costing is briefly sketched in Section ??. The foundation of plan generation are algebraic equivalences. For e, e1 , e2 , . . . being algebraic expressions and p, q predicates, here are some example equiva555 556 CHAPTER 28. HARD-WIRED ALGORITHMS Project (e.name) NL−Join (e.dno = d.dno) Select (d.name = "shoe") Table Scan (Employee[e]) Table Scan (Department[d]) Figure 28.1: A sample execution plan lences: σp (σq (e)) ≡ σq (σp (e)) σp (e1 1q e2 ) ≡ (σp (e1 )) 1q e2 if p is applicable to e1 e1 1p e2 ≡ e2 1p e1 (e1 1p e2 ) 1q e3 ≡ e1 1p (e2 1q e3 ) e1 ∪ e2 ≡ e2 ∪ e1 (e1 ∪ e2 ) ∪ e3 ≡ e1 ∪ (e2 ∪ e3 ) e1 ∩ e2 ≡ e2 ∩ e1 (e1 ∩ e2 ) ∩ e3 ≡ e1 ∩ (e2 ∩ e3 ) σp (e1 ∩ e2 ) ≡ σp (e1 ) ∩ e2 For more equivalences and conditions that ought to be attached to the equivalences see the appendix ??. Note that commutativity and associativity of the join operator allow an arbitrary ordering. Since the join operator is the most expensive operation, ordering joins is the most prominent problem in plan generation. These equivalences are of course independent of the actual implementation of the algebraic operators. The total number of plans equivalent to the original query block is called the potential search space. However, not always is the total search space considered. The set of plans equivalent to the original query considered by the plan generator is the actual search space. Since the System R plan generator [784], certain restrictions are applied. The most prominent are: • Generate only plans where selections are pushed down as far as possible. • Do not consider cross products if not absolutely necessary. • Generate only left-deep trees. 557 28.1. HARD-WIRED DYNAMIC PROGRAMMING B B B B R1 R4 B R3 R2 R1 left−deep tree B R2 R3 R4 bushy tree Figure 28.2: Different join operator trees • If the query block contains a grouping operation, the group operator is performed last. Some comments are in order. Cross products are only necessary, if the query graph is unconnected where a query graph is defined as follows: the nodes are the relations and the edges correspond to the predicates (boolean factors 1 ) found in the where clause. More precisely, the query graph is a hypergraph, since a boolean factor may involve more than two relations. A left-deep tree is an operator tree where the right argument of a join operator always is a base relation. A plan with join operators whose both arguments are derived by other join operators is called bushy tree. Figure 28.2 gives an example of a left-deep tree and a bushy tree. If we take all the above restrictions together, the problem boils down to ordering the join operators or relations. This problem has been studied extensively. The complexity of finding the best (according to some cost function) ordering (operator tree) was first studied by Ibaraki and Kameda [438]. They proved that the problem of generating optimal left-deep trees with no cross products is NP-hard for a special block-wise nested loop join cost function. This cost function applied in the proof is quite complex. Later is was shown that even if the cost function is very simple, the problem remains NP-hard [194]. The cost function (Cout ) used there just adds up intermediate results sizes. This cost function is interesting in that it is the kernel of many other cost functions and it fulfills the ASI property of which we now the following: If the cost function fulfills the ASI property and the query graph is acyclic, then the problem can be solved in polynomial time [438, 520]. Ono and Lohman gave examples that considering cross products can substantially improve performance [653]. However, generating optimal left-deep trees with cross products even for Cout makes the problem NP-hard [194]. Generating optimal bushy trees is 1 A boolean factor is a disjunction of basic predicates in a conjunctive normal form. 558 CHAPTER 28. HARD-WIRED ALGORITHMS even harder. Even if there is no predicate, that is only cross products have to be used, the problem is NP-hard [768]. This is surprising since generating left-deep trees with cross products as the only operation is very simple: just sort the relations by increasing sizes. Given the complexity of the problem, there are only two alternatives to generate plans: either explore the total search space or use heuristics. The former can be quite expensive. This is the reason why the above mentioned restrictions to the search space have traditionally been applied. The latter approach risks missing good plans. The best-known heuristics is to join the relation next, that results in the smallest next intermediate result. Estimating the cardinality of such results is discussed in Section ??. Traditionally, selections where pushed as far down as possible. However, for expensive selection predicates (e.g. user defined predicates, those involving user-defined functions, predicates with subqueries) this does not suffice. For example, if a computer vision application has to compute the percentage of snow coverage for a given set of satellite images, this is not going to be cheap. In fact, it can be more expensive than a join operation. In these cases, pushing the expensive selection down misses good plans. That is why lately research started to take expensive predicates into account. However, some of the proposed solutions do not guarantee to find the optimal plans. Some approaches and their bugs are discussed in [156, 415, 413, 767, 769]. Although we will subsequently give an algorithm that incorporates correct predicate placement, not all plan generators do so. An alternative approach (though less good) is to pull-up expensive predicates in the Rewrite-II-phase. There are several approaches to explore the search space. The original approach is to use dynamic programming [784]. The dynamic programming algorithm is typically hard-coded. Figure 28.3 illustrates the principle of bottom-up plan generation as applied in dynamic programming. The bottom level consists of the original relations to be joined. The next level consists of all plans that join a subset of cardiniality two of the original relations. The next level contains all plans for subsets of cardinality three, and so on. With the advent of new query optimization techniques, new data models, extensible database systems, researchers where no longer satisfied with the hard-wired approach. Instead, they aimed for rule-based plan generation. There exist two different approaches for rule-based query optimizers. In the first approach, the algebraic equivalences that span the search space are used to transform some initial query plan derived from the query block into alternatives. As search strategies either exhaustive search is used or some stochastic approach such as simulated annealing, iterative improvement, genetic algorrithms and the like [74, 446, 452, 453, 834, 860, 859, 862]. This is the transformation-based approach. This approach is quite inefficient. Another approach is to generate plans by rules in a bottom-up fashion. This is the generation-based approach. In this approach, either a dynamic programming algorithm [565] is used or memoization [360]. It is convenient to classify the rules used into logical and physical rules. The logical rules directly reflect the algebraic equivalences. The physical rules or implementation rules transform a logical algebraic operator into a physical algebraic operator. For example, a join-node becomes a nested- 28.1. HARD-WIRED DYNAMIC PROGRAMMING R123 R123 R13 R12 R1 R2 R123 R23 R3 559 <== 2 alternatives pruned <== first set of partial plans generated <== input relations Figure 28.3: Bottom up plan generation loop join node. 28.1.2 A plan generator for bushy trees Within the brief discussion in the last subsection, we enumerated plans such that first all 1-relation plans are generated, then all 2-relation plans and so on. This enumeration order is not the most efficient one. Let us consider the simple problem where we have to generate exactly one best plan for the subsets of the n element set of relations to be joined. The empty subset is not meaningful, leaving the number of subsets to be investigated at 2n − 1. Enumerating these subsets can be done most efficient by enumerating them in counting order . That is, we initialize a n bit counter with 1 and count until have reached 2n − 1. The n bits represent the subsets. Note that with this enumeration order, plans are still generated bottom up. For a given subset R of the relations (encoded as the bit pattern a), we have to generate a plan from subsets of this subset (encoded as the bit pattern s). For example, if we only want to generate left-deep trees, then we must consider 1 element subsets and their complements. If we want to generate bushy trees, all subsets must be considered. We can generate these subsets by a very fast algorithm developed by Vance and Maier [898]: s = a & -a; while(s) { s = a & (s - a); process(s); } The meaning of process(s) depends on the kinds of plans we generate. If we concentrate on join ordering neglecting selection operations (i.e. pushing them) This step essentially looks up the plans for s and its complement s and then joins the plans found there. Lookup is best implemented via an array with s as an index. 560 CHAPTER 28. HARD-WIRED ALGORITHMS 28.1.3 A plan generator for bushy trees and expensive selections Figure 28.4 shows the pseudocode of a dynamic programming algorithm that generates plans with cross products, selections, and joins. It generates optimal bushy trees. Efficient implementation technique for the algorithm can be found in [898, 769]. As input parameters, the algorithm takes a set of relations R and a set of predicates P . The set of relations for which a selection predicate exists is denoted by RS . We identify relations and predicates that apply to these relations. For all subsets Mk of the relations and subsets Pl of the predicates, an optimal plan is constructed and entered into the table T . The loops range over all Mk and Pl . Thereby, the set Mk is split into two disjoint subsets L and L′ , and the set Pl is split into three parts (line 7). The first part (V ) contains those predicates that apply to relations in L only. The second part (V ′ ) contains those predicates that apply to relations in L′ only. The third part (p) is a conjunction of all the join predicates connecting relations in L and L′ (line 8). Line 9 constructs a plan by joining the two plans found for the pairs [L, V ] and [L′ , V ′ ] in the table T . If this plan has so far the best costs, it is memoized in the table (lines 10-12). Last, different possibilities of not pushing predicates in Pl are investigated (lines 15-19). Another issue that complicates the application of dynamic programming are certain properties of plans. The most prominent such properties are interesting orders [784, 818, 819]. Take a look at the following query: select d.no, e.name from Employee e, Department d where e.dno = d.dno order by d.dno Here, the user requests the result to be order on d.dno. Incidentally, this is also a join attribute. During bottom up plan generation, we might think that a Grace hash join is more efficient than a sort-merge join since the cost of sorting the relations is too high. However, the result has to be sorted anyway so that this sort may pay off. Hence, we have have to keep both plans. The approach is the following. In the example, an ordering on d.dno is called an interesting order. In general, any order that is helpful for ordering the output as requested by the user, for a join operator, for a grouping operator, or for duplicate elimination is called an interesting order . The dymamic programming algorithm is then modified such that plans are not pruned, if they produce different interesting orders. 28.1.4 A plan generator for bushy trees, expensive selections and functions 28.2 Bibliography 561 28.2. BIBLIOGRAPHY proc Optimal-Bushy-Tree(R, P ) 1 for k = 1 to n do 2 for all k-subsets Mk of R do 3 for l = 0 to min(k, m) do 4 for all l-subsets Pl of Mk ∩ RS do 5 best cost so far = ∞; 6 for all subsets L of Mk with 0 < |L| < k do 7 L′ =VMk \ L, V = Pl ∩ L, V ′ = Pl ∩ L′ ; 8 p = {pi,j | pi,j ∈ P, Ri ∈ V, Rj ∈ V ′ }; 9 T = (T [L, V ] 1p T [L′ , V ′ ]); 10 if Cost(T) < best cost so far then 11 best cost so far = Cost(T); 12 T [Mk , Pl ] = T ; 13 fi; 14 od; 15 for all R ∈ Pl do 16 T = σR (T [Mk , Pl \ {R}]); 17 if Cost(T) < best cost so far then 18 best cost so far = Cost(T); 19 T [Mk , Pl ] = T ; 20 fi; 21 od; 22 od; 23 od; 24 od; 25 od; 26 return T [R, S]; // p=true might hold Figure 28.4: A Dynamic Programming Optimization Algorithm 562 CHAPTER 28. HARD-WIRED ALGORITHMS Chapter 29 Rule-Based Algorithms 29.1 Rule-based Dynamic Programming The section is beyond the scope of the paper and the reader is refered to the starburst papers, especially [388, 541, 540, 565, 567]. 29.2 Rule-based Memoization This section is beyond the scope of the paper and the reader is refered to the Volcano and Cascade papers [345, 350, 356, 359, 360]. Both optimizer frameworks derived from the earlier Exodus query optimizer generator [343, 357]. 29.3 Bibliography 563 564 CHAPTER 29. RULE-BASED ALGORITHMS Chapter 30 Example Query Compiler 30.1 Research Prototypes 30.1.1 AQUA and COLA 30.1.2 Black Dahlia II 30.1.3 Epoq Für das objektorientierte Datenmodell Encore [969] wurde die Anfragesprache Equal [802, 801, 803], eine objektorientierte Algebra, die die Erzeugung von Objekten erlaubt, entwickelt. Zur Optimierung von Equal-Algebra-Ausdrücken soll der Optimierer Epoq dienen. Eine Realisierung von Epoq steht noch aus. Konkretisiert wurden jedoch bereits der Architekturansatz [612] und die Kontrolle der Alternativenerzeugung [611] innerhalb dieser Architektur. Einen Gesamtüberblick gibt die Dissertation von Mitchell [610]. Der Architekturvorschlag besteht aus einer generischen Architektur, die an einem Beispieloptimierer konkretisiert wurde [610, 611]. Die elementaren Bausteine der Architektur sind Regionen. Sie bestehen aus einer Kontrollkomponente und wiederum Regionen beziehungsweise Transformationen. Die einfachste Region ist dabei eine Transformation/Regel, die einen Algebraausdruck in einen äquivalenten Algebraausdruck umformt. Jede Region selbst wird wiederum als eine Transformation aufgefaßt. Innerhalb der Architektur werden nun diese Regionen in einer Hierarchie oder auch einem gerichteten azyklischen Graphen, organisiert. Abbildung 30.1 zeigt eine solche Beispielorganisation. Regionen selbst können bis auf die Kontrolle als Module im Sinne von Sciore und Sieg [780] aufgefaßt werden. Sie weisen sehr ähnliche Parameter und Schnittstellen auf. Während jedoch bei Sciore und Sieg die Kontrollstrategie eines Moduls aus einer festen Menge von gegebenen Kontrollstrategien ausgewählt werden muß, kann sie hier freier spezifiziert werden. Unabhängig davon, ob die Transformationen einer Region wiederum Regionen sind oder elementare Transformationen, wird ihre Anwendung einheitlich von der Kontrolle der Region bestimmt. Die Aufgabe dieser Kontrolle besteht darin, eine Folge von Transformationen zu finden, die die gegebene Anfrage in eine äquivalente überführen. Sinngebend ist hierbei ein gewisses Ziel, das es zu erreichen gilt. Beispielsweise kann dieses Ziel lauten: Optimiere eine 565 566 CHAPTER 30. EXAMPLE QUERY COMPILER Globale Kontrolle Regionen ? ? ? Kontrolle Kontrolle Kontrolle Transformation Transformation Transformation ? ? Kontrolle Kontrolle Transformation Transformation Figure 30.1: Beispiel einer Epoq-Architektur geschachtelte Anfrage. Um dieses Ziel zu erreichen, sind zwei grobe Schritte notwendig. Zunächst muß die Anfrage entschachtelt werden und als nächstes die entschachtelte Anfrage optimiert werden. Man sieht sofort, daß die Folge der Transformationen, die die Kontrolle auszuwählen hat, sowohl von den Eigenschaften der Anfrage selbst wie auch vom zu erfüllenden Ziel abhängt. Basierend auf dieser Beobachtung wird die Kontrolle nicht als Suchfunktion implementiert, sondern es wird das Planungsparadigma zur Realisierung gewählt. Die Kontrolle selbst wird mit Hilfe eines Satzes von Regeln spezifiziert, die aus Vorbedingung und Aktion bestehen. Da es nicht möglich ist, im Vorfeld einen Plan, also eine Sequenz von Transformationen/Regionen, zu erstellen, der in garantierter Weise das Ziel erreicht, wird erlaubt, daß die Ausführung einer Transformation/Region fehlschlägt. In diesem Fall kann dann ein alternativer Plan erzeugt werden, der aber auf dem bisher Erreichten aufsetzt. Hierzu werden die Regeln, die die Kontrolle spezifizieren in Gruppen eingeteilt, wobei jeder Gruppe eine einheitliche Vorbedingung zugeordnet ist. Zu jeder Gruppe gehört dann eine Sequenz von Aktionen, die der Reihe nach ausprobiert werden. Schlägt eine vorangehende Aktion fehl, so wird die nächste in der Reihe der Aktionen angewendet. Schlagen alle Ak- 30.1. RESEARCH PROTOTYPES 567 tionen fehl, so schlägt auch die Anwendung der Region fehl. Jede Aktion selbst ist wiederum eine Sequenz von elementaren Aktionen. Jede dieser elementaren Aktionen ist entweder die Anwendung einer elementaren Transformation, der Aufruf einer Region oder der rekursive Aufruf des Planers mit einem neuformulierten Ziel, dessen Teilplan dann an entsprechender Stelle in die Aktion eingebaut wird. Die Erweiterbarkeit dieses Ansatzes um neue Regionen scheint einfach möglich, da die Schnittstelle der Regionen genormt ist. Probleme könnte es lediglich bei den Kontrollstrategien geben, da nicht klar ist, ob die benutzte Regelsprache mächtig genug ist, um alle wünschenswerten Kontrollstrategien zu verwirklichen. Die Frage, ob die einzelnen Komponenten des Optimierers, also die Regionen, evaluiert werden können, ist schwierig zu beantworten. Dafür spricht jedoch, daß jede Region in einem gewissen Kontext aufgerufen wird, also zur Erreichung eines bestimmten Zieles bei der Optimierung einer Anfrage mit ebenso bestimmten Eigenschaften. Beurteilen kann man daher die Erfolgsquote einer Region innerhalb ihrer verschiedenen Anwendungen. Da jede Region lediglich eine Alternative erzeugen darf, aufgrund des eine Region ist eine Transformation-Paradigmas, ist schwer zu sagen, in wieweit sich die durch die beschriebene Bewertung gewonnene Information zur Verbesserung der Regionen oder des Gesamtoptimierers einsetzen läßt. Da auch hier der transformierende Ansatz zugrunde liegt, treffen die bereits diskutierten Probleme auch für den Optimierer für Straube zu. Einen stetigen Leistungsabfall könnte man durch die Realisierung von alternativen Regionen erreichen, indem man ein Ziel OptimiereSchnell einführt, das dann entsprechend weniger sorgfältige, aber schnellere Regionen aufruft. Vorhersagen über der Güte (bei gegebener Optimierungszeit) scheinen aber schwerlich möglich. 30.1.4 Ereq A primary goal of the EREQ project is to define a common architecture for the next generation of database managers. This architecture now includes * the query language OQL (a la ODMG), * the logical algebra AQUA (a la Brown), and * the physical algebra OPA (a la OGI/PSU). It also includes * software to parse OQL into AQUA (a la Bolo) and query optimizers: * OPT++ (Wisconsin), * EPOQ (Brown), * Cascades (PSU/OGI), and * Reflective Optimizer (OGI). In order to test this architecture, we hope to conduct a ”bakeoff” in which the four query optimizers will participate. The primary goal of this bakeoff is to determine whether optimizers written in different contexts can accommodate the architecture we have defined. Secondarily, we hope to collect enough performance statistics to draw some conclusions about the four optimizers, which have been written using significantly different paradigms. 568 CHAPTER 30. EXAMPLE QUERY COMPILER Modellbeschreibung ? Optimierergenerator ? C-Compiler ? Anfrage - Synt. Analyse - Optimierer Anfrage Graph - Interpreter Auswertungsplan Figure 30.2: Exodus Optimierer Generator At present, OGI and PSU are testing their optimizers on the bakeoff queries. Here is the prototype bakeoff optimizer developed at OGI. This set of Web pages is meant to report on the current progress of their effort, and to define the bakeoff rules. Please email your suggestions for improvement to Leo Fegaras fegaras@cse.ogi.edu. Leo will route comments to the appropriate author. http://www.cse.ogi.edu/DISC/projects/ereq/bakeoff/bakeoff.html 30.1.5 Exodus/Volcano/Cascade Im Rahmen des Exodus-Projektes wurde ein Optimierergenerator entwickelt [357]. Einen Überblick über den Exodus-Optimierergenerator gibt Abbildung 30.2. Ein Model description file enthält alle Angaben, die für einen Optimierer nötig sind. Da der Exodus-Optimierergenerator verschiedene Datenmodelle unterstützen soll, enthält dieses File zunächst einmal die Definition der verfügbaren Operatoren und Methoden. Dabei werden mit Operatoren die Operatoren der logischen Algebra bezeichnet und mit Methoden diejenigen der physischen Algebra, also die Implementierungen der Operatoren. Das Model description file enthält weiterhin zwei Klassen von Regeln. Transformationen basieren auf algebraischen Gleichungen und führen einen Operatorbaum in einen anderen über. Implementierungsregeln wählen für einen gegebenen Operator eine Methode aus. Beide Klassen von Regeln haben einen linken Teil, der mit einem Teil des aktuellen Operatorgraphen übereinstimmen muß, einen rechten Teil, der den Operatorgraphen nach Anwendung der Regel beschreibt, und eine Bedingung, die erfüllt sein muß, damit die Regel angewendet werden kann. Während die linke und rechte Seite der Regel als Muster angegeben werden, wird die Bedingung durch C-Code beschrieben. Auch für die Tranformation lassen sich C-Routinen verwenden. In einer abschließenden Sektion des Model description files finden sich dann die benötigten C-Routinen. - Antwort 30.1. RESEARCH PROTOTYPES 569 Aus dem Model description file wird durch den Optimierergenerator ein C-Programm erzeugt, das anschließend übersetzt und gebunden wird. Das Ergebnis ist dann der Anfrageoptimierer, der in der herkömmlichen Art und Weise verwendet werden kann. Es wurde ein übersetzender Ansatz für die Regeln gewählt und kein interpretierender, da in einem von den Autoren vorher durchgeführten Experiment sich die Regelinterpretation als zu langsam erwiesen hat. Die Regelabarbeitung im generierten Optimierer verwaltet eine Liste OPEN, in der alle anwendbaren Regeln gehalten werden. Ein Auswahlmechanismus bestimmt dann die nächste anzuwendende Regel und entfernt sie aus OPEN. Nach deren Anwendung werden die hierdurch ermöglichten Regelanwendungen detektiert und in OPEN vermerkt. Zur Implementierung des Auswahlmechanismus werden sowohl die Kosten eines aktuellen Ausdrucks als auch eine Abschätzung des Potentials einer Regel in Betracht gezogen. Diese Abschätzung des Potentials berechnet sich aus dem Quotienten der Kosten für einen Operatorbaum vor und nach Regelanwendung für eine Reihe von vorher durchgeführten Regelanwendungen. Mit Hilfe dieser beiden Angaben, den Kosten des aktuellen Operatorgraphen, auf den die Regel angewendet werden soll, und ihres Potentials können dann Abschätzungen über die Kosten des erzeugten Operatorgraphen berechnet werden. Die Suchstrategie ist Hill climbing. Der von den Autoren vermerkte Hauptnachteil ihres Optimierergenerators, den sie jedoch für alle transformierenden regelbasierten Optimerer geltend machen, ist die Unmöglichkeit der Abschätzung der absoluten Güte eines Operatorbaumes und des Potentials eines Operatorbaumes im Hinblick auf zukünftige Optimierungen. Dadurch kann niemals abgeschätzt werden, ob der optimale Operatorbaum bereits erreicht wurde. Erst nach Generierung aller Alternativen ist die Auswahl des optimalen Operatorbaumes möglich. Weiter bedauern es die Autoren, daß es nicht möglich ist, den A∗-Algorithmus als Suchfunktion zu verwenden, da die Abschätzung des Potentials oder der Distanz zum optimalen Operatorgraphen nicht möglich ist. Zumindest kritisch gegenüberstehen sollte man auch der Bewertung einzelner Regeln, da diese, basierend auf algebraischen Gleichungen, von zu feiner Granularität sind, als daß eine allgemeine Bewertung möglich wäre. Die erfolgreiche Verwendung des Vertauschens zweier Verbundoperationen in einer Anfrage bedeutet noch lange nicht, daß diese Vertauschung auch in der nächsten Anfrage die Kosten verringert. Die Hauptursache für die kritische Einstellung gegenüber dieser recht ansprechenden Idee ist, daß eine Regelanwendung zu wenig Information/Kontext berücksichtigt. Würde dieses Manko beseitigt, wären Regeln also von entschieden gröberer Granularität, so erschiene dieser Ansatz vielversprechend. Ein Beispiel wäre eine Regel, die alle Verbundoperationen gemäß einer gegebenen Heuristik ordnet, also ein komplexer Algorithmus, der mehr Wissen in seine Entscheidungen einbezieht. Graefe selbst führt einige weitere Nachteile des Exodus-Optimierergenerators an, die dann zur Entwicklung des Volcano-Optimierergenerators führten [359, 360]. Unzureichend unterstützt werden • nicht-triviale Kostenmodelle, 570 CHAPTER 30. EXAMPLE QUERY COMPILER • Eigenschaften, • Heuristiken und • Transformationen von Subskripten von algebraischen Operatoren in algebraische Operatoren. Der letzte Punkt ist insbesondere im Bereich der Objektbanken wesentlich, um beispielsweise Pfadausdrücke in eine Folge von Verbundoperationen umwandeln zu können. Im Volcano-Optimierergenerator werden algebraische Ausdrücke wieder in einen Operatorbaum umgewandelt. Wie im Exodus-Optimierergenerator wird der Optimierer wieder mit einer Menge von transformierenden und implementierenden Regeln beschrieben. Die Nachteile des transformierenden Ansatz werden somit geerbt. Eine Trennung in zwei Phasen, wie bei vielen Optimierern anzutreffen, ist für den Volcano-Optimierergenerator nicht notwendig. Der Entwickler des Optimierers hat die Freiheit, die Phasen selbst festzulegen. Die Probleme, die sonst bei der Kopplung der algebraischen mit der nichtalgebraischen Optimierung auftreten, können also vermieden werden. Die Behandlung der Eigenschaften erfolgt zielorientiert. Die in der Anfrage geforderten Eigenschaften (bspw. Sortierung), werden der Suchfunktion als Parameter übergeben, damit gezielt Pläne erstellt werden, die diese erfüllen. Wenn ein Operator oder eine Methode eingebaut wird, so wird darauf geachtet, daß diese noch nicht erfüllten Eigenschaften durch den Operator oder die Methode erzielt werden. Die geforderten Eigenschaften dienen wieder als Zielbeschreibung für die nachfolgenden Aufrufe der Suchfunktion. Zu diesen Eigenschaften gehören auch Kostengrenzen, mit denen die Suchfunktion dann einen Branch-and-boundAlgorithmus implementiert. Bevor ein Plan für einen algebraischen Ausdruck generiert wird, wird in einer Hash-Tabelle nachgeschaut, ob ein entsprechender Ausdruck mit den geforderten Eigenschaften bereits existiert. Dadurch wird Doppeltarbeit vermieden. Bei beiden Optimierergeneratoren werden die Forderungen nach stetigem Leistungsabfall, früher Bewertung von Alternativen und Evaluierbarkeit einzelner Komponenten nicht erfüllt. 30.1.6 Freytags regelbasierte System R-Emulation [295] zeigt, wie man mit Hilfe eines regelbasierten Ansatzes den Optimierer von System R [784] emulieren kann. Die Eingabe besteht aus einem Lisp-ähnlichen Ausdruck: (select ) Die Projektionsliste besteht aus Attributspezifikationen der Form . 571 30.1. RESEARCH PROTOTYPES Anfrage ? Generierung des allg. Ausdrucks ? Zugriffsplangenerierung ? JoinReihenfolge und -Methoden ? Auswertungsplan Figure 30.3: Organisation der Optimierung Diese werden auch für die Selektionsprädikate und Joinprädikate verwendet. Die Algebra beinhaltet sowohl Operatoren der logischen als auch der physischen Algebra. Im einzelnen gibt es Scan-, Sort-, Projektions, Verbundoperatoren in einer logischen und verschiedenen physischen Ausprägungen. Die Erzeugung der Auswertungspläne wird in verschiedene Schritte unterteilt, die wiederum in Teilschritte zerlegt sind (siehe Abb. 30.3). Zunächst erfolgt die Übersetztung in die logische Algebra. Hier werden Scan-Operatoren um die Relationen gebaut und Selektionen, die nur eine Relation betreffen, in die Scan-Opertoren eingebaut. Der zweite Schritt generiert Zugriffspläne, indem der Scan-Operator durch einen einfachen File-Scan (FSCAN) ersetzt wird, oder falls möglich, durch einen Index-Scan (ISCAN). Der dritte Schritt generiert zunächst verschiedene Verbund-Reihenfolgen und bestimmt anschließend die Verbund-Methoden. Sie in System R wird zwischen Sort-merge- und Nested-loop-join unterschieden. Es werden keinerlei Aussagen über die Auswahl einer Suchstrategie gemacht. Ziel ist es vielmehr, durch die Modellierung des System R Optimierers mit Hilfe eines Regelsystems die prinzipielle Brauchbarkeit des regelbasierten Ansatzes nachzuweisen. 30.1.7 Genesis Das globale Ziel des Genesisprojektes [59, 60, 61, 64] war es, die gesamte Datenbanksoftware zu modularisieren und eine erhöhte Wiederverwendbarkeit von Datenbankmodulen zu erreichen. Zwei Teilziele wurden hierbei angestrebt: 572 CHAPTER 30. EXAMPLE QUERY COMPILER 1. Standardisierung der Schnittstellen und 2. Formulierung der Algorithmen unabhängig von der DBMS-Implementierung. Wir interessieren uns hier lediglich für die Erreichung der Ziele beim Bau von Optimierern [57, 62]. Die Standardisierung der Schnittstellen wird durch eine Verallgemeinerung von Anfragegraphen erreicht. Die Algorithmen selbst werden durch Transformationen auf Anfragegraphen beschrieben. Man beachte, daß dies nicht bedeutet, daß die Algorithmen auch durch Transformationregeln implementiert werden. Regeln werden lediglich als Beschreibungsmittel benutzt, um die Natur der Wiederverwendbarkeit von Optimierungsalgorithmen zu verstehen. Die Optimierung wird in zwei Phasen eingeteilt, die Reduktionsphase und die Verbundphase. Die Reduktionsphase bildet Anfragegraphen, die auf nicht reduzierten Datenmengen arbeiten, auf solche ab, die auf reduzierten Datenmengen arbeiten. Die Reduktionsphase orientiert sich also deutlich an den Heuristiken zum Durchschieben von Selektionen und Projektionen. Die zweite Phase bestimmt Verbundordnungen. Damit ist die in den Papieren beschriebene Ausprägung des Ansatzes sehr konservativ in dem Sinne, daß nur klassische Datenmodelle betrachtet werden. Eine Anwendung der Methodik auf objektorientierte oder deduktive Datenmodelle steht noch aus. Folglich lassen sich nur die existierenden klassischen Optimierungsansätze mit diesen Mitteln hinreichend gut beschreiben. Ebenso lassen sich die existierenden klassischen Optimierer mit den vorgestellten Mitteln als Zusammensetzung der ebenfalls im Formalismus erfaßten Algorithmen beschreiben. Die Zusammensetzung selbst wird mit algebraischen Termersetzungen beschrieben. Durch neue Kompositionsregeln lassen sich dann auch neue Optimierer beschreiben, die andere Kombinationen von Algorithmen verwenden. Durch die formale, implementierungsunabhängige Beschreibung sowohl der einzelnen Optimierungsalgorithmen als auch der Zusammensetzung eines Optimierers wird die Wiederverwendbarkeit von bestehenden Algorithmen optimal unterstützt. Wichtig dabei ist auch die Verwendung der standardisierten Anfragegraphen. Dieser Punkt wird allerdings aufgeweicht, da auch vorgesehen ist, verschiedene Darstellungen von Anfragegraphen zu verwenden [60]. Hierdurch wird die Wiederverwendung von Implementierungen von Optimierungsalgorithmen natürlich in Frage gestellt, da diese üblicherweise nur auf einer bestimmten Darstellung der Anfragegraphen arbeiten. Wenn neue Optimierungsansätze entwickelt werden, so lassen sie sich ebenfalls im vorgestellten Formalismus beschreiben. Gleiches gilt auch für neue Indexstrukturen, da auch diese formal beschrieben werden [58, 63]. Nicht abzusehen ist, in wieweit der standardisierte Anfragegraph Erweiterungen standhält. Dies ist jedoch kein spezifisches Problem des Genesisansatzes, sondern gilt für alle Optimierer. Es ist noch offen, ob es gelingt, die Optimierungsalgorithmen so zu spezifizieren und zu implementieren, daß sie unabhängig von der konkreten Darstellung oder Implementierung der Anfragegraphen arbeiten. Der objektorientierte Ansatz kann hier nützlich sein. Es erhebt sich jedoch die Frage, ob bei Einführung eines neuen Operators die bestehenden Algorithmen so implemen- 30.1. RESEARCH PROTOTYPES 573 tierbar sind, daß sie diesen ignorieren können und trotzdem sinnvolle Arbeit leisten. Die Beschränkung auf zwei Optimierungsphasen, die Reduktions- und die Verbundphase, ist keine Einschränkung, da auch sie mittels Termersetzungsregeln festgelegt wurde, und somit leicht geändert werden kann. Da die Beschreibungen des Optimierers und der einzelnen Algorithmen unabhängig von der tatsächlichen Implementierung sind, sind auch die globale Kontrolle des Optimierers und die lokalen Kontrollen der einzelnen Algorithmen voneinander losgelöst. Dieses ist eine wichtige Forderung, um Erweiterbarkeit zu erreichen. Sie wird oft bei regelbasierten Optimierern verletzt und schränkt somit deren Erweiterbarkeit ein. Die Evaluierbarkeit, die Vorhersagbarkeit und die frühe Bewertung von Alternativen sind mit dem vorgestellten Ansatz nicht möglich, da die einzelnen Algorithmen als Transformationen auf dem Anfragegraphen aufgefaßt werden. Dieser Nachteil gilt jedoch nicht allein für den hier vorgestellten Genesisansatz, sondern generell für alle bis auf einen Optimierer. Es ist allerdings nicht absehbar, ob dieser Nachteil aus dem verwendeten Formalismus resultiert oder lediglich aus deren Konkretisierung bei der Modellierung bestehender Optimierer. Es ist durchaus möglich, daß der Formalismus mit leichten Erweiterungen auch andere Ansätze, insbesondere den generierenden, beschreiben kann. Insgesamt handelt es sich beim Genesisansatz um einen sehr brauchbaren Ansatz. Leider hat er, im Gegensatz zur Regelbasierung, nicht genug Widerhall gefunden hat. Er hat höchst wahrscheinlich mehr Möglichkeiten, die Anforderungen zu erfüllen, als bisher ausgelotet wurde. 30.1.8 GOMbgo 574 CHAPTER 30. EXAMPLE QUERY COMPILER GOMql-Anfrage ? Übersetzung und Vorverarbeitung Termrepräsentation ? ASR-Schema - Regelanwendung    Heuristik HH Regelbasis   HH Y HH Liste der optimierten Terme ? Selektion und Polierung  optimierter Term ? Code Generator Auswertungsplan (QEP) ? Figure 30.4: Ablauf der Optimierung Kostenmodell 575 30.1. RESEARCH PROTOTYPES heuristics transformation rules heuristic evaluator tool− box cond mgr rule application query pattern matcher transf mgr environment manager Schema Manager − types − access support relations Figure 30.5: Architektur von GOMrbo optimized query alternatives 576 CHAPTER 30. EXAMPLE QUERY COMPILER X ? Normalisierung ? Algebraischeoptimierung 1 @ ? Konstante u. gemeinsame Teilausdrücke @ @ χ π @ @ @ ? sort Übersetzung in Ausdrucksalgebra @ @ @ σ ? nicht-algebr. Optimierung head @ @ @ REL ? Figure 30.6: a) Architektur des Gral-Optimierers; b) Operatorhierarchie nach Kosten 30.1.9 Gral Gral ist ein erweiterbares geometrisches Datenbanksystem. Der für dieses System entwickelte Optimierer, ein regelbasierter Optimierer in Reinkultur, erzeugt aus einer gegebenen Anfrage in fünf Schritten einen Ausführungsplan (s. Abb. 30.6 a) [67]. Die Anfragesprache ist gleich der verwendeten deskriptiven Algebra (descriptive algebra). Diese ist eine um geometrische Operatoren erweiterte relationale Algebra. Als zusätzliche Erweiterung enthält sie die Möglichkeit, Ausdrücke an Variablen zu binden. Ein Auswertungsplan wird durch einen Ausdruck der Ausführungsalgebra (executable algebra) dargestellt. Die Ausführungsalgebra beinhaltet im wesentlichen verschiedene Implementierungen der deskriptiven Algebra und Scan-Operationen. Die Trennung zwischen deskriptiver Algebra und Ausführungsalgebra ist strikt, das heißt, es kommen keine gemischten Ausdrücke vor (außer während der expliziten Konvertierung (Schritt 4)). Die Schritte 1 und 3 sind durch feste Algorithmen implementiert. Während 30.1. RESEARCH PROTOTYPES 577 der Normalisierung (Schritt 1) werden Variablenvorkommen durch die an sie gebundenen Ausdrücke ersetzt. Dies ist notwendig, um das Optimierungspotential vollständig erschließen zu können. Schritt 3 führt für konstante Ausdrücke Variablen ein. Die entspricht der Entschachtelung von Anfragen vom Typ N und A (s. Kapitel ?? und [494]). Die Behandlung von gemeinsamen Teilausdrücken ist noch nicht implementiert, aber für Schritt 3 vorgesehen. Die Schritte 2, 4 und 5 sind regelbasiert. Zur Formulierung der Regeln wird eine Regelbeschreibungssprache (rule description language) verwendet. Die Beschreibungen der Regeln werden in einer Datei abgelegt. Innerhalb der Datei werden Regeln zu Gruppen (sections) zusammengefaßt. Diese Gruppen werden nacheinander angewandt. Daraus ergeben sich auch für einen Schritt mehrere kleinere Schritte. Beispielsweise ist der Schritt 2 im OPTEX-Optimierer für Gral in vier Teilschritte unterteilt: 1. Dekomposition von Selektionen mit komplexen Selektionsprädikaten in eine Folge von Selektionen mit einfachen Selektionsprädikaten und Zerlegung von Verbundoperationen in eine Folge von Selektionen und Kreuzprodukten. 2. Eigentlicher IMPROVING Schritt (siehe unten). 3. Teilausdrücke bestehend aus einer Selektion und einem unmittelbar folgenden Kreuzprodukt werden in Verbundoperationen umgewandelt. 4. Bestimmung einer Ordnung zwischen den Verbundoperationen und Kreuzprodukten. Dabei werden Kreuzprodukte zum Schluß ausgeführt und kleine Relationen zuerst verbunden. Jeder Gruppe wird eine von drei in Gral implementierten Suchstrategien zugeordnet. STANDARD Führt solange alle Regeln einer Gruppe aus, bis keine Regel mehr anwendbar ist. Es werden keine Vorkehrungen getroffen, um Endlosschleifen zu verhindern. Die Regeln müssen also dementsprechend formuliert werden. Diese Strategie kann für Schritte 2 und 5 verwendet werden. IMPROVING Diese Strategie unterstützt algebraische Optimierung in der deskriptiven Algebra (Schritt 2). Das Ziel ist hierbei eine gute Ordnung der algebraischen Operatoren zu erlangen. Hierzu wird eine partielle Ordnung der algebraischen Operatoren gemäß ihrer Kosten definiert (s. Abb. 30.6 b) für ein Beispiel). Die IMPROVING-Strategie versucht dann die hierdurch definierte Ordnung in einem gegebenen Ausdruck zu erreichen. Hierzu wird sie zunächst rekursiv auf alle Teilausdrücke eines Ausdrucks angewendet. Regeln zur Umformung werden dann angewendet, wenn dadurch eine höhere Kohärenz der Operatorfolge im Ausdruck mit der der Operatorkostenhierarchie erreicht werden kann. Dies entspricht einem Bubble-sort auf dem Ausdruck. Ausdrücke mit der kleinsten Anzahl von runs werden bevorzugt. Dabei ist ein run eine Folge von Operatoren innerhalb des zu optimierenden Ausdrucks, dessen Operatoren gemäß der Operatorkostenhierarchie geordnet sind. 578 CHAPTER 30. EXAMPLE QUERY COMPILER TRANSLATION Regelgruppen mit dieser Strategie werden während der Übersetzung von der deskriptiven Algebra in die Ausführungsalgebra angewendet (Schritt 4). Jede Regel beschreibt dabei die Übersetzung eines einzelnen deskriptiven Operators in einen Ausdruck der Ausführungsalgebra, also einen, der keine deskriptiven Operatoren enthalten darf. Die Übersetzung erfolgt lokal. Für Parameter, also beispielsweise Selektions- und Verbundprädikate, können Regeln angegeben werden, die einen Suchraum für die Reorganisation des Parameters erlauben. Hiermit kann man beispielsweise alle Permutationen einer Konjunktion erzeugen. Die Suchstrategie für die Parameterbestimmung ist erschöpfend und trägt Vorsorge, daß keine Zyklen auftreten. Eine Auswahl kann mittels des valuation-Eintrags in den Regeln getroffen werden. Dieser kann beispielsweise Kosten repräsentieren. Dementsprechend werden dann Regeln mit der kleinsten valuation bevorzugt. Jede für einen Parameter generierte Darstellung wird übersetzt. Die Syntax für eine Regel ist specification definition RULE pattern → result1 valuation1 if condition1 ··· → resultn valuationn if conditionn wobei specification von der Form SPEC spec1 ,. . . ,specn ist. Dabei sind die speci Range-Spezifikationen wie beispielsweise opi in < OpSet >. definition Variablen definiert (bspw. für Attributsequenzen). In Gral existieren verschiedene Sorten von Variablen für Attribute, Operationen, Relationen etc. pattern ein Muster in Form eines Ausdrucks ist, der Variablen und Konstanten enthalten kann. Der Ausdruck kann ein Ausdruck der deskriptiven Algebra oder der Ausführungsalgebra sein. conditioni eine Bedingung ist. Diese Bedingung ist ein allgemeiner boolescher Ausdruck. Spezielle Prädikate wie ExistsIndex (existiert ein Index für eine Relation?) werden von Gral zur Verfügung gestellt. resulti wiederum ein Ausdruck ist, der das Ergebnis der Regel beschreibt. valuationi ist ein arithmetischer Ausdruck, der einen numerischen Wert zurückliefert. Dieser kann in einer (Gral unterstützt mehrere) Auswahlstrategie herangezogen werden: Es wird die Regel mit der kleinsten valuation bevorzugt. 30.1. RESEARCH PROTOTYPES 579 Die Auswertung einer Regel erfolgt standardmäßig. Sei E der Ausdruck auf den die Regel angewendet werden soll. if ∃ Substitution σ, Unterausdruck E ′ von E mit E ′ σ = pattern and ∀1 ≤ i ≤ j: ¬ conditioni and conditionj then ersetzte E ′ in E durch resultj σ Der Gral-Optimierer ist ein reiner regelbasierter Optimierer, der den Transformationsansatz verfolgt. Dementsprechend treffen alle vorher identifizierten Nachteile derselben zu. Zu bemängeln sind im einzelnen folgende Punkte: • Es erfolgt keine frühzeitige Bewertung der Alternativen. • Die Suchstrategien sind fest eingebaut und nicht sonderlich ausgefeilt. • Der Einbau von hochspezialisierten Algorithmen, die besondere Optimierungstechniken repräsentieren, ist schwierig, wenn nicht unmöglich. • Eine Bestimmung der Verbundreihenfolge gemäß eines komplexeren Algorithmus ist nicht möglich. • Da die Übersetzung in die Ausführungsalgebra lokal ist und keine Annotationen zugelassen sind, können vorhandene Sortierreihenfolgen nur schwer ausgenutzt werden. Es wird nur eine Alternative der algebraischen Optimierung zur physischen Optimierung übergeben. Das kann zu Fällen führen, in denen der Optimierer niemals das Optimum finden kann. Wenngleich dies auch im allgemeinen nicht immer möglich ist, so sollte jedoch diese Eigenschaft nicht inhärent sein. Positiv zu vermerken ist, daß für IMPROVING und TRANSLATION der Aufwand für das Pattern-matching vermutlich gering gehalten werden kann. 30.1.10 Lambda-DB http://lambda.uta.edu/lambda-DB/manual/overview.html 30.1.11 Lanzelotte in short Query Language Der Lanzelotte-Optimierer verwendet keine spezielle Anfragesprache. Ausgangspunkt der Betrachtungen sind sog. Anfragegraphen (request graphs, query graphs). Einzelheiten stehen in meiner Ausarbeitung. In einem Papier ([529]) wird gezeigt wie man von einer Regelsprache (RDL) zu Anfragegraphen kommt. Internal Representation Die interne Repraesentation einer Anfrage ist der Class Connection Graph. Dort enthalten sind die Datenbankobjekte (Extensionen), die in der Anfrage referenziert werden aus der Sicht des physikalischen Schemas und die in der Anfrage bedeutsamen Beziehungen zwischen diesen Extensionen (Joins, Attributpfade, Selektionen). 580 CHAPTER 30. EXAMPLE QUERY COMPILER Query Execution Plans QEPs werden als (deep) processing trees repraesentiert. Architecture Der Lanzelotte-Optimierer ist regelbasiert. Transformation versus generation Lanzelotte bietet Regeln fuer beide Spielarten. Sie unterscheidet enumerative search (Generierung), randomized search (Transformation) und genetic search (Transformation). Control/Search-Strategy Lanzelotte versucht von den Einzelheiten der verwendeten Stategien zu abstrahieren und stellt eine erweiterbare Optimierung vor, die die Einzelheiten ueberdeckt. Die tatsaechlich zu einem bestimmten Zeitpunkt verwendete Strategie wird durch “assertions” bestimmt. (Dazu steht nicht viel in den Papieren, vielleicht meint sie auch die Bedingungsteile der Regeln) Cost Model Ziemlich aehnlich dem, das wir verwenden. Sie benutzt auch solche Sachen wie card(C), size(C), ndist(Ai ), f an(Ai ), share(Ai ). Einzelheiten stehen in meiner Ausarbeitung. 30.1.12 Opt++ wisconsin 30.1.13 Postgres Postgres ist kein Objektbanksystem sondern fällt in die Klasse der erweiterten relationalen Systeme [844]. Die wesentlichen Erweiterungen sind • berechenbare Attribute, die als Quel-Anfragen formuliert werden [842], • Operationen [840], • abstrakte Datentypen [839] und • Regeln [843]. Diese beiden Punkte sollen uns jedoch an dieser Stelle nicht interessieren. Die dort entwickelten Optimierungstechniken, insbesondere die Materialisierung der berechenbaren Attribute, sind in der Literatur beschrieben [469, 400, 398, 399]. Unser Interesse richtet sich vielmehr auf eine neuere Publikation, in der eine Vorschlag für die Reihenfolgebestimmung von Selektionen und Verbundoperationen unterbreitet wird [415]. Diese soll im folgenden kurz vorgestellt werden. Zunächst jedoch einige Vorbemerkungen. Wenn man eine Selektion verzögert, also nach einem Verbund ausführt, obwohl dies nicht notwendig wäre, so kann es passieren, daß das Selektionsprädikat auf mehr Tupeln ausgewertet werden muß. Es kann jedoch nicht passieren, daß es auf mehr verschiedenen Werten ausgeführt werden muß. Im Gegenteil, die Anzahl der Argumentewerte wird durch einen Verbund im allgemeinen verkleinert. Cached man also die bereits errechneten Werte des Selektionsprädikates, so wird die Anzahl der Auswertungen des Selektionsprädikates 30.1. RESEARCH PROTOTYPES 581 nach einem Verbund zumindest nicht größer. Die Auswertung wird dann durch ein Nachschlagen ersetzt. Da wir hier nur teure Selektionsprädikate betrachten, ist ein Nachschlagen sehr billig gegenüber der Auswertung. Die Kosten für das Nachschlagen können sogar vernachläßigt werden. Es bleibt das Problem der größe des Caches. Liegt Eingabe sortiert nach den Argumenten des Selektionsprädikates vor, so kann der die größe des Caches unter Umständen auf 1 reduziert werden. Er erübrigt sich ganz, wenn man eine indirekte Repräsentation des Verbundergebnisses verwendet. Eine mögliche indirekte Repräsentation ist in Abbildung ?? dargestellt, wobei die linke der abgebildeten Relationen die Argumente für das betrachtete Selektionsprädikat enthalte. Für jedes Selektionsprädikat p(a1 , . . . , an ) mit Argumenten ai bezeichne cp die Kosten der Auswertung auf einem Tupel. Diese setzen sich aus CPUund I/O-Kosten zusammen (s. [415]). Ein Plan ist ein Baum, dessen Blätter scan-Knoten enthalten und dessen innere Knoten mit Selektions- und Verbundprädikaten markiert sind. Ein Strom in einem Plan ist ein Pfad von einem Blatt zur Wurzel. Die zentrale Ide ist nun die Selektions- und Verbundprädikate nicht zu unterscheiden, sondern gleich zu behandeln. Dabei wird angenommen, daß alle diese Prädikate auf dem Kreuzprodukt aller Relationen der betrachteten Anfrage arbeiten. Dies erfordert eine Anpassung der Kosten. Seien a1 , . . . , an die Relationen der betrachteten Anfrage und p ein Prädikat über den Relationen a1 , . . . , ak . Dann sind die globalen Kosten von p wie folgt definiert: C(p) = Qn cp i=k+1 |ai | Die globalen Kosten berechnen die Kosten der Auswertung des Prädikates über der gesamten Anfrage. Hierbei müssen natürlich diejenigen Relationen herausgenommen werden, die das Prädikat nicht beeinflussen. Zur Illustration nehme man an, p sei ein Selektionsprädikat auf nur einer Relation a1 . Wendet man p direkt auf a1 an, so entstehen die Kosten cp ∗ |a1 |. Im vereinheitlichten Modell wird angenommen, daß jedes Prädikat auf dem Kreuzprodukt aller in der Anfrage beteiligten Relationen ausgewertet wird. Es entstehen also die Kosten C(p)∗|a1 |∗|a2 |∗. . .∗|an |. Diese sind aber gleich cp ∗|a1 |. Dies ist natürlich nur unter der Verwendung eines Caches für die Werte der Selektionsprädikate korrekt. Man beachte weiter, daß die Selektivität s(p) eines Prädikates p unabhängig von der Lage innerhalb eines Stroms ist. Der globale Rang eines Prädikates p ist definiert als s(p) rank (p) = C(p) Man beachte, daß die Prädikate innerhalb eines Stroms nicht beliebig umordbar sind, da wir gewährleisten müssen, daß die von einem Prädikat benutzten Argumente auch vorhanden sein müssen. In [415] wir noch eine weitere Einschränkung vorgenommen: Die Verbundreihenfolge darf nicht angetastet werden. Es wird also vorausgesetzt, daß eine optimale Verbundreihenfolge bereits bestimmt wurde und nur noch die reinen Selektionsprädikate verschoben werden dürfen. Betrachtet man zunächst einmal nur die Umordnung der Prädikate auf einem Strom, so erhält man bedingt durch die Umordbarkeitseinschränkungen 582 CHAPTER 30. EXAMPLE QUERY COMPILER das Sequentialisierungsproblem mit Vorrangbedingungen für das Algorithmus mit Laufzeit O(nlogn) (n ist die Stromlänge) eine optimale Lösung bekannt ist [627]. Das in [415] vorgeschlagene Verfahren wendet diesen Algorithmus solange auf jeden Strom an, bis keine Verbesserung mehr erzielt werden kann. Das Ergebnis ist ein polynomialer Algorithmus, der die optimale Lösung garantiert. Dies jedoch nur unter der Einschränkung, daß die Kosten des Joins linear sind. Damit sind wir bereits bei einem der Nachteile des Verfahrens: Die Kosten der Verbundoperation nicht mitunter nicht linear sondern sogar quadratisch. Ein weiterer Nachteil liegt in der Voraussetzung, daß die optimale Verbundreihenfolge schon bestimmt wurde, denn diese hängt wesentlich davon ab, an welcher Stelle die Selektionen eingebaut werden. Üblicherweise wird bei der Bestimmung der optimalen Verbundreihenfolge vorausgesetzt, daß alle Selektionsprädikate soweit wie möglich nach unten verschoben werden. Dies ist jedoch jetzt nicht mehr der Fall. Es ist also notwendig die Selektionsprädikatmigration in die Joinreihenfolgebestimmung zu integrieren. Nur dann kann man auf gute Ergebnisse hoffen. Die Integration mit einem Ansatz des dynamischen Programmierens ist problematisch, da dort Lösungen verworfen werden, die unter Umständen zur Optimalen Lösung führen, wenn ein Selektionsprädikat nicht ganz nach unten durchgeschoben wird [415]. Eine Teillösung wird dort auch angedeutet. Ist der Rang eines Selektionsprädikates größer als jeder Rang jedes Plans einer Menge von Verbunden, so ist das Selektionsprädikat in einem optimalen Baum oberhalb all dieser Verbundoperationen plaziert. Ein entsprechender Algorithmus hat aber, wenn er beispielsweise nur Left-deep-trees erzeugt, eine Worst-case-Komplexität von O(n4 n!). 30.1.14 Sciore & Sieg Die Hauptidee von Sciore und Sieg ist es, die Regelmenge in Module zu organisieren und jedem Modul eine eigene Suchstrategie, Kostenberechnung und Regelmenge zuzuordnen. Module können andere Module explizit aufrufen, oder implizit ihre Ausgabemenge an das nächste Modul weiterleiten. 30.1.15 Secondo Gueting 30.1.16 Squiral Der erste Ansatz eines regelbasierten Optimierers, Squiral, kann auf das Jahr 1975 zurückgeführt werden [823]. Man beachte, daß dieses Papier vier Jahre älter ist als das vielleicht am häufigsten zitierte Papier über den System R Optimierer [784], der jedoch nicht regelbasiert, sondern fest verdrahtet ist. Abbildung 30.7 gibt einen Überblick über den Aufbau von Squiral. Nach der syntaktischen Analyse liegt ein Operatorgraph vor. Dieser ist in Squiral zunächst auf einen Operatorbaum beschränkt. Zur Behandlung von gemeinsamen Teilausdrücken wird das Anlegen von temporären Relationen, die den 583 30.1. RESEARCH PROTOTYPES query parsing operator graph transformation rules graph transformations transformations optimized operator graph operator construction base procedures cooperative concurrent programs database machine result Figure 30.7: Die Squiralarchitektur gemeinsamen Teilausdrücken entsprechen, vorgeschlagen. Diese temporären Relationen ersetzen dann die gemeinsamen Teilausdrücke. Dadurch ist es möglich, sich auf Operatorbäume zu beschränken. Der Operatorbaum wird dann in einen optimierten Operatorbaum transformiert. Hierzu werden Regeln, die den algebraischen Gleichungen entsprechen, verwendet. Die Anwendung dieser Transformationsregeln ist rein heuristisch gesteuert. Die Heuristik selber ist in den Transformationsanwendungsregeln abgelegt. Eine dieser Regeln sagt beispielsweise, daß Projektionen nur dann nach unten geschoben werden, wenn die Operation, über die die Projektion als nächstes geschoben werden soll, keine Verbundoperation ist. Neben den Standardregeln, die das Vertauschen von relationalen Operatoren ermöglichen, gibt es Regeln, die es erlauben, relationale Ausdrücke in komplexe boolesche Ausdrücke, die dann als Selektionsprädikate Verwendung finden, zu überführen. Dies ist der erste Vorschlag, nicht nur primitive Selektionsprädikate in Form von Literalen, sondern auch komplexere Ausdrücke mit booleschen Verknüpfungen zu verwenden. Auf die Optimierung dieser Ausdrücke wird jedoch nicht weiter 584 CHAPTER 30. EXAMPLE QUERY COMPILER eingegangen. Die wesentliche Aufgabe der Operatorkonstruktion ist die Auswahl der tatsächlichen Implementierungen der Operatoren im Operatorgraph unter optimaler Ausnutzung gegebener Sortierreihenfolgen. Auch diese Phase der Optimierung ist in Squiral nicht kostenbasiert. Sie wird durch zwei Durchläufe durch den Operatorgraphen realisiert. Der erste Durchlauf berechnet von unten nach oben die möglichen Sortierungen, die ohne zusätzlichen Aufwand möglich sind, da beispielsweise Relationen schon sortiert sind, und vorhandene Sortierungen durch Operatoren nicht zerstört werden. Im zweiten Durchlauf, von oben nach unten, werden Umsortierungen nur dann vorgenommen, wenn keine der im ersten Durchlauf berechneten Sortierungen eine effiziente Implementierung des zu konvertierenden Operators erlaubt. Beide Durchläufe sind mit Regelsätzen spezifiziert. Es ist bemerkenswert, daß die Anzahl der Regeln, 32 für den Aufwärtspaß und 34 für den Abwärtspaß, die Anzahl der Regeln für die Transformationsphase (insgesamt 7 Regeln), bei weitem übertrifft. Auch die Komplexität der Regeln ist erheblich höher. Beide für uns interessante Phasen, die Operatorgraphtransformation und Operatorkonstruktion, sind mit Regeln spezifiziert. Es ist jedoch in beiden Phasen kein Suchprozeß nötig, da die Regeln alle Fälle sehr gezielt auflisten und somit einen eindeutigen Entscheidungsbaum beschreiben. Eine noch minutiösere Unterscheidung für die Erzeugung von Ausführungsplänen in der Operatorkonstruktionsphase gibt es nur noch bei Yao [957]. Diese haben auch den Vorteil, durch Kostenrechnungen belegt zu sein. Da die Regeln in ihren Prämissen die Heuristik ihrer Anwendung mit kodieren und keine eigene Suchfunktion zur Anwendung der Regeln existiert, ist die Erweiterbarkeit sehr schwierig. Das Fehlen jeglicher Kostenbewertung macht eine Evaluation der Alternativen unmöglich. Daher ist es auch schwer, die einzelnen Komponenten des Optimierers, nämlich die Regeln, zu bewerten, zumal der transformierende Ansatz gewählt wurde. Der Forderung nach Vorhersagbarkeit und stetiger Leistungsabfall wird in diesem Ansatz ebenfalls nicht nachgegangen. 30.1.17 System R and System R∗ 30.1.18 Starburst and DB2 Starburst [267, 387] liegt ein erweiterbares relationales Datenmodell zugrunde. Die Anfragebearbeitung ist wie in System R und System R* in die zwei Schritte Anfrageübersetzung und -ausführung zergliedert [388]. Wir interessieren uns für den ersten Schritt, die Anfrageübersetzung. Einen Überblick gibt Abbildung 30.8. Nach der standardmäßigen Zerteilung liegt die Anfrage in der internen Darstellung QGM (Query Graph Model) vor. QGM ist an die Anfragesprache Hydrogen (ähnlich SQL) von Starburst angelehnt. Der wichtigste Grundbaustein von QGM ist der select-Operator. Dieser enthält eine Projektionsliste und das Anfrageprädikat in Graphform. Die Knoten sind markiert und referenzieren (gespeicherte) Relationen oder weitere QGM-Operatoren. Die Markierung ist entweder ein Quantor (∀, ∃) oder die Mengenerzeugermarkierung 585 30.1. RESEARCH PROTOTYPES Anfrage ? Zerteilung ? Anfragetransformation ? Planoptimierung ? Planverfeinerung ? Auswertungsplan Figure 30.8: Starburst Optimierer (F). Knoten, die mit F markiert sind, tragen zur Erzeugung des Ergebnisses eines Operators bei, die Quantorenmarkierungen zu dessen Einschränkung. Die Kanten sind mit den Prädikaten markiert. Es ergeben sich also Schleifen für nur eine Relation betreffende Prädikate. Weitere Operatoren sind insert, update, intersection, union und group-by. Daneben wird die QGM-Repräsentation einer Anfrage mit Schemainformation und statistischen Daten angereichert. Sie dient also auch als Sammelbecken für alle die Anfrage betreffende Information. Die QGM-Repräsentation dient der Anfragetransformation (Abb. 30.8) als Ausgangspunkt. Die Anfragetransformation generiert zu einer QGM-Repräsentation verschiedene äquivalente QGM-Repräsentationen. Die Anfragetransformation läßt sich, abgesehen von den Darstellungsunterschieden von QGM und Hydrogen, als eine Variante der Source-level-Transformationen ansehen. Sie wird regelbasiert implementiert, wobei C die Regelsprache ist. Eine Regel besteht aus 2 Teilen, einer Bedingung und einer Aktion. Jeder Teil wird durch eine C-Prozedur beschrieben. Dadurch erübrigt sich die Implementierung eines allgemeinen Regelinterpreters mit Pattern-matching. Regeln können in Gruppen zusammengefaßt werden. Der aktuelle Optimierer umfaßt drei Klassen von Regeln: 1. Migration von Prädikaten 586 CHAPTER 30. EXAMPLE QUERY COMPILER 2. Migration von Projektionen 3. Verschmelzung von Operationen Für die Ausführung der Regeln stehen drei verschiedene Suchstrategien zur Verfügung: 1. sequentiell, 2. prioritätsgesteuert und 3. zufällig, gemäß einer gegebenen Verteilung. Die Teilgraphen der QGM-Repräsentation, auf die Regeln anwendbar sind, können entweder durch eine depth-first oder eine breadth-first Suche bestimmt werden. Falls mehrere alternative QGM-Repräsentationen existieren (was meistens der Fall ist), wird ein Choose-Operator [361] verwendet, der die verschiedenen QGMs in einen QGM zusammenbaut. Die nachfolgende Phase wählt dann kostenbasiert einen dieser alternativen QGMs aus. Dies ist nicht zwingend, die Auswahl kann auch erst zur Auswertungszeit stattfinden. Begründet wird dieses Vorgehen damit, daß keine Kosten für QGMs berechnet werden können, und somit keine Bewertung eines QGMs stattfinden kann. Wie die Autoren selbst anmerken, ist dieser Umstand sehr mißlich, da keine Alternativen verworfen werden können. Sie kündigen daher Untersuchungen an, die Transformation (Schritt 2) mit der Planoptimierung (Schritt 3) zu verschmelzen. Um eine gewisse Kontrolle über das Verhalten der Transformation zu haben, kann diesem Schritt ein “budget” mitgegeben werden, nach dessen Ablauf der Schritt beendet wird. Die genaue Funktionsweise des “budget” ist leider nicht erläutert. Der Schritt der Planoptimierung (s. Abb. 30.8) kann mit der bisherigen Optimierung verglichen werden. Sie arbeitet regelbasiert, benutzt aber nicht den transformierenden, sondern den generierenden Ansatz [567]. Aus Basisoperationen – LOLEPOPs (LOw-LEvel Plan OPerator) genannt – werden mit (grammatischen) Regeln – STARs (strategy alternative rules) genannt – (alternative) Auswertungspläne erzeugt. LOLEPOPs entstammen der um SCAN, SORT und ähnliche physische Operatoren angereicherten relationalen Algebra. Ein Auswertungsplan ist dann ein Ausdruck von geschachtelten Funktionsaufrufen, wobei die Funktionen den LOLEPOPs entsprechen. Ein STAR definiert ein benanntes parametrisiertes Objekt, das einem Nichtterminalsymbol entspricht. Er besteht aus einer Menge von alternativen Definitionen, die jede aus einer Bedingung für die Anwendbarkeit und der Definition eines Plans bestehen. Der generierte Plan kann LOLEPOPs (entsprechen Terminalsymbolen) und STARs referenzieren. Ein rootSTAR entspricht dem Startsymbol der Grammatik. STARs ähneln den Regeln, die in Genesis nicht nur für den Optimierer, sondern für das ganze DBMS eingesetzt werden, um alternative Implementierungen zu erhalten [59, 57, 61, 60, 62]. Um erzeugte Alternativen für einen Plan zusammenzusetzen und zu verhindern, daß diese Alternativen die Anzahl der Pläne in denen diese vorkommen, vervielfachen, wird ein Glue-Mechanismus eingesetzt. Dieser hat den Choose-Operator als Wurzel. Darunter hängen dann Alternativen, die beispielsweise einen Strom 587 30.1. RESEARCH PROTOTYPES deklarative Anfrage ? - normalisierter ObjektalgebraKalk”ulausdruck ausdruck Übers. in Kalk”ul ? - Übers. von Kalk”ul in Algebra Alternative typkonsistenter optimierter Ausf”uhrungsAusdruck Algebraausdruck pl”ane Generierung Typ? ? - Algebra? -Ausf”uhrungs- ? ”uberpr”ufung optimierung plan Figure 30.9: Der Optimierer von Straube mit gewissen Eigenschaften (Sortierung, Lokation) erzeugen. Von diesen Alternativen werden nur diejenigen betrachten, die die geringsten Kosten bei gleichen Eigenschaften haben [541]. Die Kosten beziehen sich dabei immer nur auf den bisher erreichten Teilplan. Der Aufbau eines Auswertungsplanes erfolgt Bottom-up. Die Menge der anwendbaren STARs wird in einer ToDo-Liste gehalten. Diese ist eine sortierte Liste. Hiermit können dann verschiedene Suchstrategien implementiert werden, indem verschiedene Sortierungen für die ToDo-Liste Verwendung finden [541]. Ein Vorteil des STAR-Ansatzes ist die Vermeidung von Pattern-matching. Dies erlaubt es, die STARs zu interpretieren [541]. Die Beurteilung der Erweiterbarkeit ist sehr schwierig. Zum einen handelt es sich um einen erweiterbaren Optimierer, da sowohl LOLEPOPs als auch STARs hinzugefügt werden können. Der Glue-Mechanismus kann ebenfalls spezifiziert werden, ohne in die Implementierung einzugreifen. Das Problem ist lediglich die Komplexität dieser Änderungen. Man kann diesen Ansatz daher vielleicht als bedingt erweiterbar kennzeichnen. Eine Trennung der Optimierung in verschiedene Phasen wirft die damit verbundenen Probleme auf. Wie oben bereits angeführt, kündigen die Autoren weitere Untersuchungen an, um eine Verschmelzung der Phasen zu ermöglichen. Da keine Alternativen verworfen werden, ist es potentiell möglich, den optimalen Auswertungsplan zu errechnen. Schwer zu sehen ist jedoch, wie ein stetiger Leistungsabfall zu realisieren ist. Gleiches gilt für die Evaluierbarkeit der einzelnen Komponenten (STARs). More on Starburst can be found in [689, 690]. 30.1.19 Der Optimierer von Straube In seiner Dissertation stellt Straube den von ihm entwickelten Optimierer dar [850]. Die Ergebnisse dieser Arbeit flossen in eine Reihe von Veröffentlichungen ein [889, 847, 848, 846, 849]. Der Aufbau des Optimierers ist in Abbildung 30.9 skizziert. Eine Anfrage wird zunächst in den Objektkalkül übersetzt und von dort in die Objektalgebra. Hier findet dann zunächst eine Typüberprüfung statt. Danach beginnt die eigentliche Optimierung. Diese besteht aus zwei Phasen, der algebraischen Optimierung und der Generierung des Ausführungsplans. Die erste Phase, die algebraische Optimierung, folgt dem Tranformationsparadigma. Die algebraischen Ausdrücke werden mit Hilfe von Regeln in 588 CHAPTER 30. EXAMPLE QUERY COMPILER äquivalente algebraische Ausdrücke transformiert. Straube beschränkt sich dabei im wesentlichen auf die Formulierung der Regeln. Für die Abarbeitung der Regeln schlägt er lediglich die Verwendung des Exodus-Optimierergenerators [357] oder des Anfrageumformers von Starburst [409] vor. Die zweite Phase, die Generierung der Ausführungspläne, ist nicht regelbasiert. Ihr liegt eine sogenannte Ausführungsplanschablone zu Grunde. Sie ist vergleichbar mit einem Und/Oder-Baum, der alle möglichen Ausführungspläne implizit beinhaltet. Zur Generierung eines konkreten Ausführungsplans wird der durch die Ausführungsschablone aufgespannte Suchraum vollständig durchsucht. Der billigste Ausführungsplan kommt dann zur Abarbeitung. Da ein regelbasierten Ansatz für die erste Phase gewählt wurde und die Verwendung des Exodus-Optimierergenerators oder des Starburst-Anfrageumformers vorgeschlagen wird, verweisen wir für die Bewertung dieser Phase auf die entsprechenden Abschnitte. Die zweite Phase ist voll auskodiert und damit schlecht erweiterbar. Eine frühe Bewertung der Alternativen ist nicht ausgeschlossen, wird aber nicht vorgenommen. Ein vollständiges Durchsuchen verhindert natürlich auch den stetigen Leistungsabfall des Optimierers. Erschwerend für den gewählten Ansatz kommt die Zweiphasigkeit hinzu. Es ist schwierig zu sehen, wie eine phasenübergreifende Kontrolle auszusehen hat, die zumindest potentiell Optimalität gewährleistet. Die Evaluierbarkeit einzelner Komponenten des Optimierers ist nicht möglich. 30.1.20 Other Query Optimizer Neben den in den vorangehenden Abschnitten erwähnten Optimierern gibt es noch eine ganze Reihe anderer, die aber nicht im einzelnen vorgestellt werden sollen. Erwähnt werden sollen noch die Systeme Probe [223, 222, 655] und Prima [404, 402]. Der Schwerpunkt bei der Optimierung liegt im Primasystem auf dem dynamischen Zusammenbau von Molekülen. Es wäre zu untersuchen, ob ein Assembly-Operator (s. [483]) hier von Nutzen wäre. Besonders erwähnenswert ist noch eine Arbeit, die Optimierungsmöglichkeiten für die Datenbankprogrammiersprache FAD vorstellt [894]. Diese Arbeit stellt einen ersten Schritt in Richtung eines Optimierers für eine General-purposeProgrammiersprache dar. Ein wesentlicher Punkt ist dabei, daß auch ändernde Operationen optimiert werden. Der Optimierer ist in zwei Module (RWR and OPT) eingeteilt. RWR ist ein Sprachmodul, das die Übersetzung von FAD in ein internes FAD vornimmt. Immer wenn RWR einen Ausdruck erkennt, der in der vom Optimierer OPT bearbeitbaren Sprache ausgedrückt werden kann, so wird dieser an den Optimierer weitergegeben und dort optimiert. Es wird Exhaustive search als Suchstrategie für den Optimierer vorgeschlagen. Im erweiterten O2 -Kontext wurde das Zerteilen von Pfadausdrücken weiter untersucht [187]. Es werden die Vorteile einer typisierten Algebra für diese Zwecke herausgearbeitet. Eine graphische Notation sorgt für eine anschauliche Darstellung. Ihr besonderes Augenmerk richten die Autoren auf die Faktorisierung gemeinsamer Teilausdrücke. Einige der Ersetzungsregeln sind aus [802] und [801] entnommen und werden gewinnbringend eingesetzt, so beispielsweise die 30.1. RESEARCH PROTOTYPES 589 Ersetzung von Selektionen durch Verbundoperatoren. Ebenfalls erwähnt wurden bereits die Arbeiten im Orion-Kontext [50, 51, 493], die sich auf die Behandlung von Pfadausdrücken konzentrieren. Auch hier wurde ein funktionsfähiger Optimierer entwickelt. Wie bereits erwähnt, stammt der erste regelbasierte Optimierer von Smith und Chang [823]. Doch erst die neueren Arbeiten führten zu einer Blüte des regelbasierten Ansatzes. Hier ist insbesondere die Arbeit von Freytag zu erwähnen, die diese Blüte mit initiierte [295]. Dort wird gezeigt, wie man mit Hilfe eines regelbasierten Ansatzes den Optimierer von System R [784] emulieren kann. Die Eingabe besteht aus einem Lisp-ähnlichen Ausdruck: (select ) Die Projektionsliste besteht aus Attributspezifikationen der Form . Diese werden auch für die Selektionsprädikate und Joinprädikate verwendet. Die Algebra beinhaltet sowohl Operatoren der logischen als auch der physischen Algebra. Im einzelnen gibt es Scan-, Sort-, Projektions, Verbundoperatoren in einer logischen und verschiedenen physischen Ausprägungen. Die Erzeugung der Auswertungspläne wird in verschiedene Schritte unterteilt, die wiederum in Teilschritte zerlegt sind (siehe Abb. 30.3). Zunächst erfolgt die Übersetztung in die logische Algebra. Hier werden Scan-Operatoren um die Relationen gebaut und Selektionen, die nur eine Relation betreffen, in die Scan-Operatoren eingebaut. Der zweite Schritt generiert Zugriffspläne, indem der Scan-Operator durch einen einfachen File-Scan (FSCAN) ersetzt wird oder falls möglich, durch einen Index-Scan (ISCAN). Der dritte Schritt generiert zunächst verschiedene Verbundreihenfolgen und bestimmt anschließend die Verbundmethoden. Wie in System R wird zwischen dem Sortiere-und-Mische-Verbund und dem Verbund durch geschachtelte Schleifen unterschieden. Es werden keinerlei Aussagen über die Auswahl einer Suchstrategie gemacht. Ziel ist es vielmehr, durch die Modellierung des System R Optimierers mit Hilfe eines Regelsystems die prinzipielle Brauchbarkeit des regelbasierten Ansatzes nachzuweisen. Man beachte auch die erwähnte Arbeit von Sciore und Sieg zur Modularisierung von regelbasierten Optimierern [780]. Die Hauptidee von Sciore und Sieg ist es, die Regelmenge in Module zu organisieren und jedem Modul eine eigene Suchstrategie, Kostenberechnung und Regelmenge zuzuordnen. Module können andere Module explizit aufrufen oder implizit ihre Ausgabemenge an das nächste Modul weiterleiten. Der erste Optimierer des GOM-Systems ist ebenfalls regelbasiert [486, 485]. Die gesamte Regelmenge wurde hier in Teilmengen ähnlich zu den Modulen organisiert. Die Steuerung zwischen den Teilmengen erfolgt durch ein heuristisches Netz, das angibt in welchen Fällen zu welcher weiteren Teilmenge von Regeln zu verzweigen ist. Die Strukturierung des Optimiererwissens steht auch in [580] im Vordergrund. 590 CHAPTER 30. EXAMPLE QUERY COMPILER In diesem Zusammenhang, der Strukturierung von Optimierern und der Wiederverwendbarkeit einzelner Teile, sei noch einmal ausdrücklich auf die Arbeiten von Batory [60] aus dem Genesiskontext hingewiesen (s. auch Abschnitt 30.1.7). Der dort leider ein wenig zu kurz kommende Aspekt der Wiederverwendbarkeit von Suchfunktionen wird in einer Arbeit von Lanzelotte und Valduriez [530] ausführlicher behandelt. Hier wurde eine Typhierarchie existierender Suchfunktionen entworfen und deren Schnittstellen vereinheitlicht. Die Suchfunktionen selbst wurden modularisiert. Weitere Arbeiten aus derselben Gruppe beschäftigen sich mit der Optimierung von objektorientierten Anfragen [529, 533], wobei hier die Behandlung von Pfaden im Vordergrund steht. Eine neuere Arbeit beschäftigt sich mit der Optimierung von rekursiven Anfragen im objektorientierten Kontext [531]. Viele kommerzielle Systeme besitzen eine Anfragesprache und einen Optimierer. Einer der wenigen Optimierer, die auch in der Literatur beschrieben werden, ist der von ObjectStore [654]. Durch die einfache Anfragesprache, die nur Teilmengenbestimmung erlaubt, und die strikte Verwendung von CSemantik für boolesche Ausdrücke sind die meisten Optimierungsmöglichkeiten jedoch ausgeschlossen, und der “Optimierer” ist daher sehr einfach. 30.2 Commercial Query Compiler 30.2.1 The DB 2 Query Compiler 30.2.2 The Oracle Query Compiler Oracle still provides two modes for its optimizer. Dependent on the user specified optimizer mode, a query is optimized either by the rule-based optimizer (RBO) or by the cost-based optimizer (CBO). The RBO is a heuristic optimizer that resembles the simple optimizer of chapter 2. Here we concentrate on the more powerful CBO. The user can also determine whether the optimizer should optimize for throughput or response time. • nested loop join, nested loop outer join, index nested loop joins, sort merge join, sort merge outer join, hash joins, hash outer join, cartesian join, full outer join, cluster join, anti-joins, semi-joins, uses bitmap indexes for star queries • sort group-by, • bitmap indexes, bitmap join indexes • index skip scans • partitioned tables and indexes • index-organized tables • reverse key indexes • function-based indexes 30.2. COMMERCIAL QUERY COMPILER 591 • SAMPLE clause in SELECT statement • parallel query and parallel DML • star transformations and star joins • query rewrite with materialized views • cost: considers CPU, I/O, memory • access path: table scan, fast full index scan, index scan, ROWID scans (access ROW by ROWID), cluster scans, hash scans. [former two with prefetching] index scans: – index unique scan (UNIQUE or PRIMARY KEY constraints) – index range scan (one or more leading columns or key) – index range scan descending – index skip scan (> 1 leading key values not given) – index full scan, index fast full scan – index joins (joins indexes with hash join, resembles index anding) – bitmap joins (index anding/oring) – cluster scan: for indexed cluster to retrieve rows with the same cluster id – hash scan: to locate rows in a hash cluster CBO: parsed quer –¿ [query transformer] –¿ [estimator] –¿ [plan generator] 1-16. after parser: nested query blocks simple rewrites: • eliminate between • elminate x in (c1 . . . cn) (also uses IN-LIST iterator as outer table constructor in a d-join or nested-loop join like operation. query transformer: • view merging • predicate pushing • subquery unnesting • query rewrite using materialized views (cost based) remaining subplans for nested query blocks are ordered in an efficient manner plan generator: • choose access path, join order (upper limit on number of permutations considered), join method. 592 CHAPTER 30. EXAMPLE QUERY COMPILER • generate subplan for every block in a bottom-up fashion • (> 1 for still nested queries and unmerged views) • stop generating more plans if there already exists a cheap plan • starting plan: order by their effective cardinality • considers normally only left-deep (zig-zag) trees. • single row joins are placed first (based on unique and key constraints. • join statement with outer join: table with outer join operator must come after the other table in the condition in the join order. optimizer does not consider join orders that violate this rule. • NOT IN (SELECT . . . ) becomes a anti-join that is executed as a nestedloop join by default unless hints are given and various conditions are met which allow the transformation of the NOT IN uncorrelated subquery into a sort-merge or hash anti-join. • EXISTS (SELECT . . . ) becomes a semi-join. execution as index nested loops, if there is an index. otherwise a nested-loop join is used by default for EXISTS and IN subqueries that cannot be merged with the containing query unless a hint specifies otherwise and conditions are met to allos the transformation of the subquery into a sort-merge or hash semi-join. • star query detection cost: • takes unique/key constraints into consideration • low/high values and uniform distribution • host variables: guess small selectivity value to favor index access • histograms • common subexpression optimization • complex view merging • push-join predicate • bitmap access paths for tables with only B-tree indexes • subquery unnesting • index joins rest: • Oracle allows user hints in SQL statements to influence the Optimizer. for example join methods can be given explicitly 30.2. COMMERCIAL QUERY COMPILER 593 parameters: • HASH AREA SIZE • SORT AREA SIZE • DB FILE MULTIBLOCK READ COUNT (number of prefetched pages) statistics: • table statistics number of rows, number of blocks, average row length • column statistics number of distinct values, number of nulls, data distribution • index statistics number of keys, (from column statistics?) number of leaf blocks, levels, clustering factor (collocation amount of the index block/data blocks, 3-17) • system statistics I/O performance and utilization, cpu performance and utilization generating statistics: • estimation based on random data sampling (row sampling, block sampling) • exact computation • user-defined statistics collection methods histograms: • height-based histograms (approx. equal number of values per bucket) • value-based histograms used for number of distinct values ≤ number of buckets • support of index-only queries • index-organized tables • bitmap indexes (auch fuer null-werte x <> const) • convert b-tree result RID lists to bitmaps for further bitmap anding • bitmaps and count • bitmap join index • cluster tables (cluster rows of different tables on the same block) • hash clusters 594 CHAPTER 30. EXAMPLE QUERY COMPILER • hint: USE CONCAT: OR ::= UNION ALL • hint: STAR TRANSFORMATION: see Oracle9i Database Concepts • NOT IN ::= anti-join • EXISTS ::= special join preserving duplicates and adding no phantom duplicates (semi-join) (5-27) • continue 5-35 30.2.3 The SQL Server Query Compiler Part VI Selected Topics 595 Chapter 31 Generating Plans for Top-N-Queries? 31.1 Motivation and Introduction motivation: • first by user (ordered) • optimize for n rows (user/cursor) • exist(subquery) optimize for 1 row • having count(*) <= n 31.2 Optimizing for the First Tuple 31.3 Optimizing for the First N Tuples • nl-join instead of sm/hash join • index access over table scan • disable prefetching [127, 128, 129] [150, 246] [268, 269, 433] [554] [379] (also contains inverted list algorithms under frequent updates) [554] 597 598 CHAPTER 31. GENERATING PLANS FOR TOP-N-QUERIES? Chapter 32 Recursive Queries 599 600 CHAPTER 32. RECURSIVE QUERIES Chapter 33 Issues Introduced by OQL 33.1 Type-Based Rewriting and Pointer Chasing Elimination The first rewrite technique especially tailored for the object-oriented context is type-based rewriting. Consider the query select distinct sn, ssn, ssa from s in Student PROJECT [sn, sa, ssn, ssa] SELECT [sg >8 and ssa <30] EXPAND [sn:s.name, sg:s.gpa, ss:s.supervisor] EXPAND [ssn:ss.name, ssa:ss.age] SCAN [s:student] Figure 33.1: Algebraic representation of a query 601 602 CHAPTER 33. ISSUES INTRODUCED BY OQL where define sg > 8 and ssa < 30 sn = s.name sg = s.gpa ss = s.supervisor ssn= ss.name ssa= ss.age The algebraic expression in Fig. 33.1 implies a scan of all students and a subsequent dereferentiation of the supervisor attribute in order to access the supervisors. If not all supervisors fit into main memory, this may result in many page accesses. Further, if there exists an index on the supervisor’s age, and the selection condition ssa < 30 is highly selective, the index should be applied in order to retrieve only those supervisors required for answering the query. Type-based rewriting enables this kind of optimization. For any expression of certain type with an associated extent, the extent is introduced in the from clause. For our query this results in select distinct sn, pn, pa from s in Student, p in Professor where sg > 8 and pa < 30 and ss = p define sn = s.name sg = s.gpa ss = s.supervisor pn = ss.name pa = ss.age As a side-effect, the attribute traversal from students via supervisor to professor is replaced by a join. Now, join-ordering allows for several new plans that could not be investigated otherwise. For example, we could exploit the above mentioned index to retrieve the young professors and join them with the students having a gpa greater than 8. The according plan is given in Fig. 33.2. Turning implicit joins or pointer chasing into explicit joins which can be freely reordered is an original query optimization technique for object-oriented queries. Note that the plan generation component is still allowed to turn the explicit join into an implicit join again. Consider the query select distinct p from p in Professor where p.room.number = 209 Straight forward evaluation of this query would scan all professors. For every professor, the room relationship would be traversed to find the room where the professor resides. Last, the room’s number would be retrieved and tested to be 209. Using the inverse relationship, the query could as well be rewritten to 603 33.2. CLASS HIERARCHIES PROJECT [sn, pn, pa] JOIN [ss=p]      HH H HH H H H SELECT [sg>8] SELECT [pa<30] EXPAND [sg:s.gpa ss:s.supervisor sn:s.name] EXPAND [pa:p.age, pn:p.name] Student [s] Professor [p] Figure 33.2: A join replacing pointer chasing select distinct r.occupiedBy from r in Room where r.number = 209 The evaluation of this query can be much more efficient, especially if there exists an index on the room number. Rewriting queries by exploiting inverse relationships is another rewrite technique to be applied during Rewrite Phase I. 33.2 Class Hierarchies Another set of equivalences known from the relational context involves the UNION operator (∪) and plays a vital role in dealing with class/extent hierarchies. Consider the simple class hierarchy given in Figure 33.3. Obviously, for the user, it must appear that the extent of Employee contains all Manager s. However, the system has different alternatives to implement extents. Most OBMSs organize an object base into areas or volumes. Each area or volume is then further organized into several files. A file is a logical grouping of objects not necessarily consisting of subsequent physical pages on disk. Files don’t share pages. The simplest possible implementation to scan all objects belonging to a certain extent is to perform an area scan and select those objects belonging to the extent in question. Obviously, this is far to expensive. Therefore, some more so- 604 CHAPTER 33. ISSUES INTRODUCED BY OQL Employee 6 Manager name: string salary: int boss: Manager boss: CEO 6 CEO Figure 33.3: A Sample Class Hierarchy phisticated possibilities to realize extents and scans over them are needed. The different possible implementations can be classified along two dimensions. The first dimension distinguishes between logical and physical extents, the second distinguishes between strict and (non-strict) extents. Logical vs. Physical Extents An extent can be realized as a collection of object identifiers. A scan over the extent is then implemented by a scan over all the object identifiers contained in the collection. Subsequently, the object identifiers are dereferenced to yield the objects themselves. This approach leads to logical extents. Another possibility is to implement extent membership by physical containment. The best alternative is to store all objects of an extent in a file. This results in physical extents. A scan over a physical extent is then implemented by a file scan. Extents vs. Strict Extents A strict extent contains the objects (or their OIDs) of a class excluding those of its subclasses. A non-strict extent contains the objects of a class and all objects of its subclasses. Given a class C, any strict extent of a subclass C ′ of C is called a subextent of C. Obviously, the two classifications are orthogonal. Applying them both results in the four possibilities presented graphically in Fig. 33.4. [195] strongly argues that strict extents are the method of choice. The reason is that only this way the query optimizer might exploit differences for extents. For example, there might be an index on the age of Manager but not for Employee. This difference can only be exploited for a query including a restriction on age, if we have strict extents. However, strict extents result in initial query plans including UNION operators. Consider the query select e from e in Employee where e.salary > 100.000 605 33.3. CARDINALITIES AND COST FUNCTIONS C: {id1 } logical C1 : {id2 } C: {id1 } C2 : {id3 } C1 : {id1 , id2 } {id1 , id3 } C: C: ob1 ob1 physical C1 : ob2 C2 : ob3 C1 : ob1 , ob2 excluding C2 : ob1 , ob3 including Figure 33.4: Implementation of Extents The initial plan is σsa>100.000 (χsa:x.salary ((Employee[x] ∪ M anager[x]) ∪ CEO[x])) Hence, algebraic equivalences are needed to reorder UNION operators with other algebraic operators. The most important equivalences are e1 ∪ e2 ≡ e2 ∪ e1 (33.1) e1 ∪ (e2 ∪ e3 ) ≡ (e1 ∪ e2 ) ∪ e3 (33.2) χa:e (e1 ∪ e2 ) ≡ χa:e (e1 ) ∪ χa:e (e2 ) (33.4) σp (e1 ∪ e2 ) ≡ σp (e1 ) ∪ σp (e2 ) (e1 ∪ e2 ) 1p e3 ≡ (e1 1p e3 ) ∪ (e2 1p e3 ) (33.3) (33.5) Equivalences containing the UNION operator sometimes involve tricky typing constraints. These go beyond the current chapter and the reader is refered to [613]. 33.3 Cardinalities and Cost Functions 606 CHAPTER 33. ISSUES INTRODUCED BY OQL Chapter 34 Issues Introduced by XPath 34.1 A Naive XPath-Interpreter and its Problems 34.2 Dynamic Programming and Memoization [338, 340, 339] 34.3 Naive Translation of XPath to Algebra 34.4 Pushing Duplicate Elimination 34.5 Avoiding Duplicate Work 34.6 Avoiding Duplicate Generation [416] 34.7 Index Usage and Materialized Views [48] 34.8 Cardinalities and Costs 34.9 Bibliography 607 608 CHAPTER 34. ISSUES INTRODUCED BY XPATH Chapter 35 Issues Introduced by XQuery 35.1 Reordering in Ordered Context 35.2 Result Construction [281, 282] [797] 35.3 Unnesting Nested XQueries Unnesting with error: [673] [593, 595, 594, 596] 35.4 Cardinalities and Cost Functions cardinality: [168, 941, 942, 761] [7] XPathLearner: [560] Polyzotis et al (XSKETCH): [695, 696, 693], [697] 35.5 Bibliography [599] [883] [233] Numbering: [280] Timber [459] TAX Algebra [462], physical algebra of Timber [674] Structural Joins [20, 827] SAL: [70], TAX: [462], XAL: [293] • XML Statistics for hidden web: [8] • XPath selectivity for internet scale: [7] • StatiX: [294] • IMAX: incremental statistics [711] • Metrics for XML Document Collections: [499] 609 610 CHAPTER 35. ISSUES INTRODUCED BY XQUERY • output size containment join: [909] • Bloom Histogram: [910] View and XML: [2] Quilt: [141] Timber: [459] Monet: [774] Natix: NoK: [970] Correlated XPath: [971] Wood: [933, 934, 935] Path based approach to Storage (XRel): [961] Grust: [370, 372, 371, 373, 873] Liefke: Loop fusion etc.: [559] Benchmarking: XMach-1: [98], MBench: [751] XBench: [667, 952, 953], XMark: [775] XOO7: [108] Rewriting: [237, 363, 364] [236, 428] Incremental Schema Validation: [102, 672] Franklin (filtering): [240] Chapter 36 Outlook What we did not talk about: multiple query optimization, semantic query optimization, special techniques for optimization in OBMSs, multi-media data bases, object-relational databases, spatial databases, temporal databases, and query optimization for parallel and distributed database systems. Multi Query Optimization? [787] Parametric/Dynamic/Adaptive Query Optimization? [33, 34, 35, 29, 361, 354] [455, 456, 474, 890] [45] Parallel Database Systems? Distributed Database Systems? [514] Recursive Queries? Multi Database Systems? Temporal Database Systems? Spatial Database Systems? Translation of Triggers and Updates? Online Queries (Streams)? Approximate Answers? [330] 611 612 CHAPTER 36. OUTLOOK Appendix A Query Languages? A.1 Designing a query language requirements design principles for object-oriented query languages: [427] [83] A.2 SQL A.3 OQL A.4 XPath A.5 XQuery A.6 Datalog 613 614 APPENDIX A. QUERY LANGUAGES? Appendix B Query Execution Engine (?) • Overview Books: [403, 316] • Overview: Graefe [347, 348] • Implementation of Division [344, 353, 355] • Implementation of Division and set-containment joins [708] • Hash vs. Sort: [349, 358] • Heap-Filter Merge Join: [346] • Hash-Teams 615 616 APPENDIX B. QUERY EXECUTION ENGINE (?) Appendix C Glossary of Rewrite and Optimization Techniques trivopt Triviale Auswertungen bspw. solche für widersprüchliche Prädikate werden sofort vorgenommen. Dies ist eine Optimierungstechnik, die oft bereits auf der Quellebene durchgeführt wird. pareval Falls ein Glied einer Konjunktion zu false evaluiert, werden die restlichen Glieder nicht mehr evaluiert. Dies ergibt sich automatisch durch die Verwendung von hintereinanderausgeführten Selektionen. pushnot Falls ein Prädikat die Form ¬(p1 ∧ p2 ) hat, so ist pareval nicht anwendbar. Daher werden Negationen nach innen gezogen. Auf ¬p1 ∨ ¬p2 ist pareval dann wieder anwendbar. Das Durchschieben von Negationen ist auch im Kontext von NULL-Werten unabdingbar für die Korrektheit. Dies ist eine Optimierungstechnik, die oft bereits auf der Quellebene durchgeführt wird. bxp Verallgemeinert man die in pareval und notpush angesprochene Problematik, so führt dies auf die Optimierung von allgemeinen booleschen Prädikaten. trans Durch Ausnutzen der Transitivität von Vergleichsoperationen können neue Selektionsprädikate gewonnen und Konstanten propagiert werden. Diese Optimierungstechnik erweitert den Suchraum und wird ebenfalls auf der Quellebene durchgeführt. Bei manchen Systemen wir dieser Schritt nicht durchgeführt, falls sehr viele Relationen zu joinen sind, um den Suchraum nicht noch weiter zu vergrößern [322, 323]. selpush Selektionen werden so früh wie möglich durchgeführt. Diese Technik führt nicht immer zu optimalen Auswertungsplänen und stellt somit eine Heuristik dar. Diese Optimierungstechnik schränkt den Suchraum ein. projpush Die Technik zur Behandlung von Projektionen ist nicht ganz so einfach wie die der Selektion. Zu unterscheiden ist hier, ob es sich um eine Projektion mit Duplikateliminierung handelt oder nicht. Je nach dem 617 618APPENDIX C. GLOSSARY OF REWRITE AND OPTIMIZATION TECHNIQUES ist es sinnvoll, die Projektion zur Wurzel des Operatorgraphen zu verschieben oder zu den Blättern hin. Die Projektion verringert den Speicherbedarf von Zwischenergebnissen, da die Tupel weniger Attribute enthalten. Handelt es sich um eine duplikateliminierende Projektion, so wird möglicherweise auch die Anzahl der Tupel verringert. Duplikatelimination als solche ist aber eine sehr teure Operation. Diese wird üblicherweise durch Sortieren implementiert. Bei großen Datenmengen gibt es allerdings bessere Alternativen. Auch Hash-basierte Verfahren eignen sich zur Duplikateliminierung. Diese Optimierungstechnik schränkt den Suchraum ein. grouppush Pushing a grouping operation past a join can lead to better plans. crossjoin Ein Kreuzprodukt, das von einer Selektion gefolgt wird, wird wenn immer möglich in eine Verbundoperation umgewandelt. Diese Optimierungstechnik schränkt den Suchraum ein, da Pläne mit Kreuzprodukten vermieden werden. nocross Kreuzprodukte werden wenn immer möglich vermieden oder, wenn dies nicht möglich ist, erst so spät wie möglich durchgeführt. Diese Technik verringert den Suchraum, führt aber nicht immer zu optimalen Auswertungsplänen. semjoin Eine Verbundoperation kann durch eine Semiverbundoperation ersetzt werden, wenn nur die Attribute einer Relation weitere Verwendung finden. joinor Die Auswertungsreihenfolge von Verbundoperationen ist kritisch. Daher wurden eine Reihe von Verfahren entwickelt, die optimale oder quasioptimale Reihenfolge von Verbundoperationen zu bestimmen. Oft wird dabei der Suchraum auf Listen von Verbundoperationen beschränkt. Die Motivation hierbei ist das Verkleinern des Suchraums und die Beschränkung auf nur eine zu erzeugenden Zwischenrelation. Dieses Verfahren garantiert nicht mehr ein optimales Ergebnis. joinpush Tables that are guaranteed to produce a single tuple are always pushed to be joined first. This reduces the search space. The single tuple condition can be evaluated by determining whether all key attributes of a relation are fully qualified. [322, 323]. elimredjoin Eliminate redundant join operations. See Sections. . . XXX indnest Eine direkte Evaluierung von geschachtelten Anfragen wird durch geschachtelte Schleifen vorgenommen. Dabei wird eine Unteranfrage für jede erzielte Bindung der äußeren Anfrage evaluiert. Dies erfordert quadratischen Aufwand und ist deshalb sehr ineffizient. Falls die innere Anfrage unabhängig von der äußeren Anfrage evaluiert werden kann, so wird diese herausgezogen und getrennt evaluiert. Weitere Optimierungen geschachtelter Anfragen sind möglich. 619 unnest Entschachtelung von Anfragen [189, 191, 314, 494, 500, 501, 689, 830, 832, 833] compop Oft ist es sinnvoll, mehrere Operationen zu einer komplexeren zusammenzufassen. Beispielsweise können zwei hintereinander ausgeführte Selektionen durch eine Selektion mit einem komplexeren Prädikat ersetzt werden. Ebenso kann auch das Zusammenfassen von Verbundoperationen, Selektionen und Projektionen sinnvoll sein. comsubexpr Gemeinsame Teilausdrücke werden nur einfach evaluiert. Hierunter fallen zum einen Techniken, die das mehrmalige Lesen vom Hintergrundspeicher verhindern, und zum anderen Techniken, die Zwischenergebnisse von Teilausdrücken materialisieren. Letzteres sollte nur dann angewendet werden, falls die k-malige Auswertung teurer ist als das einmalige Auswerten und das Erzeugen des Ergebnisses mit k-maligem Lesen, wobei k die Anzahl der Vorkommen im Plan ist. dynminmax Dynamisch gewonnene Minima und Maxima von Attributwerten können für die Erzeugung von zusätzlichen Restriktionen herangezogen werden. Diese Technik funktioniert auch sehr gut für unkorrelierte Anfragen. Dabei werden min- und max-Werte herangezogen um zusätzliche Restriktionen für die Anfrage zu gewinnen. [500, 322, 323] pma Predicate Move around moves predicates between queries and subqueries. Mostly they are duplicated in order to yield as many restrictions in a block as possible [551]. As a special case, predicates will be pushed into view definitions if they have to be materialized temporarily [322, 323]. exproj For subqueries with exist prune unnecessary entries in the select clause. The intention behind is that attributes projected unnecessarily might influence the optimizer’s decision on the optimal access path [322, 323]. vm View merging expands the view definition within the query such that is can be optimized together with the query. Thereby, duplicate accesses to the view are resolved by different copies of the views definition in order to facilitate unnesting [322, 323, 689]. inConstSet2Or A predicate of the form x ∈ {a1 , . . . , an } is transformed into a sequence of disjunctions x = a1 ∨ . . . ∨ x = an if the ai are constants in order to allow index or-ing (TID list operations or bitvector operations) [322, 323]. like1 If the like predicate does not start with %, then a prefix index can be used. like2 The pattern is analyzed to see whether a range of values can be extracted such that the pattern does not have to be evaluated on all tuples. The result is either a pretest or an index access. [322, 323]. like3 Special indexes supporting like predicates are introduced. 620APPENDIX C. GLOSSARY OF REWRITE AND OPTIMIZATION TECHNIQUES sort Vorhandene Sortierungen können für verschiedene Operatoren ausgenutzt werden. Falls keine Sortierung vorhanden ist, kann es sinnvoll sein, diese zu erzeugen [818]. Z.B. aufeinanderfolgende joins, joins und gruppierungen. Dabei kann man die Gruppierungsattribute permutieren, um sie mit einer gegebenen Sortierreihenfolge in Einklang zu bringen [322, 323]. aps Zugriffspfade werden eingesetzt, wann immer dies gewinnbringend möglich ist. Beispielsweise kann die Anfrage select count(*) from R; durch einen Indexscan effizient ausgewertet werden [171]. tmpidx Manchmal kann es sinnvoll sein, temporäre Zugriffspfade anzulegen. optimpl Für algebraische Operatoren existieren im allgemeinen mehrere Implementierungen. Es sollte hier immer die für einen Operator im vorliegenden Fall billigste Lösung ausgewählt werden. Ebenfalls von Bedeutung ist die Darstellung des Zwischenergebnisses. Beispielsweise können Relationen explizit oder implizit dargestellt werden, wobei letztere Darstellung nur Zeiger auf Tupel oder Surrogate der Tupel enthält. Weitergedacht führt diese Technik zu den TID-Listen-basierten Operatoren. setpipe Die Evaluation eines algebraischen Ausdrucks kann entweder mengenorientiert oder nebenläufig (pipelining) erfolgen. Letzteres erspart das Erzeugen von großen Zwischenergebnissen. tmplay Das temporäre Ändern eines Layouts eines Objektes kann durchaus sinnvoll sein, wenn die Kosten, die durch diese Änderung entstehen, durch den Gewinn der mehrmaligen Verwendung dieses Layouts mehr als kompensiert werden. Ein typisches Beispiel ist Pointer-swizzling. matSubQ If a query is not unnested, then for every argument combination passed to the subquery, the result is materialized in order to avoid duplicate computation of the same subquery expression for the same argument combination [322, 323]. This technique is favorable for detachment [845, 931, 962] AggrJoin Joins with non-equi join predicates based on ≤ or <, can be processed more efficiently than by a cross product with a subsequent selection [193]. ClassHier Class hierarchies involve the computation of queries over a union of extents (if implemented that way). Pushing algebraic operations past unions allows often for more efficient plans [195]. AggrIDX Use an index to determine aggregate values like min/max/avg/count. rid/tidsort When several tuples qualify during an index scan, the resulting TIDs can be sorted in order to guarantee sequential access to the base relation. 621 multIDX Perform operations like union and disjunction on the outcome of an index scan. multIDXsplit If two ranges are queried within the same query ([1-10],[20-30]) consider multIDX or use a single scan through the index [1-30] with an additional qualification predicate. multIDXor Queries with more conditions on indexed attributes can be evaluated by more complex combinations of index scans and tid-list/bitvector operations. (A = 5 and (B = 3 or B = 4)). scanDirChange During multiple sequential scans of relation (e.g. for a blockwise nested loop join), the direction of the scan can be changed in order to reuse as much of the pages in the buffer as possible. lock The optimizer should chose the correct locks to set on tables. For example, if a whole table is scanned, a table lock should be set. expFunMat Expensive functions can be cached during query evaluation in order to avoid their multiple evaluation for the same arguments [414]. expFunFil Easiser to evaluate predicates that are implied by more expensive predicates can serve as filters in order to avoid the evaluation of the expensive predicate on all tuples. stop Stop evaluation after the first tuple qualifies. This is good for existential subqueries, universal subqueries (disqualify), semi-joins for distinct results and the like. expensive projections Werte 1. zum Schluss, da dort am wenigsten verschiedene 2. durchschieben, falls cache fuer Funktionsergebnisse dadurch vermieden werden kann OO-Kontext: problematisch: objekte muessen fuer funktionen/methoden als ganzes vorhanden sein. daher ist eine einfache strategie nicht moeglich. distinct/sorting select distinct a,b,c ... order by a,b kann auch nach a,b,c sortiert werden. stoert gar nicht, vereinfacht aber die duplikateliminierung. nur ein sortieren notwendig. index access • by key • by key range • by dashed key range (set of keys/key ranges) • index anding/oring 622APPENDIX C. GLOSSARY OF REWRITE AND OPTIMIZATION TECHNIQUES alternative operator implementations e.g. join: nlj bnlj hj grace-hash hybrid-hash smj diag-join star-join distpd Push-down or Pull-up distinct. aggregate with distinct ... group by a ===> sort on a,b dup elim group a,sum(b) select a, agg(distinct b) alternative: aggr(distinct *) is implemented such that it uses a hashtable to eliminate duplicates this is only good, if the number of groups is smalland the number of distinct values in each group is small. XXX - use keys, inclusion dependencies, fds etc. (all user specified and derived) (propagate keys over joins as fds), (for a function call: derived IU is functional dependend on arguments of the function call if function is deterministic) (keys can be represented as sets of IUs or as bitvectors(given numbering of IUs)) (numbering inprecise: bitvectors can be used as filters (like for signatures)) Appendix D Useful Formulas The following identities can be found in the book by Graham, Knuth, and Patashnik [362]. We use the following definition of binomial coefficients:   ( n! n k!(n−k)! = k 0 if 0 ≤ k ≤ n else (D.1) We start with some simple identities.   n k   n k   n k k   n (n − k) k   n (n − k) k   n k    r m m k = = = = = = =   n n−k   n n−1 k k−1   n−1 n k−1   n−1 n k   n−1 n n−k−1     n−1 n−1 + k k−1    r r−k k m−k 623 (D.2) (D.3) (D.4) (D.5) (D.6) (D.7) (D.8) 624 APPENDIX D. USEFUL FORMULAS The following identities are good for sums of binomial coefficients. n   X n k=0 k n   X k = 2n (D.9)  (D.10)  n+1 m m+1 k=0      n  X m+k m+n+1 m+n+1 = = k m+1 n k=0    n  X m−n+k m+1 = k n = (D.11) (D.12) k=0 From Identities D.2 and D.11 it follows that    m  X k+r m+r+1 = r r+1 (D.13) k=0 For sums of products, we have n  X     r s r+s = m+k n−k m+n k=0     n  X l−k q+k l+q+1 = m n m+n+1 k=0     n  X l s l+s = m+k n+k l−m+n k=0 Last,   n X n k = n2n−1 k k=0 (D.14) (D.15) (D.16) Bibliography [1] K. Aberer and G. Fischer. Semantic query optimization for methods in object-oriented database systems. In Proc. IEEE Conference on Data Engineering, pages 70–79, 1995. [2] S. Abiteboul. On Views and XML. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 1–9, 1999. [3] S. Abiteboul, C. Beeri, M. Gyssens, and D. Van Gucht. An introduction to the completeness of languages for complex objects and nested relations. In S. Abiteboul, P.C. Fischer, and H.-J. Schek, editors, Nested Relations and Complex Objects in Databases, pages 117–138. Lecture PAGESs in Computer Science 361, Springer, 1987. [4] S. Abiteboul and N. Bidoit. Non first normal form relations: An algebra allowing restructuring. Journal of Computer Science and Systems, 33(3):361, 1986. [5] S. Abiteboul, S. Cluet, V. Christophides, T. Milo, G. Moerkotte, and J. Simeon. Querying documents in object databases. International Journal on Digital Libraries, 1(1):5–19, April 1997. [6] S. Abiteboul and O. Duschka. Complexity of answering queries using materialized views. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 254–263, 1998. [7] A. Aboulnaga, A. Alameldeen, and J. Naughton. Estimating the selectivity of XML path expressions for internet scale applications. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 591–600, 2001. [8] A. Aboulnaga and J. Naughton. Building XML statistics for the hidden web. In Int. Conference on Information and Knowledge Management (CIKM), pages 358–365, 2003. [9] W. Abu-Sufah, D. J. Kuch, and D. H. Lawrie. On the performance enhancement of paging systems through program analysis and transformations. IEEE Trans. on Computers, C-50(5):341–355, 1981. [10] B. Adelberg, H. Garcia-Molina, and J. Widom. The STRIP rule system for efficiently maintaining derived data. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 147–158, 1997. 625 626 BIBLIOGRAPHY [11] F. Afrati, M. Gergatsoulis, and T. Kavalieros. Answering queries using materialized views with disjunctions. In Proc. Int. Conf. on Database Theory (ICDT), pages 435–452, 1999. [12] F. Afrati, C. Li, and J. Ullman. Generating efficient plans for queries using views. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 319–330, 2001. [13] F. Afrati and C. Papadimitriou. The parallel complexity of simple chain queries. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 210–?, 1987. [14] V. Aggelis and S. Cosmadakis. Optimization of nested sql queries by tableau equivalence. In Int. Workshop on Database Programming Languages, pages 31–42, 1999. [15] D. Agrawal, A. Abbadi, A. Singh, and T. Yurek. Efficient view maintenance at data warehouses. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 417–427, 1997. [16] A. Aho, P. Denning, and J. Ullman. Principles of optimal page replacement. Journal of the ACM, 18(1):80–93, 1971. [17] A. Aho, Y. Sagiv, and J. Ullman. Efficient optimization of a class of relational expressions. ACM Trans. on Database Systems, 4(4):435–454, 1979. [18] A. Aho, Y. Sagiv, and J. Ullman. Equivalence of relational expressions. SIAM J. on Computing, 8(2):218–246, 1979. [19] A. V. Aho, C. Beeri, and J. D. Ullman. The theory of joins in relational databases. ACM Trans. on Database Systems, 4(3):297–314, 1979. [20] S. Al-Khalifa, H. Jagadish, N. Koudas, J. Patel, D. Srivastava, and Y. Wu. Structural joins: A primitive for efficient XML query pattern matching. In Proc. IEEE Conference on Data Engineering, pages 141–152, 2002. [21] J. Albert. Algebraic properties of bag data types. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 211–219, 1991. [22] F. E. Allen and J. Cocke. A catalogue of optimizing transformations. In R. Rustin, editor, Design and optimization of compilers, pages 1–30. Prentice Hall, 1971. [23] N. Alon, P. Gibbons, Y. Matias, and M. Szegedy. Tracking join and selfjoin sizes in limited storage. J. Comput System Sciences, 35(4):391–432, 2002. [24] N. Alon, Y. Matias, and M. Szegedy. The space complexity of approximating the frequency moments. J. of Computer and System Sciences, 58(1):137–147, 1999. BIBLIOGRAPHY 627 [25] J. Alsabbagh and V. Rahavan. Analysis of common subexpression exploitation models in multiple-query processing. In Proc. IEEE Conference on Data Engineering, pages 488–497, 1994. [26] P. Alsberg. Space and time savings through large database compression and dynamic restructuring. In Proc IEEE 63,8, Aug. 1975. [27] D. Donoho amd M. Elad. Optimally sparse representation in general (non-orthogonal) dictionaries via ℓ1 minimization. Proc. of the National Academy of Sciences, 100(5):2197–2202, 2003. [28] S. Amer-Yahia, S. Cho, L. Lakshmanan, and D. Srivastava. Efficient algorithms for minimizing tree pattern queries. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 497–508, 2001. [29] L. Amsaleg, M. Franklin, A. Tomasic, and T. Urhan. Scrambling query plans to cope with unexpected delay. In 4th Int. Conference on Parallel and Distributed Information Systems (PDIS), Palm Beach, Fl, 1996. [30] C. Zuzarte an X. Yu. Fast approximate computation of statistics on views. In Proc. of the ACM SIGMOD Conf. on Management of Data, page 724, 2006. [31] O. Anfindsen. A study of access path selection in DB2. Technical report, Norwegian Telecommunication Administration and University of Oslo, Norway, Oct. 1989. [32] G. Antoshenkov. Random sampling from pseudo-ranked b+ -trees. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 375–382, 1992. [33] G. Antoshenkov. Dynamic query optimization in RDB/VMS. In Proc. IEEE Conference on Data Engineering, pages 538–547, Vienna, Apr. 1993. [34] G. Antoshenkov. Query processing in DEC Rdb: Major issues and future challenges. IEEE Data Engineering Bulletin, 16:42–52, Dec. 1993. [35] G. Antoshenkov. Dynamic optimization of index scans restricted by booleans. In Proc. IEEE Conference on Data Engineering, pages 430– 440, 1996. [36] P. Aoki. Algorithms for index-assisted selectivity estimation. Technical Report UCB/CSD-98-1021, University of California, Berkeley, Oct 1998. [37] P.M.G. Apers, A.R. Hevner, and S.B. Yao. Optimization algorithms for distributed queries. IEEE Trans. on Software Eng., 9(1):57–68, 1983. [38] P.M.G. Apers, A.R. Hevner, and S.B. Yao. Optimization algorithms for distributed queries. IEEE Trans. on Software Eng., 9(1):57–68, 1983. [39] R. Ashenhurst. Acm forum. Communications of the ACM, 20(8):609–612, 1977. 628 BIBLIOGRAPHY [40] M. Astrahan, M. Schkolnick, and K. Whang. Counting unique values of an attribute without sorting. Information Systems, 12(1):11–15, 1987. [41] M. M. Astrahan and D. D. Chamberlin. Implementation of a structured English query language. Communications of the ACM, 18(10):580–588, 1975. [42] M.M. Astrahan, M.W. Blasgen, D.D. Chamberlin, K.P. Eswaran, J.N. Gray, P.P. Griffiths, W.F. King, R.A. Lorie, P.R. Mc Jones, J.W. Mehl, G.R. Putzolu, I.L. Traiger, B.W. Wade, and V. Watson. System R: relational approach to database management. ACM Transactions on Database Systems, 1(2):97–137, June 1976. [43] R. Avnur and J. Hellerstein. Eddies: Continiously adaptive query optimization. In Proc. of the ACM SIGMOD Conf. on Management of Data, 2000. [44] B. Babcock and S. Chaudhuri. Towards a robust query optimizer: A principled and practical approach. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 119–130, 2005. [45] S. Babu, P. Bizarro, and D. DeWitt. Proactive re-optimization. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 107–118, 2005. [46] L. Baekgaard and L. Mark. Incremental computation of nested relational query expressions. ACM Trans. on Database Systems, 20(2):111–148, 1995. [47] T. Baer. Iperfex: A hardware performance monitor for Linux/IA32 systems. perform internet search for this or similar tools. [48] A. Balmin, F. Özcan, K. Beyer, R. Cochrane, and H. Pirahesh. A framework for using materialized XPath views in XML query processing. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 60–71, 2004. [49] F. Bancilhon and K. Koshafian. A calculus for complex objects. In ACM Symp. on Principles of Database Systems, pages 53–59, 1986. [50] J. Banerjee, W. Kim, and K.-C. Kim. Queries in object-oriented databases. MCC Technical Report DB-188-87, MCC, Austin, TX 78759, June 1987. [51] J. Banerjee, W. Kim, and K.-C. Kim. Queries in object-oriented databases. In Proc. IEEE Conference on Data Engineering, pages 31–38, 1988. [52] Z. Bar-Yossef, T. Jayram, R. Kumar, D. Sivikumar, and L. Trevisan. Counting distinct elements in a data stream. In 6th Int. Workshop RANDOM 2002, pages 1–10, 2002. BIBLIOGRAPHY 629 [53] E. Baralis, S. Paraboschi, and E. Teniente. Materialized views selection in a multidimensional database. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 156–165, 1997. [54] D. Barbar‘a, W. DuMouchel, C. Faloutsos, P. Haas, J. Hellerstein, Y. Ioannidis, H. Jagadish, T. Johnson, R. Ng, V. Poosala, K. Ross, and K. Sevcik. The New Jersey data reduction report. IEEE Data Engineering Bulletin, 20(4):3–45, 1997. [55] M. Bassiouni. Data compression in scientific and statistical databases. IEEE Trans. on Software Eng., 11(10):1047–1058, 1985. [56] D. Batory. On searching transposed files. ACM Trans. on Database Systems, 4(4):531–544, 1979. [57] D. Batory. Extensible cost models and query optimization in Genesis. IEEE Database Engineering, 9(4), Nov 1986. [58] D. S. Batory. Modeling the storage architecture of commercial database systems. ACM Trans. on Database Systems, 10(4):463–528, Dec. 1985. [59] D. S. Batory. A molecular database systems technology. Tech. Report TR-87-23, University of Austin, 1987. [60] D. S. Batory. Building blocks of database management systems. Technical Report TR-87-23, University of Texas, Austin, TX, Feb. 1988. [61] D. S. Batory. Concepts for a database system compiler. In Proc. of the 17nth ACM SIGMOD, pages 184–192, 1988. [62] D. S. Batory. On the reusability of query optimization algorithms. Information Sciences, 49:177–202, 1989. [63] D. S. Batory and C. Gotlieb. A unifying model of physical databases. ACM Trans. on Database Systems, 7(4):509–539, Dec. 1982. [64] D. S. Batory, T. Y. Leung, and T. E. Wise. Implementation concepts for an extensible data model and data language. ACM Trans. on Database Systems, 13(3):231–262, Sep 1988. [65] L. Baxter. TheComplexity of Unification. PhD thesis, University of Waterloo, 1976. [66] L. Becker and R. H. Güting. Rule-based optimization and query processing in an extensible geometric database system. ACM Trans. on Database Systems (to appear), 1991. [67] L. Becker and R. H. Güting. Rule-based optimization and query processing in an extensible geometric database system. ACM Trans. on Database Systems, 17(2):247–303, June 1992. 630 BIBLIOGRAPHY [68] C. Beeri and Y. Kornatzky. Algebraic optimization of object-oriented query languages. In Proc. Int. Conf. on Database Theory (ICDT), pages 72–88, 1990. [69] C. Beeri and Y. Kornatzky. Algebraic optimization of object-oriented query languages. Theoretical Computer Science, 116(1):59–94, 1993. [70] C. Beeri and Y. Tzaban. SAL: An algebra for semistructured data and XML. In ACM SIGMOD Workshop on the Web and Databases (WebDB), 1999. [71] L. A. Belady. A study of replacement algorithms for virtual storage computers. IBM Systems Journal, 5(2):78–101, 1966. [72] S. Bellamkonda, R. Ahmed, A. Witkowski, A. Amor, M. Zait, and C.-C. Lin. Enhanced subquery optimization in Oracle. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 1366–1377, 2009. [73] D. Beneventano, S. Bergamaschi, and C. Sartori. Description logic for semantic query optimization in object-oriented database systems. ACM Trans. on Database Systems, 28(1):1–50, 2003. [74] K. Bennett, M. Ferris, and Y. Ioannidis. A genetic algorithm for database query optimization. Technical Report Tech. Report 1004, University of Wisconsin, 1990. [75] K. Bennett, M. Ferris, and Y. Ioannidis. A genetic algorithm for database query optimization. In Proc. 4th Int. Conf. on Genetic Algorithms, pages 400–407, 1991. [76] J.L̃. Bentley and A. C.-C. Yao. An almost optimal algorithm for unbounded searching. Inf. Proc. Lett., 5(3):82–87, 1976. [77] A. Bernasconi and B. Codenetti. Measures of boolean function complexity based on harmonic analysis. In M. Bonuccelli, P. Crescenzi, and R. Petreschi, editors, Algorithms and Complexity (2nd. Italien Converence), pages 63–72, Rome, Feb. 1994. Springer (Lecture Notes in Computer Science 778). [78] P. Bernstein, E. Wong, C. Reeve, and J. Rothnie. Query processing in a system for distributed databases (sdd-1). ACM Trans. on Database Systems, 6(4):603–625, 1981. [79] P. A. Bernstein and D. M. W. Chiu. Using semi-joins to solve relational queries. Journal of the ACM, 28(1):25–40, 1981. [80] P. A. Bernstein and N. Goodman. The power of inequality semijoin. Information Systems, 6(4):255–265, 1981. [81] P. A. Bernstein and N. Goodman. The power of natural semijoin. SIAM J. Comp., 10(4):751–771, 1981. BIBLIOGRAPHY 631 [82] E. Bertino and D. Musto. Query optimization by using knowledge about data semantics. Data & Knowledge Engineering, 9(2):121–155, 1992. [83] E. Bertino, M. Negri, G. Pelagatti, and L. Sbattella. Object-oriented query languages: The notion and the issues. IEEE Trans. on Knowledge and Data Eng., 4(3):223–237, June 1992. [84] K. Beyer, P. Haas, B. Reinwald, Y. Sismanis, and R. Gemulla. On synopses for distinct-value estimation under multiset operations. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 199–210, 2007. [85] G. Bhargava, P. Goel, and B. Iyer. Hypergraph based reorderings of outer join queries with complex predicates. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 304–315, 1995. [86] G. Bhargava, P. Goel, and B. Iyer. No regression algorithm for the enumeration of projections in SQL queries with joins and outer joins. In IBM Centre for Advanced Studies Conference (CASCOM), 1995. [87] G. Bhargava, P. Goel, and B. Iyer. Simplification of outer joins. In IBM Centre for Advanced Studies Conference (CASCOM), 1995. [88] G. Bhargava, P. Goel, and B. Iyer. Efficient processing of outer joins and aggregate functions. In Proc. IEEE Conference on Data Engineering, pages 441–449, 1996. [89] A. Biliris. An efficient database storage structure for large dynamic objects. In Proc. IEEE Conference on Data Engineering, pages 301–308, 1992. [90] J. Biskup. A formal approach to null values in database relations. In Advances in Database Theory, 1981. [91] D. Bitton and D. DeWitt. Duplicate record elimination in large data files. ACM Trans. on Database Systems, 8(2):255–265, 1983. [92] S. Bitzer. Design and implementation of a query unnesting module in natix. Master’s thesis, University of Mannheim, 2007. [93] J. Blakeley and N. Martin. Join index, materialized view, and hybrid hash-join: a performance analysis. In Proc. IEEE Conference on Data Engineering, pages 256–236, 1990. [94] J. Blakeley, W. McKenna, and G. Graefe. Experiences building the Open OODB query optimizer. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 287–295, 1993. [95] J. A. Blakeley, P. A. Larson, and F. W. Tompa. Efficiently updating materialized views. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 61–71, Washington, D.C., 1986. 632 BIBLIOGRAPHY [96] B. Blohsfeld, D. Korus, and B. Seeger. A comparison of selectivity estimators for range queries on metric attributes. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 239–250, 1999. [97] P. Bodorik and J.S. Riordon. Distributed query processing optimization objectives. In Proc. IEEE Conference on Data Engineering, pages 320– 329, 1988. [98] T. Böhme and E. Rahm. Xmach-1: A benchmark for XML data management. In BTW, pages 264–273, 2001. [99] A. Bolour. Optimal retrieval for small range queries. SIAM J. of Comput., 10(4):721–741, 1981. [100] P. Bonatti. On the decidability of containment of recursive datalog queries - preliminary report. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 297–306, 2004. [101] P. Boncz, A. Wilschut, and M. Kersten. Flattening an object algebra to provide performance. In Proc. IEEE Conference on Data Engineering, pages 568–577, 1998. [102] B. Bouchou, M. Halfeld, and F. Alves. Updates and incremental validation of XML documents. In Int. Workshop on Database Programming Languages, pages 216–232, 2003. [103] M. Brantner, S. Helmer, C.-C. Kanne, and G. Moerkotte. Full-fledged algebraic XPath processing in Natix. In Proc. IEEE Conference on Data Engineering, pages 705–716, 2005. [104] M. Brantner, N. May, and G. Moerkotte. Unnesting SQL queries in the presence of disjunction. Technical Report TR-2006-013, University of Mannheim, 2006. [105] M. Brantner, N. May, and G. Moerkotte. Unnesting scalar SQL queries in the presence of disjunction. In Proc. IEEE Conference on Data Engineering, 2007. 46-55. [106] K. Bratbergsengen and K. Norvag. Improved and optimized partitioning techniques in database query procesing. In Advances in Databases, 15th British National Conference on Databases, pages 69–83, 1997. [107] Y. Breitbart and A. Reiter. Algorithms for fast evaluation of boolean expressions. Acta Informatica, 4:107–116, 1975. [108] S. Bressan, M. Lee, Y. Li, Z. Lacroix, and U. Nambiar. The XOO7 XML Management System Benchmark. Technical Report TR21/00, National University of Singapore, 2001. [109] K. Brown, M. Carey, and M. Livny. Goal-oriented buffer management revisited. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 353–364, Montreal, Canada, Jun 1996. BIBLIOGRAPHY 633 [110] N. Bruno, S. Chaudhuri, and L. Gravano. STHoles: a multidimensional workload-aware histogram. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 211–222, 2001. [111] N. Bruno, S. Chaudhuri, and L. Gravano. STHoles: a multidimensional workload-aware histogram. Technical Report MSR-TR-2001-36, Microsoft Research, 2001. [112] N. Bruno, C. Galindo-Legaria, and M. Joshi. Polynomial heuristics for query optimization. In Proc. IEEE Conference on Data Engineering, pages 589–600, 2010. [113] F. Bry. Logical rewritings for improving the evaluation of quantified queries. In 2nd. Symp. on Mathematical Fundamentals of Database Systems, pages 100–116, June 1989, Visegrad, Hungary, 1989. [114] F. Bry. Towards an efficient evaluation of general queries: Quantifiers and disjunction processing revisited. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 193–204, 1989. [115] F. Buccafurri and G. Lax. Fast range query estimation by n-level tree histograms. Data & Knowledge Engineering, 51:257–275, 2004. [116] F. Buccafurri, G. Lax, D. Sacca, L. Pontieri, and D. Rosaci. Enhancing histograms by tree-like bucket indices. The VLDB Journal, 17:1041–1061, 2008. [117] F. Buccafurri, L. Pontieri, D. Rosaci, and D. Sacca. Improving range query estimation on histograms. In Proc. IEEE Conference on Data Engineering, pages 628–638, 2002. [118] P. Buneman, S. Davidson, W. Fan, C. Hara, and W. Tan. Keys for XML. In WWW Conference, pages 201–210, 2001. [119] P. Buneman, L. Libkin, D. Suciu, V. Tannen, and L. Wong. Comprehension syntax. SIGMOD Record, 23(1):87–96, 1994. [120] L. Cabibbo and R. Torlone. A framework for the investigation of aggregate functions in database queries. In Proc. Int. Conf. on Database Theory (ICDT), pages 383–397, 1999. [121] J.-Y. Cai, V. Chakaravarthy, R. Kaushik, and J. Naughton. On the complexity of join predicates. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 207–214, 2001. [122] D. Calvanese, G. DeGiacomo, M. Lenzerini, and M. Vardi. View-based query answering and query containment over semistructured data. In Int. Workshop on Database Programming Languages, pages 40–61, 2001. [123] B. Cao and A. Badia. A nested relational approach to processing sql subqueries. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 191–202, 2005. 634 BIBLIOGRAPHY [124] L. Cardelli and P. Wegner. On understanding types, data abstraction, and polymorphism. JOURNAL of the ACM, Computing Surveys:471– 522, 1985. [125] A. F. Cardenas. Analysis and performance of inverted data base structures. Communications of the ACM, 18(5):253–263, 1975. [126] M. Carey, D. DeWitt, J. Richardson, and E. Shikita. Object and file management in the EXODUS extensible database system. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 91–100, 1986. [127] M. Carey and D. Kossmann. On saying “enough already!” in SQL. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 219– 230, 1997. [128] M. Carey and D. Kossmann. Processing top N and bottom N queries. IEEE Data Engineering Bulletin, 20(3):12–19, 1997. [129] M. Carey and D. Kossmann. Reducing the braking distance of an SQL query engine. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 158–169, 1998. [130] J. Carlis. HAS: A relational algebra operator, or devide is not to conquer. In Proc. IEEE Conference on Data Engineering, pages 254–261, 1986. [131] L. Carlitz, D. Roselle, and R. Scoville. Some remarks on ballot-type sequences of positive integers. Journal of Combinatorial Theory, 11:258– 271, 1971. [132] C. R. Carlson and R. S. Kaplan. A generalized access path model and its application to a relational database system. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 143–154, 1976. [133] R. Cattell, D. Barry, M. Berler, J. Eastman, D. Jordan, C. Russel, O. Schadow, T. Stanienda, and F. Velez, editors. The Object Database Standard: ODMG 3.0. Morgan Kaufmann, 1999. Release 3.0. [134] S. Ceri and G. Gottlob. Translating SQL into relational algebra: Optimization, semantics and equivalence of SQL queries. IEEE Trans. on Software Eng., 11(4):324–345, Apr 1985. [135] S. Ceri and G. Pelagatti. Correctness of query execution strategies in distributed databases. ACM Trans. on Database Systems, 8(4):577–607, Dec. 1983. [136] S. Ceri and G. Pelagatti. Distributed Databases: Principles and Systems. McGraw-Hill, 1985. [137] S. Chakkappen, T. Curanes, B. Dageville, L. Jiang, U. Shaft, H. Su, and M. Zait. Efficient statistics gathering for large databases in Oracle 11g. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 1053–1064, 2008. BIBLIOGRAPHY 635 [138] S. Chakravarthy. Devide and conquer: a basis for augmenting a conventional query optimizer with multiple query processing capabilities. In Proc. IEEE Conference on Data Engineering, 1991. [139] U. S. Chakravarthy, J. Grant, and J. Minker. Logic-based approach to semantic query optimization. ACM Trans. on Database Systems, 15(2):162– 207, 1990. [140] U. S. Chakravarthy and J. Minker. Multiple query processing in deductive databases using query graphs. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages ?–?, 1986. [141] D. Chamberlin, J. Robie, and D. Florescu. Quilt: An XML query language for heterogeneous data sources. In ACM SIGMOD Workshop on the Web and Databases (WebDB), 2000. [142] C. Chan and B. Ooi. Efficient scheduling of page accesses in index-based join processing. IEEE Trans. on Knowledge and Data Eng., 9(6):1005– 1011, 1997. [143] A. Chandra and P. Merlin. Optimal implementation of conjunctive queries in relational data bases. In Proc. ACM SIGACT Symp. on the Theory of Computing, pages 77–90, 1977. [144] A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in relational data bases. In Proc. 9th ACM Symposium on Theory of Computing, pages 77–90, 1976. [145] S. Chatterji, S. Evani, S. Ganguly, and M. Yemmanuru. On the complexity of approximate query optimization. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 282–292, 2002. [146] D. Chatziantoniou, M. Akinde, T. Johnson, and S. Kim. The MD-Join: An Operator for Complex OLAP. In Proc. IEEE Conference on Data Engineering, pages 524–533, 2001. [147] D. Chatziantoniou and K. Ross. Querying multiple features in relational databases. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 295–306, 1996. [148] D. Chatziantoniou and K. Ross. Groupwise processing of relational queries. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 476–485, 1997. [149] S. Chaudhuri, P. Ganesan, and S. Sarawagi. Factorizing complex predicates in queries to exploit indexes. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 361–372, 2003. [150] S. Chaudhuri and L. Gravano. Evaluating top-k selection queries. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 397–410, 1999. 636 BIBLIOGRAPHY [151] S. Chaudhuri, R. Krishnamurthy, S. Potamianos, and K. Shim. Optimizing queries with materialized views. In Proc. IEEE Conference on Data Engineering, pages 190–200, 1995. [152] S. Chaudhuri, R. Krishnamurthy, S. Potamianos, and K. Shim. Optimizing Queries with Materialized Views, pages 77–92. MIT Press, 1999. [153] S. Chaudhuri, V. Narasayya, and R. Ramamurthy. Estimating progress of long running sql queries. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 803–814, 2004. [154] S. Chaudhuri and K. Shim. Query optimization in the presence of foreign functions. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 529–542, 1993. [155] S. Chaudhuri and K. Shim. Including group-by in query optimization. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 354–366, 1994. [156] S. Chaudhuri and K. Shim. Optimization of queries with user-defined predicates. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 87–98, 1996. [157] S. Chaudhuri and K. Shim. Optimizing queries with aggregate views. In Proc. of the Int. Conf. on Extending Database Technology (EDBT), pages 167–182, 1996. [158] S. Chaudhuri and M. Vardi. On the equivalence of recursive and nonrecursive datalog programs. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 55–66, 1992. [159] S. Chaudhuri and M. Vardi. Optimization of real conjunctive queries. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 59–70, 1993. [160] S. Chaudhuri and M. Vardi. Optimization of real conjunctive queries. Technical Report HPL-93-26, HP Software Technology Laboratory, 1993. [161] S. Chaudhuri and M. Vardi. On the complexity of equivalence between recursive and nonrecursive datalog programs. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 55–66, 1994. [162] P. Cheeseman, B. Kanefsky, and W. Taylor. Where the really hard problems are. In Int. Joint Conf. on Artificial Intelligence (IJCAI), pages 331–337, 1991. [163] C. Chekuri and A. Rajaraman. Conjunctive query containment revisited. In Proc. Int. Conf. on Database Theory (ICDT), pages 56–70, 1997. [164] C. Chekuri and A. Rajaraman. Conjunctive query containment revisited. Theoretical Computer Science, 239:211–229, 2000. BIBLIOGRAPHY 637 [165] A.L.P. Chen. Outerjoin optimization in multidatabase systems. In Proc. 2nd. Int. Symp. on Databases in Parallel and Distributed Systems, pages 211–218, 1990. [166] C. Chen and N. Roussopoulos. The implementation and performance evaluation of the ADMS query optimizer: Integrating query result caching and matching. In Proc. of the Int. Conf. on Extending Database Technology (EDBT), pages 323–336, 1994. [167] Z. Chen, J. Gehrke, and F. Korn. Query optimization in compressed database systems. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 271–282, 2001. [168] Z. Chen, H. V. Jagadish, F. Korn, N. Koudas, S. Muthukrishnan, R. Ng, and D. Srivastava. Counting twig matches in a tree. In Proc. IEEE Conference on Data Engineering, pages 595–604, 2001. [169] Z. Chen and V. Narasayya. Efficient computation of multiple group by queries. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 263–274, 2005. [170] J. Cheng, D. Haderle, R. Hedges, B. Iyer, T. Messenger, C. Mohan, and Y. Wang. An efficient hybrid join algorithm: A DB2 prototype. In Proc. IEEE Conference on Data Engineering, pages 171–180, 1991. [171] J. Cheng, C. Loosley, A. Shibamiya, and P. Worthington. IBM DB2 Performance: design, implementation, and tuning. IBM Sys. J., 23(2):189– 210, 1984. [172] Q. Cheng, J. Gryz, F. Koo, T. Y. Cliff Leung, L. Liu, X. Quian, and B. Schiefer. Implementation of two semantic query optimization techniques in DB2 universal database. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 687–698, 1999. [173] M. Cherniack and S. Zdonik. Rule languages and internal algebras for rule-based optimizers. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 401–412, 1996. [174] T.-Y. Cheung. Estimating block accesses and number of records in file management. Communications of the ACM, 25(7):484–487, 1982. [175] D. M. Chiu and Y. C. Ho. A methodology for interpreting tree queries into optimal semi-join expressions. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 169–178, 1980. [176] H.-T. Chou and D. DeWitt. An evaluation of buffer management strategies for relational database systems. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 127–141, 1985. [177] S. Christodoulakis. Estimating block transfers and join sizes. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 40–54, 1983. 638 BIBLIOGRAPHY [178] S. Christodoulakis. Estimating record selectivities. Information Systems, 8(2):105–115, 1983. [179] S. Christodoulakis. Implications of certain assumptions in database performance evaluation. ACM Trans. on Database Systems, 9(2):163–186, 1984. [180] S. Christodoulakis. Analysis of retrieval performance for records and objects using optical disk technology. ACM Trans. on Database Systems, 12(2):137–169, 1987. [181] V. Christophides, S. Cluet, and G. Moerkotte. Evaluating queries with generalized path expressions. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 413–422, 1996. [182] C. Clarke, G. Cormack, and F. Burkowski. An algebra for structured text search and a framework for its implementation. The Computer Journal, 38(1):43–56, 1995. [183] J. Claussen, A. Kemper, and D. Kossmann. Order-preserving hash joins: Sorting (almost) for free. Technical Report MIP-9810, University of Passau, 1998. [184] J. Claussen, A. Kemper, G. Moerkotte, and K. Peithner. Optimizing queries with universal quantification in object-oriented and objectrelational databases. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 286–295, 1997. [185] J. Claussen, A. Kemper, G. Moerkotte, and K. Peithner. Optimizing queries with universal quantification in object-oriented and objectrelational databases. Technical Report MIP–9706, University of Passau, Fak. f. Mathematik u. Informatik, Mar 1997. [186] J. Claussen, A. Kemper, G. Moerkotte, K. Peithner, and M. Steinbrunn. Optimization and evaluation of disjunctive queries. IEEE Trans. on Knowledge and Data Eng., 12(2):238–260, 2000. [187] S. Cluet and C. Delobel. A general framework for the optimization of object-oriented queries. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 383–392, 1992. [188] S. Cluet, C. Delobel, C. Lecluse, and P. Richard. Reloop, an algebra based query language for an object-oriented database system. In Proc. Int. Conf. on Deductive and Object-Oriented Databases (DOOD), 1989. [189] S. Cluet and G. Moerkotte. Nested queries in object bases. In Proc. Int. Workshop on Database Programming Languages, pages 226–242, 1993. [190] S. Cluet and G. Moerkotte. Classification and optimization of nested queries in object bases. In BDA, pages 331–349, 1994. BIBLIOGRAPHY 639 [191] S. Cluet and G. Moerkotte. Classification and optimization of nested queries in object bases. Technical Report 95-6, RWTH Aachen, 1995. [192] S. Cluet and G. Moerkotte. Efficient evaluation of aggregates on bulk types. Technical Report 95-5, RWTH-Aachen, 1995. [193] S. Cluet and G. Moerkotte. Efficient evaluation of aggregates on bulk types. In Proc. Int. Workshop on Database Programming Languages, 1995. [194] S. Cluet and G. Moerkotte. On the complexity of generating optimal left-deep processing trees with cross products. In Proc. Int. Conf. on Database Theory (ICDT), pages 54–67, 1995. [195] S. Cluet and G. Moerkotte. Query optimization techniques exploiting class hierarchies. Technical Report 95-7, RWTH-Aachen, 1995. [196] E. Codd. A relational model of data for large shared data banks. Communications of the ACM, 13(6):377–387, 1970. [197] E. Codd. Database Systems - Courant Computer Science Symposium. Prentice Hall, 1972. [198] E. F. Codd. A database sublanguage founded on the relational calculus. In Proc. ACM-SIGFIDET Workshop, Datadescription, Access, and Control, pages 35–68, San Diego, Calif., 1971. ACM. [199] E. F. Codd. Relational completeness of data base sublanguages. In Courant Computer Science Symposia No. 6: Data Base Systems, pages 67–101, New York, 1972. Prentice Hall. [200] E. F. Codd. Extending the relational database model to capture more meaning. ACM Trans. on Database Systems, 4(4):397–434, Dec. 1979. [201] S. Cohen, W. Nutt, and Y. Sagiv. Equivalences among aggregate queries with negation. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 215–226, 2001. [202] S. Cohen, W. Nutt, and Y. Sagiv. Containment of aggregate queries (extended version). Technical report, Hebrew University of Jerusalem, 2002. available at www.cs.huji.ac.il/ sarina/papers/agg-containment-long.ps. [203] S. Cohen, W. Nutt, and Y. Sagiv. Containment of aggregate queries. In Proc. Int. Conf. on Database Theory (ICDT), pages 111–125, 2003. [204] L. Colby. A recursive algebra and query optimization for nested relational algebra. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 273–283, 1989. [205] L. Colby, A. Kawaguchi, D. Lieuwen, I. Mumick, and K. Ross. Supporting multiple view maintenance policies. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 405–416, 1997. 640 BIBLIOGRAPHY [206] G. Copeland and S. Khoshafian. A decomposition storage model. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 268– 279, Austin, TX, 1985. [207] G. Cormack. Data compression on a database system. Communications of the ACM, 28(12):1336–1342, 1985. [208] T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. MIT Press, 1990. [209] T. Cormen, C. Leiserson, R. Rivest, and C. Stein. Introduction to Algorithms. MIT Press, 2001. 2nd Edition. [210] G. Cormode, M. Garofalakis, P. Haas, and C. Jermaine. Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches. NOW Press, 2012. [211] D. Cornell and P. Yu. Integration of buffer management and query optimization in relational database environments. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 247–255, 1989. [212] J. Crawford and L. Auton. Experimental results on the crossover point in satisfiability problems. In Proc. National Conference on Artificial Intelligence, pages 21–27, 1993. [213] K. Culik, T. Ottmann, and D. Wood. Dense multiway trees. ACM Trans. on Database Systems, 6(3):486–512, 1981. [214] C. Cunningham, G. Graefe, and C. Galindo-Legaria. Pivot and unpivot: Optimization and execution strategies in an rdbms. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 998–1009, 2004. [215] M. Dadashzadeh. An improved division operator for relational algebra. Information Systems, 14(5):431–437, 1989. [216] D. Daniels. Query compilation in a distributed database system. Technical Report RJ 3432, IBM Research Laboratory, San Jose, CA, 1982. [217] D. Das and D. Batory. Praire: A rule specification framework for query optimizers. In Proc. IEEE Conference on Data Engineering, pages 201– 210, 1995. [218] C. J. Date. The outer join. In Proc. of the Int. Conf. on Databases, Cambridge, England, 1983. [219] U. Dayal. Processing queries with quantifiers: A horticultural approach. In ACM Symp. on Principles of Database Systems, pages 125–136, 1983. [220] U. Dayal. Of nests and trees: A unified approach to processing queries that contain nested subqueries, aggregates, and quantifiers. In VLDB, pages 197–208, 1987. BIBLIOGRAPHY 641 [221] U. Dayal, N. Goodman, and R.H. Katz. An extended relational algebra with control over duplicate elimination. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 117– 123, 1982. [222] U. Dayal, F. Manola, A. Buchman, U. Chakravarthy, D. Goldhirsch, S. Heiler, J. Orenstein, and A. Rosenthal. Simplifying complex object: The PROBE approach to modelling and querying them. In H.J. Schek and G. Schlageter (eds.) Proc. BTW, pages 17–37, 1987. [223] U. Dayal and J. Smith. PROBE: A knowledge-oriented database management system. In Proc. Islamorada Workshop on Large Scale Knowledge Base and Reasoning Systems, 1985. [224] G. de Balbine. Note on random permutations. Mathematics of Computation, 21:710–712, 1967. [225] D. DeHaan, P.-A. Larson, and J. Zhou. Stacked indexed views in Microsoft SQL server. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 179–190, 2005. [226] K. Delaney. Inside Microsoft SQL Server 2005: Query Tuning and Optimization. Microsoft Press, 2008. [227] R. Demolombe. Estimation of the number of tuples satisfying a query expressed in predicate calculus language. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 55–63, 1980. [228] P. Denning. Effects of scheduling on file memory operations. In Proc. AFIPS, pages 9–21, 1967. [229] N. Derrett and M.-C. Shan. Rule-based query optimization in IRIS. Technical report, Hewlard-Packard Laboratories, 1501 Page Mill Road, Palo Alto, CA94303, 1990. [230] B. Desai. Performance of a composite attribute and join index. IEEE Trans. on Software Eng., 15(2):142–152, Feb. 1989. [231] A. Deshpande, M. Garofalakis, and R. Rastogi. Independence is good: Dependency-based histogram synopses for high-dimensional data. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 199– 210, 2001. [232] A. Deshpande, Z. Ives, and V. Raman. Adaptive Query Optimization. NOW, 2007. [233] A. Deutsch, M. Fernandez, D. Florescu, A. Levy, D. Maier, and D. Suciu. Querying XML data. IEEE Data Engineering Bulletin, 22(3):10–18, 1999. [234] A. Deutsch, B. Ludäscher, and A. Nash. Rewriting queries using views with access patterns under integrity constraints. In Proc. Int. Conf. on Database Theory (ICDT), pages 352–367, 2005. 642 BIBLIOGRAPHY [235] A. Deutsch and V. Tannen. Containment and integrity constraints for XPath. In Int. Workshop on Knowledge Representation meets Databases (KRDB), 2001. [236] A. Deutsch and V. Tannen. Optimization properties for classes of conjunctive regular path queries. In Int. Workshop on Database Programming Languages, pages 21–39, 2001. [237] A. Deutsch and V. Tannen. Reformulation of XML queries and constraints. In Proc. Int. Conf. on Database Theory (ICDT), pages 225–241, 2003. [238] D. DeWitt, R. Katz, F. Olken, L. Shapiro, M. Stonebraker, and D. Wood. Implementation techniques for main memory database systems. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 151–1??, 1984. [239] D. DeWitt, J. Naughton, and D. Schneider. An evaluation of non-equijoin algorithms. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 443–452, 1991. [240] Y. Diao, M. Altinel, M. Franklin, H. Zhang, and P. Fischer. Path sharing and predicate evaluation for high-performance xml filtering. ACM Trans. on Database Systems, 28(4):367–516, 2003. [241] G. Diehr and A. Saharia. Estimating block accesses in database organizations. IEEE Trans. on Knowledge and Data Engineering, 6(3):497–499, 1994. [242] P. Dietz. Optimal algorithms for list indexing and subset ranking. In Workshop on Algorithms and Data Structures (LNCS 382), pages 39–46, 1989. [243] Z. Dimitrijevic, R. Rangaswami, E. Chang, D. Watson, and A. Acharya. Diskbench: User-level disk feature extraction tool. Technical report, University of California, Santa Barbara, 2004. [244] J. Dongarra, K. London, S. Moore, P. Mucci, and D. Terpstra. Using PAPI for hardware performance monitoring on Linux systems. perform internet search for this or similar tools. [245] D. Donjerkovic, Y. Ioannidis, and R. Ramakrishnan. Dynamic histograms: Capturing evolving data sets. In Proc. IEEE Conference on Data Engineering, page 86, 2000. [246] D. Donjerkovic and R. Ramakrishnan. Probabilistic optimization of Top n queries. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 411–422, 1999. [247] J. Donovan. Database system approach to management of decision support. ACM Trans. on Database Systems, 1(4):344–368, 1976. BIBLIOGRAPHY 643 [248] M. Drmota, D. Gardy, and B. Gittenberger. A unified presentation of some urn models. Algorithmica, 29:120–147, 2001. [249] M. Durand. Combinatoire analytique et algorithmique des ensembles de données. PhD thesis, Ecole Polytechnique, 2004. [250] M. Durand and P. Flajolet. Loglog counting of large cardinalities. In Algorithms - ESA 2003, Annual European Symposium, pages 605–617. Springer LNCS 2832, 2003. [251] R. Durstenfeld. Algorithm 235: Random permutation. Communications of the ACM, 7(7):420, 1964. [252] O. Duschka. Query Planning and Optimization in Information Integration. PhD thesis, Stanford University, 1997. [253] O. Duschka and M. Genesereth. Answering recursive queries using views. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 109–116, 1997. [254] O. Duschka and M. Genesereth. Query planning with disjunctive sources. In AAAI Workshop on AI and Information Integration, 1998. [255] O. Duschka and M. Gensereth. Answering recursive queries using views. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 109–116, 1997. [256] O. Duschka and A. Levy. Recursive plans for information gathering. In Int. Joint Conf. on Artificial Intelligence (IJCAI), 1997. [257] W. Effelsberg and T. Härder. Principles of database buffer management. ACM Trans. on Database Systems, 9(4):560–595, 1984. [258] S. Eggers, F. Olken, and A. Shoshani. A compression technique for large statistical data bases. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 424–434, 1981. [259] S. Eggers and A. Shoshani. Efficient access of compressed data. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 205–211, 1980. [260] J. F. Egler. A procedure for converting logic table conditions into an efficient sequence of test instructions. Communications of the ACM, 6(9):510–514, 1963. [261] M. Eisner and D. Severance. Mathematical techniques for efficient record segmentation in large shared databases. Journal of the ACM, 23(4):619– 635, 1976. [262] A. El-Helw, I. Ilyas, W. Lau, V. Markl, and C. Zuzarte. Collecting and maintaining just-in-time statistics. In Proc. IEEE Conference on Data Engineering, pages 516–525, 2007. 644 BIBLIOGRAPHY [263] Ml Elhemali, C. Galindo-Legaria, T. Grabs, and M. Joshi. Execution strategies for SQL subqueries. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 993–1003, 2007. [264] R. Elmasri and S. Navathe. Fundamentals of Database Systems. AddisonWesley, 2000. 3rd Edition. [265] G. Lohman et al. Optimization of nested queries in a distributed relational database. In Proc. Int. Conf. on Very Large Data Bases (VLDB), 1984. [266] N. Roussopoulos et al. The maryland ADMS project: Views R Us. IEEE Data Engineering Bulletin, 18(2), 1995. [267] P. Schwarz et al. Extensibility in the starburst database system. In Proc. Int. Workshop on Object-Oriented Database Systems, 1986. [268] R. Fagin. Combining fuzzy information from multiple systems. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 216–226, 1996. [269] R. Fagin, A. Lotem, and M. Naor. Optimal aggregation algorithms for middleware. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 102–113, 2001. [270] C. Farr’e, E. Teniente, and T. Urp’i. Query containment checking as a view updating problem. In Inf. Conf. on Database and Expert Systems Applications (DEXA), pages 310–321, 1998. [271] C. Farr’e, E. Teniente, and T. Urp’i. The constructive method for query containment checking. In Inf. Conf. on Database and Expert Systems Applications (DEXA), pages 583–593, 1999. [272] C. Farr’e, E. Teniente, and T. Urp’i. Query containment with negated IDB predicates. In ABDIS, pages 583–593, 2003. [273] C. Farr’e, E. Teniente, and T. Urp’i. Checking query containment with the cqc method. Data & Knowledge Engineering, 53:163–223, 2005. [274] J. Fedorowicz. Database evaluation using multiple regression techniques. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 70–76, 1984. [275] J. Fedorowicz. Database performance evaluation in an indexed file environment. ACM Trans. on Database Systems, 12(1):85–110, 1987. [276] L. Fegaras. Optimizing large OODB queries. In Proc. Int. Conf. on Deductive and Object-Oriented Databases (DOOD), pages 421–422, 1997. [277] L. Fegaras. A new heuristic for optimizing large queries. In DEXA, pages 726–735, 1998. BIBLIOGRAPHY 645 [278] L. Fegaras and D. Maier. Towards an effective calculus for object query languages. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 47–58, 1995. [279] L. Fegaras and D. Maier. Optimizing object queries using an effective calculus. ACM Trans. on Database Systems, 25(4):457–516, 2000. [280] T. Fiebig and G. Moerkotte. Evaluating Queries on Structure with eXtended Access Support Relations. In WebDB 2000, 2000. [281] T. Fiebig and G. Moerkotte. Algebraic XML construction in Natix. In Proc. Int. Conf. on Web Information Systems Engineering (WISE), pages 212–221, 2001. [282] T. Fiebig and G. Moerkotte. Algebraic XML construction and its optimization in Natix. World Wide Web Journal, 4(3):167–187, 2002. [283] S. Finkelstein. Common expression analysis in database applications. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 235– 245, 1982. [284] P. Flajolet. Approximate counting: A detailed analysis. BIT, 25(1):113– 134, 1985. [285] P. Flajolet, E. Fusy, O. Gandouet, and F. Meunier. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. In Conf. on Analysis of Algorithms (AofA). Discrete Mathematics and Theoretical Computer Science, pages 127–146, 2007. [286] P. Flajolet and G. Martin. Probabilistic counting. In Annual Symposium on Foundations of Computer Science (FOCS), pages 76j–82, 1983. [287] P. Flajolet and G. Martin. Probabilistic counting algorithms for data base applications. Rapports de Recherche 313, INRIA Rocquencourt, 1984. [288] P. Flajolet and G. Martin. Probabilistic counting algorithms for data base applications. J. Comput. Syst. Sci., 31(2):182–209, 1985. [289] D. Florescu. Espaces de Recherche pour l’Optimisation de Requêtes Objet (Search Spaces for Query Optimization). PhD thesis, Université de Paris VI, 1996. in French. [290] D. Florescu, A. Levy, and D. Suciu. Query containment for conjunctive queries with regular expressions. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 139–148, 1998. [291] P. Fortier. SQL-3, Implementing the SQL Foundation Standard. McGraw Hill, 1999. [292] F. Fotouhi and S. Pramanik. Optimal secondary storage access sequence for performing relational join. IEEE Trans. on Knowledge and Data Eng., 1(3):318–328, 1989. 646 BIBLIOGRAPHY [293] F. Frasincar, G.-J. Houben, and C. Pau. XAL: An algebra for XML query optimization. In Australasian Database Conference (ADC), 2002. [294] J. Freire, J. Haritsa, M. Ramanath, P. Roy, and J. Simeon. StatiX: making XML count. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 181–191, 2002. [295] J. C. Freytag. A rule-based view of query optimization. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 173–180, 1987. [296] J. C. Freytag and N. Goodman. On the translation of relational queries into iterative programs. ACM Trans. on Database Systems, 14(1):1–27, 1989. [297] C. Galindo-Legaria. Outerjoin Simplification and Reordering for Query Optimization. PhD thesis, Harvard University, 1992. [298] C. Galindo-Legaria. Outerjoins as disjunctions. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 348–358, 1994. [299] C. Galindo-Legaria. Outerjoins as disjunctions. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 348–538, 1994. [300] C. Galindo-Legaria and M. Joshi. Orthogonal optimization of subqueries and aggregation. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 571–581, 2001. [301] C. Galindo-Legaria, M. Joshi, F. Waas, and M.-C. Wu. Statistics on views. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 952–962, 2003. [302] C. Galindo-Legaria, A. Pellenkoft, and M. Kersten. Cost distribution of search spaces in query optimization. Technical Report CS-R9432, CWI, Amsterdam, NL, 1994. [303] C. Galindo-Legaria, A. Pellenkoft, and M. Kersten. Fact, randomized join-order selection — why use transformations? In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 85–95, 1994. [304] C. Galindo-Legaria, A. Pellenkoft, and M. Kersten. Fast, randomized join-order selection — why use transformations? Technical Report CSR–9416, CWI, Amsterdam, NL, 1994. [305] C. Galindo-Legaria, A. Pellenkoft, and M. Kersten. The impact of catalogs and join algorithms on probabilistic query optimization. Technical Report CS-R9459, CWI, Amsterdam, NL, 1994. [306] C. Galindo-Legaria, A. Pellenkoft, and M. Kersten. Uniformly-distributed random generation of join orders. Technical Report CS-R9431, CWI, Amsterdam, NL, 1994. BIBLIOGRAPHY 647 [307] C. Galindo-Legaria, A. Pellenkoft, and M. Kersten. Cost distribution of search spaces in query optimization. In Proc. Int. Conf. on Database Theory (ICDT), pages 280–293, 1995. [308] C. Galindo-Legaria and A. Rosenthal. Outerjoin simplification and reordering for query optimization. ACM Trans. on Database Systems, 22(1):43–73, Marc 1997. [309] C. Galindo-Legaria, A. Rosenthal, and E. Kortright. Expressions, graphs, and algebraic identities for joins, 1991. working paper. [310] S. Ganapathy and V. Rajaraman. Information theory applied to the conversion of decision tables to computer programs. Communications of the ACM, 16:532–539, 1973. [311] S. Gandeharizadeh, J. Stone, and R. Zimmermann. Techniques to quantify SCSI-2 disk subsystem specifications for multimedia. Technical Report 95-610, University of Southern California, 1995. [312] S. Ganguly. On the complexity of finding optimal join order sequence for star queries without cross products. personal correspondance, 2000. [313] S. Ganguly, A. Goel, and A. Silberschatz. Efficient and accurate cost model for parallel query optimization. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 172–181, 1996. [314] R. Ganski and H. Wong. Optimization of nested SQL queries revisited. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 23–33, 1987. [315] G. Garani and R. Johnson. Joining nested relations and subrelations. Information Systems, 25(4):287–307, 2000. [316] H. Garcia-Molina, J. Ullman, and J. Widom. Database System Implementation. Prentice Hall, 2000. [317] D. Gardy and L. Nemirovski. Urn models and yao’s formula. In Proc. Int. Conf. on Database Theory (ICDT), pages 100–112, 1999. [318] D. Gardy and C. Puech. On the effect of join operations on relation sizes. ACM Trans. on Database Systems, 14(4):574–603, 1989. [319] M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, 1978. [320] M. R. Garey and D. S. Johnson. Computers and Intractability: a Guide to the Theory of NP-Completeness. Freeman, San Francisco, 1979. [321] M. Garofalakis and P. Gibbons. Wavelet synopses with error guarantees. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 476–487, 2002. 648 BIBLIOGRAPHY [322] P. Gassner, G. Lohman, and K. Schiefer. Query optimization in the IBM DB2 family. IEEE Data Engineering Bulletin, 16:4–18, Dec. 1993. [323] P. Gassner, G. Lohman, K. Schiefer, and Y. Wang. Query optimization in the IBM DB2 family. Technical report rj 9734, IBM, 1994. [324] E. Gelenbe and D. Gardy. On the size of projections: I. Information Processing Letters, 14:1, 1982. [325] E. Gelenbe and D. Gardy. The size of projections of relations satisfying a functional dependency. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 325–333, 1982. [326] I. Gent and T. Walsh. Towards an understanding of hill-climbing procedures for SAT. In Proc. National Conference on Artificial Intelligence, pages 28–33, 1993. [327] A. Gheazal, R. Bhashyam, and A. Crolotte. Block optimization in the Teradata RDBMS. In Inf. Conf. on Database and Expert Systems Applications (DEXA), pages 782–791, 2003. [328] A. Gheazal, A. Crolotte, and R. Bhashyam. Outer join elimination in the Teradata RDBMS. In Inf. Conf. on Database and Expert Systems Applications (DEXA), pages 730–740, 2004. [329] L. Giakoumakis and C. Galinda-Legaria. Testing sql server’s query optimizer: Challenges, techniques and experiences. IEEE Data Engineering Bulletin, 31(1):36–43, 2008. [330] P. Gibbons and Y. Matias. New sampling-based summary statistics for improving approximate query answers. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 331–342, 1998. [331] F. Giroire. Order statistics and estimating cardinalities of massive data sets. Discrete Applied Mathematics, 157:406–427, 2009. [332] P. Godfrey, J. Gryz, and C. Zuzarte. Exploiting constraint-like data characterizations in query optimization. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 582–592, 2001. [333] D. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley, 1989. [334] J. Goldstein and P. Larson. Optimizing queries using materialized views: A practical, scalable solution. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 331–342, 2001. [335] J. Goldstein, R. Ramakrishnan, and U. Shaft. Compressing relations and indexes. In Proc. IEEE Conference on Data Engineering, 1998. to appear. [336] G. Golub and C. van Loan. Matrix Computations. The John Hopkins University Press, 1996. Third Edition. BIBLIOGRAPHY 649 [337] G. Gorry and S. Morton. A framework for management information systems. Sloan Management Review, 13(1):55–70, 1971. [338] G. Gottlob, C. Koch, and R. Pichler. Efficient algorithms for processing XPath queries. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 95–106, 2002. [339] G. Gottlob, C. Koch, and R. Pichler. XPath processing in a nutshell. SIGMOD Record, 2003. [340] G. Gottlob, C. Koch, and R. Pichler. XPath query evaluation: Improving time and space efficiency. In Proc. IEEE Conference on Data Engineering, page to appear, 2003. [341] M. Gouda and U. Dayal. Optimal semijoin schedules for query processing in local distributed database systems. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 164–175, 1981. [342] P. Goyal. Coding methods for text string search on compressed databases. Information Systems, 8(3):231–233, 1983. [343] G. Graefe. Software modularization with the exodus optimizer generator. IEEE Database Engineering, 9(4):37–45, 1986. [344] G. Graefe. Relational division: Four algorithms and their performance. In Proc. IEEE Conference on Data Engineering, pages 94–101, 1989. [345] G. Graefe. Encapsulation of parallelism in the Volcano query processing system. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages ?–?, 1990. [346] G. Graefe. Heap-filter merge join: A new algorithm for joining mediumsize inputs. IEEE Trans. on Software Eng., 17(9):979–982, 1991. [347] G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2), June 1993. [348] G. Graefe. Query evaluation techniques for large databases. Shortened version: [347], July 1993. [349] G. Graefe. Sort-merge-join: An idea whose time has(h) passed? In Proc. IEEE Conference on Data Engineering, pages 406–417, 1994. [350] G. Graefe. The cascades framework for query optimization. IEEE Data Engineering Bulletin, 18(3):19–29, Sept 1995. [351] G. Graefe. Executing nested queries. In BTW, pages 58–77, 2003. [352] G. Graefe, R. Bunker, and S. Cooper. Hash joins and hash teams in Microsoft SQL Server. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 86–97, 1998. 650 BIBLIOGRAPHY [353] G. Graefe and R. Cole. Fast algorithms for universal quantification in large databases. Internal report, Portland State University and University of Colorado at Boulder, 19?? [354] G. Graefe and R. Cole. Dynamic query evaluation plans. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages ?–?, 1994. [355] G. Graefe and R. Cole. Fast algorithms for universal quantification in large databases. ACM Trans. on Database Systems, ?(?):?–?, ? 1995? [356] G. Graefe, R. Cole, D. Davison, W. McKenna, and R. Wolniewicz. Extensible query optimization and parallel execution in Volcano. In Dagstuhl Query Processing Workshop, pages 337–380, 1991. [357] G. Graefe and D. DeWitt. The EXODUS optimizer generator. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 160–172, 1987. [358] G. Graefe, A. Linville, and L. Shapiro. Sort versus hash revisited. IEEE Trans. on Knowledge and Data Eng., 6(6):934–944, Dec. 1994. [359] G. Graefe and W. McKenna. The Volcano optimizer generator. Tech. Report 563, University of Colorado, Boulder, 1991. [360] G. Graefe and W. McKenna. Extensibility and search efficiency in the volcano optimizer generator. In Proc. IEEE Conference on Data Engineering, pages 209–218, 1993. [361] G. Graefe and K. Ward. Dynamic query evaluation plans. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 358–366, 1989. [362] R. Graham, D. Knuth, and O. Patashnik. Addison-Wesley, 2002. Concrete Mathematics. [363] G. Grahne and A. Thomo. Algebraic rewritings for optimizing regular path queries. In Proc. Int. Conf. on Database Theory (ICDT), pages 301–315, 2001. [364] G. Grahne and A. Thomo. New rewritings and optimizations for regular path queries. In Proc. Int. Conf. on Database Theory (ICDT), pages 242–258, 2003. [365] J. Grant, J. Gryz, J. Minker, and L. Raschid. Semantic query optimization for object databases. In Proc. IEEE Conference on Data Engineering, pages 444–453, 1997. [366] J. Gray, editor. The Benchmark Handbook. Morgan Kaufmann Publishers, San Mateo, CA, 1991. [367] J. Gray and G. Graefe. The five-minute rule ten years later, and other computer storage rules of thumb. ACM SIGMOD Record, 26(4):63–68, 1997. BIBLIOGRAPHY 651 [368] J. Gray and F. Putzolu. The 5 minute rule for trading memory for disk accesses and the 10 byte rule for trading memory for CPU time. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 395–398, 1987. [369] P. Grefen and R. de By. A multi-set extended relational algebra – a formal approach to a practical issue. In Proc. IEEE Conference on Data Engineering, pages 80–88, 1994. [370] T. Grust. Accelerating XPath location steps. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 109–120, 2002. [371] T. Grust and M. Van Keulen. Tree awareness for relational database kernels: Staircase join. In Intelligent Search on XML Data, pages 231– 245, 2003. [372] T. Grust, M. Van Keulen, and J. Teubner. Staircase join: Teach a relational dbms to watch its (axis) steps. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 524–525, 2003. [373] T. Grust, M. Van Keulen, and J. Teubner. Accelerating XPath evaluation in any RDBMS. ACM Trans. on Database Systems, 29(1):91–131, 2004. [374] J. Gryz, B. Schiefer, J. Zheng, and C. Zuzarte. Discovery and application of check constraints in DB2. In Proc. IEEE Conference on Data Engineering, pages 551–556, 2001. [375] E. Gudes and A. Reiter. On evaluating boolean expression. Software Practice and Experience, 3:345–350, 1973. [376] S. Guha, N. Koudas, and K. Shim. Data-streams and histograms. In Annual ACM Symposium on Theory of Computing (STOC), pages 471– 475, 2001. [377] S. Guha, N. Koudas, and D. Srivastava. Fast algorithms for hierarchical range histogram construction. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 180–187, 2002. [378] H. Gunadhi and A. Segev. Query processing algorithms for temporal intersection joins. In Proc. IEEE Conference on Data Engineering, pages 336–344, 1991. [379] L. Guo, K. Beyer, J. Shanmugasundaram, and E. Shekita. Efficient inverted lists and query algorithms for structured value ranking in updateintense relational databases. In Proc. IEEE Conference on Data Engineering, pages 298–309, 2005. [380] M. Guo, S. Y. W. Su, and H. Lam. An association algebra for processing object-oriented databases. In Proc. IEEE Conference on Data Engineering, pages ?–?, 1991. 652 BIBLIOGRAPHY [381] A. Gupta, V. Harinarayan, and D. Quass. Aggregate-query processing in data warehousing environments. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 358–369, 1995. [382] A. Gupta, V. Harinarayan, and D. Quass. Generalized projections: A powerful approach to aggregation. Technical Report, 1995. [383] A. Gupta and I. Mumick. Maintenance of materialized views: problems, techniques and applications. IEEE Data Engineering Bulletin, 18(2):3–19, 1995. [384] R. Güting, R. Zicari, and D. Choy. An algebra for structured office documents. ACM Trans. on Information Systems, 7(4):123–157, 1989. [385] R. H. Güting. Geo-relational algebra: A model and query language for geometric database systems. In J. W. Schmidt, S. Ceri, and M. Missikoff, editors, Proc. of the Intl. Conf. on Extending Database Technology, pages 506–527, Venice, Italy, Mar 1988. Springer-Verlag, Lecture Notes in Computer Science No. 303. [386] R. H. Güting. Second-order signature: A tool for specifying data models, query processing, and optimization. Informatik-Bericht 12/1992, ETH Zürich, 1992. [387] L. Haas, W. Chang, G. Lohman, J. McPherson, P. Wilms, G. Lapis, B. Lindsay, H. Pirahesh, M. Carey, and E. Shekita. Starbust mid-flight: As the dust clears. IEEE Trans. on Knowledge and Data Eng., 2(1):143– 160, 1990. [388] L. Haas, J. Freytag, G. Lohman, and H. Pirahesh. Extensible query processing in starburst. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 377–388, 1989. [389] A. Hadi. Matrix Algebra as a Tool. Duxbury Press, 1996. [390] A. Halevy. Answering queries using views: A survey. The VLDB Journal, 10(4):270–294, Dec. 2001. [391] P. A. V. Hall. Common subexpression identification in general algebraic systems. Tech. rep. uksc 0060, IBM UK Scientific Center, Peterlee, England, 1974. [392] P. A. V. Hall. Optimization of single expressions in a relational database system. IBM J. Res. Devel., 20(3):244–257, 1976. [393] P. A. V. Hall and S. Todd. Factorization of algebraic expressions. Tech. Report UKSC 0055, IBM UK Scientific Center, Peterlee, England, 1974. [394] C. Hamalainen. Complexity of query optimisation and evaluation. Master’s thesis, Griffith University, Queensland, Australia, 2002. BIBLIOGRAPHY 653 [395] M. Hammer and B. Niamir. A heuristic approach to attribute partitioning. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 93–101, 1979. [396] J. Han. Smallest-first query evaluation for database systems. In Australian Database Conference, pages ?–?, Christchurch, New Zealand, Jan. 1994. [397] M. Z. Hanani. An optimal evaluation of boolean expressions in an online query system. Communications of the ACM, 20(5):344–347, 1977. [398] E. Hanson. A performance analysis of view materialization strategies. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 440–453, 1987. [399] E. Hanson. Processing queries against database procedures. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages ?–?, 1988. [400] E.N. Hanson, M. Chaabouni, C.-H. Kim, and Y.-W. Wang. A predicate matching algorithm for database rule systems. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 271–?, 1990. [401] T. Härder. Implementing a generalized access path structure for a relational database system. ACM Trans. on Database Systems, 3(3):285–298, 1978. [402] T. Härder, B. Mitschang, and H. Schöning. Query processing for complex objects. Data and Knowledge Engineering, 7(3):181–200, 1992. [403] T. Härder and E. Rahm. Datenbanksysteme. Springer, 1999. [404] T. Härder, H. Schöning, and A. Sikeler. Parallelism in processing queries on complex objects. In International Symposium on Databases in Parallel and Distributed Systems, Ausgin, TX, August 1988. [405] V. Harinarayan and A. Gupta. Generalized projections: a powerful query optimization technique. Technical Report STAN-CS-TN-94-14, Stanford University, 1994. [406] H. Harmouch and F. Naumann. Cardinality estimation: An experimental survey. Proc. of the VLDB Endowment (PVLDB), 11(4):499–512, 2017. [407] E. Harris and K. Ramamohanarao. Join algorithm costs revisited. The VLDB Journal, 5(1):?–?, Jan 1996. [408] D. Harville. Matrix Algebra from a Statistician’s Perspective. Springer, 2008. [409] W. Hasan and H. Pirahesh. Query rewrite optimization in starburst. Research Report RJ6367, IBM, 1988. [410] Z. He, B. Lee, and R. Snapp. Self-tuning cost modeling of user-defined functions in an object-relational DBMS. ACM Trans. on Database Systems, 30(3):812–853, 2005. 654 BIBLIOGRAPHY [411] M. Heimel, M. Kiefer, and V. Markl. Self-tuning, GPU-accelerated kernel density models for multidimensional selectivity estimation. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 1477–1492, 2015. [412] Heller. Rabbit: A performance counter library for Intel/AMD processors and Linux. perform internet search for this or similar tools. [413] J. Hellerstein. Practical predicate placement. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 325–335, 1994. [414] J. Hellerstein and J. Naughton. Query execution techniques for caching expensive methods. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 423–434, 1996. [415] J. Hellerstein and M. Stonebraker. Predicate migration: Optimizing queries with expensive predicates. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 267–277, 1993. [416] S. Helmer, C.-C. Kanne, and G. Moerkotte. Optimized translation of XPath expressions into algebraic expressions parameterized by programs containing navigational primitives. In Proc. Int. Conf. on Web Information Systems Engineering (WISE), 2002. 215-224. [417] S. Helmer, C.-C. Kanne, and G. Moerkotte. Optimized translation of XPath expressions into algebraic expressions parameterized by programs containing navigational primitives. Technical Report 11, University of Mannheim, 2002. [418] S. Helmer, B. König-Ries, and G. Moerkotte. The relational difference calculus and applications. Technical report, Universität Karlsruhe, 1993. (unpublished manuscript). [419] S. Helmer and G. Moerkotte. Evaluation of main memory join algorithms for joins with set comparison join predicates. Technical Report 13/96, University of Mannheim, Mannheim, Germany, 1996. [420] S. Helmer and G. Moerkotte. Evaluation of main memory join algorithms for joins with set comparison join predicates. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 386–395, 1997. [421] S. Helmer and G. Moerkotte. Index structures for databases containing data items with set-valued attributes. Technical Report 2/97, University of Mannheim, 1997. [422] S. Helmer and G. Moerkotte. A study of four index structures for setvalued attributes of low cardinality. Technical Report 02/99, University of Mannheim, 1999. [423] S. Helmer and G. Moerkotte. Compiling away set containment and intersection joins. Technical Report 4, University of Mannheim, 2002. BIBLIOGRAPHY 655 [424] S. Helmer and G. Moerkotte. A performance study of four index structures for set-valued attributes of low cardinality. VLDB Journal, 12(3):244–261, 2003. [425] S. Helmer, T. Neumann, and G. Moerkotte. Early grouping gets the skew. Technical Report 9, University of Mannheim, 2002. [426] M. Henderson and R. Lawrence. An evaluation of multi-way joins for relational database systems. In ICEIS, pages 37–50, 2013. [427] A. Heuer and M. H. Scholl. Principles of object-oriented query languages. In Proc. der GI-Fachtagung Datenbanksysteme für Büro, Technik und Wissenschaft (BTW). Springer, 1991. [428] J. Hidders and P. Michiels. Avoiding unnecessary ordering operations in XPath. In Int. Workshop on Database Programming Languages, pages 54–70, 2003. [429] D. Hirschberg. On the complexity of searching a set of vectors. SIAM J. Computing, 9(1):126–129, 1980. [430] T. Hogg and C. Williams. Solving the really hard problems with cooperative search. In Proc. National Conference on Artificial Intelligence, pages 231–236, 1993. [431] L. Hong-Cheu and K. Ramamohanarao. Algebraic equivalences among nested relational expressions. In CIKM, pages 234–243, 1994. [432] R. Horn and C. Johnson. Matrix Analysis. Cambridge University Press, 2007. [433] V. Hristidis, N. Koudas, and Y. Papakonstantinou. PREFER: A system for the efficient execution of multi-parametric ranked queries. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages ?–?, 2001. [434] N. Huyn. Multiple-view self-maintenance in data warehousing environments. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 26–35, 1997. [435] F. Hwang and G. Chang. Enumerating consecutive and nested partitions for graphs. Technical Report DIMACS Technical Report 93-15, Rutgers University, 1993. [436] H.-Y. Hwang and Y.-T. Yu. An analytical method for estimating and interpreting query time. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 347–358, 1987. [437] L. Hyafil and R. Rivest. Constructing optimal binary decision trees is NP-complete. Information Processing Letters, 5(1):15–17, 1976. [438] T. Ibaraki and T. Kameda. Optimal nesting for computing n-relational joins. ACM Trans. on Database Systems, 9(3):482–502, 1984. 656 BIBLIOGRAPHY [439] O. Ibarra and J. Su. On the containment and equivalence of database queries with linear constraints. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 32–43, 1997. [440] A. IJbema and H. Blanken. Estimating bucket accesses: A practical approach. In Proc. IEEE Conference on Data Engineering, pages 30–37, 1986. [441] I. Ilyas, V. Markl, P. Haas, P. Brown, and A. Aboulnaga. Automatic relationship discovery in self-managing database systems. In Proc. Int. Conf. on Automatic Computing (ICAC), pages 340–341, 2004. [442] I. Ilyas, V. Markl, P. Haas, P. Brown, and A. Aboulnaga. Cords: Automatic discovery of correlations and soft functional dependencies. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 647–658, 2004. [443] I. Ilyas, V. Markl, P. Haas, P. Brown, and A. Aboulnaga. Cords: Automatic generation of correlation statistics in db2. In Proc. Int. Conf. on Very Large Data Bases (VLDB), 2004. [444] I. Ilyas, J. Rao, G. Lohman, D. Gao, and E. Lin. Estimating compilation time of a query optimizer. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 373–384, 2003. [445] Y. Ioannidis. Query optimization. ACM Computing Surveys, 28(1):121– 123, 1996. [446] Y. Ioannidis. A. Tucker (ed.): The Computer Science and Engineering Handbook, chapter Query Optimization, pages 1038–1057. CRC Press, 1997. [447] Y. Ioannidis. The history of histograms (abridged). In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 19–30, 2003. [448] Y. Ioannidis and V. Poosala. Histogram-based approximation of setvalued query answers. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 174–185, 1999. [449] Y. Ioannidis and R. Ramakrishnan. Generalized containment of conjunctive queries. Technical report, U. Wisconsin, Madison, 1992. [450] Y. Ioannidis and R. Ramakrishnan. Containment of conjunctive queries: Beyond relations and sets. ACM Trans. on Database Systems, 20(3):288– 324, 1995. [451] Y. E. Ioannidis and S. Christodoulakis. On the propagation of errors in the size of join results. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 268–277, 1991. BIBLIOGRAPHY 657 [452] Y. E. Ioannidis and Y. C. Kang. Randomized algorithms for optimizing large join queries. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 312–321, 1990. [453] Y. E. Ioannidis and Y. C. Kang. Left-deep vs. bushy trees: An analysis of strategy spaces and its implications for query optimization. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 168–177, 1991. [454] Y. E. Ioannidis, Y. C. Kang, and T. Zhang. Cost wells in random graphs. personal communication, Dec. 1996. [455] Y. E. Ioannidis, R. T. Ng, K. Shim, and T. K. Sellis. Parametric query optimization. Tech. report, University of Wisconsin, Madison, 1992. [456] Y. E. Ioannidis, R. T. Ng, K. Shim, and T. K. Sellis. Parametric query optimization. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 103–114, 1992. [457] Y. E. Ioannidis and E. Wong. Query optimization by simulated annealing. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 9– 22, 1987. [458] D. Jacobsen and J. Wilkes. Disk scheduling algorithms based on rotational position. Technical Report HPL-CSP-91-7, Hewlett-Packard Laboratories, 1991. [459] H. V. Jagadish, S. Al-Khalifa, A. Chapman, L.V.S. Lakshmanan, A. Nierman, S. Paparizos, J. Patel, D. Srivastava, N. Wiwatwattana, Y. Wu, and C. Yu. TIMBER: A Native XML Database. VLDB Journal, 2003. to appear. [460] H. V. Jagadish, H. Jin, B. C. Ooi, and K.-L. Tan. Global optimization of histograms. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 223–234, 2001. [461] H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, and T. Suel. Optimal histograms with quality guarantees. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 275–286, 1998. [462] H. V. Jagadish, L. V. S. Lakshmanan, D. Srivastava, and K. Thompson. TAX: A tree algebra for XML. In Proc. Int. Workshop on Database Programming Languages, pages 149–164, 2001. [463] C. Janssen. The visual profiler. perform internet search for this or similar tools. [464] M. Jarke. Common subexpression isolation in multiple query optimization. In W. Kim, D. Reiner, and D. Batory, editors, Topics in Information Systems. Query Processing in Database Systems, pages 191–205, 1985. 658 BIBLIOGRAPHY [465] M. Jarke. Common subexpression isolation in multiple query optimization. In Query Processing in Database Systems, W. Kim, D. Reiner, D. Batory (Eds.), pages 191–205, 1985. [466] M. Jarke, J. Clifford, and Y. Vassiliou. An optimizing PROLOG frontend to a relational query system. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 296–306, 1984. [467] M. Jarke and J.Koch. Range nesting: A fast method to evaluate quantified queries. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 196–206, 1983. [468] M. Jarke and J. Koch. Query optimization in database systems. ACM Computing Surveys, pages 111–152, Jun 1984. [469] A. Jhingran. A performance study of query optimization algorithms on a database system supporting procedures. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 88–99, 1988. [470] D. Johnson and A. Klug. Testing containment of conjunctive queries under functional and inclusion dependencies. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 164– 168, 1982. [471] D. Johnson and A. Klug. Testing containment of conjunctive queries under functional and inclusion dependencies. J. Comp. Sys. Sci., 28(1):167– 189, 1984. [472] D. S. Johnson and A. Klug. Optimizing conjunctive queries that contain untyped variables. SIAM J. Comput., 12(4):616–640, 1983. [473] B. Jonsson, M. Franklin, and D. Srivastava. Interaction of query evaluation and buffer management for information retrieval. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 118–129, 1998. [474] N. Kabra and D. DeWitt. Efficient mid-query re-optimization of suboptimal query execution plans. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 106–117, 1998. [475] J. Kahn, G. Kalai, and N. Linial. The influence of variables on boolean functions. In IEEE ???, pages 68–80, 1988. [476] M. Kamath and K. Ramamritham. Bucket skip merge join: A scalable algorithm for join processing in very large databases using indexes. Technical Report 20, University of Massachusetts at Amherst, Amherst, MA, 1996. [477] Y. Kambayashi. Processing cyclic queries. In W. Kim, D. Reiner, and D. Batory, editors, Query Processing in Database Systems, pages 62–78, 1985. BIBLIOGRAPHY 659 [478] Y. Kambayashi and M. Yoshikawa. Query processing utilizing dependencies and horizontal decomposition. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 55–68, 1983. [479] N. Kamel and R. King. A model of data distribution based on texture analysis. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 319–325, 1985. [480] R. Kaushik, C. R’e, and D. Suciu. General database statistics using entropy maximization. In Proc. Int. Workshop on Database Programming Languages, pages 84–99, 2009. [481] A. Kawaguchi, D. Lieuwen, I. Mumick, and K. Ross. Implementing incremental view maintenance in nested data models. In Proc. Int. Workshop on Database Programming Languages, 1997. [482] A. Keller and J. Basu. A predicate-based caching scheme for client-server database architectures. In PDIS, pages 229–238, 1994. [483] T. Keller, G. Graefe, and D. Maier. Efficient assembly of complex objects. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 148– 157, 1991. [484] A. Kemper and A. Eickler. Datenbanksysteme. Oldenbourg, 2001. 4th Edition. [485] A. Kemper and G. Moerkotte. Advanced query processing in object bases: A comprehensive approach to access support, query transformation and evaluation. Technical Report 27/90, University of Karlsruhe, 1990. [486] A. Kemper and G. Moerkotte. Advanced query processing in object bases using access support relations. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 294–305, 1990. [487] A. Kemper and G. Moerkotte. Query optimization in object bases: Exploiting relational techniques. In Proc. Dagstuhl Workshop on Query Optimization (J.-C. Freytag, D. Maier und G. Vossen (eds.)). MorganKaufman, 1993. [488] A. Kemper, G. Moerkotte, and K. Peithner. A blackboard architecture for query optimization in object bases. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 543–554, 1993. [489] A. Kemper, G. Moerkotte, K. Peithner, and M. Steinbrunn. Optimizing disjunctive queries with expensive predicates. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 336–347, 1994. [490] A. Kemper, G. Moerkotte, and M. Steinbrunn. Optimierung Boolescher Ausdrücke in Objektbanken. In Grundlagen von Datenbanken (Eds. U. Lipeck, R. Manthey), pages 91–95, 1992. 660 BIBLIOGRAPHY [491] A. Kemper, G. Moerkotte, and M. Steinbrunn. Optimization of boolean expressions in object bases. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 79–90, 1992. [492] W. Kiessling. On semantic reefs and efficient processing of correlation queries with aggregates. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 241–250, 1985. [493] K. C. Kim, W. Kim, D. Woelk, and A. Dale. Acyclic query processing in object-oriented databases. In Proc. of the Entity Relationship Conf., 1988. [494] W. Kim. On optimizing an SQL-like nested query. ACM Trans. on Database Systems, 7(3):443–469, Sep 82. [495] J. J. King. Exploring the use of domain knowledge for query processing efficiency. Technical Report STAN-CS-79-781, Computer Science Department, Stanford University, 1979. [496] J. J. King. Quist: A system for semantik query optimization in relational databases. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 510–517, 1981. [497] A. Klausner. Multirelations in Relational Databases. PhD thesis, Harvard University, Cambridge, 1986. [498] A. Klausner and N. Goodman. Multirelations – semantics and languages. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 251–258, 1985. [499] M. Klettke, L. Schneider, and A. Heuer. Metrics for XML Document Collections. In EDBT Workshop XML-Based Data Management (XMLDM), pages 15–28, 2002. [500] A. Klug. Calculating constraints on relational expressions. ACM Trans. on Database Systems, 5(3):260–290, 1980. [501] A. Klug. Access paths in the “ABE” statistical query facility. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 161–173, 1982. [502] A. Klug. Equivalence of relational algebra and relational calculus query languages having aggregate functions. Journal of the ACM, 29(3):699– 717, 1982. [503] A. Klug. On conjunctive queries containing inequalities. Journal of the ACM, 35(1):146–160, 1988. Written 1982 and published posthumously. [504] D. Knuth. The Art of Computer Programming, Volume 1: Fundamental Algorithms. Addison-Wesley, 1997. [505] D. Knuth. The Art of Computer Programming; Volume 3: Sorting and Searching. Addison Wesley, 2000. BIBLIOGRAPHY 661 [506] J. Koch. Relationale Anfragen: Zerlegung und Optimierung. InformatikFachberichte 101. Springer-Verlag, 1985. [507] J. Kollias. An estimate for seek time for batched searching of random or index sequential structured files. The Computer Journal, 21(2):132–133, 1978. [508] A. König and G. Weikum. Combining histograms and parametric curve fitting for feedback-driven query result-size estimation. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 423–434, 1999. [509] A. König and G. Weikum. Automatic tuning of data synopsis. Information Systems, 28:85–109, 2003. [510] B. König-Ries, S. Helmer, and G. Moerkotte. An experimental study on the complexity of left-deep join ordering problems for cyclic queries. Working Draft, 1994. [511] B. König-Ries, S. Helmer, and G. Moerkotte. An experimental study on the complexity of left-deep join ordering problems for cyclic queries. Technical Report 95-4, RWTH-Aachen, 1995. [512] R. Kooi. The Optimization of Queries in Relational Databases. PhD thesis, Case Western Reserve University, 1980. [513] F. Korn, H.V. Jagadish, and C. Faloutsos. Efficiently supporting ad hoc queries in large datasets of time sequences. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 289–300, 1997. [514] D. Kossmann. The state of the art in distributed query processing. ACM Computing Surveys, 32(4):422–469, 2000. [515] D. Kossmann and K. Stocker. Iterative dynamic programming: a new class of query optimization algorithms. ACM Trans. on Database Systems, 25(1):43–82, 2000. [516] N. Koudas, S. Muthukrishnan, and D. Srivastava. Optimal histograms for hierarchical range queries. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 196–204, 2000. [517] W. Kowarschick. Semantic optimization: What are disjunctive residues useful for? SIGMOD Record, 21(3):26–32, September 1992. [518] R. Krauthgamer, A. Mehta, V. Raman, and A. Rudra. Greedy list intersection. In Proc. IEEE Conference on Data Engineering, pages 1033–1042, 2008. [519] D. Kreher and D. Stinson. Combinatorial Algorithms: Generation, Enumeration, and Search. CRC Press, 1999. [520] R. Krishnamurthy, H. Boral, and C. Zaniolo. Optimization of nonrecursive queries. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 128–137, 1986. 662 BIBLIOGRAPHY [521] A. Kumar and M. Stonebraker. The effect of join selectivities on optimal nesting order. SIGMOD Record, 16(1):28–41, 1987. [522] I. Kunen and D. Suciu. A scalable algorithm for query minimization. ask Dan for more information, year. [523] S. Kwan and H. Strong. Index path length evaluation for the research storage system of system r. Technical Report RJ2736, IBM Research Laboratory, San Jose, 1980. [524] M. Lacroix and A. Pirotte. Generalized joins. SIGMOD Record, 8(3):14– 15, 1976. [525] L. Lakshman and R. Missaoui. Pushing semantics inside recursion: A general framework for semantic optimization of recursive queries. In Proc. IEEE Conference on Data Engineering, pages 211–220, 1995. [526] S. Lang and Y. Manolopoulos. Efficient expressions for completely and partly unsuccessful batched search of tree-structured files. IEEE Trans. on Software Eng., 16(12):1433–1435, 1990. [527] S.-D. Lang, J. Driscoll, and J. Jou. A unified analysis of batched searching of sequential and tree-structured files. ACM Trans. on Database Systems, 14(4):604–618, 1989. [528] T. Lang, C. Wood, and I. Fernandez. Database buffer paging in virtual storage systems. ACM Trans. on Database Systems, 2(4):339–351, 1977. [529] R. Lanzelotte and J.-P. Cheiney. Adapting relational optimisation technology for deductive and object-oriented declarative database languages. In Proc. Int. Workshop on Database Programming Languages, pages 322– 336, 1991. [530] R. Lanzelotte and P. Valduriez. Extending the search strategy in a query optimizer. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 363–373, 1991. [531] R. Lanzelotte, P. Valduriez, and M. Zait. Optimization of object-oriented recursive queries using cost-controlled stragegies. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 256–265, 1992. [532] R. Lanzelotte, P. Valduriez, and M. Zäit. On the effectiveness of optimization search strategies for parallel execution. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 493–504, 1993. [533] R. Lanzelotte, P. Valduriez, M. Ziane, and J.-P. Cheiney. Optimization of nonrecursive queries in OODBMs. In Proc. Int. Conf. on Deductive and Object-Oriented Databases (DOOD), pages 1–21, 1991. [534] P.-A. Larson. Data reduction by partial preaggregation. In Proc. IEEE Conference on Data Engineering, pages 706–715, 2002. BIBLIOGRAPHY 663 [535] P.-Å. Larson and H. Yang. Computing queries from derived relations. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 259–269, 1985. [536] Y.-N. Law, H. Wang, and C. Zaniolo. Query languages and data models for database sequences and data streams. In VLDB, pages 492–503, 2004. [537] E. Lawler. Sequencing jobs to minimize total weighted completion time subject to precedence constraints. Ann. Discrete Math., 2:75–90, 1978. [538] B. S. Lee and G. Wiederhold. Outer joins and filters for instantiating objects from relational databases through views. IEEE Trans. on Knowledge and Data Engineering, 6(1):108–119, 1994. [539] C. Lee, C.-S. Shih, and Y.-H. Chen. Optimizing large join queries using a graph-based approach. IEEE Trans. on Knowledge and Data Eng., 13(2):298–315, 2001. [540] M. K. Lee, J. C. Freytag, and G. M. Lohman. Implementing an interpreter for functional rules in a query optimizer. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 218–239, 1988. [541] M. K. Lee, J. C. Freytag, and G. M. Lohman. Implementing an optimizer for functional rules in a query optimizer. Technical Report RJ 6125, IBM Almaden Research Center, San Jose, CA, 1988. [542] T. Lehman and B. Lindsay. The Starburst long field manager. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 375–383, 1989. [543] K. Lehnert. Regelbasierte Beschreibung von Optimierungsverfahren für relationale Datenbankanfragesprachen. PhD thesis, Technische Universität München, 8000 München, West Germany, Dec 1988. [544] A. Lerner and D. Shasha. AQuery: query language for ordered data, optimization techniques, and experiments. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 345–356, 2003. [545] H. Leslie, R. Jain, D. Birdsall, and H. Yaghmai. Efficient search of multi-dimensional B-trees. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 710–719, 1995. [546] M. Levene and G. Loizou. Correction to null values in nested relational databases by m. roth and h. korth and a. silberschatz. Acta Informatica, 28(6):603–605, 1991. [547] M. Levene and G. Loizou. A fully precise null extended nested relational algebra. Fundamenta Informaticae, 19(3/4):303–342, 1993. [548] A. Levy, A. Mendelzon, and Y. Sagiv. Answering queries using views. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages ?–?, 1995. 664 BIBLIOGRAPHY [549] A. Levy, A. Mendelzon, Y. Sagiv, and D. Srivastava. Answering Queries Using Views, pages 93–106. MIT Press, 1999. [550] A. Levy, A. Mendelzon, D. Srivastava, and Y. Sagiv. Answering queries using views. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 95–104, 1995. [551] A. Levy, I. Mumick, and Y. Sagiv. Query optimization by predicate movearound. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 96–107, 1994. [552] A.Y. Levy and I.S. Mumick. Reasoning with aggregation constraints. In P. Apers, M. Bouzeghoub, and G. Gardarin, editors, Proc. European Conf. on Extending Database Technology (EDBT), Lecture Notes in Computer Science, pages 514–534. Springer, March 1996. [553] H. Lewis and C. Papadimitriou. Elements of the Theory of Computation. Prentice Hall, 1981. [554] C. Li, K. Chang, I. Ilyas, and S. Song. Ranksql: Query algebra and optimization for relational top-k queries. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 131–142, 2005. [555] D. Lichtenstein. Planar formulae and their uses. 11(2):329–343, 1982. SIAM J. Comp., [556] J. Liebehenschel. Ranking and unranking of lexicographically ordered words: An average-case analysis. J. of Automata, Languages, and Combinatorics, 2:227–268, 1997. [557] J. Liebehenschel. Lexicographical generation of a generalized dyck language. Technical Report 5/98, University of Frankfurt, 1998. [558] J. Liebehenschel. Lexikographische Generierung, Ranking und Unranking kombinatorisher Objekt: Eine Average-Case Analyse. PhD thesis, University of Frankfurt, 2000. [559] H. Liefke. Horizontal query optimization on ordered semistructured data. In ACM SIGMOD Workshop on the Web and Databases (WebDB), 1999. [560] L. Lim, M. Wang, S. Padmanabhan, J. Vitter, and R. Parr. XPathLearner: An on-line self-tuning Markov histogram for XML path selectivity estimation. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 442–453, 2002. [561] J. Lin and M. Ozsoyoglu. Processing OODB queries by O-algebra. In Int. Conference on Information and Knowledge Management (CIKM), pages 134–142, 1996. [562] J. W. S. Liu. Algorithms for parsing search queries in systems with inverted file organization. ACM Trans. on Database Systems, 1(4):299–316, 1976. BIBLIOGRAPHY 665 [563] M.-L. Lo and C. Ravishankar. Towards eliminating random I/O in hash joins. In Proc. IEEE Conference on Data Engineering, pages 422–429, 1996. [564] M. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret. Applications of second-order cone programming. Linear Algebra and its Applications, 284:192–228, 1998. [565] G. Lohman. Grammar-like functional rules for representing query optimization alternatives. Research report rj 5992, IBM, 1987. [566] G. Lohman. Heuristic method for joining relational database tables. IBM Technical Disclosure Bulletin, 30(9):8–10, 1988. [567] G. M. Lohman. Grammar-like functional rules for representing query optimization alternatives. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 18–27, 1988. [568] D. Lomet. B-tree page size when caching is considered. ACM SIGMOD Record, 27(3):28–32, 1998. [569] R. Lorie. XRM - an extended (N-ary) relational model. Technical Report 320-2096, IBM Cambridge Scientific Center, 1974. [570] H. Lu and K.-L. Tan. On sort-merge algorithms for band joins. IEEE Trans. on Knowledge and Data Eng., 7(3):508–510, Jun 1995. [571] W. S. Luk. On estimating block accesses in database organizations. Communications of the ACM, 26(11):945–947, 1983. [572] V. Lum, P. Yuen, and M. Dodd. Key-to-address transform techniques. Communications of the ACM, 14:228–239, 1971. [573] J. Lumbroso. An optimal cardinality estimation algorithm based on order statistics and its full analysis. In Conf. on Analysis of Algorithms (AofA). Discrete Mathematics and Theoretical Computer Science, pages 491–506, 2010. [574] G. Luo, J. Naughton, C. Ellmann, and M. Watzke. Toward a progress indicator for database queries. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 791–802, 2004. [575] G. Luo, J. Naughton, C. Ellmann, and M. Watzke. Increasing the accuracy and coverage of SQL progress indicators. In Proc. IEEE Conference on Data Engineering, pages 853–864, 2005. [576] D. Maier and D. S. Warren. Incorporating computed relations in relational databases. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 176–187, 1981. [577] M. Majster-Cederbaum. Elimination of redundant operations in relational queries with general selection operators. Computing, 34(4):303–323, 1984. 666 BIBLIOGRAPHY [578] A. Makinouchi, M. Tezuka, H. Kitakami, and S. Adachi. The optimization strategy for query evaluation in RDB/V1. In Proc. IEEE Conference on Data Engineering, pages 518–529, 1981. [579] T. Malkemus, S. Padmanabhan, and B. Bhattacharjee. Predicate derivation and monotonicity detection in DB2 UDB. In Proc. IEEE Conference on Data Engineering, pages ?–?, 2005. [580] C. V. Malley and S. B. Zdonik. A knowledge-based approach to query optimization. In Proc. Int. Conf. on Expert Database Systems, pages 329–344, 1987. [581] N. Mamoulis. Efficient processing of joins on set-valued attributes. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 157– 168, 2003. [582] M. Mannino and A. Rivera. An extensible model of selectivity estimation. Information Sciences, 49:225–247, 1989. [583] M. V. Mannino, P. Chu, and T. Sager. Statistical profile estimation in database systems. ACM Computing Surveys, 20(3):191–221, 1988. [584] Y. Manolopoulos and J. Kollias. Estimating disk head movement in batched searching. BIT, 28:27–36, 1988. [585] Y. Manolopoulos, J. Kollias, and M. Hatzopoulos. Sequential vs. binary batched search. The Computer Journal, 29(4):368–372, 1986. [586] Y. Manopoulos and J. Kollias. Performance of a two-headed disk system when serving database queries under the scan policy. ACM Trans. on Database Systems, 14(3):425–442, 1989. [587] S. March and D. Severence. The determination of efficient record segmentation and blocking factors for shared data files. ACM Trans. on Database Systems, 2(3):279–296, 1977. [588] R. Marek and E. Rahm. TID hash joins. In Int. Conference on Information and Knowledge Management (CIKM), pages 42–49, 1994. [589] V. Markl, P. Haas, M. Kutsch, N. Meggido, U. Srivastava, and T. Tran. Consistent selectivity estimation via maximum entropy. The VLDB Journal, 16:55–76, 2007. [590] V. Markl, N. Megiddo, M. Kutsch, T. Tran, P. Haas, and U. Srivastava. Consistently estimating the selectivity of conjuncts of predicates. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 373–384, 2005. [591] N. May, M. Brantner, A. Böhm, C.-C. Kanne, and G. Moerkotte. Index vs. navigation in XPath evaluation. In Int. XML Database Symp. (XSym), pages 16–30, 2006. BIBLIOGRAPHY 667 [592] N. May, S. Helmer, C.-C. Kanne, and G. Moerkotte. Xquery processing in natix with an emphasis on join ordering. In Int. Workshop on XQuery Implementation, Experience and Perspectives (XIME-P), pages 49–54, 2004. [593] N. May, S. Helmer, and G. Moerkotte. Nested queries and quantifiers in an ordered context. Technical report, University of Mannheim, 2003. [594] N. May, S. Helmer, and G. Moerkotte. Quantifiers in XQuery. In Proc. Int. Conf. on Web Information Systems Engineering (WISE), pages 313– 316, 2003. [595] N. May, S. Helmer, and G. Moerkotte. Three Cases for Query Decorrelation in XQuery. In Int. XML Database Symp. (XSym), pages 70–84, 2003. [596] N. May, S. Helmer, and G. Moerkotte. Nested queries and quantifiers in an ordered context. In Proc. IEEE Conference on Data Engineering, pages 239–250, 2004. [597] N. May, S. Helmer, and G. Moerkotte. Strategies for query unnesting in XML databases. ACM Trans. on Database Systems, 31(3):968–1013, 2006. [598] N. May and G. Moerkotte. Main memory implementations for binary grouping. In Int. XML Database Symp. (XSym), pages 162–176, 2005. [599] J. McHugh and J. Widom. Query optimization for XML. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 315–326, 1999. [600] N. Megiddo and D. Modha. Outperforming LRU with an adpative replacement cache algorithm. IEEE Computer, 37(4):58–65, 2004. [601] S. Melnik and H. Garcia-Molina. Divide-and-conquer algorithm for computing set containment joins. In Proc. of the Int. Conf. on Extending Database Technology (EDBT), pages 427–444, 2002. [602] S. Melnik and H. Garcia-Molina. Adaptive algorithms for set containment joins. ACM Trans. on Database Systems, 28(1):56–99, 2003. [603] T. Merrett and E. Otoo. Distribution models of relations. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 418–425, 1979. [604] T. H. Merrett, Y. Kambayashi, and H. Yasuura. Scheduling of pagefetches in join operations. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 488–498, 1981. [605] R. Van Meter. Observing the effects of multi-zone disks. In USENIX Annual Technical Conference, 1997. [606] G. Miklau and D. Suciu. Containment and equivalence of XPath expressions. Journal of the ACM, 51(1):2–45, 2002. 668 BIBLIOGRAPHY [607] T. Milo and D. Suciu. Index structures for path expressions. In Proc. Int. Conf. on Database Theory (ICDT), pages 277–295, 1999. [608] M. Minoux. Mathematical Programming. Theory and Algorithms. Wiley, 1986. [609] D. Mitchell, B. Selman, and H. Levesque. Hard and easy distributions of SAT problems. In Proc. National Conference on Artificial Intelligence, pages 459–465, 1992. [610] G. Mitchell. Extensible Query Processing in an Object-Oriented Database. PhD thesis, Brown University, Providence, RI 02912, 1993. [611] G. Mitchell, U. Dayal, and S. Zdonik. Control of an extensible query optimizer: A planning-based approach. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages ?–?, 1993. [612] G. Mitchell, S. Zdonik, and U. Dayal. An architecture for query processing in persistent object stores. In Proc. of the Hawaiian Conf. on Computer and System Sciences, pages 787–798, 1992. [613] G. Mitchell, S. Zdonik, and U. Dayal. A. Dogac and M. T. Özsu and A. Biliris, and T. Sellis: Object-Oriented Database Systems, chapter Optimization of Object-Oriented Queries: Problems and Applications, pages 119–146. NATO ASI Series F: Computer and Systems Sciences, Vol. 130. Springer, 1994. [614] G. Moerkotte. Small materialized aggregates: A light weight index structure for data warehousing. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 476–487, 1998. [615] G. Moerkotte. Constructing Optimal Bushy Trees Possibly Containing Cross Products for Order Preserving Joins is in P. Technical Report 12, University of Mannheim, 2003. [616] G. Moerkotte. Dp-counter analytics. Technical Report 2, University of Mannheim, 2006. [617] G. Moerkotte. Best approximation under a convex paranorm. Technical Report MA-08-07, University of Mannheim, 2008. [618] G. Moerkotte and T. Neumann. Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy trees without cross products. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 930–941, 2006. [619] G. Moerkotte and T. Neumann. Dynamic programming strikes back. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 539– 552, 2008. BIBLIOGRAPHY 669 [620] G. Moerkotte and T. Neumann. Faster join enumeration for complex queries. In Proc. IEEE Conference on Data Engineering, pages 1430– 1432, 2008. [621] G. Moerkotte and T. Neumann. Accelerating queries with group-by and join by groupjoin. In Proc. of the VLDB Endowment (PVLDB), pages 843–851, 2011. [622] G. Moerkotte, T. Neumann, and G. Steidl. Preventing bad plans by bounding the impact of cardinality estimation errors. Proc. of the VLDB Endowment (PVLDB), 2(1):982–993, 2009. [623] G. Moerkotte and G. Steidl. Best approximation with respect to a quotient functional. Technical Report X, University of Mannheim, 2008. [624] C. Mohan. Interactions between query optimization and concurrency control. In Int. Workshop on RIDE, 1992. [625] C. Mohan, D. Haderle, Y. Wang, and J. Cheng. Single table access using multiple indexes: Optimization, execution, and concurrency control techniques. In Int. Conf. on Extended Database Technology (EDBT), pages 29–43, 1990. [626] J. Monk and R. Bonnett, editors. Handbook of Boolean Algebras. North Holland, 1989. [627] C. Monma and J. Sidney. Sequencing with series-parallel precedence constraints. Math. Oper. Res., 4:215–224, 1979. [628] R. Morris. Counting large numbers of events in small registers. Communications of the ACM, 21(10):840–842, 1978. [629] T. Morzy, M. Matyasiak, and S. Salza. Tabu search optimization of large join queries. In Proc. of the Int. Conf. on Extending Database Technology (EDBT), pages 309–322, 1994. [630] L. Moses and R. Oakland. Tables of Random Permutations. Stanford University Press, 1963. [631] I. Mumick, S. Finkelstein, H. Pirahesh, and R. Ramakrishnan. Magic is relevant. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 247–258, 1990. [632] I. Mumick and H. Pirahesh. Implementation of magic sets in a relational database system. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 103–114, 1994. [633] I. Mumick, H. Pirahesh, and R. Ramakrishnan. The magic of duplicates and aggregates. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 264–277, 1990. 670 BIBLIOGRAPHY [634] M. Muralikrishna and D.J. DeWitt. Equi-depth histograms for estimating selectivity factors for multi-dimensional queries. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 28–36, 1988. [635] B. Muthuswamy and L. Kerschberg. A detailed statistical model for relational query optimization. In ACM Annual Conference - The range of computing: mid-80’s perspective, pages 439–447, 1985. [636] W. Myrvold and F. Ruskey. Ranking and unranking permutations in linear time. Information Processing Letters, 79(6):281–284, 2001. [637] R. Nakano. Translation with optimization from relational calculus to relational algebra having aggregate funktions. ACM Trans. on Database Systems, 15(4):518–557, 1990. [638] K. Seppia nd J. Barnes and C. Morris. A bayesian approach to database query optimization. ORSA J. on Computing, 5(4):410–419, 1993. [639] T. Neumann. Query simplification: graceful degradation for join-order optimization. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 403–414, 2009. [640] T. Neumann and C. Galindo-Legaria. Taking the edge off cardinality estimation errors using incremental execution. In Proc. der GI-Fachtagung Datenbanksysteme für Büro, Technik und Wissenschaft (BTW), pages 73–92, 2013. [641] T. Neumann and A. Kemper. Unnesting arbitrary queries. In BTW, pages 383–402, 2015. [642] T. Neumann and S. Michel. Smooth interpolation histograms with error guarantees. In British National Conference on Databases (BNCOD), pages ?–?, 2008. [643] T. Neumann and G. Moerkotte. A combined framework for grouping and order optimization. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 960–971, 2004. [644] T. Neumann and G. Moerkotte. An efficient framework for order optimization. In Proc. IEEE Conference on Data Engineering, pages 461–472, 2004. [645] F. Neven and T. Schwentick. XPath containment in the presence of disjunction, dtds, and variables. In Proc. Int. Conf. on Database Theory (ICDT), pages 315–329, 2003. [646] S. Ng. Advances in disk technology: Performance issues. IEEE Computer, 31(5):75–81, 1998. [647] W. Ng and C. Ravishankar. Relational database compression using augmented vector quantization. In Proc. IEEE Conference on Data Engineering, pages 540–549, 1995. BIBLIOGRAPHY 671 [648] S. Nigam and K. Davis. A semantic query optimization algorithm for object-oriented databases. In Second International Workshop on Constraint Database Systems, pages 329–344, 1997. [649] E. Omicienski. Heuristics for join processing using nonclustered indexes. IEEE Trans. on Software Eng., 15(1):18–25, Feb. 1989. [650] P. O’Neil. Database Principles, Programming, Performance. Morgan Kaufmann, 1994. [651] P. O’Neil and D. Quass. Improved query performance with variant indexes. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 38–49, 1997. [652] K. Ono and G. Lohman. Extensible enumeration of feasible joins for relational query optimization. Technical Report RJ 6625, IBM Almaden Research Center, 1988. [653] K. Ono and G. Lohman. Measuring the complexity of join enumeration in query optimization. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 314–325, 1990. [654] J. A. Orenstein, S. Haradhvala, B. Margulies, and D. Sakahara. Query processing in the ObjectStore database system. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 403–412, 1992. [655] J. A. Orenstein and F. A. Manola. PROBE spatial data modeling and query processing in an image database application. IEEE Trans. on Software Eng., 14(5):611–629, 1988. [656] M. Ortega-Binderberger, K. Chakrabarti, and S. Mehrotra. An approach to integrating query refinement in sql. In Proc. of the Int. Conf. on Extending Database Technology (EDBT), pages 15–33, 2002. [657] S. Osborn. Identity, equality and query optimization. In Proc. OODB, 1989. [658] N. Ott. On the problem of removing redundant join operations. Technical Report TR 80.01.002, IBM Scientific Center, Heidelberg, 1980. [659] N. Ott and K. Horlaender. Removing redundant joins in queries involving views. Technical Report TR-82.03.003, IBM Scientific Center, Heidelberg, 1982. [660] G. Ozsoyoglu, V. Matos, and Z. M. Ozsoyoglu. Query processing techniques in the Summary-Table-by-Example database query language. ACM Trans. on Database Systems, 14(4):526–573, 1989. [661] G. Ozsoyoglu and H. Wang. A relational calculus with set operators, its safety and equivalent graphical languages. IEEE Trans. on Software Eng., SE-15(9):1038–1052, 1989. 672 BIBLIOGRAPHY [662] T. Özsu and J. Blakeley. W. Kim (ed.): Modern Database Systems, chapter Query Processing in Object-Oriented Database Systems, pages 146– 174. Addison Wesley, 1995. [663] T. Özsu and D. Meechan. Finding heuristics for processing selection queries in relational database systems. Information Systems, 15(3):359– 373, 1990. [664] T. Özsu and D. Meechan. Join processing heuristics in relational database systems. Information Systems, 15(4):429–444, 1990. [665] T. Özsu and P. Valduriez. Principles of Distributed Database Systems. Prentice-Hall, 1999. [666] T. Özsu and P. Valduriez. Principles of Distributed Database Systems. Springer, 2011. [667] T. Özsu and B. Yao. Evaluation of DBMSs using XBench benchmark. Technical Report CS-2003-24, University of Waterloo, 2003. [668] P. Palvia. Expressions for batched searching of sequential and hierarchical files. ACM Trans. on Database Systems, 10(1):97–106, 1985. [669] P. Palvia and S. March. Approximating block accesses in database organizations. Information Processing Letters, 19:75–79, 1984. [670] S. Papadimitriou, H. Kitagawa, P. Gibbons, and C. Faloutsos. LOCI: Fast outlier detection using local correlation integral. In ICDE, pages 315–, 2003. [671] V. Papadimos and D. Maier. Mutant query plans. Information & Software Technology, 44(4):197–206, 2002. [672] Y. Papakonstantinou and V. Vianu. Incremental validation of XML documents. In Proc. Int. Conf. on Database Theory (ICDT), pages 47–63, 2003. [673] S. Paparizos, S. Al-Khalifa, H. V. Jagadish, L. V. S. Lakshmanan, A. Nierman, D. Srivastava, and Y. Wu. Grouping in XML. In EDBT Workshops, pages 128–147, 2002. [674] S. Paparizos, S. Al-Khalifa, H. V. Jagadish, A. Niermann, and Y. Wu. A physical algebra for XML. Technical report, University of Michigan, 2002. [675] J. Paredaens and D. Van Gucht. Converting nested algebra expressions into flat algebra expressions. ACM Trans. on Database Systems, 17(1):65– 93, Mar 1992. [676] C.-S. Park, M. Kim, and Y.-J. Lee. Rewriting OLAP queries using materialized views and dimension hierarchies in data. In Proc. IEEE Conference on Data Engineering, pages 515–523, 2001. BIBLIOGRAPHY 673 [677] J. Patel, M. Carey, and M. Vernon. Accurate modeling of the hybrid hash join algorithm. In Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pages 56–66, 1994. [678] R. Patterson, G. Gibson, E. Ginting, D. Stodolsky, and J. Zelenka. Informed prefetching and caching. Technical Report CMU-CS-95-134, Carnegie Mellon University, 1995. [679] R. Patterson, G. Gibson, and M. Sayanarayanan. A status report on research in transparent informed prefetching. Technical Report CMUCS-93-113, Carnegie Mellon University, 1993. [680] G. Paulley. Exploiting Functional Dependence in Query Optimization. PhD thesis, University of Waterloo, 2000. [681] G. Paulley and P.-A. Larson. Exploiting uniqueness in query optimization. In Proc. IEEE Conference on Data Engineering, pages 68–79, 1994. [682] J. Pearl. Heuristics: Intelligent Search Strategies for Computer Problem Solving. Addison Wesley, 1984. [683] A. Pellenkoft, C. Galindo-Legaria, and M. Kersten. Complexity of transformation-based optimizers and duplicate-free generation of alternatives. Technical Report CS-R9639, CWI, 1996. [684] A. Pellenkoft, C. Galindo-Legaria, and M. Kersten. The complexity of transformation-based join enumeration. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 306–315, 1997. [685] A. Pellenkoft, C. Galindo-Legaria, and M. Kersten. Duplicate-free generation of alternatives in transformation-based optimizers. In Proceedings of the International Conference on Database Systems for Advanced Applications (DASFAA), pages 117–124, 1997. [686] M. Pettersson. Linux x86 performance monitoring counters driver. perform internet search for this or similar tools. [687] M. Pezarro. A note on estimating hit ratios for direct-access storage devices. The Computer Journal, 19(3):271–272, 1976. [688] B. Piatetsky-Shapiro and C. Connell. Accurate estimation of the number of tuples satisfying a condition. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 256–276, 1984. [689] H. Pirahesh, J. Hellerstein, and W. Hasan. Extensible/rule-based query rewrite optimization in Starburst. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 39–48, 1992. [690] H. Pirahesh, T. Leung, and W. Hassan. A rule engine for query transformation in Starburst and IBM DB2 C/S DBMS. In Proc. IEEE Conference on Data Engineering, pages 391–400, 1997. 674 BIBLIOGRAPHY [691] A. Pirotte. Fundamental and secondary issues in the design of nonprocedural relational languages. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 239–250, 1979. [692] M. Piwowarski. Comments on batched searching of sequential and treestructured files. ACM Trans. on Database Systems, 10(2):285–287, 1985. [693] N. Plyzotis and M. Garofalakis. XSKETCH synopsis for XML. In Hellenic Data Management Symposium 02, 2002. [694] S. L. Pollack. Conversion of limited entry decision tables to computer programs. Communications of the ACM, 8(11):677–682, 1965. [695] N. Polyzotis and M. Garofalakis. Statistical synopses for graph-structured XML databases. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 358–369, 2002. [696] N. Polyzotis and M. Garofalakis. Structure and value synopsis for XML data graphs. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 466–477, 2002. [697] N. Polyzotis, M. Garofalakis, and Y. Ioannidis. Selectivity estimation for XML twigs. In Proc. IEEE Conference on Data Engineering, pages 264–275, 2002. [698] V. Poosala and Y. Ioannidis. Selectivity estimation without the attribute value independence assumption. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 486–495, 1997. [699] V. Poosola and Y. Ioannidis. Estimation of query-result distribution and its application to parallel-join load balancing. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 448–459, 1996. [700] V. Poosola, Y. Ioannidis, P. Haas, and E. Shekita. Improved histograms for selectivity estimates of range predicates. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 294–305, 1996. [701] S. Pramanik and D. Ittner. Use of graph-theoretic models for optimal relational database accesses to perform joins. ACM Trans. on Database Systems, 10(1):57–74, 1985. [702] W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Recipes. Cambridge University Press, 2007. Third Edition. Numerical [703] X. Qian. Query folding. In Proc. IEEE Conference on Data Engineering, pages 48–55, 1996. [704] D. Quass and J. Widom. On-line warehouse view maintenance. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 393–404, 1997. BIBLIOGRAPHY 675 [705] Y.-J. Qyang. A tight upper bound for the lumped disk seek time for the SCAN disk scheduling policy. Information Processing Letters, 54:355–358, 1995. [706] E. Rahm. Mehrrechner-Datenbanksysteme: Grundlagen der verteilten und parallelen Datenbankverwaltung. Addison-Wesley, 1994. [707] A. Rajaraman, Y. Sagiv, and J.D. Ullman. Answering queries using templates with binding patterns. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), PODS, 1995. [708] Bernhard Mitschang Ralf Rantzau, Leonard D. Shapiro and Quan Wang. Algorithms and applications for universal quantification in relational databases. Information Systems, 28(1-2):3–32, 2003. [709] R. Ramakrishnan and J. Gehrke. Database Management Systems. McGraw Hill, 2000. 2nd Edition. [710] K. Ramamohanarao, J. Lloyd, and J. Thom. Partial-match retrieval using hashing descriptors. ACM Trans. on Database Systems, 8(4):552–576, 1983. [711] M. Ramanath, L. Zhang, J. Freire, and J. Haritsa. IMAX: Incremental maintenance of schema-based xXML statistics. In Proc. IEEE Conference on Data Engineering, pages 273–284, 2005. [712] K. Ramasamy, J. Naughton, and D. Maier. High performance implementation techniques for set-valued attributes. Technical report, University of Wisconsin, Wisconsin, 2000. [713] K. Ramasamy, J. Patel, J. Naughton, and R. Kaushik. Set containment joins: The good, the bad, and the ugly. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 351–362, 2000. [714] S. Ramaswamy and P. Kanellakis. OODB indexing by class division. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 139– 150, 1995. [715] R. Rantzau, L. Shapiro, B. Mitschang, and Q. Wang. Universal quantification in relational databases: A classification of data and algorithms. In Proc. of the Int. Conf. on Extending Database Technology (EDBT), pages 445–463, 2002. [716] J. Rao, B. Lindsay, G. Lohman, H.Pirahesh, and D. Simmen. Using EELs: A practical approach to outerjoin and antijoin reordering. Technical Report RJ 10203, IBM, 2000. [717] J. Rao, B. Lindsay, G. Lohman, H. Pirahesh, and D. Simmen. Using EELs: A practical approach to outerjoin and antijoin reordering. In Proc. IEEE Conference on Data Engineering, pages 595–606, 2001. 676 BIBLIOGRAPHY [718] J. Rao and K. Ross. Reusing invariants: A new strategy for correlated queries. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 37–48, Seattle, WA, 1998. [719] S. Rao, A. Badia, and D. Van Gucht. Providing better support for a class of decision support queries. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 217–227, 1996. [720] G. Ray, J. Haritsa, and S. Seshadri. Database compression: A performance enhancement tool. In COMAD, 1995. [721] C. R’e and D. Suciu. Understanding cardinality estimation using entropy maximization. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 53–64, 2010. [722] C. R’e and D. Suciu. Understanding cardinality estimation using entropy maximization. ACM Trans. on Database Systems, 37(1):6, 2012. [723] D. Reiner and A. Rosenthal. Strategy spaces and abstract target machines for query optimization. Database Engineering, 5(3):56–60, Sept. 1982. [724] D. Reiner and A. Rosenthal. Querying relational views of networks. In W. Kim, D. Reiner, and D. Batory, editors, Query Processing in Database Systems, pages 109–124, 1985. [725] E. Reingold, J. Nievergelt, and N. Deo. Combinatorial Algorithms: Theory and Practice. Prentice Hall, 1977. [726] L. T. Reinwald and R. M. Soland. Conversion of limited entry decision tables to optimal computer programs I: minimum average processing time. Journal of the ACM, 13(3):339–358, 1966. [727] F. Reiss and T. Kanungo. A characterization of the sensitivity of query optimization to storage access cost parameters. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 385–396, 2003. [728] A. Reiter, A. Clute, and J. Tenenbaum. Representation and execution of searches over large tree-structured data bases. In Proc. IFIP Congress, Booklet TA-3, pages 134–144, 1971. [729] C. Rich, A. Rosenthal, and M. Scholl. Reducing duplicate work in relational join(s): A unified approach. In CISMOD, pages 87–102, 1993. [730] P. Richard. Evaluation of the size of a query expressed in relational algebra. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 155–163, 1981. [731] R. Van De Riet, A. Wassermann, M. Kersten, and W. De Jonge. Highlevel programming features for improving the efficiency of a relational database system. ACM Trans. on Database Systems, 6(3):464–485, 1981. [732] R. Rockafellar. Convex Analysis. Princeton University Press, 1970. BIBLIOGRAPHY 677 [733] D.J. Rosenkrantz and M.B. Hunt. Processing conjunctive predicates and queries. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 64–74, 1980. [734] A. Rosenthal. Note on the expected size of a join. SIGMOD Record, 11(4):19–25, 1981. [735] A. Rosenthal and U. S. Chakravarthy. Anatomy of a modular multiple query optimizer. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 230–239, 1988. [736] A. Rosenthal and C. Galindo-Legaria. Query graphs, implementing trees, and freely-reorderable outerjoins. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 291–299, 1990. [737] A. Rosenthal, S. Heiler, U. Dayal, and F. Manola. Traversal recursion: a practical approach to supporting recursive applications. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 166–167, 1986. [738] A. Rosenthal and P. Helman. Understanding and extending transformation-based optimizers. IEEE Data Engineering, 9(4):44–51, 1986. [739] A. Rosenthal and D. Reiner. An architecture for query optimization. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 246– 255, 1982. [740] A. Rosenthal and D. Reiner. Extending the algebraic framework of query processing to handle outerjoins. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 334–343, 1984. [741] A. Rosenthal and D. Reiner. Querying relational views of networks. In W. Kim, D. Reiner, and D. Batory, editors, Query Processing in Database Systems, New York, 1984. Springer. [742] A. Rosenthal, C. Rich, and M. Scholl. Reducing duplicate work in relational join(s): a modular approach using nested relations. Technical report, ETH Zürich, 1991. [743] K. Ross. Conjunctive selection conditions in main memory. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 108–120, 2002. [744] K. Ross, D. Srivastava, and D. Chatziantoniou. Complex aggregation at multiple granularities. In Proc. of the Int. Conf. on Extending Database Technology (EDBT), pages 263–278, 1998. [745] M. Roth and S. Horn. Database compression. SIGMOD Record, 22(3):31– 39, 1993. [746] M. Roth, H. Korth, and A. Silberschatz. Extended algebra and calculus for nested relational databases. ACM Trans. on Database Systems, 13(4):389–417, 1988. see also [868]. 678 BIBLIOGRAPHY [747] M. Roth, H. Korth, and A. Silberschatz. Null values in nested relational databases. Acta Informatica, 26(7):615–642, 1989. [748] M. Roth, H. Korth, and A. Silberschatz. Addendum to null values in nested relational databases. Acta Informatica, 28(6):607–610, 1991. [749] N. Roussopoulos. View indexing in relational databases. ACM Trans. on Database Systems, 7(2):258–290, 1982. [750] C. Ruemmler and J. Wilkes. An introduction to disk drive modeling. IEEE Computer, 27(3):17–29, 1994. [751] K. Runapongsa, J. Patel, H. Jagadish, and S. AlKhalifa. The michigan benchmark. Technical report, University of Michigan, 2002. [752] G. Sacco. Index access with a finite buffer. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 301–309, 1887. [753] G. Sacco and M. Schkolnick. A technique for managing the buffer pool in a relational system using the hot set model. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 257–262, 1982. [754] G. Sacco and M. Schkolnick. Buffer management in relational database systems. ACM Trans. on Database Systems, 11(4):473–498, 1986. [755] G. M. Sacco. Fragmentation: A technique for efficient query processing. ACM Trans. on Database Systems, 11(2):?–?, June 1986. [756] Y. Sagiv. Optimization of queries in relational databases. PhD thesis, Princeton University, 1978. [757] Y. Sagiv. Optimization of Queries in Relational Databases. UMI Research Press, Ann Arbor, Michigan, 1981. [758] Y. Sagiv. Quadratic algorithms for minimizing joins in restricted relational expressions. SIAM J. Comput., 12(2):321–346, 1983. [759] Y. Sagiv and M. Yannakakis. Equivalence among expressions with the union and difference operators. Journal of the ACM, 27(4):633–655, 1980. [760] V. Sarathy, L. Saxton, and D. Van Gucht. Algebraic foundation and optimization for object based query languages. In Proc. IEEE Conference on Data Engineering, pages 113–133, 1993. [761] C. Sartiani. A general framework for estimating XML query cardinality. In Int. Workshop on Database Programming Languages, pages 257–277, 2003. [762] S. Savage. The Flaw of Average. John Wiley & Sons, 2009. [763] F. Scarcello, G. Greco, and N. Leone. Weighted hypertree decomposition and optimal query plans. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 210–221, 2004. BIBLIOGRAPHY 679 [764] J. Scheible. A survey of storage options. IEEE Computer, 35(12):42–46, 2002. [765] H.-J. Schek and M. Scholl. The relational model with relation-valued attributes. Information Systems, 11(2):137–147, 1986. [766] W. Scheufele. Algebraic Query Optimization in Database Systems. PhD thesis, Universität Mannheim, 1999. [767] W. Scheufele and G. Moerkotte. Optimal ordering of selections and joins in acyclic queries with expensive predicates. Technical Report 96-3, RWTH-Aachen, 1996. [768] W. Scheufele and G. Moerkotte. On the complexity of generating optimal plans with cross products. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 238–248, 1997. [769] W. Scheufele and G. Moerkotte. Efficient dynamic programming algorithms for ordering expensive joins and selections. In Proc. of the Int. Conf. on Extending Database Technology (EDBT), pages 201–215, 1998. [770] J. Schindler, A. Ailamaki, and G. Ganger. Lachesis: Robust database storage management based on device-specific performance characteristics. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 706–717, 2003. [771] J. Schindler and G. Ganger. Automated disk drive characterization. Technical Report CMU-CS-99-176, Carnegie Mellon University, 1999. [772] J. Schindler, J. Griffin, C. Lumb, and G. Ganger. Track-aligned extents: Matching access patterns to disk drive characteristics. Technical Report CMU-CS-01-119, Carnegie Mellon University, 2001. [773] J. Schindler, J. Griffin, C. Lumb, and G. Ganger. Track-aligned extents: Matching access patterns to disk drive characteristics. In Conf. on File and Storage Technology (FAST), pages 259–274, 2002. [774] A. Schmidt, M. Kersten, M. Windhouwer, and F. Waas. Efficient relational storage and retrieval of XML documents. In ACM SIGMOD Workshop on the Web and Databases (WebDB), 2000. [775] A. Schmidt, F. Waas, M. Kersten, D. Florescu, I. Manolescu, M. Carey, and R. Busse. The XML Benchmark Project. Technical Report INSR0103, CWI, Amsterdam, 2001. [776] J. W. Schmidt. Some high level language constructs for data of type relation. ACM Trans. on Database Systems, 2(3):247–261, 1977. [777] K. Schmidt and G. Trenkler. Moderne Matrix Algebra. Springer, 2006. Second Edition. 680 BIBLIOGRAPHY [778] M. Scholl. Theoretical foundation of algebraic optimization utilizing unnormalized relations. In Proc. Int. Conf. on Database Theory (ICDT), pages ?–?, 1986. [779] T. Schwentick. XPath query containment. 33(1):101–109, 2004. ACM SIGMOD Record, [780] E. Sciore and J. Sieg. A modular query optimizer generator. In Proc. IEEE Conference on Data Engineering, pages 146–153, 1990. [781] B. Seeger. An analysis of schedules for performing multi-page requests. Information Systems, 21(4):387–407, 1996. [782] B. Seeger, P.-A. Larson, and R. McFadyen. Reading a set of disk pages. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 592–603, 1993. [783] A. Segev. Optimization of join operations in horizontally partitioned database systems. ACM Trans. on Database Systems, 11(1):48–80, 1986. [784] P. Selinger, M. Astrahan, D. Chamberlin, R. Lorie, and T. Price. Access path selection in a relational database management system. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 23–34, 1979. [785] T. Sellis. Intelligent caching and indexing techniques for relational database systems. Information Systems, 13(2):175–185, 1988. [786] T. Sellis. Intelligent caching and indexing techniques for relational database systems. Information Systems, 13(2):175–186, 1988. [787] T. Sellis. Multiple-query optimization. ACM Trans. on Database Systems, 13(1):23–52, 1988. [788] T. K. Sellis. Global query optimization. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 191–205, 1986. [789] M. Seltzer, P. Chen, and J. Ousterhout. Disk scheduling revisited. In USENIX, pages 313–323, 1990. [790] V. Sengar and J. Haritsa. PLASTIC: Reducing query optimization overheads through plan recycling. In Proc. of the ACM SIGMOD Conf. on Management of Data, page 676, 2003. [791] P. Seshadri, J. Hellerstein, H. Pirahesh, T. Leung, R. Ramakrishnan, D. Srivastava, P. Stuckey, and S. Sudarshan. Cost-based optimization for magic: Algebra and implementation. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 435–446, 1996. [792] S. Setzer, G. Steidl, T. Teuber, and G. Moerkotte. Approximation related to quotient functionals. Journal of Approximation Theory, 162(3):545– 558, 2010. BIBLIOGRAPHY 681 [793] K. Sevcik. Data base system performance prediction using an analytical model. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 182–198, 1981. [794] D. Severance. A practitioner’s guide to data base compression. Information Systems, 8(1):51=62, 1983. [795] D. Severance and G. Lohman. Differential files: their application to the maintenance of large databases. ACM Trans. on Database Systems, 1(3):256–267, Sep 1976. [796] M. C. Shan. Optimal plan search in a rule-based query optimizer. In J. W. Schmidt, S. Ceri, and M. Missikoff, editors, Proc. of the Intl. Conf. on Extending Database Technology, pages 92–112, Venice, Italy, Mar 1988. Springer-Verlag, Lecture Notes in Computer Science No. 303. [797] J. Shanmugasundaram, R. Barr E. J. Shekita, M. J. Carey, B. G. Lindsay, H. Pirahesh, and B. Reinwald. Efficiently Publishing Relational Data as XML Documents. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 65–76, 2000. [798] L. Shapiro, D. Maier, P. Benninghoff, K. Billings, Y. Fan, K. Hatwal, Q. Wang, Y. Zhang, H.-M. Wu, and B. Vance. Exploiting upper and lower bounds in top-down query optimization. In IDEAS, pages 20–33, 2001. [799] L. Shapiro and A. Stephens. Bootstrap percolation, the schröder numbers and the n-kings problem. SIAM J. Discr. Math., 4(2):275–280, 1991. [800] G. M. Shaw and S.B. Zdonik. Object-oriented queries: Equivalence and optimization. In 1st Int. Conf. on Deductive and Object-Oriented Databases, pages 264–278, 1989. [801] G. M. Shaw and S.B. Zdonik. A query algebra for object-oriented databases. Tech. report no. cs-89-19, Department of Computer Science, Brown University, 1989. [802] G.M. Shaw and S.B. Zdonik. An object-oriented query algebra. In 2nd Int. Workshop on Database Programming Languages, pages 111–119, 1989. [803] G.M. Shaw and S.B. Zdonik. A query algebra for object-oriented databases. In Proc. IEEE Conference on Data Engineering, pages 154–162, 1990. [804] E. Shekita and M. Carey. A performance evaluation of pointer-based joins. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 300–311, 1990. [805] E. Shekita, K.-L. Tan, and H. Young. Multi-join optimization for symmetric multiprocessors. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 479–492, 1993. 682 BIBLIOGRAPHY [806] E. Shekita, H. Young, and K.-L. Tan. Multi-join optimization for symmetric multiprocessors. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 479–492, 1993. [807] P. Shenoy and H. Cello. A disk scheduling framework for next generation operating systems. In Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pages 44–55, 1998. [808] S. T. Shenoy and Z. M. Ozsoyoglu. A system for semantic query optimization. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 181–195, 1987. [809] S. Sherman and R. Brice. Performance of a database manager in a virtual memory system. ACM Trans. on Database Systems, 1(4):317–343, 1976. [810] B. Shneiderman and V. Goodman. Batched searching of sequential and tree structured files. ACM Trans. on Database Systems, 1(3):208–222, 1976. [811] E. Shriver. Performance Modeling for Realistic Storage Devices. PhD thesis, University of New York, 1997. [812] E. Shriver, A. Merchant, and J. Wilkes. An analytical behavior model for disk drives with readahead caches and request reordering. In Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pages 182–191, 1998. [813] A. Shrufi and T. Topaloglou. Query processing for knowledge bases using join indices. In Int. Conference on Information and Knowledge Management (CIKM), 1995. [814] K. Shwayder. Conversion of limited entry decision tables to computer programs — a proposed modification to Pollack’s algorithm. Communications of the ACM, 14(2):69–73, 1971. [815] M. Siegel, E. Sciore, and S. Salveter. A method for automatic rule derivation to support semantic query optimization. ACM Trans. on Database Systems, 17(4):53–600, 1992. [816] A. Silberschatz, H. Korth, and S. Sudarshan. Database System Concepts. McGraw Hill, 1997. 3rd Edition. [817] D. Simmen, C. Leung, and H. Pirahesh. Exploitation of uniqueness properties for the optimization of SQL queries using a 1-tuple condition. Research Report RJ 10008 (89098), IBM Almaden Research Division, Feb. 1996. [818] D. Simmen, E. Shekita, and T. Malkemus. Fundamental techniques for order optimization. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 57–67, 1996. BIBLIOGRAPHY 683 [819] D. Simmen, E. Shekita, and T. Malkemus. Fundamental techniques for order optimization. In Proc. of the Int. Conf. on Extending Database Technology (EDBT), pages 625–62, 1996. [820] G. Slivinskas, C. Jensen, and R. Snodgrass. Bringing order to query optimization. SIGMOD Record, 13(2):5–14, 2002. [821] D. Smith and M. Genesereth. Ordering conjunctive queries. Artificial Intelligence, 26:171–215, 1985. [822] J. A. Smith. Sequentiality and prefetching in database systems. ACM Trans. on Database Systems, 3(3):223–247, 1978. [823] J. M. Smith and P. Y.-T. Chang. Optimizing the performance of a relational algebra database interface. Communications of the ACM, 18(10):568–579, 1975. [824] R. Sosic, J. Gu, and R. Johnson. The Unison algorithm: Fast evaluation of boolean expressions. ACM Transactions on Design Automation of Electronic Systems (TODAES), 1:456 – 477, 1996. [825] P. Spellucci. Numerische Verfahren der Nichtlinearen Optimierung. Birkhäuser, 1993. [826] N. Spyratos. An operational approach to data bases. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 212–220, 1982. [827] D. Srivastava, S. Al-Khalifa, H. V. Jagadish, N. Koudas, J. Patel, and Y. Wu. Structural joins: A primitive for efficient XML query pattern matching. In Proc. IEEE Conference on Data Engineering, 2002. [828] D. Srivastava, S. Dar, J. Jagadish, and A. Levy. Answering queries with aggregation using views. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 318–329, 1996. [829] R. Stanley. Enumerative Combinatorics, Volume I, volume 49 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, 1997. [830] H. Steenhagen. Optimization of Object Query Languages. PhD thesis, University of Twente, 1995. [831] H. Steenhagen, P. Apers, and H. Blanken. Optimization of nested queries in a complex object model. In Proc. of the Int. Conf. on Extending Database Technology (EDBT), pages 337–350, 1994. [832] H. Steenhagen, P. Apers, H. Blanken, and R. de By. From nested-loop to join queries in oodb. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 618–629, 1994. 684 BIBLIOGRAPHY [833] H. Steenhagen, R. de By, and H. Blanken. Translating OSQL queries into efficient set expressions. In Proc. of the Int. Conf. on Extending Database Technology (EDBT), pages 183–197, 1996. [834] M. Steinbrunn, G. Moerkotte, and A. Kemper. Heuristic and randomized optimization for the join ordering problem. The VLDB Journal, 6(3):191– 208, Aug. 1997. [835] M. Steinbrunn, K. Peithner, G. Moerkotte, and A. Kemper. Bypassing joins in disjunctive queries. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 228–238, 1995. [836] K. Stocker, D. Kossmann, R. Braumandl, and A. Kemper. Integrating semi-join reducers into state-of-the-art query processors. In Proc. IEEE Conference on Data Engineering, pages 575–584, 2001. [837] L. Stockmeyer and C. Wong. On the number of comparisons to find the intersection of two relations. Technical report, IBM Watson Research Center, 1978. [838] H. Stone and H. Fuller. On the near-optimality of the shortest-latencytime-first drum scheduling discipline. Communications of the ACM, 16(6):352–353, 1973. [839] M. Stonebraker. Inclusion of new types in relational database systems. In Proc. IEEE Conference on Data Engineering, pages ?–?, 1986. [840] M. Stonebraker, J. Anton, and E. Hanson. Extending a database system with procedures. ACM Trans. on Database Systems, 12(3):350–376, Sep 1987. [841] M. Stonebraker and P. Brown. Object-Relational DBMSs, Tracking the Next Great Wave. Morgan Kaufman, 1999. [842] M. Stonebraker et al. QUEL as a data type. In Proc. of the ACM SIGMOD Conf. on Management of Data, Boston, MA, June 1984. [843] M. Stonebraker, A. Jhingran, J. Goh, and S. Potamios. On rules, procedures, caching and views in data base systems. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 281–290, 1990. [844] M. Stonebraker and L. A. Rowe. The design of postgres. In Proc. of the 15nth ACM SIGMOD, pages 340–355, 1986. [845] M. Stonebraker, E. Wong, P. Kreps, and G. Held. The design and implementation of INGRES. ACM Trans. on Database Systems, 1(3):189–222, 1976. [846] D. Straube and T. Özsu. Access plan generation for an object algebra. Technical Report TR 90-20, Department of Computing Science, University of Alberta, June 1990. BIBLIOGRAPHY 685 [847] D. Straube and T. Özsu. Queries and query processing in object-oriented database systems. Technical report, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada, 1990. [848] D. Straube and T. Özsu. Queries and query processing in object-oriented database systems. ACM Trans. on Information Systems, 8(4):387–430, 1990. [849] D. Straube and T. Özsu. Execution plan generation for an object-oriented data model. In Proc. Int. Conf. on Deductive and Object-Oriented Databases (DOOD), pages 43–67, 1991. [850] D. D. Straube. Queries and Query Processing in Object-Oriented Database Systems. PhD thesis, The University of Alberta, Edmonton, Alberta, Canada, Dec 1990. [851] S. Subramanian and S. Venkataraman. Cost-based optimization of decision support queries using transient views. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 319–330, Seattle, WA, 1998. [852] D. Suciu. Query decomposition and view maintenance for query languages for unconstrained data. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 227–238, 1996. [853] N. Südkamp and V. Linnemann. Elimination of views and redundant variables in an SQL-like database language for extended NF2 structures. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 302–313, 1990. [854] Wei Sun and Clement T. Yu. Automatic knowledge acquisition and maintenance for semantic query optimization. IEEE Trans. on Knowledge and Data Engineering, 1(3):362–375, 1989. [855] Wei Sun and Clement T. Yu. Semantic query optimization for tree and chain queries. IEEE Trans. on Knowledge and Data Engineering, 6(1):136–151, 1994. [856] K. Sutner, A. Satyanarayana, and C. Suffel. The complexity of the residual node connectedness reliability problem. SIAM J. Comp., 20(1):149– 155, 1991. [857] P. Svensson. On search performance for conjunctive queries in compressed, fully transposed ordered files. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 155–163, 1979. [858] A. Swami. Optimization of Large Join Queries. PhD thesis, Stanford University, 1989. Technical Report STAN-CS-89-1262. [859] A. Swami. Optimization of large join queries: Combining heuristics and combinatorial techniques. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 367–376, 1989. 686 BIBLIOGRAPHY [860] A. Swami and A. Gupta. Optimization of large join queries. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 8–17, 1988. [861] A. Swami and B. Iyer. A polynomial time algorithm for optimizing join queries. Technical Report RJ 8812, IBM Almaden Research Center, 1992. [862] A. Swami and B. Iyer. A polynomial time algorithm for optimizing join queries. In Proc. IEEE Conference on Data Engineering, pages 345–354, 1993. [863] A. Swami and B. Schiefer. Estimating page fetches for index scans with finite LRU buffers. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 173–184, 1994. [864] A. Swami and B. Schiefer. On the estimation of join result sizes. In Proc. of the Int. Conf. on Extending Database Technology (EDBT), pages 287– 300, 1994. [865] M. Switakowski, P. Boncz, and M. Zukowski. From cooperative scans to predictive buffer management. Proc. of the VLDB Endowment (PVLDB), 5(12):1759–1770, 2012. [866] N. Talagala, R. Arpaci-Dusseau, and D. Patterson. Microbenchmarkbased extraction of local and global disk characteristics. Technical Report UCB-CSD-99-1063, University of Berkeley, 2000. [867] K.-L. Tan and H. Lu. A note on the strategy space of multiway join query optimization problem in parallel systems. SIGMOD Record, 20(4):81–82, 1991. [868] A. Tansel and L. Garnett. On roth, korth, and silberschatz’s extended algebra and calculus for nested relational databases. ACM Trans. on Database Systems, 17(2):374–383, 1992. [869] Y. C. Tay. On the optimality of strategies for multiple joins. Journal of the ACM, 40(5):1067–1086, 1993. [870] T. Teorey and K. Das. Application of an analytical model to evaluate storage structures. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 9–19, 1976. [871] T. Teorey and T. Pinkerton. A comparative analysis of disk scheduling policies. In Proc. of the AFIPS Fall Joint Computer Conference, pages 1–11, 1972. [872] T. Teorey and T. Pinkerton. A comparative analysis of disk scheduling policies. Communications of the ACM, 15(3):177–184, 1972. [873] J. Teubner, T. Grust, and M. Van Keulen. Bridging the gap between relational and native XML storage with staircase join. Grundlagen von Datenbanken, pages 85–89, 2003. 687 BIBLIOGRAPHY [874] N. Thaper, S. Guha, P. Indyk, and N. Koudas. Dynamic multidimensional histograms. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 428–439, 2002. [875] H. To, K. Chiang, and C. Shahabi. Entropy-based histograms for selectivity estimation. In CIKM, pages 1939–1948, 2013. [876] C. Tompkins. Machine attacks on problems whose variables are permutations. Numerical Analysis (Proc. of Symposia in Applied Mathematics), 6, 1956. [877] R. Topor. Join-ordering is NP-complete. Draft, personal communication, 1998. [878] Transaction Processing Council (TPC). http://www.tpc.org, 1995. TPC Benchmark D. [879] Transaction Processing Performance Council, 777 N. First Street, Suite 600, San Jose, CA, USA. TPC Benchmark R, 1999. Revision 1.2.0. http://www.tpc.org. [880] P. Triantafillou, S. Christodoulakis, and C. Georgiadis. A comprehensive analytical performance model for disk devices under random workloads. IEEE Trans. on Knowledge and Data Eng., 14(1):140–155, 2002. [881] O. Tsatalos, M. Solomon, and Y. Ioannidis. The GMAP: A versatile tool for physical data independence. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 367–378, 1994. [882] A. Tsois and T. Sellis. The generalized pre-grouping transformation: Aggregate-query optimization in the presence of dependencies. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 644–655, 2003. [883] K. Tufte and D. Maier. Aggregation and accumulation of XML data. IEEE Data Engineering Bulletin, 24(2):34–39, 2001. [884] K. Tzoumas, A. Deshpande, and C. Jensen. Efficiently adapting graphical models for selectivity estimation. Proc. Int. Conf. on Very Large Data Bases (VLDB), 22:3–27, 2013. [885] Überhuber. Computer Numerik 2. Springer, 1995. [886] J.D. Ullman. Database and Knowledge Base Systems, volume Volume 1. Computer Science Press, 1989. [887] J.D. Ullman. Database and Knowledge Base Systems, volume Volume 2. Computer Science Press, 1989. [888] J.D. Ullman. Database and Knowledge Base Systems. Computer Science Press, 1989. 688 BIBLIOGRAPHY [889] D. Straube und T. Özsu. Query transformation rules for an object algebra. Technical Report TR 89-23, Department of Computing Science, University of Alberta, Sept. 1989. [890] T. Urhan, M. Franklin, and L. Amsaleg. Cost based query scrambling for initial delays. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 130–141, 1998. [891] M. Uysal, G. Alvarez, and A. Merchant. A modular analytical throughput model for modern disk arrays. In MASCOTS, pages 183–192, 2001. [892] P. Valduriez. Join indices. ACM Trans. on Database Systems, 12(2):218– 246, 1987. [893] P. Valduriez and H. Boral. Evaluation of recursive queries using join indices. In Proc. Int. Conf. on Expert Database Systems (EDS), pages 197–208, 1986. [894] P. Valduriez and S. Danforth. Query optimization in database programming languages. In Proc. Int. Conf. on Deductive and Object-Oriented Databases (DOOD), pages 516–534, 1989. [895] L. Valiant. The complexity of computing the permanent. Theoretical Comp. Science, 8:189–201, 1979. [896] L. Valiant. The complexity of enumeration and reliability problems. SIAM J. Comp., 8(3):410–421, 1979. [897] B. Vance. Join-order Optimization with Cartesian Products. PhD thesis, Oregon Graduate Institute of Science and Technology, 1998. [898] B. Vance and D. Maier. Rapid bushy join-order optimization with cartesian products. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 35–46, 1996. [899] S. L. Vandenberg and D. DeWitt. An algebra for complex objects with arrays and identity. Internal report, Computer Sciences Department, University of Wisconsin, Madison, WI 53706, USA, 1990. [900] S. L. Vandenberg and D. DeWitt. Algebraic support for complex objects with arrays, identity, and inheritance. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 158–167, 1991. [901] G. von Bültzingsloewen. Optimizing SQL queries for parallel execution. ACM SIGMOD Record, 1989. [902] G. von Bültzingsloewen. Optimierung von SQL-Anfragen für parallele Bearbeitung (Optimization of SQL-queries for parallel processing). PhD thesis, University of Karlsruhe, 1990. in German. [903] G. von Bültzingsloewen. SQL-Anfragen: Optimierung für parallele Bearbeitung. FZI-Berichte Informatik. Springer, 1991. BIBLIOGRAPHY 689 [904] F. Waas and A. Pellenkoft. Probabilistic bottom-up join order selection – breaking the curse of NP-completeness. Technical Report INS-R9906, CWI, 1999. [905] F. Waas and A. Pellenkoft. Join order selection - good enough is easy. In BNCOD, pages 51–67, 2000. [906] H. Wang and K. Sevcik. Histograms based on the minimum description length principle. The VLDB Journal, 17:419–442, 2008. [907] J. Wang, J. Li, and G. Butler. Implementing the PostgreSQL query optimzier within the OPT++ framework. In Asia-Pacific Software Engineering Conference (APSEC), pages 262–272, 2003. [908] J. Wang, M. Maher, and R. Topor. Rewriting unions of general conjunctive queries using views. In Proc. of the Int. Conf. on Extending Database Technology (EDBT), pages 52–69, 2002. [909] W. Wang, H. Jiang, H. Lu, and J. Yu. Containment join size estimation: Models and methods. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 145–156, 2003. [910] W. Wang, H. Jiang, H. Lu, and J. Yu. Bloom histogram: Path selectivity estimation for xml data with updates. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 240–251, 2004. [911] X. Wang and M. Cherniack. Avoiding ordering and grouping in query processing. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 826–837, 2003. [912] S. Waters. File design fallacies. The Computer Journal, 15(1):1–4, 1972. [913] S. Waters. Hit ratio. Computer Journal, 19(1):21–24, 1976. [914] G. Watson. Approximation Theory and Numerical Methods. AddisonWesley, 1980. [915] H. Wedekind and G. Zörntlein. Prefetching in realtime database applications. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 215–226, 1986. [916] M. Wedekind. On the selection of access paths in a database system. In J. Klimbie and K. Koffeman, editors, IFIP Working Conference Data Base Management, pages 385–397, Amsterdam, 1974. North-Holland. [917] G. Weikum. Set-oriented disk access to large complex objects. In Proc. IEEE Conference on Data Engineering, pages 426–433, 1989. [918] G. Weikum, B. Neumann, and H.-B. Paul. Konzeption und Realisierung einer mengenorientierten Seitenschnittstelle zum effizienten Zugriff auf komplexe Objekte. In Proc. der GI-Fachtagung Datenbanksysteme für Büro, Technik und Wissenschaft (BTW), pages 212–230, 1987. 690 BIBLIOGRAPHY [919] T. Westmann. Effiziente Laufzeitsysteme für Datenlager. PhD thesis, University of Mannheim, 2000. [920] T. Westmann, D. Kossmann, S. Helmer, and G. Moerkotte. The implementation and performance of compressed databases. Technical Report 03/98, University of Mannheim, 1998. [921] T. Westmann, D. Kossmann, S. Helmer, and G. Moerkotte. The implementation and performance of compressed databases. SIGMOD Record, 29(3):55–67, 2000. [922] T. Westmann and G. Moerkotte. Variations on grouping and aggregations. Technical Report 11/99, University of Mannheim, 1999. [923] K.-Y. Whang, A. Malhotra, G. Sockut, and L. Burns. Supporting universal quantification in a two-dimensional database query language. In Proc. IEEE Conference on Data Engineering, pages 68–75, 1990. [924] K.-Y. Whang, B. Vander-Zanden, and H. Taylor. A linear-time probabilistic counting algorithm for database applications. ACM Trans. on Database Systems, 15(2):208–229, 1990. [925] K.-Y. Whang, G. Wiederhold, and D. Sagalowicz. Estimating block accesses in database organizations: A closed noniterative formula. Communications of the ACM, 26(11):940–944, 1983. [926] N. Wilhelm. A general model for the performance of disk systems. Journal of the ACM, 24(1):14–31, 1977. [927] D. E. Willard. Efficient processing of relational calculus queries using range query theory. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 164–175, 1984. [928] C. Williams and T. Hogg. Using deep structure to locate hard problems. In Proc. National Conference on Artificial Intelligence, pages 472–477, 1992. [929] J. Wolf, R. Iyer, K. Pattipati, and J. Turek. Optimal buffer partitioning for the nested block join algorithm. In Proc. IEEE Conference on Data Engineering, pages 510–519, 1991. [930] C. Wong. Minimizing expected head movement in one-dimensional and two-dimensional mass storage systems. ACM Computing Surveys, 12(2):167–177, 1980. [931] F. Wong and K. Youssefi. Decomposition – a strategy for query processing. ACM Trans. on Database Systems, 1(3):223–241, 1976. [932] H. Wong and J. Li. Transposition algorithms on very large compressed databases. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 304–311, 1986. BIBLIOGRAPHY 691 [933] P. Wood. On the equivalence of XML patterns. In CL 2000, 2000. [934] P. Wood. Minimizing simple XPath expressions. In Int. Workshop on Database Programming Languages, pages 13–18, 2001. [935] P. Wood. Containment for XPath fragments under dtd constraints. In Proc. Int. Conf. on Database Theory (ICDT), pages 300–314, 2003. [936] W. A. Woods. Procedural semantics for question-answering systems. In FJCC (AFIPS Vol. 33 Part I), pages 457–471, 1968. [937] B. Worthington, G. Ganger, Y. Patt, and J. Wilkes. Scheduling algorithms for modern disk drives. In Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pages 241–251, 1994. [938] B. Worthington, G. Ganger, Y. Patt, and J. Wilkes. On-line extraction of SCSI disk drive parameters. In Proc. ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, pages 146–156, 1995. [939] B. Worthington, G. Ganger, Y. Patt, and J. Wilkes. On-line extraction of SCSI disk drive parameters. Technical Report CSE-TR-323-96, University of Michigan, 1996. [940] M.-C. Wu. Query optimization for selections using bitmaps. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 227–238, 1999. [941] Y. Wu, J. Patel, and H.V. Jagadish. Estimating answer sizes for XML queries. In Proc. of the Int. Conf. on Extending Database Technology (EDBT), pages 590–608, 2002. [942] Y. Wu, J. Patel, and H.V. Jagadish. Estimating answer sizes for XML queries. Information Systems, 28(1-2):33–59, 2003. [943] Z. Xie. Optimization of object queries containing encapsulated methods. In Proc. 2nd. Int. Conf. on Information and Knowledge Management, pages 451–460, 1993. [944] Z. Xie and J. Han. Join index hierarchies for supporting efficient navigation in object-oriented databases. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 522–533, 1994. [945] G. D. Xu. Search control in semantic query optimization. Technical Report 83-09, COINS, University of Massachusetts, Amherst, MA, 1983. [946] W. Yan and P.-A. Larson. Performing group-by before join. In Proc. IEEE Conference on Data Engineering, pages 89–100, 1994. [947] W. Yan and P.-A. Larson. Eager aggregation and lazy aggregation. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 345–357, 1995. [948] H. Yang and P.-A. Larson. Query transformation for PSJ-queries. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 245–254, 1987. 692 BIBLIOGRAPHY [949] J. Yang, K. Karlapalem, and Q. Li. Algorithms for materialized view design in data warehousing environment. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 136–145, 1997. [950] Qi Yang. Computation of chain queries in distributed database systems. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 348–355, 1994. [951] M. Yannakakis. Algorithms for acyclic database schemes. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 82–94, 1981. [952] B. Yao and T. Özsu. XBench – A Family of Benchmarks for XML DBMSs. Technical Report CS-2002-39, University of Waterloo, 2002. [953] B. Yao, T. Özsu, and N. Khandelwal. Xbench benchmark and performance testing of XML DBMSs. In Proc. IEEE Conference on Data Engineering, pages 621–632, 2004. [954] S. B. Yao. Approximating block accesses in database organizations. Communications of the ACM, 20(4):260–261, 1977. [955] S. B. Yao. An attribute based model for database access cost analysis. ACM Trans. on Database Systems, 2(1):45–67, 1977. [956] S. B. Yao and D. DeJong. Evaluation of database access paths. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 66–77, 1978. [957] S.B. Yao. Optimization of query evaluation algorithms. ACM Trans. on Database Systems, 4(2):133–155, 1979. [958] S.B. Yao, A.R. Hevner, and H. Young-Myers. Analysis of database system architectures using benchmarks. IEEE Trans. on Software Eng., SE-13(6):709–725, 1987. [959] J. Yiannis and J. Zobel. Compression techniques for fast external sorting. VLDB Journal, 16(2):269–291, 2007. [960] Y. Yoo and S. Lafortune. An intelligent search method for query optimization by semijoins. IEEE Trans. on Knowledge and Data Eng., 1(2):226– 237, June 1989. [961] M. Yoshikawa, T. Amagasa, T. Shimura, and S. Uemura. Xrel: A pathbased approach to storage and retrieval of XML documents using relational databases. ACM Transactions on Internet Technology, 1(1):110–141, June 2001. [962] K. Youssefi and E. Wong. Query processing in a relational database management system. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 409–417, 1979. [963] C. T. Yu, W. S. Luk, and M. K. Siu. On the estimation of the number of desired records with respect to a given query. ACM Trans. on Database Systems, 3(1):41–56, 1978. BIBLIOGRAPHY 693 [964] L. Yu and S. L. Osborn. An evaluation framework for algebraic objectoriented query models. In Proc. IEEE Conference on Data Engineering, 1991. [965] X. Yu, N. Koudas, and C. Zuzarte. HASE: a hybrid approach to selectivity estimation for conjunctive predicates. In Proc. of the Int. Conf. on Extending Database Technology (EDBT), pages 460–477, 2006. [966] X. Yu, C. Zuzarte, and K. Sevcik. Towards estimating the number of distinct value combinations for a set of attributes. In CIKM, pages 656– 663, 2005. [967] J. Zahorjan, B. Bell, and K. Sevcik. Estimating block transfers when record access probabilities are non-uniform. Information Processing Letters, 16(5):249–252, 1983. [968] B. T. Vander Zander, H. M. Taylor, and D. Bitton. Estimating block accesses when attributes are correlated. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 119–127, 1986. [969] S. Zdonik and G. Mitchell. ENCORE: An object-oriented approach to database modelling and querying. IEEE Data Engineering Bulletin, 14(2):53–57, June 1991. [970] N. Zhang, V. Kacholia, and T. Özsu. A succinct physical storage scheme for efficient evaluation of path queries in XML. In Proc. IEEE Conference on Data Engineering, pages 54–65, 2004. [971] N. Zhang and T. Özsu. Optimizing correlated path expressions in XML languages. Technical Report CS-2002-36, University of Waterloo, 2002. [972] Y. Zhao, P. Deshpande, J. Naughton, and A. Shukla. Simultaneous optimization and evaluation of multiple dimensional queries. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 271–282, 1998. [973] Y. Zhuge, H. Garcia-Molina, J. Hammer, and J. Widom. View maintenance in a warehouse environment. In Proc. of the ACM SIGMOD Conf. on Management of Data, 1995. 694 BIBLIOGRAPHY Appendix E ToDo • size of a query in rel alg: [730] • [908] • Integrating Buffer Issues into Query Optimization: [211, 473] • Integrating concurrency control issues into query optimization: [624, 625] • [101] • where do we put ”counting page accesses”? • control, A∗ , ballooning: [611, 610] • Bypass Plans • Properties (rather complete list, partial ordering, plan independent properties: store them somewhere else (dpstructure or memostructure)) • describe prep-phase of plan generator • reuse plans: [790] • estimating query compilation time: [444] • cost model [793] • sensivitity of QO to storage access cost parameters [727] (and join selectivities on join order: [521] [papier ist nicht ernst zu nehmen]) • magic set and semi join reducers [79, 81, 80, 175, 341, 633, 631, 633, 632, 791, 836, 960] • join indexes and clustering tuples of different relations with 1:n relationship [230, 401, 892, 893, 813] • B-Trees with space filling curves (Bayer) • Prefetching [915] 695 696 APPENDIX E. TODO • feedback to optimizer [508] • compression [26, 55, 167, 207, 259, 258, 335, 342] [647, 720, 745, 794, 795, 857, 920, 932] • semantic QO SQO: [1, 82, 172, 332, 365, 495, 496, 517, 525] [648, 656, 657, 681, 808, 815, 817, 945] [552] • join processing with nonclustered indexes: [649] • join+buffer: [929] • removal/elimination of redundant joins [659, 853] • benchmark(ing): Gray Book: [366]; papers: [98, 108, 667, 751, 775, 958, 952, 953] • dynamic qo: [671] [29, 890] [43] [474] • unnesting: [675, 718, 641] • prefetching: [679, 678, 822, 915] • Starburst: [689, 690] • BXP: [107, 260, 310, 375, 397, 437, 491, 694, 726, 814, 821, 824, 490] BXP complexity: [77] BXP var infl: [475] • joins: [701] • query folding: [703] • quantification: [114, 113, 185, 186, 715, 708, 923] [467] • outerjoins: [85, 88, 218, 308, 299, 298, 717, 736] • partial match + hashing: [710] • OODB indexing by class division: [193, 714] • decision support [719] • tree structured databases: [728] • Rosenthal: [739, 723, 740, 741, 724, 735, 738, 737] • conj. queries [733] • aggregation/(generalized proj): [120, 193, 300, 744] [381, 382, 405] • do nest/unnest to optimize duplicate work: [742] e1 BA1 =A2 e2 ≡ µg (e1 BA1 =A2 Γg;=A2 ;id (e2 )) • join size: [734] • fragmentation: [755] 697 • eqv: [17, 18] • alg eqvs union/difference: [759] [908] • other sagiv: [757, 758] • bayesian approach to QO: [638] • cache query plans: [790] • joins for horizontally fragmentation: [783] • partitioning: [56, 106, 395, 478, 651] • MQO: [25, 140, 138, 788, 787, 972] • indexing+caching: [786] • rule-based QO: [796, 66, 67, 295] • rule-based IRIS: [229] • cost: [837] [891] • search space: [867], join ordering: [869] • access path: [132, 916, 956, 99] • eff aggr: [296] [922] • misc: [927] [9] [13] • access paths: bitmaps [940] • dist db: [37, 38, 97, 216, 950] Donald’s state of the art: [514] • [147, 148] • eqv: bags [21, 221] • eqvs old: [22] • DB2: norwegian analysis: [31] • nested: [46] • Genesis/Praire/Batory: [57, 61, 60, 62, 217] • eqvs OO: [68, 69] • dupelim: [91] • (generalized) division: [130, 215, 355, 344] • early aggregation • chunks-wise processing [231, 351] 698 APPENDIX E. TODO • temporal intersection join: [378] • 2nd ord sig: Güting: [386] • classics: [392] • smallest first: [396] • Hwang/Yu: [436] • Kambayashi: [477] • Koch [506], Lehnert [543] • I/O cost reduction for (hash) joins: [563, 604] • dist nest: [265] • band join: [570] • Donovan (TODS 76,1,4) Decision Support: [247] • whenever materialize something (sort, hash join, etc) compute min/max of some attributes and use these as additional selection predicates • determine optimal page access sequence and buffer size to access pairs (x,y) of pages where join partners of one relation lie on x and of the other on y (Fotouhi, Pramanik [292], Merret, Kambayashi, Yasuura [604], Omiecinski [649], Pramanik, Ittner [701], Chan, Ooi [142]) • buffer mgmt: [865] • Scarcello, Greco, Leone: [763] • Sigmod05: – proactive reoptimization [45] – robust query optimizer [44] – stacked indexed views [225] – NF2 -approach to processing nested sql queries [123] – efficient computatio of multiple groupby queries [169] • LOCI: [670] • Wavelet synopses: [321] • Progress Indicators: [153, 574, 575] • PostgresExperience: [907] • Chaudhuri sigmod 05: [44] • Cesar on testing sql server [329] 699 • Bruno, Galindo-Legaria, Joshi [112] which is like GOO with the two additional techniques of pushing partial plans down and pulling partial plans up whenever a new join is added. • incremental evaluation to justify θ, q-acceptability [640] • Exeuction Strategies for sql subqueries by Cesar [263] • PIVOT, UNPIVOT: optimization/execution in sql server by Cesar [214] • statistical views by cesar [301] • multiway joins [426] • [518]: heuristics to order index anding. • [875]: Entropy-based histograms for selectivity estimation • [884]: efficiently adapting graphical models for selectivity estimation • [411]: multidim selectivity estimation via kernel density • [?]: effective and complete discovery of order-dependencies via set-based axiomatization • section joinorder/top-down: pit, pruning [798] • chapter plan generation • chapter unnesting Oracle: Coalescing [72]; • benchmarking query optimizers: [?] • Herodotou, Borisov, Babu: Query Optimization Techniques for Partitioned Tables • Al-Kateb, Sinclair, Au, Ballinger: Hybrid Row-Column Partitioning in Teradata • Antova, El-Helw, Soliman, Gu, ZPetropoulos, Waas: Optimizing Queries over Partitioned Tables in MPP Systems. • Chen, Yi: Two-Level Sampling for Join Size Estimation [?] • check: joinorder chapter: ccp: Fig3: numbers for #ccp for chain for n=20, formel:nn -¿2n • cardinality estimation: Shekelyan [?] • cardinality estimation: Kyuseok Shim [?] • heuristics join ordering [805] • predicate inference [854] • merge min/max subqueries if possible using tableau equivalence [14] • Union-All-Duplicate Operator (UAD) [16]