Matters Computational Ideas, Algorithms, Source Code Jörg Arndt ii CONTENTS iii Contents Preface xi I 1 Low level algorithms 1 Bit wizardry 1.1 Trivia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Operations on individual bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Operations on low bits or blocks of a word . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Extraction of ones, zeros, or blocks near transitions . . . . . . . . . . . . . . . . . . . . . 1.5 Computing the index of a single set bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Operations on high bits or blocks of a word . . . . . . . . . . . . . . . . . . . . . . . . . . 1.7 Functions related to the base-2 logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 Counting the bits and blocks of a word . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.9 Words as bitsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.10 Index of the i-th set bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.11 Avoiding branches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.12 Bit-wise rotation of a word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.13 Binary necklaces ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.14 Reversing the bits of a word . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.15 Bit-wise zip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.16 Gray code and parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.17 Bit sequency ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.18 Powers of the Gray code ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.19 Invertible transforms on words ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.20 Scanning for zero bytes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.21 Inverse and square root modulo 2n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.22 Radix −2 (minus two) representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.23 A sparse signed binary representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.24 Generating bit combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.25 Generating bit subsets of a given word . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.26 Binary words in lexicographic order for subsets . . . . . . . . . . . . . . . . . . . . . . . . 1.27 Fibonacci words ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.28 Binary words and parentheses strings ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.29 Permutations via primitives ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.30 CPU instructions often missed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.31 Some space filling curves ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 7 8 11 13 14 17 18 23 25 25 27 29 33 38 41 46 48 49 55 56 58 61 62 68 70 74 78 80 82 83 2 Permutations and their operations 102 2.1 Basic definitions and operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 2.2 Representation as disjoint cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 2.3 Compositions of permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 iv CONTENTS 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 In-place methods to apply permutations to data . . . . . . . . . . . . . . . . . . . . . . . Random permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The revbin permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The radix permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . In-place matrix transposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rotation by triple reversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The zip permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The XOR permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Gray permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The reversed Gray permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 111 118 121 122 123 125 127 128 131 3 Sorting and searching 134 3.1 Sorting algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 3.2 Binary search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 3.3 Variants of sorting methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 3.4 Searching in unsorted arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 3.5 Determination of equivalence classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 4 Data structures 153 4.1 Stack (LIFO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 4.2 Ring buffer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 4.3 Queue (FIFO) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 4.4 Deque (double-ended queue) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158 4.5 Heap and priority queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 4.6 Bit-array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 4.7 Left-right array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 II Combinatorial generation 171 5 Conventions and considerations 172 5.1 Representations and orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 5.2 Ranking, unranking, and counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 5.3 Characteristics of the algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 5.4 Optimization techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.5 Implementations, demo-programs, and timings . . . . . . . . . . . . . . . . . . . . . . . . 174 6 Combinations 176 6.1 Binomial coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 6.2 Lexicographic and co-lexicographic order . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 6.3 Order by prefix shifts (cool-lex) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 6.4 Minimal-change order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 6.5 The Eades-McKay strong minimal-change order . . . . . . . . . . . . . . . . . . . . . . . 183 6.6 Two-close orderings via endo/enup moves . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 6.7 Recursive generation of certain orderings . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 7 Compositions 194 7.1 Co-lexicographic order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194 7.2 Co-lexicographic order for compositions into exactly k parts . . . . . . . . . . . . . . . . 196 7.3 Compositions and combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 7.4 Minimal-change orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 8 Subsets 202 8.1 Lexicographic order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 8.2 Minimal-change order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 CONTENTS 8.3 8.4 8.5 v Ordering with De Bruijn sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Shifts-order for subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 k-subsets where k lies in a given range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 9 Mixed radix numbers 217 9.1 Counting (lexicographic) order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217 9.2 Minimal-change (Gray code) order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 9.3 gslex order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224 9.4 endo order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 9.5 Gray code for endo order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 9.6 Fixed sum of digits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 10 Permutations 232 10.1 Factorial representations of permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 10.2 Lexicographic order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 10.3 Co-lexicographic order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 10.4 An order from reversing prefixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245 10.5 Minimal-change order (Heap’s algorithm) . . . . . . . . . . . . . . . . . . . . . . . . . . . 248 10.6 Lipski’s Minimal-change orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 10.7 Strong minimal-change order (Trotter’s algorithm) . . . . . . . . . . . . . . . . . . . . . . 254 10.8 Star-transposition order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 10.9 Minimal-change orders from factorial numbers . . . . . . . . . . . . . . . . . . . . . . . . 258 10.10 Derangement order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 10.11 Orders where the smallest element always moves right . . . . . . . . . . . . . . . . . . . . 267 10.12 Single track orders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271 11 Permutations with special properties 277 11.1 The number of certain permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 11.2 Permutations with distance restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 11.3 Self-inverse permutations (involutions) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 11.4 Cyclic permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 12 k-permutations 291 12.1 Lexicographic order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 12.2 Minimal-change order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293 13 Multisets 295 13.1 Subsets of a multiset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 13.2 Permutations of a multiset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 14 Gray codes for strings with restrictions 304 14.1 List recursions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304 14.2 Fibonacci words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305 14.3 Generalized Fibonacci words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 14.4 Run-length limited (RLL) words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 14.5 Digit x followed by at least x zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311 14.6 Generalized Pell words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 14.7 Sparse signed binary words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 14.8 Strings with no two consecutive nonzero digits . . . . . . . . . . . . . . . . . . . . . . . . 317 14.9 Strings with no two consecutive zeros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318 14.10 Binary strings without substrings 1x1 or 1xy1 ‡ . . . . . . . . . . . . . . . . . . . . . . . 320 15 Parentheses strings 323 15.1 Co-lexicographic order . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 15.2 Gray code via restricted growth strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 vi CONTENTS 15.3 Order by prefix shifts (cool-lex) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 15.4 Catalan numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 15.5 Increment-i RGS, k-ary Dyck words, and k-ary trees . . . . . . . . . . . . . . . . . . . . . 333 16 Integer partitions 339 16.1 Solution of a generalized problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 16.2 Iterative algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341 16.3 Partitions into m parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 16.4 The number of integer partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 17 Set partitions 354 17.1 Recursive generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 17.2 The number of set partitions: Stirling set numbers and Bell numbers . . . . . . . . . . . 358 17.3 Restricted growth strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 18 Necklaces and Lyndon words 370 18.1 Generating all necklaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 18.2 Lex-min De Bruijn sequence from necklaces . . . . . . . . . . . . . . . . . . . . . . . . . . 377 18.3 The number of binary necklaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 18.4 Sums of roots of unity that are zero ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 19 Hadamard and conference matrices 384 19.1 Hadamard matrices via LFSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 19.2 Hadamard matrices via conference matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 386 19.3 Conference matrices via finite fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 388 20 Searching paths in directed graphs ‡ 391 20.1 Representation of digraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392 20.2 Searching full paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 20.3 Conditional search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398 20.4 Edge sorting and lucky paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402 20.5 Gray codes for Lyndon words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403 III Fast transforms 409 21 The Fourier transform 410 21.1 The discrete Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 410 21.2 Radix-2 FFT algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411 21.3 Saving trigonometric computations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416 21.4 Higher radix FFT algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418 21.5 Split-radix algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 21.6 Symmetries of the Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 428 21.7 Inverse FFT for free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430 21.8 Real-valued Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 431 21.9 Multi-dimensional Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 21.10 The matrix Fourier algorithm (MFA) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 438 22 Convolution, correlation, and more FFT algorithms 440 22.1 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 440 22.2 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444 22.3 Correlation, convolution, and circulant matrices ‡ . . . . . . . . . . . . . . . . . . . . . . 447 22.4 Weighted Fourier transforms and convolutions . . . . . . . . . . . . . . . . . . . . . . . . 448 22.5 Convolution using the MFA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451 22.6 The z-transform (ZT) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454 CONTENTS vii 22.7 Prime length FFTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 457 23 The Walsh transform and its relatives 459 23.1 Transform with Walsh-Kronecker basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 23.2 Eigenvectors of the Walsh transform ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 461 23.3 The Kronecker product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462 23.4 Higher radix Walsh transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465 23.5 Localized Walsh transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468 23.6 Transform with Walsh-Paley basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473 23.7 Sequency-ordered Walsh transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474 23.8 XOR (dyadic) convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 481 23.9 Slant transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482 23.10 Arithmetic transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483 23.11 Reed-Muller transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486 23.12 The OR-convolution and the AND-convolution . . . . . . . . . . . . . . . . . . . . . . . . 489 23.13 The MAX-convolution ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 491 23.14 Weighted arithmetic transform and subset convolution . . . . . . . . . . . . . . . . . . . . 492 24 The Haar transform 497 24.1 The ‘standard’ Haar transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497 24.2 In-place Haar transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 499 24.3 Non-normalized Haar transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 501 24.4 Transposed Haar transforms ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 24.5 The reversed Haar transform ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 24.6 Relations between Walsh and Haar transforms . . . . . . . . . . . . . . . . . . . . . . . . 507 24.7 Prefix transform and prefix convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 510 24.8 Nonstandard splitting schemes ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512 25 The Hartley transform 515 25.1 Definition and symmetries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 25.2 Radix-2 FHT algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515 25.3 Complex FFT by FHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 521 25.4 Complex FFT by complex FHT and vice versa . . . . . . . . . . . . . . . . . . . . . . . . 522 25.5 Real FFT by FHT and vice versa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523 25.6 Higher radix FHT algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 524 25.7 Convolution via FHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525 25.8 Localized FHT algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529 25.9 2-dimensional FHTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530 25.10 Automatic generation of transform code . . . . . . . . . . . . . . . . . . . . . . . . . . . . 531 25.11 Eigenvectors of the Fourier and Hartley transform ‡ . . . . . . . . . . . . . . . . . . . . . 533 26 Number theoretic transforms (NTTs) 535 26.1 Prime moduli for NTTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 535 26.2 Implementation of NTTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 26.3 Convolution with NTTs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542 27 Fast wavelet transforms 543 27.1 Wavelet filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543 27.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544 27.3 Moment conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 546 IV Fast arithmetic 28 Fast multiplication and exponentiation 549 550 viii CONTENTS 28.1 28.2 28.3 28.4 28.5 Splitting schemes for multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fast multiplication via FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Radix/precision considerations with FFT multiplication . . . . . . . . . . . . . . . . . . . The sum-of-digits test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Binary exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 550 558 560 562 563 29 Root extraction 567 29.1 Division, square root and cube root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 567 29.2 Root extraction for rationals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 570 29.3 Divisionless iterations for the inverse a-th root . . . . . . . . . . . . . . . . . . . . . . . . 572 29.4 Initial approximations for iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 29.5 Some applications of the matrix square root . . . . . . . . . . . . . . . . . . . . . . . . . 576 29.6 Goldschmidt’s algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581 29.7 Products for the a-th root ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583 29.8 Divisionless iterations for polynomial roots . . . . . . . . . . . . . . . . . . . . . . . . . . 586 30 Iterations for the inversion of a function 587 30.1 Iterations and their rate of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587 30.2 Schröder’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588 30.3 Householder’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592 30.4 Dealing with multiple roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 30.5 More iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 594 30.6 Convergence improvement by the delta squared process . . . . . . . . . . . . . . . . . . . 598 31 The AGM, elliptic integrals, and algorithms for computing π 599 31.1 The arithmetic-geometric mean (AGM) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 599 31.2 The elliptic integrals K and E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600 31.3 Theta functions, eta functions, and singular values . . . . . . . . . . . . . . . . . . . . . . 604 31.4 AGM-type algorithms for hypergeometric functions . . . . . . . . . . . . . . . . . . . . . 611 31.5 Computation of π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615 32 Logarithm and exponential function 622 32.1 Logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622 32.2 Exponential function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 627 32.3 Logarithm and exponential function of power series . . . . . . . . . . . . . . . . . . . . . 630 32.4 Simultaneous computation of logarithms of small primes . . . . . . . . . . . . . . . . . . 632 32.5 Arctangent relations for π ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 633 33 Computing the elementary functions with limited resources 641 33.1 Shift-and-add algorithms for logb (x) and bx . . . . . . . . . . . . . . . . . . . . . . . . . . 641 33.2 CORDIC algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646 34 Numerical evaluation of power series 651 34.1 The binary splitting algorithm for rational series . . . . . . . . . . . . . . . . . . . . . . . 651 34.2 Rectangular schemes for evaluation of power series . . . . . . . . . . . . . . . . . . . . . . 658 34.3 The magic sumalt algorithm for alternating series . . . . . . . . . . . . . . . . . . . . . . 662 35 Recurrences and Chebyshev polynomials 666 35.1 Recurrences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 666 35.2 Chebyshev polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 676 36 Hypergeometric series 685 36.1 Definition and basic operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685 36.2 Transformations of hypergeometric series . . . . . . . . . . . . . . . . . . . . . . . . . . . 688 36.3 Examples: elementary functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 694 CONTENTS ix 36.4 Transformations for elliptic integrals ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700 36.5 The function xx ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 702 37 Cyclotomic polynomials, product forms, and continued fractions 704 37.1 Cyclotomic polynomials, Möbius inversion, Lambert series . . . . . . . . . . . . . . . . . 704 37.2 Conversion of power series to infinite products . . . . . . . . . . . . . . . . . . . . . . . . 709 37.3 Continued fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 716 38 Synthetic Iterations ‡ 726 38.1 A variation of the iteration for the inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . 726 38.2 An iteration related to the Thue constant . . . . . . . . . . . . . . . . . . . . . . . . . . . 730 38.3 An iteration related to the Golay-Rudin-Shapiro sequence . . . . . . . . . . . . . . . . . . 731 38.4 Iteration related to the ruler function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 733 38.5 An iteration related to the period-doubling sequence . . . . . . . . . . . . . . . . . . . . . 734 38.6 An iteration from substitution rules with sign . . . . . . . . . . . . . . . . . . . . . . . . 738 38.7 Iterations related to the sum of digits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 739 38.8 Iterations related to the binary Gray code . . . . . . . . . . . . . . . . . . . . . . . . . . . 741 38.9 A function encoding the Hilbert curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 747 38.10 Sparse power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 750 38.11 An iteration related to the Fibonacci numbers . . . . . . . . . . . . . . . . . . . . . . . . 753 38.12 Iterations related to the Pell numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757 V Algorithms for finite fields 763 39 Modular arithmetic and some number theory 764 39.1 Implementation of the arithmetic operations . . . . . . . . . . . . . . . . . . . . . . . . . 764 39.2 Modular reduction with structured primes . . . . . . . . . . . . . . . . . . . . . . . . . . 768 39.3 The sieve of Eratosthenes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 770 39.4 The Chinese Remainder Theorem (CRT) . . . . . . . . . . . . . . . . . . . . . . . . . . . 772 39.5 The order of an element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 774 39.6 Prime modulus: the field Z/pZ = Fp = GF(p) . . . . . . . . . . . . . . . . . . . . . . . . 776 39.7 Composite modulus: the ring Z/mZ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776 39.8 Quadratic residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 781 39.9 Computation of a square root modulo m . . . . . . . . . . . . . . . . . . . . . . . . . . . 784 39.10 The Rabin-Miller test for compositeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 786 39.11 Proving primality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 792 39.12 Complex modulus: the field GF(p2 ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804 39.13 Solving the Pell equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 812 39.14 Multiplication of hypercomplex numbers ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . 815 40 Binary polynomials 822 40.1 The basic arithmetical operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 822 40.2 Multiplying binary polynomials of high degree . . . . . . . . . . . . . . . . . . . . . . . . 827 40.3 Modular arithmetic with binary polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 832 40.4 Irreducible polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 837 40.5 Primitive polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 841 40.6 The number of irreducible and primitive polynomials . . . . . . . . . . . . . . . . . . . . 843 40.7 Transformations that preserve irreducibility . . . . . . . . . . . . . . . . . . . . . . . . . . 845 40.8 Self-reciprocal polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 846 40.9 Irreducible and primitive polynomials of special forms ‡ . . . . . . . . . . . . . . . . . . . 848 40.10 Generating irreducible polynomials from Lyndon words . . . . . . . . . . . . . . . . . . . 856 40.11 Irreducible and cyclotomic polynomials ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . 857 40.12 Factorization of binary polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858 x CONTENTS 41 Shift registers 864 41.1 Linear feedback shift registers (LFSR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 864 41.2 Galois and Fibonacci setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867 41.3 Error detection by hashing: the CRC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 868 41.4 Generating all revbin pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 873 41.5 The number of m-sequences and De Bruijn sequences . . . . . . . . . . . . . . . . . . . . 873 41.6 Auto-correlation of m-sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 875 41.7 Feedback carry shift registers (FCSR) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876 41.8 Linear hybrid cellular automata (LHCA) . . . . . . . . . . . . . . . . . . . . . . . . . . . 878 41.9 Additive linear hybrid cellular automata . . . . . . . . . . . . . . . . . . . . . . . . . . . 882 42 Binary finite fields: GF(2n ) 886 42.1 Arithmetic and basic properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 886 42.2 Minimal polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 892 42.3 Fast computation of the trace vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 895 42.4 Solving quadratic equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896 42.5 Representation by matrices ‡ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899 42.6 Representation by normal bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 900 42.7 Conversion between normal and polynomial representation . . . . . . . . . . . . . . . . . 910 42.8 Optimal normal bases (ONB) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 912 42.9 Gaussian normal bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914 A The electronic version of the book 921 B Machine used for benchmarking 922 C The GP language 923 Bibliography 931 Index 951 Preface This is a book for the computationalist, whether a working programmer or anyone interested in methods of computation. The focus is on material that does not usually appear in textbooks on algorithms. Where necessary the underlying ideas are explained and the algorithms are given formally. It is assumed that the reader is able to understand the given source code, it is considered part of the text. We use the C++ programming language for low-level algorithms. However, only a minimal set of features beyond plain C is used, most importantly classes and templates. For material where technicalities in the C++ code would obscure the underlying ideas we use either pseudocode or, with arithmetical algorithms, the GP language. Appendix C gives an introduction to GP. Example computations are often given with an algorithm, these are usually made with the demo programs referred to. Most of the listings and figures in this book were created with these programs. A recurring topic is practical efficiency of the implementations. Various optimization techniques are described and the actual performance of many given implementations is indicated. The accompanying software, the FXT [21] and the hfloat [22] libraries, are written for POSIX compliant platforms such as the Linux and BSD operating systems. The license is the GNU General Public License (GPL), version 3 or later, see http://www.gnu.org/licenses/gpl.html. Individual chapters are self-contained where possible and references to related material are given where needed. The symbol ‘ ‡ ’ marks sections that can be skipped at first reading. These typically contain excursions or more advanced material. Each item in the bibliography is followed by a list of page numbers where citations occur. With papers that are available for free download the respective URL is given. Note that the URL may point to a preprint which can differ from the final version of the paper. An electronic version of this book is available online, see appendix A. Given the amount of material treated there must be errors in this book. Corrections and suggestions for improvement are appreciated, the preferred way of communication is electronic mail. A list of errata is online at http://www.jjj.de/ fxt/#fxtbook. Many people helped to improve this book. It is my pleasure to thank them all, particularly helpful were Igal Aharonovich, Max Alekseyev, Marcus Blackburn, Nathan Bullock, Dominique Delande, Mike Engber, Torsten Finke, Sean Furlong, Almaz Gaifullin, Pedro Gimeno, Alexander Glyzov, R. W. Gosper, Andreas Grünbacher, Lance Gurney, Markus Gyger, Christoph Haenel, Tony Hardie-Bick, Laszlo Hars, Thomas Harte, Stephen Hartke, Christian Hey, Jeff Hurchalla, Derek M. Jones, Gideon Klimer, Richard B. Kreckel, Mike Kundmann, Gál László, Dirk Lattermann, Avery Lee, Brent Lehman, Marc Lehmann, Paul C. Leopardi, John Lien, Mirko Liss, Robert C. Long, Fred Lunnon, Johannes Middeke, Doug Moore, Fábio Moreira, Andrew Morris, David Nalepa, Samuel Neves, Matthew Oliver, Miroslaw Osys, Christoph Pacher, Krisztián Paczári, Scott Paine, Yves Paradis, Gunther Piez, André Piotrowski, David Garcı́a Quintas, Andreas Raseghi, Tony Reix, Johan Rönnblom, Uwe Schmelich, Thomas Schraitle, Clive Scott, Mukund Sivaraman, Michal Staruch, Ralf Stephan, Mikko Tommila, Sebastiano Vigna, Michael Roby Wetherfield, Jim White, Vinnie Winkler, John Youngquist, Rui Zhang, and Paul Zimmermann. Special thanks go to Edith Parzefall and Michael Somos for independently proofreading the whole text (the remaining errors are mine), and to Neil Sloane for creating the On-Line Encyclopedia of Integer Sequences [312]. jj Nürnberg, Germany, June 2010 “Why make things difficult, when it is possible to make them cryptic and totally illogical, with just a little bit more effort?” — Aksel Peter Jørgensen 1 Part I Low level algorithms 2 Chapter 1: Bit wizardry Chapter 1 Bit wizardry We give low-level functions for binary words, such as isolation of the lowest set bit or counting all set bits. Sometimes the term ‘one’ is used for a set bit and ‘zero’ for an unset bit. Where it cannot cause confusion, the term ‘bit’ is used for a set bit (as in “counting the bits of a word”). The C-type unsigned long is abbreviated as ulong as defined in [FXT: fxttypes.h]. It is assumed that BITS_PER_LONG reflects the size of an unsigned long. It is defined in [FXT: bits/bitsperlong.h] and usually equals the machine word size: 32 on 32-bit architectures, and 64 on 64-bit machines. Further, the quantity BYTES_PER_LONG reflects the number of bytes in a machine word: it equals BITS_PER_LONG divided by eight. For some functions it is assumed that long and ulong have the same number of bits. Many functions will only work on machines that use two’s complement, which is used by all of the current general purpose computers (the only machines using one’s complement appear to be some successors of the UNIVAC system, see [358, entry “UNIVAC 1100/2200 series”]). The examples of assembler code are for the x86 and the AMD64 architecture. They should be simple enough to be understood by readers who know assembler for any CPU. 1.1 Trivia 1.1.1 Little endian versus big endian The order in which the bytes of an integer are stored in memory can start with the least significant byte (little endian machine) or with the most significant byte (big endian machine). The hexadecimal number 0x0D0C0B0A will be stored in the following manner if memory addresses grow from left to right: adr: mem: mem: z 0D 0A z+1 0C 0B z+2 0B 0C z+3 0A 0D // big endian // little endian The difference becomes visible when you cast pointers. Let V be the 32-bit integer with the value above. Then the result of char c = *(char *)(&V); will be 0x0A (value modulo 256) on a little endian machine but 0x0D (value divided by 224 ) on a big endian machine. Though friends of big endian sometimes refer to little endian as ‘wrong endian’, the desired result of the shown pointer cast is much more often the modulo operation. Whenever words are serialized into bytes, as with transfer over a network or to a disk, one will need two code versions, one for big endian and one for little endian machines. The C-type union (with words and bytes) may also require separate treatment for big and little endian architectures. 1.1.2 Size of pointer is not size of int If programming for a 32-bit architecture (where the size of int and long coincide), casting pointers to integers (and back) will usually work. The same code will fail on 64-bit machines. If you have to cast pointers to an integer type, cast them to a sufficiently big type. For portable code it is better to avoid casting pointers to integer types. 1.1: Trivia 1.1.3 3 Shifts and division With two’s complement arithmetic division and multiplication by a power of 2 is a right and left shift, respectively. This is true for unsigned types and for multiplication (left shift) with signed types. Division with signed types rounds toward zero, as one would expect, but right shift is a division (by a power of 2) that rounds to −∞: int a = -1; int c = a >> 1; int d = a / 2; // c == -1 // d == 0 The compiler still uses a shift instruction for the division, but with a ‘fix’ for negative values: 9:test.cc @ int foo(int a) 10:test.cc @ { 285 0003 8B442410 movl 16(%esp),%eax // move argument to %eax 11:test.cc @ int s = a >> 1; 289 0007 89C1 movl %eax,%ecx 290 0009 D1F9 sarl $1,%ecx 12:test.cc @ int d = a / 2; 293 000b 89C2 movl %eax,%edx 294 000d C1EA1F shrl $31,%edx // fix: %edx=(%edx<0?1:0) 295 0010 01D0 addl %edx,%eax // fix: add one if a<0 296 0012 D1F8 sarl $1,%eax For unsigned types the shift would suffice. One more reason to use unsigned types whenever possible. The assembler listing was generated from C code via the following commands: # create assembler code: c++ -S -fverbose-asm -g -O2 test.cc -o test.s # create asm interlaced with source lines: as -alhnd test.s > test.lst There are two types of right shifts: a logical and an arithmetical shift. The logical version (shrl in the above fragment) always fills the higher bits with zeros, corresponding to division of unsigned types. The arithmetical shift (sarl in the above fragment) fills in ones or zeros, according to the most significant bit of the original word. Computing remainders modulo a power of 2 with unsigned types is equivalent to a bit-and: ulong a = b % 32; // == b & (32-1) All of the above is done by the compiler’s optimization wherever possible. Division by (compile time) constants can be replaced by multiplications and shifts. The compiler does it for you. A division by the constant 10 is compiled to: 5:test.cc @ ulong foo(ulong a) 6:test.cc @ { 7:test.cc @ ulong b = a / 10; 290 0000 8B442404 movl 4(%esp),%eax 291 0004 F7250000 mull .LC33 // value == 0xcccccccd 292 000a 89D0 movl %edx,%eax 293 000c C1E803 shrl $3,%eax Therefore it is sometimes reasonable to have separate code branches with explicit special values. Similar optimizations can be used for the modulo operation if the modulus is a compile time constant. For example, using modulus 10,000: 8:test.cc @ ulong foo(ulong a) 9:test.cc @ { 53 0000 8B4C2404 movl 4(%esp),%ecx 10:test.cc @ ulong b = a % 10000; 57 0004 89C8 movl %ecx,%eax 58 0006 F7250000 mull .LC0 // value == 0xd1b71759 59 000c 89D0 movl %edx,%eax 60 000e C1E80D shrl $13,%eax 61 0011 69C01027 imull $10000,%eax,%eax 62 0017 29C1 subl %eax,%ecx 63 0019 89C8 movl %ecx,%eax Algorithms to replace divisions by a constant with multiplications and shifts are given in [168], see also [346]. 4 Chapter 1: Bit wizardry Note that the C standard leaves the behavior of a right shift of a signed integer as ‘implementationdefined’. The described behavior (that a negative value remains negative after right shift) is the default behavior of many commonly used C compilers. 1.1.4 A pitfall (two’s complement) c=................ c=...............1 c=..............1. c=..............11 c=.............1.. c=.............1.1 c=.............11. [--snip--] c=.1111111111111.1 c=.11111111111111. c=.111111111111111 c=1............... c=1..............1 c=1.............1. c=1.............11 c=1............1.. c=1............1.1 c=1............11. [--snip--] c=1111111111111..1 c=1111111111111.1. c=1111111111111.11 c=11111111111111.. c=11111111111111.1 c=111111111111111. c=1111111111111111 -c=................ -c=1111111111111111 -c=111111111111111. -c=11111111111111.1 -c=11111111111111.. -c=1111111111111.11 -c=1111111111111.1. c= c= c= c= c= c= c= 0 1 2 3 4 5 6 -c=1.............11 -c=1.............1. -c=1..............1 -c=1............... -c=.111111111111111 -c=.11111111111111. -c=.1111111111111.1 -c=.1111111111111.. -c=.111111111111.11 -c=.111111111111.1. c= 32765 c= 32766 c= 32767 c=-32768 c=-32767 c=-32766 c=-32765 c=-32764 c=-32763 c=-32762 -c=-32765 -c=-32766 -c=-32767 -c=-32768 -c= 32767 -c= 32766 -c= 32765 -c= 32764 -c= 32763 -c= 32762 -c=.............111 -c=.............11. -c=.............1.1 -c=.............1.. -c=..............11 -c=..............1. -c=...............1 c= c= c= c= c= c= c= -c= -c= -c= -c= -c= -c= -c= -7 -6 -5 -4 -3 -2 -1 -c= -c= -c= -c= -c= -c= -c= 0 -1 -2 -3 -4 -5 -6 <--= <--= 7 6 5 4 3 2 1 Figure 1.1-A: With two’s complement there is one nonzero value that is its own negative. In two’s complement zero is not the only number that is equal to its negative. The value with just the highest bit set (the most negative value) also has this property. Figure 1.1-A (the output of [FXT: bits/gotcha-demo.cc]) shows the situation for words of 16 bits. This is why innocent looking code like the following can simply fail: if ( x<0 ) x = -x; // assume x positive here (WRONG!) 1.1.5 Another pitfall (shifts in the C-language) A shift by more than BITS_PER_LONG−1 is undefined by the C-standard. Therefore the following function can fail if k is zero: 1 2 3 4 5 6 7 static inline ulong first_comb(ulong k) // Return the first combination of (i.e. smallest word with) k bits, // i.e. 00..001111..1 (k low bits set) { ulong t = ~0UL >> ( BITS_PER_LONG - k ); return t; } Compilers usually emit just a shift instruction which on certain CPUs does not give zero if the shift is equal to or greater than BITS_PER_LONG. This is why the line if ( k==0 ) t = 0; // shift with BITS_PER_LONG is undefined has to be inserted just before the return statement. 1.1.6 Shortcuts Test whether at least one of a and b equals zero with if ( !(a && b) ) This works for both signed and unsigned integers. Check whether both are zero with if ( (a|b)==0 ) This obviously generalizes for several variables as if ( (a|b|c|..|z)==0 ) Test whether exactly one of two variables is zero using 1.1: Trivia 5 if ( (!a) ^ (!b) ) 1.1.7 Average without overflow A routine for the computation of the average (x + y)/2 of two arguments x and y is [FXT: bits/average.h] 1 2 3 4 5 6 7 static inline ulong average(ulong x, ulong y) // Return floor( (x+y)/2 ) // Use: x+y == ((x&y)<<1) + (x^y) // that is: sum == carries + sum_without_carries { return (x & y) + ((x ^ y) >> 1); } The function gives the correct value even if (x + y) does not fit into a machine word. If it is known that x ≥ y, then we can use the simpler statement return y+(x-y)/2. The following version rounds to infinity: 1 2 3 4 5 6 static inline ulong ceil_average(ulong x, ulong y) // Use: x+y == ((x|y)<<1) - (x^y) // ceil_average(x,y) == average(x,y) + ((x^y)&1)) { return (x | y) - ((x ^ y) >> 1); } 1.1.8 Toggling between values To toggle an integer x between two values a and b, use: pre-calculate: toggle: t = a ^ b; x ^= t; // a <--> b The equivalent trick for floating-point types is pre-calculate: toggle: t = a + b; x = t - x; Here an overflow could occur with a and b in the allowed range if both are close to overflow. 1.1.9 Next or previous even or odd value Compute the next or previous even or odd value via [FXT: bits/evenodd.h]: 1 2 3 4 5 static inline ulong next_even(ulong x) static inline ulong prev_even(ulong x) { return x+2-(x&1); } { return x-2+(x&1); } static inline ulong next_odd(ulong x) static inline ulong prev_odd(ulong x) { return x+1+(x&1); } { return x-1-(x&1); } The following functions return the unmodified argument if it has the required property, else the nearest such value: 1 2 3 4 5 static inline ulong next0_even(ulong x) static inline ulong prev0_even(ulong x) { return x+(x&1); } { return x-(x&1); } static inline ulong next0_odd(ulong x) static inline ulong prev0_odd(ulong x) { return x+1-(x&1); } { return x-1+(x&1); } Pedro Gimeno gives [priv. comm.] the following optimized versions: 1 2 3 4 5 static inline ulong next_even(ulong x) static inline ulong prev_even(ulong x) { return (x|1)+1; } { return (x-1)&~1; } static inline ulong next_odd(ulong x) static inline ulong prev_odd(ulong x) { return (x+1)|1; } { return (x&~1)-1; } 1 2 3 4 5 static inline ulong next0_even(ulong x) static inline ulong prev0_even(ulong x) { return (x+1)&~1; } { return x&~1; } static inline ulong next0_odd(ulong x) static inline ulong prev0_odd(ulong x) { return x|1; } { return (x-1)|1; } 6 Chapter 1: Bit wizardry 1.1.10 Integer versus float multiplication The floating-point multiplier gives the highest bits of the product. Integer multiplication gives the result modulo 2b where b is the number of bits of the integer type used. As an example we square the number 111111111 using a 32-bit integer type and floating-point types with 24-bit and 53-bit mantissa (significand): a = 111111111 // assignment a*a == 12345678987654321 // true result a*a == 1653732529 (a*a)%(2**32) == 1653732529 // result with 32-bit integer multiplication // ... which is modulo (2**bits_per_int) a*a == 1.2345679481405440e+16 // result with float multiplication (24 bit mantissa) a*a == 1.2345678987654320e+16 // result with float multiplication (53 bit mantissa) 1.1.11 Double precision float to signed integer conversion Conversion of double precision floats that have a 53-bit mantissa to signed integers via [11, p.52-53] 1 2 3 4 #define DOUBLE2INT(i, d) double x = 123.0; int i; DOUBLE2INT(i, x); { double t = ((d) + 6755399441055744.0); i = *((int *)(&t)); } can be a faster alternative to 1 2 double x = 123.0; int i = x; The constant used is 6755399441055744 = 252 + 251 . The method is machine dependent as it relies on the binary representation of the floating-point mantissa. Here it is assumed that, the floating-point number has a 53-bit mantissa with the most significant bit (that is always one with normalized numbers) omitted, and that the address of the number points to the mantissa. 1.1.12 Optimization considerations Never assume that some code is the ‘fastest possible’. There is always another trick that can still improve performance. Many factors can have an influence on performance, like the number of CPU registers or cost of branches. Code that performs well on one machine might perform badly on another. The old trick to swap variables without using a temporary is pretty much out of fashion today: // a=0, b=0 a=0, b=1 a ^= b; // 0 0 1 1 b ^= a; // 0 0 1 0 a ^= b; // 0 0 1 0 // equivalent to: tmp = a; a = b; a=1, b=0 1 0 1 1 0 1 b = tmp; a=1, b=1 0 1 0 1 1 1 However, under some conditions (like extreme register pressure) it may be the way to go. Note that if both operands are identical (memory locations) then the result is zero. The only way to find out which version of a function is faster is to actually do benchmarking (timing). The performance does depend on the sequence of instructions surrounding the machine code, assuming that all of these low-level functions get inlined. Studying the generated CPU instructions helps to understand what happens, but can never replace benchmarking. This means that benchmarks for just the isolated routine can at best give a rough indication. Test your application using different versions of the routine in question. Never ever delete the unoptimized version of some code fragment when introducing a streamlined one. Keep the original in the source. If something nasty happens (think of low level software failures when porting to a different platform), you will be very grateful for the chance to temporarily resort to the slow but correct version. Study the optimization recommendations for your CPU (like [11] and [12] for the AMD64, see also [144]). You can also learn a lot from the documentation for other architectures. 1.2: Operations on individual bits 7 Proper documentation is an absolute must for optimized code. Always assume that nobody will understand the code without comments. You may not be able to understand uncommented code written by yourself after enough time has passed. 1.2 Operations on individual bits 1.2.1 Testing, setting, and deleting bits The following functions should be self-explanatory. Following the spirit of the C language there is no check whether the indices used are out of bounds. That is, if any index is greater than or equal to BITS_PER_LONG, the result is undefined [FXT: bits/bittest.h]: 1 2 3 4 5 6 static inline ulong test_bit(ulong a, ulong i) // Return zero if bit[i] is zero, // else return one-bit word with bit[i] set. { return (a & (1UL << i)); } The following version returns either zero or one: 1 2 3 4 5 static inline bool test_bit01(ulong a, ulong i) // Return whether bit[i] is set. { return ( 0 != test_bit(a, i) ); } Functions for setting, clearing, and changing a bit are: 1 2 3 4 5 static inline ulong set_bit(ulong a, ulong i) // Return a with bit[i] set. { return (a | (1UL << i)); } 1 2 3 4 5 static inline ulong clear_bit(ulong a, ulong i) // Return a with bit[i] cleared. { return (a & ~(1UL << i)); } 1 2 3 4 5 static inline ulong change_bit(ulong a, ulong i) // Return a with bit[i] changed. { return (a ^ (1UL << i)); } 1.2.2 Copying a bit To copy a bit from one position to another, we generate a one if the bits at the two positions differ. Then an XOR changes the target bit if needed [FXT: bits/bitcopy.h]: 1 2 3 4 5 6 7 8 static inline ulong copy_bit(ulong a, ulong isrc, ulong idst) // Copy bit at [isrc] to position [idst]. // Return the modified word. { ulong x = ((a>>isrc) ^ (a>>idst)) & 1; // one if bits differ a ^= (x<>k1) ^ (a>>k2)) & 1; // one if bits differ a ^= (x<> 1 ) + 1; 1.3: Operations on low bits or blocks of a word 9 The sequence of returned values for x = 0, 1, . . . is the highest power of 2 that divides x + 1, entry A006519 in [312] (see also entry A001511): x: 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: == x == ........ == .......1 == ......1. == ......11 == .....1.. == .....1.1 == .....11. == .....111 == ....1... == ....1..1 == ....1.1. lowest_zero(x) .......1 ......1. .......1 .....1.. .......1 ......1. .......1 ....1... .......1 ......1. .......1 The lowest set bit in a word can be cleared by 1 2 3 4 5 6 static inline ulong clear_lowest_one(ulong x) // Return word where the lowest bit set in x is cleared. // Return 0 for input == 0. { return x & (x-1); } The lowest unset bit can be set by 1 2 3 4 5 6 static inline ulong set_lowest_zero(ulong x) // Return word where the lowest unset bit in x is set. // Return ~0 for input == ~0. { return x | (x+1); } 1.3.2 Computing the index of the lowest one We compute the index (position) of the lowest bit with an assembler instruction if available [FXT: bits/bitasm-amd64.h]: 1 2 3 4 5 6 static inline ulong asm_bsf(ulong x) // Bit Scan Forward { asm ("bsfq %0, %0" : "=r" (x) : "0" (x)); return x; } Without the assembler instruction an algorithm that involves O (log2 BITS PER LONG) operations can be used. The function can be implemented as follows (suggested by Nathan Bullock [priv. comm.], 64-bit version) [FXT: bits/bitlow.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 static inline ulong lowest_one_idx(ulong x) // Return index of lowest bit set. // Examples: // ***1 --> 0 // **10 --> 1 // *100 --> 2 // Return 0 (also) if no bit is set. { ulong r = 0; x &= -x; // isolate lowest bit if ( x & 0xffffffff00000000UL ) r += 32; if ( x & 0xffff0000ffff0000UL ) r += 16; if ( x & 0xff00ff00ff00ff00UL ) r += 8; if ( x & 0xf0f0f0f0f0f0f0f0UL ) r += 4; if ( x & 0xccccccccccccccccUL ) r += 2; if ( x & 0xaaaaaaaaaaaaaaaaUL ) r += 1; return r; } The function returns zero for two inputs, one and zero. If a special value for the input zero is needed, a statement as the following should be added as the first line of the function: if ( 1>=x ) return x-1; // 0 if 1, ~0 if 0 The following function returns the parity of the index of the lowest set bit in a binary word 1 2 3 static inline ulong lowest_one_idx_parity(ulong x) { x &= -x; // isolate lowest bit 10 4 5 Chapter 1: Bit wizardry return 0 != (x & 0xaaaaaaaaaaaaaaaaUL); } The sequence of values for x = 0, 1, 2, . . . is 0010001010100010001000101010001010100010101000100010001010100010... This is the complement of the period-doubling sequence, entry A035263 in [312]. See section 38.5.1 on page 735 for the connection to the towers of Hanoi puzzle. 1.3.3 Isolating blocks of zeros or ones at the low end Isolate the burst of low ones as follows [FXT: bits/bitlow.h]: 1 2 3 4 5 6 7 8 9 10 11 static inline ulong low_ones(ulong x) // Return word where all the (low end) ones are set. // Example: 01011011 --> 00000011 // Return 0 if lowest bit is zero: // 10110110 --> 0 { x = ~x; x &= -x; --x; return x; } The isolation of the low zeros is slightly cheaper: 1 2 3 4 5 6 7 8 9 static inline ulong low_zeros(ulong x) // Return word where all the (low end) zeros are set. // Example: 01011000 --> 00000111 // Return 0 if all bits are set. { x &= -x; --x; return x; } The lowest block of ones (which may have zeros to the right of it) can be isolated by 1 2 3 4 5 6 7 8 9 10 11 12 13 14 static inline ulong lowest_block(ulong x) // Isolate lowest block of ones. // e.g.: // x = *****011100 // l = 00000000100 // y = *****100000 // x^y = 00000111100 // ret = 00000011100 { ulong l = x & -x; // lowest bit ulong y = x + l; x ^= y; return x & (x>>1); } 1.3.4 Creating a transition at the lowest one Use the following routines to set a rising or falling edge at the position of the lowest set bit [FXT: bits/bitlow-edge.h]: 1 2 3 4 5 6 7 8 static inline ulong lowest_one_10edge(ulong x) // Return word where all bits from (including) the // lowest set bit to most significant bit are set. // Return 0 if no bit is set. // Example: 00110100 --> 11111100 { return ( x | -x ); } 1 2 3 4 5 static inline ulong lowest_one_01edge(ulong x) // Return word where all bits from (including) the // lowest set bit to the least significant are set. // Return 0 if no bit is set. // Example: 00110100 --> 00000111 1.4: Extraction of ones, zeros, or blocks near transitions 6 7 8 9 11 { if ( 0==x ) return 0; return x^(x-1); } 1.3.5 Isolating the lowest run of matching bits Let x = ∗0W and y = ∗1W , the following function computes W : 1 2 3 4 5 6 7 8 static inline ulong low_match(ulong x, ulong y) { x ^= y; // bit-wise difference x &= -x; // lowest bit that differs in both words x -= 1; // mask that covers equal bits at low end x &= y; // isolate matching bits return x; } 1.4 Extraction of ones, zeros, or blocks near transitions We give functions for the creation or extraction of bit-blocks and the isolation of values near transitions. A transition is a place where adjacent bits have different values. A block is a group of adjacent bits of the same value. 1.4.1 Creating blocks of ones The following functions are given in [FXT: bits/bitblock.h]. 1 2 3 4 5 6 7 static inline ulong bit_block(ulong p, ulong n) // Return word with length-n bit block starting at bit p set. // Both p and n are effectively taken modulo BITS_PER_LONG. { ulong x = (1UL<>(BITS_PER_LONG-p)); } 1.4.2 Finding isolated ones or zeros The following functions are given in [FXT: bits/bit-isolate.h]: 1 2 3 4 5 static inline ulong single_ones(ulong x) // Return word with only the isolated ones of x set. { return x & ~( (x<<1) | (x>>1) ); } We can assume a word is embedded in zeros or ignore the bits outside the word: 1 2 3 4 5 static inline ulong single_zeros_xi(ulong x) // Return word with only the isolated zeros of x set. { return single_ones( ~x ); // ignore outside values } 1 2 3 4 5 static inline ulong single_zeros(ulong x) // Return word with only the isolated zeros of x set. { return ~x & ( (x<<1) & (x>>1) ); // assume outside values == 0 } 12 Chapter 1: Bit wizardry 1 2 3 4 5 static inline ulong single_values(ulong x) // Return word where only the isolated ones and zeros of x are set. { return (x ^ (x<<1)) & (x ^ (x>>1)); // assume outside values == 0 } 1 2 3 4 5 static inline ulong single_values_xi(ulong x) // Return word where only the isolated ones and zeros of x are set. { return single_ones(x) | single_zeros_xi(x); // ignore outside values } 1.4.3 Isolating single ones or zeros at the word boundary 1 2 3 4 5 static inline ulong border_ones(ulong x) // Return word where only those ones of x are set that lie next to a zero. { return x & ~( (x<<1) & (x>>1) ); } 1 2 3 4 5 static inline ulong border_values(ulong x) // Return word where those bits of x are set that lie on a transition. { return (x ^ (x<<1)) | (x ^ (x>>1)); } 1.4.4 Isolating transitions 1 2 3 4 5 6 static inline ulong high_border_ones(ulong x) // Return word where only those ones of x are set // that lie right to (i.e. in the next lower bin of) a zero. { return x & ( x ^ (x>>1) ); } 1 2 3 4 5 6 static inline ulong low_border_ones(ulong x) // Return word where only those ones of x are set // that lie left to (i.e. in the next higher bin of) a zero. { return x & ( x ^ (x<<1) ); } 1.4.5 Isolating ones or zeros at block boundaries 1 2 3 4 5 6 static inline ulong block_border_ones(ulong x) // Return word where only those ones of x are set // that are at the border of a block of at least 2 bits. { return x & ( (x<<1) ^ (x>>1) ); } 1 2 3 4 5 6 7 static inline ulong low_block_border_ones(ulong x) // Return word where only those bits of x are set // that are at left of a border of a block of at least 2 bits. { ulong t = x & ( (x<<1) ^ (x>>1) ); // block_border_ones() return t & (x>>1); } 1 2 3 4 5 6 7 static inline ulong high_block_border_ones(ulong x) // Return word where only those bits of x are set // that are at right of a border of a block of at least 2 bits. { ulong t = x & ( (x<<1) ^ (x>>1) ); // block_border_ones() return t & (x<<1); } 1 2 3 4 5 6 static inline ulong block_ones(ulong x) // Return word where only those bits of x are set // that are part of a block of at least 2 bits. { return x & ( (x<<1) | (x>>1) ); } 1.5: Computing the index of a single set bit 1.5 13 Computing the index of a single set bit In the function lowest_one_idx() given in section 1.3.2 on page 9 we first isolated the lowest one of a word x by first setting x&=-x. At this point, x contains just one set bit (or x==0). The following lines in the routine compute the index of the only bit set. This section gives some alternative techniques to compute the index of the one in a single-bit word. 1.5.1 Cohen’s trick modulus m=11 k = 0 1 mt[k]= 0 0 2 1 3 8 Lowest bit == 0: Lowest bit == 1: Lowest bit == 2: Lowest bit == 3: Lowest bit == 4: Lowest bit == 5: Lowest bit == 6: Lowest bit == 7: 4 2 5 4 6 9 7 7 x= .......1 = 1 x= ......1. = 2 x= .....1.. = 4 x= ....1... = 8 x= ...1.... = 16 x= ..1..... = 32 x= .1...... = 64 x= 1....... = 128 x % m= 1 ==> lookup = 0 x % m= 2 ==> lookup = 1 x % m= 4 ==> lookup = 2 x % m= 8 ==> lookup = 3 x % m= 5 ==> lookup = 4 x % m= 10 ==> lookup = 5 x % m= 9 ==> lookup = 6 x % m= 7 ==> lookup = 7 Figure 1.5-A: Determination of the position of a single bit with 8-bit words. A nice trick is presented in [110]: for N -bit words find a number m such that all powers of 2 are different modulo m. That is, the (multiplicative) order of 2 modulo m must be greater than or equal to N . We use a table mt[] of size m that contains the power of 2: mt[(2**j) mod m] = j for j > 0. To look up the index of a one-bit-word x it is reduced modulo m and mt[x] is returned. We demonstrate the method for N = 8 where m = 11 is the smallest number with the required property. The setup routine for the table is 1 2 3 4 5 6 7 8 9 10 11 12 13 const ulong m = 11; // the modulus ulong mt[m+1]; static void mt_setup() { mt[0] = 0; // special value for the zero word ulong t = 1; for (ulong i=1; i=m ) t -= m; // modular reduction } } The entry in mt[0] will be accessed when the input is the zero word. We can use any value to be returned for input zero. Here we simply use zero to always have the same return value as with lowest_one_idx(). The index can be computed by 1 2 3 4 5 6 static inline ulong m_lowest_one_idx(ulong x) { x &= -x; // isolate lowest bit x %= m; // power of 2 modulo m return mt[x]; // lookup } The code is given in the program [FXT: bits/modular-lookup-demo.cc], the output with N = 8 (edited for size) is shown in figure 1.5-A. The following moduli m(N ) can be used for N -bit words: N: m: 4 5 8 11 16 19 32 37 64 67 128 131 256 269 512 523 1024 1061 The modulus m(N ) is the smallest prime greater than N such that 2 is a primitive root modulo m(N ). 14 Chapter 1: Bit wizardry db=...1.111 (De Bruijn sequence) k = 0 1 2 3 4 5 6 7 dbt[k] = 0 1 2 4 7 3 6 5 Lowest bit == 0: x = .......1 Lowest bit == 1: x = ......1. Lowest bit == 2: x = .....1.. Lowest bit == 3: x = ....1... Lowest bit == 4: x = ...1.... Lowest bit == 5: x = ..1..... Lowest bit == 6: x = .1...... Lowest bit == 7: x = 1....... db * x = ...1.111 db * x = ..1.111. db * x = .1.111.. db * x = 1.111... db * x = .111.... db * x = 111..... db * x = 11...... db * x = 1....... shifted = ........ == 0 ==> lookup = 0 shifted = .......1 == 1 ==> lookup = 1 shifted = ......1. == 2 ==> lookup = 2 shifted = .....1.1 == 5 ==> lookup = 3 shifted = ......11 == 3 ==> lookup = 4 shifted = .....111 == 7 ==> lookup = 5 shifted = .....11. == 6 ==> lookup = 6 shifted = .....1.. == 4 ==> lookup = 7 Figure 1.5-B: Computing the position of the single set bit in 8-bit words with a De Bruijn sequence. 1.5.2 Using De Bruijn sequences The following method (given in [228]) is even more elegant. It uses binary De Bruijn sequences of size N . A binary De Bruijn sequence of length 2N contains all binary words of length N , see section 41.1 on page 864. These are the sequences for 32 and 64 bit, as binary words: #if BITS_PER_LONG == 32 const ulong db = 0x4653ADFUL; // == 00000100011001010011101011011111 const ulong s = 32-5; #else const ulong db = 0x218A392CD3D5DBFUL; // == 0000001000011000101000111001001011001101001111010101110110111111 const ulong s = 64-6; #endif Let wi be the i-th sub-word from the left (high end). We create a table such that the entry with index wi points to i: 1 2 3 4 5 ulong dbt[BITS_PER_LONG]; static void dbt_setup() { for (ulong i=0; i>s ] = i; The computation of the index involves a multiplication and a table lookup: 1 2 3 4 5 6 7 static inline ulong db_lowest_one_idx(ulong x) { x &= -x; // isolate lowest bit x *= db; // multiplication by a power of 2 is a shift x >>= s; // use log_2(BITS_PER_LONG) highest bits return dbt[x]; // lookup } The used sequences must start with at least log2 (N ) − 1 zeros because in the line x *= db the word x is shifted (not rotated). The code is given in the demo [FXT: bits/debruijn-lookup-demo.cc], the output with N = 8 (edited for size, dots denote zeros) is shown in figure 1.5-B. 1.5.3 Using floating-point numbers Floating-point numbers are normalized so that the highest bit in the mantissa is set. Therefore if we convert an integer into a float, the position of the highest set bit can be read off the exponent. By isolating the lowest bit before that operation, the index can be found with the same trick. However, the conversion between integers and floats is usually slow. Further, the technique is highly machine dependent. 1.6 Operations on high bits or blocks of a word For functions operating on the highest bit there is no method as trivial as shown for the lower end of the word. With a bit-reverse CPU-instruction available life would be significantly easier. However, almost no CPU seems to have it. 1.6: Operations on high bits or blocks of a word 1.6.1 15 Isolating the highest one and finding its index ................1111....1111.111 = 0xf0f7 == word ................1............... = highest_one ................1111111111111111 = highest_one_01edge 11111111111111111............... = highest_one_10edge 15 = highest_one_idx ................................ = low_zeros .............................111 = low_ones ...............................1 = lowest_one ...............................1 = lowest_one_01edge 11111111111111111111111111111111 = lowest_one_10edge 0 = lowest_one_idx .............................111 = lowest_block ................1111....1111.11. = clear_lowest_one ............................1... = lowest_zero ................1111....11111111 = set_lowest_zero ................................ = high_ones 1111111111111111................ = high_zeros 1............................... = highest_zero 1...............1111....1111.111 = set_highest_zero 1111111111111111....1111....1... = 0xffff0f08 == word 1............................... = highest_one 11111111111111111111111111111111 = highest_one_01edge 1............................... = highest_one_10edge 31 = highest_one_idx .............................111 = low_zeros ................................ = low_ones ............................1... = lowest_one ............................1111 = lowest_one_01edge 11111111111111111111111111111... = lowest_one_10edge 3 = lowest_one_idx ............................1... = lowest_block 1111111111111111....1111........ = clear_lowest_one ...............................1 = lowest_zero 1111111111111111....1111....1..1 = set_lowest_zero 1111111111111111................ = high_ones ................................ = high_zeros ................1............... = highest_zero 11111111111111111...1111....1... = set_highest_zero Figure 1.6-A: Operations on the highest and lowest bits (and blocks) of a binary word for two different 32-bit input words. Dots denote zeros. Isolation of the highest set bit is easy if a bit-scan instruction is available [FXT: bits/bitasm-i386.h]: 1 2 3 4 5 6 static inline ulong asm_bsr(ulong x) // Bit Scan Reverse { asm ("bsrl %0, %0" : "=r" (x) : "0" (x)); return x; } Without a bit-scan instruction, we use the auxiliary function [FXT: bits/bithigh-edge.h] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 static inline ulong highest_one_01edge(ulong x) // Return word where all bits from (including) the // highest set bit to bit 0 are set. // Return 0 if no bit is set. { x |= x>>1; x |= x>>2; x |= x>>4; x |= x>>8; x |= x>>16; #if BITS_PER_LONG >= 64 x |= x>>32; #endif return x; } The resulting code is [FXT: bits/bithigh.h] 16 1 2 3 4 5 6 7 8 9 10 11 12 13 Chapter 1: Bit wizardry static inline ulong highest_one(ulong x) // Return word where only the highest bit in x is set. // Return 0 if no bit is set. { #if defined BITS_USE_ASM if ( 0==x ) return 0; x = asm_bsr(x); return 1UL<>1); #endif // BITS_USE_ASM } To determine the index of the highest set bit, use 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 static inline ulong highest_one_idx(ulong x) // Return index of highest bit set. // Return 0 if no bit is set. { #if defined BITS_USE_ASM return asm_bsr(x); #else // BITS_USE_ASM if ( 0==x ) return 0; ulong r = 0; BITS_PER_LONG >= 64 if ( x & 0xffffffff00000000UL ) { x >>= 32; r += 32; } #endif if ( x & 0xffff0000UL ) { x >>= 16; r += 16; } if ( x & 0x0000ff00UL ) { x >>= 8; r += 8; } if ( x & 0x000000f0UL ) { x >>= 4; r += 4; } if ( x & 0x0000000cUL ) { x >>= 2; r += 2; } if ( x & 0x00000002UL ) { r += 1; } return r; #endif // BITS_USE_ASM } #if The branches in the non-assembler part of the routine can be avoided by a technique given in [215, rel.96, sect.7.1.3] (version for 64-bit words): 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 static inline ulong highest_one_idx(ulong x) { #define MU0 0x5555555555555555UL // MU0 == ((-1UL)/3UL) == ...01010101_2 #define MU1 0x3333333333333333UL // MU1 == ((-1UL)/5UL) == ...00110011_2 #define MU2 0x0f0f0f0f0f0f0f0fUL // MU2 == ((-1UL)/17UL) == ...00001111_2 #define MU3 0x00ff00ff00ff00ffUL // MU3 == ((-1UL)/257UL) == (8 ones) #define MU4 0x0000ffff0000ffffUL // MU4 == ((-1UL)/65537UL) == (16 ones) #define MU5 0x00000000ffffffffUL // MU5 == ((-1UL)/4294967297UL) == (32 ones) ulong r = ld_neq(x, x & MU0) + (ld_neq(x, x & MU1) << 1) + (ld_neq(x, x & MU2) << 2) + (ld_neq(x, x & MU3) << 3) + (ld_neq(x, x & MU4) << 4) + (ld_neq(x, x & MU5) << 5); return r; } The auxiliary function ld_neq() is given in [FXT: bits/bitldeq.h]: 1 2 3 static inline bool ld_neq(ulong x, ulong y) // Return whether floor(log2(x))!=floor(log2(y)) { return ( (x^y) > (x&y) ); } The following version for 64-bit words provided by Sebastiano Vigna [priv. comm.] is an implementation of Brodal’s algorithm [215, alg.B, sect.7.1.3]: 1 2 3 4 5 6 7 8 9 static inline ulong highest_one_idx(ulong x) { if ( x == 0 ) return 0; ulong r = 0; if ( x & 0xffffffff00000000UL ) { x >>= 32; if ( x & 0xffff0000UL ) { x >>= 16; x |= (x << 16); x |= (x << 32); const ulong y = x & 0xff00f0f0ccccaaaaUL; r += 32; } r += 16; } 1.7: Functions related to the base-2 logarithm 10 11 12 13 14 15 16 17 const ulong z = 0x8000800080008000UL; ulong t = z & ( y | (( y | z ) - ( x ^ y ))); t |= (t << 15); t |= (t << 30); t |= (t << 60); return r + ( t >> 60 ); } 1.6.2 Isolating the highest block of ones or zeros Isolate the left block of zeros with the function 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 static inline ulong high_zeros(ulong x) // Return word where all the (high end) zeros are set. // e.g.: 00011001 --> 11100000 // Returns 0 if highest bit is set: // 11011001 --> 00000000 { x |= x>>1; x |= x>>2; x |= x>>4; x |= x>>8; x |= x>>16; #if BITS_PER_LONG >= 64 x |= x>>32; #endif return ~x; } The left block of ones can be isolated using arithmetical right shifts: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 static inline ulong high_ones(ulong x) // Return word where all the (high end) ones are set. // e.g. 11001011 --> 11000000 // Returns 0 if highest bit is zero: // 01110110 --> 00000000 { long y = (long)x; y &= y>>1; y &= y>>2; y &= y>>4; y &= y>>8; y &= y>>16; #if BITS_PER_LONG >= 64 y &= y>>32; #endif return (ulong)y; } If arithmetical shifts are more expensive than unsigned shifts, use 1 static inline ulong high_ones(ulong x) { return high_zeros( ~x ); } A demonstration of selected functions operating on the highest or lowest bit (or block) of binary words is given in [FXT: bits/bithilo-demo.cc]. Part of its output is shown in figure 1.6-A. 1.7 Functions related to the base-2 logarithm The following functions are given in [FXT: bits/bit2pow.h]. A function that returns blog2 (x)c can be implemented using the obvious algorithm: 1 2 3 4 5 6 7 8 9 static inline ulong ld(ulong x) // Return floor(log2(x)), // i.e. return k so that 2^k <= x < 2^(k+1) // If x==0, then 0 is returned (!) { ulong k = 0; while ( x>>=1 ) { ++k; } return k; } The result is the same as returned by highest_one_idx(): 18 1 Chapter 1: Bit wizardry static inline ulong ld(ulong x) { return highest_one_idx(x); } The bit-wise algorithm can be faster if the average result is known to be small. Use the function one_bit_q() to determine whether its argument is a power of 2: 1 2 3 4 5 6 static inline bool one_bit_q(ulong x) // Return whether x \in {1,2,4,8,16,...} { ulong m = x-1; return (((x^m)>>1) == m); } The following function does the same except that it returns true also for the zero argument: 1 2 3 static inline bool is_pow_of_2(ulong x) // Return whether x == 0(!) or x == 2**k { return !(x & (x-1)); } With FFTs where the length of the transform is often restricted to power of 2 the following functions are useful: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 static inline ulong next_pow_of_2(ulong x) // Return x if x=2**k // else return 2**ceil(log_2(x)) // Exception: returns 0 for x==0 { if ( is_pow_of_2(x) ) return x; x |= x >> 1; x |= x >> 2; x |= x >> 4; x |= x >> 8; x |= x >> 16; #if BITS_PER_LONG == 64 x |= x >> 32; #endif return x + 1; } 1 2 3 4 5 6 7 static inline ulong next_exp_of_2(ulong x) // Return k if x=2**k else return k+1. // Exception: returns 0 for x==0. { if ( x <= 1 ) return 0; return ld(x-1) + 1; } The following version should be faster if inline assembler is used for ld(): 1 2 3 4 5 6 static inline ulong next_pow_of_2(ulong x) { if ( is_pow_of_2(x) ) return x; ulong n = 1UL<> 1)); x = (0x3333333333333333UL & x) + (0x3333333333333333UL & (x>> 2)); x = (0x0f0f0f0f0f0f0f0fUL & x) + (0x0f0f0f0f0f0f0f0fUL & (x>> 4)); x = (0x00ff00ff00ff00ffUL & x) + (0x00ff00ff00ff00ffUL & (x>> 8)); // 0-2 in 2 bits // 0-4 in 4 bits // 0-8 in 8 bits // 0-16 in 16 bits 1.8: Counting the bits and blocks of a word 8 9 10 11 x = (0x0000ffff0000ffffUL & x) + (0x0000ffff0000ffffUL & (x>>16)); x = (0x00000000ffffffffUL & x) + (0x00000000ffffffffUL & (x>>32)); return x; 19 // 0-32 in 32 bits // 0-64 in 64 bits } The underlying idea is to do a search via bit masks. The code can be improved to either 1 2 3 4 5 6 7 x = ((x>>1) & 0x5555555555555555UL) + (x & 0x5555555555555555UL); x = ((x>>2) & 0x3333333333333333UL) + (x & 0x3333333333333333UL); x = ((x>>4) + x) & 0x0f0f0f0f0f0f0f0fUL; x += x>> 8; x += x>>16; x += x>>32; return x & 0xff; // 0-2 in 2 bits // 0-4 in 4 bits // 0-8 in 8 bits // 0-16 in 8 bits // 0-32 in 8 bits // 0-64 in 8 bits or (taken from [10]) 1 2 3 4 5 x -= (x>>1) & 0x5555555555555555UL; x = ((x>>2) & 0x3333333333333333UL) + (x & 0x3333333333333333UL); x = ((x>>4) + x) & 0x0f0f0f0f0f0f0f0fUL; x *= 0x0101010101010101UL; return x>>56; // 0-2 in 2 bits // 0-4 in 4 bits // 0-8 in 8 bits Which of the latter two versions is faster mainly depends on the speed of integer multiplication. The following code for 32-bit words (given by Johan Rönnblom [priv. comm.]) may be advantageous if loading constants is expensive. Note some constants are in octal notation: 1 2 3 4 5 6 7 8 9 static inline uint CountBits32(uint a) { uint mask = 011111111111UL; a = (a - ((a&~mask)>>1)) - ((a>>2)&mask); a += a>>3; a = (a & 070707) + ((a>>18) & 070707); a *= 010101; return ((a>>12) & 0x3f); } If the table holds the bit-counts of the numbers 0. . . 255, then the bits can be counted as follows: 1 2 3 4 5 6 7 8 9 ulong bit_count(ulong x) { unsigned char ct = 0; ct += tab[ x & 0xff ]; x >>= 8; ct += tab[ x & 0xff ]; x >>= 8; [--snip--] /* BYTES_PER_LONG times */ ct += tab[ x & 0xff ]; return ct; } However, while table driven methods tend to excel in synthetic benchmarks, they can be very slow if they cause cache misses. We give a method to count the bits of a word of a special form: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 static inline ulong bit_count_01(ulong x) // Return number of bits in a word // for words of the special form 00...0001...11 { ulong ct = 0; ulong a; #if BITS_PER_LONG == 64 a = (x & (1UL<<32)) >> (32-5); // test bit 32 x >>= a; ct += a; #endif a = (x & (1UL<<16)) >> (16-4); // test bit 16 x >>= a; ct += a; a = (x & (1UL<<8)) >> (8-3); x >>= a; ct += a; // test bit 8 a = (x & (1UL<<4)) >> (4-2); x >>= a; ct += a; // test bit 4 a = (x & (1UL<<2)) >> (2-1); x >>= a; ct += a; // test bit 2 a = (x & (1UL<<1)) >> (1-0); // test bit 1 20 24 25 26 27 28 29 Chapter 1: Bit wizardry x >>= a; ct += a; ct += x & 1; // test bit 0 return ct; } All branches are avoided, thereby the code may be useful on a planet with pink air, for further details see [301]. 1.8.1 Sparse counting If the (average input) word is known to have only a few bits set, the following sparse count variant can be advantageous: 1 2 3 4 5 6 7 static inline ulong bit_count_sparse(ulong x) // Return number of bits set. { ulong n = 0; while ( x ) { ++n; x &= (x-1); } return n; } The loop will execute once for each set bit. Partial unrolling of the loop should be an improvement for most cases: 1 2 3 4 5 6 7 8 9 10 ulong n = 0; do { n += (x!=0); n += (x!=0); n += (x!=0); n += (x!=0); } while ( x ); return n; x &= (x-1); x &= (x-1); x &= (x-1); x &= (x-1); If the number of bits is close to the maximum, use the given routine with the complement: 1 2 3 4 5 6 7 static inline ulong bit_count_dense(ulong x) // Return number of bits set. // The loop (of bit_count_sparse()) will execute once for // each unset bit (i.e. zero) of x. { return BITS_PER_LONG - bit_count_sparse( ~x ); } If the number of ones is guaranteed to be less than 16, then the following routine (suggested by Gunther Piez [priv. comm.]) can be used: 1 2 3 4 5 6 7 8 static inline ulong bit_count_15(ulong x) // Return number of set bits, must have at most 15 set bits. { x -= (x>>1) & 0x5555555555555555UL; x = ((x>>2) & 0x3333333333333333UL) + (x & 0x3333333333333333UL); x *= 0x1111111111111111UL; return x>>60; } A routine for words with no more than 3 set bits is 1 2 3 4 5 6 static inline ulong bit_count_3(ulong x) { x -= (x>>1) & 0x5555555555555555UL; x *= 0x5555555555555555UL; return x>>62; } 1.8.2 // 0-2 in 2 bits Counting blocks Compute the number of bit-blocks in a binary word with the following function: 1 2 3 4 static inline ulong bit_block_count(ulong x) // Return number of bit blocks. // E.g.: // ..1..11111...111. -> 3 // 0-2 in 2 bits // 0-4 in 4 bits 1.8: Counting the bits and blocks of a word 5 6 7 8 9 10 21 // ...1..11111...111 -> 3 // ......1.....1.1.. -> 3 // .........111.1111 -> 2 { return (x & 1) + bit_count( (x^(x>>1)) ) / 2; } Similarly, the number of blocks with two or more bits can be counted via: 1 2 3 4 5 6 7 8 9 10 static inline ulong bit_block_ge2_count(ulong x) // Return number of bit blocks with at least 2 bits. // E.g.: // ..1..11111...111. -> 2 // ...1..11111...111 -> 2 // ......1.....1.1.. -> 0 // .........111.1111 -> 2 { return bit_block_count( x & ( (x<<1) & (x>>1) ) ); } 1.8.3 GCC built-in functions ‡ Newer versions of the C compiler of the GNU Compiler Collection (GCC [146], starting with version 3.4) include a function __builtin_popcountl(ulong) that counts the bits of an unsigned long integer. The following list is taken from [147]: int __builtin_ffs (unsigned int x) Returns one plus the index of the least significant 1-bit of x, or if x is zero, returns zero. int __builtin_clz (unsigned int x) Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined. int __builtin_ctz (unsigned int x) Returns the number of trailing 0-bits in x, starting at the least significant bit position. If x is 0, the result is undefined. int __builtin_popcount (unsigned int x) Returns the number of 1-bits in x. int __builtin_parity (unsigned int x) Returns the parity of x, i.e. the number of 1-bits in x modulo 2. The names of the corresponding versions for arguments of type unsigned long are obtained by adding ‘l’ (ell) to the names, for the type unsigned long long append ‘ll’. Two more useful built-ins are: void __builtin_prefetch (const void *addr, ...) Prefetch memory location addr long __builtin_expect (long exp, long c) Function to provide the compiler with branch prediction information. 1.8.4 Counting the bits of many words ‡ x[ 0]=11111111 x[ 1]=11111111 x[ 2]=11111111 x[ 3]=11111111 x[ 4]=11111111 x[ 5]=11111111 x[ 6]=11111111 x[ 7]=11111111 x[ 8]=11111111 x[ 9]=11111111 x[10]=11111111 x[11]=11111111 x[12]=11111111 x[13]=11111111 x[14]=11111111 x[15]=11111111 x[16]=11111111 a0=11111111 a0=........ a0=11111111 a0=........ a0=11111111 a0=........ a0=11111111 a0=........ a0=11111111 a0=........ a0=11111111 a0=........ a0=11111111 a0=........ a0=11111111 a0=........ a0=11111111 a1=........ a1=11111111 a1=11111111 a1=........ a1=........ a1=11111111 a1=11111111 a1=........ a1=........ a1=11111111 a1=11111111 a1=........ a1=........ a1=11111111 a1=11111111 a1=........ a1=........ a2=........ a2=........ a2=........ a2=11111111 a2=11111111 a2=11111111 a2=11111111 a2=........ a2=........ a2=........ a2=........ a2=11111111 a2=11111111 a2=11111111 a2=11111111 a2=........ a2=........ a3=........ a3=........ a3=........ a3=........ a3=........ a3=........ a3=........ a3=11111111 a3=11111111 a3=11111111 a3=11111111 a3=11111111 a3=11111111 a3=11111111 a3=11111111 a3=........ a3=........ a4=........ a4=........ a4=........ a4=........ a4=........ a4=........ a4=........ a4=........ a4=........ a4=........ a4=........ a4=........ a4=........ a4=........ a4=........ a4=11111111 a4=11111111 Figure 1.8-A: Counting the bits of an array (where all bits are set) via vertical addition. 22 Chapter 1: Bit wizardry For counting the bits in a long array the technique of vertical addition can be useful. For ordinary addition the following relation holds: a + b == (a^b) + ((a&b)<<1) The carry term (a&b) is propagated to the left. We now replace this ‘horizontal’ propagation by a ‘vertical’ one, that is, propagation into another word. An implementation of this idea is [FXT: bits/bitcount-vdemo.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ulong bit_count_leq31(const ulong *x, ulong n) // Return sum(j=0, n-1, bit_count(x[j]) ) // Must have n<=31 { ulong a0=0, a1=0, a2=0, a3=0, a4=0; // 1, 3, 7, 15, 31, <--= max n for (ulong k=0; k>d) & 1 ) ); } The function uses the precomputed array [FXT: bits/tinyfactors.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 extern const ulong tiny_factors_tab[] = { 0x0UL, // x = 0: ( bits: ........) 0x2UL, // x = 1: 1 ( bits: ......1.) 0x6UL, // x = 2: 1 2 ( bits: .....11.) 0xaUL, // x = 3: 1 3 ( bits: ....1.1.) 0x16UL, // x = 4: 1 2 4 ( bits: ...1.11.) 0x22UL, // x = 5: 1 5 ( bits: ..1...1.) 0x4eUL, // x = 6: 1 2 3 6 ( bits: .1..111.) 0x82UL, // x = 7: 1 7 ( bits: 1.....1.) 0x116UL, // x = 8: 1 2 4 8 0x20aUL, // x = 9: 1 3 9 [--snip--] 0x20000002UL, // x = 29: 1 29 0x4000846eUL, // x = 30: 1 2 3 5 6 10 15 30 0x80000002UL, // x = 31: 1 31 #if ( BITS_PER_LONG > 32 ) 0x100010116UL, // x = 32: 1 2 4 8 16 32 0x20000080aUL, // x = 33: 1 3 11 33 [--snip--] 0x2000000000000002UL, // x = 61: 1 61 0x4000000080000006UL, // x = 62: 1 2 31 62 0x800000000020028aUL // x = 63: 1 3 7 9 21 63 #endif // ( BITS_PER_LONG > 32 ) }; Bit-arrays of arbitrary size are discussed in section 4.6 on page 164. 1.10: Index of the i-th set bit 1.10 25 Index of the i-th set bit To determine the index of the i-th set bit, we use a technique similar to the method for counting the bits of a word. Only the 64-bit version is shown [FXT: bits/ith-one-idx.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 static inline ulong ith_one_idx(ulong x, ulong i) // Return index of the i-th set bit of x where 0 <= i < bit_count(x). { ulong x2 = x - ((x>>1) & 0x5555555555555555UL); // 0-2 in 2 bits ulong x4 = ((x2>>2) & 0x3333333333333333UL) + (x2 & 0x3333333333333333UL); // 0-4 in 4 bits ulong x8 = ((x4>>4) + x4) & 0x0f0f0f0f0f0f0f0fUL; // 0-8 in 8 bits ulong ct = (x8 * 0x0101010101010101UL) >> 56; // bit count ++i; if ( ct < i ) return ~0UL; // less than i bits set ulong x16 = (0x00ff00ff00ff00ffUL & x8) + (0x00ff00ff00ff00ffUL & (x8>>8)); ulong x32 = (0x0000ffff0000ffffUL & x16) + (0x0000ffff0000ffffUL & (x16>>16)); // 0-16 // 0-32 ulong w, s = 0; w = x32 & 0xffffffffUL; if ( w < i ) { s += 32; i -= w; } x16 >>= s; w = x16 & 0xffff; if ( w < i ) { s += 16; i -= w; } x8 >>= s; w = x8 & 0xff; if ( w < i ) { s += 8; i -= w; } x4 >>= s; w = x4 & 0xf; if ( w < i ) { s += 4; i -= w; } x2 >>= s; w = x2 & 3; if ( w < i ) i -= w; } { s += 2; x >>= s; s += ( (x&1) != i ); return s; } 1.11 Avoiding branches Branches are expensive operations with many CPUs, especially if the CPU pipeline is very long. A useful trick is to replace if ( (x<0) || (x>m) ) { ... } where x might be a signed integer, by if ( (unsigned)x > m ) { ... } The obvious code to test whether a point (x, y) lies outside a square box of size m is if ( (x<0) || (x>m) || (y<0) || (y>m) ) { ... } If m is a power of 2, it is better to use if ( ( (ulong)x | (ulong)y ) > (unsigned)m ) { ... } The following functions are given in [FXT: bits/branchless.h]. This function returns max(0, x). That is, zero is returned for negative input, else the unmodified input: 1 2 3 4 static inline long max0(long x) { return x & ~(x >> (BITS_PER_LONG-1)); } There is no restriction on the input range. The trick used is that with negative x the arithmetic shift will give a word of all ones which is then negated and the AND-operation clears all bits. Note this function 26 Chapter 1: Bit wizardry will only work if the compiler emits an arithmetic right shift, see section 1.1.3 on page 3. The following routine computes min(0, x): 1 2 3 4 5 static inline long min0(long x) // Return min(0, x), i.e. return zero for positive input { return x & (x >> (BITS_PER_LONG-1)); } The following upos_*() functions only work for a limited range. The highest bit must not be set as it is used to emulate the carry flag. Branchless computation of the absolute difference |a − b|: 1 2 3 4 5 6 static inline ulong upos_abs_diff(ulong a, ulong b) { long d1 = b - a; long d2 = (d1 & (d1>>(BITS_PER_LONG-1)))<<1; return d1 - d2; // == (b - d) - (a + d); } The following routine sorts two values: 1 2 3 4 5 6 7 8 9 static inline void upos_sort2(ulong &a, ulong &b) // Set {a, b} := {min(a, b), max(a,b)} // Both a and b must not have the most significant bit set { long d = b - a; d &= (d>>(BITS_PER_LONG-1)); a += d; b -= d; } Johan Rönnblom gives [priv. comm.] the following versions for signed integer minimum, maximum, and absolute value, that can be advantageous for CPUs where immediates are expensive: 1 2 3 4 #define B1 (BITS_PER_LONG-1) // bits of signed int minus one #define MINI(x,y) (((x) & (((int)((x)-(y)))>>B1)) + ((y) & ~(((int)((x)-(y)))>>B1))) #define MAXI(x,y) (((x) & ~(((int)((x)-(y)))>>B1)) + ((y) & (((int)((x)-(y))>>B1)))) #define ABSI(x) (((x) & ~(((int)(x))>>B1)) - ((x) & (((int)(x))>>B1))) Your compiler may be smarter than you thought The machine code generated for x = x & ~(x >> (BITS_PER_LONG-1)); // max0() is 35: 37: 3b: 3e: 48 99 48 83 c4 08 48 f7 d2 48 21 d0 cqto add not and $0x8,%rsp %rdx %rdx,%rax // stack adjustment The variable x resides in the register rAX both at start and end of the function. The compiler uses a special (AMD64) instruction cqto. Quoting [13]: Copies the sign bit in the rAX register to all bits of the rDX register. The effect of this instruction is to convert a signed word, doubleword, or quadword in the rAX register into a signed doubleword, quadword, or double-quadword in the rDX:rAX registers. This action helps avoid overflow problems in signed number arithmetic. Now the equivalent x = ( x<0 ? 0 : x ); // max0() "simple minded" is compiled to: 35: 3a: 3d: ba 00 00 00 00 48 85 c0 48 0f 48 c2 mov test cmovs $0x0,%edx %rax,%rax %rdx,%rax // note %edx is %rdx A conditional move (cmovs) instruction is used here. That is, the optimized version is (on my machine) actually worse than the straightforward equivalent. 1.12: Bit-wise rotation of a word 27 A second example is a function to adjust a given value when it lies outside a given range [FXT: bits/branchless.h]: 1 2 3 4 5 6 7 8 9 10 static inline long clip_range(long x, long mi, long ma) // Code equivalent to (for mi<=ma): // if ( xma ) x = ma; { x -= mi; x = clip_range0(x, ma-mi); x += mi; return x; } The auxiliary function used involves one branch: 1 2 3 4 5 6 7 8 9 static inline long clip_range0(long x, long m) // Code equivalent (for m>0) to: // if ( x<0 ) x = 0; // else if ( x>m ) x = m; // return x; { if ( (ulong)x > (ulong)m ) x = m & ~(x >> (BITS_PER_LONG-1)); return x; } The generated machine code is 0: 3: 6: 8: b: d: 10: 13: 17: 48 89 f8 48 29 f2 31 c9 48 29 f0 78 0a 48 39 d0 48 89 d1 48 0f 4e c8 48 8d 04 0e mov %rdi,%rax sub %rsi,%rdx xor %ecx,%ecx sub %rsi,%rax js 17 <_Z2CLlll+0x17> cmp %rdx,%rax mov %rdx,%rcx cmovle %rax,%rcx lea (%rsi,%rcx,1),%rax // the branch Now we replace the code by 1 2 3 4 5 6 7 8 9 10 11 static inline long clip_range(long x, long mi, long ma) { x -= mi; if ( x<0 ) x = 0; // else // commented out to make (compiled) function really branchless { ma -= mi; if ( x>ma ) x = ma; } x += mi; } Then the compiler generates branchless code: 0: 3: 8: b: f: 12: 15: 19: 48 89 f8 b9 00 00 00 00 48 29 f0 48 0f 48 c1 48 29 f2 48 39 d0 48 0f 4f c2 48 01 f0 mov mov sub cmovs sub cmp cmovg add %rdi,%rax $0x0,%ecx %rsi,%rax %rcx,%rax %rsi,%rdx %rdx,%rax %rdx,%rax %rsi,%rax Still, with CPUs that do not have a conditional move instruction (or some branchless equivalent of it) the techniques shown in this section can be useful. 1.12 Bit-wise rotation of a word Neither C nor C++ have a statement for bit-wise rotation of a binary word (which may be considered a missing feature). The operation can be emulated via [FXT: bits/bitrotate.h]: 1 2 3 static inline ulong bit_rotate_left(ulong x, ulong r) // Return word rotated r bits to the left // (i.e. toward the most significant bit) 28 4 5 6 Chapter 1: Bit wizardry { return (x<>(BITS_PER_LONG-r)); } As already mentioned, GCC emits exactly the CPU instruction that is meant here, even with non-constant argument r. Explicit use of the corresponding assembler instruction should not do any harm: 1 2 3 4 5 6 7 8 9 10 static inline ulong bit_rotate_right(ulong x, ulong r) // Return word rotated r bits to the right // (i.e. toward the least significant bit) { #if defined BITS_USE_ASM // use x86 asm code return asm_ror(x, r); #else return (x>>r) | (x<<(BITS_PER_LONG-r)); #endif } Here we use an assembler instruction when available [FXT: bits/bitasm-amd64.h]: 1 2 3 4 5 static inline ulong asm_ror(ulong x, ulong r) { asm ("rorq %%cl, %0" : "=r" (x) : "0" (x), "c" (r)); return x; } Rotation using only a part of the word length can be implemented as 1 2 3 4 5 6 7 8 9 10 11 static inline ulong bit_rotate_left(ulong x, ulong r, ulong ldn) // Return ldn-bit word rotated r bits to the left // (i.e. toward the most significant bit) // Must have 0 <= r <= ldn { ulong m = ~0UL >> ( BITS_PER_LONG - ldn ); x &= m; x = (x<>(ldn-r)); x &= m; return x; } and 1 2 3 4 5 6 7 8 9 10 11 static inline ulong bit_rotate_right(ulong x, ulong r, ulong ldn) // Return ldn-bit word rotated r bits to the right // (i.e. toward the least significant bit) // Must have 0 <= r <= ldn { ulong m = ~0UL >> ( BITS_PER_LONG - ldn ); x &= m; x = (x>>r) | (x<<(ldn-r)); x &= m; return x; } Finally, the functions 1 2 3 4 5 6 static inline ulong bit_rotate_sgn(ulong x, long r, ulong ldn) // Positive r --> shift away from element zero { if ( r > 0 ) return bit_rotate_left(x, (ulong)r, ldn); else return bit_rotate_right(x, (ulong)-r, ldn); } and (full-word version) 1 2 3 4 5 6 static inline ulong bit_rotate_sgn(ulong x, long r) // Positive r --> shift away from element zero { if ( r > 0 ) return bit_rotate_left(x, (ulong)r); else return bit_rotate_right(x, (ulong)-r); } are sometimes convenient. 1.13: Binary necklaces ‡ 29 Binary necklaces ‡ 1.13 We give several functions related to cyclic rotations of binary words and a class to generate binary necklaces. 1.13.1 Cyclic matching, minimum, and maximum The following function determines whether there is a cyclic right shift of its second argument so that it matches the first argument. It is given in [FXT: bits/bitcyclic-match.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 static inline ulong bit_cyclic_match(ulong x, ulong y) // Return r if x==rotate_right(y, r) else return ~0UL. // In other words: return // how often the right arg must be rotated right (to match the left) // or, equivalently: // how often the left arg must be rotated left (to match the right) { ulong r = 0; do { if ( x==y ) return r; y = bit_rotate_right(y, 1); } while ( ++r < BITS_PER_LONG ); return ~0UL; } The functions shown work on the full length of the words, equivalents for the sub-word of the lowest ldn bits are given in the respective files. Just one example: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 static inline ulong bit_cyclic_match(ulong x, ulong y, ulong ldn) // Return r if x==rotate_right(y, r, ldn) else return ~0UL // (using ldn-bit words) { ulong r = 0; do { if ( x==y ) return r; y = bit_rotate_right(y, 1, ldn); } while ( ++r < ldn ); return ~0UL; } The minimum among all cyclic shifts of a word can be computed via the following function given in [FXT: bits/bitcyclic-minmax.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 static inline ulong bit_cyclic_min(ulong x) // Return minimum of all rotations of x { ulong r = 1; ulong m = x; do { x = bit_rotate_right(x, 1); if ( x>d) are zero if and only if the word has period d. So we can use the following function body: 1 2 3 4 5 6 7 ulong sl = BITS_PER_LONG-ldn; for (ulong s=1; s>s)) << sl ) ) } return ldn; return s; Testing for periods that are not divisors of the word length can be avoided as follows: 1 2 3 4 5 6 7 8 9 10 ulong f = tiny_factors_tab[ldn]; ulong sl = BITS_PER_LONG-ldn; for (ulong s=1; s>= 1; if ( 0==(f&1) ) continue; if ( 0==( (x^(x>>s)) << sl ) ) } return ldn; return s; The table of tiny factors used is shown in section 1.9.2 on page 24. The version for ldn==BITS_PER_LONG can be optimized similarly: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 static inline ulong bit_cyclic_period(ulong x) // Return minimal positive bit-rotation that transforms x into itself. // (same as bit_cyclic_period(x, BITS_PER_LONG) ) // // The returned value is a divisor of the word length, // i.e. 1,2,4,8,...,BITS_PER_LONG. { ulong r = 1; do { ulong y = bit_rotate_right(x, r); if ( x==y ) return r; r <<= 1; } while ( r < BITS_PER_LONG ); return r; // == BITS_PER_LONG } 1.13.3 Generating all binary necklaces We can generate all necklaces by the FKM algorithm given in section 18.1.1 on page 371. Here we specialize the method for binary words. The words generated are the cyclic maxima [FXT: class bit necklace 1.13: Binary necklaces ‡ 31 in bits/bit-necklace.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 class bit_necklace { public: ulong a_; // necklace ulong j_; // period of the necklace ulong n2_; // bit representing n: n2==2**(n-1) ulong j2_; // bit representing j: j2==2**(j-1) ulong n_; // number of bits in words ulong mm_; // mask of n ones ulong tfb_; // for fast factor lookup public: bit_necklace(ulong n) { init(n); } ~bit_necklace() { ; } void init(ulong n) { if ( 0==n ) n = 1; // avoid hang if ( n>=BITS_PER_LONG ) n = BITS_PER_LONG; n_ = n; n2_ = 1UL<<(n-1); mm_ = (~0UL) >> (BITS_PER_LONG-n); tfb_ = tiny_factors_tab[n] >> 1; tfb_ |= n2_; // needed for n==BITS_PER_LONG first(); } void first() { a_ = 0; j_ = 1; j2_ = 1; } ulong data() const { return a_; } ulong period() const { return j_; } The method for computing the successor is 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ulong next() // Create next necklace. // Return the period, zero when current necklace is last. { if ( a_==mm_ ) { first(); return 0; } do { // next lines compute index of highest zero, same result as // j_ = highest_zero_idx( a_ ^ (~mm_) ); // but the direct computation is faster: j_ = n_ - 1; ulong jb = 1UL << j_; while ( 0!=(a_ & jb) ) { --j_; jb>>=1; } j2_ = 1UL << j_; ++j_; a_ |= j2_; a_ = bit_copy_periodic(a_, j_, n_); } while ( 0==(tfb_ & j2_) ); return // necklaces only j_; } It uses the following function for periodic copying [FXT: bits/bitperiodic.h]: 1 2 3 4 5 6 7 static inline ulong bit_copy_periodic(ulong a, ulong p, ulong ldn) // Return word that consists of the lowest p bits of a repeated // in the lowest ldn bits (higher bits are zero). // E.g.: if p==3, ldn=7 and a=*****xyz (8-bit), the return 0zxyzxyz. // Must have p>0 and ldn>0. { a &= ( ~0UL >> (BITS_PER_LONG-p) ); 32 8 9 10 11 Chapter 1: Bit wizardry for (ulong s=p; s> (BITS_PER_LONG-ldn) ); return a; } Finally, we can easily detect whether a necklace is a Lyndon word: 1 2 3 4 5 6 7 8 9 10 11 ulong is_lyndon_word() const { return (j2_ & n2_); } ulong next_lyn() // Create next Lyndon word. // Return the period (==n), zero when current necklace is last. { if ( a_==mm_ ) { first(); return 0; } do { next(); } while ( !is_lyndon_word() ); return n_; } }; About 54 million necklaces per second are generated (with n = 32), corresponding to a rate of 112 M/s for pre-necklaces [FXT: bits/bit-necklace-demo.cc]. 1.13.4 Computing the cyclic distance A function to compute the cyclic distance between two words [FXT: bits/bitcyclic-dist.h] is: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 static inline ulong bit_cyclic_dist(ulong a, ulong b) // Return minimal bitcount of (t ^ b) // where t runs through the cyclic rotations of a. { ulong d = ~0UL; ulong t = a; do { ulong z = t ^ b; ulong e = bit_count( z ); if ( e < d ) d = e; t = bit_rotate_right(t, 1); } while ( t!=a ); return d; } If the arguments are cyclic shifts of each other, then zero is returned. A version for partial words is 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 static inline ulong bit_cyclic_dist(ulong a, ulong b, ulong ldn) { ulong d = ~0UL; const ulong m = (~0UL>>(BITS_PER_LONG-ldn)); b &= m; a &= m; ulong t = a; do { ulong z = t ^ b; ulong e = bit_count( z ); if ( e < d ) d = e; t = bit_rotate_right(t, 1, ldn); } while ( t!=a ); return d; } 1.13.5 Cyclic XOR and its inverse The functions [FXT: bits/bitcyclic-xor.h] 1 2 3 4 static inline ulong bit_cyclic_rxor(ulong x) { return x ^ bit_rotate_right(x, 1); } and 1.14: Reversing the bits of a word 1 2 3 4 33 static inline ulong bit_cyclic_lxor(ulong x) { return x ^ bit_rotate_left(x, 1); } return a word whose number of set bits is even. A word and its complement produce the same result. The inverse functions need no rotation at all, the inverse of bit_cyclic_rxor() is the inverse Gray code (see section 1.16 on page 41): 1 2 3 4 5 static inline ulong bit_cyclic_inv_rxor(ulong x) // Return v so that bit_cyclic_rxor(v) == x. { return inverse_gray_code(x); } The argument x must have an even number of bits. If this is the case, the lowest bit of the result is zero. The complement of the returned value is also an inverse of bit_cyclic_rxor(). The inverse of bit_cyclic_lxor() is the inverse reversed code (see section 1.16.6 on page 45): 1 2 3 4 5 static inline ulong bit_cyclic_inv_lxor(ulong x) // Return v so that bit_cyclic_lxor(v) == x. { return inverse_rev_gray_code(x); } We do not need to mask out the lowest bit because for valid arguments (that have an even number of bits) the high bits of the result are zero. This function can be used to solve the quadratic equation v 2 + v = x in the finite field GF(2n ) when normal bases are used, see section 42.6.2 on page 903. 1.14 Reversing the bits of a word The bits of a binary word can efficiently be reversed by a sequence of steps that reverse the order of certain blocks. For 16-bit words, we need 4 = log2 (16) such steps [FXT: bits/revbin-steps-demo.cc]: [ 0 1 2 3 4 5 6 7 8 9 a b c d e f ] [ 1 0 3 2 5 4 7 6 9 8 b a d c f e ] [ 3 2 1 0 7 6 5 4 b a 9 8 f e d c ] [ 7 6 5 4 3 2 1 0 f e d c b a 9 8 ] [ f e d c b a 9 8 7 6 5 4 3 2 1 0 ] 1.14.1 <--= pairs swapped <--= groups of 2 swapped <--= groups of 4 swapped <--= groups of 8 swapped Swapping adjacent bit blocks We need a couple of auxiliary functions given in [FXT: bits/bitswap.h]. Pairs of adjacent bits can be swapped via 1 2 3 4 5 6 7 8 9 10 11 12 static inline ulong bit_swap_1(ulong x) // Return x with neighbor bits swapped. { #if BITS_PER_LONG == 32 ulong m = 0x55555555UL; #else #if BITS_PER_LONG == 64 ulong m = 0x5555555555555555UL; #endif #endif return ((x & m) << 1) | ((x & (~m)) >> 1); } The 64-bit branch is omitted in the following examples. Adjacent groups of 2 bits are swapped by 1 2 3 4 5 6 static inline ulong bit_swap_2(ulong x) // Return x with groups of 2 bits swapped. { ulong m = 0x33333333UL; return ((x & m) << 2) | ((x & (~m)) >> 2); } Equivalently, 34 1 2 3 4 5 6 Chapter 1: Bit wizardry static inline ulong bit_swap_4(ulong x) // Return x with groups of 4 bits swapped. { ulong m = 0x0f0f0f0fUL; return ((x & m) << 4) | ((x & (~m)) >> 4); } and 1 2 3 4 5 6 static inline ulong bit_swap_8(ulong x) // Return x with groups of 8 bits swapped. { ulong m = 0x00ff00ffUL; return ((x & m) << 8) | ((x & (~m)) >> 8); } When swapping half-words (here for 32-bit architectures) 1 2 3 4 5 6 static inline ulong bit_swap_16(ulong x) // Return x with groups of 16 bits swapped. { ulong m = 0x0000ffffUL; return ((x & m) << 16) | ((x & (m<<16)) >> 16); } we could also use the bit-rotate function from section 1.12 on page 27, or return (x << 16) | (x >> 16); The GCC compiler recognizes that the whole operation is equivalent to a (left or right) word rotation and indeed emits just a single rotate instruction. 1.14.2 Bit-reversing binary words The following is a function to reverse the bits of a binary word [FXT: bits/revbin.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 static inline ulong revbin(ulong x) // Return x with reversed bit order. { x = bit_swap_1(x); x = bit_swap_2(x); x = bit_swap_4(x); x = bit_swap_8(x); x = bit_swap_16(x); #if BITS_PER_LONG >= 64 x = bit_swap_32(x); #endif return x; } The steps after bit_swap_4() correspond to a byte-reverse operation. This operation is just one assembler instruction for many CPUs. The inline assembler with GCC for AMD64 CPUs is given in [FXT: bits/bitasm-amd64.h]: 1 2 3 4 5 static inline ulong asm_bswap(ulong x) { asm ("bswap %0" : "=r" (x) : "0" (x)); return x; } We use it for byte reversal if available: 1 2 3 4 5 6 7 8 9 10 11 12 13 static inline ulong bswap(ulong x) // Return word with reversed byte order. { #ifdef BITS_USE_ASM x = asm_bswap(x); #else x = bit_swap_8(x); x = bit_swap_16(x); #if BITS_PER_LONG >= 64 x = bit_swap_32(x); #endif #endif // def BITS_USE_ASM return x; 1.14: Reversing the bits of a word 14 35 } The function actually used for bit reversal is good for both 32 and 64 bit words: 1 2 3 4 5 6 7 8 static inline ulong revbin(ulong x) { x = bit_swap_1(x); x = bit_swap_2(x); x = bit_swap_4(x); x = bswap(x); return x; } The masks can be generated in the process: 1 2 3 4 5 6 7 8 9 10 11 12 static inline ulong revbin(ulong x) { ulong s = BITS_PER_LONG >> 1; ulong m = ~0UL >> s; while ( s ) { x = ( (x & m) << s ) ^ ( (x & (~m)) >> s ); s >>= 1; m ^= (m<>= 1; } return r; } Therefore the function 1 2 3 4 5 6 7 static inline ulong revbin(ulong x, ulong ldn) // Return word with the ldn least significant bits // (i.e. bit_0 ... bit_{ldn-1}) of x reversed, // the other bits are set to zero. { return revbin(x) >> (BITS_PER_LONG-ldn); } should only be used if ldn is not too small, else be replaced by the trivial algorithm. We can use table lookups so that, for example, eight bits are reversed at a time using a 256-byte table. The routine for full words is 1 2 3 4 5 6 7 8 9 10 11 12 unsigned char revbin_tab[256]; // reversed 8-bit words ulong revbin_t(ulong x) { ulong r = 0; for (ulong k=0; k>= 8; } return r; } The routine can be optimized by unrolling to avoid all branches: 1 2 3 4 5 6 static inline ulong revbin_t(ulong x) { ulong r = revbin_tab[ x & 255 ]; r <<= 8; r |= revbin_tab[ x & 255 ]; r <<= 8; r |= revbin_tab[ x & 255 ]; #if BYTES_PER_LONG > 4 x >>= 8; x >>= 8; x >>= 8; 36 7 8 9 10 11 12 13 14 Chapter 1: Bit wizardry r <<= 8; r |= revbin_tab[ x & 255 ]; r <<= 8; r |= revbin_tab[ x & 255 ]; r <<= 8; r |= revbin_tab[ x & 255 ]; r <<= 8; r |= revbin_tab[ x & 255 ]; #endif r <<= 8; r |= revbin_tab[ x ]; return r; } x >>= 8; x >>= 8; x >>= 8; x >>= 8; However, reversing the first 230 binary words with this routine takes (on a 64-bit machine) longer than with the routine using the bit_swap_NN() calls, see [FXT: bits/revbin-tab-demo.cc]. 1.14.3 Generating the bit-reversed words in order If the bit-reversed words have to be generated in the (reversed) counting order, there is a significantly cheaper way to do the update [FXT: bits/revbin-upd.h]: 1 2 3 4 5 6 7 8 static inline ulong revbin_upd(ulong r, ulong h) // Let n=2**ldn and h=n/2. // Then, with r == revbin(x, ldn) at entry, return revbin(x+1, ldn) // Note: routine will hang if called with r the all-ones word { while ( !((r^=h)&h) ) h >>= 1; return r; } Now assume we want to generate the bit-reversed words of all N = 2n − 1 words less than 2n . The total number of branches with the while-loop can be estimated by observing that for half of the updates just one bit changes, two bits change for a quarter, three bits change for one eighth of all updates, and so on. So the loop executes less than 2 N times:  N 4 log2 (N ) 1 2 3 + + + + ··· + 2 4 8 16 N log2 (N )  = N X j=1 j < 2N 2j (1.14-1) For large values of N the following method can be significantly faster if a fast routine is available for the computation of the least significant bit in a word. The underlying observation is that for a fixed word of size n there are just n different patterns of bit-changes with incrementing. We generate a lookup table of the bit-reversed patterns, utab[], an array of BITS_PER_LONG elements: 1 2 3 4 5 6 static inline void make_revbin_upd_tab(ulong ldn) // Initialize lookup table used by revbin_tupd() { utab[0] = 1UL<<(ldn-1); for (ulong k=1; k>1); } The change patterns for n = 5 start as pattern ....1 ...11 ....1 ..111 ....1 ...11 ....1 .1111 ....1 ...11 reversed pattern 1.... 11... 1.... 111.. 1.... 11... 1.... 1111. 1.... 11... The pattern with x set bits is used for the update of k to k + 1 when the lowest zero of k is at position x − 1: utab[0]= utab[1]= utab[2]= utab[3]= utab[4]= reversed 1.... 11... 111.. 1111. 11111 used when the lowest zero of k is at index: 0 1 2 3 4 The update routine can now be implemented as 1.14: Reversing the bits of a word 1 2 3 4 5 6 7 8 9 10 11 37 static inline ulong revbin_tupd(ulong r, ulong k) // Let r==revbin(k, ldn) then // return revbin(k+1, ldn). // NOTE 1: need to call make_revbin_upd_tab(ldn) before usage // where ldn=log_2(n) // NOTE 2: different argument structure than revbin_upd() { k = lowest_one_idx(~k); // lowest zero idx r ^= utab[k]; return r; } The revbin-update routines are used for the revbin permutation described in section 2.6. Update, bit-wise Update, table Full, masks Full, 8-bit table Full32, 8-bit table Full16, 8-bit table Full, generated masks Full, bit-wise 30 bits 1.00 0.99 0.74 1.77 0.83 — 2.97 8.76 16 bits 1.00 1.08 0.81 1.94 0.90 0.54 3.25 5.77 8 bits 1.00 1.15 0.86 2.06 0.96 0.58 3.45 2.50 revbin upd() revbin tupd() revbin() revbin t() revbin t le32() revbin t le16() [page 35] [page 35] Figure 1.14-A: Relative performance of the revbin-update and (full) revbin routines. The timing of the bit-wise update routine is normalized to 1. Values in each column should be compared, smaller values correspond to faster routines. A column labeled “N bits” gives the timing for reversing the N least significant bits of a word. The relative performance of the different revbin routines is shown in figure 1.14-A. As a surprise, the full-word revbin function is consistently faster than both of the update routines, mainly because the machine used (see appendix B on page 922) has a byte swap instruction. As the performance of table lookups is highly machine dependent your results can be very different. 1.14.4 Alternative techniques for in-order generation The following loop, due to Brent Lehmann [priv. comm.], also generates the bit-reversed words in succession: 1 2 3 4 5 6 7 8 9 ulong n = 32; // a power of 2 ulong p = 0, s = 0, n2 = 2*n; do { // here: s is the bit-reversed word p += 2; s ^= n - (n / (p&-p)); } while ( p> (lowest_one_idx(p)+1)); } while ( p>1; m>n; m>>=1) revbin_rec(f+m, m); 38 6 Chapter 1: Bit wizardry } Call revbin_rec(0, 0) to generate all N-bit bit-reversed words. A technique to generate all revbin pairs in a pseudo random order is given in section 41.4 on page 873. 1.15 Bit-wise zip The bit-wise zip (bit-zip) operation moves the bits in the lower half to even indices and the bits in the upper half to odd indices. For example, with 8-bit words the permutation of bits is [ a b c d A B C D ] |--> [ a A b B c C d D ] A straightforward implementation is 1 2 3 4 5 6 7 8 9 10 11 12 13 ulong bit_zip(ulong a, ulong b) { ulong x = 0; ulong m = 1, s = 0; for (ulong k=0; k<(BITS_PER_LONG/2); ++k) { x |= (a & m) << s; ++s; x |= (b & m) << s; m <<= 1; } return x; } Its inverse (bit-unzip) moves even indexed bits to the lower half-word and odd indexed bits to the upper half-word: 1 2 3 4 5 6 7 8 9 10 11 12 13 void bit_unzip(ulong x, ulong &a, ulong &b) { a = 0; b = 0; ulong m = 1, s = 0; for (ulong k=0; k<(BITS_PER_LONG/2); ++k) { a |= (x & m) >> s; ++s; m <<= 1; b |= (x & m) >> s; m <<= 1; } } For a faster implementation we will use the butterfly_*()-functions which are defined in [FXT: bits/bitbutterfly.h] (64-bit version): 1 2 3 4 5 6 7 8 9 10 static inline ulong butterfly_4(ulong x) // Swap in each block of 16 bits the two central blocks of 4 bits. { const ulong ml = 0x0f000f000f000f00UL; const ulong s = 4; const ulong mr = ml >> s; const ulong t = ((x & ml) >> s ) | ((x & mr) << s ); x = (x & ~(ml | mr)) | t; return x; } The following version of the function may look more elegant but is actually slower: 1 2 3 4 5 6 7 8 static inline ulong butterfly_4(ulong x) { const ulong m = 0x0ff00ff00ff00ff0UL; ulong c = x & m; c ^= (c<<4) ^ (c>>4); c &= m; return x ^ c; } The optimized versions of the bit-zip and bit-unzip routines are [FXT: bits/bitzip.h]: 1 2 static inline ulong bit_zip(ulong x) { 1.15: Bit-wise zip 3 4 5 6 7 8 9 10 11 39 #if BITS_PER_LONG == 64 x = butterfly_16(x); #endif x = butterfly_8(x); x = butterfly_4(x); x = butterfly_2(x); x = butterfly_1(x); return x; } and 1 2 3 4 5 6 7 8 9 10 11 static inline ulong bit_unzip(ulong x) { x = butterfly_1(x); x = butterfly_2(x); x = butterfly_4(x); x = butterfly_8(x); #if BITS_PER_LONG == 64 x = butterfly_16(x); #endif return x; } Laszlo Hars suggests [priv. comm.] the following routine (version for 32-bit words), which can be obtained by making the compile-time constants explicit: 1 2 3 4 5 6 7 8 static inline uint32 bit_zip(uint32 x) { x = ((x & 0x0000ff00) << 8) | ((x >> 8) & 0x0000ff00) | (x & 0xff0000ff); x = ((x & 0x00f000f0) << 4) | ((x >> 4) & 0x00f000f0) | (x & 0xf00ff00f); x = ((x & 0x0c0c0c0c) << 2) | ((x >> 2) & 0x0c0c0c0c) | (x & 0xc3c3c3c3); x = ((x & 0x22222222) << 1) | ((x >> 1) & 0x22222222) | (x & 0x99999999); return x; } A bit-zip version for words whose upper half is zero is (64-bit version) 1 2 3 4 5 6 7 8 9 10 static inline ulong bit_zip0(ulong x) // Return word with lower half bits in even indices. { x = (x | (x<<16)) & 0x0000ffff0000ffffUL; x = (x | (x<<8)) & 0x00ff00ff00ff00ffUL; x = (x | (x<<4)) & 0x0f0f0f0f0f0f0f0fUL; x = (x | (x<<2)) & 0x3333333333333333UL; x = (x | (x<<1)) & 0x5555555555555555UL; return x; } Its inverse is 1 2 3 4 5 6 7 8 9 10 static inline ulong bit_unzip0(ulong x) // Bits at odd positions must be zero. { x = (x | (x>>1)) & 0x3333333333333333UL; x = (x | (x>>2)) & 0x0f0f0f0f0f0f0f0fUL; x = (x | (x>>4)) & 0x00ff00ff00ff00ffUL; x = (x | (x>>8)) & 0x0000ffff0000ffffUL; x = (x | (x>>16)) & 0x00000000ffffffffUL; return x; } The simple structure of the routines suggests trying the following versions of bit-zip and its inverse: 1 2 3 4 5 6 7 8 9 10 11 12 13 static inline ulong bit_zip(ulong x) { ulong y = (x >> 32); x &= 0xffffffffUL; x = (x | (x<<16)) & 0x0000ffff0000ffffUL; y = (y | (y<<16)) & 0x0000ffff0000ffffUL; x = (x | (x<<8)) & 0x00ff00ff00ff00ffUL; y = (y | (y<<8)) & 0x00ff00ff00ff00ffUL; x = (x | (x<<4)) & 0x0f0f0f0f0f0f0f0fUL; y = (y | (y<<4)) & 0x0f0f0f0f0f0f0f0fUL; x = (x | (x<<2)) & 0x3333333333333333UL; y = (y | (y<<2)) & 0x3333333333333333UL; x = (x | (x<<1)) & 0x5555555555555555UL; 40 Chapter 1: Bit wizardry 14 15 16 17 } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 static inline ulong bit_unzip(ulong x) { ulong y = (x >> 1) & 0x5555555555555555UL; x &= 0x5555555555555555UL; x = (x | (x>>1)) & 0x3333333333333333UL; y = (y | (y>>1)) & 0x3333333333333333UL; x = (x | (x>>2)) & 0x0f0f0f0f0f0f0f0fUL; y = (y | (y>>2)) & 0x0f0f0f0f0f0f0f0fUL; x = (x | (x>>4)) & 0x00ff00ff00ff00ffUL; y = (y | (y>>4)) & 0x00ff00ff00ff00ffUL; x = (x | (x>>8)) & 0x0000ffff0000ffffUL; y = (y | (y>>8)) & 0x0000ffff0000ffffUL; x = (x | (x>>16)) & 0x00000000ffffffffUL; y = (y | (y>>16)) & 0x00000000ffffffffUL; x |= (y<<32); return x; } y = (y | (y<<1)) x |= (y<<1); return x; & 0x5555555555555555UL; As the statements involving the variables x and y are independent the CPU-internal parallelism can be used. However, these versions turn out to be slightly slower than those given before. The following function moves the bits of the lower half-word of x into the even positions of lo and the bits of the upper half-word into hi (two versions given): 1 2 3 4 5 6 7 8 9 10 11 12 13 #define BPLH (BITS_PER_LONG/2) static inline void bit_zip2(ulong x, ulong &lo, ulong &hi) { #if 1 x = bit_zip(x); lo = x & 0x5555555555555555UL; hi = (x>>1) & 0x5555555555555555UL; #else hi = bit_zip0( x >> BPLH ); lo = bit_zip0( (x << BPLH) >> (BPLH) ); #endif } The inverse function is 1 2 3 4 5 6 7 8 9 static inline ulong bit_unzip2(ulong lo, ulong hi) // Inverse of bit_zip2(x, lo, hi). { #if 1 return bit_unzip( (hi<<1) | lo ); #else return bit_unzip0(lo) | (bit_unzip0(hi) << BPLH); #endif } Functions that zip/unzip the bits of the lower half of two words are 1 2 3 4 5 6 static inline ulong bit_zip2(ulong x, ulong y) // 2-word version: // only the lower half of x and y are merged { return bit_zip( (y<> BPLH; x = t & 0x00000000ffffffffUL; } 1.16: Gray code and parity 1.16 41 Gray code and parity k: 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: bin(k) ....... ......1 .....1. .....11 ....1.. ....1.1 ....11. ....111 ...1... ...1..1 ...1.1. ...1.11 ...11.. ...11.1 ...111. ...1111 ..1.... ..1...1 ..1..1. ..1..11 ..1.1.. ..1.1.1 ..1.11. ..1.111 ..11... ..11..1 ..11.1. ..11.11 ..111.. ..111.1 ..1111. ..11111 g(k) ....... ......1 .....11 .....1. ....11. ....111 ....1.1 ....1.. ...11.. ...11.1 ...1111 ...111. ...1.1. ...1.11 ...1..1 ...1... ..11... ..11..1 ..11.11 ..11.1. ..1111. ..11111 ..111.1 ..111.. ..1.1.. ..1.1.1 ..1.111 ..1.11. ..1..1. ..1..11 ..1...1 ..1.... g^-1(k) ....... ......1 .....11 .....1. ....111 ....11. ....1.. ....1.1 ...1111 ...111. ...11.. ...11.1 ...1... ...1..1 ...1.11 ...1.1. ..11111 ..1111. ..111.. ..111.1 ..11... ..11..1 ..11.11 ..11.1. ..1.... ..1...1 ..1..11 ..1..1. ..1.111 ..1.11. ..1.1.. ..1.1.1 g(2*k) ....... .....11 ....11. ....1.1 ...11.. ...1111 ...1.1. ...1..1 ..11... ..11.11 ..1111. ..111.1 ..1.1.. ..1.111 ..1..1. ..1...1 .11.... .11..11 .11.11. .11.1.1 .1111.. .111111 .111.1. .111..1 .1.1... .1.1.11 .1.111. .1.11.1 .1..1.. .1..111 .1...1. .1....1 g(2*k+1) ......1 .....1. ....111 ....1.. ...11.1 ...111. ...1.11 ...1... ..11..1 ..11.1. ..11111 ..111.. ..1.1.1 ..1.11. ..1..11 ..1.... .11...1 .11..1. .11.111 .11.1.. .1111.1 .11111. .111.11 .111... .1.1..1 .1.1.1. .1.1111 .1.11.. .1..1.1 .1..11. .1...11 .1..... Figure 1.16-A: Binary words, their Gray code, inverse Gray code, and Gray codes of even and odd values (from left to right). The Gray code of a binary word can easily be computed by [FXT: bits/graycode.h] 1 static inline ulong gray_code(ulong x) { return x ^ (x>>1); } Gray codes of consecutive values differ in one bit. Gray codes of values that differ by a power of 2 differ in two bits. Gray codes of even/odd values have an even/odd number of bits set, respectively. This is demonstrated in [FXT: bits/gray-demo.cc], whose output is given in figure 1.16-A. To produce a random value with an even/odd number of bits set, set the lowest bit of a random number to 0/1, respectively, and return its Gray code. Computing the inverse Gray code is slightly more expensive. As the Gray code is the bit-wise difference modulo 2, we can compute the inverse as bit-wise sums modulo 2: 1 2 3 4 5 6 7 8 9 10 11 12 13 static inline ulong inverse_gray_code(ulong x) { // VERSION 1 (integration modulo 2): ulong h=1, r=0; do { if ( x & 1 ) r^=h; x >>= 1; h = (h<<1)+1; } while ( x!=0 ); return r; } For n-bit words, n-fold application of the Gray code gives back the original word. Using the symbol G for the Gray code (operator), we have Gn = id, so Gn−1 ◦ G = id = G−1 ◦ G. That is, applying the Gray code computation n − 1 times gives the inverse Gray code. Thus we can simplify to 1 2 3 4 // VERSION 2 (apply graycode BITS_PER_LONG-1 times): ulong r = BITS_PER_LONG; while ( --r ) x ^= x>>1; return x; 42 Chapter 1: Bit wizardry Applying the Gray code twice is identical to x^=x>>2;, applying it four times is x^=x>>4;, and the idea holds for all power of 2. This leads to the most efficient way to compute the inverse Gray code: 1 2 3 4 5 6 7 8 9 10 11 12 // VERSION 3 (use: gray ** BITSPERLONG == id): x ^= x>>1; // gray ** 1 x ^= x>>2; // gray ** 2 x ^= x>>4; // gray ** 4 x ^= x>>8; // gray ** 8 x ^= x>>16; // gray ** 16 // here: x = gray**31(input) // note: the statements can be reordered at will #if BITS_PER_LONG >= 64 x ^= x>>32; // for 64bit words #endif return x; 1.16.1 The parity of a binary word The parity of a word is its bit-count modulo 2. The lowest bit of the inverse Gray code of a word contains the parity of the word. So we can compute the parity as [FXT: bits/parity.h]: 1 2 3 4 5 static inline ulong parity(ulong x) // Return 0 if the number of set bits is even, else 1 { return inverse_gray_code(x) & 1; } Each bit of the inverse Gray code contains the parity of the partial input left from it (including itself). Be warned that the parity flag of many CPUs is the complement of the above. With the x86-architecture the parity bit also only takes into account the lowest byte. The following routine computes the parity of a full word [FXT: bits/bitasm-i386.h]: 1 2 3 4 5 6 7 8 9 10 static inline ulong asm_parity(ulong x) { x ^= (x>>16); x ^= (x>>8); asm ("addl $0, %0 \n" "setnp %%al \n" "movzx %%al, %0" : "=r" (x) : "0" (x) : "eax"); return x; } The equivalent code for the AMD64 CPU is [FXT: bits/bitasm-amd64.h]: 1 2 3 4 5 6 7 8 9 10 11 static inline ulong asm_parity(ulong x) { x ^= (x>>32); x ^= (x>>16); x ^= (x>>8); asm ("addq $0, %0 \n" "setnp %%al \n" "movzx %%al, %0" : "=r" (x) : "0" (x) : "rax"); return x; } 1.16.2 Byte-wise Gray code and parity A byte-wise Gray code can be computed using (32-bit version) 1 2 3 4 5 static inline ulong byte_gray_code(ulong x) // Return the Gray code of bytes in parallel { return x ^ ((x & 0xfefefefe)>>1); } Its inverse is 1 2 3 static inline ulong byte_inverse_gray_code(ulong x) // Return the inverse Gray code of bytes in parallel { 1.16: Gray code and parity 4 5 6 7 8 43 x ^= ((x & 0xfefefefeUL)>>1); x ^= ((x & 0xfcfcfcfcUL)>>2); x ^= ((x & 0xf0f0f0f0UL)>>4); return x; } And the parities of all bytes can be computed as 1 2 3 4 5 static inline ulong byte_parity(ulong x) // Return the parities of bytes in parallel { return byte_inverse_gray_code(x) & 0x01010101UL; } 1.16.3 Incrementing (counting) in Gray code k: 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: g(k) ....... ......1 .....11 .....1. ....11. ....111 ....1.1 ....1.. ...11.. ...11.1 ...1111 ...111. ...1.1. ...1.11 ...1..1 ...1... ..11... ..11..1 g(2*k) ....... .....11 ....11. ....1.1 ...11.. ...1111 ...1.1. ...1..1 ..11... ..11.11 ..1111. ..111.1 ..1.1.. ..1.111 ..1..1. ..1...1 .11.... .11..11 g(k) p ...... . .....1 1 ....11 . ....1. 1 ...11. . ...111 1 ...1.1 . ...1.. 1 ..11.. . ..11.1 1 ..1111 . ..111. 1 ..1.1. . ..1.11 1 ..1..1 . ..1... 1 .11... . .11..1 1 diff p ...... . .....+ 1 ....+1 . ....1- 1 ...+1. . ...11+ 1 ...1-1 . ...1.- 1 ..+1.. . ..11.+ 1 ..11+1 . ..111- 1 ..1-1. . ..1.1+ 1 ..1.-1 . ..1..- 1 .+1... . .11..+ 1 set {} {0} {0, 1} {1} {1, 2} {0, 1, 2} {0, 2} {2} {2, 3} {0, 2, 3} {0, 1, 2, 3} {1, 2, 3} {1, 3} {0, 1, 3} {0, 3} {3} {3, 4} {0, 3, 4} Figure 1.16-B: The Gray code equals the Gray code of doubled value shifted to the right once. Equivalently, we can separate the lowest bit which equals the parity of the other bits. The last column shows that the changes with each increment always happen one position left of the rightmost bit. Let g(k) be the Gray code of a number k. We are interested in efficiently generating g(k + 1). We can implement a fast Gray counter if we use a spare bit to keep track of the parity of the Gray code word, see figure 1.16-B The following routine does this [FXT: bits/nextgray.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 static inline ulong next_gray2(ulong x) // With input x==gray_code(2*k) the return is gray_code(2*k+2). // Let x1 be the word x shifted right once // and i1 its inverse Gray code. // Let r1 be the return r shifted right once. // Then r1 = gray_code(i1+1). // That is, we have a Gray code counter. // The argument must have an even number of bits. { x ^= 1; x ^= (lowest_one(x) << 1); return x; } Start with x=0, increment with x=next_gray2(pg) and use the words g=x>>1: 1 2 3 4 5 6 7 8 ulong x = 0; for (ulong k=0; k>1; x = next_gray2(x); // here: g == gray_code(k); } This is shown in [FXT: bits/bit-nextgray-demo.cc]. To start at an arbitrary (Gray code) value g, compute 44 Chapter 1: Bit wizardry x = (g<<1) ^ parity(g) Then use the statement x=next_gray2(x) for later increments. If working with a set whose elements are the set bits in the Gray code, the parity is the set size k modulo 2. Compute the increment as follows: 1. If k is even, then goto step 2, else goto step 3. 2. If the first element is zero, then remove it, else prepend the element zero. 3. If the first element equals the second minus one, then remove the second element, else insert at the second position the element equal to the first element plus one. A method to decrement is obtained by simply swapping the actions for even and odd parity. When working with an array that contains the elements of the set, it is more convenient to do the described operations at the end of the array. This leads to the (loopless) algorithm for subsets in minimal-change order given in section 8.2.2 on page 206. Properties of the Gray code are discussed in [127]. 1.16.4 The Thue-Morse sequence The sequence of parities of the binary words 011010011001011010010110011010011001011001101001... is called the Thue-Morse sequence (entry A010060 in [312]). It appears in various seemingly unrelated contexts, see [8] and section 38.1 on page 726. The sequence can be generated with [FXT: class thue morse in bits/thue-morse.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 class thue_morse // Thue-Morse sequence { public: ulong k_; ulong tm_; public: thue_morse(ulong k=0) ~thue_morse() { ; } { init(k); } ulong init(ulong k=0) { k_ = k; tm_ = parity(k_); return tm_; } ulong data() { return tm_; } ulong next() { ulong x = k_ ^ (k_ + 1); ++k_; x ^= x>>1; // highest bit that changed with increment x &= 0x5555555555555555UL; // 64-bit version tm_ ^= ( x!=0 ); // change if highest changed bit was at even index return tm_; } }; The rate of generation is about 366 M/s (6 cycles per update) [FXT: bits/thue-morse-demo.cc]. 1.16.5 The Golay-Rudin-Shapiro sequence ‡ The function [FXT: bits/grsnegative.h] 1 static inline ulong grs_negative_q(ulong x) { return parity( x & (x>>1) ); } returns +1 for indices where the Golay-Rudin-Shapiro sequence (or GRS sequence, entry A020985 in [312]) has the value −1. The algorithm is to count the bit-pairs modulo 2. The pairs may overlap: the 1.16: Gray code and parity ++ ++++++- ++-+ +++- ++-+ +++- ++-+ +++- ++-+ ^ ^ 3, 6, 45 +++- --++++- --+- +++- ++-+ ---+ ++-+ +++- --+- +++- ++-+ ---+ ++-+ ^ ^^ ^ ^ ^ ... 11,12,13,15, 19, 22, ... +++- ++-+ +++- --+- ... Figure 1.16-C: A construction for the Golay-Rudin-Shapiro (GRS) sequence. sequence [1111] contains the three bit-pairs [11..], [.11.], and [..11]. The function returns +1 for x in the sequence 3, 6, 11, 12, 13, 15, 19, 22, 24, 25, 26, 30, 35, 38, 43, 44, 45, 47, 48, 49, 50, 52, 53, ... This is entry A022155 in [312], see also section 38.3 on page 731. The sequence can be computed by starting with two ones, and appending the left half and the negated right half of the values so far in each step, see figure 1.16-C. To compute the successor in the GRS sequence, use 1 2 3 4 5 6 7 8 static inline ulong grs_next(ulong k, ulong g) // With g == grs_negative_q(k), compute grs_negative_q(k+1). { const ulong cm = 0x5555555555555554UL; // 64-bit version ulong h = ~k; h &= -h; // == lowest_zero(k); g ^= ( ((h&cm) ^ ((k>>1)&h)) !=0 ); return g; } With incrementing k, the lowest run of ones of k is replaced by a one at the lowest zero of k. If the length of the lowest run is odd and ≥ 2 then a change of parity happens. This is the case if the lowest zero of k is at one of the positions bin 0101 0101 0101 0100 == hex 5 5 5 4 == cm If the position of the lowest zero is adjacent to the next block of ones, another change of parity will occur. The element of the GRS sequence changes if exactly one of the parity changes takes place. The update function can be used as shown in [FXT: bits/grs-next-demo.cc]: 1 2 3 4 5 6 7 8 ulong n = 65; // Generate this many values of the sequence. ulong k0 = 0; // Start point of the sequence. ulong g = grs_negative_q(k0); for (ulong k=k0; k>’ replaced by ‘<<’). So computing the reversed Gray code is as easy as [FXT: bits/revgraycode.h]: 1 static inline ulong rev_gray_code(ulong x) { return Its inverse is 1 2 3 4 5 6 static inline ulong inverse_rev_gray_code(ulong x) { // use: rev_gray ** BITSPERLONG == id: x ^= x<<1; // rev_gray ** 1 x ^= x<<2; // rev_gray ** 2 x ^= x<<4; // rev_gray ** 4 x ^ (x<<1); } 46 Chapter 1: Bit wizardry ---------------------------------------------------------111.1111....1111................ = 0xef0f0000 == word 1..11...1...1...1............... = gray_code ..11...1...1...1................ = rev_gray_code 1.11.1.11111.1.11111111111111111 = inverse_gray_code 1.1..1.1.....1.1................ = inverse_rev_gray_code ---------------------------------------------------------...1....1111....1111111111111111 = 0x10f0ffff == word ...11...1...1...1............... = gray_code ..11...1...1...1...............1 = rev_gray_code ...11111.1.11111.1.1.1.1.1.1.1.1 = inverse_gray_code 1111.....1.1.....1.1.1.1.1.1.1.1 = inverse_rev_gray_code ---------------------------------------------------------......1......................... = 0x2000000 == word ......11........................ = gray_code .....11......................... = rev_gray_code ......11111111111111111111111111 = inverse_gray_code 1111111......................... = inverse_rev_gray_code ---------------------------------------------------------111111.1111111111111111111111111 = 0xfdffffff == word 1.....11........................ = gray_code .....11........................1 = rev_gray_code 1.1.1..1.1.1.1.1.1.1.1.1.1.1.1.1 = inverse_gray_code 1.1.1.11.1.1.1.1.1.1.1.1.1.1.1.1 = inverse_rev_gray_code ---------------------------------------------------------- Figure 1.16-D: Examples of the Gray code, reversed Gray code, and their inverses with 32-bit words. 7 8 9 10 11 12 13 14 15 x ^= x<<8; // rev_gray ** 8 x ^= x<<16; // rev_gray ** 16 // here: x = rev_gray**31(input) // note: the statements can be reordered at will #if BITS_PER_LONG >= 64 x ^= x<<32; // for 64bit words #endif return x; } Some examples with 32-bit words are shown in figure 1.16-D. Let G and E denote be the Gray code and reversed Gray code of a word X, respectively. Write G−1 and E −1 for their inverses. Then E preserves the lowest bit of X, while E preserves the highest. Also E preserves the lowest set bit of X, while E preserves the highest. Further, E −1 contains at each bit the parity of all bits of X right from it, including the bit itself. Especially, the word parity can be found in the highest bit of E −1 . Let X denote the complement of X, p its parity, and let S the right shift by one of G−1 . Then we have  X if p = 0 G−1 XOR E −1 = (1.16-1a) X otherwise  0 if p = 0 S XOR E −1 = (1.16-1b) 0 otherwise We note that taking the reversed Gray code of a binary word corresponds to multiplication with the binary polynomial x + 1 and the inverse reversed Gray code is a method for fast exact division by x + 1, see section 40.1.6 on page 826. The inverse reversed Gray code can be used to solve the reduced quadratic equation for binary normal bases, see section 42.6.2 on page 903. 1.17 Bit sequency ‡ The sequency of a binary word is the number of zero-one transitions in the word. A function to determine the sequency is [FXT: bits/bitsequency.h]: 1 static inline ulong bit_sequency(ulong x) { return bit_count( gray_code(x) ); } 1.17: Bit sequency ‡ seq= 0 ...... 47 1 .....1 ....11 ...111 ..1111 .11111 111111 2 ....1. ...11. ...1.. ..111. ..11.. ..1... .1111. .111.. .11... .1.... 11111. 1111.. 111... 11.... 1..... 3 ...1.1 ..11.1 ..1..1 ..1.11 .111.1 .11..1 .11.11 .1...1 .1..11 .1.111 1111.1 111..1 111.11 11...1 11..11 11.111 1....1 1...11 1..111 1.1111 4 ..1.1. .11.1. .1..1. .1.11. .1.1.. 111.1. 11..1. 11.11. 11.1.. 1...1. 1..11. 1..1.. 1.111. 1.11.. 1.1... 5 .1.1.1 11.1.1 1..1.1 1.11.1 1.1..1 1.1.11 6 1.1.1. Figure 1.17-A: 6-bit words of prescribed sequency as generated by next sequency(). The function assumes that all bits to the left of the word are zero and all bits to the right are equal to the lowest bit, see figure 1.17-A. For example, the sequency of the 8-bit word [00011111] is one. To take the lowest bit into account, add it to the sequency (then all sequencies are even). The minimal binary word with given sequency can be computed as follows: 1 2 3 4 5 6 7 8 static inline ulong first_sequency(ulong k) // Return the first (i.e. smallest) word with sequency k, // e.g. 00..00010101010 (seq 8) // e.g. 00..00101010101 (seq 9) // Must have: 0 <= k <= BITS_PER_LONG { return inverse_gray_code( first_comb(k) ); } A faster version is (32-bit branch only): 1 2 3 if ( k==0 ) return 0; const ulong m = 0xaaaaaaaaUL; return m >> (BITS_PER_LONG-k); The maximal binary word with given sequency can be computed via 1 2 3 4 5 static inline ulong last_sequency(ulong k) // Return the last (i.e. biggest) word with sequency k. { return inverse_gray_code( last_comb(k) ); } The functions first_comb(k) and last_comb(k) return a word with k bits set at the low and high end, respectively (see section 1.24 on page 62). For the generation of all words with a given sequency, starting with the smallest, we use a function that computes the next word with the same sequency: 1 2 3 4 5 6 7 static inline ulong next_sequency(ulong x) { x = gray_code(x); x = next_colex_comb(x); x = inverse_gray_code(x); return x; } The inverse function, returning the previous word with the same sequency, is 1 2 3 4 5 6 static inline ulong prev_sequency(ulong x) { x = gray_code(x); x = prev_colex_comb(x); x = inverse_gray_code(x); return x; 48 7 Chapter 1: Bit wizardry } The list of all 6-bit words ordered by sequency is shown in figure 1.17-A. It was created with the program [FXT: bits/bitsequency-demo.cc]. The sequency of a word can be complemented as follows (32-bit version): 1 2 3 4 5 6 static inline ulong complement_sequency(ulong x) // Return word whose sequency is BITS_PER_LONG - s // where s is the sequency of x { return x ^ 0xaaaaaaaaUL; } 1.18 Powers of the Gray code ‡ 1....... .1...... ..1..... ...1.... ....1... .....1.. ......1. .......1 G^0=id 11...... .11..... ..11.... ...11... ....11.. .....11. ......11 .......1 G^1=G 1.1..... .1.1.... ..1.1... ...1.1.. ....1.1. .....1.1 ......1. .......1 G^2 1111.... .1111... ..1111.. ...1111. ....1111 .....111 ......11 .......1 G^3 1...1... .1...1.. ..1...1. ...1...1 ....1... .....1.. ......1. .......1 G^4 11..11.. .11..11. ..11..11 ...11..1 ....11.. .....11. ......11 .......1 G^5 1.1.1.1. .1.1.1.1 ..1.1.1. ...1.1.1 ....1.1. .....1.1 ......1. .......1 G^6 11111111 .1111111 ..111111 ...11111 ....1111 .....111 ......11 .......1 G^7=G^(-1) 1....... .1...... ..1..... ...1.... ....1... .....1.. ......1. .......1 E^0=id 1....... 11...... .11..... ..11.... ...11... ....11.. .....11. ......11 E^1=E 1....... .1...... 1.1..... .1.1.... ..1.1... ...1.1.. ....1.1. .....1.1 E^2 1....... 11...... 111..... 1111.... .1111... ..1111.. ...1111. ....1111 E^3 1....... .1...... ..1..... ...1.... 1...1... .1...1.. ..1...1. ...1...1 E^4 1....... 11...... .11..... ..11.... 1..11... 11..11.. .11..11. ..11..11 E^5 1....... .1...... 1.1..... .1.1.... 1.1.1... .1.1.1.. 1.1.1.1. .1.1.1.1 E^6 1....... 11...... 111..... 1111.... 11111... 111111.. 1111111. 11111111 E^7=E^(-1) Figure 1.18-A: Powers of the matrices for the Gray code (top) and the reversed Gray code (bottom). The Gray code is a bit-wise linear transform of a binary word. The 2k -th power of the Gray code of x can be computed as x ^ (x>>k). The e-th power can be computed as the bit-wise sum of the powers corresponding to the bits in the exponent. This motivates [FXT: bits/graypower.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 static inline ulong gray_pow(ulong x, ulong e) // Return (gray_code**e)(x) // gray_pow(x, 1) == gray_code(x) // gray_pow(x, BITS_PER_LONG-1) == inverse_gray_code(x) { e &= (BITS_PER_LONG-1); // modulo BITS_PER_LONG ulong s = 1; while ( e ) { if ( e & 1 ) x ^= x >> s; // gray ** s s <<= 1; e >>= 1; } return x; } The Gray code g = [g0 , g1 , . . . , g7 ] of a 8-bit binary word x = [x0 , x1 , . . . , x7 ] can be expressed as a matrix multiplication over GF(2) (dots for zeros): g [g0] [g1] [g2] [g3] [g4] [g5] [g6] [g7] = = G [ 11...... ] [ .11..... ] [ ..11.... ] [ ...11... ] [ ....11.. ] [ .....11. ] [ ......11 ] [ .......1 ] x [x0] [x1] [x2] [x3] [x4] [x5] [x6] [x7] The powers of the Gray code correspond to multiplication with powers of the matrix G, shown in figure 1.18-A (bottom). The powers of the inverse Gray code for N -bit words (where N is a power of 2) 1.19: Invertible transforms on words ‡ 49 can be computed by the relation Ge GN −e = GN = id. 1 2 3 4 5 6 7 8 static inline ulong inverse_gray_pow(ulong x, ulong e) // Return (inverse_gray_code**(e))(x) // == (gray_code**(-e))(x) // inverse_gray_pow(x, 1) == inverse_gray_code(x) // inverse_gray_pow(x, BITS_PER_LONG-1) == gray_code(x) { return gray_pow(x, -e); } The matrices corresponding to the powers of the reversed Gray code are shown in figure 1.18-A (bottom). We just have to reverse the shift operator in the functions: 1 2 3 4 5 6 7 8 9 10 11 12 13 static inline ulong rev_gray_pow(ulong x, ulong e) // Return (rev_gray_code**e)(x) { e &= (BITS_PER_LONG-1); // modulo BITS_PER_LONG ulong s = 1; while ( e ) { if ( e & 1 ) x ^= x << s; // rev_gray ** s s <<= 1; e >>= 1; } return x; } The inverse function is 1 2 3 4 5 static inline ulong inverse_rev_gray_pow(ulong x, ulong e) // Return (inverse_rev_gray_code**(e))(x) { return rev_gray_pow(x, -e); } 1.19 Invertible transforms on words ‡ The functions presented in this section are invertible transforms on binary words. The names are chosen as ‘some code’, emphasizing the result of the transforms, similar to the convention used with the name ‘Gray code’. The functions are given in [FXT: bits/bittransforms.h]. In the transform (blue code) 1 2 3 4 5 6 7 8 9 10 11 12 13 static inline ulong blue_code(ulong a) { ulong s = BITS_PER_LONG >> 1; ulong m = ~0UL << s; do { a ^= ( (a&m) >> s ); s >>= 1; m ^= (m>>s); } while ( s ); return a; } the masks ‘m’ are (32-bit binary) 1111111111111111................ 11111111........11111111........ 1111....1111....1111....1111.... 11..11..11..11..11..11..11..11.. 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1. The same masks are used in the yellow code 1 2 3 4 5 6 7 8 static inline ulong yellow_code(ulong a) { ulong s = BITS_PER_LONG >> 1; ulong m = ~0UL >> s; do { a ^= ( (a&m) << s ); s >>= 1; 50 9 10 11 12 13 Chapter 1: Bit wizardry m ^= (m<> 1; ulong m = ~0UL >> s; do { ulong u = a & m; ulong v = a ^ u; a = v ^ (u<>s); s >>= 1; 1.19: Invertible transforms on words ‡ 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: red ................................ 1............................... 11.............................. .1.............................. 1.1............................. ..1............................. .11............................. 111............................. 1111............................ .111............................ ..11............................ 1.11............................ .1.1............................ 11.1............................ 1..1............................ ...1............................ 1...1........................... ....1........................... .1..1........................... 11..1........................... ..1.1........................... 1.1.1........................... 111.1........................... .11.1........................... .1111........................... 11111........................... 1.111........................... ..111........................... 11.11........................... .1.11........................... ...11........................... 1..11........................... 51 0 1 2 1 2 1 2 3 4 3 2 3 2 3 2 1 2 1 2 3 2 3 4 3 4 5 4 3 4 3 2 3 green ................................ 0 11111111111111111111111111111111 32 .1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1 16 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1. 16 ..11..11..11..11..11..11..11..11 16 11..11..11..11..11..11..11..11.. 16 .11..11..11..11..11..11..11..11. 16 1..11..11..11..11..11..11..11..1 16 ...1...1...1...1...1...1...1...1 8 111.111.111.111.111.111.111.111. 24 .1...1...1...1...1...1...1...1.. 8 1.111.111.111.111.111.111.111.11 24 ..1...1...1...1...1...1...1...1. 8 11.111.111.111.111.111.111.111.1 24 .111.111.111.111.111.111.111.111 24 1...1...1...1...1...1...1...1... 8 ....1111....1111....1111....1111 16 1111....1111....1111....1111.... 16 .1.11.1..1.11.1..1.11.1..1.11.1. 16 1.1..1.11.1..1.11.1..1.11.1..1.1 16 ..1111....1111....1111....1111.. 16 11....1111....1111....1111....11 16 .11.1..1.11.1..1.11.1..1.11.1..1 16 1..1.11.1..1.11.1..1.11.1..1.11. 16 ...1111....1111....1111....1111. 16 111....1111....1111....1111....1 16 .1..1.11.1..1.11.1..1.11.1..1.11 16 1.11.1..1.11.1..1.11.1..1.11.1.. 16 ..1.11.1..1.11.1..1.11.1..1.11.1 16 11.1..1.11.1..1.11.1..1.11.1..1. 16 .1111....1111....1111....1111... 16 1....1111....1111....1111....111 16 Figure 1.19-B: Red and green transforms of the binary words 0, 1, . . . , 31. 12 13 14 15 16 m ^= (m<> 1; ulong m = ~0UL << s; do { ulong u = a & m; ulong v = a ^ u; a = v ^ (u>>s); a ^= (v<>= 1; m ^= (m>>s); } while ( s ); return a; } use the masks ................1111111111111111 ........11111111........11111111 ....1111....1111....1111....1111 ..11..11..11..11..11..11..11..11 .1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1 The transforms of the binary words up to 31 are shown in figure 1.19-B, which was created with the program [FXT: bits/bittransforms-red-demo.cc]. The red code can also be computed by the statement revbin( blue_code( x ) ); and the green code by blue_code( revbin( x ) ); 52 Chapter 1: Bit wizardry i i r B Y R E i r B Y R E r B r B i R* E* i R* E* Y* r* B* Y* Y Y E* R* i B* r* R E R E B* Y* Y* r* r* B* E i i R Figure 1.19-C: Multiplication table for the transforms. 1.19.1 Relations between the transforms We write B for the blue code (transform), Y for the yellow code and r for bit-reversal (the revbinfunction). We have the following relations between B and Y : B = Y rY = rY r (1.19-1a) Y = BrB = rBr (1.19-1b) r = Y BY = BY B (1.19-1c) B −1 = B, B B = id (1.19-2a) Y −1 = Y, Y Y = id (1.19-2b) As said, B and Y are self-inverse: We write R for the red code, and E for the green code. The red code and the green code are not involutions (square roots of identity) but third roots of identity: RRR EEE RE = = R−1 = R R = E id, id, E −1 =EE =R = E R = id (1.19-3a) (1.19-3b) (1.19-3c) Figure 1.19-C shows the multiplication table. The R in the third column of the second row says that r B = R. The letter i is used for identity (id). An asterisk says that x y 6= y x. By construction we have R = rB (1.19-4a) E = rY (1.19-4b) Relations between R and E are: R = ErE = rEr (1.19-5a) E = RrR = rRr (1.19-5b) R = RER (1.19-5c) E = ERE (1.19-5d) For the bit-reversal we have r = Y R = RB = BE = EY (1.19-6) Some products for the transforms are B = RY = Y E = RBR = EBE (1.19-7a) Y = EB = BR = RY R = EY E (1.19-7b) R = BY =BEB =Y EY (1.19-7c) E = Y B = BRB = Y RY (1.19-7d) 1.19: Invertible transforms on words ‡ 53 Some triple products that give the identical transform are 1.19.2 id = BY E = RY B (1.19-8a) id = EBY = BRY (1.19-8b) id = Y EB =Y BR (1.19-8c) Relations to Gray code and reversed Gray code Write g for the Gray code, then: gBgB = id (1.19-9a) gBg = B (1.19-9b) g −1 B g −1 gB = B = Bg (1.19-9c) −1 (1.19-9d) Let Sk be the operator that rotates a word by k bits (bit 0 is moved to position k), then Y S+1 Y = g (1.19-10a) Y S−1 Y = g −1 (1.19-10b) Y Sk Y = gk (1.19-10c) Shift in the sequency domain is bit-wise derivative in time domain. Relation 1.19-10c, together with an algorithm to generate the cycle leaders of the Gray permutation (section 2.12.1 on page 128) gives a curious method to generate the binary necklaces whose length is a power of 2, described in section 18.1.6 on page 376. Let e be the operator for the reversed Gray code, then 1.19.3 B S+1 B = e−1 (1.19-11a) B S−1 B = e (1.19-11b) B Sk B = e−k (1.19-11c) Fixed points of the blue code ‡ 0 = ...... : 1 = .....1 : 2 = ....1. : 3 = ....11 : 4 = ...1.. : 5 = ...1.1 : 6 = ...11. : 7 = ...111 : 8 = ..1... : 9 = ..1..1 : 10 = ..1.1. : 11 = ..1.11 : 12 = ..11.. : 13 = ..11.1 : 14 = ..111. : 15 = ..1111 : .......... = .........1 = .......11. = .......111 = .....1.1.. = .....1..1. = .....1.1.1 = .....1..11 = ...1111... = ...11.11.. = ...111111. = ...11.1.1. = ...1111..1 = ...11.11.1 = ...1111111 = ...11.1.11 = 0 1 6 7 20 18 21 19 120 108 126 106 121 109 127 107 16 = .1.... : 17 = .1...1 : 18 = .1..1. : 19 = .1..11 : 20 = .1.1.. : 21 = .1.1.1 : 22 = .1.11. : 23 = .1.111 : 24 = .11... : 25 = .11..1 : 26 = .11.1. : 27 = .11.11 : 28 = .111.. : 29 = .111.1 : 30 = .1111. : 31 = .11111 : .1...1.... = .1.11.1... = .1.....1.. = .1.11111.. = .1...1.11. = .1.11.111. = .1......1. = .1.1111.1. = .1...1...1 = .1.11.1..1 = .1.....1.1 = .1.11111.1 = .1...1.111 = .1.11.1111 = .1......11 = .1.1111.11 = 272 360 260 380 278 366 258 378 273 361 261 381 279 367 259 379 Figure 1.19-D: The first fixed points of the blue code. The highest bit of all fixed points lies at an even index. There are 2n/2 fixed points with highest bit at index n. The sequence of fixed points of the blue code is (entry A118666 in [312]) 0, 1, 6, 7, 18, 19, 20, 21, 106, 107, 108, 109, 120, 121, 126, 127, 258, 259, ... If f is a fixed point, then f XOR 1 is also a fixed point. Further, 2 (f XOR (2 f )) is a fixed point. These facts can be cast into a function that returns a unique fixed point for each argument [FXT: bits/bluefixed-points.h]: 54 1 2 3 4 5 6 7 8 9 10 11 12 13 Chapter 1: Bit wizardry static inline ulong blue_fixed_point(ulong s) { if ( 0==s ) return 0; ulong f = 1; while ( s>1 ) { f ^= (f<<1); f <<= 1; f |= (s&1); s >>= 1; } return f; } The output for the first few arguments is shown in figure 1.19-D. Note that the fixed points are not in ascending order. The list was created by the program [FXT: bits/bittransforms-blue-fp-demo.cc]. Now write f (x) for the binary polynomial corresponding to f (see chapter 40 on page 822), if f (x) is a fixed point (that is, B f (x) = f (x + 1) = f (x)), then both (x2 + x) f (x) and 1 + (x2 + x) f (x) are fixed points. The function blue_fixed_point() repeatedly multiplies by x2 + x and adds one if the corresponding bit of the argument is set. For the inverse function, we exploit that polynomial division by x + 1 can be done with the inverse reversed Gray code (see section 1.16.6 on page 45) if the polynomial is divisible by x + 1: 1 2 3 4 5 6 7 8 9 10 11 12 13 static inline ulong blue_fixed_point_idx(ulong f) // Inverse of blue_fixed_point() { ulong s = 1; while ( f ) { s <<= 1; s ^= (f & 1); f >>= 1; f = inverse_rev_gray_code(f); // == bitpol_div(f, 3); } return s >> 1; } 1.19.4 More transforms by symbolic powering The idea of powering a transform (as with the Gray code, see section 1.18 on page 48) can be applied to the ‘color’-transforms as exemplified for the blue code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 static inline ulong blue_xcode(ulong a, ulong x) { x &= (BITS_PER_LONG-1); // modulo BITS_PER_LONG ulong s = BITS_PER_LONG >> 1; ulong m = ~0UL << s; while ( s ) { if ( x & 1 ) a ^= ( (a&m) >> s ); x >>= 1; s >>= 1; m ^= (m>>s); } return a; } The result is not the power of the blue code which would be pretty boring as B B = id. The transforms (and the equivalents for Y , R and E, see [FXT: bits/bitxtransforms.h]) are more interesting: all relations between the transforms are still valid, if the symbolic exponent is identical with all terms in the relation. For example, we had B B = id, now B x B x = id is true for all x. Similarly, E E = R now has to be E x E x = Rx . That is, we have BITS_PER_LONG different versions of our four transforms that share their properties with the ‘simple’ versions. Among them are BITS_PER_LONG transforms B x and Y x that are involutions and E x and Rx that are third roots of the identity: E x E x E x = Rx Rx Rx = id. While not powers of the simple versions, we still have B 0 = Y 0 = R0 = E 0 = id. Further, let e be the ‘exponent’ of all ones and Z be any of the transforms, then Z e = Z. Writing ‘+’ for the XOR operation, 1.20: Scanning for zero bytes 55 we have Z x Z y = Z x+y and so Z x Z y = Z whenever x + y = e. 1.19.5 The building blocks of the transforms Consider the following transforms on 2-bit words where addition is bit-wise (that is, XOR):      1 0 a a id2 v = = 0 1 b b      0 1 a b r2 v = = 1 0 b a      1 1 a a+b B2 v = = 0 1 b b      1 0 a a Y2 v = = 1 1 b a+b      0 1 a b R2 v = = 1 1 b a+b      1 1 a a+b E2 v = = 1 0 b a (1.19-12a) (1.19-12b) (1.19-12c) (1.19-12d) (1.19-12e) (1.19-12f) It can easily be verified that for these the same relations hold as for id, r, B, Y , R, E. In fact the ‘color-transforms’, bit-reversal, and identity are the transforms obtained as repeated Kronecker-products of the matrices (see section 23.3 on page 462). The transforms are linear over GF(2): Z(α a + β b) = α Z(a) + β Z(b) (1.19-13) The corresponding version of the bit-reversal is [FXT: bits/revbin.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 static inline ulong xrevbin(ulong a, ulong x) { x &= (BITS_PER_LONG-1); // modulo BITS_PER_LONG ulong s = BITS_PER_LONG >> 1; ulong m = ~0UL >> s; while ( s ) { if ( x & 1 ) a = ( (a & m) << s ) ^ ( (a & (~m)) >> s ); x >>= 1; s >>= 1; m ^= (m< loop is executed prop. log_2(BITS_PER_LONG) times // precision is 3, 6, 12, 24, 48, 96, ... bits (or better) { if ( 0==(x&1) ) return 0; // not invertible ulong i = x; // correct to three bits at least ulong p; do { p = i * x; i *= (2UL - p); } while ( p!=1 ); return i; } Let m be the modulus (a power of 2), then the computed value i is the inverse of x modulo m: i ≡ x−1 mod m. It can be used for the exact division: to compute the quotient a/x for a number a that is known to be divisible by x, simply multiply by i. This works because a = b x (a is divisible by x), so a i ≡ b x i ≡ b mod m. 1.21: Inverse and square root modulo 2n 1.21.2 57 Exact division by C = 2k ± 1 We use the following relation where Y = 1 − C: A C = n A = A (1 + Y ) (1 + Y 2 ) (1 + Y 4 ) (1 + Y 8 ) . . . (1 + Y 2 ) 1−Y mod Y 2 n+1 (1.21-1) The relation can be used for efficient exact division over Z by C = 2k ± 1. For C = 2k + 1 use A C u = A (1 − 2k ) (1 + 2k 2 ) (1 + 2k 4 ) (1 + 2k 8 ) · · · (1 + 2k 2 ) mod 2N (1.21-2) mod 2N (1.21-3) where k 2u ≥ N . For C = 2k − 1 use (A/C = −A/ − C) A C = u −A (1 + 2k ) (1 + 2k 2 ) (1 + 2k 4 ) (1 + 2k 8 ) · · · (1 + 2k 2 ) The equivalent method for exact division by polynomials (over GF(2)) is given in section 40.1.6 on page 826. 1.21.3 Computation of the square root x = ...............................1 = inv = ...............................1 sqrt = ...............................1 1 x = 11111111111111111111111111111111 = -1 inv = 11111111111111111111111111111111 x = ..............................1. = 2 x = 1111111111111111111111111111111. = -2 x = ..............................11 = 3 inv = 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.11 x = 111111111111111111111111111111.1 = -3 inv = .1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1 x = .............................1.. = 4 sqrt = ..............................1. x = 111111111111111111111111111111.. = -4 x = .............................1.1 = inv = 11..11..11..11..11..11..11..11.1 5 x = 11111111111111111111111111111.11 = -5 inv = ..11..11..11..11..11..11..11..11 x = .............................11. = 6 x = 11111111111111111111111111111.1. = -6 x = .............................111 = 7 inv = 1.11.11.11.11.11.11.11.11.11.111 x = 11111111111111111111111111111..1 = -7 inv = .1..1..1..1..1..1..1..1..1..1..1 sqrt = 1..111..1..11...11......1.11.1.1 x = ............................1... = 8 x = 11111111111111111111111111111... = -8 x = ............................1..1 = 9 inv = ..111...111...111...111...111..1 sqrt = 111111111111111111111111111111.1 Figure 1.21-A: Examples of the inverse and square root modulo 2n of x where −9 ≤ x ≤ +9. Where no inverse or square root is given, it does not exist. With the inverse square root we choose the start value to match bd/2c + 1 as that guarantees four bits of initial precision. Moreover, we control which of the two possible values of the inverse square root is computed. The argument modulo 8 has to be equal to 1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 static inline ulong invsqrt2adic(ulong d) // Return inverse square root modulo 2**BITS_PER_LONG // Must have: d==1 mod 8 // The number of correct bits is doubled with each step // ==> loop is executed prop. log_2(BITS_PER_LONG) times // precision is 4, 8, 16, 32, 64, ... bits (or better) { if ( 1 != (d&7) ) return 0; // no inverse sqrt // start value: if d == ****10001 ==> x := ****1001 ulong x = (d >> 1) | 1; ulong p, y; do { y = x; p = (3 - d * y * y); x = (y * p) >> 1; } while ( x!=y ); return x; } 58 Chapter 1: Bit wizardry √ The square root is computed as d · 1/ d: 1 2 3 4 5 6 7 8 9 10 11 12 13 static inline ulong sqrt2adic(ulong d) // Return square root modulo 2**BITS_PER_LONG // Must have: d==1 mod 8 or d==4 mod 32, d==16 mod 128 // ... d==4**k mod 4**(k+3) // Result undefined if condition does not hold { if ( 0==d ) return 0; ulong s = 0; while ( 0==(d&1) ) { d >>= 1; ++s; } d *= invsqrt2adic(d); d <<= (s>>1); return d; } Note that the square root modulo 2n is something completely different from the integer square root in √ general. If the argument d is a perfect square, then the result is ± d. The output of the program [FXT: bits/bit2adic-demo.cc] is shown in figure 1.21-A. For further information see [213, ex.31, p.213], [135, chap.6, p.126], and also [208]. 1.22 Radix −2 (minus two) representation The radix −2 representation of a number n is n = ∞ X tk (−2)k (1.22-1) k=0 where the tk are zero or one. For integers n the sum is terminating: the highest nonzero tk is at most two positions beyond the highest bit of the binary representation of the absolute value of n (with two’s complement). 1.22.1 Conversion from binary k: 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: bin(k) ....... ......1 .....1. .....11 ....1.. ....1.1 ....11. ....111 ...1... ...1..1 ...1.1. ...1.11 ...11.. ...11.1 ...111. ...1111 ..1.... ..1...1 ..1..1. ..1..11 ..1.1.. ..1.1.1 ..1.11. ..1.111 ..11... ..11..1 ..11.1. ..11.11 ..111.. ..111.1 ..1111. ..11111 m=bin2neg(k) ....... ......1 ....11. ....111 ....1.. ....1.1 ..11.1. ..11.11 ..11... ..11..1 ..1111. ..11111 ..111.. ..111.1 ..1..1. ..1..11 ..1.... ..1...1 ..1.11. ..1.111 ..1.1.. ..1.1.1 11.1.1. 11.1.11 11.1... 11.1..1 11.111. 11.1111 11.11.. 11.11.1 11...1. 11...11 g=gray(m) ....... ......1 ....1.1 ....1.. ....11. ....111 ..1.111 ..1.11. ..1.1.. ..1.1.1 ..1...1 ..1.... ..1..1. ..1..11 ..11.11 ..11.1. ..11... ..11..1 ..111.1 ..111.. ..1111. ..11111 1.11111 1.1111. 1.111.. 1.111.1 1.11..1 1.11... 1.11.1. 1.11.11 1.1..11 1.1..1. dec(g) 0 <= 0 1 <= 1 5 4 2 3 <= 5 19 18 20 21 17 16 14 15 7 6 8 9 13 12 10 11 <= 21 75 74 76 77 73 72 70 71 79 78 Figure 1.22-A: Radix −2 representations and their Gray codes. Lines ending in ‘<=N’ indicate that all values ≤ N occur in the last column up to that point. 1.22: Radix −2 (minus two) representation 59 A surprisingly simple algorithm to compute the coefficients tk of the radix −2 representation of a binary number is [39, item 128] [FXT: bits/negbin.h]: 1 2 3 4 5 6 7 8 static inline ulong bin2neg(ulong x) // binary --> radix(-2) { const ulong m = 0xaaaaaaaaUL; // 32 bit version x += m; x ^= m; return x; } An example: 14 --> ..1..1. == 16 - 2 == (-2)^4 + (-2)^1 The inverse routine executes the inverse of the two steps in reversed order: 1 2 3 4 5 6 7 8 9 static inline ulong neg2bin(ulong x) // radix(-2) --> binary // inverse of bin2neg() { const ulong m = 0xaaaaaaaaUL; // 32-bit version x ^= m; x -= m; return x; } Figure 1.22-A shows the output of the program [FXT: bits/negbin-demo.cc]. The sequence of Gray codes of the radix −2 representation is a Gray code for the numbers in the range 0, . . . , k for the following values of k (entry A002450 in [312]): k 1.22.2 = 1, 5, 21, 85, 341, 1365, 5461, 21845, 87381, 349525, 1398101, . . . , (4n − 1)/3 Fixed points of the conversion ‡ 0: ........... 1: ..........1 4: ........1.. 5: ........1.1 16: ......1.... 17: ......1...1 20: ......1.1.. 21: ......1.1.1 64: ....1...... 65: ....1.....1 68: ....1...1.. 69: ....1...1.1 80: ....1.1.... 81: ....1.1...1 84: ....1.1.1.. 85: ....1.1.1.1 256: ..1........ 257: ..1.......1 260: ..1.....1.. 261: ..1.....1.1 272: ..1...1.... 273: ..1...1...1 276: ..1...1.1.. 277: ..1...1.1.1 320: ..1.1...... 321: ..1.1.....1 324: ..1.1...1.. 325: ..1.1...1.1 336: ..1.1.1.... 337: ..1.1.1...1 340: ..1.1.1.1.. 341: ..1.1.1.1.1 Figure 1.22-B: The fixed points of the conversion and their binary representations (dots denote zeros). The sequence of fixed points of the conversion starts as 0, 1, 4, 5, 16, 17, 20, 21, 64, 65, 68, 69, 80, 81, 84, 85, 256, ... The binary representations have ones only at even positions (see figure 1.22-B). This is the Moser – De Bruijn sequence, entry A000695 in [312]. The generating function of the sequence is j ∞ 1 X 4j x2 1 − x j=0 1 + x2j x + 4 x2 + 5 x3 + 16 x4 + 17 x5 + 20 x6 + 21 x7 + 64 x8 + 65 x9 + . . . (1.22-2) = The sequence also appears as exponents in the power series (see also section 38.10.1 on page 750) ∞  Y 1 + x4 k  = 1 + x + x4 + x5 + x16 + x17 + x20 + x21 + x64 + x65 + x68 + . . . (1.22-3) k=0 The k-th fixed point is computed by moving all bits of the binary representation of k to position 2 x where x ≥ 0 is the index of the bit under consideration: 60 1 2 3 4 Chapter 1: Bit wizardry static inline ulong negbin_fixed_point(ulong k) { return bit_zip0(k); } The bit-zip function is given in section 1.15 on page 39. The sequence of radix −2 representations of 0, 1, 2, . . ., interpreted as binary numbers, is entry A005351 in [312]: 0,1,6,7,4,5,26,27,24,25,30,31,28,29,18,19,16,17,22,23,20,21,106,107,104,105,110,111, ... The corresponding sequence for the negative numbers −1, −2, −3, . . . is entry A005352: 3,2,13,12,15,14,9,8,11,10,53,52,55,54,49,48,51,50,61,60,63,62,57,56,59,58,37,36,39,38, ... More information about ‘non-standard’ representations of numbers can be found in [213]. 1.22.3 Generating negbin words in order ................................................................ ......................111111111111111111111111111111111111111111 ......................11111111111111111111111111111111.......... ......1111111111111111................1111111111111111.......... ......11111111........11111111........11111111........11111111.. ..1111....1111....1111....1111....1111....1111....1111....1111.. ..11..11..11..11..11..11..11..11..11..11..11..11..11..11..11..11 .1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1 ................................................................ ...........................................111111111111111111111 ...........................................111111111111111111111 ...........11111111111111111111111111111111..................... ...........1111111111111111................1111111111111111..... ...11111111........11111111........11111111........11111111..... ...1111....1111....1111....1111....1111....1111....1111....1111. .11..11..11..11..11..11..11..11..11..11..11..11..11..11..11..11. .1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1 Figure 1.22-C: Radix −2 representations of the numbers 0 . . . + 63 (top) and 0 . . . − 63 (bottom). A radix −2 representation can be incremented by the function [FXT: bits/negbin.h] (32-bit versions in what follows): 1 2 3 4 5 6 7 8 9 10 static inline ulong next_negbin(ulong x) // With x the radix(-2) representation of n // return radix(-2) representation of n+1. { const ulong m = 0xaaaaaaaaUL; x ^= m; ++x; x ^= m; return x; } A version without constants is 1 2 3 4 5 ulong s = x << 1; ulong y = x ^ s; y += 1; s ^= y; return s; Decrementing can be done via 1 2 3 4 5 6 7 8 9 10 static inline ulong prev_negbin(ulong x) // With x the radix(-2) representation of n // return radix(-2) representation of n-1. { const ulong m = 0xaaaaaaaaUL; x ^= m; --x; x ^= m; return x; } or via 1.23: A sparse signed binary representation 1 2 3 4 5 61 const ulong m = 0x55555555UL; x ^= m; ++x; x ^= m; return x; The functions are quite fast, about 730 million words per second are generated (3 cycles per increment or decrement). Figure 1.22-C shows the generated words in forward (top) and backward (bottom) order. It was created with the program [FXT: bits/negbin2-demo.cc]. 1.23 A sparse signed binary representation 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: ....... ......1 .....1. .....11 ....1.. ....1.1 ....11. ....111 ...1... ...1..1 ...1.1. ...1.11 ...11.. ...11.1 ...111. ...1111 ..1.... ..1...1 ..1..1. ..1..11 ..1.1.. ..1.1.1 ..1.11. ..1.111 ..11... ..11..1 ..11.1. ..11.11 ..111.. ..111.1 ..1111. ..11111 .1..... ....... ......P .....P. ....P.M ....P.. ....P.P ...P.M. ...P..M ...P... ...P..P ...P.P. ..P.M.M ..P.M.. ..P.M.P ..P..M. ..P...M ..P.... ..P...P ..P..P. ..P.P.M ..P.P.. ..P.P.P .P.M.M. .P.M..M .P.M... .P.M..P .P.M.P. .P..M.M .P..M.. .P..M.P .P...M. .P....M .P..... 0 = 1 = 2 = 3 = 4 = 5 = 6 = 7 = 8 = 9 = 10 = 11 = 12 = 13 = 14 = 15 = 16 = 17 = 18 = 19 = 20 = 21 = 22 = 23 = 24 = 25 = 26 = 27 = 28 = 29 = 30 = 31 = 32 = +1 +2 +4 -1 +4 +4 +1 +8 -2 +8 -1 +8 +8 +1 +8 +2 +16 -4 -1 +16 -4 +16 -4 +1 +16 -2 +16 -1 +16 +16 +1 +16 +2 +16 +4 -1 +16 +4 +16 +4 +1 +32 -8 -2 +32 -8 -1 +32 -8 +32 -8 +1 +32 -8 +2 +32 -4 -1 +32 -4 +32 -4 +1 +32 -2 +32 -1 +32 Figure 1.23-A: Sparse signed binary representations (nonadjacent form, NAF). The symbols ‘P’ and ‘M’ are respectively used for +1 and −1, dots denote zeros. 0: 1: 2: 4: 5: 8: 9: 10: 16: 17: 18: 20: 21: 32: 33: 34: 36: 37: 40: 41: 42: 64: ........ .......1 ......1. .....1.. .....1.1 ....1... ....1..1 ....1.1. ...1.... ...1...1 ...1..1. ...1.1.. ...1.1.1 ..1..... ..1....1 ..1...1. ..1..1.. ..1..1.1 ..1.1... ..1.1..1 ..1.1.1. .1...... ........ .......P ......P. .....P.. .....P.P ....P... ....P..P ....P.P. ...P.... ...P...P ...P..P. ...P.P.. ...P.P.P ..P..... ..P....P ..P...P. ..P..P.. ..P..P.P ..P.P... ..P.P..P ..P.P.P. .P...... 0 = 1 = 2 = 4 = 5 = 8 = 9 = 10 = 16 = 17 = 18 = 20 = 21 = 32 = 33 = 34 = 36 = 37 = 40 = 41 = 42 = 64 = +1 +2 +4 +4 +1 +8 +8 +1 +8 +2 +16 +16 +1 +16 +2 +16 +4 +16 +4 +1 +32 +32 +1 +32 +2 +32 +4 +32 +4 +1 +32 +8 +32 +8 +1 +32 +8 +2 +64 Figure 1.23-B: The numbers whose negative part in the NAF representation is zero. 62 Chapter 1: Bit wizardry An algorithm to compute a representation of a number x as x = ∞ X sk · 2k sk ∈ {−1, 0, +1} where (1.23-1) k=0 such that two consecutive digits sk , sk+1 are never simultaneously nonzero is given in [275]. Figure 1.23-A gives the representation of several small numbers. It is the output of [FXT: bits/bin2naf-demo.cc]. We can convert the binary representation of x into a pair of binary numbers that correspond to the positive and negative digits [FXT: bits/bin2naf.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 static inline void bin2naf(ulong x, ulong &np, ulong &nm) // Compute (nonadjacent form, NAF) signed binary representation of x: // the unique representation of x as // x=\sum_{k}{d_k*2^k} where d_j \in {-1,0,+1} // and no two adjacent digits d_j, d_{j+1} are both nonzero. // np has bits j set where d_j==+1 // nm has bits j set where d_j==-1 // We have: x = np - nm { ulong xh = x >> 1; // x/2 ulong x3 = x + xh; // 3*x/2 ulong c = xh ^ x3; np = x3 & c; nm = xh & c; } Converting back to binary is trivial: 1 static inline ulong naf2bin(ulong np, ulong nm) { return ( np - nm ); } The representation is one example of a nonadjacent form (NAF). A method for the computation of certain nonadjacent forms (w-NAF) is given in [255]. A Gray code for the signed binary words is described in section 14.7 on page 315. If a binary word contains no consecutive ones, then the negative part of the NAF representation is zero. The sequence of values is [0, 1, 2, 4, 5, 8, 9, 10, 16, . . .], entry A003714 in [312], see figure 1.23-B. The numbers are called the Fibbinary numbers. 1.24 Generating bit combinations 1.24.1 Co-lexicographic (colex) order Given a binary word with k bits set the following routine computes the binary word that is the next combination of k bits in co-lexicographic order. In the co-lexicographic order the reversed sets are sorted, see figure 1.24-A. The method to determine the successor is to determine the lowest block of ones and move its highest bit one position up. Then the rest of the block is moved to the low end of the word [FXT: bits/bitcombcolex.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 static inline ulong next_colex_comb(ulong x) { ulong r = x & -x; // lowest set bit x += r; // replace lowest block by a one left to it if ( 0==x ) return 0; ulong z = x & -x; z -= r; // input was last combination // first zero beyond lowest block // lowest block (cf. lowest_block()) while ( 0==(z&1) ) { z >>= 1; } // move block to low end of word return x | (z>>1); // need one bit less of low block }  One could replace the while-loop by a bit scan and shift combination. The combinations 32 20 aregenerated at a rate of about 142 million per second. The rate is about 120 M/s for the combinations 32 12 , the rate   60 with 60 is 70 M/s, and with it is 160 M/s. 7 53 1.24: Generating bit combinations 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: word ...111 ..1.11 ..11.1 ..111. .1..11 .1.1.1 .1.11. .11..1 .11.1. .111.. 1...11 1..1.1 1..11. 1.1..1 1.1.1. 1.11.. 11...1 11..1. 11.1.. 111... = = = = = = = = = = = = = = = = = = = = = 63 set { 0, 1, 2 } { 0, 1, 3 } { 0, 2, 3 } { 1, 2, 3 } { 0, 1, 4 } { 0, 2, 4 } { 1, 2, 4 } { 0, 3, 4 } { 1, 3, 4 } { 2, 3, 4 } { 0, 1, 5 } { 0, 2, 5 } { 1, 2, 5 } { 0, 3, 5 } { 1, 3, 5 } { 2, 3, 5 } { 0, 4, 5 } { 1, 4, 5 } { 2, 4, 5 } { 3, 4, 5 } Figure 1.24-A: Combinations = = = = = = = = = = = = = = = = = = = = = 6 3  set (reversed) { 2, 1, 0 } { 3, 1, 0 } { 3, 2, 0 } { 3, 2, 1 } { 4, 1, 0 } { 4, 2, 0 } { 4, 2, 1 } { 4, 3, 0 } { 4, 3, 1 } { 4, 3, 2 } { 5, 1, 0 } { 5, 2, 0 } { 5, 2, 1 } { 5, 3, 0 } { 5, 3, 1 } { 5, 3, 2 } { 5, 4, 0 } { 5, 4, 1 } { 5, 4, 2 } { 5, 4, 3 } in co-lexicographic order. The reversed sets are sorted. A variant of the method which involves a division appears in [39, item 175]. The routine given here is due to Doug Moore and Glenn Rhoads. The following routine computes the predecessor of a combination: 1 2 3 4 5 6 7 static inline ulong prev_colex_comb(ulong x) // Inverse of next_colex_comb() { x = next_colex_comb( ~x ); if ( 0!=x ) x = ~x; return x; } The first and last combination can be computed via 1 2 3 4 5 6 7 8 9 static inline ulong first_comb(ulong k) // Return the first combination of (i.e. smallest word with) k bits, // i.e. 00..001111..1 (k low bits set) // Must have: 0 <= k <= BITS_PER_LONG { ulong t = ~0UL >> ( BITS_PER_LONG - k ); if ( k==0 ) t = 0; // shift with BITS_PER_LONG is undefined return t; } and 1 2 3 4 5 6 7 static inline ulong last_comb(ulong k, ulong n=BITS_PER_LONG) // return the last combination of (biggest n-bit word with) k bits // i.e. 1111..100..00 (k high bits set) // Must have: 0 <= k <= n <= BITS_PER_LONG { return first_comb(k) << (n - k); } The if-statement in first_comb() is needed because a shift by more than BITS_PER_LONG−1 is undefined by the C-standard, see section 1.1.5 on page 4. The listing in figure 1.24-A can be created with the program [FXT: bits/bitcombcolex-demo.cc]: 1 2 3 4 5 6 7 8 ulong n = 6, k = 3; ulong last = last_comb(k, n); ulong g = first_comb(k); ulong gg = 0; do { // visit combination given as word g gg = g; 64 9 10 11 Chapter 1: Bit wizardry g = next_colex_comb(g); } while ( gg!=last ); 1.24.2 Lexicographic (lex) order lex (5, 3) colex (5, 2) word = set word = set 1: ..111 = { 0, 1, 2 } ...11 = { 0, 1 } 2: .1.11 = { 0, 1, 3 } ..1.1 = { 0, 2 } 3: 1..11 = { 0, 1, 4 } ..11. = { 1, 2 } 4: .11.1 = { 0, 2, 3 } .1..1 = { 0, 3 } 5: 1.1.1 = { 0, 2, 4 } .1.1. = { 1, 3 } 6: 11..1 = { 0, 3, 4 } .11.. = { 2, 3 } 7: .111. = { 1, 2, 3 } 1...1 = { 0, 4 } 8: 1.11. = { 1, 2, 4 } 1..1. = { 1, 4 } 9: 11.1. = { 1, 3, 4 } 1.1.. = { 2, 4 } 10: 111.. = { 2, 3, 4 } 11... = { 3, 4 }  Figure 1.24-B: Combinations 53 in lexicographic order (left). The sets are sorted. The binary words with lex order are the bit-reversed complements of the words with colex order (right).  The binary words corresponding to combinations nk in lexicographic order are the bit-reversed comn plements of the words for the combinations n−k in co-lexicographic order, see figure 1.24-B. A more precise term for the order is subset-lex (for sets written with elements in increasing order). The sequence is identical to the delta-set-colex order backwards. The program [FXT: bits/bitcomblex-demo.cc] shows how to compute the subset-lex sequence efficiently: 1 2 3 4 5 6 7 8 9 10 11 12 13 ulong n = 5, k = 3; ulong x = first_comb(n-k); // first colex (n-k choose n) const ulong m = first_comb(n); // aux mask const ulong l = last_comb(k, n); // last colex ulong ct = 0; ulong y; do { y = revbin(~x, n) & m; // lex order // visit combination given as word y x = next_colex_comb(x); } while ( y != l ); The bit-reversal routine revbin() is shown in section 1.14 on page 33. Sections 6.2.1 on page 177 and section 6.2.2 give iterative algorithms for combinations (represented by arrays) in lex and colex order, respectively. 1.24.3 Shifts-order 1: 2: 3: 4: 5: 1.... .1... ..1.. ...1. ....1 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11... .11.. ..11. ...11 1.1.. .1.1. ..1.1 1..1. .1..1 1...1 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 111.. 1: 1111. .111. 2: .1111 ..111 3: 111.1 11.1. 4: 11.11 .11.1 5: 1.111 11..1 1.11. .1.11 1.1.1 1..11  Figure 1.24-C: Combinations k5 , for k = 1, 2, 3, 4 in shifts-order.  Figure 1.24-C shows combinations in shifts-order. The order for combinations nk is obtained from the shifts-order for subsets (section 8.4 on page 208) by discarding all subsets whose number of elements are 6= k and reversing the list order. The first combination is [1k 0n−k ] and the successor is computed as follows (see figure 1.24-D): 1.24: Generating bit combinations 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 1111... .1111.. ..1111. ...1111 111.1.. .111.1. ..111.1 111..1. .111..1 111...1 11.11.. .11.11. ..11.11 11.1.1. .11.1.1 11.1..1 11..11. .11..11 < S < S < S < S-2 < S < S < S-2 Figure 1.24-D: Updates with combinations 65 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35:  7 .11..11 11..1.1 11...11 1.111.. .1.111. ..1.111 1.11.1. .1.11.1 1.11..1 1.1.11. .1.1.11 1.1.1.1 1.1..11 1..111. .1..111 1..11.1 1..1.11 1...111 < S < S-2 < S-2 < S < S < S-2 < S < S-2 < S-2 < S < S-2 < S-2 4 : simple split ‘S’, split second ‘S-2’, easy case unmarked. 1. Easy case: if the rightmost one is not in position zero (least significant bit), then shift the word to the right and return the combination. 2. Finished?: if the combination is the last one ([0n ], [0n−1 1], [10n−k 1k−1 ]), then return zero. 3. Shift back: shift the word to the left such that the leftmost one is in the leftmost position (this can be a no-op). 4. Simple split: if the rightmost one is not the least significant bit, then move it one position to the right and return the combination. 5. Split second block: move the rightmost bit of the second block (from the right) of ones one position to the right and attach the lowest block of ones and return the combination. An implementation is given in [FXT: bits/bitcombshifts.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 class bit_comb_shifts { public: ulong x_; // the combination ulong s_; // how far shifted to the right ulong n_, k_; // combinations (n choose k) ulong last_; // last combination public: bit_comb_shifts(ulong n, ulong k) { n_ = n; k_ = k; first(); } ulong first(ulong n, ulong k) { s_ = 0; x_ = last_comb(k, n); if ( k>1 ) else last_ = first_comb(k-1) | (1UL<<(n_-1)); last_ = k; // [000001] or [000000] return x_; } ulong first() { return first(n_, k_); } ulong next() { if ( 0==(x_&1) ) // easy case: { ++s_; x_ >>= 1; return x_; } else // splitting cases: // [10000111] 66 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 Chapter 1: Bit wizardry { if ( x_ == last_ ) return 0; // combination was last x_ <<= s_; s_ = 0; ulong b = x_ & -x_; // shift back to the left // lowest bit if ( b!=1UL ) // simple split { x_ -= (b>>1); // move rightmost bit to the right return x_; } else // split second block and attach first { ulong t = low_ones(x_); // block of ones at lower end x_ ^= t; // remove block ulong b2 = x_ & -x_; // (second) lowest bit b2 >>= 1; x_ -= b2; // move bit to the right // attach block: do { t<<=1; } while ( 0==(t&x_) ); x_ |= (t>>1); return x_; } } } };   32 The combinations 32 20 are generated at a rate of about 150 M/s, for the combinations 12 the rate is  about 220 M/s [FXT: bits/bitcombshifts-demo.cc]. The rate with the combinations 60 7 is 415 M/s and  with 60 it is 110 M/s. The generation is very fast for the sparse case. 53 1.24.4 Minimal-change order ‡ The following routine is due to Doug Moore [FXT: bits/bitcombminchange.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 static inline ulong igc_next_minchange_comb(ulong x) // Return the inverse Gray code of the next combination in minimal-change order. // Input must be the inverse Gray code of the current combination. { ulong g = rev_gray_code(x); ulong i = 2; ulong cb; // ==candidate bits; do { ulong y = (x & ~(i-1)) + i; ulong j = lowest_one(y) << 1; ulong h = !!(y & j); cb = ((j-h) ^ g) & (j-i); i = j; } while ( 0==cb ); return x + lowest_one(cb); } It can be used as suggested by the routine 1 2 3 4 5 6 7 8 9 static inline ulong next_minchange_comb(ulong x, ulong last) // Not efficient, just to explain the usage of igc_next_minchange_comb() // Must have: last==igc_last_comb(k, n) { x = inverse_gray_code(x); if ( x==last ) return 0; x = igc_next_minchange_comb(x); return gray_code(x); } The auxiliary function igc_last_comb() is (32-bit version only) 1 2 static inline ulong igc_last_comb(ulong k, ulong n) // Return the (inverse Gray code of the) last combination 1.24: Generating bit combinations 3 4 5 6 7 8 9 10 11 12 67 // as in igc_next_minchange_comb() { if ( 0==k ) return 0; // } const ulong f = 0xaaaaaaaaUL >> (BITS_PER_LONG-k); // == first_sequency(k); const ulong c = ~0UL >> (BITS_PER_LONG-n); // == first_comb(n); return c ^ (f>>1); // =^= (by Doug Moore) return ((1UL<>= 1; x ^= x0; return x; 1.26: Binary words in lexicographic order for subsets 10 11 12 13 14 15 16 17 18 71 } else // lowest bit at word end { x ^= x0; // clear lowest bit x0 = x & -x; // new lowest bit ... x0 >>= 1; x -= x0; // ... is moved one to the right return x; } } The bit-reversed representation was chosen because the isolation of the lowest bit is often cheaper than the same operation on the highest bit. Starting with a one-bit word at position n − 1, we generate the 2n subsets of the word of n ones. The function is used as follows [FXT: bits/bitlex-demo.cc]: ulong n = 4; // n-bit binary words ulong x = 1UL<<(n-1); // first subset do { // visit word x } while ( (x=next_lexrev(x)) ); The following function goes backward: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 static inline ulong prev_lexrev(ulong x) // Return previous word in subset-lex order. { ulong x0 = x & -x; // lowest bit if ( x & (x0<<1) ) // easy case: next higher bit is set { x ^= x0; // clear lowest bit return x; } else { x += x0; // move lowest bit to the left x |= 1; // set rightmost bit return x; } } The sequence of all n-bit words is generated by 2n calls to prev_lexrev(), starting with zero. The words corresponding to subsets of the 6-element set are shown in figure 1.26-B. The sequence [1, 3, 2, 5, 7, 6, 4, 9, . . . ] in the right column is entry A108918 in [312]. The rate of generation using next() is about 274 million per second and about 253 million per second with prev(). An equivalent routine for arrays is given in section 8.1.2 on page 203. The routines are useful for a special version of fast Walsh transforms described in section 23.5.3 on page 472. 1.26.2 Conversion between binary and lex-ordered words A little contemplation on the structure of the binary words in lexicographic order leads to the routine that allows random access to the k-th lex-rev word (unrank algorithm) [FXT: bits/bitlex.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 static inline ulong negidx2lexrev(ulong k) { ulong z = 0; ulong h = highest_one(k); while ( k ) { while ( 0==(h&k) ) h >>= 1; z ^= h; ++k; k &= h - 1; } return z; } Let the inverse function be T (x), then we have T (0) = 0 and, with h(x) being the highest power of 2 not greater than x,  T (x − h(x)) if x − h(x) 6= 0 T (x) = h(x) − 1 + (1.26-1) h(x) otherwise 72 Chapter 1: Bit wizardry The ranking algorithm starts with the lowest bit: 1 2 3 4 5 6 7 8 9 10 11 12 13 static inline ulong lexrev2negidx(ulong x) { if ( 0==x ) return 0; ulong h = x & -x; // lowest bit ulong r = (h-1); while ( x^=h ) { r += (h-1); h = x & -x; // next higher bit } r += h; // highest bit return r; } 1.26.3 Minimal decompositions into terms 2k − 1 ‡ ....1 1 ...11 2 ...1. 1 ..1.1 2 ..111 3 ..11. 2 ..1.. 1 .1..1 2 .1.11 3 .1.1. 2 .11.1 3 .1111 4 .111. 3 .11.. 2 .1... 1 1...1 2 1..11 3 1..1. 2 1.1.1 3 1.111 4 1.11. 3 1.1.. 2 11..1 3 11.11 4 11.1. 3 111.1 4 11111 5 1111. 4 111.. 3 11... 2 1.... 1 ....1 = 1 ...1. = 2 ...11 = 3 ..1.. = 4 ..1.1 = 5 ..11. = 6 ..111 = 7 .1... = 8 .1..1 = 9 .1.1. = 10 .1.11 = 11 .11.. = 12 .11.1 = 13 .111. = 14 .1111 = 15 1.... = 16 1...1 = 17 1..1. = 18 1..11 = 19 1.1.. = 20 1.1.1 = 21 1.11. = 22 1.111 = 23 11... = 24 11..1 = 25 11.1. = 26 11.11 = 27 111.. = 28 111.1 = 29 1111. = 30 11111 = 31 = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = 1 1 + 1 3 3 + 1 3 + 1 + 1 3 + 3 7 7 + 1 7 + 1 + 1 7 + 3 7 + 3 + 1 7 + 3 + 1 + 1 7 + 3 + 3 7 + 7 15 15 + 1 15 + 1 + 1 15 + 3 15 + 3 + 1 15 + 3 + 1 + 1 15 + 3 + 3 15 + 7 15 + 7 + 1 15 + 7 + 1 + 1 15 + 7 + 3 15 + 7 + 3 + 1 15 + 7 + 3 + 1 + 1 15 + 7 + 3 + 3 15 + 7 + 7 15 + 15 31 Figure 1.26-C: Binary words in subset-lex order and P their bit counts (left columns). The least number of terms of the form 2k − 1 needed in the sum x = k 2k − 1 (right columns) equals the bit count.  P The least number of terms needed in the sum x = k 2k − 1 equals the number of bits of the lex-word as shown in figure 1.26-C. The number can be computed as c = bit_count( negidx2lexrev( x ) ); Alternatively, we can subtract the greatest integer of the form 2k − 1 until x is zero and count the number of subtractions. The sequence of these numbers is entry A100661 in [312]: 1,2,1,2,3,2,1,2,3,2,3,4,3,2,1,2,3,2,3,4,3,2,3,4,3,4,5,4,3,2,1,2,3,2,3,... The following function can be used to compute the sequence: 1 2 3 4 5 6 7 8 9 void S(ulong f, ulong n) // A100661 { static int s = 0; ++s; cout << s << ","; for (ulong m=1; m 0 1 --> 110 ------------0: (#=2) 1 1: (#=4) 110 2: (#=8) 1101100 3: (#=16) 110110011011000 4: (#=32) 1101100110110001101100110110000 5: (#=64) 110110011011000110110011011000011011001101100011011001101100000 Figure 1.26-E: String substitution with rules {0 → 0, 1 7→ 110}. The following function generates the bit-reversed binary words in reversed lexicographic order: 1 2 3 4 5 void C(ulong f, ulong n, ulong w) { for (ulong m=1; m 2 (1.26-5) Fibonacci words ‡ A Fibonacci word is a word that does not contain two successive ones. Whether a given binary word is a Fibonacci word can be tested with the function [FXT: bits/fibrep.h] 1.27: Fibonacci words ‡ 1 2 3 4 75 static inline bool is_fibrep(ulong f) { return ( 0==(f&(f>>1)) ); } The following functions convert between the binary and the Fibonacci representation: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 static inline ulong bin2fibrep(ulong b) // Return Fibonacci representation of b // Limitation: the first Fibonacci number greater // than b must be representable as ulong. // 32 bit: b < 2971215073=F(47) [F(48)=4807526976 > 2^32] // 64 bit: b < 12200160415121876738=F(93) [F(94) > 2^64] { ulong f0=1, f1=1, s=1; while ( f1<=b ) { ulong t = f0+f1; f0=f1; f1=t; s<<=1; } ulong f = 0; while ( b ) { s >>= 1; if ( b>=f0 ) { b -= f0; f^=s; } { ulong t = f1-f0; f1=f0; f0=t; } } return f; } 1 2 3 4 5 6 7 8 9 10 11 12 13 14 static inline ulong fibrep2bin(ulong f) // Return binary representation of f // Inverse of bin2fibrep(). { ulong f0=1, f1=1; ulong b = 0; while ( f ) { if ( f&1 ) b += f1; { ulong t=f0+f1; f0=f1; f1=t; } f >>= 1; } return b; } 1.27.1 Lexicographic order 0: ........ 1: .......1 2: ......1. 3: .....1.. 4: .....1.1 5: ....1... 6: ....1..1 7: ....1.1. 8: ...1.... 9: ...1...1 10: ...1..1. 11: ...1.1.. 12: ...1.1.1 13: ..1..... 14: ..1....1 15: ..1...1. 16: ..1..1.. 17: ..1..1.1 18: ..1.1... 19: ..1.1..1 20: ..1.1.1. 21: .1...... 22: .1.....1 23: .1....1. 24: .1...1.. 25: .1...1.1 26: .1..1... 27: .1..1..1 28: .1..1.1. 29: .1.1.... 30: .1.1...1 31: .1.1..1. 32: .1.1.1.. 33: .1.1.1.1 34: 1....... 35: 1......1 36: 1.....1. 37: 1....1.. 38: 1....1.1 39: 1...1... 40: 1...1..1 41: 1...1.1. 42: 1..1.... 43: 1..1...1 44: 1..1..1. 45: 1..1.1.. 46: 1..1.1.1 47: 1.1..... 48: 1.1....1 49: 1.1...1. 50: 1.1..1.. 51: 1.1..1.1 52: 1.1.1... 53: 1.1.1..1 54: 1.1.1.1. Figure 1.27-A: All 55 Fibonacci words with 8 bits in lexicographic order. The 8-bit Fibonacci words are shown in figure 1.27-A. To generate all Fibonacci words in lexicographic order, use the function [FXT: bits/fibrep.h] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 static inline ulong next_fibrep(ulong x) // With x the Fibonacci representation of n // return Fibonacci representation of n+1. { // 2 examples: // ex. 1 // // x == [*]0 010101 ulong y = x | (x>>1); // y == [*]? 011111 ulong z = y + 1; // z == [*]? 100000 z = z & -z; // z == [0]0 100000 x ^= z; // x == [*]0 110101 x &= ~(z-1); // x == [*]0 100000 return x; } // ex.2 // x == [*]0 01010 // y == [*]? 01111 // z == [*]? 10000 // z == [0]0 10000 // x == [*]0 11010 // x == [*]0 10000 76 Chapter 1: Bit wizardry The routine can be used to generate all n-bit words as shown in [FXT: bits/fibrep2-demo.cc]: const ulong f = 1UL << n; ulong t = 0; do { // visit(t) t = next_fibrep(t); } while ( t!=f ); The reversed order can be generated via ulong f = 1UL << n; do { f = prev_fibrep(f); // visit(f) } while ( f ); which uses the function (64-bit version) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 static inline ulong prev_fibrep(ulong x) // With x the Fibonacci representation of n // return Fibonacci representation of n-1. { // 2 examples: // ex. 1 // ex.2 // // x == [*]0 100000 // x == [*]0 10000 ulong y = x & -x; // y == [0]0 100000 // y == [0]0 10000 x ^= y; // x == [*]0 000000 // x == [*]0 00000 ulong m = 0x5555555555555555UL; // m == ...01010101 if ( m & y ) m >>= 1; // m == ...01010101 // m == ...0101010 m &= (y-1); // m == [0]0 010101 // m == [0]0 01010 x ^= m; // x == [*]0 010101 // x == [*]0 01010 return x; } The forward version generates about 180 million words per second, the backward version about 170 million words per second. 1.27.2 Gray code order ‡ A Gray code for the binary Fibonacci words (shown in figure 1.27-B) can be derived from the Gray code of the radix −2 representations (see section 1.22 on page 58) of binary words whose difference is of the form 1 3 5 9 19 37 73 147 293 ................1 ...............11 ..............1.1 .............1..1 ............1..11 ...........1..1.1 ..........1..1..1 .........1..1..11 ........1..1..1.1 The algorithm is to try these values as increments starting from the least, same as for the minimal-change combination described in section 1.24.4 on page 66. The next valid word is encountered if it is a valid Fibonacci word, that is, if it does not contain two consecutive set bits. The implementation is [FXT: class bit fibgray in bits/bitfibgray.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 class bit_fibgray // Fibonacci Gray code with binary words. { public: ulong x_; // current Fibonacci word ulong k_; // aux ulong fw_, lw_; // first and last Fibonacci word in Gray code ulong mw_; // max(fw_, lw_) ulong n_; // Number of bits public: bit_fibgray(ulong n) { n_ = n; fw_ = 0; 1.27: Fibonacci words ‡ j: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 77 k(j) ....11...1 ....11.... ....1.1111 ....1.11.. ....1.1.11 ....1.1.1. ....1.1..1 ....1.1... ....1...11 ....1...1. ....1....1 ....1..... .....11111 ......11.. ......1.11 ......1.1. ......1..1 ......1... ........11 ........1. .........1 .......... 1111111111 11111111.. 1111111.11 1111111.1. 111111...1 111111.... 11111.1111 11111.11.. 11111.1.11 11111.1.1. 11111.1..1 11111.1... k(j)-k(j-1) .......... .........1 .........1 ........11 .........1 .........1 .........1 .........1 .......1.1 .........1 .........1 .........1 .........1 .....1..11 .........1 .........1 .........1 .........1 .......1.1 .........1 .........1 .........1 .........1 ........11 .........1 .........1 ......1..1 .........1 .........1 ........11 .........1 .........1 .........1 .........1 x=bin2neg(k) ...111...1 ...111.... ...111..11 ...11111.. ...1111111 ...111111. ...1111..1 ...1111... ...11..111 ...11..11. ...11....1 ...11..... ...11...11 .....111.. .....11111 .....1111. .....11..1 .....11... .......111 .......11. .........1 .......... ........11 ......11.. ......1111 ......111. ....11...1 ....11.... ....11..11 ....1111.. ....111111 ....11111. ....111..1 ....111... gray(x) ...1..1..1 = ...1..1... = ...1..1.1. = ...1....1. = ...1...... = ...1.....1 = ...1...1.1 = ...1...1.. = ...1.1.1.. = ...1.1.1.1 = ...1.1...1 = ...1.1.... = ...1.1..1. = .....1..1. = .....1.... = .....1...1 = .....1.1.1 = .....1.1.. = .......1.. = .......1.1 = .........1 = .......... = ........1. = ......1.1. = ......1... = ......1..1 = ....1.1..1 = ....1.1... = ....1.1.1. = ....1...1. = ....1..... = ....1....1 = ....1..1.1 = ....1..1.. = 27 26 28 23 21 22 25 24 32 33 30 29 31 10 8 9 12 11 3 4 1 0 2 7 5 6 19 18 20 15 13 14 17 16 Figure 1.27-B: Gray code for the binary Fibonacci words (rightmost column). 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 for (ulong m=(1UL<<(n-1)); m!=0; m>>=3) fw_ |= m; lw_ = fw_ >> 1; if ( 0==(n&1) ) { ulong t=fw_; fw_=lw_; lw_=t; } // swap first/last mw_ = ( lw_>fw_ ? lw_ : fw_ ); x_ = fw_; k_ = inverse_gray_code(fw_); k_ = neg2bin(k_); } ~bit_fibgray() {;} ulong next() // Return next word in Gray code. // Return ~0 if current word is the last one. { if ( x_ == lw_ ) return ~0UL; ulong s = n_; // shift while ( 1 ) { --s; ulong c = 1 | (mw_ >> s); // possible difference for negbin word ulong i = k_ - c; ulong x = bin2neg(i); x ^= (x>>1); if ( 0==(x&(x>>1)) ) { k_ = i; x_ = x; return x; } // is_fibrep(x) } } }; About 130 million words per second are generated. The program [FXT: bits/bitfibgray-demo.cc] shows how to use the class, figure 1.27-B was created with it. Section 14.2 on page 305 gives a recursive algorithm for Fibonacci words in Gray code order. 78 Chapter 1: Bit wizardry 1.28 Binary words and parentheses strings ‡ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 .... P ...1 P ..1. ..11 P .1.. .1.1 P .11. .111 P 1... 1..1 1.1. 1.11 P 11.. 11.1 P 111. 1111 P [empty string] () (()) ()() ((())) (()()) ()(()) (((()))) ..... ....1 ...11 ..1.1 ..111 .1.11 .11.1 .1111 1..11 1.1.1 1.111 11.11 111.1 11111 [empty string] () (()) ()() ((())) (()()) ()(()) (((()))) (())() ()()() ((()())) (()(())) ()((())) ((((())))) Figure 1.28-A: Left: some of the 4-bit binary words can be interpreted as a string parentheses (marked with ‘P’). Right: all 5-bit words that correspond to well-formed parentheses strings. A subset of the binary words can be interpreted as a (well formed) string of parentheses. The 4-bit binary words that have this property are marked with a ‘P’ in figure 1.28-A (left) [FXT: bits/parenworddemo.cc]. The strings are constructed by scanning the word from the low end and printing a ‘(’ with each one and a ‘)’ with each zero. To find out when to terminate, one adds up +1 for each opening parenthesis and −1 for a closing parenthesis. After the ones in the binary word have been scanned, the s closing parentheses have to be added where s is the value of the sum [FXT: bits/parenwords.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 static inline void parenword2str(ulong x, char *str) { int s = 0; ulong j = 0; for (j=0; x!=0; ++j) { s += ( x&1 ? +1 : -1 ); str[j] = ")("[x&1]; x >>= 1; } while ( s-- > 0 ) str[j++] = ’)’; // finish string str[j] = 0; // terminate string } The 5-bit binary words that are valid ‘paren words’ together with the corresponding strings are shown in figure 1.28-A (right). Note that the lower bits in the word (right end) correspond to the beginning of the string (left end). If a negative value for the sums occurs at any time of the computation, the word is not a paren word. A function to determine whether a word is a paren word is 1 2 3 4 5 6 7 8 9 10 11 static inline bool is_parenword(ulong x) { int s = 0; for (ulong j=0; x!=0; ++j) { s += ( x&1 ? +1 : -1 ); if ( s<0 ) break; // invalid word x >>= 1; } return (s>=0); } The sequence 1, 3, 5, 7, 11, 13, 15, 19, 21, 23, 27, 29, 31, 39, 43, 45, 47, 51, 53, 55, 59, 61, 63, ... of nonzero integers x so that is_parenword(x) returns true is entry A036991 in [312]. If we fix the number of paren pairs, then the following functions generate the least and biggest valid paren words. The first paren word is a block of n ones at the low end: 1 2 3 static inline ulong first_parenword(ulong n) // Return least binary word corresponding to n pairs of parens // Example, n=5: .....11111 ((((())))) 1.28: Binary words and parentheses strings ‡ 4 5 6 79 { return first_comb(n); } The last paren word is the word with a sequence of n blocks ‘01’ at the low end: 1 2 3 4 5 6 7 static inline ulong last_parenword(ulong n) // Return biggest binary word corresponding to n pairs of parens. // Must have: 1 <= n <= BITS_PER_LONG/2. // Example, n=5: .1.1.1.1.1 ()()()()() { return 0x5555555555555555UL >> (BITS_PER_LONG-2*n); } ......11111 = ((((())))) .....1.1111 = (((()()))) .....11.111 = ((()(()))) .....111.11 = (()((()))) .....1111.1 = ()(((()))) ....1..1111 = (((())())) ....1.1.111 = ((()()())) ....1.11.11 = (()(()())) ....1.111.1 = ()((()())) ....11..111 = ((())(())) ....11.1.11 = (()()(())) ....11.11.1 = ()(()(())) ....111..11 = (())((())) ....111.1.1 = ()()((())) ...1...1111 = (((()))()) ...1..1.111 = ((()())()) ...1..11.11 = (()(())()) ...1..111.1 = ()((())()) ...1.1..111 = ((())()()) ...1.1.1.11 = (()()()()) ...1.1.11.1 = ()(()()()) ...1.11..11 = (())(()()) ...1.11.1.1 = ()()(()()) ...11...111 = ((()))(()) ...11..1.11 = (()())(()) ...11..11.1 = ()(())(()) ...11.1..11 = (())()(()) ...11.1.1.1 = ()()()(()) ..1....1111 = (((())))() ..1...1.111 = ((()()))() ..1...11.11 = (()(()))() ..1...111.1 = ()((()))() ..1..1..111 = ((())())() ..1..1.1.11 = (()()())() ..1..1.11.1 = ()(()())() ..1..11..11 = (())(())() ..1..11.1.1 = ()()(())() ..1.1...111 = ((()))()() ..1.1..1.11 = (()())()() ..1.1..11.1 = ()(())()() ..1.1.1..11 = (())()()() ..1.1.1.1.1 = ()()()()() Figure 1.28-B: The 42 binary words corresponding to all valid pairings of 5 parentheses, in colex order. The sequence of all binary words corresponding to n pairs of parens in colex order can be generated with the following (slightly cryptic) function: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 static inline ulong next_parenword(ulong x) // Next (colex order) binary word that is a paren word. { if ( x & 2 ) // Easy case, move highest bit of lowest block to the left: { ulong b = lowest_zero(x); x ^= b; x ^= (b>>1); return x; } else // Gather all low "01"s and split lowest nontrivial block: { if ( 0==(x & (x>>1)) ) return 0; ulong w = 0; // word where the bits are assembled ulong s = 0; // shift for lowest block ulong i = 1; // == lowest_one(x) do // collect low "01"s: { x ^= i; w <<= 1; w |= 1; ++s; i <<= 2; // == lowest_one(x); } while ( 0==(x&(i<<1)) ); ulong z = x ^ (x+i); x ^= z; z &= (z>>1); z &= (z>>1); w ^= (z>>s); x |= w; return x; // lowest block } } The program [FXT: bits/parenword-colex-demo.cc] shows how to create a list of binary words corresponding to n pairs of parens (code slightly shortened): 1 ulong n = 4; // Number of paren pairs 80 Chapter 1: Bit wizardry 2 3 4 5 6 7 8 9 10 11 12 ulong pn = 2*n+1; char *str = new char[n+1]; str[n] = 0; ulong x = first_parenword(n); while ( x ) { print_bin(" ", x, pn); parenword2str(x, str); cout << " = " << str << endl; x = next_parenword(x); } Its output with n = 5 is shown in figure 1.28-B. The 1,767,263,190 paren words for n = 19 are generated at a rate of about 169 million words per second. Chapter 15 on page 323 gives a different formulation of the algorithm. Knuth [215, ex.23, sect.7.1.3] gives a very elegant routine for generating the next paren word, the comments are MMIX instructions: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 static inline ulong next_parenword(ulong x) { const ulong m0 = -1UL/3; ulong t = x ^ m0; // XOR t, x, m0; if ( (t&x)==0 ) return 0; // current is last ulong u = (t-1) ^ t; // SUBU u, t, 1; XOR u, t, u; ulong v = x | u; // OR v, x, u; ulong y = bit_count( u & m0 ); // SADD y, u, m0; ulong w = v + 1; // ADDU w, v, 1; t = v & ~w; // ANDN t, v, w; y = t >> y; // SRU y, t, y; y += w; // ADDU y, w, y; return y; } The routine is slower, however, about 81 million words per second are generated. A bit-count instruction in hardware would speed it up significantly. Treating the case of easy update separately as in the other version, we get a rate of about 137 million words per second. 1.29 Permutations via primitives ‡ We give two methods to specify permutations of the bits of a binary word via one or more control words. The methods are suggestions for machine instructions that can serve as primitives for permutations of the bits of a word. 1.29.1 A restricted method ................1111111111111111 ........11111111........11111111 ....1111....1111....1111....1111 ..11..11..11..11..11..11..11..11 .1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1 ................1............... ........1...............1....... ....1.......1.......1.......1... ..1...1...1...1...1...1...1...1. .1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1 bits bits bits bits bits 15 ... 7 ... 3 11 ... 1 5 9 0 2 4 13 ... 6 8 10 12 14 ... Figure 1.29-A: Mask with primitives for permuting bits with 32-bit words (top), and words with ones at the highest bit of each block (bottom). We can specify a subset of all permutations by selecting bit-blocks of the masks as shown for 32-bit words in figure 1.29-A (top). Subsets of the blocks of the masks can be determined with the bits of a word by considering the highest bit of each block (bottom of the figure). We use all bits of a word (except for the highest bit) to select the blocks where the bits defined by the block and those left to it should be 1.29: Permutations via primitives ‡ 81 swapped. An implementation of the implied algorithm is given in [FXT: bits/bitperm1-demo.cc]. Arrays are used to give more readable code: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 void perm1(uchar *a, ulong ldn, const uchar *x) // Permute a[] according to the ’control word’ x[]. // The length of a[] must be 2**ldn. { long n = 1L<0; s/=2) { for (long k=0; k 2) specify all N ! permutations as we can choose between only 2N −1 control words. Now set the word length to N := 2n . The reachable permutations are those where the intervals [k · 2j , . . . , (k + 1) · 2j − 1] contain all numbers [p · 2j , . . . , (p + 1) · 2j − 1] for all j ≤ n and 0 ≤ k < 2n−j , choosing p for each interval arbitrarily (0 ≤ p < 2n−j ). For example, the lower half of the permuted array must contain a permutation of either the lower or the upper half (j = n − 1) and each pair a2y , a2y+1 must contain two elements 2z, 2z + 1 (j = 1). The bit-reversal is computed with a control word where all bits are set. Alas, the (important!) zip permutation (bit-zip, see section 1.15 on page 38) is unreachable. A machine instruction could choose between the two routines via the highest bit in the control word. 1.29.2 A general method All permutations of N = 2n elements can be specified with n control words of N bits. Assume we have a machine instruction that collects bits according to a control word. An eight bit example: a = abcdefgh x = ..1.11.1 cefh abdg abdgcefh input data control word (dots for zeros) bits of a where x has a one bits of a where x has a zero result, bits separated according to x We need n such instructions that work on all length-2k sub-words for 1 ≤ k ≤ n. For example, the instruction working on half words of a 16-bit word would work as a = abcdefgh ABCDEFGH x = ..1.11.1 1111.... cefh ABCD abdg EFGH abdgcefh EFGHABCD input data control word (dots for zeros) bits of a where x has a one bits of a where x has a zero result, bits separated according to x Note the bits of the different sub-words are not mixed. Now all permutations can be reached if the control word for the 2k -bit sub-words have exactly 2k−1 bits set in all ranges [j · 2k , . . . , (j + 1) · 2k ]. 82 Chapter 1: Bit wizardry A control word together with the specification of the instruction used defines the action taken. The following leads to a swap of adjacent bit pairs 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1. k= 1 (2-bit sub-words) while this 1.1.1.1.1.1.1.1.1.1.1.1.1.1.1.1. k= 5 (32 bit sub-words) results in gathering the even and odd indexed bits in the halfwords. A complete set of permutation primitives for 16-bit words and their effect on a symbolic array of bits (split into groups of four elements for readability) is 11111111........ 1111....1111.... 11..11..11..11.. 1.1.1.1.1.1.1.1. k= 4 k= 3 k= 2 k= 1 0123 4567 89ab cdef ==> 89ab cdef 0123 4567 ==> cdef 89ab 4567 0123 ==> efcd ab89 6745 2301 ==> fedc ba98 7654 3210 The top primitive leads to a swap of the left and right half of the bits, the next to a swap of the halves of the half words and so on. The computed permutation is array reversal. Note that we use array notation (least index left) here. The resulting permutation depends on the order in which the primitives are used. When starting with full words we get: 0123 4567 89ab cdef 1.1. 1.1. 1.1. 1.1. k= 4 ==> 1357 9bdf 0246 8ace 1.1. 1.1. 1.1. 1.1. k= 3 ==> 37bf 159d 26ae 048c 1.1. 1.1. 1.1. 1.1. k= 2 ==> 7f3b 5d19 6e2a 4c08 1.1. 1.1. 1.1. 1.1. k= 1 ==> f7b3 d591 e6a2 c480 The result is different when starting with 2-bit sub-words: 0123 4567 89ab cdef 1.1. 1.1. 1.1. 1.1. k= 1 ==> 1032 5476 98ba dcfe 1.1. 1.1. 1.1. 1.1. k= 2 ==> 0213 4657 8a9b cedf 1.1. 1.1. 1.1. 1.1. k= 3 ==> 2367 0145 abef 89cd 1.1. 1.1. 1.1. 1.1. k= 4 ==> 3715 bf9d 2604 ae8c  2z There are z possibilities to have z bits set in a 2z-bit word. There are 2n−k length-2k sub-words in a 2n -bit word so the number of valid control words for that step is  2k 2n−k 2k−1 The product of the number of valid words in all steps gives the number of permutations: n (2 )! = n  k 2 Y 2 k=1 n−k 2k−1 1.30 CPU instructions often missed 1.30.1 Essential (1.29-1) • Bit-shift and bit-rotate instructions that work properly for shifts greater than or equal to the word length: the shift instruction should zero the word, the rotate instruction should take the shift modulo word length. The C-language standards leave the results for these operations undefined and compilers simply emit the corresponding assembler instructions. The resulting CPU dependent behavior is both a source of errors and makes certain optimizations impossible. • A bit-reverse instruction. A fast byte-swap mitigates the problem, see section 1.14 on page 33. • Instructions that return the index of highest or lowest set bit in a word. They must execute fast. • Fast conversion from integer to float and double (both directions). 1.31: Some space filling curves ‡ 83 • A fused multiply-add instruction for floats. • Instructions for the multiplication of complex floating-point numbers, computing A · C − B · D and A · D + B · C from A, B, C, and D. • A sum-diff instruction, computing A + B and A − B from A and B. This can serve as a primitive for fast orthogonal transforms. • An instruction to swap registers. Even better, a conditional version of that. 1.30.2 Nice to have • A parity bit for the complete machine word. The parity of a word is the number of bits modulo 2, not the complement of it. Even better, an instruction for the inverse Gray code, see section 1.16 on page 41. • A bit-count instruction, see section 1.8 on page 18. This would also give the parity at bit zero. • An instruction for computing the index of the i-th set bit of a word, see section 1.10 on page 25. This would be useful even if execution takes a dozen cycles. • A random number generator, LHCAs (see section 41.8 on page 878) may be candidates. At the very least: a decent entropy source. • A conditional version of more than just the move instruction, possibly as an instruction prefix. • A bit-zip and a bit-unzip instruction, see section 1.15 on page 38. Note this is polynomial squaring over GF(2). • Primitives for permutations of bits, see section 1.29.2 on page 81. A bit-gather and a bit-scatter instruction for sub-words of all sizes a power of 2 would allow for arbitrary permutations (see [FXT: bits/bitgather.h] and [FXT: bits/bitseparate.h] for versions working on complete words). • Multiplication corresponding to XOR as addition. This is the multiplication without carries used for polynomials over GF(2), see section 40.1 on page 822. 1.31 Some space filling curves ‡ 1.31.1 The Hilbert curve A rendering of the Hilbert curve (named after David Hilbert [182]) is shown in figure 1.31-A. An efficient algorithm to compute the direction of the n-th move of the Hilbert curve is based on the parity of the number of threes in the radix-4 representation of n (see section 38.9.1 on page 748). Let dx and dy correspond to the moves at step n in the Hilbert curve. Then dx , dy ∈ {−1, 0, +1} and exactly one of them is zero. So for both p := dx + dy and m := dx − dy we have p, m ∈ {−1, +1}. The following function computes p and returns 0, 1 if p = −1, +1, respectively [FXT: bits/hilbert.h]: 1 2 3 4 5 6 7 8 9 static inline ulong hilbert_p(ulong t) // Let dx,dy be the horizontal,vertical move // with step t of the Hilbert curve. // Return zero if (dx+dy)==-1, else one (then: (dx+dy)==+1). // Algorithm: count number of threes in radix 4 { ulong d = (t & 0x5555555555555555UL) & ((t & 0xaaaaaaaaaaaaaaaaUL) >> 1); return parity( d ); } If 1 is returned the step is to the right or upwards. The function can be slightly optimized as follows (64-bit version only): 1 2 3 static inline ulong hilbert_p(ulong t) { t &= ((t & 0xaaaaaaaaaaaaaaaaUL) >> 1); 84 Chapter 1: Bit wizardry Figure 1.31-A: The first 255 segments of the Hilbert curve. dx+dy: ++-+++-+++----++++-+++-+++----++++-+++-+++----+---+---+---++++dx-dy: +----+++-+++-+++-++++---+---+----++++---+---+----++++---+---+-dir: >^<^^>v>^>vv>^>v>>^<^>^<v>>^<^>^<vv<^^< turn: 0--+0++--++0+--0-++-0--++--0-++00++-0--++--0-++-0--+0++--++0+-Figure 1.31-B: Moves and turns of the Hilbert curve. 4 5 6 7 8 9 10 t ^= t>>2; t ^= t>>4; t ^= t>>8; t ^= t>>16; t ^= t>>32; return t & 1; } The corresponding value for m can be computed as: 1 2 3 4 5 6 7 static inline ulong hilbert_m(ulong t) // Let dx,dy be the horizontal,vertical move // with step t of the Hilbert curve. // Return zero if (dx-dy)==-1, else one (then: (dx-dy)==+1). { return hilbert_p( -t ); } If the values for p and m are equal the step is in horizontal direction. It remains to merge the values of p and m into a 2-bit value d that encodes the direction of the move: 1 2 3 4 5 6 7 8 9 10 static inline ulong hilbert_dir(ulong t) // Return d encoding the following move with the Hilbert curve. // // d \in {0,1,2,3} as follows: // d : direction // 0 : right (+x: dx=+1, dy= 0) // 1 : down (-y: dx= 0, dy=-1) // 2 : up (+y: dx= 0, dy=+1) // 3 : left (-x: dx=-1, dy= 0) { 1.31: Some space filling curves ‡ 11 12 13 14 15 85 ulong p = hilbert_p(t); ulong m = hilbert_m(t); ulong d = p ^ (m<<1); return d; } To print the value of d symbolically, we can print the value of (">v^<")[d]. The sequence of moves can also be generated by the string substitution process shown in figure 1.31-C. Start: A Rules: A --> D>A^A CD C --> BvC A^D>DvB > --> > < --> < ^ --> ^ v --> v ------------0: (#=1) A 1: (#=7) D>A^ADvB>D>A^AA^AA^ADvB>A^D>DvBvCD>A^D>DvB>D>A^AA^ADvB>D>A^A ... Figure 1.31-C: Moves of the Hilbert curve by a string substitution process, the symbols ‘A’, ‘B’, ‘C’, and ‘D’, are ignored when drawing the curve. The turn u between steps can be computed as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 static inline int hilbert_turn(ulong t) // Return the turn (left or right) with the steps // t and t-1 of the Hilbert curve. // Returned value is // 0 for no turn // +1 for right turn // -1 for left turn { ulong d1 = hilbert_dir(t); ulong d2 = hilbert_dir(t-1); d1 ^= (d1>>1); d2 ^= (d2>>1); ulong u = d1 - d2; // at this point, symbolically: cout << ("+.-0+.-")[ u + 3 ]; if ( 0==u ) return 0; if ( (long)u<0 ) u += 4; return (1==u ? +1 : -1); } To print the value of u symbolically, we can print ("-0+")[u+1];. The values of p and m, followed by the direction and turn of the Hilbert curve are shown in figure 1.31-B. The list was created with the program [FXT: bits/hilbert-moves-demo.cc]. Figure 1.31-A was created with the program [FXT: bits/hilbert-texpic-demo.cc]. The computation of a function whose series coefficients are ±1 and ±i according to the Hilbert curve is described in section 38.9 on page 747. A finite state machine (FSM) for the conversion from a 1-dimensional coordinate (linear coordinate of the curve) to the pair of coordinates x and y of the Hilbert curve is described in [39, item 115]. At each step two bits of input are processed. The array htab[] serves as lookup table for the next state and two bits of the result. The FSM has an internal state of two bits [FXT: bits/lin2hilbert.cc]: 1 2 3 4 5 6 7 8 9 10 void lin2hilbert(ulong t, ulong &x, ulong &y) // Transform linear coordinate t to Hilbert x and y { ulong xv = 0, yv = 0; ulong c01 = (0<<2); // (2<<2) for transposed output (swapped x, y) for (ulong i=0; i<(BITS_PER_LONG/2); ++i) { ulong abi = t >> (BITS_PER_LONG-2); t <<= 2; 86 11 12 13 14 15 16 17 18 19 20 21 Chapter 1: Bit wizardry ulong st = htab[ (c01<<2) | abi ]; c01 = st & 3; yv <<= 1; yv |= ((st>>2) & 1); xv <<= 1; xv |= (st>>3); } x = xv; y = yv; OLD C C 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 A B I I 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 } NEW X Y I I 0 0 0 1 1 1 1 0 1 1 0 1 0 0 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0 0 0 1 C C 0 1 1 0 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 0 1 0 1 0 1 1 0 1 1 1 1 1 1 0 NEW C C 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 0 1 1 1 1 1 1 1 1 X Y I I 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 1 0 1 1 OLD A B I I 0 0 0 1 1 1 1 0 1 0 0 1 1 1 0 0 0 0 1 1 0 1 1 0 1 0 1 1 0 1 0 0 C C 0 1 1 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 1 1 1 0 1 0 1 1 1 0 1 1 0 1 Figure 1.31-D: The original table from [39] for the finite state machine for the 2-dimensional Hilbert curve (left). All sixteen 4-bit words appear in both the ‘OLD’ and the ‘NEW’ column. So the algorithm is invertible. Swap the columns and sort numerically to obtain the two columns at the right, the table for the inverse function. The table used is defined (see figure 1.31-D) as 1 2 3 4 5 6 7 8 9 10 11 static const ulong htab[] = { #define HT(xi,yi,c0,c1) ((xi<<3)+(yi<<2)+(c0<<1)+(c1)) // index == HT(c0,c1,ai,bi) HT( 0, 0, 1, 0 ), HT( 0, 1, 0, 0 ), HT( 1, 1, 0, 0 ), HT( 1, 0, 0, 1 ), [--snip--] HT( 0, 0, 1, 1 ), HT( 0, 1, 1, 0 ) }; As indicated in the code, the table maps every four bits c0,c1,ai,bi to four bits xi,yi,c0,c1. The table for the inverse function (again, see figure 1.31-D) is 1 2 3 4 5 6 7 8 9 10 11 static const ulong ihtab[] = { #define IHT(ai,bi,c0,c1) ((ai<<3)+(bi<<2)+(c0<<1)+(c1)) // index == HT(c0,c1,xi,yi) IHT( 0, 0, 1, 0 ), IHT( 0, 1, 0, 0 ), IHT( 1, 1, 0, 1 ), IHT( 1, 0, 0, 0 ), [--snip--] IHT( 0, 1, 1, 1 ), IHT( 0, 0, 0, 1 ) }; The words have to be processed backwards: 1 2 3 4 5 6 ulong hilbert2lin(ulong x, ulong y) // Transform Hilbert x and y to linear coordinate t { ulong t = 0; ulong c01 = 0; 1.31: Some space filling curves ‡ 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 87 for (ulong i=0; i<(BITS_PER_LONG/2); ++i) { t <<= 2; ulong xi = x >> (BITS_PER_LONG/2-1); xi &= 1; ulong yi = y >> (BITS_PER_LONG/2-1); yi &= 1; ulong xyi = (xi<<1) | yi; x <<= 1; y <<= 1; ulong st = ihtab[ (c01<<2) | xyi ]; c01 = st & 3; t |= (st>>2); } return t; } 1.31.2 The Z-order Figure 1.31-E: The first 255 segments of the Z-order curve. A 2-dimensional space-filling curve in Z-order traverses all points in each quadrant before it enters the next. Figure 1.31-E shows a rendering of the Z-order curve, created with the program [FXT: bits/zordertexpic-demo.cc]. The conversion between a linear parameter to a pair of coordinates is done by separating the bits at the even and odd indices [FXT: bits/zorder.h]: static inline void lin2zorder(ulong t, ulong &x, ulong &y) { bit_unzip2(t, x, y); } The routine bit_unzip2() is described in section 1.15 on page 38. The inverse is static inline ulong zorder2lin(ulong x, ulong y) { return bit_zip2(x, y); } The next pair can be computed with the following (constant amortized time) routine: 1 2 static inline void zorder_next(ulong &x, ulong &y) { 88 3 4 5 6 7 8 9 10 11 Chapter 1: Bit wizardry ulong b = 1; do { x ^= b; b &= ~x; y ^= b; b &= ~y; b <<= 1; } while ( b ); } The previous pair is computed similarly: 1 2 3 4 5 6 7 8 9 10 11 static inline void zorder_prev(ulong &x, ulong &y) { ulong b = 1; do { x ^= b; b &= x; y ^= b; b &= y; b <<= 1; } while ( b ); } The routines are written in a way that generalizes easily to more dimensions: 1 2 3 4 5 6 7 8 9 10 11 12 static inline void zorder3d_next(ulong &x, ulong &y, ulong &z) { ulong b = 1; do { x ^= b; b &= ~x; y ^= b; b &= ~y; z ^= b; b &= ~z; b <<= 1; } while ( b ); } 1 2 3 4 5 6 7 8 9 10 11 12 static inline void zorder3d_prev(ulong &x, ulong &y, ulong &z) { ulong b = 1; do { x ^= b; b &= x; y ^= b; b &= y; z ^= b; b &= z; b <<= 1; } while ( b ); } Unlike with the Hilbert curve there are steps where the curve advances more than one unit. 1.31.3 Curves via paper-folding sequences The paper-folding sequence, entry A014577 in [312], starts as [FXT: bits/bit-paper-fold-demo.cc]: 11011001110010011101100011001001110110011100100011011000110010011 ... The k-th element (k > 0) is one if k = 2t · (4u + 1), entry A091072 in [312]: 1, 2, 4, 5, 8, 9, 10, 13, 16, 17, 18, 20, 21, 25, 26, 29, 32, 33, ... The k-th element of the paper-folding sequence can be computed by testing the value of the bit left to the lowest (that is, rightmost) one in the binary expansion of k [FXT: bits/bit-paper-fold.h]: 1 2 3 4 5 6 static inline bool bit_paper_fold(ulong k) { ulong h = k & -k; // == lowest_one(k) k &= (h<<1); return ( k==0 ); } About 550 million values per second are generated. We use bool as return type to indicate that only zero or one is returned. The value can be used as an integer of arbitrary type, there is no need for a cast. 1.31: Some space filling curves ‡ Figure 1.31-F: The first 1024 segments of the dragon curve (two different renderings). 89 90 Chapter 1: Bit wizardry 1.31.3.1 The dragon curve Another name for the sequence is dragon curve sequence, because a space filling curve known as dragon curve (or Heighway dragon) can be generated if we interpret a one as ‘turn left’ and a zero as ‘turn right’. The top of figure 1.31-F shows the first 1024 segments of the curve (created with [FXT: bits/dragoncurve-texpic-demo.cc]). As some points are visited twice we draw the turns with cut off corners, for the (left) turn A → B → C: C | | | A --- B drawn as C | | / A --/B The code is given in [FXT: aux0/tex-line.cc]. The first few moves of the curve can be found by repeatedly folding a strip of paper. Always pick up the right side and fold to the left. Unfold the paper and adjust all corners to be 90 degrees. This gives the first few segments of the dragon curve. When all angles are replaced by diagonals between the midpoints of the lines C | | | A --- B C drawn as A / / / B then the curve appears as shown at the bottom of figure 1.31-F. Start: 0 Rules: 0 --> 01 1 --> 21 2 --> 23 3 --> 03 ------------0: 0 1: 01 2: 0121 3: 01212321 4: 0121232123032321 5: 01212321230323212303010323032321 6: 0121232123032321230301032303232123030103012101032303010323032321 +^-^-v-^-v+v-v-^-v+v+^+v-v+v-v-^-v+v+^+v+^-^+^+v-v+v+^+v-v+v-v-^ Figure 1.31-G: Moves of the dragon curve generated by a string substitution process. The net rotation of the dragon-curve after k steps, as multiple of the right angle, can be computed by counting the ones in the Gray code of k. Take the result modulo 4 to ignore multiples of 360 degree [FXT: bits/bit-paper-fold.h]: 1 static inline bool bit_dragon_rot(ulong k) { return bit_count( k ^ (k>>1) ) & 3; } The sequence of rotations is entry A005811 in [312]: seq = 0 1 2 1 2 3 2 1 2 3 4 3 2 3 2 1 2 3 4 3 4 5 4 3 2 3 4 3 2 3 2 1 2 3 ... mod 4 = 0 1 2 1 2 3 2 1 2 3 0 3 2 3 2 1 2 3 0 3 0 1 0 3 2 3 0 3 2 3 2 1 2 3 ... move = + ^ - ^ - v - ^ - v + v - v - ^ - v + v + ^ + v - v + v - v - ^ - v ... The sequence of moves (as symbols, last row) can be computed with [FXT: bits/dragon-curve-movesdemo.cc]. A function related to the paper-folding sequence is described in section 38.8.3 on page 744. 1.31.3.2 The alternate paper-folding sequence If the strip of paper is folded alternately from the left and right, then another paper-folding sequence is obtained. It is entry A106665 in [312] and it starts as [FXT: bits/bit-paper-fold-alt-demo.cc]: 10011100100011011001110110001100100111001000110010011101100011011 ... Compute the sequence via [FXT: bits/bit-paper-fold.h] 1 2 3 static inline bool bit_paper_fold_alt(ulong k) { ulong h = k & -k; // == lowest_one(k) 1.31: Some space filling curves ‡ Figure 1.31-H: The first 512 segments of the curve from the alternate paper-folding sequence. Start: 0 Rules: 0 --> 01 1 --> 03 2 --> 23 3 --> 21 ------------0: 0 1: 01 2: 0103 3: 01030121 4: 0103012101032303 5: 01030121010323030103012123210121 6: 0103012101032303010301212321012101030121010323032321230301032303 +^+v+^-^+^+v-v+v+^+v+^-^-v-^+^-^+^+v+^-^+^+v-v+v-v-^-v+v+^+v-v+v Figure 1.31-I: Moves of the alternate curve generated by a string substitution process. 91 92 Chapter 1: Bit wizardry Start: L Rules: L --> L+R+L-R R --> L+R-L-R + --> + - --> ------------0: (#=1) L 1: (#=7) L+R+L-R 2: (#=31) L+R+L-R+L+R-L-R+L+R+L-R-L+R-L-R 3: (#=127) L+R+L-R+L+R-L-R+L+R+L-R-L+R-L-R+L+R+L-R+L+R-L-R-L+R+L-R-L+R-L-R+L+R+L-R+L+R-L-R+L+ ... Start: L Rules: L --> R+L+R-L R --> R+L-R-L + --> + - --> ------------0: (#=1) L 1: (#=7) R+L+R-L 2: (#=31) R+L-R-L+R+L+R-L+R+L-R-L-R+L+R-L 3: (#=127) R+L-R-L+R+L+R-L-R+L-R-L-R+L+R-L+R+L-R-L+R+L+R-L+R+L-R-L-R+L+R-L+R+L-R-L+R+L+R-L-R+ ... Figure 1.31-J: Moves and turns of the dragon curve (top) and alternate dragon curve (bottom). 4 5 6 7 h <<= 1; ulong t = h & (k ^ 0xaaaaaaaaUL); return ( t!=0 ); // 32-bit version } About 413 million values per second are generated. By interpreting the sequence of zeros and ones as turns we again obtain triangular space-filling curves shown in figure 1.31-H. The orientations can be computed as 1 2 3 4 5 6 7 8 9 10 11 12 static inline ulong bit_paper_fold_alt_rot(ulong k) // Return total rotation (as multiple of the right angle) // after k steps in the alternate paper-folding curve. // k= 0, 1, 2, 3, 4, 5, ... // seq(k)= 0, 1, 0, 3, 0, 1, 2, 1, 0, 1, 0, 3, 2, 3, 0, ... // move = + ^ + v + ^ - ^ + ^ + v - v + // (+==right, -==left, ^==up, v==down). // Algorithm: count the ones in (w ^ gray_code(k)). { const ulong w = 0xaaaaaaaaUL; // 32-bit version return bit_count( w ^ (k ^ (k>>1)) ) & 3; // modulo 4 } Figure 1.31-J shows a different string substitution process for the generation of the rotations (symbols ‘+’ and ‘-’) for the paper-folding sequences, both symbols ‘L’ and ‘R’ are interpreted as a unit move in the current direction. If the constant in the routine is replaced by a parameter w, then its bits determine whether a left or a right fold was made at each step: 1 2 3 4 5 6 7 static inline bool bit_paper_fold_general(ulong k, ulong w) { ulong h = k & -k; // == lowest_one(k) h <<= 1; ulong t = h & (k^w); return ( t!=0 ); } 1.31.4 Terdragon and hexdragon The terdragon curve turns to the left or right by 120 degrees depending to the sequence 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, ... 1.31: Some space filling curves ‡ Figure 1.31-K: The first 729 segments of the terdragon (two different renderings). 93 94 Chapter 1: Bit wizardry Figure 1.31-L: The first 729 segments of the hexdragon. Start: 0 Rules: 0 --> 010 1 --> 011 ------------0: (#=1) 0 1: (#=3) 010 2: (#=9) 010011010 3: (#=27) 010011010010011011010011010 4: (#=81) 010011010010011011010011010010011010010011011010011011010011010010011011010011010 Start: F Rules: F --> F+F-F + --> + - --> ------------0: (#=1) F 1: (#=5) F+F-F 2: (#=17) F+F-F+F+F-F-F+F-F 3: (#=53) F+F-F+F+F-F-F+F-F+F+F-F+F+F-F-F+F-F-F+F-F+F+F-F-F+F-F 4: (#=161) F+F-F+F+F-F-F+F-F+F+F-F+F+F-F-F+F-F-F+F-F+F+F-F-F+F-F+F+F-F+F+F-F-F+F-F+F+F-F+F+F- ... Figure 1.31-M: Turns of the terdragon curve, generated by string substitution (top), alternative process for the moves and turns (bottom, identify ‘+’ with ‘0’ and ‘-’ with ‘1’). 1.31: Some space filling curves ‡ 95 Start: F Rules: F --> F+L+F-L-F + --> + - --> L --> L ------------0: (#=1) F 1: (#=9) F+L+F-L-F 2: (#=33) F+L+F-L-F+L+F+L+F-L-F-L-F+L+F-L-F 3: (#=105) F+L+F-L-F+L+F+L+F-L-F-L-F+L+F-L-F+L+F+L+F-L-F+L+F+L+F-L-F-L-F+L+F-L-F-L-F+L+F-L-F+ ... Figure 1.31-N: String substitution process for the hexdragon. The sequence is entry A080846 in [312], it can be generated via the string substitution with rules 0 7→ 101 and 1 7→ 011, see figure 1.31-M. A fast method to compute the sequence is based on radix-3 counting: let C1 (k) be the number of ones in the radix-3 expansion of k, the sequence is one if C1 (k + 1) < C1 (k) [FXT: bits/bit-dragon3.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 static inline bool bit_dragon3_turn(ulong &x) // Increment the radix-3 word x and // return whether the number of ones in x is decreased. { ulong s = 0; while ( (x & 3) == 2 ) { x >>= 2; ++s; } // scan over nines // if ( (x & 3) == 0 ) ==> incremented word will have one more 1 // if ( (x & 3) == 1 ) ==> incremented word will have one less 1 bool tr = ( (x & 3) != 0 ); // incremented word will have one less 1 ++x; // increment next digit x <<= (s<<1); // shift back return tr; } About 220 million values per second are generated. Two renderings of the first 729 segments of the curve are shown in figure 1.31-K (created with [FXT: bits/dragon3-texpic-demo.cc]). If we replace each turn by 120 degrees (followed by a line) by two turns by 60 degrees (each followed by a line) we obtain what may be called a hexdragon, shown in figure 1.31-L (created with [FXT: bits/dragonhex-texpic-demo.cc]). A string substitution process for the hexdragon is shown in figure 1.31-N. 1.31.5 Dragon curves based on radix-R counting Another dragon curve can be generated on radix-5 counting (we will call the curve R5-dragon) [FXT: bits/bit-dragon-r5.h]: 1 2 3 4 5 6 7 8 9 10 11 12 static inline bool bit_dragon_r5_turn(ulong &x) // Increment the radix-5 word x and // return (tr) whether the lowest nonzero digit // of the incremented word is > 2. { ulong s = 0; while ( (x & 7) == 4 ) { x >>= 3; ++s; } // scan over nines bool tr = ( (x & 7) >= 2 ); // whether digit will be > 2 ++x; // increment next digit x <<= (3*s); // shift back return tr; } About 310 million values per second are generated. The turns are by 90 degrees. Two renderings of the R5-dragon are shown in figure 1.31-O (created with [FXT: bits/dragon-r5-texpic-demo.cc]). The sequence of returned values (entry A175337 in [312]) can be computed via the string substitution shown in figure 1.31-R (top). Based on radix-7 counting we can generate a curve that will be called the R7-dragon, the turns are be 120 degrees [FXT: bits/bit-dragon-r7.h]: 1 2 static inline bool bit_dragon_r7_turn(ulong &x) // Increment the radix-7 word x and 96 Chapter 1: Bit wizardry Figure 1.31-O: The first 625 segments of the R5-dragon (two different renderings). 1.31: Some space filling curves ‡ Figure 1.31-P: The first 2401 segments of the R7-dragon (two different renderings). 3 4 // return (tr) whether the lowest nonzero digit // of the incremented word is either 2, 3, or 6. 97 98 Chapter 1: Bit wizardry Figure 1.31-Q: The first 2401 segments of the second R7-dragon (two different renderings). 1.31: Some space filling curves ‡ 99 Start: 0 Rules: 0 --> 00110 1 --> 00111 ------------0: (#=1) 0 1: (#=5) 00110 2: (#=25) 0011000110001110011100110 3: (#=125) 00110001100011100111001100011000110001110011100110001100011000111001110011100 \ 110001100011100111001110011000110001110011100110 Start: 0 Rules: 0 --> 0100110 1 --> 0110110 ------------0: (#=1) 0 1: (#=7) 0100110 2: (#=49) 0100110011011001001100100110011011001101100100110 3: (#=343) 010011001101100100110010011001101100110110010011001001100110110011011001001 ... Start: 0 Rules: 0 --> 0++--00 + --> 0++--0+ - --> 0++--0------------0: (#=1) 0 1: (#=7) 0++--00 2: (#=49) 0++--000++--0+0++--0+0++--0-0++--0-0++--000++--00 3: (#=343) 0++--000++--0+0++--0+0++--0-0++--0-0++--000++--000++--000++--0+0++--0+0++-- ... Figure 1.31-R: Turns of the R5-dragon (top), the R7-dragon (middle), and the second R7-dragon (bottom), generated by string substitution. 5 6 7 8 9 10 11 12 13 { ulong s = 0; while ( (x & 7) == 6 ) { x >>= 3; ++s; } // scan over nines ++x; // increment next digit bool tr = ( x & 2 ); // whether digit is either 2, 3, or 6 x <<= (3*s); // shift back return tr; } Two renderings of the R7-dragon are shown in figure 1.31-P (created with [FXT: bits/dragon-r7-texpicdemo.cc]). The sequence of returned values (entry A176405 in [312]) can be computed via the string substitution shown in figure 1.31-R (middle). Turns for another curve based on radix-7 counting (shown in figure 1.31-Q, created with [FXT: bits/dragon-r7-2-texpic-demo.cc]) can be computed as follows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 static inline int bit_dragon_r7_2_turn(ulong &x) // Increment the radix-7 word x and // return (tr) according to the lowest nonzero digit d // of the incremented word: // d==[1,2,3,4,5,6] ==> rt:=[0,+1,+1,-1,-1,0] // (tr * 120deg) is the turn with the second R7-dragon. { ulong s = 0; while ( (x & 7) == 6 ) { x >>= 3; ++s; } // scan over nines ++x; // increment next digit int tr = 2 - ( (0x2f58 >> (2*(x&7)) ) & 3 ); x <<= (3*s); // shift back return tr; } The sequence of turns can be generated by the string substitution shown in figure 1.31-R (bottom), it is 100 Chapter 1: Bit wizardry Start: F Rules: F --> F+F+F-F-F + --> + - --> ------------0: (#=1) F 1: (#=9) F+F+F-F-F 2: (#=49) F+F+F-F-F+F+F+F-F-F+F+F+F-F-F-F+F+F-F-F-F+F+F-F-F 3: (#=249) F+F+F-F-F+F+F+F-F-F+F+F+F-F-F-F+F+F-F-F-F+F+F-F-F+F+F+F-F-F+F+F+F-F-F+F+F+F-F-F-F+ ... Start: F Rules: F --> F+F-F-F+F+F-F + --> + - --> ------------0: (#=1) F 1: (#=13) F+F-F-F+F+F-F 2: (#=97) F+F-F-F+F+F-F+F+F-F-F+F+F-F-F+F-F-F+F+F-F-F+F-F-F+F+F-F+F+F-F-F+F+F-F+F+F-F-F+F+F- ... Start: F Rules: F --> F0F+F+F-F-F0F + --> + - --> 0 --> 0 ------------0: (#=1) F 1: (#=13) F0F+F+F-F-F0F 2: (#=97) F0F+F+F-F-F0F0F0F+F+F-F-F0F+F0F+F+F-F-F0F+F0F+F+F-F-F0F-F0F+F+F-F-F0F-F0F+F+F-F-F0 ... Figure 1.31-S: String substitution processes for the turns (symbols ‘+’ and ‘-’) and moves (symbol ‘F’ is a unit move in the current direction) of the R5-dragon (top), the R7-dragon (middle), and the second R7-dragon (bottom). entry A176416 in [312]. Two curves respectively based on radix-9 and radix-13 counting are shown in figure 1.31-T. The corresponding routines are given in [FXT: bits/bit-dragon-r9.h] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 static inline bool bit_dragon_r9_turn(ulong &x) // Increment the radix-9 word x and // return (tr) whether the lowest nonzero digit // of the incremented word is either 2, 3, 5, or 8. // tr determines whether to turn left or right (by 120 degrees) // with the R9-dragon fractal. // The sequence tr is the fixed point // of the morphism 0 |--> 011010010, 1 |--> 011010011. // Also fixed point of morphism (identify + with 0 and - with 1) // F |--> F+F-F-F+F-F+F+F-F, + |--> +, - |--> // Also fixed point of morphism // F |--> G+G-G, G |--> F-F+F, + |--> +, - |--> { ulong s = 0; while ( (x & 15) == 8 ) { x >>= 4; ++s; } // scan over nines ++x; // increment next digit bool tr = ( (0x12c >> (x&15)) & 1 ); // whether digit is either 2, 3, 5, or 8 x <<= (4*s); // shift back return tr; } and [FXT: bits/bit-dragon-r13.h] 1 2 3 4 5 6 7 8 9 10 11 12 static inline bool bit_dragon_r13_turn(ulong &x) // Increment the radix-13 word x and // return (tr) whether the lowest nonzero digit // of the incremented word is either 3, 6, 8, 9, 11, or 12. // tr determines whether to turn left or right (by 90 degrees) // with the R13-dragon fractal. // The sequence tr is the fixed point // of the morphism 0 |--> 0010010110110, 1 |--> 0010010110111. // Also fixed point of morphism (identify + with 0 and - with 1) // F |--> F+F+F-F+F+F-F+F-F-F+F-F-F, + |--> +, - |--> { ulong s = 0; 1.31: Some space filling curves ‡ 13 14 15 16 17 18 101 while ( (x & 15) == 12 ) { x >>= 4; ++s; } // scan over nines ++x; // increment next digit bool tr = ( (0x1b48 >> (x&15)) & 1 ); // whether digit is either 3, 6, 8, 9, 11, or 12 x <<= (4*s); // shift back return tr; } Figure 1.31-T: The R9-dragon (top) and the R13-dragon (bottom). 102 Chapter 2: Permutations and their operations Chapter 2 Permutations and their operations We study permutations together with the operations on them, like composition and inversion. We further discuss the decomposition of permutations into cycles and give methods for generating random permutations, cyclic permutations, involutions, and derangements. In-place algorithms for applying several special permutations like the revbin permutation, the Gray permutation, and matrix transposition are given. Algorithms for the generation of all permutations of a given number of objects and bijections between permutations and mixed radix numbers in factorial base are given in chapter 10. 2.1 Basic definitions and operations A permutation of n elements can be represented by an array X = [x0 , x1 , . . . , xn−1 ]. When the permutation X is applied to F = [f0 , f1 , . . . , fn−1 ], then the element at position k is moved to position xk . A routine for the operation is [FXT: perm/permapply.h]: 1 2 3 4 5 6 7 template void apply_permutation(const ulong *x, const Type *f, Type * restrict g, ulong n) // Apply the permutation x[] to the array f[], // i.e. set g[x[k]] <-- f[k] for all k { for (ulong k=0; k k for all k < n − 1: 1 2 3 4 5 6 7 8 9 10 11 12 13 bool is_connected(const ulong *f, ulong n) { if ( n<=1 ) return true; ulong m = 0; // maximum for (ulong k=0; km ) m = fk; if ( m<=k ) return false; } return true; } To check whether an array is a valid permutation, we need to verify that each index in the valid range appears exactly once. The bit-array described in section 4.6 on page 164 allows doing the job without modifying the input: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 bool is_valid_permutation(const ulong *f, ulong n, bitarray *bp/*=0*/) // Return whether all values 0...n-1 appear exactly once, // i.e. whether f represents a permutation of [0,1,...,n-1]. { // check whether any element is out of range: for (ulong k=0; k=n ) return false; // check whether values are unique: bitarray *tp = bp; if ( 0==bp ) tp = new bitarray(n); tp->clear_all(); ulong k; for (k=0; ktest_set(f[k]) ) break; delete tp; (k==n); } The complement of a permutation is computed by replacing every element v by n − 1 − v [FXT: perm/permcomplement.h]: 1 2 3 4 5 6 inline void make_complement(const ulong *f, ulong *g, ulong n) // Set (as permutation) g to the complement of f. // Can have f==g. { for (ulong k=0; k inline void reverse(Type *f, ulong n) // Reverse order of array f. { for (ulong k=0, i=n-1; k 2 --> 4 ) ( 3 --> 6 --> 5 ) The cycles do ‘wrap around’, for example, the final 4 of the fist cycle goes to position 1, the first element of the cycle. The inverse permutation is found by reversing every arrow in each cycle: ( 1 <-- 2 <-- 4 ) ( 3 <-- 6 <-- 5 ) Equivalently, we can reverse the order of the elements in each cycle: ( 4 --> 2 --> 1 ) ( 5 --> 6 --> 3 ) If we begin each cycle with its smallest element, the inverse permutation is written as ( 1 --> 4 --> 2 ) ( 3 --> 5 --> 6 ) This form is obtained by reversing all elements except the first in each cycle of the (forward) permutation. The last three sets of cycles all describe the same permutation, it is [ 0, 4, 1, 5, 2, 6, 3, 7 ] Permutation: [ 0 2 4 6 1 3 5 7 ] Inverse: [ 0 4 1 5 2 6 3 7 ] Cycles: (0) #=1 (1, 2, 4) #=3 (3, 6, 5) #=3 (7) #=1 Code: template inline void foo_perm_8(Type *f) { { Type t=f[1]; f[1]=f[4]; f[4]=f[2]; { Type t=f[3]; f[3]=f[5]; f[5]=f[6]; } f[2]=t; } f[6]=t; } Figure 2.2-A: A permutation of 8 elements, its inverse, its cycles, and code for the permutation. The cycles form of a permutation can be printed with [FXT: perm/printcycles.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 void print_cycles(const ulong *f, ulong n, bitarray *tb/*=0*/) // Print cycle form of the permutation in f[]. // Examples (first permutations of 4 elements in lex order): // array form cycle form // 0: [ 0 1 2 3 ] (0) (1) (2) (3) // 1: [ 0 1 3 2 ] (0) (1) (2, 3) // 2: [ 0 2 1 3 ] (0) (1, 2) (3) // 3: [ 0 2 3 1 ] (0) (1, 2, 3) // 4: [ 0 3 1 2 ] (0) (1, 3, 2) // 5: [ 0 3 2 1 ] (0) (1, 3) (2) // 6: [ 1 0 2 3 ] (0, 1) (2) (3) // 7: [ 1 0 3 2 ] (0, 1) (2, 3) // 8: [ 1 2 0 3 ] (0, 1, 2) (3) { bitarray *b = tb; 2.3: Compositions of permutations 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 105 if ( tb==0 ) b = new bitarray(n); b->clear_all(); for (ulong k=0; ktest(k) ) continue; // already processed cout << "("; ulong i = k; // next in cycle const char *cm = ""; do { cout << cm << i; cm = ", "; b->set(i); } while ( (i=f[i]) != k ); // until we meet cycle leader again cout << ") "; } if ( tb==0 ) delete b; } The bit-array (see section 4.6 on page 164 for the implementation) is used to keep track of the elements already processed. The routine can be modified to generate code for applying a given permutation to an array. The program [FXT: perm/cycles-demo.cc] prints cycles and code for a permutation, see figure 2.2-A. 2.2.1 Cyclic permutations A permutation consisting of exactly one cycle is called cyclic. Whether a given permutation has this property can be tested with [FXT: perm/permq.cc]: 1 2 3 4 5 6 7 8 9 bool is_cyclic(const ulong *f, ulong n) // Return whether permutation is exactly one cycle. { if ( n<=1 ) return true; ulong k = 0, e = 0; do { e=f[e]; ++k; } while ( e!=0 ); return (k==n); } The method used is to follow the cycle starting at position zero and counting how long it is. If the length found equals the array length, then the permutation is cyclic. There are (n − 1)! cyclic permutations of n elements. 2.2.2 Sign and parity of a permutation Every permutation can be written as a composition of transpositions (cycles of length 2). This number of transpositions is not unique, but modulo 2 it is unique. The sign of a permutation is defined to be +1 if the number is even and −1 if the number is odd. The minimal number of transpositions whose composition give a cycle of length l is l − 1. So the minimal numberP of transpositions P for a permutation k k consisting of k cycles where the length of the j-th cycle is lj equals j=1 (lj − 1) = ( j=1 lj ) − k. The transposition count modulo 2 is called the parity of a permutation. 2.3 Compositions of permutations We can apply several permutations to an array, one by one. The resulting permutation is called the composition of the applied permutations. The operation of composition is not commutative: in general f · g 6= g · f for f 6= g. We note that the permutations of n elements form a group (of n! elements), the group operation is composition. 106 Chapter 2: Permutations and their operations 2.3.1 The inverse of a permutation A permutation f is the inverse of the permutation g if it undoes its effect: f · g = id. A test whether two permutations f and g are mutual inverses is 1 2 3 4 5 6 bool is_inverse(const ulong *f, const ulong *g, ulong n) // Return whether f[] is the inverse of g[] { for (ulong k=0; kclear_all(); for (ulong k=0; ktest_clear(k) ) tp->set(k); continue; // already processed // invert a cycle: ulong i = k; ulong g = f[i]; // next index while ( 0==(tp->test_set(g)) ) { ulong t = f[g]; f[g] = i; i = g; g = t; } f[g] = i; } if ( 0==bp ) delete tp; } The extra array of tag-bits can be avoided by using the highest bit of each word as a tag-bit. The scheme would fail if any word of the permutation array had the highest bit set. However, on byte-addressable machines such an array will not fit into memory (for word sizes of 16 or more bits). To keep the code similar to the version using the bit-array, we define 1 2 3 4 5 6 static const ulong s1 = 1UL << (BITS_PER_LONG - 1); // highest bit is tag-bit static const ulong s0 = ~s1; // all bits but tag-bit static inline void SET(ulong *f, ulong k) { f[k&s0] |= s1; } static inline void CLEAR(ulong *f, ulong k) { f[k&s0] &= s0; } static inline bool TEST(ulong *f, ulong k) { return (0!=(f[k&s0]&s1)); } 2.3: Compositions of permutations 107 We have to mask out the tag-bit when using the index variable k. The routine can be implemented as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 void make_inverse(ulong *f, ulong n) // Set (as permutation) f to its own inverse. // In-place version using highest bits of array as tag-bits. { for (ulong k=0; kclear_all(); for (ulong k=0; ktest_clear(k) ) tp->set(k); continue; // square a cycle: ulong i = k; ulong t = f[i]; // save ulong g = f[i]; // next index while ( 0==(tp->test_set(g)) ) { f[i] = f[g]; i = g; g = f[g]; } f[i] = t; } if ( 0==bp ) delete tp; // already processed 108 28 Chapter 2: Permutations and their operations } 2.3.3 Composing and powering permutations The composition of two permutations can be computed as 1 2 3 4 5 6 void compose(const ulong *f, const ulong *g, ulong * restrict h, ulong n) // Set (as permutation) h = f * g { for (ulong k=0; k 1 ulong x = e>0 ? e : -e; if ( is_pow_of_2(x) ) // special case x==2^n { make_square(f, g, n); while ( x>2 ) { make_square(g, n); x /= 2; } } else { ulong *tt = t; if ( 0==t ) { tt = new ulong[n]; } acopy(f, tt, n); int firstq = 1; while ( 1 ) { if ( x&1 ) // odd { if ( firstq ) // avoid multiplication by 1 { acopy(tt, g, n); firstq = 0; } else compose(tt, g, n); if ( x==1 ) } goto dort; 2.4: In-place methods to apply permutations to data 53 54 55 56 57 58 59 60 61 62 109 make_square(tt, n); x /= 2; } dort: if ( 0==t ) } if ( e<0 ) delete [] tt; make_inverse(g, n); } The routine involves O (n log(n)) operations. By extracting the cycles of the permutation, computing their e-th powers, and copying them back, we could reduce the complexity to only O(n). The e-th power of a cycle is a cyclic shift by e positions, as described in section 2.9 on page 123. 2.4 In-place methods to apply permutations to data We repeat the routine for applying a permutation [FXT: perm/permapply.h]: 1 2 3 4 5 6 7 template void apply_permutation(const ulong *x, const Type *f, Type * restrict g, ulong n) // Apply the permutation x[] to the array f[], // i.e. set g[x[k]] <-- f[k] for all k { for (ulong k=0; k void apply_permutation(const ulong *x, Type * restrict f, ulong n, bitarray *bp=0) { bitarray *tp = bp; if ( 0==bp ) tp = new bitarray(n); // tags tp->clear_all(); for (ulong k=0; ktest_clear(k) ) tp->set(k); continue; // already processed // --- do cycle: --ulong i = k; // start of cycle Type t = f[i]; ulong g = x[i]; while ( 0==(tp->test_set(g)) ) // cf. gray_permute() { Type tt = f[g]; f[g] = t; t = tt; g = x[g]; } f[g] = t; // --- end (do cycle) --} if ( 0==bp ) delete tp; } To apply the inverse of a permutation without inverting the permutation itself, use 1 2 3 4 5 template void apply_inverse_permutation(const ulong *x, const Type *f, Type * restrict g, ulong n) { for (ulong k=0; k void apply_inverse_permutation(const ulong *x, Type * restrict f, ulong n, bitarray *bp=0) { bitarray *tp = bp; 110 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 Chapter 2: Permutations and their operations if ( 0==bp ) tp = new bitarray(n); tp->clear_all(); for (ulong k=0; ktest_clear(k) ) tp->set(k); // tags continue; // already processed // --- do cycle: --ulong i = k; // start of cycle Type t = f[i]; ulong g = x[i]; while ( 0==(tp->test_set(g)) ) // cf. inverse_gray_permute() { f[i] = f[g]; i = g; g = x[i]; } f[i] = t; // --- end (do cycle) --} if ( 0==bp ) delete tp; } A permutation of n elements can be given as a function X(k) (where 0 ≤ X(k) <= n for 0 ≤ k < n, and X(i) 6= X(j) for i 6= j). The permutation given as function X can be applied to an array f via [FXT: perm/permapplyfunc.h]: 1 2 3 4 5 6 template void apply_permutation(ulong (*x)(ulong), const Type *f, Type * restrict g, ulong n) // Set g[x(k)] <-- f[k] for all k { for (ulong k=0; k void apply_inverse_permutation(ulong (*x)(ulong), const Type *f, Type * restrict g, ulong n) { for (ulong k=0; ktest_set(g)) ) // cf. gray_permute() { Type tt = f[g]; f[g] = t; t = tt; g = x[g]; } f[g] = t; [--snip--] must be modified by replacing all occurrences of ‘x[i]’ with ‘x(i)’: 1 2 3 4 5 6 void apply_permutation(ulong (*x)(ulong), Type *f, ulong n, bitarray *bp=0) [--snip--] ulong i = k; // start of cycle Type t = f[i]; ulong g = x(i); // <--= while ( 0==(tp->test_set(g)) ) // cf. gray_permute() 2.5: Random permutations 7 8 9 10 11 12 13 14 111 { Type tt = f[g]; f[g] = t; t = tt; g = x(g); // <--= } f[g] = t; [--snip--] 2.5 Random permutations The following routine randomly permutes an array with arbitrary elements [FXT: perm/permrand.h]: 1 2 3 4 5 6 7 8 9 template void random_permute(Type *f, ulong n) { for (ulong k=n; k>1; --k) { const ulong i = rand_idx(k); swap2(f[k-1], f[i]); } } An alternative version for the loop is: 1 2 3 4 5 for (ulong k=1; k0. { if ( m==1 ) return 0; // could also use % 1 ulong x = (ulong)rand(); x ^= x>>16; // avoid using low bits of rand() alone return x % m; } A random permutation is computed by applying the function to the identical permutation: 1 2 3 4 5 6 void random_permutation(ulong *f, ulong n) // Create a random permutation { for (ulong k=0; k n return t. 3. With probability 1/k set t = Lk . 4. Go to step 2. Note that one does not need to know n, the number of elements in the list, in advance: replace the second statement in step 2 by “If there are no more elements, return t”. 112 2.5.1 Chapter 2: Permutations and their operations Random cyclic permutation A routine to apply a random cyclic permutation (as defined in section 2.2.1 on page 105) to an array is [FXT: perm/permrand-cyclic.h] 1 2 3 4 5 6 7 8 9 10 template void random_permute_cyclic(Type *f, ulong n) // Permute the elements of f by a random cyclic permutation. { for (ulong k=n-1; k>0; --k) { const ulong i = rand_idx(k); swap2(f[k], f[i]); } } The method is called Sattolo’s algorithm, see [296], and also [171] and [362]. It can be described as a method to arrange people in a cycle: Assume there are n people in a room. Let the first person choose a successor out of the remaining persons not yet chosen. Then let the person just chosen make the next choice of a successor. Repeat until everyone has been chosen. Finally, let the first person be the successor of the last person chosen. The cycle representation of a random cyclic permutation can be computed by applying a random permutation to all elements (of the identical permutation) except for the first element. 2.5.2 Random prefix of a permutation A length-m prefix of a random permutation of n elements is computed by the following routine that uses just O(m) operations [FXT: perm/permrand-pref.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 template void random_permute_pref(Type *f, ulong n, ulong m) // Set the first m elements to a prefix of a random permutation. // Same as: set the first m elements of f to a random permutation // of a random selection of all n elements. // Must have m<=n-1. // Same as random_permute() if m>=n-1. { if ( m>n-1 ) m = n-1; // m>n is not admissable for (ulong k=0,j=n; k void random_permute_parity(Type *f, ulong n, bool par) // Randomly permute the elements of f, such that the // parity of the permutation equals par. // I.e. the minimal number of transpositions of the // permutation is even if par==0, else odd. // Note: with n<=1 there is no odd permutation. { if ( (par==1) && (n<2) ) return; // not admissable bool pr = 0; // identity has even parity for (ulong k=1; k void random_ord01_permutation(Type *f, ulong n) // Random permutation such that elements 0 and 1 are in order. { random_permutation(f, n); ulong t = 0; while ( f[t]>1 ) ++t; if ( f[t]==0 ) return; // already in correct order f[t] = 0; do { ++t; } while ( f[t]!=0 ); f[t] = 1; } The routine generates half of all the permutations but not their reversals. The following routine fixes the relative order of the m smallest elements: 1 2 3 4 5 6 7 8 template void random_ordm_permutation(Type *f, ulong n, ulong m) // Random permutation such that the m smallest elements are in order. // Must have m<=n. { random_permutation(f, n); for (ulong t=0,j=0; j void random_lastm_permutation(Type *f, ulong n, ulong m) // Random permutation such that 0 appears as last of the m smallest elements. // Must have m<=n. { random_permutation(f, n); if ( m<=1 ) return; ulong p0=0, pl=0; // position of 0, and last (in m smallest elements) for (ulong t=0, j=0; j inline ulong random_cycle(Type *f, ulong cl, ulong *r, ulong nr) // Permute a random set of elements (whose positions are given in 114 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Chapter 2: Permutations and their operations // r[0], ..., r[nr-1]) by a random cycle of length cl. // Must have nr >= cl and cl != 0. { if ( cl==1 ) // just remove a random position from r[] { const ulong i = rand_idx(nr); --nr; swap2( r[nr], r[i] ); // remove position from set } else // cl >= 2 { const ulong i0 = rand_idx(nr); const ulong k0 = r[i0]; // position of cycle leader const Type f0 = f[k0]; // cycle leader --cl; --nr; swap2( r[nr], r[i0] ); // remove position from set ulong kp = k0; // position of predecessor in cycle do // create cycle { const ulong i = rand_idx(nr); const ulong k = r[i]; // random available position f[kp] = f[k]; // move element --nr; swap2( r[nr], r[i] ); // remove position from set kp = k; // update predecessor } while ( --cl ); f[kp] = f0; // close cycle } return nr; } To permute according to a cycle type, we call the routine according to the elements of an array c[] that specifies how many cycles of each length are required: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 template inline void random_permute_cycle_type(Type *f, ulong n, const ulong *c, ulong *tr=0) // Permute the elements of f by a random permutation of prescribed cycle type. // The permutation will have c[k] cycles of length k+1. // Must have s <= n where s := sum(k=0, n-1, c[k]). // If s < n then the permutation will have n-s fixed points. { ulong *r = tr; if ( tr==0 ) r = new ulong[n]; for (ulong k=0; k R(n) then a 2-cycle is created, else a fixed point. The quantities I(n) cannot be used with fixed precision arithmetic because an overflow would occur for large n. Instead, we update R(n) via R(n + 1) = 1 1 + n R(n) (2.5-2) The recurrence is numerically stable [FXT: perm/permrand-self-inverse.h]: 1 2 3 4 5 inline void next_involution_branch_ratio(double &rat, double &n1) { n1 += 1.0; rat = 1.0/( 1.0 + n1*rat ); } The following routine initializes the array of values R(n): 1 2 3 4 5 6 7 8 9 10 inline void init_involution_branch_ratios(double *b, ulong n) { b[0] = 1.0; double rat = 0.5, n1 = 1.0; for (ulong k=1; k inline void random_permute_self_inverse(Type *f, ulong n, ulong *tr=0, double *tb=0, bool bi=false) // Permute the elements of f by a random self-inverse permutation (an involution). // Set bi:=true to signal that the branch probabilities in tb[] // have been precomputed (via init_involution_branch_ratios()). { ulong *r = tr; if ( tr==0 ) r = new ulong[n]; for (ulong k=0; k=2 ) { const ulong x1 = nr-1; const ulong r1 = r[x1]; // available position --nr; // no swap needed if x1==last const double rat = b[nr]; // probability to choose fixed point const double t = rnd01(); // 0 <= t < 1 if ( t > rat ) // 2-cycle { const ulong x2 = rand_idx(nr); const ulong r2 = r[x2]; // random available position != r1 --nr; swap2(r[x2], r[nr]); swap2( f[r1], f[r2] ); } // else // fixed point, nothing to do } if ( tr==0 ) if ( tb==0 ) delete [] r; delete [] b; } The auxiliary function rand01() returns a random number t where 0 ≤ t < 1 [FXT: aux0/randf.cc]. 2.5.7 Random derangement In each step of the routine for a random permutation without fixed points (a derangement) we join two cycles and decide whether to close the resulting cycle. The probability of closing is B(n) = (n − 1) D(n − 116 Chapter 2: Permutations and their operations 2)/D(n) where D(n) is the number of derangements of n elements. This can be seen by dividing relation 11.1-12a on page 280 by D(n): 1 = (n − 1) D(n − 1) (n − 1) D(n − 2) + D(n) D(n) (2.5-3) The probability B(n) is close to 1/n for large n. Already for n > 30 the relative error (for B(n) versus 1/n) is less than 10−32 , so B(n) is indistinguishable from 1/n with floating-point types where the mantissa has at most 106 bits. We compute a table of just 32 values B(n) [FXT: perm/permrand-derange.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 // number of precomputed branch ratios: #define NUM_PBR 32 // OK for up to 106-bit mantissa inline void init_derange_branch_ratios(double *b) { b[0] = 0.0; b[1] = 1.0; double dn0 = 1.0, dn1 = 0.0, n1 = 1.0; for (ulong k=2; k inline void random_derange(Type *f, ulong n, ulong *tr=0, double *tb=0, bool bi=false) // Permute the elements of f by a random derangement. // Set bi:=true to signal that the branch probabilities in tb[] // have been precomputed (via init_derange_branch_ratios()). // Must have n > 1. { ulong *r = tr; if ( tr==0 ) r = new ulong[n]; for (ulong k=0; k=2 ) { const ulong x1 = nr-1; // last element const ulong r1 = r[x1]; const ulong x2 = rand_idx(nr-1); const ulong r2 = r[x2]; swap2( f[r1], f[r2] ); // random element !=last // join cycles containing f[r1] and f[r2] // remove r[x1]=r1 from set: --nr; // swap2(r[x1], r[nr]); // swap not needed if x1==last 2.5: Random permutations 32 33 34 35 36 37 38 39 40 41 42 43 44 45 117 const double rat = derange_branch_ratio(b, nr); const double t = rnd01(); // 0 <= t < 1 if ( t < rat ) // close cycle { // remove r[x2]=r2 from set: --nr; swap2(r[x2], r[nr]); } // else cycle stays open } if ( tr==0 ) if ( tb==0 ) delete [] r; delete [] b; } The method is (essentially) given in [245]. A generalization for permutations with all cycles of length ≥ m is given in [24]. 2.5.8 Random connected permutation A random connected (indecomposable) permutation can be computed via the rejection method : create a random permutation, if it is not connected, repeat. An implementation is [FXT: perm/permrandconnected.h] 1 2 3 4 5 inline void random_connected_permutation(ulong *f, ulong n) { for (ulong k=0; k [1,2,0] // i = 1 ==> [2,1,0] // i = 2 ==> [2,0,1] return; } do { for (ulong k=0; kx then swap(a[x], a[r]) } } The condition r>x before the swap() statement makes sure that the swapping is not undone later when the loop variable x has the value of the present r. 2.6.1 Computation using revbin-update The key ingredient for a fast permutation routine is the observation that we only need to update the bit-reversed values: given x̃ we can compute x] + 1 efficiently as described in section 1.14.3 on page 36. A faster routine will be of the form 1 2 3 4 5 6 7 8 9 10 11 procedure revbin_permute(a[], n) // a[0..n-1] input,result { if n<=2 return r := 0 // the reversed 0 for x:=1 to n-1 { r := revbin_upd(r, n/2) if r>x then swap(a[x], a[r]) } } √ About (n − n)/2 swap() statements are executed with the revbin permutation of n elements. That is, almost every element is moved for large n, as there are only a few numbers with symmetric bit patterns: 2.6: The revbin permutation 119 n: 2: 4: 8: 16: 32: 64: 210 : 220 : ∞: 2 # swaps 0 2 4 12 24 56 992 0.999 · √ 220 n− n # symm. pairs 2 2 4 4 8 8 32 210 √ n The sequence is entry A045687 in [312]: 0, 2, 4, 12, 24, 56, 112, 238, 480, 992, 1980, 4032, 8064, 16242, 32512, 65280, ... 2.6.2 Exploiting the symmetries of the permutation Symmetry can be used for further optimization: if for even x < n2 there is a swap for the pair (x, x̃), then there is also a swap for the pair (n − 1 − x, n − 1 − x̃). As x < n2 and x̃ < n2 , one has n − 1 − x > n2 and n − 1 − x̃ > n2 . That is, the swaps are independent. A routine that uses these observations is 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 procedure revbin_permute(a[], n) { if n<=2 return nh := n/2 r := 0 // the reversed 0 x := 1 while xx then { swap(a[x], a[r]) swap(a[n-1-x], a[n-1-r]) } x := x + 1 } } The code above can be used to derive an optimized version for zero padded data (used with linear convolution, see section 22.1.4 on page 443): 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 procedure revbin_permute0(a[], n) { if n<=2 return nh := n/2 r := 0 // the reversed 0 x := 1 while xx then swap(a[x], a[r]) // Omit swap of a[n-1-x] and a[n-1-r] as both are zero x := x + 1 } } We can carry the scheme further, distinguishing whether x mod 4 = 0, 1, 2, or 3, as done in the implementation [FXT: perm/revbinpermute.h]. The following parameters determine how much of the symmetry is used and which version of the revbin-update routine is chosen: 120 1 2 Chapter 2: Permutations and their operations #define #define RBP_SYMM 4 FAST_REVBIN // amount of symmetry used: 1, 2, 4 (default is 4) // define if using revbin(x, ldn) is faster than updating We further define a macro to swap elements: 1 #define idx_swap(k, r) { ulong kx=(k), rx=(r); swap2(f[kx], f[rx]); } The main routine uses unrolled versions of the revbin permutation for small values of n. These are given in [FXT: perm/shortrevbinpermute.h]. For example, the unrolled routine for n = 16 is 1 2 3 4 5 6 7 8 9 10 template inline void revbin_permute_16(Type *f) { swap2(f[1], f[8]); swap2(f[2], f[4]); swap2(f[3], f[12]); swap2(f[5], f[10]); swap2(f[7], f[14]); swap2(f[11], f[13]); } The code was generated with the program [FXT: perm/cycles-demo.cc], see section 2.2 on page 104. The routine revbin_permute_leq_64(f,n), which is called for n ≤ 64, selects the correct routine for the parameter n: 1 2 3 4 5 6 7 8 9 template void revbin_permute(Type *f, ulong n) { if ( n<=64 ) { revbin_permute_leq_64(f, n); return; } [--snip--] In what follows we set RBP_SYMM to 4, define FAST_REVBIN, and omit the corresponding preprocessor statements. Some auxiliary constants have to be computed: 1 2 3 4 5 const ulong ldn = ld(n); const ulong nh = (n>>1); const ulong n1 = n - 1; // = 11111111 const ulong nx1 = nh - 2; // = 01111110 const ulong nx2 = n1 - nx1; // = 10111101 The main loop is 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 ulong k = 0, r = 0; while ( k < (n/RBP_SYMM) ) // n>=16, n/2>=8, n/4>=4 { // ----- k%4 == 0: if ( r>k ) { idx_swap(k, r); // nh, >nh 00 idx_swap(nx1^k, nx1^r); // nh, >nh 00 } ++k; r ^= nh; // ----- k%4 == 1: if ( r>k ) { idx_swap(k, r); // nh 10 idx_swap(n1^k, n1^r); // >nh, k ) { idx_swap(k, r); // nh, >nh 00 2.7: The radix permutation 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 121 } ++k; r ^= nh; // ----- k%4 == 3: if ( r>k ) { idx_swap(k, r); // nh 10 idx_swap(nx1^k, nx1^r); // nh 10 } ++k; r = revbin(k, ldn); } } // end of the routine For large n the routine takes about six times longer than a simple array reversal. Much of the time is spent waiting for memory which suggests that further optimizations would best be attempted with special machine instructions to bypass the cache or with non-temporal writes. A specialized implementation optimized for zero padded data is given in [FXT: perm/revbinpermute0.h]. Some memory accesses can be avoided for that case. For example, revbin-pairs with both indices greater than n/2 need no processing at all. 2.6.3 A pitfall When working with separate arrays for the real and imaginary parts of complex data, one could remove half of the bookkeeping as follows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 procedure revbin_permute(a[], b[], n) { if n<=2 return r := 0 // the reversed 0 for x:=1 to n-1 { r := revbin_upd(r, n/2) // inline me if r>x then { swap(a[x], a[r]) swap(b[x], b[r]) } } } If both the real and the imaginary part fit into level-1 cache the method can lead to a speedup. However, for large arrays the routine can be much slower than two separate calls of the simple method: with FFTs the real and imaginary element for the same index typically lie apart in memory by a power of 2, leading to a high percentage of cache misses with large arrays. 2.7 The radix permutation The radix permutation is the generalization of the revbin permutation to arbitrary radices. Pairs of elements are swapped when their indices, written in radix r, are reversed. For example, in radix 10 and n = 1000 the elements with indices 123 and 321 will be swapped. The radix permutation is self-inverse. Code for the radix r permutation of the array f[ ] is given in [FXT: perm/radixpermute.h]. The routine must be called with n a perfect power of the radix r. Radix r = 2 gives the revbin permutation. 1 2 3 4 5 6 7 8 9 10 11 extern ulong radix_permute_nt[]; extern ulong radix_permute_kt[]; #define NT radix_permute_nt #define KT radix_permute_kt // == 9, 90, 900, ... // == 1, 10, 100, ... template void radix_permute(Type *f, ulong n, ulong r) { ulong x = 0; NT[0] = r-1; KT[0] = 1; for r=10 for r=10 122 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Chapter 2: Permutations and their operations while ( 1 ) { ulong z = KT[x] * r; if ( z>n ) break; ++x; KT[x] = z; NT[x] = NT[x-1] * r; } // here: n == p**x for (ulong i=0, j=0; i < n-1; i++) { if ( i void transpose(const Type * restrict f, Type * restrict g, ulong nr, ulong nc) // Transpose nr x nc matrix f[] into an nc x nr matrix g[]. { for (ulong r=0; r void transpose(Type *f, ulong nr, ulong nc, bitarray *ba=0) // In-place transposition of an nr X nc array // that lies in contiguous memory. { if ( 1>=nr ) return; if ( 1>=nc ) return; if ( nr==nc ) transpose_square(f, nr); else { const ulong n1 = nr * nc - 1; bitarray *tba = 0; if ( 0==ba ) tba = new bitarray(n1); else tba = ba; tba->clear_all(); for (ulong k=1; kset(kd); Type t = f[kd]; while ( ks != k ) { f[kd] = f[ks]; kd = ks; tba->set(kd); ks = SRC(ks); } f[kd] = t; } if ( 0==ba ) k=tba->next_clear(++k) ) // 0 and n1 are fixed points delete tba; } } One should take care of possible overflows in the calculation of i · nc . In case that n is a power of 2 (and so are both nr and nc ) the multiplications modulo n − 1 are cyclic shifts. Thus any overflow can be avoided and the computation is also significantly cheaper. An implementation is given in [FXT: aux2/transpose2.h]. 2.9 Rotation by triple reversal To rotate a length-n array by s positions without using any temporary memory, reverse three times as in the following routine [FXT: perm/rotate.h]: 1 2 template void rotate_left(Type *f, ulong n, ulong s) 124 Chapter 2: Permutations and their operations Rotate left by 3 positions: [ 1 2 3 4 5 6 7 8 ] original array [ 3 2 1 4 5 6 7 8 ] reverse first 3 elements [ 3 2 1 8 7 6 5 4 ] reverse last 8-3=5 elements [ 4 5 6 7 8 1 2 3 ] reverse whole array Rotate right by 3 positions: [ 1 2 3 4 5 6 7 8 ] original array [ 5 4 3 2 1 6 7 8 ] reverse first 8-3=5 elements [ 5 4 3 2 1 8 7 6 ] reverse last 3 elements [ 6 7 8 1 2 3 4 5 ] reverse whole array Figure 2.9-A: Rotation of a length-8 array by 3 positions to the left (top) and right (bottom). 3 4 5 6 7 8 9 10 11 12 13 14 15 16 // Rotate towards element #0 // Shift is taken modulo n { if ( s>=n ) { if (n<2) return; s %= n; } if ( s==0 ) return; reverse(f, s); reverse(f+s, n-s); reverse(f, n); } We will call this trick the triple reversal technique. For example, left-rotating an 8-element array by 3 positions is achieved by the steps shown in figure 2.9-A (top). A right rotation of an n-element array by s positions is identical to a left rotation by n − s positions (bottom of figure 2.9-A): 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 template void rotate_right(Type *f, ulong n, ulong s) // Rotate away from element #0 // Shift is taken modulo n { if ( s>=n ) { if (n<2) return; s %= n; } if ( s==0 ) return; reverse(f, n-s); reverse(f+n-s, s); reverse(f, n); } We could also execute the (self-inverse) steps of the left-shift routine in reversed order: reverse(f, n); reverse(f+s, n-s); reverse(f, s); [ 0 1 2 3 4 [ 0 1 2 3 4 [ 0 1 2 3 4 [ 0 1 2 3 4 [ 0 1 2 3 4 v v v v v v v v v a b c d e 7 8 w x y z e d c b a 7 8 w x y z e d c b a 8 7 w x y z e d c b a 8 7 z y x w w x y z 7 8 a b c d e ^ ^ ^ ^ ^ ^ ^ ^ ^ N N ] N N ] N N ] N N ] N N ] <--= want to swap these blocks original array reverse first block reverse range between blocks reverse second block reverse whole range <--= the swapped blocks Figure 2.9-B: Swapping the blocks [a b c d e] and [w x y z] via 4 reversals. The triple reversal trick can also be used to swap two blocks in an array: first reverse the three ranges (first blocks, range between blocks, last block), then reverse the range that consists of all three. We will call this trick the quadruple reversal technique. The corresponding code is given in [FXT: perm/swapblocks.h]: 2.10: The zip permutation 1 2 3 4 5 6 7 8 9 10 11 12 13 14 125 template void swap_blocks(Type *f, ulong x1, ulong n1, ulong x2, ulong n2) // Swap the blocks starting at indices x1 and x2 // n1 and n2 are the block lengths { if ( x1>x2 ) { swap2(x1,x2); swap2(n1,n2); } f += x1; x2 -= x1; ulong n = x2 + n2; reverse(f, n1); reverse(f+n1, n-n1-n2); reverse(f+x2, n2); reverse(f, n); } The elements before x1 and after x2+n2 are not accessed. An example is shown in figure 2.9-B. The listing was created with the program [FXT: perm/swap-blocks-demo.cc]. A routine to undo the effect of swap_blocks(f, x1, n1, x2, n2) can be obtained by reversing the order of the steps: 1 2 3 4 5 6 7 8 9 10 11 12 template void inverse_swap_blocks(Type *f, ulong x1, ulong n1, ulong x2, ulong n2) { if ( x1>x2 ) { swap2(x1,x2); swap2(n1,n2); } f += x1; x2 -= x1; ulong n = x2 + n2; reverse(f, n); reverse(f+x2, n2); reverse(f+n1, n-n1-n2); reverse(f, n1); } An alternative method is to call swap_blocks(f, x1, n2, x2+n2-n1, n1). 2.10 The zip permutation 0: [ * ] 1: [ * ] 2: [ * ] 3: [ * ] 4: [ * ] 5: [ * ] 6: [ * ] 7: [ * ] 8: [ * ] 9: [ * ] 10: [ * ] 11: [ * ] 12: [ * ] 13: [ * ] 14: [ * ] 15: [ * ] 0: [ * ] 1: [ * ] 2: [ * ] 3: [ * ] 4: [ * ] 5: [ * ] 6: [ * ] 7: [ * ] 8: [ * ] 9: [ * ] 10: [ * ] 11: [ * ] 12: [ * ] 13: [ * ] 14: [ * ] 15: [ * ] Figure 2.10-A: Permutation matrices of the zip permutation (left) and its inverse (right). The zip permutation moves the elements from the lower half to the even indices and the elements from the upper half to the odd indices. Symbolically, [ a b c d A B C D ] |--> [ a A b B c C d D ] The size of the array must be even. A routine for the permutation is [FXT: perm/zip.h] 1 2 3 4 5 template void zip(const Type * restrict f, Type * restrict g, ulong n) { ulong nh = n/2; for (ulong k=0, k2=0; k void unzip(const Type * restrict f, Type * restrict g, ulong n) { ulong nh = n/2; for (ulong k=0, k2=0; k void zip(Type *f, ulong n) { ulong nh = n/2; revbin_permute(f, nh); revbin_permute(f, n); } revbin_permute(f+nh, nh); The in-place version for the unzip permutation for arrays whose size is a power of 2 is 1 2 3 4 5 6 7 template void unzip(Type *f, ulong n) { ulong nh = n/2; revbin_permute(f, n); revbin_permute(f, nh); revbin_permute(f+nh, nh); } If the type Complex consists of two doubles lying contiguous in memory, then we can optimize the procedures as follows: 1 2 3 4 5 void zip(double *f, long n) { revbin_permute(f, n); revbin_permute((Complex *)f, n/2); } 1 2 3 4 5 void unzip(double *f, long n) { revbin_permute((Complex *)f, n/2); revbin_permute(f, n); } 2.11: The XOR permutation 127 For arrays whose size n is not a power of 2 the in-place zip permutation can be computed by transposing the data as a 2 × n/2 matrix: transpose(f, 2, n/2); // =^= zip(f, n) The routines for in-place transposition are given in section 2.8 on page 122. The inverse is computed by transposing the data as an n/2 × 2 matrix: transpose(f, n/2, 2); // =^= unzip(f, n) While the above mentioned technique is usually not a gain for doing a transposition it may be used to speed up the revbin permutation itself. 2.11 The XOR permutation 0: [ * ] 1: [ * ] 2: [ * ] 3: [ * ] 4: [ * ] 5: [ * ] 6: [ * ] 7: [ * ] x = 0 [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] x = 1 [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] x = 2 [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] x = 3 0: [ * ] 1: [ * ] 2: [ * ] 3: [ * ] 4: [ * ] 5: [ * ] 6: [ * ] 7: [ * ] x = 4 [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] x = 5 [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] x = 6 [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] [ * ] x = 7 Figure 2.11-A: Permutation matrices of the XOR permutation for length 8 with parameter x = 0 . . . 7. Compare to the table for the dyadic convolution shown in figure 23.8-A on page 481. The XOR permutation (with parameter x) swaps the element at index k with the element at index x XOR k (see figure 2.11-A). The implementation is easy [FXT: perm/xorpermute.h]: 1 2 3 4 5 6 7 8 9 10 template void xor_permute(Type *f, ulong n, ulong x) { if ( 0==x ) return; for (ulong k=0; kk ) swap2(f[r], f[k]); } } The XOR permutation is clearly self-inverse. The array length n must be divisible by the smallest power of 2 that is greater than x. For example, n must be even if x = 1 and n must be divisible by 4 if x = 2 or x = 3. With n a power of 2 and x < n one is on the safe side. The XOR permutation contains a few other permutations as important special cases (for simplicity assume that the array length n is a power of 2): If the third argument x equals n − 1, the permutation is the reversal. With x = 1 neighboring even and odd indexed elements are swapped. With x = n/2 the upper and the lower half of the array are swapped. We have Xa Xb = Xb Xa = Xc where c = a XOR b (2.11-1) 128 Chapter 2: Permutations and their operations For the special case a = b the relation does express the self-inverse property as X0 is the identity. The XOR permutation occurs in relations between other permutations where we will use the symbol Xa , the subscript a denoting the third argument in the given routine. 2.12 The Gray permutation 0: [ * ] 1: [ * ] 2: [ * ] 3: [ * ] 4: [ * ] 5: [ * ] 6: [ * ] 7: [ * ] 8: [ * ] 9: [ * ] 10: [ * ] 11: [ * ] 12: [ * ] 13: [ * ] 14: [ * ] 15: [ * ] 0: [ * ] 1: [ * ] 2: [ * ] 3: [ * ] 4: [ * ] 5: [ * ] 6: [ * ] 7: [ * ] 8: [ * ] 9: [ * ] 10: [ * ] 11: [ * ] 12: [ * ] 13: [ * ] 14: [ * ] 15: [ * ] Figure 2.12-A: Permutation matrices of the Gray permutation (left) and its inverse (right). The Gray permutation reorders (length-2n ) arrays according to the binary Gray code described in section 1.16 on page 41. A routine for the permutation is [FXT: perm/graypermute.h]: 1 2 3 4 5 6 template inline void gray_permute(const Type *f, Type * restrict g, ulong n) // Put Gray permutation of f[] to g[], i.e. g[gray_code(k)] == f[k] { for (ulong k=0; k inline void inverse_gray_permute(const Type *f, Type * restrict g, ulong n) // Put inverse Gray permutation of f[] to g[], i.e. g[k] == f[gray_code(k)] // (same as: g[inverse_gray_code(k)] == f[k]) { for (ulong k=0; k void gray_permute(Type *f, ulong n) { ulong z = 1; // mask for cycle maxima ulong v = 0; // ~z ulong cl = 1; // cycle length for (ulong ldm=1, m=2; m void inverse_gray_permute(Type *f, ulong n) { [--snip--] // --- do cycle: --ulong i = z | b.next(); // start of cycle Type t = f[i]; // save start value ulong g = gray_code(i); // next in cycle for (ulong k=cl-1; k!=0; --k) { f[i] = f[g]; i = g; g = gray_code(i); } f[i] = t; // --- end (do cycle) --[--snip--] } The Gray permutation is used with certain Walsh transforms, see section 23.7 on page 474. 2.12.3 Performance of the routines We use the convention that the time for an array reversal is 1.0. The operation is completely cache-friendly and therefore fast. A simple benchmark gives for 16 MB arrays: arg 1: 21 == ldn [Using 2**ldn elements] default=21 arg 2: 10 == rep [Number of repetitions] default=10 Memsize = 16384 kiloByte == 2097152 doubles reverse(f,n); dt= 0.0103524 MB/s= 1546 revbin_permute(f,n); dt= 0.0674235 MB/s= 237 revbin_permute0(f,n); dt= 0.061507 MB/s= 260 gray_permute(f,n); dt= 0.0155019 MB/s= 1032 inverse_gray_permute(f,n); dt= 0.0150641 MB/s= 1062 rel= rel= rel= rel= rel= 1 6.51282 5.94131 1.49742 1.45512 The revbin permutation takes about 6.5 units, due to its memory access pattern that is very problematic with respect to cache usage. The Gray permutation needs only 1.50 units. The difference gets bigger for machines with relatively slow memory with respect to the CPU. The relative speeds are quite different for small arrays. With 16 kB (2048 doubles) we obtain arg 1: 11 == ldn [Using 2**ldn elements] default=21 arg 2: 100000 == rep [Number of repetitions] default=512 Memsize = 16 kiloByte == 2048 doubles reverse(f,n); dt=1.88726e-06 MB/s= 8279 revbin_permute(f,n); dt=3.22166e-06 MB/s= 4850 revbin_permute0(f,n); dt=2.69212e-06 MB/s= 5804 gray_permute(f,n); dt=4.75155e-06 MB/s= 3288 inverse_gray_permute(f,n); dt=3.69237e-06 MB/s= 4232 rel= rel= rel= rel= rel= 1 1.70706 1.42647 2.51769 1.95647 Due to the small size, the cache problems are gone. 2.13 The reversed Gray permutation The reversed Gray permutation of a length-n array is computed by permuting the elements in the way that the Gray permutation would permute the upper half of an array of length 2n. The array size n must be a power of 2. An implementation is [FXT: perm/grayrevpermute.h]: 1 2 3 4 5 6 7 template inline void gray_rev_permute(const Type *f, Type * restrict g, ulong n) // gray_rev_permute() =^= // { reverse(); gray_permute(); } { for (ulong k=0, m=n-1; k void gray_rev_permute(Type *f, ulong n) // n must be a power of 2, n<=2**(BITS_PER_LONG-2) { f -= n; // note! ulong z = 1; // mask for cycle maxima ulong v = 0; // ~z ulong cl = 1; // cycle length ulong ldm, m; for (ldm=1, m=2; m<=n; ++ldm, m<<=1) { z <<= 1; v <<= 1; if ( is_pow_of_2(ldm) ) { ++z; cl<<=1; } else ++v; } ulong tv = v, tu = 0; // cf. bitsubset.h do { tu = (tu-tv) & tv; ulong i = z | tu; // start of cycle // --- do cycle: --ulong g = gray_code(i); Type t = f[i]; for (ulong k=cl-1; k!=0; --k) { Type tt = f[g]; f[g] = t; t = tt; g = gray_code(g); } f[g] = t; // --- end (do cycle) --} while ( tu ); 2.13: The reversed Gray permutation 38 133 } The routine for the inverse permutation again differs only in the way the cycles are processed: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 template void inverse_gray_rev_permute(Type *f, ulong n) { [--snip--] // --- do cycle: --Type t = f[i]; // save start value ulong g = gray_code(i); // next in cycle for (ulong k=cl-1; k!=0; --k) { f[i] = f[g]; i = g; g = gray_code(i); } f[i] = t; // --- end (do cycle) --[--snip--] } Let G denote the Gray permutation, G the reversed Gray permutation, r be the reversal, h the swap of the upper and lower halves, and Xa the XOR permutation (with parameter a) from section 2.11 on page 127. We have G = Gr = hG (2.13-1a) −1 (2.13-1b) −1 G = rG −1 −1 G G = G −1 GG G = r = Xn−1 (2.13-1c) −1 (2.13-1d) = GG = h = Xn/2 134 Chapter 3: Sorting and searching Chapter 3 Sorting and searching We give various sorting algorithms and some practical variants of them, like sorting index arrays and pointer sorting. Searching methods both for sorted and for unsorted arrays are described. Finally we give methods for the determination of equivalence classes. 3.1 Sorting algorithms We give sorting algorithms like selection sort, quicksort, merge sort, counting sort and radix sort. A massive amount of literature exists about the topic so we will not explore the details. Very readable texts are [115] and [306], while in-depth information can be found in [214]. 3.1.1 Selection sort [ n o w s o r t m e ] [ e o w s o r t m n ] [ m w s o r t o n ] [ n s o r t o w ] [ o o r t s w ] [ o r t s w ] [ r t s w ] [ s t w ] [ t w ] [ w ] [ e m n o o r s t w ] Figure 3.1-A: Sorting the string ‘nowsortme’ with the selection sort algorithm.  There are a several algorithms for sorting that have complexity O n2 where n is the size of the array to be sorted. Here we use selection sort, where the idea is to find the minimum of the array, swap it with the first element, and repeat for all elements but the first. A demonstration of the algorithm is shown in figure 3.1-A, this is the output of [FXT: sort/selection-sort-demo.cc]. The implementation is straightforward [FXT: sort/sort.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 template void selection_sort(Type *f, ulong n) // Sort f[] (ascending order). // Algorithm is O(n*n), use for short arrays only. { for (ulong i=0; i i ) // search (index of) minimum { if ( f[j] bool is_sorted(const Type *f, ulong n) // Return whether the sequence f[0], f[1], ..., f[n-1] is ascending. { for (ulong k=1; k f[k] ) return false; return true; } A test for descending order is 1 2 3 4 5 6 7 template bool is_falling(const Type *f, ulong n) // Return whether the sequence f[0], f[1], ..., f[n-1] is descending. { for (ulong k=1; k ulong partition(Type *f, ulong n) { // Avoid worst case with already sorted input: const Type v = median3(f[0], f[n/2], f[n-1]); ulong i = 0UL - 1; ulong j = n; while ( 1 ) { do { ++i; } while ( f[i]v ); if ( i static inline Type median3(const Type &x, const Type &y, const Type &z) // Return median of the input values { return x void quick_sort(Type *f, ulong n) { if ( n<=1 ) return; ulong p = partition(f, n); ulong ln = p + 1; ulong rn = n - ln; 136 9 10 11 Chapter 3: Sorting and searching quick_sort(f, ln); // f[0] ... f[ln-1] left quick_sort(f+ln, rn); // f[ln] ... f[n-1] right } The actual implementation uses two optimizations: Firstly, if the number of elements to be sorted is less than a certain threshold, selection sort is used. Secondly, the recursive calls are made for the smaller of the two sub-arrays, thereby the stack size is bounded by dlog2 (n)e. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 template void quick_sort(Type *f, ulong n) { start: if ( n<8 ) // parameter: threshold for nonrecursive algorithm { selection_sort(f, n); return; } ulong p = partition(f, n); ulong ln = p + 1; ulong rn = n - ln; if ( ln>rn ) // recursion for shorter sub-array { quick_sort(f+ln, rn); // f[ln] ... f[n-1] right n = ln; } else { quick_sort(f, ln); // f[0] ... f[ln-1] left n = rn; f += ln; } goto start; } The quicksort algorithm will be quadratic with certain inputs. A clever method to construct such inputs is described in [247]. The heapsort algorithm is in-place and O (n log(n)) (also in the worst case). It is described in section 3.1.5 on page 141. Inputs that lead to quadratic time for the quicksort algorithm with median-of-3 partitioning are described in [257]. The paper suggests to use quicksort, but to detect problematic behavior during runtime and switch to heapsort if needed. The corresponding algorithm is called introsort (for introspective sorting). 3.1.3 Counting sort and radix sort We want to sort an n-element array F of (unsigned) 8-bit values. A sorting algorithm which involves only 2 passes through the data proceeds as follows: 1. Allocate an array C of 256 integers and set all its elements to zero. 2. Count: for k = 0, 1, . . . , n − 1 increment C[F [k]]. Now C[x] contains the number of bytes in F with the value x. 3. Set r = 0. For j = 0, 1, . . . , 255 set k = C[j], then set the elements F [r], F [r + 1], . . . , F [r + k − 1] to j, and add k to r. For large values of n this method is significantly faster than any other sorting algorithm. Note that no comparisons are made between the elements of F . Instead they are counted, the algorithm is the counting sort algorithm. It might seem that the idea applies only to very special cases but with a little care it can be used in more general situations. We modify the method so that we are able to sort also (unsigned) integer variables whose range of values would make the method impractical with respect to a subrange of the bits in each word. We need an array G that has as many elements as F : 1. Choose any consecutive run of b bits, these will be represented by a bit mask m. Allocate an array C of 2b integers and set all its elements to zero. 3.1: Sorting algorithms 137 2. Let M be a function that maps the (2b ) values of interest (the bits masked out by m) to the range 0, 1, . . . , 2b − 1. 3. Count: for k = 0, 1, . . . , n − 1 increment C[M (F [k])]. Now C[x] contains how many values of M (F [.]) equal x. 4. Cumulate: for j = 1, 2, . . . , 2b − 1 (second to last) add C[j − 1] to C[j]. Now C[x] contains the number of values M (F [.]) less than or equal to x. 5. Copy: for k = n − 1, . . . , 2, 1, 0 (last to first), do as follows: set x := M (F [k]), decrement C[x], set i := C[x], and set G[i] := F [x]. A crucial property of the algorithm is that it is stable: if two (or more) elements compare equal (with respect to a certain bit-mask m), then the relative order between these elements is preserved. Input 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 11111.11< ....1... ...1.1.1 ..1...1. ..1.1111< ..1111.. .1..1..1 .1.1.11. .11...11< .111.... Counting sort wrt. two lowest bits m = ......11 0: ....1... 1: ..1111.. 2: .111.... 3: ...1.1.1 4: .1..1..1 5: ..1...1. 6: .1.1.11. 7: 11111.11< 8: ..1.1111< 9: .11...11< The relative order of the three words ending with two set bits (marked with ‘<’) is preserved. A routine that verifies whether an array is sorted with respect to a bit range specified by the variable b0 and m is [FXT: sort/radixsort.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 bool is_counting_sorted(const ulong *f, ulong n, ulong b0, ulong m) // Whether f[] is sorted wrt. bits b0,...,b0+z-1 // where z is the number of bits set in m. // m must contain a single run of bits starting at bit zero. { m <<= b0; for (ulong k=1; k> b0; ulong xp = (f[k] & m ) >> b0; if ( xm>xp ) return false; } return true; } The function M is the combination of a mask-out and a shift operation. A routine that sorts according to b0 and m is: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 void counting_sort_core(const ulong * restrict f, ulong n, ulong * restrict g, ulong b0, ulong m) // Write to g[] the array f[] sorted wrt. bits b0,...,b0+z-1 // where z is the number of bits set in m. // m must contain a single run of bits starting at bit zero. { ulong nb = m + 1; m <<= b0; ALLOCA(ulong, cv, nb); for (ulong k=0; k> b0; ++cv[ x ]; } // --- cumulative sums: for (ulong k=1; k stable sort { 138 26 27 28 29 30 31 32 Chapter 3: Sorting and searching ulong fk = f[k]; ulong x = (fk & m) >> b0; --cv[x]; ulong i = cv[x]; g[i] = fk; } } Input 111.11 ..1... .1.1.1 1...1. 1.1111 1111.. ..1..1 .1.11. 1...11 11.... Stage 1 m = ....11 vv ..1... 1111.. 11.... .1.1.1 ..1..1 1...1. .1.11. 111.11 1.1111 1...11 Stage 2 m = ..11.. vv 11.... 1...1. 1...11 .1.1.1 .1.11. ..1... ..1..1 111.11 1111.. 1.1111 Stage 3 m = 11.... vv ..1... ..1..1 .1.1.1 .1.11. 1...1. 1...11 1.1111 11.... 111.11 1111.. Figure 3.1-B: Radix sort of 10 six-bit values when using two-bit masks. Now we can apply counting sort to a set of bit masks that cover the whole range. Figure 3.1-B shows an example with 10 six-bit values and 3 two-bit masks, starting from the least significant bits. This is the output of the program [FXT: sort/radixsort-demo.cc]. The following routine uses 8-bit masks to sort unsigned integers [FXT: sort/radixsort.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 void radix_sort(ulong *f, ulong n) { ulong nb = 8; // Number of bits sorted with each step ulong tnb = BITS_PER_LONG; // Total number of bits ulong *fi = f; ulong *g = new ulong[n]; ulong m = (1UL< void merge(Type * const restrict f, ulong na, ulong nb, Type * const restrict t) // Merge the (sorted) arrays // A[] := f[0], f[1], ..., f[na-1] and B[] := f[na], f[na+1], ..., f[na+nb-1] // into t[] := t[0], t[1], ..., t[na+nb-1] such that t[] is sorted. // Must have: na>0 and nb>0 3.1: Sorting algorithms 139 [ n o w s o r t m e A D B A C D 5 4 3 2 1 ] [ n o o s w [ A e m r t [ A e m n o o r s t w [ [ [ ] ] ] A B C D D ] 1 2 3 4 5 ] 1 2 3 4 5 A B C D D ] [ A e m n o o r s t w ] [ 1 2 3 4 5 A B C D D ] [ 1 2 3 4 5 A A B C D D e m n o o r s t w ] Figure 3.1-C: Sorting with the merge sort algorithm. 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 { const Type * const A = f; const Type * const B = f + na; ulong nt = na + nb; Type ta = A[--na], tb = B[--nb]; while ( true ) { if ( ta > tb ) // copy ta { t[--nt] = ta; if ( na==0 ) // A[] empty? { for (ulong j=0; j<=nb; ++j) return; } t[j] = B[j]; ta = A[--na]; // read next element of A[] } else // copy tb { t[--nt] = tb; if ( nb==0 ) // B[] empty? { for (ulong j=0; j<=na; ++j) t[j] = A[j]; return; } tb = B[--nb]; // copy rest of B[] // copy rest of A[] // read next element of B[] } } } Two branches are involved, the unavoidable branch with the comparison of the elements, and the test for empty array where an element has been removed. We could sort by merging adjacent blocks of growing size as follows: [ h g f e d c b a ] [ g h e f c d a b ] [ e f g h a b c d ] [ a b c d e f g h ] // input // merge pairs // merge adjacent runs of two // merge adjacent runs of four For a more localized memory access, we use a depth first recursion (compare with the binsplit recursion in section 34.1.1.1 on page 651): 1 2 3 4 5 6 7 8 9 10 11 12 template void merge_sort_rec(Type *f, ulong n, Type *t) { if ( n<8 ) { selection_sort(f, n); return; } const ulong na = n>>1; const ulong nb = n - na; 140 13 14 15 16 17 18 19 20 21 Chapter 3: Sorting and searching // PRINT f[0], f[1], ..., f[na-1] merge_sort_rec(f, na, t); // PRINT f[na], f[na+1], ..., f[na+nb-1] merge_sort_rec(f+na, nb, t); merge(f, na, nb, t); for (ulong j=0; j void merge_sort(Type *f, ulong n, Type *tmp=0) { Type *t = tmp; if ( tmp==0 ) t = new Type[n]; merge_sort_rec(f, n, t); if ( tmp==0 ) delete [] t; } Optimized algorithm F: [ n o w s o r t m e A D B A C D 5 4 3 2 1 ] F: [ n o o s w ] F: [ A e m r t ] T: [ A e m n o o r s t w ] F: [ A B C D D ] F: [ 1 2 3 4 5 ] T: [ 1 2 3 4 5 A B C D D ] F: [ 1 2 3 4 5 A A B C D D e m n o o r s t w ] Figure 3.1-D: Sorting with the 4-way merge sort algorithm. The copying from T to F in the recursive routine can be avoided by a 4-way splitting scheme. We sort the left two quarters and merge them into T , then we sort the right two quarters and merge them into T + na . Then we merge T and T + na into F . Figure 3.1-D shows an example where only one recursive step is involved. It was generated with the program [FXT: sort/merge-sort4-demo.cc]. The recursive routine is [FXT: sort/merge-sort.h] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 template void merge_sort_rec4(Type *f, ulong n, Type *t) { if ( n<8 ) // threshold must be at least 8 { selection_sort(f, n); return; } // left and right half: const ulong na = n>>1; const ulong nb = n - na; // left quarters: const ulong na1 = na>>1; const ulong na2 = na - na1; merge_sort_rec4(f, na1, t); merge_sort_rec4(f+na1, na2, t); // right quarters: const ulong nb1 = nb>>1; const ulong nb2 = nb - nb1; merge_sort_rec4(f+na, nb1, t); merge_sort_rec4(f+na+nb1, nb2, t); // merge quarters (F-->T): merge(f, na1, na2, t); merge(f+na, nb1, nb2, t+na); 3.2: Binary search 30 31 32 141 // merge halves (T-->F): merge(t, na, nb, f); } The routine called by the user is merge_sort4(). 3.1.5 Heapsort The heapsort algorithm has complexity O (n log(n)). It uses the heap data structure introduced in section 4.5.2 on page 160. A heap can be sorted by swapping the first (and biggest) element with the last and restoring the heap property for the array of size n − 1. Repeat until there is nothing more to sort [FXT: sort/heapsort.h]: 1 2 3 4 5 6 7 8 9 10 11 12 template void heap_sort(Type *x, ulong n) { build_heap(x, n); Type *p = x - 1; for (ulong k=n; k>1; --k) { swap2(p[1], p[k]); // move largest to end of array --n; // remaining array has one element less heapify(p, n, 1); // restore heap-property } } Sorting into descending order is not any harder: 1 2 3 4 5 6 7 8 9 10 11 12 template void heap_sort_descending(Type *x, ulong n) // Sort x[] into descending order. { build_heap(x, n); Type *p = x - 1; for (ulong k=n; k>1; --k) { ++p; --n; // remaining array has one element less heapify(p, n, 1); // restore heap-property } } A program that demonstrates the algorithm is [FXT: sort/heapsort-demo.cc]. 3.2 Binary search Searching for an element in a sorted array can be done in O (log(n)) operations. The binary search algorithm uses repeated subdivision of the data [FXT: sort/bsearch.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 template ulong bsearch(const Type *f, ulong n, const Type v) // Return index of first element in f[] that equals v // Return n if there is no such element. // f[] must be sorted in ascending order. // Must have n!=0 { ulong nlo=0, nhi=n-1; while ( nlo != nhi ) { ulong t = (nhi+nlo)/2; if ( f[t] < v ) else nlo = t + 1; nhi = t; } if ( f[nhi]==v ) else return nhi; return n; } Only simple modifications are needed to search, for example, for the first element greater than or equal to a given value: 142 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Chapter 3: Sorting and searching template ulong bsearch_geq(const Type *f, ulong n, const Type v) { ulong nlo=0, nhi=n-1; while ( nlo != nhi ) { ulong t = (nhi+nlo)/2; if ( f[t] < v ) else nlo = t + 1; nhi = t; } if ( f[nhi]>=v ) else return nhi; return n; } For very large arrays the algorithm can be improved by selecting the new index t different from the midpoint (nhi+nlo)/2, depending on the value sought and the distribution of the values in the array. As a simple example consider an array of floating-point numbers that are equally distributed in the interval [min(v), max(v)]. If the sought value equals v, one starts with the relation n − min(n) max(n) − min(n) = v − min(v) max(v) − min(v) (3.2-1) where n denotes an index and min(n), max(n) denote the minimal and maximal index of the current interval. Solving for n gives the linear interpolation formula n = min(n) + max(n) − min(n) (v − min(v)) max(v) − min(v) (3.2-2) The corresponding interpolation binary search algorithm would select the new subdivision index t according to the given relation. One could even use quadratic interpolation schemes for the selection of t. For the majority of practical applications the midpoint version of the binary search will be good enough. Approximate matches are found by the following routine [FXT: sort/bsearchapprox.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 template ulong bsearch_approx(const Type *f, ulong n, const Type v, Type da) // Return index of first element x in f[] for which |(x-v)| <= da // Return n if there is no such element. // f[] must be sorted in ascending order. // da must be positive. // // Makes sense only with inexact types (float or double). // Must have n!=0 { ulong k = bsearch_geq(f, n, v-da); if ( k void idx_selection_sort(const Type *f, ulong n, ulong *x) // Sort x[] so that the sequence f[x[0]], f[x[1]], ... f[x[n-1]] is ascending. // Algorithm is O(n*n), use for short arrays only. 3.3: Variants of sorting methods 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 143 { for (ulong i=0; i i ) // search (index of) minimum { if ( f[x[j]] bool is_idx_sorted(const Type *f, ulong n, const ulong *x) // Return whether the sequence f[x[0]], f[x[1]], ... f[x[n-1]] is ascending order. { for (ulong k=1; k f[x[k]] ) return false; return true; } The transformation of the partition() routine is straightforward: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 template ulong idx_partition(const Type *f, ulong n, ulong *x) // rearrange index array, so that for some index p // max(f[x[0]] ... f[x[p]]) <= min(f[x[p+1]] ... f[x[n-1]]) { // Avoid worst case with already sorted input: const Type v = median3(*x[0], *x[n/2], *x[n-1], cmp); ulong i = 0UL - 1; ulong j = n; while ( 1 ) { do ++i; while ( f[x[i]]v ); if ( i void idx_quick_sort(const Type *f, ulong n, ulong *x) // Sort x[] so that the sequence f[x[0]], f[x[1]], ... f[x[n-1]] is ascending. { start: if ( n<8 ) // parameter: threshold for nonrecursive algorithm { idx_selection_sort(f, n, x); return; } ulong p = idx_partition(f, n, x); ulong ln = p + 1; ulong rn = n - ln; if ( ln>rn ) // recursion for shorter sub-array { idx_quick_sort(f, rn, x+ln); // f[x[ln]] ... f[x[n-1]] n = ln; } else { right 144 23 24 25 26 27 28 29 30 Chapter 3: Sorting and searching idx_quick_sort(f, ln, x); // f[x[0]] ... f[x[ln-1]] left n = rn; x += ln; } goto start; } Note that the index-sort routines work perfectly for non-contiguous data. The index-analogues of the binary search algorithms are again straightforward, they are given in [FXT: sort/bsearchidx.h]. The sorting routines do not change the array f , the actual data is not modified. To bring f into sorted order, apply the inverse permutation of x to f (see section 2.4 on page 109): apply_inverse_permutation(x, f, n); To copy f in sorted order into g, use: apply_inverse_permutation(x, f, n, g); Input: f[] key[] A 0 B 1 C 1 D 3 E 1 F 3 E 3 G 7 After sort_by_key(f, n, key, 1): f[] key[] A 0 E 1 C 1 B 1 D 3 F 3 E 3 G 7 Figure 3.3-A: Sorting an array according to an array of keys. The array x can be used for sorting by keys, see figure 3.3-A. The routine is [FXT: sort/sortbykey.h]: 1 2 3 4 5 6 7 8 9 10 11 12 template void sort_by_key(Type1 *f, ulong n, Type2 *key, bool skq=true) // Sort f[] according to key[] in ascending order: // f[k] precedes f[j] if key[k] void ptr_selection_sort(/*const Type *f,*/ ulong n, const Type **x) // Sort x[] so that the sequence *x[0], *x[1], ..., *x[n-1] is ascending. { for (ulong i=0; i i ) // search (index of) minimum { if ( *x[j] bool is_ptr_sorted(/*const Type *f,*/ ulong n, Type const*const*x) // Return whether the sequence *x[0], *x[1], ..., *x[n-1] is ascending. { for (ulong k=1; k *x[k] ) return false; return true; } The pointer versions of the search routines are given in [FXT: sort/bsearchptr.h]. 3.3.3 Sorting by a supplied comparison function The routines in [FXT: sort/sortfunc.h] are similar to the C-quicksort qsort that is part of the standard library. A comparison function cmp has to be supplied by the caller. This allows, for example, sorting compound data types with respect to some key contained within them. Citing the manual page for qsort: The comparison function must return an integer less than, equal to, or greater than zero if the first argument is considered to be respectively less than, equal to, or greater than the second. If two members compare as equal, their order in the sorted array is undefined. As a prototypical example we give the selection sort routine: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 template void selection_sort(Type *f, ulong n, int (*cmp)(const Type &, const Type &)) // Sort f[] (ascending order) with respect to comparison function cmp(). { for (ulong i=0; i i ) // search (index of) minimum { if ( cmp(f[j],v) < 0 ) { m = j; v = f[m]; } } swap2(f[i], f[m]); } } The other routines are rather straightforward translations of the (plain) sort analogues. Replace the comparison operations involving elements of the array as follows: (a < b) (a > b) (a == b) (a <= b) (a >= b) cmp(a,b) < 0 cmp(a,b) > 0 cmp(a,b) == 0 cmp(a,b) <= 0 cmp(a,b) >= 0 The verification routine is 1 2 3 4 5 6 7 8 template bool is_sorted(const Type *f, ulong n, int (*cmp)(const Type &, const Type &)) // Return whether the sequence f[0], f[1], ..., f[n-1] // is sorted in ascending order with respect to comparison function cmp(). { for (ulong k=1; k 0 ) return false; return true; } The numerous calls to cmp() do have a negative impact on the performance. With C++ you can provide a comparison ‘function’ for a class by overloading the comparison operators <, <, <=, >=, and == and use 146 Chapter 3: Sorting and searching the plain sort version. That is, the comparisons are inlined and the performance should be fine. 3.3.3.1 Sorting complex numbers You want to sort complex numbers? Fine with me, but don’t tell your local mathematician. To see the mathematical problem, we ask whether i is less than or greater than zero. Assuming i > 0 it follows that i · i > 0 (we multiplied with a positive value) which is −1 > 0 and that is false. So, is i < 0? Then i · i > 0 (multiplication with a negative value, as assumed), thereby −1 > 0. Oops! The lesson is that there is no way to impose an order on the complex numbers that would justify the usage of the symbols ‘<’ and ‘>’ consistent with the rules to manipulate inequalities. Nevertheless we can invent a relation for sorting: arranging (sorting) the complex numbers according to their absolute value (modulus) leaves infinitely many numbers in one ‘bucket’, namely all those that have the same distance from zero. However, one could use the modulus as the major ordering parameter, the argument (angle) as the minor. Or the real part as the major and the imaginary part as the minor. The latter is realized in 1 2 3 4 5 6 7 8 9 10 11 static inline int cmp_complex(const Complex &f, const Complex &g) { const double fr = f.real(), gr = g.real(); if ( fr!=gr ) return (fr>gr ? +1 : -1); const double fi = f.imag(), gi = g.imag(); if ( fi!=gi ) return (fi>gi ? +1 : -1); return 0; } This function, when used as comparison with the following routine, can indeed be the practical tool you had in mind: 1 2 3 4 5 6 void complex_sort(Complex *f, ulong n) // major order wrt. real part // minor order wrt. imag part { quick_sort(f, n, cmp_complex); } 3.3.3.2 Index and pointer sorting The index sorting routines that use a supplied comparison function are given in [FXT: sort/sortidxfunc.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 template void idx_selection_sort(const Type *f, ulong n, ulong *x, int (*cmp)(const Type &, const Type &)) // Sort x[] so that the sequence f[x[0]], f[x[1]], ... f[x[n-1]] // is ascending with respect to comparison function cmp(). { for (ulong i=0; i i ) // search (index of) minimum { if ( cmp(f[x[j]], v) < 0 ) { m = j; v = f[x[m]]; } } swap2(x[i], x[m]); } } The verification routine is: 1 2 3 4 template bool is_idx_sorted(const Type *f, ulong n, const ulong *x, int (*cmp)(const Type &, const Type &)) // Return whether the sequence f[x[0]], f[x[1]], ... f[x[n-1]] is ascending 3.4: Searching in unsorted arrays 5 6 7 8 9 // with respect to comparison function cmp(). { for (ulong k=1; k 0 ) return true; } 147 return false; The pointer sorting versions are given in [FXT: sort/sortptrfunc.h] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 template void ptr_selection_sort(/*const Type *f,*/ ulong n, const Type **x, int (*cmp)(const Type &, const Type &)) // Sort x[] so that the sequence *x[0], *x[1], ..., *x[n-1] // is ascending with respect to comparison function cmp(). { for (ulong i=0; i i ) // search (index of) minimum { if ( cmp(*x[j],v)<0 ) { m = j; v = *x[m]; } } swap2(x[i], x[m]); } } The verification routine is: 1 2 3 4 5 6 7 8 9 template bool is_ptr_sorted(/*const Type *f,*/ ulong n, Type const*const*x, int (*cmp)(const Type &, const Type &)) // Return whether the sequence *x[0], *x[1], ..., *x[n-1] // is ascending with respect to comparison function cmp(). { for (ulong k=1; k 0 ) return false; return true; } The corresponding versions of the binary search algorithm are given in [FXT: sort/bsearchidxfunc.h] and [FXT: sort/bsearchptrfunc.h]. 3.4 Searching in unsorted arrays To find the first occurrence of a certain value in an unsorted array use the routine [FXT: sort/usearch.h] 1 2 3 4 5 6 7 8 9 template inline ulong first_geq_idx(const Type *f, ulong n, Type v) // Return index of first element == v // Return n if all !=v { ulong k = 0; while ( (k inline ulong first_eq_idx(/* NOT const */ Type *f, ulong n, Type v) { Type s = f[n-1]; f[n-1] = v; // sentinel to guarantee that the search stops ulong k = 0; while ( f[k]!=v ) ++k; 148 8 9 10 11 Chapter 3: Sorting and searching f[n-1] = s; // restore value if ( (k==n-1) && (v!=s) ) ++k; return k; } There is only one branch in the inner loop, this can give a significant speedup. However, the technique is only applicable if writing to the array ‘f[]’ is allowed. Another way to optimize the search is partial unrolling of the loop: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 template inline ulong first_eq_idx_large(const Type *f, ulong n, Type v) { ulong k; for (k=0; k<(n&3); ++k) if ( f[k]==v ) return k; while ( k!=n ) // 4-fold unrolled { Type t0 = f[k], t1 = f[k+1], t2 = f[k+2], t3 = f[k+3]; bool qa = ( (t0==v) | (t1==v) ); // note bit-wise OR to avoid branch bool qb = ( (t2==v) | (t3==v) ); if ( qa | qb ) // element v found { while ( 1 ) { if ( f[k]==v ) return k; else ++k; } } k += 4; } return n; } The search requires only two branches with every four elements. By using two variables qa and qb better usage of the CPU internal parallelism is attempted. Depending on the data type and CPU architecture 8-fold unrolling may give a speedup. 3.5 Determination of equivalence classes Let S be a set and C := S × S the set of all ordered pairs (x, y) with x, y ∈ S. A binary relation R on S is a subset of C. An equivalence relation is a binary relation with the following properties: • reflexive: x ≡ x ∀x. • symmetric: x ≡ y ⇐⇒ y ≡ x ∀x, y. • transitive: x ≡ y, y ≡ z =⇒ x ≡ z ∀x, y, z. Here we wrote x ≡ y for (x, y) ∈ R where x, y ∈ S. We want to determine the equivalence classes: an equivalence relation partitions a set into 1 ≤ q ≤ n subsets E1 , E2 , . . . , Eq so that x ≡ y whenever both x and y are in the same subset but x 6≡ y if x and y are in different subsets. For example, the usual equality relation is an equivalence relation, with a set of (different) numbers each number is in its own class. With the equivalence relation that x ≡ y whenever x − y is a multiple of some fixed integer m > 0 and the set Z of all natural numbers we obtain m subsets and x ≡ y if and only if x ≡ y mod m. 3.5.1 Algorithm for decomposition into equivalence classes Let S be a set of n elements, represented as a vector. On termination of the following algorithm Qk = j if j is the least index such that Sj ≡ Sk (note that we consider the elements of S to be in a fixed but arbitrary order here): 1. Put each element in its own equivalence class: Qk := k for all 0 ≤ k < n 2. Set k := 1 (index of the second element). 3.5: Determination of equivalence classes 149 3. (Search for an equivalent element:) (a) Set j := 0. (b) If Sk ≡ Sj set Qk = Qj and goto step 4. (c) Set j := j + 1 and goto step 3b 4. Set k := k + 1 and if k < n goto step 3, else terminate. The algorithm needs n − 1 equivalence tests when all elements are in the same equivalence class and n (n − 1)/2 equivalence tests when each element is alone in its own equivalence class. In the following implementation the equivalence relation must be supplied as a function equiv_q() that returns true when its arguments are equivalent [FXT: sort/equivclasses.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 template void equivalence_classes(const Type *s, ulong n, bool (*equiv_q)(Type,Type), ulong *q) // Given an equivalence relation ’==’ (as function equiv_q()) // and a set s[] with n elements, // write to q[k] the index j of the first element s[j] such that s[k]==s[j]. { for (ulong k=0; k 0. We still put two numbers a and b into the same class if a − b is an integer multiple of m. Finally, the modulus m = 0 leads to the equivalence relation ‘equality’. 3.5.2.2 Binary necklaces Consider the set S of n-bit binary words with the equivalence relation in which two words x and y are equivalent if and only if there is a cyclic shift hk (x) by 0 ≤ k < n positions such that hk (x) = y. The equivalence relation is supplied as the function [FXT: sort/equivclass-necklaces-demo.cc]: 1 2 3 4 5 6 static ulong nb; // number of bits bool n_equiv_q(ulong x, ulong y) // necklaces { ulong d = bit_cyclic_dist(x, y, nb); return (0==d); } The function bit_cyclic_dist() is given in section 1.13.4 on page 32. For n = 4 we find the following list of equivalence classes: 0: .... [#=1] 1: 1... .1.. ...1 ..1. 3: 1..1 11.. ..11 .11. 5: .1.1 1.1. [#=2] 7: 11.1 111. 1.11 .111 15: 1111 [#=1] # of equivalence classes = 6 [#=4] [#=4] [#=4] 150 Chapter 3: Sorting and searching These correspond to the binary necklaces of length 4. One usually chooses the cyclic minima (or maxima) among equivalent words as representatives of the classes. 3.5.2.3 Unlabeled binary necklaces Same set but the equivalence relation is defined to identify two words x and y when there is a cyclic shift hk (x) by 0 ≤ k < n positions so that either hk (x) = y or hk (x) = y where y is the complement of y: 1 2 3 4 5 6 7 static ulong mm; // mask to complement bool nu_equiv_q(ulong x, ulong y) // unlabeled necklaces { ulong d = bit_cyclic_dist(x, y, nb); if ( 0!=d ) d = bit_cyclic_dist(mm^x, y, nb); return (0==d); } With n = 4 we find 0: 1111 .... [#=2] 1: 111. 11.1 1.11 1... 3: .11. 1..1 11.. ..11 5: .1.1 1.1. [#=2] # of equivalence classes = 4 .111 ...1 [#=4] ..1. .1.. [#=8] These correspond to the unlabeled binary necklaces of length 4. 3.5.2.4 Binary bracelets The binary bracelets are obtained by identifying two words that are identical up to rotation and possible reversal. The corresponding comparison function is 1 2 3 4 5 6 bool b_equiv_q(ulong x, ulong y) // bracelets { ulong d = bit_cyclic_dist(x, y, b); if ( 0!=d ) d = bit_cyclic_dist(revbin(x,b), y, b); return (0==d); } There are six binary bracelets of length 4: 0: 1: 3: 5: 7: 15: .... 1... 1..1 .1.1 11.1 1111 [#=1] .1.. ...1 ..1. 11.. ..11 .11. 1.1. [#=2] 111. 1.11 .111 [#=1] [#=4] [#=4] [#=4] The unlabeled binary bracelets are obtained by additionally allowing for bit-wise complementation: 1 2 3 4 5 6 7 8 9 10 11 12 13 bool bu_equiv_q(ulong x, ulong y) // unlabeled bracelets { ulong d = bit_cyclic_dist(x, y, b); x ^= mm; if ( 0!=d ) d = bit_cyclic_dist(x, y, b); x = revbin(x,b); if ( 0!=d ) d = bit_cyclic_dist(x, y, b); x ^= mm; if ( 0!=d ) d = bit_cyclic_dist(x, y, b); return (0==d); } There are four unlabeled binary bracelets of length 4: 0: 1: 3: 5: 1111 111. .11. .1.1 .... 11.1 1..1 1.1. [#=2] 1.11 1... 11.. ..11 [#=2] .111 ...1 [#=4] ..1. .1.. [#=8] The shown functions are given in [FXT: sort/equivclass-bracelets-demo.cc] which can be used to produce listings of the equivalence classes. The sequences of numbers of labeled and unlabeled necklaces and bracelets are shown in figure 3.5-A. 3.5: Determination of equivalence classes n: [312]# 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: N A000031 2 3 4 6 8 14 20 36 60 108 188 352 632 1182 2192 151 B A000029 2 3 4 6 8 13 18 30 46 78 126 224 380 687 1224 N/U A000013 1 2 2 4 4 8 10 20 30 56 94 180 316 596 1096 B/U A000011 1 2 2 4 4 8 9 18 23 44 63 122 190 362 612 Figure 3.5-A: The number of binary necklaces ‘N’, bracelets ‘B’, unlabeled necklaces ‘N/U’, and unlabeled bracelets ‘B/U’. The second row gives the sequence number in [312]. 3.5.2.5 Binary words with reversal and complement The set S of n-bit binary words and the equivalence relation identifying two words x and y whenever they are mutual complements or bit-wise reversals. 3 classes with 3-bit words: 0: 111 ... 1: ..1 .11 1.. 11. 2: 1.1 .1. 6 classes with 4-bit words: 0: 1111 .... 1: 111. 1... .111 ...1 2: ..1. .1.. 1.11 11.1 3: 11.. ..11 5: 1.1. .1.1 6: .11. 1..1 10 classes with 5-bit words: 0: 11111 ..... 1: 1111. 1.... .1111 ....1 2: 1.111 111.1 .1... ...1. 3: 111.. ...11 ..111 11... 4: ..1.. 11.11 5: 11.1. 1.1.. ..1.1 .1.11 6: ..11. .11.. 11..1 1..11 9: .11.1 1.11. .1..1 1..1. 10: .1.1. 1.1.1 14: 1...1 .111. Figure 3.5-B: Equivalence classes of binary words where words are identified if either their reversals or complements are equal. For example, the equivalence classes with 3-, 4- and 5-bit words are shown in figure 3.5-B. The sequence of numbers of equivalence classes for word-sizes n is (entry A005418 in [312]) n: #: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, ... 1, 2, 3, 6, 10, 20, 36, 72, 136, 272, 528, 1056, 2080, 4160, 8256, 16512, ... The equivalence classes can be computed with the program [FXT: sort/equivclass-bitstring-demo.cc]. We have chosen examples where the resulting equivalence classes can be verified by inspection. For example, we could create the subsets of equivalent necklaces by simply rotating a given word and marking the words visited so far. Such an approach, however, is not possible if the equivalence relation does not have an obvious structure. 3.5.3 The number of equivalence relations for a set of n elements We write B(n) for the number of possible partitionings (and thereby equivalence relations) of the set {1, 2, . . . , n}. These are called Bell numbers. The sequence of Bell numbers is entry A000110 in [312], it starts as (n ≥ 1): 152 Chapter 3: Sorting and searching 1, 2, 5, 15, 52, 203, 877, 4140, 21147, 115975, 678570, 4213597, ... The can be computed easily as indicated in the following table: 0: 1: 2: 3: 4: 5: n: [ 1] [ 1, 2] [ 2, 3, 5] [ 5, 7, 10, 15] [15, 20, 27, 37, 52] [52, 67, 87, 114, 151, 203] [B(n), ... ] The first element in each row is the last element of the previous row, the remaining elements are the sum of their left and upper left neighbors. As GP code: 1 2 3 4 5 6 7 8 N=7; v=w=b=vector(N); v[1]=1; { for(n=1,N-1, b[n] = v[1]; print(n-1, ": ", v); \\ print row w[1] = v[n]; for(k=2,n+1, w[k]=w[k-1]+v[k-1]); v=w; ); } An implementation in C++ is given in [FXT: comb/bell-number-demo.cc]. An alternative way to compute the Bell numbers is shown in section 17.2 on page 358. 153 Chapter 4 Data structures We give implementations of selected data structures like stack, ring buffer, queue, double-ended queue (deque), bit-array, heap and priority queue. 4.1 Stack (LIFO) push( 1) 1 - - push( 2) 1 2 - push( 3) 1 2 3 push( 4) 1 2 3 4 push( 5) 1 2 3 4 push( 6) 1 2 3 4 push( 7) 1 2 3 4 pop== 7 1 2 3 4 pop== 6 1 2 3 4 push( 8) 1 2 3 4 pop== 8 1 2 3 4 pop== 5 1 2 3 4 push( 9) 1 2 3 4 pop== 9 1 2 3 4 pop== 4 1 2 3 push(10) 1 2 3 10 pop==10 1 2 3 pop== 3 1 2 - push(11) 1 2 11 pop==11 1 2 - pop== 2 1 - - push(12) 1 12 - pop==12 1 - - pop== 1 - - - push(13) 13 - - pop==13 - - - pop== 0 - - - (stack was empty) push(14) 14 - - pop==14 - - - pop== 0 - - - (stack was empty) push(15) 15 - - - 5 5 5 5 5 5 5 9 - 6 6 6 8 - 7 - - #=1 #=2 #=3 #=4 #=5 #=6 #=7 #=6 #=5 #=6 #=5 #=4 #=5 #=4 #=3 #=4 #=3 #=2 #=3 #=2 #=1 #=2 #=1 #=0 #=1 #=0 #=0 - - - - #=1 #=0 #=0 - - - - #=1 Figure 4.1-A: Inserting and retrieving elements with a stack. A stack (or LIFO, for last-in, first-out) is a data structure that supports the operations: push() to save an entry, pop() to retrieve and remove the entry that was entered last, and peek() to retrieve the element that was entered last without removing it. The method poke() modifies the last entry. An implementation with the option to let the stack grow when necessary is [FXT: class stack in ds/stack.h]: 1 2 3 4 5 6 template class stack { public: Type *x_; // data ulong s_; // size 154 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Chapter 4: Data structures ulong ulong p_; // stack pointer (position of next write), top entry @ p-1 gq_; // grow gq elements if necessary, 0 for "never grow" public: stack(ulong n, ulong growq=0) { s_ = n; x_ = new Type[s_]; p_ = 0; // stack is empty gq_ = growq; } ~stack() { delete [] x_; } ulong num() const { return p_; } // Return number of entries. Insertion and retrieval from the top of the stack are implemented as follows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 ulong push(Type z) // Add element z on top of stack. // Return size of stack, zero on stack overflow. // If gq_ is nonzero the stack grows if needed. { if ( p_ >= s_ ) { if ( 0==gq_ ) return 0; // overflow grow(); } x_[p_] = z; ++p_; return s_; } ulong pop(Type &z) // Retrieve top entry and remove it. // Return number of entries before removing element. // If empty return zero and leave z is undefined. { ulong ret = p_; if ( 0!=p_ ) { --p_; z = x_[p_]; } return ret; } ulong poke(Type z) // Modify top entry. // Return number of entries. // If empty return zero and do nothing. { if ( 0!=p_ ) x_[p_-1] = z; return p_; } ulong peek(Type &z) // Read top entry, without removing it. // Return number of entries. // If empty return zero and leave z undefined. { if ( 0!=p_ ) z = x_[p_-1]; return p_; } The growth routine is implemented as 1 2 3 4 5 6 7 8 private: void grow() { ulong ns = s_ + gq_; // new size x_ = ReAlloc(x_, ns, s_); s_ = ns; } }; here we use the function ReAlloc() that imports the C function realloc(). % man realloc 4.2: Ring buffer 155 #include void *realloc(void *ptr, size_t size); realloc() changes the size of the memory block pointed to by ptr to size bytes. The contents will be unchanged to the minimum of the old and new sizes; newly allocated memory will be uninitialized. If ptr is NULL, the call is equivalent to malloc(size); if size is equal to zero, the call is equivalent to free(ptr). Unless ptr is NULL, it must have been returned by an earlier call to malloc(), calloc() or realloc(). A program that shows the working of the stack is [FXT: ds/stack-demo.cc]. An example output where the initial size is 4 and the growth-feature enabled (in increments of 4 elements) is shown in figure 4.1-A. 4.2 Ring buffer A ring buffer is an array together with read and write operations that wrap around. That is, when the last position of the array is reached, writing continues at the begin of the array, thereby erasing the oldest entries. The read operation starts at the oldest entry in the array. array x[] A A B A B C A B C D E B C D E F C D E F G D E F G H I F G H I J G H insert(A) insert(B) insert(C) insert(D) insert(E) insert(F) insert(G) insert(H) insert(I) insert(J) x[] ordered A A B A B C A B C D B C D E C D E F D E F G E F G H F G H I G H I J n 1 2 3 4 4 4 4 4 4 4 wpos 1 2 3 0 1 2 3 0 1 2 fpos 0 0 0 0 1 2 3 0 1 2 Figure 4.2-A: Writing to a ring buffer. Figure 4.2-A shows the contents of a length-4 ring buffer after insertion of the symbols ‘A’, ‘B’, . . . , ‘J’. The listing was created with the program [FXT: ds/ringbuffer-demo.cc]. The implementation used is [FXT: class ringbuffer in ds/ringbuffer.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 template class ringbuffer { public: Type *x_; // data (ring buffer) ulong s_; // allocated size (# of elements) ulong n_; // current number of entries in buffer ulong wpos_; // next position to write in buffer ulong fpos_; // first position to read in buffer public: ringbuffer(ulong n) { s_ = n; x_ = new Type[s_]; n_ = 0; wpos_ = 0; fpos_ = 0; } ~ringbuffer() ulong num() { delete [] x_; } const { return n_; } If an entry is inserted, it is written to index wpos: 1 2 3 4 5 void insert(const Type &z) { x_[wpos_] = z; if ( ++wpos_>=s_ ) wpos_ = 0; if ( n_ < s_ ) ++n_; 156 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Chapter 4: Data structures else fpos_ = wpos_; } ulong read(ulong k, Type &z) const // Read entry k (that is, [(fpos_ + k)%s_]). // Return 0 if k>=n, else return k+1. { if ( k>=n_ ) return 0; ulong j = fpos_ + k; if ( j>=s_ ) j -= s_; z = x_[j]; return k + 1; } }; Ring buffers are, for example, useful for logging purposes, if only a certain number of lines can be saved. To do so, enhance the ringbuffer class so that it uses an additional array of (fixed width) strings. The message to log is copied into the array and the pointer set accordingly. A read returns the pointer to the string. 4.3 Queue (FIFO) A queue (or FIFO for first-in, first-out) is a data structure that supports the following operations: push() saves an entry, pop() retrieves (and removes) the entry that was entered least recently, and peek() retrieves the least recently entered element without removing it. array x[] push( 1) 1 - - push( 2) 1 2 - push( 3) 1 2 3 push( 4) 1 2 3 4 push( 5) 1 2 3 4 5 - push( 6) 1 2 3 4 5 6 push( 7) 1 2 3 4 5 6 7 pop== 1 - 2 3 4 5 6 7 pop== 2 - - 3 4 5 6 7 push( 8) - - 3 4 5 6 7 pop== 3 - - - 4 5 6 7 pop== 4 - - - - 5 6 7 push( 9) 9 - - - 5 6 7 pop== 5 9 - - - - 6 7 pop== 6 9 - - - - - 7 push(10) 9 10 - - - - 7 pop== 7 9 10 - - - - pop== 8 9 10 - - - - push(11) 9 10 11 - - - pop== 9 - 10 11 - - - pop==10 - - 11 - - - push(12) - - 11 12 - - pop==11 - - - 12 - - pop==12 - - - - - - push(13) - - - - 13 - pop==13 - - - - - - pop== 0 - - - - - - (queue was empty) push(14) - - - - - 14 pop==14 - - - - - - pop== 0 - - - - - - (queue was empty) push(15) - - - - - - 15 8 8 8 8 8 8 8 8 - n 1 2 3 4 5 6 7 6 5 6 5 4 5 4 3 4 3 2 3 2 1 2 1 0 1 0 0 rpos 0 0 0 0 0 0 0 1 2 2 3 4 4 5 6 6 7 0 0 1 2 2 3 4 4 5 5 wpos 1 2 3 0 5 6 7 7 7 0 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 - 1 0 0 5 6 6 6 6 6 - 1 6 7 Figure 4.3-A: Inserting and retrieving elements with a queue. We describe a queue with an optional feature of growing when necessary. Figure 4.3-A shows the data for a queue where the initial size is four and the growth-feature enabled (in steps of four elements). The listing was created with the program [FXT: ds/queue-demo.cc]. 4.3: Queue (FIFO) 157 The implementation is [FXT: class queue in ds/queue.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 template class queue { public: Type *x_; // pointer to data ulong s_; // allocated size (# of elements) ulong n_; // current number of entries in buffer ulong wpos_; // next position to write in buffer ulong rpos_; // next position to read in buffer ulong gq_; // grow gq elements if necessary, 0 for "never grow" public: explicit queue(ulong n, ulong growq=0) { s_ = n; x_ = new Type[s_]; n_ = 0; wpos_ = 0; rpos_ = 0; gq_ = growq; } ~queue() { delete [] x_; } ulong num() const { return n_; } The method push() writes to x[wpos], peek() and pop() read from x[rpos]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 ulong push(const Type &z) // Return number of entries. // Zero is returned on failure // (i.e. space exhausted and 0==gq_) { if ( n_ >= s_ ) { if ( 0==gq_ ) return 0; // growing disabled grow(); } x_[wpos_] = z; ++wpos_; if ( wpos_>=s_ ) wpos_ = 0; ++n_; return n_; } ulong peek(Type &z) // Return number of entries. // if zero is returned the value of z is undefined. { z = x_[rpos_]; return n_; } ulong pop(Type &z) // Return number of entries before pop // i.e. zero is returned if queue was empty. // If zero is returned the value of z is undefined. { ulong ret = n_; if ( 0!=n_ ) { z = x_[rpos_]; ++rpos_; if ( rpos_ >= s_ ) rpos_ = 0; --n_; } return ret; } The growing feature is implemented as follows: 1 private: 158 2 3 4 5 6 7 8 9 10 11 12 Chapter 4: Data structures void grow() { ulong ns = s_ + gq_; // new size // move read-position to zero: rotate_left(x_, s_, rpos_); x_ = ReAlloc(x_, ns, s_); wpos_ = s_; rpos_ = 0; s_ = ns; } }; 4.4 Deque (double-ended queue) A deque (for double-ended queue) combines the data structures stack and queue: insertion and deletion in time O(1) is possible both at the first and the last position. An implementation with the option to let the deque grow when necessary is [FXT: class deque in ds/deque.h] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 template class deque { public: Type *x_; // data (ring buffer) ulong s_; // allocated size (# of elements) ulong n_; // current number of entries in buffer ulong fpos_; // position of first element in buffer // insert_first() will write to (fpos-1)%n ulong lpos_; // position of last element in buffer plus one // insert_last() will write to lpos, n==(lpos-fpos) (mod s) // entries are at [fpos, ..., lpos-1] (range may be empty) ulong gq_; // grow gq elements if necessary, 0 for "never grow" public: explicit deque(ulong n, ulong growq=0) { s_ = n; x_ = new Type[s_]; n_ = 0; fpos_ = 0; lpos_ = 0; gq_ = growq; } ~deque() { delete [] x_; } ulong num() const { return n_; } The insertion at the front and end are implemented as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ulong insert_first(const Type &z) // Return number of entries after insertion. // Zero is returned on failure // (i.e. space exhausted and 0==gq_) { if ( n_ >= s_ ) { if ( 0==gq_ ) return 0; // growing disabled grow(); } --fpos_; if ( fpos_ == -1UL ) x_[fpos_] = z; ++n_; return n_; fpos_ = s_ - 1; } ulong insert_last(const Type &z) // Return number of entries after insertion. // Zero is returned on failure // (i.e. space exhausted and 0==gq_) 4.4: Deque (double-ended queue) 24 25 26 27 28 29 30 31 32 33 34 35 36 159 { if ( n_ >= s_ ) { if ( 0==gq_ ) grow(); } return 0; x_[lpos_] = z; ++lpos_; if ( lpos_>=s_ ) ++n_; return n_; lpos_ = 0; // growing disabled } The extraction methods are 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ulong extract_first(Type & z) // Return number of elements before extract. // Return 0 if extract on empty deque was attempted. { if ( 0==n_ ) return 0; z = x_[fpos_]; ++fpos_; if ( fpos_ >= s_ ) fpos_ = 0; --n_; return n_ + 1; } ulong extract_last(Type & z) // Return number of elements before extract. // Return 0 if extract on empty deque was attempted. { if ( 0==n_ ) return 0; --lpos_; if ( lpos_ == -1UL ) lpos_ = s_ - 1; z = x_[lpos_]; --n_; return n_ + 1; } We can read at the front, end, or an arbitrary index, without changing any data: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 1 2 3 4 5 6 7 8 ulong read_first(Type & z) const // Read (but don’t remove) first entry. // Return number of elements (i.e. on error return zero). { if ( 0==n_ ) return 0; z = x_[fpos_]; return n_; } ulong read_last(Type & z) const // Read (but don’t remove) last entry. // Return number of elements (i.e. on error return zero). { return read(n_-1, z); // ok for n_==0 } ulong read(ulong k, Type & z) const // Read entry k (that is, [(fpos_ + k)%s_]). // Return 0 if k>=n_ else return k+1 { if ( k>=n_ ) return 0; ulong j = fpos_ + k; if ( j>=s_ ) j -= s_; z = x_[j]; return k + 1; } private: void grow() { ulong ns = s_ + gq_; // new size // Move read-position to zero: rotate_left(x_, s_, fpos_); x_ = ReAlloc(x_, ns, s_); fpos_ = 0; 160 9 10 11 12 Chapter 4: Data structures lpos_ = n_; s_ = ns; } }; insert_first( 1) 1 insert_last(51) 1 51 insert_first( 2) 2 1 51 insert_last(52) 2 1 51 52 insert_first( 3) 3 2 1 51 52 insert_last(53) 3 2 1 51 52 53 extract_first()= 3 2 1 51 52 53 extract_last()= 53 2 1 51 52 insert_first( 4) 4 2 1 51 52 insert_last(54) 4 2 1 51 52 54 extract_first()= 4 2 1 51 52 54 extract_last()= 54 2 1 51 52 extract_first()= 2 1 51 52 extract_last()= 52 1 51 extract_first()= 1 51 extract_last()= 51 insert_first( 5) 5 insert_last(55) 5 55 extract_first()= 5 55 extract_last()= 55 extract_first()= (deque is empty) extract_last()= (deque is empty) insert_first( 7) 7 insert_last(57) 7 57 Figure 4.4-A: Inserting and retrieving elements with a queue. Its working is shown in figure 4.4-A which was created with the program [FXT: ds/deque-demo.cc]. 4.5 Heap and priority queue 4.5.1 Indexing scheme for binary trees 1:[...1] 2:[..1.] 4:[.1..] 8:[1...] 3:[..11] 5:[.1.1] 6:[.11.] 7:[.111] 9:[1..1] Figure 4.5-A: Indexing a binary tree: the left child of node k is node 2k, the right child is node 2k + 1. A one-based index array with n elements can be identified with a binary tree as shown in figure 4.5-A. Node 1 is the root node. The left child of node k is node 2k and the right child is node 2k + 1. The parent of node k is node bk/2c. We require that consecutive array indices 1, 2, . . ., n are used. Therefore all nodes k where k ≤ bn/2c have at least one child. 4.5.2 The binary heap A binary heap is a binary tree of the form just described, where both children are less than or equal to their parent. Figure 4.5-B shows an example of a heap with nine elements. The following function determines whether a given array is a heap [FXT: ds/heap.h]: 1 2 template ulong test_heap(const Type *x, ulong n) 4.5: Heap and priority queue 161 95 91 79 76 as array: 84 91 80 78 71 [ 95, 91, 84, 79, 91, 80, 78, 76, 71] Figure 4.5-B: A heap with nine elements, the left or right child is never greater than the parent. 3 4 5 6 7 8 9 10 11 12 13 // Return 0 if x[] has heap property // else index of node found to be greater than its parent. { const Type *p = x - 1; // make one-based for (ulong k=n; k>1; --k) { ulong t = (k>>1); // parent(k) if ( p[t] void heapify(Type *z, ulong n, ulong k) // Data expected in z[1,2,...,n]. { ulong m = k; // index of max of k, left(k), and right(k) const ulong l = (k<<1); // left(k); if ( (l <= n) && (z[l] > z[k]) ) m = l; // left child (exists and) greater than k const ulong r = (k<<1) + 1; // right(k); if ( (r <= n) && (z[r] > z[m]) ) m = r; // right child (ex. and) greater than max(k,l) if ( m != k ) // need to swap { swap2(z[k], z[m]); heapify(z, n, m); } } To reorder an array into a heap, we restore the heap property from the bottom up: 1 2 3 4 5 6 7 8 9 10 11 12 13 template void build_heap(Type *x, ulong n) // Reorder data to a heap. // Data expected in x[0,1,...,n-1]. { Type *z = x - 1; // make one-based ulong j = (n>>1); // max index such that node has at least one child while ( j > 0 ) { heapify(z, n, j); --j; } } The routine has complexity O (n). Let the height of node k be the maximal number of swaps that can happen with heapify(k). There are less than n/2 elements of height 1, n/4 of height 2, n/8 of height 3, 162 Chapter 4: Data structures and so on. Let W (n) be the maximal number of swaps with n elements, we have W (n) < 1 n/2 + 2 n/4 + 3 n/8 + . . . + log2 (n) 1 < 2 n (4.5-1) So the complexity is indeed linear. A new element can be inserted into a heap in O(log n) time by appending it and moving it towards the root as necessary: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 template bool heap_insert(Type *x, ulong n, ulong s, Type t) // With x[] a heap of current size n // and max size s (i.e. space for s elements allocated), // insert t and restore heap-property. // Return true if successful, else (i.e. if space exhausted) false. { if ( n > s ) return false; ++n; Type *x1 = x - 1; // make one-based ulong j = n; while ( j > 1 ) // move towards root as needed { ulong k = (j>>1); // k==parent(j) if ( x1[k] >= t ) break; x1[j] = x1[k]; j = k; } x1[j] = t; return true; } Similarly, the maximal element can be removed in time O(log n): 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 template Type heap_extract_max(Type *x, ulong n) // Return maximal element of heap and restore heap structure. // Return value is undefined for 0==n. { Type m = x[0]; if ( 0 != n ) { Type *x1 = x - 1; x1[1] = x1[n]; --n; heapify(x1, n, 1); } return m; } 4.5.3 Priority queue A priority queue is a data structure that supports insertion of an element and extraction of its maximal element, both in time O (log(n)). A priority queue can be used to schedule an event for a certain time and return the next pending event. We use a binary heap to implement a priority queue. Two modifications seem appropriate: Firstly, replace extract_max() by extract_next(), leaving it as a compile time option whether to extract the minimal or the maximal element. We need to change the comparison operators at a few strategic places so that the heap is built either with its maximal or its minimal element first [FXT: class priority queue in ds/priorityqueue.h]: 1 2 3 4 5 6 7 8 9 10 11 #if 1 // next() is the one with the smallest key // i.e. extract_next() is extract_min() #define _CMP_ < #define _CMPEQ_ <= #else // next() is the one with the biggest key // i.e. extract_next() is extract_max() #define _CMP_ > #define _CMPEQ_ >= #endif 4.5: Heap and priority queue Secondly, augment the elements by an event description that can be freely defined: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 template class priority_queue { public: Type1 *t1_; // time: t1[1..s] one-based array! Type2 *e1_; // events: e1[1..s] one-based array! ulong s_; // allocated size (# of elements) ulong n_; // current number of events ulong gq_; // grow gq elements if necessary, 0 for "never grow" public: priority_queue(ulong n, ulong growq=0) { s_ = n; t1_ = new Type1[s_] - 1; e1_ = new Type2[s_] - 1; n_ = 0; gq_ = growq; } ~priority_queue() { delete [] (t1_+1); delete [] (e1_+1); } [--snip--] The extraction and insertion operations are 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 bool extract_next(Type1 &t, Type2 &e) { if ( n_ == 0 ) return false; t = t1_[1]; e = e1_[1]; t1_[1] = t1_[n_]; e1_[1] = e1_[n_]; --n_; heapify(1); return true; } bool insert(const Type1 &t, const Type2 &e) // Insert event e at time t. // Return true if successful, else false (space exhausted and growth disabled). { if ( n_ >= s_ ) { if ( 0==gq_ ) return false; // growing disabled grow(); } ++n_; ulong j = n_; while ( j > 1 ) { ulong k = (j>>1); // k==parent(j) if ( t1_[k] _CMPEQ_ t ) break; t1_[j] = t1_[k]; e1_[j] = e1_[k]; j = k; } t1_[j] = t; e1_[j] = e; return true; } void reschedule_next(Type1 t) { t1_[1] = t; heapify(1); } 163 164 Chapter 4: Data structures The member function reschedule_next() is more efficient than the sequence extract_next(); insert();, as it calls heapify() only once. The heapify() function is tail-recursive, so we make it iterative: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 private: void heapify(ulong k) { ulong m = k; hstart: ulong l = (k<<1); // left(k); ulong r = l + 1; // right(k); if ( (l <= n_) && (t1_[l] _CMP_ t1_[k]) ) if ( (r <= n_) && (t1_[r] _CMP_ t1_[m]) ) m = l; m = r; if ( m != k ) { swap2(t1_[k], t1_[m]); swap2(e1_[k], e1_[m]); heapify(m); k = m; goto hstart; // tail recursion } // } The second argument of the constructor determines the number of elements added in case of growth, it is disabled (equals zero) by default. 1 2 3 4 5 6 7 8 9 private: void grow() { ulong ns = s_ + gq_; // new size t1_ = ReAlloc(t1_+1, ns, s_) - 1; e1_ = ReAlloc(e1_+1, ns, s_) - 1; s_ = ns; } }; The ReAlloc() routine is described in section 4.1 on page 153. Inserting into piority_queue: # : event @ time 0: A @ 0.840188 1: B @ 0.394383 2: C @ 0.783099 3: D @ 0.79844 4: E @ 0.911647 5: F @ 0.197551 6: G @ 0.335223 7: H @ 0.76823 8: I @ 0.277775 9: J @ 0.55397 Extracting from piority_queue: # : event @ time 9: F @ 0.197551 8: I @ 0.277775 7: G @ 0.335223 6: B @ 0.394383 5: J @ 0.55397 4: H @ 0.76823 3: C @ 0.783099 2: D @ 0.79844 1: A @ 0.840188 0: E @ 0.911647 Figure 4.5-C: Insertion of events labeled ‘A’, ‘B’, . . . , ‘J’ scheduled for random times into a priority queue (left) and subsequent extraction (right). The program [FXT: ds/priorityqueue-demo.cc] inserts events at random times 0 ≤ t < 1, then extracts all of them. It gives the output shown in figure 4.5-C. A more typical usage would intermix the insertions and extractions. 4.6 Bit-array The use of bit-arrays should be obvious: an array of tag values (like ‘seen’ versus ‘unseen’) where all standard data types would be a waste of space. Besides reading and writing individual bits one should implement a convenient search for the next set (or cleared) bit. The class [FXT: class bitarray in ds/bitarray.h] is used, for example, for lists of small primes [FXT: mod/primes.cc], for in-place transposition routines [FXT: aux2/transpose.h] (see section 2.8 on page 122) and several operations on permutations (see section 2.4 on page 109). 1 class bitarray 4.6: Bit-array 2 3 4 5 6 7 8 9 10 11 12 165 // Bit-array class mostly for use as memory saving array of Boolean values. // Valid index is 0...nb_-1 (as usual in C arrays). { public: ulong *f_; // bit bucket ulong n_; // number of bits ulong nfw_; // number of words where all bits are used, may be zero ulong mp_; // mask for partially used word if there is one, else zero // (ones are at the positions of the _unused_ bits) bool myfq_; // whether f[] was allocated by class [--snip--] The constructor allocates memory by default. If the second argument is nonzero, it must point to an accessible memory range: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 bitarray(ulong nbits, ulong *f=0) // nbits must be nonzero { ulong nw = ctor_core(nbits); if ( f!=0 ) { f_ = (ulong *)f; myfq_ = false; } else { f_ = new ulong[nw]; myfq_ = true; } } The public methods are 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 // operations on bit n: ulong test(ulong n) const // Test whether n-th bit set void set(ulong n) // Set n-th bit void clear(ulong n) // Clear n-th bit void change(ulong n) // Toggle n-th bit ulong test_set(ulong n) // Test whether n-th bit is set and set it ulong test_clear(ulong n) // Test whether n-th bit is set and clear it ulong test_change(ulong n) // Test whether n-th bit is set and toggle it // Operations on all bits: void clear_all() void set_all() int all_set_q() const; int all_clear_q() const; // Clear all bits // Set all bits // Return whether all bits are set // Return whether all bits are clear // Scanning the array: // Note: the given index n is included in the search ulong next_set_idx(ulong n) const // Return index of next set or value beyond end ulong next_clear_idx(ulong n) const // Return index of next clear or value beyond end Combined operations like ‘test-and-set-bit’, ‘test-and-clear-bit’, ‘test-and-change-bit’ are often needed in applications that use bit-arrays. This is why modern CPUs often have instructions implementing these operations. The class does not supply overloading of the array-index operator [ ] because the writing variant would cause a performance penalty. One might want to add ‘sparse’-versions of the scan functions (next_set_idx() and next_clear_idx()) for large bit-arrays with only few bits set or unset. On the AMD64 architecture the corresponding CPU instructions are used [FXT: bits/bitasm-amd64.h]: 1 2 3 4 5 6 7 8 9 10 static inline ulong asm_bts(ulong *f, ulong i) // Bit Test and Set { ulong ret; asm ( "btsq %2, %1 \n" "sbbq %0, %0" : "=r" (ret) : "m" (*f), "r" (i) ); return ret; } 166 Chapter 4: Data structures If no specialized CPU instructions are available, the following two macros are used: 1 2 3 #define DIVMOD(n, d, bm) \ ulong d = n / BITS_PER_LONG; \ ulong bm = 1UL << (n % BITS_PER_LONG); 1 2 3 4 #define DIVMOD_TEST(n, d, bm) \ ulong d = n / BITS_PER_LONG; \ ulong bm = 1UL << (n % BITS_PER_LONG); \ ulong t = bm & f_[d]; The macro BITS_USE_ASM determines whether the CPU instruction is available: 1 2 3 4 5 6 7 8 9 10 ulong test_set(ulong n) // Test whether n-th bit is set and set it. { #ifdef BITS_USE_ASM return asm_bts(f_, n); #else DIVMOD_TEST(n, d, bm); f_[d] |= bm; return t; #endif } Performance is still good in that case as the modulo operation and division by BITS PER LONG (a power of 2) are replaced with cheap (bit-and and shift) operations. On the machine described in appendix B on page 922 both versions give practically identical performance. The way that out of bounds are handled can be defined at the beginning of the header file: #define CHECK 0 // define to disable check of out of bounds access //#define CHECK 1 // define to handle out of bounds access //#define CHECK 2 // define to fail with out of bounds access 4.7 Left-right array The left-right array (or LR-array) keeps track of a range of indices 0, . . . , n − 1. Every index can have two states, free or set. The LR-array implements the following operations in time O (log n): marking the k-th free index as set; marking the k-th set index as free; for the i-th (absolute) index, finding how many indices of the same type (free or set) are left (or right) to it (including or excluding i). The implementation is given as [FXT: class left right array in ds/left-right-array.h]: 1 2 3 4 5 6 7 class left_right_array { public: ulong *fl_; // Free indices Left (including current element) in bsearch interval bool *tg_; // tags: tg[i]==true if and only if index i is free ulong n_; // total number of indices ulong f_; // number of free indices The arrays used have n elements: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 public: left_right_array(ulong n) { n_ = n; fl_ = new ulong[n_]; tg_ = new bool[n_]; free_all(); } ~left_right_array() { delete [] fl_; delete [] tg_; } ulong num_free() const { return f_; } ulong num_set() const { return n_ - f_; } The initialization routine free_all() of the array fl[] uses a variation of the binary search algorithm described in section 3.2 on page 141: 4.7: Left-right array 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 167 private: void init_rec(ulong i0, ulong i1) // Set elements of fl[0,...,n-2] according to empty array a[]. // The element fl[n-1] needs to be set to 1 afterwards. // Work is O(n). { if ( (i1-i0)!=0 ) { ulong t = (i1+i0)/2; init_rec(i0, t); init_rec(t+1, i1); } fl_[i1] = i1-i0+1; } public: void free_all() // Mark all indices as free. { f_ = n_; for (ulong j=0; j= num_free() ) return ~0UL; ulong i0 = 0, i1 = n_-1; while ( 1 ) { ulong t = (i1+i0)/2; if ( (fl_[t] == k+1) && (tg_[t]) ) return t; if ( fl_[t] > k ) // left: { i1 = t; } else // right: { i0 = t+1; k-=fl_[t]; } } } Usually one would have an extra array where one actually does write to the position returned above. Then the data of the LR-array has to be modified accordingly. The following method does this: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ulong get_free_idx_chg(ulong k) // Return the k-th ( 0 <= k < num_free() ) free index. // Return ~0UL if k is out of bounds. // Change the arrays and fl[] and tg[] reflecting // that index i will be set afterwards. // Work is O(log(n)). { if ( k >= num_free() ) return ~0UL; --f_; ulong i0 = 0, i1 = n_-1; while ( 1 ) { ulong t = (i1+i0)/2; if ( (fl_[t] == k+1) && (tg_[t]) ) { --fl_[t]; tg_[t] = false; return t; 168 22 23 24 25 26 27 28 29 30 31 32 33 34 Chapter 4: Data structures } if ( fl_[t] > k ) // left: { --fl_[t]; i1 = t; } else // right: { i0 = t+1; k-=fl_[t]; } } } fl[]= 1 2 3 1 5 1 2 1 1 a[]= * * * * * * * * * (continued) ------- first: ------fl[]= 0 1 2 1 4 1 2 1 1 a[]= 1 * * * * * * * * ------- last: ------fl[]= 0 0 0 1 2 1 1 0 0 a[]= 1 3 5 * * * 6 4 2 ------- last: ------fl[]= 0 1 2 1 4 1 2 1 0 a[]= 1 * * * * * * * 2 ------- first: ------fl[]= 0 0 0 0 1 1 1 0 0 a[]= 1 3 5 7 * * 6 4 2 ------- first: ------fl[]= 0 0 1 1 3 1 2 1 0 a[]= 1 3 * * * * * * 2 ------- last: ------fl[]= 0 0 0 0 1 0 0 0 0 a[]= 1 3 5 7 * 8 6 4 2 ------- last: ------fl[]= 0 0 1 1 3 1 2 0 0 a[]= 1 3 * * * * * 4 2 ------- first: ------fl[]= 0 0 0 0 0 0 0 0 0 a[]= 1 3 5 7 9 8 6 4 2 ------- first: ------fl[]= 0 0 0 1 2 1 2 0 0 a[]= 1 3 5 * * * * 4 2 Figure 4.7-A: Alternately setting the first and last free position in an LR-array. Asterisks denote free positions, indices i where tg[i] is true. For example, the following program sets alternately the first and last free position until no free position is left [FXT: ds/left-right-array-demo.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ulong n = 9; ulong *A = new ulong[n]; left_right_array LR(n); LR.free_all(); // PRINT for (ulong e=0; e= num_set() ) return ~0UL; ++f_; ulong i0 = 0, while ( 1 ) { i1 = n_-1; 4.7: Left-right array 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 169 ulong t = (i1+i0)/2; // how many elements to the left are set: ulong slt = t-i0+1 - fl_[t]; if ( (slt == k+1) && (tg_[t]==false) ) { ++fl_[t]; tg_[t] = true; return t; } if ( slt > k ) // left: { ++fl_[t]; i1 = t; } else // right: { i0 = t+1; k-=slt; } } } The following method returns the number of free indices left of i (and excluding i): 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 ulong num_FLE(ulong i) const // Return number of Free indices Left of (absolute) index i (Excluding i). // Work is O(log(n)). { if ( i >= n_ ) { return ~0UL; } // out of bounds ulong i0 = 0, i1 = n_-1; ulong ns = i; // number of set element left to i (including i) while ( 1 ) { if ( i0==i1 ) break; ulong t = (i1+i0)/2; if ( i<=t ) // left: { i1 = t; } else // right: { ns -= fl_[t]; i0 = t+1; } } return i-ns; } Based on it are methods to determine the number of free/set indices to the left/right, including/excluding the given index. We omit the out-of-bounds clauses in the following: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ulong num_FLI(ulong i) const // Return number of Free indices Left of (absolute) index i (Including i). { return num_FLE(i) + tg_[i]; } ulong num_FRE(ulong i) const // Return number of Free indices Right of (absolute) index i (Excluding i). { return num_free() - num_FLI(i); } ulong num_FRI(ulong i) const // Return number of Free indices Right of (absolute) index i (Including i). { return num_free() - num_FLE(i); } ulong num_SLE(ulong i) const // Return number of Set indices Left of (absolute) index i (Excluding i). { return i - num_FLE(i); } ulong num_SLI(ulong i) const // Return number of Set indices Left of (absolute) index i (Including i). { return i - num_FLE(i) + !tg_[i]; } ulong num_SRE(ulong i) const // Return number of Set indices Right of (absolute) index i (Excluding i). 170 23 24 25 26 27 Chapter 4: Data structures { return num_set() - num_SLI(i); } ulong num_SRI(ulong i) const // Return number of Set indices Right of (absolute) index i (Including i). { return num_set() - i + num_FLE(i); } These can be used for the fast conversion between permutations and inversion tables, see section 10.1.1.1 on page 235. 171 Part II Combinatorial generation 172 Chapter 5: Conventions and considerations Chapter 5 Conventions and considerations We give algorithms for the generation of all combinatorial objects of certain types such as combinations, compositions, subsets, permutations, integer partitions, set partitions, restricted growth strings and necklaces. Finally, we give some constructions for Hadamard and conference matrices. Several (more esoteric) combinatorial objects that are found via searching in directed graphs are presented in chapter 20. These routines are useful in situations where an exhaustive search over all configurations of a certain kind is needed. Combinatorial algorithms are also fundamental to many programming problems and they can simply be fun! 5.1 Representations and orders For a set of n elements we will take either {0, 1, . . . , n − 1} or {1, 2, . . . , n}. Our convention for the set notation is to start with the smallest element. Often there is more than one useful way to represent a combinatorial object. For example the subset {1, 4, 6} of the set {0, 1, 2, 3, 4, 5, 6} can also be written as a delta set [0100101]. Some sources use the term bit string. We often write dots instead of zeros for readability: [.1..1.1]. Note that in the delta set we put the first element to the left side (array notation), this is in contrast to the usual way of printing binary numbers, where the least significant bit (bit number zero) is shown on the right side. For most objects we will give an algorithm for generation in lexicographic (or simply lex ) order. In lexicographic order a string X = [x0 , x1 , . . .] precedes the string Y = [y0 , y1 , . . .] if for the smallest index k where the strings differ we have xk < yk . Further, the string X precedes X.W (the concatenation of X with W ) for any nonempty string W . The co-lexicographic (or simply colex ) order is obtained by sorting with respect to the reversed strings. The order sometimes depends on the representation that is used, for an example see figure 8.1-A on page 202. In a minimal-change order the amount of change between successive objects is the least possible. Such an order is also called a (combinatorial) Gray code. There is in general more than one such order. Often we can impose even stricter conditions, like that (with permutations) the changes are between adjacent positions. The corresponding order is a strong minimal-change order. A very readable survey of Gray codes is given in [343], see also [298]. 5.2 Ranking, unranking, and counting For a particular ordering of combinatorial objects (say, lexicographic order for permutations) we can ask which position in the list a given object has. An algorithm for finding the position is called a ranking algorithm. A method to determine the object, given its position, is called an unranking algorithm. Given both ranking and unranking methods, one can compute the successor of a given object by computing its rank r and unranking r + 1. While this method is usually slow the idea can be used to find more efficient algorithms for computing the successor. In addition the idea often suggests interesting orderings for combinatorial objects. 5.3: Characteristics of the algorithms 173 We sometimes give ranking or unranking methods for numbers in special forms such as factorial representations for permutations. Ranking and unranking methods are implicit in generation algorithms based on mixed radix counting given in section 10.9 on page 258. A simple but surprisingly powerful way to discover isomorphisms (one-to-one correspondences) between combinatorial objects is counting them. If the sequences of numbers of two kinds of objects are identical, chances are good of finding a conversion routine between the corresponding objects. For example, there are 2n permutations of n elements such that no element lies more than one position to the right of its original position. With this observation an algorithm for generating these permutations via binary counting can be found, see section 11.2 on page 282. The representation of combinatorial objects as restricted growth strings (as shown in section 15.2 on page 325) follows from the same idea. The resulting generation methods can be very fast and flexible. The number of objects of a given size can often be given by an explicit expression (for example, the number of parentheses strings of n pairs is the Catalan number Cn = 2n n /(n + 1), see section 15.4 on page 331). The ordinary generating function (OGF) for a combinatorial object has a power series whose coefficients count the objects: for the Catalan numbers we have the OGF √ ∞ X 1 − 1 − 4x n (5.2-1) Cn x = C(x) = 2x n=0 Generating functions can often be given even though no explicit expression for the number of the objects is known. The generating functions sometimes can be used to observe nontrivial identities, for example, that the number of partitions into distinct parts equals the number of partitions into odd parts, given as relation 16.4-23 on page 348. An exponential generating function (EGF) for a type of object where there are En objects of size n has the power series of the form (see, for example, relation 11.1-7 on page 279) ∞ X n=0 En xn n! (5.2-2) An excellent introduction to generating functions is given in [166], for in-depth information see [167, vol.2, chap.21, p.1021], [143], and [319]. 5.3 Characteristics of the algorithms In almost all cases we produce the combinatorial objects one by one. Let n be the size of the object. The successor (with respect to the specified order) is computed from the object itself and additional data of a size less than a constant multiple of n. Let B be the total number of combinatorial objects under consideration. Sometimes the cost of a successor computation is O(n). Then the total cost for generating all objects is O(n · B). If the successor computation takes a fixed number of operations (independent of the object size), then we say the algorithm is O(1). If so, there can be no loop in the implementation, we say the algorithm is loopless. Then the total cost for all objects is c · B for some constant c, independent of the object size. A loopless algorithm can only exist if the amount of change between successive objects is bounded by a constant that does not depend on the object size. Natural candidates for loopless algorithms are Gray codes. In many cases the cost of computing all objects is also c · B while the computation of the successor does involve a loop. As an example consider incrementing in binary using arrays: in half of the cases just the lowest bit changes, for half of the remaining cases just two bits change, and so on. The total cost is B · (1 + 12 (1 + 12 (· · · ))) = 2 · B, independent of the number of bits used. So the total cost is as in the loopless case while the successor computation can be expensive in some cases. Algorithms with this characteristic are said to be constant amortized time (or CAT). Often CAT algorithms are faster than loopless algorithms, typically if their structure is simpler. 174 Chapter 5: Conventions and considerations 5.4 Optimization techniques Let x be an array of n elements. The loop ulong k = 0; while ( (km) ) { /*...*/ } by the following single test where unsigned integers are used: if ( x>m ) { /*...*/ } Use a do-while construct instead of a while-do loop whenever possible because the latter also tests the loop condition at entry. Even if the do-while version causes some additional work, the gain from avoiding a branch may outweigh it. Note that in the C language the for-loop also tests the condition at loop entry. When computing the next object there may be special cases where the update is easy. If the percentage of these ‘easy cases’ is not too small, an extra branch in the update routine should be created. The performance gain is very visible in most cases (section 10.4 on page 245) and can be dramatic (section 10.5 on page 248). Recursive routines can be quite elegant and versatile, see, for example, section 6.4 on page 182 and section 13.2.1 on page 297. However, expect only about half the speed of a good iterative implementation of the same algorithm. The notation for list recursions is given in section 14.1 on page 304. Address generation can be simpler if arrays are used instead of pointers. This technique is useful for many permutation generators, see chapter 10 on page 232. Change the pointer declarations to array declarations in the corresponding class as follows: //ulong *p_; ulong p_[32]; // permutation data (pointer version) // permutation data (array version) Here we assume that nobody would attempt to compute all permutations of 31 or more elements (31! ≈ 8.22 · 1033 , taking about 1.3 · 1018 years to finish). To use arrays uncomment (in the corresponding header files) a line like #define PERM_REV2_FIXARRAYS // use arrays instead of pointers (speedup) This will also disable the statements to allocate and free memory with the pointers. Whether the use of arrays tends to give a speedup is noted in the comment. The performance gain can be spectacular, see section 7.1 on page 194. 5.5 Implementations, demo-programs, and timings Most combinatorial generators are implemented as C++ classes. The first object in the given order is created by the method first(). The method to compute the successor is usually next(). If a method 5.5: Implementations, demo-programs, and timings 175 for the computation of the predecessor is given, then it is called prev() and a method last() to compute the last element in the list is given. The current combinatorial object can be accessed through the method data(). To make all data of a class accessible the data is declared public. This way the need for various get_something() methods is avoided. To minimize the danger of accidental modification of class data the variable names end with an underscore. For example, the class for the generation of combinations in lexicographic order starts as class combination_lex { public: ulong *x_; // combination: k elements 0<=x[j] 2^30/5.90 == 181,990,139 per second // with SUBSET_GRAY_DELTA_MAX_ARRAY_LEN defined: time ./bin 30 arg 1: 30 == n [Size of the set] default=5 arg 2: 0 == cq [Whether to start with full set] default=0 ./bin 30 5.84s user 0.01s system 99% cpu 5.853 total ==> 2^30/5.84 == 183,859,901 per second For your own measurements simply uncomment the line //#define TIMING // uncomment to disable printing near the top of the demo-program. The rate of generation for a certain object is occasionally given as 123 M/s, meaning that 123 million objects are generated per second. If a generator routine is used in an application, one must do the benchmarking with the application. Choosing the optimal ordering and type of representation (for example, delta sets versus sets) for the given task is crucial for good performance. Further optimization will very likely involve the surrounding code rather than the generator alone. 176 Chapter 6: Combinations Chapter 6 Combinations We give algorithms to generate all subsets of the n-element set that contain k elements. For brevity we  sometimes refer to the nk combinations of k out of n elements as “the combinations nk ”. 6.1 Binomial coefficients n \ k 0 0: 1 1: 1 2: 1 3: 1 4: 1 5: 1 6: 1 7: 1 8: 1 9: 1 10: 1 11: 1 12: 1 13: 1 14: 1 15: 1 1 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 3 6 10 15 21 28 36 45 55 66 78 91 105 3 4 5 6 7 8 9 10 11 12 13 1 4 1 10 5 1 20 15 6 1 35 35 21 7 1 56 70 56 28 8 1 84 126 126 84 36 9 1 120 210 252 210 120 45 10 1 165 330 462 462 330 165 55 11 1 220 495 792 924 792 495 220 66 12 1 286 715 1287 1716 1716 1287 715 286 78 13 1 364 1001 2002 3003 3432 3003 2002 1001 364 91 14 455 1365 3003 5005 6435 6435 5005 3003 1365 455 105  Figure 6.1-A: The binomial coefficients nk for 0 ≤ n, k ≤ 15. 14 15 1 15 1 The number of ways to choose k elements from a set of n elements equals the binomial coefficient (‘n choose k’, or ‘k out of n’):   n k = n (n − 1) (n − 2) . . . (n − k + 1) n! = = k! (n − k)! k (k − 1) (k − 2) . . . 1 Qk j=1 (n − j + 1) k! = nk kk (6.1-1) b The last equality uses the  falling factorial notation a := a (a − 1) (a − 2) . . . (a − b + 1). Equivalently, a n set of n elements has k subsets of exactly k elements. These subsets are called the k-subsets (where k is fixed) or k-combinations of an n-set (a set with n elements). To avoid overflow during the computation of the binomial coefficient, use the form   n k = (n − k + 1)k 1k = n−k+1 n−k+2 n−k+3 n · · ··· 1 2 3 k An implementation is given in [FXT: aux0/binomial.h]: 1 2 3 4 5 6 7 8 9 inline ulong binomial(ulong n, ulong k) { if ( k>n ) return 0; if ( (k==0) || (k==n) ) return 1; if ( 2*k > n ) k = n-k; // use symmetry ulong b = n - k + 1; ulong f = b; for (ulong j=2; j<=k; ++j) (6.1-2) 6.2: Lexicographic and co-lexicographic order 10 11 12 13 14 15 16 177 { ++f; b *= f; b /= j; } return b; } The table of the first binomial coefficients is shown in figure 6.1-A. This table is called Pascal’s triangle, it was generated with the program [FXT: comb/binomial-demo.cc]. Observe that       n n−1 n−1 = + (6.1-3) k k−1 k That is, each entry is the sum of its upper and left upper neighbor. The generating function for the k-combinations of an n-set is n   X n k n (1 + x) = x (6.1-4) k k=0 6.2 Lexicographic and co-lexicographic order lexicographic 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: set { 0, 1, 2 } { 0, 1, 3 } { 0, 1, 4 } { 0, 1, 5 } { 0, 2, 3 } { 0, 2, 4 } { 0, 2, 5 } { 0, 3, 4 } { 0, 3, 5 } { 0, 4, 5 } { 1, 2, 3 } { 1, 2, 4 } { 1, 2, 5 } { 1, 3, 4 } { 1, 3, 5 } { 1, 4, 5 } { 2, 3, 4 } { 2, 3, 5 } { 2, 4, 5 } { 3, 4, 5 } co-lexicographic delta set 111... 11.1.. 11..1. 11...1 1.11.. 1.1.1. 1.1..1 1..11. 1..1.1 1...11 .111.. .11.1. .11..1 .1.11. .1.1.1 .1..11 ..111. ..11.1 ..1.11 ...111 Figure 6.2-A: All combinations 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 6 3  set { 0, 1, 2 } { 0, 1, 3 } { 0, 2, 3 } { 1, 2, 3 } { 0, 1, 4 } { 0, 2, 4 } { 1, 2, 4 } { 0, 3, 4 } { 1, 3, 4 } { 2, 3, 4 } { 0, 1, 5 } { 0, 2, 5 } { 1, 2, 5 } { 0, 3, 5 } { 1, 3, 5 } { 2, 3, 5 } { 0, 4, 5 } { 1, 4, 5 } { 2, 4, 5 } { 3, 4, 5 } delta set 111... 11.1.. 1.11.. .111.. 11..1. 1.1.1. .11.1. 1..11. .1.11. ..111. 11...1 1.1..1 .11..1 1..1.1 .1.1.1 ..11.1 1...11 .1..11 ..1.11 ...111 set reversed { 2, 1, 0 } { 3, 1, 0 } { 3, 2, 0 } { 3, 2, 1 } { 4, 1, 0 } { 4, 2, 0 } { 4, 2, 1 } { 4, 3, 0 } { 4, 3, 1 } { 4, 3, 2 } { 5, 1, 0 } { 5, 2, 0 } { 5, 2, 1 } { 5, 3, 0 } { 5, 3, 1 } { 5, 3, 2 } { 5, 4, 0 } { 5, 4, 1 } { 5, 4, 2 } { 5, 4, 3 } in lexicographic order (left) and co-lexicographic order (right). The combinations of three elements out of six in lexicographic (or simply lex ) order are shown in figure 6.2A (left). The sequence is such that the sets are ordered lexicographically. Note that for the delta sets the element zero is printed first whereas with binary words (section 1.24 on page 62) the least significant bit (bit zero) is printed last. The sequence for co-lexicographic (or colex ) order is such that the sets, when written reversed, are ordered lexicographically. 6.2.1 Lexicographic order The following implementation generates the combinations in lexicographic order as sets [FXT: class combination lex in comb/combination-lex.h]: 1 2 3 4 class combination_lex { public: ulong *x_; // combination: k elements 0<=x[j]=2, k>=1 (i.e. s!=0 and t!=0) { s_ = n - k; t_ = k; n_ = s_ + t_; b_ = new ulong[n_]; first(); } [--snip--] void first() { for (ulong j=0; j=n_-1 ) return false; else { b_[x] = 0; ++x; b_[y] = 1; ++y; // X(s,t) if ( b_[x]==0 ) { b_[x] = 1; b_[0] = 0; // Y(s,t) if ( y>1 ) x = 1; // Z(s,t) y = 0; } return true; } } } [--snip--] The combinations 32 20  and 32 12  are generated at a rate of about 200 M/s. 182 6.4 Chapter 6: Combinations Minimal-change order complemented Gray code 111... 1: { 3, 4, 5 } ...111 1.11.. 2: { 1, 4, 5 } .1..11 .111.. 3: { 0, 4, 5 } 1...11 11.1.. 4: { 2, 4, 5 } ..1.11 1..11. 5: { 1, 2, 5 } .11..1 .1.11. 6: { 0, 2, 5 } 1.1..1 ..111. 7: { 0, 1, 5 } 11...1 1.1.1. 8: { 1, 3, 5 } .1.1.1 .11.1. 9: { 0, 3, 5 } 1..1.1 11..1. 10: { 2, 3, 5 } ..11.1 1...11 11: { 1, 2, 3 } .111.. .1..11 12: { 0, 2, 3 } 1.11.. ..1.11 13: { 0, 1, 3 } 11.1.. ...111 14: { 0, 1, 2 } 111... 1..1.1 15: { 1, 2, 4 } .11.1. .1.1.1 16: { 0, 2, 4 } 1.1.1. ..11.1 17: { 0, 1, 4 } 11..1. 1.1..1 18: { 1, 3, 4 } .1.11. .11..1 19: { 0, 3, 4 } 1..11. 11...1 20: { 2, 3, 4 } ..111.  Figure 6.4-A: Combinations 63 in Gray order (left) and complemented Gray order (right). 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: Gray code { 0, 1, 2 } { 0, 2, 3 } { 1, 2, 3 } { 0, 1, 3 } { 0, 3, 4 } { 1, 3, 4 } { 2, 3, 4 } { 0, 2, 4 } { 1, 2, 4 } { 0, 1, 4 } { 0, 4, 5 } { 1, 4, 5 } { 2, 4, 5 } { 3, 4, 5 } { 0, 3, 5 } { 1, 3, 5 } { 2, 3, 5 } { 0, 2, 5 } { 1, 2, 5 } { 0, 1, 5 } The combinations of three elements out of six in a minimal-change order (a Gray code) are shown in figure 6.4-A (left). With each transition exactly one element changes its position. We use a recursion for  the list C(n, k) of combinations nk (notation as in relation 14.1-1 on page 304): C(n, k) = [C(n − 1, k) ] [0 . C(n − 1, k) ] = [(n) . C R (n − 1, k − 1)] [1 . C R (n − 1, k − 1)] (6.4-1) The first equality is for the set representation, the second for the delta-set representation. An implementation is given in [FXT: comb/combination-gray-rec-demo.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ulong *x; // elements in combination at x[1] ... x[k] void comb_gray(ulong n, ulong k, bool z) { if ( k==n ) { for (ulong j=1; j<=k; ++j) x[j] = j; visit(); return; } if ( z ) // forward: { comb_gray(n-1, k, z); if ( k>0 ) { x[k] = n; } else // backward: { if ( k>0 ) { x[k] = n; comb_gray(n-1, k, z); } comb_gray(n-1, k-1, !z); } comb_gray(n-1, k-1, !z); } } The recursion can be partly unfolded as follows C(n, k) = [C(n − 2, k) ] [0 0 . C(n − 2, k) ] [(n − 1) . C R (n − 2, k − 1)] = [0 1 . C R (n − 2, k − 1)] [(n) . C R (n − 1, k − 1) ] [1 . C R (n − 1, k − 1) ] (6.4-2) 6.5: The Eades-McKay strong minimal-change order 183 A recursion for the complemented order is C 0 (n, k) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 = [(n) . C 0 (n − 1, k − 1)] [1 . C 0 (n − 1, k − 1)] = R 0R [C (n − 1, k) ] [0 . C 0 (n − 1, k) ] (6.4-3) void comb_gray_compl(ulong n, ulong k, bool z) { [--snip--] if ( z ) // forward: { if ( k>0 ) { x[k] = n; comb_gray_compl(n-1, k-1, z); } comb_gray_compl(n-1, k, !z); } else // backward: { comb_gray_compl(n-1, k, !z); if ( k>0 ) { x[k] = n; comb_gray_compl(n-1, k-1, z); } } } A very efficient (revolving door ) algorithm to generate the sets for the Gray code is given in [269]. An implementation following [215, alg.R, sect.7.2.1.3] is [FXT: class combination revdoor in comb/combination-revdoor.h]. Usage of the class is shown in [FXT: comb/combination-revdoor-demo.cc].  32 The routine generates the combinations 32 20 at a rate of about 115 M/s, the combinations 12 are generated at a rate of 181 M/s. An implementation geared for good performance for small values of k is given in [223], a C++ adaptation is [FXT: comb/combination-lam-demo.cc]. The combinations 32 12 are  generated at a rate of 190 M/s and the combinations 64 at a rate of 250 M/s. The routine is limited to 7 values k ≥ 2. 6.5 The Eades-McKay strong minimal-change order In any Gray code order for combinations just one element is moved between successive combinations. When an element is moved across any other, there is more than one change on the set representation. If i elements are crossed, then i + 1 entries in the set change: set { 0, 1, 2, 3 } { 1, 2, 3, 4 } delta set 1111.. .1111. A strong minimal-change order is a Gray code where only one entry in the set representation is changed per step. That is, only zeros in the delta set representation are crossed, the moves are called homogeneous. One such order is the Eades-McKay sequence described in [134]. The Eades-McKay sequence for the combinations 73 is shown in figure 6.5-A (left). 6.5.1 Recursive generation The Eades-McKay order can be generated with the program [FXT: comb/combination-emk-rec-demo.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ulong *rv; // elements in combination at rv[1] ... rv[k] void comb_emk(ulong n, ulong k, bool z) { if ( k==n ) { for (ulong j=1; j<=k; ++j) visit(); return; } if ( z ) // forward: { if ( (n>=2) && (k>=2) ) if ( (n>=2) && (k>=1) ) if ( (n>=1) ) } else // backward: rv[j] = j; { rv[k] = n; rv[k-1] = n-1; comb_emk(n-2, k-2, z); } { rv[k] = n; comb_emk(n-2, k-1, !z); } { comb_emk(n-1, k, z); } 184 Chapter 6: Combinations Eades-McKay 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: { 4, 5, 6 } { 3, 5, 6 } { 2, 5, 6 } { 1, 5, 6 } { 0, 5, 6 } { 0, 1, 6 } { 0, 2, 6 } { 1, 2, 6 } { 1, 3, 6 } { 0, 3, 6 } { 2, 3, 6 } { 2, 4, 6 } { 1, 4, 6 } { 0, 4, 6 } { 3, 4, 6 } { 3, 4, 5 } { 2, 4, 5 } { 1, 4, 5 } { 0, 4, 5 } { 0, 1, 5 } { 0, 2, 5 } { 1, 2, 5 } { 1, 3, 5 } { 0, 3, 5 } { 2, 3, 5 } { 2, 3, 4 } { 1, 3, 4 } { 0, 3, 4 } { 0, 1, 4 } { 0, 2, 4 } { 1, 2, 4 } { 1, 2, 3 } { 0, 2, 3 } { 0, 1, 3 } { 0, 1, 2 } complemented Eades-McKay ....111 ...1.11 ..1..11 .1...11 1....11 11....1 1.1...1 .11...1 .1.1..1 1..1..1 ..11..1 ..1.1.1 .1..1.1 1...1.1 ...11.1 ...111. ..1.11. .1..11. 1...11. 11...1. 1.1..1. .11..1. .1.1.1. 1..1.1. ..11.1. ..111.. .1.11.. 1..11.. 11..1.. 1.1.1.. .11.1.. .111... 1.11... 11.1... 111.... 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: { 4, 5, 6 } { 3, 5, 6 } { 2, 5, 6 } { 1, 5, 6 } { 0, 5, 6 } { 0, 4, 6 } { 1, 4, 6 } { 2, 4, 6 } { 3, 4, 6 } { 2, 3, 6 } { 1, 3, 6 } { 0, 3, 6 } { 0, 2, 6 } { 1, 2, 6 } { 0, 1, 6 } { 0, 1, 5 } { 0, 2, 5 } { 1, 2, 5 } { 2, 3, 5 } { 1, 3, 5 } { 0, 3, 5 } { 0, 4, 5 } { 1, 4, 5 } { 2, 4, 5 } { 3, 4, 5 } { 2, 3, 4 } { 1, 3, 4 } { 0, 3, 4 } { 0, 2, 4 } { 1, 2, 4 } { 0, 1, 4 } { 0, 1, 3 } { 0, 2, 3 } { 1, 2, 3 } { 0, 1, 2 } ....111 ...1.11 ..1..11 .1...11 1....11 1...1.1 .1..1.1 ..1.1.1 ...11.1 ..11..1 .1.1..1 1..1..1 1.1...1 .11...1 11....1 11...1. 1.1..1. .11..1. ..11.1. .1.1.1. 1..1.1. 1...11. .1..11. ..1.11. ...111. ..111.. .1.11.. 1..11.. 1.1.1.. .11.1.. 11..1.. 11.1... 1.11... .111... 111.... Figure 6.5-A: Combinations in Eades-McKay order (left) and complemented Eades-Mckay order (right). 20 21 22 23 24 25 { if ( (n>=1) ) if ( (n>=2) && (k>=1) ) if ( (n>=2) && (k>=2) ) { comb_emk(n-1, k, z); } { rv[k] = n; comb_emk(n-2, k-1, !z); } { rv[k] = n; rv[k-1] = n-1; comb_emk(n-2, k-2, z); } } }   32 The combinations 32 20 are generated at a rate of about 44 million per second, the combinations 12 at a rate of 34 million per second.  The underlying recursion for the list E(n, k) of combinations nk is (notation as in relation 14.1-1 on page 304) E(n, k) = [(n) . (n − 1) . E(n − 2, k − 2)] [1 1 . E(n − 2, k − 2) ] [(n) . E R (n − 2, k − 1) ] = [1 0 . E R (n − 2, k − 1)] [E(n − 1, k) ] [0 . E(n − 1, k) ] (6.5-1) Again, the first equality is for the set representation, the second for the delta-set representation. Counting the elements on both sides gives the relation         n n−2 n−2 n−1 = + + (6.5-2) k k−2 k−1 k which is an easy consequence of relation 6.1-3 on page 177. A recursion for the complemented sequence 6.5: The Eades-McKay strong minimal-change order 185 (with respect to the delta sets) is 0 E (n, k) [(n) . E 0 (n − 1, k − 1) ] [1 . E 0 (n − 1, k − 1) ] R 0 = [(n − 1) . E (n − 2, k − 1)] = [0 1 . E 0 R (n − 2, k − 1)] [E 0 (n − 2, k) ] [0 0 . E 0 (n − 2, k) ] (6.5-3) Counting on both sides gives         n n−2 n−2 n−1 = + + k k k−1 k−1 (6.5-4) The condition for the recursion end has to be modified: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 void comb_emk_compl(ulong n, ulong k, bool z) { if ( (k==0) || (k==n) ) { for (ulong j=1; j<=k; ++j) rv[j] = j; ++ct; visit(); return; } if ( z ) // forward: { if ( (n>=1) && (k>=1) ) if ( (n>=2) && (k>=1) ) if ( (n>=2) ) } else // backward: { if ( (n>=2) ) if ( (n>=2) && (k>=1) ) if ( (n>=1) && (k>=1) ) } { rv[k] = n; comb_emk_compl(n-1, k-1, z); } { rv[k] = n-1; comb_emk_compl(n-2, k-1, !z); } { comb_emk_compl(n-2, k-0, z); } // 1 // 01 // 00 { comb_emk_compl(n-2, k-0, z); } { rv[k] = n-1; comb_emk_compl(n-2, k-1, !z); } { rv[k] = n; comb_emk_compl(n-1, k-1, z); } // 00 // 01 // 1 } The complemented sequence is not a strong Gray code. 6.5.2 Iterative generation via modulo moves An iterative algorithm for the Eades-McKay sequence is given in [FXT: class combination emk in comb/combination-emk.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 class combination_emk { public: ulong *x_; // combination: k elements 0<=x[j]m ) u = 0; } else { --u; if ( u>m ) u = m; } u += sj; if ( u != a_[j] ) // next position != start position { x_[j] = u; s_[j+1] = u+1; return j; } } a_[j] = x_[j]; } return k_; // current combination is last } };  The combinations 32 20 are generated at a rate of about 60 million per second, the combinations a rate of 85 million per second [FXT: comb/combination-emk-demo.cc]. 6.5.3 32 12  at Alternative order via modulo moves A slight modification of the successor computation gives an ordering where the first and last combination differ by a single transposition (though not a homogeneous one), see figure 6.5-B. The generator is given in [FXT: class combination mod in comb/combination-mod.h]: 1 2 3 4 5 6 7 8 9 10 class combination_mod { [--snip--] ulong next() { [--snip--] // modulo moves: // if ( 0==(j&1) ) // gives EMK if ( 0!=(j&1) ) // mod [--snip--] The rate of generation is identical with the EMK order [FXT: comb/combination-mod-demo.cc]. 6.6 Two-close orderings via endo/enup moves 6.6.1 The endo and enup orderings for numbers The endo order of the set {0, 1, 2, . . . , m} is obtained by writing all odd numbers of the set in increasing order followed by all even numbers in decreasing order: {1, 3, 5, . . . , 6, 4, 2, 0}. The term endo stands 6.6: Two-close orderings via endo/enup moves mod 111.... 11....1 11...1. 11..1.. 11.1... 1.11... 1.1...1 1.1..1. 1.1.1.. 1..11.. 1..1..1 1..1.1. 1...11. 1...1.1 1....11 ....111 ...1.11 ...11.1 ...111. ..1.11. ..1.1.1 ..1..11 ..11..1 ..11.1. ..111.. .1.11.. .1.1..1 .1.1.1. .1..11. .1..1.1 .1...11 .11...1 .11..1. .11.1.. .111... 187 mod EMK 1: 1111... 1111... 2: 111.1.. 111...1 3: 111..1. 111..1. 4: 111...1 111.1.. 5: 11...11 11.11.. 6: 11..1.1 11.1..1 7: 11..11. 11.1.1. 8: 11.1.1. 11..11. 9: 11.1..1 11..1.1 10: 11.11.. 11...11 11: 1.111.. 1...111 12: 1.11.1. 1..1.11 13: 1.11..1 1..11.1 14: 1.1..11 1..111. 15: 1.1.1.1 1.1.11. 16: 1.1.11. 1.1.1.1 17: 1..111. 1.1..11 18: 1..11.1 1.11..1 19: 1..1.11 1.11.1. 20: 1...111 1.111.. 21: ...1111 .1111.. 22: ..1.111 .111..1 23: ..11.11 .111.1. 24: ..111.1 .11.11. 25: ..1111. .11.1.1 26: .1.111. .11..11 27: .1.11.1 .1..111 28: .1.1.11 .1.1.11 29: .1..111 .1.11.1 30: .11..11 .1.111. 31: .11.1.1 ..1111. 32: .11.11. ..111.1 33: .111.1. ..11.11 34: .111..1 ..1.111 35: .1111.. ...1111   Figure 6.5-B: All combinations 73 (left) and 74 (right) in mod order and EMK order. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: m 1: 2: 3: 4: 5: 6: 7: 8: 9: EMK 111.... 11.1... 11..1.. 11...1. 11....1 1....11 1...1.1 1...11. 1..1.1. 1..1..1 1..11.. 1.1.1.. 1.1..1. 1.1...1 1.11... .111... .11.1.. .11..1. .11...1 .1...11 .1..1.1 .1..11. .1.1.1. .1.1..1 .1.11.. ..111.. ..11.1. ..11..1 ..1..11 ..1.1.1 ..1.11. ...111. ...11.1 ...1.11 ....111 endo sequence 1 0 1 2 0 1 3 2 0 1 3 4 2 0 1 3 5 4 2 0 1 3 5 6 4 2 0 1 3 5 7 6 4 2 0 1 3 5 7 8 6 4 2 0 1 3 5 7 9 8 6 4 2 0 m 1: 2: 3: 4: 5: 6: 7: 8: 9: enup sequence 0 1 0 2 1 0 2 3 1 0 2 4 3 1 0 2 4 5 3 1 0 2 4 6 5 3 1 0 2 4 6 7 5 3 1 0 2 4 6 8 7 5 3 1 0 2 4 6 8 9 7 5 3 1 Figure 6.6-A: The endo (left) and enup (right) orderings with maximal value m. for ‘Even Numbers DOwn, odd numbers up’. A routine for generating the successor in endo order with maximal value m is [FXT: comb/endo-enup.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 inline ulong next_endo(ulong x, ulong m) // Return next number in endo order { if ( x & 1 ) // x odd { x += 2; if ( x>m ) x = m - (m&1); // == max even <= m } else // x even { x = ( x==0 ? 1 : x-2 ); } return x; } The sequences for the first few m are shown in figure 6.6-A. The routine computes one for the input zero. An ordering starting with the even numbers in increasing order will be called enup (for ‘Even Numbers UP, odd numbers down’). The computation of the successor can be implemented as 1 2 static inline ulong next_enup(ulong x, ulong m) { 188 3 4 5 6 7 8 9 10 11 12 13 Chapter 6: Combinations if ( x & 1 ) // x odd { x = ( x==1 ? 0 : x-2 ); } else // x even { x += 2; if ( x>m ) x = m - !(m&1); } return x; // max odd <=m } The orderings are reversals of each other, so we define: 1 2 static inline ulong prev_endo(ulong x, ulong m) static inline ulong prev_enup(ulong x, ulong m) { return next_enup(x, m); } { return next_endo(x, m); } A function that returns the x-th number in enup order with maximal digit m is 1 2 3 4 5 6 static inline ulong enup_num(ulong x, ulong m) { ulong r = 2*x; if ( r>m ) r = 2*m+1 - r; return r; } The function will only work if x ≤ m. For example, with m = 5: x: 0 1 2 3 4 5 r: 0 2 4 5 3 1 The inverse function is 1 2 3 4 5 6 static inline ulong enup_idx(ulong x, ulong m) { const ulong b = x & 1; x >>= 1; return ( b ? m-x : x ); } The function to map into endo order is 1 2 3 4 5 6 7 8 static inline ulong endo_num(ulong x, ulong m) { // return enup_num(m-x, m); x = m - x; ulong r = 2*x; if ( r>m ) r = 2*m+1 - r; return r; } For example, x: 0 1 2 3 4 5 r: 1 3 5 4 2 0 Its inverse is 1 2 3 4 5 6 static inline ulong endo_idx(ulong x, ulong m) { const ulong b = x & 1; x >>= 1; return ( b ? x : m-x ); } 6.6.2 The endo and enup orderings for combinations Two strong minimal-change orderings for combinations can be obtained via moves in enup and endo order. Figure 6.6-B shows an ordering where the moves to the right are on even positions (enup order, left). If the moves to the right are on odd positions (endo order), then Chase’s sequence is obtained (right). Both have the property of being two-close: an element in the delta set moves by at most two positions (and the move is homogeneous, no other element is crossed). An implementation of an iterative algorithm for the computation of the combinations in enup order is [FXT: class combination enup in comb/combination-enup.h]. 1 2 class combination_enup { 6.6: Two-close orderings via endo/enup moves enup moves 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: { 0, 1, 2 } { 0, 1, 4 } { 0, 1, 6 } { 0, 1, 7 } { 0, 1, 5 } { 0, 1, 3 } { 0, 2, 3 } { 0, 2, 4 } { 0, 2, 6 } { 0, 2, 7 } { 0, 2, 5 } { 0, 4, 5 } { 0, 4, 6 } { 0, 4, 7 } { 0, 6, 7 } { 0, 5, 7 } { 0, 5, 6 } { 0, 3, 6 } { 0, 3, 7 } { 0, 3, 5 } { 0, 3, 4 } { 2, 3, 4 } { 2, 3, 6 } { 2, 3, 7 } { 2, 3, 5 } { 2, 4, 5 } { 2, 4, 6 } { 2, 4, 7 } { 2, 6, 7 } { 2, 5, 7 } { 2, 5, 6 } { 4, 5, 6 } { 4, 5, 7 } { 4, 6, 7 } { 5, 6, 7 } { 3, 6, 7 } { 3, 5, 7 } { 3, 5, 6 } { 3, 4, 6 } { 3, 4, 7 } { 3, 4, 5 } { 1, 4, 5 } { 1, 4, 6 } { 1, 4, 7 } { 1, 6, 7 } { 1, 5, 7 } { 1, 5, 6 } { 1, 3, 6 } { 1, 3, 7 } { 1, 3, 5 } { 1, 3, 4 } { 1, 2, 4 } { 1, 2, 6 } { 1, 2, 7 } { 1, 2, 5 } { 1, 2, 3 } 189 endo moves 111..... 1: { 0, 1, 2 } 111..... 11..1... 2: { 0, 1, 3 } 11.1.... 11....1. 3: { 0, 1, 5 } 11...1.. 11.....1 4: { 0, 1, 7 } 11.....1 11...1.. 5: { 0, 1, 6 } 11....1. 11.1.... 6: { 0, 1, 4 } 11..1... 1.11.... 7: { 0, 3, 4 } 1..11... 1.1.1... 8: { 0, 3, 5 } 1..1.1.. 1.1...1. 9: { 0, 3, 7 } 1..1...1 1.1....1 10: { 0, 3, 6 } 1..1..1. 1.1..1.. 11: { 0, 5, 6 } 1....11. 1...11.. 12: { 0, 5, 7 } 1....1.1 1...1.1. 13: { 0, 6, 7 } 1.....11 1...1..1 14: { 0, 4, 7 } 1...1..1 1.....11 15: { 0, 4, 6 } 1...1.1. 1....1.1 16: { 0, 4, 5 } 1...11.. 1....11. 17: { 0, 2, 5 } 1.1..1.. 1..1..1. 18: { 0, 2, 7 } 1.1....1 1..1...1 19: { 0, 2, 6 } 1.1...1. 1..1.1.. 20: { 0, 2, 4 } 1.1.1... 1..11... 21: { 0, 2, 3 } 1.11.... ..111... 22: { 1, 2, 3 } .111.... ..11..1. 23: { 1, 2, 5 } .11..1.. ..11...1 24: { 1, 2, 7 } .11....1 ..11.1.. 25: { 1, 2, 6 } .11...1. ..1.11.. 26: { 1, 2, 4 } .11.1... ..1.1.1. 27: { 1, 3, 4 } .1.11... ..1.1..1 28: { 1, 3, 5 } .1.1.1.. ..1...11 29: { 1, 3, 7 } .1.1...1 ..1..1.1 30: { 1, 3, 6 } .1.1..1. ..1..11. 31: { 1, 5, 6 } .1...11. ....111. 32: { 1, 5, 7 } .1...1.1 ....11.1 33: { 1, 6, 7 } .1....11 ....1.11 34: { 1, 4, 7 } .1..1..1 .....111 35: { 1, 4, 6 } .1..1.1. ...1..11 36: { 1, 4, 5 } .1..11.. ...1.1.1 37: { 3, 4, 5 } ...111.. ...1.11. 38: { 3, 4, 7 } ...11..1 ...11.1. 39: { 3, 4, 6 } ...11.1. ...11..1 40: { 3, 5, 6 } ...1.11. ...111.. 41: { 3, 5, 7 } ...1.1.1 .1..11.. 42: { 3, 6, 7 } ...1..11 .1..1.1. 43: { 5, 6, 7 } .....111 .1..1..1 44: { 4, 6, 7 } ....1.11 .1....11 45: { 4, 5, 7 } ....11.1 .1...1.1 46: { 4, 5, 6 } ....111. .1...11. 47: { 2, 5, 6 } ..1..11. .1.1..1. 48: { 2, 5, 7 } ..1..1.1 .1.1...1 49: { 2, 6, 7 } ..1...11 .1.1.1.. 50: { 2, 4, 7 } ..1.1..1 .1.11... 51: { 2, 4, 6 } ..1.1.1. .11.1... 52: { 2, 4, 5 } ..1.11.. .11...1. 53: { 2, 3, 5 } ..11.1.. .11....1 54: { 2, 3, 7 } ..11...1 .11..1.. 55: { 2, 3, 6 } ..11..1. .111.... 56: { 2, 3, 4 } ..111...  Figure 6.6-B: Combinations 83 via enup moves (left) and via endo moves (Chase’s sequence, right). 190 Chapter 6: Combinations 3 4 5 6 7 public: ulong *x_; // combination: k elements 0<=x[j]=2) && (k>=2) ) if ( (n>=2) && (k>=1) ) if ( (n>=1) ) } else // backward: { if ( (n>=1) ) if ( (n>=2) && (k>=1) ) if ( (n>=2) && (k>=2) ) } { rv[k] = n; rv[k-1] = n-1; comb_enup(n-2, k-2, z); } { rv[k] = n; comb_enup(n-2, k-1, z); } { comb_enup(n-1, k, !z); } { comb_enup(n-1, k, !z); } { rv[k] = n; comb_enup(n-2, k-1, z); } { rv[k] = n; rv[k-1] = n-1; comb_enup(n-2, k-2, z); } } A recursion for the complemented sequence (with respect to the delta sets) is R 0 U (n, k) = R [(n) . U 0 (n − 1, k − 1) ] [1 . U 0 (n − 1, k − 1)] 0 [(n − 1) . U (n − 2, k − 1)] = [0 1 . U 0 (n − 2, k − 1)] [U 0 (n − 2, k) ] [0 0 . U 0 (n − 2, k) ] (6.6-2) The condition for the recursion end has to be modified: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 void comb_enup_compl(ulong n, ulong k, bool z) { if ( (k==0) || (k==n) ) { visit(); return; } if ( z ) // forward: { if ( (n>=1) && (k>=1) ) if ( (n>=2) && (k>=1) ) if ( (n>=2) ) } else // backward: { if ( (n>=2) ) if ( (n>=2) && (k>=1) ) if ( (n>=1) && (k>=1) ) } { rv[k] = n; comb_enup_compl(n-1, k-1, !z); } { rv[k] = n-1; comb_enup_compl(n-2, k-1, z); } { comb_enup_compl(n-2, k-0, z); } // 1 // 01 // 00 { comb_enup_compl(n-2, k-0, z); } { rv[k] = n-1; comb_enup_compl(n-2, k-1, z); } { rv[k] = n; comb_enup_compl(n-1, k-1, !z); } // 00 // 01 // 1 } An algorithm for Chase’s sequence that generates delta sets is described in [215, alg.C, sect.7.2.1.3], an The routine implementation is given in [FXT: class combination chase in comb/combination-chase.h].   32 generates about 80 million combinations per second for both 32 20 and 12 [FXT: comb/combinationchase-demo.cc]. 6.7 Recursive generation of certain orderings We give a simple recursive routine to generate the orders shown in figure 6.7-A. The combinations are generated as sets [FXT: class comb rec in comb/combination-rec.h]: 1 2 class comb_rec { 192 Chapter 6: Combinations 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: lexicographic Gray code compl. enup 111.... 11.1... 11..1.. 11...1. 11....1 1.11... 1.1.1.. 1.1..1. 1.1...1 1..11.. 1..1.1. 1..1..1 1...11. 1...1.1 1....11 .111... .11.1.. .11..1. .11...1 .1.11.. .1.1.1. .1.1..1 .1..11. .1..1.1 .1...11 ..111.. ..11.1. ..11..1 ..1.11. ..1.1.1 ..1..11 ...111. ...11.1 ...1.11 ....111 1....11 1...11. 1...1.1 1..11.. 1..1.1. 1..1..1 1.11... 1.1.1.. 1.1..1. 1.1...1 111.... 11.1... 11..1.. 11...1. 11....1 .1...11 .1..11. .1..1.1 .1.11.. .1.1.1. .1.1..1 .111... .11.1.. .11..1. .11...1 ..1..11 ..1.11. ..1.1.1 ..111.. ..11.1. ..11..1 ...1.11 ...111. ...11.1 ....111 1....11 1...1.1 1...11. 1..11.. 1..1.1. 1..1..1 1.1...1 1.1..1. 1.1.1.. 1.11... 111.... 11.1... 11..1.. 11...1. 11....1 .11...1 .11..1. .11.1.. .111... .1.11.. .1.1.1. .1.1..1 .1..1.1 .1..11. .1...11 ..1..11 ..1.1.1 ..1.11. ..111.. ..11.1. ..11..1 ...11.1 ...111. ...1.11 ....111 compl. Eades-McKay 111.... 11.1... 11..1.. 11...1. 11....1 1.1...1 1.1..1. 1.1.1.. 1.11... 1..11.. 1..1.1. 1..1..1 1...1.1 1...11. 1....11 .1...11 .1..1.1 .1..11. .1.11.. .1.1.1. .1.1..1 .11...1 .11..1. .11.1.. .111... ..111.. ..11.1. ..11..1 ..1.1.1 ..1.11. ..1..11 ...1.11 ...11.1 ...111. ....111  Figure 6.7-A: All combinations 73 in lexicographic, minimal-change, complemented enup, and complemented Eades-McKay order (from left to right). 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 public: ulong n_, k_; // (n choose k) ulong *rv_; // combination: k elements 0<=x[j] lexicographic order // 1 ==> Gray code // 2 ==> complemented enup order // 3 ==> complemented Eades-McKay sequence ulong nq_; // whether to reverse order [--snip--] void (*visit_)(const comb_rec &); // function to call with each combination [--snip--] void generate(void (*visit)(const comb_rec &), ulong rq, ulong nq=0) { visit_ = visit; rq_ = rq; nq_ = nq; ct_ = 0; rct_ = 0; next_rec(0); } The recursion function is given in [FXT: comb/combination-rec.cc]: 1 2 3 4 5 6 7 8 9 void comb_rec::next_rec(ulong d) { ulong r = k_ - d; // number of elements remaining if ( 0==r ) visit_(*this); else { ulong rv1 = rv_[d-1]; // left neighbor bool q; switch ( rq_ ) 6.7: Recursive generation of certain orderings 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 { case 0: q = 1; break; case 1: q = !(d&1); break; case 2: q = rv1&1; break; case 3: q = (d^rv1)&1; break; default: q = 1; } q ^= nq_; // reversed order if 193 // // // // 0 ==> lexicographic order 1 ==> Gray code 2 ==> complemented enup order 3 ==> complemented Eades-McKay sequence nq == true if ( q ) // forward: for (ulong x=rv1+1; x<=n_-r; ++x) { rv_[d] = x; next_rec(d+1); } else // backward: for (ulong x=n_-r; (long)x>=(long)rv1+1; --x) { rv_[d] = x; next_rec(d+1); } } } Figure 6.7-A was created  with the program [FXT: comb/combination-rec-demo.cc]. The routine generates  32 the combinations 32 20 at a rate of about 35 million objects per second. The combinations 12 are generated at a rate of 64 million objects per second. 194 Chapter 7: Compositions Chapter 7 Compositions The compositions of n into at most k parts are the ordered tuples (x0 , x1 , . . . , xk−1 ) where x0 + x1 + . . . + xk−1 = n and 0 ≤ xi ≤ n. Order matters: one 4-composition of 7 is (0, 1, 5, 1), different ones are (5, 0, 1, 1) and (0, 5, 1, 1). The compositions of n into at most k parts are also called ‘k-compositions of n’. To obtain the compositions of n into exactly k parts (where k ≤ n) generate the compositions of n − k into k parts and add one to each position. 7.1 Co-lexicographic order 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: composition chg [ 3 . . . . ] 4 [ 2 1 . . . ] 1 [ 1 2 . . . ] 1 [ . 3 . . . ] 1 [ 2 . 1 . . ] 2 [ 1 1 1 . . ] 1 [ . 2 1 . . ] 1 [ 1 . 2 . . ] 2 [ . 1 2 . . ] 1 [ . . 3 . . ] 2 [ 2 . . 1 . ] 3 [ 1 1 . 1 . ] 1 [ . 2 . 1 . ] 1 [ 1 . 1 1 . ] 2 [ . 1 1 1 . ] 1 [ . . 2 1 . ] 2 [ 1 . . 2 . ] 3 [ . 1 . 2 . ] 1 [ . . 1 2 . ] 2 [ . . . 3 . ] 3 [ 2 . . . 1 ] 4 [ 1 1 . . 1 ] 1 [ . 2 . . 1 ] 1 [ 1 . 1 . 1 ] 2 [ . 1 1 . 1 ] 1 [ . . 2 . 1 ] 2 [ 1 . . 1 1 ] 3 [ . 1 . 1 1 ] 1 [ . . 1 1 1 ] 2 [ . . . 2 1 ] 3 [ 1 . . . 2 ] 4 [ . 1 . . 2 ] 1 [ . . 1 . 2 ] 2 [ . . . 1 2 ] 3 [ . . . . 3 ] 4 combination 111.... 11.1... 1.11... .111... 11..1.. 1.1.1.. .11.1.. 1..11.. .1.11.. ..111.. 11...1. 1.1..1. .11..1. 1..1.1. .1.1.1. ..11.1. 1...11. .1..11. ..1.11. ...111. 11....1 1.1...1 .11...1 1..1..1 .1.1..1 ..11..1 1...1.1 .1..1.1 ..1.1.1 ...11.1 1....11 .1...11 ..1..11 ...1.11 ....111 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: composition chg [ 7 . . ] 2 [ 6 1 . ] 1 [ 5 2 . ] 1 [ 4 3 . ] 1 [ 3 4 . ] 1 [ 2 5 . ] 1 [ 1 6 . ] 1 [ . 7 . ] 1 [ 6 . 1 ] 2 [ 5 1 1 ] 1 [ 4 2 1 ] 1 [ 3 3 1 ] 1 [ 2 4 1 ] 1 [ 1 5 1 ] 1 [ . 6 1 ] 1 [ 5 . 2 ] 2 [ 4 1 2 ] 1 [ 3 2 2 ] 1 [ 2 3 2 ] 1 [ 1 4 2 ] 1 [ . 5 2 ] 1 [ 4 . 3 ] 2 [ 3 1 3 ] 1 [ 2 2 3 ] 1 [ 1 3 3 ] 1 [ . 4 3 ] 1 [ 3 . 4 ] 2 [ 2 1 4 ] 1 [ 1 2 4 ] 1 [ . 3 4 ] 1 [ 2 . 5 ] 2 [ 1 1 5 ] 1 [ . 2 5 ] 1 [ 1 . 6 ] 2 [ . 1 6 ] 1 [ . . 7 ] 2 combination 1111111.. 111111.1. 11111.11. 1111.111. 111.1111. 11.11111. 1.111111. .1111111. 111111..1 11111.1.1 1111.11.1 111.111.1 11.1111.1 1.11111.1 .111111.1 11111..11 1111.1.11 111.11.11 11.111.11 1.1111.11 .11111.11 1111..111 111.1.111 11.11.111 1.111.111 .1111.111 111..1111 11.1.1111 1.11.1111 .111.1111 11..11111 1.1.11111 .11.11111 1..111111 .1.111111 ..1111111 Figure 7.1-A: The compositions of 3 into 5 parts in co-lexicographic order, positions of the rightmost change, and delta sets of the corresponding combinations (left); and the corresponding data for compositions of 7 into 3 parts (right). Dots denote zeros. 7.1: Co-lexicographic order 195 The compositions in co-lexicographic (colex) order are shown in figure 7.1-A. The generator is implemented as [FXT: class composition colex in comb/composition-colex.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 class composition_colex { public: ulong n_, k_; // composition of n into k parts ulong *x_; // data (k elements) [--snip--] void first() { x_[0] = n_; // all in first position for (ulong k=1; k=k { n_ = n; k_ = k; nk1_ = n - k + 1; // must be >= 1 if ( (long)nk1_ < 1 ) nk1_ = 1; // avoid hang with invalid pair n,k x_ = new ulong[k_ + 1]; x_[k] = 0; // not one first(); } [--snip--] The variable nk1_ is the maximal entry in the compositions: 1 2 3 4 5 6 7 8 9 10 11 void first() { x_[0] = nk1_; // all in first position for (ulong k=1; k0 { ulong k = N-K+1; for (ulong z=0; z=N ) return; 8.4: Shifts-order for subsets 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: .....1 ....1. ...1.. ..1... .1.... 1..... 1....1 .1...1 1...1. 1...11 ..1..1 .1..1. 1..1.. 1..1.1 .1..11 1..11. 1 1 1 1 1 1 2 2 2 3 2 2 2 3 3 3 209 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 1..111 ...1.1 ..1.1. .1.1.. 1.1... 1.1..1 .1.1.1 1.1.1. 1.1.11 ..1.11 .1.11. 1.11.. 1.11.1 .1.111 1.111. 1.1111 4 2 2 2 2 3 3 3 4 3 3 3 4 4 4 5 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: ....11 ...11. ..11.. .11... 11.... 11...1 .11..1 11..1. 11..11 ..11.1 .11.1. 11.1.. 11.1.1 .11.11 11.11. 11.111 2 2 2 2 2 3 3 3 4 3 3 3 4 4 4 5 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: ...111 ..111. .111.. 111... 111..1 .111.1 111.1. 111.11 ..1111 .1111. 1111.. 1111.1 .11111 11111. 111111 3 3 3 3 4 4 4 5 4 4 4 5 5 5 6 Figure 8.4-A: Nonempty subsets of a 6-bit binary word where all linear shifts of a word appear in succession (shifts-order). All shifts are left shifts. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: .....1 ....1. ...1.. ..1... .1.... 1..... 1....1 1...11 1...1. .1...1 .1..11 1..11. 1..111 1..1.1 1..1.. .1..1. 1 1 1 1 1 1 2 3 2 2 3 3 4 3 2 2 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: ..1..1 ..1.11 .1.11. 1.11.. 1.11.1 1.1111 1.111. .1.111 .1.1.1 1.1.1. 1.1.11 1.1..1 1.1... .1.1.. ..1.1. ...1.1 2 3 3 3 4 5 4 4 3 3 4 3 2 2 2 2 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: ...111 ..111. .111.. 111... 111..1 111.11 111.1. .111.1 .11111 11111. 111111 1111.1 1111.. .1111. ..1111 ..11.1 3 3 3 3 4 5 4 4 5 5 6 5 4 4 4 3 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: .11.1. 11.1.. 11.1.1 11.111 11.11. .11.11 .11..1 11..1. 11..11 11...1 11.... .11... ..11.. ...11. ....11 3 3 4 5 4 4 3 3 4 3 2 2 2 2 2 Figure 8.4-B: Nonempty subsets of a 6-bit binary word where all linear shifts of a word appear in succession and transitions that are not shifts switch just one bit (minimal-change shifts-order). 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: .......1 ......1. .....1.. ....1... ...1.... ..1..... .1...... 1....... 1......1 .1.....1 1.....1. ..1....1 .1....1. 1....1.. 1....1.1 ...1...1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 3 2 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: ..1...1. .1...1.. 1...1... 1...1..1 .1...1.1 1...1.1. ....1..1 ...1..1. ..1..1.. .1..1... 1..1.... 1..1...1 .1..1..1 1..1..1. ..1..1.1 .1..1.1. 2 2 2 3 3 3 2 2 2 2 2 3 3 3 3 3 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 1..1.1.. 1..1.1.1 .....1.1 ....1.1. ...1.1.. ..1.1... .1.1.... 1.1..... 1.1....1 .1.1...1 1.1...1. ..1.1..1 .1.1..1. 1.1..1.. 1.1..1.1 ...1.1.1 3 4 2 2 2 2 2 2 3 3 3 3 3 3 4 3 49: 50: 51: 52: 53: 54: ..1.1.1. .1.1.1.. 1.1.1... 1.1.1..1 .1.1.1.1 1.1.1.1. 3 3 3 4 4 4 Figure 8.4-C: Nonzero Fibonacci words in an order where all shifts appear in succession. 7 8 9 10 visit(x); A(2*x); A(2*x+1); } The function visit() prints the binary expansion of its argument. The initial call is A(1). The transitions that are not shifts change just one bit if the following pair of functions is used for the recursion (minimal-change shifts-order shown in figure 8.4-B): 1 2 3 4 5 6 void F(ulong x) { if ( x>=N ) visit(x); F(2*x); G(2*x+1); return; 210 7 8 9 10 11 12 13 14 15 Chapter 8: Subsets } void G(ulong x) { if ( x>=N ) F(2*x+1); G(2*x); visit(x); } return; The initial call is F(1), the reversed order can be generated via G(1). A simple variation can be used to generate the Fibonacci words in a shifts-order shown in figure 8.4-C. With transitions that are not shifts more than one bit is changed in general. The function used is [FXT: comb/shift-subsets-demo.cc]: 1 2 3 4 5 6 7 void B(ulong x) { if ( x>=N ) visit(x); B(2*x); B(4*x+1); } return; A bit-level algorithm for combinations in shifts-order is given in section 1.24.3 on page 64. 8.5 k-subsets where k lies in a given range We give algorithms for generating all k-subsets of the n-set where k lies in the range kmin ≤ k ≤ kmax . If kmin = 0 and kmax = n, we generate all subsets. If kmin = kmax = k, we get the k-combinations of n. 8.5.1 Recursive algorithm A generator for all k-subsets where k lies in a prescribed range is [FXT: class ksubset rec in comb/ksubset-rec.h]. The used algorithm can generate the subsets in 16 different orders. Figure 8.5A shows the lexicographic orders, figure 8.5-B shows three Gray codes. The constructor has just one argument, the number of elements of the set whose subsets are generated: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 class ksubset_rec // k-subsets where kmin<=k<=kmax in various orders. // Recursive CAT algorithm. { public: long n_; // subsets of a n-element set long kmin_, kmax_; // k-subsets where kmin<=k<=kma long *rv_; // record of visits in graph (list of elements in subset) ulong ct_; // count subsets ulong rct_; // count recursions (==work) ulong rq_; // condition that determines the order ulong pq_; // condition that determines the (printing) order ulong nq_; // whether to reverse order // function to call with each combination: void (*visit_)(const ksubset_rec &, long); public: ksubset_rec(ulong n) { n_ = n; rv_ = new long[n_+1]; ++rv_; rv_[-1] = -1UL; } ~ksubset_rec() { --rv_; delete [] rv_; } One has to supply the interval for k (variables kmin and kmax) and a function that will be called with each subset. The argument rq determines which of the sixteen different orderings is chosen, the order 8.5: k-subsets where k lies in a given range 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: order #0: 11.... ...... 111... ..P... 11.1.. ..MP.. 11..1. ...MP. 11...1 ....MP 1.1... .MP..M 1.11.. ...P.. 1.1.1. ...MP. 1.1..1 ....MP 1..1.. ..MP.M 1..11. ....P. 1..1.1 ....MP 1...1. ...MPM 1...11 .....P 1....1 ....M. .11... MPP..M .111.. ...P.. .11.1. ...MP. .11..1 ....MP .1.1.. ..MP.M .1.11. ....P. .1.1.1 ....MP .1..1. ...MPM .1..11 .....P .1...1 ....M. ..11.. .MPP.M ..111. ....P. ..11.1 ....MP ..1.1. ...MPM ..1.11 .....P ..1..1 ....M. ...11. ..MPPM ...111 .....P ...1.1 ....M. ....11 ...MP. { 0, 1 } { 0, 1, 2 } { 0, 1, 3 } { 0, 1, 4 } { 0, 1, 5 } { 0, 2 } { 0, 2, 3 } { 0, 2, 4 } { 0, 2, 5 } { 0, 3 } { 0, 3, 4 } { 0, 3, 5 } { 0, 4 } { 0, 4, 5 } { 0, 5 } { 1, 2 } { 1, 2, 3 } { 1, 2, 4 } { 1, 2, 5 } { 1, 3 } { 1, 3, 4 } { 1, 3, 5 } { 1, 4 } { 1, 4, 5 } { 1, 5 } { 2, 3 } { 2, 3, 4 } { 2, 3, 5 } { 2, 4 } { 2, 4, 5 } { 2, 5 } { 3, 4 } { 3, 4, 5 } { 3, 5 } { 4, 5 } 211 order #8: 111... ...... 11.1.. ..MP.. 11..1. ...MP. 11...1 ....MP 11.... .....M 1.11.. .MPP.. 1.1.1. ...MP. 1.1..1 ....MP 1.1... .....M 1..11. ..MPP. 1..1.1 ....MP 1..1.. .....M 1...11 ...MPP 1...1. .....M 1....1 ....MP .111.. MPPP.M .11.1. ...MP. .11..1 ....MP .11... .....M .1.11. ..MPP. .1.1.1 ....MP .1.1.. .....M .1..11 ...MPP .1..1. .....M .1...1 ....MP ..111. .MPPPM ..11.1 ....MP ..11.. .....M ..1.11 ...MPP ..1.1. .....M ..1..1 ....MP ...111 ..MPP. ...11. .....M ...1.1 ....MP ....11 ...MP. { 0, 1, 2 } { 0, 1, 3 } { 0, 1, 4 } { 0, 1, 5 } { 0, 1 } { 0, 2, 3 } { 0, 2, 4 } { 0, 2, 5 } { 0, 2 } { 0, 3, 4 } { 0, 3, 5 } { 0, 3 } { 0, 4, 5 } { 0, 4 } { 0, 5 } { 1, 2, 3 } { 1, 2, 4 } { 1, 2, 5 } { 1, 2 } { 1, 3, 4 } { 1, 3, 5 } { 1, 3 } { 1, 4, 5 } { 1, 4 } { 1, 5 } { 2, 3, 4 } { 2, 3, 5 } { 2, 3 } { 2, 4, 5 } { 2, 4 } { 2, 5 } { 3, 4, 5 } { 3, 4 } { 3, 5 } { 4, 5 } Figure 8.5-A: The k-subsets (where 2 ≤ k ≤ 3) of a 6-element set. Lexicographic order for sets (left) and reversed lexicographic order for delta sets (right). can be reversed with nonzero nq. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 void generate(void (*visit)(const ksubset_rec &, long), long kmin, long kmax, ulong rq, ulong nq=0) { ct_ = 0; rct_ = 0; kmin_ = kmin; kmax_ = kmax; if ( kmin_ > kmax_ ) swap2(kmin_, kmax_); if ( kmax_ > n_ ) kmax_ = n_; if ( kmin_ > n_ ) kmin_ = n_; visit_ = visit; rq_ = rq % 4; pq_ = (rq>>2) % 4; nq_ = nq; next_rec(0); } private: void next_rec(long d); }; The recursive routine itself is given in [FXT: comb/ksubset-rec.cc]: 1 2 3 4 void ksubset_rec::next_rec(long d) { if ( d>kmax_ ) return; 212 Chapter 8: Subsets 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: order #6: 1....1 ...... 1...11 ....P. 1...1. .....M 1..1.. ...PM. 1..11. ....P. 1..1.1 ....MP 1.1..1 ..PM.. 1.1.1. ....PM 1.11.. ...PM. 1.1... ...M.. 11.... .PM... 111... ..P... 11.1.. ..MP.. 11..1. ...MP. 11...1 ....MP .11..1 M.P... .11.1. ....PM .111.. ...PM. .11... ...M.. .1.1.. ..MP.. .1.11. ....P. .1.1.1 ....MP .1..11 ...MP. .1..1. .....M .1...1 ....MP ..1..1 .MP... ..1.11 ....P. ..1.1. .....M ..11.. ...PM. ..111. ....P. ..11.1 ....MP ...111 ..M.P. ...11. .....M ...1.1 ....MP ....11 ...MP. order #7: 11.... ...... 111... ..P... 11.1.. ..MP.. 11..1. ...MP. 11...1 ....MP 1.1..1 .MP... 1.1.1. ....PM 1.11.. ...PM. 1.1... ...M.. 1..1.. ..MP.. 1..11. ....P. 1..1.1 ....MP 1...11 ...MP. 1...1. .....M 1....1 ....MP .1...1 MP.... .1..11 ....P. .1..1. .....M .1.1.. ...PM. .1.11. ....P. .1.1.1 ....MP .11..1 ..PM.. .11.1. ....PM .111.. ...PM. .11... ...M.. ..11.. .M.P.. ..111. ....P. ..11.1 ....MP ..1.11 ...MP. ..1.1. .....M ..1..1 ....MP ...1.1 ..MP.. ...111 ....P. ...11. .....M ....11 ...M.P order #10: 1....1 ...... 1...1. ....PM 1...11 .....P 1..11. ...P.M 1..1.1 ....MP 1..1.. .....M 1.1... ..PM.. 1.1..1 .....P 1.1.1. ....PM 1.11.. ...PM. 111... .P.M.. 11.1.. ..MP.. 11..1. ...MP. 11...1 ....MP 11.... .....M .11... M.P... .11..1 .....P .11.1. ....PM .111.. ...PM. .1.11. ..M.P. .1.1.1 ....MP .1.1.. .....M .1..1. ...MP. .1..11 .....P .1...1 ....M. ..1..1 .MP... ..1.1. ....PM ..1.11 .....P ..111. ...P.M ..11.1 ....MP ..11.. .....M ...11. ..M.P. ...111 .....P ...1.1 ....M. ....11 ...MP. Figure 8.5-B: Three minimal-change orders of the k-subsets (where 2 ≤ k ≤ 3) of a 6-element set. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: order #7: ...... ...... 1..... P..... 11.... .P.... 111... ..P... 1111.. ...P.. 11111. ....P. 111111 .....P 1111.1 ....M. 111.11 ...MP. 111.1. .....M 111..1 ....MP 11.1.1 ..MP.. 11.111 ....P. 11.11. .....M 11.1.. ....M. 11..1. ...MP. 11..11 .....P 11...1 ....M. 1.1..1 .MP... 1.1.1. ....PM 1.1.11 .....P 1.11.1 ...PM. 1.1111 ....P. 1.111. .....M 1.11.. ....M. 1.1... ...M.. 1..1.. ..MP.. 1..11. ....P. 1..111 .....P 1..1.1 ....M. 1...11 ...MP. 1...1. .....M 0 0 1 0 1 2 0 1 2 3 0 1 2 3 4 0 1 2 3 4 5 0 1 2 3 5 0 1 2 4 5 0 1 2 4 0 1 2 5 0 1 3 5 0 1 3 4 5 0 1 3 4 0 1 3 0 1 4 0 1 4 5 0 1 5 0 2 5 0 2 4 0 2 4 5 0 2 3 5 0 2 3 4 5 0 2 3 4 0 2 3 0 2 0 3 0 3 4 0 3 4 5 0 3 5 0 4 5 0 4 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: 1....1 .1...1 .1..11 .1..1. .1.1.. .1.11. .1.111 .1.1.1 .11..1 .11.1. .11.11 .111.1 .11111 .1111. .111.. .11... .1.... ..1... ..11.. ..111. ..1111 ..11.1 ..1.11 ..1.1. ..1..1 ...1.1 ...111 ...11. ...1.. ....1. ....11 .....1 ....MP MP.... ....P. .....M ...PM. ....P. .....P ....M. ..PM.. ....PM .....P ...PM. ....P. .....M ....M. ...M.. ..M... .MP... ...P.. ....P. .....P ....M. ...MP. .....M ....MP ..MP.. ....P. .....M ....M. ...MP. .....P ....M. 0 5 1 5 1 4 5 1 4 1 3 1 3 4 1 3 4 5 1 3 5 1 2 5 1 2 4 1 2 4 5 1 2 3 5 1 2 3 4 5 1 2 3 4 1 2 3 1 2 1 2 2 3 2 3 4 2 3 4 5 2 3 5 2 4 5 2 4 2 5 3 5 3 4 5 3 4 3 4 4 5 5 Figure 8.5-C: With kmin = 0 and order number seven at each transition either one element is added or removed, or one element moves to an adjacent position. 8.5: k-subsets where k lies in a given range 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 213 ++rct_; // measure computational work long rv1 = rv_[d-1]; // left neighbor bool q; switch ( rq_ % 4 ) { case 0: q = 1; break; case 1: q = !(d&1); break; case 2: q = rv1&1; break; case 3: q = (d^rv1)&1; break; } if ( nq_ ) q = !q; long x0 = rv1 + 1; long rx = n_ - (kmin_ - d); long x1 = min2( n_-1, rx ); #define PCOND(x) if ( (pq_==x) && (d>=kmin_) ) { visit_(*this, d); ++ct_; } PCOND(0); if ( q ) // forward: { PCOND(1); for (long x=x0; x<=x1; ++x) { rv_[d] = x; next_rec(d+1); } PCOND(2); } else // backward: { PCOND(2); for (long x=x1; x>=x0; --x) { rv_[d] = x; next_rec(d+1); } PCOND(1); } PCOND(3); #undef PCOND } About 50 million subsets per second are generated [FXT: comb/ksubset-rec-demo.cc]. 8.5.2 Iterative algorithm for a minimal-change order 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: delta set ...11 ..11. ..111 ..1.1 .11.. .11.1 .1111 .111. .1.1. .1.11 .1..1 11... 11..1 11.11 11.1. 1111. 111.1 111.. 1.1.. 1.1.1 1.111 1.11. 1..1. 1..11 1...1 diff ..... ..P.M ....P ...M. .P..M ....P ...P. ....M ..M.. ....P ...M. P...M ....P ...P. ....M ..P.. ...MP ....M .M... ....P ...P. ....M ..M.. ....P ...M. set { 4, 5 } { 3, 4 } { 3, 4, 5 } { 3, 5 } { 2, 3 } { 2, 3, 5 } { 2, 3, 4, 5 } { 2, 3, 4 } { 2, 4 } { 2, 4, 5 } { 2, 5 } { 1, 2 } { 1, 2, 5 } { 1, 2, 4, 5 } { 1, 2, 4 } { 1, 2, 3, 4 } { 1, 2, 3, 5 } { 1, 2, 3 } { 1, 3 } { 1, 3, 5 } { 1, 3, 4, 5 } { 1, 3, 4 } { 1, 4 } { 1, 4, 5 } { 1, 5 } Figure 8.5-D: The (25) k-subsets where 2 ≤ k ≤ 4 of a 5-element set in a minimal-change order. A generator for subsets in Gray code order is [FXT: class ksubset gray in comb/ksubset-gray.h]: 214 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 Chapter 8: Subsets class ksubset_gray { public: ulong n_; // k-subsets of {1, 2, ..., n} ulong kmin_, kmax_; // kmin <= k <= kmax ulong k_; // k elements in current set ulong *S_; // set in S[1,2,...,k] with elements \in {1,2,...,n} ulong j_; // aux public: ksubset_gray(ulong n, ulong kmin, ulong kmax) { n_ = (n>0 ? n : 1); // Must have 1<=kmin<=kmax<=n kmin_ = kmin; kmax_ = kmax; if ( kmax_ < kmin_ ) swap2(kmin_, kmax_); if ( kmin_==0 ) kmin_ = 1; S_ = new ulong[kmax_+1]; S_[0] = 0; // sentinel: != 1 first(); } ~ksubset_gray() { delete [] S_; } const ulong *data() const { return S_+1; } ulong num() const { return k_; } ulong last() { S_[1] = 1; k_ = kmin_; if ( kmin_==1 ) { j_ = 1; } else { for (ulong i=2; i<=kmin_; ++i) j_ = 2; } return k_; } ulong first() { k_ = kmin_; for (ulong i=1; i<=kmin_; ++i) j_ = 1; return k_; } bool is_first() const { S_[i] = n_ - kmin_ + i; } { S_[i] = n_ - kmin_ + i; } { return ( S_[1] == n_ - kmin_ + 1 ); } bool is_last() const { if ( S_[1] != 1 ) return 0; if ( kmin_<=1 ) return (k_==1); return (S_[2]==n_-kmin_+2); } [--snip--] The routines for computing the next or previous subset are adapted from a routine to compute the successor given in [192]. It is split into two auxiliary functions: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 private: void prev_even() { ulong &n=n_, &kmin=kmin_, &kmax=kmax_, &j=j_; if ( S_[j-1] == S_[j]-1 ) // can touch sentinel S[0] { S_[j-1] = S_[j]; if ( j > kmin ) { if ( S_[kmin] == n ) { j = j-2; } else { j = j-1; } } else { S_[j] = n - kmin + j; if ( S_[j-1]==S_[j]-1 ) { j = j-2; } 8.5: k-subsets where k lies in a given range 16 17 18 19 20 21 22 23 24 25 26 27 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 } } else { S_[j] = S_[j] - 1; if ( j < kmax ) { S_[j+1] = S_[j] + 1; if ( j >= kmin-1 ) { j = j+1; } } } 215 else { j = j+2; } } void prev_odd() { ulong &n=n_, &kmin=kmin_, &kmax=kmax_, &j=j_; if ( S_[j] == n ) { j = j-1; } else { if ( j < kmax ) { S_[j+1] = n; j = j+1; } else { S_[j] = S_[j]+1; if ( S_[kmin]==n ) { j = j-1; } } } } [--snip--] The next() and prev() functions use these routines. Note that calls cannot not be mixed. 1 2 3 4 5 6 7 8 ulong prev() { if ( is_first() ) { last(); return 0; } if ( j_&1 ) prev_odd(); else prev_even(); if ( j_1 ) // use mm as radix for all digits: for (ulong k=0; km1_[j] ) // =^= if ( (dj>m1_[j]) || ((long)dj<0) ) { i_[j] = -ij; // flip direction 222 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Chapter 9: Mixed radix numbers } else { // can update a_[j] = dj; // update digit dm_ = ij; // save for dir() j_ = j; // save for pos() return true; } ++j; } return false; } [--snip--] Note the if-clause: it is an optimized expression equivalent to the one given as comment. The following methods are often useful: 1 2 ulong pos() const { return j_; } int dir() const { return dm_; } // position of last change // direction of last change The routine for the computation of the predecessor is obtained by changing the plus sign in the statement ulong dj = a_[j] + ij; to a minus sign. The rate of generation is about 128 M/s for radix 2, 243 M/s for radix 4, and 304 M/s for radix 8 [FXT: comb/mixedradix-gray-demo.cc]. 9.2.2 Loopless algorithm A loopless algorithm for the computation of the successor, taken from [215, alg.H, sect.7.2.1.1], is given in [FXT: comb/mixedradix-gray2.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 class mixedradix_gray2 { public: ulong *a_; // digits ulong *m1_; // radix minus one (’nines’) ulong *f_; // focus pointer ulong *d_; // direction ulong n_; // number of digits ulong j_; // position of last change int dm_; // direction of last move [--snip--] void first() { for (ulong k=0; k=n_ ) { first(); return false; } const ulong dj = d_[j]; const ulong aj = a_[j] + dj; a_[j] = aj; dm_ = (int)dj; j_ = j; // save for dir() // save for pos() if ( aj+dj > m1_[j] ) // was last move? { d_[j] = -dj; // change direction f_[j] = f_[j+1]; // lookup next position f_[j+1] = j + 1; } return true; } 9.2: Minimal-change (Gray code) order 223 The rate of generation is about 120 M/s for radix 2, 194 M/s for radix 4, and 264 M/s for radix 8 [FXT: comb/mixedradix-gray2-demo.cc]. 9.2.3 Modular Gray code order 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: M=[ 2 3 4 ] [ . . . ] [ 1 . . ] [ 1 1 . ] [ . 1 . ] [ . 2 . ] [ 1 2 . ] [ 1 2 1 ] [ . 2 1 ] [ . . 1 ] [ 1 . 1 ] [ 1 1 1 ] [ . 1 1 ] [ . 1 2 ] [ 1 1 2 ] [ 1 2 2 ] [ . 2 2 ] [ . . 2 ] [ 1 . 2 ] [ 1 . 3 ] [ . . 3 ] [ . 1 3 ] [ 1 1 3 ] [ 1 2 3 ] [ . 2 3 ] j 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 0 1 0 1 0 2 0 1 0 1 0 2 0 1 0 1 0 2 0 1 0 1 0 M=[ 4 3 2 ] [ . . . ] [ 1 . . ] [ 2 . . ] [ 3 . . ] [ 3 1 . ] [ . 1 . ] [ 1 1 . ] [ 2 1 . ] [ 2 2 . ] [ 3 2 . ] [ . 2 . ] [ 1 2 . ] [ 1 2 1 ] [ 2 2 1 ] [ 3 2 1 ] [ . 2 1 ] [ . . 1 ] [ 1 . 1 ] [ 2 . 1 ] [ 3 . 1 ] [ 3 1 1 ] [ . 1 1 ] [ 1 1 1 ] [ 2 1 1 ] j 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 1 0 0 0 1 0 0 0 Figure 9.2-B: Mixed radix numbers in modular Gray code order, dots denote zeros. The radix vectors are M = [2, 3, 4] (left) and M = [4, 3, 2] (right). The columns ‘j’ give the position of last change. Figure 9.2-B shows the modular Gray code order for 3-digit mixed radix numbers with radix vectors M = [2, 3, 4] (left) and M = [4, 3, 2] (right). The transitions are either k → k+1 or, if k is maximal, k → 0. The mixed radix modular Gray code can be generated as follows [FXT: class mixedradix modular gray2 in comb/mixedradix-modular-gray2.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 class mixedradix_modular_gray2 { public: ulong *a_; // digits ulong *m1_; // radix minus one (’nines’) ulong *x_; // count changes of digit ulong n_; // number of digits ulong j_; // position of last change public: mixedradix_modular_gray2(ulong n, ulong mm, const ulong *m=0) { n_ = n; a_ = new ulong[n_]; m1_ = new ulong[n_+1]; // incl. sentinel at m1[n] x_ = new ulong[n_+1]; // incl. sentinel at x[n] (!= m1[n]) mixedradix_init(n_, mm, m, m1_); first(); } [--snip--] The computation of the successor works in constant amortized time 1 2 3 4 5 6 bool next() { ulong j = 0; while ( x_[j] == m1_[j] ) { x_[j] = 0; // can touch sentinels 224 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Chapter 9: Mixed radix numbers ++j; } ++x_[j]; if ( j==n_ ) j_ = j; { first(); return false; } // current is last // save position of change // increment: ulong aj = a_[j] + 1; if ( aj>m1_[j] ) aj = 0; a_[j] = aj; return true; } [--snip--] The rate of generation is about 151 M/s for radix 2, 254 M/s for radix 4, and 267 M/s for radix 8 [FXT: comb/mixedradix-modular-gray2-demo.cc]. The loopless implementation [FXT: class mixedradix modular gray in comb/mixedradix-modulargray.h] was taken from [215, ex.77, sect.7.2.1.1]. The rate of generation is about 169 M/s with radix 2, 197 M/s with radix 4, and 256 M/s with radix 8 [FXT: comb/mixedradix-modular-gray-demo.cc]. 9.3 gslex order 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: M=[ 2 3 4 ] [ 1 . . ] [ 1 1 . ] [ . 1 . ] [ 1 2 . ] [ . 2 . ] [ 1 . 1 ] [ 1 1 1 ] [ . 1 1 ] [ 1 2 1 ] [ . 2 1 ] [ . . 1 ] [ 1 . 2 ] [ 1 1 2 ] [ . 1 2 ] [ 1 2 2 ] [ . 2 2 ] [ . . 2 ] [ 1 . 3 ] [ 1 1 3 ] [ . 1 3 ] [ 1 2 3 ] [ . 2 3 ] [ . . 3 ] [ . . . ] x 1 3 2 5 4 7 9 8 11 10 6 13 15 14 17 16 12 19 21 20 23 22 18 0 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: M=[ 4 3 2 ] [ 1 . . ] [ 2 . . ] [ 3 . . ] [ 1 1 . ] [ 2 1 . ] [ 3 1 . ] [ . 1 . ] [ 1 2 . ] [ 2 2 . ] [ 3 2 . ] [ . 2 . ] [ 1 . 1 ] [ 2 . 1 ] [ 3 . 1 ] [ 1 1 1 ] [ 2 1 1 ] [ 3 1 1 ] [ . 1 1 ] [ 1 2 1 ] [ 2 2 1 ] [ 3 2 1 ] [ . 2 1 ] [ . . 1 ] [ . . . ] x 1 2 3 5 6 7 4 9 10 11 8 13 14 15 17 18 19 16 21 22 23 20 12 0 Figure 9.3-A: Mixed radix numbers in gslex (generalized subset lex) order, dots denote zeros. The radix vectors are M = [2, 3, 4] (left) and M = [4, 3, 2] (right). Successive words differ in at most three positions. Columns ‘x’ give the values. The algorithm for the generation of subsets in lexicographic order in set representation given in section 8.1.2 on page 203 can be generalized for mixed radix numbers. Figure 9.3-A shows the 3-digit mixed radix numbers for base M = [2, 3, 4] (left) and M = [4, 3, 2] (right). Note that zero is the last word in this order. For lack of a better name we call the order gslex (for generalized subset-lex ) order. A generator for the gslex order is [FXT: class mixedradix gslex in comb/mixedradix-gslex.h]: 1 2 3 4 5 6 class mixedradix_gslex { public: ulong n_; // n-digit numbers ulong *a_; // digits ulong *m1_; // m1[k] == radix-1 at position k 9.3: gslex order 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 225 public: mixedradix_gslex(ulong n, ulong mm, const ulong *m=0) { n_ = n; a_ = new ulong[n_ + 1]; a_[n_] = 1; // sentinel m1_ = new ulong[n_]; mixedradix_init(n_, mm, m, m1_); first(); } [--snip--] void first() { for (ulong k=0; k 1, else 1: 1 2 3 4 5 6 7 8 9 10 class mixedradix_endo { public: ulong *a_; // digits, sentinel a[n] ulong *m1_; // radix (minus one) for each digit ulong *le_; // last positive digit in endo order, sentinel le[n] ulong n_; // Number of digits ulong j_; // position of last change mixedradix_endo(const ulong *m, ulong n, ulong mm=0) 9.4: endo order 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 227 M=[ 5 6 ] [ . . ] [ 1 . ] [ 3 . ] [ 4 . ] [ 2 . ] [ . 1 ] [ 1 1 ] [ 3 1 ] [ 4 1 ] [ 2 1 ] [ . 3 ] [ 1 3 ] [ 3 3 ] [ 4 3 ] [ 2 3 ] x 0 1 3 4 2 5 6 8 9 7 15 16 18 19 17 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: [ . 5 ] [ 1 5 ] [ 3 5 ] [ 4 5 ] [ 2 5 ] [ . 4 ] [ 1 4 ] [ 3 4 ] [ 4 4 ] [ 2 4 ] [ . 2 ] [ 1 2 ] [ 3 2 ] [ 4 2 ] [ 2 2 ] x 25 26 28 29 27 20 21 23 24 22 10 11 13 14 12 Figure 9.4-A: Mixed radix numbers in endo order, dots denote zeros. The radix vector is M = [5, 6]. Columns ‘x’ give the values. 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 { n_ = n; a_ = new ulong[n_+1]; a_[n_] = 1; // sentinel: m1_ = new ulong[n_]; != 0 mixedradix_init(n_, mm, m, m1_); le_ = new ulong[n_+1]; le_[n_] = 0; // sentinel: for (ulong k=0; ka[n] a_[n_+1] = 0; // sentinel ==0 m1_[n_+1] = 1; // sentinel >0 mixedradix_init(n_, mm, m, m1_); ulong s = 0; for (ulong i=0; i sm_ ) return false; // too big ulong i = 0; ulong s = s_; while ( s ) { const ulong m1 = m1_[i]; if ( s >= m1 ) { a_[i] = m1; s -= m1; } else { a_[i] = s; break; } ++i; } while ( ++i= n_ ) return false; // current is last s += (a_[j] - 1); a_[j] = 0; ++a_[j+1]; // increment next digit ulong i = 0; do // set prefix to lex-first string { const ulong m1 = m1_[i]; if ( s >= m1 ) { a_[i] = m1; s -= m1; } else { a_[i] = s; s = 0; } ++i; } while ( s ); return true; } [--snip--] }; 232 Chapter 10: Permutations Chapter 10 Permutations We present algorithms for the generation of all permutations in various orders such as lexicographic and minimal-change order. Several methods to convert permutations to and from mixed radix numbers with factorial base are described. Algorithms for application, inversion, and composition of permutations and for the generation of random permutations are given in chapter 2. 10.1 Factorial representations of permutations The factorial number system corresponds to the mixed radix bases M = [2, 3, 4, . . .] (rising factorial base) or M = [. . . , 4, 3, 2] (falling factorial base). A factorial number with (n − 1)-digits can have n! different values. We develop different methods to convert factorial numbers to permutations and vice versa. 10.1.1 The Lehmer code (inversion table) Each permutation of n elements can be converted to a unique (n − 1)-digit factorial number A = [a0 , a1 , . . . , an−2 ] in the falling factorial base: for each index k (except the last) count the number of elements with indices to the right of k that are less than the current element [FXT: comb/fact2perm.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 void perm2ffact(const ulong *x, ulong n, ulong *fc) // Convert permutation in x[0,...,n-1] into // the (n-1) digit falling factorial representation in fc[0,...,n-2]. // We have: fc[0]=0; --k) { ulong i = fc[k]; if ( i ) rotate_left1(x+k, i+1); } } 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: ffact [ . . . ] [ 1 . . ] [ 2 . . ] [ 3 . . ] [ . 1 . ] [ 1 1 . ] [ 2 1 . ] [ 3 1 . ] [ . 2 . ] [ 1 2 . ] [ 2 2 . ] [ 3 2 . ] [ . . 1 ] [ 1 . 1 ] [ 2 . 1 ] [ 3 . 1 ] [ . 1 1 ] [ 1 1 1 ] [ 2 1 1 ] [ 3 1 1 ] [ . 2 1 ] [ 1 2 1 ] [ 2 2 1 ] [ 3 2 1 ] permutation [ . 1 2 3 ] [ 1 . 2 3 ] [ 2 . 1 3 ] [ 3 . 1 2 ] [ . 2 1 3 ] [ 1 2 . 3 ] [ 2 1 . 3 ] [ 3 1 . 2 ] [ . 3 1 2 ] [ 1 3 . 2 ] [ 2 3 . 1 ] [ 3 2 . 1 ] [ . 1 3 2 ] [ 1 . 3 2 ] [ 2 . 3 1 ] [ 3 . 2 1 ] [ . 2 3 1 ] [ 1 2 3 . ] [ 2 1 3 . ] [ 3 1 2 . ] [ . 3 2 1 ] [ 1 3 2 . ] [ 2 3 1 . ] [ 3 2 1 . ] rev.compl.perm. [ . 1 2 3 ] [ . 1 3 2 ] [ . 2 3 1 ] [ 1 2 3 . ] [ . 2 1 3 ] [ . 3 1 2 ] [ . 3 2 1 ] [ 1 3 2 . ] [ 1 2 . 3 ] [ 1 3 . 2 ] [ 2 3 . 1 ] [ 2 3 1 . ] [ 1 . 2 3 ] [ 1 . 3 2 ] [ 2 . 3 1 ] [ 2 1 3 . ] [ 2 . 1 3 ] [ 3 . 1 2 ] [ 3 . 2 1 ] [ 3 1 2 . ] [ 2 1 . 3 ] [ 3 1 . 2 ] [ 3 2 . 1 ] [ 3 2 1 . ] rfact [ . . . ] [ . . 1 ] [ . . 2 ] [ . . 3 ] [ . 1 . ] [ . 1 1 ] [ . 1 2 ] [ . 1 3 ] [ . 2 . ] [ . 2 1 ] [ . 2 2 ] [ . 2 3 ] [ 1 . . ] [ 1 . 1 ] [ 1 . 2 ] [ 1 . 3 ] [ 1 1 . ] [ 1 1 1 ] [ 1 1 2 ] [ 1 1 3 ] [ 1 2 . ] [ 1 2 1 ] [ 1 2 2 ] [ 1 2 3 ] Figure 10.1-A: Numbers in falling factorial base and permutations so that the number is the Lehmer code of it (left columns). Dots denote zeros. The rising factorial representation of the reversed and complemented permutation equals the reversed Lehmer code (right columns). A similar method can compute a representation in the rising factorial base. We count the number of elements to the left of k that are greater than the element at k (the number of left inversions at k): 1 2 3 4 5 6 7 8 9 void perm2rfact(const ulong *x, ulong n, ulong *fc) // Convert permutation in x[0,...,n-1] into // the (n-1) digit rising factorial representation in fc[0,...,n-2]. // We have: fc[0]<2, fc[1]<3, ..., fc[n-2]xk ) ++i; } } The inverse routine is 1 2 3 4 5 6 7 8 9 10 void rfact2perm(const ulong *fc, ulong n, ulong *x) { for (ulong k=0; k x[b] = a; We obtain the routines 1 2 3 4 5 6 7 void ffact2invperm(const ulong *fc, ulong n, ulong *x, left_right_array &LR) { LR.free_all(); for (ulong k=0; kf[j] { ulong ct = 0; for (ulong k=1; kset_all(); for (ulong k=0; knum_SLE( f[k] ); LR->get_set_idx_chg( i ); ct += i; } if ( tLR==0 ) return ct; delete LR; } 10.1.2 A representation via reversals ‡ Replacing the rotations in the computation of a permutation from its Lehmer code by reversals gives a different one-to-one relation between factorial numbers and permutations. The routine for the falling factorial base is [FXT: comb/fact2perm-rev.cc]: 1 2 3 4 5 6 7 8 9 void perm2ffact_rev(const ulong *x, ulong n, ulong *fc) { ALLOCA(ulong, ti, n); // inverse permutation for (ulong k=0; k1; --k, --len) { ulong i = fc[k]; 10.1: Factorial representations of permutations ffact 0: [ . . . ] 1: [ 1 . . ] 2: [ 2 . . ] 3: [ 3 . . ] 4: [ . 1 . ] 5: [ 1 1 . ] 6: [ 2 1 . ] 7: [ 3 1 . ] 8: [ . 2 . ] 9: [ 1 2 . ] 10: [ 2 2 . ] 11: [ 3 2 . ] 12: [ . . 1 ] 13: [ 1 . 1 ] 14: [ 2 . 1 ] 15: [ 3 . 1 ] 16: [ . 1 1 ] 17: [ 1 1 1 ] 18: [ 2 1 1 ] 19: [ 3 1 1 ] 20: [ . 2 1 ] 21: [ 1 2 1 ] 22: [ 2 2 1 ] 23: [ 3 2 1 ] permutation [ . 1 2 3 ] [ 1 2 3 . ] [ 2 3 . 1 ] [ 3 . 1 2 ] [ . 2 3 1 ] [ 1 3 . 2 ] [ 2 . 1 3 ] [ 3 1 2 . ] [ . 3 1 2 ] [ 1 . 2 3 ] [ 2 1 3 . ] [ 3 2 . 1 ] [ . 1 3 2 ] [ 1 2 . 3 ] [ 2 3 1 . ] [ 3 . 2 1 ] [ . 2 1 3 ] [ 1 3 2 . ] [ 2 . 3 1 ] [ 3 1 . 2 ] [ . 3 2 1 ] [ 1 . 3 2 ] [ 2 1 . 3 ] [ 3 2 1 . ] inv. perm. [ . 1 2 3 ] [ 3 . 1 2 ] [ 2 3 . 1 ] [ 1 2 3 . ] [ . 3 1 2 ] [ 2 . 3 1 ] [ 1 2 . 3 ] [ 3 1 2 . ] [ . 2 3 1 ] [ 1 . 2 3 ] [ 3 1 . 2 ] [ 2 3 1 . ] [ . 1 3 2 ] [ 2 . 1 3 ] [ 3 2 . 1 ] [ 1 3 2 . ] [ . 2 1 3 ] [ 3 . 2 1 ] [ 1 3 . 2 ] [ 2 1 3 . ] [ . 3 2 1 ] [ 1 . 3 2 ] [ 2 1 . 3 ] [ 3 2 1 . ] 239 rfact 0: [ . . . ] 1: [ 1 . . ] 2: [ . 1 . ] 3: [ 1 1 . ] 4: [ . 2 . ] 5: [ 1 2 . ] 6: [ . . 1 ] 7: [ 1 . 1 ] 8: [ . 1 1 ] 9: [ 1 1 1 ] 10: [ . 2 1 ] 11: [ 1 2 1 ] 12: [ . . 2 ] 13: [ 1 . 2 ] 14: [ . 1 2 ] 15: [ 1 1 2 ] 16: [ . 2 2 ] 17: [ 1 2 2 ] 18: [ . . 3 ] 19: [ 1 . 3 ] 20: [ . 1 3 ] 21: [ 1 1 3 ] 22: [ . 2 3 ] 23: [ 1 2 3 ] permutation [ . 1 2 3 ] [ . 1 3 2 ] [ . 2 3 1 ] [ . 2 1 3 ] [ . 3 1 2 ] [ . 3 2 1 ] [ 1 2 3 . ] [ 1 2 . 3 ] [ 1 3 . 2 ] [ 1 3 2 . ] [ 1 . 2 3 ] [ 1 . 3 2 ] [ 2 3 . 1 ] [ 2 3 1 . ] [ 2 . 1 3 ] [ 2 . 3 1 ] [ 2 1 3 . ] [ 2 1 . 3 ] [ 3 . 1 2 ] [ 3 . 2 1 ] [ 3 1 2 . ] [ 3 1 . 2 ] [ 3 2 . 1 ] [ 3 2 1 . ] inv. perm. [ . 1 2 3 ] [ . 1 3 2 ] [ . 3 1 2 ] [ . 2 1 3 ] [ . 2 3 1 ] [ . 3 2 1 ] [ 3 . 1 2 ] [ 2 . 1 3 ] [ 2 . 3 1 ] [ 3 . 2 1 ] [ 1 . 2 3 ] [ 1 . 3 2 ] [ 2 3 . 1 ] [ 3 2 . 1 ] [ 1 2 . 3 ] [ 1 3 . 2 ] [ 3 1 . 2 ] [ 2 1 . 3 ] [ 1 2 3 . ] [ 1 3 2 . ] [ 3 1 2 . ] [ 2 1 3 . ] [ 2 3 1 . ] [ 3 2 1 . ] Figure 10.1-D: Falling (left) and rising (right) factorial numbers and permutations via rotation code. 7 8 9 rotate_left(x+n-len, len, i); } } Figure 10.1-D shows the permutations of 4 elements corresponding to the falling and rising factorial numbers in lexicographic order [FXT: comb/fact2perm-rot-demo.cc]. The second half of the inverse permutations is the reversed permutations in the first half in reversed order. The columns of the inverse permutations with the falling factorials are cyclic shifts of each other, see section 10.12 on page 271 for more orderings with this property. The routines to compute the factorial representation of a given permutation are 1 2 3 4 5 6 7 8 9 10 11 void perm2ffact_rot(const ulong *x, ulong n, ulong *fc) { ALLOCA(ulong, t, n); for (ulong k=0; k= k fc[k] = tk - k; ulong j = ti[k]; // location of element k, j>=k ti[tk] = j; t[j] = tk; } } void perm2rfact_swp(const ulong *x, ulong n, ulong *fc) { ALLOCA(ulong, t, n); for (ulong k=0; k=k fc[n-2-k] = j - k; ulong tk = t[k]; // >=k ti[tk] = j; t[j] = tk; } } Their inverses also have linear complexity, and no additional memory is needed. The routine for falling base is 1 2 3 4 5 6 7 8 9 void ffact2perm_swp(const ulong *fc, ulong n, ulong *x) { for (ulong k=0; k p_[i+1] ); if ( (long)i<0 ) return false; // last sequence is falling seq. // find rightmost element p[j] less than p[i]: ulong j = n1; while ( p_[i] > p_[j] ) { --j; } swap2(p_[i], p_[j]); // Here the elements p[i+1], ..., p[n-1] are a falling sequence. // Reverse order to the right: ulong r = n1; ulong s = i + 1; while ( r > s ) { swap2(p_[r], p_[s]); --r; ++s; } return true; } Using the class is no black magic [FXT: comb/perm-lex-demo.cc]: ulong n = 4; perm_lex P(n); do { // visit permutation } while ( P.next() ); The routine generates about 130 million permutations per second. A faster algorithm is obtained by modifying the update operation for the co-lexicographic order (section 10.3) on the right end of the permutations [FXT: comb/perm-lex2.h]. The rate of generation is about 180 M/s when arrays are used and about 305 M/s with pointers [FXT: comb/perm-lex2-demo.cc]. The routine for computing the successor can easily be adapted for permutations of a multiset, see section 13.2.2 on page 298. 10.3 Co-lexicographic order Figure 10.3-A shows the permutations of 4 elements in co-lexicographic (colex) order. An algorithm for the generation is implemented in [FXT: class perm colex in comb/perm-colex.h]: 244 Chapter 10: Permutations 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: permutation [ 3 2 1 . ] [ 2 3 1 . ] [ 3 1 2 . ] [ 1 3 2 . ] [ 2 1 3 . ] [ 1 2 3 . ] [ 3 2 . 1 ] [ 2 3 . 1 ] [ 3 . 2 1 ] [ . 3 2 1 ] [ 2 . 3 1 ] [ . 2 3 1 ] [ 3 1 . 2 ] [ 1 3 . 2 ] [ 3 . 1 2 ] [ . 3 1 2 ] [ 1 . 3 2 ] [ . 1 3 2 ] [ 2 1 . 3 ] [ 1 2 . 3 ] [ 2 . 1 3 ] [ . 2 1 3 ] [ 1 . 2 3 ] [ . 1 2 3 ] rfact [ . . . ] [ 1 . . ] [ . 1 . ] [ 1 1 . ] [ . 2 . ] [ 1 2 . ] [ . . 1 ] [ 1 . 1 ] [ . 1 1 ] [ 1 1 1 ] [ . 2 1 ] [ 1 2 1 ] [ . . 2 ] [ 1 . 2 ] [ . 1 2 ] [ 1 1 2 ] [ . 2 2 ] [ 1 2 2 ] [ . . 3 ] [ 1 . 3 ] [ . 1 3 ] [ 1 1 3 ] [ . 2 3 ] [ 1 2 3 ] inv. perm. [ 3 2 1 . ] [ 3 2 . 1 ] [ 3 1 2 . ] [ 3 . 2 1 ] [ 3 1 . 2 ] [ 3 . 1 2 ] [ 2 3 1 . ] [ 2 3 . 1 ] [ 1 3 2 . ] [ . 3 2 1 ] [ 1 3 . 2 ] [ . 3 1 2 ] [ 2 1 3 . ] [ 2 . 3 1 ] [ 1 2 3 . ] [ . 2 3 1 ] [ 1 . 3 2 ] [ . 1 3 2 ] [ 2 1 . 3 ] [ 2 . 1 3 ] [ 1 2 . 3 ] [ . 2 1 3 ] [ 1 . 2 3 ] [ . 1 2 3 ] Figure 10.3-A: The permutations of 4 elements in co-lexicographic order. Dots denote zeros. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 class perm_colex { public: ulong *d_; // mixed radix digits with radix = [2, 3, 4, ...] ulong *x_; // permutation ulong n_; // permutations of n elements public: perm_colex(ulong n) // Must have n>=2 { n_ = n; d_ = new ulong[n_]; d_[n-1] = 0; // sentinel x_ = new ulong[n_]; first(); } [--snip--] void first() { for (ulong k=0; k=0; --j) rotate_right(p_, j+2, d_[j]); } Compare to the method of section 10.1.3 on page 238. 10.4.2 Optimizing the update routine We optimize the update routine by observing that 5 out of 6 updates are the swaps (0,1) (0,2) (0,1) (0,2) (0,1) We use a counter ct_ and modify the methods first() and next() accordingly [FXT: class perm rev2 in comb/perm-rev2.h]: 1 2 3 4 5 6 7 8 9 10 11 class perm_rev2 { perm_rev2(ulong n) { n_ = n; const ulong s = ( n_<3 ? 3 : n_ ); p_ = new ulong[s+1]; d_ = new ulong[s]; first(); } 248 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Chapter 10: Permutations [--snip--] ulong next() // Return index of last element with reversal. // Return n with last permutation. { if ( ct_!=0 ) // easy case(s) { --ct_; const ulong e = 1 + (ct_ & 1); swap2(p_[0], p_[e]); return e; } else { ct_ = 5; // reset counter ulong j = 2; // note: start with 2 while ( d_[j]==j+1 ) { d_[j]=0; ++j; } ++d_[j]; reverse(p_, j+2); // update permutation return j + 1; } } // can touch sentinel [--snip--] The speedup is remarkable, about 275 million permutations per second are generated (about 8.5 cycles per update) [FXT: comb/perm-rev2-demo.cc]. If arrays are used instead of pointers, the rate drops to about 200 M/s. 10.5 Minimal-change order (Heap’s algorithm) 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: permutation [ . 1 2 3 ] [ 1 . 2 3 ] [ 2 . 1 3 ] [ . 2 1 3 ] [ 1 2 . 3 ] [ 2 1 . 3 ] [ 3 1 . 2 ] [ 1 3 . 2 ] [ . 3 1 2 ] [ 3 . 1 2 ] [ 1 . 3 2 ] [ . 1 3 2 ] [ . 2 3 1 ] [ 2 . 3 1 ] [ 3 . 2 1 ] [ . 3 2 1 ] [ 2 3 . 1 ] [ 3 2 . 1 ] [ 3 2 1 . ] [ 2 3 1 . ] [ 1 3 2 . ] [ 3 1 2 . ] [ 2 1 3 . ] [ 1 2 3 . ] swap (0, 0) (1, 0) (2, 0) (1, 0) (2, 0) (1, 0) (3, 0) (1, 0) (2, 0) (1, 0) (2, 0) (1, 0) (3, 1) (1, 0) (2, 0) (1, 0) (2, 0) (1, 0) (3, 2) (1, 0) (2, 0) (1, 0) (2, 0) (1, 0) digits [ . . . ] [ 1 . . ] [ . 1 . ] [ 1 1 . ] [ . 2 . ] [ 1 2 . ] [ . . 1 ] [ 1 . 1 ] [ . 1 1 ] [ 1 1 1 ] [ . 2 1 ] [ 1 2 1 ] [ . . 2 ] [ 1 . 2 ] [ . 1 2 ] [ 1 1 2 ] [ . 2 2 ] [ 1 2 2 ] [ . . 3 ] [ 1 . 3 ] [ . 1 3 ] [ 1 1 3 ] [ . 2 3 ] [ 1 2 3 ] rfact(perm) [ . . . ] [ 1 . . ] [ 1 1 . ] [ . 1 . ] [ . 2 . ] [ 1 2 . ] [ 1 2 1 ] [ . 2 1 ] [ . 1 1 ] [ 1 1 1 ] [ 1 . 1 ] [ . . 1 ] [ . . 2 ] [ 1 . 2 ] [ 1 1 2 ] [ . 1 2 ] [ . 2 2 ] [ 1 2 2 ] [ 1 2 3 ] [ . 2 3 ] [ . 1 3 ] [ 1 1 3 ] [ 1 . 3 ] [ . . 3 ] inv. perm. [ . 1 2 3 ] [ 1 . 2 3 ] [ 1 2 . 3 ] [ . 2 1 3 ] [ 2 . 1 3 ] [ 2 1 . 3 ] [ 2 1 3 . ] [ 2 . 3 1 ] [ . 2 3 1 ] [ 1 2 3 . ] [ 1 . 3 2 ] [ . 1 3 2 ] [ . 3 1 2 ] [ 1 3 . 2 ] [ 1 3 2 . ] [ . 3 2 1 ] [ 2 3 . 1 ] [ 2 3 1 . ] [ 3 2 1 . ] [ 3 2 . 1 ] [ 3 . 2 1 ] [ 3 1 2 . ] [ 3 1 . 2 ] [ 3 . 1 2 ] Figure 10.5-A: The permutations of 4 elements in a minimal-change order. Dots denote zeros. Figure 10.5-A shows the permutations of 4 elements in a minimal-change order : just 2 elements are swapped with each update. The column labeled digits shows the mixed radix numbers with rising factorial base in counting order. Let j be the position of the rightmost change of the mixed radix string R. Then the swap is (j + 1, x) where x = 0 if j is odd, and x = Rj − 1 if j is even. The sequence of values j + 1 starts 1, 2, 1, 2, 1, 3, 1, 2, 1, 2, 1, 3, 1, 2, 1, 2, 1, 3, 1, 2, 1, 2, 1, 4, 1, 2, 1, ... The n-th value (starting with n = 1) is the largest z such that z! divides n (entry A055881 in [312]). 10.5: Minimal-change order (Heap’s algorithm) 249 The list rising factorial representations of the permutations is a Gray code only for permutations of up to four elements. (column labeled rfact(perm) in figure 10.5-A). An implementation of the algorithm (given in [178]) is [FXT: class perm heap in comb/perm-heap.h]: 1 2 3 4 5 6 7 8 class perm_heap { public: ulong *d_; // mixed radix digits with radix = [2, 3, 4, ..., n-1, (sentinel=-1)] ulong *p_; // permutation ulong n_; // permutations of n elements ulong sw1_, sw2_; // indices of swapped elements [--snip--] The computation of the successor is simple: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 bool next() { // increment mixed radix number: ulong j = 0; while ( d_[j]==j+1 ) { d_[j]=0; ++j; } // can touch sentinel // j==n-1 for last permutation: if ( j==n_-1 ) return false; ulong k = j+1; ulong x = ( k&1 ? d_[j] : 0 ); swap2(p_[k], p_[x]); // omit statement to just compute swaps sw1_ = k; sw2_ = x; ++d_[j]; return true; } [--snip--] About 133 million permutations are generated per second. Often one will only use the indices of the swapped elements to update the visited configurations: 1 void get_swap(ulong &s1, ulong &s2) const { s1=sw1_; s2=sw2_; } Then the statement swap2(p_[k], p_[x]); in the update routine can be omitted which leads to a rate of 215 M/s. Figure 10.5-A shows the permutations of 4 elements. It was created with the program [FXT: comb/perm-heap-demo.cc]. 10.5.1 Optimized implementation The algorithm can be optimized by treating 5 out of 6 cases separately, those where the first or second digit in the mixed radix number changes [FXT: class perm heap2 in comb/perm-heap2.h]: 1 2 3 4 5 6 7 8 9 class perm_heap2 { public: ulong *d_; // mixed radix digits with radix = [2, 3, 4, 5, ..., n-1, (sentinel=-1)] ulong *p_; // permutation ulong n_; // permutations of n elements ulong sw1_, sw2_; // indices of swapped elements ulong ct_; // count 5,4,3,2,1,(0); nonzero ==> easy cases [--snip--] The counter is set to 5 in the method first(). The update routine is 1 2 3 4 5 6 7 8 9 10 11 12 13 ulong next() // Return index of last element with reversal. // Return n with last permutation. { if ( ct_!=0 ) // easy cases { --ct_; sw1_ = 1 + (ct_ & 1); // == 1,2,1,2,1 sw2_ = 0; swap2(p_[sw1_], p_[sw2_]); return sw1_; } else 250 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Chapter 10: Permutations { ct_ = 5; // reset counter // increment mixed radix number: ulong j = 2; while ( d_[j]==j+1 ) { d_[j]=0; ++j; } // can touch sentinel // j==n-1 for last permutation: if ( j==n_-1 ) return n_; ulong k = j+1; ulong x = ( k&1 ? d_[j] : 0 ); swap2(p_[k], p_[x]); sw1_ = k; sw2_ = x; ++d_[j]; return k; } } Usage of the class is shown in [FXT: comb/perm-heap2-demo.cc]: 1 do { /* visit permutation */ } while ( P.next()!=n ); The rate of generation is about 280 M/s (7.85 cycles per update), and 460 M/s (4.78 cycles per update) with fixed arrays. If only the swaps are of interest, we can simply omit all statements involving the permutation array p_[]. The implementation is [FXT: class perm heap2 swaps in comb/perm-heap2-swaps.h], usage of the class is shown in [FXT: comb/perm-heap2-swaps-demo.cc]. Heap’s algorithm and the optimization idea was taken from the excellent survey [305] which gives several permutation algorithms and implementations in pseudocode. 10.6 Lipski’s Minimal-change orders Several algorithms similar to Heap’s method are given in Lipski’s paper [235]. 10.6.1 Variants of Heap’s algorithm Four orderings for the permutations of five elements are shown in figure 10.6-A. The leftmost order is Heap’s order. The implementation is given in [FXT: class perm gray lipski in comb/perm-graylipski.h], the variable r determines the order that is generated: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 class perm_gray_lipski { [--snip--] ulong r_; // order (0<=r<4): [--snip--] bool next() { // increment mixed radix number: ulong j = 0; while ( d_[j]==j+1 ) { d_[j]=0; ++j; } if ( j [2, 3] --> [3, 2] -----------------P=[2, 3] --> [1, 2, 3] --> [2, 1, 3] --> [2, 3, 1] P=[3, 2] --> [3, 2, 1] --> [3, 1, 2] --> [1, 3, 2] -----------------P=[1, 2, 3] --> [0, 1, 2, 3] --> [1, 0, 2, 3] --> [1, 2, 0, 3] --> [1, 2, 3, 0] P=[2, 1, 3] --> [2, 1, 3, 0] --> [2, 1, 0, 3] --> [2, 0, 1, 3] --> [0, 2, 1, 3] P=[2, 3, 1] --> [0, 2, 3, 1] --> [2, 0, 3, 1] --> [2, 3, 0, 1] --> [2, 3, 1, 0] P=[3, 2, 1] --> [3, 2, 1, 0] --> [3, 2, 0, 1] --> [3, 0, 2, 1] --> [0, 3, 2, 1] perm(4)== [0, 1, 2, 3] [1, 0, 2, 3] [1, 2, 0, 3] [1, 2, 3, 0] [2, 1, 3, 0] [2, 1, 0, 3] [2, 0, 1, 3] [0, 2, 1, 3] [0, 2, 3, 1] [2, 0, 3, 1] [2, 3, 0, 1] [2, 3, 1, 0] [3, 2, 1, 0] [3, 2, 0, 1] [3, 0, 2, 1] [0, 3, 2, 1] [0, 3, 1, 2] [3, 0, 1, 2] [3, 1, 0, 2] [3, 1, 2, 0] [1, 3, 2, 0] [1, 3, 0, 2] [1, 0, 3, 2] [0, 1, 3, 2] P=[3, 1, 2] --> [0, 3, 1, 2] --> [3, 0, 1, 2] --> [3, 1, 0, 2] --> [3, 1, 2, 0] P=[1, 3, 2] --> [1, 3, 2, 0] --> [1, 3, 0, 2] --> [1, 0, 3, 2] --> [0, 1, 3, 2] Figure 10.7-B: Trotter’s construction as an interleaving process. 254 Chapter 10: Permutations 10.7 Strong minimal-change order (Trotter’s algorithm) Figure 10.7-A shows the permutations of 4 elements in a strong minimal-change order : just two elements are swapped with each update and these are adjacent. In the sequence of the inverse permutations the swapped pair always consists of elements x and x + 1. Also the first and last permutation differ by an adjacent transposition (of the last two elements). The ordering can be obtained by an interleaving process shown in figure 10.7-B. The first half of the permutations in this order are the reversals of the second half: the relative order of the two smallest elements is changed only with the transition just after the first half and reversal changes the order of these two elements. Mutually reversed permutations lie n!/2 positions apart. A computer program to generate all permutations in the shown order was given 1962 by H. F. Trotter [334], see also [193] and [137]. We compute both the permutation and its inverse [FXT: class perm trotter in comb/perm-trotter.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 class perm_trotter { public: ulong n_; // number of elements to permute ulong *x_; // permutation of {0, 1, ..., n-1} ulong *xi_; // inverse permutation ulong *d_; // auxiliary: directions ulong sw1_, sw2_; // indices of elements swapped most recently public: perm_trotter(ulong n) { n_ = n; x_ = new ulong[n_+2]; xi_ = new ulong[n_]; d_ = new ulong[n_]; ulong sen = 0; // sentinel value minimal x_[0] = x_[n_+1] = sen; ++x_; first(); } [--snip--] Sentinel elements are put at the lower and the higher end of the array for the permutation. For each element we store a direction-flag = ±1 in an array d_[]. Initially all are set to +1: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 void fl_swaps() // Auxiliary routine for first() and last(). // Set sw1, sw2 to swaps between first and last permutation. { sw1_ = ( n_==0 ? 0 : n_ - 1 ); sw2_ = ( n_<2 ? 0 : n_ - 2 ); } void first() { for (ulong i=0; i 0,1 == S2 S2 = 01 --> 01,20,12 == S3 S3 = 012012 --> 012012,301301,230230,123123 == S4 S4 = (S3-0),(S3-1),(S3-2),(S3-3) modulo 4 S5 = (S4-0),(S4-1),(S4-2),(S4-3),(S4-4) modulo 5 == 012012301301230230123123,401401240240124124012012,340340134134013013401401, \ 234234023023402402340340,123123412412341341234234 Figure 10.8-B: Construction of the first column of the list of permutations, also sequence of positions of element zero in the inverse permutations. The sequence of positions swapped with the first position, entry A123400 in [312], starts as 1,2,1,2,1,3,2,1,2,1,2,3,1,2,1,2,1,3,2,1,2,1,2,4,3,1,3,1,3,2,1,3,1,3,1,2,3,1,3,1,3,2,1, ... The sequence of positions of the element zero is entry A159880, it starts as 0,1,2,0,1,2,3,0,1,3,0,1,2,3,0,2,3,0,1,2,3,1,2,3,4,0,1,4,0,1,2,4,0,2,4,0,1,2,4,1,2,4,0, ... It can be constructed as shown in figure 10.8-B. The sequence can be generated via the permutations described in section 10.4 on page 245. Thus we can compute the inverse permutations as shown in figure 10.8-C. The listing was created with the program [FXT: comb/perm-star-inv-demo.cc]: 1 2 3 ulong n = 4; perm_rev2 P(n); P.first(); const ulong *r = P.data(); 258 Chapter 10: Permutations 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: inv. star-p. [ . 1 2 3 ] [ 1 . 2 3 ] [ 1 2 . 3 ] [ . 2 1 3 ] [ 2 . 1 3 ] [ 2 1 . 3 ] [ 2 1 3 . ] [ . 1 3 2 ] [ 1 . 3 2 ] [ 1 2 3 . ] [ . 2 3 1 ] [ 2 . 3 1 ] [ 2 3 . 1 ] [ 2 3 1 . ] [ . 3 1 2 ] [ 1 3 . 2 ] [ 1 3 2 . ] [ . 3 2 1 ] [ 3 . 2 1 ] [ 3 2 . 1 ] [ 3 2 1 . ] [ 3 . 1 2 ] [ 3 1 . 2 ] [ 3 1 2 . ] swap (0, 1) (1, 2) (2, 0) (0, 1) (1, 2) (2, 3) (3, 0) (0, 1) (1, 3) (3, 0) (0, 1) (1, 2) (2, 3) (3, 0) (0, 2) (2, 3) (3, 0) (0, 1) (1, 2) (2, 3) (3, 1) (1, 2) (2, 3) perm-rev [ . 1 2 3 ] [ 1 . 2 3 ] [ 2 . 1 3 ] [ . 2 1 3 ] [ 1 2 . 3 ] [ 2 1 . 3 ] [ 3 . 1 2 ] [ . 3 1 2 ] [ 1 3 . 2 ] [ 3 1 . 2 ] [ . 1 3 2 ] [ 1 . 3 2 ] [ 2 3 . 1 ] [ 3 2 . 1 ] [ . 2 3 1 ] [ 2 . 3 1 ] [ 3 . 2 1 ] [ . 3 2 1 ] [ 1 2 3 . ] [ 2 1 3 . ] [ 3 1 2 . ] [ 1 3 2 . ] [ 2 3 1 . ] [ 3 2 1 . ] Figure 10.8-C: The inverse permutations of 4 elements with star-transposition order (left). The swaps are determined by the first element of the permutations generated via reversals (right). 4 5 6 7 8 9 10 11 12 13 14 15 ulong *x = new ulong[n]; for (ulong k=0; kfirst(); for (ulong k=0; knext() ) { first(); return false; } const ulong j = mrg_->pos(); // position of changed digit const int d = mrg_->dir(); // direction of change // swap: const ulong x1 = j; // element j const ulong i1 = ix_[x1]; // position of j const ulong i2 = i1 + d; // neighbor const ulong x2 = x_[i2]; // position of neighbor x_[i1] = x2; x_[i2] = x1; // swap2(x_[i1], x_[i2]); ix_[x1] = i2; ix_[x2] = i1; // swap2(ix_[x1], ix_[x2]); sw1_=i1; sw2_=i2; return true; } The class uses the loopless algorithm for the computation of the mixed radix Gray code, so it is loopless itself. An alternative (CAT) algorithm is implemented in [FXT: class perm gray ffact in comb/permgray-ffact.h], we give just the routine for the successor: 260 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 Chapter 10: Permutations private: void swap(ulong j, ulong im) // used with next() and prev() { const ulong x1 = j; // element j const ulong i1 = ix_[x1]; // position of j const ulong i2 = i1 + im; // neighbor const ulong x2 = x_[i2]; // position of neighbor x_[i1] = x2; x_[i2] = x1; // swap2(x_[i1], x_[i2]); ix_[x1] = i2; ix_[x2] = i1; // swap2(ix_[x1], ix_[x2]); sw1_=i1; sw2_=i2; } public: bool next() { ulong j = 0; ulong m1 = n_ - 1; // nine in falling factorial base ulong ij; while ( (ij=i_[j]) ) { ulong im = i_[j]; ulong dj = d_[j] + im; if ( dj>m1 ) // =^= if ( (dj>m1) || ((long)dj<0) ) { i_[j] = -ij; } else { d_[j] = dj; swap(j, im); return true; } --m1; ++j; } return false; } To compute the predecessor (method prev()), we only need to modify one statement as follows: ulong im = i_[j]; ulong im = -i_[j]; // next() // prev() The loopless routine computes about 80 million permutations per second, the CAT version about 160 million per second [FXT: comb/perm-gray-ffact-demo.cc]. Both are slower than the implementation given in section 10.7.1 on page 255. 10.9.2 Permutations with rising factorial numbers Figure 10.9-B shows a Gray code for permutations based on the Gray code for numbers in rising factorial base. The ordering coincides with Heap’s algorithm (see section 10.5 on page 248) for up to four elements. A recursive construction for the order is shown in figure 10.9-C. The figure was created with the program [FXT: comb/perm-gray-rfact-demo.cc]. A constant amortized time (CAT) algorithm for generating the permutations is [FXT: class perm gray rfact in comb/perm-gray-rfact.h] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 class perm_gray_rfact { public: mixedradix_gray *M_; // loopless routine ulong n_; // number of elements to permute ulong *x_; // current permutation (of {0, 1, ..., n-1}) ulong *ix_; // inverse permutation ulong sw1_, sw2_; // indices of elements swapped most recently public: perm_gray_rfact(ulong n) { n_ = n; x_ = new ulong[n_]; ix_ = new ulong[n_]; M_ = new mixedradix_gray(n_-1, 1); // rising factorial base 10.9: Minimal-change orders from factorial numbers 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: permutation [ . 1 2 3 ] [ 1 . 2 3 ] [ 2 . 1 3 ] [ . 2 1 3 ] [ 1 2 . 3 ] [ 2 1 . 3 ] [ 3 1 . 2 ] [ 1 3 . 2 ] [ . 3 1 2 ] [ 3 . 1 2 ] [ 1 . 3 2 ] [ . 1 3 2 ] [ . 2 3 1 ] [ 2 . 3 1 ] [ 3 . 2 1 ] [ . 3 2 1 ] [ 2 3 . 1 ] [ 3 2 . 1 ] [ 3 2 1 . ] [ 2 3 1 . ] [ 1 3 2 . ] [ 3 1 2 . ] [ 2 1 3 . ] [ 1 2 3 . ] rfact [ . . . ] [ 1 . . ] [ 1 1 . ] [ . 1 . ] [ . 2 . ] [ 1 2 . ] [ 1 2 1 ] [ . 2 1 ] [ . 1 1 ] [ 1 1 1 ] [ 1 . 1 ] [ . . 1 ] [ . . 2 ] [ 1 . 2 ] [ 1 1 2 ] [ . 1 2 ] [ . 2 2 ] [ 1 2 2 ] [ 1 2 3 ] [ . 2 3 ] [ . 1 3 ] [ 1 1 3 ] [ 1 . 3 ] [ . . 3 ] 261 pos dir 0 1 0 1 0 2 0 1 0 1 0 2 0 1 0 1 0 2 0 1 0 1 0 +1 +1 -1 +1 +1 +1 -1 -1 +1 -1 -1 +1 +1 +1 -1 +1 +1 +1 -1 -1 +1 -1 -1 inverse perm. [ . 1 2 3 ] [ 1 . 2 3 ] [ 1 2 . 3 ] [ . 2 1 3 ] [ 2 . 1 3 ] [ 2 1 . 3 ] [ 2 1 3 . ] [ 2 . 3 1 ] [ . 2 3 1 ] [ 1 2 3 . ] [ 1 . 3 2 ] [ . 1 3 2 ] [ . 3 1 2 ] [ 1 3 . 2 ] [ 1 3 2 . ] [ . 3 2 1 ] [ 2 3 . 1 ] [ 2 3 1 . ] [ 3 2 1 . ] [ 3 2 . 1 ] [ 3 . 2 1 ] [ 3 1 2 . ] [ 3 1 . 2 ] [ 3 . 1 2 ] Figure 10.9-B: Permutations in minimal-change order (left) and Gray code for mixed radix numbers with rising factorial base. For even n the first and last permutations are cyclic shifts of each other. append 3: 012 3 102 3 201 3 021 3 120 3 210 3 perm(2)= 01 10 append 2: 01 2 10 2 reverse and swap (2,1) 20 1 02 1 reverse and swap (1,0) 12 0 21 0 ==> perm(3) 012 102 201 021 120 210 reverse and swap (3,2): 310 2 130 2 031 2 301 2 103 2 013 2 reverse and swap (2,1): 023 1 203 1 302 1 032 1 230 1 320 1 reverse and swap (1,0): 321 0 231 0 132 0 312 0 213 0 123 0 Figure 10.9-C: Recursive construction of the permutations. ==> perm(4): 0123 1023 2013 0213 1203 2103 3102 1302 0312 3012 1032 0132 0231 2031 3021 0321 2301 3201 3210 2310 1320 3120 2130 1230 262 17 18 19 20 21 22 23 24 25 Chapter 10: Permutations first(); } [--snip--] void first() { M_->first(); for (ulong k=0; k 0, and the smallest element greater than x1 for d < 0: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 bool next() { // Compute next mixed radix number in Gray code order: if ( false == M_->next() ) { first(); return false; } ulong j = M_->pos(); // position of changed digit if ( j<=1 ) // easy cases: swap == (0,j+1) { const ulong i2 = j+1; // i1 == 0 const ulong x1 = x_[0], x2 = x_[i2]; x_[0] = x2; x_[i2] = x1; // swap2(x_[i1], x_[i2]); ix_[x1] = i2; ix_[x2] = 0; // swap2(ix_[x1], ix_[x2]); sw1_=0; sw2_=i2; return true; } else { ulong i1 = j+1, i2 = i1; ulong x1 = x_[i1], x2; int d = M_->dir(); // direction of change if ( d>0 ) { x2 = 0; for (ulong t=0; t= x2) ) { i2=t; x2=xt; } } } else { x2 = n_; for (ulong t=0; t x1) && (xt <= x2) ) { i2=t; x2=xt; } } } x_[i1] = x2; ix_[x1] = i2; x_[i2] = x1; ix_[x2] = i1; // swap2(x_[i1], x_[i2]); // swap2(ix_[x1], ix_[x2]); sw1_=i2; sw2_=i1; return true; } } There is a slightly more efficient algorithm to compute the successor using the inverse permutations: 1 2 3 4 5 6 7 8 9 10 11 12 13 bool next() { [--snip--] /* easy cases as before */ else { ulong i1 = j+1, i2 = i1; ulong x1 = x_[i1], x2; int d = M_->dir(); // direction of change if ( d>0 ) // in the inverse permutation search first smaller element left: { for (x2=x1-1; ; --x2) if ( (i2=ix_[x2]) < i1 ) break; } else // in the inverse permutation search first smaller element right: 10.9: Minimal-change orders from factorial numbers 14 15 16 17 18 19 263 { for (x2=x1+1; ; ++x2) } [--snip--] /* swaps as before */ } } if ( (i2=ix_[x2]) < i1 ) break; The method is chosen by defining SUCC_BY_INV in the file [FXT: comb/perm-gray-rfact.h]. About 80 million permutations per second are generated, about 71 million with the first method. 10.9.3 Permutations with permuted factorial numbers permutation 0: [ . 1 2 3 4 ] 1: [ 1 . 2 3 4 ] 2: [ 2 . 1 3 4 ] 3: [ . 2 1 3 4 ] 4: [ 1 2 . 3 4 ] 5: [ 2 1 . 3 4 ] 6: [ 2 1 . 4 3 ] 7: [ 1 2 . 4 3 ] [--snip--] 91: [ 3 4 2 1 . ] 92: [ 2 4 3 1 . ] 93: [ 4 2 3 1 . ] 94: [ 3 2 4 1 . ] 95: [ 2 3 4 1 . ] 96: [ 2 3 4 . 1 ] 97: [ 3 2 4 . 1 ] [--snip--] 106: [ 3 1 4 . 2 ] 107: [ 1 3 4 . 2 ] 108: [ 1 2 4 . 3 ] 109: [ 2 1 4 . 3 ] 110: [ 4 1 2 . 3 ] 111: [ 1 4 2 . 3 ] 112: [ 2 4 1 . 3 ] 113: [ 4 2 1 . 3 ] 114: [ 3 2 1 . 4 ] 115: [ 2 3 1 . 4 ] 116: [ 1 3 2 . 4 ] 117: [ 3 1 2 . 4 ] 118: [ 2 1 3 . 4 ] 119: [ 1 2 3 . 4 ] swap pos dir (0, 1) (0, 2) (0, 1) (0, 2) (0, 1) (3, 4) (0, 1) xfact [ . . . . ] [ 1 . . . ] [ 1 1 . . ] [ . 1 . . ] [ . 2 . . ] [ 1 2 . . ] [ 1 2 1 . ] [ . 2 1 . ] 0 1 0 1 0 2 0 +1 +1 -1 +1 +1 +1 -1 inv.perm. [ . 1 2 3 4 ] [ 1 . 2 3 4 ] [ 1 2 . 3 4 ] [ . 2 1 3 4 ] [ 2 . 1 3 4 ] [ 2 1 . 3 4 ] [ 2 1 . 4 3 ] [ 2 . 1 4 3 ] (0, 1) (0, 2) (0, 1) (0, 2) (0, 1) (3, 4) (0, 1) [ . 2 4 3 ] [ . 1 4 3 ] [ 1 1 4 3 ] [ 1 . 4 3 ] [ . . 4 3 ] [ . . 3 3 ] [ 1 . 3 3 ] 0 1 0 1 0 2 0 -1 -1 +1 -1 -1 -1 +1 [ 4 3 2 . 1 ] [ 4 3 . 2 1 ] [ 4 3 1 2 . ] [ 4 3 1 . 2 ] [ 4 3 . 1 2 ] [ 3 4 . 1 2 ] [ 3 4 1 . 2 ] (0, 2) (0, 1) (1, 4) (0, 1) (0, 2) (0, 1) (0, 2) (0, 1) (0, 4) (0, 1) (0, 2) (0, 1) (0, 2) (0, 1) [ 1 . 2 3 ] [ . . 2 3 ] [ . . 1 3 ] [ 1 . 1 3 ] [ 1 1 1 3 ] [ . 1 1 3 ] [ . 2 1 3 ] [ 1 2 1 3 ] [ 1 2 . 3 ] [ . 2 . 3 ] [ . 1 . 3 ] [ 1 1 . 3 ] [ 1 . . 3 ] [ . . . 3 ] 1 0 2 0 1 0 1 0 2 0 1 0 1 0 -1 -1 -1 +1 +1 -1 +1 +1 -1 -1 -1 +1 -1 -1 [ 3 1 4 . 2 ] [ 3 . 4 1 2 ] [ 3 . 1 4 2 ] [ 3 1 . 4 2 ] [ 3 1 2 4 . ] [ 3 . 2 4 1 ] [ 3 2 . 4 1 ] [ 3 2 1 4 . ] [ 3 2 1 . 4 ] [ 3 2 . 1 4 ] [ 3 . 2 1 4 ] [ 3 1 2 . 4 ] [ 3 1 . 2 4 ] [ 3 . 1 2 4 ] Figure 10.9-D: Permutations with mixed radix numbers with radix vector [2, 3, 5, 4]. The rising and falling factorial numbers are special cases of factorial numbers with permuted digits. We give a method to compute the Gray code for permutations from the Gray code for permuted (falling) factorial numbers. A permutation of the radices determines how often a digit at any position is changed: the leftmost changes most often, the rightmost least often. The permutations corresponding to the mixed radix numbers with radix vector [2, 3, 5, 4], the falling factorial last two radices swapped, is shown in figure 10.9-D [FXT: comb/perm-gray-rot1-demo.cc]. The desired property of this ordering is that the last permutation is as close to a cyclic shift by one position of the first as possible. With even n the Gray code with the falling factorial base the last permutation is a shift by one. With odd n no such Gray code exists: the total number of transpositions with any Gray code is odd for all n > 1, but the cyclic rotation by one corresponds to an even number of transpositions. The best we can get is that the first e elements where e ≤ n is the greatest possible even number. For example, n=6: n=7: first [ 0 1 2 3 4 5 ] [ 0 1 2 3 4 5 6 ] last [ 1 2 3 4 5 0 ] [ 1 2 3 4 5 0 6 ] We use this ordering to show the general method [FXT: class perm gray rot1 in comb/perm-grayrot1.h]: 1 class perm_gray_rot1 264 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Chapter 10: Permutations { public: mixedradix_gray *M_; // Gray code for factorial numbers ulong n_; // number of elements to permute ulong *x_; // current permutation (of {0, 1, ..., n-1}) ulong *ix_; // inverse permutation ulong sw1_, sw2_; // indices of elements swapped most recently public: perm_gray_rot1(ulong n) // Must have: n>=1 { n_ = (n ? n : 1); // at least one x_ = new ulong[n_]; ix_ = new ulong[n_]; M_ = new mixedradix_gray(n_-1, 1); // rising factorial base // apply permutation of radix vector with mixed radix number: if ( (n_ >= 3) && (n & 1) ) // odd n>=3 { ulong *m1 = M_->m1_; swap2(m1[n_-2], m1[n_-3]); // swap last two factorial nines } first(); } [--snip--] The permutation applied here can be replaced by any permutation, the following update routines will still work: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 bool next() { // Compute next mixed radix number in Gray code order: if ( false == M_->next() ) { first(); return false; } const ulong j = M_->pos(); // position of changed digit const ulong i1 = M_->m1_[j]; // valid for any permutation of factorial radices const ulong x1 = x_[i1]; ulong i2 = i1, x2; const int d = M_->dir(); // direction of change if ( d>0 ) // in the inverse permutation search first smaller element left: { for (x2=x1-1; ; --x2) if ( (i2=ix_[x2]) < i1 ) break; } else // in the inverse permutation search first smaller element right: { for (x2=x1+1; ; ++x2) if ( (i2=ix_[x2]) < i1 ) break; } x_[i1] = x2; ix_[x1] = i2; sw1_=i2; x_[i2] = x1; ix_[x2] = i1; // swap2(x_[i1], x_[i2]); // swap2(ix_[x1], ix_[x2]); sw2_=i1; return true; } [--snip--] Note that instead of taking j + 1 as the position of the element to move, we take the value of the nine at the position j. The special ordering shown here can be used to construct a Gray code with the single track property, see section 10.12.2 on page 274. 10.10 Derangement order In a derangement order for permutations two successive permutations have no element at the same position, as shown in figure 10.10-A. The listing was created with the program [FXT: comb/permderange-demo.cc]. There is no derangement order for n = 3. An implementation of the underlying algorithm (given in [298, p.611]) is [FXT: class perm derange in comb/perm-derange.h]: 10.10: Derangement order 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: permutation [ . 1 2 3 ] [ 3 . 1 2 ] [ 1 2 3 . ] [ 2 3 . 1 ] [ 1 . 2 3 ] [ 3 1 . 2 ] [ . 2 3 1 ] [ 2 3 1 . ] [ 1 2 . 3 ] [ 3 1 2 . ] [ 2 . 3 1 ] [ . 3 1 2 ] [ 2 1 . 3 ] [ 3 2 1 . ] [ 1 . 3 2 ] [ . 3 2 1 ] [ 2 . 1 3 ] [ 3 2 . 1 ] [ . 1 3 2 ] [ 1 3 2 . ] [ . 2 1 3 ] [ 3 . 2 1 ] [ 2 1 3 . ] [ 1 3 . 2 ] 265 inverse perm. [ . 1 2 3 ] [ 1 2 3 . ] [ 3 . 1 2 ] [ 2 3 . 1 ] [ 1 . 2 3 ] [ 2 1 3 . ] [ . 3 1 2 ] [ 3 2 . 1 ] [ 2 . 1 3 ] [ 3 1 2 . ] [ 1 3 . 2 ] [ . 2 3 1 ] [ 2 1 . 3 ] [ 3 2 1 . ] [ 1 . 3 2 ] [ . 3 2 1 ] [ 1 2 . 3 ] [ 2 3 1 . ] [ . 1 3 2 ] [ 3 . 2 1 ] [ . 2 1 3 ] [ 1 3 2 . ] [ 3 1 . 2 ] [ 2 . 3 1 ] Figure 10.10-A: The permutations of 4 elements in derangement order. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 class perm_derange { public: ulong n_; // number of elements ulong *x_; // current permutation ulong ctm_; // counter modulo n perm_trotter* T_; public: perm_derange(ulong n) // Must have: n>=4 // n=2: trivial, n=3: no solution exists, { n_ = n; x_ = new ulong[n_]; T_ = new perm_trotter(n_-1); first(); } [--snip--] n>=4: ok The routine to update the permutation is 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 bool next() { ++ctm_; if ( ctm_>=n_ ) // every n steps: need next perm_trotter { ctm_ = 0; if ( ! T_->next() ) return false; // current permutation is last const ulong *t = T_->data(); for (ulong k=0; k=2 { n_ = n; d_ = new ulong[n_]; d_[n-1] = 1; // sentinel (must be nonzero) x_ = new ulong[n_]; 268 Chapter 10: Permutations -----------------P=[3] --> [2, 3] --> [3, 2] -----------------P=[2, 3] --> [1, 2, 3] --> [2, 1, 3] --> [2, 3, 1] P=[3, 2] --> [1, 3, 2] --> [3, 1, 2] --> [3, 2, 1] -----------------P=[1, 2, 3] --> [0, 1, 2, 3] --> [1, 0, 2, 3] --> [1, 2, 0, 3] --> [1, 2, 3, 0] P=[2, 1, 3] --> [0, 2, 1, 3] --> [2, 0, 1, 3] --> [2, 1, 0, 3] --> [2, 1, 3, 0] P=[2, 3, 1] --> [0, 2, 3, 1] --> [2, 0, 3, 1] --> [2, 3, 0, 1] --> [2, 3, 1, 0] P=[1, 3, 2] --> [0, 1, 3, 2] --> [1, 0, 3, 2] --> [1, 3, 0, 2] --> [1, 3, 2, 0] P=[3, 1, 2] --> [0, 3, 1, 2] --> [3, 0, 1, 2] --> [3, 1, 0, 2] --> [3, 1, 2, 0] perm(4)== [0, 1, 2, 3] [1, 0, 2, 3] [1, 2, 0, 3] [1, 2, 3, 0] [0, 2, 1, 3] [2, 0, 1, 3] [2, 1, 0, 3] [2, 1, 3, 0] [0, 2, 3, 1] [2, 0, 3, 1] [2, 3, 0, 1] [2, 3, 1, 0] [0, 1, 3, 2] [1, 0, 3, 2] [1, 3, 0, 2] [1, 3, 2, 0] [0, 3, 1, 2] [3, 0, 1, 2] [3, 1, 0, 2] [3, 1, 2, 0] [0, 3, 2, 1] [3, 0, 2, 1] [3, 2, 0, 1] [3, 2, 1, 0] P=[3, 2, 1] --> [0, 3, 2, 1] --> [3, 0, 2, 1] --> [3, 2, 0, 1] --> [3, 2, 1, 0] Figure 10.11-A: Interleaving process to generate all permutations by right moves. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: permutation [ . 1 2 3 ] [ 1 . 2 3 ] [ 1 2 . 3 ] [ 1 2 3 . ] [ . 2 1 3 ] [ 2 . 1 3 ] [ 2 1 . 3 ] [ 2 1 3 . ] [ . 2 3 1 ] [ 2 . 3 1 ] [ 2 3 . 1 ] [ 2 3 1 . ] [ . 1 3 2 ] [ 1 . 3 2 ] [ 1 3 . 2 ] [ 1 3 2 . ] [ . 3 1 2 ] [ 3 . 1 2 ] [ 3 1 . 2 ] [ 3 1 2 . ] [ . 3 2 1 ] [ 3 . 2 1 ] [ 3 2 . 1 ] [ 3 2 1 . ] ffact [ . . . ] [ 1 . . ] [ 2 . . ] [ 3 . . ] [ . 1 . ] [ 1 1 . ] [ 2 1 . ] [ 3 1 . ] [ . 2 . ] [ 1 2 . ] [ 2 2 . ] [ 3 2 . ] [ . . 1 ] [ 1 . 1 ] [ 2 . 1 ] [ 3 . 1 ] [ . 1 1 ] [ 1 1 1 ] [ 2 1 1 ] [ 3 1 1 ] [ . 2 1 ] [ 1 2 1 ] [ 2 2 1 ] [ 3 2 1 ] inv. perm. [ . 1 2 3 ] [ 1 . 2 3 ] [ 2 . 1 3 ] [ 3 . 1 2 ] [ . 2 1 3 ] [ 1 2 . 3 ] [ 2 1 . 3 ] [ 3 1 . 2 ] [ . 3 1 2 ] [ 1 3 . 2 ] [ 2 3 . 1 ] [ 3 2 . 1 ] [ . 1 3 2 ] [ 1 . 3 2 ] [ 2 . 3 1 ] [ 3 . 2 1 ] [ . 2 3 1 ] [ 1 2 3 . ] [ 2 1 3 . ] [ 3 1 2 . ] [ . 3 2 1 ] [ 1 3 2 . ] [ 2 3 1 . ] [ 3 2 1 . ] Figure 10.11-B: All permutations of 4 elements and falling factorial numbers used to update the permutations. Dots denote zeros. 10.11: Orders where the smallest element always moves right 17 18 19 20 21 22 23 24 25 26 27 269 first(); } [--snip--] void first() { for (ulong k=0; k= 2 { n_ = n; p_ = new ulong[n_]; ip_ = new ulong[n_]; first(); } [--snip--] The computation of the successor is 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 bool next() { ulong e1 = 0, u = n_ - 1; do { const ulong i1 = ip_[e1]; const ulong i2 = (i1==u ? e1 : i1+1 ); const ulong e2 = p_[i2]; p_[i1] = e2; p_[i2] = e1; ip_[e1] = i2; ip_[e2] = i1; if ( (p_[e1]!=e1) || (p_[u]!=u) ) ++e1; --u; } return true; 10.12: Single track orders 17 18 19 20 21 271 while ( u > e1 ); return false; } [--snip--] The rate of generation is about 180 M/s [FXT: comb/perm-ives-demo.cc]. Using arrays instead of pointers increases the rate to about 190 M/s. As the easy case with the update (when just the first element is moved) occurs so often it is natural to create an extra branch for it. If the define for PERM_IVES_OPT is made before the class definition, a counter is created: 1 2 3 4 5 6 7 8 class perm_ives { [--snip--] #ifdef PERM_IVES_OPT ulong ctm_; // aux: counter for easy case ulong ctm0_; // aux: start value of ctm == n*(n-1)-1 #endif [--snip--] If the counter is nonzero, the following update can be used: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 bool next() { if ( ctm_-- ) // easy case { const ulong i1 = ip_[0]; // e1 == 0 const ulong i2 = (i1==n_-1 ? 0 : i1+1); const ulong e2 = p_[i2]; p_[i1] = e2; p_[i2] = 0; ip_[0] = i2; ip_[e2] = i1; return true; } ctm_ = ctm0_; [--snip--] } // rest as before If arrays are used, a minimal speedup is achieved (rate 192 M/s), if pointers are used, the effect is a notable slowdown (rate 163 M/s). The greatest speedup comes from a modification of a condition in the loop: if ( (p_[e1]^e1) | (p_[u]^u) ) return true; // same as: if ( (p_[e1]!=e1) || (p_[u]!=u) ) return true; The rate is increased to almost 194 M/s. This optimization is activated by defining PERM_IVES_OPT2. 10.12 Single track orders Figure 10.12-A shows a single track order for the permutations of four elements. Each column in the list of permutations is a cyclic shift of the first column. A recursive construction for the ordering is shown in figure 10.12-B. Figure 10.12-A was created with the program [FXT: comb/perm-st-demo.cc] which uses [FXT: class perm st in comb/perm-st.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 class perm_st { public: ulong *d_; // mixed radix digits with radix = [2, 3, 4, ..., n-1, (sentinel=-1)] ulong *p_; // permutation ulong *pi_; // inverse permutation ulong n_; // permutations of n elements public: perm_st(ulong n) { n_ = n; d_ = new ulong[n_]; p_ = new ulong[n_]; pi_ = new ulong[n_]; 272 Chapter 10: Permutations permutation [ . 2 3 1 ] [ . 3 2 1 ] [ . 3 1 2 ] [ . 2 1 3 ] [ . 1 2 3 ] [ . 1 3 2 ] [ 1 . 2 3 ] [ 1 . 3 2 ] [ 2 . 3 1 ] [ 3 . 2 1 ] [ 3 . 1 2 ] [ 2 . 1 3 ] [ 3 1 . 2 ] [ 2 1 . 3 ] [ 1 2 . 3 ] [ 1 3 . 2 ] [ 2 3 . 1 ] [ 3 2 . 1 ] [ 2 3 1 . ] [ 3 2 1 . ] [ 3 1 2 . ] [ 2 1 3 . ] [ 1 2 3 . ] [ 1 3 2 . ] 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: [ . . . ] [ 1 . . ] [ . 1 . ] [ 1 1 . ] [ . 2 . ] [ 1 2 . ] [ . . 1 ] [ 1 . 1 ] [ . 1 1 ] [ 1 1 1 ] [ . 2 1 ] [ 1 2 1 ] [ . . 2 ] [ 1 . 2 ] [ . 1 2 ] [ 1 1 2 ] [ . 2 2 ] [ 1 2 2 ] [ . . 3 ] [ 1 . 3 ] [ . 1 3 ] [ 1 1 3 ] [ . 2 3 ] [ 1 2 3 ] inv. perm. [ . 3 1 2 ] [ . 3 2 1 ] [ . 2 3 1 ] [ . 2 1 3 ] [ . 1 2 3 ] [ . 1 3 2 ] [ 1 . 2 3 ] [ 1 . 3 2 ] [ 1 3 . 2 ] [ 1 3 2 . ] [ 1 2 3 . ] [ 1 2 . 3 ] [ 2 1 3 . ] [ 2 1 . 3 ] [ 2 . 1 3 ] [ 2 . 3 1 ] [ 2 3 . 1 ] [ 2 3 1 . ] [ 3 2 . 1 ] [ 3 2 1 . ] [ 3 1 2 . ] [ 3 1 . 2 ] [ 3 . 1 2 ] [ 3 . 2 1 ] Figure 10.12-A: Permutations of 4 elements in single track order. Dots denote zeros. 23 32 <--= permutations of 2 elements 11 23 32 <--= concatenate rows and prepend new element 112332 321123 233211 <--= shift <--= shift <--= shift 0 2 4 000000 112332 321123 233211 <--= concatenate rows and prepend new element 000000 112332 321123 233211 233211 000000 112332 321123 321123 233211 000000 112332 112332 321123 233211 000000 <--= shift 0 <--= shift 6 <--= shift 12 <--= shift 18 Figure 10.12-B: Construction of the single track order for permutations of 4 elements. 16 17 18 19 d_[n-1] = -1UL; first(); // sentinel } [--snip--] The first permutation is in enup order (see section 6.6.1 on page 186): 1 2 3 4 5 6 7 8 9 10 11 12 13 14 const ulong *data() const { return p_; } const ulong *invdata() const { return pi_; } void first() { for (ulong k=0; k=2 { n_ = (n>=2 ? n : 2); G = new perm_gray_rot1(n-1); x_ = new ulong[n_]; ix_ = new ulong[n_]; first(); } [--snip--] void first() { G->first(); for (ulong j=0; jnext(); if ( q ) // normal update (in underlying permutation of n-1 elements) { ulong i1, i2; // positions of swaps G->get_swap(i1, i2); // rotate positions according to sct: i1 += sct_; if ( i1>=n_ ) i1-=n_; i2 += sct_; if ( i2>=n_ ) i2-=n_; swap_positions(i1, i2); return true; } The infrequent case happens when the last underlying permutation is encountered: 1 2 3 4 5 6 7 8 9 10 11 12 else // goto next cyclic shift (once in (n-1)! updates, n-1 times in total) { G->first(); // restart underlying permutations --sct_; // adjust cyclic shift swap_elements(0, n_-1); if ( 0==(n_&1) ) if ( n_>=4 ) // n even swap_elements(n_-2, n_-1); return ( 0!=sct_ ); } } // one extra transposition 277 Chapter 11 Permutations with special properties 11.1 The number of certain permutations We give expressions for the number of permutations with special properties, such as involutions, derangements, permutations with prescribed cycle types, and permutations with distance restrictions. 11.1.1 n: 1: 2: 3: 4: 5: 6: 7: 8: 9: Permutations with m cycles: Stirling cycle numbers total 1 2 6 24 120 720 5040 40320 362880 m= 1 2 3 1 1 1 2 3 1 6 11 6 24 50 35 120 274 225 720 1764 1624 5040 13068 13132 40320 109584 118124 4 5 6 7 8 9 1 10 85 735 6769 67284 1 15 175 1960 22449 1 21 322 4536 1 28 546 1 36 1 Figure 11.1-A: Stirling numbers of the first kind s(n, m) (Stirling cycle numbers). The number of permutations of n elements into m cycles is given by the (unsigned) Stirling numbers of the first kind (or Stirling cycle numbers) s(n, m). The first few are shown in figure 11.1-A which was created with the program [FXT: comb/stirling1-demo.cc]. We have s(1, 1) = 1 and s(n, m) = s(n − 1, m − 1) + (n − 1) s(n − 1, m) (11.1-1) See entry A008275 in [312] and [1, p.824]. Many identities involving the Stirling numbers are given in [166, pp.243-253]. We note just a few, writing S(n, k) for the Stirling set numbers (see section 17.2 on page 358): xn = n X S(n, k) xk = k=0 n X S(n, k) (−1)n−k xk (11.1-2a) k=0 where xk = x (x − 1) (x − 2) · · · (x − k + 1) and xk = x (x + 1) (x + 2) · · · (x + k − 1). Also xk xk = = n X k=0 n X k=0 s(n, k) (−1)n−k xk (11.1-2b) s(n, k) xk (11.1-2c) 278 Chapter 11: Permutations with special properties d d With D := dz and ϑ = z dz , we have the operator identities [166, p.296] ϑn z n Dn n X = k=0 n X = S(n, k) z k Dk (11.1-3a) s(n, k) (−1)n−k ϑk (11.1-3b) k=0 11.1.2 Permutations with prescribed cycle type A permutation of n elements is of type C = [c1 , c2 , c3 , . . . , cn ] if it has c1 fixed points, c2 cycles of length 2, c3 cycles of length 3, and so on. The number Zn,C of permutations of n elements with type C equals [62, p.80] Zn,C = n! / (c1 ! c2 ! c3 ! . . . cn ! 1c1 2c2 3c3 . . . ncn ) = n! / n Y (ck ! k ck ) (11.1-4) k=1 We necessarily have n = 1 c1 + 2 c2 + . . . + n cn , that is, the cj correspond to an integer partition of n. The exponential generating function exp(L(z)) where L(z) = ∞ X tk z k k=1 (11.1-5a) k gives detailed information about all cycle types: exp(L(z)) = " ∞ X X n=0 C Y c  Zn,C tkk # zn n! (11.1-5b) That is, the exponent of tk indicates how many cycles of length k are present in the given cycle type: ? n=8;R=O(z^(n+1)); ? L=sum(k=1,n,eval(Str("t"k))*z^k/k)+R t1*z + 1/2*t2*z^2 + 1/3*t3*z^3 + 1/4*t4*z^4 + [...] + 1/8*t8*z^8 + O(z^9) ? serlaplace(exp(L)) 1 + t1 *z + (t1^2 + t2) *z^2 + (t1^3 + 3*t2*t1 + 2*t3) *z^3 + (t1^4 + 6*t2*t1^2 + 8*t3*t1 + 3*t2^2 + 6*t4) *z^4 + (t1^5 + 10*t2*t1^3 + 20*t3*t1^2 + 15*t1*t2^2 + 30*t1*t4 + 20*t3*t2 + 24*t5) *z^5 + (t1^6 + 15*t2*t1^4 + 40*t3*t1^3 + [...] + 15*t2^3 + 90*t4*t2 + 40*t3^2 + 120*t6) *z^6 + (t1^7 + 21*t2*t1^5 + 70*t3*t1^4 + [...] + 504*t5*t2 + 420*t4*t3 + 720*t7) *z^7 + (t1^8 + 28*t2*t1^6 + 112*t3*t1^5 + [...] + 2688*t5*t3 + 1260*t4^2 + 5040*c8) *z^8 + O(z^9) Relation 11.1-5a is obtained by replacing tk by (k − 1)! tk in relation 17.2-7a on page 359 (for the EGF for set partitions of given type), which takes the order of the elements in each cycle into account. 11.1.3 Prefix conditions Some types of permutations can be generated efficiently by a routine that produces the lexicographically ordered list of permutations subject to conditions for all prefixes. The implementation (following [215, alg.X, sect.7.2.1.2]) is [FXT: class perm restrpref in comb/perm-restrpref.h]. The condition has to be supplied (as a function pointer) at creation of a class instance. The program [FXT: comb/perm-restrprefdemo.cc] demonstrates the usage, it can be used to generate all involutions, up-down permutations, connected permutations, or derangements, see figure 11.1-B.. 11.1: The number of certain permutations involutions 1: 1 2 3 4 2: 1 2 4 3 3: 1 3 2 4 4: 1 4 3 2 5: 2 1 3 4 6: 2 1 4 3 7: 3 2 1 4 8: 3 4 1 2 9: 4 2 3 1 10: 4 3 2 1 #perm = 10 279 up-down 1: 1 3 2 2: 1 4 2 3: 2 3 1 4: 2 4 1 5: 3 4 1 #perm = 5 connected 1: 2 3 4 1 2: 2 4 1 3 3: 2 4 3 1 4: 3 1 4 2 5: 3 2 4 1 6: 3 4 1 2 7: 3 4 2 1 8: 4 1 2 3 9: 4 1 3 2 10: 4 2 1 3 11: 4 2 3 1 12: 4 3 1 2 13: 4 3 2 1 #perm = 13 4 3 4 3 2 derangements 1: 2 1 4 3 2: 2 3 4 1 3: 2 4 1 3 4: 3 1 4 2 5: 3 4 1 2 6: 3 4 2 1 7: 4 1 2 3 8: 4 3 1 2 9: 4 3 2 1 #perm = 9 Figure 11.1-B: Examples of permutations subject to conditions on prefixes. From left to right: involutions, up-down permutations, connected permutations, and derangements. 11.1.3.1 Involutions The sequence of numbers of involutions (self-inverse permutations), I(n), starts as (n ≥ 1) 1, 2, 4, 10, 26, 76, 232, 764, 2620, 9496, 35696, 140152, 568504, 2390480, ... This is sequence A000085 in [312]. The first element in an involution can be a fixed point or a 2-cycle with any of the n − 1 other elements, so I(n) N=20; v=vector(N); v[1]=1; v[2]=2; for(n=3,N,v[n]=v[n-1]+(n-1)*v[n-2]); = I(n − 1) + (n − 1) I(n − 2) v \\ == (11.1-6) [1, 2, 4, 10, 26, 76, ... ] Let hn (x) be the polynomial such that the coefficient of xk gives the number of involutions of n elements with k fixed points. The polynomials can be computed recursively via hn+1 = h0n + x hn (starting with h0 = 1). We have hn (1) = I(n): ? h=1;for(k=1,8,h=(deriv(h)+x*h);print(subst(h,x,1),": ",h)) 1: x 2: x^2 + 1 4: x^3 + 3*x 10: x^4 + 6*x^2 + 3 26: x^5 + 10*x^3 + 15*x 76: x^6 + 15*x^4 + 45*x^2 + 15 232: x^7 + 21*x^5 + 105*x^3 + 105*x 764: x^8 + 28*x^6 + 210*x^4 + 420*x^2 + 105 The exponential generating function (EGF) is ∞ X I(k) xk k=0 k! =  exp x + x2 /2 (11.1-7) We further have (set c1 = t, c2 = 1, and ck = 0 for k ≥ 2 in 11.1-5a) ∞ X hk (t) xk k=0 k! =  exp t x + x2 /2 The EGF for the number permutations whose m-th power is identity is [359, p.85]:   X exp  xd /d d\m The special case m = 2 gives relation 11.1-7. The condition function for involutions is 1 2 bool cond_inv(const ulong *a, ulong k) { (11.1-8) (11.1-9) 280 3 4 5 6 Chapter 11: Permutations with special properties ulong ak = a[k]; if ( (ak<=k) && (a[ak]!=k) ) return true; return false; } The recurrence 11.1-6 can be generalized for permutations where only cycles of certain lengths are allowed. Set tk = 1 if cycles of length k are allowed, else set tk = 0. The recurrence relation for PT (n), the number of permutations corresponding to the vector T = [t1 , t2 , . . . , tu ] is (by relation 11.1-1) PT (n) = u X tk F (n − 1, k − 1) PT (n − k) where (11.1-10a) k=1 F (n − 1, e) := (n − 1) (n − 2) (n − 3) . . . (n − e + 1) and F (n − 1, 0) := 1 (11.1-10b) Initialize by setting PT (0) = 1 and PT (n) = 0 for n < 0. For example, if only cycles of length 1 or 3 are allowed (t1 = t3 = 1, else tk = 0), the recurrence is P (n) = P (n − 1) + (n − 1) (n − 2) P (n − 3) (11.1-11) The sequence of numbers of these permutations (whose order divides 3) is entry A001470 in [312]: 1, 1, 1, 3, 9, 21, 81, 351, 1233, 5769, 31041, 142011, 776601, 4874013, ... 11.1.3.2 Derangements A permutation is a derangement if ak 6= k for all k: 1 bool cond_derange(const ulong *a, ulong k) { return ( a[k] != k ); } The sequence D(n) of the number of derangements starts as (n ≥ 1) 0, 1, 2, 9, 44, 265, 1854, 14833, 133496, 1334961, 14684570, 176214841, ... This is sequence A000166 in [312], the subfactorial numbers. Compute D(n) using either of D(n) = (n − 1) [D(n − 1) + D(n − 2)] n = n D(n − 1) + (−1) n n X X (−1)k n! = n! = (−1)n−k (n − k)! k! (11.1-12b) = b(n! + 1)/ec (11.1-12d) (11.1-12c) k=0 k=0 D(n) (11.1-12a) for n ≥ 1 where e = exp(1). We use the recurrence 11.1-12a: N=20; v=vector(N); v[1]=0; v[2]=1; for(n=3,N,v[n]=(n-1)*(v[n-1]+v[n-2])); v \\ == [0, 1, 2, 9, 44, 265, 1854, 14833, ... ] The exponential generating function can be found by setting t1 = 0 and tk = 1 for k 6= 1 in relation 11.15a: we have L(z) = log (1/(1 − z)) − z and ∞ X D(n) z n k=0 n! = exp L(z) = exp(−z) 1−z (11.1-13) The number of derangements with prescribed first element is K(n) := D(n)/(n − 1), The sequence of values K(n), entry A000255 in [312], starts as 1, 1, 3, 11, 53, 309, 2119, 16687, 148329, 1468457, 16019531, 190899411, ... We have K(n) = n K(n − 1) + (n − 1) K(n − 2), and K(n) counts the permutations with no occurrence of [x, x + 1], see figure 11.1-C. The condition used is 1 2 3 4 5 bool cond_xx1(const ulong *a, ulong k) { if ( k==1 ) return true; return ( a[k-1] != a[k]-1 ); // must look backward } Note that the routine is for the permutations of the elements 1, 2, . . . , n in a one-based array. 11.1: The number of certain permutations no [x, x+1] 1: 1 3 2 4 2: 1 4 3 2 3: 2 1 4 3 4: 2 4 1 3 5: 2 4 3 1 6: 3 1 4 2 7: 3 2 1 4 8: 3 2 4 1 9: 4 1 3 2 10: 4 2 1 3 11: 4 3 2 1 281 derangements with p(1)=2 1: 2 1 4 5 3 2: 2 1 5 3 4 3: 2 3 1 5 4 4: 2 3 4 5 1 5: 2 3 5 1 4 6: 2 4 1 5 3 7: 2 4 5 1 3 8: 2 4 5 3 1 9: 2 5 1 3 4 10: 2 5 4 1 3 11: 2 5 4 3 1 Figure 11.1-C: Permutations of 4 elements with no occurrence of [x, x + 1] (left) and derangements of 5 elements starting with 2. 11.1.3.3 Connected permutations The connected (or indecomposable) permutations satisfy, for k = 0, 1, . . . , n − 2, the inequality of sets {a0 , a1 , . . . , ak } 6= {0, 1, . . . , k} (11.1-14) That is, there is no prefix of length < n which is a permutation of itself. The condition function is 1 2 3 4 5 6 7 8 ulong N; // set to n in main() bool cond_indecomp(const ulong *a, ulong k) // indecomposable condition: {a1,...,ak} != {1,...,k} for all kk ) return true; return false; } The sequence of numbers C(n) of indecomposable permutations starts as (n ≥ 1) 1, 1, 3, 13, 71, 461, 3447, 29093, 273343, 2829325, 31998903, 392743957, ... This is entry A003319 in [312]. Compute C(n) using C(n) = n! − n−1 X k! C(n − k) (11.1-15) k=1 N=20; v=vector(N); for(n=1,N,v[n]=n!-sum(k=1,n-1,k!*v[n-k])); v \\ == [1, 1, 3, 13, 71, 461, 3447, ... ] The ordinary generating function can be given as ∞ X C(n) z n = 1 − P∞ 1 k k=0 k! z n=1 = z + z 2 + 3 z 3 + 13 z 4 + 71 z 5 + . . . (11.1-16) The following recursion (and a Gray code for the connected permutations) is given in [205]: C(n) = n−1 X (n − k) (k − 1)! C(n − k) (11.1-17) k=1 11.1.3.4 Alternating permutations The alternating permutations (or up-down permutations) satisfy a0 < a1 > a2 < a3 > . . .. The condition function is 1 2 3 4 5 6 7 bool cond_updown(const ulong *a, ulong k) // up-down condition: a1 < a2 > a3 < a4 > ... { if ( k<2 ) return true; if ( (k%2) ) return ( a[k]a[k-1] ); } 282 Chapter 11: Permutations with special properties The sequence A(n) of the number of alternating permutations starts as (n ≥ 1) 1, 1, 2, 5, 16, 61, 272, 1385, 7936, 50521, 353792, 2702765, 22368256, ... It is sequence A000111 in [312], the sequence of the Euler numbers. The list can be computed using the relation  n−1  1 X n−1 A(n) = A(k) A(n − 1 − k) (11.1-18) 2 k k=0 N=20; v=vector(N+1); v[0+1]=1; v[1+1]=1; v[2+1]=1; \\ start with zero: v[x] == A(x-1) for(n=3,N,v[n+1]=1/2*sum(k=0,n-1,binomial(n-1,k)*v[k+1]*v[n-1-k+1])); v \\ == [1, 1, 1, 2, 5, 16, 61, 272, ... ] An exponential generating function is 1 + sin(z) cos(z) = ∞ X A(k) z k k=0 (11.1-19) k! ? serlaplace((1+sin(z))/cos(z)) 1 + z + z^2 + 2*z^3 + 5*z^4 + 16*z^5 + 61*z^6 + 272*z^7 + 1385*z^8 + 7936*z^9 + ... 11.2 Permutations with distance restrictions We present constructions for Gray codes for permutations with certain restrictions. These are computed from Gray codes of mixed radix numbers with factorial base. We write p(k) for the position of the element k in a given permutation. 11.2.1 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: Permutations where p(k) ≤ k + 1 ffact . 3 . . . 2 . . . 1 . . . 1 . 1 . . . 1 . . . . . . 1 . . . 2 . 1 . 2 . 1 . 1 . 1 . . . 1 . . 1 2 . . 1 2 . . . 3 . . . 4 . . . perm [ 0 4 1 2 3 ] [ 0 3 1 2 4 ] [ 0 2 1 3 4 ] [ 0 2 1 4 3 ] [ 0 1 2 4 3 ] [ 0 1 2 3 4 ] [ 0 1 3 2 4 ] [ 0 1 4 2 3 ] [ 1 0 4 2 3 ] [ 1 0 3 2 4 ] [ 1 0 2 3 4 ] [ 1 0 2 4 3 ] [ 2 0 1 4 3 ] [ 2 0 1 3 4 ] [ 3 0 1 2 4 ] [ 4 0 1 2 3 ] inv. perm [ 0 2 3 4 1 ] [ 0 2 3 1 4 ] [ 0 2 1 3 4 ] [ 0 2 1 4 3 ] [ 0 1 2 4 3 ] [ 0 1 2 3 4 ] [ 0 1 3 2 4 ] [ 0 1 3 4 2 ] [ 1 0 3 4 2 ] [ 1 0 3 2 4 ] [ 1 0 2 3 4 ] [ 1 0 2 4 3 ] [ 1 2 0 4 3 ] [ 1 2 0 3 4 ] [ 1 2 3 0 4 ] [ 1 2 3 4 0 ] ffact(inv) . 1 1 1 . 1 1 . . 1 . . . 1 . 1 . . . 1 . . . . . . 1 . . . 1 1 1 . 1 1 1 . 1 . 1 . . . 1 . . 1 1 1 . 1 1 1 . . 1 1 1 . 1 1 1 1 Figure 11.2-A: Gray code for the permutations of 5 elements where no element lies more than one place to the right of its position in the identical permutation. Let M (n) be the number of permutations of n elements where no element can move more than one place to the right. We have M (n) = 2n−1 , see entry A000079 in [312]. A Gray code for these permutations is shown in figure 11.2-A which was created with the program [FXT: comb/perm-right1-gray-demo.cc]. M (n) also counts the permutations that start as a rising sequence (ending in the maximal element) and end as a falling sequence. The list in the leftmost column of figure 11.2-A can be generated by the recursion 1 2 3 4 5 6 void Y_rec(ulong d, bool z) { if ( d>=n ) visit(); else { if ( z ) // forward: 11.2: Permutations with distance restrictions 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 283 { // words 0, 10, 200, 3000, 40000, ... ulong k = 0; do { ff[d] = k; Y_rec(d+k+1, !z); } while ( ++k <= (n-d) ); } else // backward: { // words ..., 40000, 3000, 200, 10, 0 ulong k = n-d+1; do { --k; ff[d] = k; Y_rec(d+k+1, !z); } while ( k != 0 ); } } } The array ff (of length n) must be initialized with zeros and the initial call is Y_rec(0, true);. About 85 million words per second are generated. In the inverse permutations (where no element is more than one place left of its original position) the swaps are adjacent and their position is determined by the ruler function. Therefore the inverse permutations can be generated using [FXT: class ruler func in comb/ruler-func.h] which is described in section 8.2.3 on page 207. 11.2.2 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: Permutations where k − 1 ≤ p(k) ≤ k + 1 ffact 1 . . 1 . . 1 . . 1 . 1 1 . . . . 1 1 . . . . . 1 . . . 1 . 1 . 1 . 1 . 1 . 1 . . . 1 . 1 . . 1 . . 1 . . 1 . . 1 . . . . . 1 . 1 . . . . . 1 . . . . . . . perm [ 1 0 2 4 3 5 6 ] [ 1 0 2 4 3 6 5 ] [ 1 0 2 3 4 6 5 ] [ 1 0 2 3 4 5 6 ] [ 1 0 2 3 5 4 6 ] [ 1 0 3 2 5 4 6 ] [ 1 0 3 2 4 5 6 ] [ 1 0 3 2 4 6 5 ] [ 0 1 3 2 4 6 5 ] [ 0 1 3 2 4 5 6 ] [ 0 1 3 2 5 4 6 ] [ 0 1 2 3 5 4 6 ] [ 0 1 2 3 4 5 6 ] 14: 15: 16: 17: 18: 19: 20: 21: ffact . . . . . 1 . . . 1 . 1 . . . 1 . . . 1 . 1 . . . 1 . 1 . 1 . 1 . . . 1 . 1 . . . . . 1 . . 1 . perm [ 0 1 2 3 4 6 5 ] [ 0 1 2 4 3 6 5 ] [ 0 1 2 4 3 5 6 ] [ 0 2 1 4 3 5 6 ] [ 0 2 1 4 3 6 5 ] [ 0 2 1 3 4 6 5 ] [ 0 2 1 3 4 5 6 ] [ 0 2 1 3 5 4 6 ] Figure 11.2-B: Gray code for the permutations of 7 elements where no element lies more than one place away from its position in the identical permutation. The permutations are self-inverse. Let F (n) the number of permutations of n elements where no element can move more than one place to the left. Then F (n) is the (n + 1)-st Fibonacci number. A Gray code for these permutations is shown in figure 11.2-B which was created with the program [FXT: comb/perm-dist1-gray-demo.cc]. 11.2.3 Permutations where k − 1 ≤ p(k) ≤ k + d A Gray code for the permutations where no element lies more than one place to the left or d places to the right of its original position can be generated using the Gray codes for binary words with at most d consecutive ones given in section 14.3 on page 307. Figure 11.2-C shows the permutations of 6 elements with d = 2. It was created with the program [FXT: comb/perm-l1r2-gray-demo.cc]. The array shown leftmost in figure 11.2-C can be generated via the recursion 1 2 3 4 5 6 void Y_rec(ulong d, bool z) { if ( d>=n ) visit(); else { const ulong w = n-d; 284 Chapter 11: Permutations with special properties 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: ffact 1 1 . . 1 1 1 . . . 1 1 . 1 . 1 1 . 1 1 1 . . 1 1 1 . . 1 . 1 . . . . 1 . . . 1 1 . 1 . 1 1 . 1 . . 1 . 1 1 . . . 1 1 . . . 1 . . . . 1 . 1 . . . . 1 . . . . . . . . 1 . . . . 1 1 . 1 . 1 1 . 1 . 1 . . 1 . . . . 1 . . 1 . 1 1 . 1 . 1 1 . . perm [ 1 2 0 3 5 4 ] [ 1 2 0 3 4 5 ] [ 1 2 0 4 3 5 ] [ 1 2 0 4 5 3 ] [ 1 0 2 4 5 3 ] [ 1 0 2 4 3 5 ] [ 1 0 2 3 4 5 ] [ 1 0 2 3 5 4 ] [ 1 0 3 2 5 4 ] [ 1 0 3 2 4 5 ] [ 1 0 3 4 2 5 ] [ 0 1 3 4 2 5 ] [ 0 1 3 2 4 5 ] [ 0 1 3 2 5 4 ] [ 0 1 2 3 5 4 ] [ 0 1 2 3 4 5 ] [ 0 1 2 4 3 5 ] [ 0 1 2 4 5 3 ] [ 0 2 1 4 5 3 ] [ 0 2 1 4 3 5 ] [ 0 2 1 3 4 5 ] [ 0 2 1 3 5 4 ] [ 0 2 3 1 5 4 ] [ 0 2 3 1 4 5 ] inv. perm [ 2 0 1 3 5 4 ] [ 2 0 1 3 4 5 ] [ 2 0 1 4 3 5 ] [ 2 0 1 5 3 4 ] [ 1 0 2 5 3 4 ] [ 1 0 2 4 3 5 ] [ 1 0 2 3 4 5 ] [ 1 0 2 3 5 4 ] [ 1 0 3 2 5 4 ] [ 1 0 3 2 4 5 ] [ 1 0 4 2 3 5 ] [ 0 1 4 2 3 5 ] [ 0 1 3 2 4 5 ] [ 0 1 3 2 5 4 ] [ 0 1 2 3 5 4 ] [ 0 1 2 3 4 5 ] [ 0 1 2 4 3 5 ] [ 0 1 2 5 3 4 ] [ 0 2 1 5 3 4 ] [ 0 2 1 4 3 5 ] [ 0 2 1 3 4 5 ] [ 0 2 1 3 5 4 ] [ 0 3 1 2 5 4 ] [ 0 3 1 2 4 5 ] ffact(inv) 2 . . . 1 2 . . . . 2 . . 1 . 2 . . 2 . 1 . . 2 . 1 . . 1 . 1 . . . . 1 . . . 1 1 . 1 . 1 1 . 1 . . 1 . 2 . . . . 2 . . . . 1 . . . . 1 . 1 . . . . 1 . . . . . . . . 1 . . . . 2 . . 1 . 2 . . 1 . 1 . . 1 . . . . 1 . . 1 . 2 . . 1 . 2 . . . Figure 11.2-C: Gray code for the permutations of 6 elements where no element lies more than one place to the left or two places to the right of its position in the identical permutation. 7 8 9 10 11 12 13 14 15 16 17 18 19 20 if ( z ) { if ( w>1 ) { ff[d]=1; ff[d+1]=1; ff[d+2]=0; Y_rec(d+3, !z); } ff[d]=1; ff[d+1]=0; Y_rec(d+2, !z); ff[d]=0; Y_rec(d+1, !z); } else { ff[d]=0; Y_rec(d+1, !z); ff[d]=1; ff[d+1]=0; Y_rec(d+2, !z); if ( w>1 ) { ff[d]=1; ff[d+1]=1; ff[d+2]=0; Y_rec(d+3, !z); } } } } If the two lines starting with if ( w>1 ) are omitted, the Fibonacci words are computed. About 100 million words per second are generated. 11.3 Self-inverse permutations (involutions) 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: [ . 1 2 3 4 ] [ . 1 2 4 3 ] [ . 1 4 3 2 ] [ . 4 2 3 1 ] [ 4 1 2 3 . ] [ . 1 3 2 4 ] [ . 4 3 2 1 ] [ 4 1 3 2 . ] [ . 3 2 1 4 ] [ . 3 4 1 2 ] [ 4 3 2 1 . ] [ 3 1 2 . 4 ] [ 3 1 4 . 2 ] 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: [ 3 4 2 . 1 ] [ . 2 1 3 4 ] [ . 2 1 4 3 ] [ 4 2 1 3 . ] [ 3 2 1 . 4 ] [ 2 1 . 3 4 ] [ 2 1 . 4 3 ] [ 2 4 . 3 1 ] [ 2 3 . 1 4 ] [ 1 . 2 3 4 ] [ 1 . 2 4 3 ] [ 1 . 4 3 2 ] [ 1 . 3 2 4 ] Figure 11.3-A: All self-inverse permutations of 5 elements. An involution is a self-inverse permutation (see section 2.3.1 on page 106). The involutions of 5 elements are shown in figure 11.3-A. To generate all involutions, use [FXT: class perm involution in comb/perm- 11.4: Cyclic permutations 285 involution.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 class perm_involution { public: ulong *p_; // self-inverse permutation in 0, 1, ..., n-1 ulong n_; // number of elements to permute public: perm_involution(ulong n) { n_ = n; p_ = new ulong[n_]; first(); } ~perm_involution() { delete [] p_; } void first() { for (ulong i=0; i=0 ) { if ( p_[ip]==ip ) { p_[j] = ip; p_[ip] = j; // swap2(p_[j], p_[ip]); return true; } } } return false; // current permutation is last } [--snip--] }; The rate of generation is about 50 million per second [FXT: comb/perm-involution-demo.cc]. 11.4 Cyclic permutations Cyclic permutations consist of exactly one cycle of full length, see section 2.2.1 on page 105. 11.4.1 Recursive algorithm for cyclic permutations A simple recursive algorithm for the generation of all (not only cyclic!) permutations of n elements can be described as follows: Put each of the n elements of the array to the first position and generate all permutations of the remaining n − 1 elements. If n = 1, print the permutation. The generated order is shown in figure 11.4-A, it corresponds to the alternative (swaps) factorial representation with falling base, given in section 10.1.4 on page 239. The algorithm is implemented in [FXT: class perm rec in comb/perm-rec.h]: 1 2 3 4 5 6 7 8 9 10 11 12 class perm_rec { public: ulong *x_; // permutation ulong n_; // number of elements void (*visit_)(const perm_lex_rec &); public: perm_rec(ulong n) { n_ = n; x_ = new ulong[n_]; // function to call with each permutation 286 Chapter 11: Permutations with special properties permutation [ . 1 2 3 ] [ . 1 3 2 ] [ . 2 1 3 ] [ . 2 3 1 ] [ . 3 2 1 ] [ . 3 1 2 ] [ 1 . 2 3 ] [ 1 . 3 2 ] [ 1 2 . 3 ] [ 1 2 3 . ] [ 1 3 2 . ] [ 1 3 . 2 ] [ 2 1 . 3 ] [ 2 1 3 . ] [ 2 . 1 3 ] [ 2 . 3 1 ] [ 2 3 . 1 ] [ 2 3 1 . ] [ 3 1 2 . ] [ 3 1 . 2 ] [ 3 2 1 . ] [ 3 2 . 1 ] [ 3 . 2 1 ] [ 3 . 1 2 ] 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: inverse [ . 1 2 3 ] [ . 1 3 2 ] [ . 2 1 3 ] [ . 3 1 2 ] [ . 3 2 1 ] [ . 2 3 1 ] [ 1 . 2 3 ] [ 1 . 3 2 ] [ 2 . 1 3 ] [ 3 . 1 2 ] [ 3 . 2 1 ] [ 2 . 3 1 ] [ 2 1 . 3 ] [ 3 1 . 2 ] [ 1 2 . 3 ] [ 1 3 . 2 ] [ 2 3 . 1 ] [ 3 2 . 1 ] [ 3 1 2 . ] [ 2 1 3 . ] [ 3 2 1 . ] [ 2 3 1 . ] [ 1 3 2 . ] [ 1 2 3 . ] ffact-swp [ . . . ] [ . . 1 ] [ . 1 . ] [ . 1 1 ] [ . 2 . ] [ . 2 1 ] [ 1 . . ] [ 1 . 1 ] [ 1 1 . ] [ 1 1 1 ] [ 1 2 . ] [ 1 2 1 ] [ 2 . . ] [ 2 . 1 ] [ 2 1 . ] [ 2 1 1 ] [ 2 2 . ] [ 2 2 1 ] [ 3 . . ] [ 3 . 1 ] [ 3 1 . ] [ 3 1 1 ] [ 3 2 . ] [ 3 2 1 ] Figure 11.4-A: All permutations of 4 elements (left) and their inverses (middle), and their (swaps) representations as mixed radix numbers with falling factorial base. Permutations with common prefixes appear in succession. Dots denote zeros. 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 } ~perm_rec() { delete [] x_; } void init() { for (ulong k=0; k=(long)d; --x) the permutations would appear in reversed order. Changing the loop in the function next_rec() to 11.4: Cyclic permutations 287 for (ulong k=d; kdata(); for (ulong k=0; k1; --k) { ulong z = n_-3-(k-2); // 0, ..., n-3 ulong i = fc[z]; swap2(ix_[k], ix_[i]); } if ( n_>1 ) swap2(ix_[0], ix_[1]); make_inverse(ix_, x_, n_); } public: void first() { M_->first(); 11.4: Cyclic permutations 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 289 setup(); } bool next() { if ( false == M_->next() ) ulong j = M_->pos(); { first(); return false; } if ( j && (x_[0]==n_-1) ) // once in 2*n cases { setup(); // work proportional to n // only 3 elements are interchanged } else // easy case { int d = M_->dir(); ulong x2 = (M_->data())[j]; ulong x1 = x2 - d, x3 = n_-1; ulong i1 = x_[x1], i2 = x_[x2], i3 = x_[x3]; swap2(x_[x1], x_[x2]); swap2(x_[x1], x_[x3]); swap2(ix_[i1], ix_[i2]); swap2(ix_[i2], ix_[i3]); } return true; } [--snip--] The listing in figure 11.4-C was created with the program [FXT: comb/cyclic-perm-demo.cc]. About 58 million permutations per second are generated. 11.4.3 Cyclic permutations from factorial numbers falling fact. permutation [ . . . ] [ 1 2 3 4 0 ] [ 1 . . ] [ 4 2 3 0 1 ] [ 2 . . ] [ 1 4 3 0 2 ] [ 3 . . ] [ 1 2 4 0 3 ] [ . 1 . ] [ 3 2 4 1 0 ] [ 1 1 . ] [ 3 2 0 4 1 ] [ 2 1 . ] [ 3 4 0 1 2 ] [ 3 1 . ] [ 4 2 0 1 3 ] [ . 2 . ] [ 1 3 4 2 0 ] [ 1 2 . ] [ 4 3 0 2 1 ] [ 2 2 . ] [ 1 3 0 4 2 ] [ 3 2 . ] [ 1 4 0 2 3 ] [ . . 1 ] [ 2 3 1 4 0 ] [ 1 . 1 ] [ 2 3 4 0 1 ] [ 2 . 1 ] [ 4 3 1 0 2 ] [ 3 . 1 ] [ 2 4 1 0 3 ] [ . 1 1 ] [ 2 4 3 1 0 ] [ 1 1 1 ] [ 2 0 3 4 1 ] [ 2 1 1 ] [ 4 0 3 1 2 ] [ 3 1 1 ] [ 2 0 4 1 3 ] [ . 2 1 ] [ 3 4 1 2 0 ] [ 1 2 1 ] [ 3 0 4 2 1 ] [ 2 2 1 ] [ 3 0 1 4 2 ] [ 3 2 1 ] [ 4 0 1 2 3 ] cycle (0, 1, 2, 3, 4) (0, 4, 1, 2, 3) (0, 1, 4, 2, 3) (0, 1, 2, 4, 3) (0, 3, 1, 2, 4) (0, 3, 4, 1, 2) (0, 3, 1, 4, 2) (0, 4, 3, 1, 2) (0, 1, 3, 2, 4) (0, 4, 1, 3, 2) (0, 1, 3, 4, 2) (0, 1, 4, 3, 2) (0, 2, 1, 3, 4) (0, 2, 4, 1, 3) (0, 4, 2, 1, 3) (0, 2, 1, 4, 3) (0, 2, 3, 1, 4) (0, 2, 3, 4, 1) (0, 4, 2, 3, 1) (0, 2, 4, 3, 1) (0, 3, 2, 1, 4) (0, 3, 2, 4, 1) (0, 3, 4, 2, 1) (0, 4, 3, 2, 1) inv.perm. [ 4 0 1 2 3 ] [ 3 4 1 2 0 ] [ 3 0 4 2 1 ] [ 3 0 1 4 2 ] [ 4 3 1 0 2 ] [ 2 4 1 0 3 ] [ 2 3 4 0 1 ] [ 2 3 1 4 0 ] [ 4 0 3 1 2 ] [ 2 4 3 1 0 ] [ 2 0 4 1 3 ] [ 2 0 3 4 1 ] [ 4 2 0 1 3 ] [ 3 4 0 1 2 ] [ 3 2 4 1 0 ] [ 3 2 0 4 1 ] [ 4 3 0 2 1 ] [ 1 4 0 2 3 ] [ 1 3 4 2 0 ] [ 1 3 0 4 2 ] [ 4 2 3 0 1 ] [ 1 4 3 0 2 ] [ 1 2 4 0 3 ] [ 1 2 3 4 0 ] Figure 11.4-D: Numbers in falling factorial base and the corresponding cyclic permutations. The cyclic permutations of n elements can be computed from length-(n − 2) factorial numbers. We give routines for both falling and rising base [FXT: comb/fact2cyclic.cc]: 1 2 3 4 void ffact2cyclic(const ulong *fc, ulong n, ulong *x) // Generate cyclic permutation in x[] // from the (n-2) digit factorial number in fc[0,...,n-3]. // Falling radices: [n-1, ..., 3, 2] 290 Chapter 11: Permutations with special properties rising fact. [ . . . ] [ 1 . . ] [ . 1 . ] [ 1 1 . ] [ . 2 . ] [ 1 2 . ] [ . . 1 ] [ 1 . 1 ] [ . 1 1 ] [ 1 1 1 ] [ . 2 1 ] [ 1 2 1 ] [ . . 2 ] [ 1 . 2 ] [ . 1 2 ] [ 1 1 2 ] [ . 2 2 ] [ 1 2 2 ] [ . . 3 ] [ 1 . 3 ] [ . 1 3 ] [ 1 1 3 ] [ . 2 3 ] [ 1 2 3 ] permutation [ 1 2 3 4 0 ] [ 2 3 1 4 0 ] [ 3 2 4 1 0 ] [ 2 4 3 1 0 ] [ 1 3 4 2 0 ] [ 3 4 1 2 0 ] [ 4 2 3 0 1 ] [ 2 3 4 0 1 ] [ 3 2 0 4 1 ] [ 2 0 3 4 1 ] [ 4 3 0 2 1 ] [ 3 0 4 2 1 ] [ 1 4 3 0 2 ] [ 4 3 1 0 2 ] [ 3 4 0 1 2 ] [ 4 0 3 1 2 ] [ 1 3 0 4 2 ] [ 3 0 1 4 2 ] [ 1 2 4 0 3 ] [ 2 4 1 0 3 ] [ 4 2 0 1 3 ] [ 2 0 4 1 3 ] [ 1 4 0 2 3 ] [ 4 0 1 2 3 ] cycle (0, 1, 2, 3, 4) (0, 2, 1, 3, 4) (0, 3, 1, 2, 4) (0, 2, 3, 1, 4) (0, 1, 3, 2, 4) (0, 3, 2, 1, 4) (0, 4, 1, 2, 3) (0, 2, 4, 1, 3) (0, 3, 4, 1, 2) (0, 2, 3, 4, 1) (0, 4, 1, 3, 2) (0, 3, 2, 4, 1) (0, 1, 4, 2, 3) (0, 4, 2, 1, 3) (0, 3, 1, 4, 2) (0, 4, 2, 3, 1) (0, 1, 3, 4, 2) (0, 3, 4, 2, 1) (0, 1, 2, 4, 3) (0, 2, 1, 4, 3) (0, 4, 3, 1, 2) (0, 2, 4, 3, 1) (0, 1, 4, 3, 2) (0, 4, 3, 2, 1) inv.perm. [ 4 0 1 2 3 ] [ 4 2 0 1 3 ] [ 4 3 1 0 2 ] [ 4 3 0 2 1 ] [ 4 0 3 1 2 ] [ 4 2 3 0 1 ] [ 3 4 1 2 0 ] [ 3 4 0 1 2 ] [ 2 4 1 0 3 ] [ 1 4 0 2 3 ] [ 2 4 3 1 0 ] [ 1 4 3 0 2 ] [ 3 0 4 2 1 ] [ 3 2 4 1 0 ] [ 2 3 4 0 1 ] [ 1 3 4 2 0 ] [ 2 0 4 1 3 ] [ 1 2 4 0 3 ] [ 3 0 1 4 2 ] [ 3 2 0 4 1 ] [ 2 3 1 4 0 ] [ 1 3 0 4 2 ] [ 2 0 3 4 1 ] [ 1 2 3 4 0 ] Figure 11.4-E: Numbers in rising factorial base and corresponding cyclic permutations. 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 { for (ulong k=0; k1; --k) { ulong z = n-1-k; // 0, ..., n-3 ulong i = fc[z]; swap2(x[k], x[i]); } if ( n>1 ) swap2(x[0], x[1]); } void rfact2cyclic(const ulong *fc, ulong n, ulong *x) // Rising radices: [2, 3, ..., n-1] { for (ulong k=0; k1; --k) { ulong i = fc[k-2]; // k-2 == n-3, ..., 0 swap2(x[k], x[i]); } if ( n>1 ) swap2(x[0], x[1]); } The cyclic permutations of 5 elements are shown in figures 11.4-D (falling base) and 11.4-E (rising base). The listings were created with the program [FXT: comb/fact2cyclic-demo.cc]. The cycle representation could be computed by applying the transformations in (all) permutations to all but the first element. That is, we can generate all cyclic permutations in cycle form by permuting all elements but the first with any permutation algorithm. 291 Chapter 12 k-permutations 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: ffact. num. [ . . . . . ] [ 1 . . . . ] [ 2 . . . . ] [ 3 . . . . ] [ 4 . . . . ] [ 5 . . . . ] [ . 1 . . . ] [ 1 1 . . . ] [ 2 1 . . . ] [ 3 1 . . . ] [ 4 1 . . . ] [ 5 1 . . . ] [ . 2 . . . ] [ 1 2 . . . ] [ 2 2 . . . ] [ 3 2 . . . ] [ 4 2 . . . ] [ 5 2 . . . ] [ . 3 . . . ] [ 1 3 . . . ] [ 2 3 . . . ] [ 3 3 . . . ] [ 4 3 . . . ] [ 5 3 . . . ] [ . 4 . . . ] [ 1 4 . . . ] [ 2 4 . . . ] [ 3 4 . . . ] [ 4 4 . . . ] [ 5 4 . . . ] permutation [ . 1 ][ 2 3 4 5 ] [ 1 . ][ 2 3 4 5 ] [ 2 . ][ 1 3 4 5 ] [ 3 . ][ 1 2 4 5 ] [ 4 . ][ 1 2 3 5 ] [ 5 . ][ 1 2 3 4 ] [ . 2 ][ 1 3 4 5 ] [ 1 2 ][ . 3 4 5 ] [ 2 1 ][ . 3 4 5 ] [ 3 1 ][ . 2 4 5 ] [ 4 1 ][ . 2 3 5 ] [ 5 1 ][ . 2 3 4 ] [ . 3 ][ 1 2 4 5 ] [ 1 3 ][ . 2 4 5 ] [ 2 3 ][ . 1 4 5 ] [ 3 2 ][ . 1 4 5 ] [ 4 2 ][ . 1 3 5 ] [ 5 2 ][ . 1 3 4 ] [ . 4 ][ 1 2 3 5 ] [ 1 4 ][ . 2 3 5 ] [ 2 4 ][ . 1 3 5 ] [ 3 4 ][ . 1 2 5 ] [ 4 3 ][ . 1 2 5 ] [ 5 3 ][ . 1 2 4 ] [ . 5 ][ 1 2 3 4 ] [ 1 5 ][ . 2 3 4 ] [ 2 5 ][ . 1 3 4 ] [ 3 5 ][ . 1 2 4 ] [ 4 5 ][ . 1 2 3 ] [ 5 4 ][ . 1 2 3 ] Figure 12.0-A: The falling factorial numbers with n−1 digits where only k leading digits can be nonzero correspond to the k-permutations of n elements (here n = 6 and k = 2). The length-k prefixes of the permutations of n elements are called k-permutations. The 2-permutations of 6 elements are shown in figure 12.0-A. We have n choices for the first element, n − 1 for the second, and so on. Therefore the number of the k-permutations of n elements is   n k n (n − 1) (n − 1) . . . (n − k + 1) = n = k! (12.0-1) k The second equality shows that the k-permutations could be generated by listing all k-subsets of the  n-set (combinations nk ), each in k! orderings. The expression as falling factorial power shows that the k-permutations correspond to the falling factorial numbers where only the first k digits can be nonzero: the permutations in figure 12.0-A are obtained by converting the left column (as inversion table) into a permutation (by the routine ffact2perm() described in section 10.1.1 on page 232). This is done in the program [FXT: comb/ffact2kperm-demo.cc] which was used to create the figure. 292 Chapter 12: k-permutations 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: permutation [ . 1 ][ 2 3 4 5 ] [ . 2 ][ 1 3 4 5 ] [ . 3 ][ 1 2 4 5 ] [ . 4 ][ 1 2 3 5 ] [ . 5 ][ 1 2 3 4 ] [ 1 . ][ 5 2 3 4 ] [ 1 2 ][ 5 . 3 4 ] [ 1 3 ][ 5 . 2 4 ] [ 1 4 ][ 5 . 2 3 ] [ 1 5 ][ 4 . 2 3 ] [ 2 . ][ 4 5 1 3 ] [ 2 1 ][ 4 5 . 3 ] [ 2 3 ][ 4 5 . 1 ] [ 2 4 ][ 3 5 . 1 ] [ 2 5 ][ 3 4 . 1 ] [ 3 . ][ 2 4 5 1 ] [ 3 1 ][ 2 4 5 . ] [ 3 2 ][ 1 4 5 . ] [ 3 4 ][ 1 2 5 . ] [ 3 5 ][ 1 2 4 . ] [ 4 . ][ 1 2 3 5 ] [ 4 1 ][ . 2 3 5 ] [ 4 2 ][ . 1 3 5 ] [ 4 3 ][ . 1 2 5 ] [ 4 5 ][ . 1 2 3 ] [ 5 . ][ 4 1 2 3 ] [ 5 1 ][ 4 . 2 3 ] [ 5 2 ][ 4 . 1 3 ] [ 5 3 ][ 4 . 1 2 ] [ 5 4 ][ 3 . 1 2 ] ffact [ . . . . . ] [ . 1 . . . ] [ . 2 . . . ] [ . 3 . . . ] [ . 4 . . . ] [ 1 . . . . ] [ 1 1 . . . ] [ 1 2 . . . ] [ 1 3 . . . ] [ 1 4 . . . ] [ 2 . . . . ] [ 2 1 . . . ] [ 2 2 . . . ] [ 2 3 . . . ] [ 2 4 . . . ] [ 3 . . . . ] [ 3 1 . . . ] [ 3 2 . . . ] [ 3 3 . . . ] [ 3 4 . . . ] [ 4 . . . . ] [ 4 1 . . . ] [ 4 2 . . . ] [ 4 3 . . . ] [ 4 4 . . . ] [ 5 . . . . ] [ 5 1 . . . ] [ 5 2 . . . ] [ 5 3 . . . ] [ 5 4 . . . ] inv. perm. [ . 1 2 3 4 5 ] [ . 2 1 3 4 5 ] [ . 2 3 1 4 5 ] [ . 2 3 4 1 5 ] [ . 2 3 4 5 1 ] [ 1 . 3 4 5 2 ] [ 3 . 1 4 5 2 ] [ 3 . 4 1 5 2 ] [ 3 . 4 5 1 2 ] [ 3 . 4 5 2 1 ] [ 1 4 . 5 2 3 ] [ 4 1 . 5 2 3 ] [ 4 5 . 1 2 3 ] [ 4 5 . 2 1 3 ] [ 4 5 . 2 3 1 ] [ 1 5 2 . 3 4 ] [ 5 1 2 . 3 4 ] [ 5 2 1 . 3 4 ] [ 5 2 3 . 1 4 ] [ 5 2 3 . 4 1 ] [ 1 2 3 4 . 5 ] [ 2 1 3 4 . 5 ] [ 2 3 1 4 . 5 ] [ 2 3 4 1 . 5 ] [ 2 3 4 5 . 1 ] [ 1 3 4 5 2 . ] [ 3 1 4 5 2 . ] [ 3 4 1 5 2 . ] [ 3 4 5 1 2 . ] [ 3 4 5 2 1 . ] Figure 12.1-A: The 2-permutations of 6 elements in lexicographic order (left), the corresponding numbers in falling factorial basis (middle), and the inverse permutations (right). 12.1 Lexicographic order For the generation of k-permutations in lexicographic order we use mixed radix numbers to determine the position of the leftmost change which is restricted to the first k elements. We also store the inverse permutation to simplify the update routine [FXT: comb/kperm-lex.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 class kperm_lex { public: ulong *p_; // permutation ulong *ip_; // inverse permutation ulong *d_; // falling factorial number ulong n_; // total number of elements ulong k_; // permutations of k elements ulong u_; // sort up to position u+1 public: kperm_lex(ulong n) { n_ = n; k_ = n; p_ = new ulong[n_]; ip_ = new ulong[n_]; d_ = new ulong[n_+1]; d_[0] = 0; // sentinel ++d_; // nota bene first(k_); } ~kperm_lex() { delete [] p_; delete [] ip_; --d_; delete [] d_; 12.2: Minimal-change order 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 293 } void first(ulong k) { k_ = k; u_ = n_ - 1; if ( k_ < u_ ) u_ = k_; // == min(k, n-1) for (ulong i=0; i=k_ ) } return false; } return false; The rate of generation grows slightly with n and does not depend on k. For example, the rate is about 160 M/s (for k = n = 12) and 190 M/s (for k = 4 and n = 100) [FXT: comb/kperm-gray-demo.cc]. 295 Chapter 13 Multisets A multiset (or bag) is a collection of elements where elements can be repeated and order does not matter. 13.1 Subsets of a multiset n == 630 primes = [ exponents = [ 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: d 1 2 3 6 9 18 5 10 15 30 45 90 7 14 21 42 63 126 35 70 105 210 315 630 2 1 3 2 5 1 7 ] 1 ] auxiliary products [ 1 1 1 1 1 ] [ 2 1 1 1 1 ] [ 3 3 1 1 1 ] [ 6 3 1 1 1 ] [ 9 9 1 1 1 ] [ 18 9 1 1 1 ] [ 5 5 5 1 1 ] [ 10 5 5 1 1 ] [ 15 15 5 1 1 ] [ 30 15 5 1 1 ] [ 45 45 5 1 1 ] [ 90 45 5 1 1 ] [ 7 7 7 7 1 ] [ 14 7 7 7 1 ] [ 21 21 7 7 1 ] [ 42 21 7 7 1 ] [ 63 63 7 7 1 ] [ 126 63 7 7 1 ] [ 35 35 35 7 1 ] [ 70 35 35 7 1 ] [ 105 105 35 7 1 ] [ 210 105 35 7 1 ] [ 315 315 35 7 1 ] [ 630 315 35 7 1 ] exponents [ . . . . ] [ 1 . . . ] [ . 1 . . ] [ 1 1 . . ] [ . 2 . . ] [ 1 2 . . ] [ . . 1 . ] [ 1 . 1 . ] [ . 1 1 . ] [ 1 1 1 . ] [ . 2 1 . ] [ 1 2 1 . ] [ . . . 1 ] [ 1 . . 1 ] [ . 1 . 1 ] [ 1 1 . 1 ] [ . 2 . 1 ] [ 1 2 . 1 ] [ . . 1 1 ] [ 1 . 1 1 ] [ . 1 1 1 ] [ 1 1 1 1 ] [ . 2 1 1 ] [ 1 2 1 1 ] change @ 4 0 1 0 1 0 2 0 1 0 1 0 3 0 1 0 1 0 2 0 1 0 1 0 Figure 13.1-A: Divisors of 630 = 21 · 32 · 51 · 71 generated as subsets of the multiset of exponents. A subset of a set of n elements can be identified with the bits of all n-bit binary words. The subsets of a multiset can be computed as mixed radix numbers: if the j-th element is repeated rj times, then the radix of digit j has to be rj + 1. Therefore all methods of chapter 9 on page 217 can be applied. e n−1 As an example, all divisors of a number x whose factorization x = pe00 · pe11 · · · pn−1 is known can be computed via the length-n mixed radix numbers with radices [e0 + 1, e1 + 1, . . . , en−1 + 1]. The implementation [FXT: class divisors in mod/divisors.h] generates the subsets of the multiset of exponents in counting order (figure 13.1-A shows the data for x = 630). An auxiliary array T of products is updated with each step: if the changed digit (at position j) became 1, then set t := Tj+1 · pj , else set t := Tj · pj . Set Ti = t for all 0 ≤ i ≤ j. A sentinel element Tn = 1 avoids unnecessary code. Figure 13.1-A was created with the program [FXT: mod/divisors-demo.cc]. The computation of all products of k out of n given factors is described in section 6.2.2 on page 178. 296 Chapter 13: Multisets Subsets with prescribed number of elements The k-subsets (or combinations) of a multiset are the subsets with k elements. They are one-to-one with the mixed radix numbers where the sum of digits equals k, see section 9.6 on page 229. 13.2 Permutations of a multiset 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: (2, 2, 1) [ . . 1 1 2 ] [ . . 1 2 1 ] [ . . 2 1 1 ] [ . 1 . 1 2 ] [ . 1 . 2 1 ] [ . 1 1 . 2 ] [ . 1 1 2 . ] [ . 1 2 . 1 ] [ . 1 2 1 . ] [ . 2 . 1 1 ] [ . 2 1 . 1 ] [ . 2 1 1 . ] [ 1 . . 1 2 ] [ 1 . . 2 1 ] [ 1 . 1 . 2 ] [ 1 . 1 2 . ] [ 1 . 2 . 1 ] [ 1 . 2 1 . ] [ 1 1 . . 2 ] [ 1 1 . 2 . ] [ 1 1 2 . . ] [ 1 2 . . 1 ] [ 1 2 . 1 . ] [ 1 2 1 . . ] [ 2 . . 1 1 ] [ 2 . 1 . 1 ] [ 2 . 1 1 . ] [ 2 1 . . 1 ] [ 2 1 . 1 . ] [ 2 1 1 . . ] 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: (6, 2) [ . . . . . . 1 1 ] [ . . . . . 1 . 1 ] [ . . . . . 1 1 . ] [ . . . . 1 . . 1 ] [ . . . . 1 . 1 . ] [ . . . . 1 1 . . ] [ . . . 1 . . . 1 ] [ . . . 1 . . 1 . ] [ . . . 1 . 1 . . ] [ . . . 1 1 . . . ] [ . . 1 . . . . 1 ] [ . . 1 . . . 1 . ] [ . . 1 . . 1 . . ] [ . . 1 . 1 . . . ] [ . . 1 1 . . . . ] [ . 1 . . . . . 1 ] [ . 1 . . . . 1 . ] [ . 1 . . . 1 . . ] [ . 1 . . 1 . . . ] [ . 1 . 1 . . . . ] [ . 1 1 . . . . . ] [ 1 . . . . . . 1 ] [ 1 . . . . . 1 . ] [ 1 . . . . 1 . . ] [ 1 . . . 1 . . . ] [ 1 . . 1 . . . . ] [ 1 . 1 . . . . . ] [ 1 1 . . . . . . ] 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: (1, 1, 1, 1) [ . 1 2 3 ] [ . 1 3 2 ] [ . 2 1 3 ] [ . 2 3 1 ] [ . 3 1 2 ] [ . 3 2 1 ] [ 1 . 2 3 ] [ 1 . 3 2 ] [ 1 2 . 3 ] [ 1 2 3 . ] [ 1 3 . 2 ] [ 1 3 2 . ] [ 2 . 1 3 ] [ 2 . 3 1 ] [ 2 1 . 3 ] [ 2 1 3 . ] [ 2 3 . 1 ] [ 2 3 1 . ] [ 3 . 1 2 ] [ 3 . 2 1 ] [ 3 1 . 2 ] [ 3 1 2 . ] [ 3 2 . 1 ] [ 3 2 1 . ] Figure 13.2-A: Permutations of multisets in lexicographic order: the multiset (2, 2, 1) (left), (6, 2)  (combinations 6+2 , middle), and (1, 1, 1, 1) (permutations of four elements, right). Dots denote zeros. 2 We write (r0 , r1 , . . . , rk−1 ) for a multiset with r0 elements of the first sort, r1 of the second sort, . . . , Pk−1 rk−1 elements of the k-th sort. The total number of elements is n = j=0 rk . For the elements of the j-th sort we always use the number j. The number of permutations P (r0 , r1 , . . . , rk−1 ) of the multiset (r0 , r1 , . . . , rk−1 ) is a multinomial coefficient:   n n! (13.2-1a) P (r0 , r1 , . . . , rk−1 ) = = r0 ! r1 ! r2 ! · · · rk−1 ! r0 , r1 , r2 , . . . , rk−1         n n − r0 n − r0 − r1 rk−3 + rk−2 + rk−1 rk−2 + rk−1 rk−1 ... (13.2-1b) r0 r1 r2 rk−3 rk−2 rk−1        r0 r0 + r1 r0 + r1 + r2 r0 + r1 + r2 + r3 n = ... (13.2-1c) rk−1 r0 r1 r2 r3 = Relation 13.2-1a is obtained by observing that among the n! ways to arrange all n elements r0 ! permutations of the first sort of elements, r1 ! of the second, and so on, lead to identical permutations. 13.2: Permutations of a multiset 13.2.1 297 Recursive generation Let [r0 , r1 , r2 , . . . , rk−1 ] denote the list of all permutations of the multiset (r0 , r1 , r2 , . . . , rk−1 ). We use the recursion [r0 , r1 , r2 , . . . , rk−1 ] = r0 . [r0 − 1, r1 , r2 , . . . , rk−1 ] r1 . [r0 , r1 − 1, r2 , . . . , rk−1 ] r2 . [r0 , r1 , r2 − 1, . . . , rk−1 ] .. . (13.2-2) rk−1 . [r0 , r1 , r2 , . . . , rk−1 − 1] The following routine generates all multiset permutations in lexicographic order when called with argument zero [FXT: comb/mset-perm-lex-rec-demo.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 ulong n; ulong *ms; ulong k; ulong *r; // number of objects // multiset data in ms[0], ..., ms[n-1] // number of different sorts of objects // number of elements ’0’ in r[0], ’1’ in r[1], ..., ’k-1’ in r[k-1] void mset_perm_rec(ulong d) { if ( d>=n ) visit(); else { for (ulong j=0; j=n_ ) { ++ct_; visit_( *this ); } else { for (ulong jf=k_, j=nn_[jf]; j by >= in the scanning loops: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 bool next() { // find rightmost pair with ms[i] < ms[i+1]: const ulong n1 = n_ - 1; ulong i = n1; do { --i; } while ( ms_[i] >= ms_[i+1] ); // can touch sentinel if ( (long)i<0 ) return false; // last sequence is falling seq. // find rightmost element p[j] less than p[i]: ulong j = n1; while ( ms_[i] >= ms_[j] ) { --j; } swap2(ms_[i], ms_[j]); // Here the elements ms[i+1], ..., ms[n-1] are a falling sequence. // Reverse order to the right: ulong r = n1; ulong s = i + 1; while ( r > s ) { swap2(ms_[r], ms_[s]); --r; ++s; } return true; } } Usage of the class is shown in [FXT: comb/mset-perm-lex-demo.cc]: ulong ct = 0; do { // visit } while ( P.next() ); The permutations of 12 elements are generated at a rate of about 127 million per second, the combinations 30 15 at about 60 million per second, and the permutations of (2, 2, 2, 3, 3, 3) at about 93 million per second. 13.2.3 Order by prefix shifts (cool-lex) An ordering in which each transition involves a cyclic shift of a prefix is described in [360]. Figure 13.2-B shows examples of the ordering that were generated with the program [FXT: comb/mset-permpref-demo.cc]. The implementation is [FXT: comb/mset-perm-pref.h]: 1 2 3 4 5 6 7 8 9 10 class mset_perm_pref { public: ulong k_; // number of different sorts of objects ulong *r_; // number of elements ’0’ in r[0], ’1’ in r[1], ..., ’k-1’ in r[k-1] ulong n_; // number of objects ulong *ms_; // multiset data in ms[0], ..., ms[n-1], sentinel at [n] public: mset_perm_pref(const ulong *r, ulong k) 300 Chapter 13: Multisets 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: (2, 2, 1) [ . 2 1 1 . ] [ 2 . 1 1 . ] [ 1 2 . 1 . ] [ . 1 2 1 . ] [ 1 . 2 1 . ] [ 2 1 . 1 . ] [ . 2 1 . 1 ] [ 2 . 1 . 1 ] [ . 2 . 1 1 ] [ . . 2 1 1 ] [ 2 . . 1 1 ] [ 1 2 . . 1 ] [ . 1 2 . 1 ] [ 1 . 2 . 1 ] [ . 1 . 2 1 ] [ . . 1 2 1 ] [ 1 . . 2 1 ] [ 2 1 . . 1 ] [ 1 2 1 . . ] [ 1 1 2 . . ] [ . 1 1 2 . ] [ 1 . 1 2 . ] [ 1 1 . 2 . ] [ . 1 1 . 2 ] [ 1 . 1 . 2 ] [ . 1 . 1 2 ] [ . . 1 1 2 ] [ 1 . . 1 2 ] [ 1 1 . . 2 ] [ 2 1 1 . . ] 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: (6, 2) [ . 1 1 . . . . . ] [ 1 . 1 . . . . . ] [ . 1 . 1 . . . . ] [ . . 1 1 . . . . ] [ 1 . . 1 . . . . ] [ . 1 . . 1 . . . ] [ . . 1 . 1 . . . ] [ . . . 1 1 . . . ] [ 1 . . . 1 . . . ] [ . 1 . . . 1 . . ] [ . . 1 . . 1 . . ] [ . . . 1 . 1 . . ] [ . . . . 1 1 . . ] [ 1 . . . . 1 . . ] [ . 1 . . . . 1 . ] [ . . 1 . . . 1 . ] [ . . . 1 . . 1 . ] [ . . . . 1 . 1 . ] [ . . . . . 1 1 . ] [ 1 . . . . . 1 . ] [ . 1 . . . . . 1 ] [ . . 1 . . . . 1 ] [ . . . 1 . . . 1 ] [ . . . . 1 . . 1 ] [ . . . . . 1 . 1 ] [ . . . . . . 1 1 ] [ 1 . . . . . . 1 ] [ 1 1 . . . . . . ] 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: (1, 1, 1, 1) [ . 3 2 1 ] [ 3 . 2 1 ] [ 2 3 . 1 ] [ . 2 3 1 ] [ 2 . 3 1 ] [ 3 2 . 1 ] [ 1 3 2 . ] [ 3 1 2 . ] [ . 3 1 2 ] [ 3 . 1 2 ] [ 1 3 . 2 ] [ . 1 3 2 ] [ 1 . 3 2 ] [ 3 1 . 2 ] [ 2 3 1 . ] [ 1 2 3 . ] [ 2 1 3 . ] [ . 2 1 3 ] [ 2 . 1 3 ] [ 1 2 . 3 ] [ . 1 2 3 ] [ 1 . 2 3 ] [ 2 1 . 3 ] [ 3 2 1 . ] Figure 13.2-B: Permutations of multisets in ‘cool-lex’ order: the multiset (2, 2, 1) (left), (6, 2) (combi nations 6+2 2 , middle), and (1, 1, 1, 1) (permutations of four elements, right). Dots denote zeros. 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 { k_ = k; r_ = new ulong[k]; for (ulong j=0; j= ms_[i+1] ); // can touch sentinel ++i; // here: i == length of longest non-increasing prefix if ( i >= n_-1 ) { 13.2: Permutations of a multiset 12 13 14 15 16 17 18 19 20 21 22 23 24 25 rotate_right1(ms_, n_); if ( i==n_ ) return 0; return n_; 301 // was last } else { // compare last of prefix with element 2 positions right: i += ( ms_[i+1] <= ms_[i-1] ); ++i; rotate_right1(ms_, i); return i; } } }; The rate of generation is about 68 M/s for the permutations of 12 elements, 46 M/s for the combinations 30 15 , and 62 M/s for the permutations of (2, 2, 2, 3, 3, 3). The equivalent order for combinations is given in section 6.3 on page 180. As suggested in the paper, the length of the next longest non-increasing prefix can be computed with just one comparison, we store it in a variable ln_. Usage of the fast update is enabled via the line #define MSET_PERM_PREF_LEN near the top of the file [FXT: comb/mset-perm-pref.h]. The initialization has to be modified as follows: 1 2 3 4 5 6 7 8 void first() { [--snip--] // as before #ifdef MSET_PERM_PREF_LEN ln_ = 1; if ( k_ == 1 ) ln_ = n_; #endif } // only one type of object The computation of the successor can be implemented as 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 ulong next() // Return length of rotated prefix, zero with last permutation. { const ulong i = ln_; ulong nr; // number of elements rotated if ( i >= n_-1 ) { nr = n_; rotate_right1(ms_, nr); if ( i==n_ ) return 0; // was last } else { nr = ln_ + 1 + ( ms_[i+1] <= ms_[i-1] ); rotate_right1(ms_, nr); } const bool cmp = ( ms_[0] < ms_[1] ); ln_ = ( cmp ? 1 : ln_ + 1 ); return nr; } The rate of generation is improved to about 71 M/s for the permutations of 12 elements, 62 M/s for the  combinations 30 15 , and 69 M/s for the permutations of (2, 2, 2, 3, 3, 3). 13.2.4 Minimal-change order An algorithm for the generation of a Gray code for the permutations of a multiset is given by Fred Lunnon [priv. comm.], figure 13.2-C shows examples of the ordering. It is a generalization of Trotter’s order for permutations described in section 10.7 on page 254. The implementation is [FXT: class mset perm gray in comb/mset-perm-gray.h]: 1 2 3 4 5 class mset_perm_gray { public: ulong *ms_; // permuted elements (Lunnon’s R_[]) ulong *P_; // permutation 302 Chapter 13: Multisets 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: (2, 2, 1) [ . . 2 2 3 ] [ 2 . . 2 3 ] [ . 2 . 2 3 ] [ . 2 2 . 3 ] [ 2 . 2 . 3 ] [ 2 2 . . 3 ] [ 2 2 3 . . ] [ 2 2 . 3 . ] [ 2 . 2 3 . ] [ . 2 2 3 . ] [ . 3 2 2 . ] [ 3 . 2 2 . ] [ 3 2 . 2 . ] [ 3 2 2 . . ] [ 3 2 . . 2 ] [ 3 . 2 . 2 ] [ . 3 2 . 2 ] [ . 3 . 2 2 ] [ 3 . . 2 2 ] [ . . 3 2 2 ] [ . . 2 3 2 ] [ 2 . . 3 2 ] [ . 2 . 3 2 ] [ . 2 3 . 2 ] [ 2 . 3 . 2 ] [ 2 3 . . 2 ] [ 2 3 2 . . ] [ 2 3 . 2 . ] [ 2 . 3 2 . ] [ . 2 3 2 . ] (2, 0) (0, 1) (3, 2) (1, 0) (2, 1) (4, 2) (2, 3) (1, 2) (0, 1) (3, 1) (1, 0) (2, 1) (3, 2) (2, 4) (1, 2) (0, 1) (2, 3) (1, 0) (0, 2) (2, 3) (2, 0) (0, 1) (3, 2) (1, 0) (2, 1) (4, 2) (2, 3) (1, 2) (0, 1) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: (6, 2) [ . . . . . . 2 2 ] [ 2 . . . . . . 2 ] [ . 2 . . . . . 2 ] [ . . 2 . . . . 2 ] [ . . . 2 . . . 2 ] [ . . . . 2 . . 2 ] [ . . . . . 2 . 2 ] [ . . . . . 2 2 . ] [ 2 . . . . . 2 . ] [ . 2 . . . . 2 . ] [ . . 2 . . . 2 . ] [ . . . 2 . . 2 . ] [ . . . . 2 . 2 . ] [ . . . . 2 2 . . ] [ 2 . . . . 2 . . ] [ . 2 . . . 2 . . ] [ . . 2 . . 2 . . ] [ . . . 2 . 2 . . ] [ . . . 2 2 . . . ] [ 2 . . . 2 . . . ] [ . 2 . . 2 . . . ] [ . . 2 . 2 . . . ] [ . . 2 2 . . . . ] [ 2 . . 2 . . . . ] [ . 2 . 2 . . . . ] [ . 2 2 . . . . . ] [ 2 . 2 . . . . . ] [ 2 2 . . . . . . ] 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: (1, 1, 1, 1) [ . 2 3 4 ] [ 2 . 3 4 ] [ 2 3 . 4 ] [ 2 3 4 . ] [ 3 2 4 . ] [ 3 2 . 4 ] [ 3 . 2 4 ] [ . 3 2 4 ] [ . 3 4 2 ] [ 3 . 4 2 ] [ 3 4 . 2 ] [ 3 4 2 . ] [ 4 3 2 . ] [ 4 3 . 2 ] [ 4 . 3 2 ] [ . 4 3 2 ] [ . 4 2 3 ] [ 4 . 2 3 ] [ 4 2 . 3 ] [ 4 2 3 . ] [ 2 4 3 . ] [ 2 4 . 3 ] [ 2 . 4 3 ] [ . 2 4 3 ] Figure 13.2-C: Gray  code for permutations of multisets: the multiset (2, 2, 1) (left, with swaps), (6, 2) (combinations 6+2 2 , middle), and (1, 1, 1, 1) (permutations of four elements, right). Dots denote ones. 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 ulong *Q_; // inverse permutation ulong *D_; // direction ulong k_; // number of different sorts of objects ulong n_; // number of objects ulong sw1_, sw2_; // positions swapped with last update ulong *r_; // number of elements ’1’ in r[0], ’2’ in r[1], ..., ’k’ in r[k-1] public: mset_perm_gray(const ulong *r, ulong k) { k_ = k; r_ = new ulong[k_]; for (ulong j=0; j= ms_[i] ) { D_[j] = -d; // blocked at j; reverse drift d pre-emptively // next element at j, neighbor at i: j = Q_[P_[j]+1]; d = D_[j]; i = j+d; if ( ms_[j-1] != ms_[j] ) l = j; else { if ( (long)d < 0) i = l-1; } // save left end of run in l } if ( j > n_ ) return false; // current permutation is last // restore left end at head of run // shift run of equal rank from i-d,i-2d,...,l to i,i-d,...,l+d if ( (long)d < 0 ) l = j; ulong e = D_[i], p = P_[i]; // save neighbor drift e and identifier p for (ulong k=i; k!=l; k-=d) { P_[k] = P_[k-d]; Q_[P_[k]] = k; D_[k] = -1UL; // reset drifts of run tail elements } sw1_ = i - 1; sw2_ = l - 1; swap2(ms_[i], ms_[l]); D_[l] = e; P_[l] = p; return D_[i] = d; Q_[p] = l; // save positions swapped // restore drifts of head and neighbor // wrap neighbor around to other end true; } }; The rate of generation is roughly 40 M/s [FXT: comb/mset-perm-gray-demo.cc]. 304 Chapter 14: Gray codes for strings with restrictions Chapter 14 Gray codes for strings with restrictions We give constructions for Gray codes for strings with certain restrictions, such as forbidding two successsive zeros or nonzero digits. The constraints considered are such that the number of strings of a given type satisfies a linear recursion with constant coefficients. 14.1 List recursions 111111111111111111111............. 22222222.......................... .............1111111111111111..... 11111.............222222.......... 22........111111..........111111.. ...1111.....22.....1111.....22.... 1...22...11....11...22...11....11. [120 W(n-3)] 11111111 22222222 ........ 11111... 22...... ...1111. 1...22.. + rev([10 W(n-2)]) 1111111111111 ............. .....11111111 ..........222 ..111111..... ....22.....11 .11....11...2 W(n) == + [00 W(n-2)] ............. ............. 11111111..... 222.......... .....111111.. 11.....22.... 2...11....11. Figure 14.1-A: Computing a Gray code by a sublist recursion. The algorithms are given as list recursions. For example, write W (n) for the list of n-digit words (of a certain type), write W R (n) for the reversed list, and [x . W (n)] for the list with the word x prepended at each word. The recursion for a Gray code is W (n) = [0 0 . W (n − 2) ] [1 0 . W R (n − 2)] [1 2 0 . W (n − 3)] (14.1-1) A relation like this always implies another version which is obtained by reversing the order of the sublists on the right side and additionally reversing each sublist R W (n) [1 2 0 . W R (n − 3)] = [1 0 . W (n − 2) ] [0 0 . W R (n − 2) ] (14.1-2) The construction is illustrated in figure 14.1-A. An implementation of the algorithm is [FXT: comb/fibalt-gray-demo.cc]: 1 2 void X_rec(ulong d, bool z) { 14.2: Fibonacci words 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 305 if ( d>=n ) { if ( d<=n+1 ) // avoid duplicates { visit(); } } else { if ( z ) { rv[d]=0; rv[d+1]=0; X_rec(d+2, z); rv[d]=1; rv[d+1]=0; X_rec(d+2, ! z); rv[d]=1; rv[d+1]=2; rv[d+2]=0; X_rec(d+3, z); } else { rv[d]=1; rv[d+1]=2; rv[d+2]=0; X_rec(d+3, z); rv[d]=1; rv[d+1]=0; X_rec(d+2, ! z); rv[d]=0; rv[d+1]=0; X_rec(d+2, z); } } } The initial call is X_rec(0, 0);. The parameter z determines whether the list is generated in forward or backward order. No optimizations are made as these tend to obscure the idea. Here we could omit one statement rv[d]=1; in both branches, replace the arguments z and !z in the recursive calls by constants, or create an iterative version. The number w(n) of words W (n) is determined by (some initial values and) a recursion. Counting the size of the lists on both sides of the recursion relation gives a relation for w(n). Relation 14.1-1 leads to the recursion w(n) = 2 w(n − 2) + w(n − 3) (14.1-3) We can typically set w(0) = 1, there is one empty list and it satisfies all conditions. The numbers w(n) are in fact the Fibonacci numbers. 14.2 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: Fibonacci words . . . . . . . . . . . . . 1 . . . . . 1 . . . . . 1 . . . . . . 1 . 1 . . . 1 . . . . . . 1 . . 1 . . . 1 . 1 . . . 1 . . . . . . 1 . . . 1 . . 1 . . 1 . . . 1 . 1 . . . . 1 . 1 . 1 . 1 . . . . . . 1 . . . . 1 . 1 . . . 1 . . 1 . . 1 . . . 1 . . 1 . 1 . 1 . 1 . . . . 1 . 1 . . 1 . 1 . 1 . 1 . 1 . . . . . . 1 . . . . . 1 1 . . . . 1 . 1 . . . 1 . . 1 . . . 1 . 1 1 . . 1 . . . 1 . . 1 . . 1 1 . . 1 . 1 . 1 . 1 . . . . 1 . 1 . . . 1 1 . 1 . . 1 . 1 . 1 . 1 . . 1 . 1 . 1 . 1 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: . 1 . . 1 . . . 1 . . 1 . 1 . 1 . . . . 1 . 1 . . . . . . 1 . . . 1 . . 1 . 1 . 1 . . 1 . 1 . . . . 1 . 1 . . 1 . . . 1 . . 1 . . . 1 . . . . . . 1 . 1 . . . . . . 1 . . . . . . . . . . . . . . 1 . . . . 1 . 1 . . . . 1 . . . . 1 . 1 . . . . 1 . 1 . 1 . . 1 . . . 1 . . 1 . . . . . . 1 . . 1 . 1 . 1 . . 1 . 1 . 1 . . . . 1 . 1 . . . 1 1 . 1 . 1 . 1 1 . 1 . 1 . . 1 . . . 1 . . 1 . . . 1 . 1 1 . . . . . 1 1 . . . . . . 1 . . . . 1 . 1 . . 1 . 1 . 1 . . 1 . . . 1 . . 1 . . 1 . 1 . . 1 . . 1 . . . . . 1 . . . 1 . 1 . 1 . 1 . 1 . 1 . . . . . 1 . . . . . 1 . 1 . . . . . 1 . . . . . . . . . . 1 . . . 1 . 1 . . . 1 . . . . . 1 . . 1 1 . 1 . . 1 1 . 1 . . . 1 . 1 . 1 . 1 . . . 1 . 1 . . . . . 1 . . . . 1 1 . . 1 . 1 1 . . 1 . . . 1 . . 1 . 1 . . . . 1 . 1 . . . . 1 . . . . . . . . . . 1 . . 1 . 1 . . 1 . . 1 . 1 . . 1 . 1 . 1 1 . . . 1 1 . . . . 1 . . 1 . Figure 14.2-A: The first 34 Fibonacci words in counting order (left) and Gray codes through the first 34, 21, and 13 Fibonacci words (right). Dots are used for zeros. 306 Chapter 14: Gray codes for strings with restrictions A recursive routine to generate the Fibonacci words (binary words not containing two consecutive ones) can be given as follows: 1 2 3 4 5 6 7 8 9 10 11 12 ulong n; ulong *rv; // number of bits in words // bits of the word void fib_rec(ulong d) { if ( d>=n ) visit(); else { rv[d]=0; fib_rec(d+1); rv[d]=1; rv[d+1]=0; fib_rec(d+2); } } We allocate one extra element (a sentinel) to reduce the number of if-statements in the code: int main() { n = 7; rv = new ulong[n+1]; fib_rec(0); return 0; } // incl. sentinel rv[n] The output (assuming visit() simply prints the array) is given in the left of figure 14.2-A. A simple modification of the routine generates a Gray code through the Fibonacci words [FXT: comb/fibgray-rec-demo.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 void fib_rec(ulong d, bool z) { if ( d>=n ) visit(); else { z = !z; // change direction for Gray code if ( z ) { rv[d]=0; fib_rec(d+1, z); rv[d]=1; rv[d+1]=0; fib_rec(d+2, z); } else { rv[d]=1; rv[d+1]=0; fib_rec(d+2, z); rv[d]=0; fib_rec(d+1, z); } } } The variable z controls the direction in the recursion, it is changed unconditionally with each step. The if-else blocks can be merged into 1 2 rv[d]=!z; rv[d]= z; rv[d+1]= z; rv[d+1]=!z; fib_rec(d+1+!z, z); fib_rec(d+1+ z, z); In the n-bit Fibonacci Gray code the number of ones in the first and last, second and second-last, etc. tracks are equal. Therefore the sequence of reversed words is also a Fibonacci Gray code. The algorithm needs constant amortized time and about 70 million objects are generated per second. A bit-level algorithm is given in section 1.27.2 on page 76. The algorithm for the list of the length-n Fibonacci words F (n) can be given as a recursion: F (n) = [1 0 . F R (n − 2)] [0 . F R (n − 1) ] (14.2-1) The generation can be sped up by merging two steps: F (n) [1 0 0 . F (n − 3) ] [1 0 1 0 . F (n − 4)] = [0 0 . F (n − 2) ] [0 1 0 . F (n − 3) ] (14.2-2) 14.3: Generalized Fibonacci words 14.3 307 Generalized Fibonacci words ............................................1111111111111111111111111111111111111 ........................11111111111111111111........................1111111111111 .............11111111111.............1111111.............11111111111............. .......111111.......1111.......111111..............111111.......1111.......111111 ....111....11....111........111....11....111....111....11....111........111....11 ..11..1..11....11..1..11..11..1..11....11..1..11..1..11....11..1..11..11..1..11.. .1.1.1..1.1.1.1.1.1..1.1.1.1.1..1.1.1.1.1.1..1.1.1..1.1.1.1.1.1..1.1.1.1.1..1.1.1 1111111111111111111111111111111111111............................................ 1111111111111................................................11111111111111111111 ..........................1111111111111111111111..........................1111111 .......111111111111..............11111111..............111111111111.............. 111........1111........111111................111111........1111........111111.... 1....1111........1111....11....1111....1111....11....1111........1111....11....11 ..11..11..11..11..11..11....11..11..11..11..11....11..11..11..11..11..11....11..1 Figure 14.3-A: The 7-bit binary words with at most 2 consecutive ones in lexicographic (top) and minimal-change (bottom) order. Dots denote zeros. 1111111111111 111111111111111111111111 ............................................ 1111111111111 ........................ ............. ........................11111111111111111111 .............11111111111 11111111111..........................1111111 .......111111 111111..............1111 1111..............111111111111.............. 111........11 11........111111........ ........111111........1111........111111.... 1....1111.... ....1111....11....1111.. ..1111....11....1111........1111....11....11 ..11..11..11. .11..11..11....11..11..1 1..11..11....11..11..11..11..11..11....11..1 Figure 14.3-B: Recursive structure for the 7-bit binary words with at most 2 consecutive ones. We generalize the Fibonacci words by allowing a fixed maximum value r of consecutive ones in a binary word. The Fibonacci words correspond to r = 1. Figure 14.3-A shows the 7-bit words with r = 2. The method to generate a Gray code for these words is a generalization of the recursion for the Fibonacci words. Write Lr (n) for the list of n-bit words with at most r consecutive ones, then the recursive structure for the Gray code is Lr (n) = [0 . LR r (n − 1) [1 0 . LR r (n − 2) [1 1 0 . LR r (n − 3) .. [ . ] ] ] ] (14.3-1) r−2 [1 0 . LR r (n − 1 − r + 2)] r−1 [1 0 . LR r (n − 1 − r + 1)] [1r 0 . LR ] r (n − 1 − r) Figure 14.3-B shows the structure for L2 (7), corresponding to the three lowest sublists on the right side of the equation. An implementation is [FXT: comb/maxrep-gray-demo.cc]: 1 2 3 ulong n; ulong *rv; long mr; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 void maxrep_rec(ulong d, bool z) { if ( d>=n ) visit(); else { z = !z; // number of bits in words // bits of the word // maximum number of consecutive ones long km = mr; if ( d+km > n ) km = n - d; if ( z ) { // words: 0, 10, 110, 1110, ... for (long k=0; k<=km; ++k) 308 Chapter 14: Gray codes for strings with restrictions 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: r = 5 1 1 1 1 1 1 1 1 1 . 1 1 1 . . 1 1 1 . 1 1 1 . . 1 1 1 . . . 1 1 . 1 . 1 1 . 1 1 1 . . 1 1 1 . . 1 . 1 . . . . 1 . . . 1 1 . 1 . 1 1 . 1 . . 1 . 1 1 . 1 . 1 1 1 . . 1 1 1 . . 1 1 . . . 1 . . . . 1 . 1 . . . . 1 . . . . . . . . 1 . . . . 1 1 . 1 . 1 1 . 1 . 1 . . 1 . . . . 1 . . 1 . 1 1 . 1 . 1 1 . . . 1 1 1 . . 1 1 1 1 r = 4 1 1 1 1 . 1 1 1 . . 1 1 1 . 1 1 1 . . 1 1 1 . . . 1 1 . 1 . 1 1 . 1 1 1 . . 1 1 1 . . 1 . 1 . . . . 1 . . . 1 1 . 1 . 1 1 . 1 . . 1 . 1 1 . 1 . 1 1 1 . . 1 1 1 . . 1 1 . . . 1 . . . . 1 . 1 . . . . 1 . . . . . . . . 1 . . . . 1 1 . 1 . 1 1 . 1 . 1 . . 1 . . . . 1 . . 1 . 1 1 . 1 . 1 1 . . . 1 1 1 . . 1 1 1 1 r = 3 1 1 1 . . 1 1 1 . 1 1 1 . . 1 1 1 . . . 1 1 . 1 . 1 1 . 1 1 1 . . 1 1 1 . . 1 . 1 . . . . 1 . . . 1 1 . 1 . 1 1 . 1 . . 1 . 1 1 . 1 . 1 1 1 . . 1 1 1 . . 1 1 . . . 1 . . . . 1 . 1 . . . . 1 . . . . . . . . 1 . . . . 1 1 . 1 . 1 1 . 1 . 1 . . 1 . . . . 1 . . 1 . 1 1 . 1 . 1 1 . . . 1 1 1 . r = 2 1 1 . . 1 1 1 . . . 1 1 . 1 . 1 1 . 1 1 1 . . 1 1 1 . . 1 . 1 . . . . 1 . . . 1 1 . 1 . 1 1 . 1 . . 1 . 1 1 . . . 1 1 . . . 1 . . . . 1 . 1 . . . . 1 . . . . . . . . 1 . . . . 1 1 . 1 . 1 1 . 1 . 1 . . 1 . . . . 1 . . 1 . 1 1 . 1 . 1 1 . . r = 1 1 . . 1 . 1 . . . . 1 . . . 1 1 . 1 . 1 1 . 1 . . . . 1 . . . . 1 . 1 . . . . 1 . . . . . . . . 1 . . 1 . 1 . . 1 . . . . 1 . . 1 Figure 14.3-C: Gray codes of the 5-bit binary words with at most r consecutive ones. The leftmost column is the complement of the Gray code of all binary words, the rightmost column is the Gray code for the Fibonacci words. 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 { rv[d+k] = 0; maxrep_rec(d+1+k, z); rv[d+k] = 1; } } else { // words: ... 1110, 110, 10, 0 for (long k=0; k=0; --k) { rv[d+k] = 0; maxrep_rec(d+1+k, z); } } } } Figure 14.3-C shows the 5-bit Gray codes for r ∈ {1, 2, 3, 4, 5}. Observe that all sequences are subsequences of the leftmost column. n: r=1: r=2: r=3: r=4: r=5: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 1 2 4 7 13 24 44 81 149 274 504 927 1705 3136 5768 10609 1 2 4 8 15 29 56 108 208 401 773 1490 2872 5536 10671 20569 1 2 4 8 16 31 61 120 236 464 912 1793 3525 6930 13624 26784 1 2 4 8 16 32 63 125 248 492 976 1936 3840 7617 15109 29970 Figure 14.3-D: Number of length-n binary words with at most r consecutive ones. Let wr (n) be the number of n-bit words Wr (n) with ≤ r consecutive ones. Taking the length of the lists on both sides of relation 14.3-1 gives the recursion wr (n) = r X j=0 wr (n − 1 − j) (14.3-2) 14.3: Generalized Fibonacci words 309 where we set wr (n) = 2k for 0 ≤ n ≤ r. The sequences for r ≤ 5 start as shown in figure 14.3-D. The sequences are the following entries in [312]: r = 1 is entry A000045 (the Fibonacci numbers), r = 2 is A000073, r = 3 is A000078, r = 4 is A001591, and r = 5 is A001592. The variant of the Fibonacci sequence where each number is the sum of its k predecessors is also called Fibonacci k-step sequence. The generating function for wr (n) is ∞ X wr (n) x n Pr = n=0 k k=0 x P r+1 1 − k=1 xk (14.3-3) Alternative Gray code for words without substrings 111 (r = 2) ‡ ............................................1111111111111111111111111111111111111 ........................111111111111111111111111111111111........................ .............111111111111111111.......................................11111111111 .......1111111111.....................111111111111..............1111111111....... ....11111............111111........11111........11111........11111............111 ..111......1111....111....111....111......1111......111....111......1111....111.. .11...11..11..11..11...11...11..11...11..11..11..11...11..11...11..11..11..11...1 Figure 14.3-E: The 7-bit binary words with at most 2 consecutive ones in a minimal-change order. The list recursion for the Gray code for binary words without substrings 111 is the special case r = 2 of relation 14.3-1 on page 307: L2 (n) [1 1 0 . LR 2 (n − 3)] = [1 0 . LR 2 (n − 2) ] [0 . LR ] 2 (n − 1) (14.3-4) A different Gray code is generated by the recursion L02 (n) = [1 0 . L02 (n − 2) ] R [1 1 0 . L0 2 (n − 3)] 0 [0 . L2 (n − 1) ] (14.3-5) The ordering is shown in figure 14.3-E. It was created with the program [FXT: comb/no111-gray-demo.cc]. Alternative Gray code for words without substrings 1111 (r = 3) ‡ 1111111111111111111111111111111111111......................... .............................1111111111111111................. 1111...............111111111111111111111111111111111111....... 111111111........1111................................1111..... ........11....1111111111....11....111111....11....1111111111.. 1..11..1111..11........11..1111..11....11..1111..11........11. ..1111......1111..11..1111......1111..1111......1111..11..1111 ...............................111111111111111 ............1111111111111111111111111111111111 ........11111111.............................. ...111111111111111111........1111........11111 ..11................11....1111111111....11.... .1111..11..11..11..1111..11........11..1111..1 ......1111....1111......1111..11..1111......11 Figure 14.3-F: The 7-bit binary words with at most 3 consecutive ones in a minimal-change order. A list recursion for an alternative Gray code for binary words without substrings 1111 (r = 3) is R L03 (n) [1 1 0 . L0 3 (n − 3) ] R [0 . L0 3 (n − 1) ] = R [1 1 1 0 . L0 3 (n − 4)] R [1 0 . L0 3 (n − 2) ] (14.3-6) 310 Chapter 14: Gray codes for strings with restrictions The ordering is shown in figure 14.3-F. It was created with the program [FXT: comb/no1111-graydemo.cc]. For all odd r ≥ 3 a Gray code is generated by a list recursion where the prefixes with an even number of ones are followed by those with an odd number of ones. For example, with r = 5 the recursion is R [1 1 1 1 0 . L0 5 (n − 7) ] R [1 1 0 . L0 5 (n − 3) ] 0R [0 . L (n − 1) ] 5 L05 (n) = R [1 1 1 1 1 0 . L0 5 (n − 6)] R [1 1 1 0 . L0 5 (n − 4) ] 0R [1 0 . L 5 (n − 2) ] 14.4 (14.3-7) Run-length limited (RLL) words 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: RLL(2) words . . 1 . . 1 . . . . 1 . . 1 . 1 . . 1 . . 1 1 . . . 1 . 1 . . 1 . . 1 . 1 . 1 . . . 1 . 1 . 1 1 . . 1 . 1 1 . . . . 1 . 1 1 . 1 . . 1 1 . . 1 . . . 1 1 . . 1 1 . . 1 1 . 1 . . . . 1 1 . 1 . 1 . . 1 1 . 1 1 . . 1 . . 1 . . 1 . 1 . . 1 . 1 . . 1 . . 1 . 1 1 . 1 . . 1 1 . . . 1 . . 1 1 . 1 . 1 . 1 . . 1 . . 1 . 1 . . 1 1 . 1 . 1 . 1 . . . 1 . 1 . 1 . 1 . 1 . 1 . 1 1 . . 1 . 1 1 . . 1 . 1 . 1 1 . 1 . . 1 . 1 1 . 1 1 . 1 1 . . 1 . . . 1 1 . . 1 . 1 . 1 1 . . 1 1 . . 1 1 . 1 . . 1 . 1 1 . 1 . 1 . . 1 1 . 1 . 1 1 . 1 1 . 1 1 . . . 1 1 . 1 1 . 1 Fibonacci words 1 . . 1 . . 1 1 . . 1 . . . 1 . . 1 . 1 . 1 . . . . 1 . 1 . . . . . . 1 . . . . . 1 1 . . . 1 . 1 1 . . . 1 . . 1 . 1 . 1 . . 1 . 1 . 1 . 1 1 . 1 . . . 1 1 . 1 . . . . 1 . 1 . . 1 . . . 1 . . 1 . . . 1 . . . . . . 1 . . . 1 . . 1 . 1 . 1 . . 1 . 1 . . . . . . 1 . . . . . . 1 . 1 . . . . . . 1 . . . . . . . . . . . . 1 . . . . 1 . 1 . . . . 1 . . . . . . 1 . . 1 . 1 . 1 . . 1 . 1 . 1 . . . . 1 . 1 . 1 . . 1 . . . 1 . . 1 . . . . . . 1 . . . . 1 . 1 . . 1 . 1 . 1 . . 1 . . Figure 14.4-A: Lex order for RLL(2) words (left) corresponds to Gray code for Fibonacci words (right). Words with conditions on the minimum and maximum number of repetitions of a value are called runlength limited (RLL) words. Here we consider only binary words where the number of both consecutive zeros and ones is at most r where r ≥ 2. We call the RLL words starting with zero as RLL(r) words. RLL(r) words of length n correspond to generalized Fibonacci words (with at most r − 1 ones) of length n − 1: the k-th digit (k ≥ 1) of the Fibonacci word is one if the k-th digit of the RLL word is unchanged. The list of RLL(2) words in lexicographic order is shown in figure 14.4-A, note that the corresponding Fibonacci words are in minimal change order. The listing was generated by the following recursion [FXT: comb/rll-rec-demo.cc]: 1 2 3 4 5 6 7 ulong n; // number of bits in words void rll_rec(ulong d, bool z) { if ( d>=n ) visit(); else { 14.5: Digit x followed by at least x zeros 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: RLL(2) words . 1 1 . . 1 1 . 1 1 . . 1 . . 1 1 . 1 1 . . 1 1 . 1 . . . 1 1 . 1 . 1 . 1 . . 1 1 . . 1 . . 1 . . . 1 . . 1 . 1 . 1 . 1 1 . . . 1 . 1 1 . 1 . 1 . 1 . . 1 . 1 . 1 . 1 1 . 1 . 1 . 1 . 1 1 . . 1 1 . 1 1 . . 1 . . 1 1 . . 1 . 1 1 1 . 1 1 . . 1 1 . 1 1 . 1 1 1 . 1 . . 1 1 1 . 1 . 1 1 1 1 . 1 . 1 . 1 . . 1 1 . . 1 . . 1 1 . 1 1 . . 1 . . 1 1 . . 1 . 1 1 1 . . 1 . 1 . 1 . 1 1 . . 1 1 . 1 1 . 1 1 1 . 1 1 . 1 . 1 . 1 . . 1 1 1 . 1 . . 1 . 1 . 1 . 1 1 . 1 . 1 . 1 . . 1 . 1 . 1 . 1 change 1 1 1 1 3 1 1 2 1 1 1 1 3 1 1 2 1 1 1 1 3 1 1 1 1 3 1 1 2 1 1 1 1 311 Fibonacci words 1 . 1 . 1 . 1 1 . 1 . 1 . . 1 . 1 . . 1 . 1 . 1 . . . 1 1 . 1 . . . . 1 . . 1 . 1 . 1 . . 1 . . 1 1 . . 1 . . . 1 . . . 1 . 1 1 . . . 1 . . 1 . . . . 1 . 1 . . . . . 1 1 . . . . . . . 1 . 1 . 1 . . 1 . 1 . . 1 . 1 . 1 . . . . 1 . . 1 . 1 . 1 . . 1 . . . 1 . . . 1 . . 1 . . . . 1 . 1 . . . . . . . 1 . 1 . 1 . . 1 . 1 . . . . 1 . . 1 . . . 1 . . . 1 . . 1 . . . . . . . 1 . 1 . . . . 1 . . 1 . . . 1 . . . . . . . 1 . 1 . . . . 1 . . . . . . . 1 . . . . . . . 1 . . . . . . . Figure 14.4-B: Order for RLL(2) words (left) corresponding to lex order for Fibonacci words (right). 8 9 10 11 12 13 14 15 16 17 18 19 if ( z==0 ) { rv[d]=0; rv[d+1]=1; rll_rec(d+2, 1); rv[d]=1; rll_rec(d+1, 1); } else // z==1 { rv[d]=0; rll_rec(d+1, 0); rv[d]=1; rv[d+1]=0; rll_rec(d+2, 0); } } } The variable z records whether the last bit was a one. By swapping the lines in the branch for z = 1 we obtain an ordering which corresponds to the (reversed) lexicographic order of the Fibonacci words shown √ in figure 14.4-B. The average number of changes per between successive elements tends to 1 + 1/ 5 ≈ 1.44721 for n → ∞. The order is not a Gray code for the RLL words, the maximum number of changed bits among all transitions for n ≤ 30 is n: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 ... 1 1 1 2 3 3 3 4 5 5 5 6 7 7 7 8 9 9 9 10 11 11 11 12 13 13 13 14 15 15 ... 14.5 Digit x followed by at least x zeros .................................................111111111111111111111111122222222222223333333 333322222221111111111111...................................................................... .....................................111111122223322221111111................................. ..................111123321111......................................111123321111.............. ........123321....................123321..................123321....................123321.... .123321........123321......123321........123321....123321........123321......123321........123 Figure 14.5-A: Gray code for the length-6 words with maximal digit 3 where a digit x is followed by at least x zeros. Dots denote zeros. Figure 14.5-A shows a Gray code for the length-5 words with maximal digit 3 where a digit x is followed 312 Chapter 14: Gray codes for strings with restrictions by at least x zeros. For the Gray code list Zr (n) of the length-n words with maximal digit r we have [0 . ZrR (n − 1) [1 0 . ZrR (n − 2) [2 0 0 . ZrR (n − 3) Zr (n) = [3 0 0 0 . Z R (n − 4) r .. [ . ] ] ] ] (14.5-1) ] [r 0r . ZrR (n − r − 1)] An implementation is [FXT: comb/gexz-gray-demo.cc]: 1 2 3 ulong n; ulong *rv; ulong mr; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 void gexz_rec(ulong d, bool z) { if ( d>=n ) visit(); else { if ( z ) { // words 0, 10, 200, 3000, 40000, ... ulong k = 0; do { rv[d]=k; for (ulong j=1; j<=k; ++j) rv[d+j] = 0; gexz_rec(d+k+1, !z); } while ( ++k <= mr ); } else { // words ..., 40000, 3000, 200, 10, 0 ulong k = mr + 1; do { --k; rv[d]=k; for (ulong j=1; j<=k; ++j) rv[d+j] = 0; gexz_rec(d+k+1, !z); } while ( k != 0 ); } } } // number of digits in words // digits of the word // radix== mr+1 n: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 r=1: 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 r=2: 1 3 5 9 17 31 57 105 193 355 653 1201 2209 4063 7473 13745 r=3: 1 4 7 13 25 49 94 181 349 673 1297 2500 4819 9289 17905 34513 r=4: 1 5 9 17 33 65 129 253 497 977 1921 3777 7425 14597 28697 56417 r=5: 1 6 11 21 41 81 161 321 636 1261 2501 4961 9841 19521 38721 76806 Figure 14.5-B: Number of radix-(r + 1), length-n words where a digit x is followed by at least x zeros. Let zr (n) be the number of n-bit words Zr (n), then zr (n) = r+1 X zr (n − j) (14.5-2) j=1 where we set zr (n) = 1 for n ≤ 0. The sequences for r ≤ 5 start as shown in figure 14.5-B. The sequences are the following entries in [312]: r = 1 is entry A000045 (the Fibonacci numbers), r = 2 is A000213, r = 3 is A000288, r = 4 is A000322, and r = 5 is A000383. 14.6: Generalized Pell words 14.6 Generalized Pell words 14.6.1 Gray code for Pell words 313 .........................................111111111111111111111111111111111 .................111111111111111112222222.................1111111111111111 .......1111111222.......1111111222..............1111111222.......111111122 ...1112...1112......1112...1112......1112...1112...1112......1112...1112.. .12.12..12.12..12.12.12..12.12..12.12.12..12.12..12.12..12.12.12..12.12..1 2222222 ....... 1111222 1112... .12..12 .........................................111111111111111111111111111111111 .................111111111111111112222222222222211111111111111111......... .......11111112222221111111............................1111111222222111111 ...11122111............11122111......11122111......11122111............111 .1221....1221..1221..1221....1221..1221....1221..1221....1221..1221..1221. 2222222 ....... 1111222 1...... 221..12 Figure 14.6-A: Start and end of the lists of 5-digit Pell words in counting order (top) and Gray code order (bottom). The lowest row is the least significant digit, dots denote zeros. A Gray code of the Pell words (ternary words without the substrings "21" and "22") can be computed as follows: 1 2 3 ulong n; ulong *rv; bool zq; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 void pell_rec(ulong d, bool z) { if ( d>=n ) visit(); else { if ( 0==z ) { rv[d]=0; pell_rec(d+1, z); rv[d]=1; pell_rec(d+1, zq^z); rv[d]=2; rv[d+1]=0; pell_rec(d+2, z); } else { rv[d]=2; rv[d+1]=0; pell_rec(d+2, z); rv[d]=1; pell_rec(d+1, zq^z); rv[d]=0; pell_rec(d+1, z); } } } // number of digits in words // digits of the word // order: 0==>Lex, 1==>Gray The global Boolean variable zq controls whether the counting order or the Gray code is generated. The code is given in [FXT: comb/pellgray-rec-demo.cc]. Both orderings are shown in figure 14.6-A. About 110 million words per second are generated. The computation of a function whose power series coefficients are related to the Pell Gray code is described in section 38.12.3 on page 760. 14.6.2 Gray code for generalized Pell words ...........................................1111111111111111111111111111111 333322222222222221111111111111..........................111111111111122222 ........111122223322221111........111122223322221111........11112222332222 .123321..123321....123321..123321..123321....123321..123321..123321....123 11111111111122222222222222222222222222222222222222222223333333333333 222222223333333322222222222221111111111111.......................... 1111................111122223322221111........111122223322221111.... 321..123321..123321..123321....123321..123321..123321....123321..123 Figure 14.6-B: Gray code for 4-digit radix-4 strings with no substring 3x with x 6= 0. A generalization of the Pell words are the radix-(r + 1) strings where the substring rx with x 6= 0 is forbidden (that is, a nine can only be followed by a zero). Let Pr (n) be the list of length-n words in Gray 314 Chapter 14: Gray codes for strings with restrictions code order. The list can be generated by the recursion [0 . Pr (n − 1) [1 . PrR (n − 1) [2 . Pr (n − 1) R Pr (n) = [3 . Pr (n − 1) .. [ . ] ] ] ] [0 . PrR (n − 1) [1 . Pr (n − 1) [2 . PrR (n − 1) = [3 . Pr (n − 1) .. [ . ] ] ] ] (14.6-1a) ] [(r − 1) . PrR (n − 1)] [(r) 0 . Pr (n − 2) ] if r is even, and by the recursion Pr (n) (14.6-1b) ] [(r − 1) . Pr (n − 1)] [(r) 0 . PrR (n − 2) ] if r is odd. Figure 14.6-B shows a Gray code for the 4-digit strings with r = 3. An implementation of the algorithm is [FXT: comb/pellgen-gray-demo.cc]: 1 2 3 ulong n; ulong *rv; long r; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 void pellgen_rec(ulong d, bool z) { if ( d>=n ) visit(); else { const bool p = r & 1; // parity of r rv[d] = 0; if ( z ) { for (long k=0; k=0; --k) { rv[d] = k; pellgen_rec(d+1, z ^ p ^ (k&1)); } } } } // number of digits in words // digits of the word (radix r+1) // Forbidden substrings are [r, x] where x!=0 With r = 1 we again get the Gray code for Fibonacci words. n: 0 1 2 3 4 5 6 7 8 9 10 11 r=1: 1 2 3 5 8 13 21 34 55 89 144 233 r=2: 1 3 7 17 41 99 239 577 1393 3363 8119 19601 r=3: 1 4 13 43 142 469 1549 5116 16897 55807 184318 608761 r=4: 1 5 21 89 377 1597 6765 28657 121393 514229 2178309 9227465 r=5: 1 6 31 161 836 4341 22541 117046 607771 3155901 16387276 85092281 Figure 14.6-C: Number of length-n, radix-(r + 1) words with no substring r x with x 6= 0. Taking the number pr (n) of words Pr (n) on both sides of relations 14.6-1a and 14.6-1b we find pr (n) = r pr (n) + pr (n − 2) (14.6-2) where pr (0) = 1 and pr (1) = r +1. For r ≤ 5 the sequences start as shown in figure 14.6-C. The sequences are the following entries in [312]: r = 1: A000045; r = 2: A001333; r = 3: A003688; r = 4: A015448; 14.7: Sparse signed binary words 315 r = 5: A015449. The generating function for pr (n) is ∞ X pr (n) xn = n=0 14.7 1+x 1 − r x − x2 (14.6-3) Sparse signed binary words ...........................................MMMMMMMMMMMMMMMMMMMMMPPPPPPPPPPPPPPPPPPPPP PPPPPPPPPPPMMMMMMMMMMM............................................................... .................................MMMMMPPPPPPPPPPMMMMM......................MMMMMPPPPP PPPMMM..........MMMPPPPPPMMM..............................MMMPPPPPPMMM............... .........MPPM..................MPPM......MPPM......MPPM..................MPPM......MP PM..MPPM......MPPM..MPPM..MPPM......MPPM......MPPM......MPPM..MPPM..MPPM......MPPM... Figure 14.7-A: A Gray code through the 85 sparse 6-bit signed binary words. Dots are used for zeros, the symbols ‘P’ and ‘M’ denote +1 and −1, respectively. Figure 14.7-A shows a minimal-change order (Gray code) for the sparse signed binary words (nonadjacent form (NAF), see section 1.23 on page 61). Note that we allow a digit to switch between +1 and −1. If all words with any positive digit (‘P’) are omitted, we obtain the Gray code for Fibonacci words given in section 14.2 on page 305. A recursive routine for the generation of the Gray code is given in [FXT: comb/naf-gray-rec-demo.cc]: 1 2 ulong n; int *rv; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 void sb_rec(ulong d, bool z) { if ( d>=n ) visit(); else { if ( 0==z ) { rv[d]=0; sb_rec(d+1, 1); rv[d]=-1; rv[d+1]=0; sb_rec(d+2, 1); rv[d]=+1; rv[d+1]=0; sb_rec(d+2, 0); } else { rv[d]=+1; rv[d+1]=0; sb_rec(d+2, 1); rv[d]=-1; rv[d+1]=0; sb_rec(d+2, 0); rv[d]=0; sb_rec(d+1, 0); } } } // number of digits of the string // the string About 120 million words per second are generated. Let S(n) be the number of n-digit sparse signed binary numbers (of both signs) and P (n) be the number of positive n-digit sparse signed binary numbers, then n: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 S(n): 1 3 5 11 21 43 85 171 341 683 1365 2731 5461 10923 21845 43691 87381 P(n): 1 2 3 6 11 22 43 86 171 342 683 1366 2731 5462 10923 21846 43691 The sequence of values S(n) and P (n) are respectively entries A001045 and A005578 in [312]. We have 316 Chapter 14: Gray codes for strings with restrictions (with e := n mod 2) 2n+2 − 1 + 2 e = 2 S(n − 1) − 1 + 2 e 3 = S(n − 1) + 2 S(n − 2) = 3 S(n − 2) + 2 S(n − 3) = 2 P (n) − 1 2n+1 + 1 + e = 2 P (n − 1) − 1 − e = S(n − 1) + e P (n) = 3 = P (n − 1) + S(n − 2) = P (n − 2) + S(n − 2) + S(n − 3) S(n) = = S(n − 2) + S(n − 3) + S(n − 4) + . . . + S(2) + S(1) + 3 = 2 P (n − 1) + P (n − 2) − 2 P (n − 3) (14.7-1a) (14.7-1b) (14.7-1c) (14.7-1d) (14.7-1e) (14.7-1f) Almost Gray code for positive words ‡ >< >< ...........................................PPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPP PPPPPPPPPPPPPPPPPPPPP................................................................. ................................PPPPPPPPPPPPPPPPPPPPPPMMMMMMMMMMM..................... PPPPPMMMMM...........PPPPP..................................................MMMMMPPPPP ...............MMMPPP........PPP.....MMMPPPPPPMMM..........MMMPPPPPPMMM............... PM......MPPM............MPP.....PM..................MPPM..................MPPM......MP ...MPPM......MPPM..MPPM.....PPM....MPPM..MPPM..MPPM......MPPM..MPPM..MPPM......MPPM... >< >< Figure 14.7-B: An ordering of the 86 sparse 7-bit positive signed binary words that is almost a Gray code. The transitions that are not minimal are marked with ‘><’. Dots denote zeros. If we start with the following routine that calls sb_rec() only after a one has been inserted, we get an ordering of the positive numbers: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 void pos_rec(ulong d, bool z) { if ( d>=n ) visit(); else { if ( 0==z ) { rv[d]=0; pos_rec(d+1, 1); rv[d]=+1; rv[d+1]=0; sb_rec(d+2, 1); } else { rv[d]=+1; rv[d+1]=0; sb_rec(d+2, 0); rv[d]=0; pos_rec(d+1, 0); } } } The ordering with n-digit words is a Gray code, except for n − 4 transitions. An ordering with only about n/2 non-Gray transitions is generated by the more complicated recursion [FXT: comb/naf-pos-recdemo.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 void pos_AAA(ulong d, bool z) { if ( d>=n ) visit(); else { if ( 0==z ) { rv[d]=+1; rv[d+1]=0; sb_rec(d+2, 0); rv[d]=0; pos_AAA(d+1, 1); // 1 } else { rv[d]=0; pos_BBB(d+1, 0); // 0 rv[d]=+1; rv[d+1]=0; sb_rec(d+2, 1); } } } 1 void pos_BBB(ulong d, bool z) // 0 // 1 14.8: Strings with no two consecutive nonzero digits 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 317 { if ( d>=n ) visit(); else { if ( 0==z ) { rv[d]=+1; rv[d+1]=0; sb_rec(d+2, 1); rv[d]=0; pos_BBB(d+1, 1); // 1 } else { rv[d]=0; pos_AAA(d+1, 0); // 0 rv[d]=+1; rv[d+1]=0; sb_rec(d+2, 0); } } // 1 // 0 } The initial call is pos_AAA(0,0). The result for n = 7 is shown in figure 14.7-B. We list the number N of non-Gray transitions and the number of digit changes X in excess of a Gray code for n ≤ 30: n: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 N: 0 0 0 0 1 2 2 2 3 4 4 4 5 6 6 6 7 8 8 8 9 10 10 10 11 12 12 12 13 14 X: 0 0 0 0 1 3 4 4 5 7 8 8 9 11 12 12 13 15 16 16 17 19 20 20 21 23 24 24 25 27 14.8 Strings with no two consecutive nonzero digits 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: .3..3 .3..2 .3..1 .3... .3.1. .3.2. .3.3. .2.3. .2.2. .2.1. .2... .2..1 .2..2 .2..3 .1..3 .1..2 .1..1 .1... .1.1. .1.2. .1.3. ...3. ...2. ...1. ..... 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: ....1 ....2 ....3 ..1.3 ..1.2 ..1.1 ..1.. ..2.. ..2.1 ..2.2 ..2.3 ..3.3 ..3.2 ..3.1 ..3.. 1.3.. 1.3.1 1.3.2 1.3.3 1.2.3 1.2.2 1.2.1 1.2.. 1.1.. 1.1.1 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: 64: 65: 66: 67: 68: 69: 70: 71: 72: 73: 74: 75: 1.1.2 1.1.3 1...3 1...2 1...1 1.... 1..1. 1..2. 1..3. 2..3. 2..2. 2..1. 2.... 2...1 2...2 2...3 2.1.3 2.1.2 2.1.1 2.1.. 2.2.. 2.2.1 2.2.2 2.2.3 2.3.3 76: 77: 78: 79: 80: 81: 82: 83: 84: 85: 86: 87: 88: 89: 90: 91: 92: 93: 94: 95: 96: 97: 2.3.2 2.3.1 2.3.. 3.3.. 3.3.1 3.3.2 3.3.3 3.2.3 3.2.2 3.2.1 3.2.. 3.1.. 3.1.1 3.1.2 3.1.3 3...3 3...2 3...1 3.... 3..1. 3..2. 3..3. Figure 14.8-A: Gray code for the length-4 radix-4 strings with no two consecutive nonzero digits. A Gray code for the length-n strings with radix (r + 1) and no two consecutive nonzero digits is generated by the following recursion for the list Dr (n): Dr (n) [ 0 . DrR (n − 1)] [1 0 . DrR (n − 1)] [2 0 . Dr (n − 1) ] R = [3 0 . Dr (n − 1)] [4 0 . Dr (n − 1) ] [5 0 . DrR (n − 1)] .. [ . ] An implementation is [FXT: comb/ntnz-gray-demo.cc]: 1 2 3 ulong n; ulong *rv; ulong mr; // length of strings // digits of strings // max digit (14.8-1) 318 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Chapter 14: Gray codes for strings with restrictions void ntnz_rec(ulong d, bool z) { if ( d>=n ) visit(); else { if ( 0==z ) { rv[d]=0; ntnz_rec(d+1, 1); for (ulong t=1; t<=mr; ++t) { rv[d]=t; rv[d+1]=0; ntnz_rec(d+2, t&1); } } else { for (ulong t=mr; t>0; --t) { rv[d]=t; rv[d+1]=0; ntnz_rec(d+2, !(t&1)); } rv[d]=0; ntnz_rec(d+1, 0); } } } Figure 14.8-A shows the Gray code for length-4, radix-4 (r = 3) strings. Setting r = 2, replacing 1 with −1, and 2 with +1, gives the Gray code for the sparse binary words (figure 14.7-A on page 315). With r = 1 we get the Gray code for the Fibonacci words. n: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 r=1: 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 r=2: 1 3 5 11 21 43 85 171 341 683 1365 2731 5461 10923 21845 r=3: 1 4 7 19 40 97 217 508 1159 2683 6160 14209 32689 75316 173383 r=4: 1 5 9 29 65 181 441 1165 2929 7589 19305 49661 126881 325525 833049 r=5: 1 6 11 41 96 301 781 2286 6191 17621 48576 136681 379561 1062966 2960771 Figure 14.8-B: Number of radix-(r + 1), length-n words with no two consecutive nonzero digits. Counting the elements on both sides of relation 14.8-1 we find that for the number dr (n) of strings in the list Dr (n) we have dr (n) = dr (n − 1) + r dr (n − 2) (14.8-2) where dr (0) = 1 and dr (1) = r +1. The sequences of these numbers start as shown in figure 14.8-B. These are the following entries in [312]: r = 1: A000045; r = 2: A001045; r = 3: A006130; r = 4: A006131; r = 5: A015440; r = 6: A015441; r = 7: A015442; r = 8: A015443. The generating function for dr (n) is ∞ X n=0 14.9 dr (n) xn = 1+rx 1 − x − r x2 (14.8-3) Strings with no two consecutive zeros ............111111111111111222222222222222333333333333333 111122223333333322221111......111122223333333322221111... 321..123321..123321..123321123321..123321..123321..123321 ................11111111111111111111112222222222222222222222 11111111222222222222222211111111............1111111122222222 222111....111222222111....111222222111111222222111....111222 21..12211221..1221..12211221..1221..1221..1221..12211221..12 Figure 14.9-A: Gray codes for strings with no two consecutive zeros: length-3 radix-4 (left) and length-4 radix-3 (right). Dots denote zeros. Gray codes for strings with no two consecutive zeros are shown in figure 14.9-A. The recursion for the 14.9: Strings with no two consecutive zeros 319 list Zr (n) with radix (r + 1) is [0 1 . ZrR (n − 2)] [0 2 . Zr (n − 2) ] [0 3 . ZrR (n − 2)] [0 4 . Zr (n − 2) ] [0 5 . ZrR (n − 2)] .. [ . ] [0 1 . Zr (n − 2) ] [0 2 . ZrR (n − 2)] [0 3 . Zr (n − 2) ] [0 4 . ZrR (n − 2)] [0 5 . Zr (n − 2) ] .. [ . ] Zr (n) = [0 r . ZrR (n − 2)] for r even, [1 . ZrR (n − 1) ] [2 . Zr (n − 1) ] [3 . ZrR (n − 1) ] [4 . Zr (n − 1) ] .. [ . ] Zr (n) = [0 r . ZrR (n − 2)] for r odd. [1 . Zr (n − 1) ] [2 . ZrR (n − 1) ] [3 . Zr (n − 1) ] [4 . ZrR (n − 1) ] .. [ . ] [r . ZrR (n − 1) ] [r . Zr (n − 1) ] An implementation is given in [FXT: comb/ntz-gray-demo.cc]: 1 2 3 ulong n; ulong *rv; long r; 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 void ntz_rec(ulong d, bool z) { if ( d>=n ) visit(); else { bool w = 0; // r-parity: w depends on z ... if ( r&1 ) w = !z; // ... if r odd // number of digits in words // digits of the word (radix r+1) // Forbidden substrings are [r, x] where x!=0 if ( z ) { // words 0X: rv[d] = 0; if ( d+2<=n ) { for (long k=1; k<=r; ++k, w=!w) } else { ntz_rec(d+1, w); w = !w; } w ^= (r&1); // r-parity: change direction if r odd { rv[d]=k; ntz_rec(d+1, w); } } else { // words X: for (long k=r; k>=1; --k, w=!w) { rv[d]=k; ntz_rec(d+1, w); } // r-parity: change direction if r odd // words 0X: rv[d] = 0; if ( d+2<=n ) { for (long k=r; k>=1; --k, w=!w) } else { ntz_rec(d+1, w); w = !w; } } } ntz_rec(d+2, w); } // words X: for (long k=1; k<=r; ++k, w=!w) w ^= (r&1); } { rv[d+1]=k; { rv[d+1]=k; ntz_rec(d+2, w); } (14.9-1) 320 Chapter 14: Gray codes for strings with restrictions With r = 1 we obtain the complement of the minimal-change list of Fibonacci words. n: 0 1 2 3 4 5 6 7 8 9 10 11 r=1: 1 2 3 5 8 13 21 34 55 89 144 233 r=2: 1 3 8 22 60 164 448 1224 3344 9136 24960 68192 r=3: 1 4 15 57 216 819 3105 11772 44631 169209 641520 2432187 r=4: 1 5 24 116 560 2704 13056 63040 304384 1469696 7096320 34264064 r=5: 1 6 35 205 1200 7025 41125 240750 1409375 8250625 48300000 282753125 Figure 14.9-B: Number of radix-(r + 1), length-n words with no two consecutive zeros. Let zr (n) be the number of words Wr (n), we find = r zr (n − 1) + r zr (n − 1) zr (n) (14.9-2) where zr (0) = 1 and zr (1) = r + 1. The sequences for r ≤ 5 start as shown in figure 14.9-B. These (for r ≤ 4) are the following entries in [312]: r = 1: A000045; r = 2: A028859; r = 3: A125145; r = 4: A086347. The generating function for zr (n) is ∞ X zr (n) xn = n=0 1+x 1 − r x − r x2 14.10 Binary strings without substrings 1x1 or 1xy1 ‡ 14.10.1 No substrings 1x1 (14.9-3) ........................................111111111111111111111111 .........................111111111111111...............111111111 ...............1111111111.........111111........................ .........111111......1111........................111111......... ......111....11................111............111....11......111 ....11..1..........11........11..1....11....11..1..........11..1 ..11.1.....11....11.1..11..11.1.....11.1..11.1.....11....11.1... .1.1...1..1.1.1.1.1...1.1.1.1...1..1.1...1.1...1..1.1.1.1.1...1. ........................................111111111111111111111111 .........................111111111111111111111111............... ...............1111111111111111................................. .........1111111111.......................................111111 ......11111..........................111111............11111.... ....111................1111........111....111........111........ ..111........1111....111..111....111........111....111........11 .11.....11..11..11..11......11..11.....11.....11..11.....11..11. Figure 14.10-A: The length-8 binary strings with no substring 1x1 (where x is either 0 or 1): lex order (top) and minimal-change order (bottom). Dots denote zeros. A Gray code for binary strings with no substring 1x1 is shown in figure 14.10-A. The recursive structure for the list V (n) of the n-bit words is V (n) = [1 0 0 . V (n − 3) ] [1 1 0 0 . V R (n − 4)] [0 . V (n − 1) ] The implied algorithm can be implemented as [FXT: comb/no1x1-gray-demo.cc]: 1 2 ulong n; ulong *rv; 1 2 3 4 5 6 7 void no1x1_rec(ulong d, bool z) { if ( d>=n ) { if ( d<=n+2 ) else { if ( z ) { // number of bits in words // bits of the word visit(); } (14.10-1) 14.10: Binary strings without substrings 1x1 or 1xy1 ‡ 8 9 10 11 12 13 14 15 16 17 18 19 rv[d]=1; rv[d]=1; rv[d]=0; } else { rv[d]=0; rv[d]=1; rv[d]=1; } rv[d+1]=0; rv[d+2]=0; rv[d+1]=1; rv[d+2]=0; no1x1_rec(d+1, z); no1x1_rec(d+3, z); rv[d+3]=0; no1x1_rec(d+4, !z); no1x1_rec(d+1, z); rv[d+1]=1; rv[d+2]=0; rv[d+1]=0; rv[d+2]=0; rv[d+3]=0; no1x1_rec(d+4, !z); no1x1_rec(d+3, z); 321 } } The sequence of the numbers v(n) of length-n strings starts as n: v(n): 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2 4 6 9 15 25 40 64 104 169 273 441 714 1156 1870 3025 4895 This is entry A006498 in [312]. The recurrence relation is v(n) = v(n − 1) + v(n − 3) + v(n − 4) (14.10-2) The generating function is ∞ X v(n) xn n=0 14.10.2 = 1 + x + 2 x2 + x3 1 − x − x3 − x4 (14.10-3) No substrings 1xy1 .......................................................................................... .....................................................................111111111111111111111 .........................................111111111111111111111111111111111111111111111111. .........................1111111111111111111111111111............................111111111 .................11111111111111..................11111111................................. ............11111111.........1111......................................................... ........111111.....11............................................11111111................. ....111111...11......................11111111................111111....111111........11111 ..1111...11............1111........1111....1111....1111....1111...11..11...1111....1111... .11..11.........11....11..11..11..11..11..11..11..11..11..11..11..........11..11..11..11.. ........................111111111111111111111111111111111111111111111111111111111111111111 11111111111111111111111111111111111111111111111111111..................................... .........................................111111111111111111111111......................... 1111111................................................................................... ..................................................................................11111111 ...................1111111111................................................11111111..... ...............111111......111111................11111111................111111.....11.... 111........111111...11....11...111111........111111....111111........111111...11.......... .1111....1111...11............11...1111....1111...11..11...1111....1111...11............11 11..11..11..11.........11.........11..11..11..11..........11..11..11..11.........11....11. Figure 14.10-B: The length-10 binary strings with no substring 1xy1 (where x and y are either 0 or 1) in minimal-change order. Dots denote zeros. Figure 14.10-B shows a Gray code for binary words with no substring 1xy1. The recursion for the list of n-bit words Y (n) is [1 0 0 0 . Y (n − 4) ] [1 0 1 0 0 0 . Y R (n − 6)] Y (n) = [1 1 1 0 0 0 . Y (n − 6) ] [1 1 0 0 0 . Y R (n − 5) ] [0 . Y (n − 1) ] An implementation is given in [FXT: comb/no1xy1-gray-demo.cc]: 1 2 3 4 5 6 7 void Y_rec(long p1, long p2, bool z) { if ( p1>p2 ) { visit(); return; } #define S1(a) rv[p1+0]=a #define S2(a,b) S1(a); rv[p1+1]=b; #define S3(a,b,c) S2(a,b); rv[p1+2]=c; (14.10-4) 322 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 Chapter 14: Gray codes for strings with restrictions #define S4(a,b,c,d) S3(a,b,c); rv[p1+3]=d; #define S5(a,b,c,d,e) S4(a,b,c,d); rv[p1+4]=e; #define S6(a,b,c,d,e,f) S5(a,b,c,d,e); rv[p1+5]=f; long d = p2 - p1; if ( z ) { if ( d >= 0 ) if ( d >= 2 ) if ( d >= 2 ) if ( d >= 1 ) if ( d >= 0 ) } else { if ( d >= 0 ) if ( d >= 1 ) if ( d >= 2 ) if ( d >= 2 ) if ( d >= 0 ) } { S4(1,0,0,0); { S6(1,0,1,0,0,0); { S6(1,1,1,0,0,0); { S5(1,1,0,0,0); { S1(0); Y_rec(p1+4, p2, z); } Y_rec(p1+6, p2, !z); } Y_rec(p1+6, p2, z); } Y_rec(p1+5, p2, !z); } Y_rec(p1+1, p2, z); } // 1 0 0 0 // 1 0 1 0 0 0 // 1 1 1 0 0 0 // 1 1 0 0 0 // 0 { S1(0); { S5(1,1,0,0,0); { S6(1,1,1,0,0,0); { S6(1,0,1,0,0,0); { S4(1,0,0,0); Y_rec(p1+1, p2, z); } Y_rec(p1+5, p2, !z); } Y_rec(p1+6, p2, z); } Y_rec(p1+6, p2, !z); } Y_rec(p1+4, p2, z); } // 0 // 1 1 0 0 0 // 1 1 1 0 0 0 // 1 0 1 0 0 0 // 1 0 0 0 } Note the conditions if ( d >= ? ) that make sure that no string appears repeated. The initial call is Y_rec(0, n-1, 0). The sequence of the numbers y(n) of length-n strings starts as n: y(n): 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2 4 8 12 17 25 41 69 114 180 280 440 705 1137 1825 2905 4610 The generating function is ∞ X y(n) xn = n=0 14.10.3 1 + x + 2 x2 + 4 x3 + 3 x4 + 2 x5 1 − x − x4 − x5 − 2 x6 (14.10-5) Neither substrings 1x1 nor substrings 1xy1 ............................................................1111111111111111111111111111 .........................................111111111111111111111111111111................. ...........................1111111111111111111111....................................... .................1111111111111111....................................................... ...........1111111111.............................................................111111 ........11111............................................111111................11111.... ......111..............................1111............111....111............111........ ....111..................1111........111..111........111........111........111.......... ..111..........1111....111..111....111......111....111............111....111..........11 .11.......11..11..11..11......11..11..........11..11.......11.......11..11.......11..11. Figure 14.10-C: A Gray code for the length-10 binary strings with no substring 1x1 or 1xy1. A recursion for a Gray code of the n-bit binary words Z(n) with no substrings 1x1 or 1xy1 (shown in figure 14.10-C) is Z(n) [1 0 0 0 . Z(n − 4) ] = [1 1 0 0 0 . Z R (n − 5)] [0 . Z(n − 1) ] (14.10-6) The sequence of the numbers z(n) of length-n strings starts as n: z(n): 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 1 2 4 6 8 11 17 27 41 60 88 132 200 301 449 669 1001 1502 The sequence is (apart from three leading ones) entry A079972 in [312] where two combinatorial interpretations are given: Number of permutations satisfying -k<=p(i)-i<=r and p(i)-i not in I, i=1..n, with k=1, r=4, I={1,2}. Number of compositions (ordered partitions) of n into elements of the set {1,4,5}. The generating function is ∞ X n=0 z(n) xn = 1 + x + 2 x2 + 2 x3 + x4 1 − x − x4 − x5 (14.10-7) 323 Chapter 15 Parentheses strings We give algorithms to list all well-formed strings of n pairs of parentheses. In the spirit of [211] we use the term paren string for a well-formed string of parentheses. A generalization, the k-ary Dyck words, is described at the end of the section. If the problem at hand appears to be somewhat esoteric, then see [319, vol.2, exercise 6.19, p.219] for many kinds of objects isomorphic to our paren strings. Indeed, as of May 2010, 180 kinds of combinatorial objects counted by the Catalan numbers (which may be called Catalan objects) have been identified, see [321] and also [320]. 15.1 Co-lexicographic order 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: ((((())))) (((()()))) ((()(()))) (()((()))) ()(((()))) (((())())) ((()()())) (()(()())) ()((()())) ((())(())) (()()(())) ()(()(())) (())((())) ()()((())) (((()))()) ((()())()) (()(())()) ()((())()) ((())()()) (()()()()) ()(()()()) 11111..... 1111.1.... 111.11.... 11.111.... 1.1111.... 1111..1... 111.1.1... 11.11.1... 1.111.1... 111..11... 11.1.11... 1.11.11... 11..111... 1.1.111... 1111...1.. 111.1..1.. 11.11..1.. 1.111..1.. 111..1.1.. 11.1.1.1.. 1.11.1.1.. 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: (())(()()) ()()(()()) ((()))(()) (()())(()) ()(())(()) (())()(()) ()()()(()) (((())))() ((()()))() (()(()))() ()((()))() ((())())() (()()())() ()(()())() (())(())() ()()(())() ((()))()() (()())()() ()(())()() (())()()() ()()()()() 11..11.1.. 1.1.11.1.. 111...11.. 11.1..11.. 1.11..11.. 11..1.11.. 1.1.1.11.. 1111....1. 111.1...1. 11.11...1. 1.111...1. 111..1..1. 11.1.1..1. 1.11.1..1. 11..11..1. 1.1.11..1. 111...1.1. 11.1..1.1. 1.11..1.1. 11..1.1.1. 1.1.1.1.1. Figure 15.1-A: All (42) valid strings of 5 pairs of parentheses in co-lexicographic order. An iterative scheme to generate all valid ways to group parentheses can be derived from a modified version of the combinations in co-lexicographic order (see section 6.2.2 on page 178). For n = 5 pairs the possible combinations are shown in figure 15.1-A. This is the output of [FXT: comb/paren-demo.cc]. Consider the sequences to the right of the paren strings as binary words (these are often called (binary) Dyck words). If the leftmost block has more than a single one, then its rightmost one is moved one position to the right. Otherwise (the leftmost block consists of a single one and) the ones of the longest run of the repeated pattern ‘1.’ at the left are gathered at the left end and the rightmost one in the next block of ones (which contains at least two ones) is moved by one position to the right and the rest of the block is gathered at the left end (see the transitions from #14 to #15 or #37 to #38). The generator is [FXT: class paren in comb/paren.h]: 324 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 Chapter 15: Parentheses strings class paren { public: ulong k_; // Number of paren pairs ulong n_; // ==2*k ulong *x_; // Positions where an opening paren occurs char *str_; // String representation, e.g. "((())())()" public: paren(ulong k) { k_ = (k>1 ? k : 2); // not zero (empty) or one (trivial: "()") n_ = 2 * k_; x_ = new ulong[k_ + 1]; x_[k_] = 999; // sentinel str_ = new char[n_ + 1]; str_[n_] = 0; first(); } ~paren() { delete [] x_; delete [] str_; } void first() { for (ulong i=0; ij; --i,--j) x_[j] = i; for ( ; i; --i) x_[i] = 2*i; x_[0] = 0; } 15.2: Gray code via restricted growth strings 43 44 45 46 47 48 49 325 return 1; } const ulong * data() [--snip--] const { return x_; } The strings are set up on demand only: 1 2 3 4 5 6 7 const char * string() // generate on demand { for (ulong j=0; j j. The predecessor is computed by decrementing the highest digit aj 6= 0 and setting ai = ai−1 + 1 for all i > j. The RGSs for a given n can be generated as follows [FXT: class catalan in comb/catalan.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 class catalan // Catalan restricted growth strings (RGS) // By default in near-perfect minimal-change order, i.e. // exactly two symbols in paren string change with each step { public: int *as_; // digits of the RGS: as_[k] <= as[k-1] + 1 int *d_; // direction with recursion (+1 or -1) ulong n_; // Number of digits (paren pairs) char *str_; // paren string bool xdr_; // whether to change direction in recursion (==> minimal-change order) int dr0_; // dr0: starting direction in each recursive step: // dr0=+1 ==> start with as[]=[0,0,0,...,0] == "()()()...()" // dr0=-1 ==> start with as[]=[0,1,2,...,n-1] == "((( ... )))" 326 Chapter 15: Parentheses strings 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: [ 0 1 2 3 4 ] [ 0 1 2 3 3 ] [ 0 1 2 3 2 ] [ 0 1 2 3 1 ] [ 0 1 2 3 0 ] [ 0 1 2 2 0 ] [ 0 1 2 2 1 ] [ 0 1 2 2 2 ] [ 0 1 2 2 3 ] [ 0 1 2 1 2 ] [ 0 1 2 1 1 ] [ 0 1 2 1 0 ] [ 0 1 2 0 0 ] [ 0 1 2 0 1 ] [ 0 1 1 0 1 ] [ 0 1 1 0 0 ] [ 0 1 1 1 0 ] [ 0 1 1 1 1 ] [ 0 1 1 1 2 ] [ 0 1 1 2 3 ] [ 0 1 1 2 2 ] [ 0 1 1 2 1 ] [ 0 1 1 2 0 ] [ 0 1 0 1 0 ] [ 0 1 0 1 1 ] [ 0 1 0 1 2 ] [ 0 1 0 0 1 ] [ 0 1 0 0 0 ] [ 0 0 0 0 0 ] [ 0 0 0 0 1 ] [ 0 0 0 1 2 ] [ 0 0 0 1 1 ] [ 0 0 0 1 0 ] [ 0 0 1 2 0 ] [ 0 0 1 2 1 ] [ 0 0 1 2 2 ] [ 0 0 1 2 3 ] [ 0 0 1 1 2 ] [ 0 0 1 1 1 ] [ 0 0 1 1 0 ] [ 0 0 1 0 0 ] [ 0 0 1 0 1 ] [ - - - - - ] [ - - - - - ] [ - - - - - ] [ - - - - - ] [ - - - - - ] [ - - - - + ] [ - - - - + ] [ - - - - + ] [ - - - - + ] [ - - - - - ] [ - - - - - ] [ - - - - - ] [ - - - - + ] [ - - - - + ] [ - - - + - ] [ - - - + - ] [ - - - + + ] [ - - - + + ] [ - - - + + ] [ - - - + - ] [ - - - + - ] [ - - - + - ] [ - - - + - ] [ - - - - + ] [ - - - - + ] [ - - - - + ] [ - - - - - ] [ - - - - - ] [ - - + + + ] [ - - + + + ] [ - - + + - ] [ - - + + - ] [ - - + + - ] [ - - + - + ] [ - - + - + ] [ - - + - + ] [ - - + - + ] [ - - + - - ] [ - - + - - ] [ - - + - - ] [ - - + - + ] [ - - + - + ] ((((())))) (((()()))) (((())())) (((()))()) (((())))() ((()()))() ((()())()) ((()()())) ((()(()))) ((())(())) ((())()()) ((())())() ((()))()() ((()))(()) (()())(()) (()())()() (()()())() (()()()()) (()()(())) (()((()))) (()(()())) (()(())()) (()(()))() (())(())() (())(()()) (())((())) (())()(()) (())()()() ()()()()() ()()()(()) ()()((())) ()()(()()) ()()(())() ()((()))() ()((())()) ()((()())) ()(((()))) ()(()(())) ()(()()()) ()(()())() ()(())()() ()(())(()) 11111..... 1111.1.... 1111..1... 1111...1.. 1111....1. 111.1...1. 111.1..1.. 111.1.1... 111.11.... 111..11... 111..1.1.. 111..1..1. 111...1.1. 111...11.. 11.1..11.. 11.1..1.1. 11.1.1..1. 11.1.1.1.. 11.1.11... 11.111.... 11.11.1... 11.11..1.. 11.11...1. 11..11..1. 11..11.1.. 11..111... 11..1.11.. 11..1.1.1. 1.1.1.1.1. 1.1.1.11.. 1.1.111... 1.1.11.1.. 1.1.11..1. 1.111...1. 1.111..1.. 1.111.1... 1.1111.... 1.11.11... 1.11.1.1.. 1.11.1..1. 1.11..1.1. 1.11..11.. ((((XA)))) (((()XA))) (((())XA)) (((()))XA) (((XA)))() ((()())AX) ((()()AX)) ((()(AX))) ((()X(A))) ((())(XA)) ((())()XA) ((())XA)() ((()))(AX) ((XA))(()) (()())(XA) (()()AX)() (()()()AX) (()()(AX)) (()(A(X))) (()((XA))) (()(()XA)) (()(())XA) (()X(A))() (())(()AX) (())((AX)) (())(X(A)) (())()(XA) (XA)()()() ()()()(AX) ()()(A(X)) ()()((XA)) ()()(()XA) ()(A(X))() ()((())AX) ()((()AX)) ()(((AX))) ()((X(A))) ()(()(XA)) ()(()()XA) ()(()XA)() ()(())(AX) 2 2 2 2 2 2 2 Figure 15.2-B: Minimal-change order for the paren strings of 5 pairs. From left to right: restricted growth strings, arrays of directions, paren strings, delta sets, and difference strings. If the changes are not adjacent, then the distance of changed positions is given at the right. The order corresponds to dr0=-1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 public: catalan(ulong n, bool xdr=true, int dr0=+1) { n_ = n; as_ = new int[n_]; d_ = new int[n_]; str_ = new char[2*n_+1]; str_[2*n_] = 0; init(xdr, dr0); } ~catalan() { delete [] as_; delete [] d_; delete [] str_; } void init(bool xdr, int dr0) { dr0_ = ( (dr0>=0) ? +1 : -1 ); xdr_ = xdr; ulong n = n_; if ( dr0_>0 ) else for (ulong k=0; k0) ? (as>as_[k-1]+1) : (as<0) ); if ( ovq ) // have to recurse { ulong ns1 = next_rec(k-1); if ( 0==ns1 ) return false; d = ( xdr_ ? -d : dr0_ ); d_[k] = d; as = ( (d>0) ? 0 : as_[k-1]+1 ); } as_[k] = as; return true; } The program [FXT: comb/catalan-demo.cc] demonstrates the usage: ulong n = 4; bool xdr = true; int dr0 = -1; catalan C(n, xdr, dr0); do { /* visit string */ } while ( C.next() ); About 69 million strings per second are generated. Figure 15.2-B shows the minimal-change order for n = 5 and dr0=-1, and figure 15.2-C for dr0=+1. More minimal-change orders 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 0 0 0 0 0 0 0 0 0 1 0 0 0 1 2 0 0 0 1 1 0 0 0 1 0 0 0 1 2 3 0 0 1 2 2 0 0 1 2 1 0 0 1 2 0 0 0 1 1 0 0 0 1 1 1 0 0 1 1 2 0 0 1 0 1 0 0 1 0 0 0 1 2 3 0 0 1 2 3 1 0 1 2 3 2 0 1 2 3 3 0 1 2 3 4 0 1 2 2 3 0 1 2 2 2 1.1.1.1.1. 1.1.1.11.. 1.1.111... 1.1.11.1.. 1.1.11..1. 1.1111.... 1.111.1... 1.111..1.. 1.111...1. 1.11.1..1. 1.11.1.1.. 1.11.11... 1.11..11.. 1.11..1.1. 1111....1. 1111...1.. 1111..1... 1111.1.... 11111..... 111.11.... 111.1.1... 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 0 1 2 2 1 0 1 2 2 0 0 1 2 1 0 0 1 2 1 1 0 1 2 1 2 0 1 2 0 1 0 1 2 0 0 0 1 1 0 0 0 1 1 0 1 0 1 1 1 2 0 1 1 1 1 0 1 1 1 0 0 1 1 2 0 0 1 1 2 1 0 1 1 2 2 0 1 1 2 3 0 1 0 1 0 0 1 0 1 1 0 1 0 1 2 0 1 0 0 1 0 1 0 0 0 111.1..1.. 111.1...1. 111..1..1. 111..1.1.. 111..11... 111...11.. 111...1.1. 11.1..1.1. 11.1..11.. 11.1.11... 11.1.1.1.. 11.1.1..1. 11.11...1. 11.11..1.. 11.11.1... 11.111.... 11..11..1. 11..11.1.. 11..111... 11..1.11.. 11..1.1.1. Figure 15.2-D: Strings of 5 pairs of parentheses in a Gray code order. The Gray code order shown in figure 15.2-D can be generated via a simple recursion: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ulong n; ulong *rv; // Number of paren pairs // restricted growth strings void next_rec(ulong d, bool z) { if ( d==n ) visit(); else { const long rv1 = rv[d-1]; // left neighbor if ( 0==z ) { for (long x=0; x<=rv1+1; ++x) // forward { rv[d] = x; next_rec(d+1, (x&1)); } } else 15.3: Order by prefix shifts (cool-lex) 19 20 21 22 23 24 25 26 27 329 { for (long x=rv1+1; x>=0; --x) { rv[d] = x; next_rec(d+1, !(x&1)); } // backward } } } The initial call is next_rec(0, 0);. comb/paren-gray-rec-demo.cc]. 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: ()()()()() ()()()(()) ()()(()()) ()()((())) ()()(())() ()(()())() ()(()(())) ()(()()()) ()((())()) ()((()())) ()(((()))) ()((()))() ()(())()() ()(())(()) (()())(()) (()())()() (()(()))() (()((()))) (()(()())) (()(())()) (()()()()) About 81 million strings per second are generated [FXT: 1.1.1.1.1. 1.1.1.11.. 1.1.11.1.. 1.1.111... 1.1.11..1. 1.11.1..1. 1.11.11... 1.11.1.1.. 1.111..1.. 1.111.1... 1.1111.... 1.111...1. 1.11..1.1. 1.11..11.. 11.1..11.. 11.1..1.1. 11.11...1. 11.111.... 11.11.1... 11.11..1.. 11.1.1.1.. 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: (()()(())) (()()())() ((())())() ((())(())) ((())()()) ((()())()) ((()()())) ((()(()))) ((()()))() (((())))() ((((())))) (((()()))) (((())())) (((()))()) ((()))(()) ((()))()() (())()()() (())()(()) (())(()()) (())((())) (())(())() 11.1.11... 11.1.1..1. 111..1..1. 111..11... 111..1.1.. 111.1..1.. 111.1.1... 111.11.... 111.1...1. 1111....1. 11111..... 1111.1.... 1111..1... 1111...1.. 111...11.. 111...1.1. 11..1.1.1. 11..1.11.. 11..11.1.. 11..111... 11..11..1. Figure 15.2-E: Strings of 5 pairs of parentheses in Gray code order as generated by a loopless algorithm. A loopless algorithm (that does not use RGS) given in [329] is implemented in [FXT: class paren gray in comb/paren-gray.h]. The generated order for five paren pairs is shown in figure 15.2-E. About 80 million strings per second are generated [FXT: comb/paren-gray-demo.cc]. Still more algorithms for the parentheses strings in minimal-change order are given in [90], [337], and [363]. 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: ....1111 == (((()))) ...1.111 == ((()())) ...11.11 == (()(())) ...111.1 == ()((())) ..1.11.1 == ()(()()) ..1.1.11 == (()()()) ..1..111 == ((())()) .1...111 == ((()))() .1..1.11 == (()())() .1..11.1 == ()(())() .1.1.1.1 == ()()()() ..11.1.1 == ()()(()) ..11..11 == (())(()) .1.1..11 == (())()() ^= ...11... ^= ....11.. ^= .....11. ^= ..11.... ^= .....11. ^= ....11.. ^= .11..... ^= ....11.. ^= .....11. ^= ...11... ^= .11..... ^= .....11. ^= .11..... Figure 15.2-F: A strong minimal-change order for the paren strings of 4 pairs. For even values of n it is possible to generate paren strings in strong minimal-change order where changes occur only in adjacent positions. Figure 15.2-F shows an example for four pairs of parens. The listing was generated with [FXT: graph/graph-parengray-demo.cc] that uses directed graphs and the search algorithms described in chapter 20 on page 391. 330 Chapter 15: Parentheses strings 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: ((((())))) ()(((()))) (()((()))) ((()(()))) (((()()))) ()((()())) (()(()())) ((()()())) ()(()(())) (()()(())) ()()((())) (())((())) ((())(())) (((())())) ()((())()) (()(())()) ((()())()) ()(()()()) (()()()()) ()()(()()) (())(()()) 11111..... 1.1111.... 11.111.... 111.11.... 1111.1.... 1.111.1... 11.11.1... 111.1.1... 1.11.11... 11.1.11... 1.1.111... 11..111... 111..11... 1111..1... 1.111..1.. 11.11..1.. 111.1..1.. 1.11.1.1.. 11.1.1.1.. 1.1.11.1.. 11..11.1.. 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: ((())()()) ()(())(()) (()())(()) ()()()(()) (())()(()) ((()))(()) (((()))()) ()((()))() (()(()))() ((()()))() ()(()())() (()()())() ()()(())() (())(())() ((())())() ()(())()() (()())()() ()()()()() (())()()() ((()))()() (((())))() 111..1.1.. 1.11..11.. 11.1..11.. 1.1.1.11.. 11..1.11.. 111...11.. 1111...1.. 1.111...1. 11.11...1. 111.1...1. 1.11.1..1. 11.1.1..1. 1.1.11..1. 11..11..1. 111..1..1. 1.11..1.1. 11.1..1.1. 1.1.1.1.1. 11..1.1.1. 111...1.1. 1111....1. Figure 15.3-A: All strings of 5 pairs of parentheses generated via prefix shifts. 15.3 Order by prefix shifts (cool-lex) The binary words corresponding to paren strings can be generated in an order where each word differs from its successor by a cyclic shift of a prefix (ignoring the first bit which is always one). Moreover, each transition changes either two or four bits, see figure 15.3-A. The (loopless) algorithm described in [292] can generate slightly more general objects: strings of t ones and s zeros where the number of zeros in any prefix does not exceed the number of ones. Paren strings correspond to t = s. The generator is implemented as follows [FXT: comb/paren-pref.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 class paren_pref { public: const ulong t_, s_; // t: number of ones, s: number of zeros const ulong nq_; // aux ulong x_, y_; // aux ulong *b_; // array of t ones and s zeros public: paren_pref(ulong t, ulong s) // Must have: t >= s > 0 : t_(t), s_(s), nq_(s+t-(s==t)) { b_ = new ulong[s_+t_+1]; // element [0] unused first(); } ~paren_pref() { delete [] b_; } const ulong * data() const { return b_+1; } void first() { for (ulong j=0; j<=t_; ++j) b_[j] = 1; for (ulong j=t_+1; j<=s_+t_; ++j) b_[j] = 0; x_ = y_ = t_; } The method for updating is 1 2 3 4 5 6 bool next() { if ( x_ >= nq_ ) b_[x_] = 0; b_[y_] = 1; ++x_; return false; 15.4: Catalan numbers 7 8 9 10 11 12 13 14 15 16 17 18 19 20 331 ++y_; if ( b_[x_] == 0 ) { if ( x_ == 2*y_ - 2 ) else { b_[x_] = 1; b_[2] = 0; x_ = 3; y_ = 2; } } return true; ++x_; } Note that the array b[] is one-based, as in the cited paper. A zero-based version is used if the line #define PAREN_PREF_BASE1 // default on (faster) near the top of the file is commented out. The rate of generation (with t = s = 18) is impressive: about 268 M/s when using a pointer and about 281 M/s when using an array [FXT: comb/paren-pref-demo.cc]. 15.4 Catalan numbers The number of valid combinations of n parentheses pairs is    2n 2 n+1 2n Cn = n n+1 = n 2n + 1 n−1 = n  =    2n 2n − n n−1 (15.4-1) as nicely explained in [166, p.343-346]. These are the Catalan numbers, sequence A000108 in [312]: n: Cn 1: 1 2: 2 3: 5 4: 14 5: 42 6: 132 7: 429 8: 1430 9: 4862 10: 16796 n: Cn 11: 58786 12: 208012 13: 742900 14: 2674440 15: 9694845 16: 35357670 17: 129644790 18: 477638700 19: 1767263190 20: 6564120420 n: Cn 21: 24466267020 22: 91482563640 23: 343059613650 24: 1289904147324 25: 4861946401452 26: 18367353072152 27: 69533550916004 28: 263747951750360 29: 1002242216651368 30: 3814986502092304 The Catalan numbers are generated most easily with the relation Cn+1 = 2 (2 n + 1) Cn n+2 (15.4-2) The generating function is √ ∞ X 1 − 1 − 4x C(x) = = Cn xn = 1 + x + 2 x2 + 5 x3 + 14 x4 + 42 x5 + . . . 2x n=0 (15.4-3) 2 The function C(x) satisfies the equation [x C(x)] = x + [x C(x)] which is equivalent to the following convolution property for the Catalan numbers: Cn = n−1 X Ck Cn−1−k (15.4-4) k=0 √ The quadratic equation has a second solution (1 + 1 − 4 x)/(2 x) = x−1 − 1 − x − 2 x2 − 5 x3 − 14 x4 − . . . which we ignore here. 332 Chapter 15: Parentheses strings 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: RGS [ 0 0 0 0 ] [ 0 0 0 1 ] [ 0 0 0 2 ] [ 0 0 1 0 ] [ 0 0 1 1 ] [ 0 0 1 2 ] [ 0 0 1 3 ] [ 0 0 2 0 ] [ 0 0 2 1 ] [ 0 0 2 2 ] [ 0 0 2 3 ] [ 0 0 2 4 ] [ 0 1 0 0 ] [ 0 1 0 1 ] [ 0 1 0 2 ] [ 0 1 1 0 ] [ 0 1 1 1 ] [ 0 1 1 2 ] [ 0 1 1 3 ] [ 0 1 2 0 ] [ 0 1 2 1 ] [ 0 1 2 2 ] [ 0 1 2 3 ] [ 0 1 2 4 ] [ 0 1 3 0 ] [ 0 1 3 1 ] [ 0 1 3 2 ] [ 0 1 3 3 ] [ 0 1 3 4 ] [ 0 1 3 5 ] [ 0 2 0 0 ] [ 0 2 0 1 ] [ 0 2 0 2 ] [ 0 2 1 0 ] [ 0 2 1 1 ] [ 0 2 1 2 ] [ 0 2 1 3 ] [ 0 2 2 0 ] [ 0 2 2 1 ] [ 0 2 2 2 ] [ 0 2 2 3 ] [ 0 2 2 4 ] [ 0 2 3 0 ] [ 0 2 3 1 ] [ 0 2 3 2 ] [ 0 2 3 3 ] [ 0 2 3 4 ] [ 0 2 3 5 ] [ 0 2 4 0 ] [ 0 2 4 1 ] [ 0 2 4 2 ] [ 0 2 4 3 ] [ 0 2 4 4 ] [ 0 2 4 5 ] [ 0 2 4 6 ] Dyck word 1..1..1..1.. 1..1..1.1... 1..1..11.... 1..1.1...1.. 1..1.1..1... 1..1.1.1.... 1..1.11..... 1..11....1.. 1..11...1... 1..11..1.... 1..11.1..... 1..111...... 1.1...1..1.. 1.1...1.1... 1.1...11.... 1.1..1...1.. 1.1..1..1... 1.1..1.1.... 1.1..11..... 1.1.1....1.. 1.1.1...1... 1.1.1..1.... 1.1.1.1..... 1.1.11...... 1.11.....1.. 1.11....1... 1.11...1.... 1.11..1..... 1.11.1...... 1.111....... 11....1..1.. 11....1.1... 11....11.... 11...1...1.. 11...1..1... 11...1.1.... 11...11..... 11..1....1.. 11..1...1... 11..1..1.... 11..1.1..... 11..11...... 11.1.....1.. 11.1....1... 11.1...1.... 11.1..1..... 11.1.1...... 11.11....... 111......1.. 111.....1... 111....1.... 111...1..... 111..1...... 111.1....... 1111........ positions [ 0 3 6 9 ] [ 0 3 6 8 ] [ 0 3 6 7 ] [ 0 3 5 9 ] [ 0 3 5 8 ] [ 0 3 5 7 ] [ 0 3 5 6 ] [ 0 3 4 9 ] [ 0 3 4 8 ] [ 0 3 4 7 ] [ 0 3 4 6 ] [ 0 3 4 5 ] [ 0 2 6 9 ] [ 0 2 6 8 ] [ 0 2 6 7 ] [ 0 2 5 9 ] [ 0 2 5 8 ] [ 0 2 5 7 ] [ 0 2 5 6 ] [ 0 2 4 9 ] [ 0 2 4 8 ] [ 0 2 4 7 ] [ 0 2 4 6 ] [ 0 2 4 5 ] [ 0 2 3 9 ] [ 0 2 3 8 ] [ 0 2 3 7 ] [ 0 2 3 6 ] [ 0 2 3 5 ] [ 0 2 3 4 ] [ 0 1 6 9 ] [ 0 1 6 8 ] [ 0 1 6 7 ] [ 0 1 5 9 ] [ 0 1 5 8 ] [ 0 1 5 7 ] [ 0 1 5 6 ] [ 0 1 4 9 ] [ 0 1 4 8 ] [ 0 1 4 7 ] [ 0 1 4 6 ] [ 0 1 4 5 ] [ 0 1 3 9 ] [ 0 1 3 8 ] [ 0 1 3 7 ] [ 0 1 3 6 ] [ 0 1 3 5 ] [ 0 1 3 4 ] [ 0 1 2 9 ] [ 0 1 2 8 ] [ 0 1 2 7 ] [ 0 1 2 6 ] [ 0 1 2 5 ] [ 0 1 2 4 ] [ 0 1 2 3 ] Figure 15.5-A: The 55 increment-2 restricted growth strings of length 4 (left), the corresponding 3-ary Dyck words (middle), and positions of ones in the Dyck words (right). 15.5: Increment-i RGS, k-ary Dyck words, and k-ary trees 15.5 333 Increment-i RGS, k-ary Dyck words, and k-ary trees We generalize the restricted growth strings for paren word by allowing increments at most i: sequences a0 , a1 , . . . , an where a0 = 0 and ak ≤ ak−1 + i. The case i = 1 corresponds to the RGS for paren words. A k-ary Dyck word is a binary word where each prefix contains at least k − 1 times many ones as zeros. The increment-i RGS correspond to k-ary Dyck words where k = i + 1, see figure 15.5-A. The positions of the ones in the Dyck words are computed as cj = k · j − aj (rightmost column). The length-n increment-i RGS also correspond to k-ary trees with n internal nodes: start at the root, move out by i positions for every one and follow back by one position for every zero. 15.5.1 Generation in lexicographic order Figure 15.5-A shows the increment-2 restricted growth strings of length 4. The strings can be generated in lexicographic order via [FXT: class dyck rgs in comb/dyck-rgs.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 class dyck_rgs { public: ulong *s_; ulong n_; ulong i_; [--snip--] // restricted growth string // Length of strings // s[k] <= s[k-1]+i ulong next() // Return index of first changed element in s[], // Return zero if current string is the last { ulong k = n_; start: --k; if ( k==0 ) return 0; ulong sk = s_[k] + 1; ulong mp = s_[k-1] + i_; if ( sk > mp ) // "carry" { s_[k] = 0; goto start; } s_[k] = sk; return k; } [--snip--] The rate of generation is about 168 M/s for i = 1, 194 M/s for i = 2, and 218 M/s with i = 3 [FXT: comb/dyck-rgs-demo.cc]. 15.5.2 Gray codes with homogeneous moves A loopless algorithm for the generation of a Gray code with only homogeneous moves is given in [37]. The RGS used in the algorithm gives the positions (one-based) of the ones in the delta sets, see figure 15.5-B (created with [FXT: comb/dyck-gray-demo.cc]). An implementation is given in [FXT: class dyck gray in comb/dyck-gray.h]. A Gray code where in addition all transitions are two-close is shown in figure 15.5-C (created with [FXT: comb/dyck-gray2-demo.cc]). Note that the moves are enup-moves, compare to figure 6.6-B on page 189. The underlying algorithm is described in [338] an implementation is given in [FXT: class dyck gray2 in comb/dyck-gray2.h]: 1 2 3 4 5 class dyck_gray2 { public: ulong m, k; // m ones (and m*(k-1) zeros) bool ptt; // Parity of Total number of Tories (variable ’Odd’ in paper) 334 Chapter 15: Parentheses strings 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: positions [ 1 4 7 A ] [ 1 4 7 8 ] [ 1 4 7 9 ] [ 1 4 5 9 ] [ 1 4 5 8 ] [ 1 4 5 7 ] [ 1 4 5 6 ] [ 1 4 5 A ] [ 1 4 6 A ] [ 1 4 6 7 ] [ 1 4 6 8 ] [ 1 4 6 9 ] [ 1 2 6 9 ] [ 1 2 6 8 ] [ 1 2 6 7 ] [ 1 2 6 A ] [ 1 2 5 A ] [ 1 2 5 6 ] [ 1 2 5 7 ] [ 1 2 5 8 ] [ 1 2 5 9 ] [ 1 2 4 9 ] [ 1 2 4 8 ] [ 1 2 4 7 ] [ 1 2 4 6 ] [ 1 2 4 5 ] [ 1 2 4 A ] [ 1 2 3 A ] [ 1 2 3 4 ] [ 1 2 3 5 ] [ 1 2 3 6 ] [ 1 2 3 7 ] [ 1 2 3 8 ] [ 1 2 3 9 ] [ 1 2 7 9 ] [ 1 2 7 8 ] [ 1 2 7 A ] [ 1 3 7 A ] [ 1 3 7 8 ] [ 1 3 7 9 ] [ 1 3 4 9 ] [ 1 3 4 8 ] [ 1 3 4 7 ] [ 1 3 4 6 ] [ 1 3 4 5 ] [ 1 3 4 A ] [ 1 3 5 A ] [ 1 3 5 6 ] [ 1 3 5 7 ] [ 1 3 5 8 ] [ 1 3 5 9 ] [ 1 3 6 9 ] [ 1 3 6 8 ] [ 1 3 6 7 ] [ 1 3 6 A ] Dyck word 1..1..1..1.. 1..1..11.... 1..1..1.1... 1..11...1... 1..11..1.... 1..11.1..... 1..111...... 1..11....1.. 1..1.1...1.. 1..1.11..... 1..1.1.1.... 1..1.1..1... 11...1..1... 11...1.1.... 11...11..... 11...1...1.. 11..1....1.. 11..11...... 11..1.1..... 11..1..1.... 11..1...1... 11.1....1... 11.1...1.... 11.1..1..... 11.1.1...... 11.11....... 11.1.....1.. 111......1.. 1111........ 111.1....... 111..1...... 111...1..... 111....1.... 111.....1... 11....1.1... 11....11.... 11....1..1.. 1.1...1..1.. 1.1...11.... 1.1...1.1... 1.11....1... 1.11...1.... 1.11..1..... 1.11.1...... 1.111....... 1.11.....1.. 1.1.1....1.. 1.1.11...... 1.1.1.1..... 1.1.1..1.... 1.1.1...1... 1.1..1..1... 1.1..1.1.... 1.1..11..... 1.1..1...1.. direction [ + + + + ] [ + + + + ] [ + + + - ] [ + + + - ] [ + + + - ] [ + + + - ] [ + + + - ] [ + + + + ] [ + + - + ] [ + + - + ] [ + + - + ] [ + + - - ] [ + + - - ] [ + + - - ] [ + + - - ] [ + + - + ] [ + + - + ] [ + + - + ] [ + + - + ] [ + + - + ] [ + + - - ] [ + + - - ] [ + + - - ] [ + + - - ] [ + + - - ] [ + + - - ] [ + + - + ] [ + + - + ] [ + + - + ] [ + + - + ] [ + + - + ] [ + + - + ] [ + + - + ] [ + + - - ] [ + + + - ] [ + + + - ] [ + + + + ] [ + - + + ] [ + - + + ] [ + - + - ] [ + - + - ] [ + - + - ] [ + - + - ] [ + - + - ] [ + - + - ] [ + - + + ] [ + - + + ] [ + - + + ] [ + - + + ] [ + - + + ] [ + - + - ] [ + - - - ] [ + - - - ] [ + - - - ] [ + - - + ] Figure 15.5-B: Gray code for 3-ary Dyck words where all changes are homogeneous. The left column shows the vectors of (one-based) positions, the symbol ‘A’ is used for the number 10. 15.5: Increment-i RGS, k-ary Dyck words, and k-ary trees 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: positions [ 1 2 3 4 ] [ 1 2 3 6 ] [ 1 2 3 8 ] [ 1 2 3 A ] [ 1 2 3 9 ] [ 1 2 3 7 ] [ 1 2 3 5 ] [ 1 2 4 5 ] [ 1 2 4 6 ] [ 1 2 4 8 ] [ 1 2 4 A ] [ 1 2 4 9 ] [ 1 2 4 7 ] [ 1 2 6 7 ] [ 1 2 6 8 ] [ 1 2 6 A ] [ 1 2 6 9 ] [ 1 2 7 9 ] [ 1 2 7 A ] [ 1 2 7 8 ] [ 1 2 5 8 ] [ 1 2 5 A ] [ 1 2 5 9 ] [ 1 2 5 7 ] [ 1 2 5 6 ] [ 1 4 5 6 ] [ 1 4 5 7 ] [ 1 4 5 9 ] [ 1 4 5 A ] [ 1 4 5 8 ] [ 1 4 7 8 ] [ 1 4 7 A ] [ 1 4 7 9 ] [ 1 4 6 9 ] [ 1 4 6 A ] [ 1 4 6 8 ] [ 1 4 6 7 ] [ 1 3 6 7 ] [ 1 3 6 8 ] [ 1 3 6 A ] [ 1 3 6 9 ] [ 1 3 7 9 ] [ 1 3 7 A ] [ 1 3 7 8 ] [ 1 3 5 8 ] [ 1 3 5 A ] [ 1 3 5 9 ] [ 1 3 5 7 ] [ 1 3 5 6 ] [ 1 3 4 6 ] [ 1 3 4 8 ] [ 1 3 4 A ] [ 1 3 4 9 ] [ 1 3 4 7 ] [ 1 3 4 5 ] Dyck word 1111........ 111..1...... 111....1.... 111......1.. 111.....1... 111...1..... 111.1....... 11.11....... 11.1.1...... 11.1...1.... 11.1.....1.. 11.1....1... 11.1..1..... 11...11..... 11...1.1.... 11...1...1.. 11...1..1... 11....1.1... 11....1..1.. 11....11.... 11..1..1.... 11..1....1.. 11..1...1... 11..1.1..... 11..11...... 1..111...... 1..11.1..... 1..11...1... 1..11....1.. 1..11..1.... 1..1..11.... 1..1..1..1.. 1..1..1.1... 1..1.1..1... 1..1.1...1.. 1..1.1.1.... 1..1.11..... 1.1..11..... 1.1..1.1.... 1.1..1...1.. 1.1..1..1... 1.1...1.1... 1.1...1..1.. 1.1...11.... 1.1.1..1.... 1.1.1....1.. 1.1.1...1... 1.1.1.1..... 1.1.11...... 1.11.1...... 1.11...1.... 1.11.....1.. 1.11....1... 1.11..1..... 1.111....... [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ 335 direction . . . . ] . . . +2 ] . . . +2 ] . . . -2 ] . . . -2 ] . . . -2 ] . . . . ] . . +2 . ] . . +2 +2 ] . . +2 +3 ] . . +2 -3 ] . . +2 -3 ] . . +2 . ] . . +3 . ] . . +3 +2 ] . . +3 -3 ] . . +3 . ] . . -3 . ] . . -3 -1 ] . . -3 . ] . . . . ] . . . -1 ] . . . -1 ] . . . -1 ] . . . . ] . -2 . . ] . -2 . +2 ] . -2 . +3 ] . -2 . -3 ] . -2 . . ] . -2 -2 . ] . -2 -2 -2 ] . -2 -2 . ] . -2 . . ] . -2 . -1 ] . -2 . -1 ] . -2 . . ] . . . . ] . . . +2 ] . . . -3 ] . . . . ] . . -1 . ] . . -1 -1 ] . . -1 . ] . . -1 . ] . . -1 -1 ] . . -1 -1 ] . . -1 -1 ] . . -1 . ] . . . . ] . . . +1 ] . . . -1 ] . . . -1 ] . . . -1 ] . . . . ] Figure 15.5-C: Gray code for 3-ary Dyck words where all changes are both homogeneous and two-close. The left column shows the vectors of (one-based) positions, the symbol ‘A’ is used for the number 10. 336 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 Chapter 15: Parentheses strings ulong *c_; // positions of ones (1-based) ulong *e_; // Ehrlich array (focus pointers) bool *p_; // parity (1-based) int *s_; // directions: whether last/first (==0) or // rising (>0) or falling (<0); (1-based) public: dyck_gray2(ulong tk, ulong tm) // must have tk>=2, tm>=1 { k = tk; m = tm; ptt = false; c_ = new ulong[m+2]; // sentinels c_[0] (with computing MN) and c_[m+1] (with condition in next()) e_ = new ulong[m+1]; p_ = new bool[m+1]; s_ = new int[m+1]; first(); // p_[0] unused // s_[0] unused } ~dyck_gray2() [--snip--] void first() { for (ulong j=0; j<=m; ++j) e_[j] = j; // {e_[j] = j for 0 <= j <= m} for (ulong j=0; j<=m; ++j) s_[j] = 0; // {s_[j] = 0 for 1 <= j <= m} for (ulong j=0; j<=m; ++j) p_[j] = false; // {p_[j] = 0 for 1 <= j <= m} for (ulong j=0; j<=m; ++j) c_[j] = j; // first word == [1, 2, 3, ..., m] c_[m+1] = 0; // sentinel, c_[0] is also sentinel } The following comments in curly braces are from the paper: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 ulong next() // Return zero if current==last, else // position (!=0) in (zero-based) array c_[] // (the first element never changes). { ulong i = e_[m]; // The pivot if ( i==1 ) return 0; // current is last const ulong MN = c_[i-1] + 1; // {MN is the minimum value of c_[i]} // can touch sentinel c_[0] const ulong MX = (i - 1)*k + 1; // { MX is the maximum value of c_[i]} if ( s_[i] == 0 ) // { c_[i] is at its first value } { p_[i] = ptt; // { parity of total number of tories } s_[i] = +1; // {c_[i] starts rising unless it starts at max(i)} if ( c_[i] == MX ) // {one of these tories is not to c_[i]’s left} { p_[i] = 1 - p_[i]; s_[i] = -s_[i]; } if ( c_[i+1] == MX+k ) { p_[i] = 1 - p_[i]; } // can touch sentinel c_[m+1]==0 } if ( s_[i] > 0 ) // { c_[i] is rising } { if ( c_[i] == MN ) // {MN is taken and c_[i] can’t end there} { s_[i] = 2; } else { if ( (c_[i] == MN+1) && (s_[i] == 2) ) // {MN+1 is also taken} { s_[i] = 3; 15.5: Increment-i RGS, k-ary Dyck words, and k-ary trees 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 337 } } c_[i] += ( 1 + ( ((c_[i] % 2) == p_[i]) && (c_[i] < MX-1) ) ); if ( c_[i] == MX ) // {one more tory} { ptt = 1 - ptt; s_[i] = -s_[i]; } } else { // { c_[i] is falling } if ( c_[i] == MX ) { ptt = 1 - ptt; } // {one fewer tory} c_[i] -= ( 1 + ( ((c_[i] % 2) != p_[i] ) && (c_[i] > MN+1) ) ); } e_[m] = m; // {beginning to update Ehrlich array} if ( c_[i] + s_[i] == MN-1 ) // {c_[i] is at its last value} { s_[i] = 0; // {c_[i] will be at its first value the next time i is the pivot} e_[i] = e_[i-1]; e_[i-1] = i - 1; } return i - 1; // position in zero-based array c_[] } const ulong *data() const { return c_+1; } // zero-based array }; 15.5.3 The number of increment-i RGS n: 1 2 3 4 5 6 7 8 9 10 11 i=1: 1 2 5 14 42 132 429 1430 4862 16796 58786 i=2: 1 3 12 55 273 1428 7752 43263 246675 1430715 8414640 i=3: 1 4 22 140 969 7084 53820 420732 3362260 27343888 225568798 i=4: 1 5 35 285 2530 23751 231880 2330445 23950355 250543370 2658968130 Figure 15.5-D: The numbers Cn,i of increment-i RGS of length n for i ≤ 4 and n ≤ 11. The number Cn,i of length-n increment-i strings equals Cn,i = (i+1) n n  (15.5-1) in + 1 A recursion generalizing relation 15.4-2 is Qi Cn+1,i = (i + 1) Qk=1 i [(i + 1) n + k] k=1 [i n + k + 1] Cn,i (15.5-2) The sequences of numbers of length-n strings for i = 1, 2, 3, 4 start as sown in figure 15.5-D. These are respectively the entries A000108, A001764, A002293, A002294 in [312] where combinatorial interpretations are given. We can express the generating function Ci (x) as a hypergeometric series (see chapter 36 on page 685): Ci (x) = ∞ X Cn,i xn (15.5-3a) n=0  = F 1/(i + 1), 2/(i + 1), 3/(i + 1), . . . , (i + 1)/(i + 1) (i + 1)(i+1) x 2/i, 3/i, . . . , i/i, (i + 1)/i ii  (15.5-3b) Note that the last upper and second last lower parameter cancel. Now let fi (x) := x Ci (xi ), then fi (x) − fi (x)i+1 = x (15.5-4) That is, fi (x) can be computed as the series reversion of x − xi+1 . We choose i = 2 as an example: 338 Chapter 15: Parentheses strings ? t1=serreverse(x-x^3+O(x^(17))) x + x^3 + 3*x^5 + 12*x^7 + 55*x^9 + 273*x^11 + 1428*x^13 + 7752*x^15 + O(x^17) ? t2=hypergeom([1/3,2/3,3/3],[2/2,3/2],3^3/2^2*x)+O(x^17) 1 + x + 3*x^2 + 12*x^3 + 55*x^4 + 273*x^5 + 1428*x^6 + 7752*x^7 + ... + O(x^17) ? f=x*subst(t2,x,x^2); ? t1-f O(x^17) \\ f is actually the series reversion of x-x^3 ? f-f^3 x + O(x^35) \\ ... so f - f^3 == id We further have the following convolution property which generalizes relation 15.4-4: X Cj1 , i Cj2 , i Cj3 , i · · · Cji , i Cj(i+1) , i Cn,i = j1 + j2 + . . . + ji + j(i+1) = n − 1 (15.5-5) An explicit expression for the function Ci (x) is Ci (x) = exp  ∞  1 X (i + 1) n xn i + 1 n=1 n n ! (15.5-6) The expression generalizes a relation given in [227, rel.6] (set i = 1 and take the logarithm on both sides)   ∞ X 1 2n n=1 n n xn  = 2 log 1− √ 1 − 4x 2x  (15.5-7) A curious property of the functions Ci (x) is given in [349, entry “Hypergeometric Function”]:   i Ci x (1 − x) = 1 1−x (15.5-8) 339 Chapter 16 Integer partitions 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 6 == 6 == 6 == 6 == 6 == 6 == 6 == 6 == 6 == 6 == 6 == 6* 1 + 0 + 0 + 0 + 0 + 0 4* 1 + 1* 2 + 0 + 0 + 0 + 0 2* 1 + 2* 2 + 0 + 0 + 0 + 0 0 + 3* 2 + 0 + 0 + 0 + 0 3* 1 + 0 + 1* 3 + 0 + 0 + 0 1* 1 + 1* 2 + 1* 3 + 0 + 0 + 0 0 + 0 + 2* 3 + 0 + 0 + 0 2* 1 + 0 + 0 + 1* 4 + 0 + 0 0 + 1* 2 + 0 + 1* 4 + 0 + 0 1* 1 + 0 + 0 + 0 + 1* 5 + 0 0 + 0 + 0 + 0 + 0 + 1* 6 == == == == == == == == == == == 1 + 1 + 1 + 1 + 1 + 1 1 + 1 + 1 + 1 + 2 1 + 1 + 2 + 2 2 + 2 + 2 1 + 1 + 1 + 3 1 + 2 + 3 3 + 3 1 + 1 + 4 2 + 4 1 + 5 6 Figure 16.0-A: All (eleven) integer partitions of 6. An integer x is the sum of the positive integers less than or equal to itself in various ways. The decompositions into sums of integers are called the integer partitions of the number x. Figure 16.0-A shows all integer partitions of x = 6. 16.1 Solution of a generalized problem We can solve a more general problem and find all partitions of a number x P with respect to a set V = n−1 {v0 , v1 , . . . , vn−1 } where vi > 0, that is all decompositions of the form x = k=0 ck · vk where ci ≥ 0. The integer partitions are the special case V = {1, 2, 3, . . . , n}. To generate the partitions assign to the first bucket r0 an integer multiple of the first element v0 : r0 = c·v0 . This has to be done for all c ≥ 0 for which r0 ≤ x. Now set c0 = c. If r0 = x, we already found a partition (consisting of c0 only), else (if r0 < x) solve the remaining problem where x0 := x − c0 · v0 and V 0 := {v1 , v2 , . . . , vn−1 }. A C++ class for the generation of all partitions is [FXT: class partition gen in comb/partition-gen.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 class partition_gen // Integer partitions of x into supplied values pv[0],...,pv[n-1]. // pv[] defaults to [1,2,3,...,x] { public: ulong ct_; // Number of partitions found so far ulong n_; // Number of values ulong i_; // level in iterative search long *pv_; // values into which to partition ulong *pc_; // multipliers for values ulong pci_; // temporary for pc_[i_] long *r_; // rest long ri_; // temporary for r_[i_] long x_; // value to partition partition_gen(ulong x, ulong n=0, const ulong *pv=0) { if ( 0==n ) n = x; n_ = n; pv_ = new long[n_+1]; 340 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 Chapter 16: Integer partitions if ( pv ) for (ulong j=0; j=n_ ) return 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ulong partition_gen::next_func(ulong i) { start: if ( 0!=i ) { while ( r_[i]>0 ) { pc_[i-1] = 0; r_[i-1] = r_[i]; --i; goto start; // iteration } } else // iteration end { if ( 0!=r_[i] ) { long d = r_[i] / pv_[i]; r_[i] -= d * pv_[i]; pc_[i] = d; } } n_; r_[i_] = ri_; pc_[i_] = pci_; i_ = next_func(i_); for (ulong j=0; j=0 } if ( 0==r_[i] ) { // valid partition found 16.2: Iterative algorithm 26 27 28 29 30 31 32 33 34 35 36 37 341 ++ct_; return i; } ++i; if ( i>=n_ ) return n_; // search finished r_[i] -= pv_[i]; ++pc_[i]; goto start; // iteration } The routines can easily be adapted to the generation of partitions satisfying certain restrictions, for example, partitions into distinct parts (that is, ci ≤ 1). The listing shown in figure 16.0-A can be generated with [FXT: comb/partition-gen-demo.cc]. The 190, 569, 292 partitions of 100 are generated at a rate of about 18 M/s. 16.2 Iterative algorithm An iterative implementation for the generation of the integer partitions is given in [FXT: class partition in comb/partition.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 class partition { public: ulong *c_; // partition: c[1]* 1 + c[2]* 2 + ... + c[n]* n == n ulong *s_; // cumulative sums: s[j+1] = c[1]* 1 + c[2]* 2 + ... + c[j]* j ulong n_; // partitions of n public: partition(ulong n) { n_ = n; c_ = new ulong[n+1]; s_ = new ulong[n+1]; s_[0] = 0; // unused c_[0] = 0; // unused first(); } ~partition() { delete [] c_; delete [] s_; } void first() { c_[1] = n_; for (ulong i=2; i<=n_; i++) s_[1] = 0; for (ulong i=2; i<=n_; i++) } { c_[i] = 0; } { s_[i] = n_; } void last() { for (ulong i=1; i=2 that can be increased: ulong i = 2; while ( s_[i] 1 ) { s_[i] = z; c_[i] = 0; } c_[1] = z; // z* 1 == z // s_[1] unused return true; } The preceding partition can be computed as follows: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 bool prev() { if ( c_[1]==n_ ) return false; // first == n* 1 (c[1]==n) // Find first nonzero coefficient c[i] where i>=2: ulong i = 2; while ( c_[i]==0 ) ++i; --c_[i]; s_[i] += i; ulong z = s_[i]; // Now set c[1], c[2], ..., c[i-1] to the last partition // of z into i-1 parts: while ( --i > 1 ) { ulong q = (z>=i ? z/i : 0); // == z/i; c_[i] = q; s_[i+1] = z; z -= q*i; } c_[1] = z; s_[2] = z; // s_[1] unused return true; } [--snip--] }; Divisions which result in q = 0 are avoided, leading to a small speedup. The program [FXT: comb/partition-demo.cc] demonstrates the usage of the class. About 200 million partitions per second are generated, and about 70 million for the reversed order. 16.3 Partitions into m parts 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 1 1 1 1 1 1 1 1 1 1 9 1 1 1 1 1 1 1 1 1 2 8 1 1 1 1 1 1 1 1 1 3 7 1 1 1 1 1 1 1 1 1 4 6 1 1 1 1 1 1 1 1 1 5 5 1 1 1 1 1 1 1 1 2 2 7 1 1 1 1 1 1 1 1 2 3 6 1 1 1 1 1 1 1 1 2 4 5 1 1 1 1 1 1 1 1 3 3 5 1 1 1 1 1 1 1 1 3 4 4 1 1 1 1 1 1 1 2 2 2 6 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 1 1 1 1 1 1 1 2 2 3 5 1 1 1 1 1 1 1 2 2 4 4 1 1 1 1 1 1 1 2 3 3 4 1 1 1 1 1 1 1 3 3 3 3 1 1 1 1 1 1 2 2 2 2 5 1 1 1 1 1 1 2 2 2 3 4 1 1 1 1 1 1 2 2 3 3 3 1 1 1 1 1 2 2 2 2 2 4 1 1 1 1 1 2 2 2 2 3 3 1 1 1 1 2 2 2 2 2 2 3 1 1 1 2 2 2 2 2 2 2 2 Figure 16.3-A: The 22 partitions of 19 into 11 parts in lexicographic order. An algorithm for the generation of all partitions of n into m parts is given in [123, vol2, p.106]: 16.3: Partitions into m parts 343 The initial partition contains m−1 units and the element n−m+1. To obtain a new partition from a given one, pass over the elements of the latter from right to left, stopping at the first element f which is less, by at least two units, than the final element [...]. Without altering any element at the left of f , write f + 1 in place of f and every element to the right of f with the exception of the final element, in whose place is written the number which when added to all the other new elements gives the sum n. The process to obtain partitions stops when we reach one in which no part is less than the final part by at least two units. Figure 16.3-A shows the partitions of 19 into 11 parts. The data was generated with the program [FXT: comb/mpartition-demo.cc]. The implementation used is [FXT: class mpartition in comb/mpartition.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 class mpartition // Integer partitions of n into m parts { public: ulong *x_; // partition: x[1]+x[2]+...+x[m] = n ulong *s_; // aux: cumulative sums of x[] (s[0]=0) ulong n_; // integer partitions of n (must have n>0) ulong m_; // ... into m parts (must have 0= d=2 x^(d*(n*(n+1))/2 - (d-1)*n) * 1/prod(...) xxxxxxxxxxxxx #########xxxx W######### W xxxx xxxxxxxxx == #######xx == W####### W + xx xxxxxx #####x W##### W x xxx ### W### W x # W# W The sequences of numbers of partitions into an even/odd number of distinct parts are entries A067661 and A067659 in [312], respectively: 1, 0, 0, 1, 1, 2, 2, 3, 3, 4, 5, 6, 7, 9, 11, 13, 16, 19, 23, 27, 32, 38, 45, ... 0, 1, 1, 1, 1, 1, 2, 2, 3, 4, 5, 6, 8, 9, 11, 14, 16, 19, 23, 27, 32, 38, 44, ... The corresponding generating functions are η + (x) + η(x) 2 = η + (x) − η(x) 2 = ∞ X n=0 ∞ X 2 x2n +n Q2n k=1 (1 − x k) 2 2 ∞ X x2n +3n+1 x2n+1 x2n +n = Q2n+1 Q k k 1 − x2n+1 2n k=1 (1 − x ) k=1 (1 − x ) n=0 n=0 (16.4-41a) (16.4-41b) Adding relations 16.4-41a and 16.4-41b gives the second equality in 16.4-31, subtraction gives the second equality in 16.4-16a. 16.4: The number of integer partitions 16.4.3 351 Partitions into square-free parts ‡ We give relations for the ordinary generating functions for partitions into square-free parts. The Möbius function µ is defined in section 37.1.2 on page 705. The sequence of power series coefficients is given at the end of each relation. Partitions into square-free parts (entry A073576 in [312]): ∞ Y 1 1 − µ(n)2 xn n=1 ∞  2 −µ(n) Y η xn = (16.4-42) n=1 1, 1, 2, 3, 4, 6, 9, 12, 16, 21, 28, 36, 47, 60, 76, 96, 120, 150, ... Partitions into parts that are not square-free, note the start index on the right side product, (entry A114374): ∞ Y 1 1 − (1 − µ(n)2 ) xn n=1 ∞  2 +µ(n) Y η xn = (16.4-43) n=2 1, 0, 0, 0, 1, 0, 0, 0, 2, 1, 0, 0, 3, 1, 0, 0, 5, 2, 2, 0, 7, 3, 2, 0, \ 11, 6, 4, 3, 15, 8, 6, 3, 22, 13, 11, 6, 34, 18, 15, 9, 46, 27, 24, 17, ... Partitions into distinct square-free parts (entry A087188): ∞ Y 1 + µ(n)2 xn  = n=1 ∞ Y  2 +µ(n) η + xn (16.4-44) n=1 1, 1, 1, 2, 1, 2, 3, 3, 4, 4, 5, 6, 6, 8, 9, 10, 13, 14, 16, 18, 20, ... Partitions into odd square-free parts, also partitions into parts m such that 2m is square-free (entry A134345): ∞ Y 1 1 − µ(2n − 1)2 x2n−1 n=1  −µ(2n−1)   2 ∞ η x(2n−1) Y   η x2 (2n−1)2 n=1 = = ∞ Y 1 = 1 − µ(2n)2 xn n=1 ∞ Y   2 +µ(2n−1) η + x(2n−1) (16.4-45a) (16.4-45b) n=1 1, 1, 1, 2, 2, 3, 4, 5, 6, 7, 9, 11, 13, 16, 19, 23, 27, 32, 38, 44, ... Partitions into distinct odd square-free parts, also partitions into distinct parts m such that 2m is squarefree (entry A134337): ∞ Y 1 + µ(2n − 1)2 x  2n−1 = n=1 ∞ Y 1 + µ(2n)2 x n=1  n   +µ(2n−1) 2 η + x(2n−1)   = (16.4-46) η + x2 (2n−1)2 n=1 ∞ Y  1, 1, 0, 1, 1, 1, 1, 1, 2, 1, 1, 2, 2, 2, 2, 3, 4, 3, 4, 5, 5, 6, 6, 7, ... Partitions into square-free parts m 6≡ 0 mod p where p is prime: ∞ Y 1 1 − µ(p n)2 xn n=1  −µ(p n−r)   2 η x(p n−r)   = η xp (p n−r)2 n=1 r=1 ∞ p−1 Y Y (16.4-47) 352 Chapter 16: Integer partitions For example, partitions into square-free parts m 6≡ 0 mod 3: ∞ Y 1 1 − µ(3 n)2 xn n=1 ∞ Y 1 = 1 − µ(n)2 xn n=1, n6≡0 mod 3  −µ(3 n−1)    −µ(3 n−2)   2 2 ∞ η x(3 n−1) η x(3 n−2) Y     = η x3 (3 n−1)2 η x3 (3 n−2)2 n=1 = (16.4-48a) (16.4-48b) 1, 1, 2, 2, 3, 4, 5, 7, 8, 10, 13, 16, 20, 24, 30, 36, 43, 52, 61, 73, 86, ... Partitions into distinct square-free parts m 6≡ 0 mod p where p is prime: ∞ Y 1 + µ(p n)2 xn  n=1   +µ(p n−r) 2 η + x(p n−r)   = p (p n−r)2 η x + n=1 r=1 ∞ p−1 Y Y  (16.4-49) For example, partitions into distinct square-free parts m 6≡ 0 mod 3: ∞ Y 1 + µ(3 n)2 xn  ∞ Y = 1 + µ(n)2 xn  = (16.4-50a)   +µ(3 n−1)    +µ(3 n−2) 2 2 η + x(3 n−1) η + x(3 n−2)     η + x3 (3 n−1)2 η + x3 (3 n−2)2 n=1 (16.4-50b) n=1, n6≡0 mod 3 n=1 = ∞ Y  1, 1, 1, 1, 0, 1, 1, 2, 2, 1, 2, 2, 3, 4, 4, 4, 4, 5, 6, 7, 7, 7, 8, 9, 12, 12, ... 16.4.4 Relations involving sums of divisors ‡ The logarithmic generating function (LGF) for objects counted by the sequence cn has the following form: ∞ X cn xn n=1 (16.4-51) n The LGF for σ(n), the sum of divisors of n, is connected to the ordinary generating function for the partitions as follows (compare with relation 37.2-15a on page 712): ∞ X σ(n) xn n=1 n = log (1/η(x)) (16.4-52) We generate the sequence of the σ(n), entry A000203 in [312], using GP: ? N=25; L=ceil(sqrt(N))+1; x=’x+O(’x^N); ? s=log(1/eta(x)) x + 3/2*x^2 + 4/3*x^3 + 7/4*x^4 + 6/5*x^5 + ... ? v=Vec(s); vector(#v,j,v[j]*j) [1, 3, 4, 7, 6, 12, 8, 15, 13, 18, 12, 28, 14, 24, 24, 31, 18, 39, 20, 42, 32, 36, 24, 60] Write o(n) for the sum of odd divisors of n (entry A000593). The LGF is related to the partitions into distinct parts: ∞ X o(n) xn n=1 n =  log η + (x) ? s=log(eta(x^2)/eta(x)) x + 1/2*x^2 + 4/3*x^3 + 1/4*x^4 + 6/5*x^5 + ... ? v=Vec(s); vector(#v,j,v[j]*j) [1, 1, 4, 1, 6, 4, 8, 1, 13, 6, 12, 4, 14, 8, 24, 1, 18, 13, 20, 6, 32, 12, 24, 4] (16.4-53) 16.4: The number of integer partitions 353 Let s(n) be the sum of square-free divisors of n. The LGF for the sums s(n) is the logarithm of the generating function for the partitions into square-free parts: ! ∞ ∞  2 −µ(n) X Y s(n) xn n = log η x (16.4-54) n n=1 n=1 The sequence of the s(n) is entry A048250 in [312]: ? s=log(prod(n=1,L,eta(x^(n^2))^(-moebius(n)))) x + 3/2*x^2 + 4/3*x^3 + 3/4*x^4 + 6/5*x^5 + ... ? v=Vec(s);vector(#v,j,v[j]*j) [1, 3, 4, 3, 6, 12, 8, 3, 4, 18, 12, 12, 14, 24, 24, 3, 18, 12, 20, 18, 32, 36, 24, 12] A divisor d of n is called a unitary divisor if gcd(d, n/d) = 1. We have the following identity, note the exponent −µ(n)/n on the right side: ! ∞ ∞  2 −µ(n)/n Y X u(n) xn η xn (16.4-55) = log n n=1 n=1 The sequence of the u(n) is entry A034448: ? s=(log(prod(n=1,L,eta(x^(n^2))^(-moebius(n)/n)))) x + 3/2*x^2 + 4/3*x^3 + 5/4*x^4 + 6/5*x^5 + ... ? v=Vec(s);vector(#v,j,v[j]*j) [1, 3, 4, 5, 6, 12, 8, 9, 10, 18, 12, 20, 14, 24, 24, 17, 18, 30, 20, 30, 32, 36, 24, 36] The sums u(n) of the divisors of n that are not unitary have a LGF connected to the partitions into distinct square-free parts: ! ∞ ∞  2 +µ(n)/n Y X u(n) xn n η x = log (16.4-56) n n=1 n=1 The sequence of the sums u(n) is entry A048146: ? s=log(prod(n=2,L,eta(x^(n^2))^(+moebius(n)/n))) 1/2*x^4 + 3/4*x^8 + 1/3*x^9 + 2/3*x^12 + 7/8*x^16 + ... ? v=Vec(s+’x); v[1]=0; \\ let vector start with 3 zeros ? vector(#v,j,v[j]*j) [0, 0, 0, 2, 0, 0, 0, 6, 3, 0, 0, 8, 0, 0, 0, 14, 0, 9, 0, 12, 0, 0, 0, 24, 5, 0, 12] For the sums s(n) of the divisors of n that are not square-free we have the LGF ! ∞ ∞  2 +µ(n) X Y s(n) xn n = log η x n n=1 n=1 (16.4-57) The sequence of the sums s(n) is entry A162296: ? s=log(prod(n=2,L,eta(x^(n^2))^(+moebius(n)))) x^4 + 3/2*x^8 + x^9 + 4/3*x^12 + 7/4*x^16 + ... ? v=Vec(s+’x); v[1]=0; \\ let vector start with 3 zeros ? vector(#v,j,v[j]*j) [0, 0, 0, 4, 0, 0, 0, 12, 9, 0, 0, 16, 0, 0, 0, 28, 0, 27, 0, 24, 0, 0, 0, 48, 25, 0, 36] 354 Chapter 17: Set partitions Chapter 17 Set partitions For a set of n elements, say Sn := {1, 2, . . . , n}, a set partition is a set P = {s1 , s2 , . . . , sk } of nonempty subsets si of Sn whose intersection is empty and whose union equals Sn . For example, there are 5 set partitions of the set S3 = {1, 2, 3}: 1: 2: 3: 4: 5: { {1, 2, 3} } { {1, 2}, {3} } { {1, 3}, {2} } { {1}, {2, 3} } { {1}, {2}, {3} } The following sets are not set partitions of S3 : { {1, 2, 3}, {1} } { {1}, {3} } // intersection not empty // union does not contain 2 As the order of elements in a set does not matter we sort them in ascending order. For a set of sets we order the sets in ascending order of the first elements. The number of set partitions of the n-set is the Bell number Bn , see section 17.2 on page 358. 17.1 Recursive generation We write Zn for the list of all set partitions of the n-element set Sn . To generate Zn we observe that with a complete list Zn−1 of partitions of the set Sn−1 we can generate the elements of Zn in the following way: For each element (set partition) P ∈ Zn−1 , create set partitions of Sn by appending the element n to the first, second, . . . , last subset, and one more by appending the set {n} as the last subset. For example, the partition {{1, 2}, {3, 4}} ∈ Z4 leads to 3 partitions of S5 : P = { {1, 2}, {3, 4} } --> { {1, 2, 5}, {3, 4} } --> { {1, 2}, {3, 4, 5} } --> { {1, 2}, {3, 4}, {5} } Now we start with the only partition {{1}} of the 1-element set and apply the described step n − 1 times. The construction (given in [261, p.89]) is shown in the left column of figure 17.1-A, the right column shows all set partitions for n = 5. A modified version of the recursive construction generates the set partitions in a minimal-change order. We can generate the ‘incremented’ partitions in two ways, forward (left to right) P = { {1, 2}, {3, 4} } --> { {1, 2, 5}, {3, 4} } --> { {1, 2}, {3, 4, 5} } --> { {1, 2}, {3, 4}, {5} } or backward (right to left) P = { {1, 2}, {3, 4} } --> { {1, 2}, {3, 4}, {5} } --> { {1, 2}, {3, 4, 5} } --> { {1, 2, 5}, {3, 4} } 17.1: Recursive generation 355 -----------------p1={1} --> p={1, 2} --> p={1}, {2} -----------------p1={1, 2} --> p={1, 2, 3} --> p={1, 2}, {3} p1={1}, {2} --> p={1, 3}, {2} --> p={1}, {2, 3} --> p={1}, {2}, {3} -----------------p1={1, 2, 3} --> p={1, 2, 3, 4} --> p={1, 2, 3}, {4} p1={1, 2}, {3} --> p={1, 2, 4}, {3} --> p={1, 2}, {3, 4} --> p={1, 2}, {3}, {4} p1={1, 3}, {2} --> p={1, 3, 4}, {2} --> p={1, 3}, {2, 4} --> p={1, 3}, {2}, {4} p1={1}, {2, 3} --> p={1, 4}, {2, 3} --> p={1}, {2, 3, 4} --> p={1}, {2, 3}, {4} p1={1}, {2}, {3} --> p={1, 4}, {2}, {3} --> p={1}, {2, 4}, {3} --> p={1}, {2}, {3, 4} --> p={1}, {2}, {3}, {4} ------------------ setpart(4) == 1: {1, 2, 3, 4} 2: {1, 2, 3}, {4} 3: {1, 2, 4}, {3} 4: {1, 2}, {3, 4} 5: {1, 2}, {3}, {4} 6: {1, 3, 4}, {2} 7: {1, 3}, {2, 4} 8: {1, 3}, {2}, {4} 9: {1, 4}, {2, 3} 10: {1}, {2, 3, 4} 11: {1}, {2, 3}, {4} 12: {1, 4}, {2}, {3} 13: {1}, {2, 4}, {3} 14: {1}, {2}, {3, 4} 15: {1}, {2}, {3}, {4} Figure 17.1-A: Recursive construction of the set partitions of the 4-element set S4 = {1, 2, 3, 4} (left) and the resulting list of all set partitions of 4 elements (right). -----------------P={1} --> {1, 2} --> {1}, {2} -----------------P={1, 2, 3} --> {1, 2, 3, 4} --> {1, 2, 3}, {4} P={1, 2}, {3} --> {1, 2}, {3}, {4} --> {1, 2}, {3, 4} --> {1, 2, 4}, {3} -----------------P={1, 2} --> {1, 2, 3} --> {1, 2}, {3} P={1}, {2} -->{1}, {2}, {3} -->{1}, {2, 3} -->{1, 3}, {2} P={1}, {2}, {3} --> {1, 4}, {2}, {3} --> {1}, {2, 4}, {3} --> {1}, {2}, {3, 4} --> {1}, {2}, {3}, {4} setpart(4)== {1, 2, 3, 4} {1, 2, 3}, {4} {1, 2}, {3}, {4} {1, 2}, {3, 4} {1, 2, 4}, {3} {1, 4}, {2}, {3} {1}, {2, 4}, {3} {1}, {2}, {3, 4} {1}, {2}, {3}, {4} {1}, {2, 3}, {4} {1}, {2, 3, 4} {1, 4}, {2, 3} {1, 3, 4}, {2} {1, 3}, {2, 4} {1, 3}, {2}, {4} P={1}, {2, 3} --> {1}, {2, 3}, {4} --> {1}, {2, 3, 4} --> {1, 4}, {2, 3} P={1, 3}, {2} --> {1, 3, 4}, {2} --> {1, 3}, {2, 4} --> {1, 3}, {2}, {4} Figure 17.1-B: Construction of a Gray code for set partitions as an interleaving process. 356 Chapter 17: Set partitions 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: {1, 2, 3, 4} {1, 2, 3}, {4} {1, 2}, {3}, {4} {1, 2}, {3, 4} {1, 2, 4}, {3} {1, 4}, {2}, {3} {1}, {2, 4}, {3} {1}, {2}, {3, 4} {1}, {2}, {3}, {4} {1}, {2, 3}, {4} {1}, {2, 3, 4} {1, 4}, {2, 3} {1, 3, 4}, {2} {1, 3}, {2, 4} {1, 3}, {2}, {4} 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: {1}, {2}, {3}, {4} {1}, {2}, {3, 4} {1}, {2, 4}, {3} {1, 4}, {2}, {3} {1, 4}, {2, 3} {1}, {2, 3, 4} {1}, {2, 3}, {4} {1, 3}, {2}, {4} {1, 3}, {2, 4} {1, 3, 4}, {2} {1, 2, 3, 4} {1, 2, 3}, {4} {1, 2}, {3}, {4} {1, 2}, {3, 4} {1, 2, 4}, {3} Figure 17.1-C: Set partitions of S4 = {1, 2, 3, 4} in two different minimal-change orders. The resulting process of interleaving elements is shown in figure 17.1-B. The method is similar to Trotter’s construction for permutations, see figure 10.7-B on page 253. If we change the direction with every subset that is to be incremented, we get the minimal-change order shown in figure 17.1-C for n = 4. The left column is generated when starting with the forward direction in each step of the recursion, the right when starting with the backward direction. The lists can be computed with [FXT: comb/setpart-demo.cc]. The C++ class [FXT: class setpart in comb/setpart.h] stores the list in an array of signed characters. The stored value is negated Pn if the element is the last in the subset. The work involved with the creation of Zn is proportional to k=1 k Bk where Bk is the k-th Bell number. The parameter xdr of the constructor determines the order in which the partitions are being created: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 class setpart // Set partitions of the set {1,2,3,...,n} // By default in minimal-change order { public: ulong n_; // Number of elements of set (set = {1,2,3,...,n}) int *p_; // p[] contains set partitions of length 1,2,3,...,n int **pp_; // pp[k] points to start of set partition k int *ns_; // ns[k] Number of Sets in set partition k int *as_; // element k attached At Set (0<=as[k]<=k) of set(k-1) int *d_; // direction with recursion (+1 or -1) int *x_; // current set partition (==pp[n]) bool xdr_; // whether to change direction in recursion (==> minimal-change order) int dr0_; // dr0: starting direction in each recursive step: // dr0=+1 ==> start with partition {{1,2,3,...,n}} // dr0=-1 ==> start with partition {{1},{2},{3},...,{n}}} public: setpart(ulong n, bool xdr=true, int dr0=+1) { n_ = n; ulong np = (n_*(n_+1))/2; // == \sum_{k=1}^{n}{k} p_ = new int[np]; pp_ = new int *[n_+1]; pp_[0] = 0; // unused pp_[1] = p_; for (ulong k=2; k<=n_; ++k) pp_[k] = pp_[k-1] + (k-1); ns_ = new int[n_+1]; as_ = new int[n_+1]; d_ = new int[n_+1]; x_ = pp_[n_]; init(xdr, dr0); } [--snip--] // destructor bool next() { return next_rec(n_); } 17.1: Recursive generation 40 41 42 43 44 45 46 47 48 49 50 51 52 53 const int* data() 357 const { return x_; } ulong print() const // Print current set partition // Return number of chars printed { return print_p(n_); } ulong print_p(ulong k) const; void print_internal() const; // print internal state protected: [--snip--] }; // internal methods The actual work is done by the methods next_rec() and cp_append() [FXT: comb/setpart.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 int setpart::cp_append(const int *src, int *dst, ulong k, ulong a) // Copy partition in src[0,...,k-2] to dst[0,...,k-1] // append element k at subset a (a>=0) // Return number of sets in created partition. { ulong ct = 0; for (ulong j=0; j 0 ) dst[j] = e; else { if ( a==ct ) { dst[j]=-e; ++dst; dst[j]=-k; } else dst[j] = e; ++ct; } } if ( a>=ct ) { dst[k-1] = -k; ++ct; } return ct; } int setpart::next_rec(ulong k) // Update partition in level k from partition in level k-1 // Return number of sets in created partition { if ( k<=1 ) return 0; // current is last (k<=n) int d = d_[k]; int as = as_[k] + d; bool ovq = ( (d>0) ? (as>ns_[k-1]) : (as<0) ); if ( ovq ) // have to recurse { ulong ns1 = next_rec(k-1); if ( 0==ns1 ) return 0; d = ( xdr_ ? -d : dr0_ ); d_[k] = d; as = ( (d>0) ? 0 : ns_[k-1] ); } as_[k] = as; ulong ns = cp_append(pp_[k-1], pp_[k], k, as); ns_[k] = ns; return ns; } The partitions are represented by an array of integers whose absolute value is ≤ n. A negative value indicates that it is the last of the subset. The set partitions of S4 together with their ‘signed value’ representations are shown in figure 17.1-D. The array as[ ] contains a restricted growth string (RGS) with the condition aj ≤ 1 + maxi m_[k] ); s_[k] += 1UL; ulong mm = m_[k]; mm += (s_[k]>=mm); m_[k+1] = mm; // == max2(m_[k], s_[k]+1) while ( ++k m_[k]); // greater max q |= (sk1 >= p_); // more than p parts } while ( q ); if ( k == 0 ) return false; s_[k] += 1UL; ulong mm = m_[k]; mm += (s_[k]>=mm); m_[k+1] = mm; // == max2(m_[k], s_[k]+1); while ( ++k start with partition {{1,2,3,...,n}} // dr0=-1 ==> start with partition {{1},{2},{3},...,{n}}} { n_ = n; m_ = new ulong[n_+1]; m_[0] = ~0UL; // sentinel m[0] = infinity s_ = new ulong[n_]; d_ = new ulong[n_]; first(dr0); } [--snip--] void first(int dr0) { const ulong n = n_; const ulong dd = (dr0 >= 0 ? +1UL : -1UL); if ( dd==1 ) { for (ulong k=0; k m_[k] ); if ( k == 0 ) // <0 or >max return false; s_[k] += d_[k]; m_[k+1] = max2(m_[k], s_[k]+1); while ( ++k RGS for set partitions public: rgs_maxincr(ulong n, ulong i=1) { n_ = n; m_ = new ulong[n_]; s_ = new ulong[n_]; i_ = i; first(); } ~rgs_maxincr() { delete [] m_; delete [] s_; } void first() { ulong n = n_; for (ulong k=0; k m1+i_ ) // "carry" { s_[k] = 0; goto start; } s_[k] = sk; if ( sk>m1 ) m1 = sk; for (ulong j=k; j 0 = M (k) otherwise and (17.3-5a) (17.3-5b) The function M (k) is maxj mp ) // "carry" { s_[k] = 0; goto start; } s_[k] = sk; if ( sk==mp ) m1 += i_; for (ulong j=k; j mp ) // "carry" { s_[k] = 0; goto start; } s_[k] = sk; return k; } [--snip--] The sequence of the numbers of K-increment RGS of length n is entry A107877 in [312]: n: 0 1 1 1 2 2 3 7 4 37 5 268 6 2496 7 28612 8 391189 9 6230646 10 113521387 The strings of length 4 are shown in figure 17.3-D. They can be generated with the program [FXT: comb/rgs-kincr-demo.cc]. 370 Chapter 18: Necklaces and Lyndon words Chapter 18 Necklaces and Lyndon words A sequence that is minimal among all its cyclic rotations is called a necklace (see section 3.5.2 on page 149 for the definition in terms of equivalence classes). Necklaces with k possible values for each element are called k-ary (or k-bead) necklaces. We restrict our attention to binary necklaces: only two values are allowed and we represent them by 0 and 1. 0: . 1 1: 1 1 n=1: #=2 0: .. 1 1: .1 2 2: 11 1 n=2: #=3 0: ... 1 1: ..1 3 2: .11 3 3: 111 1 n=3: #=4 0: .... 1: ...1 2: ..11 3: .1.1 4: .111 5: 1111 n=4: #=6 1 4 4 2 4 1 0: ..... 1: ....1 2: ...11 3: ..1.1 4: ..111 5: .1.11 6: .1111 7: 11111 n=5: #=8 1 5 5 5 5 5 5 1 0: ...... 1: .....1 2: ....11 3: ...1.1 4: ...111 5: ..1..1 6: ..1.11 7: ..11.1 8: ..1111 9: .1.1.1 10: .1.111 11: .11.11 12: .11111 13: 111111 n=6: #=14 1 6 6 6 6 3 6 6 6 2 6 3 6 1 0: ....... 1: ......1 2: .....11 3: ....1.1 4: ....111 5: ...1..1 6: ...1.11 7: ...11.1 8: ...1111 9: ..1..11 10: ..1.1.1 11: ..1.111 12: ..11.11 13: ..111.1 14: ..11111 15: .1.1.11 16: .1.1111 17: .11.111 18: .111111 19: 1111111 n=7: #=20 1 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 1 0: ........ 1: .......1 2: ......11 3: .....1.1 4: .....111 5: ....1..1 6: ....1.11 7: ....11.1 8: ....1111 9: ...1...1 10: ...1..11 11: ...1.1.1 12: ...1.111 13: ...11..1 14: ...11.11 15: ...111.1 16: ...11111 17: ..1..1.1 18: ..1..111 19: ..1.1.11 20: ..1.11.1 21: ..1.1111 22: ..11..11 23: ..11.1.1 24: ..11.111 25: ..111.11 26: ..1111.1 27: ..111111 28: .1.1.1.1 29: .1.1.111 30: .1.11.11 31: .1.11111 32: .11.1111 33: .111.111 34: .1111111 35: 11111111 n=8: #=36 1 8 8 8 8 8 8 8 8 4 8 8 8 8 8 8 8 8 8 8 8 8 4 8 8 8 8 8 2 8 8 8 8 4 8 1 Figure 18.0-A: All binary necklaces of lengths up to 8 and their periods. Dots represent zeros. To find all length-n necklaces we can, for all binary words of length n, test whether a word is equal to its cyclic minimum (see section 1.13 on page 29). The sequences of binary necklaces for n ≤ 8 are shown in figure 18.0-A. As 2n words have to be tested, this approach is inefficient for large n. Luckily there is both a much better algorithm for generating all necklaces and a formula for their number. Not all necklaces are created equal. Each necklace can be assigned a period that is a divisor of the length. That period is the smallest (nonzero) cyclic shift that transforms the word into itself. The periods are given directly right to each necklace in figure 18.0-A. For n prime the only periodic necklaces are those two that contain all ones or zeros. Aperiodic (or equivalently, period equals length) necklaces are called Lyndon words. 18.1: Generating all necklaces 371 For a length-n binary word x the function bit_cyclic_period(x,n) from section 1.13 on page 29 returns the period of the word. 18.1 Generating all necklaces We give several methods to generate all necklaces of a given size. An efficient algorithm for the generation of bracelets (see section 3.5.2.4 on page 150) is given in [299]. 18.1.1 The FKM algorithm 1: [ . . . . ] j=1 N 2: [ . . . 1 ] j=4 N L 3: [ . . . 2 ] j=4 N L 4: [ . . 1 . ] j=3 5: [ . . 1 1 ] j=4 N L 6: [ . . 1 2 ] j=4 N L 7: [ . . 2 . ] j=3 8: [ . . 2 1 ] j=4 N L 9: [ . . 2 2 ] j=4 N L 10: [ . 1 . 1 ] j=2 N 11: [ . 1 . 2 ] j=4 N L 12: [ . 1 1 . ] j=3 13: [ . 1 1 1 ] j=4 N L 14: [ . 1 1 2 ] j=4 N L 15: [ . 1 2 . ] j=3 16: [ . 1 2 1 ] j=4 N L 17: [ . 1 2 2 ] j=4 N L 18: [ . 2 . 2 ] j=2 N 19: [ . 2 1 . ] j=3 20: [ . 2 1 1 ] j=4 N L 21: [ . 2 1 2 ] j=4 N L 22: [ . 2 2 . ] j=3 23: [ . 2 2 1 ] j=4 N L 24: [ . 2 2 2 ] j=4 N L 25: [ 1 1 1 1 ] j=1 N 26: [ 1 1 1 2 ] j=4 N L 27: [ 1 1 2 1 ] j=3 28: [ 1 1 2 2 ] j=4 N L 29: [ 1 2 1 2 ] j=2 N 30: [ 1 2 2 1 ] j=3 31: [ 1 2 2 2 ] j=4 N L 32: [ 2 2 2 2 ] j=1 N 32 (4, 3) pre-necklaces. 24 necklaces and 18 Lyndon words. 1: [ . . . . . . ] j=1 N 2: [ . . . . . 1 ] j=6 N 3: [ . . . . 1 . ] j=5 4: [ . . . . 1 1 ] j=6 N 5: [ . . . 1 . . ] j=4 6: [ . . . 1 . 1 ] j=6 N 7: [ . . . 1 1 . ] j=5 8: [ . . . 1 1 1 ] j=6 N 9: [ . . 1 . . 1 ] j=3 N 10: [ . . 1 . 1 . ] j=5 11: [ . . 1 . 1 1 ] j=6 N 12: [ . . 1 1 . . ] j=4 13: [ . . 1 1 . 1 ] j=6 N 14: [ . . 1 1 1 . ] j=5 15: [ . . 1 1 1 1 ] j=6 N 16: [ . 1 . 1 . 1 ] j=2 N 17: [ . 1 . 1 1 . ] j=5 18: [ . 1 . 1 1 1 ] j=6 N 19: [ . 1 1 . 1 1 ] j=3 N 20: [ . 1 1 1 . 1 ] j=4 21: [ . 1 1 1 1 . ] j=5 22: [ . 1 1 1 1 1 ] j=6 N 23: [ 1 1 1 1 1 1 ] j=1 N 23 (6, 2) pre-necklaces. 14 necklaces and 9 Lyndon words. L L L L L L L L L Figure 18.1-A: Ternary length-4 (left) and binary length-6 (right) pre-necklaces as generated by the FKM algorithm. Dots are used for zeros, necklaces are marked with ‘N’, Lyndon words with ‘L’. The following algorithm for generating all necklaces actually produces pre-necklaces, a subset of which are the necklaces. A pre-necklace is a string that is the prefix of some necklace. The FKM algorithm (for Fredericksen, Kessler, Maiorana) to generate all k-ary length-n pre-necklaces proceeds as follows: 1. Initialize the word F = [f1 , f2 , . . . , fn ] to all zeros. Set j = 1. 2. (Visit pre-necklace F . If j divides n, then F is a necklace. If j equals n, then F is a Lyndon word.) 3. Find the largest index j so that fj < k−1. If there is no such index (then F = [k−1, k−1, . . . , k−1], the last necklace), then terminate. 4. Increment fj . Fill the suffix starting at fj+1 with copies of [f1 , . . . , fj ]. Goto step 2. 372 Chapter 18: Necklaces and Lyndon words The crucial steps are [FXT: comb/necklace-fkm-demo.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 for (ulong i=1; i<=n; ++i) f[i] = 0; // Initialize to zero bool nq = 1; // whether pre-necklace is a necklace bool lq = 0; // whether pre-necklace is a Lyndon word ulong j = 1; while ( 1 ) { // Print necklace: cout << setw(4) << pct << ":"; print_vec(" ", f+1, n, true); cout << " j=" << j; if ( nq ) cout << " N"; if ( lq ) cout << " L"; cout << endl; // Find largest index where we can increment: j = n; while ( f[j]==k-1 ) { --j; }; if ( j==0 ) break; ++f[j]; // Copy periodically: for (ulong i=1,t=j+1; t<=n; ++i,++t) nq = ( (n%j)==0 ); lq = ( j==n ); f[t] = f[i]; // necklace if j divides n // Lyndon word if j equals n } Two example runs are shown in figure 18.1-A. An efficient implementation of the algorithm is [FXT: class necklace in comb/necklace.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 class necklace { public: ulong *a_; // the string, NOTE: one-based ulong *dv_; // delta sequence of divisors of n ulong n_; // length of strings ulong m1_; // m-ary strings, m1=m-1 ulong j_; // period of the word (if necklaces) public: necklace(ulong m, ulong n) { n_ = ( n ? n : 1 ); // at least 1 m1_ = ( m>1 ? m-1 : 1); // at least 2 a_ = new ulong[n_+1]; dv_ = new ulong[n_+1]; for (ulong j=1; j<=n; ++j) dv_[j] = ( 0==(n_%j ) ); first(); } [--snip--] void first() { for (ulong j=0; j<=n_; ++j) j_ = 1; } [--snip--] a_[j] = 0; The method to compute the next pre-necklace is 1 2 3 4 5 6 7 8 9 10 11 12 ulong next_pre() // next pre-necklace // return j (zero when finished) { // Find rightmost digit that can be incremented: ulong j = n_; while ( a_[j] == m1_ ) { --j; } // Increment: // if ( 0==j_ ) ++a_[j]; return 0; // Copy periodically: // last // divisors 18.1: Generating all necklaces 13 14 15 16 17 373 for (ulong k=j+1; k<=n_; ++k) a_[k] = a_[k-j]; j_ = j; return j; } Note the commented out return with the last word, this gives a speedup (and no harm is done with the following copying). The array dv is used to determine whether the current pre-necklace is also a necklace (or Lyndon word) via simple lookups: 1 2 3 4 5 6 7 8 9 10 bool is_necklace() const { return ( 0!=dv_[j_] ); } bool is_lyn() const { return ( j_==n_ ); } // whether j divides n // whether j equals n The methods for the computation of the next necklace or Lyndon word are 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 ulong next() // next necklace { do { next_pre(); if ( 0==j_ ) return 0; } while ( 0==dv_[j_] ); // until j divides n return j_; } ulong next_lyn() // next Lyndon word { do { next_pre(); if ( 0==j_ ) return 0; } while ( j_==n_ ); // until j equals n return j_; // == n } }; The rate of generation for pre-necklaces is about 98 M/s for base 2, 140 M/s for base 3, and 180 M/s for base 4 [FXT: comb/necklace-demo.cc]. A specialization of the algorithm for binary necklaces is [FXT: class binary necklace in comb/binary-necklace.h]. The rate of generation for pre-necklaces is about 128 M/s [FXT: comb/binary-necklace-demo.cc]. A version of the algorithm that produces the binary necklaces as bits of a word is given in section 1.13.3 on page 30. The binary necklaces of length n can be used as cycle leaders in the length-2n zip permutation (and its inverse) that is discussed in section 2.10 on page 125. An algorithm for the generation of all irreducible binary polynomials via Lyndon words is described in section 40.10 on page 856. 18.1.2 Binary Lyndon words with length a Mersenne exponent The length-n binary Lyndon words for n an exponent of a Mersenne prime Mn = 2n − 1 can be generated efficiently as binary expansions of the powers of a primitive root r of Mn until the second word with just one bit is reached. With n = 7, M7 = 127 and the primitive root r = 3 we get the sequence shown in figure 18.1-B. The sequence of minimal primitive roots rn of the first Mersenne primes Mn = 2n − 1 is entry A096393 in [312]: 2: 2 3: 3 5: 3 7: 3 13: 17 17: 3 19: 3 31: 7 61: 37 89: 3 107: 3 127: 43 521: 3 607: 5 <--= 5 is a primitive root of 2**607-1 1279: 5 374 Chapter 18: Necklaces and Lyndon words 0 : a= ......1 = 1 : a= .....11 = 2 : a= ...1..1 = 3 : a= ..11.11 = 4 : a= 1.1...1 = 5 : a= 111.1.. = 6 : a= 1.1111. = 7 : a= ..111.. = 8 : a= 1.1.1.. = 9 : a= 11111.1 = 10 : a= 1111..1 = 11 : a= 11.11.1 = 12 : a= 1..1..1 = 13 : a= 1.111.. = 14 : a= ..1.11. = 15 : a= 1....1. = 16 : a= 1...111 = 17 : a= 1.1.11. = 18 : a= ....1.. = 19 : a= ...11.. = 20 : a= .1..1.. = 21 : a= 11.11.. = 22 : a= 1...11. = 23 : a= 1.1..11 = 24 : a= 1111.1. = 25 : a= 111.... = [--snip--] 1 3 9 27 81 116 94 28 84 125 121 109 73 92 22 66 71 86 4 12 36 108 70 83 122 112 == == == == == == == == == == == == == == == == == == == == == == == == == == ......1 .....11 ...1..1 ..11.11 ...11.1 ..111.1 .1.1111 ....111 ..1.1.1 .111111 ..11111 .11.111 ..1..11 ..1.111 ...1.11 ....1.1 ...1111 .1.1.11 ......1 <--= sequence restarts .....11 ...1..1 ..11.11 ...11.1 ..111.1 .1.1111 ....111 Figure 18.1-B: Generation of all (18) 7-bit Lyndon words as binary representations of the powers modulo 127 of the primitive root 3. The right column gives the cyclic minima. Dots are used for zeros. 18.1.3 A constant amortized time (CAT) algorithm A constant amortized time (CAT) algorithm to generate all k-ary length-n pre-necklaces is given in [95]. The crucial part of a recursive algorithm [FXT: comb/necklace-cat-demo.cc] is the function 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ulong K, N; // K-ary pre-necklaces of length N ulong f[N]; void crsms_gen(ulong n, ulong j) { if ( n > N ) visit(j); // pre-necklace in f[1,...,N] else { f[n] = f[n-j]; crsms_gen(n+1, j); for (ulong i=f[n-j]+1; i N ) visit(j); else { if ( -1==x ) { if ( 0==f[n-j] ) { f[n] = 1; xgen(n+1, n, -x); } f[n] = f[n-j]; xgen(n+1, j, +x); } 18.1: Generating all necklaces 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: .......1 ......11 .....111 .....1.1 ....11.1 ....1111 ....1.11 ....1..1 ...11..1 ...11.11 375 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: ...11111 ...111.1 ...1.1.1 ...1.111 ...1..11 ..11.111 <<+1 ..11.1.1 ..1111.1 ..111111 ..111.11 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: ..1.1.11 ..1.1111 ..1.11.1 ..1..111 <<+1 ..1..1.1 .11.1111 <<+2 .1111111 .1.11.11 <<+1 .1.11111 .1.1.111 Figure 18.1-C: The 30 binary 8-bit Lyndon words in an order with few changes between successive words. Transitions where more than one bit changes are marked with a ‘<<’. n : Xn 1: 0 2: 0 3: 0 4: 0 5: 1 6: 1 n : Xn 7: 2 8: 5 9: 11 10: 15 11: 34 12: 54 n: Xn 13: 95 14: 163 15: 290 16: 479 17: 859 18: 1450 n: 19: 20: 21: 22: 23: 24: Xn 2598 4546 8135 14427 26122 46957 n: 25: 26: 27: 28: 29: 30: Xn 85449 155431 284886 522292 963237 1778145 Figure 18.1-D: Excess (with respect to Gray code) of the number of bits changed. 11 12 13 14 15 16 17 else { f[n] = f[n-j]; xgen(n+1, j, +x); if ( 0==f[n-j] ) { f[n] = 1; xgen(n+1, n, -x); } } } } The program [FXT: comb/necklace-gray-demo.cc] computes the binary Lyndon words with the given routine. The ordering has fewer transitions between successive words but is in general not a Gray code (for up to 6-bit words a Gray code is generated). Figure 18.1-C shows the output with 8-bit Lyndon words. The first 2bn/2c −1 Lyndon words of length n are in Gray code order. The number Xn of additional transitions of the length-n Lyndon words is, for n ≤ 30, shown in figure 18.1-D. 18.1.5 An order with at most three changes per transition 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: .1111111 .111.111 .11.1111 <<+1 .1.1.111 <<+2 .1.1.1.1 .1.11.11 <<+2 .1.11111 ...11111 ...111.1 ...11..1 ...11.11 ...1..11 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: ...1...1 ...1.1.1 ...1.111 .....111 .....1.1 .......1 ........ ......11 <<+1 ....1.11 ....1..1 ....11.1 ....1111 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: ..1.1111 ..1.11.1 ..1.1.11 <<+1 ..1..1.1 <<+2 ..1..111 ..11.111 ..11.1.1 ..11..11 <<+1 ..111.11 ..1111.1 <<+1 ..111111 11111111 <<+1 Figure 18.1-E: The 30 binary 8-bit necklaces in an order with at most 3 changes per transition. Transitions where more than one bit changes are marked with a ‘<<’. An algorithm to generate necklaces in an order such that at most 3 elements change with each update is given in [352]. The recursion can be given as (corrected and shortened) [FXT: comb/necklace-gray3demo.cc]: 1 2 3 4 5 6 7 long *f; // data in f[1..m], f[0] = 0 long N; // word length int k; // k-ary necklaces, k==sigma in the paper void gen3(int z, int t, int j) { if ( t > N ) { visit(j); } 376 Chapter 18: Necklaces and Lyndon words n : Xn 1: 0 2: 1 3: 2 4: 2 5: 2 6: 4 n : Xn 7: 6 8: 12 9: 20 10: 38 11: 64 12: 116 n: Xn 13: 200 14: 360 15: 628 16: 1128 17: 1998 18: 3606 n: 19: 20: 21: 22: 23: 24: Xn 6462 11722 21234 38754 70770 129970 n: Xn 25: 239008 26: 441370 27: 816604 28: 1515716 29: 2818928 30: 5256628 Figure 18.1-F: Excess (with respect to Gray code) of number of bits changed. 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 else { if ( (z&1)==0 ) // z (number of elements ==(k-1)) is even? { for (int i=f[t-j]; i<=k-1; ++i) { f[t] = i; gen3( z+(i!=k-1), t+1, (i!=f[t-j]?t:j) ); } } else { for (int i=k-1; i>=f[t-j]; --i) { f[t] = i; gen3( z+(i!=k-1), t+1, (i!=f[t-j]?t:j) ); } } } } The variable z counts the number of maximal elements. The output with length-8 binary necklaces is shown in figure 18.1-E. Selecting the necklaces from the reversed list of complemented Gray codes of the n-bit binary words produces the same list. 18.1.6 L= L= L= L= L= L= L= L= L= L= L= L= L= L= L= L= Binary necklaces of length 2n via Gray-cycle leaders ‡ 16 cycles of length= 8 1....... [ 1....... ] 1......1 [ .1111111 ] 1.....1. [ ..1.1.1. ] 1.....11 [ 11.1.1.1 ] 1....1.. [ .1..11.. ] 1....1.1 [ 1.11..11 ] 1....11. [ 111..11. ] 1....111 [ ...11..1 ] 1..1.... [ .111.... ] 1..1...1 [ 1...1111 ] 1..1..1. [ 11.11.1. ] 1..1..11 [ ..1..1.1 ] 1..1.1.. [ 1.1111.. ] 1..1.1.1 [ .1....11 ] 1..1.11. [ ...1.11. ] 1..1.111 [ 111.1..1 ] L= 1..1.11. --> 11.111.1 --> 1.11..11 --> 111.1.1. --> 1..11111 --> 11.1.... --> 1.111... --> 111..1.. [ ...1.11. ] [ ....1.11 ] [ 1....1.1 ] [ 11....1. ] [ .11....1 ] [ 1.11.... ] [ .1.11... ] [ ..1.11.. ] L= 1..1.111 --> 11.111.. --> 1.11..1. --> 111.1.11 --> 1..1111. --> 11.1...1 --> 1.111..1 --> 111..1.1 [ 111.1..1 ] [ 1111.1.. ] [ .1111.1. ] [ ..1111.1 ] [ 1..1111. ] [ .1..1111 ] [ 1.1..111 ] [ 11.1..11 ] Figure 18.1-G: Left: the cycle leaders (minima) L of the Gray permutation with highest bit at index 7 and their bit-wise Reed-Muller transforms Y (L). Right: the last two cycles and the transforms of their elements. The algorithm for the generation of cycle leaders for the Gray permutation given section 2.12.1 on page 128 and relation 1.19-10c on page 53, written as Sk Y x = Y g k x (18.1-1) (Y is the yellow code, the bit-wise Reed-Muller transform) can be used for generating the necklaces of length 2n : The cyclic shifts of Y x are equal to Y g k x for k = 0, . . . , l − 1 where l is the cycle length. 18.2: Lex-min De Bruijn sequence from necklaces 377 Figure 18.1-G shows the correspondence between cycles of the Gray permutation and cyclic shifts. It was generated with the program [FXT: comb/necklaces-via-gray-leaders-demo.cc]. If no better algorithm for the cycle leaders of the Gray permutation was known, we could generate them as Y −1 (N ) = Y (N ) where N are the necklaces of length 2n . The same idea, together with relation 1.19-11b on page 53, give the relation Sk B x = B e−k x (18.1-2) where B is the blue code and e the reversed Gray code. 18.1.7 Binary necklaces via cyclic shifts and complements ‡ 1: 2: 3: n = 3 ..1 .11 111 1: 2: 3: 4: 5: n = 4 ...1 ..11 .111 1111 .1.1 1: 2: 3: 4: 5: 6: 7: n = 5 ....1 ...11 ..111 .1111 11111 ..1.1 .1.11 n = 6 .....1 ....11 ...111 ..1111 .11111 111111 ..11.1 .11.11 ...1.1 ..1.11 .1.111 .1.1.1 ..1..1 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: n = 7 ......1 .....11 ....111 ...1111 ..11111 .111111 1111111 ..111.1 ...11.1 ..11.11 .11.111 ....1.1 ...1.11 ..1.111 .1.1111 ..1.1.1 .1.1.11 ...1..1 ..1..11 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: n = 8 .......1 ......11 .....111 ....1111 ...11111 ..111111 .1111111 11111111 ..1111.1 ...111.1 ..111.11 .111.111 ....11.1 ...11.11 ..11.111 .11.1111 ..11.1.1 ...11..1 [n=8 cont.] 19: ..11..11 20: .....1.1 21: ....1.11 22: ...1.111 23: ..1.1111 24: .1.11111 25: ..1.11.1 26: .1.11.11 27: ...1.1.1 28: ..1.1.11 29: .1.1.111 30: .1.1.1.1 31: ....1..1 32: ...1..11 33: ..1..111 34: ..1..1.1 35: ...1...1 Figure 18.1-H: Nonzero binary necklaces of lengths n = 3, 4, . . . , 8 as generated by the shift and complement algorithm. A recursive algorithm to generate all nonzero binary necklaces via cyclic shifts and complements of the lowest bit is described in [287]. An implementation of the method is given in [FXT: comb/necklacesigma-tau-demo.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 inline ulong sigma(ulong x) { return bit_rotate_left(x, 1, n); } inline ulong tau(ulong x) { return x ^ 1; } void search(ulong y) { visit(y); ulong t = y; while ( 1 ) { t = sigma(t); ulong x = tau(t); if ( (x&1) && (x == bit_cyclic_min(x, n)) ) else break; } } search(x); The initial call is search(1). The generated ordering for lengths n = 3, 4, . . . , 8 is shown in figure 18.1-H. 18.2 Lex-min De Bruijn sequence from necklaces The lexicographically minimal De Bruijn sequence can be obtained from the necklaces in lexicographic order as shown in figure 18.2-A. Let W be a necklace with period p, and define its primitive part P (W ) to be the p rightmost digits of W . Then the lex-min De Bruijn sequence is the concatenation of the primitive parts of the necklaces in lex order. An implementation is [FXT: class debruijn in comb/debruijn.h]: 378 Chapter 18: Necklaces and Lyndon words neckl. period P(neckl.) 0000 0001 0002 0011 0012 0021 0022 0101 0102 0111 0112 0121 0122 0202 0211 0212 0221 0222 1111 1112 1122 1212 1222 2222 1 4 4 4 4 4 4 2 4 4 4 4 4 2 4 4 4 4 1 4 4 2 4 1 0 0001 0002 0011 0012 0021 0022 01 0102 0111 0112 0121 0122 02 0211 0212 0221 0222 1 1112 1122 12 1222 2 0 0001 0002 0011 0012 0021 0022 01 0102 0111 0112 [--snip--] 1122 12 1222 2 == 000010002001100120021002201010201110112012101220202110212022102221111211221212222 Figure 18.2-A: The 3-ary necklaces of length 4 (left) and their primitive parts (right). The concatenation of the primitive parts gives a De Bruijn sequence (bottom). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 class debruijn : public necklace // Lexicographic minimal De Bruijn sequence. { public: ulong i_; // position of current digit in current string public: debruijn(ulong m, ulong n) : necklace(m, n) { first_string(); } ~debruijn() { ; } ulong first_string() { necklace::first(); i_ = 1; return j_; } ulong next_string() // make new string, return its length { necklace::next(); i_ = (j_ != 0); return j_; } ulong next_digit() // Return current digit and move to next digit. // Return m if previous was last. { if ( i_ == 0 ) return necklace::m1_ + 1; ulong d = a_[ i_ ]; if ( i_ == j_ ) next_string(); else ++i_; return d; } ulong first_digit() { first_string(); return next_digit(); 18.3: The number of binary necklaces 43 44 379 } }; Usage is demonstrated in [FXT: comb/debruijn-demo.cc]: 1 2 3 4 5 6 7 8 9 10 11 ulong m = 3; // m-ary De Bruijn sequence ulong n = 4; // length = m**n debruijn S(m, n); ulong i = S.first_string(); do { cout << " "; for (ulong u=1; u<=i; ++u) cout << S.a_[u]; i = S.next_string(); } while ( i ); // note: one-based array For digit by digit generation, use 1 2 3 4 5 6 7 ulong i = S.first_digit(); do { cout << i; i = S.next_digit(); } while ( i!=m ); A special version for binary necklaces is [FXT: class binary debruijn in comb/binary-debruijn.h]. 18.3 The number of binary necklaces n : Nn 1: 2 2: 3 3: 4 4: 6 5: 8 6: 14 7: 20 8: 36 9: 60 10: 108 n: Nn 11: 188 12: 352 13: 632 14: 1182 15: 2192 16: 4116 17: 7712 18: 14602 19: 27596 20: 52488 n: Nn 21: 99880 22: 190746 23: 364724 24: 699252 25: 1342184 26: 2581428 27: 4971068 28: 9587580 29: 18512792 30: 35792568 n: Nn 31: 69273668 32: 134219796 33: 260301176 34: 505294128 35: 981706832 36: 1908881900 37: 3714566312 38: 7233642930 39: 14096303344 40: 27487816992 Figure 18.3-A: The number of binary necklaces for n ≤ 40. n : Ln 1: 2 2: 1 3: 2 4: 3 5: 6 6: 9 7: 18 8: 30 9: 56 10: 99 n: Ln 11: 186 12: 335 13: 630 14: 1161 15: 2182 16: 4080 17: 7710 18: 14532 19: 27594 20: 52377 n: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: Ln 99858 190557 364722 698870 1342176 2580795 4971008 9586395 18512790 35790267 n: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: Ln 69273666 134215680 260300986 505286415 981706806 1908866960 3714566310 7233615333 14096302710 27487764474 Figure 18.3-B: The number of binary Lyndon words for n ≤ 40. 380 Chapter 18: Necklaces and Lyndon words The number of binary necklaces of length n equals Nn = n 1 X 1 X gcd(j,n) ϕ(d) 2n/d = 2 n n j=1 (18.3-1) d\n The values for n ≤ 40 are shown in figure 18.3-A. The sequence is entry A000031 in [312]. The number of Lyndon words (aperiodic necklaces) equals 1 X 1 X µ(d) 2n/d = µ(n/d) 2d Ln = n n d\n (18.3-2) d\n The Möbius function µ is defined in relation 37.1-6 on page 705. The values for n ≤ 40 are given in figure 18.3-B. The sequence is entry A001037 in [312]. Replacing 2 by k in the formulas for Nn and Ln gives expressions for k-ary necklaces and Lyndon words. For prime n = p we have Lp = Np − 2 and p−1   1 X p 2p − 2 = (18.3-3) Lp = p p k k=1  The latter form tells us that there are exactly kp /p Lyndon words with k ones for 1 ≤ k ≤ p − 1. The difference of 2 is due to the necklaces that consist of all zeros or ones. The number of irreducible binary polynomials (see section 40.6 on page 843) of degree n also equals Ln . For the equivalence between necklaces and irreducible polynomials see section 40.10 on page 856. Let d be a divisor of n. There are 2n binary words of length n, each having some period d that divides n. There are d different shifts of the corresponding word, thereby X 2n = d Ld (18.3-4) d\n Möbius inversion gives relation 18.3-2. The necklaces of length n and period d are a concatenation of n/d Lyndon words of length d, so X Nn = Ld (18.3-5) d\n We note the relations (see section 37.2 on page 709) (1 − 2 x) ∞ X Lk xk = = k=1 ∞ Y k=1 ∞ X k=1 (1 − xk )Lk (18.3-6a)  −µ(k) log 1 − 2 xk k (18.3-6b) Defining η B (x) := ∞ Y 1 − B xk  (18.3-7a) (1 − xk )Nk (18.3-7b) η 1 (xk )Lk (18.3-7c) k=1 we have η 2 (x) η 2 (x) = = ∞ Y k=1 ∞ Y k=1 18.3: The number of binary necklaces n: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: Nn N(n,0) N(n,1) N(n,2) N(n,3) N(n,4) N(n,5) N(n,6) N(n,7) N(n,8) N(n,9) N(n,10) 2 1 1 3 1 1 1 4 1 1 1 1 6 1 1 2 1 1 8 1 1 2 2 1 1 14 1 1 3 4 3 1 1 20 1 1 3 5 5 3 1 1 36 1 1 4 7 10 7 4 1 1 60 1 1 4 10 14 14 10 4 1 1 108 1 1 5 12 22 26 22 12 5 1 1 188 1 1 5 15 30 42 42 30 15 5 1 352 1 1 6 19 43 66 80 66 43 19 6 632 1 1 6 22 55 99 132 132 99 55 22 1182 1 1 7 26 73 143 217 246 217 143 73 2192 1 1 7 31 91 201 335 429 429 335 201 4116 1 1 8 35 116 273 504 715 810 715 504 7712 1 1 8 40 140 364 728 1144 1430 1430 1144 14602 1 1 9 46 172 476 1038 1768 2438 2704 2438 27596 1 1 9 51 204 612 1428 2652 3978 4862 4862 52488 1 1 10 57 245 776 1944 3876 6310 8398 9252 Figure 18.3-C: The number N(n,z) of binary necklaces of length n with z zeros. n: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: Ln L(n,0) L(n,1) L(n,2) L(n,3) L(n,4) L(n,5) L(n,6) L(n,7) L(n,8) L(n,9) L(n,10) 2 1 1 1 0 1 0 2 0 1 1 0 3 0 1 1 1 0 6 0 1 2 2 1 0 9 0 1 2 3 2 1 0 18 0 1 3 5 5 3 1 0 30 0 1 3 7 8 7 3 1 0 56 0 1 4 9 14 14 9 4 1 0 99 0 1 4 12 20 25 20 12 4 1 0 186 0 1 5 15 30 42 42 30 15 5 1 335 0 1 5 18 40 66 75 66 40 18 5 630 0 1 6 22 55 99 132 132 99 55 22 1161 0 1 6 26 70 143 212 245 212 143 70 2182 0 1 7 30 91 200 333 429 429 333 200 4080 0 1 7 35 112 273 497 715 800 715 497 7710 0 1 8 40 140 364 728 1144 1430 1430 1144 14532 0 1 8 45 168 476 1026 1768 2424 2700 2424 27594 0 1 9 51 204 612 1428 2652 3978 4862 4862 52377 0 1 9 57 240 775 1932 3876 6288 8398 9225 Figure 18.3-D: The number L(n,z) of binary Lyndon words of length n with z zeros. 381 382 Chapter 18: Necklaces and Lyndon words 18.3.1 Binary necklaces with fixed density Let N(n,n0 ) be the number of binary length-n necklaces with exactly n0 zeros (and n1 = n − n0 ones) the necklaces with fixed density. We have   X 1 n/j N(n,n0 ) = ϕ(j) (18.3-8) n n0 /j j\ gcd(n,n0 ) Bit-wise complementing gives the symmetry relation N(n,n0 ) = N(n,n−n0 ) = N(n,n1 ) . A table of small values is given in figure 18.3-C. Let L(n,n0 ) be the number of binary length-n Lyndon words with exactly n0 zeros (Lyndon words with fixed density), then   X n/j 1 µ(j) (18.3-9) L(n,n0 ) = n0 /j n j\ gcd(n,n0 ) The symmetry relation is the same as for N(n,n0 ) . A table of small values is given in figure 18.3-D. 18.3.2 Binary necklaces with even or odd weight Summing N(n,k) over all even or odd k ≤ n gives the number of necklaces of even (symbol En ) or odd (On ) weight, respectively. The first few values, the differences En − On , and the sums En + On = Nn : Neckl. n: 1 2 3 4 5 En : 1 2 2 4 4 On : 1 1 2 2 4 En − On : 0 1 0 2 0 En + On : 2 3 4 6 8 6 7 8 9 8 10 20 30 6 10 16 30 2 0 4 0 14 20 36 60 10 11 12 13 14 15 16 17 56 94 180 316 596 1096 2068 3856 52 94 172 316 586 1096 2048 3856 4 0 8 0 10 0 20 0 108 188 352 632 1182 2192 4116 7712 The number of Lyndon words of even (en ) and odd (on ) weight can be computed in the same way: Lyn. n : 1 2 en : 0 0 on : 1 1 en − on : −1 −1 e n + on : 1 1 3 4 5 1 1 3 1 2 3 0 −1 0 2 3 6 6 7 8 9 10 11 12 13 14 15 16 17 4 9 14 28 48 93 165 315 576 1091 2032 3855 5 9 16 28 51 93 170 315 585 1091 2048 3855 −1 0 −2 0 −3 0 −5 0 −9 0 −16 0 9 18 30 56 99 186 335 630 1161 2182 4080 7710 The differences between the number of necklaces and Lyndon words are: n: 1 2 3 4 5 6 7 8 9 En − en : 1 2 1 3 1 4 1 6 2 O n − on : 0 0 1 0 1 1 1 0 2 E n − on : 0 1 1 2 1 3 1 4 2 On − en : 1 1 1 1 1 2 1 2 2 18.3.3 10 8 1 5 4 11 12 1 15 1 2 1 10 1 7 13 14 1 20 1 1 1 11 1 10 15 16 5 36 5 0 5 20 5 16 17 1 1 1 1 Necklaces with fixed content Let N(n0 ,n1 ,...,nk−1 ) be the number of k-symbol length-n necklaces with nj occurrences of symbol j, the P number of such necklaces with fixed content, we have (n = j Smat; // class matrix // matrix with integer entries 19.1: Hadamard matrices via LFSR Signed SRS: - + + + - + + - - + - + - - Hadamard matrix H: + + + + + + + + + + + + + + + + + - + + + - + + - - + - + - - + - - + + + - + + - - + - + - + - - - + + + - + + - - + - + + - - - - + + + - + + - - + - + + + - - - - + + + - + + - - + + - + - - - - + + + - + + - - + + + - + - - - - + + + - + + - + - + - + - - - - + + + - + + + - - + - + - - - - + + + - + + + + - - + - + - - - - + + + - + + + + - - + - + - - - - + + + + - + + - - + - + - - - - + + + + + - + + - - + - + - - - - + + + + + - + + - - + - + - - - - + + + + + - + + - - + - + - - - - 385 Signed SRS: - + + - + - Hadamard matrix H: + + + + + + + + + - + + - + - + - - + + - + + - - - + + - + + + - - - + + + - + - - - + + + + - + - - - + + + + - + - - - Signed SRS: - + Hadamard matrix H: + + + + + - + + - - + + + - - Figure 19.1-A: Hadamard matrices created with binary shift register sequences (SRS) of maximum length. Only the sign of the entries is given, all entries are ±1. 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 [--snip--] ulong n = 5; ulong N = 1UL << n; [--snip--] // --- create signed SRS: int vec[N-1]; lfsr S(n); for (ulong k=0; k inline void copy_cyclic(const Type *src, Type *dst, ulong n, ulong s) // Copy array src[] to dst[] // starting from position s in src[] // wrap around end of src[] (src[n-1]) // // src[] is assumed to be of length n // dst[] must be length n at least // // Equivalent to: { acopy(src, dst, n); rotate_right(dst, n, s)} { ulong k = 0; while ( s(vec, q, -1); vec[0] = 0; for (ulong k=1; k<(q+1)/2; ++k) vec[(k*k)%q] = +1; [--snip--] // --- create Q x Q conference matrix: Smat C(Q,Q); C.set(0,0, 0); for (ulong c=1; c 2*x^2+1 return ( subst(p, ’x, q) ); } The inverse routine is 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 num2pol(n,q)= \\ Return polynomial for number n. { local(p, mq, k); p = Pol(0,’x); k = 0; while ( 0!=n, mq = n % q; p += mq * (’x)^k; n -= mq; n \= q; k++; ); return( p ); } n The quadratic character of an element z can be determined by computing z (q −1)/2 modulo the field polynomial. The result will be zero for z = 0, else ±1. For our purpose its is better to precompute a table of the quadratic characters for later lookup: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 quadcharvec(fp, q)= \\ Return a table of quadratic characters in GF(q^n) \\ fp is the field polynomial. { local(n, qn, sv, pl); n=poldegree(fp); qn=q^n-1; sv=vector(qn+1, j, -1); sv[1] = 0; for (k=1, qn, pl = num2pol(k,q); pl = Mod(Mod(1,q)*pl, fp); sq = pl * pl; sq = lift(sq); \\ remove mod i = pol2num( sq, q ); sv[i+1] = +1; ); return( sv ); } 19.3: Conference matrices via finite fields 389 With this table we can compute the quadratic characters of the difference of two elements efficiently: 1 2 3 4 5 6 7 8 9 10 11 12 13 getquadchar_v(n1, n2, q, fp, sv)= \\ Return the quadratic character of (n2-n1) in GF(q^n) \\ Table lookup method { local(p1, p2, d, nd, sc); if ( n1==n2, return(0) ); p1 = num2pol(n1, q); p2 = num2pol(n2, q); d = (p2-p1) % fp; nd = pol2num(d, q); sc = sv[nd+1]; return( sc ); } Now we can construct conference matrices: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 matconference(q, fp, sv)= \\ Return a QxQ conference matrix. \\ q an odd prime. \\ fp an irreducible polynomial modulo q. \\ sv table of quadratic characters in GF(q^n) \\ where n is the degree of fp. { local(y, Q, C, n); n = poldegree(fp); Q=q^n+1; if ( sv[2]==sv[Q-1], y=+1, y=-1 ); \\ symmetry C = matrix(Q,Q); for (k=2, Q, C[1,k]=+1); \\ first row for (k=2, Q, C[k,1]=y); \\ first column for (r=2, Q, for (c=2, Q, sc = getquadchar_v(r-2, c-2, q, fp, sv); C[r,c] = sc; ); ); return( C ); } q = 3 fp = x^2 + 1 GF(3^2) Table of quadratic characters: 0 + + + - - + - 10x10 conference matrix C: 0 + + + + + + + + + - 0 + + + - - + - - + 0 + - + - - + - + + 0 - - + - - + - + - - 0 + + + - - - + - + 0 + - + - - - + + + 0 - - + - + - - + - - 0 + + - - + - - + - + 0 + - - - + - - + + + 0 Figure 19.3-A: A 10 × 10 conference matrix for q = 3 and the field polynomial f = x2 + 1. To compute a Q × Q conference matrix where Q = q n + 1 we need to find a polynomial of degree n that is irreducible modulo q. With q = 3 and the field polynomial f = x2 + 1 (so n = 2) we get the 10 × 10 conference matrix shown in figure 19.3-A. A conference matrix for q = 3 and f = x3 − x + 1 is given in figure 19.3-B. Hadamard matrices can be created in the same manner as before, the symmetry criterion being whether q n ≡ ± 1 mod 4. The conference matrices obtained are of size Q = q n + 1 where q is an odd prime. The values Q ≤ 100 are (see sequence A061344 in [312]): 4, 6, 8, 10, 12, 14, 18, 20, 24, 26, 28, 30, 32, 38, 42, 44, 48, 50, 54, 60, 62, 68, 72, 74, 80, 82, 84, 90, 98 Our construction does not give conference matrices for any odd Q, and these even values Q ≤ 100: 390 Chapter 19: Hadamard and conference matrices q = 3 fp = x^3 - x + 1 GF(3^3) Table of quadratic characters: 0 + - - - - + + + + - + + + - + + - - - + - + - - + 28x28 conference matrix C: 0 + + + + + + + + + + + + + + + + + + + + + + + + + + + - 0 + - - - - + + + + - + + + - + + - - - + - + - - + - - 0 + - - - + + + + + - - + + - + + + - - - - + - - + - + - 0 - - - + + + - + + + - + + - + - + - + - - + - - + + + 0 + - - - - + + - + - + + + - - + - - - + - + - + + + - 0 + - - - - + + + + - - + + - - + + - - - - + - + + + + - 0 - - - + - + - + + + - + + - - - + - + - - - - - + + + 0 + - + + - + + - + - + - + - - + - - - + - - - - + + + - 0 + - + + - + + + + - - - + - - + + - - - - - + + + + - 0 + - + + - + - + + + - - + - - - + - - - + - + - - + - 0 + - - - - + + + + - + + + - + + - + - - - - + - - + - 0 + - - - + + + + + - - + + - + + - - + - + - - + - - + - 0 - - - + + + - + + + - + + - + - - + - - - + - + - + + + 0 + - - - - + + - + - + + + - - - + + - - - - + + + + - 0 + - - - - + + + + - - + + - + - - - + - + - - + + + + - 0 - - - + - + - + + + - + - - + - - + - - - + - - - + + + 0 + - + + - + + - + - + - - - + - - + + - - - - - + + + - 0 + - + + - + + + + - + - - + - - - + - - - - + + + + - 0 + - + + - + - + + - + - + + + - + + - - - + - + - - + - 0 + - - - - + + + - + + - - + + - + + + - - - - + - - + - 0 + - - - + + + - - + + + - + + - + - + - + - - + - - + - 0 - - - + + + - + + - + - + + + - - + - - - + - + - + + + 0 + - - - - - + + + + - - + + - - + + - - - - + + + + - 0 + - - - + - + - + + + - + + - - - + - + - - + + + + - 0 - - - + + - + + - + - + - + - - + - - - + - - - + + + 0 + - - + + - + + + + - - - + - - + + - - - - - + + + - 0 + - + - + + - + - + + + - - + - - - + - - - - + + + + - 0 Figure 19.3-B: A 28 × 28 conference matrix for q = 3 and the field polynomial f = x3 − x + 1. 2, 16, 22, 34, 36, 40, 46, 52, 56, 58, 64, 66, 70, 76, 78, 86, 88, 92, 94, 96, 100 For example, Q = 16 = 15 + 1 = 3 · 5 + 1 has not the required form. If a conference matrix of size Q exists, then we can create Hadamard matrices of sizes N = Q whenever q n ≡ 3 mod 4 and N = 2 Q whenever q n ≡ 1 mod 4. Further, if Hadamard matrices of sizes N and M exist, then a (N · M ) × (N · M ) the Kronecker product of those matrices is a Hadamard matrix. The values of N = 4 k ≤ 2000 such that this construction does not give an N × N Hadamard matrix are: 92, 116, 156, 172, 184, 188, 232, 236, 260, 268, 292, 324, 356, 372, 376, 404, 412, 428, 436, 452, 472, 476, 508, 520, 532, 536, 584, 596, 604, 612, 652, 668, 712, 716, 732, 756, 764, 772, 808, 836, 852, 856, 872, 876, 892, 904, 932, 940, 944, 952, 956, 964, 980, 988, 996, 1004, 1012, 1016, 1028, 1036, 1068, 1072, 1076, 1100, 1108, 1132, 1148, 1168, 1180, 1192, 1196, 1208, 1212, 1220, 1244, 1268, 1276, 1300, 1316, 1336, 1340, 1364, 1372, 1380, 1388, 1396, 1412, 1432, 1436, 1444, 1464, 1476, 1492, 1508, 1528, 1556, 1564, 1588, 1604, 1612, 1616, 1636, 1652, 1672, 1676, 1692, 1704, 1712, 1732, 1740, 1744, 1752, 1772, 1780, 1796, 1804, 1808, 1820, 1828, 1836, 1844, 1852, 1864, 1888, 1892, 1900, 1912, 1916, 1928, 1940, 1948, 1960, 1964, 1972, 1976, 1992 This is sequence A046116 in [312]. It can be computed by starting with a list of all numbers of the form 4 k and deleting all values k = 2a (q + 1) where q is a power of an odd prime. Constructions for Hadamard matrices for numbers of certain forms are known, see [234] and [157]. Whether Hadamard matrices exist for all values N = 4 k is an open problem. A readable source about constructions for Hadamard matrices is [316]. Hadamard matrices for all N ≤ 256 are given in [313]. 391 Chapter 20 Searching paths in directed graphs ‡ We describe how certain combinatorial structures can be represented as paths or cycles in a directed graph. As an example consider Gray codes of n-bit binary words: we are looking for sequences of all 2n binary words such that only one bit changes between two successive words. A convenient representation of the search space is that of a graph. The nodes are the binary words and an edge is drawn between two nodes if the node’s values differ by exactly one bit. Every path that visits all nodes of that graph corresponds to a Gray code. If the path is a cycle, a Gray cycle was found. Depending on the size of the problem, we can 1. try to find at least one object, 2. generate all objects, 3. show that no such object exists. The method used is usually called backtracking. We will see how to reduce the search space if additional constraints are imposed on the paths. Finally, we show how careful optimization can lead to surprising algorithms for objects of a size where one would hardly expect to obtain a result at all. In fact, Gray cycles through the n-bit binary Lyndon words for all odd n ≤ 37 are determined. We use graphs solely as a tool for finding combinatorial structures. For algorithms dealing with the properties of graphs see, for example, [220] and [307]. Terminology and conventions We will use the terms node (instead of vertex ) and edge (sometimes called arc). We restrict our attention to directed graphs (or digraphs) as undirected graphs are just the special case of these: an edge in an undirected graph corresponds to two antiparallel edges (think: ‘arrows’) in a directed graph. A length-k path is a sequence of nodes where an edge leads from each node to its successor. A path is called simple if the nodes are pair-wise distinct. We restrict our attention to simple paths of length N where N is the number of nodes of the graph. We use the term full path for a simple path of length N . If in a simple path there is an edge from the last node of the path to the starting node the path is a cycle (or circuit). A full path that is a cycle is called a Hamiltonian cycle, a graph containing such a cycle is called Hamiltonian. We allow for loops (edges that start and point to the same node). Graphs that contain loops are called pseudo graphs. The algorithms used will effectively ignore loops. We disallow multigraphs (where multiple edges can start and end at the same two nodes), as these would lead to repeated output of identical objects. The neighbors of a node are those nodes to which outgoing edges point. Neighbors can be reached with one step. The neighbors of a node a called adjacent to the node. The adjacency matrix of a graph with N nodes is an N × N matrix A where Ai,j = 1 if there is an edge from node i to node j, else Ai,j = 0. While easy to implement (and modify later) we will not use this kind of representation as the memory requirement would be prohibitive for large graphs. Chapter 20: Searching paths in directed graphs ‡ 392 20.1 Representation of digraphs For our purposes a static implementation of the graph as arrays of nodes and (outgoing) edges will suffice. The container class digraph merely allocates memory for the nodes and edges. The correct initialization is left to the user [FXT: class digraph in graph/digraph.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 class digraph { public: ulong ng_; // number of Nodes of Graph ulong *ep_; // e[ep[k]], ..., e[ep[k+1]-1]: outgoing connections of node k ulong *e_; // outgoing connections (Edges) ulong *vn_; // optional: sorted values for nodes // if vn is used, then node k must correspond to vn[k] public: digraph(ulong ng, ulong ne, ulong *&ep, ulong *&e, bool vnq=false) : ng_(0), ep_(0), e_(0), vn_(0) { ng_ = ng; ep_ = new ulong[ng_+1]; e_ = new ulong[ne]; ep = ep_; e = e_; if ( vnq ) vn_ = new ulong[ng_]; } ~digraph() { delete [] ep_; delete [] e_; if ( vn_ ) delete [] vn_; } [--snip--] void get_edge_idx(ulong p, ulong &fe, ulong &en) const // Setup fe and en so that the nodes reachable from p are // e[fe], e[fe+1], ..., e[en-1]. // Must have: 0<=p1-->3 { dp.mark(0, ns); dp.mark(1, ns); p = 3; } dp.all_cond_paths(pfunc, cfunc_mac, ns, p, maxnp); return 0; } The function used to impose the MAC condition is: 1 2 3 4 5 6 7 8 9 10 11 ulong cf_nb; // number of bits, set in main() bool cfunc_mac(digraph_paths &dp, ulong ns) // Condition: difference of successive delta values (modulo n) == +-1 { // path initialized, we have ns>=2 ulong p = dp.rv_[ns], p1 = dp.rv_[ns-1], p2 = dp.rv_[ns-2]; ulong c = p ^ p1, c1 = p1 ^ p2; if ( c & bit_rotate_left(c1,1,cf_nb) ) return true; if ( c1 & bit_rotate_left(c,1,cf_nb) ) return true; return false; } Chapter 20: Searching paths in directed graphs ‡ 400 We find paths for n ≤ 7 (n = 7 takes about 15 minutes). Whether MAC Gray codes exist for n ≥ 8 is unknown (none is found with a 40 hour search). 20.3.2 Adjacent changes (AC) Gray codes For AC paths we can only discard track-reflected solutions, the canonical paths are those where the delta sequence starts with a value ≤ dn/2e. A function to impose the AC condition is 1 2 3 4 5 6 7 8 9 10 11 ulong cf_mt; // mid track < cf_mt, set in main() bool cfunc_ac(digraph_paths &dp, ulong ns) // Condition: difference of successive delta values == +-1 { if ( ns<2 ) return (dp.rv_[1] < cf_mt); // avoid track-reflected solutions ulong p = dp.rv_[ns], p1 = dp.rv_[ns-1], p2 = dp.rv_[ns-2]; ulong c = p ^ p1, c1 = p1 ^ p2; if ( c & (c1<<1) ) return true; if ( c1 & (c<<1) ) return true; return false; } 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: ..... 0 ..1.. 1 .11.. 2 111.. 3 1.1.. 2 1.... 1 1..1. 2 1.11. 3 1111. 4 11.1. 3 11... 2 11..1 3 11.11 4 11111 5 1.111 4 1..11 3 1...1 2 1.1.1 3 111.1 4 .11.1 3 ..1.1 2 ....1 1 ...11 2 ..111 3 .1111 4 .1.11 3 .1..1 2 .1... 1 .1.1. 2 .111. 3 ..11. 2 ...1. 1 0 4 12 28 20 16 18 22 30 26 24 25 27 31 23 19 17 21 29 13 5 1 3 7 15 11 9 8 10 14 6 2 ..1.. .1... 1.... .1... ..1.. ...1. ..1.. .1... ..1.. ...1. ....1 ...1. ..1.. .1... ..1.. ...1. ..1.. .1... 1.... .1... ..1.. ...1. ..1.. .1... ..1.. ...1. ....1 ...1. ..1.. .1... ..1.. [...1. 2 3 4 3 2 1 2 3 2 1 0 1 2 3 2 1 2 3 4 3 2 1 2 3 2 1 0 1 2 3 2 1] 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: ..... 0 ....1 1 ...11 2 ..111 3 .1111 4 .1.11 3 .1..1 2 .11.1 3 ..1.1 2 1.1.1 3 111.1 4 11..1 3 11.11 4 11111 5 1.111 4 1..11 3 1...1 2 1.... 1 1..1. 2 1.11. 3 1111. 4 11.1. 3 11... 2 111.. 3 1.1.. 2 ..1.. 1 .11.. 2 .1... 1 .1.1. 2 .111. 3 ..11. 2 ...1. 1 0 1 3 7 15 11 9 13 5 21 29 25 27 31 23 19 17 16 18 22 30 26 24 28 20 4 12 8 10 14 6 2 ....1 ...1. ..1.. .1... ..1.. ...1. ..1.. .1... 1.... .1... ..1.. ...1. ..1.. .1... ..1.. ...1. ....1 ...1. ..1.. .1... ..1.. ...1. ..1.. .1... 1.... .1... ..1.. ...1. ..1.. .1... ..1.. [...1. 0 1 2 3 2 1 2 3 4 3 2 1 2 3 2 1 0 1 2 3 2 1 2 3 4 3 2 1 2 3 2 1] Figure 20.3-B: Two 5-bit adjacent changes (AC) Gray codes that are cycles. The program [FXT: graph/graph-acgray-demo.cc] allows searches for AC Gray codes. Two cycles for n = 5 are shown in figure 20.3-B. It turns out that such paths exist for n ≤ 6 (the only path for n = 6 is shown in figure 20.3-C) but there is no AC Gray code for n = 7: time ./bin 7 arg 1: 7 == n [size in bits] default=5 arg 2: 0 == maxnp [ stop after maxnp paths (0: never stop)] n = 7 #pfct = 0 #paths = 0 #cycles = 0 ./bin 7 20.77s user 0.11s system 98% cpu 21.232 total default=0 Nothing is known about the case n ≥ 8. For n = 8 no path is found within 15 days. By inspection of the AC Gray codes for different values of n we find an ad hoc algorithm. The following routine computes the delta sequence for AC Gray codes for n ≤ 6 [FXT: comb/acgray.cc]: 1 2 3 4 void ac_gray_delta(uchar *d, ulong ldn) // Generate a delta sequence for an adjacent-changes (AC) Gray code // of length n=2**ldn where ldn<=6. 20.3: Conditional search 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: ...... 0 ...1.. 1 ...11. 2 ....1. 1 ..1.1. 2 ..111. 3 ..11.. 2 ..11.1 3 ..1111 4 ..1.11 3 ....11 2 ...111 3 ...1.1 2 .....1 1 ..1..1 2 .11..1 3 .1...1 2 .1.1.1 3 .1.111 4 .1..11 3 .11.11 4 .11111 5 .111.1 4 .111.. 3 .1111. 4 .11.1. 3 .1..1. 2 .1.11. 3 .1.1.. 2 .1.... 1 .11... 2 ..1... 1 0 4 6 2 10 14 12 13 15 11 3 7 5 1 9 25 17 21 23 19 27 31 29 28 30 26 18 22 20 16 24 8 401 ...1.. ....1. ...1.. ..1... ...1.. ....1. .....1 ....1. ...1.. ..1... ...1.. ....1. ...1.. ..1... .1.... ..1... ...1.. ....1. ...1.. ..1... ...1.. ....1. .....1 ....1. ...1.. ..1... ...1.. ....1. ...1.. ..1... .1.... 1..... 2 1 2 3 2 1 0 1 2 3 2 1 2 3 4 3 2 1 2 3 2 1 0 1 2 3 2 1 2 3 4 5 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: 1.1... 2 111... 3 11.... 2 11.1.. 3 11.11. 4 11..1. 3 111.1. 4 11111. 5 1111.. 4 1111.1 5 111111 6 111.11 5 11..11 4 11.111 5 11.1.1 4 11...1 3 111..1 4 1.1..1 3 1....1 2 1..1.1 3 1..111 4 1...11 3 1.1.11 4 1.1111 5 1.11.1 4 1.11.. 3 1.111. 4 1.1.1. 3 1...1. 2 1..11. 3 1..1.. 2 1..... 1 40 56 48 52 54 50 58 62 60 61 63 59 51 55 53 49 57 41 33 37 39 35 43 47 45 44 46 42 34 38 36 32 .1.... ..1... ...1.. ....1. ...1.. ..1... ...1.. ....1. .....1 ....1. ...1.. ..1... ...1.. ....1. ...1.. ..1... .1.... ..1... ...1.. ....1. ...1.. ..1... ...1.. ....1. .....1 ....1. ...1.. ..1... ...1.. ....1. ...1.. [1..... 4 3 2 1 2 3 2 1 0 1 2 3 2 1 2 3 4 3 2 1 2 3 2 1 0 1 2 3 2 1 2 5] Figure 20.3-C: The (essentially unique) AC Gray code for n = 6. While the path is a cycle in the graph, the AC condition does not hold for the transition from the last to the first word. 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 { if ( ldn<=2 ) // standard Gray code { d[0] = 0; if ( ldn==2 ) { d[1] = 1; d[2] = 0; } return; } ac_gray_delta(d, ldn-1); // recursion ulong n = 1UL<=6 ) { reverse(d, nh-1); for (ulong k=0; k next [xf xe / nn] 0: 0 -> 8 [ 0 0 / 4] 1: 8 -> 10 [ 0 0 / 4] 2: 10 -> 14 [ 0 0 / 4] 3: 14 -> 15 [ 0 0 / 4] 4: 15 -> 7 [ 0 0 / 4] 5: 7 -> 3 [ 0 1 / 4] 6: 3 -> 2 [ 1 2 / 4] 7: 2 -> 6 [ 0 3 / 4] 8: 6 -> 4 [ 0 0 / 4] 9: 4 -> 12 [ 1 3 / 4] 10: 12 -> 13 [ 0 0 / 4] 11: 13 -> 5 [ 0 1 / 4] 12: 5 -> 1 [ 0 3 / 4] 13: 1 -> 9 [ 0 2 / 4] 14: 9 -> 11 [ 0 0 / 4] Path: #non-first-free turns = 2 Figure 20.4-A: A Gray code in the hypercube graph with randomized edge order (left) and the path description (right, see text). If xf equals zero at some step, the first free neighbor was visited. If xf is nonzero, a dead end was reached in the course of the search and there was at least one U-turn. If the path is not the first found, the U-turn might well correspond to a previous path. If there was no U-turn, the number of non-first-free turns is zero (the number is given as the last line of the report). If it is zero, we call the path found a lucky path. For each given ordering of the edges and each starting position of the search there is at most one lucky path and if there is, it is the first path found. If the first path is a lucky path, the search effectively ‘falls through’: the number of operations is a constant times the number of edges. That is, if a lucky path exists it is found almost immediately even for huge graphs. 20.5 Gray codes for Lyndon words We search Gray codes for n-bit binary Lyndon words where n is a prime. Here is a Gray code for the 5-bit Lyndon words that is a cycle: ....1 ...11 .1.11 .1111 ..111 ..1.1 An important application of such Gray codes is the construction of single track Gray codes which can be obtained by appending rotated versions of the block. The following is a single track Gray code based on the block given. At each stage, the block is rotated by two positions (horizontal format): ###### -####---### --##------- --##------###### -####---### -####---### --##------###### -----###### -####---### --##-- ---### --##------###### -####- The transition count (the number of zero-one transitions) is by construction the same for each track. The all-zero and the all-one words are missing in the Gray code, its length equals 2n − 2. 20.5.1 Graph search with edge sorting Gray codes for the 7-bit binary Lyndon words like those shown in figure 20.5-A can easily be found by a graph search. In fact, all of them can be generated in a short time: for n = 7 there are 395 Gray codes (starting with the word 0000..001) of which 112 are cycles. The search for such a path for the next prime, n = 11, does not seem to give a result in reasonable time. Chapter 20: Searching paths in directed graphs ‡ 404 0: 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: ......1 .....11 ....111 ...1111 ..11111 ..11.11 ..1..11 ..1.111 .11.111 .111111 .1.1111 .1.1.11 ...1.11 ...1..1 ...11.1 ..111.1 ..1.1.1 ....1.1 ......1 ...1..1 ...11.1 ..111.1 ..1.1.1 ..1.111 .11.111 .111111 .1.1111 .1.1.11 ...1.11 ...1111 ..11111 ..11.11 ..1..11 .....11 ....111 ....1.1 ......1 ...1..1 ...11.1 ..111.1 ..11111 .111111 .11.111 ..1.111 ..1.1.1 ....1.1 ....111 ...1111 .1.1111 .1.1.11 ...1.11 ..11.11 ..1..11 .....11 ......1 ...1..1 ...11.1 ....1.1 ..1.1.1 ..111.1 ..11111 ..11.11 ..1..11 ..1.111 .11.111 .111111 .1.1111 .1.1.11 ...1.11 ...1111 ....111 .....11 ......1 ...1..1 ...1.11 .1.1.11 .1.1111 .111111 .11.111 ..1.111 ..1.1.1 ..111.1 ...11.1 ....1.1 ....111 .....11 ..1..11 ..11.11 ..11111 ...1111 ......1 .....11 ....111 ....1.1 ..1.1.1 ..111.1 ...11.1 ...1..1 ...1.11 ...1111 ..11111 ..11.11 ..1..11 ..1.111 .11.111 .111111 .1.1111 .1.1.11 Figure 20.5-A: Various Gray codes through the length-7 binary Lyndon words. The first four are cycles. k : 0 : 1 : 2 : 3 : 4 : 5 : 6 : 7 : 8 : 9 : 10 : 11 : 12 : 13 : 14 : 15 : 16 : 17 : [ node] [ 0] [ 1] [ 3] [ 7] [ 13] [ 17] [ 15] [ 10] [ 16] [ 11] [ 5] [ 14] [ 6] [ 12] [ 8] [ 4] [ 9] [ 2] lyn_dec lyn_bin #rot rot(lyn) 1 ......1 0 ......1 3 .....11 0 .....11 7 ....111 0 ....111 15 ...1111 0 ...1111 31 ..11111 0 ..11111 63 .111111 0 .111111 47 .1.1111 0 .1.1111 23 ..1.111 1 .1.111. 55 .11.111 1 11.111. 27 ..11.11 2 11.11.. 11 ...1.11 2 .1.11.. 43 .1.1.11 2 .1.11.1 13 ...11.1 0 ...11.1 29 ..111.1 0 ..111.1 19 ..1..11 3 ..11..1 9 ...1..1 0 ...1..1 21 ..1.1.1 3 .1.1..1 5 ....1.1 3 .1.1... diff delta ......1 0 .....1. 1 ....1.. 2 ...1... 3 ..1.... 4 .1..... 5 ..1.... 4 ......1 0 1...... 6 .....1. 1 1...... 6 ......1 0 .1..... 5 ..1.... 4 ....1.. 2 ..1.... 4 .1..... 5 ......1 0 Figure 20.5-B: A Gray code through the length-7 binary Lyndon words. If we do not insist on a Gray code through the cyclic minima, but allow for arbitrary rotations of the Lyndon words, then more Gray codes exist. For that purpose nodes are declared adjacent if there is any cyclic rotation of the second node’s value that differs in exactly one bit to the first node’s value. The cyclic rotations can be recovered easily after a path is found. This is done in [FXT: graph/graph-lyndongray-demo.cc] whose output is shown in figure 20.5-B. Still, already for n = 11 we do not get a result. As the corresponding graph has 186 nodes and 1954 edges, this is not a surprise. Now we sort the edges according to the comparison function [FXT: graph/lyndon-cmp.cc] 1 2 3 4 5 6 7 8 9 10 int lyndon_cmp0(const ulong &a, const ulong &b) { int bc = bit_count_cmp(a, b); if ( bc ) return -bc; // more bits first else { if ( a==b ) return 0; return (a>b ? +1 : -1); // greater numbers last } } where bit_count_cmp() is defined in [FXT: bits/bitcount.h]: 1 2 3 4 5 6 static inline int bit_count_cmp(const ulong &a, const ulong &b) { ulong ca = bit_count(a); ulong cb = bit_count(b); return ( ca==cb ? 0 : (ca>cb ? +1 : -1) ); } We find a Gray code (which also is a cycle) for n = 11 immediately. Same for n = 13, again a cycle. The 20.5: Gray codes for Lyndon words k : [ node] 0 : [ 0] 1 : [ 1] 2 : [ 3] 3 : [ 7] 4 : [ 15] 5 : [ 31] 6 : [ 63] 7 : [ 125] 8 : [ 239] 9 : [ 417] 10 : [ 589] 11 : [ 629] 12 : [ 618] 13 : [ 514] 14 : [ 624] 15 : [ 550] 16 : [ 626] 17 : [ 567] 18 : [ 627] 19 : [ 576] 20 : [ 628] 21 : [ 581] 22 : [ 404] 23 : [ 614] 24 : [ 508] 25 : [ 584] [--snip--] 615 : [ 4] 616 : [ 36] 617 : [ 32] 618 : [ 33] 619 : [ 153] 620 : [ 65] 621 : [ 154] 622 : [ 79] 623 : [ 16] 624 : [ 126] 625 : [ 145] 626 : [ 130] 627 : [ 188] 628 : [ 71] 629 : [ 8] 405 lyn_dec 1 3 7 15 31 63 127 255 511 1023 2047 4095 3071 1535 3583 1791 3839 1919 3967 1983 4031 2015 991 3039 1519 2031 lyn_bin #rot ............1 0 ...........11 0 ..........111 0 .........1111 0 ........11111 0 .......111111 0 ......1111111 0 .....11111111 0 ....111111111 0 ...1111111111 0 ..11111111111 0 .111111111111 0 .1.1111111111 0 ..1.111111111 1 .11.111111111 1 ..11.11111111 2 .111.11111111 2 ..111.1111111 3 .1111.1111111 3 ..1111.111111 4 .11111.111111 4 ..11111.11111 5 ...1111.11111 5 .1.1111.11111 5 ..1.1111.1111 6 ..111111.1111 6 rot(lyn) ............1 ...........11 ..........111 .........1111 ........11111 .......111111 ......1111111 .....11111111 ....111111111 ...1111111111 ..11111111111 .111111111111 .1.1111111111 .1.111111111. 11.111111111. 11.11111111.. 11.11111111.1 11.1111111..1 11.1111111.11 11.111111..11 11.111111.111 11.11111..111 11.11111...11 11.11111.1.11 11.1111..1.11 11.1111..1111 diff delta ............1 0 ...........1. 1 ..........1.. 2 .........1... 3 ........1.... 4 .......1..... 5 ......1...... 6 .....1....... 7 ....1........ 8 ...1......... 9 ..1.......... 10 .1........... 11 ..1.......... 10 ............1 0 1............ 12 ...........1. 1 ............1 0 ..........1.. 2 ...........1. 1 .........1... 3 ..........1.. 2 ........1.... 4 ..........1.. 2 .........1... 3 .......1..... 5 ..........1.. 2 9 73 65 67 323 133 325 161 33 265 305 273 401 145 17 .........1..1 5 ......1..1..1 2 ......1.....1 2 ......1....11 2 ....1.1....11 2 .....1....1.1 8 ....1.1...1.1 2 .....1.1....1 10 .......1....1 10 ....1....1..1 2 ....1..11...1 10 ....1...1...1 10 ....11..1...1 10 .....1..1...1 10 ........1...1 10 ....1..1..... ....1..1..1.. ....1.....1.. ....1....11.. ..1.1....11.. ..1.1.....1.. ..1.1...1.1.. ..1.....1.1.. ..1.......1.. ..1....1..1.. ..1....1..11. ..1....1...1. ..1....11..1. ..1.....1..1. ..1........1. ..1.......... 10 ..........1.. 2 .......1..... 5 .........1... 3 ..1.......... 10 .........1... 3 ........1.... 4 ....1........ 8 ........1.... 4 .......1..... 5 ...........1. 1 ..........1.. 2 ........1.... 4 .......1..... 5 ........1.... 4 Figure 20.5-C: Begin and end of a Gray cycle through the 13-bit binary Lyndon words. graph for n = 13 has 630 nodes and 8,056 edges, so finding a path is quite unexpected. The cycle found starts and ends as shown in figure 20.5-C. For next candidate (n = 17) we do not find a Gray code within many hours of search. No surprise for a graph with 7,710 nodes and 130,828 edges. We try another edge sorting scheme, an ordering based on the binary Gray code [FXT: graph/lyndon-cmp.cc]: 1 2 3 4 5 6 7 int lyndon_cmp2(const ulong &a, const ulong &b) { if ( a==b ) return 0; #define CODE(x) gray_code(x) ulong ta = CODE(a), tb = CODE(b); return ( ta40 d >160 d Figure 20.5-D: Memory and (approximate) time needed for computing Gray codes with n-bit Lyndon words. The number of nodes equals the number of length-n necklaces minus 2. The size of the tag array equals 2n /4 bits or 2n /32 bytes. With edge sorting functions that lead to a lucky path we can discard most of the data used with graph searching. We only need to keep track of whether a node has been visited so far. A tag-array ([FXT: ds/bitarray.h], see section 4.6 on page 164) suffices. With n-bit Lyndon words the amount of tag-bits needed is 2n . Find an implementation of the algorithm as [FXT: class lyndon gray in graph/lyndon-gray.h]. If only the cyclic minima of the values are tagged, then only 2n /2 bits are needed if the access to the single necklace consisting of all ones is treated separately. This variant of the algorithm is activated by uncommenting the line #define ALT_ALGORITM. As the lowest bit in a necklace is always one, we need only 2n /4 bits: simply shift the words to the right by one position before testing or writing to the tag array. This can be activated by additionally uncommenting the line #define ALTALT in the file. When a node is visited, the algorithm creates a table of neighbors and selects the minimum among the free nodes with respect to the edge sorting function used. Then the table of neighbors is discarded to minimize memory usage. If no neighbor is found, the number of nodes visited so far is returned. If this number equals the number of n-bit Lyndon words, then a lucky path was found. With composite n a Gray code for n-bit necklaces (with the exception of the all-ones and the all-zeros word) will be searched. Four variants of the algorithm have been found so far, corresponding to edge sorting with the 3rd, 5th, 21th, and 29th power of the Gray code. We refer to these functions as comparison functions 0, 1, 2, and 3, respectively. All of these lead to cycles for all primes n ≤ 31. The resources needed with greater values of n are shown in figure 20.5-D. Using a 64-bit machine equipped with more than 4 Gigabyte of RAM, it can be verified that three of the edge sorting functions lead to a Gray cycle also for n = 37, the 3rd power version fails. One of the sorting functions may lead to a Gray code for n = 41. A program to compute the Gray codes is [FXT: graph/lyndon-gray-demo.cc], four arguments can be given: arg 1: 13 == n [ a prime < BITS_PER_LONG ] default=17 arg 2: 1 == wh [printing: 0==>none, 1==>delta seq., 2==>full output] default=1 arg 3: 3 == ncmp [use comparison function (0,1,2,3)] default=2 arg 4: 0 == testall [special: test all odd values <= value] default=0 An example with full output is given in figure 20.5-E. A 64-bit CRC (see section 41.3 on page 868) is computed from the delta sequence (rightmost column) and printed with the last word. For large n one might want to print only the delta sequence, as shown in figure 20.5-F. The CRC is used to determine whether two delta sequences are different. Different sequences sometimes start identically. 20.5: Gray codes for Lyndon words 407 % ./bin 7 2 0 # 7 bits, full output, comparison function 0 n = 7 #lyn = 18 1: ......1 0 ......1 ......1 0 2: ...1..1 0 ...1..1 ...1... 3 3: ..1..11 3 ..11..1 ..1.... 4 4: ..1.111 3 .111..1 .1..... 5 5: .1.1111 2 .1111.1 ....1.. 2 6: .1.1.11 2 .1.11.1 ..1.... 4 7: .11.111 5 11.11.1 1...... 6 8: .111111 2 11111.1 ..1.... 4 9: ..11111 2 11111.. ......1 0 10: ..111.1 2 111.1.. ...1... 3 11: ..1.1.1 2 1.1.1.. .1..... 5 12: ....1.1 2 ..1.1.. 1...... 6 13: ...1.11 1 ..1.11. .....1. 1 14: ..11.11 1 .11.11. .1..... 5 15: ...11.1 2 .11.1.. .....1. 1 16: ...1111 2 .1111.. ...1... 3 17: ....111 2 ..111.. .1..... 5 18: .....11 2 ...11.. ..1.... 4 last = .....11 crc=0b14a5846c41d57f n = 7 #lyn = 18 #= 18 Figure 20.5-E: A Gray code for 7-bit Lyndon words. % ./bin 13 1 2 # 13 bits, delta seq. output, comparison function 2 n = 13 #lyn = 630 06B57458354645962546436734A74684A106C0145120825747A745247AC8564567018A7654647484A756A546457CA1ACBC1C 856BA9A64B97456548645659645219425215315BC82BC75BA02926256354267A462475A3ACB9761560C37412583758CA5624 B8C6A6C6A87A9C20CBA4534042014540523129075697651563160204230A7BA31C1485C6105201510490BCA891BA9B1B9AC0 A9A89B898A565B8785745865747845A9546702305A41275315458767465747A8457845470379A8586B0A7698578767976759 A976567686A567656A576B86581305A20AB0ACB0AB53523438235465325247563A432532A372354657643572373624634642 4532397423435235653236423263235234327532342325396926853234232582642436823632346362358423242383242327 523242325323432642324235323423 last = ...........11 crc=568dab04b55aa2fb n = 13 #lyn = 630 #= 630 % ./bin 13 1 3 # 13 bits, delta seq. output, comparison function 3 n = 13 #lyn = 630 06B57458354645962546436735371CA8B1587BA7610635285A0C2484B9713476B689A897AC98768968B9A106326016261050 1424B8979A78987B97898C98921941315313698314281687BCB9469C489C6210205B050A1A7A4568A9BC5CB79AB647B74812 0AB30BC1A131ACB120B0164CA1CABA121ABACA2B0BACAB1845786784989584867646A8456191654694745787545865490137 40201031012104270171216507457B854606C16BC523801365164130164BC7987A09872CBA9A87A20B787AC9B7CBA834C0C1 3C341C1042010C14C01C414587854645A854C95035A6A9570A9756586B9B5969580A0872C3123B0CB316BC6C0B21B2C0C2C0 5301C0530CB1C1530C01CB0BC20CBC0CB1C87565756865A75A65A40898A898B91CA898A8B898A81BC8A9ACA989AB817A9BC1 BA9ABA9CA9AB918A1CACBAC9BCB0BC last = ...........11 crc=745def277b1fbed0 n = 13 #lyn = 630 #= 630 Figure 20.5-F: Delta sequences for two different Gray codes for 13-bit Lyndon words. % ./bin 29 0 0 # 29 bits, output=progress, comparison function 0 n = 29 #lyn = 18512790 ................ 1048576 ( 5.66406 % ) crc=ceabc5f2056be699 ................ 2097152 ( 11.3281 % ) crc=76dd94f1a554b50d ................ 3145728 ( 16.9922 % ) crc=6b39957f1e141f4d ................ 4194304 ( 22.6563 % ) crc=53419af1f1185dc0 ................ 5242880 ( 28.3203 % ) crc=45d45b193f8ee566 ................ 6291456 ( 33.9844 % ) crc=95a24c824f56e196 ................ 7340032 ( 39.6484 % ) crc=003ee5af5b248e34 ................ 8388608 ( 45.3125 % ) crc=23cb74d3ea0c4587 ................ 9437184 ( 50.9766 % ) crc=896fd04c87dd0d43 ................ 10485760 ( 56.6406 % ) crc=b00d8c899f0fc791 ................ 11534336 ( 62.3047 % ) crc=d148f1b95b23eeab ................ 12582912 ( 67.9688 % ) crc=82971e2ed4863050 ................ 13631488 ( 73.6328 % ) crc=f249ad5b4fed252d ................ 14680064 ( 79.2969 % ) crc=909821d0c7246a98 ................ 15728640 ( 84.9609 % ) crc=1c5d68e38e55b3ca ................ 16777216 ( 90.625 % ) crc=0e64f82c67c79cf1 ................ 17825792 ( 96.2891 % ) crc=62c17b9f3c644396 .......... last = ...........................11 crc=5736fc9365da927e n = 29 #lyn = 18512790 #= 18512790 Figure 20.5-G: Computation of a Gray code through the 29-bit Lyndon words. Most output is suppressed, only the CRC is printed at certain checkpoints. Chapter 20: Searching paths in directed graphs ‡ 408 For still greater values of n even the delta sequence tends to get huge (for example, with n = 37 the sequence would be approximately 3.7 GB). One can suppress all output except for a progress indication, as shown in figure 20.5-G. Here the CRC checksum is updated only with every (cyclically unadjusted) 216 -th Lyndon word. Sometimes a Gray code through the necklaces (except for the all-zeros and all-ones words) is also found for composite n. Comparison functions 0, 1, and 2 lead to Gray codes (which are cycles) for all odd n ≤ 33. Gray cycles are also found with comparison function 3, except for n = 21, 27, and 33. All functions give Gray cycles also for n = 4 and n = 6. The values of n for which no Gray code was found are the even values ≥ 8. 20.5.3 No Gray codes for even n ≥ 8 As the parity of the words in a Gray code sequence alternates between one and zero, the difference between the numbers words of odd and even weight must be zero or one. If it is one, no Gray cycle can exist because the parity of the first and last word is identical. We use the relations from section 18.3.2 on page 382. For Lyndon words of odd length there are the same number of words for odd and even weight by symmetry, so a Gray code (and also a Gray cycle) can exist. For even length the sequence of numbers of Lyndon words of odd and even weights start as: n: odd: even: diff: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, ... 1, 2, 5, 16, 51, 170, 585, 2048, 7280, 26214, 95325, 349520, 1290555, ... 0, 1, 4, 14, 48, 165, 576, 2032, 7252, 26163, 95232, 349350, 1290240, ... 1, 1, 1, 2, 3, 5, 9, 16, 28, 51, 93, 170, 315, ... The last row gives the differences, entry A000048 in [312]. All entries for n ≥ 8 are greater than one, so no Gray code exists. For the number of necklaces we have, for n = 2, 4, 6, . . . n: odd: even: diff: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, ... 1, 2, 6, 16, 52, 172, 586, 2048, 7286, 26216, 95326, 349536, 1290556, ... 2, 4, 8, 20, 56, 180, 596, 2068, 7316, 26272, 95420, 349716, 1290872, ... 1, 2, 2, 4, 4, 8, 10, 20, 30, 56, 94, 180, 316, ... The (absolute) difference of both sequences is entry A000013 in [312]. We see that for n ≥ 4 the numbers are greater than one, so no Gray code exists. If we exclude the all-ones and all-zeros words, then the differences are n: diff: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, ... 1, 0, 0, 2, 2, 6, 8, 18, 28, 54, 92, 178, 314, ... And again, no Gray code exists for n ≥ 8. That is, we have found Gray codes, and even cycles, for all computationally feasible sizes where they can exist. 409 Part III Fast transforms 410 Chapter 21: The Fourier transform Chapter 21 The Fourier transform We introduce the discrete Fourier transform and give algorithms for its fast computation. Implementations and optimization considerations for complex and real-valued transforms are given. The fast Fourier transforms are the basis of the algorithms for fast convolution described in chapter 22. These are in turn the core of the fast high precision multiplication routines treated in chapter 28. The number theoretic transforms are treated in chapter 26. Algorithms for Fourier transforms based on fast convolution like Bluestein’s algorithm and Rader’s algorithm are given in chapter 22. 21.1 The discrete Fourier transform The discrete Fourier transform (DFT) of a complex sequence a = [a0 , a1 , . . . , an−1 ] of length n is the complex sequence c = [c0 , c1 , . . . , cn−1 ] defined by   c = F a (21.1-1a) n−1 ck := 1 X √ ax z +x k n x=0 where z = e2 π i/n (21.1-1b) z is a primitive n-th root of unity: z n = 1 and z j 6= 1 for 0 < j < n. The inverse discrete Fourier transform is a =   F −1 c ax := 1 X √ ck z −x k n (21.1-2a) n−1 (21.1-2b) k=0 To see this, consider the element y of the inverse transform of the transform of a:    F −1 F a y = = n−1 n−1 1 X 1 X √ √ (ax z x k ) z −y k n n x=0 k=0 X 1 X ax (z x−y )k n x (21.1-3a) (21.1-3b) k P Now k (z x−y )k = n for x = y and 0 else. This is because z is an n-th primitive root of unity: with x = y the sum consists of n times z 0 = 1, with x 6= y the summands lie on the unit circle (on the vertices of an equilateral polygon with center 0) and add up to 0. Therefore the whole expression is equal to  1 X 1 if x = y n ax δx,y = ay where δx,y := (21.1-4) 0 otherwise n x Here we will call the transform with the plus in the exponent the forward transform. The choice is actually arbitrary, engineers seem to prefer the minus for the forward transform, mathematicians the plus. The sign in the exponent is called the sign of the transform. 21.2: Radix-2 FFT algorithms 411 The Fourier transform is linear: for α, β ∈ C we have       F αa + βb = αF a + βF b (21.1-5) Further Parseval’s equation holds, the sum of squares of the absolute values is identical for a sequence and its Fourier transform: n−1 X 2 |ax | x=0 = n−1 X 2 |ck | (21.1-6) k=0 A straightforward implementation of the discrete Fourier transform, that is, the computation of n sums each of length n, requires O(n2 ) operations [FXT: fft/slowft.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 void slow_ft(Complex *f, long n, int is) { Complex h[n]; const double ph0 = is*2.0*M_PI/n; for (long w=0; w0 ? 2.0*M_PI : -2.0*M_PI ); const ulong n = (1UL<>LX); double ph0 = s2pi/m; for (ulong j=0; j>LX); const double ph0 = -2.0*M_PI/m; // isign for (ulong j=0; j static inline void sumdiff(Type &a, Type &b) 21.4: Higher radix FFT algorithms 3 4 423 // {a, b} <--| {a+b, a-b} { Type t=a-b; a+=b; b=t; } The routine fft8_dit_core_m1() is an unrolled size-8 DIT FFT (hard-coded for σ = −1) given in [FXT: fft/fft8ditcore.cc]. We further need a version of the routine for the positive sign. It uses a routine fft8_dit_core_p1() for the computation of length-8 DIT FFTs with σ = −1. The following changes need to be made in the core routine [FXT: fft/cfftdit4.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 void fft_dit4_core_p1(Complex *f, ulong ldn) // Fixed isign = +1 { [--snip--] for (ulong i0=0; i00 ) fft_dit4_core_p1(f, ldn); else fft_dit4_core_m1(f, ldn); } 21.4.5 Radix-4 DIF FFT A routine for the radix-4 DIF FFT is (the C++ equivalent is given in [FXT: fft/fftdif4l.cc]) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 procedure fftdif4(a[], ldn, is) // complex a[0..2**ldn-1] input, result { n := 2**ldn for ldm := ldn to 2 step -2 { m := 2**ldm mr := m/4 for j := 0 to mr-1 { e := exp(is*2*PI*I*j/m) e2 := e * e e3 := e2 * e for r := 0 to n-m step m { u0 := a[r+j] u1 := a[r+j+mr] u2 := a[r+j+mr*2] u3 := a[r+j+mr*3] x := u0 + u2 y := u1 + u3 t0 := x + y // == (u0+u2) + (u1+u3) t2 := x - y // == (u0+u2) - (u1+u3) x := u0 - u2 y := (u1 - u3)*I*is t1 := x + y // == (u0-u2) + (u1-u3)*I*is t3 := x - y // == (u0-u2) - (u1-u3)*I*is t1 := t1 * e t2 := t2 * e2 424 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 Chapter 21: The Fourier transform t3 := t3 * e3 a[r+j] := t0 a[r+j+mr] := t2 a[r+j+mr*2] := t1 a[r+j+mr*3] := t3 // (!) // (!) } } } if is_odd(ldn) then // n not a power of 4 { for r:=0 to n-2 step 2 { {a[r], a[r+1]} := {a[r]+a[r+1], a[r]-a[r+1]} } } revbin_permute(a[],n) } A reasonably optimized implementation, hard-coded for σ = +1, is [FXT: fft/cfftdif4.cc] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 static const ulong RX = 4; static const ulong LX = 2; void fft_dif4_core_p1(Complex *f, ulong ldn) // Auxiliary routine for fft_dif4(). // Radix-4 decimation in frequency FFT. // Output data is in revbin_permuted order. // ldn := base-2 logarithm of the array length. // Fixed isign = +1 { const ulong n = (1UL<=(LX<<1); ldm-=LX) { ulong m = (1UL<>LX); const double ph0 = 2.0*M_PI/m; // isign for (ulong j=0; j0 ) fft_dif4_core_p1(f, ldn); else fft_dif4_core_m1(f, ldn); revbin_permute(f, 1UL<0 { for j:=1 to n/2-1 { swap(x[j], x[n-j]) } for j:=1 to n/2-1 { swap(y[j], y[n-j]) } } } The C++ implementation given in [FXT: fft/fftsplitradix.cc] uses a DIF core as above which is given in [129]. The C++ type complex version of the split-radix FFT given in [FXT: fft/cfftsplitradix.cc] uses a DIF or DIT core, depending on the sign of the transform. Here we just give the DIF version: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 void split_radix_dif_fft_core(Complex *f, ulong ldn) // Split-radix decimation in frequency (DIF) FFT. // ldn := base-2 logarithm of the array length. // Fixed isign = +1 // Output data is in revbin_permuted order. { if ( ldn==0 ) return; const ulong n = (1UL<>= 1; // == n>>(k-1) == n, n/2, n/4, ..., 4 const ulong n4 = n2 >> 2; // == n/4, n/8, ..., 1 const double e = s2pi / n2; { // j==0: const ulong j = 0; ulong ix = j; ulong id = (n2<<1); while ( ix static inline void sumdiff3(Type &a, Type b, Type &d) // {a, b, d} <--| {a+b, b, a-b} (used in split-radix FFTs) { d=a-b; a+=b; } 21.6 Symmetries of the Fourier transform A bit of notation again. Let a be the length-n sequence a reversed around the element with index 0: a0 := a0 an/2 := an/2 ak (21.6-1a) if n even := an−k = a−k (21.6-1b) (21.6-1c) That is, we consider the indices modulo n and a is the sequence a with negated indices. Element zero stays in its place and for even n there is also an element with index n/2 that stays in place. 21.6: Symmetries of the Fourier transform 429 Example one, length-4: a := [0, 1, 2, 3], then a = [0, 3, 2, 1] Example two, length-5: a := [0, 1, 2, 3, 4], then a = [0, 4, 3, 2, 1] (0 and 2 stay). (only 0 stays). Let aS and aA denote the symmetric and antisymmetric parts of the sequence a, respectively: aS := aA := 1 (a + a) 2 1 (a − a) 2 (21.6-2a) (21.6-2b) The elements with index 0 (and n/2 for even n) of aA are zero. We have a = aS + aA (21.6-3a) a = aS − aA (21.6-3b) Let c + i d be the transform of the sequence a + i b, then   F (aS + aA ) + i (bS + bA ) = (cS + cA ) + i (dS + dA )   F aS = cS ∈ R   F aA = i dA ∈ i R   F i bS = i dS ∈ i R   F i bA = cA ∈ R where (21.6-4a) (21.6-4b) (21.6-4c) (21.6-4d) (21.6-4e) Here we write a ∈ R as a short form for a purely real sequence a. Equivalently, we write a ∈ i R for a purely imaginary sequence. Thus the transform of a complex symmetric or antisymmetric sequence is symmetric or antisymmetric, respectively:   F aS + i bS = cS + i dS (21.6-5a)   F aA + i bA = cA + i dA (21.6-5b) The real and imaginary parts of the transform of a symmetric sequence correspond to the real and imaginary parts of the original sequence. With an antisymmetric sequence the transform of the real and imaginary parts correspond to the imaginary and real parts of the original sequence.   F (aS + aA ) = cS + i dA (21.6-6a)   F i (bS + bA ) = cA + i dS (21.6-6b) If the sequence a is purely real, then we have     F aS = +F aS     F aA = −F aA ∈ R (21.6-7a) ∈ iR (21.6-7b) That is, the transform of a real symmetric sequence is real and symmetric and the transform of a real antisymmetric sequence is purely imaginary and antisymmetric. Thus the transform of a general real sequence is the complex conjugate of its reversal:    ∗ F a = F a f or a∈R (21.6-8) ∈ iR (21.6-9a) ∈ R (21.6-9b) Similarly, for a purely imaginary sequence b ∈ iR, we have     F bS = +F bS     F bA = −F bA 430 Chapter 21: The Fourier transform We compare the results of the Fourier transform and its inverse (the transform with negated sign σ) by symbolically writing the transforms as a complex multiplication with the trigonometric term (using C for cosine, S for sine):   F a + ib : (a + i b) (C + i S) = (a C − b S) + i (b C + a S) (21.6-10a)   −1 F a + ib : (a + i b) (C − i S) = (a C + b S) + i (b C − a S) (21.6-10b) The terms on the right side can be identified with those in relation 21.6-4a. Changing the sign of the transform leads to a result where the components due to the antisymmetric parts of the input are negated. Now write F for the Fourier transform and R for the reversal. We have F 4 = id, F 3 = F −1 , and F 2 = R. So the inverse transform can be computed as either F −1 21.7 = RF = F R (21.6-11) Inverse FFT for free Some FFT implementations are hard-coded for a fixed sign of the transform. If we cannot easily modify the implementation into the transform with the other sign (the inverse transform), then how can we compute the inverse FFT? If the implementation uses separate arrays for the real and imaginary parts of the complex sequences to be transformed, as in 1 2 3 4 5 6 7 8 procedure my_fft(ar[], ai[], ldn) // only for is==+1 ! // real ar[0..2**ldn-1] input, result, real part // real ai[0..2**ldn-1] input, result, imaginary part { // Incredibly complicated code // that you cannot see how to modify // for is==-1 } Then do as follows: with the forward transform being my_fft(ar[], ai[], ldn) // forward FFT compute the inverse transform as my_fft(ai[], ar[], ldn) // inverse FFT Note the swapped real and imaginary parts! The same trick works for a procedure coded for fixed is= −1. To see why this works, we note that           F a + ib = F aS + i σ F aA + i F bS + σ F bA         = F aS + i F bS + i σ F aA − i F bA (21.7-1b) For the computation with swapped real and imaginary parts we have           F b + ia = F bS + i F aS + i σ F bA − i F aA (21.7-2a) (21.7-1a) Now the real and imaginary parts are implicitly swapped at the end of the computation, giving           F aS + i F bS − i σ F aA − i F bA = F −1 a + i b (21.7-2b) When a complex type is used, then the best way to compute the inverse transform may be to reverse the sequence according to the symmetry of the Fourier transform given as relation 21.6-11: the transform with negated sign can be computed by reversing the order of the result (use the routine reverse_0() in [FXT: perm/reverse.h]). The reversal can also happen with the input data before the transform, which is advantageous if the data has to be copied anyway (use copy_reverse_0() in [FXT: aux1/copy.h]). The additional work will usually not matter. 21.8: Real-valued Fourier transforms 21.8 431 Real-valued Fourier transforms   The Fourier transform of a purely real sequence c = F a where a ∈ R has a symmetric real part (Re c = Re c, relation 21.6-8) and an antisymmetric imaginary part (Im c = − Im c). The symmetric and antisymmetric parts of the original sequence correspond to the symmetric (and purely real) and antisymmetric (and purely imaginary) parts of the transform, respectively:       F a = F aS + i σ F aA (21.8-1) Simply using a complex FFT for real input is a waste by a factor 2 of memory and CPU cycles. There are several alternatives: • wrapper routines for complex FFTs (section 21.8.3 on the next page), • usage of the fast Hartley transform (section 25.5 on page 523), • special versions of the split-radix algorithm (section 21.8.4 on page 434). All techniques have in common that they store only half of the complex result to avoid the redundancy due to the symmetries of a complex Fourier transform of purely real input. The result of a real to complex FFT (R2CFT) contains the purely real components c0 (the ‘DC-part’ of the input signal) and, in case n is even, cn/2 (the Nyquist frequency part). The inverse procedure, the complex to real transform (C2RFT) must be compatible to the ordering of the R2CFT. 21.8.1 Sign of the transforms The sign of the transform can be chosen arbitrarily to be either +1 or −1. Note that the transform with the ‘other sign’ is not the inverse transform. The R2CFT and its inverse C2RFT must use the same sign. Some R2CFT and C2RFT implementations are hard-coded for a fixed sign. For the R2CFT with the other sign, negate the imaginary part after the transform. If we have to copy the data before the transform, then we can exploit the relation       = F aS − i σ F aA (21.8-2) F a That is, copy the real data in reversed order to get the transform with the other sign. This technique does not involve an extra pass and should be virtually for free. For the complex to real FFTs (C2RFT) we have to negate the imaginary part before the transform to obtain the transform with the other sign. 21.8.2 Data ordering Let c be the Fourier transform of the purely real sequence, stored in the array a[ ]. All given procedures use one of the following schemes for storing the transformed sequence. A scheme that interleaves real and imaginary parts (‘complex ordering’) is a[0] = Re c0 a[1] = Re cn/2 a[2] = Re c1 a[3] = Im c1 a[4] = Re c2 a[5] = Im c2 .. . a[n − 2] = Re cn/2−1 a[n − 1] = Im cn/2−1 (21.8-3) 432 Chapter 21: The Fourier transform Note the absence of the elements Im c0 and Im cn/2 which are always zero. Some routines store the real parts in the lower half and imaginary parts in the upper half. The data in the lower half will always be ordered as follows: a[0] = Re c0 a[1] = Re c1 a[2] = Re c2 .. . a[n/2] = Re cn/2 (21.8-4) For the imaginary part of the result there are two schemes: The ‘parallel ordering’ is a[n/2 + 1] = Im c1 a[n/2 + 2] = Im c2 a[n/2 + 3] = Im c3 .. . a[n − 1] = Im cn/2−1 a[n/2 + 1] = Im cn/2−1 a[n/2 + 2] = Im cn/2−2 a[n/2 + 3] = Im cn/2−3 .. . (21.8-5) The ‘antiparallel ordering’ is a[n − 1] 21.8.3 (21.8-6) = Im c1 Real-valued Fourier transforms via wrapper routines A complex length-n FFT can be used to compute a real length-2n FFT. For a real sequence a one feeds the (length-n) complex sequence f = a(even) + i a(odd) into a complex FFT. Some post-processing is necessary. This is not the most elegant real FFT available, but it is directly usable to turn complex FFTs into real FFTs. A C++ implementation of the real to complex FFT (R2CFT) is given in [FXT: realfft/realfftwrap.cc], the sign of the transform is hard-coded to σ = +1: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 void wrap_real_complex_fft(double *f, ulong ldn) // Real to complex FFT (R2CFT) { if ( ldn==0 ) return; fht_fft((Complex *)f, ldn-1, +1); // cast const ulong n = 1UL<=2 ) { f[nh] *= 2.0; f[nh+1] *= 2.0; } 434 50 51 52 Chapter 21: The Fourier transform fht_fft((Complex *)f, ldn-1, -1); // cast } 21.8.4 Real-valued split-radix Fourier transforms We give pseudocode for the split-radix real to complex FFT and its inverse. The C++ implementations are given in [FXT: realfft/realfftsplitradix.cc]. The code given here follows [130], see also [318] (erratum for page 859 of [318]: at the start of the D0 32 loop replace the obvious assignments by CC1=COS(A), SS1=SIN(A), CC3=COS(A3), SS3=SIN(A3)). 21.8.5 Real to complex split-radix FFT We give a routine for the split-radix R2CFT algorithm, the sign of the transform is hard-coded to σ = −1: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 procedure r2cft_splitradix_dit(x[], ldn) { n := 2**ldn revbin_permute(x[], n); ix := 1; id := 4; do { i0 := ix-1 while i0 void slow_convolution(const Type *f, const Type *g, Type *h, ulong n) // (cyclic) convolution: h[] := f[] (*) g[] // n := array length { for (ulong tau=0; tau void slow_linear_convolution(const Type *f, const Type *g, Type *h, ulong n) // Linear (acyclic) convolution. 444 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Chapter 22: Convolution, correlation, and more FFT algorithms // n := array length of a[] and b[] // The array h[] must have 2*n elements. { // compute h0 (left half): for (ulong tau=0; tau void slow_correlation(const Type *f, const Type *g, Type * restrict h, ulong n) // Cyclic correlation of f[], g[], both real-valued sequences. // n := array length { for (ulong tau=0; tau=n ) k2 -= n; s += (g[k]*f[k2]); } h[tau] = s; } } The if statement in the inner loop is avoided by the following version: 1 2 3 4 5 6 7 8 for (ulong tau=0; tau void slow_correlation0(const Type *f, const Type *g, Type * restrict h, ulong n) // Linear correlation of f[], g[], both real-valued sequences. // n := array length // Version for zero padded data: // f[k],g[k] == 0 for k=n/2 ... n-1 // n must be >=2 { const ulong nh = n/2; for (ulong tau=0; tau>1); fht_real_complex_fft(f, ldn); fht_real_complex_fft(g, ldn); // real, imag part in lower, upper half const double v = 1.0/n; g[0] *= f[0] * v; g[nh] *= f[nh] * v; for (ulong i=1,j=n-1; iτ The sequences h(0) and h(1) are the left and right half of the linear convolution sequence a ~lin b, defined by relation 22.1-6a on page 443. For example, the linear self-convolution of the sequence [1, 1, 1, 1] is the length-8 sequence [h0 ][h1 ] = [1, 2, 3, 4][3, 2, 1, 0], its cyclic self-convolution is [h0 + h1 ] = [4, 4, 4, 4]. The direct (slow) routine for linear convolution can be modified to compute just one of either h(0) or h(1) [FXT: convolution/slowcnvlhalf.h]: 450 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Chapter 22: Convolution, correlation, and more FFT algorithms template void slow_half_convolution(const Type *f, const Type *g, Type *h, ulong n, int h01) // Half cyclic convolution. // Part determined by h01 which must be 0 or 1. // n := array length { if ( 0==h01 ) // compute h0: { for (ulong tau=0; tauτ x≤τ mod n Final division of this element (by V τ ) gives h(0) + V n h(1) as stated. +-| 0: 1: 2: 3: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 0 1 2 3 1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7 5 6 7 8 6 7 8 9 7 8 9 10 8 9 10 11 9 10 11 12 10 11 12 13 11 12 13 14 12 13 14 15 13 14 15 14 15 015 0- 10- 1- 2- 4: 5: 6: 7: 4 5 6 7 5 6 7 8 6 7 8 9 7 8 9 10 8 9 10 11 9 10 11 12 10 11 12 13 11 12 13 14 12 13 14 15 13 14 15 14 15 015 0- 10- 1- 2- 0123- 1234- 2345- 8: 9: 10: 11: 8 9 10 11 9 10 11 12 10 11 12 13 11 12 13 14 12 13 14 15 13 14 15 14 15 015 0- 10- 1- 2- 0123- 1234- 2345- 4567- 5678- 6- 77- 88- 99- 10- 12: 13: 14: 15: 12 13 14 15 13 14 15 14 15 015 0- 10- 1- 2- 0123- 1234- 4567- 5678- 6- 77- 88- 99- 10- 2345- 3456- 3456- 14 15 3456- 8- 9- 10- 119- 10- 11- 1210- 11- 12- 1311- 12- 13- 14- Figure 22.4-A: Semi-symbolic table for the negacyclic convolution. The products that enter with negative sign are indicated with a postfix minus at the corresponding entry. √ The cases when V n is some root of unity are particularly interesting. For V n = ±i = ± −1 we obtain the right-angle convolution: hv = h(0) ∓ i h(1) (22.4-8) 22.5: Convolution using the MFA 451 Choosing V n = −1 leads to the negacyclic convolution (or skew circular convolution): hv = h(0) − h(1) (22.4-9) Cyclic, negacyclic and right-angle convolution can be understood as polynomial products modulo the polynomials z n − 1, z n + 1 and z n ± i, respectively (see [262]). The semi-symbolic table for the negacyclic√convolution is shown in figure 22.4-A. With right-angle convolution the minuses are replaced by i = −1, so the elements in h(1) go to the imaginary part. With real input one effectively separates h(0) and h(1) . Therefore the linear convolution of real sequences can be computed using the complex right-angle convolution. The parts h(0) and h(1) can be computed as sum and difference of the cyclic and the negacyclic convolution. Thus all expressions of the form α h(0) + β h(1) where α, β ∈ C can be computed. The routine for the direct computation has complexity O(n2 ) [FXT: convolution/slowweightedcnvl.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 template void slow_weighted_convolution(const Type *f, const Type *g, Type *h, ulong n, Type w) // weighted (cyclic) convolution: h[] := f[] (*)_w g[] // n := array length { for (ulong tau=0; tau= 2 n such that a length-L FFT can be computed (highly composite L, for example a power of 2). As the Fourier transform is the special case z = e±2 π i/n of the ZT, the chirp-ZT algorithm constitutes an FFT algorithm for sequences of arbitrary length. An implementation is [FXT: chirpzt/fftarblen.cc] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 void fft_arblen(Complex *x, ulong n, int is) // Arbitrary length FFT. { const ulong ldnn = 1 + ld( (n << 1) - 1 ); const ulong nn = (1UL<= 2*n Complex *f = new Complex[nn]; acopy(x, f, n); null(f+n, nn-n); Complex *w = new Complex[nn]; make_fft_chirp(w, n, nn, is); multiply(f, n, w); double *dw = (double *)w; for (ulong k=1; k<2*n; k+=2) dw[k] = -dw[k]; // =^= make_fft_chirp(w, n, nn, -is); fft_complex_convolution(w, f, ldnn); if ( n & 1 ) else subtract(f, n, f+n); add(f, n, f+n); make_fft_chirp(w, n, nn, is); multiply(w, n, f); acopy(w, x, n); delete [] w; delete [] f; } // odd n: negacyclic convolution // even n: cyclic convolution 456 Chapter 22: Convolution, correlation, and more FFT algorithms The auxiliary routine make_fft_chirp() is 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 static inline void make_fft_chirp(Complex *w, ulong n, ulong nn, int is) // For k=0..n-1: w[k] := exp( is * k*k * (i*2*PI/n)/2 ) // For k=n..nn-1: w[k] = 0 { double phi = 1.0*is*M_PI/n; // == (i*2*Pi/n)/2 ulong k2 = 0, n2 = 2*n; for (ulong k=0; kn2 ) k2 -= n2; // here: k2 == (k*k) mod 2*n; } null(w+n, nn-n); } where i = sqrt(-1) The computation of a length-n ZT uses three FFTs with length greater than n. The worst case (if only FFTs for n a power of 2 are available) is n = 2p + 1: we need three FFTs of length L = 2p+1 ≈ 2n for the computation of the convolution. So the total work is about 6 times the work of an FFT of length n. It is possible to lower this worst case factor to 3 by using highly composite L slightly greater than n. For multiple computations of z-transforms of the same length one should precompute and store the 2 transform of the sequence z k /2 as it does not change. Therefore the worst case is a factor 2 with highly composite FFTs and 4 if FFTs are available for powers of 2 only. 22.6.3 Fractional Fourier transform by ZT The z-transform with z = eα 2 π i/n is called the fractional Fourier transform in [29]. The term is usually used for the fractional order transform given as relation 25.11-6 on page 533, see also [274, ch.13]. For α = ±1 one again obtains the usual Fourier transform. The fractional Fourier transform can be used for the computation of the Fourier transform of sequences with only few nonzero elements and for the exact detection of frequencies that are not integer multiples of the lowest frequency of the DFT. A C++ implementation of the fractional Fourier transform for sequences of arbitrary length is given in [FXT: chirpzt/fftfract.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 void fft_fract(Complex *x, ulong n, double v) // Fractional (fast) Fourier transform. { const ulong ldnn = 1 + ld( (n << 1) - 1 ); const ulong nn = (1UL<= 2*n Complex *f = new Complex[nn]; acopy(x, f, n); null(f+n, nn-n); Complex *w = new Complex[nn]; make_fft_fract_chirp(w, v, n, nn); for (ulong j=0; j=n2 ) np -= n2; } } 22.7 Prime length FFTs For the computation of FFTs for sequences whose length is prime we can exploit the existence of primitive roots. We will be able to express the transform of all but the first element as a cyclic convolution of two sequences whose length is reduced by one. Let p be prime, then an element g exists so that the least positive exponent e so that g e ≡ 1 mod p is e = p − 1. The element g is called a generator (or primitive root) modulo p (see section 39.6 on page 776). Every nonzero element modulo p can be uniquely expressed as a power g e where 0 ≤ e < p − 1. For example, a generator modulo p = 11 is g = 2, its powers are g 0 ≡ 1, g 1 ≡ 2, g 2 ≡ 4, g 3 ≡ 8, g 4 ≡ 5, g 5 ≡ 10 ≡ −1, g 6 ≡ 9, g 7 ≡ 7, g 8 ≡ 3, g 9 ≡ 6, g p−1 ≡ 1 Likewise, we can express any nonzero element as a negative power of g. Let h = g −1 , then with our example h ≡ 6 and h0 ≡ 1, h1 ≡ 6, h2 ≡ 3, h3 ≡ 7, h4 ≡ 9, h5 ≡ 10 ≡ −1, h6 ≡ 5, h7 ≡ 8, h8 ≡ 4, h9 ≡ 2, hp−1 ≡ 1 This is just the reversed sequence of values. Let C be the Fourier transform of length-p sequence A: Ck p−1 X = Ax W σ x k (22.7-1) x=0 where W = exp (2 π i/p) and σ = ±1 is the sign of the transform. We split the computation of the Fourier transform into two parts, we compute the first element of the transform as C0 = p−1 X Ax (22.7-2) Ax W σ x k (22.7-3) x=0 Now it remains to compute Ck for 1 ≤ k ≤ p − 1: Ck = A0 + p−1 X x=1 Note the lower index of the sum. We write k ≡ g e and x ≡ g −f (modulo p), so C(g e ) − A0 = p−2 X f =0 A (g −f ) W σ (g −f ) (g e ) = p−2 X f =0 w A (g −f ) W σ (g The sum is a cyclic convolution of the sequences W ∗ := W (g ) and A∗ := A e−f ) (22.7-4) where 0 ≤ w ≤ p − 2. (g −w ) The main algorithm (ignoring the constant terms A0 and C0 ) can be outlined as follows: 1. Compute A∗ and W ∗ by permuting the sequences A and W . 2. Compute C ∗ as the cyclic convolution of A∗ and W ∗ . 458 Chapter 22: Convolution, correlation, and more FFT algorithms 3. Compute W by permuting W ∗ . The method is given in [277], it is called Rader’s algorithm. We implement it in GP: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 ft_rader(a, is=+1)= \\ Fourier transform for prime lengths (Rader’s algorithm) { local(n, a0, c0, g, w); local(c, ixp, ixm, pa, pw, t); n = length(a); a0 = a[1]; c0 = sum(j=1, n, a[j]); \\ constant terms \\ prepare permutations: g = znprimroot(n); ixp = vector(n, j, lift( g^(j-1) ) ); g = g^(-1); ixm = vector(n, j, lift( g^(j-1) ) ); \\ permute sequence W: w = is*2*I*Pi/n; pw = vector(n-1, j, exp(w*ixp[j]) ); \\ permute sequence A: pa = vector(n-1); for (j=1, n-1, pa[j]=a[1+ixp[1+n-j]] ); \\ cyclic convolution of permuted sequences: t = cconv(pa, pw); \\ cyclic convolution \\ set C_0, and add A_0 to each C_k: c = vector(n); c[1] = c0; for (k=1, n-1, c[1+k]=t[k]+a0); \\ permute to obtain result: t = vector(n); t[1] = c[1]; return( t ); for (k=2, n, t[1+ixp[k-1]]=c[k]); } With a (slow) implementation of the cyclic convolution and DFT we can check whether the method works by comparing the results: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 cconv(a, b)= /* Cyclic convolution (direct computation, n^2 operations) */ /* Example: cconv([a,b],[c,d]) ==> [b*d + c*a, a*d + c*b] */ { local(n, f, s, k, k2); n = length(a); f = vector(n); for (tau=0, n-1, \\ tau = k + k2 s0 = 0; k = 0; k2 = tau; while (k<=tau, s0 += (a[k+1]*b[k2+1]); k++; k2--); s1 = 0; k2 = n-1; \\ k=tau+1 while (k>1); for (ulong j=0; j void walsh_wak_dit2(Type *f, ulong ldn) // Transform wrt. to Walsh-Kronecker basis (wak-functions). // Radix-2 decimation in time (DIT) algorithm. { const ulong n = (1UL<>1); for (ulong r=0; r void walsh_wak_dif2(Type *f, ulong ldn) { const ulong n = (1UL<=1; --ldm) { [--snip--] // same block as in DIT routine } } The basis functions are shown in figure 23.1-A. The lowest row is (the signed version of) the Thue-Morse sequence, see section 1.16.4 on page 44. A routine that computes the k-th basis function of the transform is [FXT: walsh/walsh-basis.h]: 1 2 3 4 5 6 7 8 9 10 template void walsh_wak_basis(Type *f, ulong n, ulong k) { for (ulong i=0; i 1:   +Wn/2 +Wn/2 Wn = = W2 ⊗ Wn/2 +Wn/2 −Wn/2 (23.3-13) (23.3-14) To see that this relation is the statement of a fast algorithm, split the (to be transformed) vector x into halves   x0 x = (23.3-15) x1 and write out the matrix-vector product     Wn/2 x0 + Wn/2 x1 Wn/2 (x0 + x1 ) Wn x = = Wn/2 x0 − Wn/2 x1 Wn/2 (x0 − x1 ) (23.3-16) That is, a length-n transform can be computed by two length-n/2 transforms of the sum and difference of the first and second half of x. We define a notation equivalent to the product sign, n O Mk := M1 ⊗ M2 ⊗ M3 ⊗ . . . ⊗ Mn (23.3-17) k=1 where the empty product equals a 1 × 1 matrix with entry 1. If A = B in relation 23.3-11b, then we have (A ⊗ A)−1 = A−1 ⊗ A−1 , (A ⊗ A ⊗ A)−1 = A−1 ⊗ A−1 ⊗ A−1 and so on. That is, n O !−1 A = k=1 n O A−1 (23.3-18) k=1 For the Walsh transform we have log2 (n) Wn = O k=1 W2 (23.3-19) 23.4: Higher radix Walsh transforms 465 and log2 (n) Wn−1 O = W2−1 (23.3-20) k=1 The latter relation isn’t that exciting as W2−1 = W2 for the Walsh transform. However, it also holds if the inverse transform is different from the forward transform. Given a fast algorithm for some transform in the form of a Kronecker product, the fast algorithm for the inverse transform is immediate. The direct sum of two matrices is defined as  A ⊕ B := A 0 0 B  (23.3-21) In general A ⊕ B 6= B ⊕ A. As an analogue to the sum sign we have n M A := In ⊗ A (23.3-22) k=1 where In is the n×n identity matrix. The matrix In ⊗ A consists of n copies of A that lie on the diagonal. The Kronecker product can be used to derive properties of unitary transforms, see [282]. In [236] the properties of the Kronecker product are used to develop all well-known algorithms for computing the Fourier transform. 23.4 Higher radix Walsh transforms 23.4.1 Generated transforms A generator for short-length Walsh (wak) transforms is given as [FXT: fft/gen-walsh-demo.cc]. It can create code for DIF and DIT transforms. For example, the code for the 4-point DIF transform is 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 template inline void short_walsh_wak_dif_4(Type *f) { Type t0, t1, t2, t3; t0 = f[0]; t1 = f[1]; t2 = f[2]; t3 = f[3]; sumdiff( t0, t2 ); sumdiff( t1, t3 ); sumdiff( t0, t1 ); sumdiff( t2, t3 ); f[0] = t0; f[1] = t1; f[2] = t2; f[3] = t3; } To make the code more readable we use the function [FXT: aux0/sumdiff.h]: 1 2 3 4 template static inline void sumdiff(Type &a, Type &b) // {a, b} <--| {a+b, a-b} { Type t=a-b; a+=b; b=t; } We further need a variant that transforms elements which are not contiguous but lie apart by a distance s: 1 2 3 4 5 6 7 template inline void short_walsh_wak_dif_4(Type *f, ulong s) { Type t0, t1, t2, t3; { ulong x = 0; 466 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Chapter 23: The Walsh transform and its relatives t0 = f[x]; x += s; t1 = f[x]; x += s; t2 = f[x]; x += s; t3 = f[x]; } sumdiff( t0, t2 ); sumdiff( t1, t3 ); sumdiff( t0, t1 ); sumdiff( t2, t3 ); { ulong x = 0; f[x] = t0; x += s; f[x] = t1; x += s; f[x] = t2; x += s; f[x] = t3; } } The short-length transforms (DIF and DIT variants) are given in [FXT: walsh/shortwalshwakdif.h] and [FXT: walsh/shortwalshwakdit.h], respectively. A radix-4 DIF transform using these as ingredients is [FXT: walsh/walshwak4.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 template void walsh_wak_dif4(Type *f, ulong ldn) // Transform wrt. to Walsh-Kronecker basis (wak-functions). // Radix-4 decimation in frequency (DIF) algorithm. // Self-inverse. { const ulong n = (1UL<3; ldm-=2) { ulong m = (1UL<>2); for (ulong r=0; r void walsh_wak_dif8(Type *f, ulong ldn) // Transform wrt. to Walsh-Kronecker basis (wak-functions). // Radix-8 decimation in frequency (DIF) algorithm. // Self-inverse. { const ulong n = (1UL<xx; ldm-=3) { ulong m = (1UL<>3); for (ulong r=0; r void walsh_wak_matrix(Type *f, ulong ldn) { ulong ldc = (ldn>>1); ulong ldr = ldn-ldc; // ldr>=ldc ulong nc = (1UL<= ncol for (ulong r=0; r void walsh_wak_matrix_1(Type *f, ulong ldn, int is) { ulong ldc = (ldn>>1); ulong ldr = ldn-ldc; // ldr>=ldc if ( is<0 ) swap2(ldr, ldc); // inverse ulong nc = (1UL<= ncol for (ulong r=0; r 13 and the performance becomes more and more memory bound. In the first region the radix-4 routine is the fastest. The radix-8 routine comes close but, somewhat surprisingly, never wins. In the second region the matrix version is the best. However, for very large sizes its performance could be better. Note that with odd ldn (not shown) its performance drops significantly due to the more expensive transposition operation. The transposition is clearly the bottleneck. One can use machinespecific optimizations for the transposition to further improve the performance. In the next section we give an algorithm that avoids the transposition completely and consistently outperforms the matrix algorithm. 23.5 Localized Walsh transforms A decimation in time (DIT) algorithm combines the halves of the array, then the halves of the halves, the halves of each quarter, and so on. With each step the whole array is accessed which leads to a drop in performance as soon as the array does not fit into the cache. 23.5.1 The method of localization We can reorganize the algorithm as follows: combine the halves of the array and postpone further processing of the upper half, then combine the halves of the lower half and again postpone processing of its upper half. Repeat until size 2 is reached. Then use the algorithm at the postponed parts, starting with the smallest (last postponed). For size 16 the scheme can be sketched as follows: hhhhhhhhhhhhhhhh hhhhhhhh44444444 hhhh333344444444 hh22333344444444 The letters ‘h’ denote places processed before any recursive call. The blocks of twos, threes and fours denote postponed blocks. The Walsh transform is thereby decomposed into a sequence of Haar transforms (see figure 24.6-A on page 508). The algorithm described is most easily implemented via recursion: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 template void walsh_wak_loc_dit2(Type *f, ulong ldn) { if ( ldn<1 ) return; // Recursion: for (ulong ldm=1; ldm>1); for (ulong t1=0, t2=mh; t1 void walsh_wak_loc_dit2(Type *f, ulong ldn) { if ( ldn<=13 ) // parameter: (2**13)*sizeof(Type) <= L1-cache { walsh_wak_dif4(f,ldn); // note: DIF version, result is the same 23.5: Localized Walsh transforms 8 == ldn; 469 MemSize == 2 kB == walsh_wak_dif2(f,ldn); walsh_wak_dif4(f,ldn); walsh_wak_dif8(f,ldn); walsh_wak_matrix(f,ldn); walsh_wak_matrix_1(f,ldn,+1); 10 == ldn; MemSize == 8 kB == walsh_wak_dif2(f,ldn); walsh_wak_dif4(f,ldn); walsh_wak_dif8(f,ldn); walsh_wak_matrix(f,ldn); walsh_wak_matrix_1(f,ldn,+1); 12 == ldn; MemSize == 32 kB == walsh_wak_dif2(f,ldn); walsh_wak_dif4(f,ldn); walsh_wak_dif8(f,ldn); walsh_wak_matrix(f,ldn); walsh_wak_matrix_1(f,ldn,+1); 256 doubles; rep == 976563 dt= 2.49551 MB/s= 6114 dt= 1.56806 MB/s= 9731 dt= 1.57419 MB/s= 9693 dt= 2.28047 MB/s= 6691 dt= 1.94357 MB/s= 7851 1024 doubles; rep == 195313 dt= 2.26683 MB/s= 6731 dt= 1.47338 MB/s=10356 dt= 1.65262 MB/s= 9233 dt= 1.91859 MB/s= 7953 dt= 1.69215 MB/s= 9017 4096 doubles; rep == 20345 dt= 1.0884 MB/s= 7010 dt= 0.723136 MB/s=10550 dt= 0.790313 MB/s= 9654 dt= 1.01233 MB/s= 7536 dt= 0.926387 MB/s= 8236 14 == ldn; 16384 doubles; rep == 2180 dt= 1.17042 MB/s= 3260 dt= 1.14861 MB/s= 3321 dt= 1.08501 MB/s= 3516 dt= 0.669182 MB/s= 5701 dt= 0.552063 MB/s= 6910 65536 doubles; rep == 477 dt= 1.40004 MB/s= 2726 dt= 1.70347 MB/s= 2240 dt= 1.12997 MB/s= 3377 dt= 0.801902 MB/s= 4759 dt= 0.628073 MB/s= 6076 256 K doubles; rep == 106 dt= 2.61599 MB/s= 1459 dt= 2.55153 MB/s= 1496 dt= 1.9791 MB/s= 1928 dt= 1.77306 MB/s= 2152 dt= 1.14735 MB/s= 3326 1024 K doubles; rep == 24 dt= 2.64158 MB/s= 1454 dt= 2.8532 MB/s= 1346 dt= 2.34867 MB/s= 1635 dt= 1.88431 MB/s= 2038 dt= 1.21084 MB/s= 3171 4096 K doubles; rep == 5 dt= 2.43537 MB/s= 1445 dt= 2.82337 MB/s= 1247 dt= 2.07422 MB/s= 1697 dt= 1.99251 MB/s= 1767 dt= 1.22719 MB/s= 2868 16384 K doubles; rep == 1 dt= 2.10939 MB/s= 1456 dt= 2.61517 MB/s= 1175 dt= 2.11508 MB/s= 1452 dt= 2.16597 MB/s= 1418 dt= 1.28349 MB/s= 2393 MemSize == 128 kB == walsh_wak_dif2(f,ldn); walsh_wak_dif4(f,ldn); walsh_wak_dif8(f,ldn); walsh_wak_matrix(f,ldn); walsh_wak_matrix_1(f,ldn,+1); 16 == ldn; MemSize == 512 kB == walsh_wak_dif2(f,ldn); walsh_wak_dif4(f,ldn); walsh_wak_dif8(f,ldn); walsh_wak_matrix(f,ldn); walsh_wak_matrix_1(f,ldn,+1); 18 == ldn; MemSize == 2 MB == walsh_wak_dif2(f,ldn); walsh_wak_dif4(f,ldn); walsh_wak_dif8(f,ldn); walsh_wak_matrix(f,ldn); walsh_wak_matrix_1(f,ldn,+1); 20 == ldn; MemSize == 8 MB == walsh_wak_dif2(f,ldn); walsh_wak_dif4(f,ldn); walsh_wak_dif8(f,ldn); walsh_wak_matrix(f,ldn); walsh_wak_matrix_1(f,ldn,+1); 22 == ldn; MemSize == 32 MB == walsh_wak_dif2(f,ldn); walsh_wak_dif4(f,ldn); walsh_wak_dif8(f,ldn); walsh_wak_matrix(f,ldn); walsh_wak_matrix_1(f,ldn,+1); 24 == ldn; MemSize == 128 MB == walsh_wak_dif2(f,ldn); walsh_wak_dif4(f,ldn); walsh_wak_dif8(f,ldn); walsh_wak_matrix(f,ldn); walsh_wak_matrix_1(f,ldn,+1); rel= rel= rel= rel= rel= 1 0.628352 * 0.63081 0.91383 0.778827 rel= rel= rel= rel= rel= 1 0.649977 * 0.729044 0.846378 0.746485 rel= rel= rel= rel= rel= 1 0.664403 * 0.726124 0.930112 0.851146 rel= rel= rel= rel= rel= 1 0.981368 0.927026 0.571747 0.471681 * rel= rel= rel= rel= rel= 1 1.21673 0.807095 0.572769 0.448609 * rel= rel= rel= rel= rel= 1 0.975359 0.756538 0.677776 0.438591 * rel= rel= rel= rel= rel= 1 1.08011 0.889113 0.713327 0.458376 * rel= rel= rel= rel= rel= 1 1.15932 0.851708 0.818155 0.503901 * rel= rel= rel= rel= rel= 1 1.23977 1.0027 1.02683 0.608466 * Figure 23.4-A: Relative speed of different implementations of the Walsh (wak) transform. The transforms were run ‘rep’ times for each measurement. The quantity ‘dt’ gives the elapsed time for rep transforms of the given type. The quantity ‘MB/s’ gives the memory transfer rate as if a radix-2 algorithm were used; it equals ‘Memsize’ times ‘ldn’ divided by the time elapsed for a single transform. The ‘rel’ gives the performance relative to the radix-2 version, smaller values mean better performance. 470 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Chapter 23: The Walsh transform and its relatives return; } // Recursion: short_walsh_wak_dit_2(f+2); // ldm==1 short_walsh_wak_dit_4(f+4); // ldm==2 short_walsh_wak_dit_8(f+8); // ldm==3 short_walsh_wak_dit_16(f+16); // ldm==4 for (ulong ldm=5; ldm>1); for (ulong t1=0, t2=mh; t1 void walsh_wak_loc_dif2(Type *f, ulong ldn) { if ( ldn<=13 ) // parameter: (2**13)*sizeof(Type) <= L1-cache { walsh_wak_dif4(f,ldn); return; } for (ulong ldm=ldn; ldm>=1; --ldm) { const ulong m = (1UL<>1); for (ulong t1=0, t2=mh; t1 inline void short_walsh_wak_dif_8(Type *f) { Type t0, t1, t2, t3, t4, t5, t6, t7; t0 = f[0]; t1 = f[1]; t2 = f[2]; t3 = f[3]; t4 = f[4]; t5 = f[5]; t6 = f[6]; t7 = f[7]; sumdiff( t0, t4 ); sumdiff( t1, t5 ); sumdiff( t2, t6 ); sumdiff( t3, t7 ); sumdiff( t0, t2 ); sumdiff( t1, t3 ); sumdiff( t4, t6 ); sumdiff( t5, t7 ); sumdiff( t0, t1 ); sumdiff( t2, t3 ); sumdiff( t4, t5 ); sumdiff( t6, t7 ); f[0] = t0; f[1] = t1; f[2] = t2; f[3] = t3; f[4] = t4; f[5] = t5; f[6] = t6; f[7] = t7; } The strategy used leads to a very favorable memory access pattern that results in excellent performance for large transforms. Figure 23.5-A shows a comparison between the localized transforms and the matrix algorithm. Small sizes are omitted because the localized algorithm has the same speed as the radix4 algorithm it falls back to. The localized algorithms are the clear winners, even against the matrix algorithm with only one transposition. For very large transforms the DIF version is slightly faster, as 23.5: Localized Walsh transforms 14 == ldn; MemSize == 128 kB == walsh_wak_matrix(f,ldn); walsh_wak_matrix_1(f,ldn,+1); walsh_wak_loc_dit2(f,ldn); walsh_wak_loc_dif2(f,ldn); 16 == ldn; MemSize == 512 kB == walsh_wak_matrix(f,ldn); walsh_wak_matrix_1(f,ldn,+1); walsh_wak_loc_dit2(f,ldn); walsh_wak_loc_dif2(f,ldn); 18 == ldn; MemSize == 2 MB == walsh_wak_matrix(f,ldn); walsh_wak_matrix_1(f,ldn,+1); walsh_wak_loc_dit2(f,ldn); walsh_wak_loc_dif2(f,ldn); 20 == ldn; MemSize == 8 MB == walsh_wak_matrix(f,ldn); walsh_wak_matrix_1(f,ldn,+1); walsh_wak_loc_dit2(f,ldn); walsh_wak_loc_dif2(f,ldn); 22 == ldn; MemSize == 32 MB == walsh_wak_matrix(f,ldn); walsh_wak_matrix_1(f,ldn,+1); walsh_wak_loc_dit2(f,ldn); walsh_wak_loc_dif2(f,ldn); 24 == ldn; MemSize == 128 MB == walsh_wak_matrix(f,ldn); walsh_wak_matrix_1(f,ldn,+1); walsh_wak_loc_dit2(f,ldn); walsh_wak_loc_dif2(f,ldn); 471 16384 doubles; rep == 2180 dt= 0.672327 MB/s= 5674 dt= 0.555851 MB/s= 6863 dt= 0.498558 MB/s= 7652 dt= 0.533746 MB/s= 7148 65536 doubles; rep == 477 dt= 0.919579 MB/s= 4150 dt= 0.692488 MB/s= 5511 dt= 0.653256 MB/s= 5842 dt= 0.670104 MB/s= 5695 256 K doubles; rep == 106 dt= 2.2111 MB/s= 1726 dt= 1.36827 MB/s= 2789 dt= 0.938006 MB/s= 4068 dt= 0.927804 MB/s= 4113 1024 K doubles; rep == 24 dt= 2.31178 MB/s= 1661 dt= 1.42614 MB/s= 2693 dt= 1.11847 MB/s= 3433 dt= 1.11142 MB/s= 3455 4096 K doubles; rep == 5 dt= 2.00573 MB/s= 1755 dt= 1.23695 MB/s= 2846 dt= 1.16461 MB/s= 3022 dt= 1.16164 MB/s= 3030 16384 K doubles; rep == 1 dt= 2.16536 MB/s= 1419 dt= 1.28455 MB/s= 2392 dt= 1.10769 MB/s= 2773 dt= 1.10601 MB/s= 2778 rel= rel= rel= rel= 1 0.826756 0.741541 * 0.793878 rel= rel= rel= rel= 1 0.753049 0.710386 * 0.728707 rel= rel= rel= rel= 1 0.618819 0.424225 0.419611 * rel= rel= rel= rel= 1 0.616901 0.483811 0.480765 * rel= rel= rel= rel= 1 0.616707 0.580644 0.579162 * rel= rel= rel= rel= 1 0.593226 0.511552 0.510775 * Figure 23.5-A: Speed comparison between localized and matrix algorithms for the Walsh transform. it starts with smaller chunks of data and therefore more of the data is in the cache when the larger sub-arrays get accessed. The localized algorithm can easily be implemented for transforms where a radix-2 step is known. Section 25.8 on page 529 gives the fast Hartley transform variant of the localized algorithm. Similar routines with higher radix can be developed. However, a radix-4 version was found to be slower than the given routines. A speedup can be achieved by unrolling and prefetching. We use the C-type double whose size is 8 bytes. Substitute the double loop in the DIF version (that is, the Haar transform) by 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 // machine-specific prefetch instruction: #define PREF(p,o) asm volatile ("prefetchw " #o "(%0) " : : "r" (p) ) ulong ldm; for (ldm=ldn; ldm>=6; --ldm) { const ulong m = (1UL<>1); PREF(f, 0); PREF(f+mh, 0); PREF(f, 64); PREF(f+mh, 64); PREF(f, 128); PREF(f+mh, 128); PREF(f, 192); PREF(f+mh, 192); for (ulong t1=0, t2=mh; t1=1; --ldm) { const ulong m = (1UL<>1); for (ulong t1=0, t2=mh; t1 inline void haar_dif2(Type *f, ulong n) { for (ulong m=n; m>=2; m>>=1) { const ulong mh = (m>>1); for (ulong t1=0, t2=mh; t1 void loc_dif2(Type *f, ulong n) { haar_dif2(f, n); for (ulong z=2; z inline void haar_dit2(Type *f, ulong n) { for (ulong m=1; m<=n; m<<=1) { const ulong mh = (m>>1); for (ulong t1=0, t2=mh; t1 void loc_dit2(Type f, ulong n) { for (ulong z=2, u=1; z void walsh_pal(Type *f, ulong ldn) { const ulong n = 1UL< void walsh_pal_basis(Type *f, ulong n, ulong k) (23.6-1) (23.6-2) 474 Chapter 23: The Walsh transform and its relatives 0: [ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ] ( 0) 1: [ * * * * * * * * * * * * * * * * ] ( 1) 2: [ * * * * * * * * * * * * * * * * ] ( 3) 3: [ * * * * * * * * * * * * * * * * ] ( 2) 4: [ * * * * * * * * * * * * * * * * ] ( 7) 5: [ * * * * * * * * * * * * * * * * ] ( 6) 6: [ * * * * * * * * * * * * * * * * ] ( 4) 7: [ * * * * * * * * * * * * * * * * ] ( 5) 8: [ * * * * * * * * * * * * * * * * ] (15) 9: [ * * * * * * * * * * * * * * * * ] (14) 10: [ * * * * * * * * * * * * * * * * ] (12) 11: [ * * * * * * * * * * * * * * * * ] (13) 12: [ * * * * * * * * * * * * * * * * ] ( 8) 13: [ * * * * * * * * * * * * * * * * ] ( 9) 14: [ * * * * * * * * * * * * * * * * ] (11) 15: [ * * * * * * * * * * * * * * * * ] (10) 16: [ * * * * * * * * * * * * * * * * ] (31) 17: [ * * * * * * * * * * * * * * * * ] (30) 18: [ * * * * * * * * * * * * * * * * ] (28) 19: [ * * * * * * * * * * * * * * * * ] (29) 20: [ * * * * * * * * * * * * * * * * ] (24) 21: [ * * * * * * * * * * * * * * * * ] (25) 22: [ * * * * * * * * * * * * * * * * ] (27) 23: [ * * * * * * * * * * * * * * * * ] (26) 24: [ * * * * * * * * * * * * * * * * ] (16) 25: [ * * * * * * * * * * * * * * * * ] (17) 26: [ * * * * * * * * * * * * * * * * ] (19) 27: [ * * * * * * * * * * * * * * * * ] (18) 28: [ * * * * * * * * * * * * * * * * ] (23) 29: [ * * * * * * * * * * * * * * * * ] (22) 30: [ * * * * * * * * * * * * * * * * ] (20) 31: [ * * * * * * * * * * * * * * * * ] (21) Figure 23.6-A: Walsh-Paley basis. Asterisks denote the value +1, blank entries denote −1. 3 4 5 6 7 8 9 10 11 { k = revbin(k, ld(n)); for (ulong i=0; i 23.7: Sequency-ordered Walsh transforms 475 0: [ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ] ( 0) 1: [ * * * * * * * * * * * * * * * * ] ( 1) 2: [ * * * * * * * * * * * * * * * * ] ( 2) 3: [ * * * * * * * * * * * * * * * * ] ( 3) 4: [ * * * * * * * * * * * * * * * * ] ( 4) 5: [ * * * * * * * * * * * * * * * * ] ( 5) 6: [ * * * * * * * * * * * * * * * * ] ( 6) 7: [ * * * * * * * * * * * * * * * * ] ( 7) 8: [ * * * * * * * * * * * * * * * * ] ( 8) 9: [ * * * * * * * * * * * * * * * * ] ( 9) 10: [ * * * * * * * * * * * * * * * * ] (10) 11: [ * * * * * * * * * * * * * * * * ] (11) 12: [ * * * * * * * * * * * * * * * * ] (12) 13: [ * * * * * * * * * * * * * * * * ] (13) 14: [ * * * * * * * * * * * * * * * * ] (14) 15: [ * * * * * * * * * * * * * * * * ] (15) 16: [ * * * * * * * * * * * * * * * * ] (16) 17: [ * * * * * * * * * * * * * * * * ] (17) 18: [ * * * * * * * * * * * * * * * * ] (18) 19: [ * * * * * * * * * * * * * * * * ] (19) 20: [ * * * * * * * * * * * * * * * * ] (20) 21: [ * * * * * * * * * * * * * * * * ] (21) 22: [ * * * * * * * * * * * * * * * * ] (22) 23: [ * * * * * * * * * * * * * * * * ] (23) 24: [ * * * * * * * * * * * * * * * * ] (24) 25: [ * * * * * * * * * * * * * * * * ] (25) 26: [ * * * * * * * * * * * * * * * * ] (26) 27: [ * * * * * * * * * * * * * * * * ] (27) 28: [ * * * * * * * * * * * * * * * * ] (28) 29: [ * * * * * * * * * * * * * * * * ] (29) 30: [ * * * * * * * * * * * * * * * * ] (30) 31: [ * * * * * * * * * * * * * * * * ] (31) Figure 23.7-A: The Walsh-Kacmarz basis is sequency-ordered. Asterisks denote +1, and blanks −1. 2 3 4 5 6 7 8 9 10 11 12 13 14 15 void walsh_wal_basis(Type *f, ulong n, ulong k) { k = revbin(k, ld(n)+1); k = gray_code(k); // // =^= // k = revbin(k, ld(n)); // k = rev_gray_code(k); for (ulong i=0; i void walsh_wal_dif2_core(Type *f, ulong ldn) // Core routine for sequency-ordered Walsh transform. // Radix-2 decimation in frequency (DIF) algorithm. { const ulong n = (1UL<=2; --ldm) { const ulong m = (1UL<>1); const ulong m4 = (mh>>1); for (ulong r=0; r>1); for (ulong r=0; r inline void walsh_wal(Type *f, ulong ldn) { revbin_permute(f, (1UL< inline void walsh_wal_rev(Type *f, ulong ldn) { revbin_permute(f, (1UL< void walsh_wal_rev_basis(Type *f, ulong n, ulong k) { k = revbin(k, ld(n)); 478 5 6 7 8 9 10 11 12 13 14 15 // // Chapter 23: The Walsh transform and its relatives k = gray_code(k); // =^= k = rev_gray_code(k); k = revbin(k, ld(n)); for (ulong i=0; i void walsh_q1(Type *f, ulong ldn) { ulong n = 1UL << ldn; grs_negate(f, n); walsh_gray(f, ldn); revbin_permute(f, n); } The routine walsh_gray() is given in [FXT: walsh/walshgray.h]: 1 2 template void walsh_gray(Type *f, ulong ldn) 23.7: Sequency-ordered Walsh transforms 479 ldm=4 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ldm=3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ldm=2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ldm=1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Figure 23.7-D: Data flow for the length-16 Walsh-Gray routine. 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 { const ulong n = (1UL<0; --ldm) // dif { const ulong m = (1UL< void walsh_q2(Type *f, ulong ldn) { ulong n = 1UL << ldn; revbin_permute(f, n); grs_negate(f, n); walsh_gray(f, ldn); // =^= // grs_negate(f, n); // revbin_permute(f, n); // walsh_gray(f, ldn); } The transform could be computed by the following statements: 480 Chapter 23: The Walsh transform and its relatives 0: [ * * * * * * * * * * * * * * * * * * * * ] (16) 1: [ * * * * * * * * * * * * * * * * ] (16) 2: [ * * * * * * * * * * * * * * * * ] (16) 3: [ * * * * * * * * * * * * ] (16) 4: [ * * * * * * * * * * * * * * * * ] (16) 5: [ * * * * * * * * * * * * ] (16) 6: [ * * * * * * * * * * * * ] (16) 7: [ * * * * * * * * * * * * * * * * ] (16) 8: [ * * * * * * * * * * * * * * * * ] (16) 9: [ * * * * * * * * * * * * * * * * * * * * ] (16) 10: [ * * * * * * * * * * * * ] (16) 11: [ * * * * * * * * * * * * * * * * ] (16) 12: [ * * * * * * * * * * * * ] (16) 13: [ * * * * * * * * * * * * * * * * ] (16) 14: [ * * * * * * * * * * * * * * * * ] (16) 15: [ * * * * * * * * * * * * ] (16) 16: [ * * * * * * * * * * * * * * * * ] (15) 17: [ * * * * * * * * * * * * ] (15) 18: [ * * * * * * * * * * * * * * * * * * * * ] (15) 19: [ * * * * * * * * * * * * * * * * ] (15) 20: [ * * * * * * * * * * * * ] (15) 21: [ * * * * * * * * * * * * * * * * ] (15) 22: [ * * * * * * * * * * * * * * * * ] (15) 23: [ * * * * * * * * * * * * ] (15) 24: [ * * * * * * * * * * * * ] (15) 25: [ * * * * * * * * * * * * * * * * ] (15) 26: [ * * * * * * * * * * * * * * * * ] (15) 27: [ * * * * * * * * * * * * * * * * * * * * ] (15) 28: [ * * * * * * * * * * * * * * * * ] (15) 29: [ * * * * * * * * * * * * ] (15) 30: [ * * * * * * * * * * * * ] (15) 31: [ * * * * * * * * * * * * * * * * ] (15) Figure 23.7-E: Basis functions for a self-inverse Walsh transform (second form) that has sequencies n/2 and n/2 − 1 only. Asterisks denote the value +1, blank entries denote −1. ulong n = 1UL << ldn; revbin_permute(f, n); walsh_q1(f, ldn); revbin_permute(f, n); The basis functions of the transforms can be computed as follows [FXT: walsh/walsh-basis.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 template void walsh_q1_basis(Type *f, ulong n, ulong k) { ulong qk = (grs_negative_q(k) ? 1 : 0); k = gray_code(k); k = revbin(k, ld(n)); for (ulong i=0; i void walsh_q2_basis(Type *f, ulong n, ulong k) { ulong qk = (grs_negative_q(k) ? 1 : 0); k = revbin(k, ld(n)); k = gray_code(k); for (ulong i=0; i void dyadic_convolution(Type * restrict f, Type * restrict g, ulong ldn) // Dyadic convolution (XOR-convolution): h[] of f[] and g[]: // h[k] = sum( i XOR j == k, f[i]*g[k] ) // Result is written to g[]. // ldn := base-2 logarithm of the array length { walsh_wak(f, ldn); walsh_wak(g, ldn); const ulong n = (1UL< static inline void fht_mul(Type xi, Type xj, Type &yi, Type &yj, double v) // yi <-- v*( (yi + yj)*xi + (yi - yj)*xj ) == v*( (xi + xj)*yi + (xi - xj)*yj ) // yj <-- v*( (-yi + yj)*xi + (yi + yj)*xj ) == v*( (-xi + xj)*yi + (xi + xj)*yj ) { Type h1p = xi, h1m = xj; Type s1 = h1p + h1m, d1 = h1p - h1m; Type h2p = yi, h2m = yj; yi = (h2p * s1 + h2m * d1) * v; yj = (h2m * s1 - h2p * d1) * v; } 23.9 Slant transform The slant transform can be implemented using a Walsh Transform and just a little pre/post-processing [FXT: walsh/slant.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 void slant(double *f, ulong ldn) { walsh_wak(f, ldn); ulong n = 1UL< void arith_transform_plus(Type *f, ulong ldn) // Arithmetic Transform (positive sign). 484 0: [ + + + + + + + + + + + + + + + + ] 1: [ + + + + + + + + ] 2: [ + + + + + + + + ] 3: [ + + + + ] 4: [ + + + + + + + + ] 5: [ + + + + ] 6: [ + + + + ] 7: [ + + ] 8: [ + + + + + + + + ] 9: [ + + + + ] 10: [ + + + + ] 11: [ + + ] 12: [ + + + + ] 13: [ + + ] 14: [ + + ] 15: [ + ] Chapter 23: The Walsh transform and its relatives 0: [ + - - + - + + - - + + - + - - + ] 1: [ + + + + - ] 2: [ + - + - + + - ] 3: [ + + ] 4: [ + - - + - + + - ] 5: [ + + ] 6: [ + - + ] 7: [ + - ] 8: [ + - - + - + + - ] 9: [ + + ] 10: [ + - + ] 11: [ + - ] 12: [ + - - + ] 13: [ + - ] 14: [ + - ] 15: [ + ] Figure 23.10-A: Basis functions for the transform Y + (left) and Y − (right). The values are ±1, or 0 (blank entries). 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 // Radix-2 decimation In Frequency (DIF) algorithm. { const ulong n = (1UL<=1; --ldm) { const ulong m = (1UL<>1); for (ulong r=0; r void arith_transform_minus(Type *f, ulong ldn) // Arithmetic Transform (negative sign). // Radix-2 decimation In Frequency (DIF) algorithm. // Inverse of arith_transform_plus(). { [--snip--] f[t1] = u; f[t2] = v - u; [--snip--] } The length-2 transforms can be written as      +1 0 a a + Y2 v = = +1 +1 b a+b      +1 0 a a − Y2 v = = −1 +1 b b−a (23.10-1a) (23.10-1b) 23.10: Arithmetic transform 485 In Kronecker product notation (see section 23.3 on page 462) the transforms can be written as log2 (n) Yn+ = O Y2+ where Y2+ where Y2−  = k=1 log2 (n) Yn− = O Y2−  = k=1 +1 +1 0 +1  +1 −1 0 +1  (23.10-2a) (23.10-2b) The k-th element of the arithmetic transform Y + is Y + [a]k = X ai (23.10-3a) i⊆k where i ⊆ k means that the bits of i are a subset of the bits of k: i ⊆ k ⇐⇒ (i ∧ k) = i. For the transform Y − we have X X Y − [a]k = (−1)p(k) (−1)p(i) ai = (−1)p(k−i) ai (23.10-3b) i⊆k i⊆k where p(x) is the parity of x. 23.10.1 Reversed arithmetic transform 0: [ + ] 1: [ + + ] 2: [ + + ] 3: [ + + + + ] 4: [ + + ] 5: [ + + + + ] 6: [ + + + + ] 7: [ + + + + + + + + ] 8: [ + + ] 9: [ + + + + ] 10: [ + + + + ] 11: [ + + + + + + + + ] 12: [ + + + + ] 13: [ + + + + + + + + ] 14: [ + + + + + + + + ] 15: [ + + + + + + + + + + + + + + + + ] 0: [ + ] 1: [ - + ] 2: [ + ] 3: [ + - - + ] 4: [ + ] 5: [ + - + ] 6: [ + + ] 7: [ - + + - + - - + ] 8: [ + ] 9: [ + - + ] 10: [ + + ] 11: [ - + + + - - + ] 12: [ + + ] 13: [ - + + + - + ] 14: [ + + + + ] 15: [ + - - + - + + - - + + - + - - + ] Figure 23.10-B: Basis functions for the transform B + (left) and B − (right). We define the (mutually inverse) reversed arithmetic transforms B + and B − via log2 (n) Bn+ = O B2+  = B2− −1 +1  where B2− = +1 0 X ai = X ai = log2 (n) O  where k=1 Bn− +1 +1 0 +1 B2+  k=1 , (23.10-4a) , (23.10-4b) The k-th element of the transform B + is B + [a]k = i⊆k k⊆i where k = n − 1 − k is the complement of k: we have e ⊆ f ⇐⇒ f ⊆ e. A routine for the transform B + is [FXT: walsh/arithtransform.h] (23.10-5) 486 1 2 3 4 5 6 7 8 Chapter 23: The Walsh transform and its relatives template void rev_arith_transform_plus(Type *f, ulong ldn) { [--snip--] f[t1] = u + v; f[t2] = v; [--snip--] } The omitted lines are identical to the routine for Y + . The same transform could be computed by the statements: ulong n=1UL< void rev_arith_transform_minus(Type *f, ulong ldn) // Inverse of rev_arith_transform_plus(). { [--snip--] f[t1] = u - v; f[t2] = v; [--snip--] } 23.10.2 Conversion to and from the Walsh transform ‡ To establish the relation to the Walsh transform recall that its decomposition as a Kronecker product is log2 (n) Wn = O  W2 where W2 = k=1 +1 +1 +1 −1  (23.10-6) We have (W Y + ) Y − = W , and the expression in parentheses is the matrix that converts the arithmetic transform Y − to the Walsh transform. Similarly, ( 21 Y + W ) W = Y + , gives the matrix for the conversion from the Walsh transform to the arithmetic transform Y + . We only need length-2 transforms to obtain the conversions:    − +2 +1 + WY Y = W = Y− (23.10-7a) 0 −1    0 +1 W Y− Y+ = W = Y+ (23.10-7b) +2 −1     1 − 1 +1 +1 Y W W = Y− = W (23.10-7c) 0 −2 2 2     1 + 1 +1 +1 Y W W = Y+ = W (23.10-7d) 0 2 2 +2 The Kronecker product of the given matrices gives the converting transform. For example, using relation 23.10-7a, define log2 (n)  Tn := O k=1 +2 +1 0 −1  (23.10-8) Then Tn converts an arithmetic transform Y − to a Walsh transform: Wn = Tn Yn− . The relations between the arithmetic transform, the Reed-Muller transform, and the Walsh transform are treated in [330]. 23.11 Reed-Muller transform The Reed-Muller transform is obtained from the arithmetic transform by working modulo 2: replace all + and - by XOR. The transform is self-inverse, its basis functions are identical to those of the arithmetic transform Y + , shown in figure 23.10-A on page 484. An implementation is [FXT: walsh/reedmuller.h]: 23.11: Reed-Muller transform 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 487 template void word_reed_muller_dif2(Type *f, ulong ldn) // Reed-Muller Transform. // Radix-2 decimation in frequency (DIF) algorithm. // Self-inverse. // Type must have the XOR operator. { const ulong n = (1UL<=1; --ldm) { const ulong m = (1UL<>1); for (ulong r=0; r inline void bit_reed_muller(Type *f, ulong ldn) { word_reed_muller_dif2(f, ldn); ulong n = 1UL << ldn; for (ulong k=0; k =--> Reed-Muller: f[t1] = u; Reed-Muller: f[t2] = u ^ v; For the decimation in time algorithm, make the very same changes in walsh_wak_dit2(). The replacements for the reversed Reed-Muller transform are: Walsh: f[t1] = u + v; Walsh: f[t2] = u - v; =--> =--> reversed Reed-Muller: f[t1] = u ^ v; reversed Reed-Muller: f[t2] = v; 488 Chapter 23: The Walsh transform and its relatives blue rev. Reed Muller 1............... 11.............. 1.1............. 1111............ 1...1........... 11..11.......... 1.1.1.1......... 11111111........ 1.......1....... 11......11...... 1.1.....1.1..... 1111....1111.... 1...1...1...1... 11..11..11..11.. 1.1.1.1.1.1.1.1. 1111111111111111 yellow Reed Muller 1111111111111111 .1.1.1.1.1.1.1.1 ..11..11..11..11 ...1...1...1...1 ....1111....1111 .....1.1.....1.1 ......11......11 .......1.......1 ........11111111 .........1.1.1.1 ..........11..11 ...........1...1 ............1111 .............1.1 ..............11 ...............1 red green 1111111111111111 1.1.1.1.1.1.1.1. 11..11..11..11.. 1...1...1...1... 1111....1111.... 1.1.....1.1..... 11......11...... 1.......1....... 11111111........ 1.1.1.1......... 11..11.......... 1...1........... 1111............ 1.1............. 11.............. 1............... ...............1 ..............11 .............1.1 ............1111 ...........1...1 ..........11..11 .........1.1.1.1 ........11111111 .......1.......1 ......11......11 .....1.1.....1.1 ....1111....1111 ...1...1...1...1 ..11..11..11..11 .1.1.1.1.1.1.1.1 1111111111111111 Figure 23.11-A: Basis functions of the length-16 blue, yellow, red, and green transforms. The symbolic powering idea from section 1.19 on page 49 leads to transforms with the following bases (using length-8 arrays and the yellow code): 1....... .1...... ..1..... ...1.... ....1... .....1.. ......1. .......1 x=0 1...1... .1...1.. ..1...1. ...1...1 ....1... .....1.. ......1. .......1 x=1 1.1..... .1.1.... ..1..... ...1.... ....1.1. .....1.1 ......1. .......1 x=2 1.1.1.1. .1.1.1.1 ..1...1. ...1...1 ....1.1. .....1.1 ......1. .......1 x=3 11...... .1...... ..11.... ...1.... ....11.. .....1.. ......11 .......1 x=4 11..11.. .1...1.. ..11..11 ...1...1 ....11.. .....1.. ......11 .......1 x=5 1111.... .1.1.... ..11.... ...1.... ....1111 .....1.1 ......11 .......1 x=6 11111111 .1.1.1.1 ..11..11 ...1...1 ....1111 .....1.1 ......11 .......1 x=7 The program [FXT: bits/bitxtransforms-demo.cc] gives the matrices for 64-bit words. A function that computes the k-th basis function of the transform is [FXT: walsh/reedmuller.h]: 1 2 3 4 5 6 7 8 template inline void reed_muller_basis(Type *f, ulong n, ulong k) { for (ulong i=0; i void word_gray(Type *f, ulong n) { for (ulong k=0; k void word_gray_pow(Type *f, ulong n, ulong x) { for (ulong s=1; s>= 1; } } f[k] ^= f[j]; Let e be the reversed Gray code operator, then we have for the reversed Reed-Muller transform B: B S+1 B = e−1 (23.11-3a) B S−1 B = e (23.11-3b) −k B Sk B = e (23.11-3c) E Sk R = ek (23.11-4a) k = Sk (23.11-4b) Further, Ee R The transforms as Kronecker products (all operations are modulo 2): log2 (n) Bn = O  B2 where B2 = k=1 log2 (n) Yn = O  Y2 where Y2 = k=1 log2 (n) Rn = O  R2 where R2 = E2 where E2 = k=1 log2 (n) En = O  k=1 23.12 1 0 1 1  1 1 0 1  0 1 1 1  1 1 1 0  (23.11-5a) (23.11-5b) (23.11-5c) (23.11-5d) The OR-convolution and the AND-convolution Let a and b be sequences of length a power of 2. We define the OR-convolution h of a and b as X hτ = ai bj (23.12-1) i∨j=τ where ∨ denotes bit-wise OR. The symbolic table for the OR-convolution is shown in figure 23.12-A (see figure 22.1-A on page 441 for an explanation of the scheme). The OR-convolution can be computed via 1 2 3 4 5 6 7 8 9 10 11 12 template inline void slow_or_convolution(const Type *f, const Type *g, ulong ldn, Type *h) // Compute the OR-convolution h[] of f[] and g[]: // h[k] = sum(i | j == k, f[i]*g[j]) // Result written to h[]. { const ulong n = 1UL << ldn; for (ulong j=0; j inline void or_convolution(Type * restrict f, Type * restrict g, ulong ldn) { arith_transform_plus(f, ldn); arith_transform_plus(g, ldn); const ulong n = (1UL< inline void slow_and_convolution(const Type *f, const Type *g, ulong ldn, Type *h) // Compute the AND-convolution h[] of f[] and g[]: // h[k] = sum(i & j == k, f[i]*g[j]) // Result written to h[]. { const ulong n = 1UL << ldn; for (ulong j=0; j inline void and_convolution(Type * restrict f, Type * restrict g, ulong ldn) { rev_arith_transform_plus(f, ldn); rev_arith_transform_plus(g, ldn); const ulong n = (1UL< inline void slow_max_convolution(const Type *f, const Type *g, ulong n, Type *h) // Compute the MAX-convolution h[] of f[] and g[]: // h[k] = sum( max(i,j) == k, f[i]*g[j]) // Result written to h[]. { for (ulong j=0; j inline void max_convolution(const Type *f, const Type *g, ulong n, Type *h) { Type sf=0, sg=0; // cumulative sums for (ulong k=0; k void arith_transform_plus(Type *f, ulong ldn, Type w) // Weighted arithmetic transform (positive sign). { if ( w!=(Type)1 ) bit_count_weight(f, ldn, w); arith_transform_plus(f, ldn); } The routine for the multiplications with powers of ω is [FXT: walsh/bitcount-weight.h]: 1 2 3 4 5 6 7 8 9 10 template void bit_count_weight(Type *f, ulong ldn, Type w) // Multiply f[i] by w**bitcount(i). { ALLOCA(Type, pw, ldn+1); // powers of w pw[0] = (Type)1; for (ulong j=1; j<=ldn; ++j) pw[j] = w * pw[j-1]; const ulong n = (1UL< void arith_transform_minus(Type *f, ulong ldn, Type w) // Weighted arithmetic transform (negative sign). // Inverse of (weighted) arith_transform_plus(). { arith_transform_minus(f, ldn); if ( w!=(Type)1 ) bit_count_weight(f, ldn, 1.0/w); } 23.14.2 Subset convolution We want to compute the subset convolution s of the sequences a and b, defined as X sτ = ai bj (23.14-3) i∨j=τ, i∧j=0 The definition is similar to the OR-convolution, but the condition i ∧ j = 0 (no intersecting subsets) makes matters more complicated. Figure 23.14-B shows the symbolic scheme, note that many products ai bj do not appear at all in the subset convolution. The total number of products ai bj is N 3 for N a power of 2. It may seem that computing fewer products (than N 4 , as with the OR-convolution) would allow for a method even  cheaper than O (N log N ), but no such scheme is known. We develop a method that is O N (log N )2 . Define the weighted OR-convolution h(ω) of a and b as X h(ω)τ = ω c(i∧j) ai bj (23.14-4) i∨j=τ The symbolic table for the convolution with ω = −1 is shown in figure 23.14-C. The positive entries appear where the basis of the Walsh transform is positive, see figure 23.1-A on page 459. We can compute the weighted OR-convolution by definition [FXT: walsh/weighted-or-convolution.h]: template inline void slow_weighted_or_convolution(const Type *f, const Type *g, ulong ldn, 494 Chapter 23: The Walsh transform and its relatives [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 ] [ 1 . 3 . 5 . 7 . 9 . 11 [ 2 3 . . 6 7 . . 10 11 . . 14 15 . . ] [ 3 . . . 7 . . . 11 . . 15 . . . ] [ 4 5 6 7 . . . . 12 13 14 15 . . . . ] [ 5 . 7 . . . . . 13 . . . . . ] [ 6 7 . . . . . . 14 15 . . . . . . ] [ 7 . . . . . . . 15 . . . . . . . ] [ 8 9 10 11 12 13 14 15 . . . . . . . . ] [ 9 . 11 . 15 . 15 . 15 . ] . . . . . . . . . ] [10 11 . . 14 15 . . . . . . . . . . ] [11 . . 15 . . . . . . . . . . . ] . . 13 . . 13 [12 13 14 15 . . . . . . . . . . . . ] [13 . . . . . . . . . . . . . ] [14 15 . 15 . . . . . . . . . . . . . . ] [15 . . . . . . . . . . . . . . ] . Figure 23.14-B: Semi-symbolic scheme for the subset convolution. Dots denote unused products. weighted (w=-1) OR-convolution, positive entries: +-- 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1: 1 . 3 . 5 . 7 . 9 . 11 . 13 . 15 . 2: 2 3 . . 6 7 . . 10 11 . . 14 15 . . 3: 3 . . 3 7 . . 7 11 . . 11 15 . . 15 4: 4 5 6 7 . . . . 12 13 14 15 . . . . 5: 5 . 7 . . 5 . 7 13 . 15 . . 13 . 15 6: 6 7 . . . . 6 7 14 15 . . . . 14 15 7: 7 . . 7 . 7 7 . 15 . . 15 . 15 15 . 8: 9: 10: 11: 12: 13: 14: 15: 8 9 10 11 12 13 14 15 9 . 11 . 13 . 15 . 10 11 . . 14 15 . . 11 . . 11 15 . . 15 12 13 14 15 . . . . 13 . 15 . . 13 . 15 14 15 . . . . 14 15 15 . . 15 . 15 15 . . . . . . . . . . 9 . 11 . 13 . 15 . . 10 11 . . 14 15 . 11 11 . . 15 15 . . . . . 12 13 14 15 . 13 . 15 13 . 15 . . . 14 15 14 15 . . . 15 15 . 15 . . 15 weighted (w=-1) OR-convolution, negative entries: +-- 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0: . . . . . . . . . . . . . . . . 1: . 1 . 3 . 5 . 7 . 9 . 11 . 13 . 15 2: . . 2 3 . . 6 7 . . 10 11 . . 14 15 3: . 3 3 . . 7 7 . . 11 11 . . 15 15 . 4: . . . . 4 5 6 7 . . . . 12 13 14 15 5: . 5 . 7 5 . 7 . . 13 . 15 13 . 15 . 6: . . 6 7 6 7 . . . . 14 15 14 15 . . 7: . 7 7 . 7 . . 7 . 15 15 . 15 . . 15 8: 9: 10: 11: 12: 13: 14: 15: . . . . . . . . . 9 . 11 . 13 . 15 . . 10 11 . . 14 15 . 11 11 . . 15 15 . . . . . 12 13 14 15 . 13 . 15 13 . 15 . . . 14 15 14 15 . . . 15 15 . 15 . . 15 8 9 10 11 12 13 14 15 9 . 11 . 13 . 15 . 10 11 . . 14 15 . . 11 . . 11 15 . . 15 12 13 14 15 . . . . 13 . 15 . . 13 . 15 14 15 . . . . 14 15 15 . . 15 . 15 15 . Figure 23.14-C: Semi-symbolic scheme for the weighted OR-convolution with ω = −1, separated into positive (top) and negative (bottom) entries. 23.14: Weighted arithmetic transform and subset convolution 495 Type *h, Type w) // Compute the weighted OR-convolution h[] of f[] and g[]: // h[k] = sum(i | j == k, f[i]*g[j] * (w)**bitcount(i&j)) // Result written to h[]. { ALLOCA(Type, pw, ldn+1); // powers of w pw[0] = (Type)1; for (ulong j=1; j<=ldn; ++j) pw[j] = w * pw[j-1]; const ulong n = 1UL << ldn; for (ulong j=0; j inline void weighted_or_convolution(Type * restrict f, Type * restrict g, ulong ldn, Type w) { arith_transform_plus(f, ldn, w); arith_transform_plus(g, ldn, w); const ulong n = (1UL< inline void subset_convolution(Type *f, Type *g, ulong ldn) // Compute the subset convolution h[] of f[] and g[]: 496 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 Chapter 23: The Walsh transform and its relatives // h[k] = sum( j subset k, f[j]*g[k-j] ) // Type must allow conversion to and from type Complex. // Result written to g[]. { const ulong n = 1UL << ldn; Complex *fc, *gc, *hc; fc = new Complex[n]; gc = new Complex[n]; hc = new Complex[n]; // w^0: copy_cast(f, fc, n); copy_cast(g, gc, n); or_convolution(fc, gc, ldn); acopy(gc, hc, n); // w^1, w^2, ... , w^(L-1): const ulong L = ldn + 1; const Complex w = SinCos( 2*M_PI/(double)L ); Complex wp = 1.0; // powers of w for (ulong j=1; j void haar(Type *f, ulong ldn) { ulong n = (1UL<1; m>>=1) // n, n/2, n/4, n/8, ..., 4, 2 { ulong mh = (m>>1); for (ulong j=0, k=0; j void haar(Type *f, ulong ldn, Type *ws=0) { ulong n = (1UL<1; m>>=1) { v *= s2; ulong mh = (m>>1); for (ulong j=0, k=0; j void inverse_haar(Type *f, ulong ldn, Type *ws=0) { ulong n = (1UL<>1); for (ulong j=0, k=0; j void haar_inplace(Type *f, ulong ldn) { ulong n = 1UL<>1; j void inverse_haar_inplace(Type *f, ulong ldn) { ulong n = 1UL<=2; js>>=1) { for (ulong j=0, t=js>>1; j void haar_permute(Type *f, ulong n) { revbin_permute(f, n); for (ulong m=4; m<=n/2; m*=2) revbin_permute(f+m, m); } The revbin permutations in the loop do not overlap, so the routine for the inverse Haar permutation is obtained by simply swapping the loop with the full-length revbin permutation [FXT: perm/haarpermute.h]: 1 2 3 4 5 6 template void inverse_haar_permute(Type *f, ulong n) { for (ulong m=4; m<=n/2; m*=2) revbin_permute(f+m, m); revbin_permute(f, n); } 24.3: Non-normalized Haar transforms 0: [+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +] 1: [+ + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - -] 2: [+ + + + + + + + - - - - - - - ] 3: [ + + + + + + + + - - - - - - - -] 4: [+ + + + - - - ] 5: [ + + + + - - - ] 6: [ + + + + - - - ] 7: [ + + + + - - - -] 8: [+ + - ] 9: [ + + - ] 10: [ + + - ] 11: [ + + - ] 12: [ + + - ] 13: [ + + - ] 14: [ + + - ] 15: [ + + - -] 16: [+ ] 17: [ + ] 18: [ + ] 19: [ + ] 20: [ + ] 21: [ + ] 22: [ + ] 23: [ + ] 24: [ + ] 25: [ + ] 26: [ + ] 27: [ + ] 28: [ + ] 29: [ + ] 30: [ + ] 31: [ + -] 501 1/sqrt(32) 1/sqrt(32) 1/sqrt(16) 1/sqrt(16) 1/sqrt(8) 1/sqrt(8) 1/sqrt(8) 1/sqrt(8) 1/sqrt(4) 1/sqrt(4) 1/sqrt(4) 1/sqrt(4) 1/sqrt(4) 1/sqrt(4) 1/sqrt(4) 1/sqrt(4) 1/sqrt(2) 1/sqrt(2) 1/sqrt(2) 1/sqrt(2) 1/sqrt(2) 1/sqrt(2) 1/sqrt(2) 1/sqrt(2) 1/sqrt(2) 1/sqrt(2) 1/sqrt(2) 1/sqrt(2) 1/sqrt(2) 1/sqrt(2) 1/sqrt(2) 1/sqrt(2) Figure 24.2-B: Basis functions of the in-place order Haar transform followed by a revbin permutation. In this ordering those basis functions which are identical up to a shift appear consecutively. Relation 24.2-1a tells us that haar() is equivalent to the sequence of statements haar_inplace(); haar_permute(); and, by relation 24.2-1b, inverse_haar() is equivalent to inverse_haar_permute(); inverse_haar_inplace(); 24.3 Non-normalized Haar transforms Versions of the Haar transform without normalization are given in [FXT: haar/haarnn.h]. The basis functions are the same as for the normalized versions, only the absolute value of the nonzero entries are different. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 template void haar_nn(Type *f, ulong ldn, Type *ws=0) { ulong n = (1UL<1; m>>=1) { ulong mh = (m>>1); for (ulong j=0, k=0; j void inverse_haar_nn(Type *f, ulong ldn, Type *ws=0) { ulong n = (1UL<>1); for (ulong j=0, k=0; j void haar_inplace_nn(Type *f, ulong ldn) { ulong n = 1UL<>1; j void inverse_haar_inplace_nn(Type *f, ulong ldn) { ulong n = 1UL<=2; js>>=1) { for (ulong j=0, t=js>>1; j void transposed_haar_nn(Type *f, ulong ldn, Type *ws=0) { ulong n = (1UL<>1); for (ulong j=0, k=0; j void inverse_transposed_haar_nn(Type *f, ulong ldn, Type *ws=0) { ulong n = (1UL<1; m>>=1) { ulong mh = (m>>1); for (ulong j=0, k=0; j void transposed_haar_inplace_nn(Type *f, ulong ldn) 24.5: The reversed Haar transform ‡ 3 4 5 6 7 8 9 10 11 12 13 14 15 505 { ulong n = 1UL<=2; js>>=1) { for (ulong j=0, t=js>>1; j void inverse_transposed_haar_inplace_nn(Type *f, ulong ldn) { ulong n = 1UL<>1; j void haar_rev_nn(Type *f, ulong ldn) { // const ulong n = (1UL<=1; --ldm) { const ulong m = (1UL<>1); ulong r = 0; // for (ulong r=0; r void inverse_haar_rev_nn(Type *f, ulong ldn) { for (ulong ldm=1; ldm<=ldn; ++ldm) { const ulong m = (1UL<>1); ulong r = 0; // for (ulong r=0; r void transposed_haar_rev_nn(Type *f, ulong ldn) { for (ulong ldm=1; ldm<=ldn; ++ldm) { const ulong m = (1UL<>1); ulong r = 0; // for (ulong r=0; r void inverse_transposed_haar_rev_nn(Type *f, ulong ldn) { // const ulong n = (1UL<=1; --ldm) { const ulong m = (1UL<>1); ulong r = 0; // for (ulong r=0; r0; --ldk) { ulong k = 1UL << ldk; for (ulong j=k; j0; --ldk) { 24.6: Relations between Walsh and Haar transforms AAAAAAAAaaaaaaaa AAAAaaaaBBBBbbbb AAaaCCccBBbbCCcc AaDdCcDdBbDdCcDd aaaaaaaaAAAAAAAA bbbbBBBBaaaaAAAA ccCCbbBBccCCaaAA dDcCdDbBdDcCdDaA WH1 AaDdCcDdBbDdCcDd AAaaCCccBBbbCCcc AAAAaaaaBBBBbbbb AAAAAAAAaaaaaaaa 509 WH2T dDcCdDbBdDcCdDaA ccCCbbBBccCCaaAA bbbbBBBBaaaaAAAA aaaaaaaaAAAAAAAA WH1T WH2 Figure 24.6-C: Symbolic scheme of the four versions of the computation of the Walsh transform via Haar transforms. 6 7 8 ulong k = 1UL << ldk; for (ulong j=k; j void prefix_transform(Type *f, ulong ldn) { for (ulong ldm=1; ldm<=ldn; ++ldm) { const ulong mh = 1UL << (ldm-1); for (ulong i=0; i void inverse_prefix_transform(Type *f, ulong ldn) { for (ulong ldm=ldn; ldm>=1; --ldm) { const ulong mh = 1UL << (ldm-1); for (ulong i=0; i inline void slow_prefix_convolution(const Type *f, const Type *g, ulong ldn, Type *h) { const ulong n = 1UL << ldn; for (ulong k=0; k inline void prefix_convolution(Type * restrict f, Type * restrict g, ulong ldn) { prefix_transform(f, ldn); prefix_transform(g, ldn); const ulong n = (1UL<> 1; // next smaller Mersenne number for (ulong j=0,k=f1+1; j inline void hartley_shift_05_v2rec(Type *f, ulong n) { const ulong nh = n/2; if ( n>=4 ) { ulong im=nh/2, jm=3*im; 25.2: Radix-2 FHT algorithms 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 517 Type fi = f[im], fj = f[jm]; double cs = SQRT1_2; f[im] = (fi + fj) * cs; f[jm] = (fi - fj) * cs; if ( n>=8 ) { const Tdouble phi0 = PI/n; Tdouble be = Sin(phi0), al = Sin(0.5*phi0); al *= (2.0*al); Tdouble s = 0.0, c = 1.0; for (ulong i=1, j=n-1, k=nh-1, l=nh+1; i>1); const ulong m4 = (mh>>1); const double phi0 = M_PI/mh; for (ulong r=0; r>1); const ulong m4 = (mh>>1); const double phi0 = M_PI/mh; for (ulong r=0; r=1; --ldm) { const ulong m = (1UL<>1); const ulong m4 = (mh>>1); const double phi0 = M_PI/mh; for (ulong r=0; r0 ) else for (ulong i=1,j=n-1; i static inline void sumdiff05(Type &a, Type &b) // {a, b} <--| {0.5*(a+b), 0.5*(a-b)} { Type t=(a-b)*0.5; a+=b; a*=0.5; b=t; } template static inline void sumdiff05_r(Type &a, Type &b) // {a, b} <--| {0.5*(a+b), 0.5*(b-a)} { Type t=(b-a)*0.5; a+=b; a*=0.5; b=t; }   At the end of the procedure the ordering of the output data c = F a ∈ C is a[0] = Re c0 a[1] = Re c1 a[2] = Re c2 ... a[n/2] = Re cn/2 a[n/2 + 1] = Im cn/2−1 a[n/2 + 2] = Im cn/2−2 a[n/2 + 3] = ... Im cn/2−3 a[n − 1] = Im c1 The inverse procedure is given in [FXT: realfft/realfftbyfht.cc]: (25.5-2) 524 1 2 3 4 5 6 7 8 9 10 Chapter 25: The Hartley transform void fht_complex_real_fft(double *f, ulong ldn, int is/*=+1*/) { const ulong n = (1UL<0 ) else for (ulong i=1,j=n-1; i static inline void sumdiff(Type &a, Type &b) // {a, b} <--| {a+b, a-b} { Type t=a-b; a+=b; b=t; } template static inline void diffsum(Type &a, Type &b) // {a, b} <--| {a-b, a+b} { Type t=a-b; b+=a; a=t; } The input has to be ordered as given in relations 25.5-2 on the previous page. The sign of the transform (variable is) has to be the same as with the forward version. Computation of an FHT using a real-valued FFT proceeds similarly as for complex versions. Let Tr2c be the operator corresponding to the post-processing in fht_real_complex_fft(), and Tc2r correspond to the preprocessing in fht_complex_real_fft(). That is Fc2r = H · Tc2r and Fr2c = Tr2c · H (25.5-3) −1 −1 The operators are mutually inverse: Tr2c = Tc2r and Tc2r = Tr2c . Multiplying the relations and using Tr2c · Tc2r = Tc2r · Tr2c = 1 gives H = Tc2r · Fr2c 25.6 and H = Fc2r · Tr2c (25.5-4) Higher radix FHT algorithms Higher radix FHT algorithms seem to get complicated due to the structure of the Hartley shift operator. In fact there is a straightforward way to turn any FFT decomposition into an FHT algorithm. For the moment assume that we want to compute a complex FHT, further assume we want to use a radix-r algorithm. At each step we have r short FHTs and want to combine them to a longer FHT but we do not know how this might be done. In section 25.3 on page 521 we learned how to turn an FHT into an FFT using the T -operator. And we have seen radix-r algorithms for the FFT. The crucial idea is to use the conversion operator T as a wrapper around the FFT-step that combines several short FFTs into a longer one. Turn a radix-r FFT-step into an FHT-step as follows: 1. Convert the r short FHTs into FFTs (use T on the subsequences). 2. Do the radix-r FFT step. 3. Convert the FFT into an FHT (use T on the sequence). For efficient implementations one obviously wants to combine the computations. With a radix-r step the scheme always accesses 2r elements simultaneously. The symmetry of the trigonometric factors is thereby automatically exploited. Splitting steps for the radix-4 FHT and the split-radix FHT are given in [317]. 25.7: Convolution via FHT 25.7 525 Convolution via FHT The convolution property of the Hartley transform can be stated as  1 H [a] H [b] − H [a] H [b] + H [a] H [b] + H [a] H [b] H [a ~ b] = 2 or, with c := H [a] and d := H [b], written element-wise:  1 ck dk − ck dk + ck dk + ck dk H [a ~ b]k = 2  1 = ck (dk + dk ) + ck (dk − dk ) 2  1 = dk (ck + ck ) + dk (ck − ck ) 2 (25.7-1) (25.7-2a) (25.7-2b) (25.7-2c) The latter forms reduce the number of multiplications. When turning the relation into an algorithm, one has to keep in mind that both elements yk = H [a ~ b]k and y−k must be computed simultaneously. For the auto-convolution equation 25.7-2a becomes:  1 H [a ~ a]k = ck (ck + ck ) + ck (ck − ck )) 2  1 = ck ck + c2k − ck 2 2 25.7.1 (25.7-3a) (25.7-3b) Algorithms as pseudocode The following routine computes the cyclic convolution of two real-valued sequences x[ ] and y[ ] via the FHT, the array length n must be even: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 procedure fht_cyclic_convolution(x[], y[], n) // real x[0..n-1] input, modified // real y[0..n-1] result { // transform data: fht(x[], n) fht(y[], n) // convolution in transformed domain: j := n-1 for i:=1 to n/2-1 { xi := x[i] xj := x[j] yp := y[i] + y[j] ym := y[i] - y[j] // == y[j] + y[i] // == -(y[j] - y[i]) y[i] := (xi*yp + xj*ym)/2 y[j] := (xj*yp - xi*ym)/2 j := j-1 } y[0] := x[0] * y[0] if n>1 then y[n/2] := x[n/2] * y[n/2] // transform back: fht(y[], n) // normalize: for i:=0 to n-1 { y[i] := y[i] / n } } It is assumed that the procedure fht() does no normalization. A routine for the cyclic auto-convolution is 526 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Chapter 25: The Hartley transform procedure cyclic_self_convolution(x[], n) // real x[0..n-1] input, result { // transform data: fht(x[], n) // convolution in transformed domain: j := n-1 for i:=1 to n/2-1 { ci := x[i] cj := x[j] t1 := ci*cj t2 := 1/2*(ci*ci-cj*cj) // == cj*ci // == -1/2*(cj*cj-ci*ci) x[i] := t1 + t2 x[j] := t1 - t2 j := j-1 } x[0] := x[0] * x[0] if n>1 then x[n/2] := x[n/2] * x[n/2] // transform back: fht(x[], n) // normalize: for i:=0 to n-1 { x[i] := x[i] / n } } For odd n replace the line for i:=1 to n/2-1 by for i:=1 to (n-1)/2 and omit the line if n>1 then x[n/2] := x[n/2]*x[n/2] in both procedures above. 25.7.2 C++ implementations The FHT based routine for the cyclic convolution of two real sequences is [FXT: convolution/fhtcnvl.cc] 1 2 3 4 5 6 7 void fht_convolution(double * restrict f, double * restrict g, ulong ldn) { fht(f, ldn); fht(g, ldn); fht_convolution_core(f, g, ldn); fht(g, ldn); } The equivalent of the element-wise multiplication is given in [FXT: convolution/fhtcnvlcore.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 void fht_convolution_core(const double * restrict f, double * restrict g, ulong ldn, double v/*=0.0*/) // Auxiliary routine for the computation of convolutions // via Fast Hartley Transforms. // ldn := base-2 logarithm of the array length. // v!=0.0 chooses alternative normalization. { const ulong n = (1UL<0 ) { g[nh] *= (v * f[nh]); v *= 0.5; for (ulong i=1,j=n-1; i static inline void fht_mul(Type xi, Type xj, Type &yi, Type &yj, double v) // yi <-- v*( (yi + yj)*xi + (yi - yj)*xj ) == v*( (xi + xj)*yi + (xi - xj)*yj ) // yj <-- v*( (-yi + yj)*xi + (yi + yj)*xj ) == v*( (-xi + xj)*yi + (xi + xj)*yj ) { Type h1p = xi, h1m = xj; Type s1 = h1p + h1m, d1 = h1p - h1m; Type h2p = yi, h2m = yj; yi = (h2p * s1 + h2m * d1) * v; yj = (h2m * s1 - h2p * d1) * v; } A C++ implementation of the FHT based self-convolution is given in [FXT: convolution/fhtcnvla.cc]. It uses the routine [FXT: convolution/fhtcnvlacore.cc] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 void fht_auto_convolution_core(double *f, ulong ldn, double v/*=0.0*/) // v!=0.0 chooses alternative normalization { const ulong n = (1UL<=2 ) { const ulong nh = n/2; f[nh] *= (v * f[nh]); v *= 0.5; for (ulong i=1,j=n-1; i static inline void fht_sqr(Type &xi, Type &xj, double v) // xi <-- v*( 2*xi*xj + xi*xi - xj*xj ) // xj <-- v*( 2*xi*xj - xi*xi + xj*xj ) { Type a = xi, b = xj; Type s1 = (a + b) * (a - b); a *= b; a += a; xi = (a+s1) * v; xj = (a-s1) * v; } 25.7.3 Avoiding the revbin permutations The observation that the revbin permutations can be omitted with FFT-based convolutions (see section 22.1.3 on page 442) applies again [FXT: convolution/fhtcnvlcore.cc]: 1 2 3 4 5 6 7 8 9 10 11 void fht_convolution_revbin_permuted_core(const double * restrict f, double * restrict g, ulong ldn, double v/*=0.0*/) // Same as fht_convolution_core() but with data access in revbin order. { const ulong n = (1UL<=2 ) g[1] *= (v * f[1]); // 1 == revbin(nh) if ( n<4 ) return; v *= 0.5; const ulong nh = (n>>1); ulong r=nh, rm=n-1; // nh == revbin(1), fht_mul(f[r], f[rm], g[r], g[rm], v); n1-1 == revbin(n-1) ulong k=2, km=n-2; while ( k>1); !((r^=m)&m); m>>=1) fht_mul(f[r], f[rm], g[r], g[rm], v); --km; ++k; {;} // k odd: rm += (tr-r); r += nh; fht_mul(f[r], f[rm], g[r], g[rm], v); --km; ++k; } } The optimized version saving three revbin permutations is [FXT: convolution/fhtcnvl.cc]: 1 2 3 4 5 6 7 void fht_convolution(double * restrict f, double * restrict g, ulong ldn) { fht_dif_core(f, ldn); fht_dif_core(g, ldn); fht_convolution_revbin_permuted_core(f, g, ldn); fht_dit_core(g, ldn); } 25.7.4 Negacyclic convolution via FHT Pseudocode for the computation of the negacyclic auto-convolution via FHT: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 procedure negacyclic_self_convolution(x[], n) // real x[0..n-1] input, result { hartley_shift_05(x, n) // preprocess fht(x, n) // transform data // convolution in transformed domain: j := n-1 for i:=0 to n/2-1 // here i starts from zero { a := x[i] b := x[j] x[i] := a*b+(a*a-b*b)/2 x[j] := a*b-(a*a-b*b)/2 j := j-1 } fht(x, n) // transform back hartley_shift_05(x, n) // postprocess } C++ implementations for the negacyclic convolution and self-convolution are given in [FXT: convolution/fhtnegacnvl.cc]. The negacyclic convolution is used for the computation of weighted transforms, for example in the MFA-based convolution for real input described in section 22.5.4 on page 453. 25.8: Localized FHT algorithms 25.8 529 Localized FHT algorithms Localized routines for the FHT can be obtained by slight modifications of the corresponding algorithms for the Walsh transform described in section 23.5 on page 468. The decimation in time (DIT) version is [FXT: fht/fhtloc2.h]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 template void fht_loc_dit2_core(Type *f, ulong ldn) { if ( ldn<=13 ) // sizeof(Type)*(2**threshold) <= L1_CACHE_BYTES { fht_dit_core(f, ldn); return; } // Recursion: fht_dit_core_2(f+2); // ldm==1 fht_dit_core_4(f+4); // ldm==2 fht_dit_core_8(f+8); // ldm==3 for (ulong ldm=4; ldm>1); hartley_shift_05(f+mh, mh); for (ulong t1=0, t2=mh; t1 void fht_loc_dif2_core(Type *f, ulong ldn) { if ( ldn<=13 ) // sizeof(Type)*(2**threshold) <= L1_CACHE_BYTES { fht_dif_core(f, ldn); return; } for (ulong ldm=ldn; ldm>=1; --ldm) { const ulong m = (1UL<>1); for (ulong t1=0, t2=mh; t1 inline void fht_dif_core_8(Type *f) { Type g0, f0, f1, g1; sumdiff(f[0], f[4], f0, g0); sumdiff(f[2], f[6], f1, g1); sumdiff(f0, f1); sumdiff(g0, g1); Type s1, c1, s2, c2; sumdiff(f[1], f[5], s1, c1); 530 12 13 14 15 16 17 18 19 20 Chapter 25: The Hartley transform sumdiff(f[3], f[7], s2, c2); sumdiff(s1, s2); sumdiff(f0, s1, f[0], f[1]); sumdiff(f1, s2, f[2], f[3]); c1 *= SQRT2; c2 *= SQRT2; sumdiff(g0, c1, f[4], f[5]); sumdiff(g1, c2, f[6], f[7]); } An additional revbin permutation is needed if the data is required in order. The FHT can be computed by either fht_loc_dif2_core(f, ldn); revbin_permute(f, 1UL< inline void fht_dit_core_8(Type *f) // unrolled version for length 8 { { // start initial loop { // fi = 0 gi = 1 Type g0, f0, f1, g1; sumdiff(f[0], f[1], f0, g0); sumdiff(f[2], f[3], f1, g1); sumdiff(f0, f1); sumdiff(g0, g1); Type s1, c1, s2, c2; sumdiff(f[4], f[5], s1, c1); sumdiff(f[6], f[7], s2, c2); sumdiff(s1, s2); sumdiff(f0, s1, f[0], f[4]); sumdiff(f1, s2, f[2], f[6]); c1 *= SQRT2; c2 *= SQRT2; sumdiff(g0, c1, f[1], f[5]); sumdiff(g1, c2, f[3], f[7]); } } // end initial loop } // opcount by generator: #mult=2=0.25/pt #add=22=2.75/pt Generated DIF FHT codes for lengths up to 64 are given in [FXT: fht/shortfhtdifcore.h]. The generated codes can be useful to spot parts of the original code that allow further optimization. Especially repeated trigonometric values and unused symmetries tend to be apparent in the unrolled code. It is a good idea to let the generator count the number of operations (multiplications, additions, loads and stores) of the code it emits. Those numbers can be compared to the corresponding values found in the compiled assembler code. The GCC compiler can produce the assembler code with the original source interlaced. This is a great tool for code optimization. The necessary commands are (include and warning flags omitted) # create assembler code: c++ -S -fverbose-asm -g -O2 test.cc -o test.s # create asm interlaced with source lines: as -alhnd test.s > test.lst For example, the generated length-4 DIT FHT core from [FXT: fht/shortfhtditcore.h] is 1 2 3 4 5 6 7 8 9 10 template inline void fht_dit_core_4(Type *f) // unrolled version for length 4 { Type f0, f1, f2, f3; sumdiff(f[0], f[1], f0, f1); sumdiff(f[2], f[3], f2, f3); sumdiff(f0, f2, f[0], f[2]); sumdiff(f1, f3, f[1], f[3]); } With Type set to double the generated assembler is, after some editing for readability, 25.11: Eigenvectors of the Fourier and Hartley transform ‡ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 533 void fht_dit_core_4(double *f) { double f0, f1, f2, f3; sumdiff(f[0], f[1], f0, f1); movlpd (%rdi), %xmm1 #* f, tmp63 movlpd 8(%rdi), %xmm0 #, tmp64 sumdiff(f[2], f[3], f2, f3); movlpd 16(%rdi), %xmm2 #, tmp67 movsd %xmm1, %xmm3 # tmp63, f0 subsd %xmm0, %xmm1 # tmp64, f1 movsd %xmm2, %xmm4 # tmp67, f2 addsd %xmm0, %xmm3 # tmp64, f0 movlpd 24(%rdi), %xmm0 #, tmp68 addsd %xmm0, %xmm4 # tmp68, f2 subsd %xmm0, %xmm2 # tmp68, f3 sumdiff(f0, f2, f[0], f[2]); movsd %xmm3, %xmm0 # f0, tmp71 addsd %xmm4, %xmm0 # f2, tmp71 subsd %xmm4, %xmm3 # f2, f0 movsd %xmm0, (%rdi) # tmp71,* f sumdiff(f1, f3, f[1], f[3]); movsd %xmm1, %xmm0 # f1, tmp73 subsd %xmm2, %xmm1 # f3, f1 movsd %xmm3, 16(%rdi) # f0, addsd %xmm2, %xmm0 # f3, tmp73 movsd %xmm1, 24(%rdi) # f1, movsd %xmm0, 8(%rdi) # tmp73, } Note that the assembler code is not always in sync with the corresponding source lines, especially with higher levels of optimization. 25.11 Eigenvectors of the Fourier and Hartley transform ‡ Let aS := a + a be the symmetric part of a sequence a, then    F F aS = aS     Now let u+ := aS + F aS and u− := aS − F aS , then       F u+ = F aS + aS = aS + F aS = +1 · u+       F u− = F aS − aS = −(aS − F aS ) = −1 · u− Both u+ and u− are symmetric. For aA := a − a, the antisymmetric part of a, we have    F F aA = −aA     Therefore with v+ := aA + i F aA and v− := aA − i F aA :       F v+ = F aA − i aA = −i (aA + i F aA ) = −i · v+       F v− = F aA + i aA = +i (aA − i F aA ) = +i · v− (25.11-1) (25.11-2a) (25.11-2b) (25.11-3) (25.11-4a) (25.11-4b) Both v+ and v− are antisymmetric. The sequences u+ , u− , v+ , and v− are eigenvectors of the Fourier transform, with eigenvalues +1, −1, −i and +i respectively. The eigenvectors are pair-wise orthogonal. Using the relation a = 1 (u+ + u− + v+ + v− ) 2 (25.11-5) we can, for a given sequence, find a transform that is a ‘square root’ of the Fourier transform: compute u+ , u− , v+ , and v− , and a transform F λ [a] for λ ∈ R as  1 F λ [a] = (+1)λ u+ + (−1)λ u− + (−i)λ v+ + (+i)λ v− (25.11-6) 2 534 Chapter 25: The Hartley transform This transform is called the fractional (order) Fourier transform (but see section 22.6.3 on page 456). 1 1/2 Then F 0 [a] is the identity and [a] is a transform   F [a] is the usual Fourier transform. The transform F so that F 1/2 F 1/2 [a] = F a , that is, a ‘square root’ of the Fourier transform. The transform F 1/2 [a] is not unique as the expressions ±11/2 and ±i1/2 are not. A set of eigenvectors (that is, eigenfunctions) of the continuous Fourier transform is given by Hn exp(−x2 /2) (25.11-7) where Hn is the n-th Hermite polynomial, see figure 36.3-A on page 696. The corresponding eigenvalues are in . The functions are the eigenstates of the quantum mechanical harmonic oscillator, see [358, entry “Quantum oscillator”]. The eigenvectors of the Hartley transform are u+ := a + H [a] (25.11-8a) u− := a − H [a] (25.11-8b) The eigenvalues are ±1, we have H [u+ ] = +1 · u+ and H [u− ] = −1 · u− . Let M be√the n × n matrix corresponding to the length-n Fourier transform with σ = +1, that is, Mr,c = 1/ n exp (2 π i r c/n). Then its characteristic polynomial (see relation 42.5-2 on page 899) is p(x) = (x − 1)b(n+4)/4c (x + 1)b(n+2)/4c (x − i)b(n+1)/4c (x + i)b(n−1)/4c (25.11-9) We write p(x) = xn + cn−1 xn−1 + . . . + c1 x + c0 . The trace of the matrix M is n−1 Tr(M ) =  1 X √ exp 2 π i k 2 /n n (25.11-10) k=0 It equals (−cn−1 , the negated sum of all roots of p(x), and) 1 + i, +1, 0, +i (25.11-11) −n for n mod 4 ≡ 0, 1, 2, 3, respectively. A closed form  is (1 + i ) / (1 − i). The generating function for 2 the sequence is ((1 + i) − x) / 1 + (−1 + i) x − i x . The determinant of M equals ((−1)n c0 , (−1)n times the product of all roots of p(x), and) + i, +1, −1, −i, −i, −1, +1, +i (25.11-12)   for n mod 8 ≡ 0, 1, 2, . . . , 7. The generating function for the sequence is i + x − x2 − i x3 / 1 + x4 . Let√ H be the n × n matrix corresponding to the length-n Hartley transform, that is, Hr,c = 1/ n (cos (2 π r c/n) + sin (2 π r c/n)). Then its characteristic polynomial is p(x) = (x − 1)b(n+2)/2c (x + 1)b(n−1)/2c (25.11-13) 535 Chapter 26 Number theoretic transforms (NTTs) We introduce the number theoretic transforms (NTTs). The routines for the fast NTTs are rather straightforward translations of the FFT algorithms. Radix-2 and radix-4 routines are given, there should be no difficulty to translate any given complex FFT into the equivalent NTT. For the translation of real-valued FFT (or FHT) routines, we need to express sines and cosines in modular arithmetic, this is presented in sections 39.12.6 and 39.12.7. As no rounding errors occur with the underlying modular arithmetic, the main application of NTTs is the fast computation of exact convolutions. 26.1 Prime moduli for NTTs We want to implement FFTs in Z/mZ (the ring of integers modulo some integer m) instead of C, the field of complex numbers. These FFTs are called number theoretic transforms (NTTs), mod m FFTs or (if m is a prime) prime modulus transforms. There is a restriction for the choice of m: for a length-n NTT we need a primitive n-th root of unity. A number r is called an n-th root of unity if rn = 1. It is called a primitive n-th root if rk 6= 1 ∀ k < n (see section 39.5 on page 774). In C matters are simple: e± 2 π i/n is a primitive n-th root of unity for arbitrary n. For example, e2 π i/21 is a primitive 21st root of unity. Now r = e2 π i/3 is also 21st root of unity but not a primitive root, because r3 = 1. A primitive n-th root of 1 in Z/mZ is also called an element of order n. The ‘cyclic’ property of the elements r of order n lies in the heart of all FFT algorithms: rn+k = rk . In Z/mZ things are not that simple: for a given modulus m primitive n-th roots of unity do not exist for arbitrary n. They only exist for some maximal order R and its divisors di : rR/di is a di -th root of unity because (rR/di )di = rR = 1. Therefore n, the length of the transform, must divide the maximal order R. This is the first condition for NTTs: n \ R (26.1-1) The operations needed in FFTs are modular addition, subtraction and multiplication, as described in section 39.1 on page 764. Division is not needed, except for the division by n in the final normalization. Division by n is multiplication by the inverse of n, so n must be invertible in Z/mZ. Therefore n, the length of the transform, must be coprime to the modulus m. This is the second condition for NTTs. gcd(n, m) = 1 (26.1-2) 536 Chapter 26: Number theoretic transforms (NTTs) We restrict our attention to prime moduli, though NTTs are also possible with composite moduli. If the modulus is a prime p, then Z/pZ is the field Fp = GF(p): all elements except 0 have inverses and ‘division is possible’. Thus the second condition (relation 26.1-2) is trivially fulfilled for all NTT lengths n < p: a prime p is coprime to all integers n < p. Roots of unity are available for the maximal order R = p−1 and its divisors: Therefore the first condition (relation 26.1-1) is that n divides p − 1. This restricts the choice for p to primes of the form p = v n + 1: for length-n = 2k NTTs one will use primes like p = 3 · 5 · 227 + 1 (31 bits), p = 13 · 228 + 1 (32 bits), p = 3 · 29 · 256 + 1 (63 bits) or p = 27 · 259 + 1 (64 bits). arg 1: 62 == wb [word bits, wb<=63] default=62 arg 2: 0.01 == deltab [results are in the range [wb-deltab, wb]] default=0.01 minb = 61.99 = wb-0.01 arg 3: 44 == minx [log_2(min(fftlen))] default=44 ---- x = 44: ----4580495072570638337 = 0x3f91300000000001 = 1 + 2^44 * 83 * 3137 (61.9902 bits) 4581058022524059649 = 0x3f93300000000001 = 1 + 2^44 * 3 * 11 * 13 * 607 (61.9904 bits) 4582113553686724609 = 0x3f96f00000000001 = 1 + 2^44 * 3 * 7 * 79 * 157 (61.9907 bits) 4585702359639785473 = 0x3fa3b00000000001 = 1 + 2^44 * 3^2 * 11 * 2633 (61.9918 bits) 4587039365779161089 = 0x3fa8700000000001 = 1 + 2^44 * 7 * 193^2 (61.9923 bits) 4587391209500049409 = 0x3fa9b00000000001 = 1 + 2^44 * 3 * 17 * 5113 (61.9924 bits) 4588130081313914881 = 0x3fac500000000001 = 1 + 2^44 * 3 * 5 * 17387 (61.9926 bits) 4589572640569556993 = 0x3fb1700000000001 = 1 + 2^44 * 11 * 37 * 641 (61.9931 bits) [--snip--] 4610999923171655681 = 0x3ffd900000000001 = 1 + 2^44 * 5 * 19 * 31 * 89 (61.9998 bits) 4611105476287922177 = 0x3ffdf00000000001 = 1 + 2^44 * 262111 (61.9998 bits) ---- x = 45: ----4580336742896238593 = 0x3f90a00000000001 = 1 + 2^45 * 29 * 67^2 (61.9902 bits) 4581533011547258881 = 0x3f94e00000000001 = 1 + 2^45 * 3 * 5 * 8681 (61.9905 bits) 4584347761314365441 = 0x3f9ee00000000001 = 1 + 2^45 * 5 * 11 * 23 * 103 (61.9914 bits) 4587655092290715649 = 0x3faaa00000000001 = 1 + 2^45 * 3 * 7^2 * 887 (61.9925 bits) [--snip--] ---- x = 48: ----4585508845593296897 = 0x3fa3000000000001 = 1 + 2^48 * 11 * 1481 (61.9918 bits) ---- x = 49: ----4582975570802900993 = 0x3f9a000000000001 = 1 + 2^49 * 7 * 1163 (61.991 bits) 4595360469778169857 = 0x3fc6000000000001 = 1 + 2^49 * 3^2 * 907 (61.9949 bits) ---- x = 50: ----4601552919265804289 = 0x3fdc000000000001 = 1 + 2^50 * 61 * 67 (61.9968 bits) Figure 26.1-A: Primes suitable for NTTs of lengths dividing 244 . modulus (hex) 0x3f40f80000000001 0x3c0eb50000000001 0x3d673d0000000001 0x3fc22b0000000001 0x3bf6190000000001 0x3d1d690000000001 0x3d8c270000000001 0x3e8e8d0000000001 0x3ee4af0000000001 0x3ed23a0000000001 0x3fafb60000000001 0x3c46140000000001 0x3e32440000000001 0x3d23900000000001 == factorization + 1 == 2^43.3^2.5^2.7^2.47+1 == 2^40.3^3.5^2.7^3.17+1 == 2^40.3^2.5^3.7^2.73+1 == 2^40.3^2.5^2.7^2.379+1 == 2^40.3^2.5^3.7.499+1 == 2^40.3^2.5^2.7.2543+1 == 2^40.3^2.5^2.7.13.197+1 == 2^40.3^2.5^2.7.19.137+1 == 2^40.3^2.5^2.7.2617+1 == 2^41.3^2.5^2.7.1307+1 == 2^41.3^2.5^4.7.53+1 == 2^42.3^3.5^2.7.11.19+1 == 2^42.3^2.5^2.7.647+1 == 2^44.3^3.5^2.7.53+1 log(m-1)/log(2) 61.9831 61.9083 61.9402 61.9945 61.906 61.9335 61.9436 61.9671 61.9748 61.9732 61.9929 61.9135 61.9588 61.934 Figure 26.1-B: Primes suitable for NTTs of lengths dividing 240 32 52 7. Primes suitable with NTTs (sometimes called FFT-primes) can be generated with the program [FXT: mod/fftprimes-demo.cc]. A shortened sample output is shown in figure 26.1-A. A few moduli that allow for transforms of lengths dividing 240 · 32 · 52 · 7 are shown in figure 26.1-B, the data is taken from [FXT: mod/moduli.txt]. We note that primality of moduli suitable for NTTs can easily by tested using Proth’s theorem, see section 39.11.3.1 on page 795. 26.2: Implementation of NTTs 26.2 537 Implementation of NTTs To implement NTTs (modulo m, length n), we need to implement modular arithmetic and replace e± 2 π i/n by a primitive n-th root r of unity in Z/mZ in the code. A C++ class implementing modular arithmetic is [FXT: class mod in mod/mod.h]. For the inverse transform one uses the (mod m) inverse r−1 of r that was used for the forward transform. The element r−1 is also a primitive n-th root. Methods for the computation of the modular inverse are described in section 39.1.4 on page 767 (GCD algorithm) and in section 39.7.4 on page 781 (powering algorithm). While the notion of the Fourier transform as a ‘decomposition into frequencies’ appears to be meaningless for NTTs the algorithms are denoted with ‘decimation in time/frequency’ in analogy to those in the complex domain. The nice feature of NTTs is that there is no loss of precision in the transform as with the floating-point FFTs. Using the trigonometric recursion in its most naive form is mandatory, as the computation of roots of unity is expensive. 26.2.1 Radix-2 DIT NTT Pseudocode for the radix-2 decimation in time (DIT) NTT (to be called with ldn=log2(n)): 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 procedure mod_fft_dit2(f[], ldn, is) // mod_type f[0..2**ldn-1] { n := 2**ldn rn := element_of_order(n) // (mod_type) if is<0 then rn := rn**(-1) revbin_permute(f[], n) for ldm:=1 to ldn { m := 2**ldm mh := m/2 dw := rn**(2**(ldn-ldm)) w := 1 // (mod_type) // (mod_type) for j:=0 to mh-1 { for r:=0 to n-m step m { t1 := r + j t2 := t1 + mh v := f[t2] * w // (mod_type) u := f[t1] // (mod_type) f[t1] := u + v f[t2] := u - v } w := w * dw // trig recursion } } } As shown in section 21.2.1 on page 412 it is a good idea to extract the ldm==1 stage of the outermost loop: Replace for ldm:=1 to ldn { by for r:=0 to n-1 step 2 { { f[r], f[r+1] } := { f[r]+f[r+1], f[r]-f[r+1] } } // parallel assignment 538 Chapter 26: Number theoretic transforms (NTTs) for ldm:=2 to ldn { The C++ implementation is given in [FXT: ntt/nttdit2.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 void ntt_dit2_core(mod *f, ulong ldn, int is) // Auxiliary routine for ntt_dit2() // Decimation in time (DIT) radix-2 FFT // Input data must be in revbin_permuted order // ldn := base-2 logarithm of the array length // is := sign of the transform { const ulong n = 1UL<>1); const mod dw = mod::root2pow( is>0 ? ldm : -ldm ); mod w = (mod::one); for (ulong j=0; j0 ? ldn : -ldn ); for (ulong ldm=ldn; ldm>1; --ldm) { const ulong m = (1UL<>1); mod w = mod::one; for (ulong j=0; j0 ? 2 : -2 ); ulong ldm = LX + (ldn&1); for ( ; ldm<=ldn ; ldm+=LX) { const ulong m = (1UL<>LX); const mod dw = mod::root2pow( is>0 ? ldm : -ldm ); mod w = (mod::one); mod w2 = w; mod w3 = w; for (ulong j=0; j0 ? 2 : -2 ); for (ulong ldm=ldn; ldm>=LX; ldm-=LX) { const ulong m = (1UL<>LX); const mod dw = mod::root2pow( is>0 ? ldm : -ldm ); mod w = (mod::one); mod w2 = w; mod w3 = w; for (ulong j=0; j eps ) return false; for (ulong i=1; i eps ) return false; return true; } where norm_sqr() computes the sums in the relations 27.1-6a and 27.1-6b: 1 2 3 4 5 6 7 8 9 10 11 static double norm_sqr(const double *h, ulong n, ulong s=0) { s *= 2; // Note! if ( s>=n ) return 0.0; double v = 0; for (ulong k=0,j=s; return v; j>1); const ulong m = n-1; // mask to compute modulo n (n is a power of 2) for (ulong i=0,j=0; i=minm; m>>=1) wavelet_step(f, m, wf, t); } The step for the inverse transform is [FXT: wavelet/invwavelet.cc]: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 void inverse_wavelet_step(double *f, ulong n, const wavelet_filter &wf, double *t) { const ulong nh = (n>>1); const ulong m = n-1; // mask to compute modulo n (n is a power of 2) null(t, n); // t[] := [0,0,...,0] for (ulong i=0, j=0; i