17db96d56Sopenharmony_ci 27db96d56Sopenharmony_ci 37db96d56Sopenharmony_ciBignum support (Fast Number Theoretic Transform or FNT): 47db96d56Sopenharmony_ci======================================================== 57db96d56Sopenharmony_ci 67db96d56Sopenharmony_ciBignum arithmetic in libmpdec uses the scheme for fast convolution 77db96d56Sopenharmony_ciof integer sequences from: 87db96d56Sopenharmony_ci 97db96d56Sopenharmony_ciJ. M. Pollard: The fast Fourier transform in a finite field 107db96d56Sopenharmony_cihttp://www.ams.org/journals/mcom/1971-25-114/S0025-5718-1971-0301966-0/home.html 117db96d56Sopenharmony_ci 127db96d56Sopenharmony_ci 137db96d56Sopenharmony_ciThe transform in a finite field can be used for convolution in the same 147db96d56Sopenharmony_ciway as the Fourier Transform. The main advantages of the Number Theoretic 157db96d56Sopenharmony_ciTransform are that it is both exact and very memory efficient. 167db96d56Sopenharmony_ci 177db96d56Sopenharmony_ci 187db96d56Sopenharmony_ciConvolution in pseudo-code: 197db96d56Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~ 207db96d56Sopenharmony_ci 217db96d56Sopenharmony_ci fnt_convolute(a, b): 227db96d56Sopenharmony_ci x = fnt(a) # forward transform of a 237db96d56Sopenharmony_ci y = fnt(b) # forward transform of b 247db96d56Sopenharmony_ci z = pairwise multiply x[i] and y[i] 257db96d56Sopenharmony_ci result = inv_fnt(z) # backward transform of z. 267db96d56Sopenharmony_ci 277db96d56Sopenharmony_ci 287db96d56Sopenharmony_ciExtending the maximum transform length (Chinese Remainder Theorem): 297db96d56Sopenharmony_ci------------------------------------------------------------------- 307db96d56Sopenharmony_ci 317db96d56Sopenharmony_ciThe maximum transform length is quite limited when using a single 327db96d56Sopenharmony_ciprime field. However, it is possible to use multiple primes and 337db96d56Sopenharmony_cirecover the result using the Chinese Remainder Theorem. 347db96d56Sopenharmony_ci 357db96d56Sopenharmony_ci 367db96d56Sopenharmony_ciMultiplication in pseudo-code: 377db96d56Sopenharmony_ci~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 387db96d56Sopenharmony_ci 397db96d56Sopenharmony_ci _mpd_fntmul(u, v): 407db96d56Sopenharmony_ci c1 = fnt_convolute(u, v, P1) # convolute modulo prime1 417db96d56Sopenharmony_ci c2 = fnt_convolute(u, v, P2) # convolute modulo prime2 427db96d56Sopenharmony_ci c3 = fnt_convolute(u, v, P3) # convolute modulo prime3 437db96d56Sopenharmony_ci result = crt3(c1, c2, c3) # Chinese Remainder Theorem 447db96d56Sopenharmony_ci 457db96d56Sopenharmony_ci 467db96d56Sopenharmony_ciOptimized transform functions: 477db96d56Sopenharmony_ci------------------------------ 487db96d56Sopenharmony_ci 497db96d56Sopenharmony_ciThere are three different fnt() functions: 507db96d56Sopenharmony_ci 517db96d56Sopenharmony_ci std_fnt: "standard" decimation in frequency transform for array lengths 527db96d56Sopenharmony_ci of 2**n. Performs well up to 1024 words. 537db96d56Sopenharmony_ci 547db96d56Sopenharmony_ci sixstep: Cache-friendly algorithm for array lengths of 2**n. Outperforms 557db96d56Sopenharmony_ci std_fnt for large arrays. 567db96d56Sopenharmony_ci 577db96d56Sopenharmony_ci fourstep: Algorithm for array lengths of 3 * 2**n. Also cache friendly 587db96d56Sopenharmony_ci in large parts. 597db96d56Sopenharmony_ci 607db96d56Sopenharmony_ci 617db96d56Sopenharmony_ciList of bignum-only files: 627db96d56Sopenharmony_ci-------------------------- 637db96d56Sopenharmony_ci 647db96d56Sopenharmony_ciFunctions from these files are only used in _mpd_fntmul(). 657db96d56Sopenharmony_ci 667db96d56Sopenharmony_ci umodarith.h -> fast low level routines for unsigned modular arithmetic 677db96d56Sopenharmony_ci numbertheory.c -> routines for setting up the FNT 687db96d56Sopenharmony_ci difradix2.c -> decimation in frequency transform, used as the 697db96d56Sopenharmony_ci "base case" by the following three files: 707db96d56Sopenharmony_ci 717db96d56Sopenharmony_ci fnt.c -> standard transform for smaller arrays 727db96d56Sopenharmony_ci sixstep.c -> transform large arrays of length 2**n 737db96d56Sopenharmony_ci fourstep.c -> transform arrays of length 3 * 2**n 747db96d56Sopenharmony_ci 757db96d56Sopenharmony_ci convolute.c -> do the actual fast convolution, using one of 767db96d56Sopenharmony_ci the three transform functions. 777db96d56Sopenharmony_ci transpose.c -> transpositions needed for the sixstep algorithm. 787db96d56Sopenharmony_ci crt.c -> Chinese Remainder Theorem: use information from three 797db96d56Sopenharmony_ci transforms modulo three different primes to get the 807db96d56Sopenharmony_ci final result. 817db96d56Sopenharmony_ci 827db96d56Sopenharmony_ci 837db96d56Sopenharmony_ci 84