17db96d56Sopenharmony_ci 27db96d56Sopenharmony_ci 37db96d56Sopenharmony_ci(* Copyright (c) 2011-2020 Stefan Krah. All rights reserved. *) 47db96d56Sopenharmony_ci 57db96d56Sopenharmony_ci 67db96d56Sopenharmony_ciThe Matrix Fourier Transform: 77db96d56Sopenharmony_ci============================= 87db96d56Sopenharmony_ci 97db96d56Sopenharmony_ciIn libmpdec, the Matrix Fourier Transform [1] is called four-step transform 107db96d56Sopenharmony_ciafter a variant that appears in [2]. The algorithm requires that the input 117db96d56Sopenharmony_ciarray can be viewed as an R*C matrix. 127db96d56Sopenharmony_ci 137db96d56Sopenharmony_ciAll operations are done modulo p. For readability, the proofs drop all 147db96d56Sopenharmony_ciinstances of (mod p). 157db96d56Sopenharmony_ci 167db96d56Sopenharmony_ci 177db96d56Sopenharmony_ciAlgorithm four-step (forward transform): 187db96d56Sopenharmony_ci---------------------------------------- 197db96d56Sopenharmony_ci 207db96d56Sopenharmony_ci a := input array 217db96d56Sopenharmony_ci d := len(a) = R * C 227db96d56Sopenharmony_ci p := prime 237db96d56Sopenharmony_ci w := primitive root of unity of the prime field 247db96d56Sopenharmony_ci r := w**((p-1)/d) 257db96d56Sopenharmony_ci A := output array 267db96d56Sopenharmony_ci 277db96d56Sopenharmony_ci 1) Apply a length R FNT to each column. 287db96d56Sopenharmony_ci 297db96d56Sopenharmony_ci 2) Multiply each matrix element (addressed by j*C+m) by r**(j*m). 307db96d56Sopenharmony_ci 317db96d56Sopenharmony_ci 3) Apply a length C FNT to each row. 327db96d56Sopenharmony_ci 337db96d56Sopenharmony_ci 4) Transpose the matrix. 347db96d56Sopenharmony_ci 357db96d56Sopenharmony_ci 367db96d56Sopenharmony_ciProof (forward transform): 377db96d56Sopenharmony_ci-------------------------- 387db96d56Sopenharmony_ci 397db96d56Sopenharmony_ci The algorithm can be derived starting from the regular definition of 407db96d56Sopenharmony_ci the finite-field transform of length d: 417db96d56Sopenharmony_ci 427db96d56Sopenharmony_ci d-1 437db96d56Sopenharmony_ci ,---- 447db96d56Sopenharmony_ci \ 457db96d56Sopenharmony_ci A[k] = | a[l] * r**(k * l) 467db96d56Sopenharmony_ci / 477db96d56Sopenharmony_ci `---- 487db96d56Sopenharmony_ci l = 0 497db96d56Sopenharmony_ci 507db96d56Sopenharmony_ci 517db96d56Sopenharmony_ci The sum can be rearranged into the sum of the sums of columns: 527db96d56Sopenharmony_ci 537db96d56Sopenharmony_ci C-1 R-1 547db96d56Sopenharmony_ci ,---- ,---- 557db96d56Sopenharmony_ci \ \ 567db96d56Sopenharmony_ci = | | a[i * C + j] * r**(k * (i * C + j)) 577db96d56Sopenharmony_ci / / 587db96d56Sopenharmony_ci `---- `---- 597db96d56Sopenharmony_ci j = 0 i = 0 607db96d56Sopenharmony_ci 617db96d56Sopenharmony_ci 627db96d56Sopenharmony_ci Extracting a constant from the inner sum: 637db96d56Sopenharmony_ci 647db96d56Sopenharmony_ci C-1 R-1 657db96d56Sopenharmony_ci ,---- ,---- 667db96d56Sopenharmony_ci \ \ 677db96d56Sopenharmony_ci = | r**k*j * | a[i * C + j] * r**(k * i * C) 687db96d56Sopenharmony_ci / / 697db96d56Sopenharmony_ci `---- `---- 707db96d56Sopenharmony_ci j = 0 i = 0 717db96d56Sopenharmony_ci 727db96d56Sopenharmony_ci 737db96d56Sopenharmony_ci Without any loss of generality, let k = n * R + m, 747db96d56Sopenharmony_ci where n < C and m < R: 757db96d56Sopenharmony_ci 767db96d56Sopenharmony_ci C-1 R-1 777db96d56Sopenharmony_ci ,---- ,---- 787db96d56Sopenharmony_ci \ \ 797db96d56Sopenharmony_ci A[n*R+m] = | r**(R*n*j) * r**(m*j) * | a[i*C+j] * r**(R*C*n*i) * r**(C*m*i) 807db96d56Sopenharmony_ci / / 817db96d56Sopenharmony_ci `---- `---- 827db96d56Sopenharmony_ci j = 0 i = 0 837db96d56Sopenharmony_ci 847db96d56Sopenharmony_ci 857db96d56Sopenharmony_ci Since r = w ** ((p-1) / (R*C)): 867db96d56Sopenharmony_ci 877db96d56Sopenharmony_ci a) r**(R*C*n*i) = w**((p-1)*n*i) = 1 887db96d56Sopenharmony_ci 897db96d56Sopenharmony_ci b) r**(C*m*i) = w**((p-1) / R) ** (m*i) = r_R ** (m*i) 907db96d56Sopenharmony_ci 917db96d56Sopenharmony_ci c) r**(R*n*j) = w**((p-1) / C) ** (n*j) = r_C ** (n*j) 927db96d56Sopenharmony_ci 937db96d56Sopenharmony_ci r_R := root of the subfield of length R. 947db96d56Sopenharmony_ci r_C := root of the subfield of length C. 957db96d56Sopenharmony_ci 967db96d56Sopenharmony_ci 977db96d56Sopenharmony_ci C-1 R-1 987db96d56Sopenharmony_ci ,---- ,---- 997db96d56Sopenharmony_ci \ \ 1007db96d56Sopenharmony_ci A[n*R+m] = | r_C**(n*j) * [ r**(m*j) * | a[i*C+j] * r_R**(m*i) ] 1017db96d56Sopenharmony_ci / ^ / 1027db96d56Sopenharmony_ci `---- | `---- 1) transform the columns 1037db96d56Sopenharmony_ci j = 0 | i = 0 1047db96d56Sopenharmony_ci ^ | 1057db96d56Sopenharmony_ci | `-- 2) multiply 1067db96d56Sopenharmony_ci | 1077db96d56Sopenharmony_ci `-- 3) transform the rows 1087db96d56Sopenharmony_ci 1097db96d56Sopenharmony_ci 1107db96d56Sopenharmony_ci Note that the entire RHS is a function of n and m and that the results 1117db96d56Sopenharmony_ci for each pair (n, m) are stored in Fortran order. 1127db96d56Sopenharmony_ci 1137db96d56Sopenharmony_ci Let the term in square brackets be f(m, j). Step 1) and 2) precalculate 1147db96d56Sopenharmony_ci the term for all (m, j). After that, the original matrix is now a lookup 1157db96d56Sopenharmony_ci table with the mth element in the jth column at location m * C + j. 1167db96d56Sopenharmony_ci 1177db96d56Sopenharmony_ci Let the complete RHS be g(m, n). Step 3) does an in-place transform of 1187db96d56Sopenharmony_ci length n on all rows. After that, the original matrix is now a lookup 1197db96d56Sopenharmony_ci table with the mth element in the nth column at location m * C + n. 1207db96d56Sopenharmony_ci 1217db96d56Sopenharmony_ci But each (m, n) pair should be written to location n * R + m. Therefore, 1227db96d56Sopenharmony_ci step 4) transposes the result of step 3). 1237db96d56Sopenharmony_ci 1247db96d56Sopenharmony_ci 1257db96d56Sopenharmony_ci 1267db96d56Sopenharmony_ciAlgorithm four-step (inverse transform): 1277db96d56Sopenharmony_ci---------------------------------------- 1287db96d56Sopenharmony_ci 1297db96d56Sopenharmony_ci A := input array 1307db96d56Sopenharmony_ci d := len(A) = R * C 1317db96d56Sopenharmony_ci p := prime 1327db96d56Sopenharmony_ci d' := d**(p-2) # inverse of d 1337db96d56Sopenharmony_ci w := primitive root of unity of the prime field 1347db96d56Sopenharmony_ci r := w**((p-1)/d) # root of the subfield 1357db96d56Sopenharmony_ci r' := w**((p-1) - (p-1)/d) # inverse of r 1367db96d56Sopenharmony_ci a := output array 1377db96d56Sopenharmony_ci 1387db96d56Sopenharmony_ci 0) View the matrix as a C*R matrix. 1397db96d56Sopenharmony_ci 1407db96d56Sopenharmony_ci 1) Transpose the matrix, producing an R*C matrix. 1417db96d56Sopenharmony_ci 1427db96d56Sopenharmony_ci 2) Apply a length C FNT to each row. 1437db96d56Sopenharmony_ci 1447db96d56Sopenharmony_ci 3) Multiply each matrix element (addressed by i*C+n) by r**(i*n). 1457db96d56Sopenharmony_ci 1467db96d56Sopenharmony_ci 4) Apply a length R FNT to each column. 1477db96d56Sopenharmony_ci 1487db96d56Sopenharmony_ci 1497db96d56Sopenharmony_ciProof (inverse transform): 1507db96d56Sopenharmony_ci-------------------------- 1517db96d56Sopenharmony_ci 1527db96d56Sopenharmony_ci The algorithm can be derived starting from the regular definition of 1537db96d56Sopenharmony_ci the finite-field inverse transform of length d: 1547db96d56Sopenharmony_ci 1557db96d56Sopenharmony_ci d-1 1567db96d56Sopenharmony_ci ,---- 1577db96d56Sopenharmony_ci \ 1587db96d56Sopenharmony_ci a[k] = d' * | A[l] * r' ** (k * l) 1597db96d56Sopenharmony_ci / 1607db96d56Sopenharmony_ci `---- 1617db96d56Sopenharmony_ci l = 0 1627db96d56Sopenharmony_ci 1637db96d56Sopenharmony_ci 1647db96d56Sopenharmony_ci The sum can be rearranged into the sum of the sums of columns. Note 1657db96d56Sopenharmony_ci that at this stage we still have a C*R matrix, so C denotes the number 1667db96d56Sopenharmony_ci of rows: 1677db96d56Sopenharmony_ci 1687db96d56Sopenharmony_ci R-1 C-1 1697db96d56Sopenharmony_ci ,---- ,---- 1707db96d56Sopenharmony_ci \ \ 1717db96d56Sopenharmony_ci = d' * | | a[j * R + i] * r' ** (k * (j * R + i)) 1727db96d56Sopenharmony_ci / / 1737db96d56Sopenharmony_ci `---- `---- 1747db96d56Sopenharmony_ci i = 0 j = 0 1757db96d56Sopenharmony_ci 1767db96d56Sopenharmony_ci 1777db96d56Sopenharmony_ci Extracting a constant from the inner sum: 1787db96d56Sopenharmony_ci 1797db96d56Sopenharmony_ci R-1 C-1 1807db96d56Sopenharmony_ci ,---- ,---- 1817db96d56Sopenharmony_ci \ \ 1827db96d56Sopenharmony_ci = d' * | r' ** (k*i) * | a[j * R + i] * r' ** (k * j * R) 1837db96d56Sopenharmony_ci / / 1847db96d56Sopenharmony_ci `---- `---- 1857db96d56Sopenharmony_ci i = 0 j = 0 1867db96d56Sopenharmony_ci 1877db96d56Sopenharmony_ci 1887db96d56Sopenharmony_ci Without any loss of generality, let k = m * C + n, 1897db96d56Sopenharmony_ci where m < R and n < C: 1907db96d56Sopenharmony_ci 1917db96d56Sopenharmony_ci R-1 C-1 1927db96d56Sopenharmony_ci ,---- ,---- 1937db96d56Sopenharmony_ci \ \ 1947db96d56Sopenharmony_ci A[m*C+n] = d' * | r' ** (C*m*i) * r' ** (n*i) * | a[j*R+i] * r' ** (R*C*m*j) * r' ** (R*n*j) 1957db96d56Sopenharmony_ci / / 1967db96d56Sopenharmony_ci `---- `---- 1977db96d56Sopenharmony_ci i = 0 j = 0 1987db96d56Sopenharmony_ci 1997db96d56Sopenharmony_ci 2007db96d56Sopenharmony_ci Since r' = w**((p-1) - (p-1)/d) and d = R*C: 2017db96d56Sopenharmony_ci 2027db96d56Sopenharmony_ci a) r' ** (R*C*m*j) = w**((p-1)*R*C*m*j - (p-1)*m*j) = 1 2037db96d56Sopenharmony_ci 2047db96d56Sopenharmony_ci b) r' ** (C*m*i) = w**((p-1)*C - (p-1)/R) ** (m*i) = r_R' ** (m*i) 2057db96d56Sopenharmony_ci 2067db96d56Sopenharmony_ci c) r' ** (R*n*j) = r_C' ** (n*j) 2077db96d56Sopenharmony_ci 2087db96d56Sopenharmony_ci d) d' = d**(p-2) = (R*C) ** (p-2) = R**(p-2) * C**(p-2) = R' * C' 2097db96d56Sopenharmony_ci 2107db96d56Sopenharmony_ci r_R' := inverse of the root of the subfield of length R. 2117db96d56Sopenharmony_ci r_C' := inverse of the root of the subfield of length C. 2127db96d56Sopenharmony_ci R' := inverse of R 2137db96d56Sopenharmony_ci C' := inverse of C 2147db96d56Sopenharmony_ci 2157db96d56Sopenharmony_ci 2167db96d56Sopenharmony_ci R-1 C-1 2177db96d56Sopenharmony_ci ,---- ,---- 2) transform the rows of a^T 2187db96d56Sopenharmony_ci \ \ 2197db96d56Sopenharmony_ci A[m*C+n] = R' * | r_R' ** (m*i) * [ r' ** (n*i) * C' * | a[j*R+i] * r_C' ** (n*j) ] 2207db96d56Sopenharmony_ci / ^ / ^ 2217db96d56Sopenharmony_ci `---- | `---- | 2227db96d56Sopenharmony_ci i = 0 | j = 0 | 2237db96d56Sopenharmony_ci ^ | `-- 1) Transpose input matrix 2247db96d56Sopenharmony_ci | `-- 3) multiply to address elements by 2257db96d56Sopenharmony_ci | i * C + j 2267db96d56Sopenharmony_ci `-- 3) transform the columns 2277db96d56Sopenharmony_ci 2287db96d56Sopenharmony_ci 2297db96d56Sopenharmony_ci 2307db96d56Sopenharmony_ci Note that the entire RHS is a function of m and n and that the results 2317db96d56Sopenharmony_ci for each pair (m, n) are stored in C order. 2327db96d56Sopenharmony_ci 2337db96d56Sopenharmony_ci Let the term in square brackets be f(n, i). Without step 1), the sum 2347db96d56Sopenharmony_ci would perform a length C transform on the columns of the input matrix. 2357db96d56Sopenharmony_ci This is a) inefficient and b) the results are needed in C order, so 2367db96d56Sopenharmony_ci step 1) exchanges rows and columns. 2377db96d56Sopenharmony_ci 2387db96d56Sopenharmony_ci Step 2) and 3) precalculate f(n, i) for all (n, i). After that, the 2397db96d56Sopenharmony_ci original matrix is now a lookup table with the ith element in the nth 2407db96d56Sopenharmony_ci column at location i * C + n. 2417db96d56Sopenharmony_ci 2427db96d56Sopenharmony_ci Let the complete RHS be g(m, n). Step 4) does an in-place transform of 2437db96d56Sopenharmony_ci length m on all columns. After that, the original matrix is now a lookup 2447db96d56Sopenharmony_ci table with the mth element in the nth column at location m * C + n, 2457db96d56Sopenharmony_ci which means that all A[k] = A[m * C + n] are in the correct order. 2467db96d56Sopenharmony_ci 2477db96d56Sopenharmony_ci 2487db96d56Sopenharmony_ci-- 2497db96d56Sopenharmony_ci 2507db96d56Sopenharmony_ci [1] Joerg Arndt: "Matters Computational" 2517db96d56Sopenharmony_ci http://www.jjj.de/fxt/ 2527db96d56Sopenharmony_ci [2] David H. Bailey: FFTs in External or Hierarchical Memory 2537db96d56Sopenharmony_ci http://crd.lbl.gov/~dhbailey/dhbpapers/ 2547db96d56Sopenharmony_ci 2557db96d56Sopenharmony_ci 2567db96d56Sopenharmony_ci 257