Nicholas Bell, mentored by Vincent Neiger and Xavier Caruso
Main PR: sagemath/sage#40508 (ready for review)
Additional PRs and issues:
- sagemath/sage#40291 (PR, merged)
- sagemath/sage#40423 (PR, merged)
- sagemath/sage#40425 (PR, abandoned, obsolete with sagemath/sage#40435)
- sagemath/sage#40435 (PR, merged)
- malb/m4rie#8 (PR, merged)
- sagemath/sage#40572 (issue by Vincent Neiger, resolved)
- sagemath/sage#40579 (PR by Vincent Neiger, merged)
- sagemath/sage#40652 (issue by Vincent Neiger, open)
- sagemath/sage#40735 (issue by Vincent Neiger, open)
To implement various Wiedemann-inspired algorithms in SageMath for computing a particular Krylov matrix, calculating its row rank profile without having to compute the matrix, and calculating a particular basis for the kernel of the matrix from the row rank profile.
The algorithms implemented in SageMath during this project have been based off Chapter 7 of https://arxiv.org/pdf/1512.03503. A summary of the key definitions and algorithms is given here.
For a field
.
The shifted Krylov matrix
When
. When
.
The first problem is to identify the Krylov profile of such a matrix given
The second problem is to compute a left kernel for
The permutation applied to the rows of
first by
It is also necessary to be able to identify the original
For these two purposes the approach that achieves both simply is to compute the original pairs and assign each an index before sorting:
.
I have added a helper function _krylov_row_coordinates that returns this list sorted.
The algorithm to compute the unshifted Krylov matrix
Given the matrix
For a sufficiently large power, i.e. matrix_from_rows) of
The permutation constructed via _krylov_row_coordinates is then applied to
The naive Krylov basis algorithm uses the built-in pivot_rows to compute the indices of the first linearly independent rows of the computed Krylov basis matrix_from_rows to select these rows from
The call to pivot_rows generally involves computing the transpose of _krylov_row_coordinates then provides the coordinates of each row in the basis.
I have implemented this in the function _krylov_basis_naive.
The elimination-based method is the core solution to the first problem: calculating the Krylov basis without explicitly computing
The algorithm starts with
The inductive step multiplies our best guess by
Once the degree bound is reached, the "best guess" will be the actual Krylov basis, and the list of coordinates having been tracked, may be returned to the user (with a small computation converting the coordinates to indices in
I have implemented this algorithm as _krylov_basis_elimination. For certain parameters, the naive version is faster, so the public facing function krylov_basis allows the user to select an implementation, or in the default case selects the algorithm it believes will be faster.
This algorithm starts by computing the Krylov basis
Then the following remarks can be made. Each row
The algorithm constructs a submatrix
Using
The left kernel is represented as a column-permuted
This kernel is returned, along with the indices indicating the row indices in
Optionally, the kernel may be returned in the form of a polynomial matrix, which corresponds to the linear interpolation basis described in the source paper.
I created a personal test suite early on for both verifying correctness of algorithms and comparing performance between different implementations.
Test cases are instantiated by providing a field and dimensions of the matrices generator which returns a string that will reconstruct the test instance when run in Sage. This makes failed randomly generated test cases easily rerunnable.
The test code (available here: https://github.com/Biffo89/GSoC-2025/blob/main/test_suite.sage) is not polished as it is not intended for integration into Sage, and I have made many, many changes to the test file over the course of the project, to test specific bugs, compare certain performance changes, and profile running times.
Current functioning procedures are:
sage: run_test_set(generate_test_set()) # checks correctness
sage: benchmark() # evaluates performance
In the original specification of the elimination-based algorithm, the entire permutation of
For this reason, _krylov_row_coordinates is equipped with an optional parameter row_pairs that specifies the rows in the matrix to be permuted.
The implementation of pivot_rows in Sage calculates the transpose of the given matrix by default. Unfortunately, the performance of calculating the transpose of a matrix over
The generic transpose of a matrix in Sage involves copying each element
from matrix_from_rows,
matrix_from_columns, and matrix_from_rows_and_columns as they all
involve data copies. I have added a function copy_from_unsafe for this
purpose that skips these conversions, and replaced default transpose
and matrix_from_\* implementations with versions that make use of this
more optimised function.
The PR for this change is sagemath/sage#40435.
The transpose of matrices over
Matrices over
is represented in memory as
and its transpose as
. By using reading and writing (__mzd_read_bits and
__mzd_write_bits) one
The PR for this change in M4RIE is malb/m4rie#8. As of writing this, the change has been accepted into M4RIE but not yet into Sage.
A large part of the time taken to compute the Krylov kernel basis is in the instantiation of the final polynomial matrix. Given a 2D array of lists of coefficients, the conversion to a polynomial matrix involves the conversion of each list of coefficients to a polynomial, which is a currently very costly in Sage. This is especially slow in the case of an extension field, as individual finite field elements are represented in Sage as FiniteField_givaroElement while the type stored in Polynomial_ZZ_pEX is ntl_ZZ_pE, and so each coefficient is converted from one type to another.
I had originally included a strategy to minimise this performance loss by caching the converted coefficients. While converting a list of coefficients to a polynomial is slow as it involves Givaro-NTL conversions, operations acting on already converted polynomials, such as __add__ and shift, are fast as they do not contain any such conversion. In the case of a small field relative to the number of coefficients in the matrix, the coefficients are converted degree-
A second strategy is to avoid polynomial construction entirely by returning a matrix of coefficients. To represent a polynomial matrix
is represented as
. The downside however is that these matrices are can become increasingly sparse, as the storage space for parameters
The first strategy was overly complicated, very parameter-dependent, and essentially a local fix for a quite global performance fix within the library. The second solution was less complicated but still quite costly in terms of storage. For these reasons I have provided a modified version of the second solution which returns the relevant (non-zero) columns of the coefficient matrix, sorted by the priority of their corresponding rows. This stores
This reduced matrix is returned, along with the row coordinate triplets corresponding to the preserved columns.
In certain cases when calculating _krylov_basis_elimination we may be able to eliminate rows much sooner than the base algorithm. To consider an extreme example, take j > 0 to form part of the basis, as its position in the matrix would be strictly after these initial rows.
The more general shortcut here is that if a full set of
The original algorithm for computing the Krylov basis comes with an optional parameter that allows the user to provide an upper bound on the power of
This is intended purely as a performance-improving argument given the user has additional information about the type of input. If the user provides incorrect bounds, the naive implementation should (but is not specified to) return the correct basis, while the elimination-based method may return an incorrect result. There is a warning in the documentation to provide only correct upper bounds. As a default value,
While investigating the transpose function of matrices over
sage: M = matrix.random(GF(2),3,3)
sage: N = matrix.block([[M,M,M]])
sage: M
[1 0 0]
[0 0 1]
[0 1 0]
sage: N
[1 0 0|1 0 0|1 0 0]
[0 0 1|0 0 1|0 0 1]
[0 1 0|0 1 0|0 1 0]
sage: N.transpose()
<repr(<sage.matrix.matrix_mod2_dense.Matrix_mod2_dense at 0x7f1eb7f0c1c0>) failed: IndexError: list index out of range>
The PR fixing this bug is sagemath/sage#40423.
There was also an issue with performing a transpose of empty matrices over
However, the issue was instead a problem with Sage. To construct empty matrices, the M4RIE library is not called, as there is no need to store anything more than the dimensions of the matrix. This leaves the pointer to the mzed_t null. When a non-trivial permutation is made, this triggers at least one call to the M4RIE library to swap two rows or columns. While an empty matrix would have been handled fine, Sage was passing a null pointer to the function call, causing a segmentation fault the first time it was dereferenced.
A fix for this was merged here: sagemath/sage#40291.
The problem of finding the first linearly independent rows/columns of a matrix uses Gaussian elimination. The default method calculates an echelon form for the independent columns and an echelon form of the transpose for the rows. For prime fields, LinBox contains a implementation of echelonisation that reveals both the independent rows and columns. Vincent Neiger replaced the algorithm in the case of prime fields, allowing both independent rows and columns to be calculated with one echelonisation.
- sagemath/sage#40572 (issue)
- sagemath/sage#40579 (fix)
The remaining work left to do for this includes:
- Merging the PR
- Integrating the transpose for
$\mathbb{F}_{2^e}$ matrices into Sage
Extensions to this project could include:
During the inductive step of the elimination based algorithm, the first half of the rows are linearly independent. However, this information is not used when calculating the independent rows of the entire matrix via Gaussian elimination. It could be useful to store the echelon form and use this in calculating the echelon form of the extended matrix.
Currently, the option to return a polynomial matrix instead of a coefficient matrix is very slow. This is due to slow conversion of lists of elements to polynomials, and particularly in the case of extension fields, the conversion between Givaro and NTL elements.
Angles of attack for improving performance include:
- Optimising conversion between Givaro and NTL field elements
- Optimising polynomial conversion by reducing the number of coefficient conversions by caching converted elements
- Optimising matrix conversion by caching the converted polynomials
I have raised an issue for the performance of this here: <github.com/sagemath/sage/issues/40667>.
Using the permutation methods on matrices currently requires constructing a Permutation object from a list of indices. This has the additional disadvantage that Permutation uses
The issue for this is raised here: sagemath/sage#40652
The code for this project is a generalisation of cyclic_subspace, which handles the case where cyclic_subspace. It would be worth investigating whether this is true in general and which
The issue for this is raised here: sagemath/sage#40735
The current state of the performance over a variety of fields is provided below. The results calculated have been for small values of
GF(2)
+------+------------+---------+
| n | multiply | gauss |
+======+============+=========+
| 64 | 8.55 μs | 35.5 μs |
+------+------------+---------+
| 128 | 13.8 μs | 96.2 μs |
+------+------------+---------+
| 256 | 46.4 μs | 265 μs |
+------+------------+---------+
| 512 | 210 μs | 507 μs |
+------+------------+---------+
| 1024 | 2.4 ms | 3.47 ms |
+------+------------+---------+
| 2048 | 12.2 ms | 10.4 ms |
+------+------------+---------+
uniform basis
+-------------+---------+---------+--------+---------+--------+--------+
| * n | 64 | 128 | 256 | 512 | 1024 | 2048 |
| * | | | | | | |
| m * | | | | | | |
+=============+=========+=========+========+=========+========+========+
| 1 | | | | 37.2 ms | 119 ms | 442 ms |
+-------------+---------+---------+--------+---------+--------+--------+
| 4 | | | | 57.6 ms | 179 ms | 670 ms |
+-------------+---------+---------+--------+---------+--------+--------+
| 16 | | | | 124 ms | 314 ms | 945 ms |
+-------------+---------+---------+--------+---------+--------+--------+
| 64 | 54.9 ms | 92.7 ms | 174 ms | | | |
+-------------+---------+---------+--------+---------+--------+--------+
| 128 | 133 ms | 209 ms | 351 ms | | | |
+-------------+---------+---------+--------+---------+--------+--------+
| 256 | 374 ms | 512 ms | 808 ms | | | |
+-------------+---------+---------+--------+---------+--------+--------+
hermite basis
+-------------+---------+--------+--------+---------+--------+--------+
| * n | 64 | 128 | 256 | 512 | 1024 | 2048 |
| * | | | | | | |
| m * | | | | | | |
+=============+=========+========+========+=========+========+========+
| 1 | | | | 34.8 ms | 109 ms | 471 ms |
+-------------+---------+--------+--------+---------+--------+--------+
| 4 | | | | 63.1 ms | 191 ms | 566 ms |
+-------------+---------+--------+--------+---------+--------+--------+
| 16 | | | | 169 ms | 460 ms | 1.47 s |
+-------------+---------+--------+--------+---------+--------+--------+
| 64 | 65.8 ms | 111 ms | 215 ms | | | |
+-------------+---------+--------+--------+---------+--------+--------+
| 128 | 155 ms | 241 ms | 424 ms | | | |
+-------------+---------+--------+--------+---------+--------+--------+
| 256 | 430 ms | 598 ms | 953 ms | | | |
+-------------+---------+--------+--------+---------+--------+--------+
uniform profile
+-------------+---------+---------+---------+---------+--------+--------+
| * n | 64 | 128 | 256 | 512 | 1024 | 2048 |
| * | | | | | | |
| m * | | | | | | |
+=============+=========+=========+=========+=========+========+========+
| 1 | | | | 30.9 ms | 135 ms | 511 ms |
+-------------+---------+---------+---------+---------+--------+--------+
| 4 | | | | 51.5 ms | 163 ms | 603 ms |
+-------------+---------+---------+---------+---------+--------+--------+
| 16 | | | | 77 ms | 200 ms | 650 ms |
+-------------+---------+---------+---------+---------+--------+--------+
| 64 | 18.1 ms | 35.2 ms | 78.9 ms | | | |
+-------------+---------+---------+---------+---------+--------+--------+
| 128 | 30.8 ms | 66.4 ms | 130 ms | | | |
+-------------+---------+---------+---------+---------+--------+--------+
| 256 | 58.5 ms | 119 ms | 249 ms | | | |
+-------------+---------+---------+---------+---------+--------+--------+
hermite profile
+-------------+---------+---------+--------+---------+--------+--------+
| * n | 64 | 128 | 256 | 512 | 1024 | 2048 |
| * | | | | | | |
| m * | | | | | | |
+=============+=========+=========+========+=========+========+========+
| 1 | | | | 31 ms | 134 ms | 509 ms |
+-------------+---------+---------+--------+---------+--------+--------+
| 4 | | | | 56.1 ms | 187 ms | 724 ms |
+-------------+---------+---------+--------+---------+--------+--------+
| 16 | | | | 130 ms | 332 ms | 1.27 s |
+-------------+---------+---------+--------+---------+--------+--------+
| 64 | 24.7 ms | 47.5 ms | 101 ms | | | |
+-------------+---------+---------+--------+---------+--------+--------+
| 128 | 41.3 ms | 85.8 ms | 173 ms | | | |
+-------------+---------+---------+--------+---------+--------+--------+
| 256 | 73.7 ms | 148 ms | 304 ms | | | |
+-------------+---------+---------+--------+---------+--------+--------+
GF(243)
+-----+------------+---------+
| n | multiply | gauss |
+=====+============+=========+
| 16 | 1.28 ms | 437 μs |
+-----+------------+---------+
| 32 | 9.73 ms | 2.88 ms |
+-----+------------+---------+
| 64 | 78.5 ms | 21 ms |
+-----+------------+---------+
uniform basis
+-------------+---------+---------+--------+
| * n | 16 | 32 | 64 |
| * | | | |
| m * | | | |
+=============+=========+=========+========+
| 1 | 9.63 ms | 75.8 ms | 652 ms |
+-------------+---------+---------+--------+
| 4 | 12.8 ms | 78.6 ms | 634 ms |
+-------------+---------+---------+--------+
| 16 | 13.1 ms | 66.8 ms | 514 ms |
+-------------+---------+---------+--------+
| 32 | 19.9 ms | 66.1 ms | 457 ms |
+-------------+---------+---------+--------+
| 64 | 39.7 ms | 112 ms | 440 ms |
+-------------+---------+---------+--------+
hermite basis
+-------------+---------+---------+--------+
| * n | 16 | 32 | 64 |
| * | | | |
| m * | | | |
+=============+=========+=========+========+
| 1 | 11.1 ms | 75.6 ms | 680 ms |
+-------------+---------+---------+--------+
| 4 | 20.7 ms | 103 ms | 857 ms |
+-------------+---------+---------+--------+
| 16 | 28.4 ms | 152 ms | 1.3 s |
+-------------+---------+---------+--------+
| 32 | 35 ms | 194 ms | 1.43 s |
+-------------+---------+---------+--------+
| 64 | 57.4 ms | 204 ms | 1.46 s |
+-------------+---------+---------+--------+
uniform profile
+-------------+---------+---------+--------+
| * n | 16 | 32 | 64 |
| * | | | |
| m * | | | |
+=============+=========+=========+========+
| 1 | 8.61 ms | 80.3 ms | 677 ms |
+-------------+---------+---------+--------+
| 4 | 8.66 ms | 59.9 ms | 539 ms |
+-------------+---------+---------+--------+
| 16 | 5.3 ms | 40 ms | 381 ms |
+-------------+---------+---------+--------+
| 32 | 6.69 ms | 26.3 ms | 292 ms |
+-------------+---------+---------+--------+
| 64 | 10 ms | 34.4 ms | 172 ms |
+-------------+---------+---------+--------+
hermite profile
+-------------+---------+---------+--------+
| * n | 16 | 32 | 64 |
| * | | | |
| m * | | | |
+=============+=========+=========+========+
| 1 | 12.1 ms | 79.1 ms | 496 ms |
+-------------+---------+---------+--------+
| 4 | 15.5 ms | 109 ms | 956 ms |
+-------------+---------+---------+--------+
| 16 | 17.4 ms | 140 ms | 1.02 s |
+-------------+---------+---------+--------+
| 32 | 18.7 ms | 161 ms | 1.26 s |
+-------------+---------+---------+--------+
| 64 | 26 ms | 165 ms | 1.18 s |
+-------------+---------+---------+--------+
GF(256)
+-----+------------+---------+
| n | multiply | gauss |
+=====+============+=========+
| 64 | 545 μs | 480 μs |
+-----+------------+---------+
| 128 | 1.54 ms | 1.32 ms |
+-----+------------+---------+
| 256 | 5.51 ms | 4.34 ms |
+-----+------------+---------+
| 512 | 17.9 ms | 18.1 ms |
+-----+------------+---------+
uniform basis
+-------------+---------+---------+--------+--------+
| * n | 64 | 128 | 256 | 512 |
| * | | | | |
| m * | | | | |
+=============+=========+=========+========+========+
| 1 | | 36.8 ms | 127 ms | 426 ms |
+-------------+---------+---------+--------+--------+
| 4 | | 50.1 ms | 154 ms | 625 ms |
+-------------+---------+---------+--------+--------+
| 16 | | 55.3 ms | 156 ms | 579 ms |
+-------------+---------+---------+--------+--------+
| 64 | 61.3 ms | 115 ms | 291 ms | |
+-------------+---------+---------+--------+--------+
| 128 | 139 ms | 227 ms | 424 ms | |
+-------------+---------+---------+--------+--------+
| 256 | 385 ms | 556 ms | 869 ms | |
+-------------+---------+---------+--------+--------+
hermite basis
+-------------+---------+---------+--------+--------+
| * n | 64 | 128 | 256 | 512 |
| * | | | | |
| m * | | | | |
+=============+=========+=========+========+========+
| 1 | | 37.3 ms | 115 ms | 424 ms |
+-------------+---------+---------+--------+--------+
| 4 | | 62.1 ms | 180 ms | 852 ms |
+-------------+---------+---------+--------+--------+
| 16 | | 95.3 ms | 241 ms | 1.15 s |
+-------------+---------+---------+--------+--------+
| 64 | 87.6 ms | 170 ms | 403 ms | |
+-------------+---------+---------+--------+--------+
| 128 | 177 ms | 335 ms | 627 ms | |
+-------------+---------+---------+--------+--------+
| 256 | 469 ms | 655 ms | 1.17 s | |
+-------------+---------+---------+--------+--------+
uniform profile
+-------------+---------+---------+---------+--------+
| * n | 64 | 128 | 256 | 512 |
| * | | | | |
| m * | | | | |
+=============+=========+=========+=========+========+
| 1 | | 37.6 ms | 124 ms | 439 ms |
+-------------+---------+---------+---------+--------+
| 4 | | 35.4 ms | 109 ms | 439 ms |
+-------------+---------+---------+---------+--------+
| 16 | | 35.8 ms | 99.8 ms | 406 ms |
+-------------+---------+---------+---------+--------+
| 64 | 20.6 ms | 46.6 ms | 120 ms | |
+-------------+---------+---------+---------+--------+
| 128 | 34.6 ms | 71.6 ms | 172 ms | |
+-------------+---------+---------+---------+--------+
| 256 | 61.1 ms | 127 ms | 273 ms | |
+-------------+---------+---------+---------+--------+
hermite profile
+-------------+---------+---------+--------+--------+
| * n | 64 | 128 | 256 | 512 |
| * | | | | |
| m * | | | | |
+=============+=========+=========+========+========+
| 1 | | 38.2 ms | 118 ms | 341 ms |
+-------------+---------+---------+--------+--------+
| 4 | | 49.8 ms | 156 ms | 608 ms |
+-------------+---------+---------+--------+--------+
| 16 | | 64.5 ms | 189 ms | 985 ms |
+-------------+---------+---------+--------+--------+
| 64 | 40 ms | 91.5 ms | 289 ms | |
+-------------+---------+---------+--------+--------+
| 128 | 56.9 ms | 137 ms | 357 ms | |
+-------------+---------+---------+--------+--------+
| 256 | 93.4 ms | 201 ms | 507 ms | |
+-------------+---------+---------+--------+--------+
GF(257)
+-----+------------+---------+
| n | multiply | gauss |
+=====+============+=========+
| 64 | 146 μs | 619 μs |
+-----+------------+---------+
| 128 | 571 μs | 2.43 ms |
+-----+------------+---------+
| 256 | 2.55 ms | 6.93 ms |
+-----+------------+---------+
| 512 | 13.1 ms | 25.5 ms |
+-----+------------+---------+
uniform basis
+-------------+---------+---------+---------+--------+
| * n | 64 | 128 | 256 | 512 |
| * | | | | |
| m * | | | | |
+=============+=========+=========+=========+========+
| 1 | | 20.9 ms | 72.1 ms | 317 ms |
+-------------+---------+---------+---------+--------+
| 4 | | 29.7 ms | 100 ms | 378 ms |
+-------------+---------+---------+---------+--------+
| 16 | | 47.7 ms | 126 ms | 425 ms |
+-------------+---------+---------+---------+--------+
| 64 | 66.6 ms | 117 ms | 242 ms | |
+-------------+---------+---------+---------+--------+
| 128 | 140 ms | 230 ms | 429 ms | |
+-------------+---------+---------+---------+--------+
| 256 | 419 ms | 545 ms | 892 ms | |
+-------------+---------+---------+---------+--------+
hermite basis
+-------------+---------+---------+---------+--------+
| * n | 64 | 128 | 256 | 512 |
| * | | | | |
| m * | | | | |
+=============+=========+=========+=========+========+
| 1 | | 21.8 ms | 72.1 ms | 317 ms |
+-------------+---------+---------+---------+--------+
| 4 | | 29.4 ms | 106 ms | 367 ms |
+-------------+---------+---------+---------+--------+
| 16 | | 78.1 ms | 188 ms | 780 ms |
+-------------+---------+---------+---------+--------+
| 64 | 81.2 ms | 165 ms | 367 ms | |
+-------------+---------+---------+---------+--------+
| 128 | 166 ms | 300 ms | 605 ms | |
+-------------+---------+---------+---------+--------+
| 256 | 457 ms | 659 ms | 1.17 s | |
+-------------+---------+---------+---------+--------+
uniform profile
+-------------+---------+---------+---------+--------+
| * n | 64 | 128 | 256 | 512 |
| * | | | | |
| m * | | | | |
+=============+=========+=========+=========+========+
| 1 | | 14.3 ms | 46.6 ms | 254 ms |
+-------------+---------+---------+---------+--------+
| 4 | | 20.2 ms | 64 ms | 262 ms |
+-------------+---------+---------+---------+--------+
| 16 | | 24.6 ms | 70.2 ms | 253 ms |
+-------------+---------+---------+---------+--------+
| 64 | 19 ms | 43.8 ms | 105 ms | |
+-------------+---------+---------+---------+--------+
| 128 | 33 ms | 71 ms | 158 ms | |
+-------------+---------+---------+---------+--------+
| 256 | 59.7 ms | 126 ms | 267 ms | |
+-------------+---------+---------+---------+--------+
hermite profile
+-------------+---------+---------+---------+--------+
| * n | 64 | 128 | 256 | 512 |
| * | | | | |
| m * | | | | |
+=============+=========+=========+=========+========+
| 1 | | 19 ms | 62.3 ms | 197 ms |
+-------------+---------+---------+---------+--------+
| 4 | | 33.4 ms | 99.8 ms | 338 ms |
+-------------+---------+---------+---------+--------+
| 16 | | 47.6 ms | 148 ms | 589 ms |
+-------------+---------+---------+---------+--------+
| 64 | 32.7 ms | 87.2 ms | 238 ms | |
+-------------+---------+---------+---------+--------+
| 128 | 49.1 ms | 127 ms | 301 ms | |
+-------------+---------+---------+---------+--------+
| 256 | 80.8 ms | 190 ms | 486 ms | |
+-------------+---------+---------+---------+--------+
GF(59049)
+-----+------------+---------+
| n | multiply | gauss |
+=====+============+=========+
| 8 | 339 μs | 195 μs |
+-----+------------+---------+
| 16 | 2.97 ms | 1.25 ms |
+-----+------------+---------+
| 32 | 23.1 ms | 8.95 ms |
+-----+------------+---------+
uniform basis
+-------------+---------+---------+--------+
| * n | 8 | 16 | 32 |
| * | | | |
| m * | | | |
+=============+=========+=========+========+
| 1 | 4.08 ms | 23.6 ms | 190 ms |
+-------------+---------+---------+--------+
| 4 | 5.2 ms | 25.9 ms | 197 ms |
+-------------+---------+---------+--------+
| 8 | 5.16 ms | 24.3 ms | 179 ms |
+-------------+---------+---------+--------+
| 16 | 7.21 ms | 23.9 ms | 164 ms |
+-------------+---------+---------+--------+
| 32 | 12.8 ms | 34.1 ms | 182 ms |
+-------------+---------+---------+--------+
hermite basis
+-------------+---------+---------+--------+
| * n | 8 | 16 | 32 |
| * | | | |
| m * | | | |
+=============+=========+=========+========+
| 1 | 4.03 ms | 23.7 ms | 191 ms |
+-------------+---------+---------+--------+
| 4 | 7.96 ms | 42.7 ms | 291 ms |
+-------------+---------+---------+--------+
| 8 | 8.16 ms | 39.4 ms | 345 ms |
+-------------+---------+---------+--------+
| 16 | 13 ms | 49.9 ms | 413 ms |
+-------------+---------+---------+--------+
| 32 | 17.9 ms | 58.8 ms | 490 ms |
+-------------+---------+---------+--------+
uniform profile
+-------------+---------+---------+---------+
| * n | 8 | 16 | 32 |
| * | | | |
| m * | | | |
+=============+=========+=========+=========+
| 1 | 4.43 ms | 16 ms | 188 ms |
+-------------+---------+---------+---------+
| 4 | 3.13 ms | 18.9 ms | 152 ms |
+-------------+---------+---------+---------+
| 8 | 2.5 ms | 14.2 ms | 132 ms |
+-------------+---------+---------+---------+
| 16 | 2.78 ms | 9.87 ms | 102 ms |
+-------------+---------+---------+---------+
| 32 | 3.73 ms | 12.5 ms | 63.4 ms |
+-------------+---------+---------+---------+
hermite profile
+-------------+---------+---------+--------+
| * n | 8 | 16 | 32 |
| * | | | |
| m * | | | |
+=============+=========+=========+========+
| 1 | 4.48 ms | 24.1 ms | 132 ms |
+-------------+---------+---------+--------+
| 4 | 6.67 ms | 27.8 ms | 288 ms |
+-------------+---------+---------+--------+
| 8 | 7.38 ms | 40.5 ms | 305 ms |
+-------------+---------+---------+--------+
| 16 | 7.59 ms | 49.6 ms | 377 ms |
+-------------+---------+---------+--------+
| 32 | 8.82 ms | 47.5 ms | 406 ms |
+-------------+---------+---------+--------+
GF(65536)
+-----+------------+---------+
| n | multiply | gauss |
+=====+============+=========+
| 8 | 152 μs | 138 μs |
+-----+------------+---------+
| 16 | 12.2 ms | 10.6 ms |
+-----+------------+---------+
| 32 | 35.6 ms | 27.1 ms |
+-----+------------+---------+
uniform basis
+-------------+---------+--------+--------+
| * n | 8 | 16 | 32 |
| * | | | |
| m * | | | |
+=============+=========+========+========+
| 1 | 13.1 ms | 121 ms | 386 ms |
+-------------+---------+--------+--------+
| 4 | 14 ms | 138 ms | 409 ms |
+-------------+---------+--------+--------+
| 8 | 12.9 ms | 105 ms | 387 ms |
+-------------+---------+--------+--------+
| 16 | 20.1 ms | 127 ms | 427 ms |
+-------------+---------+--------+--------+
| 32 | 24.8 ms | 134 ms | 336 ms |
+-------------+---------+--------+--------+
hermite basis
+-------------+---------+--------+--------+
| * n | 8 | 16 | 32 |
| * | | | |
| m * | | | |
+=============+=========+========+========+
| 1 | 13.4 ms | 123 ms | 376 ms |
+-------------+---------+--------+--------+
| 4 | 17.8 ms | 210 ms | 476 ms |
+-------------+---------+--------+--------+
| 8 | 16.2 ms | 245 ms | 606 ms |
+-------------+---------+--------+--------+
| 16 | 23.3 ms | 292 ms | 877 ms |
+-------------+---------+--------+--------+
| 32 | 28.5 ms | 258 ms | 928 ms |
+-------------+---------+--------+--------+
uniform profile
+-------------+---------+---------+--------+
| * n | 8 | 16 | 32 |
| * | | | |
| m * | | | |
+=============+=========+=========+========+
| 1 | 6.86 ms | 69.9 ms | 378 ms |
+-------------+---------+---------+--------+
| 4 | 4.19 ms | 90.4 ms | 302 ms |
+-------------+---------+---------+--------+
| 8 | 3.07 ms | 76.7 ms | 245 ms |
+-------------+---------+---------+--------+
| 16 | 3.27 ms | 52.9 ms | 217 ms |
+-------------+---------+---------+--------+
| 32 | 3.98 ms | 51.8 ms | 132 ms |
+-------------+---------+---------+--------+
hermite profile
+-------------+---------+--------+--------+
| * n | 8 | 16 | 32 |
| * | | | |
| m * | | | |
+=============+=========+========+========+
| 1 | 7.12 ms | 77 ms | 367 ms |
+-------------+---------+--------+--------+
| 4 | 7.55 ms | 171 ms | 363 ms |
+-------------+---------+--------+--------+
| 8 | 6.09 ms | 173 ms | 492 ms |
+-------------+---------+--------+--------+
| 16 | 6.4 ms | 180 ms | 724 ms |
+-------------+---------+--------+--------+
| 32 | 7.13 ms | 219 ms | 642 ms |
+-------------+---------+--------+--------+
GF(65537)
+-----+------------+---------+
| n | multiply | gauss |
+=====+============+=========+
| 64 | 159 μs | 181 μs |
+-----+------------+---------+
| 128 | 682 μs | 586 μs |
+-----+------------+---------+
| 256 | 3.43 ms | 3.36 ms |
+-----+------------+---------+
| 512 | 19.7 ms | 18.9 ms |
+-----+------------+---------+
uniform basis
+-------------+---------+---------+---------+--------+
| * n | 64 | 128 | 256 | 512 |
| * | | | | |
| m * | | | | |
+=============+=========+=========+=========+========+
| 1 | | 19.8 ms | 80.1 ms | 405 ms |
+-------------+---------+---------+---------+--------+
| 4 | | 25.3 ms | 98.8 ms | 436 ms |
+-------------+---------+---------+---------+--------+
| 16 | | 42.2 ms | 130 ms | 513 ms |
+-------------+---------+---------+---------+--------+
| 64 | 62.6 ms | 114 ms | 257 ms | |
+-------------+---------+---------+---------+--------+
| 128 | 145 ms | 236 ms | 451 ms | |
+-------------+---------+---------+---------+--------+
| 256 | 411 ms | 566 ms | 952 ms | |
+-------------+---------+---------+---------+--------+
hermite basis
+-------------+---------+---------+---------+--------+
| * n | 64 | 128 | 256 | 512 |
| * | | | | |
| m * | | | | |
+=============+=========+=========+=========+========+
| 1 | | 18.6 ms | 80.1 ms | 400 ms |
+-------------+---------+---------+---------+--------+
| 4 | | 24.3 ms | 98.7 ms | 536 ms |
+-------------+---------+---------+---------+--------+
| 16 | | 55.5 ms | 184 ms | 873 ms |
+-------------+---------+---------+---------+--------+
| 64 | 75.8 ms | 147 ms | 374 ms | |
+-------------+---------+---------+---------+--------+
| 128 | 172 ms | 283 ms | 619 ms | |
+-------------+---------+---------+---------+--------+
| 256 | 471 ms | 661 ms | 1.17 s | |
+-------------+---------+---------+---------+--------+
uniform profile
+-------------+---------+---------+---------+--------+
| * n | 64 | 128 | 256 | 512 |
| * | | | | |
| m * | | | | |
+=============+=========+=========+=========+========+
| 1 | | 14.9 ms | 43.6 ms | 235 ms |
+-------------+---------+---------+---------+--------+
| 4 | | 15 ms | 57.6 ms | 291 ms |
+-------------+---------+---------+---------+--------+
| 16 | | 18.8 ms | 60.5 ms | 276 ms |
+-------------+---------+---------+---------+--------+
| 64 | 17.5 ms | 37.1 ms | 93.8 ms | |
+-------------+---------+---------+---------+--------+
| 128 | 30.8 ms | 65.4 ms | 148 ms | |
+-------------+---------+---------+---------+--------+
| 256 | 58.3 ms | 122 ms | 261 ms | |
+-------------+---------+---------+---------+--------+
hermite profile
+-------------+---------+---------+---------+--------+
| * n | 64 | 128 | 256 | 512 |
| * | | | | |
| m * | | | | |
+=============+=========+=========+=========+========+
| 1 | | 11.7 ms | 44 ms | 235 ms |
+-------------+---------+---------+---------+--------+
| 4 | | 21.5 ms | 84.3 ms | 404 ms |
+-------------+---------+---------+---------+--------+
| 16 | | 33.1 ms | 127 ms | 585 ms |
+-------------+---------+---------+---------+--------+
| 64 | 27.3 ms | 61.9 ms | 203 ms | |
+-------------+---------+---------+---------+--------+
| 128 | 42.1 ms | 97.4 ms | 283 ms | |
+-------------+---------+---------+---------+--------+
| 256 | 79.1 ms | 165 ms | 424 ms | |
+-------------+---------+---------+---------+--------+
GF(67108879)
+-----+------------+---------+
| n | multiply | gauss |
+=====+============+=========+
| 32 | 80.9 μs | 210 μs |
+-----+------------+---------+
| 64 | 450 μs | 1 ms |
+-----+------------+---------+
| 128 | 3.35 ms | 6.72 ms |
+-----+------------+---------+
uniform basis
+-------------+---------+---------+---------+
| * n | 32 | 64 | 128 |
| * | | | |
| m * | | | |
+=============+=========+=========+=========+
| 1 | 3.69 ms | 11 ms | 56.6 ms |
+-------------+---------+---------+---------+
| 4 | 4.79 ms | 16.3 ms | 116 ms |
+-------------+---------+---------+---------+
| 16 | 10.2 ms | 24 ms | 95.2 ms |
+-------------+---------+---------+---------+
| 32 | 8.11 ms | 16.5 ms | 114 ms |
+-------------+---------+---------+---------+
| 64 | 38.2 ms | 64.7 ms | 159 ms |
+-------------+---------+---------+---------+
| 128 | 101 ms | 148 ms | 277 ms |
+-------------+---------+---------+---------+
hermite basis
+-------------+---------+---------+---------+
| * n | 32 | 64 | 128 |
| * | | | |
| m * | | | |
+=============+=========+=========+=========+
| 1 | 3.68 ms | 11.1 ms | 60.3 ms |
+-------------+---------+---------+---------+
| 4 | 5.04 ms | 16 ms | 81.1 ms |
+-------------+---------+---------+---------+
| 16 | 15 ms | 34.5 ms | 179 ms |
+-------------+---------+---------+---------+
| 32 | 10.9 ms | 56.1 ms | 239 ms |
+-------------+---------+---------+---------+
| 64 | 47.1 ms | 88.3 ms | 308 ms |
+-------------+---------+---------+---------+
| 128 | 120 ms | 182 ms | 459 ms |
+-------------+---------+---------+---------+
uniform profile
+-------------+---------+---------+---------+
| * n | 32 | 64 | 128 |
| * | | | |
| m * | | | |
+=============+=========+=========+=========+
| 1 | 3.32 ms | 4.42 ms | 49.3 ms |
+-------------+---------+---------+---------+
| 4 | 3.74 ms | 10.4 ms | 56 ms |
+-------------+---------+---------+---------+
| 16 | 4.27 ms | 11.4 ms | 55.8 ms |
+-------------+---------+---------+---------+
| 32 | 5.65 ms | 14.1 ms | 26.3 ms |
+-------------+---------+---------+---------+
| 64 | 9.02 ms | 19.7 ms | 67.5 ms |
+-------------+---------+---------+---------+
| 128 | 15.8 ms | 34.1 ms | 90 ms |
+-------------+---------+---------+---------+
hermite profile
+-------------+---------+---------+---------+
| * n | 32 | 64 | 128 |
| * | | | |
| m * | | | |
+=============+=========+=========+=========+
| 1 | 1.31 ms | 3.72 ms | 49.2 ms |
+-------------+---------+---------+---------+
| 4 | 5.99 ms | 16 ms | 80.2 ms |
+-------------+---------+---------+---------+
| 16 | 8.46 ms | 24.8 ms | 147 ms |
+-------------+---------+---------+---------+
| 32 | 10.9 ms | 22.4 ms | 77.4 ms |
+-------------+---------+---------+---------+
| 64 | 14.5 ms | 42.7 ms | 223 ms |
+-------------+---------+---------+---------+
| 128 | 22.6 ms | 57 ms | 256 ms |
+-------------+---------+---------+---------+