Skip to content

Instantly share code, notes, and snippets.

@Biffo89
Last active September 1, 2025 14:49
Show Gist options
  • Select an option

  • Save Biffo89/af96850071fd0212ac64b6ba7b03698e to your computer and use it in GitHub Desktop.

Select an option

Save Biffo89/af96850071fd0212ac64b6ba7b03698e to your computer and use it in GitHub Desktop.
GSoC Final Product Submission

GSoC 2025 Product Submission

Nicholas Bell, mentored by Vincent Neiger and Xavier Caruso

Main PR: sagemath/sage#40508 (ready for review)

Additional PRs and issues:

Aim of the project

To implement various Wiedemann-inspired algorithms in SageMath for computing a particular Krylov matrix, calculating its row rank profile without having to compute the matrix, and calculating a particular basis for the kernel of the matrix from the row rank profile.

Problem

The algorithms implemented in SageMath during this project have been based off Chapter 7 of https://arxiv.org/pdf/1512.03503. A summary of the key definitions and algorithms is given here.

Krylov matrix

For a field $\mathbb{K}$, let $E \in \mathbb{K}^{m\times n}$ and $M \in \mathbb{K}^{n\times n}$, and $s \in \mathbb{Z}^{m}$, and some maximum degree $d \in \mathbb{Z}$. We define the unshifted matrix

$$K^* := \begin{bmatrix} E \\\ EM \\\ EM^2 \\\ EM^3 \\\ \vdots \\\ EM^d \end{bmatrix} = \begin{bmatrix} E_0 \\\ \vdots \\\ E_{m-1} \\\ E_0 M \\\ \vdots \\\ E_{m-1} M \\\ \vdots \\\ E_0 M^d \\\ \vdots \\\ E_{m-1} M^d \end{bmatrix}$$

.

The shifted Krylov matrix $K$ is defined as the $K^*$ with the rows $E_i M^j$ reordered first by $s_i + j$, then by $s_i$, both in ascending order. An example used in the source paper is the following over $\mathbb{F}_{97}$

$$E = \begin{bmatrix} 27 & 49 & 29 \\\ 50 & 58 & 0 \\\ 77 & 10 & 29 \end{bmatrix} , J = \begin{bmatrix} 0 & 1 & 0 \\\ 0 & 0 & 1 \\\ 0 & 0 & 0 \end{bmatrix} \\\ , d = 3$$ $$K^* = \begin{bmatrix} 27 & 49 & 29 \\\ 50 & 58 & 0 \\\ 77 & 10 & 29 \\\ 0 & 27 & 49 \\\ 0 & 50 & 58 \\\ 0 & 77 & 10 \\\ 0 & 0 & 27 \\\ 0 & 0 & 50 \\\ 0 & 0 & 77 \\\ 0 & 0 & 0 \\\ 0 & 0 & 0 \\\ 0 & 0 & 0 \end{bmatrix}$$

When $s = (0,0,0)$ (the uniform shift), $K = K^*$. When $s=(0,d,2d)=(0,3,6)$ (the hermite shift), $K$ is the following matrix:

$$\begin{bmatrix} 27 & 49 & 29 \\\ 0 & 27 & 49 \\\ 0 & 0 & 27 \\\ 0 & 0 & 0 \\\ 50 & 58 & 0 \\\ 0 & 50 & 58 \\\ 0 & 0 & 50 \\\ 0 & 0 & 0 \\\ 77 & 10 & 29 \\\ 0 & 77 & 10 \\\ 0 & 0 & 77 \\\ 0 & 0 & 0 \end{bmatrix}$$

. When $s = (3,0,2)$, $K$ is the following matrix:

$$\begin{bmatrix} 50 & 58 & 0 \\\ 0 & 50 & 58 \\\ 0 & 0 & 50 \\\ 77 & 10 & 29 \\\ 27 & 49 & 29 \\\ 0 & 0 & 0 \\\ 0 & 77 & 10 \\\ 0 & 27 & 49 \\\ 0 & 0 & 77 \\\ 0 & 0 & 27 \\\ 0 & 0 & 0 \\\ 0 & 0 & 0 \end{bmatrix}$$

.

The first problem is to identify the Krylov profile of such a matrix given $E,M,s,d$, i.e. the first rows of $K$ that form a basis spanning the subspace spanned by all the rows. As construction of such a matrix may be very costly for large parameters, the aim is to compute this without having to construct the full matrix.

The second problem is to compute a left kernel for $K$ in row-reduced echelon form. In the original paper, this is characterised as an $s$-minimal linear interpolation basis, considering the rows of $E$ as a set of elements of the $\mathbb{K}[x]$-module defined with multiplication matrix $M$.

Basic algorithm description

Computing the permutation

The permutation applied to the rows of $K^*$ to obtain $K$ is characterised by a sorting of the coordinates

$$\left((0,0),\dots,(m-1,0),(0,1),\dots,(m-1,1),\dots,(0,d),\dots,(m-1,d)\right)$$

first by $(i,j) \to s_i + j$, then by $(i,j) \to i$, both in ascending order.

It is also necessary to be able to identify the original $(i,j)$ coordinate given the row index in $K$.

For these two purposes the approach that achieves both simply is to compute the original pairs and assign each an index before sorting:

$$\left((0,0,0),\dots,(m-1,0,m-1),(0,1,m),\dots,(m-1,1,2m-1),\dots,(0,d,md),\dots\right)$$

. I have added a helper function _krylov_row_coordinates that returns this list sorted.

Computing the Krylov matrix

The algorithm to compute the unshifted Krylov matrix $K^*$ is composed of two steps: computing the powers of $2$ of $M$ by repeated squaring, and computing the products $\begin{bmatrix}E\\EM\\EM^2\\\vdots\end{bmatrix}$ by multiplication by these powers of $M$.

Given the matrix $\begin{bmatrix}E\\\vdots\\EM^{2^j-1}\end{bmatrix}$ and $M^{2^{j-1}}$, the next step is computed inductively.

$$M^{2^j} = \left(M^{2^{j-1}}\right)^2$$ $$\begin{bmatrix}E\\\vdots\\EM^{2^{j+1}-1}\end{bmatrix} = \begin{bmatrix}\begin{bmatrix}E\\\vdots\\EM^{2^j-1}\end{bmatrix}\\\begin{bmatrix}E\\\vdots\\EM^{2^j-1}\end{bmatrix} M^{2^j}\end{bmatrix}$$

For a sufficiently large power, i.e. $2^j \geq \sigma$, $K^*$ is computed as a submatrix (using matrix_from_rows) of $\begin{bmatrix}E\\\vdots\\EM^{2^{j+1}-1}\end{bmatrix}$.

The permutation constructed via _krylov_row_coordinates is then applied to $K^*$ to give $K$.

Computing the naive Krylov basis (or row rank profile)

The naive Krylov basis algorithm uses the built-in pivot_rows to compute the indices of the first linearly independent rows of the computed Krylov basis $K$, then matrix_from_rows to select these rows from $K$ for the output.

The call to pivot_rows generally involves computing the transpose of $K$, then computing the RREF of this transpose via Linbox. This reveals the row rank profile of $K$. The output from _krylov_row_coordinates then provides the coordinates of each row in the basis.

I have implemented this in the function _krylov_basis_naive.

Computing the elimination Krylov basis

The elimination-based method is the core solution to the first problem: calculating the Krylov basis without explicitly computing $K$. It has the same idea of repeated multiplication by powers of $2$ of $M$. However, while the naive method performs each multiplication between the submatrices of $K^*$ and $M$ repeatedly, only sorting rows and finding the independent rows at the end, this optimised method sorts rows and eliminates dependent ones after each multiplication by a power of $M$, thereby reducing the size of the submatrix of $K$ stored in memory, and reducing the cost of performing multiplications and eliminations.

The algorithm starts with $E$ sorted according to our shift, and selects its first independent rows. These are denoted by coordinates $(i_1,0),\dots,(i_r,0)$ where $r$ is the rank of $E$. This is our initial "best guess" for the basis of $K$. It is also the Krylov basis for the same parameters, but setting $d=0$.

The inductive step multiplies our best guess by $M^{2^j}$, which has the effect of appending $(i,j+M^{2^j})$ to the list of rows for every $(i,j)$. Then the rows must be sorted again according to the shift. A further Gaussian elimination is performed to find the independent rows, and the rest are discarded.

Once the degree bound is reached, the "best guess" will be the actual Krylov basis, and the list of coordinates having been tracked, may be returned to the user (with a small computation converting the coordinates to indices in $K$).

I have implemented this algorithm as _krylov_basis_elimination. For certain parameters, the naive version is faster, so the public facing function krylov_basis allows the user to select an implementation, or in the default case selects the algorithm it believes will be faster.

Krylov kernel basis

This algorithm starts by computing the Krylov basis $B$ for the same parameters.

Then the following remarks can be made. Each row $E_i$ either appears in $B$ as some $E_i M^j$ or is excluded. The returned row coordinates indicate this. For every row $E_i$ appearing in $B$ as some $E_i M^j$, there must be a maximum such $j$ for this row. Let $\delta_i := j+1$ for this maximum $j$. For all rows $E_i$ not appearing in $B$, $\delta_i = 0$.

The algorithm constructs a submatrix $D$ of $K$ formed from the rows $`E_i M^{\delta_i}$ (the actual algorithm also selects certain columns, but this is a detail).

Using $D$ and the basis $B$, a sort of linear recurrence relation $R$ can be computed which transforms the basis into $D$.

The left kernel is represented as a column-permuted $\begin{bmatrix} -R \vert I_m \end{bmatrix}$, noting that $\begin{bmatrix} -R \vert I_m \end{bmatrix} \begin{bmatrix} B \\ D \end{bmatrix} = 0$.

This kernel is returned, along with the indices indicating the row indices in $B$ and $D$ after sorting, which indicate which rows to match the columns of the kernel to.

Optionally, the kernel may be returned in the form of a polynomial matrix, which corresponds to the linear interpolation basis described in the source paper.

Test suite

I created a personal test suite early on for both verifying correctness of algorithms and comparing performance between different implementations.

Test case generation

Test cases are instantiated by providing a field and dimensions of the matrices $E$ and $M$. Cases can either be generated randomly according to certain constraints chosen for $E$ and $M$, or the matrices, shifts and degree bounds can be set explicitly. Each test case has a method generator which returns a string that will reconstruct the test instance when run in Sage. This makes failed randomly generated test cases easily rerunnable.

The test code (available here: https://github.com/Biffo89/GSoC-2025/blob/main/test_suite.sage) is not polished as it is not intended for integration into Sage, and I have made many, many changes to the test file over the course of the project, to test specific bugs, compare certain performance changes, and profile running times.

Current functioning procedures are:

sage: run_test_set(generate_test_set()) # checks correctness
sage: benchmark() # evaluates performance

Optimisations and extensions to the algorithms

Calculation of subpermutations

In the original specification of the elimination-based algorithm, the entire permutation of $K$ is calculated explicitly beforehand. However, noticing that this permutation is just a sorting of the rows according to the original row $c$ in $E$ and the power $d$ of $M$, and that each application is on a much smaller instance (the first being only on $E$, the rest being on $R_{2^n}^*$), it is only necessary to calculate a permutation sorting the relevant submatrix each time.

For this reason, _krylov_row_coordinates is equipped with an optional parameter row_pairs that specifies the rows in the matrix to be permuted.

Optimisation of transpose of matrices over $\mathbb{F}_{2^n}$

The implementation of pivot_rows in Sage calculates the transpose of the given matrix by default. Unfortunately, the performance of calculating the transpose of a matrix over $\mathbb{F}_{2^n}$ was very slow, and for many other fields was also suboptimal.

Default transpose (and matrix_from_*)

The generic transpose of a matrix in Sage involves copying each element from $A_{i,j}$ to $B_{j,i}$. This by default involves two conversions: matrix data to Sage object for the read, and Sage object to matrix data for the write. A similar problem exists with matrix_from_rows, matrix_from_columns, and matrix_from_rows_and_columns as they all involve data copies. I have added a function copy_from_unsafe for this purpose that skips these conversions, and replaced default transpose and matrix_from_\* implementations with versions that make use of this more optimised function.

The PR for this change is sagemath/sage#40435.

Optimised transpose over $\mathbb{F}_{2^n}$

The transpose of matrices over $\mathbb{F}_2$ is much faster than what is possible copying each element one-by-one. I decided to analyse the implementation of matrices over $\mathbb{F}_{2^n}$ to see if a similar improvement could be made.

Matrices over $\mathbb{F}_{2^n}$ are represented in memory by the M4RIE library as matrices over $\mathbb{F}_2$, with each entry being stored horizontally, and padded to take up a number of bits dividing the word length $64$, with the each row padded to a multiple of $64$ bits. For example, the following matrix (over $\mathbb{F}_{32}$)

$$\begin{bmatrix} x^4 + x^2 & 1 & x^2 + x + 1 & x^3 + x \\\ x^3 + x & x^2 + x & x^2 + 1 & x^4 + 1 \\\ x^2 & x^3 + x^2 + x + 1 & x^2 + 1 & x^3 + x \\\ x^4 + x^3 + x^2 + x + 1 & 0 & x^3 + x + 1 & x^2 + 1 \end{bmatrix}$$

is represented in memory as

$$\begin{bmatrix} \texttt{\textcolor{red}{00101}000 \textcolor{red}{10000}000 \textcolor{red}{11100}000 \textcolor{red}{01010}000 00000000 00000000 00000000 00000000} \\\ \texttt{\textcolor{red}{01010}000 \textcolor{red}{01100}000 \textcolor{red}{10100}000 \textcolor{red}{10001}000 00000000 00000000 00000000 00000000} \\\ \texttt{\textcolor{red}{00100}000 \textcolor{red}{11110}000 \textcolor{red}{10100}000 \textcolor{red}{01010}000 00000000 00000000 00000000 00000000} \\\ \texttt{\textcolor{red}{11111}000 \textcolor{red}{00000}000 \textcolor{red}{11010}000 \textcolor{red}{10100}000 00000000 00000000 00000000 00000000} \end{bmatrix}$$

and its transpose as

$$\begin{bmatrix} \texttt{\textcolor{red}{00101}000 \textcolor{red}{01010}000 \textcolor{red}{00100}000 \textcolor{red}{11111}000 00000000 00000000 00000000 00000000} \\\ \texttt{\textcolor{red}{10000}000 \textcolor{red}{01100}000 \textcolor{red}{11110}000 \textcolor{red}{00000}000 00000000 00000000 00000000 00000000} \\\ \texttt{\textcolor{red}{11100}000 \textcolor{red}{10100}000 \textcolor{red}{10100}000 \textcolor{red}{11010}000 00000000 00000000 00000000 00000000} \\\ \texttt{\textcolor{red}{01010}000 \textcolor{red}{10001}000 \textcolor{red}{01010}000 \textcolor{red}{10100}000 00000000 00000000 00000000 00000000} \end{bmatrix}$$

. By using reading and writing (__mzd_read_bits and __mzd_write_bits) one $64$-bit word at a time, and using bit manipulation to move the data from each word read into the appropriate place in the write buffer, I was able to get the performance of the $\mathbb{F}_{2^n}$ transpose down to ~2x the CPU cycles per bit as $\mathbb{F}_2$ of similar size in memory, compared to ~9-23x for the naive transpose.

The PR for this change in M4RIE is malb/m4rie#8. As of writing this, the change has been accepted into M4RIE but not yet into Sage.

Performance issues with polynomial matrix construction

A large part of the time taken to compute the Krylov kernel basis is in the instantiation of the final polynomial matrix. Given a 2D array of lists of coefficients, the conversion to a polynomial matrix involves the conversion of each list of coefficients to a polynomial, which is a currently very costly in Sage. This is especially slow in the case of an extension field, as individual finite field elements are represented in Sage as FiniteField_givaroElement while the type stored in Polynomial_ZZ_pEX is ntl_ZZ_pE, and so each coefficient is converted from one type to another.

I had originally included a strategy to minimise this performance loss by caching the converted coefficients. While converting a list of coefficients to a polynomial is slow as it involves Givaro-NTL conversions, operations acting on already converted polynomials, such as __add__ and shift, are fast as they do not contain any such conversion. In the case of a small field relative to the number of coefficients in the matrix, the coefficients are converted degree-$0$ monomials and cached. The cached values are then shifted by the corresponding degree and summed.

A second strategy is to avoid polynomial construction entirely by returning a matrix of coefficients. To represent a polynomial matrix $P$ of dimension $m \times m$ and degree $d$, we return a coefficient matrix $C$ of dimension Using this approach, the following matrix over $\mathbb{F}_5$

$$\begin{bmatrix}x^2 + 4x + 2 & x & 4 & 2 \\\ 3x & x^2 + 3x + 3 & 4 & 0 \\\ 2x + 1 & 4 & x + 2 & 4 \\\ 2x & x + 2 & 2 & x + 2\end{bmatrix}$$

is represented as

$$\begin{bmatrix} \texttt{2 0 4 2 4 1 0 0 1 0 0 0} \\\ \texttt{0 3 4 0 3 3 0 0 0 1 0 0} \\\ \texttt{1 4 2 4 2 0 1 0 0 0 0 0} \\\ \texttt{0 2 2 2 2 1 0 1 0 0 0 0} \end{bmatrix}$$

. The downside however is that these matrices are can become increasingly sparse, as the storage space for parameters $m$ and $n$ can be up to $m^2(n+1)$ coefficients in $\mathbb{K}$ while only at most $m(n+2)$ non-zero coefficients are stored.

The first strategy was overly complicated, very parameter-dependent, and essentially a local fix for a quite global performance fix within the library. The second solution was less complicated but still quite costly in terms of storage. For these reasons I have provided a modified version of the second solution which returns the relevant (non-zero) columns of the coefficient matrix, sorted by the priority of their corresponding rows. This stores $m(n+2)$ coefficients in a matrix of size $m(m+n)$, which is a major improvement on the original.

This reduced matrix is returned, along with the row coordinate triplets corresponding to the preserved columns.

Early row exclusion

In certain cases when calculating _krylov_basis_elimination we may be able to eliminate rows much sooner than the base algorithm. To consider an extreme example, take $E$ to be a $1000\times1000$ matrix of full rank, and $M$ to be an arbitrary $1000\times1000$ matrix, with a uniform shift (i.e. $[0,\dots,0]$). After the first $1000$ rows $\left\{E_{i}M^0\right\}_{i=0}^1000$ of $K$ are identified as being linearly independent, it is not possible for any row $E_i M^j$ where j > 0 to form part of the basis, as its position in the matrix would be strictly after these initial rows.

The more general shortcut here is that if a full set of $n$ linearly independent rows has been identified in the Krylov matrix, all following rows in the Krylov matrix $K$ need not be examined. Given that the sequence of degrees is always increasing, it is possible to store certain basis candidate rows separately in the induction step, to minimise the cost and number of each multiplication and row reduction.

Differing degree bounds

The original algorithm for computing the Krylov basis comes with an optional parameter that allows the user to provide an upper bound on the power of $M$, given the user is aware beforehand of such a bound (on the infinite Krylov matrix). I have modified this to accept a separate degree bound for each row of $E$ in the case the user has even more specific information about the relations between the rows of the Krylov matrix.

This is intended purely as a performance-improving argument given the user has additional information about the type of input. If the user provides incorrect bounds, the naive implementation should (but is not specified to) return the correct basis, while the elimination-based method may return an incorrect result. There is a warning in the documentation to provide only correct upper bounds. As a default value, $n$ is chosen as it is known to be the an upper bound in general.

Bonus bugs found

Transpose of GF(2) matrix with subdivisions

While investigating the transpose function of matrices over $\mathbb{F}_{2^e}$, I noticed an inconsistency in the handling of subdivisions for $\mathbb{F}_{2}$. Subdivisions were not being transposed, so the following would fail as the function would attempt to set out-of-bounds subdivisions:

sage: M = matrix.random(GF(2),3,3)
sage: N = matrix.block([[M,M,M]])
sage: M
[1 0 0]
[0 0 1]
[0 1 0]
sage: N
[1 0 0|1 0 0|1 0 0]
[0 0 1|0 0 1|0 0 1]
[0 1 0|0 1 0|0 1 0]
sage: N.transpose()
<repr(<sage.matrix.matrix_mod2_dense.Matrix_mod2_dense at 0x7f1eb7f0c1c0>) failed: IndexError: list index out of range>

The PR fixing this bug is sagemath/sage#40423.

Permutation of empty GF(2^e) matrix

There was also an issue with performing a transpose of empty matrices over $\mathbb{F}_{2^e}$. When calculating a non-trivial row permutation of a matrix with $0$ columns or a non-trivial column permutation of a matrix with $0$ rows, this would cause a segmentation fault in M4RIE. This initially seemed like a problem with the M4RIE library, as the fault was being encountered a few function calls deep.

However, the issue was instead a problem with Sage. To construct empty matrices, the M4RIE library is not called, as there is no need to store anything more than the dimensions of the matrix. This leaves the pointer to the mzed_t null. When a non-trivial permutation is made, this triggers at least one call to the M4RIE library to swap two rows or columns. While an empty matrix would have been handled fine, Sage was passing a null pointer to the function call, causing a segmentation fault the first time it was dereferenced.

A fix for this was merged here: sagemath/sage#40291.

Issues completed by others as part of this project

Calculation of row and column pivots using one Gaussian elimination

The problem of finding the first linearly independent rows/columns of a matrix uses Gaussian elimination. The default method calculates an echelon form for the independent columns and an echelon form of the transpose for the rows. For prime fields, LinBox contains a implementation of echelonisation that reveals both the independent rows and columns. Vincent Neiger replaced the algorithm in the case of prime fields, allowing both independent rows and columns to be calculated with one echelonisation.

Work left

The remaining work left to do for this includes:

  • Merging the PR
  • Integrating the transpose for $\mathbb{F}_{2^e}$ matrices into Sage

Extensions to this project could include:

Smart calculation of echelon form

During the inductive step of the elimination based algorithm, the first half of the rows are linearly independent. However, this information is not used when calculating the independent rows of the entire matrix via Gaussian elimination. It could be useful to store the echelon form and use this in calculating the echelon form of the extended matrix.

Optimising polynomial matrix construction

Currently, the option to return a polynomial matrix instead of a coefficient matrix is very slow. This is due to slow conversion of lists of elements to polynomials, and particularly in the case of extension fields, the conversion between Givaro and NTL elements.

Angles of attack for improving performance include:

  • Optimising conversion between Givaro and NTL field elements
  • Optimising polynomial conversion by reducing the number of coefficient conversions by caching converted elements
  • Optimising matrix conversion by caching the converted polynomials

I have raised an issue for the performance of this here: <github.com/sagemath/sage/issues/40667>.

Modifying permute_rows and permute_columns to accept a list of indices

Using the permutation methods on matrices currently requires constructing a Permutation object from a list of indices. This has the additional disadvantage that Permutation uses $1$-based indexing. An issue has been raised for accepting a list of indices as the permutation.

The issue for this is raised here: sagemath/sage#40652

Updating implementation of cyclic_subspace

The code for this project is a generalisation of cyclic_subspace, which handles the case where $m=1$. The PR for this project is better performing than the code for cyclic_subspace. It would be worth investigating whether this is true in general and which

The issue for this is raised here: sagemath/sage#40735

Performance

The current state of the performance over a variety of fields is provided below. The results calculated have been for small values of $m$ ($1$, $4$, $16$) and large values of $m$ ($\frac{n}{4}$, $\frac{n}{2}$, $n$). All matrices are generated uniformly at random, and the two shifts examined are the uniform shift $[0,\dots,0]$ and the hermite shift $[0,n,2n,\dots,(m-1)n]$, which correspond more or less to the best and worst case. The uniform shift case should find the basis quickly, while the hermite shift likely requires performing the maximum number of inductive steps to find the basis and must therefore perform the maximum number of multiplications and Gaussian eliminations.

GF(2)

+------+------------+---------+
|    n | multiply   | gauss   |
+======+============+=========+
|   64 | 8.55 μs    | 35.5 μs |
+------+------------+---------+
|  128 | 13.8 μs    | 96.2 μs |
+------+------------+---------+
|  256 | 46.4 μs    | 265 μs  |
+------+------------+---------+
|  512 | 210 μs     | 507 μs  |
+------+------------+---------+
| 1024 | 2.4 ms     | 3.47 ms |
+------+------------+---------+
| 2048 | 12.2 ms    | 10.4 ms |
+------+------------+---------+

uniform basis
+-------------+---------+---------+--------+---------+--------+--------+
|   *       n | 64      | 128     | 256    | 512     | 1024   | 2048   |
|       *     |         |         |        |         |        |        |
|   m       * |         |         |        |         |        |        |
+=============+=========+=========+========+=========+========+========+
|           1 |         |         |        | 37.2 ms | 119 ms | 442 ms |
+-------------+---------+---------+--------+---------+--------+--------+
|           4 |         |         |        | 57.6 ms | 179 ms | 670 ms |
+-------------+---------+---------+--------+---------+--------+--------+
|          16 |         |         |        | 124 ms  | 314 ms | 945 ms |
+-------------+---------+---------+--------+---------+--------+--------+
|          64 | 54.9 ms | 92.7 ms | 174 ms |         |        |        |
+-------------+---------+---------+--------+---------+--------+--------+
|         128 | 133 ms  | 209 ms  | 351 ms |         |        |        |
+-------------+---------+---------+--------+---------+--------+--------+
|         256 | 374 ms  | 512 ms  | 808 ms |         |        |        |
+-------------+---------+---------+--------+---------+--------+--------+

hermite basis
+-------------+---------+--------+--------+---------+--------+--------+
|   *       n | 64      | 128    | 256    | 512     | 1024   | 2048   |
|       *     |         |        |        |         |        |        |
|   m       * |         |        |        |         |        |        |
+=============+=========+========+========+=========+========+========+
|           1 |         |        |        | 34.8 ms | 109 ms | 471 ms |
+-------------+---------+--------+--------+---------+--------+--------+
|           4 |         |        |        | 63.1 ms | 191 ms | 566 ms |
+-------------+---------+--------+--------+---------+--------+--------+
|          16 |         |        |        | 169 ms  | 460 ms | 1.47 s |
+-------------+---------+--------+--------+---------+--------+--------+
|          64 | 65.8 ms | 111 ms | 215 ms |         |        |        |
+-------------+---------+--------+--------+---------+--------+--------+
|         128 | 155 ms  | 241 ms | 424 ms |         |        |        |
+-------------+---------+--------+--------+---------+--------+--------+
|         256 | 430 ms  | 598 ms | 953 ms |         |        |        |
+-------------+---------+--------+--------+---------+--------+--------+

uniform profile
+-------------+---------+---------+---------+---------+--------+--------+
|   *       n | 64      | 128     | 256     | 512     | 1024   | 2048   |
|       *     |         |         |         |         |        |        |
|   m       * |         |         |         |         |        |        |
+=============+=========+=========+=========+=========+========+========+
|           1 |         |         |         | 30.9 ms | 135 ms | 511 ms |
+-------------+---------+---------+---------+---------+--------+--------+
|           4 |         |         |         | 51.5 ms | 163 ms | 603 ms |
+-------------+---------+---------+---------+---------+--------+--------+
|          16 |         |         |         | 77 ms   | 200 ms | 650 ms |
+-------------+---------+---------+---------+---------+--------+--------+
|          64 | 18.1 ms | 35.2 ms | 78.9 ms |         |        |        |
+-------------+---------+---------+---------+---------+--------+--------+
|         128 | 30.8 ms | 66.4 ms | 130 ms  |         |        |        |
+-------------+---------+---------+---------+---------+--------+--------+
|         256 | 58.5 ms | 119 ms  | 249 ms  |         |        |        |
+-------------+---------+---------+---------+---------+--------+--------+

hermite profile
+-------------+---------+---------+--------+---------+--------+--------+
|   *       n | 64      | 128     | 256    | 512     | 1024   | 2048   |
|       *     |         |         |        |         |        |        |
|   m       * |         |         |        |         |        |        |
+=============+=========+=========+========+=========+========+========+
|           1 |         |         |        | 31 ms   | 134 ms | 509 ms |
+-------------+---------+---------+--------+---------+--------+--------+
|           4 |         |         |        | 56.1 ms | 187 ms | 724 ms |
+-------------+---------+---------+--------+---------+--------+--------+
|          16 |         |         |        | 130 ms  | 332 ms | 1.27 s |
+-------------+---------+---------+--------+---------+--------+--------+
|          64 | 24.7 ms | 47.5 ms | 101 ms |         |        |        |
+-------------+---------+---------+--------+---------+--------+--------+
|         128 | 41.3 ms | 85.8 ms | 173 ms |         |        |        |
+-------------+---------+---------+--------+---------+--------+--------+
|         256 | 73.7 ms | 148 ms  | 304 ms |         |        |        |
+-------------+---------+---------+--------+---------+--------+--------+

GF(243)

+-----+------------+---------+
|   n | multiply   | gauss   |
+=====+============+=========+
|  16 | 1.28 ms    | 437 μs  |
+-----+------------+---------+
|  32 | 9.73 ms    | 2.88 ms |
+-----+------------+---------+
|  64 | 78.5 ms    | 21 ms   |
+-----+------------+---------+

uniform basis
+-------------+---------+---------+--------+
|   *       n | 16      | 32      | 64     |
|       *     |         |         |        |
|   m       * |         |         |        |
+=============+=========+=========+========+
|           1 | 9.63 ms | 75.8 ms | 652 ms |
+-------------+---------+---------+--------+
|           4 | 12.8 ms | 78.6 ms | 634 ms |
+-------------+---------+---------+--------+
|          16 | 13.1 ms | 66.8 ms | 514 ms |
+-------------+---------+---------+--------+
|          32 | 19.9 ms | 66.1 ms | 457 ms |
+-------------+---------+---------+--------+
|          64 | 39.7 ms | 112 ms  | 440 ms |
+-------------+---------+---------+--------+

hermite basis
+-------------+---------+---------+--------+
|   *       n | 16      | 32      | 64     |
|       *     |         |         |        |
|   m       * |         |         |        |
+=============+=========+=========+========+
|           1 | 11.1 ms | 75.6 ms | 680 ms |
+-------------+---------+---------+--------+
|           4 | 20.7 ms | 103 ms  | 857 ms |
+-------------+---------+---------+--------+
|          16 | 28.4 ms | 152 ms  | 1.3 s  |
+-------------+---------+---------+--------+
|          32 | 35 ms   | 194 ms  | 1.43 s |
+-------------+---------+---------+--------+
|          64 | 57.4 ms | 204 ms  | 1.46 s |
+-------------+---------+---------+--------+

uniform profile
+-------------+---------+---------+--------+
|   *       n | 16      | 32      | 64     |
|       *     |         |         |        |
|   m       * |         |         |        |
+=============+=========+=========+========+
|           1 | 8.61 ms | 80.3 ms | 677 ms |
+-------------+---------+---------+--------+
|           4 | 8.66 ms | 59.9 ms | 539 ms |
+-------------+---------+---------+--------+
|          16 | 5.3 ms  | 40 ms   | 381 ms |
+-------------+---------+---------+--------+
|          32 | 6.69 ms | 26.3 ms | 292 ms |
+-------------+---------+---------+--------+
|          64 | 10 ms   | 34.4 ms | 172 ms |
+-------------+---------+---------+--------+

hermite profile
+-------------+---------+---------+--------+
|   *       n | 16      | 32      | 64     |
|       *     |         |         |        |
|   m       * |         |         |        |
+=============+=========+=========+========+
|           1 | 12.1 ms | 79.1 ms | 496 ms |
+-------------+---------+---------+--------+
|           4 | 15.5 ms | 109 ms  | 956 ms |
+-------------+---------+---------+--------+
|          16 | 17.4 ms | 140 ms  | 1.02 s |
+-------------+---------+---------+--------+
|          32 | 18.7 ms | 161 ms  | 1.26 s |
+-------------+---------+---------+--------+
|          64 | 26 ms   | 165 ms  | 1.18 s |
+-------------+---------+---------+--------+

GF(256)

+-----+------------+---------+
|   n | multiply   | gauss   |
+=====+============+=========+
|  64 | 545 μs     | 480 μs  |
+-----+------------+---------+
| 128 | 1.54 ms    | 1.32 ms |
+-----+------------+---------+
| 256 | 5.51 ms    | 4.34 ms |
+-----+------------+---------+
| 512 | 17.9 ms    | 18.1 ms |
+-----+------------+---------+

uniform basis
+-------------+---------+---------+--------+--------+
|   *       n | 64      | 128     | 256    | 512    |
|       *     |         |         |        |        |
|   m       * |         |         |        |        |
+=============+=========+=========+========+========+
|           1 |         | 36.8 ms | 127 ms | 426 ms |
+-------------+---------+---------+--------+--------+
|           4 |         | 50.1 ms | 154 ms | 625 ms |
+-------------+---------+---------+--------+--------+
|          16 |         | 55.3 ms | 156 ms | 579 ms |
+-------------+---------+---------+--------+--------+
|          64 | 61.3 ms | 115 ms  | 291 ms |        |
+-------------+---------+---------+--------+--------+
|         128 | 139 ms  | 227 ms  | 424 ms |        |
+-------------+---------+---------+--------+--------+
|         256 | 385 ms  | 556 ms  | 869 ms |        |
+-------------+---------+---------+--------+--------+

hermite basis
+-------------+---------+---------+--------+--------+
|   *       n | 64      | 128     | 256    | 512    |
|       *     |         |         |        |        |
|   m       * |         |         |        |        |
+=============+=========+=========+========+========+
|           1 |         | 37.3 ms | 115 ms | 424 ms |
+-------------+---------+---------+--------+--------+
|           4 |         | 62.1 ms | 180 ms | 852 ms |
+-------------+---------+---------+--------+--------+
|          16 |         | 95.3 ms | 241 ms | 1.15 s |
+-------------+---------+---------+--------+--------+
|          64 | 87.6 ms | 170 ms  | 403 ms |        |
+-------------+---------+---------+--------+--------+
|         128 | 177 ms  | 335 ms  | 627 ms |        |
+-------------+---------+---------+--------+--------+
|         256 | 469 ms  | 655 ms  | 1.17 s |        |
+-------------+---------+---------+--------+--------+

uniform profile
+-------------+---------+---------+---------+--------+
|   *       n | 64      | 128     | 256     | 512    |
|       *     |         |         |         |        |
|   m       * |         |         |         |        |
+=============+=========+=========+=========+========+
|           1 |         | 37.6 ms | 124 ms  | 439 ms |
+-------------+---------+---------+---------+--------+
|           4 |         | 35.4 ms | 109 ms  | 439 ms |
+-------------+---------+---------+---------+--------+
|          16 |         | 35.8 ms | 99.8 ms | 406 ms |
+-------------+---------+---------+---------+--------+
|          64 | 20.6 ms | 46.6 ms | 120 ms  |        |
+-------------+---------+---------+---------+--------+
|         128 | 34.6 ms | 71.6 ms | 172 ms  |        |
+-------------+---------+---------+---------+--------+
|         256 | 61.1 ms | 127 ms  | 273 ms  |        |
+-------------+---------+---------+---------+--------+

hermite profile
+-------------+---------+---------+--------+--------+
|   *       n | 64      | 128     | 256    | 512    |
|       *     |         |         |        |        |
|   m       * |         |         |        |        |
+=============+=========+=========+========+========+
|           1 |         | 38.2 ms | 118 ms | 341 ms |
+-------------+---------+---------+--------+--------+
|           4 |         | 49.8 ms | 156 ms | 608 ms |
+-------------+---------+---------+--------+--------+
|          16 |         | 64.5 ms | 189 ms | 985 ms |
+-------------+---------+---------+--------+--------+
|          64 | 40 ms   | 91.5 ms | 289 ms |        |
+-------------+---------+---------+--------+--------+
|         128 | 56.9 ms | 137 ms  | 357 ms |        |
+-------------+---------+---------+--------+--------+
|         256 | 93.4 ms | 201 ms  | 507 ms |        |
+-------------+---------+---------+--------+--------+

GF(257)

+-----+------------+---------+
|   n | multiply   | gauss   |
+=====+============+=========+
|  64 | 146 μs     | 619 μs  |
+-----+------------+---------+
| 128 | 571 μs     | 2.43 ms |
+-----+------------+---------+
| 256 | 2.55 ms    | 6.93 ms |
+-----+------------+---------+
| 512 | 13.1 ms    | 25.5 ms |
+-----+------------+---------+

uniform basis
+-------------+---------+---------+---------+--------+
|   *       n | 64      | 128     | 256     | 512    |
|       *     |         |         |         |        |
|   m       * |         |         |         |        |
+=============+=========+=========+=========+========+
|           1 |         | 20.9 ms | 72.1 ms | 317 ms |
+-------------+---------+---------+---------+--------+
|           4 |         | 29.7 ms | 100 ms  | 378 ms |
+-------------+---------+---------+---------+--------+
|          16 |         | 47.7 ms | 126 ms  | 425 ms |
+-------------+---------+---------+---------+--------+
|          64 | 66.6 ms | 117 ms  | 242 ms  |        |
+-------------+---------+---------+---------+--------+
|         128 | 140 ms  | 230 ms  | 429 ms  |        |
+-------------+---------+---------+---------+--------+
|         256 | 419 ms  | 545 ms  | 892 ms  |        |
+-------------+---------+---------+---------+--------+

hermite basis
+-------------+---------+---------+---------+--------+
|   *       n | 64      | 128     | 256     | 512    |
|       *     |         |         |         |        |
|   m       * |         |         |         |        |
+=============+=========+=========+=========+========+
|           1 |         | 21.8 ms | 72.1 ms | 317 ms |
+-------------+---------+---------+---------+--------+
|           4 |         | 29.4 ms | 106 ms  | 367 ms |
+-------------+---------+---------+---------+--------+
|          16 |         | 78.1 ms | 188 ms  | 780 ms |
+-------------+---------+---------+---------+--------+
|          64 | 81.2 ms | 165 ms  | 367 ms  |        |
+-------------+---------+---------+---------+--------+
|         128 | 166 ms  | 300 ms  | 605 ms  |        |
+-------------+---------+---------+---------+--------+
|         256 | 457 ms  | 659 ms  | 1.17 s  |        |
+-------------+---------+---------+---------+--------+

uniform profile
+-------------+---------+---------+---------+--------+
|   *       n | 64      | 128     | 256     | 512    |
|       *     |         |         |         |        |
|   m       * |         |         |         |        |
+=============+=========+=========+=========+========+
|           1 |         | 14.3 ms | 46.6 ms | 254 ms |
+-------------+---------+---------+---------+--------+
|           4 |         | 20.2 ms | 64 ms   | 262 ms |
+-------------+---------+---------+---------+--------+
|          16 |         | 24.6 ms | 70.2 ms | 253 ms |
+-------------+---------+---------+---------+--------+
|          64 | 19 ms   | 43.8 ms | 105 ms  |        |
+-------------+---------+---------+---------+--------+
|         128 | 33 ms   | 71 ms   | 158 ms  |        |
+-------------+---------+---------+---------+--------+
|         256 | 59.7 ms | 126 ms  | 267 ms  |        |
+-------------+---------+---------+---------+--------+

hermite profile
+-------------+---------+---------+---------+--------+
|   *       n | 64      | 128     | 256     | 512    |
|       *     |         |         |         |        |
|   m       * |         |         |         |        |
+=============+=========+=========+=========+========+
|           1 |         | 19 ms   | 62.3 ms | 197 ms |
+-------------+---------+---------+---------+--------+
|           4 |         | 33.4 ms | 99.8 ms | 338 ms |
+-------------+---------+---------+---------+--------+
|          16 |         | 47.6 ms | 148 ms  | 589 ms |
+-------------+---------+---------+---------+--------+
|          64 | 32.7 ms | 87.2 ms | 238 ms  |        |
+-------------+---------+---------+---------+--------+
|         128 | 49.1 ms | 127 ms  | 301 ms  |        |
+-------------+---------+---------+---------+--------+
|         256 | 80.8 ms | 190 ms  | 486 ms  |        |
+-------------+---------+---------+---------+--------+

GF(59049)

+-----+------------+---------+
|   n | multiply   | gauss   |
+=====+============+=========+
|   8 | 339 μs     | 195 μs  |
+-----+------------+---------+
|  16 | 2.97 ms    | 1.25 ms |
+-----+------------+---------+
|  32 | 23.1 ms    | 8.95 ms |
+-----+------------+---------+

uniform basis
+-------------+---------+---------+--------+
|   *       n | 8       | 16      | 32     |
|       *     |         |         |        |
|   m       * |         |         |        |
+=============+=========+=========+========+
|           1 | 4.08 ms | 23.6 ms | 190 ms |
+-------------+---------+---------+--------+
|           4 | 5.2 ms  | 25.9 ms | 197 ms |
+-------------+---------+---------+--------+
|           8 | 5.16 ms | 24.3 ms | 179 ms |
+-------------+---------+---------+--------+
|          16 | 7.21 ms | 23.9 ms | 164 ms |
+-------------+---------+---------+--------+
|          32 | 12.8 ms | 34.1 ms | 182 ms |
+-------------+---------+---------+--------+

hermite basis
+-------------+---------+---------+--------+
|   *       n | 8       | 16      | 32     |
|       *     |         |         |        |
|   m       * |         |         |        |
+=============+=========+=========+========+
|           1 | 4.03 ms | 23.7 ms | 191 ms |
+-------------+---------+---------+--------+
|           4 | 7.96 ms | 42.7 ms | 291 ms |
+-------------+---------+---------+--------+
|           8 | 8.16 ms | 39.4 ms | 345 ms |
+-------------+---------+---------+--------+
|          16 | 13 ms   | 49.9 ms | 413 ms |
+-------------+---------+---------+--------+
|          32 | 17.9 ms | 58.8 ms | 490 ms |
+-------------+---------+---------+--------+

uniform profile
+-------------+---------+---------+---------+
|   *       n | 8       | 16      | 32      |
|       *     |         |         |         |
|   m       * |         |         |         |
+=============+=========+=========+=========+
|           1 | 4.43 ms | 16 ms   | 188 ms  |
+-------------+---------+---------+---------+
|           4 | 3.13 ms | 18.9 ms | 152 ms  |
+-------------+---------+---------+---------+
|           8 | 2.5 ms  | 14.2 ms | 132 ms  |
+-------------+---------+---------+---------+
|          16 | 2.78 ms | 9.87 ms | 102 ms  |
+-------------+---------+---------+---------+
|          32 | 3.73 ms | 12.5 ms | 63.4 ms |
+-------------+---------+---------+---------+

hermite profile
+-------------+---------+---------+--------+
|   *       n | 8       | 16      | 32     |
|       *     |         |         |        |
|   m       * |         |         |        |
+=============+=========+=========+========+
|           1 | 4.48 ms | 24.1 ms | 132 ms |
+-------------+---------+---------+--------+
|           4 | 6.67 ms | 27.8 ms | 288 ms |
+-------------+---------+---------+--------+
|           8 | 7.38 ms | 40.5 ms | 305 ms |
+-------------+---------+---------+--------+
|          16 | 7.59 ms | 49.6 ms | 377 ms |
+-------------+---------+---------+--------+
|          32 | 8.82 ms | 47.5 ms | 406 ms |
+-------------+---------+---------+--------+

GF(65536)

+-----+------------+---------+
|   n | multiply   | gauss   |
+=====+============+=========+
|   8 | 152 μs     | 138 μs  |
+-----+------------+---------+
|  16 | 12.2 ms    | 10.6 ms |
+-----+------------+---------+
|  32 | 35.6 ms    | 27.1 ms |
+-----+------------+---------+

uniform basis
+-------------+---------+--------+--------+
|   *       n | 8       | 16     | 32     |
|       *     |         |        |        |
|   m       * |         |        |        |
+=============+=========+========+========+
|           1 | 13.1 ms | 121 ms | 386 ms |
+-------------+---------+--------+--------+
|           4 | 14 ms   | 138 ms | 409 ms |
+-------------+---------+--------+--------+
|           8 | 12.9 ms | 105 ms | 387 ms |
+-------------+---------+--------+--------+
|          16 | 20.1 ms | 127 ms | 427 ms |
+-------------+---------+--------+--------+
|          32 | 24.8 ms | 134 ms | 336 ms |
+-------------+---------+--------+--------+

hermite basis
+-------------+---------+--------+--------+
|   *       n | 8       | 16     | 32     |
|       *     |         |        |        |
|   m       * |         |        |        |
+=============+=========+========+========+
|           1 | 13.4 ms | 123 ms | 376 ms |
+-------------+---------+--------+--------+
|           4 | 17.8 ms | 210 ms | 476 ms |
+-------------+---------+--------+--------+
|           8 | 16.2 ms | 245 ms | 606 ms |
+-------------+---------+--------+--------+
|          16 | 23.3 ms | 292 ms | 877 ms |
+-------------+---------+--------+--------+
|          32 | 28.5 ms | 258 ms | 928 ms |
+-------------+---------+--------+--------+

uniform profile
+-------------+---------+---------+--------+
|   *       n | 8       | 16      | 32     |
|       *     |         |         |        |
|   m       * |         |         |        |
+=============+=========+=========+========+
|           1 | 6.86 ms | 69.9 ms | 378 ms |
+-------------+---------+---------+--------+
|           4 | 4.19 ms | 90.4 ms | 302 ms |
+-------------+---------+---------+--------+
|           8 | 3.07 ms | 76.7 ms | 245 ms |
+-------------+---------+---------+--------+
|          16 | 3.27 ms | 52.9 ms | 217 ms |
+-------------+---------+---------+--------+
|          32 | 3.98 ms | 51.8 ms | 132 ms |
+-------------+---------+---------+--------+

hermite profile
+-------------+---------+--------+--------+
|   *       n | 8       | 16     | 32     |
|       *     |         |        |        |
|   m       * |         |        |        |
+=============+=========+========+========+
|           1 | 7.12 ms | 77 ms  | 367 ms |
+-------------+---------+--------+--------+
|           4 | 7.55 ms | 171 ms | 363 ms |
+-------------+---------+--------+--------+
|           8 | 6.09 ms | 173 ms | 492 ms |
+-------------+---------+--------+--------+
|          16 | 6.4 ms  | 180 ms | 724 ms |
+-------------+---------+--------+--------+
|          32 | 7.13 ms | 219 ms | 642 ms |
+-------------+---------+--------+--------+

GF(65537)

+-----+------------+---------+
|   n | multiply   | gauss   |
+=====+============+=========+
|  64 | 159 μs     | 181 μs  |
+-----+------------+---------+
| 128 | 682 μs     | 586 μs  |
+-----+------------+---------+
| 256 | 3.43 ms    | 3.36 ms |
+-----+------------+---------+
| 512 | 19.7 ms    | 18.9 ms |
+-----+------------+---------+

uniform basis
+-------------+---------+---------+---------+--------+
|   *       n | 64      | 128     | 256     | 512    |
|       *     |         |         |         |        |
|   m       * |         |         |         |        |
+=============+=========+=========+=========+========+
|           1 |         | 19.8 ms | 80.1 ms | 405 ms |
+-------------+---------+---------+---------+--------+
|           4 |         | 25.3 ms | 98.8 ms | 436 ms |
+-------------+---------+---------+---------+--------+
|          16 |         | 42.2 ms | 130 ms  | 513 ms |
+-------------+---------+---------+---------+--------+
|          64 | 62.6 ms | 114 ms  | 257 ms  |        |
+-------------+---------+---------+---------+--------+
|         128 | 145 ms  | 236 ms  | 451 ms  |        |
+-------------+---------+---------+---------+--------+
|         256 | 411 ms  | 566 ms  | 952 ms  |        |
+-------------+---------+---------+---------+--------+

hermite basis
+-------------+---------+---------+---------+--------+
|   *       n | 64      | 128     | 256     | 512    |
|       *     |         |         |         |        |
|   m       * |         |         |         |        |
+=============+=========+=========+=========+========+
|           1 |         | 18.6 ms | 80.1 ms | 400 ms |
+-------------+---------+---------+---------+--------+
|           4 |         | 24.3 ms | 98.7 ms | 536 ms |
+-------------+---------+---------+---------+--------+
|          16 |         | 55.5 ms | 184 ms  | 873 ms |
+-------------+---------+---------+---------+--------+
|          64 | 75.8 ms | 147 ms  | 374 ms  |        |
+-------------+---------+---------+---------+--------+
|         128 | 172 ms  | 283 ms  | 619 ms  |        |
+-------------+---------+---------+---------+--------+
|         256 | 471 ms  | 661 ms  | 1.17 s  |        |
+-------------+---------+---------+---------+--------+

uniform profile
+-------------+---------+---------+---------+--------+
|   *       n | 64      | 128     | 256     | 512    |
|       *     |         |         |         |        |
|   m       * |         |         |         |        |
+=============+=========+=========+=========+========+
|           1 |         | 14.9 ms | 43.6 ms | 235 ms |
+-------------+---------+---------+---------+--------+
|           4 |         | 15 ms   | 57.6 ms | 291 ms |
+-------------+---------+---------+---------+--------+
|          16 |         | 18.8 ms | 60.5 ms | 276 ms |
+-------------+---------+---------+---------+--------+
|          64 | 17.5 ms | 37.1 ms | 93.8 ms |        |
+-------------+---------+---------+---------+--------+
|         128 | 30.8 ms | 65.4 ms | 148 ms  |        |
+-------------+---------+---------+---------+--------+
|         256 | 58.3 ms | 122 ms  | 261 ms  |        |
+-------------+---------+---------+---------+--------+

hermite profile
+-------------+---------+---------+---------+--------+
|   *       n | 64      | 128     | 256     | 512    |
|       *     |         |         |         |        |
|   m       * |         |         |         |        |
+=============+=========+=========+=========+========+
|           1 |         | 11.7 ms | 44 ms   | 235 ms |
+-------------+---------+---------+---------+--------+
|           4 |         | 21.5 ms | 84.3 ms | 404 ms |
+-------------+---------+---------+---------+--------+
|          16 |         | 33.1 ms | 127 ms  | 585 ms |
+-------------+---------+---------+---------+--------+
|          64 | 27.3 ms | 61.9 ms | 203 ms  |        |
+-------------+---------+---------+---------+--------+
|         128 | 42.1 ms | 97.4 ms | 283 ms  |        |
+-------------+---------+---------+---------+--------+
|         256 | 79.1 ms | 165 ms  | 424 ms  |        |
+-------------+---------+---------+---------+--------+

GF(67108879)

+-----+------------+---------+
|   n | multiply   | gauss   |
+=====+============+=========+
|  32 | 80.9 μs    | 210 μs  |
+-----+------------+---------+
|  64 | 450 μs     | 1 ms    |
+-----+------------+---------+
| 128 | 3.35 ms    | 6.72 ms |
+-----+------------+---------+

uniform basis
+-------------+---------+---------+---------+
|   *       n | 32      | 64      | 128     |
|       *     |         |         |         |
|   m       * |         |         |         |
+=============+=========+=========+=========+
|           1 | 3.69 ms | 11 ms   | 56.6 ms |
+-------------+---------+---------+---------+
|           4 | 4.79 ms | 16.3 ms | 116 ms  |
+-------------+---------+---------+---------+
|          16 | 10.2 ms | 24 ms   | 95.2 ms |
+-------------+---------+---------+---------+
|          32 | 8.11 ms | 16.5 ms | 114 ms  |
+-------------+---------+---------+---------+
|          64 | 38.2 ms | 64.7 ms | 159 ms  |
+-------------+---------+---------+---------+
|         128 | 101 ms  | 148 ms  | 277 ms  |
+-------------+---------+---------+---------+

hermite basis
+-------------+---------+---------+---------+
|   *       n | 32      | 64      | 128     |
|       *     |         |         |         |
|   m       * |         |         |         |
+=============+=========+=========+=========+
|           1 | 3.68 ms | 11.1 ms | 60.3 ms |
+-------------+---------+---------+---------+
|           4 | 5.04 ms | 16 ms   | 81.1 ms |
+-------------+---------+---------+---------+
|          16 | 15 ms   | 34.5 ms | 179 ms  |
+-------------+---------+---------+---------+
|          32 | 10.9 ms | 56.1 ms | 239 ms  |
+-------------+---------+---------+---------+
|          64 | 47.1 ms | 88.3 ms | 308 ms  |
+-------------+---------+---------+---------+
|         128 | 120 ms  | 182 ms  | 459 ms  |
+-------------+---------+---------+---------+

uniform profile
+-------------+---------+---------+---------+
|   *       n | 32      | 64      | 128     |
|       *     |         |         |         |
|   m       * |         |         |         |
+=============+=========+=========+=========+
|           1 | 3.32 ms | 4.42 ms | 49.3 ms |
+-------------+---------+---------+---------+
|           4 | 3.74 ms | 10.4 ms | 56 ms   |
+-------------+---------+---------+---------+
|          16 | 4.27 ms | 11.4 ms | 55.8 ms |
+-------------+---------+---------+---------+
|          32 | 5.65 ms | 14.1 ms | 26.3 ms |
+-------------+---------+---------+---------+
|          64 | 9.02 ms | 19.7 ms | 67.5 ms |
+-------------+---------+---------+---------+
|         128 | 15.8 ms | 34.1 ms | 90 ms   |
+-------------+---------+---------+---------+

hermite profile
+-------------+---------+---------+---------+
|   *       n | 32      | 64      | 128     |
|       *     |         |         |         |
|   m       * |         |         |         |
+=============+=========+=========+=========+
|           1 | 1.31 ms | 3.72 ms | 49.2 ms |
+-------------+---------+---------+---------+
|           4 | 5.99 ms | 16 ms   | 80.2 ms |
+-------------+---------+---------+---------+
|          16 | 8.46 ms | 24.8 ms | 147 ms  |
+-------------+---------+---------+---------+
|          32 | 10.9 ms | 22.4 ms | 77.4 ms |
+-------------+---------+---------+---------+
|          64 | 14.5 ms | 42.7 ms | 223 ms  |
+-------------+---------+---------+---------+
|         128 | 22.6 ms | 57 ms   | 256 ms  |
+-------------+---------+---------+---------+
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment