The image shows call graph output of valgrind --callgrind tool for 100 digits prime:
So, we can see that to improve the speed of algorithm we must reduce the number of multiplications (which are well optimized in flint).
Currently I have a very naive implementation of selecting $s$ and $R$ values (just trying increase $R = 2, 4, 6...$ until the proper $s$ is found). It can takes a lot of time and choose non-optimal parameters. For example, if we set $R = 98280$ it proves 200 digits number primality for ~8 seconds instead of 30. By $R = 166320$ we can prove 300 digits number in ~30 seconds. So I need to rewrite aprcl_config_init() function.
The next thing to do is replace the condition $s^2 > n$ to $s^3 > n$. Using "Divisors in Residue Classes" we can determine n divisors. It was implemented in flint before, so I need only add this step in Final division step.
Also I want to replace my basic powering algorithm in $\mathbb{Z}[\zeta] / (n)$ by $2^k$-ary method.
And finally we can precompute and store Jacobi sum in configuration structure (for 300 digits number computation of Jacobi sums takes ~40% of time) for testing numbers with a similar length.
This week I want to add this modifications and look in more detail the other opensource implementations.
No comments:
Post a Comment