Monday, October 4, 2010

Standalone C/C++ libraries that are included in Sage

Every copy of Sage includes many free open source C/C++ libraries, which can also be used standalone without Sage. They got into Sage because enough people really cared about using them, and felt it was worth the effort to convince other people that these should be included in Sage. Thus every single library also provides some key functionality to Sage. Thus understanding the main points about these libraries should be of interest to anybody who knows C/C++ who wants to do mathematical computation. Moreover, most of these libraries were written in C/C++ because the authors had already mastered the underlying algorithms, and really wanted an implementation that was extremely fast, often potentially orders of magnitude faster than what they might implement using only an interpreter, such as Python, Maple or Magma.

It's also possible using Cython (or even ctypes) to directly call any of the functionality of the libraries below from Python (hence Sage). Thus it is vital that you know about these libraries if you want to write fast code using the Sage framework.

When you want to look at the source code for any of these libraries, you can download the Sage source code from here or here, unpack it, look in the directory sage-x.y.z/spkg/standard/, and for any "spkg" (=tar, bzip2) file there, unpack it, e.g.,

wstein@sage:$ tar jxvf mpir-1.2.2.p1.spkg
wstein@sage:$ cd mpir-1.2.2.p1/src/
wstein@sage:$ ls
acinclude.m4     doc               missing        
...

Note that you can't unpack the spkg files included in a binary of Sage, since they are actually empty placeholders.

You can also find direct links to all of the standard spkg by going to this page, along with short descriptions of each.


Arithmetic

  • mpir (written in C and assembler; has a C++ interface): this is a library for doing fast arithmetic with arbitrary precision integers and rationals. It's very fast and cross-platform, and it's used by many other libraries. (NOTE: MPIR was started as an angry fork" of GMP=Gnu Multiprecision Library. GMP is used by Magma, Maple, and Mathematica; I've been told Mathematica doesn't use MPIR because of the overlap of Sage and MPIR developers, and their general suspicion toward the Sage project.)
  • mpfr (C library with C++ interface): this is a library for doing arithmetic with fixed precision floating point numbers, i.e., decimals with a given fixed choice of precision. "Floating-point numbers are a lot like sandpiles: Every time you move one you lose a little sand and pick up a little dirt." [Kernighan and Plauger]. MPFR is used by Magma (for details, google for "mpfr magma"), and several other libraries and systems out there. The number theory system and library PARI does *NOT* use MPFR; since PARI also does arbitrary precision floating point arithmetic, you see that PARI includes alternative implementations of some of the same algorithms (note: PARI also has its own alternative to MPIR). MPFR is unusual in that the main author -- Paul Zimmerman -- is unusually careful for somebody who works with floating point numbers, and my understanding is that all the algorithms in MPFR are proven to give the claimed precision. Basically, MPFR generalizes IEEE standards (that define what floating point arithmetic means) to arbitrary precision. MPFR also supports numerous "rounding modes". The mpmath Python library is in some cases faster than MPFR, but it has very different standards of rigor.
  • mpfi (C library): mpfi is built on top of MPFR, and implements arbitrary precision floating point "interval arithmetic". Interval arithmetic is an idea where you do arithmetic on intervals [a,b],
    and apply functions to them. Given two intervals, their sum, product, quotient, etc., is another interval. The actual "feel" of using this is that you're working with floating point numbers, and as you lose precision due to rounding errors, the computer keeps track of the loss every step of the way... and in some cases you can quickly discover you have no bits left at all! MPFI is nicely integrated into Sage (by Carl Witty), and is a pretty sophisticated and robust implementation of interval arithmetic, on which some other things in Sage are built, such as the field QQbar of all algebraic numbers.
  • flint (C library): "Flint" is an abbreviation for "Fast Library for Number Theory". However, I've listed it here under basic arithmetic, since so far in Sage at least we only use FLINT for arithmetic with polynomials over Z[x] and (Z/nZ)[x]. (Sage should use it for Q[x], but still doesn't; see the horendously painful trac ticket, number 4000!) The main initial challenge and reason for existence of flint was to multiply polynomials in Z[x] more quickly than Magma (hence any other software on the planet), and FLINT succeeded. Now FLINT also does many other relevant polynomial algorithms, e.g., GCD, addition, etc. Most new development is going into "FLINT2", which is a total rewrite of FLINT, with an incompatible C interface. Bill Hart's vision is that eventually FLINT2 have capabilities sort of like PARI. Sage users have found at least one serious bug involving f,g in Z[x] where f*g was computed incorrectly.
  • znpoly (C libary): (by David Harvey) one version of znpoly is included in FLINT, and another is a standalone package in Sage. The point of this library is to do fast arithmetic in (Z/nZ)[x]. ("Monsky-Washnitzer!" we once found a very subtle bug in znpoly-in-FLINT that only appeared when multiplying polynomials of huge degree.)
  • givaro (C++ library): in Sage we use it entirely for fast arithmetic over small-cardinality finite fields, up to cardinality "2 to the power 16". Givaro makes a log table and does all multiplications as additions of small ints. Givaro actually does a lot more, e.g., arithmetic with algebraic numbers, and a bunch of matrix algorithms, including Block Wiedemann, which we don't even try to make available in Sage.

NOTE: The C library API conventions of mpir (gmp), mpfr, mpfi, flint and zn_poly are all pretty similar. Once you learn one, the others are comfortable to use. Givaro is totally different than the others because it is a C++ library, and hence has operator overloading available. The upshot: in Givaro, you type "c=a+b" to add two numbers, but in the other libaries (when used from C) you type, maybe "mpz_add(c,a,b)".


Number Theory


  • pari (C library; interpreter): a hugely powerful C library (and interactive interpreter) for number theory. This library is very, very good and fast for doing computations of many functions relevant to number theory, of "class groups of number fields", and for certain computations with elliptic curves. It also has functions for univariate polynomial factorization and arithmetic over number fields and finite fields. PARI has a barbaric stack-based memory scheme, which can be quite annoying. Key Point: PARI is easy to build on my iPhone, which says something. That said, we have Neil Sloane, 2007: ``I would like to thank everyone who responded to my question about installing PARI on an iMAC. The consensus was that it would be simplest to install Sage, which includes PARI and many other things. I tried this and it worked! Thanks! Neil (It is such a shock when things actually work!!)'' PARI is very tightly integrated into Sage.
  • ntl (C++ library): a C++ library aimed at providing foundations for implementing number theory algorithms. It provides C++ classes for polynomials, vectors, and matrices over ZZ, Z/pZ, GF(q), (Z/pZ)[x], and RR, and algorithms for each. Some algorithms, e.g., for polynomial factorization in ZZ[x], are very sophisticated. Others, e.g., Hermite form of matrices over ZZ, are naive. In fact, most of the matrix algebra in NTL is naive, at least compared to what is in IML, Linbox, Sage and Magma. NTL is very stable and considered "done" by its author.
  • eclib (C++, standalone program): This is John Cremona's prized collection of C++ code that he wrote for computing elliptic curves, both finding them (using "modular symbols") and computing tricky things about them (via the "mwrank" program). It's a large amount of C++ code that Cremona developed over nearly 2 decades. There is still much functionality in there that could be made more readily available in Sage via Cython, but for which nobody has put in the effort to do so (I'm sure Cremona would be very helpful in providing guidance).
  • lcalc (C++ library, standalone program): This is Mike Rubinstein's C++ library for computing with "motivic L-functions", which are generalizations of the Riemann zeta function, of great importance in number theory. It's particularly good at computing tons of zeros of such functions in their critical strip, hence getting information about generalizations of the Riemann Hypothesis. It's the best general purpose L-functions program for computing zeros.
  • sympow (standalone C program): Though technically not a library it could be adapted into one. Sympow is a C program written by Mark Watkins for quickly computing special values of symmetric power L-functions (so related to what lcalc does). It has no dependencies (instead of PARI), because Mark didn't want to have to license sympow under the GPL.
  • ratpoints (C library): Michael Stoll's highly optimized C program for searching for certain rational points on hyperelliptic curves (i.e. of the form y^2 = f(x)). This library goes back to the early 1990s and work of Elkies. I believe this is in Sage for one reason only: Robert Miller wanted to implement 2-descent, and he needed this. (Unfortunately, Miller's implementation is only done in the case when there is a rational 2-torsion point.) This library could be made more useful in Sage, via making it so functionality in sage.libs.ratpoints is easily available for hyperelliptic curves...
  • genus2reduction (standalone C program that uses PARI): Not technically a library. This a C program that depends on PARI, and quickly computes "stuff" about genus 2 curves over the rational numbers... which as far as I know is fairly unique in what it does. A. Brumer used it recently for a large scale computation, and pointed out that it has stability issues.
  • flintqs (C++ program): This is a standalone C++ program that implements the quadratic sieve algorithm for integer factorization (which is good at factoring p*q with p and q of similar size and p*q up to maybe 100 digits). I think that this algorithm is also implemented in PARI, though in some cases flintqs is faster. A silly fact is that a better version of this program is evidently in FLINT itself, but for some reason Sage doesn't use it.
  • ecm (C library and C program; uses MPFR, MPIR): This implements the elliptic curve integer factorization algorithm, which Hendrik Lenstra invented, and is good at factoring p*n with p a prime with up to 30 (?) or so digits. There is another highly optimized implementation of the same algorithm as part of PARI, but ECM is in some (all?) cases better. It also has a huge number of tunable parameters. The ECM program is also often run standalone by people, sometimes on clusters, to attempt to factor integers. NOTE: Sage has a "factor" command, which just calls PARI. One could imagine it calling ECM and flintqs, etc., but nobody has made it do so... though many have tried.

Linear Algebra


  • IML (C library): "Integer matrix library". This library solves A*X = B, with A and B matrices over the integers. It does this by solving "A*X = B (mod p^n)" for several prime powers p^n, which involves computing the reduced row echelon form of A mod p (for various p) quickly... which is in turn done via matrix multiplication over matrices with double precision entries. It is the fastest software in the world for solving AX=B in many cases, especially as the entries of A and B get huge.
  • Linbox (C++ library): asymptotically fast black box linear algebra algorithms. It was just a "research sandbox", but "we" (mainly Clement Pernet) got Linbox whipped into shape enough to be of use in Sage for certain things. It is used at least for: matrix multiplication and computation of characteristic polynomials of matrices over the integers and finite fields, and asymptotically fast echelon form of matrices over finite field. I don't think it is used for much else yet. Implementing fast versions of the above in general is a gargantuan task, and we had done much of it (but no good charpoly!) natively in Sage before using Linbox, so now multiple implementations are available -- Linbox's are usually faster. Until Linbox/Sage, there were no fast open source programs for asymptoically fast exact linear algebra -- Magma was the only game in town (neither Maple or Mathematica are good at this either).
  • m4ri (C library): "Mary" -- scary fast arithmetic with matrices over the finite field GF(2) with 2 elements. This library uses various tricks (involving so-called "Gray codes") and asymptotically fast divide-and-conquer algorithms to compute reduced row echelon form and do matrix multiplication with matrices over GF(2). It is memory efficient and is the world's fastest at many benchmarks. There is related work (on "m4ri(e)") to do arithmetic over GF(2^n) and also some work by Boothby and Bradshaw on GF(p^n) for small p and n.
  • libfplll (C++ library): floating point LLL. The whole point of fpLLL is to implement very fast algorithms for computing LLL reduced basis for lattices, possibly with rounding error if you are not careful. LLL gets used as a key algorithm in many other algorithms (e.g., polynomial factorization). Magma also uses some variant of libfplll, and an older rewritten version of fpLLL is in PARI. If you look, e.g., at Mathematica, it does have some sort of limited LLL implementation, but it is nowhere near as sophisticated as what is available overall in Sage... NTL and PARI also have their own interfaces to LLL, which can have various pros and cons over using fpLLL directly. (In Sage, you can make a matrix over the integers and call the LLL or LLL_gram methods on it.) Magma also uses fpLLL.
  • polybori (C++ library): commutative algebra with boolean rings, which are characteristic 2 rings such that x^2 = x for every element of the ring. Evidently, polybori is important in cryptography and circuit design (?). I have personally never succeeded in really understanding how or seeing for myself polybori do well on benchmarks. Polybori is why the boost-cropped package is in Sage, which provides some C++ datastructure and algorithms.

Combinatorics


  • symmetrica (C++ library): a C++ library for doing very fast arithmetic with symmetric functions. The combinatorics code in Sage builds on this.
  • cliquer (C library): for finding cliques (=maximal complete subgraphs) in a graph.


Geometry


  • gfan (C++ program): "groebner fans" -- for computing a geometric object that parametrizes all Groebner basis for an ideal.
  • cddlib (C library): for computing with convex polytopes that gfan requires, using the "double description" method (whatever that is). There is code in sage.geometry.polyhedra that also uses cddlib.
  • glpk (C library): the GNU linear programming toolkit. This is a good free open source program for "linear programming". There are also commercial programs (e.g., CPLEX) for linear programming, and whenever people talk about linear programming software they always point out that some commercial programs are way, way faster than glpk. But glpk is in Sage.
  • palp (C program): computes stuff about lattice polytopes, like all the integers points in one. There is a big Sage package that builds on top of this. Evidently, this is of substantial interest to some physicists.

Numerical applied math

  • gsl (C library): implements core algorithms of scientific computing, is very stable and considered "done" by its authors. New relevant code goes into other libraries that use GSL.

Non mathematics


  • readline: provides command line editing, which is used by IPython, PARI, Singular, etc. Often misbuilt or not available on various platforms, so included in Sage. Readline is perhaps famous for being a key reason that both PARI and clisp are licenced under the GPL.
  • bzip2: bzip2 compression; easily usable from Sage via "import bz2". NOTE: this is used by the Sage build system so it is in SAGE_ROOT/spkg/base instead of SAGE_ROOT/spkg/standard/.
  • zlib: gzip-compatible compression; makes it so you can easily "gzip" files, and even work with gzip'd files in memory from Python.