Sunday, October 31, 2010

How to Referee Sage Trac Tickets

The current workflow and tools for peer review in Sage are not perfect, and it's clear they can be improved in many ways. However, the current system is also stable and works well, and though it will surely evolve over time, it is well worth learning. This post is about the current Sage peer review workflow. I will restrict my discussion to tickets that involve the Sage library, not standard or optional packages or other scripts not in the Sage library.

Why Referee a Ticket?

There are a couple of reasons that you might referee a ticket:

  1. You want to contribute to Sage development, but don't know where to start. There's a list right here of about 200 tickets (sorted by component) that need review, and probably some require only background you know something about.

  2. You want a specific chunk of code to get into Sage that somebody else wrote. If you care about this code, you are probably qualified to review it.

  3. Somebody asked you to review a certain ticket, perhaps in exchange for them reviewing one of your tickets.
Refereeing trac tickets is different than refereeing papers for inclusion in a journal in several ways. There is no central editor that assigns tickets to reviewers -- anybody (you, right now!) can just jump in and start refereeing tickets. Also, everything about refereeing is completely open and archived (hopefully) forever.

What To Expect

The code attached to a trac ticket can do all kinds of things, including:
  1. Fix a little (or huge) bug.

  2. Add documentation to a function.

  3. Introduce a new feature to Sage (and possibly new bugs!).

  4. Change the name of an existing function.

  5. Raise (or lower) the "doctest coverage" of Sage.

How to Referee a Ticket

Refereeing a ticket involves several steps. The steps below are mainly aimed at tickets that add new capabilities to Sage, but I've included some remarks about other types of tickets in another section below.
  1. Use Mercurial queues to apply all the patches on the trac ticket to your own (very recent, usually alpha) version of Sage, then type "sage -br" to rebuild the Sage library. You download
    each foo.patch, then do "hg qimport foo.patch" followed by "hg qpush" (I personally use this little script). Note: this can be automated; type "sage -merge" to learn how. If you can't apply the patch, then the ticket may "need work", i.e., the author has to rebase the patches; note this and change the Action at the bottom of the ticket page to "needs work".

  2. Test out the code to make sure the doctests work, using "sage -t affected_files" and "make ptest" in the SAGE_ROOT directory. Doing "make ptest" is a good idea, since often a change in one part of Sage breaks something far, far away that was written by somebody else long, long ago. (Again, see "sage -merge" for automating this.) If any tests fail, the ticket probably "needs work"; note this and change the action to "needs work". I often do this testing on sage.math.washington.edu, since it has 24 processors.

  3. Make sure that every function introduced in the code, even ones that start with underscores (!), have examples that fully illustrate how they work. Make sure that all input parameters to these functions are tested in the examples, and ensure that the function is actually being used by the examples. If the patch is large you can do "sage --coverageall" both before and after applying the patches, and make sure the coverage score doesn't go down. Also, make sure the docstrings are laid out according to our standard template, e.g.,
    """
    Short description... (one sentence usually)
    
    Long description... (optional but sometimes good)
    
    INPUT:
        - param1 -- description of
          the first param
        - param2 -- description of
          the second param
    OUTPUT:
        - description of output
    
    AUTHORS: 
        - Sage Developer1
        - Sage Developer2
    
    EXAMPLES::
    
        sage: 2+2
        4
    
    TESTS::
    
        sage: further examples nobody should look at.
    """
    
    It is essential that there be two colons after "EXAMPLES" and a blank line!! Also, do not omit the INPUT/OUTPUT blocks, since I often get complaints from end users that these are missing.

  4. If you've got this far without marking the ticket "needs work", then every function in the new code should work and have solid docstrings, including INPUT/OUTPUT blocks and examples that illustrate how the function works. Much of what you did in 1-3 is pretty routine (and yes, there are people working on trying to automate some of this). In this step, you finally get to have more fun and be creative: think seriously about whether the code is "probably correct" and sufficiently well documented to read. If not, do not give the code a positive review. The Sage library is already huge, and we don't want new code that is just going to cause a lot of pain to maintain later (there are other places for such code, such as Purple Sage). Basically, at this point, pretend you are a playtester or hacker and try to break the functionality introduced by the trac ticket. Some techniques include:

    • Throw random input at the functions and see what happens.
    • If you know any theorems or conjectures that leads to consistency checks on the functions, try them out. For example, if a function factors something as a product, try factoring products, then multiplying the factors back together. If the function computes the Ramanujan tau function, try it on large inputs and test the standard congruences.
    • If the functions are implemented in other software (e.g., Magma or Mathematica) write a little program to systematically compare the output of Sage and the other system for some range of values. Also, compare timings; if code is 100 times slower in Sage than Magma, there better be a good reason for this.
    • Very often when reading code, your "devious mind" will think of ways to potentially break it. Make sure that you have a sage prompt up, so you can try out your ideas for breaking the code. If anything succeeds at breaking the code, put that in your report and set the trac Action to "Needs work". Also, if you think of good doctests to add, note these.

Special Cases

As mentioned above, most of the steps above assume that the code attached to the trac ticket is implementing new features.
  • If the code fixes a bug, then if at all possible, there should be a new doctest that illustrates that the bug is fixed. This doctest should mention the relevant trac ticket.

  • If a trac ticket mainly adds new documentation, pay special attention to the English grammar, writing, etc. In my experience, almost nobody (definitely not me!) ever writes a few paragraphs of English without making several mistakes, typos, etc.

  • If the code changes the name of an existing function, then this must happen in accordance with our deprecation policy. You have to leave the old function name in place and add a deprecation warning. This can be removed after ONE YEAR. The proper way to do this is to put the following at the top of the function:
    from sage.misc.misc import deprecation
    deprecation("function_name is deprecated...")
    
    Also, you may want to put in the Sage version number to give people a clue about when to remove the function (see the second argument to the deprecation function).
The core Sage library is meant to be mature, stable and highly polished, and function names, class names, etc., are not allowed to just be changed left and right. There are other places to post code that uses Sage, but which is not yet stable.

Posting Patches as the Referee

Often when going through the above steps, it is easiest to just make changes directly to the source code. E.g., if you see documentation misspelled, or think of examples to add. Just make a new patch that has all these changes and post a "referee patch". If you do this, then you should notify the original patch author(s), so they can referee your referee patch. This can get a little confusing, but usually sorts itself out.

Summary


I hope you can see from the above that refereeing tickets can be really fun. Think of it as a game to break whatever is posted on a trac ticket. It's much better if you find issues now when the author is still interested in the code, rather than some poor user finding bugs a few years later when the author of the code has moved on to other things and forgotten everything about the code. The core Sage library can only someday become high quality with your help as referees. And getting substantial code into Sage itself can only have any hope of attaining a similar status to publishing in a peer reviewed journal if this process of code refereeing itself gets refined, documented, and results in good quality mathematical software.

Remember, the goal is that only high quality code goes into Sage that satisfies the requirements outlined above; this is much more important than worrying about following "bureaucratic protocols".

Tuesday, October 26, 2010

Sage as a Python Library?

When I started Sage I viewed it as a distribution of a bunch of math software, and Python as just the interpreter language I happen to use at the time. I didn't even know if using Python as the language would last. However, it's also possible to think of Sage as a Python library.

Anyway, it has occurred to me (a few times, and again recently) that it would be possible to make much of the Sage distribution, without Python of course, into a Python library. What I mean is the following. You would have a big Python library called "sagemath", say, and inside of that would be a huge HG repository. In that repository, one would check in the source code for many of the standard Sage spkg's... e.g., GAP, Pari, etc. When you type

python setup.py install

then GAP, Pari, etc., would all get built, controlled by some Python scripts, then installed as package_data in the sagemath directory of /site-packages/.

From a technical perspective, I don't see any reason why this couldn't be made to work. HG can handle this much data, and "python setup.py install" can do anything. It does lead to a very different way of looking at Sage though, and it could help untangle things in interesting ways.

(1) Have a Python library called "sagecore", which is just the most important standard spkg's (e.g., Singular, PARI, etc.), perhaps eventually built *only* as shared object libraries (no standalone interpreters).

(2) Have a Python library which is the current Sage library (we already have this), and which can be installed assuming sagecore is installed.

(3) Have other Python libraries (like psage: http://code.google.com/p/purplesage/source/browse/), which depend on (2). Maybe a lot of the "sage-combinat" code could also be moved to such a library, so they can escape the "combinat patch queue" madness. Maybe many other research groups in algebraic topology, differential geometry, special functions, etc., will start developing such libraries... on their own, and share them with the community (but without having to deal directly with the sage project until they want to).

To emphasize (3), when people want to write a lot of mathematics code in some area, e.g., differential geometry, they would just make a new library that depends on Sage (the library in (2)). We do the work needed to make it easy for people to write code outside of the Sage library, which depends on Sage. Especially writing Cython code like this can be difficult and confusing, and we don't explain it all in any Sage documentation. It actually took me quite a while to figure out how to do it today (with psage).

The core Sage library (2) above would continue to have a higher and higher level of code review, tough referee process etc. However, the development models for (3) would be up to the authors of those libraries.

The above is already how the ecosystem with Python (http://pypi.python.org/pypi), Perl (http://www.cpan.org/), R, etc., work. Fortunately, Python has reasonably good support already for this.

I think without a shift in this direction, Sage is going to be very frustrating for people writing research oriented code.

Fortunately, it's possible to do everything I'm describing above without disturbing the mainline Sage project itself, at least for now.

Monday, October 25, 2010

Revision Control and Sage

 The main target audience of this blog post is mathematics researchers who are relatively new to revision control in the context of Sage.

Background


In the 1990s when I first started using revision control, it was a major, major pain in the arse. Making a tarball snapshot of the project directory with a date-stamp was better! In 2005 when I started Sage, I didn't know any better, and didn't both with revision control for the first year. In 2006, Gonzalo Tornaria told me that things had improved, and got me to use Darcs. That lasted just over a year, when I switched Sage to use Mercurial. The switch was due to major headaches installing Darcs (a Haskell program), and Darcs hanging when our repo got too big and complicated.

We are fortunate that today many difficult problems and design decisions for revision control have been addressed through lots of work. Revision control is amazingly good now compared to what it was 15 years ago!

Some milestones: RCS, CVS, Subversion, Bitkeeper & Monotone, Darcs, Git/Mercurial/Bazaar

A reference: see the Mercurial guide.

Getting started with Mercurial

In sage we use "hg" = (Mercurial). Why?

- Because I chose it in early 2007, when Darcs started sucking too much
- Distributed is the way to go these days. (Modern)
- Bazaar far too unstable in 2007, to put it mildly. Bazaar is by the people who made Ubuntu, and is much more stable today.
- GIT badly documented, too hard to setup, etc... back in 2007; again, git is much better today.

---> so Mercurial was the only reasonable choice then. It was just "good enough" at the time that we could survive. Hence Mercurial.

Mercurial comes with Sage, and you can use it like so:
  - "sage -hg"
  - sage: install_scripts command
  - sage: hg_sage.[tab]

- Or Install: easy to install standalone on OS X, Linux, and Windows.

- You can do "$ sudo easy_install mercurial" to install systemwide the latest version into your default system wide Python.   It will probably appear in /usr/local/bin/hg

Documentation for hg:
- http://mercurial.selenic.com/
- http://hgbook.red-bean.com/
- http://mercurial.selenic.com/learn/
- http://hginit.com/

Project Hosting


If you create your own HG projects, you can easily host them on Google Code (and also http://bitbucket.org/).

I'm doing this a lot lately with smaller projects, including:
- Purple Sage
- Sage Notebook

- Other systems. Google code only does HG and Subversion.
But there are also similar hosting services for git and bazaar:
- github -- free hosting for git projects
- Bazaar's Launchpad

- Sourceforge: The first major free hosting site I heard of is sourceforge.net, which offers hosting for all sorts of repos, including mercurial. It is more general, but not as "clean and focused" in abilities as the above, e.g., their mailing lists are often plagued by spam, and I seem to recall a lot of banner ads.

These hosting services are a very popular and fairly new thing (but not too new); all but sourceforge became popular well after I started Sage. They provide much, much more than just what the revision control systems do; they are more than just a place to host your code repository. They all provide:

- code hosting (of course)
- bug tracking
- code review
- mailing lists, wiki's, etc.

Sage isn't hosted on one of these probably because they are so new, and Sage is older. This may change.  Also, Sage is really, really big (Google code has a 2GB limit).

Main point: if you want to start a small project with a small number of other people -- example, writing a research paper and some corresponding code -- you can in seconds easily get the full suite of hosting, bug tracking, code review, etc., totally for free, backed up, high quality, etc. Obviously, you are trusting your data to either Google, Github, or Canonical, but you can easily backup most of it (e.g., the repos, mailing lists).

They want you to use them: "431,000 people hosting over 1,330,000 git repositories" (from github)

Story: In 2006, when I came to UW, Randy Leveque (a professor in applied math) found out about me running the Sage project, and wanted to know how to get Mercurial, Wiki's, trac, etc., setup, so that he could organize his software development project(s). I explained some basics, and I think he and his students setup various project infrastructure, and ran into assorted troubles. Today, I would just say "http://code.google.com" (or github, etc.) and be done with it! And Randy would be up and running with more tools than before in a matter of minutes.

Command line mercurial


The main point of the rest of this lecture will be about how to use Mercurial. It's very hard to beat the official Mercurial Quickstart, which we'll just go over together in class:

See the Mercurial Quickstart.

- Make a directory in your computer be under revision control
- Setup your .hgrc file, if you haven't already.
- Try cloning a remote repository, e.g., purple sage
- Try committing a change that spans multiple files, so you can see that all the changes are together. Then view the commit via "hg serve -p 8200".

WARNING: on OS X with the sage-included hg, this "hg serve" crashed for me with "Trace/BPT trap" when I visited http://localhost:8200. I opened trac ticket 10171.


So I did "sudo easy_install mercurial" to get a new version systemwide, and now this works for me: "/usr/local/bin/hg serve -p 8200"

A future blog post may cover more advanced HG topics, including branching, cloning, queues, and bundles.

Monday, October 4, 2010

Standalone C/C++ libraries that are included in Sage

Every copy of Sage includes many free open source C/C++ libraries, which can also be used standalone without Sage. They got into Sage because enough people really cared about using them, and felt it was worth the effort to convince other people that these should be included in Sage. Thus every single library also provides some key functionality to Sage. Thus understanding the main points about these libraries should be of interest to anybody who knows C/C++ who wants to do mathematical computation. Moreover, most of these libraries were written in C/C++ because the authors had already mastered the underlying algorithms, and really wanted an implementation that was extremely fast, often potentially orders of magnitude faster than what they might implement using only an interpreter, such as Python, Maple or Magma.

It's also possible using Cython (or even ctypes) to directly call any of the functionality of the libraries below from Python (hence Sage). Thus it is vital that you know about these libraries if you want to write fast code using the Sage framework.

When you want to look at the source code for any of these libraries, you can download the Sage source code from here or here, unpack it, look in the directory sage-x.y.z/spkg/standard/, and for any "spkg" (=tar, bzip2) file there, unpack it, e.g.,

wstein@sage:$ tar jxvf mpir-1.2.2.p1.spkg
wstein@sage:$ cd mpir-1.2.2.p1/src/
wstein@sage:$ ls
acinclude.m4     doc               missing        
...

Note that you can't unpack the spkg files included in a binary of Sage, since they are actually empty placeholders.

You can also find direct links to all of the standard spkg by going to this page, along with short descriptions of each.


Arithmetic

  • mpir (written in C and assembler; has a C++ interface): this is a library for doing fast arithmetic with arbitrary precision integers and rationals. It's very fast and cross-platform, and it's used by many other libraries. (NOTE: MPIR was started as an angry fork" of GMP=Gnu Multiprecision Library. GMP is used by Magma, Maple, and Mathematica; I've been told Mathematica doesn't use MPIR because of the overlap of Sage and MPIR developers, and their general suspicion toward the Sage project.)
  • mpfr (C library with C++ interface): this is a library for doing arithmetic with fixed precision floating point numbers, i.e., decimals with a given fixed choice of precision. "Floating-point numbers are a lot like sandpiles: Every time you move one you lose a little sand and pick up a little dirt." [Kernighan and Plauger]. MPFR is used by Magma (for details, google for "mpfr magma"), and several other libraries and systems out there. The number theory system and library PARI does *NOT* use MPFR; since PARI also does arbitrary precision floating point arithmetic, you see that PARI includes alternative implementations of some of the same algorithms (note: PARI also has its own alternative to MPIR). MPFR is unusual in that the main author -- Paul Zimmerman -- is unusually careful for somebody who works with floating point numbers, and my understanding is that all the algorithms in MPFR are proven to give the claimed precision. Basically, MPFR generalizes IEEE standards (that define what floating point arithmetic means) to arbitrary precision. MPFR also supports numerous "rounding modes". The mpmath Python library is in some cases faster than MPFR, but it has very different standards of rigor.
  • mpfi (C library): mpfi is built on top of MPFR, and implements arbitrary precision floating point "interval arithmetic". Interval arithmetic is an idea where you do arithmetic on intervals [a,b],
    and apply functions to them. Given two intervals, their sum, product, quotient, etc., is another interval. The actual "feel" of using this is that you're working with floating point numbers, and as you lose precision due to rounding errors, the computer keeps track of the loss every step of the way... and in some cases you can quickly discover you have no bits left at all! MPFI is nicely integrated into Sage (by Carl Witty), and is a pretty sophisticated and robust implementation of interval arithmetic, on which some other things in Sage are built, such as the field QQbar of all algebraic numbers.
  • flint (C library): "Flint" is an abbreviation for "Fast Library for Number Theory". However, I've listed it here under basic arithmetic, since so far in Sage at least we only use FLINT for arithmetic with polynomials over Z[x] and (Z/nZ)[x]. (Sage should use it for Q[x], but still doesn't; see the horendously painful trac ticket, number 4000!) The main initial challenge and reason for existence of flint was to multiply polynomials in Z[x] more quickly than Magma (hence any other software on the planet), and FLINT succeeded. Now FLINT also does many other relevant polynomial algorithms, e.g., GCD, addition, etc. Most new development is going into "FLINT2", which is a total rewrite of FLINT, with an incompatible C interface. Bill Hart's vision is that eventually FLINT2 have capabilities sort of like PARI. Sage users have found at least one serious bug involving f,g in Z[x] where f*g was computed incorrectly.
  • znpoly (C libary): (by David Harvey) one version of znpoly is included in FLINT, and another is a standalone package in Sage. The point of this library is to do fast arithmetic in (Z/nZ)[x]. ("Monsky-Washnitzer!" we once found a very subtle bug in znpoly-in-FLINT that only appeared when multiplying polynomials of huge degree.)
  • givaro (C++ library): in Sage we use it entirely for fast arithmetic over small-cardinality finite fields, up to cardinality "2 to the power 16". Givaro makes a log table and does all multiplications as additions of small ints. Givaro actually does a lot more, e.g., arithmetic with algebraic numbers, and a bunch of matrix algorithms, including Block Wiedemann, which we don't even try to make available in Sage.

NOTE: The C library API conventions of mpir (gmp), mpfr, mpfi, flint and zn_poly are all pretty similar. Once you learn one, the others are comfortable to use. Givaro is totally different than the others because it is a C++ library, and hence has operator overloading available. The upshot: in Givaro, you type "c=a+b" to add two numbers, but in the other libaries (when used from C) you type, maybe "mpz_add(c,a,b)".


Number Theory


  • pari (C library; interpreter): a hugely powerful C library (and interactive interpreter) for number theory. This library is very, very good and fast for doing computations of many functions relevant to number theory, of "class groups of number fields", and for certain computations with elliptic curves. It also has functions for univariate polynomial factorization and arithmetic over number fields and finite fields. PARI has a barbaric stack-based memory scheme, which can be quite annoying. Key Point: PARI is easy to build on my iPhone, which says something. That said, we have Neil Sloane, 2007: ``I would like to thank everyone who responded to my question about installing PARI on an iMAC. The consensus was that it would be simplest to install Sage, which includes PARI and many other things. I tried this and it worked! Thanks! Neil (It is such a shock when things actually work!!)'' PARI is very tightly integrated into Sage.
  • ntl (C++ library): a C++ library aimed at providing foundations for implementing number theory algorithms. It provides C++ classes for polynomials, vectors, and matrices over ZZ, Z/pZ, GF(q), (Z/pZ)[x], and RR, and algorithms for each. Some algorithms, e.g., for polynomial factorization in ZZ[x], are very sophisticated. Others, e.g., Hermite form of matrices over ZZ, are naive. In fact, most of the matrix algebra in NTL is naive, at least compared to what is in IML, Linbox, Sage and Magma. NTL is very stable and considered "done" by its author.
  • eclib (C++, standalone program): This is John Cremona's prized collection of C++ code that he wrote for computing elliptic curves, both finding them (using "modular symbols") and computing tricky things about them (via the "mwrank" program). It's a large amount of C++ code that Cremona developed over nearly 2 decades. There is still much functionality in there that could be made more readily available in Sage via Cython, but for which nobody has put in the effort to do so (I'm sure Cremona would be very helpful in providing guidance).
  • lcalc (C++ library, standalone program): This is Mike Rubinstein's C++ library for computing with "motivic L-functions", which are generalizations of the Riemann zeta function, of great importance in number theory. It's particularly good at computing tons of zeros of such functions in their critical strip, hence getting information about generalizations of the Riemann Hypothesis. It's the best general purpose L-functions program for computing zeros.
  • sympow (standalone C program): Though technically not a library it could be adapted into one. Sympow is a C program written by Mark Watkins for quickly computing special values of symmetric power L-functions (so related to what lcalc does). It has no dependencies (instead of PARI), because Mark didn't want to have to license sympow under the GPL.
  • ratpoints (C library): Michael Stoll's highly optimized C program for searching for certain rational points on hyperelliptic curves (i.e. of the form y^2 = f(x)). This library goes back to the early 1990s and work of Elkies. I believe this is in Sage for one reason only: Robert Miller wanted to implement 2-descent, and he needed this. (Unfortunately, Miller's implementation is only done in the case when there is a rational 2-torsion point.) This library could be made more useful in Sage, via making it so functionality in sage.libs.ratpoints is easily available for hyperelliptic curves...
  • genus2reduction (standalone C program that uses PARI): Not technically a library. This a C program that depends on PARI, and quickly computes "stuff" about genus 2 curves over the rational numbers... which as far as I know is fairly unique in what it does. A. Brumer used it recently for a large scale computation, and pointed out that it has stability issues.
  • flintqs (C++ program): This is a standalone C++ program that implements the quadratic sieve algorithm for integer factorization (which is good at factoring p*q with p and q of similar size and p*q up to maybe 100 digits). I think that this algorithm is also implemented in PARI, though in some cases flintqs is faster. A silly fact is that a better version of this program is evidently in FLINT itself, but for some reason Sage doesn't use it.
  • ecm (C library and C program; uses MPFR, MPIR): This implements the elliptic curve integer factorization algorithm, which Hendrik Lenstra invented, and is good at factoring p*n with p a prime with up to 30 (?) or so digits. There is another highly optimized implementation of the same algorithm as part of PARI, but ECM is in some (all?) cases better. It also has a huge number of tunable parameters. The ECM program is also often run standalone by people, sometimes on clusters, to attempt to factor integers. NOTE: Sage has a "factor" command, which just calls PARI. One could imagine it calling ECM and flintqs, etc., but nobody has made it do so... though many have tried.

Linear Algebra


  • IML (C library): "Integer matrix library". This library solves A*X = B, with A and B matrices over the integers. It does this by solving "A*X = B (mod p^n)" for several prime powers p^n, which involves computing the reduced row echelon form of A mod p (for various p) quickly... which is in turn done via matrix multiplication over matrices with double precision entries. It is the fastest software in the world for solving AX=B in many cases, especially as the entries of A and B get huge.
  • Linbox (C++ library): asymptotically fast black box linear algebra algorithms. It was just a "research sandbox", but "we" (mainly Clement Pernet) got Linbox whipped into shape enough to be of use in Sage for certain things. It is used at least for: matrix multiplication and computation of characteristic polynomials of matrices over the integers and finite fields, and asymptotically fast echelon form of matrices over finite field. I don't think it is used for much else yet. Implementing fast versions of the above in general is a gargantuan task, and we had done much of it (but no good charpoly!) natively in Sage before using Linbox, so now multiple implementations are available -- Linbox's are usually faster. Until Linbox/Sage, there were no fast open source programs for asymptoically fast exact linear algebra -- Magma was the only game in town (neither Maple or Mathematica are good at this either).
  • m4ri (C library): "Mary" -- scary fast arithmetic with matrices over the finite field GF(2) with 2 elements. This library uses various tricks (involving so-called "Gray codes") and asymptotically fast divide-and-conquer algorithms to compute reduced row echelon form and do matrix multiplication with matrices over GF(2). It is memory efficient and is the world's fastest at many benchmarks. There is related work (on "m4ri(e)") to do arithmetic over GF(2^n) and also some work by Boothby and Bradshaw on GF(p^n) for small p and n.
  • libfplll (C++ library): floating point LLL. The whole point of fpLLL is to implement very fast algorithms for computing LLL reduced basis for lattices, possibly with rounding error if you are not careful. LLL gets used as a key algorithm in many other algorithms (e.g., polynomial factorization). Magma also uses some variant of libfplll, and an older rewritten version of fpLLL is in PARI. If you look, e.g., at Mathematica, it does have some sort of limited LLL implementation, but it is nowhere near as sophisticated as what is available overall in Sage... NTL and PARI also have their own interfaces to LLL, which can have various pros and cons over using fpLLL directly. (In Sage, you can make a matrix over the integers and call the LLL or LLL_gram methods on it.) Magma also uses fpLLL.
  • polybori (C++ library): commutative algebra with boolean rings, which are characteristic 2 rings such that x^2 = x for every element of the ring. Evidently, polybori is important in cryptography and circuit design (?). I have personally never succeeded in really understanding how or seeing for myself polybori do well on benchmarks. Polybori is why the boost-cropped package is in Sage, which provides some C++ datastructure and algorithms.

Combinatorics


  • symmetrica (C++ library): a C++ library for doing very fast arithmetic with symmetric functions. The combinatorics code in Sage builds on this.
  • cliquer (C library): for finding cliques (=maximal complete subgraphs) in a graph.


Geometry


  • gfan (C++ program): "groebner fans" -- for computing a geometric object that parametrizes all Groebner basis for an ideal.
  • cddlib (C library): for computing with convex polytopes that gfan requires, using the "double description" method (whatever that is). There is code in sage.geometry.polyhedra that also uses cddlib.
  • glpk (C library): the GNU linear programming toolkit. This is a good free open source program for "linear programming". There are also commercial programs (e.g., CPLEX) for linear programming, and whenever people talk about linear programming software they always point out that some commercial programs are way, way faster than glpk. But glpk is in Sage.
  • palp (C program): computes stuff about lattice polytopes, like all the integers points in one. There is a big Sage package that builds on top of this. Evidently, this is of substantial interest to some physicists.

Numerical applied math

  • gsl (C library): implements core algorithms of scientific computing, is very stable and considered "done" by its authors. New relevant code goes into other libraries that use GSL.

Non mathematics


  • readline: provides command line editing, which is used by IPython, PARI, Singular, etc. Often misbuilt or not available on various platforms, so included in Sage. Readline is perhaps famous for being a key reason that both PARI and clisp are licenced under the GPL.
  • bzip2: bzip2 compression; easily usable from Sage via "import bz2". NOTE: this is used by the Sage build system so it is in SAGE_ROOT/spkg/base instead of SAGE_ROOT/spkg/standard/.
  • zlib: gzip-compatible compression; makes it so you can easily "gzip" files, and even work with gzip'd files in memory from Python.