M4RI 1.0.1
Todo List

Global _mzd_addmul_even (mzd_t *C, mzd_t *A, mzd_t *B, int cutoff)
make sure not to overwrite crap after ncols and before width*RADIX

Global _mzd_combine2 (c, t1, t2, wide)
the non SSE2 version of this code is slow, replace by code from mzd_process_rows2

Global _mzd_combine4 (c, t1, t2, t3, t4, wide)
the non SSE2 version of this code is slow, replace by code from mzd_process_rows4

Global _mzd_combine8 (c, t1, t2, t3, t4, t5, t6, t7, t8, wide)
the non SSE2 version of this code is slow, replace by code from mzd_process_rows8

Global _mzd_mul_even (mzd_t *C, mzd_t *A, mzd_t *B, int cutoff)
ideally we would use the same Wmk throughout the function but some called function doesn't like that and we end up with a wrong result if we use virtual Wmk matrices. Ideally, this should be fixed not worked around. The check whether the bug has been fixed, use only one Wmk and check if mzd_mul(4096, 3528, 4096, 2124) still returns the correct answer.

Global m4ri_coin_flip ()
Allow user to provide her own random() function.

Global m4ri_die (const char *errormessage,...)
Allow user to register callback which is called on m4ri_die().

Global m4ri_mm_calloc (int count, int size)
Allow user to register calloc function.

Global m4ri_mm_free (void *condemned,...)
Allow user to register free function.

Global m4ri_mm_malloc (int size)
Allow user to register malloc function.

Global m4ri_random_word ()
Allow user to provide her own random() function.

Global mzd_combine (mzd_t *DST, const size_t row3, const size_t startblock3, const mzd_t *SC1, const size_t row1, const size_t startblock1, const mzd_t *SC2, const size_t row2, const size_t startblock2)
this code is slow if offset!=0

Global mzd_randomize (mzd_t *M)
Allow the user to provide a RNG callback.

File xor.h
start counting at 0!