GSoC 2026 with NumFOCUS/scikit-bio — Numba-Optimized Bioinformatics

Announcement

Accepted into GSoC 2026

On May 1, 2026, Google notified me that my proposal with NumFOCUS had been accepted for Google Summer of Code 2026. The accepted project title is Numba Optimized Implementations for scikit-bio's Performance-Critical Functions.

The project is under scikit-bio, a BSD-licensed Python library for biological omic data analysis. It sits inside the NumFOCUS ecosystem and is used in workflows around distance matrices, ordination, compositional data analysis, diversity metrics, and biological sequence analysis.

This is not a generic performance project. The work targets the computational core behind PCoA, Mantel tests, PERMANOVA, and related bioinformatics workflows where large biological datasets make performance and memory behavior matter.

Context

The Problem: Fast CPU Code, GPU-Blind Internals

scikit-bio currently has two important execution paths for numerical code:

Path	Strength	Limitation
Cython	Fast on CPU with single-pass memory patterns	Cannot directly operate on GPU-backed arrays because typed memoryviews require host memory
Array API / xp	Works across NumPy, CuPy, JAX, PyTorch, and Dask-style backends	Often slower on CPU because vectorized operations make multiple passes through the matrix

That creates a divergence problem: the fastest CPU code paths are locked behind Cython, while the GPU-capable paths live in separate pure Array API implementations. Over time, two parallel implementations become harder to maintain and easier to subtly diverge.

Architecture

The Proposed Solution: One Source, Three Execution Paths

The accepted proposal replaces selected Cython functions with Numba implementations. Numba lets us write Python code that compiles to optimized native CPU code via LLVM, while also giving an explicit CUDA kernel path for GPU execution.

1. Numba CPU

@njit(parallel=True, cache=True) for NumPy arrays on CPU. This is the default fast path and should match or exceed the existing Cython implementation.

2. Numba CUDA

@cuda.jit for memory-bandwidth-bound operations where a fused kernel beats multiple Array API kernels on GPU-backed arrays.

3. Array API fallback

The xp path remains the safety net for backends like JAX, PyTorch, Dask, or environments where Numba is not installed.

Numba will be an optional dependency. If it is unavailable, users still get correct results through the existing Array API-compatible fallback path.

Proof

The Prototype That Made the Proposal Concrete

The core prototype was for center_distance_matrix, which is used in Principal Coordinates Analysis (PCoA). The algorithm performs Gower centering: square the distance matrix, accumulate row means, compute a global mean, and apply centering.

center_distance_matrix Numba shape

from numba import njit, prange
import numpy as np

@njit(parallel=True)
def center_distance_matrix_numba(D):
    n = D.shape[0]
    E = np.empty((n, n), dtype=D.dtype)
    row_means = np.empty(n, dtype=D.dtype)
    global_sum = 0.0

    for i in prange(n):
        row_sum = 0.0
        for j in range(n):
            val = -0.5 * D[i, j] * D[i, j]
            E[i, j] = val
            row_sum += val
        row_means[i] = row_sum / n
        global_sum += row_sum

    global_mean = global_sum / (n * n)
    for i in prange(n):
        for j in range(n):
            E[i, j] = E[i, j] - row_means[i] - row_means[j] + global_mean
    return E

The important part is not the decorator alone. It is the memory-access pattern: one pass for squaring and row accumulation, then one pass for centering. That mirrors the Cython advantage while keeping the implementation in Python.

Matrix size	Cython	Numba `@njit(parallel=True)`
n = 500	0.39 ms	0.34 ms
n = 1000	1.75 ms	1.01 ms
n = 3000	19.3 ms	7.91 ms

At n=3000, the Numba prototype was 2.4x faster than Cython in the same benchmark session, with numerical agreement below 1e-15.

Deliverables

What I Will Build

The accepted scope focuses on the distance and ordination modules: the places where performance-critical matrix operations appear repeatedly.

Tier 1

Ordination Cython replacements

Replace e_matrix_means_cy, f_matrix_inplace_cy, and center_distance_matrix_cy. These underpin PCoA and are the direct continuation of the working prototype.

Tier 2

Distance module Cython replacements

Implement Numba versions of Mantel permutation correlation, PERMANOVA pseudo-F statistics, and geometric median routines where the loop and reduction patterns fit Numba.

Tier 3

Distance utilities

Port validation and matrix reordering utilities such as is_symmetric_and_hollow and distmat_reorder where straightforward Numba implementations can remove more Cython dependency surface.

Tier 4

Ordination utilities

Add Array API and Numba-aware paths for utilities like mean_and_std, scale, corr, e_matrix, and f_matrix.

Not every Cython file is a good Numba target. Alignment traceback, tree traversal, phylogenetic dictionary lookups, and treap-like metadata structures are explicitly out of scope because they do not match Numba's strengths.

People

Mentorship

Igor Sfiligoi

Primary mentor from UC San Diego / San Diego Supercomputer Center. His work focuses on HPC, GPU acceleration, cache-aware restructuring, and bioinformatics performance.

Qiyun Zhu

Co-mentor from Arizona State University and a key architect of scikit-bio's Array API migration, including the ingest_array / xp pattern.

Matthew Aton

Co-mentor from Arizona State University working on Array API infrastructure such as decorators for backend-compatible functions and tests.

The technical bar is clear: benchmarks must be honest, numerical claims must be verified, and performance work must respect memory access patterns rather than chasing decorator-based speedups blindly.

Plan

Timeline

Community Bonding: May 1 - May 24

Set up the development environment, confirm scope, audit Cython files, reproduce prototype benchmarks, study mentor papers, and prepare a warm-up PR.

Weeks 1-6: May 25 - July 5

Build Numba integration infrastructure, replace ordination Cython functions, implement Mantel and PERMANOVA Numba paths, and prepare for midterm evaluation.

Midterm: July 6 - July 10

Target state: Tier 1 complete and merged, with Mantel and PERMANOVA implementations ready for review.

Weeks 7-12: July 11 - August 17

Finish utilities, GPU kernels, benchmark tables, documentation, review cleanup, final report, and final blog post.

Impact

Why This Matters

scikit-bio is not an isolated library. It supports downstream scientific workflows in ecosystems like QIIME 2, Emperor, and Qiita. When performance-critical matrix operations become faster and more GPU-capable, the benefit can propagate into real biological data analysis pipelines.

The engineering lesson is bigger than one library: modern scientific Python needs graceful heterogeneity. CPU should be fast. GPU should be possible. Fallback paths should be correct. Maintainers should not have to carry multiple drifting implementations forever.

My goal for the summer is simple: make scikit-bio's fastest paths easier to maintain, easier to benchmark, and ready for CPU and GPU execution without compromising numerical correctness.

GSoC 2026: Bringing Numba-Optimized Performance to scikit-bio

Accepted into GSoC 2026

The Problem: Fast CPU Code, GPU-Blind Internals

The Proposed Solution: One Source, Three Execution Paths

1. Numba CPU

2. Numba CUDA

3. Array API fallback

The Prototype That Made the Proposal Concrete

What I Will Build

Ordination Cython replacements

Distance module Cython replacements

Distance utilities

Ordination utilities

Mentorship

Igor Sfiligoi

Qiyun Zhu

Matthew Aton

Timeline

Community Bonding: May 1 - May 24

Weeks 1-6: May 25 - July 5

Midterm: July 6 - July 10

Weeks 7-12: July 11 - August 17

Why This Matters