Windows Support#
Since v0.12.0, PyHMMER is provided for Windows on an experimental basis.
Caution
Note that PyHMMER on Windows has only been tested on the internal PyHMMER test suite, and not extensively for result consistency across a wide number of platforms or build combinations. PyHMMER on Windows is provided for convenience and is not officially supported by HMMER and its developers.
By default, HMMER is not compatible with Windows because it uses POSIX functions extensively: POSIX threads for parallelism, GNU autotools for building from source, or POSIX-specific system headers.
To build PyHMMER on Windows, we use MinGW-w64
which provides a minimum GNU environment to build HMMER with gcc and
sufficient compatibility to handle most of the core features.
MinGW Build#
PyHMMER was rewritten in v0.11.0 to use scikit-build-core and CMake
to handle the build of the native extension. As CMake supports Windows
natively, all that is needed for cross-compilation is a working MinGW-w64
install, and pointing CMake to use gcc.exe. This allows us to release
wheels for Windows AMD64 using cibuildwheel and MinGW-w64 on GitHub
Actions.
Easel and HMMER patches#
Some core areas that required patching for Windows/MinGW support include:
esl_bufferThe implementation for buffered readers in Easel, which internally supports data read from an in-memory location, from a
FILE*, or from a path using memory-mapping withmmap. Some include guards were adjusted to ensuremmapis not used on ammap-deprived target.esl_exceptionOn failure, Easel usually attempts to write to the system logger (using
vsyslog). Since MinGW does not implement that interface we simply silence errors reachingesl_exception. In practice, PyHMMER should catch them at a higher level and translate them to a Python exception with a custom exception handler, although some errors may be uncaught and cause a system exit.ctime_r
creation_timecode usesctime_rto format the HMM creation time, which is not a POSIX function. Windows provides similar functionality in thectime_sfunction, for which we added include guards and conditional compilation.impl_sseSeveral parts of the HMMER SSE implementation require manual memory alignment for the dynamic programming matrices, and incidentally used
unsigned longto cast pointers for unsigned arithmetic. Unfortunately, on Windows 64-bit,longis 32-bit and onlylong longis 64-bit, which caused pointers to be truncated. A patch was added to use theuintptr_ttype for casting pointers, which is exposed instdint.hfor that exact purpose.
Memory Management#
Windows complicates memory management because each DLL handles its own
memory management (see documentation).
Memory allocated within a DLL must be deallocated within that same DLL
because of virtual address space management. This required some refactoring
since PyHMMER was reallocating routinely in different setters of classes
wrapping a corresponding Easel or HMMER class (such as Sequence
or HMM).
Where applicable, we now use esl_strdup from Easel to allocate string data
for Easel dataclasses, and corresponding esl_free when data originating
from Easel must be manually deallocated in PyHMMER. There may still be
locations where “illegal” allocation/deallocation happens, although most of
them should have been tracked and eliminated with the help of the test suite.
File-like object handling#
Several I/O classes of PyHMMER, such as SequenceFile, MSAFile
or MSAFile support reading data from any Python file-like object,
and not just from a file located on the filesystem. This is handled through
OS specific code, where fopen_cookie is used on Linux and funopen is
used on BSD and derivatives (MacOS). Unfortunately, such an API is not exposed
by MinGW.
To emulate file-like object support, we use Windows’ anonymous pipes, and then
use _open_osfhandle to get a FILE* from one end of the handle, while
wrapping the other end inside a background Python thread which manages the
piping accordingly. This has high risks of creating deadlocks if an I/O
operation happens in the main thread without having released the GIL.