Changelog#

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

Unreleased#

v0.10.15 - 2024-10-08#

Added#

  • query property TopHits referencing the original object used to create the TopHits #76.

Changed#

  • Require the query object to create a TopHits object.

  • Make TopHits generic over its query property.

  • Deprecate old query properties of TopHits (query_name, query_length, query_accession).

Removed#

  • Detection of SSE flush from setup.py (#71).

v0.10.14 - 2024-07-16#

Added#

  • Detection of SSE flush modes to setup.py for possible performance gains on x86 platforms.

Changed#

  • Migrate documentation to pydata-sphinx-theme.

Fixed#

  • Documentation examples not using permanent resource links.

v0.10.13 - 2024-06-19#

Changed#

  • Allow AlphabetMismatch error to allow for an unknown actual alphabet.

  • Make HMMFile and HMMPressedFile raise AlphabetMismatch on files with mixed alphabets.

Fixed#

  • Avoid calling fclose with null pointers in Sequence.write and MSA.write.

v0.10.12 - 2024-04-25#

Fixed#

  • HMM.__setstate__ not properly extracting the cutoff from pickle state for some HMMs (#67).

Changed#

  • Update and remove some test files to reduce size of distributed package data.

v0.10.11 - 2024-03-27#

Fixed#

  • Compilation of Easel and HMMER code not using SSE4.1 extensions.

v0.10.10 - 2024-03-18 - YANKED#

Fixed#

  • Implement write function for fopencookie with off_t instead of off64_t for compatibility.

  • Fix handling of NULL buffers passed to read and write methods of fopencookie.

v0.10.9 - 2024-03-12 - YANKED#

Fixed#

  • Reallocation issue causing segmentation faults in nhmmer with more than 64 sequences (#62).

v0.10.8 - 2024-03-06 - YANKED#

Added#

  • Getter to access the strand of a Domain produced by a LongTargetsPipeline.

Changed#

  • Display model and cutoff names in MissingCutoffs error message, if any.

  • Allow LongTargetsPipeline to be configured with window length and beta parameters.

  • Make nhmmer use the window length and beta from the options when creating a Builder.

Fixed#

  • nhmmer not computing E-values for non-default window lengths (moshi4/pybarrnap#2).

  • SequenceFile and MSAFile crashing with a segmentation fault when given the path to a folder rather than a file.

v0.10.7 - 2024-03-04 - YANKED#

Added#

  • Pre-compiled wheels for PyPy 3.10.

Fixed#

  • Invalid pointer cast in __getbuffer__ method of Matrix and Vector objects.

  • Remaining tests failing to run on missing importlib-resources.

  • pyhmmer.hmmer dispatchers possibly dead-locking on background thread errors (#60).

v0.10.6 - 2024-02-20 - YANKED#

Added#

  • armv7 and aarch64 to the PKGBUILD architectures.

Changed#

  • SSIReader and SSIWriter constructors now accept path-like objects.

  • Skip tests dependending on importlib.resources.files when it is not available on the host machine.

Fixed#

  • Memory leak caused by alphabet allocation in Pipeline._scan_loop_file.

v0.10.5 - 2024-02-16 - YANKED#

Added#

  • Alignment properties to get the original lengths of the sequence and HMM being stored.

  • Hit.length property storing the length of the hit sequence (or HMM).

  • TopHits.query_length storing the length of the hit HMM (or query).

  • Alignment.posterior_probabilities property showing an encoded representation of posteriors (#59, by @arajkovic).

  • Trace.score method to compute a trace score from a given profile and sequence.

  • Alignment.__sizeof__ implementation leveraing p7_alidisplay_SizeOf.

Fixed#

  • Cutoffs proxy objects not recording their owner to prevent deallocation.

  • Avoid GIL re-acquisition in GeneticCode.translate.

  • Query metadata not being recorded in Hits obtained from daemon.Client.

  • Empty MatrixU8 creation attempting zero-allocation.

  • VectorU8.zeros allocating 4x more memory than required.

  • Memory leak caused by string duplication in __getbuffer__ methods of Matrix and Vector types.

v0.10.4 - 2023-10-29 - YANKED#

Added#

  • residue_markups argument to TextSequence and DigitalSequence constructors.

  • __reduce__ implementation to TextSequence, DigitalSequence, TextSequenceBlock and DigitalSequenceBlock.

Changed#

  • Handling of easel I/O methods to avoid implicit GIL acquisition for error checking.

Fixed#

  • Syntax errors in type annotation files.

v0.10.3 - 2023-10-22 - YANKED#

Added#

  • Out-of-band pickle serialization of Bitfield objects.

  • Getters for float attributes and forward/backward parameters of OptimizedProfile.

  • InvalidHMM error raised by HMM.validate.

Changed#

  • Mark HMM.zero method as noexcept.

  • Increase size of buffer for the query queue in the hmmer dispatcher.

Fixed#

  • Unneeded semaphore in pyhmmer.hmmer message passing implementation.

  • Broken assertion in Bitfield._from_raw_bytes.

  • Relax tolerance of HMM validation in TraceAligner.align_traces.

v0.10.2 - 2023-08-20 - YANKED#

Fixed#

  • Invalid buffer write in DigitalSequenceBlock.translate (#50).

v0.10.1 - 2023-08-17 - YANKED#

Added#

  • HMM.set_consensus method to set the consensus for a method or compute it from the emission probabilities.

Fixed#

  • Platform detection for MacOS and Armv7 platforms in setup.py.

  • pyhmmer.plan7.HMM constructor setting a consensus string forcefully.

v0.10.0 - 2023-08-16 - YANKED#

Added#

  • Support for compiling wheels for Aarch64 and NEON-enabled Arm platforms.

Changed#

Fixed#

  • Patch missing PyInterpreterState_GetID preventing the package from working on PyPy 3.9.

v0.9.0 - 2023-08-03#

Added#

  • TopHits.mode property showing from which pipeline mode (search or scan) the hits were obtained.

Changed#

  • Updated the code for Cython v3.0.

Fixed#

  • TopHits.merge not properly handling inclusion and reporting for domains (#46, #47, by @zdk123).

v0.8.2 - 2023-06-07#

Added#

  • Bracket-style repr implementation to HMM, Profile and OptimizedProfile showing model alphabet, length and name.

  • MissingCutoffs and InvalidParameter exceptions inheriting ValueError.

Changed#

  • Replace pthread locks with PyThread API for synchronizing models in OptimizedProfileBlock.

Fixed#

  • Sequence length extraction in LongTargetsPipeline.search_hmm (#42).

  • LongTargetsPipeline.search_msa not building a HMM with Builder.build_msa.

v0.8.1 - 2023-05-19#

Added#

  • HMM.validate method to ensure a HMM holds HMMER structural constraints.

  • plan7.Transitions enum with transition names for indexing HMM.transition_probabilities.

v0.8.0 - 2023-05-01#

PyHMMER has been accepted for publication in Bioinformatics. Paper can be reached at doi:10.1093/bioinformatics/btad214.

Added#

  • pyhmmer.hmmer.jackhmmer function to run several JackHMMER iterative searches in parallel using multithreading (#35, by @zdk123).

  • HMM.to_profile shortcut method to allocate and configure a new Profile object.

Fixed#

  • Type annotations of Pipeline.iterate_seq and Pipeline.iterate_hmm.

  • Potential memory leak on exceptions raised by HMMPressedFile.read.

  • Offsets.profile not recording offsets properly, causing pyhmmer.hmmer.hmmpress to produce invalid pressed files (#37).

Changed#

  • HMM.__init__ and HMM.sample now take the Alphabet as the first argument, for consistency with the rest of the API.

  • HMM now require a name argument.

Removed#

  • Deprecated ignore_gaps argument in SequenceFile.__init__.

  • Deprecated Sequence.taxonomy_id property.

v0.7.4 - 2023-04-14#

Added#

  • Recipes page to the documentation with code example for loading multiple HMM files (#24, by @zdk123).

Fixed#

  • TraceAligner methods causing a segfault when passed an uninitialized HMM (#36).

Changed#

  • HMM default constructor now always creates a valid HMM (with respects to probability arrays).

  • TraceAligner now validates the input HMM before calling the HMMER code.

  • Use stack allocation for all error buffers instead of creating empty bytearray objects where applicable.

v0.7.3 - 2023-03-24#

Fixed#

  • Wrong argument type in IterativeSearch.iterate_hmm method (#34, by @zdk123).

v0.7.2 - 2023-02-17#

Added#

  • easel.GeneticCode class wrapping an ESL_GENCODE struct for configuring translation.

  • DigitalSequence.translate method to translate a nucleotide sequence to a protein sequence. Metadata is copied from the source sequence to its translation (#31, by @valentynbez).

Deprecated#

  • Sequence.taxonomy_id property, as it is not used by Easel and implementation is not consistent (see EddyRivasLab/easel#68).

v0.7.1 - 2022-12-15#

Added#

  • Missing __reduce__ method to TopHits.

Fixed#

  • Build detection of available platform functions in setup.py.

v0.7.0 - 2022-12-04#

Added#

  • Bitfield.zeros and Bitfield.ones classmethods for constructing an empty bitfield of known size.

  • Bitfield.copy method to copy a bitfield object.

  • SequenceBlock and OptimizedProfileBlock classes to store Python objects next to a contiguous array of pointers for iterating with the GIL released.

  • SequenceFile.read_block method to read a whole sequence block from a file.

  • HMM.sample class method to generate a HMM at random given a Randomness source.

  • hmmscan function to scan a profile database with sequence queries.

  • deepcopy implementations to HMM, Profile and OptimizedProfile classes of plan7.

  • rewind method to HMMFile, HMMPressedFile and SequenceFile to reset a file back to its initial position.

  • name attribute to HMMFile, HMMPressedFile, MSAFile and SequenceFile to expose the path of a file (when it was created from path).

  • local property to Profile and OptimizedProfile, indicating whether a profile is in local or global mode.

  • multihit property to Profile and OptimizedProfile, indicating whether a profile is in unihit or multihit mode, with a setter taking care of the reconfiguration.

  • Domain.included and Domain.reported settable properties to report the inclusion and reporting status of a single domain.

  • TopHits.included and TopHits.reported sized iterator to iterate only on included and reported hits.

  • Domains.included and Domains.reported sized iterator to iterate only on included and reported domains.

Changed#

  • Bitfield, Vector and Matrix can now be created from an iterable.

  • Pipeline search methods now expect a DigitalSequenceBlock or a SequenceFile for the target sequence database.

  • Pipeline scan methods now expect an OptimizedProfileBlock or a HMMPressedFile for the target profile database.

  • TraceAligner now expect a DigitalSequenceBlock for the sequences to align to the HMM.

  • Profile.configure now uses a default value of 400 for the L argument.

  • hmmsearch, nhmmer and phmmer support being given a single query instead of requiring an iterable.

  • HMMPressedFile can now be created, closed and used as a context manager directly without having to manage the source HMMFile.

  • Renamed Profile.optimized method to Profile.to_optimized.

  • Replaced Randomness.is_fast method with the Randomness.fast property.

  • Rewrite handling of Hit flags using settable properties (Hit.included, Hit.reported, Hit.new, Hit.dropped, Hit.duplicate) instead of methods.

Fixed#

  • Memory leak in the LongTargetsPipeline search loop.

  • PyPy behaviour change of readinto methods now expecting unsigned char* instead of char* memoryview.

  • NULL-pointer dereference in Pipeline.search_hmm when given a query without name.

  • LongTargetsPipeline not recording the query name and accession.

  • Memory leak caused by using a non-default prior scheme when constructing a Builder.

Removed#

  • PipelineSearchTargets, replaced in functionality with easel.DigitalSequenceBlock.

  • is_local and is_multihit methods of Profile and OptimizedProfile, replaced with equivalent properties.

  • Hit.manually_drop and Hit.manually_include methods, replaced with the different Hit properties.

v0.6.3 - 2022-09-09#

Fixed#

  • Error not being raised on alphabet detection failure in SequenceFile or MSAFile.

  • Add check in DigitalSequence constructor to make sure encoded characters are in valid range (#25).

Added#

  • SequenceFile.guess_alphabet and MSAFile.guess_alphabet to guess the alphabet from an open file.

  • Alphabet.encode and Alphabet.decode to convert raw sequences between digital and text format.

v0.6.2 - 2022-08-12#

Changed#

  • hmmsearch, phmmer and nhmmer functions will reduce the requested number of threads to the number of queries, if it can be detected using operator.length_hint.

Added#

  • Documentation for loading all HMMs from an HMMFile object at once (#23).

  • List of projects depending on PyHMMER to the Examples page of the documentation.

v0.6.1 - 2022-06-28#

Added#

  • pickle protocol support for TopHits objects, using the HMMER network serialization.

  • TopHits.write method to write hits to a file in tabular format.

  • query_name and query_accession properties to TopHits objects to access the name and accession of the query that produced the hits.

Fixed#

  • Extraction of filename from file-like objects in the HMMFile constructor.

  • Use os.cpu_count instead of multiprocessing.cpu_count where applicable to preserve OS scheduling.

  • Wrong return type in docstring of HMM.insert_emissions.

  • TopHits.searched_nodes returning the searched number of residues instead of the searched number of model nodes.

  • Unsound decoding of pickled MatrixF or VectorF when data comes from a source of different endianness.

Changed#

  • Rewrite pyhmmer.hmmer threading code using Deque instead of collections.Queue to store the queries and results.

  • Reduce memory consumption of pyhmmer.hmmer by reducing the number of semaphores and event flags used concurrently.

  • Make pyhmmer.hmmer main threads block on query insertion rather than result retrieval to make sure worker threads are never idling.

v0.6.0 - 2022-05-01#

Added#

  • pyhmmer.daemon module with an client implementation to communicate to a hmmpgmd server.

  • Pipeline.arguments methods to get a list of CLI arguments from the parameters used to initialize the Pipeline.

  • Setters for name, accession and description properties of plan7.Hit.

  • Constructor for individual plan7.Trace objects outside a plan7.Traces list.

  • plan7.Trace.from_sequence constructor to create a faux trace from a single sequence.

  • manually_include and manually_drop methods to plan7.Hit for manually selecting the inclusion status of a Hit in a TopHits instance.

  • compare_ranking method to plan7.TopHits for comparing the order of the hits compared to a previous run on the same targets stored in an easel.KeyHash object.

  • Pipeline.iterate_seq and Pipeline.iterate_hmm to run iterative queries like JackHMMER.

  • repr implementations for easel.MSAFile, easel.SequenceFile and easel.HMMFile showing the path or file object they were created from.

  • repr implementation for easel.Randomness showing the seed and the RNG algorithm in use.

  • str implementation for plan7.Alignment using HMMER original code to display a domain alignment like in search/scan results.

Changed#

  • plan7.Trace.posterior_probabilities property may now be None in case no memory is allocated for the posteriors in the P7_TRACE struct.

  • TopHits.to_msa can now add additional sequences passed as arguments to the alignment.

  • plan7.HMMPressedFile now raises an exception on attempts to create a new instance manually.

  • ignore_gaps argument of easel.SequenceFile is now deprecated.

  • repr implementations for easel types now use the fully qualified class name.

Fixed#

  • easel.SequenceFile.readinto docstring not rendering properly in documentation.

  • Type annotations of hits_included and hits_reported of plan7.TopHits marking these properties as bool instead of int.

  • Setters of name, accession, description and author properties of easel.MSA crashing when given None values.

  • Exception value raised from Easel code not being properly extracted.

  • Plain strings being used in example for easel.TextSequence and easel.TextMSA constructors where byte strings are expected (#20).

v0.5.0 - 2022-03-14#

Added#

  • plan7.PipelineSearchTargets to reduce the overhead when searching the same sequences several times with different. query profiles.

  • TopHits.copy method to duplicate a TopHits instance.

  • TopHits.merge method to merge hits obtained with the same query on different targets.

  • Buffer protocol implementation for pyhmmer.easel.Bitfield.

Changed#

  • Renamed TopHits.included and TopHits.reported properties to TopHits.hits_included and TopHits.hits_included.

  • MSAFile and SequenceFile are now directly in digital mode if they are instantiated with digital=True.

  • SequenceFile.parse can now return a sequence in digital mode.

  • Reorganized tests to make then runnable from a site install.

Fixed#

  • Usage of memcpy in contexts where it may have had undefined behaviour.

  • VectorF.__eq__ crashing when comparing two empty objects.

  • SequenceFile and MSAFile not closing file handles when raising an error in __init__.

v0.4.11 - 2021-12-15#

Added#

  • plan7.HMMFile.read method to read a single plan7.HMM from an plan7.HMMFile (instead of using next).

  • closed property on easel.SequenceFile, easel.MSAFile and plan7.HMMFile to mark whether a file object is closed.

  • plan7.HMMFile.is_pressed method to check whether a HMM file has associated pressed data.

  • plan7.HMMFile.optimized_profiles methods to read the plan7.OptimizedProfile entries in an plan7.HMMFile is there are associated pressed data available.

  • Getters for the name, accession, description, consensus, consensus_structure, evalue_parameters and cutoffs properties of a plan7.OptimizedProfile.

  • plan7.OptimizedProfile.__eq__ implementation to compare two optimized profiles.

  • __sizeof__ implementations for plan7.OptimizedProfile and plan7.Profile to get the allocated size of a profile.

Fixed#

  • Double-free caused by the Cython cycle breaking feature on several view types (easel.Randomness, easel.Vector, easel.Matrix, plan7.Cutoffs, plan7.EvalueParameters, plan7.Offsets, plan7.Trace)

  • plan7.Hit.description using the pointer to the accession string erroneously, causing occasional NULL dereference.

  • plan7.OptimizedProfile.copy performing a shallow copy instead of a deep copy as expected.

Changed#

  • pyhmmer.hmmer type annotations now explicit support for plan7.Profile or plan7.OptimizedProfile inputs where applicable.

v0.4.10 - 2021-12-06#

Added#

  • entropy and relative_entropy methods to easel.VectorF to compute the Shannon entropy of a vector and the Kullback-Leibler divergence of two vectors.

  • mean_match_entropy, mean_match_information and mean_match_relative_entropy methods to plan7.HMM to get information statistics of an HMM model.

  • match_occupancy method to plan7.HMM to compute the occupancy for each match state as an easel.VectorF.

Fixed#

  • plan7.Builder.build_msa using the gap-open and gap-extend probabilities instead of the MSA itself to compute the transition probabilities for the new HMM.

Changed#

  • plan7.Builder.build will now only load the score system once and reuse it unless a different score system is requested between calls.

v0.4.9 - 2021-11-11#

Added#

  • plan7.ScoreData class to store the substitution scores and maximal extensions for a long target search.

  • plan7.LongTargetsPipeline to run searches on targets longer than 100,000 residues.

  • Alphabet methods to check whether an Alphabet object is a DNA, RNA, nucleotide or protein alphabet.

  • window_length and window_beta arguments to plan7.Builder to set the max length of nucleotide HMM created by builder objects.

Changed#

  • pyhmmer.hmmer.nhmmer now uses a LongTargetsPipeline instead of a Pipeline to search the target sequences.

  • pyhmmer.hmmer.nhmmer now supports HMM queries in addition to DigitalSequence and DigitalMSA queries.

  • pyhmmer.hmmer.phmmer now always assumes protein queries.

  • Z and domZ attributes of plan7.TopHits objects is now read-only.

Fixed#

  • nhmmer now uses DNA as the default alphabet instead of amino acid alphabet like it did before (#12).

v0.4.8 - 2021-10-27#

Added#

  • Constructor arguments and properties to plan7.Pipeline to support bit score thresholds instead to filter top hits.

  • Support for creating a SequenceFile and an MSAFile using a Python file-like object instead of only supporting filenames.

  • Support for reading individual sequences from an MSA file with SequenceFile.

  • TextMSA.alignment to access the actual alignment as a tuple of strings.

  • Subtraction and division support for easel.Vector subclasses

Changed#

  • plan7.Cutoffs now support setting the bit score cutoffs, but requires both to be set or cleared at the same time.

  • easel.Vector will always allocate some memory when created manually to avoid having a special empty case in every vector method.

  • pyhmmer.easel.AllocationError now stores the size it failed to allocate, and the number of elements when allocating an array.

Fixed#

  • TextSequence.digitize will not raise a ValueError when the sequence contains invalid characters for the alphabet (previously was an UnexpectedError).

v0.4.7 - 2021-09-28#

Added#

  • TraceAligner, Trace and Traces classes to pyhmmer.plan7 to get tracebacks after aligning several sequences against an HMM.

  • pyhmmer.hmmalign function with the same features as the hmmalign binary from HMMER3.

  • Support for out-of-band pickling in easel.Vector and easel.Matrix.

Changed#

  • Allow creating an empty Vector or Matrix by calling their constructor without arguments.

Fixed#

  • Potential unreported exceptions in plan7.OptimizedProfile.write and several plan7.SSIWriter methods.

v0.4.6 - 2021-09-10#

Added#

  • pickle protocol for easel.Alphabet, easel.Bitfield, easel.KeyHash, easel.Vector, easel.Matrix and plan7.HMM.

  • taxonomy_id and residue_markups properties to easel.Sequence.

  • sum_score property to plan7.Hit.

  • plan7.EvalueParameters class to expose the e-value parameters of a plan7.HMM or a plan7.Profile.

  • Equality checks and slicing for easel.Matrix and easel.Vector.

  • Support for creating and manipulating zero-sized easel matrices and vectors.

  • plan7.Cutoffs class to expose the Pfam score cutoffs of a plan7.HMM or a plan7.Profile.

  • Keyword arguments to configure E-value thresholds when creating a plan7.Pipeline object.

  • Support for using model-specific thresholding options in plan7.Pipeline.

Changed#

  • Use the replace error handler when decoding error messages to skip potential decoding issues when already building an exception.

  • Improve pyhmmer.hmmer to ensure background threads exit on a KeyboardInterrupt.

  • easel.VectorU8.__eq__ accepts any object implementing the buffer protocol.

  • plan7.HMM.creation_time now takes and returns a datetime.datetime object, assuming the field is only ever set with asctime.

  • Refactor easel.Vector and easel.Matrix and mark exposed memory as C-contiguous.

Fixed#

  • easel.Alphabet not reporting potential allocation errors.

  • Potential buffer overflow in easel.Matrix and easel.Vector when calling __init__ more than once.

v0.4.5 - 2021-07-19#

Added#

  • OptimizedProfile.convert method to configure an optimized profile from a Profile without reallocating a new P7_OPROFILE struct.

Changed#

  • Rewrite the plan7.Pipeline search loop to avoid reacquiring the GIL between reference sequences.

  • Require the reference sequences to be stored in a collection (instead of an iterable) when passing them to the search_hmm, search_msa and search_seq methods of plan7.Pipeline.

  • Avoid reallocating a new OptimizedProfile every time a new HMM is passed to Pipeline.search_hmm.

  • Relax the GIL while sorting and thresholding TopHits in Pipeline search methods.

v0.4.4 - 2021-07-07#

Added#

  • ignore_gaps parameter to pyhmmer.plan7.SequenceFile, allowing to skip the gap characters when reading a sequence from an ungapped format.

  • __sizeof__ implementation for some

  • Dedicated check for sequence length before running the platform-specific code in pyhmmer.plan7.Pipeline.

Fixed#

  • Score system not being set in pyhmmer.plan7.Builder.build_msa.

  • Alphabet not being checked after the first sequence in Pipeline search and scan methods.

v0.4.3 - 2021-07-03#

Fixed#

  • File object wrappers not reporting exceptions raised when seeking on OSX/BSD platforms.

v0.4.2 - 2021-06-20#

Added#

  • pyhmmer.easel.Randomness class exposing a deterministic random number generator.

  • pyhmmer.plan7.Builder.randomness and pyhmmer.plan7.Pipeline.randomness attributes exposing the internal random number generator used by each object.

  • pyhmmer.plan7.Hit.best_domain property mapping to the highest scoring domain of a hit.

  • pyhmmer.plan7.OptimizedProfile.rbv property exposing match scores.

  • pyhmmer.plan7.Domain.pvalue and pyhmmer.plan7.Hit.pvalue reporting the p-value for a domain or hit bitscore.

Fixed#

  • Dimensions of the pyhmmer.plan7.OptimizedProfile.sbv matrix not being properly set.

v0.4.1 - 2021-06-06#

Fixed#

  • Main buffer not being freed in MatrixF.__dealloc__ and MatrixU8.__dealloc__ when created without owner.

Added#

  • Additional configuration values for pyhmmer.plan7.Pipeline as both constructor arguments and mutable properties.

  • consensus, consensus_structure and offsets properties to pyhmmer.plan7.Profile objects.

Changed#

  • Make OptimizedProfile.ssv_filter check the alphabet of the given sequence.

v0.4.0 - 2021-06-05 - YANKED#

Added#

  • Linear algebra primitives to expose 1D (Vector) and 2D (Matrix) contiguous buffers containing numerical values to pyhmmer.easel.

  • Documentation for the Z and domZ parameters of the pyhmmer.plan7.Pipeline constructor.

  • pyhmmer.errors.AlphabetMismatch exception deriving from ValueError to specifically report mismatching Easel alphabets where applicable.

  • scale and normalize methods to pyhmmer.plan7.HMM objects.

  • Property to access pyhmmer.plan7.Background residue frequencies as a VectorF object.

  • Property to access pyhmmer.plan7.HMM mean residue composition as a VectorF object.

  • Property to access pyhmmer.plan7.HMM probabilities and emissions as MatrixF objects.

  • ssv_filter methods to pyhmmer.plan7.OptimizedProfile to get the SSV filter score of the profile for a given sequence.

  • Several additional properties to access the pyhmmer.plan7.OptimizedProfile internals.

Removed#

  • Unused report_e parameter of pyhmmer.plan7.Pipeline constructor.

  • pyhmmer.plan7.TopHits.clear method which could lead to segfault if it was called while a Hit is being held.

Changed#

  • Multithreaded loop in pyhmmer.hmmer to reduce memory consumption while still yielding hits in order.

  • pyhmmer.easel.DigitalSequence.sequence property is now a VectorU8.

Fixed#

  • Type annotations in pyhmmer.hmmer.

  • Potential double free in pyhmmer.plan7.HMM.command_line property setter.

  • Minor floating-point precision issues in pyhmmer.plan7.Builder constructor.

  • Segfault in TextMSA.digitize caused by esl_msa_Copy not digitizing on-the-fly like esl_sq_Copy.

  • Exceptions not being raised in some methods of pyhmmer.plan7.Profile and pyhmmer.plan7.TopHits.

v0.3.1 - 2021-05-08#

Added#

  • Pipeline.scan_seq method to query a database of profiles with one or more sequences.

  • transition_probabilities, match_emissions, insert_emissions properties to the HMM class, providing access to the numerical parameters of the HMM.

  • consensus_structure and consensus_accessibility properties to the HMM class to get consensus lines from the source alignment if the HMM was created from a MSA.

  • nseq and nseq_effective properties to the HMM class to get the number of training sequences and effective sequences used to build the HMM.

Changed#

  • HMM.checksum is now None if the p7H_CHKSUM flag is not set.

  • Builder methods will now record sys.argv when creating a HMM.

Fixed#

  • HMM.write(..., binary=False) crashing on HMMs without a consensus line. (#5). Fixed upstream in (EddyRivasLab/HMMER#236).

  • Pipeline.reset mishandling the Z and domZ values if those were detected from the number of targets.

  • pyhmmer.hmmer functions will not block until all results have been collected anymore when run in multithreaded mode.

v0.3.0 - 2021-03-11#

Added#

  • easel.MSAFile to read from a file containing

  • accession, author, name and description properties to easel.MSA objects.

  • plan7.Builder.build_msa to build a pHMM from a sequence alignment.

  • Additional methods to easel.KeyHash, allowing to use it as a dict/set hybrid.

  • Sequence.write and MSA.write methods to format a sequence or an alignment to a file handle.

  • plan7.TopHits.to_msa method to convert all the top hits of a query against a database into a multiple sequence alignment.

  • easel.MSA.sequences attribute to access individual sequences of an alignment using the collections.abc.Sequence interface.

  • easel.DigitalMSA.textize method to convert a multiple sequence alignment in digital mode to its text-mode counterpart.

  • Read-only name, accession and description properties to plan7.Profile showing attributes inherited from the HMM it was configured with.

  • plan7.HMM.consensus property, allowing to access the consensus sequence of a pHMM.

  • plan7.HMM equality implementation, using zero tolerance.

  • plan7.Pipeline.search_msa to query a MSA against a sequence database.

  • easel.Sequence.reverse_complement method allowing to reverse-complement inplace or to build a copy.

  • errors.AlphabetMismatch exception for use in cases where an alphabet is expected but not matched by the input.

  • hmmer.nhmmer function with the same behaviour as hmmer.phmmer, except it expects inputs with a DNA alphabet.

Fixed#

  • plan7.Builder.copy not copying some parameters correctly, causing pyhmmer.hmmer.phmmer to give inconsistent results in multithreaded mode.

  • easel.Bitfield not properly handling index overflows.

  • Documentation not rendering for the __init__ method of all classes.

Changed#

  • plan7.Builder gap-open and gap-extend probabilities are now set on instantiation and depend on the alphabet type.

  • Constructors for easel.TextMSA and easel.DigitalMSA, which can now be given an iterable of easel.Sequence objects to store in the alignment.

Removed#

  • Unimplemented easel.SequenceFile.fetch and easel.SequenceFile.fetchinto methods.

v0.2.2 - 2021-03-04#

Fixed#

  • Linking issues on OSX caused by aggressive stripping of intermediate libraries.

  • plan7.Builder RNG not reseeding between different HMMs.

v0.2.1 - 2021-01-29#

Added#

  • pyhmmer.plan7.HMM.checksum property to get the 32-bit checksum of an HMM.

v0.2.0 - 2021-01-21#

Added#

  • pyhmmer.plan7.Builder class to handle building a HMM from a sequence.

  • Pipeline.search_seq to query a sequence against a sequence database.

  • psutil dependency to detect the most efficient thread count for hmmsearch based on the number of physical CPUs.

  • pyhmmer.hmmer.phmmer function to run a search of query sequences against a sequence database.

Changed#

  • Pipeline.search was renamed to Pipeline.search_hmm for disambiguation.

  • libeasel.random sequences do not require the GIL anymore.

  • Public API now have proper signature annotations.

Fixed#

  • Inaccurate exception messages in Pipeline.search_hmm.

  • Unneeded RNG reallocation, replaced with re-initialisation where possible.

  • SequenceFile.__next__ not working after being set in digital mode.

  • sequences argument of hmmsearch now only requires a typing.Collection[DigitalSequence] instead of a typing.Collection[Sequence] (not more __getitem__ needed).

Removed#

  • hits argument to Pipeline.search_hmm to reduce risk of issues with TopHits reuse.

  • Broken alignment coordinates on Domain classes.

v0.1.4 - 2021-01-15#

Added#

  • DigitalSequence.textize to convert a digital sequence to a text sequence.

  • DigitalSequence.__init__ method allowing to create a digital sequence from any object implementing the buffer protocol.

  • Alignment.hmm_accession property to retrieve the accession of the HMM in an alignment.

v0.1.3 - 2021-01-08#

Fixed#

  • Compilation issues in OSX-specific Cython code.

v0.1.2 - 2021-01-07#

Fixed#

  • Required Cython files not being included in source distribution.

v0.1.1 - 2020-12-02#

Fixed#

  • HMMFile calling file.peek without arguments, causing it to crash when passed some types, e.g. gzip.GzipFile.

  • HMMFile failing to work with PyPy file objects because of a bug with their implementation of readinto.

  • C/Python file object implementation using strcpy instead of memcpy, causing issues when null bytes were read.

v0.1.0 - 2020-12-01#

Initial beta release.

Fixed#

  • TextSequence uses the sequence argument it’s given on instantiation.

  • Segmentation fault in Sequence.__eq__ caused by implicit type conversion.

  • Segmentation fault on SequenceFile.read failure.

  • Missing type annotations for the pyhmmer.easel module.

v0.1.0-a5 - 2020-11-28#

Added#

  • Sequence.__len__ magic method so that len(seq) returns the number of letters in seq.

  • Python file-handle support when opening an pyhmmer.plan7.HMMFile.

  • Context manager protocol to pyhmmer.easel.SSIWriter.

  • Type annotations for pyhmmer.easel.SSIWriter.

  • add_alias to pyhmmer.easel.SSIWriter.

  • write method to pyhmmer.plan7.OptimizedProfile to write an optimized profile in binary format.

  • offsets property to interact with the disk offsets of a pyhmmer.plan7.OptimizedProfile instance.

  • pyhmmer.hmmer.hmmpress emulating the hmmpress binary from HMMER.

  • M property to pyhmmer.plan7.HMM exposing the number of nodes in the model.

Changed#

  • Bumped vendored Easel to v0.48.

  • Bumped vendored HMMER to v3.3.2.

  • pyhmmer.plan7.HMMFile will raise an EOFError when given an empty file.

  • Renamed length property to L in pyhmmer.plan7.Background.

Fixed#

  • Segmentation fault when close method of pyhmmer.easel.SSIWriter was called more than once.

  • close method of pyhmmer.easel.SSIWriter not writing the index contents.

v0.1.0-a4 - 2020-11-24#

Added#

  • MSA, TextMSA and DigitalMSA classes representing a multiple sequence alignment to pyhmmer.easel.

  • Methods and protocol to copy a Sequence and a MSA.

  • pyhmmer.plan7.OptimizedProfile wrapping a platform-specific optimized profile.

  • SSIReader and SSIWriter classes interacting with sequence/subsequence indices to pyhmmer.easel.

  • Exception handler using Python exceptions to report Easel errors.

Changed#

  • pyhmmer.hmmsearch returns an iterator of TopHits, with one instance per HMM in the input.

  • pyhmmer.hmmsearch properly raises errors happenning in the background threads without deadlock.

  • pyhmmer.plan7.Pipeline recycles memory between Pipeline.search calls.

Fixed#

  • Missing type annotations for the pyhmmer.errors module.

Removed#

  • Unneeded or private methods from pyhmmer.plan7.

v0.1.0-a3 - 2020-11-19#

Added#

  • TextSequence and DigitalSequence representing a Sequence in a given mode.

  • E-value properties to Hit and Domain.

  • TopHits now stores a reference to the pipeline it was obtained from.

  • Pipeline.Z and Pipeline.domZ properties.

  • Experimental pickling support to Alphabet.

  • Experimental freelist to Sequence class to avoid allocation bottlenecks when iterating on a SequenceFile without recycling sequence buffers.

Changed#

  • Made Sequence an abstract base class.

  • Additional Pipeline parameters can be passed as keyword arguments to pyhmmer.hmmsearch.

  • SequenceFile.read can now be configured to skip reading the metadata or the content of a sequence.

Removed#

  • Redundant SequenceFile methods.

Fixed#

  • doctest loader crashing on Python 3.5.

  • TopHits.threshold segfaulting when being called without prior Tophits.sort call

  • Unknown format argument to SequenceFile constructor not raising the right error.

v0.1.0-a2 - 2020-11-12#

Added#

  • Support for compilation on PowerPC big-endian platforms.

  • Type annotations and stub files for Cython modules.

Changed#

  • distutils is now used to compile the package, instead of calling autotools and letting HMMER configure itself.

  • Bitfield.count now allows passing an argument (for compatibility with collections.abc.Sequence).

v0.1.0-a1 - 2020-11-10#

Initial alpha release (test deployment to PyPI).