Plan7

High-level interface to the Plan7 data model.

Plan7 is the model architecture used by HMMER since HMMER2.

See also

Details about the Plan 7 architecture in the HMMER documentation.

Alignment

class pyhmmer.plan7.Alignment

A single alignment of a sequence to a profile.

hmm_accession

The accession of the query, or its name if it has none.

Type

bytes

hmm_from

The start coordinate of the alignment in the query HMM.

Type

int

hmm_name

The name of the query HMM.

Type

bytes

hmm_sequence

The sequence of the query HMM in the alignment.

Type

str

hmm_to

The end coordinate of the alignment in the query HMM.

Type

int

identity_sequence

The identity sequence between the query and the target.

Type

str

target_from

The start coordinate of the alignment in the target sequence.

Type

int

target_name

The name of the target sequence.

Type

bytes

target_sequence

The sequence of the target sequence in the alignment.

Type

str

target_to

The end coordinate of the alignment in the target sequence.

Type

int

Background Model

class pyhmmer.plan7.Background

The null background model of HMMER.

copy()

Create a copy of the null model with the same parameters.

L

The mean of the null model length distribution, in residues.

Type

int

Builder

class pyhmmer.plan7.Builder

A factory for constructing new HMMs from raw sequences.

build(sequence, background, popen=0.02, pextend=0.4)

Build a new HMM from sequence using the builder configuration.

Parameters
  • sequence (DigitalSequence) – A single biological sequence in digital mode to build a HMM with.

  • background (pyhmmer.plan7.background) – The background model to use to create the HMM.

  • popen (float) – The gap open probability to use with the score system. Defaults to 0.02.

  • pextend (float) – The gap extend probability to use with the score system. Defaults to 0.4.

Raises

ValueError – When either sequence or background have the wrong alphabet for this builder.

copy()

Create a duplicate Builder instance with the same arguments.

seed

The seed used by the internal random number generator.

Setting the seed will effectively reinitialize the internal RNG. In the special case the seed is 0, a one-time arbitrary seed will be chosen and the RNG will no be reseeded for reproducibility.

Type

int

Domains

class pyhmmer.plan7.Domain

A single domain in a query Hit.

c_evalue

The independent e-value for the domain.

Type

float

i_evalue

The independent e-value for the domain.

Type

float

score

The overall score in bits, null-corrected.

Type

float

class pyhmmer.plan7.Domains

A sequence of domains corresponding to a single Hit.

Hits

class pyhmmer.plan7.Hit

A high-scoring database hit found by the comparison pipeline.

accession

The accession of the database hit, if any.

Type

bytes or None

description

The description of the database hit, if any.

Type

bytes or None

domains

The list of domains aligned to this hit.

Type

Domains

evalue

The e-value of the hit.

Type

float

name

The name of the database hit.

Type

bytes

pre_score

Bit score of the sequence before null2 correction.

Type

float

score

Bit score of the sequence with all domains after correction.

Type

float

class pyhmmer.plan7.TopHits

A ranked list of top-scoring hits.

TopHits are thresholded using the parameters from the pipeline, and are sorted by key when you obtain them from a Pipeline instance:

>>> abc = thioesterase.alphabet
>>> hits = Pipeline(abc).search_hmm(thioesterase, proteins)
>>> hits.is_sorted()
True

Use len to query the number of top hits, and the usual indexing notation to extract a particular Hit:

>>> len(hits)
1
>>> hits[0].name
b'938293.PRJEB85.HG003687_113'
clear()

Free internals to allow reusing for a new pipeline run.

is_sorted(by='key')

Check whether or not the hits are sorted with the given method.

See sort for a list of allowed values for the by argument.

sort(by='key')

Sort hits in the current instance using the given method.

Parameters

by (str) – The comparison method to use to compare hits. Allowed values are: key (the default) to sort by key, or seqidx to sort by sequence index and alignment position.

HMM

class pyhmmer.plan7.HMM

A data structure storing the Plan7 Hidden Markov Model.

write(fh, binary=False)

Write the HMM to a file handle.

Parameters
  • fh (io.IOBase) – A Python file handle, opened in binary mode (this must be the case even with binary=False, since the C code will emit bytes in either case).

  • binary (bool) – Pass False to emit the file in ASCII mode using the latest supported HMMER format, or True to use the binary HMMER3 format.

zero()

Set all parameters to zero, including model composition.

M

The length of the model (i.e. the number of nodes).

Type

int

accession

The accession of the HMM, if any.

Type

bytes or None

description

The description of the HMM, if any.

Type

bytes or None

name

The name of the HMM, if any.

Type

bytes or None

HMM File

class pyhmmer.plan7.HMMFile

A wrapper around a file (or database), storing serialized HMMs.

close()

Close the HMM file and free resources.

This method has no effect if the file is already closed. It called automatically if the HMMFile was used in a context:

>>> with HMMFile("tests/data/hmms/bin/PKSI-AT.h3m") as hmm_file:
...     hmm = next(hmm_file)

Pipeline

class pyhmmer.plan7.Pipeline

An HMMER3 accelerated sequence/profile comparison pipeline.

clear()

Reset the pipeline configuration to its default state.

search_hmm(query, sequences)

Run the pipeline using a query HMM against a sequence database.

Parameters
  • query (HMM) – The HMM object to use to query the sequence database.

  • sequences (iterable of DigitalSequence) – The sequences to query with the HMM. For instance, pass a SequenceFile in digital mode to read from disk iteratively.

Returns

TopHits – the hits found in the sequence database.

Raises

ValueError – When the alphabet of the current pipeline does not match the alphabet of the given HMM.

search_seq(query, sequences, builder=None)

Run the pipeline using a query sequence against a sequence database.

Parameters
  • query (DigitalSequence) – The sequence object to use to query the sequence database.

  • sequences (iterable of DigitalSequence) – The sequences to query. Pass a SequenceFile instance in digital mode to read from disk iteratively.

  • builder (Builder, optional) – A HMM builder to use to convert the query to a HMM. If None is given, it will use a default one.

Returns

TopHits – the hits found in the sequence database.

Raises

ValueError – When the alphabet of the current pipeline does not match the alphabet of the given query.

Z

The number of effective targets searched.

It is used to compute the independent e-value for each domain, and for an entire hit. If None, the parameter number will be set automatically after all the comparisons have been done. Otherwise, it can be set to an arbitrary number.

Type

float or None

domZ

The number of significant targets.

It is used to compute the conditional e-value for each domain. If None, the parameter number will be set automatically after all the comparisons have been done, and all hits have been thresholded. Otherwise, it can be set to an arbitrary number.

Type

float or None

seed

The seed used by the internal random number generator.

Type

int

Profile

class pyhmmer.plan7.Profile

A Plan7 search profile.

clear()

Clear internal buffers to reuse the profile without reallocation.

configure(hmm, background, L, multihit=True, local=True)

Configure a search profile using the given models.

Parameters
  • hmm (pyhmmer.plan7.HMM) – The model HMM with core probabilities.

  • bg (pyhmmer.plan7.Background) – The null background model.

  • L (int) – The expected target sequence length.

  • multihit (bool) – Whether or not to use multihit modes.

  • local (bool) – Whether or not to use non-local modes.

copy()

Return a copy of the profile with the exact same configuration.

is_local()

Return whether or not the profile is in a local alignment mode.

is_multihit()

Returns whether or not the profile is in a multihit alignment mode.

optimized()

Convert the profile to a platform-specific optimized profile.

Returns

OptimizedProfile – The platform-specific optimized profile built using the configuration of this profile.

class pyhmmer.plan7.OptimizedProfile

An optimized profile that uses platform-specific instructions.

copy()

Create an exact copy of the optimized profile.

is_local()

Return whether or not the profile is in a local alignment mode.

write(fh_filter, fh_profile)

Write an optimized profile to two separate files.

HMMER implements an acceleration pipeline using several scoring algorithms. Parameters for MSV (the Multi ungapped Segment Viterbi) are saved independently to the fh_filter handle, while the rest of the profile is saved to fh_profile.