Plan7¶
High-level interface to the Plan7 data model.
Plan7 is the model architecture used by HMMER since HMMER2.
See also
Details about the Plan 7 architecture in the HMMER documentation.
Alignment¶
Background Model¶
-
class
pyhmmer.plan7.
Background
¶ The null background model of HMMER.
-
__init__
(alphabet, uniform=False)¶ Create a new background model for the given
alphabet
.- Parameters
alphabet (
pyhmmer.easel.Alphabet
) – The alphabet to create the background model with.uniform (
bool
) – Whether or not to create the null model with uniform frequencies. Defaults toFalse
.
-
copy
()¶ Create a copy of the null model with the same parameters.
-
Builder¶
-
class
pyhmmer.plan7.
Builder
¶ A factory for constructing new HMMs from raw sequences.
New in version 0.2.0.
-
__init__
(alphabet, *, architecture='fast', weighting='pb', effective_number='entropy', prior_scheme='alpha', symfrac=0.5, fragthresh=0.5, wid=0.62, esigma=45.0, eid=0.62, EmL=200, EmN=200, EvL=200, EvN=200, EfL=100, EfN=200, Eft=0.04, seed=42, ere=None, popen=None, pextend=None)¶ Create a new sequence builder with the given configuration.
- Parameters
alphabet (
Alphabet
) – The alphabet the builder expects the sequences to be in.- Keyword Arguments
-
build
(sequence, background)¶ Build a new HMM from
sequence
using the builder configuration.- Parameters
sequence (
DigitalSequence
) – A single biological sequence in digital mode to build a HMM with.background (
pyhmmer.plan7.background
) – The background model to use to create the HMM.
- Raises
ValueError – When either
sequence
orbackground
have the wrong alphabet for this builder.
-
build_msa
(msa, background)¶ Build a new HMM from
msa
using the builder configuration.- Parameters
msa (
DigitalMSA
) – A multiple sequence alignment in digital mode to build a HMM with.background (
pyhmmer.plan7.background
) – The background model to use to create the HMM.
- Raises
ValueError – When either
msa
orbackground
have the wrong alphabet for this builder.
New in version 0.3.0.
-
Hits¶
-
class
pyhmmer.plan7.
Hit
¶ A high-scoring database hit found by the comparison pipeline.
-
class
pyhmmer.plan7.
TopHits
¶ A ranked list of top-scoring hits.
TopHits
are thresholded using the parameters from the pipeline, and are sorted by key when you obtain them from aPipeline
instance:>>> abc = thioesterase.alphabet >>> hits = Pipeline(abc).search_hmm(thioesterase, proteins) >>> hits.is_sorted() True
Use
len
to query the number of top hits, and the usual indexing notation to extract a particularHit
:>>> len(hits) 1 >>> hits[0].name b'938293.PRJEB85.HG003687_113'
-
clear
()¶ Free internals to allow reusing for a new pipeline run.
-
is_sorted
(by='key')¶ Check whether or not the hits are sorted with the given method.
See
sort
for a list of allowed values for theby
argument.
-
sort
(by='key')¶ Sort hits in the current instance using the given method.
- Parameters
by (
str
) – The comparison method to use to compare hits. Allowed values are:key
(the default) to sort by key, orseqidx
to sort by sequence index and alignment position.
-
to_msa
(alphabet, trim=False, digitize=False, all_consensus_cols=False)¶ Create multiple alignment of all included domains.
- Parameters
alphabet (
Alphabet
) – The alphabet of the HMM thisTopHits
was obtained from. It is required to convert back hits to single sequences.trim (
bool
) – Trim off any residues that get assigned to flanking \(N\) and \(C\) states (in profile traces) or \(I_0\) and \(I_m\) (in core traces).digitize (
bool
) – If set toTrue
, returns aDigitalMSA
instead of aTextMSA
.all_consensus_cols (
bool
) – Force a column to be created for every consensus column in the model, even if it means having all gap character in a column.
- Returns
MSA
– A multiple sequence alignment containing the reported hits, either aTextMSA
or aDigitalMSA
depending on the value of thedigitize
argument.
New in version 0.3.0.
-
HMM¶
-
class
pyhmmer.plan7.
HMM
¶ A data structure storing the Plan7 Hidden Markov Model.
-
__init__
(M, alphabet)¶ Create a new HMM from scratch.
-
copy
()¶ Return a copy of the HMM with the exact same configuration.
New in version 0.3.0.
-
write
(fh, binary=False)¶ Write the HMM to a file handle.
- Parameters
-
zero
()¶ Set all parameters to zero, including model composition.
-
checksum
¶ The 32-bit checksum of the HMM, if any.
The checksum if calculated from the alignment the HMM was created from, and was introduced in more recent HMM formats. This means some
HMM
objects may have a non-None
checksum.New in version 0.2.1.
Changed in version 0.3.1: Returns
None
if the HMM flag for the checksum is not set.
-
command_line
¶ The command line that built the model.
For HMMs created with
Builder
, this defaults tosys.argv
. It can however be set to any string, including multiline to show successive commands.Example
>>> print(thioesterase.command_line) hmmbuild Thioesterase.hmm Thioesterase.fa hmmcalibrate Thioesterase.hmm
New in version 0.3.1.
-
consensus_accessibility
¶ The consensus accessibility of the HMM, if any.
New in version 0.3.1.
-
consensus_structure
¶ The consensus structure of the HMM, if any.
New in version 0.3.1.
-
insert_emissions
¶ The insert emissions of the model.
The returned memoryview exposes a matrix of dimensions \((M, K)\), with one row per node and one column per alphabet symbol.
Hint
Use
numpy.asarray
to convert the memoryview to a 2D aray:>>> i = thioesterase.insert_emissions >>> numpy.asarray(i).reshape((thioesterase.M, thioesterase.alphabet.K)) array([[...]], dtype=float32)
New in version 0.3.1.
- Type
memoryview
offloat
-
match_emissions
¶ The match emissions of the model.
The returned memory view exposes a matrix of dimensions \((M, K)\), with one row per node, and one column per alphabet symbol.
Hint
Use
numpy.asarray
to convert the memory view to a 2D array:>>> m = thioesterase.match_emissions >>> numpy.asarray(m).reshape((thioesterase.M, thioesterase.alphabet.K)) array([[...]], dtype=float32)
New in version 0.3.1.
- Type
memoryview
offloat
-
nseq
¶ The number of training sequences used, if any.
If the HMM was created from a multiple sequence alignment, this corresponds to the number of sequences in the MSA.
Example
>>> thioesterase.nseq 278
New in version 0.3.1.
-
nseq_effective
¶ The number of effective sequences used, if any.
New in version 0.3.1.
-
reference
¶ The reference line from the alignment, if any.
This is relevant if the HMM was built from a multiple sequence alignment (e.g. by
Builder.build_msa
, or by an externalhmmbuild
pipeline run).New in version 0.3.1.
-
transition_probabilities
¶ The transition probabilities of the model.
The returned memory view exposes a matrix of dimensions \((M+1, 7)\), with one row per node (plus one extra row for the entry probabilities), and one column per transition.
Hint
Use
numpy.asarray
to convert the memory view to a 2D array:>>> t = thioesterase.transition_probabilities >>> numpy.asarray(t).reshape((thioesterase.M+1, 7)) array([[...]], dtype=float32)
New in version 0.3.1.
- Type
memoryview
offloat
-
HMM File¶
Pipeline¶
-
class
pyhmmer.plan7.
Pipeline
¶ An HMMER3 accelerated sequence/profile comparison pipeline.
-
__init__
(alphabet, background=None, *, bias_filter=True, report_e=10.0, null2=True, seed=42, Z=None, domZ=None)¶ Instantiate and configure a new accelerated comparison pipeline.
- Parameters
alphabet (
Alphabet
) – The biological alphabet the of the HMMs and sequences that are going to be compared. Used to build the background model.background (
Background
, optional) – The background model to use with the pipeline, orNone
to create and use a default one. The pipeline needs ownership of the background model, so any background model passed there will be copied.
- Keyword Arguments
bias_filter (
bool
) – Whether or not to enable composition bias filter. Defaults toTrue
.null2 (
bool
) – Whether or not to compute biased composition score corrections. Defaults toTrue
.report_e (
float
) – Report hits with e-value lower than or equal to this threshold in output. Defaults to10.0
.seed (
int
, optional) – The seed to use with the random number generator. Pass 0 to use a one-time arbitrary seed, orNone
to keep the default seed from HMMER.
-
clear
()¶ Reset the pipeline configuration to its default state.
-
scan_seq
(query, hmms)¶ Run the pipeline using a query sequence against a profile database.
- Parameters
query (
DigitalSequence
) – The sequence object to use to query the profile database.hmms (iterable of
DigitalSequence
) – The HMM profiles to query. Pass aHMMFile
instance to read from disk iteratively.
- Returns
TopHits
– the hits found in the profile database.- Raises
AlphabetMismatch – When the alphabet of the current pipeline does not match the alphabet of the given query or profile.
Caution
In the current version, this method is not optimized to use the pressed database, even if it exists. This will cause the MSV and SSV filters to be rebuilt at each iteration, which could be slow. Consider at least pre-fetching the HMM database if calling this method several times in a row.
New in version v0.4.0.
-
search_hmm
(query, sequences)¶ Run the pipeline using a query HMM against a sequence database.
- Parameters
query (
HMM
) – The HMM object to use to query the sequence database.sequences (iterable of
DigitalSequence
) – The sequences to query with the HMM. For instance, pass aSequenceFile
in digital mode to read from disk iteratively.
- Returns
TopHits
– the hits found in the sequence database.- Raises
AlphabetMismatch – When the alphabet of the current pipeline does not match the alphabet of the given HMM.
New in version 0.2.0.
-
search_msa
(query, sequences, builder=None)¶ Run the pipeline using a query alignment against a sequence database.
- Parameters
query (
DigitalMSA
) – The multiple sequence alignment to use to query the sequence database.sequences (iterable of
DigitalSequence
) – The sequences to query. Pass aSequencesFile
instance in digital mode to read from disk iteratively.builder (
Builder
, optional) – A HMM builder to use to convert the query to aHMM
. IfNone
is given, it will use a default one.
- Returns
TopHits
– the hits found in the sequence database.- Raises
AlphabetMismatch – When the alphabet of the current pipeline does not match the alphabet of the given query.
New in version 0.3.0.
-
search_seq
(query, sequences, builder=None)¶ Run the pipeline using a query sequence against a sequence database.
- Parameters
query (
DigitalSequence
) – The sequence object to use to query the sequence database.sequences (iterable of
DigitalSequence
) – The sequences to query. Pass aSequenceFile
instance in digital mode to read from disk iteratively.builder (
Builder
, optional) – A HMM builder to use to convert the query to aHMM
. IfNone
is given, it will use a default one.
- Returns
TopHits
– the hits found in the sequence database.- Raises
AlphabetMismatch – When the alphabet of the current pipeline does not match the alphabet of the given query.
New in version 0.2.0.
-
Z
¶ The number of effective targets searched.
It is used to compute the independent e-value for each domain, and for an entire hit. If
None
, the parameter number will be set automatically after all the comparisons have been done. Otherwise, it can be set to an arbitrary number.
-
Profile¶
-
class
pyhmmer.plan7.
Profile
¶ A Plan7 search profile.
-
__init__
(M, alphabet)¶ Create a new profile for the given
alphabet
.
-
clear
()¶ Clear internal buffers to reuse the profile without reallocation.
-
configure
(hmm, background, L, multihit=True, local=True)¶ Configure a search profile using the given models.
- Parameters
hmm (
pyhmmer.plan7.HMM
) – The model HMM with core probabilities.bg (
pyhmmer.plan7.Background
) – The null background model.L (
int
) – The expected target sequence length.multihit (
bool
) – Whether or not to use multihit modes.local (
bool
) – Whether or not to use non-local modes.
-
copy
()¶ Return a copy of the profile with the exact same configuration.
-
is_local
()¶ Return whether or not the profile is in a local alignment mode.
-
is_multihit
()¶ Returns whether or not the profile is in a multihit alignment mode.
-
optimized
()¶ Convert the profile to a platform-specific optimized profile.
- Returns
OptimizedProfile
– The platform-specific optimized profile built using the configuration of this profile.
-
-
class
pyhmmer.plan7.
OptimizedProfile
¶ An optimized profile that uses platform-specific instructions.
-
__init__
(M, alphabet)¶ Create a new optimized profile from scratch.
Optimized profiles use platform-specific code to accelerate the various algorithms. Although you can allocate an optimized profile yourself, the only way to obtain a fully configured profile is to create it with the
Profile.optimized
method, after having configured the profile for a given HMM withProfile.configure
.
-
copy
()¶ Create an exact copy of the optimized profile.
-
is_local
()¶ Return whether or not the profile is in a local alignment mode.
-
write
(fh_filter, fh_profile)¶ Write an optimized profile to two separate files.
HMMER implements an acceleration pipeline using several scoring algorithms. Parameters for MSV (the Multi ungapped Segment Viterbi) are saved independently to the
fh_filter
handle, while the rest of the profile is saved tofh_profile
.
-