Results#
- class pyhmmer.plan7.TopHits#
An immutable ranked list of top-scoring hits.
TopHits
are thresholded using the parameters from the pipeline, and are sorted by key when you obtain them from aPipeline
instance:>>> abc = thioesterase.alphabet >>> hits = Pipeline(abc).search_hmm(thioesterase, proteins) >>> hits.is_sorted(by="key") True
Use
len
to query the number of top hits, and the usual indexing notation to extract a particularHit
:>>> len(hits) 1 >>> hits[0].name b'938293.PRJEB85.HG003687_113'
Added in version 0.6.1:
pickle
protocol support.- compare_ranking(ranking)#
Compare current top hits to previous top hits ranking.
This method is used by
jackhmmer
to record the hits obtained during each iteration, so that the inner loop can converge.- Parameters:
ranking (
KeyHash
) – A keyhash containing the ranks of the top hits from a previous run.- Returns:
int
– The number of new hits found in this iteration.
Added in version 0.6.0.
- is_sorted(by='key')#
Check whether or not the hits are sorted with the given method.
See
sort
for a list of allowed values for theby
argument.
- merge(*others)#
Concatenate the hits from this instance and
others
.If the
Z
anddomZ
values used to compute E-values were computed by thePipeline
from the number of targets, the returned object will update them by summingself.Z
andother.Z
. If they were set manually, the manual value will be kept, provided both values are equal.- Returns:
TopHits
– A new collection of hits containing a copy of all the hits fromself
andother
, sorted by key.- Raises:
ValueError – When trying to merge together several hits obtained from different
Pipeline
with incompatible parameters.
Caution
This should only be done for hits obtained for the same domain on similarly configured pipelines. Some internal checks will be done to ensure this is not the case, but the results may not be consistent at all.
Example
>>> pli = Pipeline(thioesterase.alphabet) >>> hits1 = pli.search_hmm(thioesterase, proteins[:1000]) >>> hits2 = pli.search_hmm(thioesterase, proteins[1000:2000]) >>> hits3 = pli.search_hmm(thioesterase, proteins[2000:]) >>> merged = hits1.merge(hits2, hits3)
Added in version 0.5.0.
- sort(by='key')#
Sort hits in the current instance using the given method.
- Parameters:
by (
str
) – The comparison method to use to compare hits. Allowed values are:key
(the default) to sort by key, orseqidx
to sort by sequence index and alignment position.
- to_msa(alphabet, sequences=None, traces=None, trim=False, digitize=False, all_consensus_cols=False)#
Create multiple alignment of all included domains.
- Parameters:
alphabet (
Alphabet
) – The alphabet of the HMM thisTopHits
was obtained from. It is required to convert back hits to single sequences.sequences (
list
ofSequence
, optional) – A list of additional sequences to include in the alignment.traces (
list
ofTrace
, optional) – A list of additional traces to include in the alignment.
- Keyword Arguments:
trim (
bool
) – Trim off any residues that get assigned to flanking \(N\) and \(C\) states (in profile traces) or \(I_0\) and \(I_m\) (in core traces).digitize (
bool
) – If set toTrue
, returns aDigitalMSA
instead of aTextMSA
.all_consensus_cols (
bool
) – Force a column to be created for every consensus column in the model, even if it means having all gap character in a column.
- Returns:
MSA
– A multiple sequence alignment containing the reported hits, either aTextMSA
or aDigitalMSA
depending on the value of thedigitize
argument.
Added in version 0.3.0.
Changed in version 0.6.0: Added the
sequences
andtraces
arguments.
- write(fh, format='targets', header=True)#
Write the hits in tabular format to a file-like object.
- Parameters:
Hint
The hits can be written in one of the following formats:
targets
A tabular output format of per-target hits, as obtained with the
--tblout
output flag ofhmmsearch
orhmmscan
.domains
A tabular output format of per-domain hits, as obtained with the
--domtblout
output flag ofhmmsearch
orhmmscan
.pfam
A tabular output format suitable for Pfam, merging per-sequence and per-domain hits in a single file, with fewer fields and sorted by score.
Added in version 0.6.1.
- bit_cutoffs#
The model-specific thresholding option, if any.
Added in version 0.5.0.
- block_length#
The block length these hits were obtained with.
Is always
None
when the hits were not obtained from a long targets pipeline.Added in version 0.5.0.
- domT#
The per-domain score threshold for reporting a hit.
Added in version 0.5.0.
- incT#
The per-target score threshold for including a hit.
Added in version 0.4.8.
- incdomT#
The per-domain score threshold for including a hit.
Added in version 0.5.0.
- included#
An iterator over the hits marked as included.
Added in version 0.7.0.
- Type:
iterator of
Hit
- long_targets#
Whether these hits were produced by a long targets pipeline.
Added in version 0.5.0.
- Type:
- query#
The query object these hits were obtained for.
The actual type of
TopHits.query
depends on the query that was given to thePipeline
, or thehmmer
function, that created the object:>>> hits = next(pyhmmer.hmmsearch(thioesterase, proteins)) >>> hits.query is thioesterase True
- Type:
- query_accession#
The accession of the query, if any.
Added in version 0.6.1.
Deprecated since version 0.10.10: Use
TopHits.query
to access the original query directly.
- query_length#
The length of the query.
Added in version 0.10.5.
Deprecated since version 0.10.10: Use
TopHits.query
to access the original query directly.- Type:
- query_name#
The name of the query, if any.
Added in version 0.6.1.
Deprecated since version 0.10.10: Use
TopHits.query
to access the original query directly.
- class pyhmmer.plan7.Hit#
A high-scoring database hit found by the comparison pipeline.
A hit is obtained in HMMER for every target where one or more significant domain alignment was found by a
Pipeline
. AHit
comes with a score, which is obtained after correcting of the individual bit scores of all its domains; a P-value, which is computed by testing the likelihood to obtain the same alignment using a random background model; and an E-value, which is obtained after Bonferonni correction of the p-value, taking into account the total number of targets in the target database.Hits also store several information as flags.
Hit.included
andHit.reported
show whether aHit
is considered for inclusion (resp. reporting) with respects to the thresholds defined on the originalPipeline
. These flags can be modified manually to force inclusion or exclusion of certains hits independently of their score or E-value. Thewrite
method ofTopHits
objects will only write a line for hits marked as reported. Included hits are necessarily reported:\[\text{included} \implies \text{reported}\]When used during an iterative search, hits can also be marked as dropped by setting the
Hit.dropped
flag toFalse
. Dropped hits will not be used for building HMMs during the next iteration. Hits newly found in an iteration will be marked as new with theHit.new
flag.Hit.dropped
andHit.included
are mutually exclusive, and setting one will unset the other. Dropped hits can be reported, but are not included:\[\text{dropped} \implies \neg \text{included}\]When running a long target pipeline, some hits may appear as duplicates if they were found across multiple windows. These hits will be marked as duplicates with the
Hit.duplicate
flag. Duplicate hits are neither reported nor included:\[\text{duplicate} \implies \neg \text{reported}\]Added in version 0.6.1:
pickle
protocol support.
- class pyhmmer.plan7.Domain#
A single domain in a query
Hit
.Added in version 0.6.1:
pickle
protocol support.- strand#
The strand where the domain is located.
When running a search with the
LongTargetsPipeline
, both strands of each target sequence are processed (unless disabled), so the domain may be located on either strand, either+
or-
. For defaultPipeline
searches, this is alwaysNone
.Added in version 0.10.8.
- class pyhmmer.plan7.Alignment#
An alignment of a sequence to a profile.
Added in version 0.6.1:
pickle
protocol support.- hmm_accession#
The accession of the query, or its name if it has none.
Added in version 0.1.4.
- Type:
- posterior_probabilities#
Posterior probability annotation of the alignment.
Added in version 0.10.5.
- Type:
- target_length#
The length of the target sequence in the alignment.
Added in version 0.10.5.
- Type: