Sequence Searches#
- pyhmmer.hmmer.phmmer(queries, sequences, cpus=0, callback=None, builder=None, **options)#
Search protein sequences against a sequence database.
- Parameters:
queries (iterable of
DigitalSequence
orDigitalMSA
) – The query sequences to search for in the sequence database. Passing a single object is supported.sequences (iterable of
DigitalSequence
) – A database of sequences to query. If you plan on using the same sequences several times, consider storing them into aDigitalSequenceBlock
directly. If aSequenceFile
is given, profiles will be loaded iteratively from disk rather than prefetched.cpus (
int
) – The number of threads to run in parallel. Pass1
to run everything in the main thread,0
to automatically select a suitable number (usingpsutil.cpu_count
), or any positive number otherwise.callback (callable) – A callback that is called everytime a query is processed with two arguments: the query, and the total number of queries. This can be used to display progress in UI.
builder (
Builder
, optional) – A builder to configure how the queries are converted to HMMs. PassingNone
will create a default instance.backend (
str
) – The parallel backend to use for workers to be executed. Supportsthreading
to use thread-based parallelism, ormultiprocessing
to use process-based parallelism.
- Yields:
TopHits
– A top hits instance for each query, in the same order the queries were passed in the input.- Raises:
AlphabetMismatch – When any of the query sequence the profile or the optional builder do not share the same alphabet.
Note
Any additional keyword arguments passed to the
phmmer
function will be passed transparently to thePipeline
to be created in each worker thread.Added in version 0.2.0.
Changed in version 0.3.0: Allow using
DigitalMSA
queries.Changed in version 0.7.0: Queries may now be an iterable of different types, or a single object.
- pyhmmer.hmmer.nhmmer(queries, sequences, cpus=0, callback=None, builder=None, **options)#
Search nucleotide sequences against a sequence database.
- Parameters:
queries (iterable of
DigitalSequence
,DigitalMSA
,HMM
) – The query sequences or profiles to search for in the sequence database. Passing a single object is supported.sequences (iterable of
DigitalSequence
) – A database of sequences to query. If you plan on using the same sequences several times, consider storing them into aDigitalSequenceBlock
directly. If aSequenceFile
is given, profiles will be loaded iteratively from disk rather than prefetched.cpus (
int
) – The number of threads to run in parallel. Pass1
to run everything in the main thread,0
to automatically select a suitable number (usingpsutil.cpu_count
), or any positive number otherwise.callback (callable) – A callback that is called everytime a query is processed with two arguments: the query, and the total number of queries. This can be used to display progress in UI.
builder (
Builder
, optional) – A builder to configure how the queries are converted to HMMs. PassingNone
will create a default instance.backend (
str
) – The parallel backend to use for workers to be executed. Supportsthreading
to use thread-based parallelism, ormultiprocessing
to use process-based parallelism.
- Yields:
TopHits
– A top hits instance for each query, in the same order the queries were passed in the input.
Note
Any additional keyword arguments passed to the
nhmmer
function will be passed to theLongTargetsPipeline
created in each worker thread. Thestrand
argument can be used to restrict the search on the direct or reverse strand.Caution
This function is not just
phmmer
for nucleotide sequences; it actually uses aLongTargetsPipeline
internally instead of processing each target sequence in its entirety when searching for hits. This avoids hitting the maximum target size that can be used (100,000 residues), which may be a problem for some larger genomes.Added in version 0.3.0.
Changed in version 0.4.9: Allow using
Profile
andOptimizedProfile
queries.Changed in version 0.7.0: Queries may now be an iterable of different types, or a single object.