HMMER¶

Reimplementation of HMMER binaries with the pyHMMER API.

hmmsearch¶

pyhmmer.hmmer.hmmsearch(queries, sequences, cpus=0, callback=None, **options)¶

Search HMM profiles against a sequence database.

Parameters

queries (iterable of HMM, Profile or OptimizedProfile) – The query HMMs or profiles to search for in the database.
sequences (collection of DigitalSequence) – A database of sequences to query.
cpus (int) – The number of threads to run in parallel. Pass 1 to run everything in the main thread, 0 to automatically select a suitable number (using psutil.cpu_count), or any positive number otherwise.
callback (callable) – A callback that is called everytime a query is processed with two arguments: the query, and the total number of queries. This can be used to display progress in UI.

Yields

TopHits – An object reporting top hits for each query, in the same order the queries were passed in the input.

Raises

AlphabetMismatch – When any of the query HMMs
and the sequences do not share the same alphabet. –

Note

Any additional arguments passed to the hmmsearch function will be passed transparently to the Pipeline to be created.

New in version 0.1.0.

Changed in version 0.4.9: Allow using Profile and OptimizedProfile queries.

phmmer¶

pyhmmer.hmmer.phmmer(queries, sequences, cpus=0, callback=None, builder=None, **options)¶

Search protein sequences against a sequence database.

Parameters

queries (iterable of DigitalSequence or DigitalMSA) – The query sequences to search for in the sequence database.
sequences (collection of DigitalSequence) – A database of sequences to query.
cpus (int) – The number of threads to run in parallel. Pass 1 to run everything in the main thread, 0 to automatically select a suitable number (using psutil.cpu_count), or any positive number otherwise.
callback (callable) – A callback that is called everytime a query is processed with two arguments: the query, and the total number of queries. This can be used to display progress in UI.
builder (Builder, optional) – A builder to configure how the queries are converted to HMMs. Passing None will create a default instance.

Yields

TopHits – A top hits instance for each query, in the same order the queries were passed in the input.

Note

Any additional keyword arguments passed to the phmmer function will be passed transparently to the Pipeline to be created in each worker thread.

New in version 0.2.0.

Changed in version 0.3.0: Allow using DigitalMSA queries.

nhmmer¶

pyhmmer.hmmer.nhmmer(queries, sequences, cpus=0, callback=None, builder=None, **options)¶

Search nucleotide sequences against a sequence database.

Parameters

queries (iterable of DigitalSequence, DigitalMSA, HMM) – The query sequences or profiles to search for in the sequence database.
sequences (collection of DigitalSequence) – A database of sequences to query.
cpus (int) – The number of threads to run in parallel. Pass 1 to run everything in the main thread, 0 to automatically select a suitable number (using psutil.cpu_count), or any positive number otherwise.
callback (callable) – A callback that is called everytime a query is processed with two arguments: the query, and the total number of queries. This can be used to display progress in UI.
builder (Builder, optional) – A builder to configure how the queries are converted to HMMs. Passing None will create a default instance.

Yields

TopHits – A top hits instance for each query, in the same order the queries were passed in the input.

Note

Any additional keyword arguments passed to the nhmmer function will be passed to the LongTargetsPipeline created in each worker thread. The strand argument can be used to restrict the search on the direct or reverse strand.

Hint

This function is not just phmmer for nucleotide sequences; it actually uses a LongTargetsPipeline internally instead of processing each target sequence in its entirety when searching for hits. This avoids hitting the maximum target size that can be used (100,000 residues), which may be a problem for some larger genomes.

New in version 0.3.0.

Changed in version 0.4.9: Allow using Profile and OptimizedProfile queries.

hmmpress¶

pyhmmer.hmmer.hmmpress(hmms, output)¶

Press several HMMs into a database.

Calling this function will create 4 files at the given location: {output}.h3p (containing the optimized profiles), {output}.h3m (containing the binary HMMs), {output}.h3f (containing the MSV parameters), and {output}.h3i (the SSI index mapping the previous files).

Parameters

hmms (iterable of HMM) – The HMMs to be pressed together in the file.
output (str or os.PathLike) – The path to an output location where to write the different files.

hmmalign¶

pyhmmer.hmmer.hmmalign(hmm, sequences, trim=False, digitize=False, all_consensus_cols=True)¶

Align several sequences to a reference HMM, and return the MSA.

Parameters

hmm (HMM) – The reference HMM to use for the alignment.
sequences (collection of DigitalSequence) – The sequences to align to the HMM.
trim (bool) – Trim off any residues that get assigned to flanking \(N\) and \(C\) states (in profile traces) or \(I_0\) and \(I_m\) (in core traces).
digitize (bool) – If set to True, returns a DigitalMSA instead of a TextMSA.
all_consensus_cols (bool) – Force a column to be created for every consensus column in the model, even if it means having all gap character in a column.

Returns

MSA – A multiple sequence alignment containing the aligned sequences, either a TextMSA or a DigitalMSA depending on the value of the digitize argument.