HMMER¶
Reimplementation of HMMER binaries with the pyHMMER API.
hmmsearch¶
-
pyhmmer.hmmer.hmmsearch(queries, sequences, cpus=0, callback=None, **options)¶ Search HMM profiles against a sequence database.
- Parameters
queries (iterable of
HMM) – The query HMMs to search in the database.sequences (collection of
DigitalSequence) – A database of sequences to query.cpus (
int) – The number of threads to run in parallel. Pass1to run everything in the main thread,0to automatically select a suitable number (usingpsutil.cpu_count), or any positive number otherwise.callback (callable) – A callback that is called everytime a query is processed with two arguments: the query, and the total number of queries. This can be used to display progress in UI.
- Yields
TopHits– An object reporting top hits for each query, in the same order the queries were passed in the input.- Raises
AlphabetMismatch – When any of the query HMMs
and the sequences do not share the same alphabet. –
Note
Any additional arguments passed to the
hmmsearchfunction will be passed transparently to thePipelineto be created.New in version 0.1.0.
phmmer¶
-
pyhmmer.hmmer.phmmer(queries: Iterable[pyhmmer.easel.DigitalMSA], sequences: Collection[pyhmmer.easel.DigitalSequence], cpus: int = 0, callback: Optional[Callable[[pyhmmer.easel.DigitalMSA, int], None]] = None, builder: Optional[pyhmmer.plan7.Builder] = None, **options: Any) → Iterator[pyhmmer.plan7.TopHits]¶ -
pyhmmer.hmmer.phmmer(queries: Iterable[pyhmmer.easel.DigitalSequence], sequences: Collection[pyhmmer.easel.DigitalSequence], cpus: int = 0, callback: Optional[Callable[[pyhmmer.easel.DigitalSequence, int], None]] = None, builder: Optional[pyhmmer.plan7.Builder] = None, **options: Any) → Iterator[pyhmmer.plan7.TopHits] Search protein sequences against a sequence database.
- Parameters
queries (iterable of
DigitalSequence,DigitalMSAorHMM) – The query sequences or profiles to search in the database.sequences (collection of
DigitalSequence) – A database of sequences to query.cpus (
int) – The number of threads to run in parallel. Pass1to run everything in the main thread,0to automatically select a suitable number (usingpsutil.cpu_count), or any positive number otherwise.callback (callable) – A callback that is called everytime a query is processed with two arguments: the query, and the total number of queries. This can be used to display progress in UI.
builder (
Builder, optional) – A builder to configure how the queries are converted to HMMs. PassingNonewill create a default instance.
- Yields
TopHits– A top hits instance for each query, in the same order the queries were passed in the input.
Note
Any additional keyword arguments passed to the
phmmerfunction will be passed transparently to thePipelineto be created in each worker thread.New in version 0.2.0.
Changed in version 0.3.0: Allow using
DigitalMSAqueries.
nhmmer¶
-
pyhmmer.hmmer.nhmmer(queries: Iterable[pyhmmer.easel.DigitalMSA], sequences: Collection[pyhmmer.easel.DigitalSequence], cpus: int = 0, callback: Optional[Callable[[pyhmmer.easel.DigitalMSA, int], None]] = None, builder: Optional[pyhmmer.plan7.Builder] = None, **options: Any) → Iterator[pyhmmer.plan7.TopHits]¶ -
pyhmmer.hmmer.nhmmer(queries: Iterable[pyhmmer.easel.DigitalSequence], sequences: Collection[pyhmmer.easel.DigitalSequence], cpus: int = 0, callback: Optional[Callable[[pyhmmer.easel.DigitalSequence, int], None]] = None, builder: Optional[pyhmmer.plan7.Builder] = None, **options: Any) → Iterator[pyhmmer.plan7.TopHits] -
pyhmmer.hmmer.nhmmer(queries: Iterable[pyhmmer.plan7.HMM], sequences: Collection[pyhmmer.easel.DigitalSequence], cpus: int = 0, callback: Optional[Callable[[pyhmmer.plan7.HMM, int], None]] = None, builder: Optional[pyhmmer.plan7.Builder] = None, **options: Any) → Iterator[pyhmmer.plan7.TopHits] Search nucleotide sequences against a sequence database.
Note
Any additional keyword arguments passed to the
phmmerfunction will be passed to theLongTargetsPipelinecreated in each worker thread. Thestrandargument can be used to restrict the search on the direct or reverse strand.See also
The equivalent function for proteins,
phmmer.New in version 0.3.0.
hmmpress¶
-
pyhmmer.hmmer.hmmpress(hmms, output)¶ Press several HMMs into a database.
Calling this function will create 4 files at the given location:
{output}.h3p(containing the optimized profiles),{output}.h3m(containing the binary HMMs),{output}.h3f(containing the MSV parameters), and{output}.h3i(the SSI index mapping the previous files).- Parameters
hmms (iterable of
HMM) – The HMMs to be pressed together in the file.output (
stroros.PathLike) – The path to an output location where to write the different files.
hmmalign¶
-
pyhmmer.hmmer.hmmalign(hmm, sequences, trim=False, digitize=False, all_consensus_cols=True)¶ Align several sequences to a reference HMM, and return the MSA.
- Parameters
hmm (
HMM) – The reference HMM to use for the alignment.sequences (collection of
DigitalSequence) – The sequences to align to the HMM.trim (
bool) – Trim off any residues that get assigned to flanking \(N\) and \(C\) states (in profile traces) or \(I_0\) and \(I_m\) (in core traces).digitize (
bool) – If set toTrue, returns aDigitalMSAinstead of aTextMSA.all_consensus_cols (
bool) – Force a column to be created for every consensus column in the model, even if it means having all gap character in a column.
- Returns
MSA– A multiple sequence alignment containing the aligned sequences, either aTextMSAor aDigitalMSAdepending on the value of thedigitizeargument.
See also
The
TraceAlignerclass, which lets you inspect the intermediate tracebacks obtained for each alignment before building a MSA.New in version 0.4.7.