Iterative Searches#

pyhmmer.hmmer.jackhmmer(queries, sequences, *, max_iterations=5, select_hits=None, checkpoints=False, cpus=0, callback=None, builder=None, **options)#

Search protein sequences against a sequence database.

Parameters:
  • queries (iterable of DigitalSequence) – The query sequences to search for in the sequence database. Passing a single sequence object is supported.

  • sequences (iterable of DigitalSequence) – A database of sequences to query. If you plan on using the same sequences several times, consider storing them into a DigitalSequenceBlock directly. jackhmmer does not support passing a SequenceFile at the moment.

  • max_iterations (int) – The maximum number of iterations for the search. Hits will be returned early if the searched converged.

  • select_hits (callable, optional) – A function or callable object for manually selecting hits during each iteration. It should take a single TopHits argument and change the inclusion of individual hits with the include and drop methods of Hit objects.

  • checkpoints (bool) – A logical flag to return the results at each iteration ‘checkpoint’. If True, then an iterable of up to max_iterations IterationResult will be returned, rather than just the final iteration. This is similar to --chkhmm amd --chkali flags from HMMER3’s jackhmmer interface.

  • cpus (int) – The number of threads to run in parallel. Pass 1 to run everything in the main thread, 0 to automatically select a suitable number (using psutil.cpu_count), or any positive number otherwise.

  • callback (callable) – A callback that is called everytime a query is processed with two arguments: the query, and the total number of queries. This can be used to display progress in UI.

  • builder (Builder, optional) – A builder to configure how the queries are converted to HMMs. Passing None will create a default instance.

  • backend (str) – The parallel backend to use for workers to be executed. Supports threading to use thread-based parallelism, or multiprocessing to use process-based parallelism.

Yields:

IterationResult – An iteration result instance for each query, in the same order the queries were passed in the input. If checkpoint option is True, all iterations will be returned instead of the last one.

Raises:

AlphabetMismatch – When any of the query sequence the profile or the optional builder do not share the same alphabet.

Note

Any additional keyword arguments passed to the jackhmmer function will be passed transparently to the Pipeline to be created in each worker thread.

Caution

Default values used for jackhmmer do not correspond to the default parameters used for creating a pipeline in the other cases. If no parameter value is given as a keyword argument, jackhmmer will create the pipeline with incE=0.001 and incdomE=0.001, where a default Pipeline would use incE=0.01 and incdomE=0.01.

Added in version 0.8.0.