Blocks#

class pyhmmer.easel.SequenceBlock#

An abstract container for storing Sequence objects.

To pass the target sequences efficiently in Pipeline.search_hmm, an array is allocated so that the inner loop can iterate over the target sequences without having to acquire the GIL for each new sequence (this gave a huge performance boost in v0.4.5). However, there was no way to reuse this between different queries; some memory recycling was done, but the target sequences had to be indexed for every query. This class allows synchronizing a Python list of Sequence objects with an internal C-contiguous buffer of pointers to ESL_SQ structs that can be used in the HMMER search loop.

Added in version 0.7.0.

clear()#: Remove all sequences from the block.

copy()#: Return a copy of the sequence block.

Note

The sequence internally refered to by this collection are not copied. Use copy.deepcopy if you also want to duplicate the internal storage of each sequence.

extend(iterable)#: Extend block by appending sequences from the iterable.

largest()#: Return the largest sequence in the block.

total_length()#: Compute the total length of the sequence block.

write(fh)#

Write all sequences to a file handle, in FASTA format.

Parameters:: fh (io.IOBase) – A Python file handle, opened in binary mode.

Added in version 0.12.0.

indexed#

A mapping of names to sequences.

This property can be used to access the sequence of a sequence block by name. An index is created the first time this property is accessed. An error is raised if the block contains duplicate sequence names.

Raises:: KeyError – When attempting to create an index for an alignment containing duplicate sequence names.

Example

>>> s1 = TextSequence(name="seq1", sequence="ATGC")
>>> s2 = TextSequence(name="seq2", sequence="ATTA")
>>> block = TextSequenceBlock([s1, s2])
>>> block.indexed['seq1'].sequence
'ATGC'
>>> block.indexed['seq3']
Traceback (most recent call last):
...
KeyError: 'seq3'

Added in version 0.11.1.

Type:: Mapping

class pyhmmer.easel.TextSequenceBlock(SequenceBlock)#

A container for storing TextSequence objects.

Added in version 0.7.0.

Added in version 0.10.4: pickle protocol support.

__init__(iterable=())#: Create a new block from an iterable of text sequences.

append(sequence)#: Append sequence at the end of the block.

clear()#: Remove all sequences from the block.

copy()#: Return a copy of the text sequence block.

Note

The sequence internally refered to by this collection are not copied. Use copy.deepcopy is you also want to duplicate the internal storage of each sequence.

digitize(alphabet)#: Create a block containing sequences from this block in digital mode.

extend(iterable)#: Extend block by appending sequences from the iterable.

index(sequence, start=0, stop=9223372036854775807)#

Return the index of the first occurence of sequence.

Raises:: ValueError – When the block does not contain sequence.

insert(index, sequence)#: Insert a new sequence in the block before index.

largest()#: Return the largest sequence in the block.

pop(index=-1)#: Remove and return a sequence from the block (the last one by default).

remove(sequence)#: Remove the first occurence of the given sequence.

total_length()#: Compute the total length of the sequence block.

write(fh)#

Write all sequences to a file handle, in FASTA format.

Parameters:: fh (io.IOBase) – A Python file handle, opened in binary mode.

Added in version 0.12.0.

indexed#

A mapping of names to sequences.

This property can be used to access the sequence of a sequence block by name. An index is created the first time this property is accessed. An error is raised if the block contains duplicate sequence names.

Raises:: KeyError – When attempting to create an index for an alignment containing duplicate sequence names.

Example

>>> s1 = TextSequence(name="seq1", sequence="ATGC")
>>> s2 = TextSequence(name="seq2", sequence="ATTA")
>>> block = TextSequenceBlock([s1, s2])
>>> block.indexed['seq1'].sequence
'ATGC'
>>> block.indexed['seq3']
Traceback (most recent call last):
...
KeyError: 'seq3'

Added in version 0.11.1.

Type:: Mapping

class pyhmmer.easel.DigitalSequenceBlock(SequenceBlock)#

A container for storing DigitalSequence objects.

alphabet#

The biological alphabet shared by all sequences in the collection.

Type:: Alphabet, readonly

Added in version 0.7.0.

Added in version 0.10.4: pickle protocol support.

__init__(alphabet, iterable=())#

Create a new digital sequence block with the given alphabet.

Parameters:

alphabet (Alphabet) – The alphabet to use for all the sequences in the block.
iterable (iterable of DigitalSequence) – An initial collection of digital sequences to add to the block.

Raises:

AlphabetMismatch – When the alphabet of one of the sequences does not match alphabet.

append(sequence)#: Append sequence at the end of the block.

clear()#: Remove all sequences from the block.

copy()#: Return a copy of the digital sequence block.

Note

The sequence internally refered to by this collection are not copied. Use copy.deepcopy is you also want to duplicate the internal storage of each sequence.

extend(iterable)#: Extend block by appending sequences from the iterable.

index(sequence, start=0, stop=9223372036854775807)#

Return the index of the first occurence of sequence.

Raises:: ValueError – When the block does not contain sequence.

insert(index, sequence)#: Insert a new sequence in the block before index.

largest()#: Return the largest sequence in the block.

pop(index=-1)#: Remove and return a sequence from the block (the last one by default).

remove(sequence)#: Remove the first occurence of the given sequence.

textize()#: Create a block containing sequences from this block in text mode.

total_length()#: Compute the total length of the sequence block.

translate(genetic_code=GeneticCode(1))#

Translate the sequence block using the given genetic code.

Parameters:

genetic_code (GeneticCode) – The genetic code to use for translating the sequence. If none provided, the default uses the standard translation table (1) and expects DNA sequences.

Returns:

DigitalSequenceBlock – The translation of each sequence from the block, in digital mode.

Raises:

AlphabetMismatch – When the genetic_code expects a different nucleotide alphabet than the one currently for the sequences in the block.
ValueError – When a sequence from the block could not be translated properly, because of a codon could not be recognized, or because the sequence has an invalid length.