Blocks#
- class pyhmmer.easel.SequenceBlock#
An abstract container for storing
Sequenceobjects.To pass the target sequences efficiently in
Pipeline.search_hmm, an array is allocated so that the inner loop can iterate over the target sequences without having to acquire the GIL for each new sequence (this gave a huge performance boost in v0.4.5). However, there was no way to reuse this between different queries; some memory recycling was done, but the target sequences had to be indexed for every query. This class allows synchronizing a PythonlistofSequenceobjects with an internal C-contiguous buffer of pointers toESL_SQstructs that can be used in the HMMER search loop.Added in version 0.7.0.
- clear()#
Remove all sequences from the block.
- copy()#
Return a copy of the sequence block.
Note
The sequence internally refered to by this collection are not copied. Use
copy.deepcopyif you also want to duplicate the internal storage of each sequence.
- extend(iterable)#
Extend block by appending sequences from the iterable.
- largest()#
Return the largest sequence in the block.
- total_length()#
Compute the total length of the sequence block.
- write(fh)#
Write all sequences to a file handle, in FASTA format.
- Parameters:
fh (
io.IOBase) – A Python file handle, opened in binary mode.
Added in version 0.12.0.
- indexed#
A mapping of names to sequences.
This property can be used to access the sequence of a sequence block by name. An index is created the first time this property is accessed. An error is raised if the block contains duplicate sequence names.
- Raises:
KeyError – When attempting to create an index for an alignment containing duplicate sequence names.
Example
>>> s1 = TextSequence(name="seq1", sequence="ATGC") >>> s2 = TextSequence(name="seq2", sequence="ATTA") >>> block = TextSequenceBlock([s1, s2]) >>> block.indexed['seq1'].sequence 'ATGC' >>> block.indexed['seq3'] Traceback (most recent call last): ... KeyError: 'seq3'
Added in version 0.11.1.
- Type:
- class pyhmmer.easel.TextSequenceBlock(SequenceBlock)#
A container for storing
TextSequenceobjects.Added in version 0.7.0.
Added in version 0.10.4:
pickleprotocol support.- __init__(iterable=())#
Create a new block from an iterable of text sequences.
- append(sequence)#
Append
sequenceat the end of the block.
- clear()#
Remove all sequences from the block.
- copy()#
Return a copy of the text sequence block.
Note
The sequence internally refered to by this collection are not copied. Use
copy.deepcopyis you also want to duplicate the internal storage of each sequence.
- digitize(alphabet)#
Create a block containing sequences from this block in digital mode.
- extend(iterable)#
Extend block by appending sequences from the iterable.
- index(sequence, start=0, stop=9223372036854775807)#
Return the index of the first occurence of
sequence.- Raises:
ValueError – When the block does not contain
sequence.
- insert(index, sequence)#
Insert a new sequence in the block before
index.
- largest()#
Return the largest sequence in the block.
- pop(index=-1)#
Remove and return a sequence from the block (the last one by default).
- remove(sequence)#
Remove the first occurence of the given sequence.
- total_length()#
Compute the total length of the sequence block.
- write(fh)#
Write all sequences to a file handle, in FASTA format.
- Parameters:
fh (
io.IOBase) – A Python file handle, opened in binary mode.
Added in version 0.12.0.
- indexed#
A mapping of names to sequences.
This property can be used to access the sequence of a sequence block by name. An index is created the first time this property is accessed. An error is raised if the block contains duplicate sequence names.
- Raises:
KeyError – When attempting to create an index for an alignment containing duplicate sequence names.
Example
>>> s1 = TextSequence(name="seq1", sequence="ATGC") >>> s2 = TextSequence(name="seq2", sequence="ATTA") >>> block = TextSequenceBlock([s1, s2]) >>> block.indexed['seq1'].sequence 'ATGC' >>> block.indexed['seq3'] Traceback (most recent call last): ... KeyError: 'seq3'
Added in version 0.11.1.
- Type:
- class pyhmmer.easel.DigitalSequenceBlock(SequenceBlock)#
A container for storing
DigitalSequenceobjects.- alphabet#
The biological alphabet shared by all sequences in the collection.
- Type:
Alphabet, readonly
Added in version 0.7.0.
Added in version 0.10.4:
pickleprotocol support.- __init__(alphabet, iterable=())#
Create a new digital sequence block with the given alphabet.
- Parameters:
alphabet (
Alphabet) – The alphabet to use for all the sequences in the block.iterable (iterable of
DigitalSequence) – An initial collection of digital sequences to add to the block.
- Raises:
AlphabetMismatch – When the alphabet of one of the sequences does not match
alphabet.
- append(sequence)#
Append
sequenceat the end of the block.
- clear()#
Remove all sequences from the block.
- copy()#
Return a copy of the digital sequence block.
Note
The sequence internally refered to by this collection are not copied. Use
copy.deepcopyis you also want to duplicate the internal storage of each sequence.
- extend(iterable)#
Extend block by appending sequences from the iterable.
- index(sequence, start=0, stop=9223372036854775807)#
Return the index of the first occurence of
sequence.- Raises:
ValueError – When the block does not contain
sequence.
- insert(index, sequence)#
Insert a new sequence in the block before
index.
- largest()#
Return the largest sequence in the block.
- pop(index=-1)#
Remove and return a sequence from the block (the last one by default).
- remove(sequence)#
Remove the first occurence of the given sequence.
- textize()#
Create a block containing sequences from this block in text mode.
- total_length()#
Compute the total length of the sequence block.
- translate(genetic_code=GeneticCode(1))#
Translate the sequence block using the given genetic code.
- Parameters:
genetic_code (
GeneticCode) – The genetic code to use for translating the sequence. If none provided, the default uses the standard translation table (1) and expects DNA sequences.- Returns:
DigitalSequenceBlock– The translation of each sequence from the block, in digital mode.- Raises:
AlphabetMismatch – When the
genetic_codeexpects a different nucleotide alphabet than the one currently for the sequences in the block.ValueError – When a sequence from the block could not be translated properly, because of a codon could not be recognized, or because the sequence has an invalid length.
See also
DigitalSequence.translatefor more information on how ambiguous nucleotides are handled.
- write(fh)#
Write all sequences to a file handle, in FASTA format.
- Parameters:
fh (
io.IOBase) – A Python file handle, opened in binary mode.
Added in version 0.12.0.
- indexed#
A mapping of names to sequences.
This property can be used to access the sequence of a sequence block by name. An index is created the first time this property is accessed. An error is raised if the block contains duplicate sequence names.
- Raises:
KeyError – When attempting to create an index for an alignment containing duplicate sequence names.
Example
>>> s1 = TextSequence(name="seq1", sequence="ATGC") >>> s2 = TextSequence(name="seq2", sequence="ATTA") >>> block = TextSequenceBlock([s1, s2]) >>> block.indexed['seq1'].sequence 'ATGC' >>> block.indexed['seq3'] Traceback (most recent call last): ... KeyError: 'seq3'
Added in version 0.11.1.
- Type: