Miscellaneous#
- class pyhmmer.easel.Alphabet#
A biological alphabet, including additional marker symbols.
This type is used to share an alphabet to several objects in the
easel
andplan7
modules. Reference counting helps sharing the same instance everywhere, instead of reallocating memory every time an alphabet is needed.Use the factory class methods to obtain a default
Alphabet
for one of the three standard biological alphabets:>>> dna = Alphabet.dna() >>> rna = Alphabet.rna() >>> aa = Alphabet.amino()
- classmethod amino()#
Create a default amino-acid alphabet.
- decode(sequence)#
Decode a raw digital sequence into its textual representation.
- Parameters:
sequence (
object
, buffer-like) – A raw sequence in digital format. Any object implementing the buffer protocol (likebytearray
,VectorU8
, etc.) may be given.- Returns:
str
– A raw sequence in textual format.
Example
>>> alphabet = easel.Alphabet.amino() >>> dseq = easel.VectorU8([0, 4, 2, 17, 3, 13, 0, 0, 5]) >>> alphabet.decode(dseq) 'AFDVEQAAG'
Added in version 0.6.3.
- classmethod dna()#
Create a default DNA alphabet.
- encode(sequence)#
Encode a raw text sequence into its digital representation.
- Parameters:
sequence (
str
) – A raw sequence in text format.- Returns:
VectorU8
– A raw sequence in digital format.
Example
>>> alphabet = easel.Alphabet.dna() >>> alphabet.encode("ACGT") VectorU8([0, 1, 2, 3])
Added in version 0.6.3.
- classmethod rna()#
Create a default RNA alphabet.
- K#
The alphabet size, counting only actual alphabet symbols.
Example
>>> Alphabet.dna().K 4 >>> Alphabet.amino().K 20
- Type:
- Kp#
The complete alphabet size, including marker symbols.
Example
>>> Alphabet.dna().Kp 18 >>> Alphabet.amino().Kp 29
- Type:
- symbols#
The symbols composing the alphabet.
Example
>>> Alphabet.dna().symbols 'ACGT-RYMKSWHBVDN*~' >>> Alphabet.rna().symbols 'ACGU-RYMKSWHBVDN*~'
- Type:
- class pyhmmer.easel.GeneticCode#
A genetic code table for translation.
Added in version 0.7.2.
- __init__(translation_table=1, *, nucleotide_alphabet=None, amino_alphabet=None)#
Create a new genetic code for translating nucleotide sequences.
- Parameters:
translation_table (
int
) – The translation table to use. Check the Wikipedia page listing all genetic codes for the available values.nucleotide_alphabet (
Alphabet
) – The nucleotide alphabet from which to translate the sequence.amino_alphabet (
Alphabet
) – The target alphabet into which to translate the sequence.
- translate(sequence)#
Translate a raw nucleotide sequence into a protein.
- Parameters:
sequence (
object
, buffer-like) – A raw sequence in digital format. Any object implementing the buffer protocol (likebytearray
,VectorU8
, etc.) may be given.- Returns:
VectorU8
– The translation of the input sequence, as a raw digital sequence.- Raises:
ValueError – When
sequence
could not be translated properly, because of a codon could not be recognized, or because the sequence has an invalid length.
Note
The translation of a DNA/RNA codon supports ambiguous codons. If the amino acid is unambiguous, despite codon ambiguity, the correct amino acid is still determined:
GGR
translates asGly
,UUY
asPhe
, etc. If there is no single unambiguous amino acid translation, the codon is translated asX
. Ambiguous amino acids (such asJ
orB
) are never produced.
- translation_table#
The translation table in use.
Can be set manually to a different number to change the translation table for the current
GeneticCode
object.- Type:
- class pyhmmer.easel.Randomness#
A portable, thread-safe random number generator.
Methods with an implementation in Easel are named after the equivalent methods of
random.Random
.Added in version 0.4.2.
- __init__(seed=None, fast=False)#
Create a new random number generator with the given seed.
- Parameters:
seed (
int
) – The seed to initialize the generator with. If0
orNone
is given, an arbitrary seed will be chosen using the system clock.fast (
bool
) – IfTrue
, use a linear congruential generator (LCG), which is low quality and should only be used for integration with legacy code. WithFalse
, use the Mersenne Twister MT19937 algorithm instead.
- copy()#
Return a copy of the random number generator in the same exact state.
- getstate()#
Get a tuple containing the current state.
- normalvariate(mu, sigma)#
Generate a Gaussian-distributed sample.
- random()#
Generate a uniform random deviate on \(\left[ 0, 1 \right)\).
- seed(n=None)#
Reinitialize the random number generator with the given seed.
- setstate(state)#
Restores the state of the random number generator.
- class pyhmmer.easel.SSIReader#
A read-only handler for sequence/subsequence index file.
- class Entry(fd, record_offset, data_offset, record_length)#
- data_offset#
Alias for field number 2
- fd#
Alias for field number 0
- record_length#
Alias for field number 3
- record_offset#
Alias for field number 1
- __init__(file)#
Create a new SSI file reader for the file at the given location.
- Parameters:
file (
str
,bytes
oros.PathLike
) – The path to a sequence/subsequence index file to read.
- close()#
Close the SSI file reader.
- class pyhmmer.easel.SSIWriter#
A writer for sequence/subsequence index files.
- __init__(file, exclusive=False)#
Create a new SSI file write for the file at the given location.
- Parameters:
file (
str
,bytes
oros.PathLike
) – The path to a sequence/subsequence index file to write.exclusive (
bool
) – Whether or not to create a file if one does not exist.
- Raises:
FileNotFoundError – When the path to the file cannot be resolved.
FileExistsError – When the file exists and
exclusive
isTrue
.
- add_alias(alias, key)#
Make
alias
an alias ofkey
in the index.
- add_file(filename, format=0)#
Add a new file to the index.
- Parameters:
filename (
str
,bytes
oros.PathLike
) – The name of the file to register.format (
int
) – A format code to associate with the file, or 0 by default.
- Returns:
int
– The filehandle associated with the new indexed file.
- add_key(key, fd, record_offset, data_offset=0, record_length=0)#
Add a new entry to the index with the given
key
.
- close()#
Close the SSI file writer.