Alignments#
- class pyhmmer.easel.MSA#
An abstract alignment of multiple sequences.
Hint
Use
len(msa)
to get the number of columns in the alignment, andlen(msa.sequences)
to get the number of sequences (i.e. the number of rows).- checksum()#
Calculate a 32-bit checksum for the multiple sequence alignment.
- write(fh, format)#
Write the multiple sequence alignement to a file handle.
- Parameters:
Added in version 0.3.0.
- names#
The name of each sequence in the alignment.
Every sequence in the alignment is required to have a name, so no member of the
tuple
will ever beNone
.Example
>>> s1 = TextSequence(name=b"seq1", sequence="ATGC") >>> s2 = TextSequence(name=b"seq2", sequence="ATGC") >>> msa = TextMSA(name=b"msa", sequences=[s1, s2]) >>> msa.names (b'seq1', b'seq2')
Added in version 0.4.8.
- class pyhmmer.easel.TextMSA(MSA)#
A multiple sequence alignement stored in text mode.
- __init__(name=None, description=None, accession=None, sequences=None, author=None)#
Create a new text-mode alignment with the given
sequences
.- Parameters:
name (
bytes
, optional) – The name of the alignment, if any.description (
bytes
, optional) – The description of the alignment, if any.accession (
bytes
, optional) – The accession of the alignment, if any.sequences (collection of
TextSequence
) – The sequences to store in the multiple sequence alignment. All sequences must have the same length. They also need to have distinct names.author (
bytes
, optional) – The author of the alignment, often used to record the aligner it was created with.
- Raises:
ValueError – When the alignment cannot be created from the given sequences.
TypeError – When
sequences
is not an iterable ofTextSequence
objects.
Example
>>> s1 = TextSequence(name=b"seq1", sequence="ATGC") >>> s2 = TextSequence(name=b"seq2", sequence="ATGC") >>> msa = TextMSA(name=b"msa", sequences=[s1, s2]) >>> len(msa) 4
Changed in version 0.3.0: Allow creating an alignment from an iterable of
TextSequence
.
- copy()#
Duplicate the text sequence alignment, and return the copy.
- digitize(alphabet)#
Convert the text alignment to a digital alignment using
alphabet
.- Returns:
DigitalMSA
– An alignment in digital mode containing the same sequences digitized withalphabet
.- Raises:
ValueError – When the text sequence contains invalid characters that cannot be converted according to
alphabet.symbols
.
- alignment#
A view of the aligned sequences as strings.
This property gives access to the aligned sequences, including gap characters, so that they can be displayed or processed column by column.
Examples
Use
TextMSA.alignment
to display an alignment in text format:>>> for name, aligned in zip(luxc.names, luxc.alignment): ... print(name, " ", aligned[:40], "...") b'Q9KV99.1' LANQPLEAILGLINEARKSWSST------------PELDP ... b'Q2WLE3.1' IYSYPSEAMIEIINEYSKILCSD------------RKFLS ... b'Q97GS8.1' VHDIKTEETIDLLDRCAKLWLDDNYSKK--HIETLAQITN ... b'Q3WCI9.1' LLNVPLKEIIDFLVETGERIRDPRNTFMQDCIDRMAGTHV ... b'P08639.1' LNDLNINNIINFLYTTGQRWKSEEYSRRRAYIRSLITYLG ... ...
Use the splat operator (*) in combination with the
zip
builtin to iterate over the columns of an alignment:>>> for idx, col in enumerate(zip(*luxc.alignment)): ... print(idx+1, col) 1 ('L', 'I', 'V', 'L', 'L', ...) 2 ('A', 'Y', 'H', 'L', 'N', ...) ...
Added in version 0.4.8.
- sequences#
A view of the sequences in the alignment.
This property lets you access the individual sequences in the multiple sequence alignment as
TextSequence
instances.Example
Query the number of sequences in the alignment with
len
, or access individual members via indexing notation:>>> s1 = TextSequence(name=b"seq1", sequence="ATGC") >>> s2 = TextSequence(name=b"seq2", sequence="ATGC") >>> msa = TextMSA(name=b"msa", sequences=[s1, s2]) >>> len(msa.sequences) 2 >>> msa.sequences[0].name b'seq1'
Caution
Sequences in the list are copies, so editing their attributes will have no effect on the alignment:
>>> msa.sequences[0].name b'seq1' >>> msa.sequences[0].name = b"seq1bis" >>> msa.sequences[0].name b'seq1'
Support for this feature may be added in a future version, but can be circumvented for now by forcingly setting the updated version of the object:
>>> seq = msa.sequences[0] >>> seq.name = b"seq1bis" >>> msa.sequences[0] = seq >>> msa.sequences[0].name b'seq1bis'
Added in version 0.3.0.
- Type:
_TextMSASequences
- class pyhmmer.easel.DigitalMSA(MSA)#
A multiple sequence alignment stored in digital mode.
- __init__(alphabet, name=None, description=None, accession=None, sequences=None, author=None)#
Create a new digital-mode alignment with the given
sequences
.- Parameters:
alphabet (
Alphabet
) – The alphabet of the alignmed sequences.name (
bytes
, optional) – The name of the alignment, if any.description (
bytes
, optional) – The description of the alignment, if any.accession (
bytes
, optional) – The accession of the alignment, if any.sequences (iterable of
DigitalSequence
) – The sequences to store in the multiple sequence alignment. All sequences must have the same length and alphabet. They also need to have distinct names set.author (
bytes
, optional) – The author of the alignment, often used to record the aligner it was created with.
Changed in version 0.3.0: Allow creating an alignment from an iterable of
DigitalSequence
.
- copy()#
Duplicate the digital sequence alignment, and return the copy.
- textize()#
Convert the digital alignment to a text alignment.
- Returns:
TextMSA
– A copy of the alignment in text-mode.
Added in version 0.3.0.
- sequences#
A view of the sequences in the alignment.
This property lets you access the individual sequences in the multiple sequence alignment as
DigitalSequence
instances.See also
The documentation for the
TextMSA.sequences
property, which contains some additional information.Added in version 0.3.0.
- Type:
_DigitalMSASequences