Alignments#

class pyhmmer.easel.MSA#

An abstract alignment of multiple sequences.

Hint

Use len(msa) to get the number of columns in the alignment, and len(msa.sequences) to get the number of sequences (i.e. the number of rows).

checksum()#

Calculate a 32-bit checksum for the multiple sequence alignment.

write(fh, format)#

Write the multiple sequence alignement to a file handle.

Parameters:
  • fh (io.IOBase) – A Python file handle, opened in binary mode.

  • format (str) – The name of the multiple sequence alignment file format to use.

Added in version 0.3.0.

accession#

The accession of the alignment, if any.

Type:

bytes or None

author#

The author of the alignment, if any.

Type:

bytes or None

description#

The description of the alignment, if any.

Type:

bytes or None

name#

The name of the alignment, if any.

Type:

bytes or None

names#

The name of each sequence in the alignment.

Every sequence in the alignment is required to have a name, so no member of the tuple will ever be None.

Example

>>> s1 = TextSequence(name=b"seq1", sequence="ATGC")
>>> s2 = TextSequence(name=b"seq2", sequence="ATGC")
>>> msa = TextMSA(name=b"msa", sequences=[s1, s2])
>>> msa.names
(b'seq1', b'seq2')

Added in version 0.4.8.

Type:

tuple of bytes

class pyhmmer.easel.TextMSA(MSA)#

A multiple sequence alignement stored in text mode.

__init__(name=None, description=None, accession=None, sequences=None, author=None)#

Create a new text-mode alignment with the given sequences.

Parameters:
  • name (bytes, optional) – The name of the alignment, if any.

  • description (bytes, optional) – The description of the alignment, if any.

  • accession (bytes, optional) – The accession of the alignment, if any.

  • sequences (collection of TextSequence) – The sequences to store in the multiple sequence alignment. All sequences must have the same length. They also need to have distinct names.

  • author (bytes, optional) – The author of the alignment, often used to record the aligner it was created with.

Raises:

Example

>>> s1 = TextSequence(name=b"seq1", sequence="ATGC")
>>> s2 = TextSequence(name=b"seq2", sequence="ATGC")
>>> msa = TextMSA(name=b"msa", sequences=[s1, s2])
>>> len(msa)
4

Changed in version 0.3.0: Allow creating an alignment from an iterable of TextSequence.

copy()#

Duplicate the text sequence alignment, and return the copy.

digitize(alphabet)#

Convert the text alignment to a digital alignment using alphabet.

Returns:

DigitalMSA – An alignment in digital mode containing the same sequences digitized with alphabet.

Raises:

ValueError – When the text sequence contains invalid characters that cannot be converted according to alphabet.symbols.

alignment#

A view of the aligned sequences as strings.

This property gives access to the aligned sequences, including gap characters, so that they can be displayed or processed column by column.

Examples

Use TextMSA.alignment to display an alignment in text format:

>>> for name, aligned in zip(luxc.names, luxc.alignment):
...     print(name, " ", aligned[:40], "...")
b'Q9KV99.1'   LANQPLEAILGLINEARKSWSST------------PELDP ...
b'Q2WLE3.1'   IYSYPSEAMIEIINEYSKILCSD------------RKFLS ...
b'Q97GS8.1'   VHDIKTEETIDLLDRCAKLWLDDNYSKK--HIETLAQITN ...
b'Q3WCI9.1'   LLNVPLKEIIDFLVETGERIRDPRNTFMQDCIDRMAGTHV ...
b'P08639.1'   LNDLNINNIINFLYTTGQRWKSEEYSRRRAYIRSLITYLG ...
...

Use the splat operator (*) in combination with the zip builtin to iterate over the columns of an alignment:

>>> for idx, col in enumerate(zip(*luxc.alignment)):
...     print(idx+1, col)
1 ('L', 'I', 'V', 'L', 'L', ...)
2 ('A', 'Y', 'H', 'L', 'N', ...)
...

Added in version 0.4.8.

Type:

tuple of str

sequences#

A view of the sequences in the alignment.

This property lets you access the individual sequences in the multiple sequence alignment as TextSequence instances.

Example

Query the number of sequences in the alignment with len, or access individual members via indexing notation:

>>> s1 = TextSequence(name=b"seq1", sequence="ATGC")
>>> s2 = TextSequence(name=b"seq2", sequence="ATGC")
>>> msa = TextMSA(name=b"msa", sequences=[s1, s2])
>>> len(msa.sequences)
2
>>> msa.sequences[0].name
b'seq1'

Caution

Sequences in the list are copies, so editing their attributes will have no effect on the alignment:

>>> msa.sequences[0].name
b'seq1'
>>> msa.sequences[0].name = b"seq1bis"
>>> msa.sequences[0].name
b'seq1'

Support for this feature may be added in a future version, but can be circumvented for now by forcingly setting the updated version of the object:

>>> seq = msa.sequences[0]
>>> seq.name = b"seq1bis"
>>> msa.sequences[0] = seq
>>> msa.sequences[0].name
b'seq1bis'

Added in version 0.3.0.

Type:

_TextMSASequences

class pyhmmer.easel.DigitalMSA(MSA)#

A multiple sequence alignment stored in digital mode.

alphabet#

The biological alphabet used to encode this sequence alignment to digits.

Type:

Alphabet

__init__(alphabet, name=None, description=None, accession=None, sequences=None, author=None)#

Create a new digital-mode alignment with the given sequences.

Parameters:
  • alphabet (Alphabet) – The alphabet of the alignmed sequences.

  • name (bytes, optional) – The name of the alignment, if any.

  • description (bytes, optional) – The description of the alignment, if any.

  • accession (bytes, optional) – The accession of the alignment, if any.

  • sequences (iterable of DigitalSequence) – The sequences to store in the multiple sequence alignment. All sequences must have the same length and alphabet. They also need to have distinct names set.

  • author (bytes, optional) – The author of the alignment, often used to record the aligner it was created with.

Changed in version 0.3.0: Allow creating an alignment from an iterable of DigitalSequence.

copy()#

Duplicate the digital sequence alignment, and return the copy.

textize()#

Convert the digital alignment to a text alignment.

Returns:

TextMSA – A copy of the alignment in text-mode.

Added in version 0.3.0.

sequences#

A view of the sequences in the alignment.

This property lets you access the individual sequences in the multiple sequence alignment as DigitalSequence instances.

See also

The documentation for the TextMSA.sequences property, which contains some additional information.

Added in version 0.3.0.

Type:

_DigitalMSASequences