biopython吧 关注:59贴子:174

回复:biopython对序列的处理

只看楼主收藏回复

核苷酸序列和(反向)互补序列:
>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import IUPAC
>>> my_seq = Seq("GATCGATGGGCCTATATAGGATCGAAAATCGC", IUPAC.unambiguous_dna)
>>> my_seq
Seq('GATCGATGGGCCTATATAGGATCGAAAATCGC', IUPACUnambiguousDNA())
>>> my_seq.complement()
Seq('CTAGCTACCCGGATATATCCTAGCTTTTAGCG', IUPACUnambiguousDNA())
>>> my_seq.reverse_complement()
Seq('GCGATTTTCGATCCTATATAGGCCCATCGATC', IUPACUnambiguousDNA())


IP属地:广东16楼2016-12-29 20:23
收起回复
    转录:
    >>> from Bio.Seq import Seq
    >>> from Bio.Alphabet import IUPAC
    >>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", IUPAC.unambiguous_dna)
    >>> coding_dna
    Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG', IUPACUnambiguousDNA())
    >>> template_dna = coding_dna.reverse_complement()
    >>> template_dna
    Seq('CTATCGGGCACCCTTTCAGCGGCCCATTACAATGGCCAT', IUPACUnambiguousDNA())
    >>> coding_dna
    Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG', IUPACUnambiguousDNA())
    >>> messenger_rna = coding_dna.transcribe()
    >>> messenger_rna
    Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', IUPACUnambiguousRNA())


    IP属地:广东17楼2016-12-29 20:38
    回复
      逆转录:
      >>> from Bio.Seq import Seq
      >>> from Bio.Alphabet import IUPAC
      >>> messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG", IUPAC.unambiguous_rna)
      >>> messenger_rna
      Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', IUPACUnambiguousRNA())
      >>> messenger_rna.back_transcribe()
      Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG', IUPACUnambiguousDNA())


      IP属地:广东18楼2016-12-29 20:40
      回复
        翻译:
        >>> from Bio.Seq import Seq
        >>> from Bio.Alphabet import IUPAC
        >>> messenger_rna = Seq("AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG", IUPAC.unambiguous_rna
        )>>> messenger_rna
        Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG', IUPACUnambiguousRNA())
        >>> messenger_rna.translate()
        Seq('MAIVMGR*KGAR*', HasStopCodon(IUPACProtein(), '*'))
        >>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG", IUPAC.unambiguous_dna)
        >>> coding_dna
        Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG', IUPACUnambiguousDNA())
        >>> coding_dna.translate()
        Seq('MAIVMGR*KGAR*', HasStopCodon(IUPACProtein(), '*'))


        IP属地:广东19楼2016-12-29 20:42
        回复
          更换遗传密码与仅翻译到阅读框的第一个终止密码子:
          >>> coding_dna.translate(table="Vertebrate Mitochondrial")
          Seq('MAIVMGRWKGAR*', HasStopCodon(IUPACProtein(), '*'))
          >>> coding_dna.translate(table=2)
          Seq('MAIVMGRWKGAR*', HasStopCodon(IUPACProtein(), '*'))
          >>> coding_dna.translate()
          Seq('MAIVMGR*KGAR*', HasStopCodon(IUPACProtein(), '*'))
          >>> coding_dna.translate(to_stop=True)
          Seq('MAIVMGR', IUPACProtein())
          >>> coding_dna.translate(table=2)
          Seq('MAIVMGRWKGAR*', HasStopCodon(IUPACProtein(), '*'))
          >>> coding_dna.translate(table=2, to_stop=True)
          Seq('MAIVMGRWKGAR', IUPACProtein())


          IP属地:广东20楼2016-12-29 20:46
          收起回复
            自己指定终止符:
            >>> coding_dna.translate(table=2, stop_symbol="@")
            Seq('MAIVMGRWKGAR@', HasStopCodon(IUPACProtein(), '@'))


            IP属地:广东21楼2016-12-29 20:49
            回复
              在细菌遗传密码中 GTG 是个有效的起始密码子。 正常情况下编码缬氨酸, 如果作为起始密码子,则翻译成甲硫氨酸。当你告诉Biopython你的序列是完整CDS时:
              >>> from Bio.Seq import Seq
              >>> from Bio.Alphabet import generic_dna
              >>> gene = Seq("GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCA" + \
              ... "GCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGAT" + \
              ... "AATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACAT" + \
              ... "TATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCAT" + \
              ... "AAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA",
              ... generic_dna)
              >>> gene.translate(table="Bacterial")
              Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR*',HasStopCodon(ExtendedIUPACProtein(), '*')
              >>> gene.translate(table="Bacterial", to_stop=True)
              Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR',ExtendedIUPACProtein())
              >>> gene.translate(table="Bacterial", cds=True)
              Seq('MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR',ExtendedIUPACProtein())


              IP属地:广东22楼2016-12-29 20:52
              回复
                翻译表:
                >>> from Bio.Data import CodonTable
                >>> standard_table = CodonTable.unambiguous_dna_by_name["Standard"]
                >>> mito_table = CodonTable.unambiguous_dna_by_name["Vertebrate Mitochondrial"]
                >>> standard_table = CodonTable.unambiguous_dna_by_id[1]
                >>> mito_table = CodonTable.unambiguous_dna_by_id[2]
                >>> mito_table.stop_codons['TAA', 'TAG', 'AGA', 'AGG']
                >>> mito_table.start_codons['ATT', 'ATC', 'ATA', 'ATG', 'GTG']
                >>> mito_table.forward_table["ACG"]'T'


                IP属地:广东23楼2016-12-29 20:56
                回复
                  1 The Standard Code (transl_table=1)
                  2 The Vertebrate Mitochondrial Code (transl_table=2)
                  3 The Yeast Mitochondrial Code (transl_table=3)
                  4 The Mold, Protozoan, and Coelenterate Mitochondrial Code and the Mycoplasma/Spiroplasma Code (transl_table=4)
                  5 The Invertebrate Mitochondrial Code (transl_table=5)
                  6 The Ciliate, Dasycladacean and Hexamita Nuclear Code (transl_table=6)
                  9 The Echinoderm and Flatworm Mitochondrial Code (transl_table=9)
                  10 The Euplotid Nuclear Code (transl_table=10)
                  11 The Bacterial, Archaeal and Plant Plastid Code (transl_table=11)
                  12 The Alternative Yeast Nuclear Code (transl_table=12)
                  13 The Ascidian Mitochondrial Code (transl_table=13)
                  14 The Alternative Flatworm Mitochondrial Code (transl_table=14)
                  16 Chlorophycean Mitochondrial Code (transl_table=16)
                  21 Trematode Mitochondrial Code (transl_table=21)
                  22 Scenedesmus obliquus Mitochondrial Code (transl_table=22)
                  23 Thraustochytrium Mitochondrial Code (transl_table=23)
                  24 Pterobranchia Mitochondrial Code (transl_table=24)
                  25 Candidate Division SR1 and Gracilibacteria Code (transl_table=25)
                  26 Pachysolen tannophilus Nuclear Code (transl_table=26)
                  27 Karyorelict Nuclear (transl_table=27)
                  28 Condylostoma Nuclear (transl_table=28)
                  29 Mesodinium Nuclear (transl_table=29)
                  30 Peritrich Nuclear (transl_table=30)
                  31 Blastocrithidia Nuclear (transl_table=31)


                  IP属地:广东24楼2016-12-29 20:58
                  回复
                    就像正常的Python字符串, Seq 对象是 “只读的” ,但是可以使用 MutableSeq 对象将它转换成可变的序列:
                    >>> from Bio.Seq import Seq
                    >>> from Bio.Alphabet import IUPAC
                    >>> my_seq = Seq("GCCATTGTAATGGGCCGCTGAAAGGGTGCCCGA", IUPAC.unambiguous_dna)
                    >>> my_seq[5] = "G"
                    Traceback (most recent call last):
                    ...
                    TypeError: 'Seq' object does not support item assignment
                    >>> mutable_seq = my_seq.tomutable()
                    >>> mutable_seq
                    MutableSeq('GCCATTGTAATGGGCCGCTGAAAGGGTGCCCGA', IUPACUnambiguousDNA())
                    >>> mutable_seq[5] = "C"
                    >>> mutable_seq
                    MutableSeq('GCCATCGTAATGGGCCGCTGAAAGGGTGCCCGA', IUPACUnambiguousDNA())
                    >>> mutable_seq.remove("T")
                    >>> mutable_seq
                    MutableSeq('GCCACGTAATGGGCCGCTGAAAGGGTGCCCGA', IUPACUnambiguousDNA())
                    >>> mutable_seq.reverse()
                    >>> mutable_seq
                    MutableSeq('AGCCCGTGGGAAAGTCGCCGGGTAATGCACCG', IUPACUnambiguousDNA())
                    >>> new_seq = mutable_seq.toseq()
                    >>> new_seq
                    Seq('AGCCCGTGGGAAAGTCGCCGGGTAATGCACCG', IUPACUnambiguousDNA())
                    与 Seq 对象不同的是, MutableSeq 对象的各种函数都是实时呈现的


                    IP属地:广东25楼2016-12-30 17:51
                    收起回复
                      UnknownSeq对象:
                      一个已知长度的 序列,但序列并不是由实际的字母组成的。在这种情况下,你当然可以将其作为一个 正常的 Seq 对象,但是存储由一百万个 “N” 字母组成的字符串会浪费相当大量的内 存,这时你可以只存储一个 “N” 和序列所需的长度(整数)。
                      >>> from Bio.Seq import UnknownSeq
                      >>> unk = UnknownSeq(20)
                      >>> unk
                      UnknownSeq(20, alphabet = Alphabet(), character = '?'
                      )>>> print unk
                      ????????????????????
                      >>> len(unk)
                      20
                      >>> from Bio.Alphabet import IUPAC
                      >>> unk_dna = UnknownSeq(20, alphabet=IUPAC.ambiguous_dna)
                      >>> unk_dna
                      UnknownSeq(20, alphabet = IUPACAmbiguousDNA(), character = 'N')
                      >>> print unk_dna
                      NNNNNNNNNNNNNNNNNNNN
                      >>> unk_dna.complement()
                      UnknownSeq(20, alphabet = IUPACAmbiguousDNA(), character = 'N')
                      >>> unk_dna.reverse_complement()
                      UnknownSeq(20, alphabet = IUPACAmbiguousDNA(), character = 'N')
                      >>> unk_dna.transcribe()
                      UnknownSeq(20, alphabet = IUPACAmbiguousRNA(), character = 'N')
                      >>> unk_protein = unk_dna.translate()
                      >>> unk_protein
                      UnknownSeq(6, alphabet = ProteinAlphabet(), character = 'X')
                      >>> print unk_protein
                      XXXXXX
                      >>> len(unk_protein)
                      6


                      IP属地:广东26楼2016-12-30 18:04
                      回复
                        不想使用序列对象的人或者那些更喜欢面向 对象的函数式编程风格的人, Bio.Seq 的模块级别的函数可以接受普通的 Python字符串:
                        >>> from Bio.Seq import reverse_complement, transcribe, back_transcribe, translate
                        >>> my_string = "GCTGTTATGGGTCGTTGGAAGGGTGGTCGTGCTGCTGGTTAG"
                        >>> reverse_complement(my_string)
                        'CTAACCAGCAGCACGACCACCCTTCCAACGACCCATAACAGC'
                        >>> transcribe(my_string)
                        'GCUGUUAUGGGUCGUUGGAAGGGUGGUCGUGCUGCUGGUUAG'
                        >>> back_transcribe(my_string)
                        'GCTGTTATGGGTCGTTGGAAGGGTGGTCGTGCTGCTGGTTAG'
                        >>> translate(my_string)
                        'AVMGRWKGGRAAG*'


                        IP属地:广东27楼2016-12-30 18:08
                        回复
                          好东西啊~但没有人气


                          IP属地:上海28楼2018-05-09 21:23
                          回复