biopython吧 关注:59贴子:174
  • 9回复贴,共1

Biopython解析BLAST 输出

只看楼主收藏回复

>>> result_handle = open("my_blast.xml")
>>> from Bio.Blast import NCBIXML
>>> blast_records = NCBIXML.parse(result_handle)
for - 循环
>>> for blast_record in blast_records:
... # Do something with blast_record
把迭代转换成列表
>>> blast_records = list(blast_records)


IP属地:广东1楼2017-02-17 20:09回复
    >>> for alignment in blast_record.alignments:
    ... for hsp in alignment.hsps:
    ... if hsp.expect < E_VALUE_THRESH:
    ... print '****Alignment****'
    ... print 'sequence:', alignment.title
    ... print 'length:', alignment.length
    ... print 'e value:', hsp.expect
    ... print hsp.query[0:75] + '...'
    ... print hsp.match[0:75] + '...'
    ... print hsp.sbjct[0:75] + '...'


    IP属地:广东2楼2017-02-17 20:11
    回复
      >>> from Bio import SearchIO
      >>> blast_qresult = SearchIO.read('my_blast.xml', 'blast-xml')
      >>> blast_hsp = blast_qresult[0][0] # first hit, first hsp
      >>> print blast_hsp
      Query: 42291 mystery_seq
      Hit: gi|262205317|ref|NR_030195.1| Homo sapiens microRNA 520b (MIR520...
      Query range: [0:61] (1)
      Hit range: [0:61] (1)
      Quick stats: evalue 4.9e-23; bitscore 111.29
      Fragments: 1 (61 columns)
      Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
      |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
      Hit - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
      >>> blast_hsp.query_range
      (0, 61)
      >>> blast_hsp.evalue
      4.91307e-23
      >>> blast_hsp.hit_start # start coordinate of the hit sequence
      0
      >>> blast_hsp.query_span # how many residues in the query sequence
      61
      >>> blast_hsp.aln_span # how long the alignment is
      61
      >>> blast_hsp.gap_num # number of gaps
      0
      >>> blast_hsp.ident_num # number of identical residues
      61
      >>> blast_hsp.query
      SeqRecord(seq=Seq('CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTT...GGG', DNAAlphabet()), id='42291', name='aligned query sequence', description='mystery_seq', dbxrefs=[])
      >>> blast_hsp.hit
      SeqRecord(seq=Seq('CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTT...GGG', DNAAlphabet()), id='gi|262205317|ref|NR_030195.1|', name='aligned hit sequence', description='Homo sapiens microRNA 520b (MIR520B), microRNA', dbxrefs=[])
      HSP 对象有个 alignment 属性
      >>> print blast_hsp.aln
      DNAAlphabet() alignment with 2 rows and 61 columns
      CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAG...GGG 42291
      CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAG...GGG gi|262205317|ref|NR_030195.1|


      IP属地:广东3楼2017-02-19 23:34
      回复
        >>> from Bio import SearchIO
        >>> blast_qresult = SearchIO.read('my_blast.xml', 'blast-xml')
        >>> blast_frag = blast_qresult[0][0][0] # first hit, first hsp, first fragment
        >>> print blast_frag
        Query: 42291 mystery_seq
        Hit: gi|262205317|ref|NR_030195.1| Homo sapiens microRNA 520b (MIR520...
        Query range: [0:61] (1)
        Hit range: [0:61] (1)
        Fragments: 1 (61 columns)
        Query - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG
        |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
        Hit - CCCTCTACAGGGAAGCGCTTTCTGTTGTCTGAAAGAAAAGAAAGTGCTTCCTTTTAGAGGG


        IP属地:广东4楼2017-02-19 23:38
        回复
          有时,你得到了一个包含成百上千个query的搜索输出文件要分析,你当然可以使用 Bio.SearchIO.parse 来处理,但是如果你仅仅需要访问少数query的话,效率 是及其低下的。这是因为 parse 会分析所有的query,直到找到你感兴趣。
          在这种情况下,理想的选择是用 Bio.SearchIO.index 或 Bio.SearchIO.index_db 来索引文件。


          IP属地:广东5楼2017-02-20 00:24
          回复
            >>> from Bio import SearchIO
            >>> qresults = SearchIO.parse('mirna.xml', 'blast-xml') # read XML file
            >>> SearchIO.write(qresults, 'results.tab', 'blast-tab') # write to tabular file
            (3, 239, 277, 277)


            IP属地:广东6楼2017-02-20 00:48
            回复
              >>> from Bio import SearchIO
              >>> SearchIO.convert('mirna.xml', 'blast-xml', 'results.tab', 'blast-tab')
              (3, 239, 277, 277)


              IP属地:广东7楼2017-02-20 00:49
              回复
                刚开始接触,愿越来越好


                来自iPhone客户端8楼2017-11-26 22:42
                回复
                  不错


                  IP属地:福建来自iPhone客户端9楼2017-11-27 12:23
                  回复


                    IP属地:福建来自iPhone客户端10楼2017-12-06 19:16
                    回复