biopython吧 关注:59贴子:174
  • 4回复贴,共1

序列文件作为字典 - 数据库索引文件

只看楼主收藏回复

Bio.SeqIO.index_db()将序列信息以文件方式存储在硬盘上,它可以处理超大文件。


IP属地:广东1楼2017-02-11 18:54回复
    >>> from Bio import SeqIO
    >>> files = ["gbvrli.seq"]
    >>> gb_vrl = SeqIO.index_db("gbvrl.idx", files, "genbank")
    >>> print "%i sequences indexed" % len(gb_vrl)
    958086 sequences indexed


    IP属地:广东2楼2017-02-11 19:26
    回复
      >>> print gb_vrl["GQ333173.1"].descriptionHIV-1 isolate F12279A1 from Uganda gag protein (gag) gene, partial cds.


      IP属地:广东3楼2017-02-11 19:26
      回复
        >>> print gb_vrl["GQ333173.1"].description
        HIV-1 isolate F12279A1 from Uganda gag protein (gag) gene, partial cds.


        IP属地:广东4楼2017-02-11 19:26
        回复
          >>> print gb_vrl.get_raw("GQ333173.1")
          LOCUS GQ333173 459 bp DNA linear VRL 21-OCT-2009
          DEFINITION HIV-1 isolate F12279A1 from Uganda gag protein (gag) gene, partial
          cds.
          ACCESSION GQ333173
          ...
          //


          IP属地:广东5楼2017-02-11 19:27
          回复