biopython吧 关注:59贴子:174
  • 2回复贴,共1

序列文件转为字典写在内存中

只看楼主收藏回复

Bio.SeqIO.to_dict()


IP属地:广东1楼2017-01-13 20:54回复
    >>> from Bio import SeqIO
    >>> orchid_dict = SeqIO.to_dict(SeqIO.parse("ls_orchid.gbk", "genbank"))
    >>> len(orchid_dict)
    94
    >>> print orchid_dict.keys()
    ['Z78484.1', 'Z78464.1', 'Z78455.1', 'Z78442.1', 'Z78532.1', 'Z78453.1', ..., 'Z78471.1']
    可以一次性查看所有的序列条目:
    >>> orchid_dict.values() #lots of output!
    ...
    读取单个SeqRecord 对象并操作改对象:
    >>> seq_record = orchid_dict["Z78475.1"]
    >>> print seq_record.description
    P.supardii 5.8S rRNA gene and ITS1 and ITS2 DNA.
    >>> print repr(seq_record.seq)
    Seq('CGTAACAAGGTTTCCGTAGGTGAACCTGCGGAAGGATCATTGTTGAGATCACAT...GGT', IUPACAmbiguousDNA())


    IP属地:广东2楼2017-01-13 20:57
    回复
      FASTA文件:
      from Bio import SeqIO
      orchid_dict = SeqIO.to_dict(SeqIO.parse("ls_orchid.fasta", "fasta"))
      print orchid_dict.keys()
      ['gi|2765596|emb|Z78471.1|PDZ78471', 'gi|2765646|emb|Z78521.1|CCZ78521', ...
      ..., 'gi|2765613|emb|Z78488.1|PTZ78488', 'gi|2765583|emb|Z78458.1|PHZ78458']
      def get_accession(record):
      """"Given a SeqRecord, return the accession number as a string.
      e.g. "gi|2765613|emb|Z78488.1|PTZ78488" -> "Z78488.1"
      """
      parts = record.id.split("|")
      assert len(parts) == 5 and parts[0] == "gi" and parts[2] == "emb"
      return parts[3]
      orchid_dict = SeqIO.to_dict(SeqIO.parse("ls_orchid.fasta", "fasta"), key_function=get_accession)
      print orchid_dict.keys()
      >>> print orchid_dict.keys()
      ['Z78484.1', 'Z78464.1', 'Z78455.1', 'Z78442.1', 'Z78532.1', 'Z78453.1', ..., 'Z78471.1']


      IP属地:广东3楼2017-01-13 22:52
      回复