biopython吧 关注:59贴子:174
  • 14回复贴,共1

Biopython访问NCBI数据库

只看楼主收藏回复

>>> from Bio import Entrez


IP属地:广东1楼2017-02-20 19:02回复
    使用email参数,这样如果遇到什么问题,NCBI可以通过邮件联系到你。你可以在每次请求Entrez的时候明确的设置 这个参数,在参数列表中包含 email="A.N.Other@example.com",或者你也可以设置一个全局的email 地址:
    >>> from Bio import Entrez
    >>> Entrez.email = "A.N.Other@example.com"


    IP属地:广东2楼2017-02-20 19:04
    回复
      >>> from Bio import Entrez
      >>> Entrez.email = "chendasong@gmail.com"
      >>> handle = Entrez.einfo()
      >>> result = handle.read()
      >>> print(result)
      <?xml version="1.0" encoding="UTF-8" ?>
      <!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD einfo 20130322//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20130322/einfo.dtd">
      <eInfoResult>
      <DbList>
      <DbName>pubmed</DbName>
      <DbName>protein</DbName>
      <DbName>nuccore</DbName>
      <DbName>nucleotide</DbName>
      <DbName>nucgss</DbName>
      <DbName>nucest</DbName>
      <DbName>structure</DbName>
      <DbName>sparcle</DbName>
      <DbName>genome</DbName>
      <DbName>annotinfo</DbName>
      <DbName>assembly</DbName>
      <DbName>bioproject</DbName>
      <DbName>biosample</DbName>
      <DbName>blastdbinfo</DbName>
      <DbName>books</DbName>
      <DbName>cdd</DbName>
      <DbName>clinvar</DbName>
      <DbName>clone</DbName>
      <DbName>gap</DbName>
      <DbName>gapplus</DbName>
      <DbName>grasp</DbName>
      <DbName>dbvar</DbName>
      <DbName>gene</DbName>
      <DbName>gds</DbName>
      <DbName>geoprofiles</DbName>
      <DbName>homologene</DbName>
      <DbName>medgen</DbName>
      <DbName>mesh</DbName>
      <DbName>ncbisearch</DbName>
      <DbName>nlmcatalog</DbName>
      <DbName>omim</DbName>
      <DbName>orgtrack</DbName>
      <DbName>pmc</DbName>
      <DbName>popset</DbName>
      <DbName>probe</DbName>
      <DbName>proteinclusters</DbName>
      <DbName>pcassay</DbName>
      <DbName>biosystems</DbName>
      <DbName>pccompound</DbName>
      <DbName>pcsubstance</DbName>
      <DbName>pubmedhealth</DbName>
      <DbName>seqannot</DbName>
      <DbName>snp</DbName>
      <DbName>sra</DbName>
      <DbName>taxonomy</DbName>
      <DbName>unigene</DbName>
      <DbName>gencoll</DbName>
      <DbName>gtr</DbName>
      </DbList>
      </eInfoResult>


      IP属地:广东3楼2017-02-20 19:09
      回复
        >>> handle = Entrez.einfo()
        >>> record = Entrez.read(handle)
        >>> record.keys()
        dict_keys(['DbList'])
        >>> record["DbList"]
        ['pubmed', 'protein', 'nuccore', 'nucleotide', 'nucgss', 'nucest', 'structure', 'sparcle', 'genome', 'annotinfo', 'assembly', 'bioproject', 'biosample', 'blastdbinfo', 'books', 'cdd', 'clinvar', 'clone', 'gap', 'gapplus', 'grasp', 'dbvar', 'gene', 'gds', 'geoprofiles', 'homologene', 'medgen', 'mesh', 'ncbisearch', 'nlmcatalog', 'omim', 'orgtrack', 'pmc', 'popset', 'probe', 'proteinclusters', 'pcassay', 'biosystems', 'pccompound', 'pcsubstance', 'pubmedhealth', 'seqannot', 'snp', 'sra', 'taxonomy', 'unigene', 'gencoll', 'gtr']


        IP属地:广东4楼2017-02-20 19:12
        回复
          >>> handle = Entrez.einfo(db="pubmed")
          >>> record = Entrez.read(handle)
          >>> record["DbInfo"]["Description"]
          'PubMed bibliographic record'
          >>> record["DbInfo"]["Count"]
          '26935290'
          >>> record["DbInfo"]["LastUpdate"]
          '2017/02/20 02:08'


          IP属地:广东5楼2017-02-20 19:14
          回复
            >>> handle = Entrez.esearch(db="nucleotide",term="Cypripedioideae[Orgn] AND matK[Gene]")
            >>> record = Entrez.read(handle)
            >>> record["Count"]
            '348'
            >>> record["IdList"]
            ['402502985', '402502983', '402502981', '402502979', '402502977', '402502975', '402502973', '402502971', '402502969', '402502967', '402502965', '402502963', '402502961', '402502959', '402502957', '402502955', '402502953', '402502951', '402502949', '402502947']


            IP属地:广东6楼2017-02-20 19:25
            回复
              >>> from Bio import Entrez, SeqIO
              >>> handle = Entrez.efetch(db="nucleotide", id="186972394",rettype="gb", retmode="text")
              >>> record = SeqIO.read(handle, "genbank")
              >>> handle.close()
              >>> print (record)
              ID: EU490707.1
              Name: EU490707
              Description: Selenipedium aequinoctiale maturase K (matK) gene, partial cds; chloroplast.
              Number of features: 3
              /topology=linear
              /references=[Reference(title='Phylogenetic utility of ycf1 in orchids: a plastid gene more variable than matK', ...), Reference(title='Direct Submission', ...)]
              /sequence_version=1
              /keywords=['']
              /accessions=['EU490707']
              /data_file_division=PLN
              /organism=Selenipedium aequinoctiale
              /date=26-JUL-2016
              /taxonomy=['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta', 'Tracheophyta', 'Spermatophyta', 'Magnoliophyta', 'Liliopsida', 'Asparagales', 'Orchidaceae', 'Cypripedioideae', 'Selenipedium']
              /source=chloroplast Selenipedium aequinoctiale
              Seq('ATTTTTTACGAACCTGTGGAAATTTTTGGTTATGACAATAAATCTAGTTTAGTA...GAA', IUPACAmbiguousDNA())


              IP属地:广东8楼2017-02-20 19:50
              回复
                需要注意的是,一种更加典型的用法是先把序列数据保存到一个本地文件,然后 使用 Bio.SeqIO 来解析。这样就避免了 在运行脚本的时候需要重复的下载同样的文件,并减轻NCBI服务器的负载。
                import os
                from Bio import SeqIO
                from Bio import Entrez
                Entrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
                filename = "gi_186972394.gbk"
                if not os.path.isfile(filename):
                # Downloading...
                net_handle = Entrez.efetch(db="nucleotide",id="186972394",rettype="gb", retmode="text")
                out_handle = open(filename, "w")
                out_handle.write(net_handle.read())
                out_handle.close()
                net_handle.close()
                print "Saved"
                print "Parsing..."
                record = SeqIO.read(filename, "genbank")
                print record


                IP属地:广东9楼2017-02-21 21:05
                回复
                  多谢,挺有用的。


                  IP属地:福建10楼2018-03-08 17:22
                  回复
                    您好 我想学习一下biopython,有些问题想跟您咨询一下


                    11楼2018-09-27 17:12
                    收起回复
                      请问如何通过BIOPYTHON 的序列,查到基因的相关信息呢?


                      IP属地:德国12楼2020-03-12 11:57
                      收起回复
                        record["IdList"] 是不是只有20个数据?


                        IP属地:广东14楼2020-05-24 11:52
                        回复