Biopython访问NCBI数据库【biopython吧】

03月23日漏签0天

biopython吧关注：59贴子：174

14回复贴，共1页

<返回biopython吧

Biopython访问NCBI数据库

只看楼主收藏回复

>>> from Bio import Entrez

送TA礼物

IP属地:广东

1楼2017-02-20 19:02回复

使用email参数，这样如果遇到什么问题，NCBI可以通过邮件联系到你。你可以在每次请求Entrez的时候明确的设置这个参数，在参数列表中包含 email="A.N.Other@example.com"，或者你也可以设置一个全局的email 地址：
>>> from Bio import Entrez
>>> Entrez.email = "A.N.Other@example.com"

IP属地:广东

2楼2017-02-20 19:04

>>> from Bio import Entrez
>>> Entrez.email = "chendasong@gmail.com"
>>> handle = Entrez.einfo()
>>> result = handle.read()
>>> print(result)
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD einfo 20130322//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20130322/einfo.dtd">
<eInfoResult>
<DbList>
<DbName>pubmed</DbName>
<DbName>protein</DbName>
<DbName>nuccore</DbName>
<DbName>nucleotide</DbName>
<DbName>nucgss</DbName>
<DbName>nucest</DbName>
<DbName>structure</DbName>
<DbName>sparcle</DbName>
<DbName>genome</DbName>
<DbName>annotinfo</DbName>
<DbName>assembly</DbName>
<DbName>bioproject</DbName>
<DbName>biosample</DbName>
<DbName>blastdbinfo</DbName>
<DbName>books</DbName>
<DbName>cdd</DbName>
<DbName>clinvar</DbName>
<DbName>clone</DbName>
<DbName>gap</DbName>
<DbName>gapplus</DbName>
<DbName>grasp</DbName>
<DbName>dbvar</DbName>
<DbName>gene</DbName>
<DbName>gds</DbName>
<DbName>geoprofiles</DbName>
<DbName>homologene</DbName>
<DbName>medgen</DbName>
<DbName>mesh</DbName>
<DbName>ncbisearch</DbName>
<DbName>nlmcatalog</DbName>
<DbName>omim</DbName>
<DbName>orgtrack</DbName>
<DbName>pmc</DbName>
<DbName>popset</DbName>
<DbName>probe</DbName>
<DbName>proteinclusters</DbName>
<DbName>pcassay</DbName>
<DbName>biosystems</DbName>
<DbName>pccompound</DbName>
<DbName>pcsubstance</DbName>
<DbName>pubmedhealth</DbName>
<DbName>seqannot</DbName>
<DbName>snp</DbName>
<DbName>sra</DbName>
<DbName>taxonomy</DbName>
<DbName>unigene</DbName>
<DbName>gencoll</DbName>
<DbName>gtr</DbName>
</DbList>
</eInfoResult>

IP属地:广东

3楼2017-02-20 19:09

>>> handle = Entrez.einfo()
>>> record = Entrez.read(handle)
>>> record.keys()
dict_keys(['DbList'])
>>> record["DbList"]
['pubmed', 'protein', 'nuccore', 'nucleotide', 'nucgss', 'nucest', 'structure', 'sparcle', 'genome', 'annotinfo', 'assembly', 'bioproject', 'biosample', 'blastdbinfo', 'books', 'cdd', 'clinvar', 'clone', 'gap', 'gapplus', 'grasp', 'dbvar', 'gene', 'gds', 'geoprofiles', 'homologene', 'medgen', 'mesh', 'ncbisearch', 'nlmcatalog', 'omim', 'orgtrack', 'pmc', 'popset', 'probe', 'proteinclusters', 'pcassay', 'biosystems', 'pccompound', 'pcsubstance', 'pubmedhealth', 'seqannot', 'snp', 'sra', 'taxonomy', 'unigene', 'gencoll', 'gtr']

IP属地:广东

4楼2017-02-20 19:12

>>> handle = Entrez.einfo(db="pubmed")
>>> record = Entrez.read(handle)
>>> record["DbInfo"]["Description"]
'PubMed bibliographic record'
>>> record["DbInfo"]["Count"]
'26935290'
>>> record["DbInfo"]["LastUpdate"]
'2017/02/20 02:08'

IP属地:广东

5楼2017-02-20 19:14

>>> handle = Entrez.esearch(db="nucleotide",term="Cypripedioideae[Orgn] AND matK[Gene]")
>>> record = Entrez.read(handle)
>>> record["Count"]
'348'
>>> record["IdList"]
['402502985', '402502983', '402502981', '402502979', '402502977', '402502975', '402502973', '402502971', '402502969', '402502967', '402502965', '402502963', '402502961', '402502959', '402502957', '402502955', '402502953', '402502951', '402502949', '402502947']

IP属地:广东

6楼2017-02-20 19:25

>>> from Bio import Entrez, SeqIO
>>> handle = Entrez.efetch(db="nucleotide", id="186972394",rettype="gb", retmode="text")
>>> record = SeqIO.read(handle, "genbank")
>>> handle.close()
>>> print (record)
ID: EU490707.1
Name: EU490707
Description: Selenipedium aequinoctiale maturase K (matK) gene, partial cds; chloroplast.
Number of features: 3
/topology=linear
/references=[Reference(title='Phylogenetic utility of ycf1 in orchids: a plastid gene more variable than matK', ...), Reference(title='Direct Submission', ...)]
/sequence_version=1
/keywords=['']
/accessions=['EU490707']
/data_file_division=PLN
/organism=Selenipedium aequinoctiale
/date=26-JUL-2016
/taxonomy=['Eukaryota', 'Viridiplantae', 'Streptophyta', 'Embryophyta', 'Tracheophyta', 'Spermatophyta', 'Magnoliophyta', 'Liliopsida', 'Asparagales', 'Orchidaceae', 'Cypripedioideae', 'Selenipedium']
/source=chloroplast Selenipedium aequinoctiale
Seq('ATTTTTTACGAACCTGTGGAAATTTTTGGTTATGACAATAAATCTAGTTTAGTA...GAA', IUPACAmbiguousDNA())

IP属地:广东

8楼2017-02-20 19:50

需要注意的是，一种更加典型的用法是先把序列数据保存到一个本地文件，然后使用 Bio.SeqIO 来解析。这样就避免了在运行脚本的时候需要重复的下载同样的文件，并减轻NCBI服务器的负载。
import os
from Bio import SeqIO
from Bio import Entrez
Entrez.email = "A.N.Other@example.com" # Always tell NCBI who you are
filename = "gi_186972394.gbk"
if not os.path.isfile(filename):
# Downloading...
net_handle = Entrez.efetch(db="nucleotide",id="186972394",rettype="gb", retmode="text")
out_handle = open(filename, "w")
out_handle.write(net_handle.read())
out_handle.close()
net_handle.close()
print "Saved"
print "Parsing..."
record = SeqIO.read(filename, "genbank")
print record

IP属地:广东

9楼2017-02-21 21:05

多谢，挺有用的。

IP属地:福建

10楼2018-03-08 17:22

您好我想学习一下biopython，有些问题想跟您咨询一下

11楼2018-09-27 17:12

收起回复

请问如何通过BIOPYTHON 的序列,查到基因的相关信息呢?

IP属地:德国

12楼2020-03-12 11:57

收起回复

record["IdList"] 是不是只有20个数据？

IP属地:广东

14楼2020-05-24 11:52

扫二维码下载贴吧客户端

下载贴吧APP
看高清直播、视频！

贴吧热议榜

14回复贴，共1页

<返回biopython吧

发表回复

发贴请遵守贴吧协议及“七条底线”贴吧投诉

内容:

使用签名档查看全部

发表

保存至快速回贴

日	一	二	三	四	五	六

Biopython访问NCBI数据库

登录百度账号

扫二维码下载贴吧客户端