{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

Accessing UniProt Web Services from BioServices

\n", "\n", "

This notebook illustrates some of the uniprot web services using BioServices uniprot module. We show how to

\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] } ], "source": [ "from bioservices import *\n", "%pylab inline --no-import-all" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### UniProt service can help us getting information about a given protein" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "u = UniProt()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you already know the entry name, just type it:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Entry\tEntry name\tStatus\tProtein names\tGene names\tOrganism\tLength\n", "P43403\tZAP70_HUMAN\treviewed\tTyrosine-protein kinase ZAP-70 (EC 2.7.10.2) (70 kDa zeta-chain associated protein) (Syk-related tyrosine kinase)\tZAP70 SRK\tHomo sapiens (Human)\t619\n", "\n" ] } ], "source": [ "res = u.search(\"ZAP70_HUMAN\")\n", "print(res)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Otherwise, let us search the entire database. We can restrict the search to human species, and print results limited to 3 best matches and sub-selection of columns/information" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Entry name\tLength\tGene names\n", "ZAP70_HUMAN\t619\tZAP70 SRK\n", "CBL_HUMAN\t906\tCBL CBL2 RNF55\n", "LCK_HUMAN\t509\tLCK\n", "\n" ] } ], "source": [ "print(u.search('zap70+AND+organism:9606', frmt='tab', limit=3,\n", " columns=\"entry name, length, genes\"))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Entry\tEntry name\tStatus\tProtein names\tGene names\tOrganism\tLength\n", "P43403\tZAP70_HUMAN\treviewed\tTyrosine-protein kinase ZAP-70 (EC 2.7.10.2) (70 kDa zeta-chain associated protein) (Syk-related tyrosine kinase)\tZAP70 SRK\tHomo sapiens (Human)\t619\n", "\n" ] } ], "source": [ "print(res)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Experimental: using pandas to scan the output of the search function" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "u.debugLevel = \"INFO\"\n", "u.timeout = 10 # some queries are long and requires much more time; default is 1000 seconds\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Accession via entry name (e.g., ZAP70_HUMAN) is faster than by Entry (e.g., P43403)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
EntryEntry nameGene namesGene names (primary )Gene names (synonym )Gene names (ordered locus )Gene names (ORF )OrganismOrganism IDProtein names...Taxonomic lineage IDs (GENUS)Taxonomic lineage IDs (SUBGENUS)Taxonomic lineage IDs (SPECIES GROUP)Taxonomic lineage IDs (SPECIES SUBGROUP)Taxonomic lineage IDs (SPECIES)Taxonomic lineage IDs (SUBSPECIES)Taxonomic lineage IDs (VARIETAS)Taxonomic lineage IDs (FORMA)Cross-reference (db_abbrev)Cross-reference (EMBL)
0P43403ZAP70_HUMAN[ZAP70 SRK]ZAP70SRKNaNNaNHomo sapiens (Human)9606Tyrosine-protein kinase ZAP-70 (EC 2.7.10.2) (......9605NaNNaNNaNNaNNaNNaNNaNNaNAB083211;AC016699;BC039039;BC053878;
1Q8TD08MK15_HUMAN[MAPK15 ERK7 ERK8]MAPK15ERK7 ERK8NaNNaNHomo sapiens (Human)9606Mitogen-activated protein kinase 15 (MAP kinas......9605NaNNaNNaNNaNNaNNaNNaNNaNAY065978;AY994058;BC028034;
2P10144GRAB_HUMAN[GZMB CGL1 CSPB CTLA1 GRB]GZMBCGL1 CSPB CTLA1 GRBNaNNaNHomo sapiens (Human)9606Granzyme B (EC 3.4.21.79) (C11) (CTLA-1) (Cath......9605NaNNaNNaNNaNNaNNaNNaNNaNM17016;J03189;J04071;J03072;M38193;M28879;BC03...
3P05412JUN_HUMAN[JUN]JUNNaNNaNNaNHomo sapiens (Human)9606Transcription factor AP-1 (Activator protein 1......9605NaNNaNNaNNaNNaNNaNNaNNaNJ04111;CR541724;BT019759;AY217548;BC006175;BC0...
\n", "

4 rows × 179 columns

\n", "
" ], "text/plain": [ " Entry Entry name Gene names Gene names (primary ) \\\n", "0 P43403 ZAP70_HUMAN [ZAP70 SRK] ZAP70 \n", "1 Q8TD08 MK15_HUMAN [MAPK15 ERK7 ERK8] MAPK15 \n", "2 P10144 GRAB_HUMAN [GZMB CGL1 CSPB CTLA1 GRB] GZMB \n", "3 P05412 JUN_HUMAN [JUN] JUN \n", "\n", " Gene names (synonym ) Gene names (ordered locus ) Gene names (ORF ) \\\n", "0 SRK NaN NaN \n", "1 ERK7 ERK8 NaN NaN \n", "2 CGL1 CSPB CTLA1 GRB NaN NaN \n", "3 NaN NaN NaN \n", "\n", " Organism Organism ID \\\n", "0 Homo sapiens (Human) 9606 \n", "1 Homo sapiens (Human) 9606 \n", "2 Homo sapiens (Human) 9606 \n", "3 Homo sapiens (Human) 9606 \n", "\n", " Protein names \\\n", "0 Tyrosine-protein kinase ZAP-70 (EC 2.7.10.2) (... \n", "1 Mitogen-activated protein kinase 15 (MAP kinas... \n", "2 Granzyme B (EC 3.4.21.79) (C11) (CTLA-1) (Cath... \n", "3 Transcription factor AP-1 (Activator protein 1... \n", "\n", " ... \\\n", "0 ... \n", "1 ... \n", "2 ... \n", "3 ... \n", "\n", " Taxonomic lineage IDs (GENUS) Taxonomic lineage IDs (SUBGENUS) \\\n", "0 9605 NaN \n", "1 9605 NaN \n", "2 9605 NaN \n", "3 9605 NaN \n", "\n", " Taxonomic lineage IDs (SPECIES GROUP) \\\n", "0 NaN \n", "1 NaN \n", "2 NaN \n", "3 NaN \n", "\n", " Taxonomic lineage IDs (SPECIES SUBGROUP) Taxonomic lineage IDs (SPECIES) \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN NaN \n", "\n", " Taxonomic lineage IDs (SUBSPECIES) Taxonomic lineage IDs (VARIETAS) \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN NaN \n", "\n", " Taxonomic lineage IDs (FORMA) Cross-reference (db_abbrev) \\\n", "0 NaN NaN \n", "1 NaN NaN \n", "2 NaN NaN \n", "3 NaN NaN \n", "\n", " Cross-reference (EMBL) \n", "0 AB083211;AC016699;BC039039;BC053878; \n", "1 AY065978;AY994058;BC028034; \n", "2 M17016;J03189;J04071;J03072;M38193;M28879;BC03... \n", "3 J04111;CR541724;BT019759;AY217548;BC006175;BC0... \n", "\n", "[4 rows x 179 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = u.get_df([\"ZAP70_HUMAN\", \"GRAB_HUMAN\", \"JUN_HUMAN\", \"MK15_HUMAN\"])\n", "df" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4xLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvAOZPmwAAErhJREFUeJzt3XuMXGd5x/Hvg52bsiEBDNvIdrFpTYsVoyTemqBUsAtp46RV3EoJchRCUxEstSS0Sih1RJXStFUFVUqFmoa63KGwhLSlVuIqIOJVL2pC4ubiOMFlCVazSYi5ut0QCC5P/5hjMxrvzs7OnJkdv/p+pJHPOfPOzPM+PufnmbOzx5GZSJLK8oKlLkCSVD/DXZIKZLhLUoEMd0kqkOEuSQUy3CWpQIa7JBXIcJekAhnuklSg5Uv1witWrMg1a9b0/DzPPvssp556au8F9ZE19m7Y6wNrrIs1trdnz55vZeZLFxyYmUty27hxY9Zh9+7dtTxPP1lj74a9vkxrrIs1tgfcnx1krKdlJKlAhrskFchwl6QCGe6SVCDDXZIKtGC4R8RHIuJgRDwyz/0RER+IiOmIeDgizq2/TEnSYnTyzv1jwOY2918ErKtu24Bbey9LktSLBcM9M/8F+E6bIVuAT1RfwbwHOCMizqyrQEnS4tVxzn0l8ETT+ky1TZK0RCI7+A+yI2INcEdmnjXHfXcCf5aZ/1atfwl4V2bumWPsNhqnbhgdHd04OTnZVdF7nzx0dHn0FHjmua6episbVp6+6MfMzs4yMjLS0+s2z7kf2vWxmznXrY4e9tvxXGO/9692WvevQfWxlzn3mju9HFMTExN7MnNsoXF1XFtmBljdtL4KeGqugZm5A9gBMDY2luPj41294FXb7zy6fP2Gw9y8d3CXyDlwxfiiHzM1NUW3cz2iec790K6P3cy5bnX0sN+O5xr7vX+107p/DaqPvcy519wZxDFVx2mZncBbqm/NnAccysyna3heSVKXFvynJyI+A4wDKyJiBvhD4ASAzPwgsAu4GJgGvg/8Zr+KlSR1ZsFwz8zLF7g/gbfXVpEkqWf+hqokFchwl6QCGe6SVCDDXZIKZLhLUoEMd0kqkOEuSQUy3CWpQIa7JBXIcJekAhnuklQgw12SCmS4S1KBDHdJKpDhLkkFMtwlqUCGuyQVyHCXpAIZ7pJUIMNdkgpkuEtSgQx3SSqQ4S5JBTLcJalAhrskFchwl6QCGe6SVCDDXZIKZLhLUoEMd0kqkOEuSQUy3CWpQB2Fe0Rsjoj9ETEdEdvnuP+nI2J3RDwQEQ9HxMX1lypJ6tSC4R4Ry4BbgIuA9cDlEbG+ZdgfALdl5jnAVuCv6y5UktS5Tt65bwKmM/PxzHwemAS2tIxJ4IXV8unAU/WVKElarOUdjFkJPNG0PgO8pmXMe4AvRMS1wKnABbVUJ0nqSmRm+wERlwEXZubV1fqVwKbMvLZpzHXVc90cEa8FPgyclZk/bnmubcA2gNHR0Y2Tk5NdFb33yUNHl0dPgWee6+ppurJh5emLfszs7CwjIyM9vW7znPuhXR+7mXPd6uhhvx3PNfZ7/2qndf8aVB97mXOvudPLMTUxMbEnM8cWGtfJO/cZYHXT+iqOPe3yVmAzQGb+R0ScDKwADjYPyswdwA6AsbGxHB8f7+Dlj3XV9juPLl+/4TA37+1kGvU4cMX4oh8zNTVFt3M9onnO/dCuj93MuW519LDfjuca+71/tdO6fw2qj73MudfcGcQx1ck59/uAdRGxNiJOpPED050tY/4beCNARLwKOBn4Zp2FSpI6t2C4Z+Zh4BrgLuAxGt+K2RcRN0XEJdWw64G3RcRDwGeAq3Kh8z2SpL7p6HNFZu4CdrVsu7Fp+VHg/HpLkyR1y99QlaQCGe6SVCDDXZIKZLhLUoEMd0kqkOEuSQUy3CWpQIa7JBXIcJekAhnuklQgw12SCmS4S1KBDHdJKpDhLkkFMtwlqUCGuyQVyHCXpAIZ7pJUIMNdkgpkuEtSgQx3SSqQ4S5JBTLcJalAhrskFchwl6QCGe6SVCDDXZIKZLhLUoEMd0kqkOEuSQUy3CWpQIa7JBXIcJekAnUU7hGxOSL2R8R0RGyfZ8ybIuLRiNgXEZ+ut0xJ0mIsX2hARCwDbgF+CZgB7ouInZn5aNOYdcANwPmZ+d2IeFm/CpYkLayTd+6bgOnMfDwznwcmgS0tY94G3JKZ3wXIzIP1lilJWoxOwn0l8ETT+ky1rdkrgVdGxL9HxD0RsbmuAiVJixeZ2X5AxGXAhZl5dbV+JbApM69tGnMH8CPgTcAq4F+BszLzey3PtQ3YBjA6OrpxcnKyq6L3Pnno6PLoKfDMc109TVc2rDx90Y+ZnZ1lZGSkp9dtnnM/tOtjN3OuWx097LfjucZ+71/ttO5fg+pjL3PuNXd6OaYmJib2ZObYQuMWPOdO45366qb1VcBTc4y5JzN/BHw9IvYD64D7mgdl5g5gB8DY2FiOj4938PLHumr7nUeXr99wmJv3djKNehy4YnzRj5mamqLbuR7RPOd+aNfHbuZctzp62G/Hc4393r/aad2/BtXHXubca+4M4pjq5LTMfcC6iFgbEScCW4GdLWM+D0wARMQKGqdpHq+zUElS5xYM98w8DFwD3AU8BtyWmfsi4qaIuKQadhfw7Yh4FNgN/F5mfrtfRUuS2uvoc0Vm7gJ2tWy7sWk5geuqmyRpifkbqpJUIMNdkgpkuEtSgQx3SSqQ4S5JBTLcJalAhrskFchwl6QCGe6SVCDDXZIKZLhLUoEMd0kqkOEuSQUy3CWpQIa7JBXIcJekAhnuklQgw12SCmS4S1KBDHdJKpDhLkkFMtwlqUCGuyQVyHCXpAIZ7pJUIMNdkgpkuEtSgQx3SSqQ4S5JBTLcJalAhrskFchwl6QCdRTuEbE5IvZHxHREbG8z7tKIyIgYq69ESdJiLRjuEbEMuAW4CFgPXB4R6+cYdxrwDuDeuouUJC1OJ+/cNwHTmfl4Zj4PTAJb5hj3x8D7gB/UWJ8kqQudhPtK4Imm9Zlq21ERcQ6wOjPvqLE2SVKXIjPbD4i4DLgwM6+u1q8ENmXmtdX6C4C7gasy80BETAHvzMz753iubcA2gNHR0Y2Tk5NdFb33yUNHl0dPgWee6+ppurJh5emLfszs7CwjIyM9vW7znPuhXR+7mXPd6uhhvx3PNfZ7/2qndf8aVB97mXOvudPLMTUxMbEnMxf8uebyDp5rBljdtL4KeKpp/TTgLGAqIgB+CtgZEZe0Bnxm7gB2AIyNjeX4+HgHL3+sq7bfeXT5+g2HuXlvJ9Oox4Erxhf9mKmpKbqd6xHNc+6Hdn3sZs51q6OH/XY819jv/aud1v1rUH3sZc695s4gjqlOTsvcB6yLiLURcSKwFdh55M7MPJSZKzJzTWauAe4Bjgl2SdLgLBjumXkYuAa4C3gMuC0z90XETRFxSb8LlCQtXkefKzJzF7CrZduN84wd770sSVIv/A1VSSqQ4S5JBTLcJalAhrskFchwl6QCGe6SVCDDXZIKZLhLUoEMd0kqkOEuSQUy3CWpQIa7JBXIcJekAhnuklQgw12SCmS4S1KBDHdJKpDhLkkFMtwlqUCGuyQVyHCXpAIZ7pJUIMNdkgpkuEtSgQx3SSqQ4S5JBTLcJalAhrskFchwl6QCGe6SVCDDXZIKZLhLUoEMd0kqUEfhHhGbI2J/RExHxPY57r8uIh6NiIcj4ksR8fL6S5UkdWrBcI+IZcAtwEXAeuDyiFjfMuwBYCwzXw3cDryv7kIlSZ3r5J37JmA6Mx/PzOeBSWBL84DM3J2Z369W7wFW1VumJGkxIjPbD4i4FNicmVdX61cCr8nMa+YZ/1fANzLzT+a4bxuwDWB0dHTj5ORkV0XvffLQ0eXRU+CZ57p6mq5sWHn6oh8zOzvLyMhIT6/bPOd+aNfHbuZctzp62G/Hc4393r/aad2/BtXHXubca+70ckxNTEzsycyxhcYt7+C5Yo5tc/6LEBFvBsaA1891f2buAHYAjI2N5fj4eAcvf6yrtt95dPn6DYe5eW8n06jHgSvGF/2Yqakpup3rEc1z7od2fexmznWro4f9djzX2O/9q53W/WtQfexlzr3mziCOqU6qmwFWN62vAp5qHRQRFwDvBl6fmT+spzxJUjc6Oed+H7AuItZGxInAVmBn84CIOAf4G+CSzDxYf5mSpMVYMNwz8zBwDXAX8BhwW2bui4ibIuKSatifAyPA5yLiwYjYOc/TSZIGoKOTRpm5C9jVsu3GpuULaq5LktQDf0NVkgpkuEtSgQx3SSqQ4S5JBTLcJalAhrskFchwl6QCGe6SVCDDXZIKZLhLUoEMd0kqkOEuSQUy3CWpQIa7JBXIcJekAhnuklQgw12SCmS4S1KBDHdJKpDhLkkFMtwlqUCGuyQVyHCXpAIZ7pJUIMNdkgpkuEtSgQx3SSqQ4S5JBTLcJalAhrskFchwl6QCGe6SVKCOwj0iNkfE/oiYjojtc9x/UkR8trr/3ohYU3ehkqTOLRjuEbEMuAW4CFgPXB4R61uGvRX4bmb+LPB+4L11FypJ6lwn79w3AdOZ+XhmPg9MAltaxmwBPl4t3w68MSKivjIlSYvRSbivBJ5oWp+pts05JjMPA4eAl9RRoCRp8ZZ3MGaud+DZxRgiYhuwrVqdjYj9Hbx+W++AFcC3en2eTkV3J5wGWmM32vWxyznXbeh7iDV2ZY79a+hqbNVr7vR4TL28k0GdhPsMsLppfRXw1DxjZiJiOXA68J3WJ8rMHcCOTgrrVETcn5ljdT5n3ayxd8NeH1hjXayxHp2clrkPWBcRayPiRGArsLNlzE7gN6rlS4G7M/OYd+6SpMFY8J17Zh6OiGuAu4BlwEcyc19E3ATcn5k7gQ8Dn4yIaRrv2Lf2s2hJUnudnJYhM3cBu1q23di0/APgsnpL61itp3n6xBp7N+z1gTXWxRprEJ49kaTyePkBSSrQUId7RKyOiN0R8VhE7IuI36m2vycinoyIB6vbxU2PuaG6DML+iLhwADWeHBFfjoiHqhr/qNq+troUw1erSzOcWG0f+KUa2tT4sYj4elMfz662R0R8oKrx4Yg4t981NtW6LCIeiIg7qvWh6eM89Q1jDw9ExN6qnvurbS+OiC9WffxiRLxoKeucp8ZhOq7PiIjbI+IrVf68dth6uKDMHNobcCZwbrV8GvBfNC6B8B7gnXOMXw88BJwErAW+Bizrc40BjFTLJwD3AucBtwFbq+0fBH6rWv5t4IPV8lbgswPo43w1fgy4dI7xFwP/XD3uPODeAf6dXwd8GrijWh+aPs5T3zD28ACwomXb+4Dt1fJ24L1LWec8NQ7Tcf1x4Opq+UTgjGHr4UK3oX7nnplPZ+Z/Vsv/CzzGsb8d22wLMJmZP8zMrwPTNC6f0M8aMzNnq9UTqlsCb6BxKQZo7Ci/1lTjQC/V0KbG+WwBPlE97h7gjIg4s581AkTEKuBXgA9V68EQ9bG1vgUsSQ8XqOdIv1r7OEx1zmWgx3VEvBB4HY1vAZKZz2fm9zjOejjU4d6s+th9Do13nQDXVB+BPnLk4xGdXSqhH7Uti4gHgYPAF2m8s/heNi7F0FrHklyqobXGzDzSxz+t+vj+iDiptcY56u+nvwTeBfy4Wn8Jw9XH1vqOGKYeQuMf7i9ExJ5o/FY4wGhmPg2NN03Ay5a4zrlqhOE4rl8BfBP4aHUK7kMRcSrD18O2jotwj4gR4O+B383M/wFuBX4GOBt4Grj5yNA5Ht73rwNl5v9l5tk0fnt3E/CqNnUMRY0RcRZwA/DzwC8ALwZ+f6lqjIhfBQ5m5p7mzW3qGGiN89QHQ9TDJudn5rk0ruT69oh4XZuxS1XnXDUOy3G9HDgXuDUzzwGepXEaZj5L+Xc9r6EP94g4gUaw/11m/gNAZj5ThdWPgb/lJx/ROrlUQt9UH92maJx3OyMal2JoreNojdHmUg0DqHFzddorM/OHwEdZ2j6eD1wSEQdoXHn0DTTeKQ9LH4+pLyI+NWQ9BCAzn6r+PAj8Y1XTM0dOFVR/HlzKOueqcYiO6xlgpunT7e00wn6oeriQoQ736hzqh4HHMvMvmrY3n8/6deCRanknsLX6JsVaYB3w5T7X+NKIOKNaPgW4gMbPBnbTuBQDNC7N8E9NNQ70Ug3z1PiVph01aJw/bO7jW6pvAZwHHDrycbRfMvOGzFyVmWto/ID07sy8giHp4zz1vXmYeljVcWpEnHZkGfjlqqbmfrX2caB1zlfjsBzXmfkN4ImI+Llq0xuBRxmiHnZkUD+57eYG/CKNjzcPAw9Wt4uBTwJ7q+07gTObHvNuGue89wMXDaDGVwMPVLU8AtxYbX8FjR1wGvgccFK1/eRqfbq6/xVLWOPdVR8fAT7FT75REzT+g5avVfePDfjvfZyffBtlaPo4T31D1cOqXw9Vt33Au6vtLwG+BHy1+vPFS1VnmxqH6bg+G7i/quXzwIuGqYed3PwNVUkq0FCflpEkdcdwl6QCGe6SVCDDXZIKZLhLUoEMd0kqkOEuSQUy3CWpQP8P38eHtjZ84oUAAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df['Length'].hist()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### UniProt service can help us getting the FASTA sequence and more generally information about a given protein" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "sequence = u.retrieve(\"P43403\", \"fasta\")" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ">sp|P43403|ZAP70_HUMAN Tyrosine-protein kinase ZAP-70 OS=Homo sapiens OX=9606 GN=ZAP70 PE=1 SV=1\n", "MPDPAAHLPFFYGSISRAEAEEHLKLAGMADGLFLLRQCLRSLGGYVLSLVHDVRFHHFP\n", "IERQLNGTYAIAGGKAHCGPAELCEFYSRDPDGLPCNLRKPCNRPSGLEPQPGVFDCLRD\n", "AMVRDYVRQTWKLEGEALEQAIISQAPQVEKLIATTAHERMPWYHSSLTREEAERKLYSG\n", "AQTDGKFLLRPRKEQGTYALSLIYGKTVYHYLISQDKAGKYCIPEGTKFDTLWQLVEYLK\n", "LKADGLIYCLKEACPNSSASNASGAAAPTLPAHPSTLTHPQRRIDTLNSDGYTPEPARIT\n", "SPDKPRPMPMDTSVYESPYSDPEELKDKKLFLKRDNLLIADIELGCGNFGSVRQGVYRMR\n", "KKQIDVAIKVLKQGTEKADTEEMMREAQIMHQLDNPYIVRLIGVCQAEALMLVMEMAGGG\n", "PLHKFLVGKREEIPVSNVAELLHQVSMGMKYLEEKNFVHRDLAARNVLLVNRHYAKISDF\n", "GLSKALGADDSYYTARSAGKWPLKWYAPECINFRKFSSRSDVWSYGVTMWEALSYGQKPY\n", "KKMKGPEVMAFIEQGKRMECPPECPPELYALMSDCWIYKWEDRPDFLTVEQRMRACYYSL\n", "ASKVEGPPGSTQKAEAACA\n", "\n" ] } ], "source": [ "print(sequence)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, you can just use the following function to get the fasta sequence:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'MPDPAAHLPFFYGSISRAEAEEHLKLAGMADGLFLLRQCLRSLGGYVLSLVHDVRFHHFPIERQLNGTYAIAGGKAHCGPAELCEFYSRDPDGLPCNLRKPCNRPSGLEPQPGVFDCLRDAMVRDYVRQTWKLEGEALEQAIISQAPQVEKLIATTAHERMPWYHSSLTREEAERKLYSGAQTDGKFLLRPRKEQGTYALSLIYGKTVYHYLISQDKAGKYCIPEGTKFDTLWQLVEYLKLKADGLIYCLKEACPNSSASNASGAAAPTLPAHPSTLTHPQRRIDTLNSDGYTPEPARITSPDKPRPMPMDTSVYESPYSDPEELKDKKLFLKRDNLLIADIELGCGNFGSVRQGVYRMRKKQIDVAIKVLKQGTEKADTEEMMREAQIMHQLDNPYIVRLIGVCQAEALMLVMEMAGGGPLHKFLVGKREEIPVSNVAELLHQVSMGMKYLEEKNFVHRDLAARNVLLVNRHYAKISDFGLSKALGADDSYYTARSAGKWPLKWYAPECINFRKFSSRSDVWSYGVTMWEALSYGQKPYKKMKGPEVMAFIEQGKRMECPPECPPELYALMSDCWIYKWEDRPDFLTVEQRMRACYYSLASKVEGPPGSTQKAEAACA'" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "u.get_fasta_sequence(\"P43403\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### UniProt service has also a mapping utility that can be called via BioServices" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The are 3 parameters required. The input database code, the output database code and query as a list of valid identifiers" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defaultdict(list, {'P43403': ['hsa:7535']})" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "u.mapping(\"ACC\", \"KEGG_ID\", 'P43403')" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "defaultdict(list,\n", " {'P43403': ['1FBV',\n", " '1M61',\n", " '1U59',\n", " '2CBL',\n", " '2OQ1',\n", " '2OZO',\n", " '2Y1N',\n", " '3ZNI',\n", " '4A4B',\n", " '4A4C',\n", " '4K2R',\n", " '4XZ0',\n", " '4XZ1',\n", " '5O76']})" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "u.mapping(\"ID\", \"PDB_ID\", \"P43403\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the databases available for mapping" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'AGD': 'AGD_ID',\n", " 'Aarhus/Ghent-2DPAGE': 'AARHUS_GHENT_2DPAGE_ID',\n", " 'Allergome': 'ALLERGOME_ID',\n", " 'ArachnoServer': 'ARACHNOSERVER_ID',\n", " 'BioCyc': 'BIOCYC_ID',\n", " 'CGD': 'CGD',\n", " 'CYGD': 'CYGD_ID',\n", " 'ChEMBL': 'CHEMBL_ID',\n", " 'ChiTaRS': 'CHITARS_ID',\n", " 'CleanEx': 'CLEANEX_ID',\n", " 'ConoServer': 'CONOSERVER_ID',\n", " 'DIP': 'DIP_ID',\n", " 'DMDM': 'DMDM_ID',\n", " 'DNASU': 'DNASU_ID',\n", " 'DisProt': 'DISPROT_ID',\n", " 'DrugBank': 'DRUGBANK_ID',\n", " 'EMBL/GenBank/DDBJ': 'EMBL_ID',\n", " 'EMBL/GenBank/DDBJ CDS': 'EMBL',\n", " 'EchoBASE': 'ECHOBASE_ID',\n", " 'EcoGene': 'ECOGENE_ID',\n", " 'Ensembl': 'ENSEMBL_ID',\n", " 'Ensembl Genomes': 'ENSEMBLGENOME_ID',\n", " 'Ensembl Genomes Protein': 'ENSEMBLGENOME_PRO_ID',\n", " 'Ensembl Genomes Transcript': 'ENSEMBLGENOME_TRS_ID',\n", " 'Ensembl Protein': 'ENSEMBL_PRO_ID',\n", " 'Ensembl Transcript': 'ENSEMBL_TRS_ID',\n", " 'Entrez Gene (GeneID)': 'P_ENTREZGENEID',\n", " 'EuPathDB': 'EUPATHDB_ID',\n", " 'FlyBase': 'FLYBASE_ID',\n", " 'GI number*': 'P_GI',\n", " 'GeneCards': 'GENECARDS_ID',\n", " 'GeneFarm': 'GENEFARM_ID',\n", " 'GeneID': 'P_ENTREZGENEID',\n", " 'GeneTree': 'GENETREE_ID',\n", " 'GenoList': 'GENOLIST_ID',\n", " 'GenomeRNAi': 'GENOMERNAI_ID',\n", " 'GenomeReviews': 'GENOMEREVIEWS_ID',\n", " 'GermOnline': 'GERMONLINE_ID',\n", " 'H-InvDB': 'H_INVDB_ID',\n", " 'HGNC': 'HGNC_ID',\n", " 'HOGENOM': 'HOGENOM_ID',\n", " 'HOVERGEN': 'HOVERGEN_ID',\n", " 'HPA': 'HPA_ID',\n", " 'HSSP': 'HSSP_ID',\n", " 'IPI': 'P_IPI',\n", " 'KEGG': 'KEGG_ID',\n", " 'KO': 'KO_ID',\n", " 'LegioList': 'LEGIOLIST_ID',\n", " 'Leproma': 'LEPROMA_ID',\n", " 'MEROPS': 'MEROPS_ID',\n", " 'MGI': 'MGI_ID',\n", " 'MIM': 'MIM_ID',\n", " 'MINT': 'MINT_ID',\n", " 'MaizeGDB': 'MAIZEGDB_ID',\n", " 'NextBio': 'NEXTBIO_ID',\n", " 'OMA': 'OMA_ID',\n", " 'Orphanet': 'ORPHANET_ID',\n", " 'OrthoDB': 'ORTHODB_ID',\n", " 'PATRIC': 'PATRIC_ID',\n", " 'PDB': 'PDB_ID',\n", " 'PIR': 'PIR',\n", " 'PeroxiBase': 'PEROXIBASE_ID',\n", " 'PharmGKB': 'PHARMGKB_ID',\n", " 'PhosSite': 'PHOSSITE_ID',\n", " 'PomBase': 'POMBASE_ID',\n", " 'PptaseDB': 'PPTASEDB_ID',\n", " 'ProtClustDB': 'PROTCLUSTDB_ID',\n", " 'PseudoCAP': 'PSEUDOCAP_ID',\n", " 'REBASE': 'REBASE_ID',\n", " 'RGD': 'RGD_ID',\n", " 'Reactome': 'REACTOME_ID',\n", " 'RefSeq Nucleotide': 'REFSEQ_NT_ID',\n", " 'RefSeq Protein': 'P_REFSEQ_AC',\n", " 'SGD': 'SGD_ID',\n", " 'TAIR': 'TAIR_ID',\n", " 'TCDB': 'TCDB_ID',\n", " 'TubercuList': 'TUBERCULIST_ID',\n", " 'UCSC': 'UCSC_ID',\n", " 'UniGene': 'UNIGENE_ID',\n", " 'UniParc': 'UPARC',\n", " 'UniPathWay': 'UNIPATHWAY_ID',\n", " 'UniProtKB': 'ID',\n", " 'UniProtKB AC/ID': 'ACC+ID',\n", " 'UniRef100': 'NF100',\n", " 'UniRef50': 'NF50',\n", " 'UniRef90': 'NF90',\n", " 'VectorBase': 'VECTORBASE_ID',\n", " 'World-2DPAGE': 'WORLD_2DPAGE_ID',\n", " 'WormBase': 'WORMBASE_ID',\n", " 'WormBase Protein': 'WORMBASE_PRO_ID',\n", " 'WormBase Transcript': 'WORMBASE_TRS_ID',\n", " 'Xenbase': 'XENBASE_ID',\n", " 'ZFIN': 'ZFIN_ID',\n", " 'dictyBase': 'DICTYBASE_ID',\n", " 'eggNOG': 'EGGNOG_ID',\n", " 'euHCVdb': 'EUHCVDB_ID',\n", " 'mycoCLAP': 'MYCOCLAP_ID',\n", " 'neXtProt': 'NEXTPROT_ID'}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "u._mapping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Using get_df method to get exhaustive information" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "--2020-03-10 18:05:59-- ftp://ftp.ebi.ac.uk/pub/databases/uniprot/knowledgebase/uniprot_sprot.fasta.gz\n", " => ‘uniprot_sprot.fasta.gz’\n", "Resolving ftp.ebi.ac.uk (ftp.ebi.ac.uk)... 193.62.197.74\n", "Connecting to ftp.ebi.ac.uk (ftp.ebi.ac.uk)|193.62.197.74|:21... connected.\n", "Logging in as anonymous ... Logged in!\n", "==> SYST ... done. ==> PWD ... done.\n", "==> TYPE I ... done. ==> CWD (1) /pub/databases/uniprot/knowledgebase ... done.\n", "==> SIZE uniprot_sprot.fasta.gz ... 89133570\n", "==> PASV ... done. ==> RETR uniprot_sprot.fasta.gz ... done.\n", "Length: 89133570 (85M) (unauthoritative)\n", "\n", "uniprot_sprot.fasta 100%[===================>] 85.00M 4.58MB/s in 19s \n", "\n", "2020-03-10 18:06:18 (4.41 MB/s) - ‘uniprot_sprot.fasta.gz’ saved [89133570]\n", "\n" ] } ], "source": [ "! wget ftp://ftp.ebi.ac.uk/pub/databases/uniprot/knowledgebase/uniprot_sprot.fasta.gz" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "!gunzip -c uniprot_sprot.fasta.gz | grep sp - | grep HUMAN | awk '{print substr($1, 12, length($1))}' > list.txt\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "with open(\"list.txt\", \"r\") as fh:\n", " identifiers = fh.read().split(\"\\n\")" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO [bioservices:UniProt]: fetching information from uniprot for 1000 entries\n", "INFO [bioservices:UniProt]: uniprot.get_df 1/10\n", "INFO [bioservices:UniProt]: uniprot.get_df 2/10\n", "INFO [bioservices:UniProt]: uniprot.get_df 3/10\n", "INFO [bioservices:UniProt]: uniprot.get_df 4/10\n", "INFO [bioservices:UniProt]: uniprot.get_df 5/10\n", "INFO [bioservices:UniProt]: uniprot.get_df 6/10\n", "INFO [bioservices:UniProt]: uniprot.get_df 7/10\n", "INFO [bioservices:UniProt]: uniprot.get_df 8/10\n", "INFO [bioservices:UniProt]: uniprot.get_df 9/10\n", "INFO [bioservices:UniProt]: uniprot.get_df 10/10\n" ] } ], "source": [ "# This is slow. You may want to increase the attribute TIMEOUT or call this \n", "# command several times and aggregate the results\n", "u.TIMEOUT = 300\n", "####### limit set to 1 is important, otherwise this will take forever\n", "u.logging.level = \"INFO\"\n", "df = u.get_df(identifiers[0:1000], limit=1)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXwAAAD8CAYAAAB0IB+mAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4xLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvAOZPmwAAEp1JREFUeJzt3X+M5HV9x/Hnu3cHUhYP9Ox6Oc4uRNKUeK3ABjE0ZhZ/ARr5B5MjRMFqLrVabYsxhyYYTYzaRK0GI14rFY1lsaDtlbuGEmT98YfoLh7cHSdy2GtYoJ6CHi6i9uq7f8yXdnbYY2Znv7O7M5/nI5ns9/v5fuY77/fN7Gu/+92Z70VmIkkafr+z0gVIkpaHgS9JhTDwJakQBr4kFcLAl6RCGPiSVAgDX5IKYeBLUiEMfEkqxNqVeuANGzbk2NjYvLEnn3ySE088cWUK6iP7GjzD2tuw9gXD21t7XzMzMz/NzBf0sq8VC/yxsTGmp6fnjU1NTdFoNFamoD6yr8EzrL0Na18wvL219xUR/9nrvjylI0mFMPAlqRAGviQVwsCXpEIY+JJUiI6BHxHPiYjvRsQ9EbE/Ij64wJzjI+KmiDgYEXdFxFg/ipUk9a6bI/xfAxdk5h8DLwUujIjz2ua8FfhZZr4Y+CTwsXrLlCQtVcfAz6a5anVddWv/fxEvAW6olm8GXhkRUVuVkqQl6+ocfkSsiYg9wGHg9sy8q23KJuAhgMw8ChwBnl9noZKkpYnF/CfmEXEy8DXgLzJzX8v4fuC1mTlbrT8InJuZj7XdfxuwDWB0dPScycnJefufm5tjZGSkx1a6t/fhIz3fd8um9Yu+z3L1tdyGtS8Y3t6GtS8Y3t7a+5qYmJjJzPFe9rWoSytk5s8jYgq4ENjXsmkW2AzMRsRaYD3w+AL33wHsABgfH8/2j0Ev10ejr9y+q+f7Hrq8sej7lPKR72EyrL0Na18wvL3V2Vc379J5QXVkT0ScALwK+EHbtJ3AFdXypcDXczG/OkiS+q6bI/yNwA0RsYbmD4ivZOatEfEhYDozdwKfB74UEQdpHtlv7VvFkqSedAz8zLwXOGuB8Wtaln8FvLHe0iRJdfKTtpJUCANfkgph4EtSIQx8SSqEgS9JhTDwJakQBr4kFcLAl6RCGPiSVAgDX5IKYeBLUiEMfEkqhIEvSYUw8CWpEAa+JBXCwJekQhj4klQIA1+SCmHgS1IhDHxJKoSBL0mFMPAlqRAGviQVwsCXpEIY+JJUCANfkgrRMfAjYnNE3BkRByJif0S8e4E5jYg4EhF7qts1/SlXktSrtV3MOQpclZl3R8RJwExE3J6Z97XN+1Zmvr7+EiVJdeh4hJ+Zj2bm3dXyL4ADwKZ+FyZJqldkZveTI8aAbwIvycwnWsYbwC3ALPAI8J7M3L/A/bcB2wBGR0fPmZycnLd9bm6OkZGRxfawaHsfPtLzfbdsWr/o+yxXX8ttWPuC4e1tWPuC4e2tva+JiYmZzBzvZV9dB35EjADfAD6cmV9t2/Zc4LeZORcRFwOfyswznm1/4+PjOT09PW9samqKRqOxiPJ7M7Z9V8/3PfTR1y36PsvV13Ib1r5geHsb1r5geHtr7ysieg78rt6lExHraB7Bf7k97AEy84nMnKuWdwPrImJDLwVJkvqjm3fpBPB54EBmfuIYc15YzSMizq32+1idhUqSlqabd+mcD7wJ2BsRe6qx9wEvAsjM64BLgbdHxFHgKWBrLuaPA5KkvusY+Jn5bSA6zLkWuLauoiRJ9fOTtpJUCANfkgph4EtSIQx8SSqEgS9JhTDwJakQBr4kFcLAl6RCGPiSVAgDX5IKYeBLUiEMfEkqhIEvSYUw8CWpEAa+JBXCwJekQhj4klQIA1+SCmHgS1IhDHxJKoSBL0mFMPAlqRAGviQVwsCXpEIY+JJUiI6BHxGbI+LOiDgQEfsj4t0LzImI+HREHIyIeyPi7P6UK0nq1dou5hwFrsrMuyPiJGAmIm7PzPta5lwEnFHdXgZ8tvoqSVolOh7hZ+ajmXl3tfwL4ACwqW3aJcAXs+k7wMkRsbH2aiVJPVvUOfyIGAPOAu5q27QJeKhlfZZn/lCQJK2gyMzuJkaMAN8APpyZX23btgv4SGZ+u1q/A3hvZs60zdsGbAMYHR09Z3Jyct5jzM3NMTIy0rGWvQ8f6armftiyaf2i7/N0X0upu5fH7bdun69BNKy9DWtfMLy9tfc1MTExk5njveyrm3P4RMQ64Bbgy+1hX5kFNresnwo80j4pM3cAOwDGx8ez0WjM2z41NUX72EKu3L6rm7L74tDljUXf5+m+llJ3L4/bb90+X4NoWHsb1r5geHurs69u3qUTwOeBA5n5iWNM2wm8uXq3znnAkcx8tJYKJUm16OYI/3zgTcDeiNhTjb0PeBFAZl4H7AYuBg4CvwTeUn+pkqSl6Bj41Xn56DAngXfUVZQkqX5+0laSCmHgS1IhDHxJKoSBL0mFMPAlqRAGviQVwsCXpEIY+JJUCANfkgph4EtSIQx8SSqEgS9JhTDwJakQBr4kFcLAl6RCGPiSVAgDX5IKYeBLUiEMfEkqhIEvSYUw8CWpEAa+JBXCwJekQhj4klQIA1+SCmHgS1IhOgZ+RFwfEYcjYt8xtjci4khE7Klu19RfpiRpqdZ2MecLwLXAF59lzrcy8/W1VCRJ6ouOR/iZ+U3g8WWoRZLUR5GZnSdFjAG3ZuZLFtjWAG4BZoFHgPdk5v5j7GcbsA1gdHT0nMnJyXnb5+bmGBkZ6VjP3oePdJzTL1s2rV/0fZ7uayl19/K4/dbt8zWIhrW3Ye0Lhre39r4mJiZmMnO8l33VEfjPBX6bmXMRcTHwqcw8o9M+x8fHc3p6et7Y1NQUjUajYz1j23d1nNMvhz76ukXf5+m+llJ3L4/bb90+X4NoWHsb1r5geHtr7ysieg78Jb9LJzOfyMy5ank3sC4iNix1v5Kkei058CPihRER1fK51T4fW+p+JUn16vgunYi4EWgAGyJiFvgAsA4gM68DLgXeHhFHgaeArdnNeSJJ0rLqGPiZeVmH7dfSfNumJGkV85O2klQIA1+SCmHgS1IhDHxJKoSBL0mFMPAlqRAGviQVwsCXpEIY+JJUCANfkgph4EtSIQx8SSqEgS9JhTDwJakQBr4kFcLAl6RCGPiSVAgDX5IKYeBLUiEMfEkqhIEvSYUw8CWpEAa+JBXCwJekQhj4klSIjoEfEddHxOGI2HeM7RERn46IgxFxb0ScXX+ZkqSl6uYI/wvAhc+y/SLgjOq2Dfjs0suSJNWtY+Bn5jeBx59lyiXAF7PpO8DJEbGxrgIlSfWo4xz+JuChlvXZakyStIpEZnaeFDEG3JqZL1lg2y7gI5n57Wr9DuC9mTmzwNxtNE/7MDo6es7k5OS87XNzc4yMjHSsZ+/DRzrOWU1GT4AfP7WyNWzZtL72fXb7fA2iYe1tWPuC5e9tKTm0mO/H9r4mJiZmMnO8l8dd28ud2swCm1vWTwUeWWhiZu4AdgCMj49no9GYt31qaor2sYVcuX1Xb5WukKu2HOXje+v4p+7docsbte+z2+drEA1rb8PaFyx/b0vJocV8P9bZVx2ndHYCb67erXMecCQzH61hv5KkGnU87IyIG4EGsCEiZoEPAOsAMvM6YDdwMXAQ+CXwln4VK0nqXcfAz8zLOmxP4B21VSRJ6gs/aStJhTDwJakQBr4kFcLAl6RCGPiSVAgDX5IKYeBLUiEMfEkqhIEvSYUw8CWpEAa+JBXCwJekQhj4klQIA1+SCmHgS1IhDHxJKoSBL0mFMPAlqRAGviQVwsCXpEIY+JJUCANfkgph4EtSIQx8SSqEgS9JhTDwJakQXQV+RFwYEfdHxMGI2L7A9isj4icRsae6va3+UiVJS7G204SIWAN8Bng1MAt8LyJ2ZuZ9bVNvysx39qFGSVINujnCPxc4mJk/yszfAJPAJf0tS5JUt8jMZ58QcSlwYWa+rVp/E/Cy1qP5iLgS+AjwE+CHwF9l5kML7GsbsA1gdHT0nMnJyXnb5+bmGBkZ6Vj03oePdJyzmoyeAD9+amVr2LJpfe377Pb5GkTD2tuw9gXL39tScmgx34/tfU1MTMxk5ngvj9vxlA4QC4y1/5T4V+DGzPx1RPwZcANwwTPulLkD2AEwPj6ejUZj3vapqSnaxxZy5fZdXZS9ely15Sgf39vNP3X/HLq8Ufs+u32+BtGw9jasfcHy97aUHFrM92OdfXVzSmcW2NyyfirwSOuEzHwsM39drf4dcE4t1UmSatNN4H8POCMiTouI44CtwM7WCRGxsWX1DcCB+kqUJNWh43mGzDwaEe8EbgPWANdn5v6I+BAwnZk7gXdFxBuAo8DjwJV9rFmS1IOuTixn5m5gd9vYNS3LVwNX11uaJKlOftJWkgph4EtSIQx8SSqEgS9JhTDwJakQBr4kFcLAl6RCGPiSVAgDX5IKYeBLUiEMfEkqhIEvSYUw8CWpEAa+JBXCwJekQhj4klQIA1+SCmHgS1IhDHxJKoSBL0mFMPAlqRAGviQVwsCXpEIY+JJUCANfkgrRVeBHxIURcX9EHIyI7QtsPz4ibqq23xURY3UXKklamo6BHxFrgM8AFwFnApdFxJlt094K/CwzXwx8EvhY3YVKkpammyP8c4GDmfmjzPwNMAlc0jbnEuCGavlm4JUREfWVKUlaqm4CfxPwUMv6bDW24JzMPAocAZ5fR4GSpHqs7WLOQkfq2cMcImIbsK1anYuI+9umbAB+2kVNA+Vdq6Cv6M9JthXvq4+Gtbdh7QsGqLdFfj+29/X7vT5uN4E/C2xuWT8VeOQYc2YjYi2wHni8fUeZuQPYcawHiojpzBzvoqaBYl+DZ1h7G9a+YHh7q7Ovbk7pfA84IyJOi4jjgK3AzrY5O4ErquVLga9n5jOO8CVJK6fjEX5mHo2IdwK3AWuA6zNzf0R8CJjOzJ3A54EvRcRBmkf2W/tZtCRp8bo5pUNm7gZ2t41d07L8K+CNNdRzzNM9A86+Bs+w9jasfcHw9lZbX+GZF0kqg5dWkKRCrIrA73TphtUoIq6PiMMRsa9l7HkRcXtEPFB9PaUaj4j4dNXfvRFxdst9rqjmPxARVyz0WMspIjZHxJ0RcSAi9kfEu6vxge4tIp4TEd+NiHuqvj5YjZ9WXQ7kgeryIMdV48e8XEhEXF2N3x8Rr12ZjuaLiDUR8f2IuLVaH5a+DkXE3ojYExHT1dhAvxarek6OiJsj4gfV99rLl6WvzFzRG80/BD8InA4cB9wDnLnSdXVR9yuAs4F9LWN/A2yvlrcDH6uWLwb+jebnFc4D7qrGnwf8qPp6SrV8ygr3tRE4u1o+CfghzUtqDHRvVX0j1fI64K6q3q8AW6vx64C3V8t/DlxXLW8FbqqWz6xeo8cDp1Wv3TWr4PX418A/ArdW68PS1yFgQ9vYQL8Wq5puAN5WLR8HnLwcfa3ok1kV/XLgtpb1q4GrV7quLmsfY37g3w9srJY3AvdXy58DLmufB1wGfK5lfN681XAD/gV49TD1BvwucDfwMpofaFnb/lqk+a60l1fLa6t50f76bJ23gv2cCtwBXADcWtU58H1VdRzimYE/0K9F4LnAf1D9DXU5+1oNp3S6uXTDoBjNzEcBqq+/V40fq8dV3Xv16/5ZNI+GB7636rTHHuAwcDvNo9ifZ/NyIDC/xmNdLmTV9QX8LfBe4LfV+vMZjr6g+Yn9f4+ImWh+Uh8G/7V4OvAT4B+q03B/HxEnsgx9rYbA7+qyDAPuWD2u2t4jYgS4BfjLzHzi2aYuMLYqe8vM/8nMl9I8Ij4X+MOFplVfB6KviHg9cDgzZ1qHF5g6UH21OD8zz6Z5td53RMQrnmXuoPS2lubp4M9m5lnAkzRP4RxLbX2thsDv5tINg+LHEbERoPp6uBo/Vo+rsveIWEcz7L+cmV+thoeiN4DM/DkwRfN86MnRvBwIzK/x/+qP+ZcLWW19nQ+8ISIO0byS7QU0j/gHvS8AMvOR6uth4Gs0f1AP+mtxFpjNzLuq9Ztp/gDoe1+rIfC7uXTDoGi9xMQVNM9/Pz3+5uqv7ecBR6pf2W4DXhMRp1R/kX9NNbZiIiJofnL6QGZ+omXTQPcWES+IiJOr5ROAVwEHgDtpXg4EntnXQpcL2Qlsrd7tchpwBvDd5enimTLz6sw8NTPHaH7vfD0zL2fA+wKIiBMj4qSnl2m+hvYx4K/FzPwv4KGI+INq6JXAfSxHXyv9R5nqjw0X03w3yIPA+1e6ni5rvhF4FPhvmj9p30rzXOgdwAPV1+dVc4PmfyLzILAXGG/Zz58CB6vbW1ZBX39C89fCe4E91e3iQe8N+CPg+1Vf+4BrqvHTaQbbQeCfgOOr8edU6wer7ae37Ov9Vb/3Axet9HPWUleD/3+XzsD3VfVwT3Xb/3Q2DPprsarnpcB09Xr8Z5rvsul7X37SVpIKsRpO6UiSloGBL0mFMPAlqRAGviQVwsCXpEIY+JJUCANfkgph4EtSIf4XltToLfX3WI0AAAAASUVORK5CYII=\n", "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df.Length.hist(bins=20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "For more information, please see bioservices.uniprot module documentation." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.5" } }, "nbformat": 4, "nbformat_minor": 1 }