👁 UniProt Protein Sequence & Annotation Scraper avatar

UniProt Protein Sequence & Annotation Scraper

Pricing

from $28.12 / 1,000 results

👁 UniProt Protein Sequence & Annotation Scraper

UniProt Protein Sequence & Annotation Scraper

Export UniProt Knowledgebase entries — search Swiss-Prot by organism, keyword, gene, or any UniProt query, or fetch a single accession. Returns names, genes, organism, sequence length & molecular weight, keywords, comments, features, and PDB/RefSeq/Ensembl/KEGG cross-refs.

Pricing

from $28.12 / 1,000 results

Rating

0.0

(0)

Developer

👁 ParseForge

ParseForge

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

🧬 UniProt Protein Sequence & Annotation Scraper

🚀 Export UniProt Knowledgebase entries in seconds. Query Swiss-Prot and TrEMBL by organism, gene, keyword, subcellular location, length range, or any UniProt field, or fetch a single accession with full annotations. No API key, no SPARQL, no XML parsing.

🕒 Last updated: 2026-05-13 · 📊 25 fields per entry · 🧬 250M+ UniProt entries · 🌍 every kingdom of life

The UniProt Protein Scraper queries the official UniProt REST API and returns standardized protein records from the world's largest protein-sequence knowledgebase. Each entry carries the primary accession, UniProtKB ID, entry type (reviewed Swiss-Prot vs unreviewed TrEMBL), protein name, alternative names, gene names, organism (scientific + common + taxon ID + lineage), evidence level, annotation score, sequence length, molecular weight, CRC64 / MD5 sequence hashes, keywords (with categories), curated comments (function, subunit, subcellular location, etc.), structural features, reference counts, last-update date, entry version, and the canonical UniProt URL.

UniProt is maintained jointly by EMBL-EBI, SIB, and PIR and is the de facto reference for protein biology in research, pharma, and bioinformatics. Coverage spans 250 million+ entries across 2.7 million+ species in TrEMBL, with ~570,000 manually curated entries in Swiss-Prot. This Actor flattens UniProt's nested JSON into rows that drop into pandas, R, or any warehouse.

🎯 Target Audience	💡 Primary Use Cases
Bioinformatics teams, computational biologists, pharma research, structural biologists, drug-discovery startups, science journalists	Proteome exports, gene-to-protein mapping, target dossier builds, organism-level annotation, sequence + feature retrieval, cross-database joining

📋 What the UniProt Scraper does

Two lookup modes in one Actor:

🔍 Query mode. Pass any UniProt query (reviewed:true AND organism_id:9606, keyword:KW-0181, gene:BRCA1, cc_subcellular_location:nucleus, existence:1, taxonomy_id:10090 AND length:[100 TO 500]).
🆔 Accession mode. Set accession (e.g. P00533) for a single full-entry pull. Skips the search query entirely.

Each record carries identifiers (primary accession, UniProtKB ID, entry type), names (protein name, alternative names, gene names), taxonomy (scientific + common organism, taxon ID, lineage), evidence (protein existence, annotation score), sequence facts (length, molecular weight, CRC64, MD5, plus optional full sequence string), curated annotations (keywords, comments, features), reference + feature counts, last-updated date, version, and the canonical UniProt URL.

💡 Why it matters: UniProt's REST API is rich but verbose. Researchers and engineering teams spend days writing parsers for keywords, comments, and features. This Actor flattens the response into 25 spreadsheet-ready fields so target dossiers, comparative proteomics, and dataset prep land in one query.

🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing a human proteome pull, gene lookup, and accession fetch.

⚙️ Input

Input	Type	Default	Behavior
query	string	"reviewed:true AND organism_id:9606"	UniProt query syntax. Supports reviewed:, organism_id:, taxonomy_id:, gene:, keyword:, cc_subcellular_location:, existence:, length:[X TO Y], and more. Ignored when accession is set.
accession	string	""	Single UniProt accession (e.g. P00533). Bypasses the search query when set.
maxItems	integer	10	Records to return. Free plan caps at 10, paid plan at 1,000,000.
fetchSequence	boolean	false	When true, embeds the full amino-acid sequence string in every record. Sequence length and molecular weight are always returned.
pageSize	integer	500	Entries per API request. UniProt hard max is 500.

Example: every reviewed human Swiss-Prot entry.

{
"query":"reviewed:true AND organism_id:9606",
"maxItems":1000,
"pageSize":500
}

Example: single accession, full sequence included.

{
"accession":"P00533",
"fetchSequence":true
}

⚠️ Good to Know: the accession field is for a single entry. To resolve a list of accessions, use the query syntax: accession:P00533 OR accession:P04637. Use fetchSequence: false (default) when you do not need the raw amino-acid string. Sequence length and molecular weight are always returned regardless.

📊 Output

Each entry carries 25 fields. Download as CSV, Excel, JSON, or XML.

🧾 Schema

Field	Type	Example
🆔 `primaryAccession`	string	`"A0A0C5B5G6"`
🏷️ `uniProtkbId`	string	`"MOTSC_HUMAN"`
📚 `entryType`	string	`"UniProtKB reviewed (Swiss-Prot)"`
🧬 `proteinName`	string	`"Mitochondrial-derived peptide MOTS-c"`
📝 `alternativeNames`	string[]	`["Mitochondrial open reading frame of the 12S rRNA-c"]`
🧫 `geneNames`	string[]	`["MT-RNR1"]`
🦠 `organismScientific`	string	`"Homo sapiens"`
👤 `organismCommon`	string	`"Human"`
🆔 `taxonId`	number	`9606`
🌳 `organismLineage`	string[]	`["Eukaryota","Metazoa","Chordata",...]`
🧪 `proteinExistence`	string	`"1: Evidence at protein level"`
⭐ `annotationScore`	number	`5`
📏 `sequenceLength`	number	`16`
⚖️ `sequenceMolWeight`	number	`2175`
🔐 `sequenceCrc64`	string	`"361DE748426DD505"`
🔐 `sequenceMd5`	string	`"AE72B6C4E87692429C0D558B92BD7B3D"`
🏷️ `keywords`	object[]	`[{ "id": "KW-0238", "category": "Molecular function", "name": "DNA-binding" }]`
💬 `comments`	object[]	`[{ "type": "FUNCTION", "text": "Regulates insulin sensitivity ..." }]`
🧩 `features`	object[]	`[{ "type": "Chain", "description": "MOTS-c", "start": 1, "end": 16 }]`
📖 `referenceCount`	number	`17`
🧱 `featureCount`	number	`6`
📅 `lastUpdated`	date	`"2026-01-28"`
🔢 `entryVersion`	number	`30`
🔗 `url`	string	`"https://www.uniprot.org/uniprotkb/A0A0C5B5G6/entry"`
🕒 `scrapedAt`	ISO 8601	`"2026-05-13T22:25:18.386Z"`

📦 Sample record

✨ Why choose this Actor

	Capability
🧬	Authoritative knowledgebase. Pulls directly from the official UniProt REST API.
🔍	Full query syntax. Every UniProt search field works: organism, gene, keyword, location, length range, evidence, taxonomy.
🆔	Accession fast-path. Set `accession:` to pull one entry without writing a query.
📏	Sequence facts built in. Length and molecular weight always returned. Full sequence string available on demand.
🏷️	Curated annotations exposed. Keywords, comments, and features come through as structured arrays.
🚫	No API key. UniProt is a free public service.
🔁	Always fresh. Reflects the current UniProt release.

📊 UniProt entries are referenced in nearly every modern paper on protein biology, drug discovery, and structural biology.

📈 How it compares to alternatives

Approach	Cost	Coverage	Refresh	Format	Setup
⭐ UniProt Scraper (this Actor)	$5 free credit, then pay-per-use	UniProtKB (Swiss-Prot + TrEMBL)	Live per run	Flat JSON / CSV	⚡ 2 min
Direct REST API calls	Free	Same	Live	Nested JSON	🐢 Hours
Full release FASTA + XML download	Free	Full UniProt	8-week release	Massive flatfiles	🐢 Days
Commercial bioinformatics platform	$$$	Curated subset	Real-time	Web UI / API	⏳ Vendor onboarding

Pick this Actor when you want UniProt records in a flat table without writing a client or downloading the release.

🚀 How to use

📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
🌐 Open the Actor. Go to the UniProt Protein Scraper page on the Apify Store.
🎯 Set input. Pick a query (reviewed:true AND organism_id:9606 is a great starter) or an accession.
🚀 Run it. Click Start and let the Actor walk the UniProt API.
📥 Download. Grab results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to a downloaded proteome slice: 3-5 minutes. No coding required.

💼 Business use cases

🧪 Drug Discovery & Pharma

Target dossier builds for new programs
Cross-organism homolog comparisons
Subcellular location filters for druggability
Evidence-level scoring for prioritization

🧬 Bioinformatics & Genomics

Gene-to-protein lookups across organisms
Proteome exports for comparative analysis
Annotation enrichment for variant calling
Keyword and feature-based cohort building

🔬 Structural Biology

Length and molecular-weight filters for crystallography candidates
Feature-table mining for domain boundaries
Sequence hash joins to PDB or AlphaFold IDs
Reference-count signals for popular targets

🤖 LLM & Bio AI

Ground LLM responses in UniProt-authoritative data
Build RAG indexes for protein chatbots
Training data for sequence-attribute models
Validation layers for bio AI agents

🔌 Automating UniProt Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

🟢 Node.js. Install the apify-client NPM package.
🐍 Python. Use the apify-client PyPI package.
📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. UniProt has an eight-week release cycle. Schedule a refresh on the same cadence to stay current.

🌟 Beyond business use cases

UniProt data feeds far more than commercial pharma. The same structured records support research, education, and open-science work.

🎓 Research and academia

Reproducible proteome datasets for papers
Coursework on protein annotation and biocuration
Comparative-genomics theses with structured features
Open-data benchmarks for sequence-based ML

🎨 Personal and creative

Hobbyist bioinformatics portfolio projects
Sci-comm visualizations of protein families
Personal target tracker for citizen scientists
Indie tools for amateur synthetic biology

🤝 Non-profit and civic

Pandemic preparedness datasets keyed to UniProt
Public-health reports on pathogen proteomes
Open-source vaccine candidate research
Civic transparency on bio-research outputs

🧪 Experimentation

Train sequence-attribute ML classifiers
Prototype agents that build target dossiers
Test bio chatbot grounding against real records
Benchmark protein-NER models

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

❓ Frequently Asked Questions

🧩 How does it work?

Either supply a UniProt query (reviewed:true AND organism_id:9606) or an accession (P00533), then click Start. The Actor pages through the UniProt REST API, flattens nested fields, and emits a row per entry with 25 columns including keywords, comments, and features.

🔍 What query syntax can I use?

Everything UniProt supports in its own search bar. Common fields: reviewed:, organism_id:, taxonomy_id:, gene:, keyword:, cc_subcellular_location:, existence:, length:[X TO Y], accession:, plus boolean AND/OR/NOT. See the UniProt query fields docs for the full list.

🆔 How do I look up a single accession?

Set the accession field (e.g. P00533). It bypasses the query and pulls the full entry directly.

🧬 How do I look up many accessions at once?

Use the query syntax with OR: accession:P00533 OR accession:P04637 OR accession:Q9Y6K8.

📏 Does it include the full sequence string?

Only when fetchSequence: true. Sequence length and molecular weight are always returned. Skip the full string for big proteomes to keep dataset sizes manageable.

🔁 How fresh is the data?

UniProt releases every eight weeks. Every run hits the live API, so output reflects the current release.

📚 What is the difference between Swiss-Prot and TrEMBL?

Swiss-Prot is manually curated (reviewed:true, ~570K entries). TrEMBL is automatically annotated (reviewed:false, hundreds of millions of entries). Pick the slice your work needs.

🚫 Do I need an API key?

No. The UniProt REST API is free and public.

⏰ Can I schedule recurring runs?

Yes. Use Apify Schedules to refresh on the UniProt release cadence and pipe results into your pipeline.

⚖️ Is this data legal to use?

Yes. UniProt is released under CC BY 4.0. Attribute UniProt in any downstream publication or product, as their license requires.

💳 Do I need a paid Apify plan?

No. The free plan covers small runs (10 records). A paid plan unlocks higher limits and scheduling.

🆘 What if I need help?

Reach out via the contact form below to request a custom protein workflow.

🔌 Integrate with any app

UniProt Protein Scraper connects to any cloud service via Apify integrations:

Make - Automate multi-step research workflows
Zapier - Connect with 5,000+ apps
Slack - Get release notifications in your channels
Airbyte - Pipe protein records into your warehouse
GitHub - Trigger runs from commits and releases
Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh UniProt entries into your bio pipeline or alert your team in Slack.

🔗 Recommended Actors

💊 RxNorm Drug Concepts Scraper - Standardized US drug vocabulary
🏥 ICD-10-CM, LOINC & Clinical Terminology Scraper - Diagnosis, lab, and drug codes
🤗 Hugging Face Model Scraper - AI model registry metadata
🛡️ urlscan.io Threat Intelligence Scraper - Live web scan data
🌐 RDAP Domain Lookup Scraper - Modern WHOIS replacement

💡 Pro Tip: browse the complete ParseForge collection for more reference-data scrapers.

🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.

⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by EMBL-EBI, the SIB Swiss Institute of Bioinformatics, the Protein Information Resource (PIR), the UniProt Consortium, or any of their funding agencies. All trademarks mentioned are the property of their respective owners. Only publicly available UniProtKB data is collected. Please cite UniProt as required by their CC BY 4.0 license.

👁 UniProt Protein Scraper avatar

azureblue/ncbi-gene-scraper

Search NCBI Gene via E-utilities. Returns gene symbol, full name, chromosome location, map locus, aliases, OMIM ID, organism and functional summary.

👁 User avatar

azureblue

URL: https://apify.com/parseforge/uniprot-scraper

⇱ UniProt Protein Knowledgebase Scraper - 250M+ Entries · Apify

UniProt Protein Sequence & Annotation Scraper

🧬 UniProt Protein Sequence & Annotation Scraper

📋 What the UniProt Scraper does

🎬 Full Demo

⚙️ Input

📊 Output

🧾 Schema

📦 Sample record

✨ Why choose this Actor

📈 How it compares to alternatives

🚀 How to use

💼 Business use cases

🧪 Drug Discovery & Pharma

🧬 Bioinformatics & Genomics

🔬 Structural Biology

🤖 LLM & Bio AI

🔌 Automating UniProt Scraper

🌟 Beyond business use cases

🎓 Research and academia

🎨 Personal and creative

🤝 Non-profit and civic

🧪 Experimentation

🤖 Ask an AI assistant about this scraper

❓ Frequently Asked Questions

🧩 How does it work?

🔍 What query syntax can I use?

🆔 How do I look up a single accession?

🧬 How do I look up many accessions at once?

📏 Does it include the full sequence string?

🔁 How fresh is the data?

📚 What is the difference between Swiss-Prot and TrEMBL?

🚫 Do I need an API key?

⏰ Can I schedule recurring runs?

⚖️ Is this data legal to use?

💳 Do I need a paid Apify plan?

🆘 What if I need help?

🔌 Integrate with any app

🔗 Recommended Actors

You might also like

UniProt Protein Scraper

Uniprot Scraper

UniProt Proteins Scraper

HGNC Gene Symbols Scraper

EBI Proteins API Scraper

Ensembl Genomics Scraper (Genes, Variants, Sequences)

Ensembl Gene Lookup Scraper

ChEMBL Targets Scraper

NCBI Gene Database Scraper

NCBI Gene Lookup — Genomics API for Pharma R&D