HGNC Gene Symbols Scraper

Pricing

from $15.00 / 1,000 result items

HGNC Gene Symbols Scraper

Query the HUGO Gene Nomenclature Committee database for approved human gene symbols, names, aliases, chromosomal location, gene family, RefSeq, Ensembl, OMIM, UniProt, and external links. Export to JSON, CSV, or Excel for bioinformatics, genomics research, and pharmaceutical pipelines.

Pricing

from $15.00 / 1,000 result items

Rating

0.0

(0)

Developer

👁 ParseForge

ParseForge

Maintained by Community

Actor stats

Bookmarked

Total users

Monthly active users

a month ago

Last modified

🧬 HGNC Gene Symbols Scraper

🚀 Export approved human gene symbols in seconds. Pull 43,000+ HGNC-approved gene records with cross-references to Ensembl, Entrez, UniProt, OMIM, and PubMed. No API key, no registration, no manual nomenclature lookups.

🕒 Last updated: 2026-05-23 · 📊 27 fields per record · 🧬 43,000+ genes · 🔗 9 cross-references · 🌍 HUGO canonical

The HGNC Gene Symbols Scraper exports records from the HUGO Gene Nomenclature Committee, the official authority for assigning unique human gene symbols and names. Each record carries 27 fields including approved symbol, full name, chromosomal location, aliases, previous symbols, gene group, status, and cross-references to Ensembl, Entrez, UCSC, RefSeq, UniProt, OMIM, PubMed, MGD, RGD, CCDS, and Vega. HGNC nomenclature underpins virtually every modern human-genetics database and clinical-genomics pipeline.

Coverage spans 43,000+ approved gene symbols plus thousands of pseudogenes, withdrawn symbols, and reserved names. This Actor turns lookup-by-symbol, lookup-by-ID, and search-by-keyword into one-step exports as CSV, Excel, JSON, or XML.

🎯 Target Audience	💡 Primary Use Cases
Bioinformatics teams, clinical-genomics labs, pharma R&D, computational biologists, science writers, EHR vendors	Variant interpretation, gene-panel design, cross-DB joins, symbol normalization, literature mining, omics pipeline annotation

📋 What the HGNC Scraper does

Five lookup modes in a single run:

🔤 Symbol lookup. Resolve approved symbols like BRCA1, TP53, EGFR, MYC, AKT1.
🆔 HGNC ID lookup. Resolve canonical HGNC IDs like 1100 or HGNC:1100.
🔗 Entrez Gene ID lookup. Cross-reference NCBI Entrez IDs back to HGNC records.
🧪 UniProt accession lookup. Map protein accessions like P38398 to gene records.
🔍 Free-text search. Query across symbols, names, aliases, and previous names.

Each record includes chromosomal location, locus type and group, alias and previous symbols, gene-family group, status, approval date, last-modified timestamp, and the complete cross-reference panel.

💡 Why it matters: symbol nomenclature drifts. A gene approved as MLL in 2010 is now KMT2A. Pipelines and clinical reports that miss the update silently lose joins. This Actor returns the canonical, current HGNC record on every lookup so your annotations stay correct.

🎬 Full Demo

🚧 Coming soon: a 3-minute walkthrough showing how to resolve a panel of symbols into a downloadable cross-reference table.

⚙️ Input

Input	Type	Default	Behavior
maxItems	integer	10	Records to return. Free plan caps at 10, paid plan at 1,000,000.
mode	string	"fetchBySymbol"	One of searchQuery, fetchBySymbol, fetchByHgncId, fetchByEntrezId, fetchByUniprot.
values	array	["BRCA1", "TP53", "EGFR", "MYC", "AKT1"]	Symbols, IDs, or search terms. One lookup per entry.

Example: resolve a panel of cancer genes by approved symbol.

{
"maxItems":25,
"mode":"fetchBySymbol",
"values":["BRCA1","BRCA2","TP53","EGFR","KRAS","MYC","PTEN","APC","RB1","NF1"]
}

Example: map UniProt accessions back to HGNC records.

{
"maxItems":5,
"mode":"fetchByUniprot",
"values":["P38398","P04637","P01133"]
}

⚠️ Good to Know: HGNC assigns symbols for human genes only. Mouse and rat orthologs are linked via MGD and RGD cross-references inside each record, but rodent-only symbols are not in scope.

📊 Output

Each gene record contains 27 fields. Download the dataset as CSV, Excel, JSON, or XML.

🧾 Schema

Field	Type	Example
🆔 `hgncId`	string	`"HGNC:1100"`
🔤 `symbol`	string	`"BRCA1"`
📛 `name`	string	`"BRCA1 DNA repair associated"`
🧬 `locusType`	string	`"gene with protein product"`
🧪 `locusGroup`	string	`"protein-coding gene"`
📍 `location`	string	`"17q21.31"`
🔁 `aliasSymbol`	array	`["BRCC1", "PNCA4"]`
🏷️ `aliasName`	array	`["Breast cancer 1, early onset"]`
⏪ `prevSymbol`	array	`["BRCAI"]`
⏪ `prevName`	array	`[]`
👥 `geneGroup`	array	`["Ring finger proteins", "BRCT domain containing"]`
🔗 `entrezId`	string	`"672"`
🔗 `ensemblGeneId`	string	`"ENSG00000012048"`
🔗 `ucscId`	string	`"uc002ict.5"`
🔗 `refseqAccession`	array	`["NM_007294"]`
🧪 `uniprotIds`	array	`["P38398"]`
📚 `omimId`	array	`["113705"]`
📚 `pubmedId`	array	`["2270482", "8554067"]`
🐭 `mgdId`	array	`["MGI:104537"]`
🐀 `rgdId`	array	`["RGD:2218"]`
🧬 `ccdsId`	array	`["CCDS11456"]`
🔗 `vegaId`	string \| null	`"OTTHUMG00000157426"`
✅ `status`	string	`"Approved"`
📅 `dateApprovedReserved`	string	`"1989-06-30"`
📅 `dateModified`	string	`"2024-09-12"`
🗃️ `raw`	object	Full HGNC payload for that record
🕒 `scrapedAt`	ISO 8601	`"2026-05-23T00:00:00.000Z"`

📦 Sample records

✨ Why choose this Actor

	Capability
🧬	Canonical nomenclature. HUGO-approved symbols and names backed by 30+ years of curation.
🔗	Nine cross-references per record. Entrez, Ensembl, UCSC, RefSeq, UniProt, OMIM, PubMed, MGD, RGD, CCDS, Vega.
🔤	Five lookup modes. Symbol, HGNC ID, Entrez ID, UniProt accession, free-text search.
⏪	Aliases and previous symbols. Resolve historical names like `MLL` to current `KMT2A` automatically.
⚡	Fast. 10 lookups in seconds, hundreds in under a minute.
🔁	Always fresh. Pulls live HGNC records so updates appear on the next run.
🚫	No authentication. Works against the public HGNC data feed. No login or key needed.

📊 Gene symbol consistency is one of the most under-appreciated quality signals in modern genomics. This Actor makes it trivial to enforce.

📈 How it compares to alternatives

Approach	Cost	Coverage	Refresh	Lookups	Setup
⭐ HGNC Scraper (this Actor)	$5 free credit, then pay-per-use	43,000+ human genes	Live per run	symbol, ID, Entrez, UniProt, search	⚡ 2 min
Manual HGNC web search	Free	Full	Live	One at a time	🐢 Per-row
Bulk file downloads	Free	Full snapshot	Quarterly	Local parsing	⏳ Hours
Generic biomedical APIs	Varies	Mixed	Mixed	Often paid	🕒 Variable

Pick this Actor when you want HGNC records on demand without bulk downloads or per-row clicks.

🚀 How to use

📝 Sign up. Create a free account with $5 credit (takes 2 minutes).
🌐 Open the Actor. Go to the HGNC Gene Symbols Scraper page on the Apify Store.
🎯 Set input. Choose a mode, paste your symbols or IDs into the values list, and set maxItems.
🚀 Run it. Click Start and let the Actor resolve every lookup.
📥 Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

⏱️ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.

💼 Business use cases

🧪 Clinical & Translational Genomics

Variant interpretation pipelines with canonical symbols
Gene-panel design and QA for diagnostics
EHR and lab-report normalization
Cross-DB joins from Entrez or UniProt back to HGNC

💊 Pharma & Biotech R&D

Target-list curation against canonical nomenclature
Drug-target literature mining via PubMed IDs
Multi-source omics annotation with stable IDs
Patent and FDA filing nomenclature checks

🧮 Bioinformatics Pipelines

Symbol-history normalization for legacy datasets
RNA-seq and microarray probe-to-gene mapping
Cross-species ortholog joins via MGD and RGD
Pre-flight QA on submitted FASTA/GFF annotations

📰 Science Communication & EdTech

Up-to-date gene cards for popular-science articles
Interactive teaching tools with live HGNC data
Database front-ends for medical education
Symbol lookup widgets for science journalism

🔌 Automating HGNC Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

🟢 Node.js. Install the apify-client NPM package.
🐍 Python. Use the apify-client PyPI package.
📚 See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly refreshes keep clinical and research databases aligned with HGNC updates automatically.

🌟 Beyond business use cases

Authoritative gene nomenclature has reach well beyond commercial pipelines. The same records support research, education, civic projects, and personal initiatives.

🎓 Research and academia

Reproducible variant calls for peer-reviewed studies
Class assignments on gene-naming conventions
Cross-DB join exercises for bioinformatics courses
Citation-friendly snapshots of canonical records

🎨 Personal and creative

23andMe and consumer-genomics result decoding
Custom Anki decks for med-school revision
Hobbyist family-history disease research
Indie biotech newsletter content automation

🤝 Non-profit and civic

Rare-disease patient-advocacy gene factsheets
Public health surveillance with canonical symbols
Open-data biology curriculum for high schools
Grant-proposal supporting evidence with stable IDs

🧪 Experimentation

Train LLMs on canonical biomedical vocabulary
Build agentic tools that resolve symbol drift live
Prototype knowledge graphs with HGNC as the spine
Validate gene-prediction models against ground truth

🤖 Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:

❓ Frequently Asked Questions

🧩 How does it work?

Pick a lookup mode, paste your symbols or IDs into the values list, and the Actor resolves each one against HGNC and emits a clean structured record. No browser automation, no captchas, no setup.

📏 How accurate are the symbols?

HGNC is the canonical authority for human gene nomenclature. Every approved symbol is reviewed and assigned by HUGO curators. Status flags (Approved, Entry Withdrawn, Symbol Withdrawn) are surfaced on every record so you always know what you have.

🔁 How often is the dataset refreshed?

HGNC updates its records continuously as new symbols are approved and existing ones are reviewed. Every run of this Actor fetches live data.

🔗 Which cross-references are included?

Entrez, Ensembl, UCSC, RefSeq, UniProt, OMIM, PubMed, MGD, RGD, CCDS, and Vega. Not every cross-reference is populated for every gene; the field is an empty array when HGNC has no mapping.

⏪ Can it resolve old symbols?

Yes. Run free-text search with the obsolete symbol (for example MLL) and the Actor returns the current approved record (KMT2A) along with the alias and previous-symbol arrays.

⏰ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval (hourly, daily, weekly) and keep your annotation database in sync with HGNC releases.

⚖️ Is this data legal to use?

HGNC data is publicly available and widely cited. Standard scholarly attribution applies; commercial pipelines and clinical tools have been using HGNC nomenclature for decades.

💳 Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small panels (10 records per run). A paid plan lifts the limit for full panel resolution and scheduling.

🔁 What happens if a run fails or gets interrupted?

Apify automatically retries transient errors. If a run still fails, inspect the log, fix the input, and re-run. Partial datasets from failed runs are preserved.

🐭 Does it return mouse or rat genes?

No, HGNC covers human genes only. The MGD and RGD ID fields cross-reference the rodent equivalents so you can follow up in those databases.

🆘 What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.

🔌 Integrate with any app

HGNC Scraper connects to any cloud service via Apify integrations:

Make - Automate multi-step workflows
Zapier - Connect with 5,000+ apps
Slack - Get run notifications in your channels
Airbyte - Pipe gene records into your warehouse
GitHub - Trigger runs from commits and releases
Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh HGNC records into your annotation database or alert your team in Slack on symbol updates.

🔗 Recommended Actors

🩺 ClinicalTrials.gov Scraper - Registered clinical trials worldwide
📖 arXiv Scraper - Open-access preprints across science
🧪 OSF Scraper - Open Science Framework projects and registrations
📊 Figshare Scraper - Research data and figures with DOIs
🌍 GBIF Biodiversity Scraper - Global biodiversity occurrence records

💡 Pro Tip: browse the complete ParseForge collection for more biomedical and reference-data scrapers.

🆘 Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.

⚠️ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by HGNC, HUGO, or EMBL-EBI. All trademarks mentioned are the property of their respective owners. Only publicly available gene nomenclature data is collected.

👁 Ensembl Gene Lookup Scraper avatar

Ensembl Gene Lookup Scraper

parseforge/ensembl-gene-lookup-scraper

Resolve human gene symbols against the Ensembl REST API to fetch stable gene identifiers, chromosome location, strand, biotype, and description. Useful for variant annotation, RNA seq pipelines, and gene set enrichment workflows that need clean Ensembl mappings from a list of HGNC symbols.

👁 User avatar

ParseForge

👁 NCBI Gene Database Scraper avatar

NCBI Gene Database Scraper

parseforge/ncbi-eutils-gene-scraper

Query NCBI Gene through Entrez syntax such as BRCA1[gene] AND human[orgn]. Returns gene symbol, description, organism, chromosome, map location, summary, aliases, and designations. Useful for genomics pipelines, target discovery, and clinical research across model organisms.

👁 User avatar

ParseForge

👁 NCBI Gene Lookup — Genomics API for Pharma R&D avatar

NCBI Gene Lookup — Genomics API for Pharma R&D

azureblue/ncbi-gene-scraper

Search NCBI Gene via E-utilities. Returns gene symbol, full name, chromosome location, map locus, aliases, OMIM ID, organism and functional summary.

👁 User avatar

azureblue

👁 NCBI EUtils Gene Summary Scraper avatar

NCBI EUtils Gene Summary Scraper

parseforge/ncbi-eutils-gene-summary-scraper

Pull gene summaries from NCBI EUtils by Gene UID, returning official symbol, full name, aliases, organism, chromosome, map location, and description. Useful for annotating variant lists, building gene knowledge panels, and enriching bioinformatics dashboards with canonical NCBI metadata.

👁 User avatar

ParseForge

👁 UniProt Protein Scraper avatar

UniProt Protein Scraper

parseforge/uniprot-protein-scraper

Query the UniProt knowledgebase with any free text search to retrieve protein entries with accession identifiers, names, gene symbols, organism, sequence length, and functional annotations. Useful for proteomics research, bioinformatics pipelines, and structural biology cross referencing.

👁 User avatar

ParseForge

👁 UniProt Protein Sequence & Annotation Scraper avatar

UniProt Protein Sequence & Annotation Scraper

parseforge/uniprot-scraper

Export UniProt Knowledgebase entries — search Swiss-Prot by organism, keyword, gene, or any UniProt query, or fetch a single accession. Returns names, genes, organism, sequence length & molecular weight, keywords, comments, features, and PDB/RefSeq/Ensembl/KEGG cross-refs.

👁 User avatar

ParseForge

👁 NCBI ClinVar Variant Scraper avatar

NCBI ClinVar Variant Scraper

parseforge/clinvar-esummary-scraper

Query NCBI ClinVar for human genetic variants tied to disease. Search by gene symbol, variant ID, or clinical significance and pull variation IDs, gene info, molecular consequence, clinical assertions, and review status. Useful for clinical genomics, variant curation, and research.

👁 User avatar

ParseForge

Uniprot Scraper

fortuitous_pirate/uniprot-scraper

Scrape UniProt protein knowledge base: 250M+ proteins including reviewed Swiss-Prot entries. Search by protein name, gene, organism. Free, no auth required.

👁 User avatar

Fortuitous Pirate

👁 NCBI dbSNP Variant Scraper avatar

NCBI dbSNP Variant Scraper

parseforge/dbsnp-esummary-scraper

Search NCBI dbSNP for short genetic variants by gene symbol or rs ID. Returns rs accession, chromosome position, allele info, gene context, and global minor allele frequencies. Useful for GWAS pipelines, variant annotation, population genetics, and pharmacogenomics work.

👁 User avatar

ParseForge

👁 GTEx Gene Expression Scraper avatar

GTEx Gene Expression Scraper

parseforge/gtex-gene-expression-scraper

Profile any gene across human tissues with the GTEx Portal. Resolve a symbol like BRCA1, TP53, or EGFR to its gencode ID, then pull median expression for all 54 tissues with tissue name, median TPM, UBERON ontology ID, and dataset release. Great for target research and expression analysis.

👁 User avatar

ParseForge

URL: https://apify.com/parseforge/hgnc-gene-symbols-scraper