VOOZH about

URL: https://apify.com/parseforge/hgnc-gene-symbols-scraper

โ‡ฑ HGNC Gene Symbols Scraper ยท Apify


Pricing

from $15.00 / 1,000 result items

Go to Apify Store

HGNC Gene Symbols Scraper

Query the HUGO Gene Nomenclature Committee database for approved human gene symbols, names, aliases, chromosomal location, gene family, RefSeq, Ensembl, OMIM, UniProt, and external links. Export to JSON, CSV, or Excel for bioinformatics, genomics research, and pharmaceutical pipelines.

Pricing

from $15.00 / 1,000 result items

Rating

0.0

(0)

Developer

๐Ÿ‘ ParseForge

ParseForge

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

a month ago

Last modified

Share

๐Ÿ‘ ParseForge Banner

๐Ÿงฌ HGNC Gene Symbols Scraper

๐Ÿš€ Export approved human gene symbols in seconds. Pull 43,000+ HGNC-approved gene records with cross-references to Ensembl, Entrez, UniProt, OMIM, and PubMed. No API key, no registration, no manual nomenclature lookups.

๐Ÿ•’ Last updated: 2026-05-23 ยท ๐Ÿ“Š 27 fields per record ยท ๐Ÿงฌ 43,000+ genes ยท ๐Ÿ”— 9 cross-references ยท ๐ŸŒ HUGO canonical

The HGNC Gene Symbols Scraper exports records from the HUGO Gene Nomenclature Committee, the official authority for assigning unique human gene symbols and names. Each record carries 27 fields including approved symbol, full name, chromosomal location, aliases, previous symbols, gene group, status, and cross-references to Ensembl, Entrez, UCSC, RefSeq, UniProt, OMIM, PubMed, MGD, RGD, CCDS, and Vega. HGNC nomenclature underpins virtually every modern human-genetics database and clinical-genomics pipeline.

Coverage spans 43,000+ approved gene symbols plus thousands of pseudogenes, withdrawn symbols, and reserved names. This Actor turns lookup-by-symbol, lookup-by-ID, and search-by-keyword into one-step exports as CSV, Excel, JSON, or XML.

๐ŸŽฏ Target Audience๐Ÿ’ก Primary Use Cases
Bioinformatics teams, clinical-genomics labs, pharma R&D, computational biologists, science writers, EHR vendorsVariant interpretation, gene-panel design, cross-DB joins, symbol normalization, literature mining, omics pipeline annotation

๐Ÿ“‹ What the HGNC Scraper does

Five lookup modes in a single run:

  • ๐Ÿ”ค Symbol lookup. Resolve approved symbols like BRCA1, TP53, EGFR, MYC, AKT1.
  • ๐Ÿ†” HGNC ID lookup. Resolve canonical HGNC IDs like 1100 or HGNC:1100.
  • ๐Ÿ”— Entrez Gene ID lookup. Cross-reference NCBI Entrez IDs back to HGNC records.
  • ๐Ÿงช UniProt accession lookup. Map protein accessions like P38398 to gene records.
  • ๐Ÿ” Free-text search. Query across symbols, names, aliases, and previous names.

Each record includes chromosomal location, locus type and group, alias and previous symbols, gene-family group, status, approval date, last-modified timestamp, and the complete cross-reference panel.

๐Ÿ’ก Why it matters: symbol nomenclature drifts. A gene approved as MLL in 2010 is now KMT2A. Pipelines and clinical reports that miss the update silently lose joins. This Actor returns the canonical, current HGNC record on every lookup so your annotations stay correct.


๐ŸŽฌ Full Demo

๐Ÿšง Coming soon: a 3-minute walkthrough showing how to resolve a panel of symbols into a downloadable cross-reference table.


โš™๏ธ Input

InputTypeDefaultBehavior
maxItemsinteger10Records to return. Free plan caps at 10, paid plan at 1,000,000.
modestring"fetchBySymbol"One of searchQuery, fetchBySymbol, fetchByHgncId, fetchByEntrezId, fetchByUniprot.
valuesarray["BRCA1", "TP53", "EGFR", "MYC", "AKT1"]Symbols, IDs, or search terms. One lookup per entry.

Example: resolve a panel of cancer genes by approved symbol.

{
"maxItems":25,
"mode":"fetchBySymbol",
"values":["BRCA1","BRCA2","TP53","EGFR","KRAS","MYC","PTEN","APC","RB1","NF1"]
}

Example: map UniProt accessions back to HGNC records.

{
"maxItems":5,
"mode":"fetchByUniprot",
"values":["P38398","P04637","P01133"]
}

โš ๏ธ Good to Know: HGNC assigns symbols for human genes only. Mouse and rat orthologs are linked via MGD and RGD cross-references inside each record, but rodent-only symbols are not in scope.


๐Ÿ“Š Output

Each gene record contains 27 fields. Download the dataset as CSV, Excel, JSON, or XML.

๐Ÿงพ Schema

FieldTypeExample
๐Ÿ†” hgncIdstring"HGNC:1100"
๐Ÿ”ค symbolstring"BRCA1"
๐Ÿ“› namestring"BRCA1 DNA repair associated"
๐Ÿงฌ locusTypestring"gene with protein product"
๐Ÿงช locusGroupstring"protein-coding gene"
๐Ÿ“ locationstring"17q21.31"
๐Ÿ” aliasSymbolarray["BRCC1", "PNCA4"]
๐Ÿท๏ธ aliasNamearray["Breast cancer 1, early onset"]
โช prevSymbolarray["BRCAI"]
โช prevNamearray[]
๐Ÿ‘ฅ geneGrouparray["Ring finger proteins", "BRCT domain containing"]
๐Ÿ”— entrezIdstring"672"
๐Ÿ”— ensemblGeneIdstring"ENSG00000012048"
๐Ÿ”— ucscIdstring"uc002ict.5"
๐Ÿ”— refseqAccessionarray["NM_007294"]
๐Ÿงช uniprotIdsarray["P38398"]
๐Ÿ“š omimIdarray["113705"]
๐Ÿ“š pubmedIdarray["2270482", "8554067"]
๐Ÿญ mgdIdarray["MGI:104537"]
๐Ÿ€ rgdIdarray["RGD:2218"]
๐Ÿงฌ ccdsIdarray["CCDS11456"]
๐Ÿ”— vegaIdstring | null"OTTHUMG00000157426"
โœ… statusstring"Approved"
๐Ÿ“… dateApprovedReservedstring"1989-06-30"
๐Ÿ“… dateModifiedstring"2024-09-12"
๐Ÿ—ƒ๏ธ rawobjectFull HGNC payload for that record
๐Ÿ•’ scrapedAtISO 8601"2026-05-23T00:00:00.000Z"

๐Ÿ“ฆ Sample records


โœจ Why choose this Actor

Capability
๐ŸงฌCanonical nomenclature. HUGO-approved symbols and names backed by 30+ years of curation.
๐Ÿ”—Nine cross-references per record. Entrez, Ensembl, UCSC, RefSeq, UniProt, OMIM, PubMed, MGD, RGD, CCDS, Vega.
๐Ÿ”คFive lookup modes. Symbol, HGNC ID, Entrez ID, UniProt accession, free-text search.
โชAliases and previous symbols. Resolve historical names like MLL to current KMT2A automatically.
โšกFast. 10 lookups in seconds, hundreds in under a minute.
๐Ÿ”Always fresh. Pulls live HGNC records so updates appear on the next run.
๐ŸšซNo authentication. Works against the public HGNC data feed. No login or key needed.

๐Ÿ“Š Gene symbol consistency is one of the most under-appreciated quality signals in modern genomics. This Actor makes it trivial to enforce.


๐Ÿ“ˆ How it compares to alternatives

ApproachCostCoverageRefreshLookupsSetup
โญ HGNC Scraper (this Actor)$5 free credit, then pay-per-use43,000+ human genesLive per runsymbol, ID, Entrez, UniProt, searchโšก 2 min
Manual HGNC web searchFreeFullLiveOne at a time๐Ÿข Per-row
Bulk file downloadsFreeFull snapshotQuarterlyLocal parsingโณ Hours
Generic biomedical APIsVariesMixedMixedOften paid๐Ÿ•’ Variable

Pick this Actor when you want HGNC records on demand without bulk downloads or per-row clicks.


๐Ÿš€ How to use

  1. ๐Ÿ“ Sign up. Create a free account with $5 credit (takes 2 minutes).
  2. ๐ŸŒ Open the Actor. Go to the HGNC Gene Symbols Scraper page on the Apify Store.
  3. ๐ŸŽฏ Set input. Choose a mode, paste your symbols or IDs into the values list, and set maxItems.
  4. ๐Ÿš€ Run it. Click Start and let the Actor resolve every lookup.
  5. ๐Ÿ“ฅ Download. Grab your results in the Dataset tab as CSV, Excel, JSON, or XML.

โฑ๏ธ Total time from signup to downloaded dataset: 3-5 minutes. No coding required.


๐Ÿ’ผ Business use cases

๐Ÿงช Clinical & Translational Genomics

  • Variant interpretation pipelines with canonical symbols
  • Gene-panel design and QA for diagnostics
  • EHR and lab-report normalization
  • Cross-DB joins from Entrez or UniProt back to HGNC

๐Ÿ’Š Pharma & Biotech R&D

  • Target-list curation against canonical nomenclature
  • Drug-target literature mining via PubMed IDs
  • Multi-source omics annotation with stable IDs
  • Patent and FDA filing nomenclature checks

๐Ÿงฎ Bioinformatics Pipelines

  • Symbol-history normalization for legacy datasets
  • RNA-seq and microarray probe-to-gene mapping
  • Cross-species ortholog joins via MGD and RGD
  • Pre-flight QA on submitted FASTA/GFF annotations

๐Ÿ“ฐ Science Communication & EdTech

  • Up-to-date gene cards for popular-science articles
  • Interactive teaching tools with live HGNC data
  • Database front-ends for medical education
  • Symbol lookup widgets for science journalism

๐Ÿ”Œ Automating HGNC Scraper

Control the scraper programmatically for scheduled runs and pipeline integrations:

  • ๐ŸŸข Node.js. Install the apify-client NPM package.
  • ๐Ÿ Python. Use the apify-client PyPI package.
  • ๐Ÿ“š See the Apify API documentation for full details.

The Apify Schedules feature lets you trigger this Actor on any cron interval. Weekly refreshes keep clinical and research databases aligned with HGNC updates automatically.


๐ŸŒŸ Beyond business use cases

Authoritative gene nomenclature has reach well beyond commercial pipelines. The same records support research, education, civic projects, and personal initiatives.

๐ŸŽ“ Research and academia

  • Reproducible variant calls for peer-reviewed studies
  • Class assignments on gene-naming conventions
  • Cross-DB join exercises for bioinformatics courses
  • Citation-friendly snapshots of canonical records

๐ŸŽจ Personal and creative

  • 23andMe and consumer-genomics result decoding
  • Custom Anki decks for med-school revision
  • Hobbyist family-history disease research
  • Indie biotech newsletter content automation

๐Ÿค Non-profit and civic

  • Rare-disease patient-advocacy gene factsheets
  • Public health surveillance with canonical symbols
  • Open-data biology curriculum for high schools
  • Grant-proposal supporting evidence with stable IDs

๐Ÿงช Experimentation

  • Train LLMs on canonical biomedical vocabulary
  • Build agentic tools that resolve symbol drift live
  • Prototype knowledge graphs with HGNC as the spine
  • Validate gene-prediction models against ground truth

๐Ÿค– Ask an AI assistant about this scraper

Open a ready-to-send prompt about this ParseForge actor in the AI of your choice:


โ“ Frequently Asked Questions

๐Ÿงฉ How does it work?

Pick a lookup mode, paste your symbols or IDs into the values list, and the Actor resolves each one against HGNC and emits a clean structured record. No browser automation, no captchas, no setup.

๐Ÿ“ How accurate are the symbols?

HGNC is the canonical authority for human gene nomenclature. Every approved symbol is reviewed and assigned by HUGO curators. Status flags (Approved, Entry Withdrawn, Symbol Withdrawn) are surfaced on every record so you always know what you have.

๐Ÿ” How often is the dataset refreshed?

HGNC updates its records continuously as new symbols are approved and existing ones are reviewed. Every run of this Actor fetches live data.

๐Ÿ”— Which cross-references are included?

Entrez, Ensembl, UCSC, RefSeq, UniProt, OMIM, PubMed, MGD, RGD, CCDS, and Vega. Not every cross-reference is populated for every gene; the field is an empty array when HGNC has no mapping.

โช Can it resolve old symbols?

Yes. Run free-text search with the obsolete symbol (for example MLL) and the Actor returns the current approved record (KMT2A) along with the alias and previous-symbol arrays.

โฐ Can I schedule regular runs?

Yes. Use Apify Schedules to run this Actor on any cron interval (hourly, daily, weekly) and keep your annotation database in sync with HGNC releases.

โš–๏ธ Is this data legal to use?

HGNC data is publicly available and widely cited. Standard scholarly attribution applies; commercial pipelines and clinical tools have been using HGNC nomenclature for decades.

๐Ÿ’ณ Do I need a paid Apify plan to use this Actor?

No. The free Apify plan is enough for testing and small panels (10 records per run). A paid plan lifts the limit for full panel resolution and scheduling.

๐Ÿ” What happens if a run fails or gets interrupted?

Apify automatically retries transient errors. If a run still fails, inspect the log, fix the input, and re-run. Partial datasets from failed runs are preserved.

๐Ÿญ Does it return mouse or rat genes?

No, HGNC covers human genes only. The MGD and RGD ID fields cross-reference the rodent equivalents so you can follow up in those databases.

๐Ÿ†˜ What if I need help?

Our support team is here to help. Contact us through the Apify platform or use the Tally form linked below.


๐Ÿ”Œ Integrate with any app

HGNC Scraper connects to any cloud service via Apify integrations:

  • Make - Automate multi-step workflows
  • Zapier - Connect with 5,000+ apps
  • Slack - Get run notifications in your channels
  • Airbyte - Pipe gene records into your warehouse
  • GitHub - Trigger runs from commits and releases
  • Google Drive - Export datasets straight to Sheets

You can also use webhooks to trigger downstream actions when a run finishes. Push fresh HGNC records into your annotation database or alert your team in Slack on symbol updates.


๐Ÿ”— Recommended Actors

๐Ÿ’ก Pro Tip: browse the complete ParseForge collection for more biomedical and reference-data scrapers.


๐Ÿ†˜ Need Help? Open our contact form to request a new scraper, propose a custom data project, or report an issue.


โš ๏ธ Disclaimer: this Actor is an independent tool and is not affiliated with, endorsed by, or sponsored by HGNC, HUGO, or EMBL-EBI. All trademarks mentioned are the property of their respective owners. Only publicly available gene nomenclature data is collected.

You might also like

Ensembl Gene Lookup Scraper

parseforge/ensembl-gene-lookup-scraper

Resolve human gene symbols against the Ensembl REST API to fetch stable gene identifiers, chromosome location, strand, biotype, and description. Useful for variant annotation, RNA seq pipelines, and gene set enrichment workflows that need clean Ensembl mappings from a list of HGNC symbols.

NCBI Gene Database Scraper

parseforge/ncbi-eutils-gene-scraper

Query NCBI Gene through Entrez syntax such as BRCA1[gene] AND human[orgn]. Returns gene symbol, description, organism, chromosome, map location, summary, aliases, and designations. Useful for genomics pipelines, target discovery, and clinical research across model organisms.

NCBI Gene Lookup โ€” Genomics API for Pharma R&D

azureblue/ncbi-gene-scraper

Search NCBI Gene via E-utilities. Returns gene symbol, full name, chromosome location, map locus, aliases, OMIM ID, organism and functional summary.

NCBI EUtils Gene Summary Scraper

parseforge/ncbi-eutils-gene-summary-scraper

Pull gene summaries from NCBI EUtils by Gene UID, returning official symbol, full name, aliases, organism, chromosome, map location, and description. Useful for annotating variant lists, building gene knowledge panels, and enriching bioinformatics dashboards with canonical NCBI metadata.

UniProt Protein Scraper

parseforge/uniprot-protein-scraper

Query the UniProt knowledgebase with any free text search to retrieve protein entries with accession identifiers, names, gene symbols, organism, sequence length, and functional annotations. Useful for proteomics research, bioinformatics pipelines, and structural biology cross referencing.

UniProt Protein Sequence & Annotation Scraper

parseforge/uniprot-scraper

Export UniProt Knowledgebase entries โ€” search Swiss-Prot by organism, keyword, gene, or any UniProt query, or fetch a single accession. Returns names, genes, organism, sequence length & molecular weight, keywords, comments, features, and PDB/RefSeq/Ensembl/KEGG cross-refs.

NCBI ClinVar Variant Scraper

parseforge/clinvar-esummary-scraper

Query NCBI ClinVar for human genetic variants tied to disease. Search by gene symbol, variant ID, or clinical significance and pull variation IDs, gene info, molecular consequence, clinical assertions, and review status. Useful for clinical genomics, variant curation, and research.

NCBI dbSNP Variant Scraper

parseforge/dbsnp-esummary-scraper

Search NCBI dbSNP for short genetic variants by gene symbol or rs ID. Returns rs accession, chromosome position, allele info, gene context, and global minor allele frequencies. Useful for GWAS pipelines, variant annotation, population genetics, and pharmacogenomics work.

GTEx Gene Expression Scraper

parseforge/gtex-gene-expression-scraper

Profile any gene across human tissues with the GTEx Portal. Resolve a symbol like BRCA1, TP53, or EGFR to its gencode ID, then pull median expression for all 54 tissues with tissue name, median TPM, UBERON ontology ID, and dataset release. Great for target research and expression analysis.