![]() |
VOOZH | about |
The Current Release page is a web interface allowing easy access to the main directories and the individual bulk data files available at the current FlyBase FTP repository. Files can be downloaded directly through the web interface.
Users can also browse files on our FTP site, either for the current release or for a limited selection of past releases. It's also possible to browse the FTP file by Dmel genomes.
The ftp client wget can be used to access precomputed files. Files for the lastest release can be found using the "_current" suffix.
e.g:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/genes/fbgn_annotation_ID_current.tsv.gz
For accessing archived releases, the release suffix should be used. e.g.
wget https://s3ftp.flybase.org/releases/FB2023_05/precomputed_files/genes/dmel_unique_protein_isoforms_fb_2023_05.tsv.gz
Most of the files are compressed with the GNU gzip program and have the suffix '.gz'. Most modern computers will unpack and open these files automatically after download. Alternatively, the gunzip command may be used on machines runnign Apple OS X or Unix. On a Windows machine we suggest you use the program 7-zip to open these files as several people have reported problems using WinZip. The resulting file should open with any standard text editor.
Data files from previous releases, as well as links to servers hosting older releases of FlyBase, can be accessed via the Archived Data webpage.
Using an FTP client, data files from previous releases can be obtained by including the FlyBase release in the path /releases/<RELEASE_NUMBER>/. For example to retrieve the 'fbgn_annotation_ID' file for the FB2018_06 release, type:
wget https://s3ftp.flybase.org/releases/FB2014_03/precomputed_files/genes/fbgn_annotation_ID_fb_2014_03.tsv.gz
The /releases/current/ path will always point to latest FlyBase release and this directory will have only one copy of the file.
This section contains links to top-level directories of the FlyBase S3 FTP repository.
The Chado database link leads to the psql directory of the current S3 FTP repository where you can obtain a dump of the PostgreSQL Chado database. If you have a PostgreSQL client application installed and would like to access the latest FlyBase release without installing the database you can connect to the FlyBase public read only Chado database as: $ psql -h chado.flybase.org -U flybase flybase
The version running on this service is identical to the current web site release.
This section contains links to:
The remaining sections of the Current Release page are organized by data class/type and provide direct downloads of the current bulk data files from the FTP site. Most files are from the current precomputed files directory of the FTP site and contain useful data for the specified data type (described in detail below). The Genomes files are from the current D. melanogaster FTP genomes directory or the current files for selected other Drosophila species.
The first part of a filename always describes the content of the file, and the second part may contain a FlyBase or genome annotation version number. For example, the file "fbgn_annotation_ID_fb_2018_06.tsv.gz" maps the primary FlyBase gene identifiers (FBgn) to their annotation IDs for the FB2018_06 release of FlyBase. The "dmel-all-CDS-r6.25.fasta.gz" files contains the coding sequences for all D. melanogaster genes from the release 6 of the sequence assembly, annotation release 25.
At the top and bottom of each tab separated text file there are a few lines that describe the file. These lines start with a '#' symbol. The line immediately before the start of the data contains headings for each of the tab separated columns in the file. The file can also include some blank lines to separate information about the version of the file from the description of data in the file.
Superscripts and subscripts are represented in the precomputed data files in the ASCII text format used by FlyBase, which is described in section 10.3 of the Nomenclature document.
Each precomputed data file contains the complete data set for the FlyBase release. If you are looking for information on a defined subset of genes or other FlyBase data type, you can use the Batch Download tool to query the precomputed data files and thus obtain only the data you require. This approach is described in more detail here.
Files described in this section are in the "synonyms" subdirectory of the FTP site.
Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/synonyms/fb_synonym_current.tsv.gz
The file reports current symbols and synonyms for the following objects in FlyBase: genes (FBgn), alleles (FBal), balancers (FBba), aberrations (FBab), transgenic constructs (FBtp), insertions (FBti), transcripts (FBtr), and proteins (FBpp).
The file includes:
File format:
| Column heading | Content Description |
|---|---|
| primary_FBid | Primary FlyBase identifier for the object. |
| organism_abbreviation | Abbreviation (from the Species Abbreviations list) indicating the species of origin. |
| current_symbol | Current symbol used in FlyBase for the object. |
| current_fullname | Current full name used in FlyBase for the object. |
| fullname_synonym(s) | Non-current full name(s) associated with the object (pipe separated values). |
| symbol_synonym(s) | Non-current symbol(s) associated with the object (pipe separated values). |
Files described in this section are in the "genes" subdirectory of the FTP site. Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/genes/fbgn_annotation_ID_current.tsv.gz
The chado XML file generated from the FlyBase PostgreSQL database for the 'genes' data class.
The file reports the summary of gene-level genetic interactions in FlyBase. This data is computed from the allele-level genetic interaction data captured by FlyBase curators.
The file includes information for Dmel genes only.
Interactions involving any of the following kinds of allele are considered when the gene-level genetic interaction data is computed:
File format:
| Column heading | Content Description |
|---|---|
| Starting_gene(s)_symbol | Current FlyBase symbol of gene(s) involved in the starting genotype. |
| Starting_gene(s)_FBgn | Current FlyBase identifier (FBgn#) of gene(s) involved in the starting genotype. |
| Interacting_gene(s)_symbol | Current FlyBase symbol of gene(s) involved in the interacting genotype. |
| Interacting_gene(s)_FBgn | Current FlyBase identifier (FBgn#) of gene(s) involved in the interacting genotype. |
| Interaction_type | Type of interaction observed, either 'suppressible' or 'enhanceable'. |
| Publication_FBrf | Current FlyBase identifier (FBrf#) of publication from which the data came. |
Notes:
e.g.
Pten FBgn0026379 Akt1 FBgn0010379 suppressible FBrf0127089
indicates that phenotype(s) caused by a mutation of Pten are suppressed by a mutation of Akt1.
e.g.
robo1|sli FBgn0005631|FBgn0264089 RhoGAP93B FBgn0038853 enhanceable FBrf0191476
indicates that:
This file reports curated spatiotemporal expression patterns for genes, reporters and transgenic constructs, including split system hemidriver combinations. Where possible, controlled vocabulary terms are reported (with the FlyBase or GO ID in parentheses after the name); multiple terms within a field are pipe-separated. File format:
| Column heading | Content Description |
|---|---|
| feature_id | The FlyBase identifier for the subject of the expression annotation, which may represent a gene, split system combination feature, or transgenic/reporter allele. |
| feature_symbol | The FlyBase symbol for the subject of the expression annotation. |
| reference_id | The FlyBase FBrf ID for the reference to which the annotation is attributed. |
| reference_id | The FlyBase FBrf ID for the reference to which the annotation is attributed. |
| expression_type | The gene/reporter product type observed in the annotation: RNA or polypeptide. |
| assay_term | The assay used to observe the expression pattern. |
| stage_start | The developmental stage term for the start of the temporal window in which the expression pattern was observed. |
| stage_end | The developmental stage term for the end of the temporal window in which the expression pattern was observed. No "end" stage is reported for annotations of expression at a single stage. |
| stage_qualifiers | Qualifiers for the observed stage range of the expression pattern. |
| stage_slim_terms | High level developmental stage terms representing the temporal window of expression observed. |
| anatomical_structure_term | The anatomical term for the primary site described in the annotation. |
| anatomical_structure_qualifiers | Qualifiers for the anatomical term for the primary site described in the annotation. |
| anatomical_structure_slim_terms | High level anatomy terms representing the primary site of expression. |
| anatomical_substructure_term | The anatomical term for sub-region within the primary site (see "anatomical_structure_term" above) in which expression is observed. |
| anatomical_substructure_qualifiers | Qualifiers for the anatomical substructure terms reported. |
| anatomical_substructure_slim_terms | High level anatomy terms representing the sub-region in which expression is observed. |
| cellular_component_term | Subcellular regions in which expression is observed for the annotation. |
| cellular_component_qualifiers | Qualifiers for the cellular component terms of the annotation. |
| notes | Free-text curator notes about the annotation. |
This file reports gene expression values based on RNA-Seq experiments, calculated as reads per kilobase per million reads (RPKM). RPKM values are calculated only for the unique exonic regions of the gene (excluding segments that overlap other genes), except for genes derived from dicistronic/polycistronic transcripts, in which case all exon regions are used in the RPKM expression calculation.
File format:
| Column heading | Content Description |
|---|---|
| Release_ID | The D. melanogaster annotation set version from which the gene model used in the analysis derives. |
| FBgn# | The unique FlyBase gene ID for this gene. |
| GeneSymbol | The official FlyBase symbol for this gene. |
| Parent_library_FBlc# | The unique FlyBase ID for the dataset project to which the RNA-Seq experiment belongs. |
| Parent_library_name | The official FlyBase symbol for the dataset project to which the RNA-Seq experiment belongs. |
| RNASource_FBlc# | The unique FlyBase ID for the RNA-Seq experiment used for RPKM expression calculation. |
| RNASource_name | The official FlyBase symbol for the RNA-Seq experiment used for RPKM expression calculation. |
| RPKM_value | The RPKM expression value for the gene in the specified RNA-Seq experiment. |
| Bin_value | The expression bin classification of this gene in this RNA-Seq experiment, based on RPKM value. Bins range from 1 (no/extremely low expression) to 8 (extremely high expression). |
| Unique_exon_base_count | The number of exonic bases unique to the gene (not overlapping exons of other genes). Field will be blank for genes derived from dicistronic/polycistronic transcripts. |
| Total_exon_base_count | The number of bases in all exons of this gene. |
| Count_used | Indicates if the RPKM expression value was calculated using only the exonic regions unique to the gene and not overlapping exons of other genes (Unique), or, if the RPKM expression value was calculated based on all exons of the gene regardless of overlap with other genes (Total). RPKM expression values are typically reported for the "Unique" count, except for genes on dicistronic/polycistronic transcripts, in which case the "Total" count is reported. |
A simpler, spreadsheet-friendly version of the "gene_rpkm_report_fb_*.tsv.gz" file. This file provides a gene by expression value matrix based on RNA-Seq experiments. RPKM is calculated as reads per kilobase per million reads (RPKM). RPKM values are calculated only for the unique exonic regions of the gene (excluding segments that overlap other genes), except for genes derived from dicistronic/polycistronic transcripts, in which case all exon regions are used in the RPKM expression calculation. This RPKM matrix lacks the details of how RPKM was calculated for each gene.
Note - In addition to FlyBase calculated RPKM RNA-Seq expression values, FlyAtlas2 data have been incorporated into this file. These data are in FPKM units, calculated by the FlyAtlas group Gillen, 2023.
File format:
| Column heading | Content Description |
|---|---|
| gene_primary_id | The unique FlyBase gene ID for this gene. |
| gene_symbol | The official FlyBase symbol for this gene. |
| gene_fullname | The official full name for this gene. |
| gene_type | The type of gene: e.g., protein_coding_gene, non_protein_coding_gene. |
| DATASAMPLE_NAME_(DATASET_ID) | Each subsequent column reports the RNA-Seq gene expression value for the sample listed in the header. The dataset "FBlc" ID is listed in parentheses, and can be pasted into FlyBase search to access more information on the sample from the "dataset" report. Expression in most cases was calculated by FlyBase in RPKM units, with the exception of FlyAtlas2 data, which was calculated by the FlyAtlas group and is expressed in FPKM units. |
This file reports summarized gene expression levels from cell clusters observed in single cell RNA-Seq experiments; these data are processed from data at the EBI Single Cell Expression Atlas. The "Mean_Expression" is the average level of expression of the gene across all cells of the cluster in which the gene is detected at all; the "Spread" is the proportion of cells in the cluster in which the gene is detected. Please see the dataset reports for more experimental details and for links to other data repositories for raw and alternatively processed data.
File format:
| Column heading | Content Description |
|---|---|
| Pub_ID | The FlyBase FBrf ID for the reference in which the expression was reported. |
| Pub_miniref | The FlyBase citation for the publication in which the expression was reported. |
| Clustering_Analysis_ID | The FlyBase FBlc ID for the dataset representing the clustering analysis. |
| Clustering_Analysis_Name | The FlyBase name for the dataset representing the clustering analysis. |
| Source_Tissue_Sex | The sex of the source tissue used for the experiment: male, female or mixed. |
| Source_Tissue_Stage | The life stage of the source tissue used for the experiment, using only high-level terms: embryonic stage, larval stage, pupal stage, adult stage or mixed. |
| Source_Tissue_Anatomy | The anatomical region of the source tissue used for the experiment; only "mixed" is shown if many |
| Cluster_ID | The FlyBase FBlc ID for the dataset representing the cell cluster. |
| Cluster_Name | The FlyBase name for the dataset representing the cell cluster. |
| Cluster_Cell_Type_ID | The FlyBase FBbt ID for the cell type represented by the cell cluster. |
| Cluster_Cell_Type_Name | The FlyBase name for the cell type represented by the cell cluster. |
| Gene_ID | The FlyBase FBgn ID for the expressed gene. |
| Gene_Symbol | The FlyBase symbol for the expressed gene (ASCII-format). |
| Mean_Expression | The average level of expression of the gene across all cells of the cluster in which the gene is detected at all. |
| Spread | The proportion of cells in the cluster in which the gene is detected. |
This file contains the metadata describing the single-cell RNA sequencing datasets. It is associated with the scRNA-seq_gene_expression_fb_*.tsv.gz file, which contains the actual expression data from those datasets.
This is a JSON file whose structure follows a specific LinkML schema, which is provided in the accompanying scRNAseq_metadata_schema.yaml file. Please refer to that schema for the complete details.
Briefly, a dataset is represented by a Dataset object which contains one or several Sample object(s) and optionally a dataset-wide Analysis object representing the clustering analysis performed on all the samples. Each Sample object may in turn contain sub-Sample objects and a sample-wide Analysis object. Each Analysis object contains one Cluster object for each of the identified clusters found by the clustering analysis.
The id and symbol fields in Analysis objects correspond to the Clustering_Analysis_ID and Clustering_Analysis_Name columns, respectively, in the scRNA-Seq_gene_expression_fb_*.tsv.gz file. Likewise, the id and symbol fields in Cluster objects correspond to the Cluster_ID and Cluster_Name columns in that same file.
This file provides the data used to generate the “Fly Cell Atlas Cell Type Expression Data” bar chart displayed on our Gene Report pages. For each gene that was found expressed in the Fly Cell Atlas dataset, it provides the mean expression level and the proportion of positive cells in the same 22 high level cell types displayed in the aforementioned bar chart. These data are calculated from FlyCellAtlas scRNA-Seq data for higher resolution cell clusters (having more detailed cell type classifications). For more detailed FlyCellAtlas data, and other scRNA-Seq data, please see the "Single Cell RNA-Seq Gene Expression" file.
NOTE: Not yet available; coming in the FB2023_06 release.
File format:
| Column heading | Content Description |
|---|---|
| gene_id | The unique FlyBase gene ID for this gene. |
| gene_Symbol | The official FlyBase symbol for this gene. |
| <cell_type> | Two colon-separated values: the mean expression level of the gene in <cell_type>, and the proportion of <cell_type> expressing the gene (percent). |
This file reports most high-throughput gene expression data that is featured in the High-Throughput Expression Data section of the FlyBase gene report. Data is sorted first by the expression section in which the dataset is displayed, then by sample ID, then by gene ID. Additional information about the dataset or the sample can be obtained by searching FlyBase with the appropriate FBlc dataset/sample ID (columns 2 and 4). Note that scRNA-Seq data is not included in this file, as it is structured differently; scRNA-Seq data is available in other download files. This file includes the testis specificity index score, as calculated by Vedelek et al. (2018)
File format:
| Column heading | Content Description |
|---|---|
| <High_Throughput_Expression_Section> | The name of the Gene report High-Throughput Expression Data section in which the data is reported. |
| <Dataset_ID> | The FBlc ID of the dataset. |
| <Dataset_Name> | The name of the dataset. |
| <Sample_ID> | The FBlc of the sample. |
| <Sample_Name> | The name of the sample. |
| <Gene_ID> | The FBgn ID of the gene. |
| <Gene_Symbol> | The gene symbol. |
| <Expression_Unit> | The unit of expression: e.g., RPKM, RPMM, TPM, LFQ_geom_mean_intensity, testis_specificity_index_score |
| <Expression_Value | The gene expression value. |
This file reports each individual experiment curated by FlyBase that supports a physical interaction between two gene products. There can be multiple experiments (multiple rows in the file) between products of the same gene pair. Interaction molecule types currently curated are protein-protein, protein-RNA or RNA-RNA.
This file is in PSI-MI TAB format, a tab-delimited format developed by the HUPO Proteomics Standards Initiative (PSI) Molecular Interactions (MI) working group to facilitate interactomics data comparison and exchange. Details on the general MITAB format can be found here. The file makes use of the Molecular Interactions ontology which can be searched or browsed here. Fields are filled with “-” if values are missing or not relevant.
File format:
| Column number | Column heading | General format | FlyBase example | Content description |
|---|---|---|---|---|
| 1 | ID(s) Interactor A | database:identifier | flybase:FBgn0002121 | The unique Flybase identifier for the first gene of the interacting pair. |
| 2 | ID(s) Interactor B | ” | ” | The unique Flybase identifier for the second gene of the interacting pair. |
| 3 | Alt ID(s) Interactor A | database:identifier | flybase:CG2671|entrez gene/locuslink:33156 | The alternative gene identifiers currently provided are Flybase annotation IDs (CG#) and NCBI’s Entrez Gene ID separated by “|”. |
| 4 | Alt ID(s) Interactor B | ” | ” | ” |
| 5 | Alias(es) Interactor A | database:name(alias type) | flybase:l(2)gl(gene name) | The official Flybase gene symbol. It is referred to as “gene name” to adhere to the psi-mi ontology. |
| 6 | Alias(es) Interactor B | ” | ” | ” |
| 7 | Interaction Detection Method(s) | ontology:identifier(method name) | psi-mi:"MI:0006"(anti bait coimmunoprecipitation) | The assay used to detect the interaction, taken from the psi-mi ontology. |
| 8 | Publication 1st Author(s) | surname initial(s) (publication year) | Betschinger K. (2003) | The first author and year of the publication where the interaction is described. |
| 9 | Publication ID(s) | database:identifier | flybase:FBrf0157155|pubmed:12629552 | The unique FlyBase identifier for the publication followed by the unique PubMed identifier (if there is one) separated by “|”. |
| 10 | Taxid Interactor A | taxid:identifier | taxid:7227("Drosophila melanogaster") | The NCBI taxonomy identifier for the source organism of the interactor. The vast majority of interactors in FlyBase come from D. melanogaster. There are, however, a few interspecies interactions consisting of a D. melanogaster interactor and an interactor of a different species. |
| 11 | Taxid Interactor B | ” | ” | ” |
| 12 | Interaction Type(s) | ontology:identifier(interaction type) | psi-mi:"MI:0915"(physical association) | Taken from the psi-mi ontology. Most often “physical association” for FlyBase. |
| 13 | Source Database(s) | ontology:identifier(database name) | psi-mi:"MI:0478"(flybase) | All interactions are curated by FlyBase. |
| 14 | Interaction Identifier(s) | database:identifier | flybase:FBrf0157155-13.coIP.WB | The unique FlyBase identifier for this interaction. |
| 15 | Confidence Value(s) | Not applicable | ||
| 16 | Expansion Method(s) | Not applicable | ||
| 17 | Biological Role(s) Interactor A | Not applicable | ||
| 18 | Biological Role(s) Interactor B | Not applicable | ||
| 19 | Experimental Role(s) Interactor A | ontology:identifier(experimental role name) | psi-mi:"MI:0496"(bait) | The role played by the interactor in the experiment. Taken from the psi-mi ontology. |
| 20 | Experimental Role(s) Interactor B | ” | ” | ” |
| 21 | Type(s) Interactor A | ontology:identifier(interactor type name) | psi-mi:"MI:0326"(protein) | The molecule type. For FlyBase, these are limited to protein or ribonucleic acid. Taken from the psi-mi ontology. |
| 22 | Type(s) Interactor B | ” | ” | ” |
| 23 | Xref(s) Interactor A | Not applicable | ||
| 24 | Xref(s) Interactor B | Not applicable | ||
| 25 | Interaction Xref(s) | database:identifier | flybase:FBig0000000103 | Cross references for the interactions. For Flybase, these include an interaction group identifier (FBig) and possibly a collection identifier (FBlc) separated by “|”. All experiments that show an interaction between the products of gene A and gene B are compiled into an A-B interaction group, such that all interactions are associated with an interaction group identified by an FBig number. Interactions identified as part of a large scale study are also associated with the collection identifier, or FBlc number. |
| 26 | Annotation(s) Interactor A | topic:text | isoform-comment:a isoform | Information on whether the interaction is specific to a particular interactor isoform. |
| 27 | Annotation(s) Interactor B | ” | ” | ” |
| 28 | Interaction Annotation(s) | topic:text | comment:Phosphorylated isoforms of @l(2)gl@ are absent when @aPKC@ is knocked down by RNAi. | Describes the source(s) of the interaction participants and includes free text comments about the interaction. |
| 29 | Host Organism(s) | Not applicable | ||
| 30 | Interaction Parameters | Not applicable | ||
| 31 | Creation Date | Not applicable | ||
| 32 | Update Date | Not applicable | ||
| 33 | Checksum Interactor A | Not applicable | ||
| 34 | Checksum Interactor B | Not applicable | ||
| 35 | Interaction Checksum | Not applicable | ||
| 36 | Negative | FALSE | All interactions in FlyBase are positive. | |
| 37 | Feature(s) Interactor A | feature_type:range(text) | sufficient binding region:aa 1-58(N-terminal region) | Describes features of Interactor A such as binding sites, mutations that disrupt the interaction, epitope tags, etc. |
| 38 | Feature(s) Interactor B | ” | ” | ” |
| 39 | Stoichiometry Interactor A | Not applicable | ||
| 40 | Stoichiometry Interactor B | Not applicable | ||
| 41 | Identification Method(s) Participant A | Not applicable | ||
| 42 | Identification Method(s) Participant B | Not applicable |
This file reports when functional complementation of Dmel genes by non-Dmel orthologs has been observed. This data is computed by FlyBase using a combination of the orthology data obtained from DIOPT and OrthoDB and the allele-level genetic interaction data curated from the literature. The file contains a list of gene Dmel - to - non-Dmel-ortholog gene pairs where a transgenic construct/mutant allele of the non-Dmel ortholog has been shown to at least partially suppress mutant phenotype(s) of an allele of the Dmel gene.
File format:
| Column number | Column heading | Content Description |
|---|---|---|
| 1 | Dmel gene (symbol) | Current FlyBase symbol of Dmel gene. |
| 2 | Dmel gene (FBgn) | Current FlyBase identifier (FBgn#) of Dmel gene in column 1. |
| 3 | Functionally complementing ortholog (symbol) | Current FlyBase symbol of a non-Dmel ortholog of the Dmel gene in column 1 where this non-Dmel gene has been show to functionally complement the Dmel gene. |
| 4 | Functionally complementing ortholog (FBgn#) | Current FlyBase identifier (FBgn#) of a non-Dmel ortholog of the Dmel gene in column 1 where this non-Dmel gene has been show to functionally complement the Dmel gene. |
| 5 | Supporting_FBrf | Current FlyBase identifier (FBrf#) of the publication that provides support for the functional complementation statement (the publication that reported the suppression of a mutant phenotype of the Dmel gene by a transgenic construct/mutant allele of the non-Dmel ortholog). |
Notes:
The file reports EMBL/GenBank/DDBJ nucleotide and protein accessions, UniProtKB/SwissProt/TrEMBL protein accessions, NCBI Entrez gene IDs and NCBI RefSeq transcript and protein accessions associated with FlyBase genes.
The file includes:
it excludes:
File format:
| Column number | Column heading | Content Description |
|---|---|---|
| 1 | gene_symbol | Current symbol of gene. |
| 2 | organism_abbreviation | Abbreviation (from the Species Abbreviations list) indicating the species of origin of the gene. |
| 3 | primary_FBgn# | Current FlyBase identifier (FBgn#) of gene. |
| 4 | nucleotide_accession | EMBL/GenBank/DDBJ nucleotide accession associated with the gene. |
| 5 | na_based_protein_accession | EMBL/GenBank/DDBJ protein accession associated with the gene and the nucleotide accession in the preceeding 'nucleotide_accession' column |
| 6 | UniprotKB/Swiss-Prot/TrEMBL_accession | UniProtKB/SwissProt/TrEMBL protein accession associated with the gene. |
| 7 | EntrezGene_ID | NCBI Entrez ID associated with the gene. |
| 8 | RefSeq_transcripts | NCBI RefSeq transcript accession associated with the gene. |
| 9 | RefSeq_proteins | NCBI RefSeq protein accession associated with the gene and the transcript accession in the preceeding 'RefSeq_transcripts' column. |
Notes:
The file reports current and secondary FlyBase identifiers associated with D. melanogaster genes, including current and secondary gene identifiers (FBgn#), and current and secondary annotation identifiers (CG#).
The file includes:
it excludes:
File format:
| Column heading | Content Description |
|---|---|
| gene_symbol | Current symbol of gene. |
| organism_abbreviation | Abbreviation (from the Species Abbreviations list) indicating the species of origin of the gene. |
| primary_FBgn# | Current FlyBase identifier (FBgn#) of gene. |
| secondary_FBgn#(s) | Secondary FlyBase identifier(s) (FBgn#) associated with the gene (comma separated values). |
| annotation_ID | Current annotation identifier associated with the gene. |
| secondary_annotation_ID(s) | Secondary annotation identifier(s) associated with the gene (comma separated values). |
Notes:
This file reports the relationship between the symbols and gene identifiers used by FlyBase for non-melanogaster genes identified by the AAA consortium, and the GLEANR identifier assigned to the gene during the initial annotation of the genome sequence.
The file includes:
it excludes:
File format:
| Column heading | Content Description |
|---|---|
| organism_abbreviation | Abbreviation (from the Species Abbreviations list) indicating the species of origin of the gene. |
| gene_symbol | Current FlyBase gene symbol. |
| primary_FBgn# | Current FlyBase identifier (FBgn#) of the gene. |
| GLEANR_ID | GLEANR identifier assigned by the AAA Consortium. |
This file reports the relationship of gene identifiers used by FlyBase for sequence localized D. melanogaster genes, and the identifiers used for the transcript and polypeptide products of these genes.
The file includes:
it excludes:
File format:
| Column heading | Content Description |
|---|---|
| FlyBase_FBgn | Current FlyBase identifier (FBgn#) of the gene. |
| FlyBase_FBtr | Current FlyBase identifier (FBtr#) of a transcript encoded by the gene listed in the preceeding 'FlyBase_FBgn' column. |
| FlyBase_FBpp | Current FlyBase identifier (FBpp#) of a polypeptide encoded by the transcript listed in the preceeding 'FlyBase_FBtr' column, where this is relevant. |
Notes:
This expanded version of the "FBgn <=> FBtr <=> FBpp IDs" file adds organism, symbol and type information to the identifiers for sequence localized D. melanogaster genes and their related transcript and protein products.
The file includes:
it excludes:
File format:
| Column number | Column heading | Content Description |
|---|---|---|
| 1 | organism | Abbreviation (from the Species Abbreviations list) indicating the species of origin of the gene. |
| 2 | gene_type | The type of gene, represented by a Sequence Ontology term. |
| 3 | gene_ID | Current "FBgn" identifier of gene. |
| 4 | gene_symbol | Current symbol of the gene. |
| 5 | gene_fullname | Current full name of the gene. |
| 6 | annotation_ID | Current FlyBase annotation identifier of the gene. |
| 7 | transcript_type | The type of transcript, represented by a Sequence Ontology term. |
| 8 | transcript_ID | Current FlyBase annotation identifier of the transcript. |
| 9 | transcript_symbol | Current symbol of the transcript. |
| 10 | polypeptide_ID | Current FlyBase annotation identifier of the polypeptide. |
| 11 | polypeptide_symbol | Current symbol of the polypeptide. |
Notes:
The file is generated by testing for overlaps, no matter how small, of the locations of Affy1 oligos in the genome with the locations of gene exons, as defined by the Dmel gene models for the current release of FlyBase. If the location of an Affy1 oligo shows any kind of overlap with an exon of a gene, a Gene=>Affy reference is recorded in this file.
The extent of the overlap has no influence on the inclusion of a crossreference in this file. The overlap might be just one nucleotide, or it could be an exact match to the exon. For interpretation of the significance of a partial overlap please contact Affymetrix.
The file includes the following Dmel genes:
it excludes:
Notes:
The file is generated from the location of Affy2 oligos exactly as described for Affy1 oligos above.
This file provides SO term annotations for D. melanogaster genes that have been mapped to the current genome assembly. It will be available beginning with the FB2021_02 release.
File format:
| Column heading | Content Description |
|---|---|
| gene_primary_id | The unique FlyBase gene ID for this gene. |
| gene_symbol | The official FlyBase symbol for this gene. |
| so_term_name | The SO term name. |
| so_term_id | The SO term primary identifier. |
The file reports available localization information for FlyBase genes.
It includes:
File format:
| Column heading | Content Description |
|---|---|
| organism_abbreviation | Abbreviation (from the Species Abbreviations list) indicating the species of origin of the gene. |
| current_symbol | Current FlyBase gene symbol. |
| primary_FBid | Current FlyBase identifier (FBgn#) of gene. |
| recombination_loc | recombination map location. |
| cytogenetic_loc | cytogenetic location. |
| sequence_loc | genomic location. |
The single best available gene summary is reported for each D. melanogaster gene (available in the FB2022_05 release).
Gene summaries are taken from the following sources, in order of decreasing rank:
For other non-D. melanogaster genes, please see FlyBase's "automated_gene_summaries.tsv.gz" file.
File format:
| Column heading | Content Description |
|---|---|
| FBgn_ID | Current FlyBase identifier number for the gene. |
| Gene_Symbol | Current FlyBase symbol of the gene. |
| Summary_Source | The source of the gene summary. |
| Summary | The gene summary text. |
The file contains the summaries found on gene report pages and the pop-ups in JBrowse and Interactions Browser in plain text.
It includes:
File format:
| Column heading | Content Description |
|---|---|
| - | FlyBase ID. The Valid FlyBase identifier number for the gene. |
| - | The gene summary as a string of plain text. |
The file contains in plain text the gene snapshot information visible on gene report pages.
It includes only Dmel protein coding genes.
File format:
| Column heading | Content Description |
|---|---|
| FBgn_ID | Current FlyBase identifier number for the gene. |
| GeneSymbol | Current FlyBase symbol of the gene. |
| GeneName | Current FlyBase name of the gene. |
| datestamp | Date on which the information was last reviewed. |
| gene_snapshot_text | Gene snapshot information for the gene. Cases that are in progress or are deemed to have insufficient data to summarize are stated as such. |
The file reports D. melanogaster genes and their unique protein isoforms.
The file includes:
it excludes:
File format:
| Column heading | Content Description |
|---|---|
| FBgn | Current FlyBase identifier (FBgn#) of the gene. |
| FB_gene_symbol | Current FlyBase gene symbol of the gene. |
| representative_protein | Current FlyBase protein symbol of the representative protein isoform. |
| identical_protein(s) | Current FlyBase protein symbol(s) of identical protein isoforms. |
This file reports all ncRNAs with gene models supported by FlyBase in JSON format, as submitted to RNAcentral. Pseudogenes are excluded. In addition to the symbols and IDs for ncRNAs, this file also includes their associated gene, genomic location, sequence, Sequence Ontology classification, etc. The full schema for this file is available here.
Note - from release FB2020_03 onward, this file reports only ncRNAs for D. melanogaster; earlier files include ncRNAs for D. ananassae, D. pseudoobscura pseudoobscura, D. simulans and D. virilis.
This file reports nomenclature and functional data (GO annotations, EC annotations, gene group membership) for D. melanogaster genes encoding enzymes, as defined by membership of the ENZYMES (FBgg0001715) gene group. If a gene is a member of multiple enzyme gene groups, then that gene has separate entries for each group of which it is a member.
The file includes:
it excludes:
File format:
| Column heading | Content Description |
|---|---|
| group_id | FlyBase gene group (FBgg) ID of the relevant terminal group within the ENZYMES (FBgg0001715) hierarchy (only terminal groups contain members). |
| group_name | FlyBase gene group (FBgg) name of relevant terminal group within the ENZYMES (FBgg0001715) hierarchy (only terminal groups contain members). |
| group_GO_ID | The GO molecular function term ID on the given gene group. Multiple entries are separated with a pipe. |
| group_GO_name | The GO molecular function term name on the given gene group. Multiple entries are separated with a pipe. |
| group_EC_number | The EC number on the given gene group, if present. (This is computed, corresponding to the EC cross-reference on the GO molecular function term.) |
| group_EC_name | The EC name on the given gene group, if present. (This is computed, corresponding to the EC cross-reference on the GO molecular function term.) |
| gene_id | The current FlyBase gene ID (FBgn) of the gene. |
| gene_symbol | The current FlyBase symbol of the gene. |
| gene_name | The current FlyBase name of the gene. |
| gene_EC_number | The EC number(s) associated with the gene, if present. Multiple entries are separated with a pipe. (This is computed, corresponding to the EC cross-reference(s) on any positive GO molecular function term(s) annotated to the gene.) |
| gene_EC_name | The EC name(s) associated with the gene, if present. Multiple entries are separated with a pipe. (This is computed, corresponding to the EC cross-reference(s) on any positive GO molecular function term(s) annotated to the gene.) |
This file reports D. melanogaster gene model status and any comments on the gene model, as reported in FlyBase Gene Reports.
File format:
| Column heading | Content Description |
|---|---|
| FB_id | FlyBase gene (FBgn) ID. |
| Symbol | FlyBase gene symbol. |
| Annotation status | One of: Current, Uncertain, Incomplete, Withdrawn, Unannotated, Not Applicable (see documentation here for more details). |
| Annotation_Comment | Textual comments about features or changes to the gene model, usually accompanied by the relevant annotation release number. |
This file reports certain curated relationships between D. melanogaster genes, as shown in FlyBase Gene Reports. Reciprocal relationships are included.
File format:
| Column heading | Content Description |
|---|---|
| Subject_FBgn_ID | FlyBase ID of the subject gene. |
| Relationship | Relationship between the given subject and object gene. One of: 'member_gene_of', 'has_component_gene', 'encoded_by'. See documentation here for more details. |
| Object_FB_ID | FlyBase ID of the object gene. |
| Object_Symbol | FlyBase symbol for the object gene. |
This file reports InterPro signatures found in the encoded protein(s) of D. melanogaster genes. Each row contains a single InterPro signature, so a gene whose product contains multiple signatures will be represented on multiple rows.
File format:
| Column heading | Content Description |
|---|---|
| FBgn_ID | FlyBase ID of the gene. |
| FBgn_Symbol | FlyBase symbol of the gene. |
| InterPro_Signature | ID and name of the InterPro signature (domain, superfamily, site, family), separated by a pipe. |
This file reports the 'representative publications' for a given D. melanogaster gene, as shown in the References section of a Gene Report. 'Representative publications' are those papers (up to 100) that are most likely to contain the most information on the gene, identified and scored using an algorithm that assesses the amount and type of data within FlyBase attached to each gene from each publication. See documentation here for more details.
File format:
| Column heading | Content Description |
|---|---|
| FBgn_ID | FlyBase ID of the gene. |
| Symbol | FlyBase symbol of the gene. |
| References | The FBrf and PMID of each representative publication, separated by a pipe, as a comma-separated list. A dash is used if a PMID is unavailable. |
Files described in this section are in the "go" subdirectory of the FTP site. Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/go/gene_association.fb.gz
The file contains the Gene Ontology (GO) controlled vocabulary (CV) terms assigned to FlyBase genes.
The file includes the following Dmel genes:
The columns of the file are described in section G.3.1. of the Reference manual.
This file contains mapping information for FlyBase D.mel protein coding genes to UniProtKB IDs as specified by the GO consortium
Files described in this section are in the "genes" subdirectory of the FTP site. Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/genes/gene_group_data_fb_current.tsv.gz
This file reports Gene Groups in FlyBase, together with their hierarchical relationships (where relevant) and member genes. Note, that as of FB202206, this file no longer contains Pathway groups, which can be found in a separate file (pathway_group_data_fb_*.tsv)
File format:
| Column heading | Content Description |
|---|---|
| FB_group_id | Current FlyBase identifier (FBgg##) of Gene Group. |
| FB_group_symbol | Current FlyBase symbol of Gene Group. |
| FB_group_name | Current FlyBase full name of Gene Group. |
| Parent_FB_group_id | Current FlyBase identifier (FBgg##) of parent of given Gene Group (if relevant). |
| Parent_FB_group_symbol | Current FlyBase symbol of parent of given Gene Group (if relevant). |
| Group_member_FB_gene_id | Current FlyBase identifier (FBgn##) of member gene (if terminal group). |
| Group_member_FB_gene_symbol | Current FlyBase symbol of member gene (if terminal group). |
Notes:
This file reports all Gene Groups in FlyBase, together with the corresponding HGNC 'gene family' ID (where relevant).
File format:
| Column heading | Content Description |
|---|---|
| FB_group_id | Current FlyBase identifier (FBgg##) of Gene Group. |
| FB_group_symbol | Current FlyBase symbol of Gene Group. |
| FB_group_name | Current FlyBase full name of Gene Group. |
| HGNC_family_ID | HGNC ID of equivalent human 'gene family'. |
Notes:
Pathway group data (pathway_group_data_fb_*.tsv)
This file reports all Signaling Pathway Gene Groups in FlyBase, together with their hierarchical relationships (where relevant) and member genes.
File format:
| Column heading | Content Description |
|---|---|
| FB_group_id | Current FlyBase identifier (FBgg##) of Signaling Pathway. |
| FB_group_symbol | Current FlyBase symbol of Signaling Pathway. |
| FB_group_name | Current FlyBase full name of Signaling Pathway. |
| Parent_FB_group_id | Current FlyBase identifier (FBgg##) of parent of given Signaling Pathway (if relevant). |
| Parent_FB_group_symbol | Current FlyBase symbol of parent of given Signaling Pathway (if relevant). |
| Group_member_FB_gene_id | Current FlyBase identifier (FBgn##) of member gene (if terminal group). |
| Group_member_FB_gene_symbol | Current FlyBase symbol of member gene (if terminal group). |
Notes:
This file reports all Metabolic Pathway Gene Groups in FlyBase, together with their hierarchical relationships (where relevant) and member genes.
File format:
| Column heading | Content Description |
|---|---|
| FB_group_id | Current FlyBase identifier (FBgg##) of Metabolic Pathway. |
| FB_group_symbol | Current FlyBase symbol of Metabolic Pathwayp. |
| FB_group_name | Current FlyBase full name of Metabolic Pathway. |
| Parent_FB_group_id | Current FlyBase identifier (FBgg##) of parent of given Metabolic Pathway (if relevant). |
| Parent_FB_group_symbol | Current FlyBase symbol of parent of given Metabolic Pathway (if relevant). |
| Group_member_FB_gene_id | Current FlyBase identifier (FBgn##) of member gene (if terminal group). |
| Group_member_FB_gene_symbol | Current FlyBase symbol of member gene (if terminal group). |
Notes:
Files described in this section are in the "alleles" or "stocks" subdirectory of the FTP site. Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/alleles/allele_genetic_interactions_current.tsv.gz
wget https://s3ftp.flybase.org/releases/current/precomputed_files/stocks/stocks_current.tsv.gz
The chado XML file generated from the FlyBase PostgreSQL database for the 'alleles' data class.
The chado XML file generated from the FlyBase PostgreSQL database for the 'stocks' data class.
This file reports genetic components and related information about Stocks in FlyBase.
Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/stocks/stocks_current.tsv.gz
File format:
| Column heading | Content Description | Example |
|---|---|---|
| FBst | The unique identifier assigned to this stock by FlyBase. | FBst0000002 |
| collection_short_name | A short name for the stock collection that holds the stock. | Bloomington |
| stock_type_cv | The controlled vocabulary term and unique identifier that describe the state of the stock. | living stock ; FBsv:0000002 |
| species | Abbreviation (from the Species Abbreviations list) indicating the species of the stock. | Dmel |
| FB_genotype | Genetic components of the stock corresponding to alleles, aberrations, balancers, or insertions in FlyBase. May be empty. | w[*]; betaTub60D[2] Kr[If-1]/CyO |
| description | Genetic components of the stock as provided to FlyBase by the collection that holds the stock. | FlyTrap: ZCL1796 III |
| stock_number | The stock identifier provided to FlyBase by the collection that holds the stock. May be empty. | 110818 |
The file reports controlled vocabulary (i.e. not free text) genetic interaction data associated with alleles. This is the data reported in the "Phenotypic Class" and "Phenotype Manifest in" subsections of the "Interactions" section of each Allele Report.
Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/alleles/allele_genetic_interactions_current.tsv.gz
File format:
| Column heading | Content Description |
|---|---|
| allele_symbol | Current FlyBase allele symbol. |
| allele_FBal# | Current FlyBase identifier (FBal#) of allele. |
| interaction | Interaction information associated with allele. |
| FBrf# | Current FlyBase identifer (FBrf#) of publication from which data came. |
Notes:
The file reports controlled vocabulary (i.e. not free text) phenotypic data associated with genotypes. This is the data reported in the Phenotypic Class and Phenotype Manifest in subsections of the Phenotypic Data section of each Allele Report.
Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/alleles/genotype_phenotype_data_current.tsv.gz
File format:
| Column heading | Content Description |
|---|---|
| genotype_symbols | Current FlyBase symbol(s) of the components that make up the genotype. |
| genotype_FBids | Current FlyBase identifier(s) of the components that make up the genotype. |
| phenotype_name | Phenotypic name associated with the genotype. |
| phenotype_id | Phenotypic identifier associated with the genotype. |
| qualifier_names | Qualifier name(s) associated with phenotypic data for genotype. |
| qualifier_ids | Qualifier identifier(s) associated with phenotypic data for genotype. |
| reference | Current FlyBase identifer (FBrf#) of publication from which data came. |
Notes:
* Homozygous or transheterozygous combinations of classical/insertional alleles at a single locus are separated by a '/'.
* Hemizygous combinations affecting a single locus (classical/insertional allele over a deficiency for that locus) are separated by a '/'.
* Heterozygosity for a classical/insertional allele or aberration is represented by '/+'.
* In all other cases, other genotype components (e.g. drivers, transgenic alleles) are separated by a space.
This file reports the relationship between gene identifiers and the identifiers used for alleles of these genes.
Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/alleles/fbal_to_fbgn_current.tsv.gz
File format:
| Column heading | Content Description |
|---|---|
| AlleleID | Current FlyBase identifier (FBal#) of the allele. |
| AlleleSymbol | Current symbol of the allele. |
| GeneID | Current FlyBase identifier (FBgn#) of the gene. |
| GeneSymbol | Current symbol of the gene. |
This file includes information for classical alleles, alleles associated with insertion(s), and alleles generated via CRISPR/Cas9 mutagenesis (which may include inserted sequence), for D.melanogaster genes only.
Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/alleles/dmel_classical_and_insertion_allele_descriptions_current.tsv.gz
File format:
| Column heading | Content Description |
|---|---|
| Allele (symbol) | Current FlyBase symbol of the allele. |
| Allele (id) | Current FlyBase identifier (FBal#) of the allele. |
| Gene (symbol) | Current FlyBase symbol of the parent gene of the allele. |
| Gene (id) | Current FlyBase identifier (FBgn#) of the parent gene of the allele. |
| Allele Class (term) | Allele class term(s) associated with the allele. |
| Allele Class (id) | Identifier(s) (FBcv#) corresponding to the terms in the Allele Class (term) column. |
| Insertion (symbol) | Current FlyBase symbol of any insertion(s) associated with the allele. The inserted element may be a natural transposable element, a transgenic construct or sequence inserted via CRISPR/Cas9 mutagenesis. |
| Insertion (id) | FlyBase identifier(s) (FBti#) corresponding to the symbols in the Insertion (symbol) column. |
| Inserted element type (term) | Term(s) that describe the type (e.g. enhancer trap, mis-expression element) of the insertion(s) associated with the allele. |
| Inserted element type (id) | Identifier(s) (FBcv#) corresponding to the terms in the Inserted element type (term) column. |
| Regulatory region (symbol) | Current FlyBase symbol of any regulatory region(s) present in the inserted element. |
| Regulatory region (id) | FlyBase identifier(s) corresponding to the symbols in the Regulatory region (symbol) column. |
| Encoded product/tool (symbol) | Current FlyBase symbol of any experimental tools encoded by the inserted element, where the tool is expected to be expressed as a separate product from the endogenous D. melanogaster gene affected by the insertion. |
| Encoded product/tool (id) | FlyBase identifier(s) (FBto#) corresponding to the symbols in the Encoded product/tool (symbol) column. |
| Tagged with (symbol) | Current FlyBase symbol of any experimental tools encoded by the inserted element, where the tool sequence is expected to "tag" the product of the endogenous D. melanogaster gene affected by the insertion. |
| Tagged with (id) | FlyBase identifier(s) (FBto#) corresponding to the symbols in the Tagged with (symbol) column. |
| Also carries (symbol) | Current FlyBase symbol of any experimental tools that are carried within the inserted element, where the function of the tool sequence does not depend on it being expressed (e.g. FRT site, attP site). |
| Also carries (id) | FlyBase identifier(s) (FBto#) corresponding to the symbols in the Also carries (symbol) column. |
| Description (text) | Free text description of the allele. |
| Description (supporting reference) | FlyBase identifier (FBrf#) of the source reference for the free text description. |
| Stocks (number) | Number of stocks that contain the allele in the Allele (symbol) column. |
Notes:
e.g.
* for the Allele Class (term) and Allele Class (id) columns. * for the Tagged with (symbol) and Tagged with (id) columns. * for the Description (text) and Description (supporting reference) columns.
Example:
Mps1[ald-1] A missense mutation.|Amino acid replacement: R7H. FBrf0182837|FBrf0187308+FBrf0200438
indicates that:
* Inserted element type (term) * Inserted element type (id) * Regulatory region (symbol) * Regulatory region (id) * Encoded product/tool (symbol) * Encoded product/tool (id) * Tagged with (symbol) * Tagged with (id) * Also carries (symbol) * Also carries (id)
This file includes information for split system combinations. It is located in the "alleles" subdirectory of the FTP site.
Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/alleles/split_system_combinations_current.tsv.gz
File format:
| Column heading | Content Description |
|---|---|
| FB_id | The Primary FlyBase identifier (FBco#), used to uniquely identify the split system combination in the database. |
| Symbol | The valid symbol that is used in FlyBase for the combination. |
| Component_Alleles | The FlyBase alleles comprising the given combination. |
| Stocks | A list of experimental lines that include this combination and which are available for order from public stock centers. |
| Synonyms | A list of symbols that have been used in the literature, or by FlyBase, to describe the combination. |
| References | A list of publications from which expression pattern data for the combination has been curated. |
Notes:
Multiple entries in a column are separated by a pipe '|'.
For the Component_Alleles column:
For the Stocks column:
Files described in this section are in the "orthologs" subdirectory of the FTP site. Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/orthologs/dmel_paralogs_fb_current.tsv.gz
The file reports D. melanogaster genes and their paralogs, as provided by DIOPT. (The version of DIOPT currently being used is shown in the 'Paralogs' -> 'Paralogs (via DIOPT)' section of a Gene Report.)
File format:
| Column heading | Content Description |
|---|---|
| FBgn_ID | Current FlyBase identifier (FBgn#) of the D. melanogaster gene. |
| GeneSymbol | Current FlyBase gene symbol of the D. melanogaster gene. |
| Arm/Scaffold | Arm upon which the D. melanogaster gene is localized. |
| Location | Location of D. melanogaster gene on the arm. |
| Strand | Strand of D. melanogaster gene ('1' indicates the positive strand, '-1' indicates the negative strand). |
| Paralog_FBgn_ID | Current FlyBase identifier (FBgn#) of the paralogous gene. |
| Paralog_GeneSymbol | Current FlyBase gene symbol of the paralogous gene. |
| Paralog_Arm/Scaffold | Arm upon which the paralogous gene is localized. |
| Paralog_Location | Location of paralogous gene on the arm. |
| Paralog_Strand | Strand of paralogous gene ('1' indicates the positive strand, '-1' indicates the negative strand). |
| DIOPT_score | DIOPT 'score' for the paralog call (i.e. the number of individual algorithms that support the call). |
Notes:
This file reports the human orthologs of D. melanogaster genes using the DIOPT dataset. Each line reports a single orthologous pair, which means that each human and D. melanogaster gene can appear in multiple lines. Note that ortholog calls supported by only 1 or 2 algorithms (DIOPT score <3) have been removed. Human genes are also associated with diseases (OMIM phenotypes) using the OMIM dataset.
File format:
| Column heading | Content Description |
|---|---|
| Dmel_gene_ID | Current FlyBase identifier (FBgn#) of the D. melanogaster gene. |
| Dmel_gene_symbol | Current FlyBase gene symbol of the D. melanogaster gene. |
| Human_gene_HGNC_ID | HGNC ID of orthologous human gene. |
| Human_gene_OMIM_ID | OMIM ID of orthologous human gene. |
| Human_gene_symbol | HGNC gene symbol of orthologous human gene. |
| DIOPT_score | DIOPT 'score' for orthology call (i.e. the number of individual algorithms that support the call). |
| OMIM_Phenotype_IDs | OMIM Phenotype ID of orthologous human gene (comma separated values). |
| OMIM_Phenotype_IDs[name] | OMIM Phenotype ID of orthologous human gene (with the corresponding OMIM name in square brackets). Multiple phenotype[name] entries are separated by a comma. |
Files described in this section are in the "human_disease" or "orthologs" subdirectory of the FTP site. Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/human_disease/disease_model_annotations_fb_current.tsv.gz
wget https://s3ftp.flybase.org/releases/current/precomputed_files/human_disease/human_disease_models_fb_current.tsv.gz
wget https://s3ftp.flybase.org/releases/current/precomputed_files/orthologs/dmel_human_orthologs_disease_fb_current.tsv.gz
wget https://s3ftp.flybase.org/releases/current/precomputed_files/human_disease/disease_implicated_variants_current.tsv
This file reports (i) all experimental-based disease model annotations (using DO IDs, from Disease Ontology), associated with alleles; and (ii) all 'potential' disease models based on orthology to human disease genes in OMIM (see FBrf0241599 for more information on this pipeline) for D. melanogaster. 'Alleles' encompass both classical alleles and transgenic alleles; the latter may relate to transgenic constructs of D. melanogaster genes or non-D. melanogaster genes (often human genes) inserted into the D. melanogaster genome. These disease model annotations are reported in the "Human Disease Model Data" -> "Disease Ontology (DO) Annotations" section of the Gene and Allele Reports.
File format:
| Column heading | Content Description |
|---|---|
| FBgn ID | Current FlyBase identifier (FBgn#) of the gene associated with the allele of an experimental annotation, or the D. melanogaster ortholog of a human gene associated with a disease in OMIM. |
| Gene symbol | Current FlyBase symbol of the gene in column 1. |
| HGNC ID | HGNC ID of the gene identified in column 1 where it is a human gene (experimental-based annotations only). |
| DO qualifier | Type of association between the object of annotation and the disease - one of 'model of', 'ameliorates', 'exacerbates', 'DOES NOT model', 'DOES NOT ameliorate' or 'DOES NOT exacerbate'. |
| DO ID | Disease Ontology (DO) ID. |
| DO term | Disease Ontology (DO) term. |
| Allele used in model (FBal ID) | Current FlyBase identifier (FBal#) of allele (experimental-based annotations only). |
| Allele used in model (symbol) | Current FlyBase symbol of allele (experimental-based annotations only). |
| Based on orthology with (HGNC ID) | HGNC ID of the human ortholog used for annotations based on orthology to human disease genes. |
| Based on orthology with (symbol) | HGNC gene symbol of the human ortholog used for annotations based on orthology to human disease genes. |
| Evidence/interacting alleles | Evidence code, with interacting allele(s) where appropriate. For experimental-based annotations, the evidence code is one of: 'inferred from mutant phenotype', 'in combination with', 'modeled by', 'is ameliorated by', 'is exacerbated by', 'is NOT ameliorated by' or 'is NOT exacerbated by'. Interacting alleles are give as 'FLYBASE:<allele_symbol>; FB:<FBal_ID>', with multiple alleles separated by a comma. For orthology-based annotations, the evidence code is 'inferred from electronic annotation'. |
| Reference (FBrf ID) | Current FlyBase identifier (FBrf#) of the source publication. |
This file includes data from the Human Disease Model Report pages. Any cell that contains multiple entries will have a pipe character separating them (e.g. FBhh0001234|FBhh0001230). Some fields will only have content in specific cases, as described below.
File format:
| Column heading | Content Description |
|---|---|
| Fb_id | The Primary FlyBase identifier number of the Human Disease Model (FBhh#######), used to uniquely identify the model in the database. |
| name | The valid full name that is used in FlyBase for the Human Disease Model. |
| name synonyms | A listing of other names, abbreviations, acronyms and terms that have been used to refer to the disease. |
| sub-datatype | This column will always contain "disease". |
| category | This column contains one of four terms which describe the disease's relationship to other diseases.
parent entity means that the term is a parent term of a phenotypic series in OMIM, for example Parkinson disease. sub-entity means that the term is a child of a parent entity, for example Parkinson disease 6, early-onset. Sub-entity terms will always have a parent term in the column parent_disease_FBhh. group entity means that the term contains sub-terms that are not part of a phenotypic series at OMIM, but were deemed by FlyBase to be related. For example, the group entity neurodevelopmental disorders, MECP2-related contains diseases that are related, in this case, by association with variants in the human gene MECP2. The column related_disease_FBhh will contain all members of the group. specific entity means that the term is not a child term of a parent entity (i.e. not part of a phenotypic series), for example chronic inflammatory lung disease. It may be a member of a group entity, such as Rett syndrome being a member of neurodevelopmental disorders, MECP2-related. |
| parent_disease_FBhh | For a sub-entity term, this is the ID of its parent term. It and the next column will be empty for any term that is not a sub-entity. |
| parent_disease_name | For a sub-entity term, the name of the parent entity term. |
| related_disease_FBhh | These are other FBhh IDs that are related to the the current FB_id. This may be through being part of a group entity, or associated for another reason by a curator. |
| related_disease_name | The full name of each of the related_disease_FBhh entries. |
| child_disease_FBhh | For a parent entity term, this contains the IDs all of its child terms, as associated in a phenotypic series by OMIM. |
| child_disease_name | For a parent entity term, this contains the names of all its child terms, as associated in a phenotypic series by OMIM. |
| OMIM_disease_ID | The MIM term or terms (MIM:######) of disease(s) at OMIM associated with this term, if an association exists. |
| OMIM_disease_name | The name of the OMIM disease term, if one is present in the OMIM_disease_ID column. |
| OMIM_gene_ID | The MIM term or terms (MIM:######) of gene(s) at OMIM associated with this term, if an association exists. |
| OMIM_gene_name | The name of the OMIM gene, if one is present in the OMIM_gene_ID column. |
| HGNC_gene_ID | The ID of the gene or genes at https://www.genenames.org/ (known as HGNC) associated with this term, if an association exists. |
| HGNC_gene_name | The name of the HGNC gene(s), if present in the HGNC_gene_ID column. |
| DO_ID | The ID or IDs associated with this term at disease-ontology.org, if an association exists. |
| DO_name | The name of the DOID(s), if present in the DO_ID column. |
| external_links | Links to other websites with information related to the term, separated by pipes. A list of the websites linked can be found here. |
| related_specific_diseases | For a sub-entity term, links to other diseases at OMIM that are part of the same phenotypic series. |
| implicated_human_gene | The FBgn and name, separated by a semicolon, for any human genes in FlyBase associated with the term. This will only be filled in if the human gene has been expressed in flies, so it will not necessarily include any/all of the genes mentioned in the OMIM_gene_name and HGNC_gene_name columns. |
| implicated_Dmel_gene | The FBgn and name, separated by a semicolon, for any Drosophila melanogaster genes in FlyBase associated with the term. |
| implicated_other_gene | The FBgn and name, separated by a semicolon, for any non-human, non-fly genes in FlyBase associated with the term. This will only be filled in if the gene has been expressed in flies. |
| description_overview | A curator-written summary of the disease model. |
| description_symptoms | A curator-written description of symptoms and phenotypes shared by related diseases. |
| description_genetics | A curator-written description of the genetics of the disease, including causative human gene, and pattern of inheritance. This field includes links to outside source(s), and is date-stamped as to when a FlyBase curator last consulted the source. |
| description_cellular | A curator-written description of the cellular phenotype and pathology characteristic of the disease. Entry includes links to outside source, and is date-stamped as to when a FlyBase curator last consulted the source. |
| description_molecular | A curator-written description of molecular information relevant to the disease, including information about the function of the causative gene, molecular information about mutations in the causative gene, and/or molecular information about mutant isoforms of the causative protein. Entry includes links to outside source, and is date-stamped as to when a FlyBase curator last consulted the source. |
| BDSC_link | A link to a collection of stocks at the Bloomington Drosophila Stock Center related to this term. |
This file reports the human orthologs of D. melanogaster genes using the DIOPT dataset. Each line reports a single orthologous pair, which means that each human and D. melanogaster gene can appear in multiple lines. Note that ortholog calls supported by only 1 or 2 algorithms (DIOPT score <3) have been removed. Human genes are also associated with diseases (OMIM phenotypes) using the OMIM dataset.
This is identical to the file of the same name listed under the 'Orthologs' section above.
This file lists human variant designations that have associated alleles in FlyBase.
File format:
| Column heading | Content Description |
|---|---|
| FB_name | The human variant designation, consisting of the human gene name, and the identity and location of the amino acid(s) changed. |
| FBrf | The reference that this variant was curated from. |
| Clinvar_id | The Variation ID of the human variant, which can be found at https://www.ncbi.nlm.nih.gov/clinvar/variation/[characters after the colon]/. |
| UP_SP_variant_id | The UniProtKB/SwissProt Variant ID of the human variant, which can be found at https://web.expasy.org/variant_pages/[characters after the colon]. |
| FB_synonyms | Synonyms of the human variant designation. |
| HumanHealth_id | The Human Disease Model (FBhh) ID(s) associated with this variant at FlyBase. |
| Allele_id | The Allele (FBal) ID(s) associated with this variant at FlyBase. |
Files described in this section are in the "species" subdirectory of the FTP site.
This file lists all the species for which FlyBase has some information.
Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/species/organism_list_fb_current.tsv.gz
FlyBase includes gene reports for genes derived from species within the family Drosophilidae, as well as gene reports for non-drosophilid genes that have been introduced into a Drosophila genome via either transposable-element based transgenic constructs or via targeted insertion of DNA by a technique such as homologous recombination or CRISPR/Cas9. In this case, there will be a species 'Abbreviation' in the table, a standard prefix that is used in FlyBase as the first part of the symbol (before the '\') of any object, e.g. a gene or allele, that originates from this species.
In addition, information about non-Drosophilid species is also included in orthology data that is diplayed on gene reports and on G/JBrowse. In this case, a species 'Abbreviation' is not automatically generated in the database for the species, and thus the column in the table may be blank.
The file thus includes information for both Drosophilid and non-Drosophilid species.
File format:
| Column heading | Content Description |
|---|---|
| Genus | The genus designation of the organism. |
| Species name | The species designation of the organism. |
| Abbreviation | The standard FlyBase prefix for the species. This abbreviation is used in FlyBase as the first part of the symbol (before the '\') of any object, e.g. a gene or allele, that originates from this species. This column may be blank, if no individual report page exists for that species in FlyBase. |
| Common name | The NCBI Taxonomy Database common name of the organism. This column may be blank. |
| Ncbi-taxon-id | The NCBI Taxonomy Database Taxon ID for the organism. This column may be blank. |
| drosophilid | If the species is from the family Drosophilidae, this column is filled in with 'y'. |
The ontology files used by FlyBase are in the OBO format OBO Library used by the Open Biomedical Ontology group.
Ontologies undergo continual development. Links are provided to the 'frozen versions' used for the current release of FlyBase, together with links to the current 'live' versions at external sites.
List of ontologies available for download:
List of ontologies available for download:
Note: link points to the ontology version fbbt-simple.obo, which lacks a few minor FlyBase specific changes that are present in the 'fly_anatomy.obo' version
Note: link points to the ontology version fbbt-simple.obo, which lacks a few minor FlyBase specific changes that are present in the 'fly_development.obo' version
Note: link points to the ontology version fbcv-simple.obo, which lacks a few minor FlyBase specific changes that are present in the 'flybase_controlled_vocabulary.obo' version
Links are available to the following FTP repositories:
From release FB2020_03 onward, the above links are available for downloading only D. melanogaster data.
For releases FB2018_06 to FB2020_02, the above links are available for the following sequenced Drosophila species:
| Species name | Abbreviation |
|---|---|
| Drosophila melanogaster | Dmel |
| Drosophila ananassae | Dana |
| Drosophila pseudoobscura pseudoobscura | Dpse |
| Drosophila simulans | Dsim |
| Drosophila virilis | Dvir |
For earlier archived releases, the above links are also available for these additional species (other members of the original 12 sequenced Drosophila species):
| Species name | Abbreviation |
|---|---|
| Drosophila erecta | Dere |
| Drosophila grimshawi | Dgri |
| Drosophila mojavensis | Dmoj |
| Drosophila persimilis | Dper |
| Drosophila sechellia | Dsec |
| Drosophila willistoni | Dwil |
| Drosophila yakuba | Dyak |
The FlyBase FASTA files generally follow the FASTA format guidelines with one exception being that our header lines sometime exceed the 80 character limit. The FASTA filenames follow these formats:
dmel-all-
or
dmel-<chromosome_arm>-<data_type>-r<release-number>.fasta.gz
Where data_type is one of the following entries in the table below. The all files contain sequences for those data types on all chromosome arms whereas the specific chromosome arm have only those features for that particular chromosome.
| Data Type | Content Description |
|---|---|
| aligned | The region of genomic sequence that analysis features align to. |
| CDS | The contiguous protein coding sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon. |
| chromosome | The sequence of each chromosome arm. |
| clones | The sequence of full length cDNA, 3' and 5' ESTs, and partial length clones. |
| exon | The sequence of each exon split up into individual FASTA records. |
| five_prime_UTR | The sequence of 5' untranslated regions. |
| gene | The sequence of the gene span. |
| gene_extended2000 | The sequence of the gene span with 2000 base pairs added upstream and downstream. |
| intergenic | The sequence of chromosomal regions between genes that do not contain known gene models. |
| intron | The sequence of each intron split up into individual FASTA records. |
| miRNA | The sequence of transcripts that are typed as micro RNAs. |
| miscRNA | The sequence of transcripts that are typed as small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), or ribosomal RNA (rRNA). May also contain other transcript types that do not exist in their own individual files. |
| ncRNA | The sequence of transcripts that are typed as non coding RNAs (ncRNA). |
| predicted | The sequence of various features that are derived from a variety of prediction algorithms. These can encompass analyses conducted by FlyBase or by 3rd party groups. |
| pseudogene | The sequence of transcripts that are typed as pseudogenes. |
| sequence_features | The sequence of sequence features, which currently describe data about RNAi reagents. In the future, it will also contain natural genomic features (aside from transcribed regions), such as replication origins, transcription factor binding sites and boundary elements, and other experimental reagents that map to the genome, such as microarray oligonucleotides and rescue fragments. |
| synteny | The sequence of syntenic regions between two species. |
| three_prime_UTR | The sequence of 3' untranslated regions. |
| transcript | The sequence of transcripts that are typed as messenger RNAs (mRNA). |
| translation | The resulting protein sequence from protein coding transcripts. |
| transposon | The sequence of transposable elements inserted into the reference genome assembly. See TE insertion section below for more details. |
| tRNA | The sequence of transcripts that are typed as transfer RNAs (tRNA). |
The typical format of our FASTA header begins with an ID followed by any number of fields that follow this format
field_name=value;
Multiple field values are separated by commas
field_name=value1,value2;
This table describes some of the field names found in our FASTA headers
| Field Name | Description |
|---|---|
| type | The feature type of the FASTA sequence record. |
| loc | The genomic location given in the NCBI's feature location format. Please see the NCBI's site for more information. |
| ID | A unique ID. IDs in the form of FBxx[0-9]+ are a unique FlyBase object identifier. |
| name | The name or symbol of the feature. |
| dbxref | Database cross references relating to the FASTA record. The dbxref values use a 'dbname:dbid' format. |
| MD5 | An MD5 checksum calculated from the sequence that can be used to identify identical sequences. |
| length | The length of the sequence found in the FASTA record. |
| release | The release number denotes the annotation release which this FASTA record corresponds to. |
| species | The species abbreviation that this FASTA record corresponds to. |
The FlyBase GFF files follow the GFF v3 specification. The GFF files contain feature line definitions for gene models, predicted features, alignments, and many other features.
For melanogaster, there are 4 GFF files distributed:
The current GFF files D.mel and can be downloaded from our FTP site using this URL form
wget https://s3ftp.flybase.org/genomes/dmel/current/gff/dmel-all-current.gff.gz
or for selected releases by using the release number e.g.:
wget https://s3ftp.flybase.org/genomes/Drosophila_melanogaster/dmel_r5.57_FB2014_03/dmel-all-r5.57.gff.gz/
wget https://s3ftp.flybase.org/genomes/Drosophila_melanogaster/dmel_r5.57_FB2014_03/dmel-dmel_mitochondrion_genome-r5.57.gff.gz/
The FlyBase GTF files follow the GTF v2.2 specification. The GTF files contain feature line definitions for gene models.
The GTF are produced for each species and can be downloaded from our FTP site using this URL form:
https://s3ftp.flybase.org/genomes/<species abbreviation>/current/gtf/
e.g. https://s3ftp.flybase.org/genomes/dmel/current/gff/
The chado XML file generated from the FlyBase PostgreSQL database for the 'transcripts' data class.
The chado XML file generated from the FlyBase PostgreSQL database for the 'polypeptide' data class.
This file reports all ncRNAs with gene models supported by FlyBase in JSON format, as submitted to RNAcentral. Pseudogenes are excluded. In addition to the symbols and IDs for ncRNAs, this file also includes their associated gene, genomic location, sequence, Sequence Ontology classification, etc. The full schema for this file is available here.
Note - from release FB2020_03 onward, this file reports only ncRNAs for D. melanogaster; earlier files include ncRNAs for D. ananassae, D. pseudoobscura pseudoobscura, D. simulans and D. virilis.
Files described in this section are in the "insertions" subdirectory of the FTP site (unless otherwise noted). Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/insertions/insertion_mapping_fb_current.tsv.gz
The chado XML file generated from the FlyBase PostgreSQL database for the 'insertions' data class.
The chado XML file generated from the FlyBase PostgreSQL database for the 'transgenic constructs' data class.
The construct_maps.zip file unpacks as a directory containing maps of recombinant constructs and transgenic transposons generated by FlyBase, that are based on the compiled sequence data curated by FlyBase. The name of each PNG image in the directory corresponds to the FlyBase identifier of the respective recombinant construct or transgenic transposon.
Please note: For transgenic transposons, the image may be a map of the corresponding plasmid form.
The insertion mapping table reports available localization information for Dmel insertions.
File format:
| Column heading | Content Description |
|---|---|
| insertion_symbol | Current symbol of insertion. |
| FBti# | Current FlyBase identifier (FBti#) of insertion. |
| genomic_location | Genomic location of insertion. |
| range | Range (t/f) indicates whether genomic location is range or single base. |
| orientation | Orientation indicates orientation of insertion on chromosome (see note below). |
| estimated_cytogenetic_location | Estimated cytogenetic location based on correlation of genomic location and estimated genomic location of cytological bands. |
| observed_cytogenetic_location | Observed cytogenetic location reported in the literature. |
From FB2025_04 onwards, orientation can be either 0, 1, or -1:
(Prior to FB2025_04, '0' was not used, and '1' indicated either that the element was inserted in the plus orientation or that no orientation information was available).
These files, in FASTA or GFF format, represent 'canonical' sequences of transposable elements of Drosophila species (primarily but not exclusively of D. melanogaster), including the protein sequences of encoded genes. Based on a file originally compiled by Michael Ashburner; currently maintained by Casey Bergman.
To download the latest files:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/transposons/transposon_sequence_set.fa.gz
wget https://s3ftp.flybase.org/releases/current/precomputed_files/transposons/transposon_sequence_set.gff.gz
Information for transposable element insertions in the D. melanogaster reference genome assembly (Release 6) are available in a FASTA file, which offers not only the insertion sequences, but also the genomic coordinates (in the FASTA headers). See the FASTA_files section for file format details.
To download the latest version of this file, use the code below:
wget https://s3ftp.flybase.org/genomes/Drosophila_melanogaster/current/fasta/dmel-all-transposon-current.fasta.gz
This set is limited to insertions of transposons retaining both intact ends. Over 90% of these annotated TE insertions were identified by Quesneville et al., 2005, which involved analysis of the Release 4 reference genome assembly. In 2008, an analysis of the Release 5 reference genome assembly by DeBaryshe and Parude (FBrf0205582, FBrf0206616) identified 44 new HeT-A, HeT-Tag, TART-A and TART-C elements (corrections were made to another 11 annotations). A more recent analysis of the Release 6 genome assembly by Govindarajan et al., 2021, using a similar algorithm to the Quesneville publication, identified 319 insertions. All TE insertion annotations identified in the Release 4 and Release 5 assemblies have been re-mapped to the current Release 6 genome assembly where possible; see the FBrf0224938 analysis for a list of 18 TE insertions that could not be mapped to the Release 6 assembly. For additional details on TE insertion annotation, see FBrf0261225.
These transposable element insertions are displayed in the "Natural TE" JBrowse track (under the "Reference Genome" tracks section).
Note - repeat regions on the D. melanogaster Release 6 reference genome assembly can be viewed in JBrowse (see the "Repeat region" track in the "Reference Genome" tracks section). These repeat regions were identified in RepeatRunner and RepeatMasker analyses of the Release 5 genome (Smith et al., 2007), and subsequently lifted over to the current Release 6 assembly. As such, these analyses may be out of date. These regions are not currently available for download. We recommend going to RepeatMasker directly for the most current repeat region analysis.
This file reports a list of all GAL4 drivers that have been curated to at least 21 references and/or are among 150 most frequently requested GAL4 stocks from the Bloomington Drosophila Stock Center, in JSON format. In addition to the symbols and IDs for Scer\GAL4 alleles, this file also includes their associated transposon or insertion, associated gene, expression pattern in controlled vocabulary stage and anatomy terms, stocks, and publications, all with IDs, as well as free text expression pattern descriptions. This file, except for publications and stocks, is also available in TSV format here.
This file includes information for transgenic constructs. It is located in the "transposons" subdirectory of the FTP site. Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/transposons/transgenic_construct_descriptions_current.tsv.gz
File format:
| Column heading | Content Description |
|---|---|
| Component Allele (symbol) | Current FlyBase symbol of the component allele of a transgenic construct. |
| Component Allele (id) | Current FlyBase identifier (FBal#) of the component allele in the Component Allele (symbol) column. |
| Transgenic Construct (symbol) | Current FlyBase symbol of transgenic construct(s) that carry the component allele. |
| Transgenic Construct (id) | FlyBase identifier(s) (FBtp#) corresponding to the symbols in the Transgenic Construct (symbol) column. |
| Transgenic Product class (term) | Sequence Ontology (SO) term(s) that describe the nature of the product encoded by the component allele of the transgenic construct. |
| Transgenic Product class (id) | SO id(s) corresponding to the terms in the Transgenic Product class (term) column. |
| Regulatory region (symbol) | Current FlyBase symbol of the regulatory region(s) that drive expression of the product encoded by the component allele of the transgenic construct. |
| Regulatory region (id) | FlyBase identifier(s) (FBgn#, FBsf# or FBto#) corresponding to the symbols in the Regulatory region (symbol) column. |
| Encoded product/tool (symbol) | Current FlyBase symbol of the product encoded by the component allele of the transgenic construct. |
| Encoded product/tool (id) | FlyBase identifier(s) (FBgn#, FBsf# or FBto#) corresponding to the symbols in the Encoded product/tool (symbol) column. |
| Tagged with (symbol) | Current FlyBase symbol of any experimental tools that "tag" the product encoded by the component allele of the transgenic construct. |
| Tagged with (id) | FlyBase identifier(s) (FBto#) corresponding to the symbols in the Tagged with (symbol) column. |
| Also carries (symbol) | Current FlyBase symbol of any experimental tools that are carried within the transgenic construct, but do not form part of the gene product encoded by the component allele. |
| Also carries (id) | FlyBase identifier(s) (FBto#) corresponding to the symbols in the Also carries (symbol) column. |
| Description (text) | Free text description of the component allele. |
| Description (supporting reference) | FlyBase identifier (FBrf#) of the source reference for the free text description. |
| Stocks (number) | Number of stocks that contain an insertion of the Transgenic Construct in the Transgenic Construct (symbol) column. |
Notes:
e.g.
* for the Transgenic Product class (term) and Transgenic Product class (id) columns. * for the Tagged with (symbol) and Tagged with (id) columns. * for the Description (text) and Description (supporting reference) columns.
Example:
Scer\GAL4[sim.PS] P{GAL4-sim.S} 3.5kb EcoRV sim promoter fragment regulates expression of a GAL4 driver. FBrf0093209+FBrf0096245
indicates that:
The chado XML file generated from the FlyBase PostgreSQL database for the 'aberrations' data class.
The chado XML file generated from the FlyBase PostgreSQL database for the 'balancers' data class.
This file reports gene deletion and duplication data for chromosomal aberrations, as reported in FlyBase Aberration Reports.
Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/aberrations/aberration_experimental_gene_del_dup_data_current.tsv.gz
File format:
| Column heading | Content Description |
|---|---|
| gene_id | Current FlyBase identifier (FBgn#) of the gene. |
| gene_symbol | Current symbol of the gene. |
| type | Description of how the gene is affected by the aberration listed in the aberration_id/aberration_symbol column pair, determined experimentally (either by genetic complementation analysis or molecular mapping). |
| aberration_id | Current FlyBase identifier (FBab#) of the aberration. |
| references | FlyBase identifier(s) (FBrf#) of the source reference(s). |
Note:
KEY to type column:
| Value in type column | Description |
|---|---|
| completely deleted/disrupted (complementation) | the gene is reported to be completely deleted/disrupted by the aberration, as determined by genetic complementation analysis |
| partially deleted/disrupted (complementation) | the gene is reported to be partially deleted/disrupted by the aberration, as determined by genetic complementation analysis |
| not deleted/disrupted (complementation) | the gene is reported not to be removed or broken by the aberration, as determined by genetic complementation analysis |
| completely deleted (molecular) | the gene is reported to be completely deleted by the aberration, as determined by molecular mapping |
| partially deleted (molecular) | the gene is reported to be partially deleted by the aberration, as determined by molecular mapping |
| not deleted/disrupted (molecular) | the gene is reported not to be removed or broken by the aberration, as determined by molecular mapping |
| completely duplicated (complementation) | the gene is reported to be fully duplicated within the aberration, as determined by genetic complementation analysis |
| partially duplicated (complementation) | the gene is reported to be partially duplicated within the aberration, as determined by genetic complementation analysis |
| not duplicated (complementation) | the gene is reported not to be duplicated within the aberration, as determined by genetic complementation analysis |
| completely duplicated (molecular) | the gene is reported to be fully duplicated within the aberration, as determined by molecular mapping |
| partially duplicated (molecular) | the gene is reported to be partially duplicated within the aberration, as determined by molecular mapping |
| not duplicated (molecular) | the gene is reported not to be duplicated within the aberration, as determined by molecular mapping |
Files described in this section are in the "metadata" subdirectory of the FTP site. Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/metadata/dataset_metadata_current.tsv.gz
This file lists all features that are associated with a dataset/collection (e.g., genes, cDNA clones, TF_binding_sites, Affymetrix probes).
File format:
| Column heading | Content Description |
|---|---|
| Dataset_Metadata_ID | The unique FlyBase ID for the dataset. |
| Dataset_Metadata_Name | The official FlyBase symbol for the dataset. |
| Item_ID | The unique FlyBase ID for the feature associated with this dataset. |
| Item_Name | The official FlyBase symbol for the feature associated with this dataset. |
The chado XML file generated from the FlyBase PostgreSQL database for the 'clones' data class.
The file reports basic cDNA clone data in FlyBase.
This file is in the "reagents" subdirectory of the FTP site.
Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/reagents/cDNA_clone_data_fb_current.tsv.gz
Note: Prior to FB2025_04, this file was located in the "clones" subdirectory of the FTP site.
File format:
| Column heading | Content Description |
|---|---|
| FBcl# | Current FlyBase identifier (FBcl#) of cDNA clone. |
| organism_abbreviation | Abbreviation (from the Species Abbreviations list) indicating the species of origin of the clone. |
| clone_name | Clone name. |
| dataset_metadata_name | Name of dataset associated with clone. |
| cDNA_accession(s) | EMBL/GenBank/DDBJ cDNA accession number. |
| EST_accession(s) | EMBL/GenBank/DDBJ EST accession number. |
The file reports basic genomic clone data in FlyBase.
This file is in the "reagents" subdirectory of the FTP site.
Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/reagents/genomic_clone_data_fb_current.tsv.gz
Note: Prior to FB2025_04, this file was located in the "clones" subdirectory of the FTP site.
File format:
| Column heading | Content Description |
|---|---|
| FBcl# | Current FlyBase identifier (FBcl#) of genomic clone. |
| organism_abbreviation | Abbreviation (from the Species Abbreviations list) indicating the species of origin of the clone. |
| clone_name | Clone name. |
| accession | EMBL/GenBank/DDBJ cDNA accession number. |
This file reports antibody data for genes, as reported in FlyBase Gene Reports.
This file is in the "reagents" subdirectory of the FTP site.
Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/reagents/antibody_information_current.tsv.gz
File format:
| Column heading | Content Description |
|---|---|
| gene_id | Current FlyBase identifier (FBgn#) of the gene that encodes the target of the antibody. |
| gene_symbol | Current symbol of the gene that encodes the target of the antibody. |
| antibody_source | Source of the antibody; either "lab generated" or "commercial". |
| antibody_clonality | Clonality of the antibody; either "polyclonal" or "monoclonal". |
| pub_id | FlyBase identifier (FBrf#) of the publication that describes the generation of the antibody (for lab generated antibodies only). |
| citation | FlyBase citation of publication that describes the generation of the antibody (for lab generated antibodies only). |
| supplier | Name of antibody supplier (commercial antibodies only); currently limited to DSHB (Developmental Studies Hybridoma Bank in Iowa) and Cell Signaling Technology. |
| product_number | Product number of the antibody for the commercial entity in the supplier column (commercial antibodies only). |
Notes:
This file reports genetic components and related information about Stocks in FlyBase.
This file is in the "stocks" subdirectory of the FTP site.
Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/stocks/stocks_current.tsv.gz
Files described in this section are in the "references" subdirectory of the FTP site. Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/references/fbrf_pmid_pmcid_doi_current.tsv.gz
The chado XML file generated from the FlyBase PostgreSQL database for the 'references' data class.
This file lists all publications in the FlyBase bibliography that have a PubMed ID. Additional identifiers are listed as applicable.
File format:
| Column heading | Content Description |
|---|---|
| FBrf | The unique FlyBase ID for this publication. |
| PMID | The unique PubMed ID for this publication. |
| PMCID | The unique PubMed Central ID for this publication, if applicable. |
| DOI | The digital object identifier assigned to the publication. |
| pub_type | The publication type (for example, paper, review, erratum, abstract, book, etc.) |
| miniref | A short citation listing the first author, year of publication, journal, volume, issue and page numbers. |
| pmid_added | The FlyBase release in which the publication was first incorporated into the FlyBase bibliography. Note: as this report first generated for fb_2012_01 release, all publications associated with a Pub Med ID prior to this release have pmid_added = fb_2011_10. |
This file reports the 'representative publications' for a given D. melanogaster gene, as shown in the References section of a Gene Report. 'Representative publications' are those papers (up to 100) that are most likely to contain the most information on the gene, identified and scored using an algorithm that assesses the amount and type of data within FlyBase attached to each gene from each publication. See documentation here for more details.
File format:
| Column heading | Content Description |
|---|---|
| FBgn_ID | FlyBase ID of the gene. |
| Symbol | FlyBase symbol of the gene. |
| References | The FBrf and PMID of each representative publication, separated by a pipe, as a comma-separated list. A dash is used if a PMID is unavailable. |
This is a tab delimited file that FlyBase uses to relate sequence coordinates from release 5 of the Drosophila melanogaster sequence assembly to published cytogenetic map positions. A description of how this is calculated is provided in section G.5.1. of the Reference manual.
The data for each chromosome arm is separated by a line starting with a '#' that lists the name of the chromosome arm and corresponding sequence scaffold.
The columns in the file are:
| Column heading | Content Description |
|---|---|
| - | Cytogenetic map position as described by Bridges. |
| - | First sequence coordinate for this map position in the sequence scaffold corresponding this chromosome arm. |
| - | Last sequence coordinate for this map position in the sequence scaffold corresponding this chromosome arm. |
This is the table that FlyBase uses to infer a genetic map position from a published cytogenetic map position for Drosophila melanogaster.
The first six lines of the file describe the contents of the file or are blank. The data in the file is organized with the cytological position in first four characters of a line followed by a run of spaces and then the genetic map position.
This is a tab separated file generated from the cytotable.txt and genome-cyto-seq.txt files that infers the relationship between published cytogenetic map positions, genetic map positions and release 6 sequence assembly coordinates for Drosophila melanogaster. Please note that band numbers are not given in this file because they are absent in cytotable.txt.
File format:
| Column heading | Content Description |
|---|---|
| Cytogenetic map position | Cytogenetic map position. |
| Genetic map position | Genetic map position. |
| Sequence coordinates (release 6) | Sequence coordinates (release 6) for the interval. |
| R6 conversion notes |
An html version of this file is also available - see the Map Conversion Table page.
This is identical to the file listed under the genes section above.
Files described in this section are in the “chemicals” subdirectory of the FTP site. Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/chemicals/chemicals_current.tsv.gz
wget https://s3ftp.flybase.org/releases/current/precomputed_files/chemicals/chem_synonyms_fb_*.tsv.gz
This file contains all chemicals annotated to a publication, as seen on the Chemical reports (data class FBch#). This file is available as of the FB2025_01 FlyBase release.
File format:
| Column heading | Content Description |
|---|---|
| FB_id | Current FlyBase identifier (FBch#) of the chemical. |
| FB_name | Current FlyBase name of the chemical. |
| FB_synonyms | Synonyms for the chemical uniquely used in FlyBase publications that are not included in another synonym column (see below), if present. |
| InChIKey | A standardized string that describes the chemical’s structure, as developed by https://www.inchi-trust.org/. |
| PubChem_id | PubChem CID (Compound ID) of the chemical, from https://pubchem.ncbi.nlm.nih.gov/, if available. |
| PubChem_synonyms | PubChem synonyms associated with the CID. |
| ChEBI_id | ChEBI ID of the chemical, from https://www.ebi.ac.uk/chebi/init.do, if available. |
| ChEBI_name | ChEBI name associated with the ChEBI ID, if available. |
| ChEBI_synonyms | ChEBI synonyms associated with the ChEBI ID. |
| ChEBI_definition | ChEBI definition associated with the ChEBI ID, if available. ChEBI’s documentation for definitions can be found here. |
| ChEBI_roles | ChEBI Roles Classification associated with the ChEBI ID, if available. FlyBase captures the Biological Role and Application sub-classifications associated with the chemical. ChEBI’s documentation for Roles Classification can be found here. |
This file contains all chemical name synonyms captured from a publication, as seen on the Chemical reports (data class FBch#). This file is available as of the FB2025_01 FlyBase release.
File format:
| Column heading | Content Description |
|---|---|
| Publication_ID | Current FlyBase identifier (FBrf#) of the publication. |
| FB_Chemical_ID | Current FlyBase ID of a chemical attached to the FBrf in the first column. |
| FB_Chemical_Name | Current FlyBase name of the chemical. |
| Author Synonym | Synonym for the chemical, as used in the text of the publication. |
Files described in this section are in the "experimental_tools" subdirectory of the FTP site. Download the latest file using a query of this form:
wget https://s3ftp.flybase.org/releases/current/precomputed_files/experimental_tools/experimental_tool_data_current.tsv.gz
This file includes information for experimental tools.
File format:
| Column heading | Content Description |
|---|---|
| Symbol | Current FlyBase symbol of the experimental tool. |
| FlyBase ID | Current FlyBase identifier (FBto#) of the experimental tool. |
| Name | Current FlyBase full name of the experimental tool. |
| Uses (term) | Term(s) that describe how the experimental tool is used. |
| Uses (id) | Identifier(s) corresponding to the terms in the Uses (term) column. |
| Description | A short textual description of the experimental tool. |
| Compatible tools (symbol) | Current FlyBase symbol of experimental tools that are compatible with the tool in the Symbol column. |
| Compatible tools (id) | FlyBase identifier(s) (FBto#) corresponding to the symbols in the Compatible tools (symbol) column. |
Notes:
e.g.
* for the Uses (term) and Uses (id) columns. * for the Compatible tools (symbol) and Compatible tools (id) columns.