Fill-Mask • 0.5B • Updated • 14.6k • 33
sequence stringlengths 199 600 | name stringlengths 5 30 | label int32 0 2 | task stringclasses 18
values |
|---|---|---|---|
TCACTTCGATTATTGAGGCAGTCTTCATTAAAGTTTATTACAATGGATATGGTATCACCAGTCTTGAACCTACAATCATCTATTTTAGGTGAGCTCGTAGGCATTATTGGAAAAGTGTTCTTTCTCTTAATAGAAGAGATTAAATACCCGATAATCACACCCAAAATTATTGTGGATGCCCAGATATCTTCTTGGTCATTGTTTTTTTTCGCTTCAATCTGTAATCTCTCTGCAAAATTTCGGGAGCCAATAGTGACAACATCGTCAATAATAAGTTTGATGGAATCGGAAAAAGATCTTAAAAATGTAAATGAGTATTT... | YBR063C_YBR063C_367930|0 | 0 | H3 |
CAGTAGTGGCATAAACCCAAGGAACAGAGCCAGTGGTACTCCATCCAATGAGCGGGCTCGTCCGGCGTCGGGTATCAGTTCGTTTTTGAATACCTTCGGAATTAGGCAAAATAGCCAGACAGCTTCTTCTTCTGCGGCTCCTGATCAGCGTCTATTCGGCACAACCCCATCTAACTCACATATGAGTGTGGCCATGGAAAGTATCGATACCGCTCCGCAACAGCAGGAACCACGTCTGCATCATCCTATACAAATGCCTCTGTCGGCCCAGTTCCACGTTCATCGCAACTATCAACTCCCCATCTCCATATCTCTCACTG... | YNL116W_YNL116W_408685|1 | 1 | H3 |
TTTCCGATAAGCTTCAGCCCCGGCAACGCTAAAAATAGTATCATTCGCACCCCATGCAACTAGCACAGGAATCTTTGAATCTCTCAAGAACTTTTGAAAGGCTGGGTACAACTTTATATTATTCTGATAATCGAAAAATAATCTCAATTGAATATCGGTCTGGCCGGTACGCTGAATTAGCGCAATATCAAGCGTATAAGCGGCGGGATCAACAGATTCGATAGCTGGTACTCCATCATGGTACTGGCATATGACATTAGCTGGATCTTCAAGGTACGGGATAAGGGATTTAACAAATACCGGATCACTTTGATATGACT... | YNR064C_YNR064C_749488|1 | 1 | H3 |
CCGTTTGGAGTAATGAGCGGTTAAACTTGTTCTTGGTGAAGTGTGCAGAGTTTCTAAACTTTAAGTAAATGACACTAAGCCTATATTTTTCGCATTGCTAAAGAAACACTGATACATAATGTGTCTACATTTTATATAGCTGATAGCACGTTCTTTTTAAGTTGAATTATCTTTTTTTTCTCTTGTTAAATAAGATGGACAACGCGCTTTCGCTGGCGAGATGAAGCGCCCCTGATTGACAAGCAACTCGTGGAGGCAAGTAATTAGCGACGCTTTACTTTGTCTGATAGCAGTAAATATGTGACTGCATTTAAGAAGTG... | iYDR005C_458375|0 | 0 | H3 |
TTTTTATTTAGTCGACTATAAAGGTGGAAGTCCATACTTAAGAGATATTAAGGGTATTTTGATCAACAAGTAAGTAACAATCGTTATAAAAATACAATAGCAAAAGTATGAGCGGAGAAAATCGTGCTGTGGTGCCGATTGAATCAAACCCTGAAGTTTTTACAAATTTTGCACATAAATTAGGTTTAAAAAATGAATGGGCGTATTTCGATATCTATAGCTTAACAGAGCCAGAGTTACTAGCATTCTTACCAAGGCCAGTGAAGGCCATTGTGCTGCTATTTCCGATAAACGAGGATAGAAAATCGAGTACCAGTCAA... | iYJR098C_615412|1 | 1 | H3 |
CCAAAGTCTTTATACTCGCTCGTTGATGTACTGGGCGTTATCAAAACTGTTTGCCACTGGGTAGTTGCCTTATGGTCACTATCTGCAAGTTAAATTCAATCGCAAGAGGAAGAGAAAATAAACCGTCGCATCACTTTTCTACACTTACCCGGTTATTAATTTTATAACGTTTATATGATATATCTTTTCTTATTTATTATATTATGAATCGTGAAAACGGATTAAGCTATGCTTACAATTTGGTTTCCTCTAGTTTCGTGGCAATCTCGTTGGTCTTAGTAGCGCCTGGGTACTTAGTACCCTTATTCTTTGGTTTCTTA... | YGR185C_TYS1_866355|1 | 1 | H3 |
CCAAGTTTTGGACCACCAAAGAGCCATCCCATTCAGAAGATTTAACTCCTCTATTGGTAGAACTGCCCAAGGTAAGGAATTTGGTGTCACCAAGGCTAGATGGCCAGCTAAATCTGTTAAGTTCGTTCAAGGTTTGTTGCAAAACGCCGCTGCCAATGCTGAAGTATGTTTAAAATGAAAATATGAAAAATAAGATTGAAAAATAATAATGATGTGTTGACGGGAGATGATAAAGTATTAGAATATGTTCATATGTGTTGCATCCTATTTTCTGCATGAATGCACGATTCGAAAGAGCTAAAATTAACAGTTTTCCAAAT... | YJL177W_RPL17B_91178|0 | 0 | H3 |
CTAATTTCCGGGACTCTTTGTTGAGCGATGATTGCGACTGTTGCACAAATGCATTTAATTGTTTAGATTGTTTTGCGTAATCATTGGCTATCACTTCAGACAGCTTGTCTATTTGAGATAGTGTCGATAATTTGTCTAATAACAAGGTGTACCTTGCTAGATTTTTAGACAATTGGCCCTCGATGATCGAGCTCTCCCATGATGCATTCGATTTTTGCCCTTCCATCAAAACTTCACCCTTTTAGAGCCTGACAATACAGTATCGTTAGTTTGCGCTACTTCGCTGTCCCAACGAATTTGGTTTGTTTTCTATGTTCTAT... | iYPR105C_739589|1 | 1 | H3 |
ACTATTGTAGCAAATAACTTGTTTCAGAGGGGAGAAAAAGTAGCCGTGGGGGCCTCTGGTGGTAAAGACTCCACGGTGTTAGCGCATATGCTGAAGCTTCTGAATGACCGTTACGATTACGGCATTGAGATTGTACTGTTGAGTATTGACGAAGGAATCATCGGATACAGAGATGATTCCTTGGCCACGGTGAAAAGGAATCAGCAGCAGTATGGCCTTCCTTTGGAGATTTTTTCCTTTAAGGACCTCTATGATTGGACAATGGATGAGATTGTATCCGTTGCTGGTATAAGGAACAGTTGCACGTATTGTGGTGTTTT... | YGL211W_NCS6_92932|1 | 1 | H3 |
TCCTTTCCACTAGCATTTGGTCTCAAGACCTCATTTGGGTTTATGCACTATGCCAAGGCCCCTGCCATTAATTTACGCCCCAAGGAATCCTTGCTGCCGGAAATGAGTGATGGTGTGCTGGCCTTGGTTGCGCCGGTTGTTGCCTACTGGGCGTTGTCTGGTATATTCCATGTAATAGACACTTTCCATCTGGCTGAGAAGTACAGAATTCATCCGAGCGAAGAGGTTGCCAAGAGGAACAAGGCGTCGAGAATGCATGTTTTCCTTGAAGTGATTCTACAACATATCATACAGACCATTGTTGGCCTTATCTTTATGCA... | YDR297W_SUR2_1056824|1 | 1 | H3 |
ATTTGTTTTCAATAGGATCTGTTGATTCATATAACTGTTCTAGAAGGTTCAGTGGGAGAGTCATATTATCAATACCGGCTAACGCCTTCAATTCATCTAAATTCCTGAAAGAAGCCGCCATTACCTCAGTTGCATAACCATGCCTCTTATAGTAACTGTATATCTTCTTAACAGAAAGAACACCGGGATCAGTTTCTGCAGTATAGTCTTTGCCTGAAAGGGCCTTGTAAAAGTCCATAATCCTTCCAACAAATGGGGAGATCAATGTGACATTTGCCTCCGCACAGGCTACTGCTTGCGTAAAGGAAAACAGTAATGTC... | YGR043C_YGR043C_580864|1 | 1 | H3 |
TTCTGAACCTGCTGCAACAACCACAGATGGATCATCTTCTATGTCAGAGGAGCGTGTAGGTACTGAGCAAACTGCAGCTTCCGTTCAGGACAACGGAACTGCAAATAACATTCAAAATGGTCTTGGTGAAGAAGGAAACGCAACACGATCAAAAACTTCAAACGAACACAATGAAAACCAACAACCATCTCAACCATCAGAACGTGTTATACTTCCTGAAAGAAGTGATGAGAAAAAAAAATACACTCTACTTGCAAAAGTAACAGGATTGGAACGATTTGGATCTGCAACCGGTAAGAAAGAGAATCCGACTATTATAT... | YOR132W_VPS17_573498|1 | 1 | H3 |
AGCAGTTGTTTACTCGAATTTTGCAATCAACCCTAATTTTTGAAGCCTGGTTTTAGATTTATTCTCTTCTTTTTCTTCTTGTGAACTTCAATTACTAATGTAACTTAATTTTTAATATAACTTTTACAGTTTAATAATATTGATTTTTTTCGGTCTGGACCAATCGCGCCGCATTTCTCACTAATATTACTAACATACCCTCTTCTCATTGGCTCGGTACCCCTTTCGTGACCCGCATTTTTTGTTTTCTTTGTTAGCCCGAATGTCTCACAATGAAGATGTAAAATTAAGATTATATATGAAAAATTGATACAAAACAA... | iYLR219W_576585|0 | 0 | H3 |
TTCATTTTTACAACAGTATTCCAAACGAGCCGTGTATGCAAAAAGAATGAGGTCAAATCAAAAGATCAAGTACCAAGCCAGCTGATTCTCTTCTTACTAAGTTTGACTATCGTTTACATTTCTTGCTTGTTGTTTCATCAAACAATGTACTCTTTTCTTGTTTTAAATGATTTTTTAGCGGCGAAGGTAACAGCAGCAAAAAAAAAAAAGAAAAAACATTGCAAAATAGCTAATGGAAATGGATTTATTACTGAAATATGTAGAAATATTTAACTCGATAGATTTGATCATACACATATAATGCAAAATATATATATATA... | iYBR057C_353497|0 | 0 | H3 |
AAATGCGTTGATATTGTGAAACAATCCTTACCAAGACAGCGCCAAAAGCTATCCTCTTTAAAGAATTAAACTTGATTTGTTCCAGGAACTGGCTCATTAAAATCAATGCTGGAGTTATTACTAAATGGTACTGGTCCGAGGTGGAGAAAAGAATACCAATAATGGAAAAAAATACCAGATCACCATTTGATAGAGCATCAAAATGGTTCTTTTTATAGCGTGCTTGCATTTCATTGATATAATCTCTACACTCCTCGGAAAGCTCTCTATTATATTTCTCTGAAAGGGATTTTAGAATTGAGATCAATGCATTTTGTGTA... | YDL148C_NOP14_189168|0 | 0 | H3 |
CGCTATCAAGCCCAACTTGATGCAAACTTTAGAAGGTACTCCTGTCTTGGTCCATGCCGGCCCATTTGCCAACATCTCTATCGGTGCCTCTTCTGTTATTGCTGATCGCGTGGCTTTGAAATTGGTTGGTACCGAGCCAGAGGCAAAAACAGAAGCTGGTTATGTGGTTACTGAAGCAGGGTTCGATTTCACTATGGGTGGTGAAAGATTCTTCAACATCAAGTGCCGTTCCTCTGGATTGACACCTAATGCTGTGGTCTTGGTTGCTACTGTTAGGGCATTGAAGTCACACGGTGGTGCTCCAGATGTCAAACCTGGCC... | YGR204W_ADE3_908050|1 | 1 | H3 |
AAAGGACACTAAAATAAAACAGAACCGGAATGTGTTGAATAAAACTGAAGTGAGCTACTATATCTTGAATCACAATATCTTAGCTCCGTGATAGAGCATAGGGCACCTCTATTGATCTCTTCGTGATGCCCGGTTATATACACACGTAACCCCGTAAAAGCCGTTACCCGGTTTATGTCTTTCTGTGAGCACAGTAAGCTAGGAAGGTCGATGGAGAAGAAGACTCCTGTCTCATATGGAACAACCTTGCAAATTGAGTGCAATTATCCCCTTTGGAGTTTGAATTTTGTGTTAGTACATTCTCCTTTTTTAACTTGTCT... | iYIR002C_360653|1 | 1 | H3 |
ACCCAGAAGAAGTGAAAAGAAAAAGGAAATTAATAAATGATGTCGATGCAGCCCAAAAAAAACTAAGTGAAAGAAAAAAGCATAACAGTTGGGTGCCGAAGTGGTTGAAACCGAAGAAATCCAAATGGAAGGTCATGGTCGAAGAAGCTGTCGAAGAAGGAAGAGATATGCAAGACCTACCAGAAAATGACGTCAATAACAATGAAAACGAAAATCCAGATGAACATGAAGGGATAGCAAGGCAAAAACGCAGAGATGCTGCTCTTGTCGATCATGGGGCATTAATGCATGAGTTACAACTTATAAAACAAGCGATGCAC... | YFL034W_YFL034W_68389|1 | 1 | H3 |
GGGGTTACGACAATGCTCCAGGTATCTGGTCCGAAGAACAAATTAAAGAATGGACCAAGATTTTCAAGGCTATTCATGAGAATAAATCGTTCGCATGGGTCCAATTATGGGTTCTAGGTTGGGCTGCTTTCCCAGACACCCTTGCTAGGGATGGTTTGCGTTACGACTCCGCTTCTGACAACGTGTATATGAATGCAGAACAAGAAGAAAAGGCTAAGAAGGCTAACAACCCACAACACAGTATAACAAAGGATGAAATTAAGCAATACGTCAAAGAATACGTCCAAGCTGCCAAAAACTCCATTGCTGCTGGTGCCGAT... | YHR179W_OYE2_462988|1 | 1 | H3 |
TAGCAAATTAGGAAAACAACTTTAGCACGCCCAACTGCGTACTGAAACGCAGAAAACTATAGAGTTTCCCGAAACGGTGACAGCTGAGTTCGCTCAAATAAGAACAACACCATGTCTTGAAGCTTCTTCTATAGCGTAGAGTAAGTAGTGCCTTGTAGCCCTACCGTTTTACAATGCATGAGGTTACCCGCACTTATTATTTTTTTCTCTTTTTTTTTCTTAGCTATAAAAGGCAGATCAATGCAGCATTCATTGCTTTATTTGACTTTCCTTTACTATTTATTTACTTTCCATTTCTTATTTTAGTGTTATTTTATAAC... | iYGL089C_345913|0 | 0 | H3 |
TCATGACTCAAGCTTGCGATATGTGTTGGTGTCATGCACAAGACGCCACAAATGATAGAACAGAAAAGAAAGTGAACTAATCTTCCAAGACGAAGAAAACCAAAATCCGGGATGAGTTGAAAGTCAAAAAGACTGTATATATAAATTTCAACTTTTGTAGAAGATGCAGAAAAAGAAAATGATATGGTATGCAGAAAAAGAAATAAACCGCTATTATCCTCGCGGTTTGTCATTATAACAGGCAATTACACTAGAGAAAGCCGCACACCTCCCTCCGTTTCTTTTGCCCTGCGAGTTTTTCCGGAAAAGAAAAAAAAAAC... | iYBR105C_452171|0 | 0 | H3 |
ATTAAACGCACCACACTGTTGCCGTTTACTTTAAAGAGAAGCGCAAAGAGCGGCAATGGTGATAGCGAGTGCGCATACGCTCGCTTTCCAGGTCAAACTCATATTTTTGATTGGTAATAGCTGCGAAGTTGTTAGTTTATTAGAAATTGCAGCATTCTTACTCTTCCTGGAAGTCGTAGTGGAGGACAAAACCGATGACGTTGTTCTTGGAAATGCCTGTGCTGATAAAGAGGATGAGAATTTTTCTTCGATAGATCCATCTGCTTTGAAGTACGTGTAGTTGTCGAAAGGGGAAGGGAAAAGGTTAGCTTCTTTTAGCT... | YDR261C_EXG2_977731|1 | 1 | H3 |
ACGGTGCTGTTCCCTGGTTTTGTCTTGCTTGTGCCTGGTACTCATCGTCAAATTTTGGCATGTCCGCACTACCTGCAGTTTGTCTTTTAAGTTGTGTCATTATAGGTTTCAGGTCTGGCTCCACAGGTAAGCTTTCTCTTTCTAGTGGCCATTTCAATTCTGTGGGCAATTTAAAATCGCTGTAACCCACCACACATGCCATGGTGATCGACGTACCATGACGAATAACAACAGTATACTTGTCATTGTCATCCAGAGTCTGAATCACCACTAGCTCAGAGAGAAACGATGTTGAACCCTTCTCACTGGGAGAATCTTTA... | YCR076C_YCR076C_249852|1 | 1 | H3 |
CTAATAAAGGAAATATCAGAAAAGATGAAGCTACTGGTTTCTAATTCCAATGGAAGTAGTAGTAGTGGTAATTCAAGTTCAATTTACAACTCTCATTTAATGAATGACAAGAAGAAAAACAACAATGCCGGCTTAAACAAAAATATGTTGAAGAAAATTATAAAATAATTGATAGAGGGAAGAACTCCCTTTAACATTCTTATGACCATGATACATCGCACTTTTTTTTACTTGGATTGAGGAAATATGTATATATAGGCACGTATGTATATATACAAATGTTCTTTCTGAAACCATGTAAACAGAAAAGGAAAAAAAAT... | iYPR084W_708420|0 | 0 | H3 |
CTGGAAAAACAATGTATGCTGCCGACGGTGACTATTTAGAAACTTACAAGCAATTGGAAAAAATTTACCTTGATCCTAACGATCATCGTGTGAGAGCCATTGGTGTCTCAAATTTTTCCATTGAGTATTTGGAACGTCTCATTAAGGAATGCAGAGTTAAGCCAACGGTGAACCAAGTGGAAACTCACCCTCACTTACCACAAATGGAACTAAGAAAGTTCTGCTTTATGCACGACATTCTGTTAACAGCATACTCACCATTAGGTTCCCATGGCGCACCAAACTTGAAAATCCCACTAGTGAAAAAGCTTGCCGAAAAG... | YBR149W_ARA1_540657|1 | 1 | H3 |
GCGCAAGCCTAGGAATTGCTCTCGATGGCTTGGACAAGCCTCCGACTTCTTATCTGCTTGTTCACAATGAAGATTTTGAAAAAAAGTGGGACGTGTTAATGACTTCTACTTTTAGAAATAGAACTGTGCCATTAAATATAATACAGTATTTGATCTCTCATACTGATAGTAATACTGAGTTTAATCGAATGTTACGCTCCAATTTTGATGATTCATTACTACTTATTGAGAAATGTAAAAAGTTTATTAAGACCTTCGTGGATGTTTCCTGTTCTGTTAAAGATGTAGATTTCGGAAACGGTTTCAATTTACGTCATCTA... | YMR176W_ECM5_614461|0 | 0 | H3 |
TATCAGAGCTCACTCATGTAACAACAGCTCATACGTTGACAGTAACAAGATTACTCAAGGTTCCGGTACCAGATGTAGACAAGCTCAAGCCGCTGTTGCATTCCTCTACTTCTCTTGTGCCATCTTTTTGGCTAAGACCCTGATGTCTGTTTTCAACATGATCTCCAATGGTGCCTTTGGTTCTGGTTCTTTCTCCAAGAGAAGAAGAACTGGCCAAGTCGGTGTTCCAACCATTTCCCAAGTCTAATTGAAGCGCACCAACTTAAATTTTACGCCACTTTCAATTAAGAATATAATAAATGGACACCGTGAATAAATTA... | iYPR149W_830436|0 | 0 | H3 |
TGACTGTAGGTGGTACTTTTACATTTTTATGTGCGCTGGTATTTTTCTTCAACTTCTTAATGTTTATTCCGATGAAATATGGTATGAAGTGGAGGGAGGATAGATTATTGAAACAACAAAGACAGTCTTGGTTAAACACCTTGGCAGTCAAAGCCAAAAAGGGAACAAAAAGAGACCAGAACGATAATCATAATTAATTGGCATTCTTCAATTTGATAGACACTTATCCTGCATATTTTTTTTATAAACAGCTTATAGACTTTCATGTAAATTTTTCCTAATTAATGTATTATTTACTTCGTTAATTTTCCGTTGAATTA... | iYNL065W_505534|0 | 0 | H3 |
ATACAATTCGGCTTCAAATATGCGGGTGCACAATCTATCTTCAAATTTTCTGTTATGGTCTCAGAAGTTTGCAACTCAATCCCGTACAATTCTTGAATAGAATGCACGAACTTGATAACCTGAGTTAAAGTGAGACCTTCCTTAATACAGGTCAATTTAACACGTGGTTGTTCTTCACCAGCGTATAAGTAAACGGTTTCTAACTTGGACATGGTTTCCTTTCTTTGTACTTGGTATTTCTTCTTCTTTTTCTAGCTGTCTGTATGAGTACTTAACTGTAACAGAATAGTAATGGGTTCAGGCTAAATCTCTTTAATTAT... | iYKR011C_462372|0 | 0 | H3 |
TCAGCGGCAACAGATGTTTCGACAGTAGCAACAGTTGAGGCAGGAGCACGTTCACCCTCTCCAGTCCATACGAAAGTAGCATCCTGGGTGACGGTCTTTGTGTAGACATGTCCATTTTTGGTGGCTGTAATAGTGGTGGTTATACTCGAACCAGACCCTTCATCAGCAGAGGTGCCCTGAGCAGAAGTGGAAGGCTCAACAATAGATGTTTCAGCGGCAACAGATGTTTCAGCGGCAGCAGACGTTGCAGCGACAGTAGACGCTTCAGTTACTGCGGAGGTTACCGCAGCGCCTTCTCCTGCCCAGACAAAAGTAGCATC... | YDR534C_FIT1_1504697|1 | 1 | H3 |
TACCACCATTATGTCTTGTTGGAGAGTTAGCATCCCCGACTCTTGCCCTACCAGAACCCTTTTGGGGCATCAATTTACGTCTGGAAAACCCGTTTTCACTTCTCCCGGGAGGATTTGAGGCGCCTACTCTTCTATTATCGTTTTCATAAACTACGGCTCTCCATAGTATATCACGCCTTAATGGAGCAGCTACAGTTGATGTGGGGACCGGCACAAATGTCAATGGTTCTAATGATGGAAAGGAGCGAACGGTAACTAAGGCGTATTTGGGAGGAATGGCTGCGTTTGGCAGAGGGTTTAACGTGGATTCTGCATGTGCG... | YML025C_YML6_225131|1 | 1 | H3 |
ACAGTAAATCCACCTCACGGAGAACGATAAAAGTCTTCATTTCTCCATGAGGTGGAAACGTTGTTTGAATATTAAACATAATACTTGAACTTCTTGTGTAATTTTTATTTTTTAATCGAAGTCTACTCCGAAGTCTCCTATAAAGCGTAAGGCGCGTCTCCAAAGTGCTATTTTTCAAGATCAATTCAGCTTCAAAGAAGCTTTGATACCTACGGGGCAATGCAAAGGCTGCCAGTCGAAACAAACAGCGAAAGCTAGCCCATAATTTCTTGAGCTGTTGATTTTGATTGTGATATTTTTCATACTTTAGAAGAAGATAT... | iYDR135C_728183|1 | 1 | H3 |
GGTAAATCATTGAAGGAGTCAAGAGCTTTATTACTTGGAATGATGTAATAAGAATGATCGTAAATGCCGCTCAAGTAAGACGTCTCAATTTTACCCCCATTTTGAACAACAATTTTGGTTAGTATTGAACGGTGATTGCCAGGAAAAATATGATGAATGATAAACGCACAATTTTTGAAAATCAAAGTACTATTATCACTTAAATCTTCTTGTTTGTTATTTATCGATAATGGTGACTGCCCTAGTACTGAAAAATTTGTCTTACTATGCTGCTGTAGAGACATTGCCTTGTCCCATATTTTTTCACCCTTTGGTTTAAA... | YJL090C_DPB11_263840|0 | 0 | H3 |
GTTCTGCCCAGCAGGTGTCTACGAATTCGTCAAAGATGAAAAATCGCCTGTGGGTACGAGACTGCAAATCAATTCACAAAATTGTATCCATTGCAAGACCTGTGATATTAAGGCGCCCAGACAAGATATTACTTGGAAAGTACCCGAAGGTGGAGATGGACCAAAGTACACGCTAACTTAATCACAGGATTCATTATTTATTTAATTATTTTATTCATTATTTATTTAAATTTTTCTATTCATTCTTTTATATAATCTATATTATTTATTCACGTAAAAGAGTTCTTTTCAGCCGACAAACTTTTCAGCTTCAATGAACC... | iYOR356W_1009179|1 | 1 | H3 |
TCAATACTTAGACTTGTTCACCTTGAGTGATCCTGAAAAAAAAAAATGATGGCCGCGTTTCGAAGAAAATAAAGAAAAAATAAAGCAAACCTCAAACAAGAAAACAAAGCGGAAAATGGTAAACAAGAGAGATAGAACAGAGTAGTAGGCTATTAGCAAAAGCGCGAGAATTACTACATTATAAAGGATCTGTCAAGAATGACATTGAATAGGAAGTGCGTAGTGATACATAACGGATCGCACAGAACGGTGGCTGGATTTAGTAACGTTGAATTGCCACAATGCATTATACCCTCTAGCTATATTAAAAGAACAGATGA... | YPR034W_ARP7_639571|1 | 1 | H3 |
CACTAGCGTATCGGTATTCAAAAAAAAGGAATGGAGAAGAAAGTGACCAGGATGATGAAGAGTGTGATGAAAAGTCTAGGGGCGAAGGCCATTCAGATCAACCACAAAACCCAAAACCAGAGAGCTTTACAGCCACAAAGGAAGAAAAGGCTCTGGTAAATCAATTAGTTGCGTTGCATAATTCACTACATGTGAAAGGCGTATCTTTGTATGGGATTGGCTACGGTAAAGTGCATCCTGAAAATGCTAACGGAAGCCACGGCGAACCGGGACTCTCAAACTGGGCCAATACATGGTGTGGATTATTAGACTACATATTT... | YML118W_NGL3_33556|1 | 1 | H3 |
TGGAAACGGGGATGGCCATCATGGCTCTTACAATATGAGTTCGTCAGATAGAAAGAGACTAATGGAGGAAACTGGAAACAATGGAAACTTCTCCAATAAGAAATTCAAAAGAGACTCAGAGCTTCCAACAGAGGTTCTTGATTTATTGAGCGTTATACCAAAACGTCAATATTTTAATACAAATTTACTCGATGCGCAGAAATTGGTGAATTTTTTAAATGATCAAGTAGAGATTCCAACAGTTGAGAGCACCAAGTCAGGTTAACATTACGTTAATAAATAGGTATATATGAATATTTATACCAACACATCTATTATAA... | iYMR061W_394772|0 | 0 | H3 |
TCCGTTTGTCAGGATCTTTTTGTAACAGCTTCTTCAATAAATCTTTGGCATCAGTGTATTCTTCCATTGTGATTCCCGAAGTGGCACCATTCAACATTTCCTCATAAGATGGAAATTCTAAGGGTTTATTGATAATACTGTCGAAAAGTTCTAATCCTGAATTGGCGTTGAAAGGCAACTTTCCAAACAACAAGCAGTATATAGTGACCCCCAATGACCAGATATCGATGGCAGATGAGCAGGAATACTCTTTTTCAGTGGAGCAAAGCTCAGGTGCAAAAAAAGCAGGGGTCCCTAAGGCTCTAGATTTTAAAAGTTGT... | YGL179C_TOS3_164348|1 | 1 | H3 |
AGGTCCAAGTACCGAAAGTTCTTGCACTCAAATGGGTTGTTTCAGTGGGTTTTCTTTCGTAGACTTTACGTGTCAATTCTAAACCAGAAACGTAAGTCTGGATAGAATTGAAGACTGATACAATGGAAATGAAAAGTAACCATTTTGGTAAGTAACCTTTTGGCATTGCTGCCAAGGTGGTCTTGGTTGTAGTTATTACGTCTTGTAGGCTGAACATTATCTGATACTTTAGTGTATTTGAATCCTTCGATACTATGGTCTAGCTAGTGTCAAGAATATAAAAACCTGAAATTAGTCAATGCGTTTACGTCAAAAATAGT... | iYER044C_238048|0 | 0 | H3 |
GAGAATTGGCGGATGATGGGTAACTGTGAGAGGAAGTGTCTGGATTTGATCGTATTTGTTAAATTCATCAACACCTTTTATGCAGAGGTGTAGTAGTGTAGGTAGCCCTTAGTAATTGGTGTTTATTATTGTCTATATAGAAATTTCTTTTTAGTGTAGCTTGTACTCTTTCCCTGCATTATTTTTGTTTTATTGACTTTTGTCTTGGTGAATCAAAAAATTAACGAAACGAACAAATTTAAAATGGCAAGATACGGTGCTACTTCAACCAACCCAGCTAAATCTGCGTCAGCTCGTGGTTCTTATTTGCGTGTTTCTTT... | YJL177W_RPL17B_90789|0 | 0 | H3 |
TACTTTAAACAGCTTTCTTGCTTTTAATTTTCTTTCTCTATTAATAAGTTTAGAAGCCATATCTACAGGGTGCAACTCCCCTTTTGCTATCATTTCATGAACTTTGTGATCAATATTTTTAGTATTAATGACATCTTTTCGTACCCCTGATCCCCTTTCAACAACGGAACCAATACATTTCATAATTTTAATAATTTCAGCGTGGGAATTTGTGACTGCTTTAGGAATATTATTTAAAGTTGTAATCTGATTTGATAGAGGACAAAACACTCTTTGGTATTGAAACGCATAATTGGCGAACTCAACCTGCTGCTTGAACG... | YDR263C_DIN7_994653|1 | 1 | H3 |
AATGCATTTGGAGGAAGCGCCACCACCGGAGGGGGCCTTTTCGGTAACAAACCTAACAATACGGCGAACACTGGGGGCGGGTTATTTGGCGCTAATTCGAACAGTAATTCTGGCAGTTTGTTTGGTTCCAACAATGCACAGACGAGTCGTGGTTTGTTTGGTAATAATAACACTAATAATATCAATAATAGTAGTAGTGGCATGAATAATGCAAGCGCTGGACTATTTGGCTCTAAACCTGCAGGAGGCACTTCTTTGTTCGGTAATACAAGCACCTCTTCGGCCCCTGCGCAGAACCAGGGCATGTTTGGTGCAAAACC... | YGL172W_NUP49_181175|1 | 1 | H3 |
AATCAGAGGAAAAGGCCTTCTTGTTTGGAGTAGCAATGGAAATACCATTTTCGACAAACTTAGTGTAAAAACCAGCAATGTAAGCGCTGGAAGTGTTATCAACCAAAATGACTGGCTTAGGTGAAGTCTTCAAATGAGCAATTAAATCATCCAAAGGCAACGTTTTAGTAGTGGAGGCTGCTAAAGCAGCCTTCCAATCAGAACCAACATTTAATGGAGAAAAGTCCTTGGAGATTAAAGAACGCTCAGCTTCAGCCAAAAGAACTAGATTGTAAGTAATGGTAGACTTCATGGCTAACAATTGATCCAAGAAAGCTGAA... | YJR139C_HOM6_690099|1 | 1 | H3 |
AAAAAGTGGTCGCTTTTTGCAACCTCAGAATCACACACTTCACTCATTATTCGTTTGTAATCATCTATTCTTTGATTTCTATTGTTGACACCTTCATTTGCATTTAAATGAGCACAAATATAGCTAAATCTTTCCCAGTTCTCCTCGCCATTGCGTGTCATCTGAAAACTTATCAAAGTTCCACCTTTCAAGTGTGTACCAAACCAACCGCATTTCCCATTCCGTTTCAAAATATCATCTTTTACCTTTAGCGCATTATTATTGTATAATACTATTATGGTAATCGCCCCAAGACTATTCACCCCTAAACATGAATACTG... | YOL065C_INP54_205535|1 | 1 | H3 |
TACATTTGGCCTTATAGAGTGTGGTCGTGGCGGAGGTTGTTTATCTTTCGAGTACTGAATGTTGTCAGTATAGCTATCCTATTTGAAACTCCCCATCGTCTTGCTCTTGTTCCCAATGTTTGTTTATACACTCATATGGCTATACCCTTATCTACTTGCCTCTTTTGTTTATGTCTATGTATTTGTATAAAATATGATATTACTCAGACTCAAGCAAACAATCAAAGAAATCTTTCACTGCTCTTTTCTGTGTTCCATTTAGTTTTTAGTACGATTGCATTGTCTATATACTGTATTTACCAAATCTTAATTTTAGTCAA... | iYCL066W_14042|0 | 0 | H3 |
CATTTTTACTTAAAAATTGATGGTCCATCGTAAACAGCGTTAATAATTTCTTCGCCCGATGCATAGACAATGAGAAGCCAAAATGCAGAAACTTGATACTTTCAAAGAAAGAGGTTCAACTATGAAAAGAAAGAAGATTTAAGTGCCGATAGAGTATACTAGCATGGCACTTGTCAAATACAGCACAGTTTTTTTTCCACTCCGATCATTGCGACTGTTTGTGTCCATCAAGAAAGCATATTACCACAGCGAGCCGCATAGCATCGATCTATTTCATGATAAGGATTGGATTGTGAAAAGACCTAAATTCCTAAATTTAC... | YPL029W_SUV3_495588|1 | 1 | H3 |
AGCGTTTTCTTCCAATTTACTATCCAAGTCAATTACCTTAAGCCCTACAGTGAATTACAATGATAAGACCATCAGCGGAAACTAAGTAGTTCATATAAATGCATTTTTTCATAAATGTGATAAAGGGAAAATTTCTTTTTCCACTTGAAAGATATGTACGTATTTAAGAACTGATCTTCTTGACCACTTTTTTTATGCGTGCAACTTTAAATGCTTTCATACATTTCCTTGGTCTTTCATCAGATGATATGCCAGATATCAACAATTATAGACCTAATATAACATTGAGTTATTTTTTACACGTATAATAATAACTCCTT... | iYPL120W_323908|0 | 0 | H3 |
TGACTCCGGTTTCCAAATTTTGAAGCCGATGGACGGATGCATCAATCCGGGGGAACTACTTGTTGTGCTTGGACGACCCGGTGCAGGATGTACTACGCTGCTGAAATCTATATCTGTAAATACACACGGATTCAAGATTTCTCCGGACACAATCATCACGTACAATGGATTCTCCAACAAAGAGATCAAAAACCATTACCGTGGTGAAGTGGTCTACAATGCAGAATCAGACATTCACATCCCGCACTTGACAGTATTCCAAACTTTATACACAGTGGCAAGACTGAAGACACCAAGGAACCGAATCAAGGGTGTCGATA... | YOR328W_PDR10_932611|1 | 1 | H3 |
TTTTCCCCCTGCTTGTTCCATTAAGAAAGCCATTGGGAAGGCCTCATAAAGCAACCTCAGTTTTCCGTTGGGGCTCTTCTTGTCGCAAGGGTATGCGAAAAGGCCACCGTAAAGAAACGTCCTGTGAACATCAGCAACCATGGATCCAACATACCTAGCCGAGAAAGGCTTGTTGTTGTTGTCTGCTTGGGGTTGTTTGACTTTCTCAATAAATGTTCTTATAGTCTCGTTCCAGTAGAGGGTGTTACCTTCATTAATTGAGTAGATGGCCTTTTGAGGCGGAATTCTTAAGTTAGGATGAGTCAAGATGAATTCGCCCA... | YLR377C_FBP1_874123|1 | 1 | H3 |
GTTATCGCAGATAGTTGTTGCAAAGATAGCGGCGTAGGTGGCCGCGAAATGGGGAATTCCAAAACAAACGGTTTTTTTACTCCTGAGAAATACTTGTACGGGATAATCCAGGGCCTACCACCCACGCTTCGAGGATTGGCTTTTATTTTTTTTTTTTTGGTGGCGTTTTATTTCTTTCCCGCTTTCTGGGACTTGTGCGGAGTTTTGAGAGGGGCGCGCGGCAAAGGATTCCCAAAACGGAAATCAGACGCCAATAGCCAGCACTCAAAGCAGTTCTGGACCCATTCCGATTTTCCCATTTGGTTCTTGCGCGTGCTGAT... | iYLR110C_370593|0 | 0 | H3 |
TCCTAAAAAAGAGTTAAAACTACTTTCTCTCATCGTGCCCAGCGCACTGATCTCCTTATTACATCGGCGTAGCTCTTCCGCTCTTGCATCTGAAGTCATAGGGCTGCCTGTCTCGGGTAATAAAGCGCTACCATCTTGTTTTTTTGAGTTTACCAGAAATTCCTTGCGTTGTACATTTATAGTCTGTGGGCTTGCCATAGACTTGCTCGTTATAGCTGAGGATGCCATAGAATTCCCTTTCTGAACCAAATAATCCGCCAATAAGCCATTATGTGGCAATATTCCCTCTTCTCCCAAAGATGAAACTGAACTGTACTTTT... | YLR014C_PPR1_174562|1 | 1 | H3 |
TATCGATGCGTCACAGTAGCAGATCATCTCTGACACTTGTTTCCCCATTTTTTTTTTTCATTTTTTAAAGGGTTTCTCTACAGCCTACAGGCCTCCCCTAATAAGTCAGCCCCTCCCTTTGGAGTGCGCTGTTGACCTGCGTATATAAGAGGTATATCAGTGCCAGTAGGTAAACCCATCTTGCGGGGATTGTACCAGGAACATAGTAGAAAGACAAAAACAACCACCGTACTTGCCATTCGTATAGATGCTGCCCAGACTTGGTTTTGCGAGGACTGCTAGGTCCATACACCGTTTCAAGATGACCCAGATCTCTAAAC... | YDL085W_NDE2_303212|1 | 1 | H3 |
TTTTCAGCGTTTTGCTCCATCAATAATGTTTAGAAGTTGCCTTTTTTTCCTATTCTGAGTTCATTAATGTTTCATTACTGCTCATCTCATCACTACACATTCGAACTAAATTCTTCATAACTTTAACTAGATCTTATGAAACTTTTCAAAAATGGTGCACGGGCTTTCTGTATATTGAACTTTCCGTCGCTTTTCGGAGCTACAATTCCCCTACCCGGCTTGTCGGCATTAAGTCTCTTCATCCCGTGTACTATTACCCGACAGTTTCTGCGGCATGTTCTGTGGAACGACGGTACGGGCGGTGACTGGGTGCCAAACAG... | iYLR401C_924676|0 | 0 | H3 |
GGTCATGAGGAAAGAAAAATATGCAGAGGGGTGTAAAAGTAGGATGTAATCCAACTATAGTTTGCTTTCAATGTTTTTGACCAATTCCTTGTATTTCTCAGTAGAATAGGACTTTGGCCTTTCAATGGAAGCACCGATGGCCCTATCAGTGATCAATTGAGCAAGAATACCAAATGCCCTTGAAACGCCAAATAAAACGGTATAGAAAGAAGATTCTTTTAGTCCATAATATTGTAATAAGACACCAGAGTGAGCATCTACATTTGGCCATGGATTTTTGGTTTTACCATGTTCAGTCAATACGCCAGGTGCTACCTCGT... | YCR005C_CIT2_121135|0 | 0 | H3 |
ATTCATTCCCTTAAAATCACGAGTTCTAATATCAAACATTTAATTGTCTCGCAAAATGGTGAAAGATTAGCTATTAACTGCTCCGATAGAACAATAAGACAATACGAAATAAGTATTGATGATGAAAACTCTGCGGTTGAGTTGACCTTAGAGCATAAGTACCAGGATGTGATTAATAAATTACAGTGGAACTGTATCCTCTTTAGTAATAATACTGCCGAATACTTAGTCGCTTCTACACATGGTTCTTCTGCACATGAACTATACATCTGGGAAACGACTAGTGGAACGTTGGTGAGAGTCCTGGAAGGGGCTGAAGA... | YAR003W_SWD1_155873|0 | 0 | H3 |
AAAAGGGCTCCAGGGATGCCCTTCTACAAAGCATTTCACATGCTAATAAAAGCCCCGTTACATAATAACATCTCCCGCCCACAAAAGCGTTTGATCCTGTTTTTCTATTTGCTTCCATAAAAAATCGGATCACATTTCCTTTTCTTCCGCCTTAGATCTGCACCGAGATCTTTTGCGCTCTATTTTACCCCTAGAAACTGGACCCGGCCTAGAATCTTTCACTGCCACCAGAGAGTTTTAGCCTCGCTCTCTTTTAATTTGCTGAAAAAGGGTTTCCTGATAGAAGCCGCGGTTAATAGAAAAACGGCAAAAAAGTTATG... | iYBR085C-A_419811|0 | 0 | H3 |
TTCCTCTTACGGCACTGCACGCATGCAGGCGGTTTTTTCATTTTTCTACCACGAATATCCATCGTTGTTTTTTGGTTCTGAAATGCTATTTTTGTAGATAAGAGCTTAAGAAAATATCACAGAAACCTCGATTTTAAATATCTGCGTAGTATGGGGCAATTGCAGCAGAGCACCAATGACTTTGGTTTGAAAGCCGTGAGTTTTGATGAAATAATCTAATGTAAGCAGCTAATATATCTTCTCGGGCGAAAACGGGAAAGTGGTCTTACAAGCGTCTAACGGGCCGAATGAATATTGGTGAACTATGGCTATTTGCGAGA... | iYDR303C_1071565|1 | 1 | H3 |
CCTATCCACGCCAATTTACTGTGGCAGAAGTTACTGCAGTGGAAGAACTACCTTTTCCAGCATAGCCTCCGCCACCAAAAAATGCAGGTGCTGCGTTTTCTGGATACGTGGCTGGCCTTTAAAACGCATGGGAGGGACAGTATCTCACCGATTTGGTAATGCCGTGATGCAACCTCGAGGATCCGGTTTTCAAGAGCGGAAGTAAAAAACCGGGAAGATCTAATATTTCCTTATGCAGGTATTCTGTGACTAAAAAGTATAGAAGACTATTTTGCACCGCTCCCTGATTAGACTTCGTGCGATATTATTGATGGATATCG... | iYKL045W_355360|0 | 0 | H3 |
AGACGATAGTGGATTTTTATTCCAACACATATAAAGTTTAGAAAGATGACATAAAGATTAACTTAATACATGACTGGGTATACTTGAATTCTAAAATTTTCAACAAATAAGTGGTTGTTTGGCCGAGCGGTCTAAGGCGCCTGATTCAAGAAATATCTTGACCGCAGTTAACTGTGGGAATACTCAGGTATCGTAAGATGCAAGAGTTCGAATCTCTTAGCAACCATATTTTATATTTTTTTATATTGTGTTCAACAATGAACAATAATGACGAATAATCCAGAATGATATAAACGTCCATGAATAAAAGTCCGTTATAA... | itL(CAA)K_458332|0 | 0 | H3 |
TTTCACCCATGATACCCAATTTCAAGAGAAGCAATTGCTACATATAATTATTTAGGCTTTACTATCTACTACTCATTGACTGTGCCCTTTTACACAATTATAACAAATATGTCAAAGCAGATGCCATGAACTTTGTATCTGAATTTTTGATTTCCTTTTAATTCTAATTGCAGACGACGTAAATATAGTTCTGAATTTCAAAGTCACTGTTAATTAATTGTTCTAATTGTTTGGTTTTTTTAATATAAATCACTAGTGCTTAAGTTCTGTTGACGCACACAGTACCTATCTTTGATTCCTTCGTGCAAACAGTATTCCGG... | iYCL055W_30197|0 | 0 | H3 |
TTACACCCAGAGAAACTCTGAAGGCTCTAGCCCATGTTATAATGACGACCAAATACGACAACAACAGCAACCTTTACAAATGCAACCTTTATCAAGAACTTCAAGTTCAAGTGTTAATGTCACAGCGATGAGAAGTACATCTGCAGGTAATTCAATTACAGCGAACGCTCCCGTTGTACCTAAGGTGATGGTCAATAACCAAAATGTTAAAACTGTTGCTGCCGATCAGTCTGCTACAGCACCTTCTTCACCTACCATGAATTCGTCCGTCACAACGATCAACCGCGAATCACCATACCAAACCTTAAAGAAAACGAACA... | YGR097W_ASK10_681758|1 | 1 | H3 |
AAACTAGTATTGTATGGTAGATCTCTAGGTGGTGCTAATGCTCTTTACATTGCTTCAAAATTTCGGGATCTATGCGATGGCGTTATACTGGAGAATACATTTTTGAGTATTCGAAAAGTTATCCCATATATTTTCCCCCTTTTGAAACGCTTTACGCTTTTATGTCACGAAATCTGGAATTCAGAAGGCCTTATGGGAAGCTGTAGTTCAGAGACGCCATTCTTGTTTTTAAGTGGATTAAAAGATGAAATCGTTCCACCCTTCCACATGAGGAAACTGTACGAAACATGTCCTAGTTCGAACAAAAAGATTTTCGAATT... | YNL320W_YNL320W_38395|1 | 1 | H3 |
ATATCTTGGTAAACTACAATTAGATCTTAAAACGGTTCCTGAAAAAGTGCTTTCATCTACCATTGAATTTAATTCACTTCCCTTTATGGAATTTGCACTGGTTTTGTGCTGTGGGATGTGGGTGTTTTGAAATATTTTCAGCTTGCAAGTCGTGCATATTGCACTTGTTTCTTATTTTTGGCGAGTGTAAACGTAATAGGTAATCTTAGTATTGGGCCATCGTCGCCCAGATAGCCATTGATAAAGCAGCTAGGTTCGTCGTCCGGTTGTTGGTATACACCTGTGCGTGCGTAAGTGTAGAAAAAAATAGAAATGACTAA... | iYOR268C_826319|1 | 1 | H3 |
AAACCAGCAGTATCCGCCGAGAAGGTTTCGCAGCAAGGACAAGTGCCCACTAGGAGGACTCGTTCTCACTCAGTATCGTACGGCCTACTCCAGAAGAAGAATAATAACGATGACACCACAGATTCTCCTAAAATATCTCGAATTAGAACGGCACAGGATCAACCTGTTAAGGAAACAAAAAGCAGCACCCTTGCGGAACCCATCGTTTCAAAGAAGGGGAGAAGTCGTTCTTCTTCTATATCAACTTCTTTGAATGAACGATCTAAGAAATCTCTGTTCGGTTCTTTGTTCGGGAGAAGGCCTTCCACCACTCCCTCGCA... | YOR227W_YOR227W_763308|1 | 1 | H3 |
ATTCACTTGAAGCCTTCTTAAAACATCATATATATCATTTGTGGGAGAATTGGTGCTTCCAGGGTAAAGCTCTGTTCTTCCATCGTTCAAATTTTGCAAGTTCATTAGGTATAAAAGGTAATCCGCTTGAAATATTGGATCTACAAGGTTATTTTTAACACTACCACACAACGCCAGTCCTAGCATTCTATTATGATCTGAAACGCGACCGGGAACACCGATTCCTGGAGTATCTATCAAGTATATTTCATTTCTTGATTCCGTATTCCGTGAAGTTACTCTAATAACCTCGCTTGTGGCTCTAGTAACACCAGCTTCTG... | YMR097C_MTG1_459882|1 | 1 | H3 |
CCTGCCGCTGATGTGGCACAATATCGTTCGAGGAGTGTTCGGGAACCCAACCTTTTCGCTCAAAACACGCTCTTTCCGCTTTTGAGACTTAGTCAAAGGCGTAAAGCCGTCTTCTATGGGAGGATTGAATTTTATTGTGGGAGTAGCGCCCTCAGACGACTTTTTTTTGTTTGGAGGAAACAGTATTGTTGGGTAATTGAAGAAATCGCGCTTTTGTTCTAATCGATGATCTATCTGATTTTTCTTCTTGATCCATAGTTGTCTCCAATTGGCATACGCCCTACTAATCACAAGAGCGTCATTGATATCGGTGGGTGTAG... | YDR475C_JIP4_1409489|1 | 1 | H3 |
TACGCCTCTTGTTATTATTTTTACTCTTTTAGCTTATGGCACAAAAGTACACAAATAGATACTAAAAGGCGAAATTTATTTCACTTGATTTACCCGTAACTGAATACTTGAGTTTGTCTATATATATATATGTATACCTTTTATTGTTAAAAAAAGAATATGCCAACAAAATCTATAATAAATAGTCCATGTACTCCACTATAAATTAATTTGCTATTGTCAGTAATGGCGCGATTAAAAATAAATAGGAAAAGAAATCTCTCCTCAAAAAATCTTATGTCCCCTAATTTCCGTATGTTGTGATTTTGGTATTCAGAGAT... | iYIL107C_166055|0 | 0 | H3 |
CGTGATTCCCCTAAACTCAGGCCCTTCTCCAAGGAGAATGCAACGGTAGGCAGTTCGTTTGAAATTGAAACTTTTCTCAACGACAATGATGTCCCGCAGCATAAGAAAACATGAGATCAGAAAAGCGTTGCAAAACGCAGAAGCCACTTATCAAGAGCTACGGCTGCGCGTTTTTCTGGGACGTGCTGACAAACGAGAAAAAAGATGGCTTTGTATTCAGTTCAAGGCCCTGTGTATTTTTTTCCTTCAAGAAATTTTTATTTTGGCCTGGGGTCTTTAAACGGACGGGTGCCCTGAGAGTACTCTCGCATAATCCAGGG... | iYBR066C_371133|0 | 0 | H3 |
CGCTCAAGATTACTTTTTTAATTTTCATACTTTTTCTCCTAATATCTTGCAAATCATGACTGAATTGACATTAACTGATGCAAATGGGTGGGAAAAATGCTTATCAACCCCCTAAAAAGTCAACTTCTTCGTTAGTGATTGCTATATGTGGCGTCAAGAATTACAAGAAGCTGGCGAGCTTTCCAAGCCCCTTCAGGATTAACCGTTTTTTACCCTTTTTTTACAAATTTACGGCGGGGAATAATTCTGATACTTTCTCTTTATGAGCATTCCTCAACAGGGATCGGCACTCAATTGCTTTAGTGAATAGTTTTTTTTTA... | itG(UCC)G_779933|0 | 0 | H3 |
ATGATACTTGCCTCCCATAATTTGACTTAAGCCAACCTGGAAAACCAGATTCCAGAATGAAAATTTTGGTAAAGTCATCGGATATTGGTTTTTCAAAGGAATGATTCACCAGAATGTCCAACAGGACAGACTGCTGTTTAACATTGTATTCGTTTGCGTCTGTATAGAGAATGATAAACTTGAAAAGATTTCTACTTTGAAACATTTTAATCTCACTATTAGGAGAAGTAATTAATGATTTTTTCTCCAAATCATGATCTGAATATGACATTTTAAAAGAAATAGGCTCCAGGCATATAATATTTTTTGTATCAATATGT... | YDR069C_DOA4_586976|0 | 0 | H3 |
TAAAGAATTCTAGATGGAGCTCTGAAATGGAATGGACCACGGGTCTTGTTGAAAGCAGTAGCCTTTCTCAAAAAGTCGTGGTACTTCAACTTGTTTCTGAAGAATTCACCAGAAATGTTCAAAGCTTCAGCTCTGACAACGACAATCTTTTGACCGTTCAATACTTGCTTGGCAATAGTGGAGGCCAAACGACCCAACAAATGATCCTTAGCTATAATAAAAGGAGAAAAATAAAGTTAGTAAATGCTAATCTTGGTATTTTTTACTGAGTGGATTGAATATAGTTGGAGTCAAAATATTGATTAGAAGTTACCATTACT... | YNL069C_RPL16B_494560|0 | 0 | H3 |
GATAATGACTTTATTATCAATGATTTCAAGAAAATTGGAAGGGAAAACGAAAACGAAAACGAGGACGATTAGAATATATTAATATATAGATGTACACGTATATGCAGTAGTTTTATTTTTTTATCTATAATACAACTCAAGCACAAGAATGCTTTGTTTTCCTAGTGCTCATCCTGGGCCTAGGCGCCATAGTTATCCGATTTATCATCGGATTCAGCTTTAGTAAACTGAATGGGGCCGTGAGAACCACTGGCACCTTCACTCTTAACATTGACCGCTTCGTCCAGCTTTTCGTAGTTGGTCTTGTATATGCTTTCAAT... | YAL032C_PRP45_83407|1 | 1 | H3 |
TTTACAGCATTTGTTTGTATCTTCATTTCTCAGAATTCTAAGATCTTCTTCTTGAATGGTGCGCAATGTCCTTCTTCCTTTTCCAAGCTTCTAATGAGGTTATTTTCTTAGACTTTTGTGCAGTATCTAAATATTATTGGTTTTATTGCTTGTAACTTGCACTTTTCTGTAATTGCCGAAGTATCAAAGAAATTTCAACTCTCAAAGAAGATGCGCAAGAATGAAGACCAAATGAAAAATGTCATAAAAAAGCAGTATATTAGCTGTCTTGGGTTAGGGTCCTCGTTCAACAGATTCTTTTTGCTTAATGGTGCAAATGA... | iYIL093C_189005|1 | 1 | H3 |
TCGTACAGGCTTTACCGTCGATGTACTCACGGTATTCACCGAGGAAAATGTTTCTGTTATCAAAGAACGTATGGAATCCTTGATTAATGAAAAAATGTCACAACTGAATAAAATATCCAACATATTTAATGTCCATTTTATTGATGTTAACGAATTTTTCAACAATGCATCGGAAGTATCGACTTTTATCATTGACAATGAAAATTTTGAGATTTTCAGCAAGTCAAAGTCAGTAGATGATAGCAATATATTAACATTAAAAGAAATATTAGGCAAATACTGTCTCAATAATTCATCCAGGTCTGATTTAATATCGATTA... | YNL119W_NCS2_401573|0 | 0 | H3 |
CTTTTATCTATTCAAATTGTTTCCCTTAGGTATATATATATATATATATATATATATATATTTTCCCTGTATATATCTATGTAAATGACGAAAACGCATGACATTTTAAAACCTACCCCGGGTTTGACCACAACCCACCGTTCATCTAATATTAACCCGCACCCACCAAGGAAAGAGGCAATAATTCGGGAATACTGCTCTAAGACCTTGTTTTTTCTTCTGTCAGGTGAAAGTATATGACCAAAATCTACATCCCTGCCCTAACAAACGTCATAGAAGTAGGAAATAAATGATGAGCCCCGCAAAATGACGCCGCACGG... | iYDR040C_538947|0 | 0 | H3 |
AACTACAGGCAAGTACAATAGCCTCCTCCAGCTTTTGACTCTGAAAAAAAGGTAACGAAAGAAACTTTTCACACAGATAGGAACTATTCTAATATTACAGTAAAGCCAGCTATGGAATAACAGGTGGCATGTGCCCACTGTATTCATGTATGAGCGGCCTTTTTTTTCCACGAAAACAGATATGATGATGTTTTAGGCCAAAATGGTGAGATCCTTTTACAGGAAGATTTTACGCCAACTCTTTCCATTTAGTGACACAGGTGAGGGTCTTTTATCGGCGCAAGGTGTTTCAGATGAGCCATTTTGCCCGTAGTCCGGTT... | iYBR082C_408580|0 | 0 | H3 |
TGGACGTGTCAATGCTCTTTCTAGGTCGTGTTCATCTTTCAAAGTCGCTTCAAAGTACAATTGAGTTCTTAGGATCAAATTATAGATTGATTGTTCATCCCTTAAACGGATCAAATAATCACTGGAATGAGGATCGATGTTTAACAGGGATTTCATGAATTCGTCATCTAATCTTTCAACAAATGAGAAAATGGAACCCAGAATCCTCTTGACACCATCAGAATCTTCTTTAGGTTCATCTTCAATGAAATCGATTGGATCAGCAAATTCATTAACTTGGTAGGTGTCAATTGTCTGGTCTAAAATAGACAATAATTTAC... | YMR309C_NIP1_894380|0 | 0 | H3 |
TTAGTTCCTATTCTAAACAGATCTTTAAACCTATCGTCATCATCTCAAGAGCAATTAAGACAAACCGTTATTGTCGTGGAGAATTTGACAAGATTGGTGAATAATCGTAATGAAATTGAAAGCTTCATCCCTCTACTACTACCCGGTATCCAAAAGGTTGTTGATACTGCGTCCTTACCTGAAGTTCGTGAATTGGCTGAAAAGGCCCTTAACGTTCTAAAAGAAGATGACGAAGCTGATAAGGAAAACAAATTCTCAGGCAGACTGACTTTAGAAGAAGGTAGGGATTTCTTACTTGATCACCTCAAGGACATTAAAGC... | YPL226W_NEW1_123171|0 | 0 | H3 |
AACACACACTATTTAAAAATCCTTGTCCTGTATAAACTACCGTACCGAGCCATTGGTCATCTTTAGACCACAAGCGAACTGTTCCATCCCTCGAAACACTAGCAACCTTTGAATCATCCACAGCTACCACATCCCTGACGTCCTGATCGTGCCCTTTAAGTGTTGCACTCAATTGATATCCCATACTCCAAATCTGCTGCTCTACACCTTACTATCACACATGAATATATATATATAAAATAAGCCAAGACAGTGGCCTTCCCTTATTATCAGCGTACTAAAATCTCATATGATTTATTTTTCGTGGTCCTGAACGAGTG... | iYKL213C_34174|1 | 1 | H3 |
CGTATAGTCTCAAGAGGAATAGGTGTCATTTTTGGTGCTTCTCCATACTTTATGTCAAGGATAAAGACATACTTGGGTTGTGCCTCAGCCTCACAAAGTGAAGTAGCTACAGATGAACCTGGTTGTAATACATCAAAATTTTTAATTGGATTGTGTACGAGATTCGGAATACACTCATGTTCATGACCCCATATCACCATATCCAGGAAATCTGGCAAGAACTGTTCAGGTAAAAATGCAGTATTCGTGTGACCTGTATGATTTTGATGGACGCACATTAAATTAAACCATTCACCTTCTCGCATAGTCGGTACTTCAAA... | YMR224C_MRE11_719997|1 | 1 | H3 |
TAAAAACACACGTGTATGTTTTTTTCCGCACGCATGAGCAGAATTAGACGGTTTTATAAACGTTCGCTCTGTGTATTTCACGTACCTTCCCGGATACAAATTTCGTCATGGTTTATATTGGAAGTTTTTCCGAAATTGCGGGTTTTGAAGGTTTATCCGAGACGACGTGAGGGGTAGGCAGGAGAAGGAAGGCTGTTTACGTACGCAAGGATGGGGTACGACAATGATTCAATTTCTGTTGCTAAGTTTTTTGTTATTTTTTCATATTGGCCTACTCATCTTTGATATCATTATTTTTTCGAATCTACATTAGAGCGTGT... | iYGL210W_94976|0 | 0 | H3 |
CAGAAAGAGAGCTGTCGACAACATTCCAGTTGGTCCAAACTTCGACGACGAAGAAGACGAAGGTGAAGACTTATGTAACTGTGAATTCTCTTTGGCTTATGGTGCTAAAATCTTGTTGAACAAGACCCAATTAAGATTGAAGAGAGCCAGAAGATATGGTATCTGTGGTCCAAACGGTTGTGGTAAGTCCACTTTAATGAGAGCTATTGCCAACGGTCAAGTTGATGGTTTCCCAACCCAAGAAGAATGTAGAACCGTCTACGTCGAACACGACATTGATGGTACTCACTCTGACACTTCCGTCTTGGATTTCGTTTTCG... | YLR249W_YEF3_638249|0 | 0 | H3 |
ATCTTTTGCAGTAAATAAAGTGTCCAATGGCAAGCCTTCAACAGGTTGGCAACGATCTAGAAAATCTGGTCTTAGTCTTCCAATCCAATTCTTGATGAAGTTTGTAAAGAAACTCGTACTGAACCAAGCGAGTGATAAACCAAGGAGAGATGTGTACAAAATAAAAATCAAATGTCTTCTATCGGCCAAAATGGAACCAATTATCAATATGGTTAAAGATGGCACGACAAAACTATAAACAAACAACATGTTGTTATTTACACGTTCAGTTGTCGCATAAGGATGCGATATAGTGAGATCGTTAATGTAAAACTGACGTT... | YDR284C_DPP1_1031216|1 | 1 | H3 |
TAAGTACTGAAAAGATAGCGTCGTTTTATACCCAGTATTTCAGGAACTCTAATTCGGTAGTCGTGAATCTTTGCTCACCAACTACAGCAGCAGTAGCAACAAAGAAGGCCGCAATTGATTTGTATATACGAAACAATACAATACTACTACAGAAATTCGTTGGACAGTACTTGCAGATGGGCAAAAAGATAAAAACATCTTTAACACAGGCACAAACCGATACAATCCAATCACTGCCCCAGTTTTGTAATTCGAATGTCCTCAGTGGTGAGCCCTTGGTACAGTACCAGGCATTCAACGATCTGTTGGCACTCTTTAAG... | YAL037W_YAL037W_74419|1 | 1 | H3 |
GTTCGATATGGTTACAAATATGGTGAAATTGAGGAATTTAAGGAGATTGTATTGTTCATCCCGTCTCTTAAGAACTATACAGAACGGGAGAATTTCTTCTGTTTCTTCAATATCGTTGAGTAAGAAATATACTACAAAATCTGCCAAAGAAGGTGAAGAAAACGTAGAAAGGAAGCATGAGGAGGAAAAAAAGGATACATTAAAAAGTTCCAGTGTACCAACTTCACGAATATCGAGATTGTTCCATTATGGCTCACTGGCAGCAGGGGTGGGCATGAATGCCGCAGCAAAAGGCATATCGGAAGTTGCAAAGGGCAATT... | YGL119W_ABC1_284688|1 | 1 | H3 |
GCTAAGCCTTCTAGAAGGCCGGAAAAGGCTCAGAAATGCCTGCAGTGCAAAGCTTTTCAGGAACGTTACTAGCGCAAGTCTTCTGGCTGCTGTTTAGCAACCTCTTTGCAGGTATTTCGCAGCGGGATTTCGCAGGTGTCTTGCCTAGCGTATCAACACTTAGTAGCAGTTAGGTTCTAAAGTTAAACTCCACTTCCGTTCTTCTTCGAGTTCCTTTTTGAGCGGTTTGGCCTATCTTGTCTCATCATCTTGTTAGTGTGCGACGTTATCTGTGGGTGAGAGAGGCTATTTTTGGTCCGGTGAAAAAAAAATTTTTTCTC... | iYBL055C_117203|0 | 0 | H3 |
CGATGGCTACAACCAGGGCTACTCAGTATCCCCCACAAATAAACAGCAATAATTTTAATACTAATCAAGCATCTGTACCTCCACAAATGAGATCTAATCCACAACAGCCGCCTCAAGATAAACCAGCTGGCCAGTCAATTTGGTTGTAAGCAACATATATTGCTCAAAACGCACAAAAATAAACATATGTATATATAGACATACACACACACATATATATATATATATTATTATTATTATTTACATATACGTACACACAATTCCATATCGAGTTAATATATACAATTCTGGCCTTCTTACCTAAAAAGATGATAGCTAAA... | iYPL204W_165860|1 | 1 | H3 |
TGGGAATCCGCAATTTGGATAAACTCCTGCACACAAATTTAGAAGCTTTATGTACTTATAGTTTGTCGAAGAATTATTTTTAGTAAAACTGCTTGCCAAACTCATGGACTTATAGATTGCGGACACAATATCAAAGTAATATGTGCCTACTATATCCCTTTGTTGCGGTTCTAGATGTAAAAGGGTATATAGCAGATCTAATTTCGCATTTTTTAACTCTACACCGTCACTTTGCTCAATGGACAACAAACATTGAAAAATCTGGCACTTTGGAGCCAAAAAGTAATCTACCTTAGCCAACTGGTCTCCCACCAGGGAAT... | YIL017C_VID28_320850|1 | 1 | H3 |
CATATTTATAGAATATATAGGATAATTATCATCCCCTCAATCAAGTTGATATTCCGTTTTGACAACAGGTCACTTCTGCAAGGTTGCTTATATTAAAATTGTGAATAAGCATGATATTAGACGGACTCATAATTGAATGGTTATCAGTTAATTGACTCTCGGTAGCCAAGTTGGTTTAAGGCGCAAGACTGTAATTTACCACTACGAAATCTTGAGATCGGGCGTTCGACTCGCCCCCGGGAGATATTTTTTTACTTTTGATTACCTTAATTTTTGAAGATTCTCAATTTCATGTATTAGGGATTGCTTTAAGGTAGCTA... | itY(GUA)M2_838022|0 | 0 | H3 |
TGAATTTATTCCGGGAATATTCAAGTTATGTATATCTCTTTTCATATTCTTAAATACACATACTCATAATATCTTGTCGAAAATACGCGGTGTAGGGAGTTATGGTGGATAACTTTTTCACGATTAGAAGAAAAGGAAAATTTCATTATTCGTAGCTTAACATGGCAAAAACGAGAAAGACATATAATCAAAACGTGAGTTTCCTGTGGAAAAAAAAAAAAGGGAACCTCTGGTTACGATGATATACCTGCGTGAAAAAGGACAGTTATTACCAATACATACAAAGGCTTAATAAGTGTAAAATATATATCTGCCGAGAC... | iYAL003W_143618|0 | 0 | H3 |
TAGTGAGTCTTGGAGGCAATAAGGGGAATGCTCGGTTCTGGAATCCTAAAAATGTCCCTTTTCCTTTTGATGGAGATGATGACAAAGCCATCGTGGAGCATTACATTAGAGACAAGTATATTTTGGGTAAATTCAGGTATGATGAAATAAAGCCTGAAGACTTTGGATCCAGAATGGATGATTTTGATGGGGAATCGGACAGGTTTGATGAAAGAAATAGAAGTAGGAGCAGGAGCAGATCTCATTCTTTCTATAAAGGGGGCCATAATAGGTCTGACTACGGCGGTTCCAGGGACTCATTCCAAAGCAGTGGAAGCAGA... | YGL181W_GTS1_158421|1 | 1 | H3 |
CTCTAACAGAAAAATCAGCCTCATTAAGCCATAGTGATTTGGGCGGCGAAATTTTAAATGGTACAGGAAAGAACCGCACCCCCAATGATGGCCAAGAATCAAATGAAAGTGATGGGAGTCCCGAAAGTGATGAGAGTCCCGAAAGTGAAGAAAGTAGCGACAACAGTGATTCGAGCGATAGTGACGATATGAGACCTTTACCGAGGCCATTATTTATGAAGAAAAAGGCCAATAATTTGCAGAAAGCTACCAAGATAGATCAACCCTGGAATGCCCAAGATGACGCACGAGTTCTGCAAACAAAGAAGGAAAATATGATA... | YBR152W_SPP381_546671|1 | 1 | H3 |
TGATAGCGGATGAGAACGGAACAAACAGCGCTATAGCTAATGAGCAAGAGGAAAAATCCGAAGAAGTAAAAGCTGAAGATGATACTGGTGAAGAAGAAGAGGATGACCCAGTGATCGAAGAGTTTCCATTGAAGATCTCCGGAGAAGAGGAGTCACTGCACGTGTTTCAGTATGCTAATAGACCAAGGCTAGTAGGACGCAAACCTGCTGAGCATCCGTTTATCTCTGCAGCAAGATATAAACCCAAGTCGCACCTATGGGAAATAGATATTCCTTTGGATGAGCAGGCCTTCTATAACAAGGATAAGGCTGAGAGCGAA... | YKR025W_RPC37_487754|1 | 1 | H3 |
TCCTATTGATTATGGGTTCGAATAGTACCAGATGTTTTGCCAATCCTAAATCGGTAGGAAAGTGGCTTGTCGTCGTCAGGCTTATTATCAACTCTTATGCACAAGAAAGGTACTCATCTTCTATAAACTACATAAGACCTGAATCTAATCAAAGGGAGAAAGCGCAGAACATCAGATTTAAAGCGGTTTTGCTTGATACACTCAGCCTTGTCTCTTTGTAAGGATTTTGGGGTACCTATGAATAATACATCTAGTAGTGTTAGTAAACCAACGTATGGGATTTTGGGATACATAGTTTTCCAGTGTTTCTTATCCGTGAT... | iYLR142W_427070|0 | 0 | H3 |
GAAAGGTCTTTTGAGCCTTTGTCGGCAAATTCCTCGGAAGAGAAGTCCAACCTTATTGAATTTGAAGCATGAAATCGTGCTTATCAATTTTATGTCACCCTAAAACATCTGTACGTGTTTATATAGATATTTAAAGCAATATTTGCCAGGATTTGGTGAAGATCCCTCATATAACTCTCATAAATGCGGATTTTCGGAGCGAAAAAAGCCTAAATTCTTGTCTGGAAGTATAATTGGCGGTGAAATAGAAAAGGTGGCAATCACGACTGAAAAGGGTACAGCTTTCGCAACTGACATATACAGACAGTGAAAAGTAATAA... | iYGR100W_693276|1 | 1 | H3 |
TTCATCCCTACCACTATATAATGGGAAAGAATTTCTTGCGAACATTCTCCTGATATGATTATTTCCTGGTACAGCCTTGTTAGCACTTCGTTCCAGCATTTCTATCTGGCTGTAAATATTCTCATGCGGCCCCATTGAAGGCATTGTAGTTCTAGGGACACCACTATGGCTCCCTCCGGGAGAGGCTGCAGGCGCACCTCCGCTATCTATGGTAATATGCTGAGCTGTAGTTGCATTTCCTGGTTCATAAGCAGTAGATGGATGGCTACCGCTCATTCTCCTTCCAGAAAATGTACCCGTGGTATTGACAGTACTCCCAT... | YBR280C_YBR280C_763833|1 | 1 | H3 |
TGTCGTTTCATTATCTGTATGACTGTCGTAACTTTGAATCGATCTAATGTGTTGACCCTGTCTCAGGCTCACCCATGGCGGCGCCTGCACCTGTGGGTGAAGGAAGAAAGACGATGTTTGTGAGGGAACTGAATTGGGTTGAAGTTCATATCCTAAACAAACACTTCACCAGCCATGGATGCATGCCTTGTCTTTTCGCAGTTGGTGGCATGAAAATATATATCACCCACCAAACCCTCTTACTCTTTTCTTACCAAGTAACTCCAGTAAGTGCTCGTTTTTTTCTTCTTCCATTCAAACCTGCTTAAAAACCTCGACAA... | iYAL034C_82459|1 | 1 | H3 |
TTTGTTGAAAATTTAGGCTAACATTTTCTAATAGGTGACATTACAAAATCCAGTCTTACAGTTATGCTTAATATTGACCAGTAGTCATATTACTGGCATATTATCTGTTAGAATAGTGACATACTATTCTTTATTACTGATGAATTTGTAATTGTGAAGCAATTAGACGCAAACGTCATTGATGTGTCAGGATGGAGAACGATATAGACTATGTGATAATGTAAAGTTAAATATCTGTTACTTGAACAATTAACTTGACGTTTGTATATGTAAGGATATGGGTCATTACATAGAAACTATAGTAAATAGTCTCAGTATTG... | itW(CCA)J_415984|0 | 0 | H3 |
GTGTCCTCTTGGAGAAGCAAGATTGATATTCATGCAGGCTAGATTCAAGTATGGTATGACGAATGGCGTCACCATACCCTCTCAATTAAGATACTTAAGGTACCATGAGTTTTTTATTACCCATGAAAAAGCAGCCCAAGAAGGAATTTCTAATGAGGCGGTAAAATTCAAATTTAAATTCAGGCTTGCTAAAATGACGTTCCTCCGCCCATCAAGCTTAATCACTTCCGAGTCTGCTATTGTAACTACTAAGATTCAACACTACAACGATGATAGAAATGCCCTCCTAACCCGAAAAGTAGTATATTCAGATATCATGG... | YNL128W_TEP1_383243|1 | 1 | H3 |
AAGTGCAGGAATGAGTCACTGTGCTCCATATATAGTAGGCTGTTCAAACTGGGTCTGTTTTTCGCCCAACTGTGTGTGAAAAGTGTGGTTTCTAGCGCAGAATTGCAAGACTGTATCAGCACCTCTCATTATGCGACTAAGCTGACCCGTTATTTCAATGATAATGGTAGTACACACGATGGTGCAGATGCAGGTGCTACCGTGCTGCCCACTGGCGACGATTTCCAGTACCTGTTTGAAAGGGACTATGTTACTTTTCTTCCGACGGGCGTGTTGACCATCTTCCCCTGTGCCAAGGCTATAAGGTACAAGCCATCGAC... | YJR046W_TAH11_522303|1 | 1 | H3 |
End of preview. Expand in Data Studio
Dataset Card for Dataset Name
The nucleotide_transformer_downstream_tasks dataset features the 18 downstream tasks presented in the Nucleotide Transformer paper. They consist of both binary and multi-class classification tasks that aim at providing a consistent genomics benchmark.
⚠️We note that we have revised and improved our benchmark during the peer-review process. The datasets featured in this repository are used up to this release. We highly encourage to move to the new version available here, which we believe to be much more robust.⚠️
Dataset Summary
The different datasets are collected from 4 different genomics papers:
- DeePromoter: Robust Promoter Predictor Using Deep Learning: The datasets features 3,065 TATA promoters and 26,532 non-TATA promoters, with each promoter yielding a negative sequence by randomly sampling parts of the sequence. The
promoter_alldataset will feature all the promoters and their negative counterparts, while thepromoter_tataandpromoter_no_tatarespectively provide the TATA and non-TATA parts of the dataset. - A deep learning framework for enhancer prediction using word embedding and sequence generation: To build the training dataset, the authors collect 742 strong
enhancers, 742 weak enhancers and 1484 non-enhancers, and augment the dataset with 6000 synthetic enhancers and 6000 synthetic non-enhancers produced with a generative model. The test dataset is comprised of 100 strong enhancers, 100 weak enhancers and 200 non enhancers. The original paper uses this dataset to do both binary classification (i.e a sample gets classified as non-enhancer or enhancer) and 3-class classification (i.e a sample gets classified as non-enhancer, weak enhancer or strong enhancer). Both tasks are respectively tackled in the
enhancersandenhancers_typesdatasets. - SpliceFinder: ab initio prediction of splice sites using convolutional neural network: The authors introduce a dataset containing 10,000 samples of donor site, acceptor site, and non-splice-site, resulting in 30,000 total samples that are featured in the
splice_sites_alldataset. - Spliceator: multi-species splice site prediction using convolutional neural networks: Two datasets are introduced by this paper, each of them contain splice sites and their corresponding negative datasets. The dataset
splice_sites_acceptorfeatures acceptor splice sites and the other,splice_sites_donor, donor splice sites. - Qualitatively predicting acetylation and methylation areas in DNA sequences: The paper introduces a set of datasets featuring epigenetic marks identified in the yeast genome, namely acetylation and metylation nucleosome occupancies. Nucleosome occupancy values in these ten datasets were obtained with Chip-Chip experiments and further processed into positive and negative observations to provide the datasets corresponding to the following histone marks:
H3,H4,H3K9ac,H3K14ac,H4ac,H3K4me1,H3K4me2,H3K4me3,H3K36me3andH3K79me3
Dataset Structure
| Task | Number of train sequences | Number of test sequences | Number of labels | Sequence length |
| --------------------- | ------------------------- | ------------------------ | ---------------- | --------------- |
| promoter_all | 53,276 | 5,920 | 2 | 300 |
| promoter_tata | 5,509 | 621 | 2 | 300 |
| promoter_no_tata | 47,767 | 5,299 | 2 | 300 |
| enhancers | 14,968 | 400 | 2 | 200 |
| enhancers_types | 14,968 | 400 | 3 | 200 |
| splice_sites_all | 27,000 | 3,000 | 3 | 400 |
| splice_sites_acceptor | 19,961 | 2,218 | 2 | 600 |
| splice_sites_donor | 19,775 | 2,198 | 2 | 600 |
| H3 | 13,468 | 1,497 | 2 | 500 |
| H4 | 13,140 | 1,461 | 2 | 500 |
| H3K9ac | 25,003 | 2,779 | 2 | 500 |
| H3K14ac | 29,743 | 3,305 | 2 | 500 |
| H4ac | 30,685 | 3,410 | 2 | 500 |
| H3K4me1 | 28,509 | 3,168 | 2 | 500 |
| H3K4me2 | 27,614 | 3,069 | 2 | 500 |
| H3K4me3 | 33,119 | 3,680 | 2 | 500 |
| H3K36me3 | 31,392 | 3,488 | 2 | 500 |
| H3K79me3 | 25,953 | 2,884 | 2 | 500 |
- Downloads last month
- 2,638
Repository:
Nucleotide Transformer
Paper:
The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics
