Sentence Similarity • Updated • 3 • 78
english stringlengths 3 23 | non_english stringlengths 1 29 |
|---|---|
browsers | blaaiers |
directs | насочва |
quintus | квинт |
counts | dihitung |
downloaded | downloadet |
bariloche | باریلوچه |
consistency | pastovumas |
extreme | крайни |
thunder | tro |
movements | förflyttningar |
lysenko | лисенко |
alp | planina |
imagine | immagina |
deadline | termiņš |
acquaintance | známý |
assumptions | antaganden |
herpetologist | erpetologo |
salaries | mzdy |
screenplay | guião |
attracts | atraem |
clothes | vestiti |
tenements | kamienicach |
propagation | 繁殖 |
lighthouse | fari |
kabaddi | கபடி |
supposedly | konon |
bombing | 爆撃 |
earns | reason |
itunes | آیتونز |
warsaw | warskou |
kidnapping | викрадення |
panels | панели |
starfish | meritähti |
inhabitant | locuitor |
environmental | περιβαλλοντικές |
weimer | ויימר |
flabbergasted | atònit |
attracted | مجذوب |
entwined | põimunud |
reconfiguration | preustroj |
persuasion | presvedčenie |
diethyl | dietil |
lhasa | λάσα |
warnock | وارناک |
articulation | artikulasjon |
visualize | mailarawan |
exports | експортує |
together | együtt |
pulled | ditarik |
consist | consistono |
sweepers | penyapu |
remained | blieben |
filibuster | פיליבסטר |
complaints | pritužbe |
live | mamuhay |
dyskinesia | دیسکینزی |
excision | excizie |
demoralized | demoralize |
rock | 岩 |
hartman | хартман |
superficial | yüzeysel |
template | 模板 |
ruslana | руслана |
ripples | wellen |
harassed | lastiggevallen |
resignation | istifa |
hush | hysj |
menstruation | ciklin |
sleepover | ööbimine |
yard | avlu |
removes | entfernt |
convection | संवहन |
combat | zwalczanie |
#in | #v |
legitimate | perusteltu |
bike | biçikleta |
officially | oficiāli |
method | 法 |
coinage | acuñación |
demonstrate | demonstrar |
giulio | džūlio |
deplete | tanjšajo |
preserve | bevare |
psychologically | psicològicament |
allies | zavezniki |
pathogenicity | patogenitātes |
factorial | παραγοντικό |
resurrection | resurrección |
sediments | sedimenty |
rehabilitation | rehabilitering |
qin | цінь |
mittens | handschuhe |
horseman | הפרש |
potential | potensial |
stairs | scări |
gracefully | gracieus |
rental | thuê |
feedforward | फीडफॉरवर्ड |
jyoti | ஜோதி |
expressionism | експресіонізм |
End of preview. Expand in Data Studio
Dataset Card for Parallel Sentences - MUSE
This dataset contains parallel sentences (i.e. English sentence + the same sentences in another language) for numerous other languages. Most of the sentences originate from the OPUS website. In particular, this dataset contains the MUSE dataset.
Related Datasets
The following datasets are also a part of the Parallel Sentences collection:
- parallel-sentences-europarl
- parallel-sentences-global-voices
- parallel-sentences-muse
- parallel-sentences-jw300
- parallel-sentences-news-commentary
- parallel-sentences-opensubtitles
- parallel-sentences-talks
- parallel-sentences-tatoeba
- parallel-sentences-wikimatrix
- parallel-sentences-wikititles
- parallel-sentences-ccmatrix
These datasets can be used to train multilingual sentence embedding models. For more information, see sbert.net - Multilingual Models.
Dataset Stats
- Columns: "english", "non_english"
- Column types:
str,str - Examples:
{ "english": "consistency", "non_english": "pastovumas" } - Collection strategy: Processing the raw data from parallel-sentences and formatting it in Parquet.
- Deduplified: No
- Downloads last month
- 41
