ftfy 5.6
pip install ftfy==5.6
Released:
Fixes some problems with Unicode text after the fact
Navigation
Verified details
These details have been verified by PyPIMaintainers
๐ Avatar for jbalonso from gravatar.comjbalonso ๐ Avatar for lumitim from gravatar.com
lumitim ๐ Avatar for rspeer from gravatar.com
rspeer
Unverified details
These details have not been verified by PyPIProject links
Meta
- License: MIT License (MIT)
- Maintainer: Luminoso Technologies, Inc.
- Requires: Python >=3.4
Classifiers
- Development Status
- License
- Operating System
- Programming Language
- Topic
Project description
ftfy: fixes text for you
๐ Travis
๐ PyPI package
๐ Docs
>>> print(fix_encoding("(ร ยธโก'รขลยฃ')ร ยธโก")) (เธ'โฃ')เธ
Full documentation: https://ftfy.readthedocs.org
Testimonials
- โMy life is livable again!โ โ @planarrowspace
- โA handy piece of magicโ โ @simonw
- โSaved me a large amount of frustrating dev workโ โ @iancal
- โftfy did the right thing right away, with no faffing about. Excellent work, solving a very tricky real-world (whole-world!) problem.โ โ Brennan Young
- โHat mir die Tage geholfen. Im รbrigen bin ich der Meinung, dass wir keine komplexen Maschinen mit Computern bauen sollten solange wir nicht einmal Umlaute sicher verarbeiten kรถnnen. :Dโ โ Bruno Ranieri
- โI have no idea when Iโm gonna need this, but Iโm definitely bookmarking it.โ โ /u/ocrow
- โ9.2/10โ โ pylint
Developed at Luminoso
Luminoso makes groundbreaking software for text analytics that really understands what words mean, in many languages. Our software is used by enterprise customers such as Sony, Intel, Mars, and Scotts, and it's built on Python and open-source technologies.
We use ftfy every day at Luminoso, because the first step in understanding text is making sure it has the correct characters in it!
Luminoso is growing fast and hiring. If you're interested in joining us, take a look at our careers page.
What it does
ftfy fixes Unicode that's broken in various ways.
The goal of ftfy is to take in bad Unicode and output good Unicode, for use
in your Unicode-aware code. This is different from taking in non-Unicode and
outputting Unicode, which is not a goal of ftfy. It also isn't designed to
protect you from having to write Unicode-aware code. ftfy helps those who help
themselves.
Of course you're better off if your input is decoded properly and has no glitches. But you often don't have any control over your input; it's someone else's mistake, but it's your problem now.
ftfy will do everything it can to fix the problem.
Mojibake
The most interesting kind of brokenness that ftfy will fix is when someone has encoded Unicode with one standard and decoded it with a different one. This often shows up as characters that turn into nonsense sequences (called "mojibake"):
- The word
schรถnmight appear asschรยถn. - An em dash (
โ) might appear asรขโฌโ. - Text that was meant to be enclosed in quotation marks might end up
instead enclosed in
รขโฌลandรขโฌ<9d>, where<9d>represents an unprintable character.
ftfy uses heuristics to detect and undo this kind of mojibake, with a very low rate of false positives.
This part of ftfy now has an unofficial Web implementation by simonw: https://ftfy.now.sh/
Examples
fix_text is the main function of ftfy. This section is meant to give you a
taste of the things it can do. fix_encoding is the more specific function
that only fixes mojibake.
Please read the documentation for more information on what ftfy does, and how to configure it for your needs.
>>> print(fix_text('This text should be in รขโฌลquotesรขโฌ\x9d.')) This text should be in "quotes". >>> print(fix_text('uรหnicode')) รผnicode >>> print(fix_text('Broken text… it’s ๏ฌubberi๏ฌc!', ... normalization='NFKC')) Broken text... it's flubberific! >>> print(fix_text('HTML entities <3')) HTML entities <3 >>> print(fix_text('<em>HTML entities in HTML <3</em>')) <em>HTML entities in HTML <3</em> >>> print(fix_text('\001\033[36;44mI’m blue, da ba dee da ba ' ... 'doo…\033[0m', normalization='NFKC')) I'm blue, da ba dee da ba doo... >>> print(fix_text('๏ผฌ๏ผฏ๏ผต๏ผคใ๏ผฎ๏ผฏ๏ผฉ๏ผณ๏ผฅ๏ผณ')) LOUD NOISES >>> print(fix_text('๏ผฌ๏ผฏ๏ผต๏ผคใ๏ผฎ๏ผฏ๏ผฉ๏ผณ๏ผฅ๏ผณ', fix_character_width=False)) ๏ผฌ๏ผฏ๏ผต๏ผคใ๏ผฎ๏ผฏ๏ผฉ๏ผณ๏ผฅ๏ผณ
Installing
ftfy is a Python 3 package that can be installed using pip:
pip install ftfy
(Or use pip3 install ftfy on systems where Python 2 and 3 are both globally
installed and pip refers to Python 2.)
If you're on Python 2.7, you can install an older version:
pip install 'ftfy<5'
You can also clone this Git repository and install it with
python setup.py install.
Who maintains ftfy?
I'm Robyn Speer (rspeer@luminoso.com). I develop this tool as part of my text-understanding company, Luminoso, where it has proven essential.
Luminoso provides ftfy as free, open source software under the extremely permissive MIT license.
You can report bugs regarding ftfy on GitHub and we'll handle them.
Citing ftfy
ftfy has been used as a crucial data processing step in major NLP research.
It's important to give credit appropriately to everyone whose work you build on in research. This includes software, not just high-status contributions such as mathematical models. All I ask when you use ftfy for research is that you cite it.
ftfy has a citable record on Zenodo. A citation of ftfy may look like this:
Robyn Speer. (2019). ftfy (Version 5.5). Zenodo.
http://doi.org/10.5281/zenodo.2591652
In BibTeX format, the citation is::
@misc{speer-2019-ftfy,
author = {Robyn Speer},
title = {ftfy},
note = {Version 5.5},
year = 2019,
howpublished = {Zenodo},
doi = {10.5281/zenodo.2591652},
url = {https://doi.org/10.5281/zenodo.2591652}
}
Project details
Verified details
These details have been verified by PyPIMaintainers
๐ Avatar for jbalonso from gravatar.comjbalonso ๐ Avatar for lumitim from gravatar.com
lumitim ๐ Avatar for rspeer from gravatar.com
rspeer
Unverified details
These details have not been verified by PyPIProject links
Meta
- License: MIT License (MIT)
- Maintainer: Luminoso Technologies, Inc.
- Requires: Python >=3.4
Classifiers
- Development Status
- License
- Operating System
- Programming Language
- Topic
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ftfy-5.6.tar.gz.
File metadata
- Download URL: ftfy-5.6.tar.gz
- Upload date:
- Size: 58.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d7509c45e602dec890f0f6ee0623a8b5f50ec1188ac7ab9535e18e572c99bcc
|
|
| MD5 |
3a045f4ee8c190c0adfc20c22a21f94a
|
|
| BLAKE2b-256 |
75ca2d9a5030eaf1bcd925dab392762b9709a7ad4bd486a90599d93cd79cb188
|
