Unicode is a universal character encoding standard designed to represent text and symbols from all writing systems around the world.
- Unicode is the most fundamental and universal character encoding standard. For every character, there is a unique 4 to 6-digit unique hexadecimal number.
- Unicode is standardized among all global computing platforms, devices and programs, enabling consistent representation and manipulation of text across different systems and applications.
- Unicode supports multiple languages, mathematical symbols, emojis and specialized symbols.
- Unicode is flexible. It allows new characters to be added, supporting the evolving communication and language needs.
How is Unicode Compatible with ASCII?
- We can also say that ASCII is a subset of Unicode.
- But wait! For the character 'A', the ASCII representation is 0065 and the unicode point is U+0041. How is it backward compatible with ASCII?
- This is because the U+0041 is in hexadecimal form! which corresponds to 0065 in Decimal.(0041)16 = (0065)10
Size and Growth
Unicode is an extensive and continually evolving character-encoding standard. It currently includes more than 149,000 characters, spanning global writing systems, symbols, and technical notations. As new characters are introduced, the Unicode set continues to expand. Below are a few examples of common characters along with their corresponding Unicode values:
| Character | Unicode |
|---|
| 1 | U+0031 |
| + | U+002B |
| A | U+0041 |
| $ | U+0024 |
How To Type in Unicode Characters?
- Open your computer and log into your Operating System.
- Open the Unicode character panel. On Windows, press Windows Key + . (period). On macOS, press Control + Command + Space.
- This will open a small window with Unicode characters.
- Search for the character you want and click on it. The character will appear on the screen.
Unicode Transformation Format (UTF)
Unicode Transformation Format is a method of encoding unicode characters for storage and communication purposes. This format specifies how Unicode characters will be converted into a sequence of bytes. The most common UTF forms are UTF-8, UTF-16, UTF-32.
UTF-8
- UTF-8 is a variable width encoding system where each character is encoded into 1 to 4-byte unicode points.
- UTF-8 is backward compatible with ASCII. All the ASCII characters (0-127) and 10 are represented inside UTF-8 (00-F7)16 using one byte.
- Other Unicode characters in UTF-8 are represented using multiple bytes.
- UTF-8 is widely used in internet and UNIX-like operating systems.
UTF-16
- UTF-16 is also a variable width encoding system where each character is encoded into a 2 to 4-byte unicode point.
- UTF-16 is used in Microsoft Windows OS and programming languages like Java
UTF-32
- UTF-32 is a fixed-width encoding system where each character is encoded into 4-byte unicode point.
- This format provides a simple one-to-one correspondence between Unicode characters but makes it less space-efficient, as where it should only take 1 byte of data (Example: 01), it is taking up 4 bytes (Example: 00000001).
- UTF-32 is less commonly used in mainstream applications and systems due to its space inefficiency and compatibility considerations
History of Unicode
There have been numerous versions of Unicode released till now :
| Unicode Version | Year of Release | Month (Day) |
|---|
| 15.1.0 | 2023 | September 12 |
| 15.0.0 | 2022 | September 13 |
| 14.0.0 | 2021 | September 14 |
| 13.0.0 | 2020 | March 10 |
| 12.1.0 | 2019 | May 7 |
| 12.0.0 | 2019 | March 5 |
| 11.0.0 | 2018 | June 5 |
| 10.0.0 | 2017 | June 20 |
| 9.0.0 | 2016 | June 21 |
| 8.0.0 | 2015 | June 17 |
| 7.0.0 | 2014 | June 16 |
| 6.3.0 | 2013 | September 30 |
| 6.2.0 | 2012 | September 26 |
| 6.1.0 | 2012 | January 31 |
| 6.0.0 | 2010 | October 11 |
| 5.2.0 | 2009 | October 1 |
| 5.1.0 | 2008 | April 4 |
| 5.0.0 | 2006 | July 14 |
| 4.1.0 | 2005 | March 31 |
| 4.0.1 | 2004 | March |
| 4.0.0 | 2003 | April |
| 3.2.0 | 2002 | March |
| 3.1.1 | 2001 | August |
| 3.1.0 | 2001 | March |
| 3.0.1 | 2000 | August |
| 3.0.0 | 1999 | September |
| 2.1.9 | 1999 | April |
| 2.1.8 | 1998 | December |
| 2.1.5 | 1998 | August |
| 2.1.2 | 1998 | May |
| 2.0.0 | 1996 | July |
| 1.1.5 | 1995 | July |
| 1.1.0 | 1993 | June |
| 1.0.1 | 1992 | June |
| 1.0.0 | 1991 | October
|