Character Set in Java

character set in java

Character set helps Java to understand the letters you type, the numbers you enter and even emojis you sneak into your code.

Character Set in Java – Explained for Students

Let’s Begin with a Question Have you ever thought about how a computer understands what you type?
When you press a key — say, the letter A — how does your laptop or mobile phone understand it?
That’s where character sets, binary code, and encoding systems come in.

Let’s break it all down together — in the simplest way possible.

Communicating with Devices

    Just like we use language to talk to each other, we need a special language to communicate with computers. But computers don’t understand letters or symbols like we do. They only understand binary numbers (0s and 1s).

    So, if you write “Hello”, the computer sees something like:

01001000 01100101 01101100 01101100 01101111.

    So, every single letter, digit or symbol we use in Java (or any language) must be converted to binary.

What is Binary Code?

Binary code is a number system that uses only two digits: 0 and 1. Here’s how numbers are converted to binary:

  • 0 → 0000
  • 1 → 0001
  • 2 → 0010
  • 5 → 0101
  • 10 → 1010

    Computers process everything in binary, whether it’s a number, a letter or even an emoji!

Letters to Binary

    But how do we convert characters like A, B, or $ into binary?

    We use an encoding system. An encoding system is like a dictionary that matches each letter or symbol to a unique number. For example:

  • A = 65
  • B = 66
  • C = 67
  • a = 97
  • $ = 36

    Once we have the number, we can easily convert it to binary. For example, A = 65 → 01000001

What is a Character Set?

The Character Set of Java is the collection of all letters, numbers, symbols and special characters that Java understands and uses when writing a program.

    Just like the English alphabet has 26 letters, the Java character set includes everything you can use in Java code — like alphabets, digits, operators, punctuation, spaces and more.

    Java Character Set Includes:

1. Letters / Alphabets

  • Uppercase letters: A to Z
  • Lowercase letters: a to z

Java is case-sensitive, so A and a are not the same.

2. Digits / Numbers

  • 0 to 9

3. Special Symbols / Punctuation

    These are used for various tasks in Java, such as:

  • +, -, *, /, % → Arithmetic operators
  • =, ==, !=, >, <, >=, <= → Relational operators
  • ;, ,, ., : → Punctuation marks
  • {, }, (, ), [, ] → Braces and brackets
  • @, #, $, &, _, ~, !, ^ → Miscellaneous symbols

4. White Spaces

Space, tab (\t), new line (\n), carriage return (\r) and many more…..

A character set is a list of characters and their assigned codes (usually in binary). This helps the computer know what each symbol means. Java uses a character set to store and process text data.

What is ASCII ?

    ASCII stands for American Standard Code for Information Interchange. It is an encoding system that gives a number (code) to every character like letters, digits and symbols so the computer can store and understand them.

    Even though Java primarily uses Unicode, it is fully compatible with ASCII. This means all ASCII characters are part of Java’s character set.

ASCII Character Set Includes:

  1. Uppercase Letters (A–Z)
    • Range: 65 to 90
    • Example: A → 65
  2. Lowercase Letters (a–z)
    • Range: 97 to 122
    • Example: a → 97
  3. Digits (0–9)
    • Range: 48 to 57
    • Example: 0 → 48
  4. Special Symbols
    • Range: 32 to 47
    • Examples: Space → 32
  5. Control Characters (Non-printable)
    • Range: 0 to 31
    • Examples: 10 → Line Feed (New Line)
  6. Extended ASCII (Optional)
    • Range: 128 to 255 (Used for graphical symbols and extra characters in some systems)

Limitation of ASCII:

    Supports only English characters and some symbols. Cannot represent Hindi, Chinese, Arabic, or Emojis. That’s why Java uses Unicode for full support.

Unicode – A Better and Bigger System

    To support all world languages, emojis, and special characters, we needed a better system and that’s where Unicode came in.

    Unicode can represent over 1 million characters! It supports every language like Hindi, Chinese, Arabic, Tamil Japanese and even emojis! Java uses Unicode as its standard character set.

Working with Unicode in Java

    Every char (character data type) uses 2 bytes (16 bits). This means Java can represent up to 65,536 Unicode characters directly. Here’s how you can use Unicode in Java:

public class UnicodeExample {
    public static void main(String[] args) {
        char ch1 = 'A';              // Direct character
        char ch2 = '\u0905';         // Unicode for Hindi letter अ
        System.out.println(ch1);     // Output: A
        System.out.println(ch2);     // Output: अ
    }
}

Note: \uXXXX is used to represent a Unicode character in Java. Replace XXXX with the Unicode value in hexadecimal.

ASCII vs Unicode – What’s the Difference?

ParameterASCIIUNICODE
Full formASCII stands for American Standard Code for Information Interchange.UNICODE stands for Universal Character Set.
Mutual RelationshipASCII is a subset of UNICODE encoding scheme.UNICODE is a superset of ASCII.
Supporting CharactersASCII supports only 128 characters using 7-bit encoding scheme. It contains codes representing English characters, digits, and standard special symbols.UNICODE supports a wide range of characters. It supports 154 written scripts.
Bits per CharacterASCII uses 7-bit or 8-bits (Extended ASCII) to represent different characters.UNICODE uses mainly four character encoding schemes namely UTF-7 (7-bit), UTE-8 (8-bit), UTF-16 (16-bit), and UTF-32 (32-bit).
Memory ConsumptionASCII consumes less memory.UNICODE consumes more memory as compared to ASCII.
Characters RepresentedASCII can represent only English letters, digits, certain mathematical symbols, and some grammatical symbols, etc.UNICODE can represent a large range characters, special symbols, formulae, etc. from different languages such as English, Latin, Greek, etc.
First Edition ReleaseThe first edition Of ASCII was released in 1963.The first edition Of UNICODE was released in 1991.
ApplicationsASCII encoding scheme is used in computers and other electronic devices for exchange of data. It is also used in programming languages like HTML.UNICODE is used by IT industries for encoding and character representation in computers.

Conclusion:

    The character set in Java includes all letters, digits, symbols and special characters used in coding.

    Java uses the Unicode system, which supports characters from all languages around the world. Each character is stored as a numeric code which the computer reads in binary form.

    Understanding character sets helps us to handle text, symbols and multilingual data easily in Java. It’s the key to make programs that can communicate with users and devices correctly.


Share the Post:

Leave a Comment

Your email address will not be published. Required fields are marked *

Related Posts​

  • Data Types in Java
    Data types in Java define the kind of data a variable can hold. They act as a blueprint that tells the compiler or interpreter what type of value can be stored in a particular variable.
  • Punctuators and Separators in Java
    Punctuators are special characters or symbols that serve to structure and organize code, providing syntactic and semantic meaning to the compiler.

Join Our Newsletter

Name
Email
The form has been submitted successfully!
There has been some error while submitting the form. Please verify all form fields again.
Scroll to Top