What are the key takeaways from this guide?

Mojibake is garbled text that appears when text is decoded with the wrong character encoding.. ASCII covers 128 characters (English letters, digits, basic symbols) using 7 bits.. Windows applications often default to Windows-1252 (similar to Latin-1).. Look at the corruption pattern.. Set UTF-8 everywhere: database charset (utf8mb4 in MySQL, UTF8 in PostgreSQL), file encoding in your editor, HTTP Content-Type headers (charset=utf-8), HTML meta charset tag, and programming language string handling..

Who is this guide for?

This guide is designed for beginner-level users and takes about 1 minutes to read.

Troubleshooting Beginner 1 min read 249 words

Solving Text Encoding Issues Across Platforms

Fix character corruption (mojibake) when text appears garbled due to encoding mismatches.

Key Takeaways

Mojibake is garbled text that appears when text is decoded with the wrong character encoding.
ASCII covers 128 characters (English letters, digits, basic symbols) using 7 bits.
Windows applications often default to Windows-1252 (similar to Latin-1).
Look at the corruption pattern.
Set UTF-8 everywhere: database charset (utf8mb4 in MySQL, UTF8 in PostgreSQL), file encoding in your editor, HTTP Content-Type headers (charset=utf-8), HTML meta charset tag, and programming language string handling.

Featured Tool

Счётчик слов

Подсчёт слов, символов и времени чтения.

Try it Free

What Is Mojibake

Mojibake is garbled text that appears when text is decoded with the wrong character encoding. For example, the word "café" stored as UTF-8 but read as Latin-1 becomes "cafÃ©." This happens because different encodings use different byte sequences for the same character.

Understanding Character Encodings

ASCII covers 128 characters (English letters, digits, basic symbols) using 7 bits. Latin-1 (ISO 8859-1) extends ASCII to 256 characters, covering Western European languages. UTF-8 uses 1-4 bytes per character and covers all Unicode characters (150,000+). UTF-8 is backward-compatible with ASCII — plain English text looks identical in both.

Common Encoding Mismatches

Windows applications often default to Windows-1252 (similar to Latin-1). Mac and Linux default to UTF-8. Excel opens CSV files using the system's default encoding. Databases may use latin1 when they should use utf8mb4. Copy-pasting between applications can introduce encoding changes.

Diagnosis Steps

Look at the corruption pattern. "Ã©" replacing "é" means UTF-8 data read as Latin-1. "?" or "□" replacing characters means the font or encoding lacks those characters. "é" (double mojibake) means UTF-8 was decoded, re-encoded as Latin-1, then decoded as UTF-8 again.

Prevention Strategy

Set UTF-8 everywhere: database charset (utf8mb4 in MySQL, UTF8 in PostgreSQL), file encoding in your editor, HTTP Content-Type headers (charset=utf-8), HTML meta charset tag, and programming language string handling. When receiving external data, detect the encoding using libraries like chardet (Python) or jschardet (JavaScript) before processing. Always explicitly specify encoding when reading files rather than relying on defaults.

Связанные инструменты

С Счётчик слов К Конвертер регистра С Сортировка строк Г Генератор Lorem Ipsum Г Генератор слагов П Поиск и замена У Удаление дубликатов строк К Кодировщик/декодировщик Base64 К Кодировщик/декодировщик URL Ф Форматирование JSON К Кодировщик/декодировщик HTML-сущностей П Переворот текста Д Добавить/удалить нумерацию строк С Сравнение текстов И Извлечение текста

Связанные форматы

.csv .html .json .md .txt .xml

Связанные руководства

Text Encoding Explained: UTF-8, ASCII, and Beyond

Text encoding determines how characters are stored as bytes. Understanding UTF-8, ASCII, and other encodings prevents garbled text, mojibake, and data corruption in your applications and documents.

Regular Expressions: A Practical Guide for Text Processing

Regular expressions are powerful patterns for searching, matching, and transforming text. This guide covers the most useful regex patterns with real-world examples for common text processing tasks.

Markdown vs Rich Text vs Plain Text: When to Use Each

Choosing between Markdown, rich text, and plain text affects portability, readability, and editing workflow. This comparison helps you select the right text format for documentation, notes, and content creation.

How to Convert Case and Clean Up Messy Text

Messy text with inconsistent capitalization, extra whitespace, and mixed formatting is a common problem. This guide covers tools and techniques for cleaning, transforming, and standardizing text efficiently.

Troubleshooting Character Encoding Problems

Garbled text, question marks, and missing characters are symptoms of encoding mismatches. This guide helps you diagnose and fix the most common character encoding problems in web pages, files, and databases.