Encoding and Binary Data
Serial communication deals with bytes. How those bytes map to characters — or whether they represent characters at all — depends on your encoding choice. This guide covers when to use each encoding and how to handle binary protocols that have no character representation.
The Encoding Parameter
Section titled “The Encoding Parameter”Most mcserial read and write tools accept an encoding parameter that controls how strings convert to/from bytes:
// Text with UTF-8 (default)// write_serial(port="/dev/ttyUSB0", data="Hello, 世界\r\n", encoding="utf-8")
// Raw bytes as Latin-1// write_serial(port="/dev/ttyUSB0", data="\x01\x03\x00\x00\x00\x01", encoding="latin-1")When reading, mcserial decodes incoming bytes using the specified encoding. Invalid byte sequences are replaced with the Unicode replacement character (�) rather than throwing an error — this is the errors="replace" behavior in Python.
UTF-8: The Default
Section titled “UTF-8: The Default”UTF-8 is the default encoding for all string operations. It handles ASCII (bytes 0x00–0x7F) directly and encodes non-ASCII characters as multi-byte sequences.
When to use UTF-8:
- ASCII text protocols (AT commands, console output)
- Devices that explicitly use UTF-8 (modern systems, JSON/XML output)
- Human-readable text where you want proper Unicode support
// Reading UTF-8 text// read_serial_line(port="/dev/ttyUSB0", encoding="utf-8"){ "line": "Temperature: 23.5°C", "bytes_read": 21}Latin-1: The Raw Byte Passthrough
Section titled “Latin-1: The Raw Byte Passthrough”Latin-1 (ISO-8859-1) maps bytes 0x00–0xFF directly to Unicode code points U+0000–U+00FF. This makes it a perfect passthrough for raw binary data — every possible byte value round-trips through encoding and decoding unchanged.
When to use Latin-1:
- Binary protocols (Modbus RTU, proprietary framing)
- Data with arbitrary byte values (firmware blobs, encrypted payloads)
- Protocol analysis where you need to see exact bytes
// Writing a Modbus RTU request (address 1, read holding registers)// write_serial(// port="/dev/ttyUSB0",// data="\x01\x03\x00\x00\x00\x01\x84\x0A",// encoding="latin-1"// ){ "bytes_written": 8}mcserial defaults to Latin-1 for binary-oriented operations like rs485_scan_addresses precisely because it ensures byte-for-byte fidelity.
ASCII: Strict 7-Bit
Section titled “ASCII: Strict 7-Bit”ASCII only covers bytes 0x00–0x7F. Bytes outside this range cause encoding errors.
When to use ASCII:
- Strict validation of 7-bit protocols
- Legacy systems that only accept ASCII
- When you want encoding to fail loudly on invalid data
// This would fail if the device sends bytes > 0x7F// read_serial(port="/dev/ttyUSB0", encoding="ascii")Binary Data: write_serial_bytes
Section titled “Binary Data: write_serial_bytes”For precise binary control, use write_serial_bytes instead of write_serial. It accepts a list of integer byte values (0–255) and writes them directly:
// Write exact bytes without encoding conversion// write_serial_bytes(port="/dev/ttyUSB0", data=[0x01, 0x03, 0x00, 0x00, 0x00, 0x01, 0x84, 0x0A]){ "bytes_written": 8}This is clearer than escaping bytes in a string and avoids any encoding ambiguity.
Reading Binary: The Hex Dump Resource
Section titled “Reading Binary: The Hex Dump Resource”For binary analysis, use the serial://{port}/raw resource. It returns data as a hex dump with printable ASCII annotations:
Read resource: serial:///dev/ttyUSB0/rawOutput format:
00000000 01 03 02 00 64 B8 44 |....d.D|This shows:
- Offset (00000000)
- Hex bytes (01 03 02 00 64 B8 44)
- ASCII representation (. for non-printable, actual character otherwise)
Common Protocol Patterns
Section titled “Common Protocol Patterns”Modbus RTU (Binary)
Section titled “Modbus RTU (Binary)”Modbus RTU is a binary protocol with CRC-16 error checking. Always use Latin-1 or write_serial_bytes:
// Request: Read holding register 0 from device 1// write_serial(// port="/dev/ttyUSB0",// data="\x01\x03\x00\x00\x00\x01\x84\x0A",// encoding="latin-1"// )
// Read response// read_serial(port="/dev/ttyUSB0", size=7, encoding="latin-1")// Same request using byte array// write_serial_bytes(// port="/dev/ttyUSB0",// data=[0x01, 0x03, 0x00, 0x00, 0x00, 0x01, 0x84, 0x0A]// )NMEA (GPS) — ASCII Text
Section titled “NMEA (GPS) — ASCII Text”NMEA sentences are pure ASCII with a simple checksum:
// NMEA sentences are safe with UTF-8 or ASCII// read_serial_line(port="/dev/ttyUSB0", encoding="utf-8"){ "line": "$GPGGA,123519,4807.038,N,01131.000,E,1,08,0.9,545.4,M,47.0,M,,*47"}Mixed Text/Binary Protocols
Section titled “Mixed Text/Binary Protocols”Some protocols embed binary data within text framing. Handle these by:
- Reading with Latin-1 to preserve all bytes
- Parsing the text portions as needed
- Extracting binary payloads by position
// Read with Latin-1 to preserve everything// read_serial(port="/dev/ttyUSB0", size=100, encoding="latin-1")
// Then parse: "DATA:" prefix + 4-byte length + binary payload + "\r\n"Invalid Byte Handling
Section titled “Invalid Byte Handling”When decoding with UTF-8 (or any multi-byte encoding), invalid sequences are replaced with � (U+FFFD). This is intentional — it prevents crashes and makes problems visible.
Symptoms of encoding mismatch:
| What You See | Likely Cause |
|---|---|
| Scattered � in output | Binary data decoded as UTF-8 |
| Truncated strings | Multi-byte sequence split across reads |
| Missing bytes | XON/XOFF stripping 0x11/0x13 |
Diagnosis:
// Switch to Latin-1 to see raw bytes// read_serial(port="/dev/ttyUSB0", encoding="latin-1")
// Or use the hex dump resource for full visibility// Read resource: serial:///dev/ttyUSB0/rawEncoding Quick Reference
Section titled “Encoding Quick Reference”| Encoding | Byte Range | Use Case |
|---|---|---|
utf-8 | Multi-byte | Text protocols, console I/O, JSON (default) |
latin-1 | 0x00–0xFF → U+0000–U+00FF | Binary protocols, raw byte passthrough |
ascii | 0x00–0x7F | Strict 7-bit validation |
| (bytes) | 0–255 | write_serial_bytes for explicit binary |
Rules of thumb:
- Text you can read? Use UTF-8 (default)
- Binary protocol? Use Latin-1 or
write_serial_bytes - Seeing � characters? You’re decoding binary as UTF-8 — switch to Latin-1
- Need to analyze raw bytes? Use the
serial://{port}/rawresource
CRC and Checksum Calculation
Section titled “CRC and Checksum Calculation”Many binary protocols include error-checking bytes (CRC, checksum). When constructing frames:
- Build the data portion as a byte array
- Calculate the check value
- Append the check bytes
- Send via
write_serial_bytesor Latin-1
Example: Simple XOR checksum
# In your MCP client or preprocessingdata = [0x01, 0x03, 0x00, 0x00, 0x00, 0x01]checksum = 0for b in data: checksum ^= bframe = data + [checksum]# frame = [0x01, 0x03, 0x00, 0x00, 0x00, 0x01, 0x03]Then send:
// write_serial_bytes(port="/dev/ttyUSB0", data=[1, 3, 0, 0, 0, 1, 3])