Encoding and Binary Data

Serial communication deals with bytes. How those bytes map to characters — or whether they represent characters at all — depends on your encoding choice. This guide covers when to use each encoding and how to handle binary protocols that have no character representation.

The Encoding Parameter

Most mcserial read and write tools accept an encoding parameter that controls how strings convert to/from bytes:

// Text with UTF-8 (default)
// write_serial(port="/dev/ttyUSB0", data="Hello, 世界\r\n", encoding="utf-8")

// Raw bytes as Latin-1
// write_serial(port="/dev/ttyUSB0", data="\x01\x03\x00\x00\x00\x01", encoding="latin-1")

When reading, mcserial decodes incoming bytes using the specified encoding. Invalid byte sequences are replaced with the Unicode replacement character (�) rather than throwing an error — this is the errors="replace" behavior in Python.

UTF-8: The Default

UTF-8 is the default encoding for all string operations. It handles ASCII (bytes 0x00–0x7F) directly and encodes non-ASCII characters as multi-byte sequences.

When to use UTF-8:

ASCII text protocols (AT commands, console output)
Devices that explicitly use UTF-8 (modern systems, JSON/XML output)
Human-readable text where you want proper Unicode support

// Reading UTF-8 text
// read_serial_line(port="/dev/ttyUSB0", encoding="utf-8")
{
  "line": "Temperature: 23.5°C",
  "bytes_read": 21
}

Latin-1: The Raw Byte Passthrough

Latin-1 (ISO-8859-1) maps bytes 0x00–0xFF directly to Unicode code points U+0000–U+00FF. This makes it a perfect passthrough for raw binary data — every possible byte value round-trips through encoding and decoding unchanged.

When to use Latin-1:

Binary protocols (Modbus RTU, proprietary framing)
Data with arbitrary byte values (firmware blobs, encrypted payloads)
Protocol analysis where you need to see exact bytes

// Writing a Modbus RTU request (address 1, read holding registers)
// write_serial(
//   port="/dev/ttyUSB0",
//   data="\x01\x03\x00\x00\x00\x01\x84\x0A",
//   encoding="latin-1"
// )
{
  "bytes_written": 8
}

mcserial defaults to Latin-1 for binary-oriented operations like rs485_scan_addresses precisely because it ensures byte-for-byte fidelity.

ASCII: Strict 7-Bit

ASCII only covers bytes 0x00–0x7F. Bytes outside this range cause encoding errors.

When to use ASCII:

Strict validation of 7-bit protocols
Legacy systems that only accept ASCII
When you want encoding to fail loudly on invalid data

// This would fail if the device sends bytes > 0x7F
// read_serial(port="/dev/ttyUSB0", encoding="ascii")

Binary Data: write_serial_bytes

For precise binary control, use write_serial_bytes instead of write_serial. It accepts a list of integer byte values (0–255) and writes them directly:

// Write exact bytes without encoding conversion
// write_serial_bytes(port="/dev/ttyUSB0", data=[0x01, 0x03, 0x00, 0x00, 0x00, 0x01, 0x84, 0x0A])
{
  "bytes_written": 8
}

This is clearer than escaping bytes in a string and avoids any encoding ambiguity.

Reading Binary: The Hex Dump Resource

For binary analysis, use the serial://{port}/raw resource. It returns data as a hex dump with printable ASCII annotations:

Read resource: serial:///dev/ttyUSB0/raw

Output format:

00000000  01 03 02 00 64 B8 44                              |....d.D|

This shows:

Offset (00000000)
Hex bytes (01 03 02 00 64 B8 44)
ASCII representation (. for non-printable, actual character otherwise)

Common Protocol Patterns

Modbus RTU (Binary)

Modbus RTU is a binary protocol with CRC-16 error checking. Always use Latin-1 or write_serial_bytes:

Using Latin-1
Using write_serial_bytes

// Request: Read holding register 0 from device 1
// write_serial(
//   port="/dev/ttyUSB0",
//   data="\x01\x03\x00\x00\x00\x01\x84\x0A",
//   encoding="latin-1"
// )

// Read response
// read_serial(port="/dev/ttyUSB0", size=7, encoding="latin-1")

// Same request using byte array
// write_serial_bytes(
//   port="/dev/ttyUSB0",
//   data=[0x01, 0x03, 0x00, 0x00, 0x00, 0x01, 0x84, 0x0A]
// )

NMEA (GPS) — ASCII Text

NMEA sentences are pure ASCII with a simple checksum:

// NMEA sentences are safe with UTF-8 or ASCII
// read_serial_line(port="/dev/ttyUSB0", encoding="utf-8")
{
  "line": "$GPGGA,123519,4807.038,N,01131.000,E,1,08,0.9,545.4,M,47.0,M,,*47"
}

Mixed Text/Binary Protocols

Some protocols embed binary data within text framing. Handle these by:

Reading with Latin-1 to preserve all bytes
Parsing the text portions as needed
Extracting binary payloads by position

// Read with Latin-1 to preserve everything
// read_serial(port="/dev/ttyUSB0", size=100, encoding="latin-1")

// Then parse: "DATA:" prefix + 4-byte length + binary payload + "\r\n"

Invalid Byte Handling

When decoding with UTF-8 (or any multi-byte encoding), invalid sequences are replaced with � (U+FFFD). This is intentional — it prevents crashes and makes problems visible.

Symptoms of encoding mismatch:

What You See	Likely Cause
Scattered � in output	Binary data decoded as UTF-8
Truncated strings	Multi-byte sequence split across reads
Missing bytes	XON/XOFF stripping 0x11/0x13

Diagnosis:

// Switch to Latin-1 to see raw bytes
// read_serial(port="/dev/ttyUSB0", encoding="latin-1")

// Or use the hex dump resource for full visibility
// Read resource: serial:///dev/ttyUSB0/raw

Encoding Quick Reference

Encoding	Byte Range	Use Case
`utf-8`	Multi-byte	Text protocols, console I/O, JSON (default)
`latin-1`	0x00–0xFF → U+0000–U+00FF	Binary protocols, raw byte passthrough
`ascii`	0x00–0x7F	Strict 7-bit validation
(bytes)	0–255	`write_serial_bytes` for explicit binary

Rules of thumb:

Text you can read? Use UTF-8 (default)
Binary protocol? Use Latin-1 or write_serial_bytes
Seeing � characters? You’re decoding binary as UTF-8 — switch to Latin-1
Need to analyze raw bytes? Use the serial://{port}/raw resource

CRC and Checksum Calculation

Many binary protocols include error-checking bytes (CRC, checksum). When constructing frames:

Build the data portion as a byte array
Calculate the check value
Append the check bytes
Send via write_serial_bytes or Latin-1

Example: Simple XOR checksum

# In your MCP client or preprocessing
data = [0x01, 0x03, 0x00, 0x00, 0x00, 0x01]
checksum = 0
for b in data:
    checksum ^= b
frame = data + [checksum]
# frame = [0x01, 0x03, 0x00, 0x00, 0x00, 0x01, 0x03]

Then send:

// write_serial_bytes(port="/dev/ttyUSB0", data=[1, 3, 0, 0, 0, 1, 3])