How do I find invalid UTF-8 byte sequences in a file in Vim?
Answer
8g8
Explanation
The 8g8 command searches forward from the cursor for the first byte that belongs to an invalid UTF-8 sequence. This is invaluable when a file appears corrupt or displays garbled characters, letting you pinpoint exactly where the encoding problem begins.
How it works
g8shows the UTF-8 byte encoding of the character under the cursor- Prefixing with
8changes the behaviour:8g8finds the next invalid UTF-8 byte in the file and moves the cursor there - If no invalid byte exists, Vim reports "no invalid byte found" and the cursor stays put
- Works regardless of your current
fileencodingsetting
Example
Suppose you open a log file that contains garbled bytes (shown as question marks or boxes in your terminal). Pressing 8g8 from the top of the file jumps the cursor directly to the first invalid byte, which you can then inspect with g8 to see its raw byte values.
2024-01-15 request from 192.168.1.1
2024-01-15 user [garbled bytes here]
2024-01-15 response 200 OK
Tips
8g8does not integrate withn— re-press8g8to find the next invalid byte- Combine with
:set fileencoding=utf-8to verify the encoding Vim chose - After jumping to the offending byte, use
rto replace it orxto delete it - Use
:set bomb/:set nobombto control the BOM (byte order mark) separately