vimtricks.wiki Concise Vim tricks, one at a time.

How do I find invalid UTF-8 byte sequences in a file in Vim?

Answer

8g8

Explanation

The 8g8 command searches forward from the cursor for the first byte that belongs to an invalid UTF-8 sequence. This is invaluable when a file appears corrupt or displays garbled characters, letting you pinpoint exactly where the encoding problem begins.

How it works

  • g8 shows the UTF-8 byte encoding of the character under the cursor
  • Prefixing with 8 changes the behaviour: 8g8 finds the next invalid UTF-8 byte in the file and moves the cursor there
  • If no invalid byte exists, Vim reports "no invalid byte found" and the cursor stays put
  • Works regardless of your current fileencoding setting

Example

Suppose you open a log file that contains garbled bytes (shown as question marks or boxes in your terminal). Pressing 8g8 from the top of the file jumps the cursor directly to the first invalid byte, which you can then inspect with g8 to see its raw byte values.

2024-01-15 request from 192.168.1.1
2024-01-15 user [garbled bytes here]
2024-01-15 response 200 OK

Tips

  • 8g8 does not integrate with n — re-press 8g8 to find the next invalid byte
  • Combine with :set fileencoding=utf-8 to verify the encoding Vim chose
  • After jumping to the offending byte, use r to replace it or x to delete it
  • Use :set bomb / :set nobomb to control the BOM (byte order mark) separately

Next

How do I extract regex capture groups from a string in Vimscript?