How do I collapse consecutive duplicate words across a file?
Answer
:%s/\v<(\w+)\s+\1>/\1/g\<CR>
Explanation
OCR cleanup, copy-paste artifacts, and rushed note-taking often produce repeated words like the the or is is. Instead of fixing them manually, you can remove adjacent duplicates across the whole buffer with one regex substitution. This keeps punctuation and spacing structure intact while reducing obvious noise quickly.
How it works
:%s/.../.../gruns substitution on all lines in the file\venables very-magic mode so the pattern stays readable<(\w+)captures a whole word into group 1\s+matches one or more spaces between repeated words\1>requires the exact same captured word again- Replacement
\1keeps only one copy
This pattern is especially useful before proofreading passes, search indexing, or generating diff-friendly output from noisy source text.
Example
Before:
This is is a test.
We we should should fix this.
Run:
:%s/\v<(\w+)\s+\1>/\1/g
After:
This is a test.
We should fix this.
Tips
- Add
c(.../gc) when you want confirmation per match. - For case-insensitive cleanup, prepend
\cinside the pattern.