When dealing with legacy data it's been pretty common to run into malformed / illegal byte sequences in files. Figuring out what's causing the issue is often really difficult, especially when the file has thousands of rows.
Here's a trick I pretty much stumpbled upon:
nl file.txt | sort
sort: string comparison failed: Illegal byte sequence
sort: Set LC_ALL='C' to work around the problem.