Created
April 2, 2014 16:44
-
-
Save sochoa/9937948 to your computer and use it in GitHub Desktop.
finding non-ascii characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python | |
# -*- coding: utf-8 -*- | |
# Here's a program that will find non-ascii characters that might | |
# break your program, especially if the non-ascii characters are | |
# in a json config file. | |
# The offending characters below should be smart quotes | |
# from M$ Word on the first line of the cfg. | |
cfg = u""" | |
Some Offending "ascii text". | |
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed | |
do eiusmod tempor incididunt ut labore et dolore magna aliqua. | |
Ut enim ad minim veniam, quis nostrud exercitation ullamco | |
laboris nisi ut aliquip ex ea commodo consequat. Duis aute | |
irure dolor in reprehenderit in voluptate velit esse cillum | |
dolore eu fugiat nulla pariatur. Excepteur sint occaecat | |
cupidatat non proident, sunt in culpa qui officia deserunt | |
mollit anim id est laborum. | |
""" | |
LINE_BREAK = 10 | |
CARRIAGE_RETURN = 13 | |
lines = 1 | |
for idx, c in enumerate(list(cfg)): | |
if ord(c) in [LINE_BREAK, CARRIAGE_RETURN]: | |
lines += 1 | |
if ord(c) > 127: | |
print "line # ", lines, ", ascii code: ", ord(c), ", actual character: ", c |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment