Skip to content

Instantly share code, notes, and snippets.

@mattcarp
Created July 24, 2012 22:09
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save mattcarp/3173004 to your computer and use it in GitHub Desktop.
Save mattcarp/3173004 to your computer and use it in GitHub Desktop.
Generates a random length string of non-control Unicode characters
#!/usr/bin/python
import random
import unicodedata
def unicode_fuzz(lower_limit=0, upper_limit=60):
unicode_glyphs = ''.join(
unichr(char)
for char in xrange(65533)
# use the unicode categories that don't include control codes
if unicodedata.category(unichr(char))[0] in ('LMNPSZ')
)
rand_length = random.randint(lower_limit, upper_limit)
# generate it
utf_string = ''.join([random.choice(unicode_glyphs).encode('utf-8')
for i in xrange(rand_length)])
return utf_string
# call it with some upper and lower string lengths, or use the defaults
# print unicode_fuzz(0, 250)
@Nasko-5
Copy link

Nasko-5 commented Apr 18, 2021

what is xrange?

@FooqX
Copy link

FooqX commented Sep 27, 2021

This doesn't work! I tried fixing it but it didn't output anything. Here is my fixed version:

from random import randint, choice
from unicodedata import category

import pyparsing
from cffi.backend_ctypes import xrange


def gen_unicode(lower_limit, upper_limit):
    unicode_glyphs = ''.join(
        pyparsing.unichr(char) for char in xrange(65533) if category(pyparsing.unichr(char))[0] in 'LMNPSZ')

    rand_length = randint(lower_limit, upper_limit)
    utf_string = ''.join([choice(unicode_glyphs).encode('utf-8') for _ in xrange(rand_length)])

    return utf_string


print(gen_unicode(0, 30))

@VBPROGER
Copy link

VBPROGER commented Apr 7, 2022

This doesn't work! I tried fixing it but it didn't output anything. Here is my fixed version:

from random import randint, choice
from unicodedata import category

import pyparsing
from cffi.backend_ctypes import xrange


def gen_unicode(lower_limit, upper_limit):
    unicode_glyphs = ''.join(
        pyparsing.unichr(char) for char in xrange(65533) if category(pyparsing.unichr(char))[0] in 'LMNPSZ')

    rand_length = randint(lower_limit, upper_limit)
    utf_string = ''.join([choice(unicode_glyphs).encode('utf-8') for _ in xrange(rand_length)])

    return utf_string


print(gen_unicode(0, 30))

Error: TypeError: sequence item 0: expected str instance, bytes found

@mattcarp
Copy link
Author

mattcarp commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment