Caveats: These are all guesswork, they might be incorrect or may block more than intended, but they work for me.
NB, i have a robots.txt file specifying a crawl-rate of one request every 5 seconds, the below appear to be ignoring this. Generally i turn a blind eye to anything thats not invoking a server-generating url multiple times a second - these are people being excessive and causing undue load on relatively-modest servers.
User-agent: *
Allow: /
Crawl-delay: 5