Skip to content

Instantly share code, notes, and snippets.

@selfboot
Last active December 15, 2015 17:39
Show Gist options
  • Save selfboot/5298362 to your computer and use it in GitHub Desktop.
Save selfboot/5298362 to your computer and use it in GitHub Desktop.
python 编码检测示例程序.
>>> import requests
>>> r = requests.get('http://www.luoo.net/radio/radio2/mp3player.xml')
>>> r.status_code
200
>>> print r.content
锘??xml version="1.0" encoding="UTF-8"?>
<player showDisplay="yes" showPlaylist="yes" autoStart="yes">
<song path="http://ftp.luoo.net/radio/radio2/1.mp3" title="鏃呰€? />
<song path="http://ftp.luoo.net/radio/radio2/2.mp3" title="绱㈤潪浜? />
<song path="http://ftp.luoo.net/radio/radio2/3.mp3" title="涓夊嘲" />
<song path="http://ftp.luoo.net/radio/radio2/4.mp3" title="閬ヤ笉鍙強" />
...
>>> import chardet
>>> chardet.detect(r.content)
{'confidence': 1.0, 'encoding': 'UTF-8'}
>>> print r.content.decode('utf8').encode('gbk')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'gbk' codec can't encode character u'\ufeff' in position 0:
illegal multibyte sequence
>>> print r.content.decode('utf-8-sig').encode('gbk')
<?xml version="1.0" encoding="UTF-8"?>
<player showDisplay="yes" showPlaylist="yes" autoStart="yes">
<song path="http://ftp.luoo.net/radio/radio2/1.mp3" title="旅者" />
<song path="http://ftp.luoo.net/radio/radio2/2.mp3" title="索非亚" />
<song path="http://ftp.luoo.net/radio/radio2/3.mp3" title="三峰" />
<song path="http://ftp.luoo.net/radio/radio2/4.mp3" title="遥不可及" />
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment