unicode
unicode是一种编码方案, utf-8是unicode的一种实现方式。
Python2 编码
In [
1]: a =
'啊哈哈'
In [
2]: a
Out[
2]:
'\xe5\x95\x8a\xe5\x93\x88\xe5\x93\x88'
In [
4]: type(a)
Out[
4]: str
In [
5]: len(a)
Out[
5]:
9
In [
6]: b =
u'姚赫赫'
In [
7]: type(b)
Out[
7]: unicode
In [
8]: len(b)
Out[
8]:
3
In [
9]: a.decode(
'utf-8')
Out[
9]:
u'\u554a\u54c8\u54c8'
In [
10]: b
Out[
10]:
u'\u59da\u8d6b\u8d6b'
In [
11]: b.encode(
'utf-8')
Out[
11]:
'\xe5\xa7\x9a\xe8\xb5\xab\xe8\xb5\xab'
In [
12]: c =
'姚赫赫'
In [
13]: c
Out[
13]:
'\xe5\xa7\x9a\xe8\xb5\xab\xe8\xb5\xab'
In [
14]:
import sys
In [
15]: sys.getdefaultencoding()
Out[
15]:
'ascii'
In [
16]: b + c
---------------------------------------------------------------------------
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-
16-c6b7c7e5694f>
in <module>()
---->
1 b + c
UnicodeDecodeError:
'ascii' codec can
't decode byte 0xe5 in position 0: ordinal not in range(128)
In [17]: import sys
In [18]: relaod(sys)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-18-f73449e725b6> in <module>()
----> 1 relaod(sys)
NameError: name 'relaod
' is not defined
In [19]: reload(sys)
<module 'sys
' (built-in)>
In [20]: sys.setdefaultencoding('utf-
8')
In [21]: b + c
Out[21]: u'\u59da\u8d6b\u8d6b\u59da\u8d6b\u8d6
b'
In [22]: type(b + c)
Out[22]: unicode
python2 中a='啊哈哈', a的类型是str, 是编码后的字节序列。a的长度是字节数;而b的类型是unicode(存储文本字符串), b的长度是字符数。
相互转化
str –>decode(‘utf-8’) –> unicode unicode –>encode(‘utf-8’)–> str 写入文件的时候str类型的可以直接写入,unicode类型的必须encode之后写入。
转载请注明原文地址: https://ju.6miu.com/read-662487.html