python2 unicode str

    xiaoxiao2021-03-26  22

    unicode

    unicode是一种编码方案, utf-8是unicode的一种实现方式。

    Python2 编码

    In [1]: a = '啊哈哈' In [2]: a Out[2]: '\xe5\x95\x8a\xe5\x93\x88\xe5\x93\x88' In [4]: type(a) Out[4]: str In [5]: len(a) Out[5]: 9 In [6]: b = u'姚赫赫' In [7]: type(b) Out[7]: unicode In [8]: len(b) Out[8]: 3 In [9]: a.decode('utf-8') Out[9]: u'\u554a\u54c8\u54c8' In [10]: b Out[10]: u'\u59da\u8d6b\u8d6b' In [11]: b.encode('utf-8') Out[11]: '\xe5\xa7\x9a\xe8\xb5\xab\xe8\xb5\xab' In [12]: c = '姚赫赫' In [13]: c Out[13]: '\xe5\xa7\x9a\xe8\xb5\xab\xe8\xb5\xab' In [14]: import sys In [15]: sys.getdefaultencoding() Out[15]: 'ascii' In [16]: b + c --------------------------------------------------------------------------- UnicodeDecodeError Traceback (most recent call last) <ipython-input-16-c6b7c7e5694f> in <module>() ----> 1 b + c UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128) In [17]: import sys In [18]: relaod(sys) --------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-18-f73449e725b6> in <module>() ----> 1 relaod(sys) NameError: name 'relaod' is not defined In [19]: reload(sys) <module 'sys' (built-in)> In [20]: sys.setdefaultencoding('utf-8') In [21]: b + c Out[21]: u'\u59da\u8d6b\u8d6b\u59da\u8d6b\u8d6b' In [22]: type(b + c) Out[22]: unicode

    python2 中a='啊哈哈', a的类型是str, 是编码后的字节序列。a的长度是字节数;而b的类型是unicode(存储文本字符串), b的长度是字符数。

    相互转化

    str –>decode(‘utf-8’) –> unicode unicode –>encode(‘utf-8’)–> str 写入文件的时候str类型的可以直接写入,unicode类型的必须encode之后写入。

    转载请注明原文地址: https://ju.6miu.com/read-662487.html

    最新回复(0)