欢迎使用6miu-markdown编辑器

xiaoxiao2021-03-25 99

python读取中文目录

常用编码格式：ascii、gbk、unicode、utf-8python2.x内部编码默认为unicode

通常写代码时使用utf-8编码，在文件的开头处加入#coding=utf-8,之后在代码定义字符串时编码就是utf-8，比如： code:

#coding=utf-8 import chardet a='hello' b='hello你好' print chardet.detect(a) print chardet.detect(b)

result:

{'confidence': 1.0, 'encoding': 'ascii'} {'confidence': 0.7525, 'encoding': 'utf-8'}` 当字符串是ascii的子集时，则使用utf-8或者说ascii编码(这与utf-8的变长编码有关)，此时检测出的编码方式是ascii,实际上认为是utf-8编码即可，使用utf-8解码可以得到正确的结果当字符串超出ascii的子集，则就能检测出是utf-8编码

python要输出中文必须是gbk编码，不能是utf-8编码,打印utf-8编码的中文会乱码,比如： code

#coding=utf-8 a='hello你好' print a.decode('utf-8').encode('gbk') print a

result

hello你好 hello浣犲ソ

python读取中文目录返回的结果是unicode编码（u’xxx’）,要转成gbk编码输出 code

#coding=gbk import os import chardet path = 'h:\影音' files = os.listdir(unicode(path,'gbk')) print files for item in files: print item.encode('gbk')

result

[u'\u6b4c\u66f2', u'\u7535\u5f71', u'\u7535\u89c6\u5267', u'\u7efc\u827a'] 歌曲电影电视剧综艺

python访问中文目录要使用unicode编码，前面说过了python内部默认使用unicode编码，因此需要转换目录的编码（unicode和utf-8编码是不同的） code

#coding=utf-8 import os path = 'h:\影音' files = os.listdir(path)

result

Traceback (most recent call last): File "H:/Program/Projects/Itchat/test1/WindowsPathTest.py", line 6, in <module> files = os.listdir(path) WindowsError: [Error 3] : 'h:\\\xe5\xbd\xb1\xe9\x9f\xb3/*.*'

改正

#coding=utf-8 import os path = 'h:\影音' files = os.listdir(unicode(path,'utf-8'))

windows下默认使用gbk编码，因此可以直接使用gbk作为默认的编码,可直接输出中文 code

#coding=gbk import os path = 'h:\影音' files = os.listdir(unicode(path,'gbk')) for item in files: print item.encode('gbk') print '你好'

result

歌曲电影电视剧综艺你好

chardet模块可以用于检测字符串的编码，但是无法检测unicode编码，因为unicode编码前面有一个u(格式：u’xxx’) code

#coding=utf-8 import chardet a='hello你好' b=a.decode('utf-8').encode('gbk') c=a.decode('utf-8') print chardet.detect(a)["encoding"] print chardet.detect(b)["encoding"] print chardet.detect(c)["encoding"]

result

utf-8 ISO-8859-2 Traceback (most recent call last): File "H:/Program/Projects/Itchat/test1/WindowsPathTest.py", line 18, in <module> print chardet.detect(c)["encoding"] File "D:\PythonAll\Python27\lib\site-packages\chardet\__init__.py", line 25, in detect raise ValueError('Expected a bytes object, not a unicode object') ValueError: Expected a bytes object, not a unicode object 检测结果中把gbk检测成了ISO8859-2,这是一个补充了东欧语言的Latin-2，但是按照gbk解码还是可以得到正确的结果unicode编码检测出现了错误，原因就是unicode编码字符串前面有一个u

转载请注明原文地址: https://ju.6miu.com/read-16948.html

技术

最新回复(0)