coursera-使用python访问网络数据-密歇根大学

    xiaoxiao2021-12-14  22

    课程总体比较简单,是python for everyone的系列课程之一,零基础的朋友也好上手。 课程简单随记笔记在此记录一下。 week2 正则表达式 ‘^From’ 以’From’为开头 ‘.’匹配任意字符 ‘.*’任意字符任意次数 \S 非空格字符 + 表示一次或者多次 eg.\S+ 至少一个非空格字符 re.search()返回是否匹配正则表达式 re.findall()提取所有符合的字符串,返回一个列表 [0-9] 匹配一个数字 []一个字符 默认贪婪匹配,即返回最长的符合的字符串 ?代表不进行贪婪匹配 ()提取的起始和结束 [^ ]一个非空格字符,这里^表示非

    Quick Guide ^ Matches the beginning of a line $ Matches the end of the line . Matches any character \s Matches whitespace \S Matches any non-whitespace character * Repeats a character zero or more times *? Repeats a character zero or more times (non-greedy) + Repeats a character one or more times +? Repeats a character one or more times (non-greedy) [aeiou] Matches a single character in the listed set [^XYZ] Matches a single character not in the listed set [a-z0-9] The set of characters can include a range ( Indicates where string extraction is to start ) Indicates where string extraction is to end

    week3 Socket使客户端和服务器端能够进行通信 TCP端口号 python 建立socket连接:

    import socket mysock=socket.socket(socket.AF_INET,socket.SOCK_STREAM) mysock.connect(('www.py4inf.com',80))

    HTTP协议 URL :协议名、主机名、文件名 请求-响应周期: 请求新页面GET请求、连接服务器、响应返回页面 收到、取回、显示

    mysock.send('GET http://...// HTTP/1.0\n\n') while True: data=mysock.recv(512) if(len(data)<-1) break print data mysock.close() fhand=urlib.urlopen() for line in fhand: print line.strip()

    week4 网络爬虫 beautiful soup

    from BeautifulSoup imort * html=urllib.urlopen(url).read() soup=BeautifulSoup(html) tags=soup('a') for tag in tags: print tag.get('href',None)

    week5 webservice python和java通过线路转化协议互相通信 中间格式为xml json 序列化、反序列化 : 应用程序《-》xml xml可以想象成一个树

    XML schema 用以验证XML格式的规范 XSD模式 xs:element xs:sequence xs:complexType

    Python解析xml 用xml.etree.ElementTree

    data = ''' <person> <name>Chuck</name> <phone type="intl"> +1 734 303 4456 </phone> <email hide="yes"/> </person>''' tree = ET.fromstring(data) print 'Name:',tree.find('name').text print 'Attr:',tree.find('email').get('hide')

    本周测验代码如下

    address = raw_input('Enter location: ') count=0 url = address print 'Retrieving', url uh = urllib.urlopen(url) data = uh.read() #print 'Retrieved',len(data),'characters' #print data tree = ET.fromstring(data) results = tree.findall('comments/comment') #lat = results[0].find('geometry').find('location').find('lat').text for result in results: count+=int(result.find('count').text) print count

    实例代码详见code文件夹xml1.py xml2.py geoxml.py

    week6 第六周 json 实例代码json1.py json2.py

    面向对象服务架构 使用API

    测验一:

    import json import urllib url = raw_input('Enter location: ') print 'Retrieving', url uh = urllib.urlopen(url) data = uh.read() print 'Retrieved',len(data),'characters' info = json.loads(data) print 'User count:', len(info) count=0 #print info for item in info['comments']: count+=int(item['count']) #print item['count'] print count

    测验二:

    import urllib import json # serviceurl = 'http://maps.googleapis.com/maps/api/geocode/json?' serviceurl = 'http://python-data.dr-chuck.net/geojson?' while True: address = raw_input('Enter location: ') if len(address) < 1 : break url = serviceurl + urllib.urlencode({'sensor':'false', 'address': address}) print 'Retrieving', url uh = urllib.urlopen(url) data = uh.read() print 'Retrieved',len(data),'characters' try: js = json.loads(str(data)) except: js = None if 'status' not in js or js['status'] != 'OK': print '==== Failure To Retrieve ====' print data continue print json.dumps(js, indent=4) lat = js["results"][0]["geometry"]["location"]["lat"] lng = js["results"][0]["geometry"]["location"]["lng"] print 'lat',lat,'lng',lng location = js['results'][0]['formatted_address'] print location print js['results'][0]['place_id']

    课程的code下载地址http://www.pythonlearn.com/code.zip

    转载请注明原文地址: https://ju.6miu.com/read-963262.html

    最新回复(0)