coursera-使用python访问网络数据-密歇根大学

xiaoxiao2021-12-14 22

课程总体比较简单，是python for everyone的系列课程之一，零基础的朋友也好上手。课程简单随记笔记在此记录一下。 week2 正则表达式 ‘^From’ 以’From’为开头 ‘.’匹配任意字符 ‘.*’任意字符任意次数 \S 非空格字符 + 表示一次或者多次 eg.\S+ 至少一个非空格字符 re.search()返回是否匹配正则表达式 re.findall()提取所有符合的字符串，返回一个列表 [0-9] 匹配一个数字 []一个字符默认贪婪匹配，即返回最长的符合的字符串 ?代表不进行贪婪匹配（）提取的起始和结束 [^ ]一个非空格字符，这里^表示非

Quick Guide ^ Matches the beginning of a line $ Matches the end of the line . Matches any character \s Matches whitespace \S Matches any non-whitespace character * Repeats a character zero or more times *? Repeats a character zero or more times (non-greedy) + Repeats a character one or more times +? Repeats a character one or more times (non-greedy) [aeiou] Matches a single character in the listed set [^XYZ] Matches a single character not in the listed set [a-z0-9] The set of characters can include a range ( Indicates where string extraction is to start ) Indicates where string extraction is to end

week3 Socket使客户端和服务器端能够进行通信 TCP端口号 python 建立socket连接:

import socket mysock=socket.socket(socket.AF_INET,socket.SOCK_STREAM) mysock.connect(('www.py4inf.com',80))

HTTP协议 URL ：协议名、主机名、文件名请求-响应周期：请求新页面GET请求、连接服务器、响应返回页面收到、取回、显示

mysock.send('GET http://...// HTTP/1.0\n\n') while True: data=mysock.recv(512) if(len(data)<-1) break print data mysock.close() fhand=urlib.urlopen() for line in fhand: print line.strip()

week4 网络爬虫 beautiful soup

from BeautifulSoup imort * html=urllib.urlopen(url).read() soup=BeautifulSoup(html) tags=soup('a') for tag in tags: print tag.get('href',None)

week5 webservice python和java通过线路转化协议互相通信中间格式为xml json 序列化、反序列化：应用程序《-》xml xml可以想象成一个树

XML schema 用以验证XML格式的规范 XSD模式 xs:element xs:sequence xs:complexType

Python解析xml 用xml.etree.ElementTree

data = ''' <person> <name>Chuck</name> <phone type="intl"> +1 734 303 4456 </phone> <email hide="yes"/> </person>''' tree = ET.fromstring(data) print 'Name:',tree.find('name').text print 'Attr:',tree.find('email').get('hide')

本周测验代码如下

address = raw_input('Enter location: ') count=0 url = address print 'Retrieving', url uh = urllib.urlopen(url) data = uh.read() #print 'Retrieved',len(data),'characters' #print data tree = ET.fromstring(data) results = tree.findall('comments/comment') #lat = results[0].find('geometry').find('location').find('lat').text for result in results: count+=int(result.find('count').text) print count

实例代码详见code文件夹xml1.py xml2.py geoxml.py

week6 第六周 json 实例代码json1.py json2.py

面向对象服务架构使用API

测验一：

import json import urllib url = raw_input('Enter location: ') print 'Retrieving', url uh = urllib.urlopen(url) data = uh.read() print 'Retrieved',len(data),'characters' info = json.loads(data) print 'User count:', len(info) count=0 #print info for item in info['comments']: count+=int(item['count']) #print item['count'] print count

测验二：

import urllib import json # serviceurl = 'http://maps.googleapis.com/maps/api/geocode/json?' serviceurl = 'http://python-data.dr-chuck.net/geojson?' while True: address = raw_input('Enter location: ') if len(address) < 1 : break url = serviceurl + urllib.urlencode({'sensor':'false', 'address': address}) print 'Retrieving', url uh = urllib.urlopen(url) data = uh.read() print 'Retrieved',len(data),'characters' try: js = json.loads(str(data)) except: js = None if 'status' not in js or js['status'] != 'OK': print '==== Failure To Retrieve ====' print data continue print json.dumps(js, indent=4) lat = js["results"][0]["geometry"]["location"]["lat"] lng = js["results"][0]["geometry"]["location"]["lng"] print 'lat',lat,'lng',lng location = js['results'][0]['formatted_address'] print location print js['results'][0]['place_id']

课程的code下载地址http://www.pythonlearn.com/code.zip

转载请注明原文地址: https://ju.6miu.com/read-963262.html

专利

最新回复(0)