0x00 初探
提取百度主页html123456import urllib2if __name__ == '__main__': url = 'http://www.baidu.com' print urllib2.urlopen(url).read()
0x01 cookie&timeout
提取新浪微博“weibo.com/u/1000000002”出错,分析返回的html,请教同学后,猜测可能是没有加入cookie的原因用burpsuite拦截抓包,提取cookie:SUBP=0033W…….。也可用浏览器访问,保存cookie,然后提取。12345678#coding=utf8import urllib2if __name__ == '__main__': url = 'http://weibo.com/u/1000000002' rep = urllib2.Request(url) rep.add_header('cookie','你的cookie') print urllib2.urlopen(rep).read()
遍历获取weibo信息时,会出现卡死在urlopen,加入timeout解决此问题1print urllib2.urlopen(rep,timeout=10).read()