• 热门专题

博客园的模拟登陆(SimulatedLogin)

作者:  发布日期:2016-08-22 21:24:40
Tag标签:博客  
  • 查看正常情况下登录博客园时本地浏览器向博客园的服务器发送的数据 依据上一步得到的由本地浏览器发送给博客园服务器的数据包内容进行模拟登陆 scrapy模拟登陆博客园 Reference

    1.查看正常情况下登录博客园时本地浏览器向博客园的服务器发送的数据

    首先打开博客园登录界面,填入登录用户名和密码,按快捷键 Ctrl+Alt+I 打开开发者管理器,然后点击登录 按钮,则可以在开发者管理器里看到发送的数据包内容。
    这里写图片描述

    数据包内容的查看位置如下图所示。
    这里写图片描述

    下面贴出了该数据包的内容

    1.  General
        1.  Remote Address:121.199.251.55:80
        2.  Request URL:http://passport.cnblogs.com/login.aspx?ReturnUrl=http%3a%2f%2fwww.cnblogs.com%2f
        3.  Request Method:POST
        4.  Status Code:302 Found
    2.  Response Headers
        1.  Cache-Control:private
        2.  Connection:keep-alive
        3.  Content-Length:140
        4.  Content-Type:text/html; charset=utf-8
        5.  Date:Sat, 28 Mar 2015 11:14:18 GMT
        6.  Location:http://www.cnblogs.com/
        7.  Set-Cookie:.DottextCookie=8D07D4D6449D629F475F84028369F871661B6C9E8F77305038D6236B5A4E3F33E1803C65D52DAD18CEDE4F4DB0B530179489D11B1F92DA7D78506AAF3570BEC0DA8C283662326F44679A88D01E09F53AA243908301C66E1617CE5B183682D93B5F7B9843AF0945B4CC825AE1A989A536F79D6C434111BF40ADE21D90A2918901BE2AC17F688B210A274DAE79; domain=.cnblogs.com; path=/; HttpOnly
        8.  Set-Cookie:SERVERID=9b2e527de1fc6430919cfb3051ec3e6c|1427541258|1427541244;Path=/
        9.  X-AspNet-Version:4.0.30319
        10. X-Powered-By:ASP.NET
        11. X-UA-Compatible:IE=10
    3.  Request Headers
        1.  Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
        2.  Accept-Encoding:gzip, deflate
        3.  Accept-Language:en-US,en;q=0.8
        4.  Cache-Control:max-age=0
        5.  Connection:keep-alive
        6.  Content-Length:503
        7.  Content-Type:application/x-www-form-urlencoded
        8.  Cookie:__gads=ID=5f799eb5ff8a0d1c:T=1426060996:S=ALNI_MY3SIyB9wH3MOArdyDiV2aA15B-5w; _gat=1; _ga=GA1.2.327332698.1426074473; SERVERID=9b2e527de1fc6430919cfb3051ec3e6c|1427541248|1427541244
        9.  Host:passport.cnblogs.com
        10. Origin:http://passport.cnblogs.com
        11. Referer:http://passport.cnblogs.com/login.aspx?ReturnUrl=http%3A%2F%2Fwww.cnblogs.com%2F
        12. User-Agent:Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36
    4.  Query String Parameters
        1.  ReturnUrl:http://www.cnblogs.com/
    5.  Form Dataview sourceview URL encoded
        1.  __EVENTTARGET:
        2.  __EVENTARGUMENT:
        3.  __VIEWSTATE:/wEPDwUKLTM1MjEzOTU2MGQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgEFC2Noa1JlbWVtYmVy4b/ZXiH+8FthXlmKpjSEgi7XBNU=
        4.  __VIEWSTATEGENERATOR:C2EE9ABB
        5.  __EVENTVALIDATION:/wEdAAUIqCk3Gcmu25zI9fQWqoC7hI6Xi65hwcQ8/QoQCF8JIahXufbhIqPmwKf992GTkd0wq1PKp6+/1yNGng6H71Uxop4oRunf14dz2Zt2+QKDEM3sbzJLySdZoy08+/dzW8VF2on0
        6.  tbUserName:golden1314521@gmail.com
        7.  tbPassword:真实的密码
        8.  btnLogin:登 录
        9.  txtReturnUrl:http://www.cnblogs.com/
    

    该部分的工作就是用自己写的程序模拟本地浏览器来登陆服务器。

    Python程序如下:

    import time,urllib2,urllib
    
    followeespagecontent = ''
    
    try:
        #设置 cookie
        cookies = urllib2.HTTPCookieProcessor()
        opener = urllib2.build_opener(cookies)
        urllib2.install_opener(opener)
    
        #下面的数据都是从上一步得到的
        parms = {'tbUserName':'golden1314521@gmail.com','tbPassword':'真实的用户密码','__EVENTTARGET':'btnLogin','__EVENTARGUMENT':'',
                 '__VIEWSTATE':'/wEPDwUKLTM1MjEzOTU2MGQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgEFC2Noa1JlbWVtYmVy4b/ZXiH+8FthXlmKpjSEgi7XBNU=',
                 '__EVENTVALIDATION':'/wEdAAUIqCk3Gcmu25zI9fQWqoC7hI6Xi65hwcQ8/QoQCF8JIahXufbhIqPmwKf992GTkd0wq1PKp6+/1yNGng6H71Uxop4oRunf14dz2Zt2+QKDEM3sbzJLySdZoy08+/dzW8VF2on0',
                 'txtReturnUrl':'http://www.cnblogs.com/'}
        loginUrl = 'http://passport.cnblogs.com/login.aspx?ReturnUrl=http%3A%2F%2Fwww.cnblogs.com%2F'
        login = urllib2.urlopen(loginUrl,urllib.urlencode(parms))
    
        #验证是否登录成功。如果登录成功,则能读取用户的好友信息所在的页面。
        followeespage = urllib2.urlopen('http://home.cnblogs.com/followees/')
        followeespagecontent = followeespage.read().decode('utf8')
    
    except Exception,e:
        print('登录失败')
    pass
    
    print followeespagecontent

    在上面的程序中,为了验证是否成功登录博客园,我们访问了当前博客园用户的好友信息所在的页面,即http://home.cnblogs.com/followees/ ,该页面含有用户的好友信息的列表。

    下面是成功登录的所得到的用户好友信息列表的内容输出:

    ....
    <div id='main'>
        <div class='avatar_list'>
            <ul>
    
                    <li>
                        <div class='avatar_pic'>
                            <a href='/u/heaad/'><img src='http://pic.cnitblog.com/face/u63234.png' alt='' title='苍梧'></a>
                        </div>
                        <div class='avatar_name'>
                            <a href='/u/heaad/' title='苍梧'>苍梧</a>
                        </div>
    .....
    
          <a href='http://www.cnblogs.com/AboutUS.aspx'>关于博客园</a><a href='http://www.cnblogs.com/SiteMap.aspx'>站点地图</a><a href='http://www.cnblogs.com/ContactUs.aspx'>联系我们</a><a href='http://www.cnblogs.com/ad.aspx'>广告服务</a>&copy; 2004-2015 <a href='http://www.cnblogs.com'>博客园</a>
            </div>
        </div>
    </body>
    </html>
    .......

    3.scrapy模拟登陆博客园

    代码执行过程:
    启动后,先执行方法 start_requests,该方法抛出一个网页爬取请求,网页地址是http://passport.cnblogs.com/login.aspx?ReturnUrl=http%3A%2F%2Fwww.cnblogs.com%2F,也就是博客园的登录界面,解析该网页的函数是post_login。postlogin方法以该方法接收到的response为基础发起请求,参数有浏览器信息headers,所填的包含用户名和密码等信息的表单信息formdata。

        headers = {
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
        'Accept-Encoding': 'gzip',
        'Accept-Language': 'en-US,en;q=0.8',
        'Connection': 'keep-alive',
        'Content-Type':'text/html; charset=UTF-8',
        'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36',
        'Referer': 'http://www.cnblogs.com/'
        }
        def start_requests(self):
            return [Request('http://passport.cnblogs.com/login.aspx?ReturnUrl=http%3A%2F%2Fwww.cnblogs.com%2F', meta = {'cookiejar' : 1}, callback = self.post_login)]
    
        #FormRequeset出问题了
        def post_login(self, response):
            print 'Preparing login'
    
            #FormRequeset.from_response是Scrapy提供的一个函数, 用于post表单
            #登陆成功后, 会调用after_login回调函数
            return [FormRequest.from_response(response,   
                                meta = {'cookiejar' : response.meta['cookiejar']},
                                headers = self.headers,
                                formdata = {
                                'tbUserName':'golden1314521@gmail.com',
                                'tbPassword':'***',
                                '__EVENTTARGET':'btnLogin','__EVENTARGUMENT':'',
                                '__VIEWSTATE':'/wEPDwUKLTM1MjEzOTU2MGQYAQUeX19Db250cm9sc1JlcXVpcmVQb3N0QmFja0tleV9fFgEFC2Noa1JlbWVtYmVy4b/ZXiH+8FthXlmKpjSEgi7XBNU=',
                                '__EVENTVALIDATION':'/wEdAAUIqCk3Gcmu25zI9fQWqoC7hI6Xi65hwcQ8/QoQCF8JIahXufbhIqPmwKf992GTkd0wq1PKp6+/1yNGng6H71Uxop4oRunf14dz2Zt2+QKDEM3sbzJLySdZoy08+/dzW8VF2on0',
                                'txtReturnUrl':'http://www.cnblogs.com/'
                                },
                                callback = self.after_login,
                                dont_filter = True
                                )]
        def after_login(self, response) :
            print 'login successfully'
            #登录成功后进入页面http://home.cnblogs.com/u/jinliangjiuzhuang/
            yield self.make_requests_from_url('http://home.cnblogs.com/u/jinliangjiuzhuang/')

    4. Reference

    使用C#发送Http 请求实现模拟登陆(以博客园为例)
    http://www.cnblogs.com/ListenCode/p/4211776.html Python网页抓取、模拟登录
    http://www.cnblogs.com/bboy/archive/2010/10/29/1864537.html python爬虫实践之模拟登录
    http://blog.csdn.net/figo829/article/details/18728381 Python网页抓取、模拟登录
    http://www.cnblogs.com/bboy/archive/2010/10/29/1864537.html Python爬虫(七)–Scrapy模拟登录
    http://www.jianshu.com/p/b7f41df6202d
About IT165 - 广告服务 - 隐私声明 - 版权申明 - 免责条款 - 网站地图 - 网友投稿 - 联系方式
本站内容来自于互联网,仅供用于网络技术学习,学习中请遵循相关法律法规