玩蛇网提供最新Python编程技术信息以及Python资源下载!
您现在的位置: 玩蛇网首页 > Python爬虫_Web网络爬虫_搜索引擎蜘蛛框架Spide > 正文内容

Python爬虫自动获取whois信息的方法源码

Python入门佳作 经典教程的全新修订 10个项目引人入胜
玩蛇网推荐图文教程:python黑客多线程扫描器

通过Python代码自动获取域名查询服务信息的方法。Python爬虫自动获取whois信息的方法源码示例提供。以前有看过python通过socket查询whois的方法,这次我们要使用的方法python 模块是subprocess、Spider等,python whois 查询具体操作方法大家可自行先学习下。

下面开始正式的Python爬虫自动获取whois信息的方法源码部分:

#!/usr/bin/python
#Python 解释器路径 

# -*- coding: utf-8 -*-
#编码声明 


import subprocess
from webauth.common.spider.base import Spider
#导入方法模块 获取whois信息

#代码块如下:
def get_whois(domain):
    try:
        p = subprocess.Popen(["whois %s" % domain], stdout=subprocess.PIPE, shell=True)
        out = p.communicate()[0]
        return get_info(out)
    except:
        return None, None
        
def get_info(content):
    """获取whois 信息"""
    url = ''
    templates = [
        {
          'email': ur'''(?P<email>[0-9a-zA-Z_-]*?@[0-9a-zA-Z\.]*)''',
         }
    ]
    spider = Spider(url, templates)
    
    msg = 'success'
    cmp, name, addr, email, phone = None, None, None, None, None

    lines = content.split('\n')

    if content.find('Registrant Contact:') != -1:
        add_start, start = False, False
        index = 0

        for line in lines:
            if not email and start:
                email_info = spider.get_info(line, spider.templates[0]['email'])
                if email_info:
                    email = email_info['email']

            if line.find("Registrant Contact:") != -1:
                start = True
                continue

            if start and index <= 6:
                if index == 0:
                    cmp = line.strip()
                elif index == 1:
                    name = line.strip()
                elif not line.strip():
                    pass
                elif line.strip().startswith('Fax:'):
                    add_start = True
                    addr = ''
                elif add_start:
                    addr += ' %s' % line.strip()
                index += 1

    elif content.find('Registrars.') != -1:

        for line in lines:
            line = line.strip()
            if line.startswith('Organisation Name....'):
                name = line[len('Organisation Name....'):].strip()
                cmp = name

            elif line.startswith("Organisation Address."):
                addr = addr if addr else ''
                addr += " %s" % line[len("Organisation Address."):].strip()

            elif line.startswith("Admin Email.........."):

                email = line[len('Admin Email..........'):].strip()
            elif line.startswith("Admin Phone.........."):
                phone = line[len('Admin Phone..........'):].strip()

    elif content.startswith('Domain Name'):

        for line in lines:
            line = line.strip()

            if line.startswith('Registrant Organization:'):
                cmp = line[len('Registrant Organization:'):].strip()          

            elif line.startswith('Registrant Name:'):
                name = line[len('Registrant Name:'):].strip().decode('utf8')

            elif line.startswith('Administrative Email:'):
                email = line[len('Administrative Email:'):].strip()

            elif line.startswith('Registrant Organization:'):
                cmp = line[len('Registrant Organization:'):].strip()

                #www.iplaypython.com 玩蛇网
    
    elif content.find("Registrant Name") != -1 and content.find("Registrant Organization") != -1 \
         and content.find("Registrant Country") != -1:
        street, city, state, post, country = '', '', '', '', ''

        for line in lines:

            if line.startswith('Registrant Name:'):
                name = line[len('Registrant Name:'):].strip()          

            elif line.startswith('Registrant Organization:'):
                cmp = line[len('Registrant Organization:'):].strip().decode('utf8')

            elif line.startswith('Registrant Phone:'):
                phone = line[len('Registrant Phone:'):].strip()

            elif line.startswith('Registrant Email:'):
                email = line[len('Registrant Email:'):].strip() 

            elif line.startswith('Registrant Street1'):
                street = line[len('Registrant Street1'):].strip()

            elif line.startswith('Registrant City:'):
                city = line[len('Registrant City:'):].strip()

            elif line.startswith('Registrant State/Province:'):
                state = line[len('Registrant State/Province:'):].strip()
 
           elif line.startswith('Registrant Postal Code:'):
                post = line[len('Registrant Postal Code:'):].strip()

            elif line.startswith('Registrant Country:'):
2000

                country = line[len('Registrant Country:'):].strip()
        addr = '%s %s %s %s %s' % (street, city, state, post, country)         

    elif content.find('NOT FOUND') != -1 or content.find('no matching record') != -1:
        msg = 'error'     

    else:
        msg = 'exception'
        
    if not email:
        email_info = spider.get_info(content, spider.templates[0]['email'])

        if email_info:
            email = email_info['email']

    if name:
        name = name.replace("()", "")
              
    return msg, {'addr': addr, 'name': name, 'email': email, 'phone': phone, 'raw_data': content}

if __name__ == '__main__':
    import sys

    if len(sys.argv) == 2:
        print get_whois(sys.argv[1])
    else:
        print "pleas pass a domain"

python 查看 whois的方法源码就先介绍到这里了,喜欢本篇文章的用户推荐关注:python 多线程扫描器,欢迎关注玩蛇网Python之家。

玩蛇网文章,转载请注明出处和来源网址:http://www.iplaypython.com/crawler/2625.html



微信公众号搜索"玩蛇网Python之家"加关注,每日最新的Python资讯、图文视频教程可以让你一手全掌握。强烈推荐关注!

微信扫描下图可直接关注

玩蛇网PythonQQ群,欢迎加入: ① 240764603 玩蛇网Python新手群
出炉日期:2016-01-31 16:13 玩蛇网 www.iplaypython.com

我要分享到:
评论列表(网友评论仅供网友表达个人看法,并不表明本站同意其观点或证实其描述)

必知PYTHON教程 Must Know PYTHON Tutorials

必知PYTHON模块 Must Know PYTHON Modules