python爬虫之抓取彼岸壁纸

sadness安全宇航员 lv.1

发布时间：2023-01-14 22:16:37 425

相关标签： # webkit# python# 爬虫

喜欢壁纸的小伙伴经常苦恼于不能找到高清壁纸，网络上手动搜索费时费力，而且格式大小不适合电脑还会失真。那么我们该如何通过python爬虫实现高效快速爬取高清图片呢？下面的代码值得大家试一试。

# coding=utf-8

import os.path
import re
import requests

if not os.path.exists('photo/'):
    os.mkdir('photo/')

url = 'http://www.netbian.com'
# http://www.netbian.com/index_2.htm

# http://www.netbian.com/desk/26344-1920x1080.htm
# http://www.netbian.com/desk/26345-1920x1080.htm
headers = {
    'Host': 'www.netbian.com',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36',
    'Upgrade-Insecure-Requests': '1',
    'Cookie': '__yjs_duid=1_4535c561a20964f1ade88776981a0f411648389371877; Hm_lvt_0f461eb489c245a31c209d36e41fcc0f=1648389374,1648986956; Hm_lpvt_0f461eb489c245a31c209d36e41fcc0f=1648986956'
}
rsp = requests.get(url, headers=headers)
rsp.encoding = rsp.apparent_encoding
# print(rsp.text)

# <img src="http://img.netbian.com/file/2022/0402/small004425v1bwe1648831465.jpg" alt="lol英雄联盟九尾妖狐 命运之子 阿狸壁纸"/>
# <a href="(.*?)"title="(.*?)" target="_blank"><img src=".*?" alt=".*?" />
url_list = re.findall('<a href="(.*?)"title="(.*?)" target="_blank"><img src=".*?" alt=".*?" />', rsp.text)
# print(url_list)

for index in url_list:
    url_lis = index[0]
    title = index[1]
    new_url = url + url_lis
    # print(new_url)

    rsp1 = requests.get(new_url)
    rsp1.encoding = rsp1.apparent_encoding
    img_list = re.findall('<a href=".*?" target="_blank"><img src="(.*?)" alt="(.*?)" title=".*?"></a>', rsp1.text)
    # print(img_list)

    for img in img_list:
        img_url = img[0]
        img_title = img[1]
        content_data = requests.get(img_url).content

        with open('photo/' + img_title + '.jpg', 'wb') as f:
            f.write(content_data)
            print(f'***************正在爬取{title}中****************')

文章来源： https://blog.51cto.com/u_13488918/5992170

特别声明：以上内容（图片及文字）均为互联网收集或者用户上传发布，本站仅提供信息存储服务！如有侵权或有涉及法律问题请联系我们。