Python爬虫-第三章-5-利用xpath爬取某八戒网相关词语公司的信息和价格

网络安全芝麻鱼 lv.1

发布时间：2023-04-16 00:04:48 212

相关标签： # html# sass# python# 数据

# Demo Describe:数据解析 xpath 

import requests
from lxml import etree
from fake_useragent import UserAgent

'''
company
title
price
'''

# picType = input('输入想要爬取的词语： ')
# domain = f'https://www.zbj.com/search/f/?kw={picType}'
domain = 'https://www.zbj.com/search/f/?kw=saas'
ua = UserAgent()
user_agent = ua.random
headers = {
    'user-agent': user_agent
}
resp = requests.get(domain, headers=headers)
# get web html
html = etree.HTML(resp.text)
divs = html.xpath('/html/body/div[6]/div/div/div[2]/div[5]/div')
for element in divs:
    company = element.xpath('./div/div/div/a[1]/div[1]/p/text()')
    title = 'sass'.join(element.xpath('./div/div/div/a[2]/div[2]/div[2]/p/text()'))
    price = element.xpath('./div/div/div/a[2]/div[2]/div[1]/span[1]/text()')
    print(company)

文章来源： https://blog.51cto.com/mooreyxia/6002883

特别声明：以上内容（图片及文字）均为互联网收集或者用户上传发布，本站仅提供信息存储服务！如有侵权或有涉及法律问题请联系我们。