2．2 requests的使用 -Python3网络爬虫开发实战(上)

当前位置:　首页>> 技术小册>> Python3网络爬虫开发实战(上)

### 2.2 Requests的使用

在Python网络爬虫开发的实践中，`requests`库无疑是最为流行且强大的HTTP客户端库之一。它简洁易用的API使得发送网络请求、处理响应变得轻松自如。本章节将详细介绍`requests`库的基本使用，包括安装、发送各种HTTP请求、处理响应、异常处理以及高级功能如会话（Session）管理、Cookies处理、文件上传与下载等。

#### 2.2.1 安装Requests

在开始使用`requests`之前，首先需要确保你的Python环境中已经安装了它。`requests`库可以通过pip轻松安装，打开你的命令行工具（如CMD、Terminal或PowerShell），输入以下命令：

```bash
pip install requests
```

安装完成后，你就可以在Python代码中导入并使用`requests`库了。

#### 2.2.2 发送GET请求

GET请求是最常见的HTTP请求方法之一，用于请求服务器发送资源。使用`requests`发送GET请求非常简单，只需调用`requests.get()`方法，并传入目标URL作为参数。

```python
import requests

url = 'http://httpbin.org/get'
response = requests.get(url)

print(response.text)  # 打印响应体内容
print(response.status_code)  # 打印HTTP状态码
```

在上述代码中，`response.text`包含了服务器返回的响应体（默认为Unicode编码），而`response.status_code`则是HTTP响应的状态码，用于判断请求是否成功。

#### 2.2.3 发送POST请求

POST请求通常用于向服务器提交数据，如表单提交。使用`requests.post()`方法时，可以通过`data`参数传递要发送的数据（默认为表单数据）。

```python
import requests

url = 'http://httpbin.org/post'
data = {'key': 'value'}
response = requests.post(url, data=data)

print(response.text)
```

对于需要发送JSON数据的情况，可以使用`json`参数替代`data`，`requests`会自动将字典转换为JSON字符串，并设置正确的`Content-Type`头。

```python
import requests

url = 'http://httpbin.org/post'
json_data = {'key': 'value'}
response = requests.post(url, json=json_data)

print(response.text)
```

#### 2.2.4 响应内容处理

除了直接访问`response.text`获取响应体文本外，`requests`还提供了多种方式来处理响应内容：

- `response.content`：以字节形式返回响应体，常用于处理二进制数据（如图片、视频）。
- `response.json()`：当响应体是JSON格式时，可以调用此方法直接解析JSON数据为Python字典。
- `response.headers`：获取响应头信息，以字典形式返回。

```python
import requests

url = 'https://api.github.com/users/github'
response = requests.get(url)

# 解析JSON响应
user_info = response.json()
print(user_info['name'])

# 获取并打印响应头
print(response.headers['Content-Type'])
```

#### 2.2.5 异常处理

在网络请求中，难免会遇到各种异常情况，如网络问题、服务器错误等。`requests`库通过抛出异常的方式处理这些错误，常用的异常有：

- `requests.exceptions.RequestException`：所有`requests`异常的基类。
- `requests.exceptions.ConnectionError`：网络连接问题。
- `requests.exceptions.HTTPError`：HTTP请求返回了不成功的状态码（默认400及以上）。
- `requests.exceptions.Timeout`：请求超时。
- `requests.exceptions.TooManyRedirects`：请求超过了设置的重定向次数。

使用`try-except`语句可以捕获并处理这些异常：

```python
import requests
from requests.exceptions import RequestException

try:
    response = requests.get('http://httpbin.org/status/500')
    response.raise_for_status()  # 如果响应状态码不是200系列，将抛出HTTPError
except RequestException as e:
    print(f"请求发生错误：{e}")
```

#### 2.2.6 会话（Session）对象

`requests`库中的`Session`对象允许你跨请求保持某些参数，如cookies、headers、认证信息等。使用`Session`对象，你可以模拟一个用户在多个请求之间的会话。

```python
import requests

session = requests.Session()

# 发送第一个请求，并设置一些cookies
session.get('http://httpbin.org/cookies/set/sessioncookie/123456789')

# 第二个请求会自动带上上一个请求设置的cookies
response = session.get('http://httpbin.org/cookies')
print(response.text)
```

#### 2.2.7 Cookies处理

除了通过`Session`对象自动管理cookies外，你也可以手动设置和获取cookies。

```python
import requests

url = 'http://httpbin.org/cookies'
cookies = {'user': 'foo', 'password': 'bar'}

response = requests.get(url, cookies=cookies)
print(response.text)

# 获取响应中的cookies
for cookie in response.cookies:
    print(f"{cookie.name}: {cookie.value}")
```

#### 2.2.8 文件上传与下载

使用`requests`上传文件也非常简单，只需将文件对象作为`files`字典的值传入即可。下载文件则可以通过`response.content`获取二进制内容，然后保存到文件中。

```python
import requests

# 上传文件
files = {'file': open('example.txt', 'rb')}
response = requests.post('http://httpbin.org/post', files=files)

# 下载文件
url = 'https://example.com/somefile.zip'
response = requests.get(url)
with open('downloaded_file.zip', 'wb') as f:
    f.write(response.content)
```

#### 结语

本章节详细介绍了`requests`库在Python网络爬虫开发中的基本使用方法，包括安装、发送GET/POST请求、处理响应内容、异常处理、会话管理、Cookies处理以及文件上传与下载等。`requests`库的强大功能和易用性，使其成为Python网络编程中不可或缺的工具之一。通过掌握这些基础，你将能够构建出功能强大的网络爬虫，轻松抓取互联网上的数据。

该分类下的相关小册推荐：

Python合辑5-格式化字符串

Python编程轻松进阶(五)

Python编程轻松进阶(一)

Python合辑2-字符串常用方法

实战Python网络爬虫

Python编程轻松进阶(三)

剑指Python(磨刀不误砍柴工)

Python合辑4-130个字符串操作示例

Python面试指南

Python与办公-玩转Word

Python自动化办公实战

Python3网络爬虫开发实战(下)