<p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">每一个幻灯片页面都必须要指定一个幻灯片布局,那么什么是布局呢?幻灯片布局就是指幻灯片的模板,PPT里面把它叫作母版,一般来说一个母版上面会存在一些文本框、形状、图表等空白控件。当使用这个母版添加一张幻灯片的时候,这张幻灯片上面就会出现母版里已有的控件,这样做的好处是节省了添加控件和排版的步骤。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">在python-pptx中,母版就是布局(layout),我们可以访问Presentation对象的slide_layouts属性获取该文档全部布局,它是一个可迭代对象,我们可以通过遍历它取得所有的单个布局,每一个布局都是一个SlideLayout对象,代码如下:</p><pre class="prettyprint linenums prettyprinted" style="box-sizing: border-box; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-alternates: normal; font-kerning: auto; font-optical-sizing: auto; font-feature-settings: normal; font-variation-settings: normal; font-variant-position: normal; font-stretch: normal; font-size: 13.6px; line-height: 1.6; font-family: "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin-top: 0px; margin-bottom: 16px; overflow: auto; color: rgb(47, 111, 159); background-color: rgb(246, 246, 246); border: 1px solid rgb(238, 238, 238); padding: 10px; border-radius: 3px; overflow-wrap: break-word; text-wrap: wrap;">from pptx import Presentationppt = Presentation() print(len(ppt.slide_layouts))# 输出: 11 for layout in ppt.slide_layouts: print(type(layout))# 输出: < class 'pptx.slide.SlideLayout' ></pre><p style="box-sizing: border-box; margin-top: 0px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap; margin-bottom: 0px !important;">我偷偷打印了slide_layouts的长度,发现是11,也就是说默认的PPT文档一共有11种布局,至于这11种布局到底长什么样,我们可以打开PPT软件看看。单击PPT软件的“视图”选项卡,单击“幻灯片母版”按钮即可看到默认的页面母版,如图8-1所示。<br/><img src="https://www.maxiaoke.com/uploads/images/20231205/995a4c5c96fdd30523d888bf2946f590.png" alt=""/><br/>我们晚点再学习关于SlideLayout对象的其他知识点,这里主要是先提前跟布局打个招呼,因为等会儿新建幻灯片的时候需要用到布局。</p><p><br/></p>
文章列表
<p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">每一个PDF页面都有自己的尺寸,在pypdf2中,PageObject对象的mediabox属性存储的是就是该页面的尺寸信息,对应的是一个RectangleObject对象。RectangleObject对象的left、bottom、right、top四个属性分别代表左、下、右、上四边的距离,我们可以把一个PDF页面的左下角当成是坐标原点,而RectangleObject的四个属性分别是四条边到达坐标轴的距离。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">我还是举一个例子看一下吧:</p><pre class="prettyprint linenums prettyprinted" style="box-sizing: border-box; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-alternates: normal; font-kerning: auto; font-optical-sizing: auto; font-feature-settings: normal; font-variation-settings: normal; font-variant-position: normal; font-stretch: normal; font-size: 13.6px; line-height: 1.6; font-family: "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin-top: 0px; margin-bottom: 16px; overflow: auto; color: rgb(47, 111, 159); background-color: rgb(246, 246, 246); border: 1px solid rgb(238, 238, 238); padding: 10px; border-radius: 3px; overflow-wrap: break-word; text-wrap: wrap;">from PyPDF2 import PdfReaderreader = PdfReader( "./pdf_ files/练习文档.pdf") page = reader.getPage(0) print(page.mediabox)# 输出: RectangleObject([0, 0, 612, 792]) print(page.mediabox.left)# 输出: 0 print(page.mediabox.bottom)# 输出: 0 print(page.mediabox.right)# 输出: 612 print(page.mediabox.top)# 输出: 792</pre><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">代码中读取了“练习文档.pdf”的第一页的mediabox属性,该文档是信纸的尺寸,约等于612磅×792磅,我们把它放到坐标图里看一下,如图9-2所示。<br/><img src="https://www.maxiaoke.com/uploads/images/20231206/b47abf662849357b1e2d8c7c7db9286e.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">现在通过坐标图我们可以更容易理解RectangleObject左、下、右、上四个属性所表示的含义了,所以修改这几个属性实际上就是修改页面的尺寸。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">比如说我们要把信纸的尺寸修改为A4纸的尺寸,只要把PageObject对象的mediabox 重新赋值为一个指定四边距离的RectangleObject对象即可,注意这四边距离是一个元组,代码如下:</p><pre class="prettyprint linenums prettyprinted" style="box-sizing: border-box; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-alternates: normal; font-kerning: auto; font-optical-sizing: auto; font-feature-settings: normal; font-variation-settings: normal; font-variant-position: normal; font-stretch: normal; font-size: 13.6px; line-height: 1.6; font-family: "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin-top: 0px; margin-bottom: 16px; overflow: auto; color: rgb(47, 111, 159); background-color: rgb(246, 246, 246); border: 1px solid rgb(238, 238, 238); padding: 10px; border-radius: 3px; overflow-wrap: break-word; text-wrap: wrap;">from PyPDF2 import PdfReader, PdfWriterfrom PyPDF2.generic import RectangleObjectreader = PdfReader( "./pdf_ files/练习文档.pdf") writer = PdfWriter() for page in reader.pages: page.mediabox = RectangleObject((0, 0, 595.28, 841.89)) writer.addPage(page) with open("./pdf_ files/test.pdf", "wb") as f: writer.write(f)</pre><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">如果不想自己实例化RectangleObject对象也是可以的,PageObject对象有提供一些缩放方法,比如说scaleTo()方法可以直接指定宽高,代码如下:</p><pre class="prettyprint linenums prettyprinted" style="box-sizing: border-box; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-alternates: normal; font-kerning: auto; font-optical-sizing: auto; font-feature-settings: normal; font-variation-settings: normal; font-variant-position: normal; font-stretch: normal; font-size: 13.6px; line-height: 1.6; font-family: "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin-top: 0px; margin-bottom: 16px; overflow: auto; color: rgb(47, 111, 159); background-color: rgb(246, 246, 246); border: 1px solid rgb(238, 238, 238); padding: 10px; border-radius: 3px; overflow-wrap: break-word; text-wrap: wrap;">from PyPDF2 import PdfReader, PdfWriterreader = PdfReader( "./pdf_ files/练习文档.pdf") writer = PdfWriter() for page in reader.pages: page.scaleTo(595.28, 841.89) writer.addPage(page) with open("./pdf_ files/test.pdf", "wb") as f: writer.write(f)</pre><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">PageObject对象的scaleBy()方法可以按某个比例因子对页面进行等比缩放,比例因子就是指缩放的倍数,比如说把所有页面的宽高都缩小为原来的50%,代码如下:</p><pre class="prettyprint linenums prettyprinted" style="box-sizing: border-box; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-alternates: normal; font-kerning: auto; font-optical-sizing: auto; font-feature-settings: normal; font-variation-settings: normal; font-variant-position: normal; font-stretch: normal; font-size: 13.6px; line-height: 1.6; font-family: "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin-top: 0px; overflow: auto; color: rgb(47, 111, 159); background-color: rgb(246, 246, 246); border: 1px solid rgb(238, 238, 238); padding: 10px; border-radius: 3px; overflow-wrap: break-word; text-wrap: wrap; margin-bottom: 0px !important;">from PyPDF2 import PdfReader, PdfWriterreader = PdfReader( "./pdf_ files/练习文档.pdf") writer = PdfWriter() for page in reader.pages: page.scaleBy(0.5) writer.addPage(page) with open("./pdf_ files/test.pdf", "wb") as f: writer.write(f)</pre><p><br/></p>
<p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">PageObject对象有一个rotate()方法可以用于对自身进行旋转,旋转角度取值是0、90、270、180等,即90的倍数,正数是顺时针,负数是逆时针。现在有一个需求是要把一个PDF文件的奇数页逆时针旋转90度,偶数页顺时针旋转90度。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">我们可以使用rotate()方法旋转页面,所以这个问题的关键是怎么判断当前页面对应页数的奇偶呢?肯定是需要获取文档的总页数,之后遍历所有页面,把当前页数对2取余,结果为0则为偶数,否则是奇数。不过要注意,如果是使用range()方法生成下标,计算奇偶之前记得加上1,因为range()默认是从0开始的,代码如下:</p><pre class="prettyprint linenums prettyprinted" style="box-sizing: border-box; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-alternates: normal; font-kerning: auto; font-optical-sizing: auto; font-feature-settings: normal; font-variation-settings: normal; font-variant-position: normal; font-stretch: normal; font-size: 13.6px; line-height: 1.6; font-family: "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin-top: 0px; margin-bottom: 16px; overflow: auto; color: rgb(47, 111, 159); background-color: rgb(246, 246, 246); border: 1px solid rgb(238, 238, 238); padding: 10px; border-radius: 3px; overflow-wrap: break-word; text-wrap: wrap;">from PyPDF2 import PdfReader, PdfWriterreader = PdfReader( "./pdf_ files/练习文档.pdf") writer = PdfWriter() for i in range(reader.getNumPages()): if(i + 1) % 2 == 0: page = reader.getPage(i).rotate(90) else :page = reader.getPage(i).rotate(-90) writer.addPage(page) with open("./pdf_ files/test.pdf", "wb") as f: writer.write(f)</pre><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">既然讲到这里了,那就趁机再讲一下怎么知道某个PageObject对象对应的下标。PdfReader对象有一个get_page_number()方法,调用时只要把一个PageObject对象传进去就会返回该PageObject是第几页了,所以上面的旋转需求我们也可以这样写:</p><pre class="prettyprint linenums prettyprinted" style="box-sizing: border-box; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-alternates: normal; font-kerning: auto; font-optical-sizing: auto; font-feature-settings: normal; font-variation-settings: normal; font-variant-position: normal; font-stretch: normal; font-size: 13.6px; line-height: 1.6; font-family: "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin-top: 0px; overflow: auto; color: rgb(47, 111, 159); background-color: rgb(246, 246, 246); border: 1px solid rgb(238, 238, 238); padding: 10px; border-radius: 3px; overflow-wrap: break-word; text-wrap: wrap; margin-bottom: 0px !important;">from PyPDF2 import PdfReader, PdfWriterreader = PdfReader( "./pdf_ files/练习文档.pdf") writer = PdfWriter() for page in reader.pages: page_num = reader.get_page_number(page) page = page.rotate(90) if(page_num + 1) % 2 == 0 else page.rotate(-90) writer.addPage(page) with open("./pdf_ files/test.pdf", "wb") as f: writer.write(f)</pre><p><br/></p>
<p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">严格来说,在Python中,我们并不需要专门去实现线程同步,因为在Python中编写多线程程序时,无论是否手动给Python多线程程序加锁,在多线程程序运行时,都会被GIL所影响,使原本的线程不同步转变为安全的线程同步,不需要开发者额外手动进行处理。在本节中,笔者不会对Python中具体的线程同步实现原理进行介绍,而是对CPython官方实现Python线程同步以保证Python线程安全的整体概念和实现思路进行补充介绍。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">Python线程同步指的是在多线程环境中执行任意一段Python代码(这段代码可以是一个方法,也可以是一个类,还可以是一个接口等),均能返回期望的结果。对于Python多线程环境下的资源竞争过程,笔者画了一张示意图,如图3-1所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231206/41deefca3ed0bafd794b841163a30b19.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">假设在Python环境中同时存在线程A、线程B、线程C,且都需要执行同一个任务,假设这个任务的名称为T,这三个线程需要同时执行任务T,以达到多线程执行任务的目的。对于CPU来说,任一线程想要执行具体的任务,都需要CPU进行统一调度;对于线程来说,需要首先获取执行任务所需要的资源,才能开始执行。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">在没有线程同步概念下,线程A、线程B、线程C在获取临界区中的资源时就会随机获取,甚至会出现都获取不到资源的情况,所以在图3-1的输出部分,笔者使用“线程?”的形式来描述在没有线程同步概念下,任务能不能被执行以及是否按照我们所设计的时序来执行都是不确定的。那么,在线程同步概念下,上述任务执行过程将会是什么样的呢?笔者也画了一张图,如图3-2所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231206/6255422a576b86881bd67931cb5b0305.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">在线程同步概念下,线程A、线程B、线程C访问临界区中资源的方式不变,只不过在任意时刻只能有一个线程可以获取到临界区中的资源。这里以线程A为例,线程A在获取临界区中的资源之后,就会以标记位的方式将临界区中的相关资源进行占有性标记。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">在标记成功后,线程B、线程C都不能获取该资源。接着,线程A使用该资源继续执行任务,直到任务执行完毕,线程A才会释放资源。在线程A释放资源后,线程B或者线程C才能获取该资源,并执行后续任务。如此反复执行上述过程,直到所有的线程都获取该资源,并且任务均被执行完毕为止。</p><p style="box-sizing: border-box; margin-top: 0px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap; margin-bottom: 0px !important;">这是CPython官方在实现线程同步时使用的主要思路,并且将线程同步和线程安全放在一起实现,所以,读者在阅读本小节内容时,可以结合3.3.1节一起看。</p><p><br/></p>
<p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">Python线程安全性指的是在Python环境中,多个线程可以并发或并行地执行Python开发者分配的任务,并且任务执行完毕后的结果总是正确的。在多线程执行Python任务时,但凡有一次任务没有得到预期的结果,就不能说明我们所设计的Python多线程程序是安全的。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">CPython官方在实现Python线程安全时,将GIL进行了封装,并且将其实现过程封装到Python的内核实现层面,具体的实现源代码文件名称为pycore_gil.h,即CPython官方将GIL的实现过程封装成C语言的头文件。pycore_gil.h文件中的源代码如下。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231206/f0f565bbcbb6ab12b45187290c7b3a54.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">首先来看pycore_gil.h头文件中的定义,Py_INTERNAL_GIL_H是GIL的内核实现定义。在本文件中通过取反的方式进行引入,即先判断Py_INTERNAL_GIL_H没有引入该文件,如果这一判断结果为真,则引入Py_INTERNAL_GIL_H;如果这一判断结果为假,则不再重复引入Py_INTERNAL_GIL_H。__cplusplus是在C语言文件中定义C++拓展程序的语法形式,由以下代码段组成:<br/><img src="https://www.maxiaoke.com/uploads/images/20231206/3a802cb54fad6c50150c55a80f8d6a6a.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">pycore_gil.h头文件中还有几个地方需要注意。首先来看Py_BUILD_CORE,Py_BUILD_CORE是Python解释器识别Python内核文件的标识。如果Python内核文件中没有该标识,编译Python程序时就会提示“this header requires Py_BUILD_CORE define”错误,即只有文件中添加了Py_BUILD_CORE标识,才能被Python解释器识别为内核文件,并在内核编译期进行编译。pycore_atomic.h是Python实现原子性的内核头文件。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">该文件可以实现Python程序全局的原子性,也可以实现具体Python操作的原子性。FORCE_SWITCHING表示线程及线程上下文的切换方式。该切换方式有两种:一种是操作系统CPU内核自动管理和切换线程及线程上下文,该方式也是CPython官方实现线程及线程上下文切换的默认方式;另一种是手动强制切换,我们可以在运行Python程序时,通过向Python解释器配置FORCE_SWITCHING参数来改变Python线程及线程上下文的切换方式为手动强制切换。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">接着来看名为_gil_runtime_state的结构体定义,它是GIL的核心实现和调用的封装。_gil_runtime_state结构体分别由无符号long类型的变量interval、_Py_atomic_address类型的变量last_holder、_Py_atomic_int类型的变量locked、无符号long类型的变量switch_number、PyCOND_T类型的变量cond、PyMUTEX_T类型的变量mutex,以及对FORCE_SWITCHING状态的判断组成。该结构体规定了GIL的运行流程和管理措施。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">变量interval表示GIL的执行时间,单位为μs,且不能是负数。在GIL被触发时,变量interval随即开始工作,直到GIL执行完毕。这个时间间隔就是GIL的执行时间,开发者可以通过该变量获取GIL执行程序所消耗的时间。变量last_holder表示最后持有GIL的标记地址,开发者可以通过该变量来查看最后持有GIL的Python程序。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">变量locked表示Python程序是否被GIL修饰,如果没有被修饰,说明Python程序没有获取到GIL, locked变量值为-1,否则不为-1,而是一个大于0的值。同样,该变量是原子性变量,开发者可以通过该变量的值来判断Python程序有没有获取GIL。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">变量switch_number表示GIL的切换次数,即锁被获取和锁被释放之间的切换次数。该变量不能是负数,只能是大于0的正整数。开发者可以通过该变量的值来查看Python程序中GIL的切换次数,从而判断Python线程是否正常运转。变量cond表示条件的意思,可以理解为影响线程安全性的条件。该变量支持设置允许一个或多个线程等待,直到获取到GIL的线程释放锁了之后,逐个解除等待。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">开发者可以通过查看该变量是否有数据来判断线程是否处于等待状态,如果有数据,表明线程仍然处于等待状态。变量mutex表示Python中互斥锁的实现标记位。Python通过引入该变量来保证在同一时刻只能允许一个线程获取GIL,并且对获取GIL的线程进行互斥保护。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">最后来看_gil_runtime_state结构体的最后部分,代码如下所示。<br/><img src="https://www.maxiaoke.com/uploads/images/20231206/1ad2870bf25e85293f60f6311b5b2984.png" alt=""/><br/>上述代码段表示如果开发者启用了线程和线程上下文手动强制切换方式,那么GIL在运行时就会同步开启手动Python线程的切换和手动Python互斥锁的切换,Python解释器不会再自动管理Python线程的切换和Python互斥锁的切换;如果开发者没有启用手动强制切换方式,GIL在运行时并不会将手动Python线程的切换和手动Python互斥锁的切换进行应用。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">通过上述_gil_runtime_state结构体的定义,我们实现了GIL的整体管理。在编写Python程序时,开发者并不需要了解Python GIL的内部运转过程,只需要清楚在默认情况下Python线程被GIL保护着。</p><p style="box-sizing: border-box; margin-top: 0px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap; margin-bottom: 0px !important;">通过分析上述GIL实现的结构体_gil_runtime_state可知,Python在实现线程安全时,首先为每个访问临界区的Python线程统一添加GIL,然后通过变量interval和变量last_holder的相互配合,来确定每一个线程的执行时间和最大执行时间边界值,如果发现某个线程的执行时间太长,则会自动终止该线程的运行,并通过变量locked来改变多线程的状态。在多线程执行过程中,Python通过变量cond和变量mutex来规定每个线程的执行时机和线程状态切换时机,最后通过变量switch_number在后台静默记录每个线程对应的切换次数,以备不时之需。</p><p><br/></p>
<p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">通过上述对Profile的介绍,我们知道了Profile在大多数情况下被称为Python中的代码分析工具,可统计分析给定Python方法在CPython解释器或虚拟机中占用的内存,以及其他基本参数。那么,到底如何使用Profile来对Python代码进行分析呢?下面笔者会通过一个分析示例来总结使用Profile分析Python代码的步骤。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">为了方便演示Profile对Python代码的分析结果,笔者这里定义了一个简单的Python方法,代码如下:</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231206/e58d50b0aed6c5fdc5c00dad4a097d46.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">上述代码非常简单,笔者在这里就不再介绍了。使用Profile来对上述helloProfile()方法进行分析的过程如下:</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231206/c656f66e3410518acbeac4da2ff3ace2.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">在上述代码中,首先将Profile的一种实现cProfile通过import导入包引入,然后调用cProfile中的run()方法来对helloProfile()方法进行代码分析。cProfile中的run()方法支持传递Python中的方法名称、Python中对应方法的参数,还支持将Python方法的分析结果写入对应的文件。执行上述代码便开启对helloProfile()方法的分析,并直接将分析结果进行打印,如图14-2所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231206/ed8122c239356aef6d038f5970237be7.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">上述代码中,第一行“hello profile”是helloProfile()方法的输出内容,和Profile分析内容无关。第二行“2 function calls in 0.000 seconds”是使用Profile的概括性分析内容,表示Profile监听到2个方法调用,并且这一监听过程在0.000秒内完成,这意味着调用不是通过递归引发的。第三行“Ordered by: standard name”表示监听结果内容的输出顺序是按照最右边列中的文本字符串的顺序依次进行输出。第四行开始直到最后一行是Profile对helloProfile()方法的详细分析内容,共包含6列,每一列所代表的的含义如下。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">● ncalls列表示该Python方法被调用的次数。如果该列中有两个数字同时出现,例如3/1,那么这两个数字就表示该Python方法发生了递归调用,具体为这两个数字中的第二个数字是Python方法的原始调用次数,第一个数字是调用的总次数。注意,当Python方法不发生递归调用时,这两个数字总是相同的,并且只打印这两个数字中的一个数字。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">● tottime列表示指定的Python方法执行消耗的总时间,不包括调用子方法的时间,单位为s。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">● percall列表示tottime列除以ncalls列的值。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">● cumtime列表示指定的Python方法及其所有子方法(从调用到退出)消耗的累积时间。这个数字对于递归函数来说是准确的,单位为s。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">● percall列表示cumtime列除以原始调用次数的值,即函数运行一次的平均时间,单位为s。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">● filename:lineno(function)列表示提供相应数据的每个Python方法。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">如果需要将测试结果输出到文件中,那么我们可以像下面这样编写代码:</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231206/a1aad48e4f3f88ccaacbbcb9c2423ec2.png" alt=""/><br/>执行上述代码之后,demo.txt文件就会在当前Python文件所在目录中生成,如图14-3所示。<br/><img src="https://www.maxiaoke.com/uploads/images/20231206/977318aa18c7136e224516a577fa29b6.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">双击打开该demo.txt文件即可查看性能检测数据。这种将性能测试报告输出到文件中查看的形式非常实用,例如,在多部门对代码进行评审或研讨时,可以直接将该性能测试报告进行投屏,还是很方便的。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">在理解了上述分析内容中每一列所代表的含义后,让我们来看一下Profile中的run()方法的定义:</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231206/4dc58f2ad88a5bf2cb71849a3c3caf8c.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">profile中的run()方法接收3个参数,第一个参数command表示传入的Python方法的名称,也可以传入固定的Python代码,这些都是允许的。第二个参数是fileName,即文件名称,如果传递了fileName,那么第一个参数的分析结果就会直接写入该fileName中,开发者需要通过对应的fileName来查看分析结果,且不会在控制台将分析结果进行打印。第三个参数是sort,表示排序方式,默认值为-1,表示采用默认方式进行排序,即采用上述示例代码中的Ordered by:standard name方式对代码的分析结果列进行输出排序。其实,调用上述run()方法之后,本质上是执行的下述exec()方法。exec()方法的官方定义如下:</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231206/7d221dd67b96c4edeceb339ccaccc745.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">可以看出,exec()方法的官方定义与run()方法的官方定义几乎相同,这里不再赘述。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">通过对run()方法打印结果的输出内容进行统计与分析,我们可以判断出开发者所开发的Python程序是否正常运行,比如一个递归方法应该执行多少次递归调用,通过ncalls列即可查看,再比如,开发者在处理业务代码时,如果该业务代码对执行的时间有严格要求,那么就可以通过Tottime列查看该代码所执行的真正时间,从而可以判断出开发者开发的代码有没有按照预期的执行时间执行,如果执行时间超过这一范围,就需要开发者对代码进行一定的优化,之后再次调用run()方法来查看代码的执行时间。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">诸如此类的代码测试还有很多,开发者需要根据实际业务场景和性能指标,结合Profile的run()方法进行监控,以此达到优化Python程序的目的。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">综上所述,使用Profile对Python代码的分析步骤总结如下。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">第一步,定义进行分析的Python方法或代码行,可以是已经开发好的Python方法,也可以是开发好的Python代码,只要有Python方法或代码行即可。理论上来说,该Python方法或代码可以无限大。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">第二步,根据当下的Python运行环境,决定是采用cProfile还是Profile实现第一步中定义的Python方法或代码的性能分析。选用哪种实现的方法很简单,我们可以直接通过import来对cProfile引入,引入完成再调用该代码时,如果控制台报错,即没有cProfile模块或找不到cProfile模块,说明当下的Python环境不支持cProfile,此时就只能使用Profile进行性能优化。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">如果在引入Profile之后,控制台还是报错Profile模块找不到或不存在,说明当下的Python环境不支持Profile,需要升级到较新版本或最新的Python版本。</p><p style="box-sizing: border-box; margin-top: 0px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap; margin-bottom: 0px !important;">第三步,根据第二步选择的Profile,决定调用性能分析的Profile方法,最后根据实际的业务需要结合性能分析结果,对开发者开发的Python方法或代码进行深入分析,并结合分析结果对开发者所开发的Python方法或代码进行整改和优化,以满足业务需要。</p><p><br/></p>
<p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">Profile在Python中被称为监视统计数据组,用来监视Python代码的执行过程。Python中的Profile分为两种类型:一种是CProfile,即基于C语言来操作的Profile;另一种是Profile,即使用Python语言来操作的Profile。CProfile和Profile提供了对Python程序的性能确定性分析。Profile是一组统计数据,描述Python程序各个部分执行的频率和时间。这些统计数据可以通过pstats模块格式化为报表进行查看,如图14-1所示。我们可以通过Profile来分析Python代码,从而了解Python代码中可优化的地方。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231206/21319d6411aab7c2dff2834e603880be.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">性能确定性分析旨在反映这样一个事实:所有函数调用、函数返回和异常事件都被监控,并且对这些事件发生的间隔(在此期间用户编写的代码正在执行)进行精确计时。统计分析(不是由该模块完成)随机采样有效指令指针,并推断时间耗费在哪里。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">统计分析传统上涉及较少的开销(因为代码不需要检测),但只提供了时间花在哪里的相对指示。在Python中,由于在执行程序中总有一个活动的解释器,因此执行确定性分析不需要插入指令的代码。Python自动为每个事件提供一个:dfn:钩子(可选回调)。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">此外,Python的解释特性往往会给执行增加太多开销,以至于在典型的应用程序中,确定性分析往往只会增加很小的处理开销。结果是,确定性分析代价并没有那么高昂,但是提供了有关Python程序执行的大量运行时统计信息。调用计数统计信息可用于识别代码中的错误(意外计数),并识别可能的内联扩展点(高频调用)。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">内部时间统计可用于识别应仔细优化的“热循环”。累积时间统计可用于识别算法选择上的高级别错误。注意,确定性性能分析中对累积时间的异常处理,允许直接比较算法的递归实现与迭代实现的统计信息。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">确定性分析存在一个涉及精度的基本问题。其最明显的限制是,底层时钟周期大约为0.001s(通常)。因此,没有什么测量会比底层时钟更精确。如果进行足够的测量,那么误差将趋于平均。不幸的是,删除第一个错误会导致第二个错误发生。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">第二个问题是,从调度事件到分析器调用获取时间函数的过程实际上是获取时钟状态,这需要一段时间。类似地,从获取时钟值开始,到再次执行用户代码为止,退出分析器事件句柄也存在一定的延时。因此,多次调用单个函数或多个函数通常会累积延时。尽管这种方式的误差通常小于时钟的精度(小于一个时钟周期),但它可以累积并变得非常可观。</p><p style="box-sizing: border-box; margin-top: 0px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap; margin-bottom: 0px !important;">与开销较低的CProfile相比,Profile的问题更为严重。出于这个原因,Profile提供了一种针对指定平台的自我校准方法,以便最大限度(平均地)消除此误差。校准后,结果将更准确(在最小二乘意义上),但它有时会产生负数。但不要对产生的负数惊慌,该情况应该只会在手工校准分析器的情况下出现,实际上结果比没有校准的情况要好。</p><p><br/></p>
<p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;">要实现Charles对Android抓包,其过程比iOS稍微复杂一点。这是因为不同的Andorid设备,安装证书的入口可能不一样,这就需要根据自己手机的实际情况来寻找。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;">首先在Charles中选择“Help”-“SSL Proxying”-“Save Charles Root Certificate”命令,将Charles的证书保存到计算机桌面,如图9-35所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;"><img src="https://www.maxiaoke.com/uploads/images/20231210/ceb54ac00cde3a13bb8d548caee8c812.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;">为了在手机上安装证书,需要先发送证书到手机里面。如果计算机系统为Windows,那么直接插上USB线就可以传送。如果计算机系统是Mac OS,那么可以使用QQ传文件或者微信的文件传输助手把证书文件发送到手机里面。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;">虽然Android里面安装证书的位置可能不同,但一般都在“系统设置”-“WLAN”里面。打开Wi-Fi功能,界面如图9-36所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;"><img src="https://www.maxiaoke.com/uploads/images/20231210/0ab1367e5d548c62c292381bda10eb9a.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;">有的Android系统,这个界面的左下角或者右下角可能有3个点,点开以后是一个高级设置的菜单。而另一些Android手机的系统,例如小米手机的MIUI系统,则是直接在最下面就可以找到高级设置,如图9-37所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;"><img src="https://www.maxiaoke.com/uploads/images/20231210/19fda7fb44e7da0adc89ce62378e1883.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;">选择“高级设置”,打开高级设置界面,继续选择“安装证书”,如图9-38所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;"><img src="https://www.maxiaoke.com/uploads/images/20231210/4b9ab41561bfefcbd933ba24bac82dd5.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;">系统会打开文件浏览器,在里面找到刚才发送到手机上面的证书文件。找到证书并单击“确定”按钮,此时会弹出一个窗口,提示给证书设定一个名字,如图9-39所示,这个名字可以任意设定。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;"><img src="https://www.maxiaoke.com/uploads/images/20231210/fac975d321c48c166b8fa6500dcfe986.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;">单击“确定”按钮,证书就安装好了。接下来和iOS一样,需要为手机设置代理,让手机的流量经过Charles。在系统设置中打开当前连接的Wi-Fi设置界面,并将代理设置为Charles对应的IP和端口号,如图9-40所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;"><img src="https://www.maxiaoke.com/uploads/images/20231210/1d2b5ddd7b838bfd7ab5dd59988a89ce.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;">和iOS不同的是,Android设置了代理以后需要单击“确定”按钮。代理设置好以后,Android的环境就搭建好了。在手机上任意打开一个App,就可以看到Charles上面有数据在流动,如图9-41所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;"><img src="https://www.maxiaoke.com/uploads/images/20231210/8376059650be21f3956207adfb1e89f6.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px;">这个数据包对应了TapTap这个App的首页数据,TapTap首页如图9-42所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 0px !important;"><img src="https://www.maxiaoke.com/uploads/images/20231210/8e6028caba855d6ef30af56efb861722.png" alt=""/></p><p><br/></p>
<p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">对于苹果设备,首先要保证计算机和苹果设备联在同一个Wi-Fi上。选择Charles菜单栏中的“Help”-“Local IP Address”命令,此时弹出一个对话框,显示当前计算机的内网IP地址,如图9-24所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231210/4893b545f17a7dd1c17b54078a6e5733.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">接下来设置手机。进入系统设置,选择“无线局域网”,然后单击已经连接的这个Wi-Fi热点右侧的圆圈包围的字母i的图标,如图9-25所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231210/c12157c87d619c6eb8df256c8a4d617f.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">选择“HTTP代理”下面的“手动”选项卡,在“服务器”处输入计算机的IP地址,在“端口”处输入8888,如图9-26所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231210/b0f71ebf73fbbf2a498640e0a1b2631a.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">输入完成代理以后按下苹果设备的Home键,设置就会自动保存。注意,计算机上立刻就会弹出一个对话框,询问是否允许一台设备通过计算机代理上网,如图9-27所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231210/3f6565094de981f22eb6ad42b3a80d26.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">单击“Allow”按钮,允许以后,只能使用iOS系统自带的Safari浏览器访问<a href="https://chls.pro/ssl%E3%80%82" style="box-sizing: border-box; color: rgb(51, 202, 187); text-decoration-line: none; background-image: initial; background-position: initial; background-size: initial; background-repeat: initial; background-attachment: initial; background-origin: initial; background-clip: initial; transition: all 0.3s linear 0s; outline: none !important;">https://chls.pro/ssl。</a> 此时会弹出一个对话框,询问是否显示配置描述文件,如图9-28所示,单击“允许”按钮,打开图9-29所示的界面。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231210/af385a04e317fd2626242b6ab9d6d0f1.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231210/0026dcd879480a7432090372c977bd3e.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">单击右上角的“安装”按钮,弹出另一个对话框,显示描述文件信息,如图9-30所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231210/8c4e7e5268bdfdb93671f9d5f03138c3.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">单击“安装”按钮并输入屏锁密码,进行安装即可。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">安装完成证书以后,在设置中打开“关于本机”,找到最下面的“证书信任设置”,并在里面启动对Charles证书的完全信任,如图9-31所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231210/7a1e6d946c703cb97d509fcbe9576591.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">这样,一个证书就在iOS设备上安装好了。安装好证书以后,打开iOS设备上的任何一个App,可以看到Charles中有数据包在流动,如图9-32所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231210/aaf7dd8060c3b8e57897310adda57a56.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">图9-32中,这个数据包对应的是iOS设备上面的“掘金”App这个技术类App查询文章的请求,界面如图9-33所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231210/f2b84a4b7856ebf3e9d8358a76e1953d.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">在Charles中,在这个数据包的“Contents”选项卡下面的请求的“Raw”选项卡中可以看到这个请求的如下相关信息:</p><pre class="prettyprint linenums prettyprinted" style="box-sizing: border-box; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-alternates: normal; font-kerning: auto; font-optical-sizing: auto; font-feature-settings: normal; font-variation-settings: normal; font-variant-position: normal; font-stretch: normal; font-size: 13.6px; line-height: 1.6; font-family: "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin-top: 0px; margin-bottom: 16px; overflow: auto; color: rgb(47, 111, 159); background-color: rgb(246, 246, 246); border: 1px solid rgb(238, 238, 238); padding: 10px; border-radius: 3px; overflow-wrap: break-word; text-wrap: wrap;"> GET /v1/get_entry_by_timeline? before=&category=57be7c18128fe1005fa902de&limit=20&src=ios&type= HTTP/1.1 Host: timeline-merger-ms.juejin.im Accept: */* Cookie: QINGCLOUDELB=47f7a729e0fcb7fdf0b3143c89790b65ab7e48fb3972913efd95d72fe838c4fb|W0Nyw|W0Nyw User-Agent: Xitu/5.3.0 (iPad; iOS 11.4; Scale/2.00) Accept-Language: zh-Hans-CN; q=1 Accept-Encoding: br, gzip, deflate Connection: keep-alive</pre><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">说明:这是一个GET方式的请求,从第3行开始是请求的头。在Python中可以模拟这个请求,如图9-34所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231210/99eead457a56c7d95bab574d8c1adc1d.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap; margin-bottom: 0px !important;">通过以上内容可知,要抓取文章信息,根本不需要先在计算机网页上打开掘金网站,再写XPath。在Charles的帮助下开发一个App后台的爬虫,就像开发一个异步加载的爬虫一样简单,而且结果直接就是JSON,转换成字典以后直接就能存到MongoDB里面,极其方便。</p><p><br/></p>
<p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">读者会不会觉得一元线性回归模型比较简单呢?它反映的是单个自变量对因变量的影响,然而实际情况中,影响因变量的自变量往往不止一个,从而需要将一元线性回归模型扩展到多元线性回归模型。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">如果构建多元线性回归模型的数据集包含n个观测、p+1个变量(其中p个自变量和1个因变量),则这些数据可以写成下方的矩阵形式:</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231211/7d650f913c675fe56746e4e8d271b2f7.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">其中,xij代表第个i行的第j个变量值。如果按照一元线性回归模型的逻辑,那么多元线性回归模型应该就是因变量y与自变量X的线性组合,即可以将多元线性回归模型表示成:</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231211/30a52f52bdd790e8b32050ae2b2f991e.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">根据线性代数的知识,可以将上式表示成y=Xβ+ε。其中,β为p×1的一维向量,代表了多元线性回归模型的偏回归系数;ε为n×1的一维向量,代表了模型拟合后每一个样本的误差项。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">7.2.1 回归模型的参数求解</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">在多元线性回归模型所涉及的数据中,因变量y是一维向量,而自变量X为二维矩阵,所以对于参数的求解不像一元线性回归模型那样简单,但求解的思路是完全一致的。为了使读者掌握多元线性回归模型参数的求解过程,这里把详细的推导步骤罗列到下方:</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">第一步:构建目标函数</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231211/d3ef6992d0c1a6c72b497836c4301489.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">根据线性代数的知识,可以将向量的平方和公式转换为向量的内积,接下来需要对该式进行平方项的展现。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">第二步:展开平方项</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231211/22ad32c1e73bd6d74ade8d3c888b4af0.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">由于上式中的y’Xβ和β’X’y均为常数,并且常数的转置就是其本身,因此y’Xβ和β’X’y是相等的。接下来,对目标函数求参数β的偏导数。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">第三步:求偏导</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231211/183ca82318f96c86eb97c1aae2d6812f.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">如上式所示,根据高等数学的知识可知,欲求目标函数的极值,一般都需要对目标函数求导数,再令导函数为0,进而根据等式求得导函数中的参数值。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">第四步:计算偏回归系数的值</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231211/24da6d296a492468b8c44ad582b056a9.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">经过如上四步的推导,最终可以得到偏回归系数β与自变量X、因变量y的数学关系。这个求解过程也被成为“最小二乘法”。基于已知的偏回归系数β就可以构造多元线性回归模型。前文也提到,构建模型的最终目的是为了预测,即根据其他已知的自变量X的值预测未知的因变量y的值。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">7.2.2 回归模型的预测</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">如果已经得知某个多元线性回归模型y=β0+β1x1+β2x2+…+βpxn,当有其他新的自变量值时,就可以将这些值带入如上的公式中,最终得到未知的y值。在Python中,实现线性回归模型的预测可以使用predict“方法”,关于该“方法”的参数含义如下:</p><pre class="prettyprint linenums prettyprinted" style="box-sizing: border-box; font-variant-numeric: normal; font-variant-east-asian: normal; font-variant-alternates: normal; font-kerning: auto; font-optical-sizing: auto; font-feature-settings: normal; font-variation-settings: normal; font-variant-position: normal; font-stretch: normal; font-size: 13.6px; line-height: 1.6; font-family: "YaHei Consolas Hybrid", Consolas, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Helvetica, monospace, monospace; margin-top: 0px; margin-bottom: 16px; overflow: auto; color: rgb(47, 111, 159); background-color: rgb(246, 246, 246); border: 1px solid rgb(238, 238, 238); padding: 10px; border-radius: 3px; overflow-wrap: break-word; text-wrap: wrap;"> predict(exog=None, transform=True)</pre><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">exog:指定用于预测的其他自变量的值。<br/>transform:bool类型参数,预测时是否将原始数据按照模型表达式进行转换,默认为True。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">接下来将基于statsmodels模块对多元线性回归模型的参数进行求解,进而依据其他新的自变量值实现模型的预测功能。这里不妨以某产品的利润数据集为例,该数据集包含5个变量,分别是产品的研发成本、管理成本、市场营销成本、销售市场和销售利润,数据集的部分截图如表7-1所示。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231211/c5f846ac98cfaeb917251abdad6bb648.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">表7-1中数据集中的Profit变量为因变量,其他变量将作为模型的自变量。需要注意的是,数据集中的State变量为字符型的离散变量,是无法直接带入模型进行计算的,所以建模时需要对该变量进行特殊处理。有关产品利润的建模和预测过程如下代码所示:</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231211/deb3cf794defd1b30f07bc149a30784c.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">如上结果所示,得到多元线性回归模型的回归系数及测试集上的预测值,为了比较,将预测值和测试集中的真实Profit值罗列在一起。针对如上代码需要说明三点:</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">为了建模和预测,将数据集拆分为两部分,分别是训练集(占80%)和测试集(占20%),训练集用于建模,测试集用于模型的预测。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">由于数据集中的State变量为非数值的离散变量,故建模时必须将其设置为哑变量的效果,实现方式很简单,将该变量套在C()中,表示将其当作分类(Category)变量处理。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">对于predict“方法”来说,输入的自变量X与建模时的自变量X必须保持结构一致,即变量名和变量类型必须都相同,这就是为什么代码中需要将test数据集的Profit变量删除的原因。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">对于输出的回归系数结果,读者可能会感到疑惑,为什么字符型变量State对应两个回归系数,而且标注了Florida和New York。那是因为字符型变量State含有三种不同的值,分别是California、Florida和New York,在建模时将该变量当作哑变量处理,所以三种不同的值就会衍生出两个变量,分别是State[Florida]和State[New York],而另一个变量State[California]就成了对照组。</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">正如建模中的代码所示,将State变量套在C()中,就表示State变量需要进行哑变量处理。但是这样做会存在一个缺陷,那就是无法指定变量中的某个值作为对照组,正如模型结果中默认将State变量的California值作为对照组(因为该值在三个值中的字母顺序是第一个)。如需解决这个缺陷,就要通过pandas模块中的get_dummies函数生成哑变量,然后将所需的对照组对应的哑变量删除即可。为了使读者明白该解决方案,这里不妨重新建模,并以State变量中的New York值作为对照组,代码如下:</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;"><img src="https://www.maxiaoke.com/uploads/images/20231211/56bb8a9f2dc4f7fb895f7c78462feb35.png" alt=""/></p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">如上结果所示,从离散变量State中衍生出来的哑变量在回归系数的结果里只保留了Florida和California,而New York变量则作为了参照组。以该模型结果为例,得到的模型公式可以表达为:</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">Profit=58068.05+0.80RD_Spend-0.06Administation+0.01Marketing_Spend+1440.86Florida+513.47California</p><p style="box-sizing: border-box; margin-top: 0px; margin-bottom: 16px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap;">虽然模型的回归系数求解出来了,但从统计学的角度该如何解释模型中的每个回归系数呢?下面分别以研发成本RD_Spend变量和哑变量Florida为例,解释这两个变量对模型的作用:在其他变量不变的情况下,研发成本每增加1美元,利润会增加0.80美元;在其他变量不变的情况下,以New York为基准线,如果在Florida销售产品,利润会增加1440.86美元。</p><p style="box-sizing: border-box; margin-top: 0px; color: rgb(77, 82, 89); font-family: "Microsoft YaHei", Helvetica, "Meiryo UI", "Malgun Gothic", "Segoe UI", "Trebuchet MS", Monaco, monospace, Tahoma, STXihei, 华文细黑, STHeiti, "Helvetica Neue", "Droid Sans", "wenquanyi micro hei", FreeSans, Arimo, Arial, SimSun, 宋体, Heiti, 黑体, sans-serif; text-wrap: wrap; margin-bottom: 0px !important;">关于产品利润的多元线性回归模型已经构建完成,但是该模型的好与坏并没有相应的结论,还需要进行模型的显著性检验和回归系数的显著性检验。在下一节,将重点介绍有关线性回归模型中的几点重要假设检验。</p><p><br/></p>