彻底弄懂Python标准库源码(三)—— pprint模块

作者: 杰克小麻雀
原文链接: https://blog.csdn.net/yushuaigee/article/details/109100139

 

目录

print 模块能够美观地对数据结构进行格式化。不仅可以由解释器正确地解析,而且便于人类阅读。输出尽可能地保持在一行,需要分拆到多行时会有缩进表示。 

想象一下,你有下面这么一串 json (这个是我随机生成的)需要打印出来,或者调试程序的时候需要看一下 json 中某个字段的值。

20201015163424822.png

 用 print() 打印出来回事这么一坨,根本没有可读性:

{‘_id’: ‘5f8808d57ac946ae591e8929’, ‘index’: 0, ‘guid’: ‘b41b3b14-1ae2-4cc4-b443-105bda03f4f0’, ‘isActive’: True, ‘balance’: ‘$1,985.20’, ‘picture’: ‘http://placehold.it/32x32', ‘age’: 37, ‘eyeColor’: ‘brown’, ‘name’: ‘Alexandra Atkins’, ‘gender’: ‘female’, ‘company’: ‘COSMETEX’, ‘email’: ‘alexandraatkins@cosmetex.com’, ‘phone’: ‘+1 (999) 588-3661’, ‘address’: ‘779 Beayer Place, Belvoir, Washington, 5395’, ‘about’: ‘Laborum Lorem labore sint excepteur ad do esse veniam sunt cillum. Magna ipsum id aliqua consequat. Commodo enim occaecat pariatur ullamco irure incididunt et incididunt. Dolor aliqua eiusmod id laboris non laborum aliqua sunt occaecat eu commodo elit consequat. In mollit aute ullamco officia exercitation eiusmod ea labore id magna adipisicing.\r\n’, ‘registered’: ‘2018-12-29T09:52:41 -08:00’, ‘latitude’: 66.079339, ‘longitude’: 68.156168, ‘tags’: [‘mollit’, ‘velit’, ‘do’, ‘velit’, ‘Lorem’, ‘qui’, ‘irure’], ‘friends’: [{‘id’: 0, ‘name’: ‘Latonya Pena’}, {‘id’: 1, ‘name’: ‘Marion Ayers’}, {‘id’: 2, ‘name’: ‘Bishop Day’}], ‘greeting’: ‘Hello, Alexandra Atkins! You have 3 unread messages.’, ‘favoriteFruit’: ‘banana’}

 在IDE中调试时,可能打印出来变成了一眼看不到头的一行,然后你就需要拖着进度条一个字段一个字段去找。

20201015164001779

 而用 pprint() 打印出来会自动变成这样美观可读的格式,这就是 pprint 库的作用:

20201015165529216

 pprint 模块主要包含 “pprint“,”pformat“,”isreadable“,”isrecursive“,”saferepr“, “pp“ 方法和一个类“PrettyPrinter“,其中核心是 PrettyPrinter 类,其他方法都是调用的这个类。以下文章中的行数是与我所用的3.8.4版本 pprint.py 文件真实的行数对应的。

模块整体注释

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#  Author:      Fred L. Drake, Jr.
# fdrake@acm.org
#
# This is a simple little module I wrote to make life easier. I didn't
# see anything quite like it in the library, though I may have overlooked
# something. I wrote this when I was trying to read some heavily nested
# tuples with fairly non-descriptive content. This is modeled very much
# after Lisp/Scheme - style pretty-printing of lists. If you find it
# useful, thank small children who sleep at night.

"""Support to pretty-print lists, tuples, & dictionaries recursively.

Very simple, but useful, especially in debugging data structures.

Classes
-------

PrettyPrinter()
Handle pretty-printing operations onto a stream using a configured
set of formatting parameters.

Functions
---------

pformat()
Format a Python object into a pretty-printed representation.

pprint()
Pretty-print a Python object to a stream [default is sys.stdout].

saferepr()
Generate a 'standard' repr()-like value, but protect against recursive
data structures.

"""

模块作者**Fred L. Drake, Jr.**是 PythonLabs 团队的成员,自1995年以来一直在为 Python 做出贡献,许多原始 Python 文档都是他贡献的。下面是他说的话:这是我为了让生活更容易写的一个简单的小模块。我之前没有在 Python 库中看到有类似功能的模块,可能有我没注意到吧。我写这篇文章时,我正在试图读一些嵌套很繁琐的元组,可读性非常差。这个模块仿效Lisp/Scheme(一种编程语言)风格将列表打印的更美观。如果你觉得有用,感谢那些晚上睡觉的孩子们吧(我估计作者的意思是他家孩子晚上睡觉很乖,没打扰他用晚上的业余时间写这个模块,hh)。

支持递归地的将列表、元组、字典优雅地打印出来,非常简单好用,尤其是调试数据的时候。

  • PrettyPrinter,作用是根据传入的参数将输入流优雅地打印出来。
  • 方法 format ,作用是将 Python 对象格式化成优雅的表现形式。
  • 方法 pprint,作用是将 Python 对象优雅地打印出来。(这个最常用,其实我写这篇博客之前只用过这个库这一个方法。。。)
  • 方法saferepr,作用是返回 object 的字符串表示,并为递归数据结构提供保护。这个怎么理解具体见后面safe_repr()分析。

依赖模块导入、对外暴露接口

1
2
3
4
5
6
7
8
import collections as _collections
import re
import sys as _sys
import types as _types
from io import StringIO as _StringIO

__all__ = ["pprint","pformat","isreadable","isrecursive","saferepr",
"PrettyPrinter", "pp"]

collections 是Python内建的一个集合模块,提供了许多有用的集合类,像 DequeCounter等。re 是Python自带的标准库提供对正则表达式的支持。sys 是一个C实现的内置模块,主要是实现Python解释器、操作系统相关的操作。types 定义了一些工具函数,用于协助动态创建新的类型。io提供了 Python 用于处理各种 I/O 类型的主要工具。三种主要的 I/O类型分别为: 文本 I/O, 二进制 I/O原始 I/O,此处引入的 StringIO 即是文本 I/O。pprint 库对外暴露的只有pprintpformatisreadableisrecursivesafereprPrettyPrinterpp 这几个属性。

saferepr 函数——返回对象的字符串表示,并为无限递归数据结构提供保护

1
2
3
def saferepr(object):
"""Version of repr() which can handle recursive data structures."""
return _safe_repr(object, {}, None, 0, True)[0]

第 65 ~ 67 行, saferepr 函数,通过递归调用 _safe_repr 方法,返回传入对象的的字符串表示,类似 repr()str() 方法(str()、repr()的区别),不同的地方是这个函数对于无限递归的数据结构提供了保护。 如果传入的object包含一个无限递归的数据结构,该递归数据结构会被表示为 <Recursion on typename with id=number>

什么是无限递归数据结构?类似下面这种,一个数组包含了它自己,你要是想将它完整打印出来肯定会有问题,下图中结果说明内置的 repr()str() 方法对无限递归数据结构也做了保护,会转换成 [...],而 saferepr 函数将其转化为了字符串 <Recursion on typename with id=number>,包含了对象的类型和id。

20201210185221469

isreadable 函数——返回对象的是否“可读”

1
2
3
def isreadable(object):
"""Determine if saferepr(object) is readable by eval()."""
return _safe_repr(object, {}, None, 0, True)[1]

第69~71行,这里的“可读”的意思是说,该对象是否可被用来通过 eval() 重新构建对象的值。一般是判断是否一个字符串是否可以用 eval() 转换成python对象,此函数对于递归对象总是返回 False。

isrecursive 函数——返回对象的是否是无限递归结构

1
2
3
def isrecursive(object):
"""Determine if object requires a recursive representation."""
return _safe_repr(object, {}, None, 0, True)[2]

第 73~75行, 判断传入的对象是否是无限递归结构。

PrettyPrinter——优雅格式化Python对象的类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class PrettyPrinter:
def __init__(self, indent=1, width=80, depth=None, stream=None, *,
compact=False, sort_dicts=True):
"""Handle pretty printing operations onto a stream using a set of
configured parameters.
indent
Number of spaces to indent for each level of nesting.
width
Attempted maximum number of columns in the output.
depth
The maximum depth to print out nested structures.
stream
The desired output stream. If omitted (or false), the standard
output stream available at construction will be used.
compact
If true, several items will be combined in one line.
sort_dicts
If true, dict keys are sorted.
"""
indent = int(indent)
width = int(width)
if indent < 0:
raise ValueError('indent must be >= 0')
if depth is not None and depth <= 0:
raise ValueError('depth must be > 0')
if not width:
raise ValueError('width must be != 0')
self._depth = depth
self._indent_per_level = indent
self._width = width
if stream is not None:
self._stream = stream
else:
self._stream = _sys.stdout
self._compact = bool(compact)
self._sort_dicts = sort_dicts

第104~145行,初始化函数,入参的处理和解析。

  • indent:每个嵌套级别要缩进的空格数,即第 1 层缩进 indent 个字符,第 2 层缩进 2 * indent 个字符,默认值 1。
  • width: 输出的字符的最大宽度,默认值 80。
  • depth: 打印嵌套结构的最大深度,超过这个深度不再换行,默认None表示不限制。
  • stream: 所需的输出流。如果省略(或False),将使用标准输出流,即 sys.stdout ,这个和 print()一样。
  • *** :** keyword-only参数标志,不占参数位置,详见我的另一篇博客 彻底弄懂 Python3中入参里的*号的作用 第5节。
  • compact: 如果为True,则多个元素将合并为一行,不会打印换行符,width 参数依然生效,默认为 False。
  • sort_dicts: 字典打印前是否排序,如果为 True,则打印前根据 key 进行排序。
1
2
3
def pprint(self, object):
self._format(object, self._stream, 0, 0, {}, 0)
self._stream.write("\n")

第147~149行,pprint(self, object) 函数,我们平时用的最多的 from pprint import pprint 最终也是调用的这个方法。这个方法先调用 self._format() 方法,再调用对象的输出流输出一个换行。这里面的函数层层调用有点多,需要一步一步看。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def _format(self, object, stream, indent, allowance, context, level):
objid = id(object)
if objid in context:
stream.write(_recursion(object))
self._recursive = True
self._readable = False
return
rep = self._repr(object, context, level)
max_width = self._width - indent - allowance
if len(rep) > max_width:
p = self._dispatch.get(type(object).__repr__, None)
if p is not None:
context[objid] = 1
p(self, object, stream, indent, allowance, context, level + 1)
del context[objid]
return
elif isinstance(object, dict):
context[objid] = 1
self._pprint_dict(object, stream, indent, allowance,
context, level + 1)
del context[objid]
return
stream.write(rep)

第163~185行的 self._format() 方法,这是 PrettyPrinter 这个类一个比较核心的递归调用方法。

入参解析:

  • object: 需要被格式化的对象。就是要打印的对象。
  • stream:所需的输出流。。
  • indent:每个嵌套级别要缩进的空格数。
  • allowance:额外补偿的缩进空格数。会根据前面的 width 和这里的 allowance,indent 决定什么时候换行。
  • context:以对象的id为键的字典,这些对象是当前表示上下文的一部分。
  • level:打印嵌套结构的最大深度,超过这个深度不再换行。

第164169行,获取对象的 id,如果objectidcontext 中,就打印 _recursion(object)_recursion() 在文件第575577行,是个类外的函数,作用是返回对象的类型名字和 id:

1
2
3
def _recursion(object):
return ("<Recursion on %s with id=%s>"
% (type(object).__name__, id(object)))

第170行,self._repr() 函数在 PrettyPrinter 类内的第403~410行,这里又调用了 self.format()

1
2
3
4
5
6
7
8
def _repr(self, object, context, level):
repr, readable, recursive = self.format(object, context.copy(),
self._depth, level)
if not readable:
self._readable = False
if recursive:
self._recursive = True
return repr

self.format() 在类内的第412~417行,这里又调用了 **_safe_repr()**。注释是说本方法将对象格式化为一个具体的文本,返回一个字符串、是否“可读”、是否无限递归结构。和前面的saferepr函数是一样的,只是saferepr函数只取了第一个返回值。

1
2
3
4
5
6
def format(self, object, context, maxlevels, level):
"""Format object for a specific context, returning a string
and flags indicating whether the representation is 'readable'
and whether the object represents a recursive construct.
"""
return _safe_repr(object, context, maxlevels, level, self._sort_dicts)

所以经过 rep = self._repr(object, context, level) 的层层调用,这一步已经获取到了对象的字符串形式,下一步就是按照预定的格式将它美观地打印出来。self._dispatch 是一个函数映射字典,以对象类型为键,对应的处理方法为值。p = self._dispatch.get(type(object).repr, None) 就是根据对象类型不同获取不同的处理函数。

下面就是各种类型的对象的具体处理函数。可以看出它不是一次性输出的,而是用 write 方法一块一块的输出,例如字典,先输出 “{”,然后递归调用fomat方法解析每一个键值对,每一个键值对先解析key输出,再解析value,最后再输出“”。这个debug一下就能直观地看出来。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
_dispatch = {}

def _pprint_dict(self, object, stream, indent, allowance, context, level):
write = stream.write
write('{')
if self._indent_per_level > 1:
write((self._indent_per_level - 1) * ' ')
length = len(object)
if length:
if self._sort_dicts:
items = sorted(object.items(), key=_safe_tuple)
else:
items = object.items()
self._format_dict_items(items, stream, indent, allowance + 1,
context, level)
write('}')

_dispatch[dict.__repr__] = _pprint_dict

def _pprint_ordered_dict(self, object, stream, indent, allowance, context, level):
if not len(object):
stream.write(repr(object))
return
cls = object.__class__
stream.write(cls.__name__ + '(')
self._format(list(object.items()), stream,
indent + len(cls.__name__) + 1, allowance + 1,
context, level)
stream.write(')')

_dispatch[_collections.OrderedDict.__repr__] = _pprint_ordered_dict

def _pprint_list(self, object, stream, indent, allowance, context, level):
stream.write('[')
self._format_items(object, stream, indent, allowance + 1,
context, level)
stream.write(']')

_dispatch[list.__repr__] = _pprint_list

def _pprint_tuple(self, object, stream, indent, allowance, context, level):
stream.write('(')
endchar = ',)' if len(object) == 1 else ')'
self._format_items(object, stream, indent, allowance + len(endchar),
context, level)
stream.write(endchar)

_dispatch[tuple.__repr__] = _pprint_tuple

def _pprint_set(self, object, stream, indent, allowance, context, level):
if not len(object):
stream.write(repr(object))
return
typ = object.__class__
if typ is set:
stream.write('{')
endchar = '}'
else:
stream.write(typ.__name__ + '({')
endchar = '})'
indent += len(typ.__name__) + 1
object = sorted(object, key=_safe_key)
self._format_items(object, stream, indent, allowance + len(endchar),
context, level)
stream.write(endchar)

_dispatch[set.__repr__] = _pprint_set
_dispatch[frozenset.__repr__] = _pprint_set

def _pprint_str(self, object, stream, indent, allowance, context, level):
write = stream.write
if not len(object):
write(repr(object))
return
chunks = []
lines = object.splitlines(True)
if level == 1:
indent += 1
allowance += 1
max_width1 = max_width = self._width - indent
for i, line in enumerate(lines):
rep = repr(line)
if i == len(lines) - 1:
max_width1 -= allowance
if len(rep) <= max_width1:
chunks.append(rep)
else:
# A list of alternating (non-space, space) strings
parts = re.findall(r'\S*\s*', line)
assert parts
assert not parts[-1]
parts.pop() # drop empty last part
max_width2 = max_width
current = ''
for j, part in enumerate(parts):
candidate = current + part
if j == len(parts) - 1 and i == len(lines) - 1:
max_width2 -= allowance
if len(repr(candidate)) > max_width2:
if current:
chunks.append(repr(current))
current = part
else:
current = candidate
if current:
chunks.append(repr(current))
if len(chunks) == 1:
write(rep)
return
if level == 1:
write('(')
for i, rep in enumerate(chunks):
if i > 0:
write('\n' + ' '*indent)
write(rep)
if level == 1:
write(')')

_dispatch[str.__repr__] = _pprint_str

def _pprint_bytes(self, object, stream, indent, allowance, context, level):
write = stream.write
if len(object) <= 4:
write(repr(object))
return
parens = level == 1
if parens:
indent += 1
allowance += 1
write('(')
delim = ''
for rep in _wrap_bytes_repr(object, self._width - indent, allowance):
write(delim)
write(rep)
if not delim:
delim = '\n' + ' '*indent
if parens:
write(')')

_dispatch[bytes.__repr__] = _pprint_bytes

def _pprint_bytearray(self, object, stream, indent, allowance, context, level):
write = stream.write
write('bytearray(')
self._pprint_bytes(bytes(object), stream, indent + 10,
allowance + 1, context, level + 1)
write(')')

_dispatch[bytearray.__repr__] = _pprint_bytearray

def _pprint_mappingproxy(self, object, stream, indent, allowance, context, level):
stream.write('mappingproxy(')
self._format(object.copy(), stream, indent + 13, allowance + 1,
context, level)
stream.write(')')

_dispatch[_types.MappingProxyType.__repr__] = _pprint_mappingproxy

#......


def _pprint_default_dict(self, object, stream, indent, allowance, context, level):
if not len(object):
stream.write(repr(object))
return
rdf = self._repr(object.default_factory, context, level)
cls = object.__class__
indent += len(cls.__name__) + 1
stream.write('%s(%s,\n%s' % (cls.__name__, rdf, ' ' * indent))
self._pprint_dict(object, stream, indent, allowance + 1, context, level)
stream.write(')')

_dispatch[_collections.defaultdict.__repr__] = _pprint_default_dict

def _pprint_counter(self, object, stream, indent, allowance, context, level):
if not len(object):
stream.write(repr(object))
return
cls = object.__class__
stream.write(cls.__name__ + '({')
if self._indent_per_level > 1:
stream.write((self._indent_per_level - 1) * ' ')
items = object.most_common()
self._format_dict_items(items, stream,
indent + len(cls.__name__) + 1, allowance + 2,
context, level)
stream.write('})')

_dispatch[_collections.Counter.__repr__] = _pprint_counter

def _pprint_chain_map(self, object, stream, indent, allowance, context, level):
if not len(object.maps):
stream.write(repr(object))
return
cls = object.__class__
stream.write(cls.__name__ + '(')
indent += len(cls.__name__) + 1
for i, m in enumerate(object.maps):
if i == len(object.maps) - 1:
self._format(m, stream, indent, allowance + 1, context, level)
stream.write(')')
else:
self._format(m, stream, indent, 1, context, level)
stream.write(',\n' + ' ' * indent)

_dispatch[_collections.ChainMap.__repr__] = _pprint_chain_map

def _pprint_deque(self, object, stream, indent, allowance, context, level):
if not len(object):
stream.write(repr(object))
return
cls = object.__class__
stream.write(cls.__name__ + '(')
indent += len(cls.__name__) + 1
stream.write('[')
if object.maxlen is None:
self._format_items(object, stream, indent, allowance + 2,
context, level)
stream.write('])')
else:
self._format_items(object, stream, indent, 2,
context, level)
rml = self._repr(object.maxlen, context, level)
stream.write('],\n%smaxlen=%s)' % (' ' * indent, rml))

_dispatch[_collections.deque.__repr__] = _pprint_deque

def _pprint_user_dict(self, object, stream, indent, allowance, context, level):
self._format(object.data, stream, indent, allowance, context, level - 1)

_dispatch[_collections.UserDict.__repr__] = _pprint_user_dict

def _pprint_user_list(self, object, stream, indent, allowance, context, level):
self._format(object.data, stream, indent, allowance, context, level - 1)

_dispatch[_collections.UserList.__repr__] = _pprint_user_list

def _pprint_user_string(self, object, stream, indent, allowance, context, level):
self._format(object.data, stream, indent, allowance, context, level - 1)

_dispatch[_collections.UserString.__repr__] = _pprint_user_string

pprint ——更方便更美观地打印

1
2
3
4
5
6
7
def pprint(object, stream=None, indent=1, width=80, depth=None, *,
compact=False, sort_dicts=True):
"""Pretty-print a Python object to a stream [default is sys.stdout]."""
printer = PrettyPrinter(
stream=stream, indent=indent, width=width, depth=depth,
compact=compact, sort_dicts=sort_dicts)
printer.pprint(object)

看明白了 PrettyPrinter 类和它的 pprint 方法,后面这些就很好理解了。这个pprint方法是实例化一个PrettyPrinter对象并调用对象的pprint方法。这样我们在使用时只需要 from pprint import pprint,然后pprint(xxx)就行了,不必每次初始化一个对象再调用pprint方法。

pformat ——格式化字符串构造器

1
2
3
4
5
def pformat(object, indent=1, width=80, depth=None, *,
compact=False, sort_dicts=True):
"""Format a Python object into a pretty-printed representation."""
return PrettyPrinter(indent=indent, width=width, depth=depth,
compact=compact, sort_dicts=sort_dicts).pformat(object)
1
2
3
4
def pformat(self, object):
sio = _StringIO()
self._format(object, sio, 0, 0, {}, 0)
return sio.getvalue()

这个也是为了方便调用,实例化一个PrettyPrinter 对象并返回对象的 pformat 方法。pprint 是直接将格式化后的字符串打印出来,pformat是构造好字符串但是不打印出来。也就是说 pprint(test_data)print(pformat(test_data)) 是等效的。

pp ——同 pprint

1
2
3
def pp(object, *args, sort_dicts=False, **kwargs):
"""Pretty-print a Python object"""
pprint(object, *args, sort_dicts=sort_dicts, **kwargs)

原来和pprint是一样的,看来以后只需要 from pprint import pp 了。欧耶,没白看。

  • 版权声明: 本博客所有文章除特别声明外,著作权归作者所有。转载请注明出处!
  • Copyrights © 2020-2021 杰克小麻雀
  • 访问人数: | 浏览次数:

请我喝杯咖啡吧~

支付宝
微信