Python 编程语言精粹

1. 基本常识
2. 控制流
3. 函数
4. 类
5. 模块
- 5.1. 定义模块
- 5.2. 包的组织结构
6. 内置对象
7. 常用工具
8. 社区标准 PEPs
9. 参考链接

1 基本常识

1.1 帮助和查询

获取主页帮助: help()
获取函数帮助: help(str.replace)
获取模块帮助: help(re)
查看模块中内容: dir(re)
查看数据类型: type(var)

1.2 Python 解释器的环境变量

1.2.1 PYTHONDONTWRITEBYTECODE 是否写字节码文件

阻止 Python 解释器生成二进制字节码文件，如 __pycache__, *.pyc, *.pyo 等

export PYTHONDONTWRITEBYTECODE=1

附将字节码一次性删除的命令

find . -type f -name "*.py[co]" -delete -or -type d -name "__pycache__" -delete

1.2.2 PYTHONPATH 系统的搜索路径

Python 模块的搜索路径，可以通过 sys.path 变量查看

>>> import sys
>>> sys.path
['d:\\Python36\\DLLs', 'd:\\Python36\\lib', 'd:\\Python36', ...]
>>>

1.3 内置基本类型

Python 是弱类型的语言，但也会涉及到 6 个基本类型

int/long : 过大的 int 类型会被自动转化为 long 类型
float : 64 位，Python 中没有 double 类型
bool
str : 在 Python 2 中默认以 ASCII 编码，而在 Python 3 中默认以 Unicode 编码
- 字符串可置于单/双/三引号中
- 字符串是字符的序列，因此可以像处理其他序列一样处理字符串
- 特殊字符可通过 \ 或者前缀 r 实现, str1= r'this\f?ff'
NoneType(None) : Python 中的 "null" 值（None 对象只存在一个实例）
- None 不是一个保留关键字，而是 NoneType 的一个唯一实例
- None 通常是可选函数参数的默认值, def func1(a, b, c=None)
- None 可以放在 if 条件的判断, if variable is None :

datetime : Python 内置的 datetime 模块提供了 datetime、data 以及 time 类型

# 从字符串中创建 datetime
dt1 = datetime.strptime('20091031', '%Y%m%d')
# 获取 date 对象
dt1.date()
# 获取 time 对象
dt1.time()
# 将 datetime 格式化为字符串
dt1.strftime('%m/%d/%Y%H:%M')
# 更改字段值
dt2 = dt1.replace(minute=0, second=30)
# 做差, diff 是一个 datetime.timedelta 对象
diff = dt1 - dt2

需要注意的是：

str、bool、int 和 float 同时也是显式类型转换函数
除字符串和元组外，Python 中的绝大多数对象都是可变的

1.4 推导表达式

推导表达式是 Python 的语法糖，可以使用简洁的方式来处理生成新的列表、集合或字典

1.4.1 列表推导

[expr for val in collection if condition]

1.4.2 字典推导

{key-expr : value-expr for value in collection if condition}

1.4.3 集合推导

(expr for val in collection if condition)

1.4.4 嵌套列表

[expr for val in collection for innerVal in val if condition]

2 控制流

2.1 if 条件语句

num = 0
if num > 0:
  print 'num is positive'
elif num < 0:
  print 'num is negative'
else:
  assert num == 0, 'num is zero'

2.2 while 循环语句

x = 1
while x <= 100:
  x += 1

2.3 for 循环语句

# for loop
words = ['this', 'is', 'an', 'ex', 'parrot']
for w in words:
  pass

names = ['anne', 'beth', 'google']
ages = [12, 33, 81]
zip(names, ages) #>>> [('anne', 12), ('beth', 33), ('google', 81)]
for name, age in zip(names, ages):
  pass

# 添加下标
for i, v in enumerate(names, start=1):
  pass

2.4 `try/catch` 异常相关

基本形式

try:
    pass
except ValueError as e:
    print e
except (TypeError, AnotherError):
    pass
except:
    pass
finally:
    pass  # 清理，比如 close db;

手动引发异常

raise AssertionError  # 断言失败
raise SystemExit
# 请求程序退出
raise RuntimeError('错误信息 :..')

2.5 字符串求值 eval 和 exec

>>> nums = range(10)
>>> expr = '+'.join([str(n) for n in nums])
>>> expr
'0+1+2+3+4+5+6+7+8+9'
>>> eval(expr)
45
>>> exec("print('hello world')")
hello world
>>>

3 函数

3.1 函数定义

函数通过 def 关键字来定义，定义的例子如下：

def myfunc(arg):
  print(arg)

def fib_lessthan(n):
  ans = []
  a, b = 0, 1
  while a < n:
    ans.append(a)
    a, b = b, a+b
  return ans

3.2 函数传参

3.2.1 默认参数

在函数最后使用 = 可以给参数添加默认的值。

>>> def increase(n, step=1):
...   return n + step
...
>>> increase(2)
3
>>> increase(1, 5)
6
>>> increase(1, step=10)
11
>>>

需要注意的是： 函数的默认参数只初始化一次 。例如，下面例子中的 L 默认只在开始第一次初始化，后面每次调用都是添加到第一次初始化的列表中。

>>> def f(a, L=[]):
...   L.append(a)
...   return L
...
>>> f(1)
[1]
>>> f(2)
[1, 2]
>>> f(3)
[1, 2, 3]
>>>

3.2.2 变长参数

Python 的变长参数传递可以通过列表或者字典实现。当在参数前面添加一个 * 表示在当前参数列表中的变长部分都会放入一个列表中传进函数里面， ** 会放入一个字典中。

>>> def print_param(*params):
...   print(params)
...
>>> print_param('aa')
('aa',)
>>> print_param('aa', 'bb')
('aa', 'bb')
>>>
>>> def print_param2(**params):
...   print(params)
...
>>> print_param2(x=1, y=2)
{'x': 1, 'y': 2}
>>>
>>> def print_param3(x, y, *args, **kargs):
...   print(x)
...   print(y)
...   print(args)
...   print(xargs)
...
>>> def print_param3(x, y, *args, **kargs):
...   print(x)
...   print(y)
...   print(args)
...   print(kargs)
...
>>> print_param3(1, 2, 3, 4, 5, p='3', k='d')
1
2
(3, 4, 5)
{'p': '3', 'k': 'd'}
>>>

* 和 ** 除了定义变长参数的作用以外，还可以用于将列表和字典参数解包，下面是一个使用的例子。

list(range(3, 6))            # normal call with separate arguments
args = [3, 6]
list(range(*args))           # call with arguments unpacked from a list

def parrot(voltage, state='a stiff', action='voom'):
  print("-- This parrot wouldn't", action, end=' ')
  print("if you put", voltage, "volts through it.", end=' ')
  print("E's", state, "!")
d = {"voltage": "four million", "state": "bleedin' demised", "action": "VOOM"}
parrot(**d)

3.2.3 lambda 表达式

lambda 表达式其实是匿名函数，lambda 表达式可以帮助我们来实现闭包操作。如下定义了 make_incrementor 来动态生成 add5 和 add10 函数。

>>> def make_incrementor(n):
...   return lambda x: x + n
...
>>> add5 = make_incrementor(5)
>>> add10 = make_incrementor(10)
>>> add5(4)
9
>>> add10(4)
14
>>>

3.2.4 装饰器

装饰器是一个返回函数的高阶函数，通常是对一个函数进行一些属性设置后再将结果返回给原来的函数。装饰器使用 @ 修饰到函数定义的前面，下面是一个样例。

>>> def foo():
...   print('foo called')
...
>>> def decorator(func):
...   return func
...
>>> foo = decorator(foo)
>>>
>>> @decorator
... def bar():
...   print('bar called')
...
>>> bar()
bar called
>>>

3.3 匿名函数

filter(func,iter) 只能处理一个参数 iter ，仅仅将满足 func 方法的数值过滤出来。
map(func,iter1,iter2,..) 可以处理多个 iter，实现通过 func 方法对 iter1, iter2,… 进行处理。
reduce(func,iter,init) 仅能处理一个 iter, init 为初始化值，执行顺序为：先将每个 iter 内部第一个值和 init 进行 func 处理，处理的结果再与 iter 第二个值进行 func 处理，直到结束。

>>> numseq = map(str, range(10))
>>> list(numseq)
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
>>> filnum = filter(lambda x: x > 5, range(10))
>>> list(filnum)
[6, 7, 8, 9]
>>> from functools import reduce
>>> reduce(lambda x, y: x+y, range(100), 0)
4950
>>>

3.4 全局变量

Python 定义的变量默认都是局部变量，如果需要定义全局变量需要使用 global 关键字修饰。

>>> g_x = 0
>>> def change_x():
...   global g_x
...   g_x += 1
...
>>> g_x
0
>>> change_x()
>>> g_x
1
>>>

4 类

4.1 定义类

class Vector:
  # constructor
  def __init__(self, a, b):
    self.a = a
    self.b = b

  # destructor
  def __del__(self):
    pass

  # displayer of this class
  def __str__(self):
    return 'Vector (%d, %d)' % (self.a, self.b)

  # override operator '+'
  def __add__(self,other):
    return Vector(self.a + other.a, self.b + other.b)

v1 = Vector(2, 10)
v2 = Vector(5, -2)
v3 = v1 + v2

4.2 类继承

Python 的类继承方式如下：

__metaclass__ = type
class Parent:
  def __init__(self,):
    self.name = 'parent'

  def myMethod(self):
    print(self.name)

class Child(Parent):
  def __init__(self):
    self.name = 'child'

  def myMethod(self):
    # call super method
    super(Child, self).myMethod()

c = Child()
c.myMethod()

4.3 访问控制

Python 没有 private, protected, public 关键字，类的访问级别根据函数的命名来。

class Visibility:
  # private method start with __
  def __inaccessible(self):
    print 'you can not see me'

  # public method
  def accessible(self):
    print 'this secret message is:',
    self.__inaccessible()


secr = Visibility()
# secr.__inaccessible()
'''
Traceback (most recent call last):
File "***.py", line 13, in <module>
  secr.__inaccessible()
AttributeError: Visibility instance has no attribute '__inaccessible'
'''
secr.accessible() #>>> this secret message is: you can not see me

4.4 定义类型类

__metaclass__ = type
class Rect:
  def __init__(self, width=0, height=0):
    self.w = width
    self.h = height

  def getSize(self):
    return self.w, self.h

  def setSize(self, size):
    self.w, self.h = size

size = property(getSize, setSize)

r = Rect(2, 5)
r.size #=> (2, 5)
r.size = 4, 4
r.size #=> (4, 4)

5 模块

5.1 定义模块

定义模块就像正常编写普通 Python 的代码一样，在相应文件中定义一些函数。

# fibo.py
# Fibonacci numbers module
def fib(n):    # write Fibonacci series up to n
  a, b = 0, 1
    while a < n:
      print(a, end=' ')
      a, b = b, a+b
      print()

def fib2(n):   # return Fibonacci series up to n
  result = []
  a, b = 0, 1
    while a < n:
      result.append(a)
      a, b = b, a+b
    return result

然后通过 import 关键字导入模块

import fibo
fibo.fib(1000)
fibo.fib2(100)

# or
from fibo import fib, fib2
import fibo as fib
from fibo import fib as fibonacci

5.2 包的组织结构

包也是一种模块，在每一级的文件夹下需要新建 __init__.py 文件初始化当前的包。下面是一个包的文件结构的例子。

sound/                          Top-level package
      __init__.py               Initialize the sound package
      formats/                  Subpackage for file format conversions
              __init__.py
              wavread.py
              wavwrite.py
              aiffread.py
              aiffwrite.py
              auread.py
              auwrite.py
              ...
      effects/                  Subpackage for sound effects
              __init__.py
              echo.py
              surround.py
              reverse.py
              ...
      filters/                  Subpackage for filters
              __init__.py
              equalizer.py
              vocoder.py
              karaoke.py
              ...

当包创建并且添加到 PYTHONPATH 环境变量中后，可以通过如下方式导入包

import sound.effects.echo
from sound.effects import echo
from sound.effects.echo import echofilter

6 内置对象

6.1 列表

6.1.1 创建列表

>>> 3 * [4]
[4, 4, 4]
>>> 5 * ['']
['', '', '', '', '']
>>> 3 * [ 2 * [0]]
[[0, 0], [0, 0], [0, 0]]
>>>

6.1.2 索引列表元素以及获取子列表

常见的有直接下标索引，范围索引，倒序索引。

>>> nums = [1, 2, 3, 4, 5, 6, 7]
>>> nums[1:3]
[2, 3]
>>> nums[-3:]
[5, 6, 7]
>>> nums[-2]
6
>>>

使用具有一定步长的索引

>>> start = 1; end = 7; step  = 2
>>> nums[start:end:step]
[2, 4, 6]
>>>

6.1.3 修改列表内容：添加，扩展，翻转，排序

append 方法向列表最后添加元素，注意这样添加的方式是引用，如果需要复制的方式则需要 深度复制 。

>>> x = [1, 2, 3]
>>> y = x
>>> x.append(4)
>>> x
[1, 2, 3, 4]
>>> y
[1, 2, 3, 4]
>>>
>>> from copy import copy
>>> y = copy(x)
>>> x.append(5)
>>> x
[1, 2, 3, 4, 5]
>>> y
[1, 2, 3, 4]
>>>

insert 插入元素， pop 出栈元素， remove=查找并删除特定元素， =clear 清除所有列表。

>>> friuts = ['apple', 'banana', 'orange']
>>> friuts.insert(1, 'pear')
>>> friuts
['apple', 'pear', 'banana', 'orange']
>>> friuts.pop()
'orange'
>>> friuts
['apple', 'pear', 'banana']
>>> friuts.remove('apple')
>>> friuts
['pear', 'banana']
>>> friuts.clear()
>>> friuts
[]
>>>

extend 使用一个列表来扩展列表，相当于合并两个列表

>>> x = [1, 2, 3]; y = [5, 7]
>>> x.extend(y)
>>> x
[1, 2, 3, 5, 7]
>>>

reverse 翻转列表。 sort 成员方法在修改当前列表的元素，对其进行排序。 sorted 返回排序后的副本。

>>> x = [4, 6, 2, 1, 0, 6]
>>> x.reverse()
>>> x
[6, 0, 1, 2, 6, 4]
>>> y = sorted(x)
>>> x
[6, 0, 1, 2, 6, 4]
>>> y
[0, 1, 2, 4, 6, 6]
>>> x.sort()
>>> x
[0, 1, 2, 4, 6, 6]
>>> friuts = ['apple', 'pear', 'banana', 'orange']
>>> friuts.sort(key=len) # 按单词长度排序
>>> friuts
['pear', 'apple', 'banana', 'orange']
>>> friuts.sort() # 按字典序排序
>>> friuts
['apple', 'banana', 'orange', 'pear']
>>>

一个非常重要的技巧， 通过赋值的方式来增加和删除列表中的元素 。

>>> numbers = [1, 5]
>>> numbers[1:1] = [2, 3, 4] # add elements by assign
>>> numbers
[1, 2, 3, 4, 5]
>>> numbers[-3:] = [] # delete elements by assign empty list
>>> numbers
[1, 2]
>>>

6.1.4 统计列表信息: 元素存在性判断，长度，最大最小值

使用 in 可以判断当前元素是否在一个列表里。 len 是求列表的长度， min 求列表中的最小值。

>>> greeting = 'Hello'
>>> 'x' in greeting
False
>>> 'l' in greeting
True
>>> len(greeting)
5
>>> min(greeting)
'H'
>>>

6.1.5 列表查找

count 对列表中的元素计数

>>> numbers = [1, 2, 1, 3, 4, 2, 1]
>>> numbers.count(1)
3
>>>

index 查找元素，返回元素下标。如果元素不存在则抛出 ValueError 异常

>>> friuts = ['apple', 'banana', 'orange']
>>> friuts.index("apple")
0
>>> friuts.index("foo")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: 'foo' is not in list
>>>

6.2 字符串

6.2.1 基本操作，格式化输出，模板字符串

字符串有类似于列表的索引操作，也可以使用 % 和元组的组合方式来输出格式化字符串。

>>> url = 'http://jeanhwea.github.io'
>>> url[-2:]
'io'
>>> fmt = 'first: %s, second: %s'
>>> val = ('hello', 'Jeanhwea')
>>> fmt % val
'first: hello, second: Jeanhwea'
>>>

Python 支持字符串模板的操作，但是并没有 Ruby 那么好用，一般的操作方式如下：

>>> from string import Template
>>> s = Template('$friut is $color') # using $$ to diplay $
>>> data = {'friut': 'apple', 'color': 'red'}
>>> s.substitute(friut='banana', color='yellow')
'banana is yellow'
>>> s.substitute(data)
'apple is red'
>>>

另外一直比较常用的模板字符串形式如下，操作清晰易懂，建议使用这种方式来操作字符串

>>> foo = 'foo'
>>> bar = 'bar'
>>> '%s%s' % (foo, bar)
'foobar'
>>> '{0}{1}'.format(foo, bar)
'foobar'
>>> '{foo}{bar}'.format(foo=foo, bar=bar)
'foobar'
>>> '{{foo}}{bar}'.format(foo=foo, bar=bar)
'{foo}bar'
>>>

6.2.2 字符串索引方式

#  +---+---+---+---+---+---+
#  | P | y | t | h | o | n |
#  +---+---+---+---+---+---+
#  0   1   2   3   4   5   6
# -6  -5  -4  -3  -2  -1
>>> python = 'Python'
>>> python[0]
'P'
>>> python[-1]
'n'
>>> python[-3]
'h'
>>>

6.2.3 字符串查找

find 查找字符串的内容, 类似的有 lfind 和 rfind 。 startswith 和 endswith 判定开头和结尾字母。

>>> url = 'http://jeanhwea.github.io'
>>> url.find('jeanhwea')
7
>>> url.find('nothing')
-1
>>> start = 10
>>> url.find('e', start)
13
>>> 'hello, man'.startswith('hi')
False
>>> 'hello, man'.startswith('hello')
True
>>> 'hello, man'.endswith('man')
True
>>>

6.2.4 修改字符串：替换，删除空格

replace 替换字符串

str = "Hello, world"
>>> str.replace("world", "Jinghui")
'Hello, Jinghui'
>>> "aaba".replace("a", "$")
'$$b$'
>>> "aaba".replace("a", "$", 1)
'$aba'
>>>

strip 可以移除字符串前后的空白字符，另外有 lstrip 和 rstrip 。其它一些转化大小写的函数见代码演示。

>>> foo = '   internal whitespace is kept    '
>>> foo.strip()
'internal whitespace is kept'
>>> foo.lstrip()
'internal whitespace is kept    '
>>> foo.rstrip()
'   internal whitespace is kept'
>>> foo.upper()
'   INTERNAL WHITESPACE IS KEPT    '
>>> foo.lower()
'   internal whitespace is kept    '
>>> foo.strip().capitalize()
'Internal whitespace is kept'
>>> from string import capwords
>>> capwords(foo)
'Internal Whitespace Is Kept'
>>>

6.2.5 字符串和列表转化: split join

join 连接字符串， split 分割字符串

>>> dirs = 'home' , 'hujh', 'Projects' # tuple
>>> dirs
('home', 'hujh', 'Projects')
>>> '/'.join(dirs)
'home/hujh/Projects'
>>> seq = [1, 2, 4]
>>> '+'.join([str(n) for n in seq])
'1+2+4'
>>> '1+2+3+4'.split('+')
['1', '2', '3', '4']
>>>

6.2.6 正则表达式

正则表达式是处理文档的必备工具，常用的有 search ， match ， findall ， finditer 这几个函数。

search 若 string 中包含 pattern 子串，则返回 Match 对象，否则返回 None，注意，如果 string 中存在多个 pattern 子串，只返回第一个。

match 从首字母开始开始匹配，string 如果包含 pattern 子串，则匹配成功，返回 Match 对象，失败则返回 None，若要完全匹配，pattern 要以$结尾。

findall 返回 string 中所有与 pattern 相匹配的全部字串，返回形式为数组。

>>> import re
>>> re.search(r'(abc)', 'hello abc.')
<_sre.SRE_Match object; span=(6, 9), match='abc'>
>>> m = re.search(r'(abc)', 'hello abc.')
>>> m.group(0)
'abc'
>>> m = re.match(r'(abc)', 'hello abc.')
>>> m.group(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'
>>> m = re.match(r'(\w+)', 'hello abc.')
>>> m.group(0)
'hello'
>>> re.findall(r'\w+', 'hello abc.')
['hello', 'abc']
>>>

字符串替换 re.sub(regexp, des, str)

>>> import re
>>> re.sub(r'[0-9]+', '_', 'Hello2World422')
'Hello_World_'

6.3 字典

6.3.1 基本操作：字典的添加、删除、修改

>>> items = [('name', 'Jeanhwea'), ('age', '24')]
>>> d = dict(items)
>>> d['name']
'Jeanhwea'
>>> d['gender'] = 'male'
>>> d
{'name': 'Jeanhwea', 'age': '24', 'gender': 'male'}
>>> len(d)
3
>>> del d['age']
>>> d
{'name': 'Jeanhwea', 'gender': 'male'}
>>> 'name' in d
True
>>> d
{'name': 'Jeanhwea', 'gender': 'male'}
>>> d.clear()
>>> d
{}
>>>

关于字典引用的相关操作， 如何优雅地置空原字典而不影响引用的列表

>>> x = {}
>>> x['key1'] = 'val1'
>>> x
{'key1': 'val1'}
>>> y = x
>>> y
{'key1': 'val1'}
>>> x.clear() # clear x as well as y
>>> y
{}

>>> x['key2'] = 'val2'
>>> x
{'key2': 'val2'}
>>> y
{'key2': 'val2'}
>>> x = {} # bind x to {}, while y stay it old state
>>> y
{'key2': 'val2'}
>>>

6.3.2 浅拷贝和深拷贝

字典也有浅拷贝和深拷贝的区别，具体见下面代码。

>>> # 浅拷贝
>>> x = { 'name': 'Jeanhwea', 'friends': ['Jack', 'Alice'] }
>>> y = x.copy()
>>> y['name'] = 'Wang'
>>> x
{'name': 'Jeanhwea', 'friends': ['Jack', 'Alice']}
>>> y
{'name': 'Wang', 'friends': ['Jack', 'Alice']}
>>> y['friends'].remove('Jack')
>>> x
{'name': 'Jeanhwea', 'friends': ['Alice']}
>>> y
{'name': 'Wang', 'friends': ['Alice']}
>>>
>>> # 深拷贝
>>> x = { 'name': 'Jeanhwea', 'friends': ['Jack', 'Alice'] }
>>> from copy import deepcopy
>>> y = deepcopy(x)
>>> y['name'] = 'Wang'
>>> x
{'name': 'Jeanhwea', 'friends': ['Jack', 'Alice']}
>>> y
{'name': 'Wang', 'friends': ['Jack', 'Alice']}
>>> y['friends'].remove('Jack')
>>> x
{'name': 'Jeanhwea', 'friends': ['Jack', 'Alice']}
>>> y
{'name': 'Wang', 'friends': ['Alice']}
>>>

6.3.3 构造字典

fromkeys 通过列表生成字典。查字典时， get 方法不会引起异常，直接索引会引起异常。

>>> keys = ['a', 'b', 'c']
>>> {}.fromkeys(keys)
{'a': None, 'b': None, 'c': None}
>>> {}.fromkeys(keys, '(none)')
{'a': '(none)', 'b': '(none)', 'c': '(none)'}
>>> x = {'k1': 'val1', 'k2': 'val2'}
>>> x['c']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'c'
>>> x.get('c')
>>> x.get('c') == None
True
>>>

下面的创建字典的方法结果是一样的

>>> dict(one=1, two=2, three=3)
{'one': 1, 'two': 2, 'three': 3}
>>> {'one': 1, 'two': 2, 'three': 3}
{'one': 1, 'two': 2, 'three': 3}
>>> dict(zip(['one', 'two', 'three'], [1, 2, 3]))
{'one': 1, 'two': 2, 'three': 3}
>>> dict([('two', 2), ('one', 1), ('three', 3)])
{'two': 2, 'one': 1, 'three': 3}
>>> dict({'three': 3, 'one': 1, 'two': 2})
{'three': 3, 'one': 1, 'two': 2}
>>>

6.3.4 键是否存在和迭代器

has_key 在 Python3.x 中已经弃用，建议用 in 关键字判断是否在字典中。

x = {'k1': 'val1', 'k2': 'val2'}
x.has_key('k1') #>>> True
x.has_key('c') #>>> False
'k1' in x

x = {'k1': 'val1', 'k2': 'val2'}
>>> x.items()
dict_items([('k1', 'val1'), ('k2', 'val2')])
>>> x.values()
dict_values(['val1', 'val2'])
>>> x.keys()
dict_keys(['k1', 'k2'])

for k, v in x.iteritems():
    pass
for v in x.itervalues():
    pass
for k in x.iterkeys():
    pass

6.3.5 更新字典

update 方法通过一个字典的内容来替换另外一个字典。

>>> p1 = dict(x=0,y=0)
>>> p1
{'x': 0, 'y': 0}
>>> p2 = dict(x=1, y=2)
>>> p2
{'x': 1, 'y': 2}
>>> p1.update(p2)
>>> p1
{'x': 1, 'y': 2}
>>> p2
{'x': 1, 'y': 2}
>>>

6.3.6 获取字典的值

setdefault 用于设置字典的默认值，如果元素存在则返回元素的值，并将值写入字典。=get= 方法有同样的取值效果，但是不会将字典中不存在的值写入字典中。

>>> person = dict(name='Jinghui', age=18)
>>> person
{'name': 'Jinghui', 'age': 18}
>>> person.setdefault('name', 'anonymous')
'Jinghui'
>>> person.setdefault('birthday', 'unknown')
'unknown'
>>> person
{'name': 'Jinghui', 'age': 18, 'birthday': 'unknown'}
>>> person['height']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'height'
>>> person.setdefault('height', 120)
120
>>> person['height']
120
>>>

6.4 时间和日期

6.4.1 基本操作

time 模块提供和时间相关的处理函数。 datetime 提供和日期相关的处理函数

>>> import time
>>> time.time()
1562066055.218775
>>> int(time.time())
1562066055
>>>
>>> from datetime import datetime, timedelta
>>> datetime.today()
datetime.datetime(2019, 7, 2, 19, 14, 15, 427266)
>>>
>>> year = timedelta(days=365)
>>> year
datetime.timedelta(365)
>>> year.total_seconds()
31536000.0
>>> datetime.today() + year
datetime.datetime(2020, 7, 1, 19, 14, 15, 692306)
>>>

6.4.2 time 模块

time 模块中的所有时间通过 time.struct_time 数据结构存储，一般使用 struct_time 作为时间格式转换的中间变量。

>>> time.localtime() # local time
time.struct_time(tm_year=2019, tm_mon=7, tm_mday=2, tm_hour=19, tm_min=15, tm_sec=46, tm_wday=1, tm_yday=183, tm_isdst=0)
>>> time.gmtime()    # UTC time
time.struct_time(tm_year=2019, tm_mon=7, tm_mday=2, tm_hour=11, tm_min=15, tm_sec=46, tm_wday=1, tm_yday=183, tm_isdst=0)
>>>

6.4.3 时间戳和 `struct_time` 转换

>>> now = time.time()
>>> time.localtime(now) # timestamp -> struct_time
time.struct_time(tm_year=2019, tm_mon=7, tm_mday=2, tm_hour=19, tm_min=18, tm_sec=38, tm_wday=1, tm_yday=183, tm_isdst=0)
>>> local_time = time.localtime()
>>> time.mktime(local_time) # the inverse function of localtime(), struct_time -> timestamp
1562066319.0
>>>

6.4.4 格式化处理时间, 字符串和 `struct_time` 之间转换

>>> fmt = '%Y-%m-%d %H:%M:%S'
>>> time.strftime(fmt, time.localtime())
'2019-07-02 19:19:38'
>>> time.strftime(fmt, time.gmtime())
'2019-07-02 11:19:38'
>>> time.strptime('2018-10-24 14:51:03', fmt)
time.struct_time(tm_year=2018, tm_mon=10, tm_mday=24, tm_hour=14, tm_min=51, tm_sec=3, tm_wday=2, tm_yday=297, tm_isdst=-1)
>>>

6.4.5 时间戳和字符串之间转换

需要借助 Python 的 time 模块中的 struct_time 作为中间数据结构来进行转换

>>> fmt = '%Y-%m-%d %H:%M:%S'
>>> now = time.time()
>>> time.strftime(fmt, time.localtime(now))
'2019-07-02 19:20:42'
>>> time.mktime(time.strptime('2018-10-24 15:03:46', fmt))
1540364626.0
>>>

6.4.6 时间日期格式化字符串含义表

具体定义见下表：

Directive	Meaning
%a	Locale’s abbreviated weekday name.
%A	Locale’s full weekday name.
%b	Locale’s abbreviated month name.
%B	Locale’s full month name.
%c	Locale’s appropriate date and time representation.
%d	Day of the month as a decimal number [01,31].
%H	Hour (24-hour clock) as a decimal number [00,23].
%I	Hour (12-hour clock) as a decimal number [01,12].
%j	Day of the year as a decimal number [001,366].
%m	Month as a decimal number [01,12].
%M	Minute as a decimal number [00,59].
%p	Locale’s equivalent of either AM or PM.
%S	Second as a decimal number [00,61].
%U	Week number of the year (Sunday as the first day of the week) as a decimal number [00,53]. All days in a new year preceding the first Sunday are considered to be in week 0.
%w	Weekday as a decimal number [0(Sunday),6].
%W	Week number of the year (Monday as the first day of the week) as a decimal number [00,53]. All days in a new year preceding the first Monday are considered to be in week 0.
%x	Locale’s appropriate date representation.
%X	Locale’s appropriate time representation.
%y	Year without century as a decimal number [00,99].
%Y	Year with century as a decimal number.
%z	Time zone offset indicating a positive or negative time difference from UTC/GMT of the form +HHMM or -HHMM, where H represents decimal hour digits and M represents decimal minute digits [-23:59, +23:59].
%Z	Time zone name (no characters if no time zone exists).
%%	A literal '%' character.

6.4.7 线程休眠

sleep 可以是线程休眠相应的秒数。

import time
time.sleep(5) # 线程休眠 5 秒

6.5 用户输入

6.5.1 `raw_input` 和 `input`

Python3.x 里面已经把 raw_input() 给去掉了。事实上是这样的：在 Python3.x 内，将 raw_input() 重命名为 input() ，这样一来，无须导入也能从标准输入获得数据了。如果您需要保留版本 Python2.x 的 input() 功能，可以使用 eval(input()) ，效果基本相同。Python2.x 中， raw_input() 会从标准输入 sys.stdin 读取一个输入并返回一个字符串，且尾部的换行符从末尾移除。其中关于读取用户输入的样例如下：

import os, sys

if __name__ == '__main__' :
  # read raw string
  name = raw_input('name = ')
  print('your name is ' + name)

  # read a expression, for example, integer
  age = input('age = ')
  print(age+1)

6.5.2 sys.argv 参数

sys.argv 其实是一个列表，在可以直接读取，用法如下：

import sys
if __name__ == '__main__':
  print(sys.argv)

6.6 命令行参数

argparse 是 Python 的标准库，可以用来解析命令行参数，非常好用，请参考 doc

import argparse

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description="short decription for this
    command.")
    parser.add_argument("-v", "--verbose", action="store_true", help="boolean for verbose")
    parser.add_argument("-a", "--paraA", type=float)
    parser.add_argument("-b", "--paraB", type=float)
    parser.add_argument("folders", nargs='+', help="list of folders")
    args = parser.parse_args()

    yourfunc(args.paraA, args.paraB)

7 常用工具

7.1 单元测试

Python 自带单元测试模块， unittest

import unittest

我们可以编写继承于 unittest.TestCase 测试类的子类，并在子类中编写具体的测试函数。测试函数命必须以 test_ 开头，否则不会被识别为测试函数，进而不会在运行单元测试时被运行。

class TestSubclass(unittest.TestCase):

  def test_func(self):
    self.assertEqual(0, 0)
    # 可以通过 msg 关键字参数提供测试失败时的提示消息
    self.assertEqual(0, 0, msg='modified message')
    self.assertGreater(1, 0)
    self.assertIn(0, [0])
    self.assertTrue(True)
    # 测试是否会抛出异常
    with self.assertRaises(KeyError):
      _ = dict()[1]

  # 被@unittest.skip 装饰器装饰的测试类或测试函数会被跳过
  @unittest.skip(reason='just skip')
  def test_skip(self):
    raise Exception('I shall never be tested')

unittest.TestCase 中还有两个特殊的成员函数，他们分别会在调用每一个测试函数的前后运行。在测试前连接数据库并在测试完成后断开连接是一种常见的使用场景

def setUp(self):
  # To do: connect to the database
  pass

def tearDown(self):
  # To do: release the connection
  pass

def test_database(self):
  # To do: test the database
  pass

测试类编写完毕后，可以通过添加以下代码来将当前文件当成正常的 Python 脚本使用

if __name__ == '__main__':
  unittest.main()

7.2 日志

在进行比较大的工程中往往需要配置日志。

7.2.1 配置案例

我在 GitHub 的 Python 模板工程 Python Project Template 建了日志初始化样例。在项目模块的 init 文件中初始化空的 handler

# sample/__init__.py
# -*- coding: utf-8 -*-
__all__ = ()
# https://docs.python.org/3/howto/logging.html#configuring-logging-for-a-library
import logging
logging.getLogger(__name__).addHandler(logging.NullHandler())

在测试的模块 init 文件中读取根目录的配置文件，这样的好处是运行单元测试的时候可以看到日志的输出，当项目打包后就屏蔽了日志输出。

# test/__init__.py
# -*- coding: utf-8 -*-
import os
import sys
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))

import logging.config
from sample.tool.config import load_logging_config

logging.config.dictConfig(load_logging_config())

7.2.2 使用指南

import logging.config
logging.config.fileConfig('logging.conf')

初始化过后就可以记录日志了，这样记录日志方法可以是直接记录，也可以在类里面记录。下面是常用的记录方法：

import logging

# logging directly
logger = logging.getLogger(__name__)
logging.info("...")

# logging in Class
class Hello:

  def __init__(self):
    self.logger = logging.getLogger(__name__)

  def hello(self):
self.info("...")

7.3 excel

Python 不是自带操作 excel 的包，需要安装第三方包来完成相应的操作。常见的可以操作 excel 文件的包有：pandas, openpyxl, xlrd, xlutils 和 pyexcel。

7.3.1 pandas

读取 excel 中的数据

import pandas as pd
xl = pd.ExcelFile('example.xlsx')
print(xl.sheet_names)
df1 = xl.parse(xl.sheet_names[0])

将数据写入 excel 文件

import numpy as np
import pandas as pd
data =   pd.Series([1,3,5,6,8])
writer = pd.ExcelWriter('example.xlsx', engine='xlsxwriter')
data.to_excel(writer, 'Sheet1')
writer.save()

7.3.2 xlrd

xlrd 可以操作 excel，如果想要轻量级操作可以使用这个包，并且 pandas 读取 excel 底层也是调用 xlrd 的。 xlrd 的缺点是 只能读取，不能写入 。

import xlrd
workbook = xlrd.open_workbook('example.xlsx')
workbook = xlrd.open_workbook('example.xlsx', on_demand = True)
workbook.sheet_names() #=> ['Sheet1']
workbook.sheets() #=> [<xlrd.sheet.Sheet object at 0x000000001407E208>]
sheet = workbook.sheet_by_name('Sheet1')
sheet = workbook.sheet_by_index(0)

row, col = 3, 0
# ctype : 0 empty,1 string, 2 number, 3 date, 4 boolean, 5 error
sheet.cell(row, col).ctype #=> 2
sheet.cell(row, col).value #=> 2.0
sheet.nrows #=> 6
sheet.ncols #=> 2
sheet.col_values(0) #=> ['', 0.0, 1.0, 2.0, 3.0, 4.0]
sheet.row_values(4) #=> [3.0, 6.0]

7.3.3 xlwt

xlwt 可以写入 excel 文件，具体使用方式见代码

import xlwt

book = xlwt.Workbook(encoding="utf-8")
sheet1 = book.add_sheet("Sheet1")
sheet1.write(0, 0, "Hello world")
book.save("sheet1.xls")

book = xlwt.Workbook()
sheet1 = book.add_sheet("Sheet1")
cols = ["A", "B", "C", "D", "E"]
txt = [0,1,2,3,4]
for num in range(5):
  row = sheet1.row(num)
  for index, col in enumerate(cols):
    value = txt[index] + num
    row.write(index, value)
book.save("test.xls")

7.4 csv

Python 自带读写 csv 文件的模块，可以直接导入 csv 模块。

7.4.1 读取 csv 文件

使用函数方式读取 csv 文件

import csv
with open('eggs.csv', 'rb') as csvfile:
  spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
  for row in spamreader:
    print ', '.join(row)

使用类方式读取 csv 文件

import csv
with open('names.csv') as csvfile:
  reader = csv.DictReader(csvfile)
  for row in reader:
    print(row['first_name'], row['last_name'])

7.4.2 写入 csv 文件

使用函数方式写入 csv 文件

import csv
with open('eggs.csv', 'wb') as csvfile:
  spamwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
  spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
  spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])

使用类方式写入 csv 文件

import csv
with open('names.csv', 'w') as csvfile:
  fieldnames = ['first_name', 'last_name']
  writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
  writer.writeheader()
  writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'})
  writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'})
  writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})

7.4.3 使用 csv 读写文件的例子

import csv, codecs, cStringIO

class UTF8Recoder:
  """
  Iterator that reads an encoded stream and reencodes the input to UTF-8
  """
  def __init__(self, f, encoding):
    self.reader = codecs.getreader(encoding)(f)

  def __iter__(self):
    return self

  def next(self):
    return self.reader.next().encode("utf-8")

class UnicodeReader:
  """
  A CSV reader which will iterate over lines in the CSV file "f",
  which is encoded in the given encoding.
  """

  def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
    f = UTF8Recoder(f, encoding)
    self.reader = csv.reader(f, dialect=dialect, **kwds)

  def next(self):
    row = self.reader.next()
    return [unicode(s, "utf-8") for s in row]

  def __iter__(self):
    return self

class UnicodeWriter:
  """
  A CSV writer which will write rows to CSV file "f",
  which is encoded in the given encoding.
  """

  def __init__(self, f, dialect=csv.excel, encoding="utf-8", **kwds):
    # Redirect output to a queue
    self.queue = cStringIO.StringIO()
    self.writer = csv.writer(self.queue, dialect=dialect, **kwds)
    self.stream = f
    self.encoder = codecs.getincrementalencoder(encoding)()

  def writerow(self, row):
    self.writer.writerow([s.encode("utf-8") for s in row])
    # Fetch UTF-8 output from the queue ...
    data = self.queue.getvalue()
    data = data.decode("utf-8")
    # ... and reencode it into the target encoding
    data = self.encoder.encode(data)
    # write to the target stream
    self.stream.write(data)
    # empty queue
    self.queue.truncate(0)

  def writerows(self, rows):
    for row in rows:
      self.writerow(row)

7.5 json

7.5.1 处理 JSON 字符串和字典转化

>>> import json
>>> str1 = '{"a":1, "b":2}'
>>> json.loads(str1)
{'a': 1, 'b': 2}
>>> d1 = dict(a='apple', b='banana')
>>> json.dumps(d1)
'{"a": "apple", "b": "banana"}'
>>>

7.5.2 处理 JSON 文件和字典转化

>>> import json
>>> with open('/tmp/sample.json', 'r') as f:
...   json.load(f)
...
{'k1': 1, 'k2': 2}
>>>
>>> d2 = dict(name='Tom', age=18)
>>> with open('/tmp/d2.json', 'w') as f2:
...   json.dump(d2, f2)
...
>>>
# cat /tmp/d2.json
# {"name": "Tom", "age": 18}

7.6 yaml

7.6.1 安装包

pip install pyyaml

7.6.2 读写 yaml

import yaml
# 读取 yaml
with open(filename, 'r') as yamlfile:
  return yaml.load(yamlfile)

# 写入 yaml 文件
with open(filename, 'w') as yamlfile:
  yaml.dump(nested_dict, yamlfile)

8 社区标准 PEPs

PEP 是 Python Enhancement Proposals 的缩写，常见有以下几种：

PEP8 The Python Style Guide: 最基本的 Python 规范，需要遵循其中每一条
PEP20 The Zen of Python: 有 19 条关于如何简单明了地书写 Python 代码的建议
PEP257 Docstring Conversions: 关于书写 Python 文档建议