Python性能比较

最近遇到不少Python性能问题，对Python性能产生了一定怀疑，因此简单测量下一些常见操作的不同实现方式的性能差异，以此作为日常开发的一个指引。

简单实现一个耗时计算函数，没有使用timeit，避免包含函数调用之类的开销，

dict的iteritems与items

代码

def profile(num=1000):
	start = time.clock()
	data = {x: x ** 2 for x in xrange(100)}
	for x in xrange(num):
		for k, v in data.iteritems():
			_, _ = k, v
	end = time.clock()
	print 'cost %.7f' % (end - start)

def profile(num=1000):
	start = time.clock()
	data = {x: x ** 2 for x in xrange(100)}
	for x in xrange(num):
		for k, v in data.items():
			_, _ = k, v
	end = time.clock()
	print 'cost %.7f' % (end - start)

结果

iteritems	items
0.0051353	0.0069680

结论

同理iterkeys、itervalues的性能也会略优于keys、values。在只需要进行迭代访问的时候优先使用iter系函数。

x not in y与not x in y

代码

def profile(num=100000):
	start = time.clock()
	data = {1 for x in xrange(10)}
	for x in xrange(num):
		50 not in data
	end = time.clock()
	print 'cost %.7f' % (end - start)

def profile(num=100000):
	start = time.clock()
	data = {1 for x in xrange(10)}
	for x in xrange(num):
    not 50 in data
	end = time.clock()
	print 'cost %.7f' % (end - start)

结果

x not in y	not x in y
0.0043617	0.0044166

结论

这两种写法之间没有性能上的差异，但PEP里面推荐的写法是第一种，那么就按照PEP中的方式进行吧。

if xxx与if is True/False/None

代码

def profile(num=100000):
	start = time.clock()
	v = True
	for x in xrange(num):
		if v is True:
			pass
	end = time.clock()
	print 'cost %.7f' % (end - start)

def profile(num=100000):
	start = time.clock()
	v = True
	for x in xrange(num):
		if v:
			pass
	end = time.clock()
	print 'cost %.7f' % (end - start)

结果

if xxx is True/False/None	if xxx
0.0041141	0.0020919

结论

实际只测试了is True，其余估计也类似。直接if性能上还是强不少的。在不影响语义的情况下，尽量直接使用if xxx。

`setattr`重载之后的性能比较

代码

class Foo(object):
	def __init__(self):
		super(Foo, self).__init__()
		self.tag = 0


def profile(num=100000):
	start = time.clock()
	foo = Foo()
	for x in xrange(num):
		foo.tag = x
	end = time.clock()
	print 'cost %.7f' % (end - start)

class Foo(object):
	def __init__(self):
		super(Foo, self).__init__()
		self.tag = 0

	def __setattr__(self, key, value):
		super(Foo, self).__setattr__(key, value)


def profile(num=100000):
	start = time.clock()
	foo = Foo()
	for x in xrange(num):
		foo.tag = x
	end = time.clock()
	print 'cost %.7f' % (end - start)

结果

未重载	重载之后
0.0069106	0.0677060

结论

重载__setattr__之后，对属性的赋值操作有近10倍的性能下降，这还是在__setattr__中未有其它逻辑，而只是简单调用super函数的结果。在非必要或有极大好处的情况下，尽可能减少__setattr__重载。

`getattr`、`getattribute`重载之后的性能比较

代码

class Foo(object):
	def __init__(self):
		super(Foo, self).__init__()
		self.tag = 0

def profile(num=100000):
	start = time.clock()
	foo = Foo()
	for x in xrange(num):
		foo.tag
	end = time.clock()
	print 'cost %.7f' % (end - start)

class Foo(object):
	def __init__(self):
		super(Foo, self).__init__()
		self.tag = 0

	def __getattr__(self, item):
		return self[item]


def profile(num=100000):
	start = time.clock()
	foo = Foo()
	for x in xrange(num):
		foo.tag
	end = time.clock()
	print 'cost %.7f' % (end - start)

class Foo(object):
	def __init__(self):
		super(Foo, self).__init__()
		self.tag = 0

	def __getattribute__(self, item):
		return super(Foo, self).__getattribute__(item)


def profile(num=100000):
	start = time.clock()
	foo = Foo()
	for x in xrange(num):
		foo.tag
	end = time.clock()
	print 'cost %.7f' % (end - start)

结果

未重载	重载`__getattr__`之后	重载`__getattribute__`之后
0.0049022	0.0060706	0.0584793

结论

在重载__getattribute__之后也会导致近10倍的性能下降__getattr__只有在访问不到属性的时才被调用，因此对性能影响不大。考虑到类属性访问比变更要频繁得多，因此对__getattribute__要更为慎重。

属性直接读写、getter/setter函数、property性能比较

代码

class Foo(object):
	def __init__(self):
		super(Foo, self).__init__()
		self.tag = 0


def profile(num=100000):
	start = time.clock()
	foo = Foo()
	for x in xrange(num):
		foo.tag
	end = time.clock()
	print 'cost %.7f' % (end - start)

class Foo(object):
	def __init__(self):
		super(Foo, self).__init__()
		self.tag = 0

	def get_tag(self):
		return self.tag


def profile(num=100000):
	start = time.clock()
	foo = Foo()
	for x in xrange(num):
		foo.get_tag()
	end = time.clock()
	print 'cost %.7f' % (end - start)

class Foo(object):
	def __init__(self):
		super(Foo, self).__init__()
		self._tag = 0

	@property
	def tag(self):
		return self._tag


def profile(num=100000):
	start = time.clock()
	foo = Foo()
	for x in xrange(num):
		foo.tag
	end = time.clock()
	print 'cost %.7f' % (end - start)

结果

直接访问属性字段	getter函数形式访问	property形式访问
0.0052441	0.0189515	0.0190965

直接设置属性字段	setter函数形式设置	property形式设置
0.0070000	0.0190001	0.0249999

结论

直接对属性进行读写效率最高，property形式最慢，但在可读性上与getter/setter相比更优。在无特定逻辑情况下，考虑直接访问，需要在读写时进行一定逻辑处理考虑property。

监听属性变化采用`setattr`与property的性能比较

代码

class Foo(object):
	def __init__(self):
		super(Foo, self).__init__()
		self.tag = 0

	def __setattr__(self, key, value):
		if hasattr(self, key) and key == 'tag':
			self.on_tag_changed(getattr(self, key), value)
		super(Foo, self).__setattr__(key, value)

	def on_tag_changed(self, old, new):
		pass


def profile(num=10):
	start = time.clock()
	foo = Foo()
	for x in xrange(num):
		foo.tag = x
	end = time.clock()
	print 'cost %.7f' % (end - start)

class Foo(object):
	def __init__(self):
		super(Foo, self).__init__()
		self._tag = 0

	@property
	def tag(self):
		return self._tag

	@tag.setter
	def tag(self, val):
		self.on_tag_changed(self._tag, val)
		self._tag = val

	def on_tag_changed(self, old, new):
		pass


def profile(num=10):
	start = time.clock()
	foo = Foo()
	for x in xrange(num):
		foo.tag = x
	end = time.clock()
	print 'cost %.7f' % (end - start)

结果

使用`__setattr__`	使用property
0.0000228	0.0000109

结论

在需要监听属性变化时，采用property方式相对会更有效率，性能相差近一倍，但如果需要监听的属性很多，那么可以考虑__setattr__，或者如果监听逻辑很相似，可以考虑自动生成property，减少重复工作。

import语句在函数中的性能影响

代码

def foo():
	pass


def profile(num=100000):
	start = time.clock()
	for x in xrange(num):
		foo()
	end = time.clock()
	print 'cost %.7f' % (end - start)

def foo():
	import os


def profile(num=100000):
	start = time.clock()
	for x in xrange(num):
		foo()
	end = time.clock()
	print 'cost %.7f' % (end - start)

结果

无局部import	有局部import
0.0108807	0.0634075

结论

将import放于函数内部的好处在于避免循环import错误、降低初次启动时间，但import语句会在每次函数调用时进行，实际上对性能是有影响的。在非必要情况下不应该将import放入局部函数中。