django query set会对查询结果做一定程度的cache,搞清楚其中的cache是怎么实现的,有助于编写相对高效的代码。

以下基于django 1.3.4版本,首先来看下现象,摘抄自stackoverflow,

recs = TrackingImport.objects.filter(...stuff...)

In [102]: time(recs[0])
Wall time: 1.84 s

In [103]: time(recs[0])
Wall time: 1.84 s

In [104]: len(recs)
Out[104]: 1823

In [105]: time(recs[0])
Wall time: 0.00 s

然后就直奔代码吧,在django.db.model.query中,

  def __getitem__(self, k):
      """
      Retrieves an item or slice from the set of results.
      """
      if not isinstance(k, (slice, int, long)):
          raise TypeError
      assert ((not isinstance(k, slice) and (k >= 0))
              or (isinstance(k, slice) and (k.start is None or k.start >= 0)
                  and (k.stop is None or k.stop >= 0))), \
              "Negative indexing is not supported."

      if self._result_cache is not None:
          if self._iter is not None:
              # The result cache has only been partially populated, so we may
              # need to fill it out a bit more.
              if isinstance(k, slice):
                  if k.stop is not None:
                      # Some people insist on passing in strings here.
                      bound = int(k.stop)
                  else:
                      bound = None
              else:
                  bound = k + 1
              if len(self._result_cache) < bound:
                  self._fill_cache(bound - len(self._result_cache))
          return self._result_cache[k]

      if isinstance(k, slice):
          qs = self._clone()
          if k.start is not None:
              start = int(k.start)
          else:
              start = None
          if k.stop is not None:
              stop = int(k.stop)
          else:
              stop = None
          qs.query.set_limits(start, stop)
          return k.step and list(qs)[::k.step] or qs
      try:
          qs = self._clone()
          qs.query.set_limits(k, k + 1)
          return list(qs)[0]
      except self.model.DoesNotExist, e:
          raise IndexError(e.args)

可以看到在获query set中某个元素时是否使用cache是有条件的,在cache为空的情况下就直接跳到了下面这部分,

  try:
     qs = self._clone()
     qs.query.set_limits(k, k + 1)
     return list(qs)[0]
  except self.model.DoesNotExist, e:
     raise IndexError(e.args)

这部分代码的返回结果是不会被放到cache中的,所以连续两次执行的响应时间都较长。这里的cache只针对完整的query set结果,而非其中的一两条。

比如调用len后cache就有值了,

  def __len__(self):
      # Since __len__ is called quite frequently (for example, as part of
      # list(qs), we make some effort here to be as efficient as possible
      # whilst not messing up any existing iterators against the QuerySet.
      if self._result_cache is None:
          if self._iter:
              self._result_cache = list(self._iter)
          else:
              self._result_cache = list(self.iterator())
      elif self._iter:
          self._result_cache.extend(self._iter)
      return len(self._result_cache)

所以在实际代码编写的过程中,需要注意cache生效的时间点,不然又不环保了。