Skip to content

Instantly share code, notes, and snippets.

@cjolowicz
Last active July 2, 2021 14:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cjolowicz/1c60b784dd010450fd2c2a478ea3bb49 to your computer and use it in GitHub Desktop.
Save cjolowicz/1c60b784dd010450fd2c2a478ea3bb49 to your computer and use it in GitHub Desktop.

This is now on PyPI and GitHub:

Lazy sequences

A lazy sequence makes an iterator look like an immutable sequence:

from lazysequence import lazysequence

def load_records():
    yield from [1, 2, 3, 4, 5, 6]  # pretend each iteration is expensive

records = lazysequence(load_records())
if not records:
    raise SystemExit("no records found")

first, second = records[:2]

print("The first record is", first)
print("The second record is", second)

for record in records.release():  # do not cache all records in memory
    print("record", record)

Sometimes you need to peek ahead at items returned by an iterator. But what if later code needs to see all the items from the iterator? Then you have some options:

  1. Pass any consumed items separately. This can get messy, though.
  2. Copy the iterator into a sequence beforehand, if that does not take a lot of space or time.
  3. Duplicate the iterator using itertools.tee, or write your own custom itertool that buffers consumed items internally. There are some good examples of this approach on SO, by Alex Martelli, Raymond Hettinger, and Ned Batchelder.

A lazy sequence combines advantages from option 2 and option 3. It is an immutable sequence that wraps the iterable and caches consumed items in an internal buffer. By implementing collections.abc.Sequence, lazy sequences provide the full set of sequence operations on the iterable. Unlike a copy (option 2), but like a duplicate (option 3), items are only consumed and stored in memory as far as required for any given operation.

There are some caveats:

  • The lazy sequence will eventually store all items in memory. If this is a problem, use s.release() to obtain an iterator over the sequence items without further caching. After calling this function, the sequence should no longer be used.
  • Explicit is better than implicit. Clients may be better off being passed an iterator and dealing with its limitations. For example, clients may not expect len(s) to incur the cost of consuming the iterator to its end.
"""Lazy sequences."""
from collections.abc import Sequence
class LazySequence(Sequence):
def __init__(self, iterable):
self._iter = iter(iterable)
self._cache = []
def __iter__(self):
yield from self._cache
for item in self._iter:
self._cache.append(item)
yield item
def __len__(self):
return sum(1 for _ in self)
def __getitem__(self, index):
if isinstance(index, slice):
return LazySequence(
self[position] for position in range(*index.indices(len(self)))
)
if index < 0:
index += len(self)
for position, item in enumerate(self):
if index == position:
return item
raise IndexError("LazySequence index out of range")
"""Unit tests for lazysequence."""
import pytest
from lazysequence import LazySequence
def test_init():
"""It is created from an iterable."""
LazySequence([])
def test_len():
"""It returns the number of items."""
s = LazySequence([])
assert 0 == len(s)
def test_getitem():
"""It returns the item at the given position."""
s = LazySequence([1])
assert 1 == s[0]
def test_getitem_second():
"""It returns the item at the given position."""
s = LazySequence([1, 2])
assert 2 == s[1]
def test_getitem_negative():
"""It returns the item at the given position."""
s = LazySequence([1, 2])
assert 2 == s[-1]
def test_getitem_past_cache():
"""It returns the item at the given position."""
s = LazySequence([1, 2])
assert (1, 2) == (s[0], s[1])
def test_getslice():
"""It returns the items at the given positions."""
s = LazySequence([1, 2])
[item] = s[1:]
assert 2 == item
def test_outofrange():
"""It raises IndexError."""
s = LazySequence([])
with pytest.raises(IndexError):
s[0]
def test_bool():
"""It is False for an empty sequence."""
s = LazySequence([])
assert not s
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment