- Feature Name: retain_iterator
- Start Date: 10/6/2016
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)
New API for std::collections::HashMap
and std::collections::HashSet
. Modification to the retain
API for std::vec::Vec
and std::collections::vec_deque::VecDeque
.
For HashMap
:
/// Retains only the elements specified by the predicate.
///
/// In other words, remove all key-value pairs `(k, v)` such that `predicate(&k, &v)` returns false.
/// Removed key-value pairs are yielded by the returned iterator.
fn retain<P>(&mut self, predicate: P) -> Retain<K, V, P>
where P: FnMut(&K, &V) -> bool;
fn retain_mut<P>(&mut self, predicate: P) -> RetainMut<K, V, P>
where P: FnMut(&K, &mut V) -> bool;
For HashSet
:
/// Retains only the elements specified by the predicate.
///
/// In other words, remove all elements `e` such that `predicate(&e)` returns false.
/// Removed elements yielded by the returned iterator.
fn retain<P>(&mut self, predicate: P) -> Retain<T, P>
where P: FnMut(&T) -> bool;
fn retain_mut<P>(&mut self, predicate: P) -> RetainMut<T, P>
where P: FnMut(&mut T) -> bool;
For Vec
and VecDeque
:
/// Retains only the elements specified by the predicate.
///
/// In other words, remove all elements `e` such that `predicate(&e)` returns false.
/// This method operates in place and preserves the order of the retained
/// elements.
/// Removed elements yielded by the returned iterator.
fn retain<P>(&mut self, predicate: P) -> Retain<T, P>
where P: FnMut(&T) -> bool;
fn retain_mut<P>(&mut self, predicate: P) -> RetainMut<T, P>
where P: FnMut(&mut T) -> bool;
Specifically I've run into the following scenario, I have two HashMap
, one of ready_items
and one of unready_items
. Every once in a while, I want to go through all the unready_items
, remove all the ones that are now ready, and add them to ready_items
. Right now the only ways of doing that involve a temporary collection of some sort.
Option 1: Drain and reinsert
let new_unready: Vec<_> = unready_items.drain().filter_map(
|(k, v)| {
if v.is_ready() {
ready_items.insert(k, v);
None
} else {
Some((k, v))
}
}
).collect();
for (k, v) in new_unready { unready_items.insert(k, v); }
Option 2: Filter and remove
let new_ready = Vec<_> unready_items.iter().filter_map(
|(k, v)| if v.is_ready() { Some(k.clone()) } else { None }
).collect();
for k in new_ready {
let v = unready_items.remove(k).unwrap();
ready_items.insert(k, v);
}
On top of the temporary collection, Option 1 requires reinserting all values which were supposed to be retained, and Option 2 requires copying keys, both logically unnecessary and potentially expensive operations.
The new API avoids all of these issues.
for (k, v) in unready_items.retain(|_, v| !v.is_ready()) {
ready_items.insert(k, v);
}
For retain_mut
, as we already have have mutable ownership of the buffer, there does not seem to be a reason not to pass &mut
. As for why not just a single function that always does &mut
, this is done to minimize breakage with the current Vec
API.
The motivation to return an iterator is similar to the above reasons for HashMap
and HashSet
. Also, to keep retain
functionality consistent across containers.
The first step is to duplicate most of the boilerplate for existing iterator functions (e.g. drain
). We then define our table::Retain
struct as follows with some simple trait implementations:
pub struct Retain<'a, K: 'a, V: 'a, P: FnMut(&'a K, &'a V) -> bool> {
table: Shared<RawTable<K, V>>,
predicate: P,
/// Something to iterate through the table
iter: ...,
marker: marker::PhantomData<&'a RawTable<K, V>>,
}
unsafe impl<'a, K: Sync, V: Sync, P: FnMut(&'a K, &'a V) -> bool> Sync for Retain<'a, K, V, P> {}
unsafe impl<'a, K: Send, V: Send, P: Send+FnMut(&'a K, &'a V) -> bool> Send for Retain<'a, K, V, P> {}
impl<'a, K: 'a, V: 'a, P: FnMut(&'a K, &'a V) -> bool> Drop for Retain<'a, K, V, P> {
fn drop(&mut self) {
for _ in self {}
}
}
For the implementation of Iterator
, the simplest option just uses shift
after every removal. However this option is not very performant. Worst case, requires O(n²)
shifts. But, by simply reversing our iteration order, and starting at an empty bucket, this problem can be eliminated.
To accomplish this, we'll make a new helper iterator type (drain
will also probably need to use this):
struct RevRawBuckets<'a, K: 'a, V: 'a> {
raw: RawBucket<K, V>,
hashes_start: *const HashUint,
end: RawBucket<K, V>,
elems_left: usize,
marker: marker::PhantomData<&'a (K, V)>,
}
impl<K, V> RawTable<K, V> {
fn rev_raw_buckets(&mut self) -> RevRawBuckets<K, V> {
let mut raw = self.first_bucket_raw();
let end = unsafe { raw.offset(self.capacity as isize); }
// Find the first empty bucket
if self.size > 0 {
while let Full(_) = raw.peek() {
debug_assert!(raw.hash != end.hash);
raw = unsafe { raw.offset(1) };
}
}
RevRawBuckets {
raw: raw,
hashes_start: self.hashes,
end: end,
elems_left: self.size,
marker: marker::PhantomData,
}
}
}
impl<'a, K, V> Iterator for RevRawBuckets<'a, K, V> {
type Item = RawBucket<K, V>;
fn next(&mut self) -> Option<RawBucket<K, V>> {
if self.elems_left == 0 {
return None;
}
loop {
if self.raw.hash == self.hashes_start {
self.raw = self.end;
}
self.raw = unsafe { self.raw.offset(-1) };
if *self.raw.hash != EMPTY_BUCKET {
self.elems_left -= 1;
return Some(self.raw);
}
}
}
}
impl<'a, K, V> ExactSizeIterator for RevRawBuckets<'a, K, V> {
fn len(&self) -> usize { self.elems_left }
}
We can now define the type of the field Retain.iter
to be RevRawBuckets<'a, K, V>
, which lets us define the constructor and Iterator
implementation:
impl<K, V> RawTable<K, V> {
pub fn retain<P: FnMut(&'a K, &'a V) -> bool>(&mut self, predicate: P) -> Retain<K, V, P> {
Retain {
table: unsafe { Shared::new(self) },
predicate: predicate,
iter: self.rev_raw_buckets(),
marker: marker::PhantomData,
}
}
}
impl<'a, K, V, P: FnMut(&'a K, &'a mut V) -> bool> Iterator for Retain<'a, K, V, P> {
type Item = (SafeHash, K, V);
fn next(&mut self) -> Option<(SafeHash, K, V)> {
while let Some(bucket) = self.iter.next() {
let pair_mut = raw.pair as *mut (K, V);
if !(self.predicate)(&(*pair_mut).0, &mut (*pair_mut).1) {
// Extract the values from the table
let (hash, (k, v)) = unsafe {
(**self.table).size -= 1;
(ptr::replace(bucket.hash, EMPTY_BUCKET), ptr::read(bucket.pair))
}
let hash = SafeHash { hash: hash };
// Shift to restore the invariant
let empty = EmptyBucket {
raw: bucket,
idx: bucket.hash as usize - table.hashes_start as usize,
table: &self.table,
};
let mut gap = match empty.gap_peek() {
Some(b) => b,
None => return Some((hash, k, v)),
};
while gap.full().displacement() != 0 {
gap = match gap.shift() {
Some(b) => b,
None => break,
};
}
return Some((hash, k, v));
}
}
None
}
fn size_hint(&self) -> (usize, Option<usize>) {
(0, Some(self.iter.len()))
}
}
Replace the existing retain
function with a version which sets the length to 0
to start with and returns an iterator object Retain
. This is done for much the same reason it's done in drain
, to ensure that removed items are not accessible if Drop
is not called.
The Retain
object implements Sync
, Send
, and Drop
in much the same ways that Drain
does.
pub struct Retain<'a, T: 'a, P: FnMut(&T) -> bool> {
/// Iterator to track current position in the
iter: Enumerate<slice::Iter<'a, T>>,
vec: Shared<Vec<T>>,
predicate: P,
}
#[stable(feature = "drain", since = "1.6.0")]
unsafe impl<'a, T: Sync, P: FnMut(&T) -> bool> Sync for Drain<'a, T> {}
#[stable(feature = "drain", since = "1.6.0")]
unsafe impl<'a, T: Send: P: Send + FnMut(&T) -> bool> Send for Drain<'a, T> {}
pub fn retain<P: FnMut(&T) -> bool>(&mut self, predicate: P) -> Retain<T> {
unsafe {
// Use the borrow in the IterMut to indicate borrowing behavior of the
// whole Retain iterator (like &mut T).
let range_slice = slice::from_raw_parts_mut(self.as_mut_ptr(), self.len());
// set self.vec length's to 0, to be safe in case Retain is leaked
self.set_len(0);
Retain {
iter: range_slice.iter().enumerate(),
vec: Shared::new(self as *mut _),
predicate: predicate,
}
}
}
Unlike the current retain
implementation, which uses swap
, we will be returning items each iteration, so our Iterator
implementation loops through checking if items satisfy the predicate
. Items which fail, are returned by value, items which succeed are written to the correct index in the buffer, and the vec size is updated. Like the current implementation of retain
, items which satisfy the predicate
before any deletions take place don't incur a write penalty, as they're already in the correct index in the buffer.
impl<'a, T> Iterator for Retain<'a, T> {
type Item = T;
#[inline]
fn next(&mut self) -> Option<T> {
while let Some((i, v)) = self.next() {
let len = self.vec.len();
if !self.predicate(v) {
let v = unsafe { ptr::read(v as *const _) };
return Some(v);
} else if i > len {
unsafe {
let dst = self.vec.get_unchecked_mut(len);
ptr::copy_nonoverlapping(dst as *mut T, v as *const T, 1);
}
}
unsafe { self.vec.set_len(len + 1); }
}
}
fn size_hint(&self) -> (usize, Option<usize>) {
(0, self.iter.len())
}
}
- More APIs, more complexity.
- Technically breaks
mem::forget(vec.retain(bar))
, as currentlyretain
returns()
. I seriously doubt anyone is doing this. - Like
iter
anddrain
, we expose the HashMap iteration order and thus enable O(n²) blowup.- A potential mitigation is to randomly alternate between 2+ disjoint iterators, though the performance cost might be too high.
- As always, do nothing.
- Merge
retain
andretain_mut
.
- Is it worth the duplication of
retain_mut
to avoid breakages? Would it break any existing code?