-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The implementation of InPlaceIterable
for Flatten
&FlatMap
is unsound
#135103
Comments
Oof... looks like I didn't learn from #85322 :( Edit: PR at #135104 |
Right… either change something about the clone; or add some way to detect the after-clone state. I'm thinking the latter might be quite approachable. If there's a method added to … So that's one method like /// Conservaticely returns true whenever the iterator *may* contain extra items on the side
/// E.g. if the peeked element is `Some` in `Peekable`,
/// or either `frontiter` or `backiter` is `Some` for `Flatten`/`FlatMap`
fn has_extra_items(&self) -> bool E.g. as an easy first extra step, there can be a /// For optimization, `true`, if `has_extra_items` *never* returns `true`
const NEVER_HAS_EXTRA_ITEMS: bool; One can potentially get successively even more precise later… e.g. we consider if it's worth adding an additional field to Footnotes
|
Hrrm, you mean with fallback to the default collect implementation? Yeah, that would work but I'm concerned that it'd lead to code bloat. Currently the path is chosen at compile time, a runtime fallback would require both paths to exist. I'm thinking about specializing clone calls to pass down an upper bound of how many empty slots it'd have to preserve (if any), similar to the
That could work for |
I guess that's a valid concern.
Something like a |
Not certain, but it might work. |
Looks like |
…flatten, r=Mark-Simulacrum do not in-place-iterate over flatmap/flatten The implementation is unsound when a partially consumed iterator has some elements buffered in the front/back parts and cloning the Iterator removes the capacity from the backing vec::IntoIter. This is a fix for rust-lang#135103 that removes the specialization trait impls without removing some supporting parts. I've kept it small so it can be easily backported. I'll either remove the remaining parts or think of a way to recover the optimization in a separate PR.
And doing it through a helper trait runs into the issue that it'll require a blanket impl and then specializing that, which requires full specialization for const items, not min_specialization. |
Rollup merge of rust-lang#135104 - the8472:disable-in-place-iter-for-flatten, r=Mark-Simulacrum do not in-place-iterate over flatmap/flatten The implementation is unsound when a partially consumed iterator has some elements buffered in the front/back parts and cloning the Iterator removes the capacity from the backing vec::IntoIter. This is a fix for rust-lang#135103 that removes the specialization trait impls without removing some supporting parts. I've kept it small so it can be easily backported. I'll either remove the remaining parts or think of a way to recover the optimization in a separate PR.
@the8472 I came up with a possible way to attach a [notably this should work for supporting OR-like logic for This approach is a bit verbose, but I found a way to get by with minimal usage of a macro, to make it less ugly. It works without even requiring any nightly features at all. It does restrict the setting to only use std::{mem, num::NonZeroUsize, ptr::NonNull};
trait Iterator {
// (implementation elided here for brevity)
inplace_iterable_defaults!();
type Item;
fn next(&mut self) -> Option<Self::Item>;
}
// ####################################################
// # normal iterators still work #
// ####################################################
impl<I: ?Sized + Iterator> Iterator for Box<I> {
type Item = I::Item;
fn next(&mut self) -> Option<Self::Item> {
(**self).next()
}
}
fn objects_work(mut x: Box<dyn Iterator<Item = i32>>) {
while let Some(_i) = x.next() {
// TODO
}
}
// ####################################################
// # relatively small relevant API #
// ####################################################
// type-erased struct containing pointers from a `vec::IntoIter`
// only for the non-ZST case. Values like `cap` are NOT byte-counts,
// but understood in terms of an item-size that is communicated *separately*
// (in `struct InplaceIterable`)
struct Source {
buf: NonNull<()>,
cap: usize,
// TODO: add more fields as needed
}
// automatically implemented for all iterators
trait GetInplaceIterable<const ALIGN: usize>: Sized {
const INPLACE_ITERABLE: Option<InplaceIterable<ALIGN, Self>>;
}
struct InplaceIterable<const ALIGN: usize, This> {
get_source: fn(&This) -> Source,
size: NonZeroUsize,
expand: NonZeroUsize,
merge: NonZeroUsize,
}
// and a macro
// inplace_iterable!(Type[Params][Constraints] => ident);
// ####################################################
// # EXAMPLE 1: Map<F, I> #
// ####################################################
struct Map<F, I>(F, I);
impl<F: Fn(I::Item) -> R, R, I: Iterator> Iterator for Map<F, I> {
inplace_iterable!(Map[F I][I: Iterator] => inplace_iterable_map);
type Item = R;
fn next(&mut self) -> Option<Self::Item> {
todo!()
}
}
const fn inplace_iterable_map<const ALIGN: usize, F, I>(
) -> Option<InplaceIterable<ALIGN, Map<F, I>>>
where
I: GetInplaceIterable<ALIGN>,
{
let Some(i) = I::INPLACE_ITERABLE else {
return None;
};
Some(InplaceIterable {
get_source: |this| (I::INPLACE_ITERABLE.unwrap().get_source)(&this.1),
size: i.size,
expand: i.expand,
merge: i.merge,
})
}
// ####################################################
// # EXAMPLE 2: vec::IntoIter #
// ####################################################
trait Allocator {}
struct VecIntoIter<T, A: Allocator>(T, A);
impl<T, A: Allocator> Iterator for VecIntoIter<T, A> {
inplace_iterable!(VecIntoIter[T A][A: Allocator] => inplace_iterable_vec);
type Item = T;
fn next(&mut self) -> Option<Self::Item> {
todo!()
}
}
const fn inplace_iterable_vec<const ALIGN: usize, T, A: Allocator>(
) -> Option<InplaceIterable<ALIGN, VecIntoIter<T, A>>> {
if mem::align_of::<T>() != ALIGN {
return None;
}
let Some(size) = NonZeroUsize::new(mem::size_of::<T>()) else {
return None;
};
Some(InplaceIterable {
get_source: |_this| Source {
buf: NonNull::dangling(), // TODO
cap: 0, // TODO
},
size,
expand: NonZeroUsize::MIN,
merge: NonZeroUsize::MIN,
})
}
// ####################################################
// # EXAMPLE 3: Zip #
// ####################################################
struct Zip<A, B>(A, B);
impl<A: Iterator, B: Iterator> Iterator for Zip<A, B> {
inplace_iterable!(Zip[A B][A: Iterator, B: Iterator] => inplace_iterable_zip);
type Item = (A::Item, B::Item);
fn next(&mut self) -> Option<Self::Item> {
todo!()
}
}
const fn inplace_iterable_zip<const ALIGN: usize, A, B>(
) -> Option<InplaceIterable<ALIGN, Zip<A, B>>>
where
A: GetInplaceIterable<ALIGN>,
B: GetInplaceIterable<ALIGN>,
{
let ab = (A::INPLACE_ITERABLE, B::INPLACE_ITERABLE);
let mut choose_left = true;
'choice: {
// tie-breaker when both iterators support in-place iteration
// with an appropriate alignment
//
// the value of `choose_left` will be ignored in the `match` below
// if `ab` isn't `Some(_), Some(_)`,
// but this way of structuring the code avoids repeating `match` arms
let (Some(a), Some(b)) = ab else {
break 'choice;
};
// when the multiplications fail, values have gotten very large
// we fall back to arbitrarily choosing the left side, though
// in-place iteration might be unusable down the line anyway,
// and this case is uncommon to begin with
let Some(pro_a) = a.size.checked_mul(a.merge) else {
break 'choice;
};
let Some(pro_b) = b.size.checked_mul(b.merge) else {
break 'choice;
};
let Some(a_advantage) = pro_a.checked_mul(b.expand) else {
break 'choice;
};
let Some(b_advantage) = pro_b.checked_mul(a.expand) else {
break 'choice;
};
// choose whichever side promises more free space per item consumed
if b_advantage.get() > b_advantage.get() {
choose_left = false;
} else if a_advantage.get() == b_advantage.get() {
// on a tie:
// larger item size makes it more likely the written-back
// values divide the existing capacity without needing re-alloc
choose_left = a.size.get() >= b.size.get();
}
// else we default to the left side
};
Some(match (ab, choose_left) {
((None, None), _) => return None,
((Some(a), None), _) | ((Some(a), _), true) => InplaceIterable {
get_source: |this| (A::INPLACE_ITERABLE.unwrap().get_source)(&this.0),
size: a.size,
expand: a.expand,
merge: a.merge,
},
((_, Some(b)), _) => InplaceIterable {
get_source: |this| (B::INPLACE_ITERABLE.unwrap().get_source)(&this.1),
size: b.size,
expand: b.expand,
merge: b.merge,
},
})
} (playground see the full implementation here) Rough idea how it works: Since neither So the method effectively does define an associated type with a default: its return-type! And that return type can then carry the |
Awesome. That's even on the more readable side of type system crime spectrum, with the comments at least 😆 As-is
Hrrm, with This would be a fairly big refactoring and I'm not even sure yet it'd work (but dealing with Zip is a good sign, that's one of the more complicated adapters). I'll work a narrower fix for the clone problem before trying this. |
(playground)
The above code is analogous to #85322 but applied to
Flatten
instead ofPeekable
: cloning the whole iterator doesn't preserve capacity in the innervec::IntoIter
. (This also applies toFlatMap
.)Introduced in 1.76
cc @the8472, #110353
@rustbot label T-libs, I-unsound, A-iterators
The text was updated successfully, but these errors were encountered: