This article addresses an issue alluded
to
in my previous one. Namely, we sometimes want to break out of the iteration
within fold
, but it is not easy to do that in many languages.
fold
has a problem with short-circuiting.
In defining any
and all
by using fold
, as we did in the previous
article,
we did not short-circuit. But we’d get a better constant factor for our time
complexity if we can break
from those methods once we know the final
outcome.1 Let’s see if we can do this with Ruby (using
the built-in reduce
method):
# `any` using `reduce`; we're going to skip the actual function definition as I
# don't want to distract with Ruby's weird blocks and lambdas.
!![false, true, false, false, false].reduce(false) do |a, e|
puts e ? "Breaking" : "Continuing"
break true if e
a # We'd actually use `each_with_object`, but that's not the point here.
end
# Continuing
# Continuing
# Continuing
# Continuing
# Continuing
# => false
!![false, true, false, false, false].reduce(false) do |a, e|
puts e ? "Breaking" : "Continuing"
break true if e
a
end
# Continuing
# Breaking
# => true
# "Breaking" + no more output means short circuiting. Voila!
In other words, Ruby gives us a version of fold
that allows us to
short-circuit the looping. This isn’t possible in all languages, however.
Consider this JavaScript:
const fold = (array, fn, initialAccumulator) => {
let accumulator =
typeof initialAccumulator === "undefined" ? [] : initialAccumulator;
for (const element of array) {
accumulator = fn(accumulator, element);
}
return accumulator;
};
const any = (array, fn) => {
return fold(
array,
(truthFlag, element) => {
if (fn(element)) {
break true; // This isn't valid syntax.
} else {
return truthFlag;
}
},
false,
);
};
In Ruby, putting break
in a block is pretty powerful. It causes a return
from the function that yield
ed to that block. In other words, it reaches
outside of its own scope and causes the function that called it to return.
What’s more, break
is an expression that can make that outer function
return the value you give it: break true
, for example. If this seems subtle
and in the weeds, the takeaway is that Ruby lets you hit the eject button
from way down inside the scope of the thing2 passed to reduce
during one particular iteration of that function, reach up to the reduce
itself, and make the whole reduce
bail out.
That simply isn’t possible in JavaScript as a break
doesn’t have the power
to cause an outer function to return. The JavaScript interpreter looks at the
break
and thinks we must be talking about the anonymous function it’s
inside of, the one that begins (accumulator, element) =>
. And it says,
“What do you mean, break
? I don’t see any loop inside my scope!” The
interpreter doesn’t take any notice of the fact that the anonymous function
is called from inside a loop. That’s a different scope, and as I said,
JavaScript doesn’t let something inside the anonymous function pop its head
out of its own scope and then tinker with the control flow of the loop it’s
situated inside of, namely the fold
itself.
Think of it this way: imagine we wrote the function separately, like this:
// Assume `fold`, as defined above, is in scope.
const anyHelper = fn => {
// Return a function appropriate for passing to `fold`. We will need to curry
// `anyHelper`, closing over `fn`, so that it matches the function signature
// expected by `fold`'s second argument.
return (truthFlag, element) => {
if (fn(element)) {
break true; // Invalid syntax. But imagine this worked for now.
} else {
return truthFlag;
}
};
};
const any = (array, fn) => {
const curriedAnyHelper = anyHelper(fn);
return fold(array, curriedAnyHelper, false);
};
console.log(any([true, false, false], x => x));
But now somebody comes along and tries to call anyHelper
directly.
anyHelper(x => x)(false, true);
That’s a weird thing to do, to be sure; it’s a pretty useless function
outside of fold
. But let’s go with it. Now break
makes absolutely no
sense. And we don’t really want to have function syntax be valid or invalid
based on its enclosing scope.3 There are several
other considerations weighing against this.4
There are ways to work around that problem.
There are at least two ways to have the effect of breaking out of fold
in
languages where that’s not directly possible. The itertools
library teaches one way, the
fold_while
method. Then the Rust standard library provided a second
approach: try_fold
(which actually led to itertools
’s deprecating
fold_while
, though I think fold_while
is still the best approach in some
circumstances).
fold_while
wraps the accumulator in a Continue
or Done
to interact with looping.
So that we can stick with JavaScript, I will define fold_while
, stolen from
Rust’s itertools
concept, in JS. The upshot is that we will wrap the
accumulator in Continue
or Done
. Before the loop each iterates each time,
it checks to see which wrapper the previous iteration returned. If it’s a
Done
, the loop breaks.
// Our interface for `Continue` and `Done` will wrap a value that can be
// unwrapped with `intoInner`, and also report whether we should continue or
// not.
interface IFoldWhile<T> {
intoInner: () => T;
shouldContinue: () => boolean;
}
// Implement the shared part of our functionality.
abstract class FoldWhileBaseContainer<T> implements Partial<IFoldWhile<T>> {
private value: T;
constructor(value: T) {
this.value = value;
}
intoInner() {
return this.value;
}
}
class FoldWhileContinue<T> extends FoldWhileBaseContainer<T>
implements IFoldWhile<T> {
shouldContinue() {
return true;
}
}
class FoldWhileDone<T> extends FoldWhileBaseContainer<T>
implements IFoldWhile<T> {
shouldContinue() {
return false;
}
}
// Convenience factory helper thingies
const Continue: <T>(value: T) => IFoldWhile<T> = <T>(value: T) =>
new FoldWhileContinue(value);
const Done: <T>(value: T) => IFoldWhile<T> = <T>(value: T) =>
new FoldWhileDone(value);
// Here's the actual `fold` function we're defining. The preceding has been
// setup.
const foldWhile = <T, U>(
array: T[],
fn: (acc: U, el: T) => IFoldWhile<U>,
initialAccumulator: U,
) => {
let accumulator = Continue(
typeof initialAccumulator === "undefined" ? [] : initialAccumulator,
);
for (const element of array) {
// Are we supposed to `Continue`? If we get a `FoldWhileContinue`-wrapped
// value from the previous iteration, we continue with the normal `fold`.
if (accumulator.shouldContinue()) {
const continueValue = accumulator.intoInner() as U;
accumulator = fn(continueValue, element);
// Or are we supposed to be `Done`? This is the key to short-circuiting.
// If we get a `FoldWhileDone`-wrapped value, we `break`.
} else {
break;
}
}
return accumulator;
};
// Now we can define a short-circuiting `any` based on `foldWhile`.
const any = <T>(array: T[], fn: <T>(value: T) => boolean) => {
return foldWhile(
array,
(truthFlag, element) => {
console.log("Running");
return fn(element) ? Done(true) : Continue(false);
},
false,
).intoInner();
};
any([false, false, false, false, true], x => !!x);
// Running
// Running
// Running
// Running
// Running
// => true
any([true, false, false, false, false], x => !!x);
// Running
// => true
The function passed to fold
is no longer trying to alter control flow
directly. Instead, it is passing a message up to foldWhile
, and
foldWhile
is the thing actually calling break
. It’s no longer something
like a goto
; it’s now a message being passed.
We can do the same kind of thing for our “game” from the previous
post,
in which you must collect one tinyBoomer
or two megaBoomers
to win, a
nothing
does nothing and a whammy
makes you lose, assuming you haven’t
yet won.
const tinyBoomer = Symbol("tinyBoomer");
const megaBoomer = Symbol("megaBoomer");
const nothing = Symbol("nothing");
const whammy = Symbol("whammy");
const isWinningGame = gameTape => {
return foldWhile(
gameTape,
(status, event) => {
if (!(status.hasWon || status.hasLost)) {
switch (event) {
case whammy:
status.hasLost = true;
break;
case megaBoomer:
status.hasWon = true;
break;
case tinyBoomer:
status.tinyBoomerCount += 1;
if (status.tinyBoomerCount >= 2) status.hasWon = true;
}
}
return gameOver(status) ? Done(status) : Continue(status);
},
{
hasWon: false,
hasLost: false,
tinyBoomerCount: 0,
},
).intoInner().hasWon;
};
const winner = [tinyBoomer, nothing, nothing, tinyBoomer, whammy];
const loser = [tinyBoomer, nothing, whammy, tinyBoomer, megaBoomer];
isWinningGame(winner);
isWinningGame(loser);
// => true
// => false
This kind of approach is probably the best we can do in JavaScript. But
Rust’s standard library now has a built-in method for breaking out of fold
.
Rust’s try_fold
gives us a way to break folding using the standard library.
Rust’s standard
library
now exposes a try_fold
method on iterators. It automatically breaks when
the closure passed to it returns a None
value. The only problem that raises
is that None
can’t wrap a return value like the itertools
library’s
Done
variant, and so we can’t do the equivalent of Ruby’s break foo
, in
which the syntax we use to cause the looping to quit is capable of returning
a value. That’s why I think itertools
shouldn’t deprecate fold_while
.
Anyway, we can use try_fold
to define any
, as shown below, though it’s
a kind of ugly, awkward solution. As mentioned, try_fold
keeps looping as
long as the accumulator is a Some
type. It breaks the first time it gets a
None
. So we could probably skip the true
s and false
s altogether, and
the fact that None
means true
is confusing. We really are abusing the
method at this point.
// We would actually make this generic over any iterable, but for simplicity's
// sake...
fn any<T>(arr: &Vec<T>, func: fn(&T) -> bool) -> bool {
arr.iter().try_fold(false, |truth_flag, element| {
println!("Running");
if func(element) {
// We need `None` to break iteration, but it actually represents a
// `true` result, which is confusing.
None
} else {
// We don't really need the accumulator at all.
Some(truth_flag)
}
}).is_none()
}
fn main() {
let not_short_circuitable = vec![false, false, false, false, true];
let short_circuitable = vec![true, false, false, false, false];
assert!(any(¬_short_circuitable, |x| *x));
assert!(any(&short_circuitable, |x| *x));
}
This any
stuff was all to prove a point; don’t actually define any
with fold
.
The best way to define any
in a language that uses iteration is something
like this:
const any = (array, fn) => {
for (element of array) {
if (fn(element)) return true;
}
return false;
};
The best way to define any
in a language that uses recursion is something
like this:
const any = ([head, ...tail], fn) =>
head === undefined ? false : fn(head) || any(tail, fn);
❧ ❧ ❧
So what was the point of all that? Well, it took me awhile to work out why
you couldn’t use break
in Rust’s fold
method and JavaScript’s reduce
method. It helps me to consolidate my understanding of things to explain them
to others, as I have tried to do here.
Also, I do reach for fold
a lot, but the inability to break
tends
to be a sign that a different abstraction might be preferable.
That said, I think something like my implementation of isWinningGame
is the
best way to solve that kind of problem, and there break
ing is an important
part of the implementation. I would simply extract a struct or class to
represent the complex “flag” used as the accumulator. People can certainly
disagree with that approach, but I think there’s a lot of value in keeping
the instance of the flag integral to the looping construct outside of which
the flag makes no sense.
For
any
that means once we encounter the first value that causes the anonymous function to return a truthy value, and forall
that means the first value that causes the anonymous function to return a falsey value. ↩︎I use thing advisedy: it’s important that it’s a block rather than a function in Ruby. ↩︎
That’s different than simply introducing a bug based on something going on in the enclosing scope. We can easily do that with a function that closes over an undefined variable or a variable of the wrong type. But that’s a runtime issue. Again, it’s not that the syntax became invalid based on a runtime binding. ↩︎
First, how far up the call stack should
break
be allowed to traverse? Sure, we might say “only one level,” but now we can’t extract a sub-helper out of the function passed tofold
. Second, we are fundamentally messing with agoto
-like construct which is Something to Worry About. (I am aware of JavaScriptbreak
statements withlabel
s, and see how that could help address some of these issues I’m raising. The main reason for not allowingbreak
is, I think, stated in the main text.) ↩︎