Hacker Remix

Why Rust nextest is process-per-test

114 points by jicea 1 week ago | 35 comments

OptionOfT 1 week ago

I prefer per process over the alternatives.

When you write code you have the choice to do per process, per thread, or sequential.

The problem is that doing multiple tests in a shared space doesn't necessarily match the world in which this code is run.

Per process testing allows you to design a test that matches the usage of your codebase. Per thread already constrains you.

For example: we might elect to write a job as a process that runs on demand, and the library we use has a memory leak, but it can't be fixed in reasonable time. Since we write it as a process that gets restarted we manage to constrain the behavior.

Doing multiple tests in multiple threads might not work here as there is a shared space that is retained and isn't representative of real world usage.

Concurrency is a feature of your software that you need to code for. So if you have multiple things happening, then that should be part of your test harness.

The test harness forcing you to think of it isn't always a desirable trait.

That said, I have worked on a codebase where we discovered bugs because the tests were run in parallel, in a shared space.

sunshowers 7 days ago

There is definitely some value in shaking out bugs by running code in parallel in the same process space — someone on Lobsters brought this up too. I've been wondering if there's some kind of optional feature that can be enabled here.

NewJazz 7 days ago

Can you do it deliberately in the test?

sunshowers 7 days ago

Yes, you can write your own meta-test that runs all of the actual tests in separate threads. But it's a bit inconvenient and you won't necessarily get separate reporting.

rad_gruchalski 6 days ago

> The problem is that doing multiple tests in a shared space doesn't necessarily match the world in which this code is run.

For me that's a positive bonus. If it runs multiple times in parallel and works, it will work as a single instance deployed in a pod somewhere.

OptionOfT 6 days ago

I get that, but if it doesn't work, do you spend the time on a use-case that doesn't exist?

sunshowers 5 days ago

Totally depends on if it's a use case you care about. Things like libraries vs leaf-node applications also play a factor.

o11c 6 days ago

A much better model still is a mixture.

* Use multiple processes, but multiple tests per process as well.

* Randomly split and order the tests on every run, to encourage catching flakiness. Print the seed for this as part of the test results for reproducibility.

* Tag your tests a lot (this is one place where, as many languages provide, "test classes" or other grouping is very useful). Smoke tests should run before all other tests, and all run in one process (though still in random order). Known long-running tests should be tagged to use a dedicated process and mostly start early (longest first), except that a few cores should be reserved to work through the fast tests so they can fail early.

* If you need to kill a timed-out test even though other tests are still running in the same process - just kill the process anyway, and automatically run the other tests again.

* Have the harness provide fixtures like "this is a temporary directory, you don't have to worry about clearing it on failure", so tests don't have to worry about cleaning up if killed. Actually, why not just randomly kill a few tests regardless?

I wrote some more about tests here [1], but I'm not sure I'll update it any more because of Github's shitty 2FA-but-only-the-inconvenience-not-the-security requirement.

[1]: https://gist.github.com/o11c/ef8f0886d5967dfebc3d

cortesi 7 days ago

Nextest is one of the very small handful of tools I use dozens or hundreds of times a day. Parallelism can reduce test suite execution time significantly, depending on your project, and has saved me cumulative days of my life. The output is nicer, test filtering is nicer, leak detection is great, and the developer is friendly and responsive. Thanks sunshowers!

The one thing we've had to be aware of is that the execution model means there can sometimes be differences in behaviour between nextest and cargo test. Very occasionally there are tests that fail in cargo test but succeed in nextest due to better isolation. In practice this just means that we run cargo test in CI.

sunshowers 7 days ago

Thank you for the kind words!

The behavior differences mean some projects (like wgpu, and nextest itself) only support nextest these days. There's also support for setup scripts which can be used to pre-seed databases and stuff.

marky1991 1 week ago

I don't understand why he jumps straight from 'one test per process' to 'one test per thread' as the alternative.

I'm not actually clear what he means by 'test' to be honest, but I assume he means 'a single test function that can either pass or fail'

Eg in python (nose)

class TestSomething: def test_A(): ... def test_B(): ...

I'm assuming he means test_A. But why not run all of TestSomething in a process?

Honestly, I think the idea of having tests have shared state is bad to begin with (for things that truly matter, eg if the outcome of your test depends on the state of sys.modules, something else is horribly wrong), so I would never make this tradeoff to benefit a scenario that I never think should be done.

Even if we were being absolute purists, this still hasn't solved the problem, the second your process communicates with any other process (or server). And that problem seems largely unsolveable, short of mocking.

Basically, I'm not convinced this is a good tradeoff, because the idea of creating thousands and thousands of processes to run a test suite, even on linux, sounds like a bad idea. (And at work, would definitely be a bad idea, for performance reasons)

saghm 1 week ago

I think most of the context that might explain your confusion is the way that tests work out of the box in Rust. The default test harness when invoking `cargo test` runs one test per thread (and by default parallelizes based on the number of cores available, although this is configurable with a flag). In Rust, there isn't any equivalent to the `TestSomething` class you give; each test is always a top-level function. Since `cargo nextest` is a mostly drop-in replacement for `cargo test`, I imagine the author is using one test per thread as an alternative because that's the paradigm that users will be switching from if they start using cargo nextest.

While enforcing no shared state in tests might be useful, that wouldn't be feasible in Rust without adding quite a lot of constraints that would be tough if not impossible to enforce in a drop-in replacement for cargo test. There's certainly room for alternatives in the testing ecosystem in Rust that don't try to maintain compatibility with the built-in test harness, but I don't think the intention of cargo nextest is to try to do that.

One other point that might not be obvious is that right now, there's no stable way to hook into Rust's libtest. The only options to provide an alternative testing harness in Rust are to either only support nightly rather than stable, break compatibility with tests written for the built-in test harness, or provide a separate harness that still supports existing tests. I'm sure there are arguments to be made for each of the other alternatives, but personally, I don't think there's any super realistic chance for adoption of anything that picks the first two options, so the approach cargo nextest is taking the most viable approach available (at least until libtest stabilizes, but it's not obvious when that will happen).

cbarrick 1 week ago

> I'm not actually clear what he means by 'test' to be honest, but I assume he means 'a single test function that can either pass or fail'

I assume so as well.

Unit testing in Rust is based around functions annotated with #[test], so it's safe to assume that when the author says "test" they are referring to one such function.

It's up to the user to decide what they do in each function. For example, you could do a Go-style table-driven test, but the entire function would be a single "test", _not_ one "test" per table entry.

sunshowers 7 days ago

FYI I use they/she pronouns (thanks jclulow!)

> Honestly, I think the idea of having tests have shared state is bad to begin with (for things that truly matter, eg if the outcome of your test depends on the state of sys.modules, something else is horribly wrong), so I would never make this tradeoff to benefit a scenario that I never think should be done.

I don't disagree as a matter of principle, but the reality really is different. Some of the first nextest users outside of myself and my workplace were graphical libraries and engines.

> Basically, I'm not convinced this is a good tradeoff, because the idea of creating thousands and thousands of processes to run a test suite, even on linux, sounds like a bad idea. (And at work, would definitely be a bad idea, for performance reasons)

With Rust or other native languages it really is quite fast. With Python I agree not as much, so this tradeoff wouldn't make as much sense there yes.

But note that things like test cancellation are a little easier to do in an interpreted model.

vlovich123 7 days ago

Have you considered having pre warmed zygote processes hot that you then fork for each test rather than launching the process from scratch? That might mitigate the performance issue even more since less has to be initialized.

sunshowers 7 days ago

Zygote processes for the test binary itself, you mean? That will probably require some coordination from the test because fork is not compatible with most Rust code. Nextest is designed to work with arbitrary Rust test binaries, though it imposes some conditions on them.

jclulow 1 week ago

NB: the author's pronouns are they/she, not he.