114 points by jicea 1 week ago | 35 comments
OptionOfT 1 week ago
When you write code you have the choice to do per process, per thread, or sequential.
The problem is that doing multiple tests in a shared space doesn't necessarily match the world in which this code is run.
Per process testing allows you to design a test that matches the usage of your codebase. Per thread already constrains you.
For example: we might elect to write a job as a process that runs on demand, and the library we use has a memory leak, but it can't be fixed in reasonable time. Since we write it as a process that gets restarted we manage to constrain the behavior.
Doing multiple tests in multiple threads might not work here as there is a shared space that is retained and isn't representative of real world usage.
Concurrency is a feature of your software that you need to code for. So if you have multiple things happening, then that should be part of your test harness.
The test harness forcing you to think of it isn't always a desirable trait.
That said, I have worked on a codebase where we discovered bugs because the tests were run in parallel, in a shared space.
sunshowers 7 days ago
NewJazz 7 days ago
sunshowers 7 days ago
rad_gruchalski 6 days ago
For me that's a positive bonus. If it runs multiple times in parallel and works, it will work as a single instance deployed in a pod somewhere.
OptionOfT 6 days ago
sunshowers 5 days ago
o11c 6 days ago
* Use multiple processes, but multiple tests per process as well.
* Randomly split and order the tests on every run, to encourage catching flakiness. Print the seed for this as part of the test results for reproducibility.
* Tag your tests a lot (this is one place where, as many languages provide, "test classes" or other grouping is very useful). Smoke tests should run before all other tests, and all run in one process (though still in random order). Known long-running tests should be tagged to use a dedicated process and mostly start early (longest first), except that a few cores should be reserved to work through the fast tests so they can fail early.
* If you need to kill a timed-out test even though other tests are still running in the same process - just kill the process anyway, and automatically run the other tests again.
* Have the harness provide fixtures like "this is a temporary directory, you don't have to worry about clearing it on failure", so tests don't have to worry about cleaning up if killed. Actually, why not just randomly kill a few tests regardless?
I wrote some more about tests here [1], but I'm not sure I'll update it any more because of Github's shitty 2FA-but-only-the-inconvenience-not-the-security requirement.
cortesi 7 days ago
The one thing we've had to be aware of is that the execution model means there can sometimes be differences in behaviour between nextest and cargo test. Very occasionally there are tests that fail in cargo test but succeed in nextest due to better isolation. In practice this just means that we run cargo test in CI.
sunshowers 7 days ago
The behavior differences mean some projects (like wgpu, and nextest itself) only support nextest these days. There's also support for setup scripts which can be used to pre-seed databases and stuff.
marky1991 1 week ago
I'm not actually clear what he means by 'test' to be honest, but I assume he means 'a single test function that can either pass or fail'
Eg in python (nose)
class TestSomething: def test_A(): ... def test_B(): ...
I'm assuming he means test_A. But why not run all of TestSomething in a process?
Honestly, I think the idea of having tests have shared state is bad to begin with (for things that truly matter, eg if the outcome of your test depends on the state of sys.modules, something else is horribly wrong), so I would never make this tradeoff to benefit a scenario that I never think should be done.
Even if we were being absolute purists, this still hasn't solved the problem, the second your process communicates with any other process (or server). And that problem seems largely unsolveable, short of mocking.
Basically, I'm not convinced this is a good tradeoff, because the idea of creating thousands and thousands of processes to run a test suite, even on linux, sounds like a bad idea. (And at work, would definitely be a bad idea, for performance reasons)
saghm 1 week ago
While enforcing no shared state in tests might be useful, that wouldn't be feasible in Rust without adding quite a lot of constraints that would be tough if not impossible to enforce in a drop-in replacement for cargo test. There's certainly room for alternatives in the testing ecosystem in Rust that don't try to maintain compatibility with the built-in test harness, but I don't think the intention of cargo nextest is to try to do that.
One other point that might not be obvious is that right now, there's no stable way to hook into Rust's libtest. The only options to provide an alternative testing harness in Rust are to either only support nightly rather than stable, break compatibility with tests written for the built-in test harness, or provide a separate harness that still supports existing tests. I'm sure there are arguments to be made for each of the other alternatives, but personally, I don't think there's any super realistic chance for adoption of anything that picks the first two options, so the approach cargo nextest is taking the most viable approach available (at least until libtest stabilizes, but it's not obvious when that will happen).
cbarrick 1 week ago
I assume so as well.
Unit testing in Rust is based around functions annotated with #[test], so it's safe to assume that when the author says "test" they are referring to one such function.
It's up to the user to decide what they do in each function. For example, you could do a Go-style table-driven test, but the entire function would be a single "test", _not_ one "test" per table entry.
sunshowers 7 days ago
> Honestly, I think the idea of having tests have shared state is bad to begin with (for things that truly matter, eg if the outcome of your test depends on the state of sys.modules, something else is horribly wrong), so I would never make this tradeoff to benefit a scenario that I never think should be done.
I don't disagree as a matter of principle, but the reality really is different. Some of the first nextest users outside of myself and my workplace were graphical libraries and engines.
> Basically, I'm not convinced this is a good tradeoff, because the idea of creating thousands and thousands of processes to run a test suite, even on linux, sounds like a bad idea. (And at work, would definitely be a bad idea, for performance reasons)
With Rust or other native languages it really is quite fast. With Python I agree not as much, so this tradeoff wouldn't make as much sense there yes.
But note that things like test cancellation are a little easier to do in an interpreted model.
vlovich123 7 days ago
sunshowers 7 days ago
jclulow 1 week ago