flush-cache can cause files affected by redo-stamp to get rebuilt unnecessarily, which the test is specifically trying to validate. Since other tests run flush-cache at random times when using -j, this would cause random test failures.
It was getting way too ad-hoc in there. Let's reorganize the tests so that there's a good, obvious, suggested sequence to run them in.