flush-cache can cause files affected by redo-stamp to get rebuilt unnecessarily, which the test is specifically trying to validate. Since other tests run flush-cache at random times when using -j, this would cause random test failures.
If must_build was nonempty when recursively calling isdirty() that returned a list, we'd lose the original value of must_build.