|
|
Created:
8 years, 4 months ago by csharp Modified:
8 years, 4 months ago CC:
chromium-reviews, pam+watch_chromium.org, M-A Ruel Base URL:
http://git.chromium.org/chromium/src.git@master Visibility:
Public. |
DescriptionRepeat Failed Tests in Serial
Before considering a test failed in run_test_cases.py try running the failed tests serial, since they may have only failed before because they were conflicting with other tests running at the same time.
BUG=
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=151893
Patch Set 1 : #
Messages
Total messages: 16 (0 generated)
This should fix the flakiness that is current visible on the swarm bots when running net_unittests (The DiskCacheBackendTests don't like running at the same time)
lgtm
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/csharp@chromium.org/10831330/2001
Change committed as 151893
hmm, this seems like it's masking over the problem. all of our tests (with the exception of interactive_ui_tests) be written so that they can run in parallel. what are DiskCacheBackendTests doing that isn't parallelizable?
Sharding supervisor does the same thing right? It retries failed tests at the end serially. +rvargas, who probably know about the cache tests. But there was a thread earlier I believe and the consensus was that it was hard to have those tests use temporary folder for the cache data, or something like that.
On 2012/08/16 18:21:25, nsylvain wrote: > Sharding supervisor does the same thing right? It retries failed tests at the > end serially. > > +rvargas, who probably know about the cache tests. But there was a thread > earlier I believe and the consensus was that it was hard to have those tests use > temporary folder for the cache data, or something like that. In the little digging I did the test were usually failing when they attempted to setup their cache. The output was usually: [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from DiskCacheBackendTest [ RUN ] DiskCacheBackendTest.RecoverRemove [28352:28360:0816/105211:80282438598:WARNING:backend_impl.cc(1916)] Destroying invalid entry. [28352:28360:0816/105211:80282443094:WARNING:backend_impl.cc(1916)] Destroying invalid entry. [28352:28352:0816/105211:80282447120:ERROR:backend_impl.cc(221)] Unable to create cache ../../net/disk_cache/disk_cache_test_base.cc:273: Failure Value of: cb.GetResult(rv) Actual: -2 Expected: net::OK Which is: 0 ../../net/disk_cache/disk_cache_test_base.cc:74: Failure Value of: NULL != cache_ Actual: false Expected: true base::debug::StackTrace::StackTrace() [0x12614ce] base::(anonymous namespace)::StackDumpSignalHandler() [0x1283a09] 0x7fde3b6e8af0 DiskCacheBackendTest::BackendTransaction() [0x6555aa] DiskCacheBackendTest::BackendRecoverRemove() [0x655c7f] testing::Test::Run() [0xd0a9b1] testing::TestInfo::Run() [0xd0aa7a] testing::TestCase::Run() [0xd0abc7] testing::internal::UnitTestImpl::RunAllTests() [0xd0ddad] testing::UnitTest::Run() [0xd079d3] base::TestSuite::Run() [0xd2cf58] main [0x537a0e] 0x7fde3b6d3c4d 0x4160a9
On 2012/08/16 18:21:25, nsylvain wrote: > Sharding supervisor does the same thing right? It retries failed tests at the > end serially. I think that ends up just dealing with general flakiness in general, since flaky tests will usually pass a second time. As opposed to this cache case though, I don't of any tests in browser_tests or content_browsertests that would fail in parallel with other tests. > > +rvargas, who probably know about the cache tests. But there was a thread > earlier I believe and the consensus was that it was hard to have those tests use > temporary folder for the cache data, or something like that. Can you please point me to the other thread? I'm curious to learn more about this.
On 2012/08/16 18:23:18, csharp wrote: > On 2012/08/16 18:21:25, nsylvain wrote: > > Sharding supervisor does the same thing right? It retries failed tests at the > > end serially. > > > > +rvargas, who probably know about the cache tests. But there was a thread > > earlier I believe and the consensus was that it was hard to have those tests > use > > temporary folder for the cache data, or something like that. > > In the little digging I did the test were usually failing when they attempted to > setup their cache. > > The output was usually: > [==========] Running 1 test from 1 test case. > [----------] Global test environment set-up. > [----------] 1 test from DiskCacheBackendTest > [ RUN ] DiskCacheBackendTest.RecoverRemove > [28352:28360:0816/105211:80282438598:WARNING:backend_impl.cc(1916)] Destroying > invalid entry. > [28352:28360:0816/105211:80282443094:WARNING:backend_impl.cc(1916)] Destroying > invalid entry. > [28352:28352:0816/105211:80282447120:ERROR:backend_impl.cc(221)] Unable to > create cache > ../../net/disk_cache/disk_cache_test_base.cc:273: Failure > Value of: cb.GetResult(rv) > Actual: -2 > Expected: net::OK > Which is: 0 > ../../net/disk_cache/disk_cache_test_base.cc:74: Failure > Value of: NULL != cache_ > Actual: false > Expected: true > base::debug::StackTrace::StackTrace() [0x12614ce] > base::(anonymous namespace)::StackDumpSignalHandler() [0x1283a09] > 0x7fde3b6e8af0 > DiskCacheBackendTest::BackendTransaction() [0x6555aa] > DiskCacheBackendTest::BackendRecoverRemove() [0x655c7f] > testing::Test::Run() [0xd0a9b1] > testing::TestInfo::Run() [0xd0aa7a] > testing::TestCase::Run() [0xd0abc7] > testing::internal::UnitTestImpl::RunAllTests() [0xd0ddad] > testing::UnitTest::Run() [0xd079d3] > base::TestSuite::Run() [0xd2cf58] > main [0x537a0e] > 0x7fde3b6d3c4d > 0x4160a9 I see. looks like 139272 is filed for the disk cache tests, which need to be fixed. can we just not run these specific tests in this new system until they're fixed? i'm worried that by doing this workaround in the higher level scripts, more tests that don't run in parallel will sneak through.
It should be possible to modify the new system to separate those tests, but I'm not sure how good that would be to do with the new system. maruel@ probably has better thoughts about this so I'll check with him next week and see what he thinks.
On 2012/08/16 19:13:08, csharp wrote: > It should be possible to modify the new system to separate those tests, but I'm > not sure how good that would be to do with the new system. > > maruel@ probably has better thoughts about this so I'll check with him next week > and see what he thinks. fix on the way for the tests here: http://codereview.chromium.org/10824336
On 2012/08/16 20:12:17, John Abd-El-Malek wrote: > On 2012/08/16 19:13:08, csharp wrote: > > It should be possible to modify the new system to separate those tests, but > I'm > > not sure how good that would be to do with the new system. > > > > maruel@ probably has better thoughts about this so I'll check with him next > week > > and see what he thinks. > > fix on the way for the tests here: http://codereview.chromium.org/10824336 since the fix landed, do we still want to do this change? again, my worry is that we'll miss bugs with tests like this.
On 2012/08/17 15:58:40, John Abd-El-Malek wrote: > On 2012/08/16 20:12:17, John Abd-El-Malek wrote: > > On 2012/08/16 19:13:08, csharp wrote: > > > It should be possible to modify the new system to separate those tests, but > > I'm > > > not sure how good that would be to do with the new system. > > > > > > maruel@ probably has better thoughts about this so I'll check with him next > > week > > > and see what he thinks. > > > > fix on the way for the tests here: http://codereview.chromium.org/10824336 > > since the fix landed, do we still want to do this change? again, my worry is > that we'll miss bugs with tests like this. If we revert this change, should we also modify the sharding_supervisor to not have this behavior as well? Ideally run_test_cases and sharding_supervisor should probably have the same conditions for the tests they run.
Thanks Chris for trying to work around the issue but the previous behavior was intended, I wanted to have the flaky tests being very flaky and have the maximum performance as possible. Technically, the failed tests should added to the queue but that's a separate issue. Thanks John for fixing the tests, that is really really appreciated. Also, this adds yet another retry, raising the number of retries to 4, which is too high at that point IMHO. I don't think we should spend time to modify sharding_supervisor.py, since the way shards are done is significantly different. run_test_case.py hides all inter-tests issues so I think it's preferable to only use it long-term.
On 2012/08/18 01:30:31, Marc-Antoine Ruel wrote: > Thanks Chris for trying to work around the issue but the previous behavior was > intended, I wanted to have the flaky tests being very flaky and have the maximum > performance as possible. Technically, the failed tests should added to the queue > but that's a separate issue. > > Thanks John for fixing the tests, that is really really appreciated. > > Also, this adds yet another retry, raising the number of retries to 4, which is > too high at that point IMHO. > > I don't think we should spend time to modify sharding_supervisor.py, since the > way shards are done is significantly different. run_test_case.py hides all > inter-tests issues so I think it's preferable to only use it long-term. +1 to everything you said. (in reply to csharp's email): rerunning tests just masks flakiness, which we should fix. i don't think sharding_supervisor should do the current behavior, and I'd be happy to see it stop. note that sharding_supervisor is only used on only browser_tests and content_browsertests, all the unit tests, interactive tests, sync, nacl etc all don't rerun failed tests. |