| 
 | 
 | 
 Chromium Code Reviews
 Chromium Code Reviews Issue 
            11572039:
    Make HttpNetworkLayer subscribe to power events on Windows.  (Closed)
    
  
    Issue 
            11572039:
    Make HttpNetworkLayer subscribe to power events on Windows.  (Closed) 
  | Created: 8 years ago by Philippe Modified: 6 years, 7 months ago CC: chromium-reviews, cbentzel+watch_chromium.org, darin-cc_chromium.org, pasko-google - do not use, felipeg, digit1, ppi Base URL: svn://svn.chromium.org/chrome/trunk/src Visibility: Public. | DescriptionMake HttpNetworkLayer subscribe to power events on Windows.                                                                                                                                                                                                                                                                               
HttpNetworkLayer unintentionally stopped subscribing for power events 1 year
and 6 months ago (see r86241).
This is not a big problem in practice though since URLRequestJob already
subscribes to power events. The socket pool is also invalidated when the 
network change notifier fires a 'no connection' event after a suspend on
desktop. Another CL will follow to align Android with desktop on this.
This makes HttpNetworkLayer subscribe to power events on Windows which is the 
platform that initially needed it.
   Patch Set 1 #
      Total comments: 4
      
     Patch Set 2 : Address Paul's comments #Patch Set 3 : Update CL according to Ricardo's offline remark #Patch Set 4 : Handle case where SystemMonitor::Get() returns NULL #Messages
    Total messages: 31 (0 generated)
     
 Please see the discussion with willchan on https://chromiumcodereview.appspot.com/11538008/ for more context. I will send the follow up CL to pauljensen and szym. 
 https://codereview.chromium.org/11572039/diff/1/net/http/http_network_layer.h File net/http/http_network_layer.h (right): https://codereview.chromium.org/11572039/diff/1/net/http/http_network_layer.h... net/http/http_network_layer.h:10: #include "base/memory/ref_counted.h" While you're in this file, you could add: #include "base/basictypes.h" #include "base/compiler_specific.h" https://codereview.chromium.org/11572039/diff/1/net/http/http_network_layer.h... net/http/http_network_layer.h:48: const scoped_refptr<HttpNetworkSession> session_; While you're in this file, you could add: DISALLOW_COPY_AND_ASSIGN(HttpNetworkLayer); 
 Thanks Paul. https://chromiumcodereview.appspot.com/11572039/diff/1/net/http/http_network_... File net/http/http_network_layer.h (right): https://chromiumcodereview.appspot.com/11572039/diff/1/net/http/http_network_... net/http/http_network_layer.h:10: #include "base/memory/ref_counted.h" On 2012/12/14 14:19:31, pauljensen wrote: > While you're in this file, you could add: > #include "base/basictypes.h" > #include "base/compiler_specific.h" Sure, done. https://chromiumcodereview.appspot.com/11572039/diff/1/net/http/http_network_... net/http/http_network_layer.h:48: const scoped_refptr<HttpNetworkSession> session_; On 2012/12/14 14:19:31, pauljensen wrote: > While you're in this file, you could add: > DISALLOW_COPY_AND_ASSIGN(HttpNetworkLayer); Done. 
 I don't think Android should fire a network changed event on suspend. I think that the socket pools should be flushed on a suspend event, but don't conflate the two different events. I think that long-term it might be nice to coalesce the suspend flush and the network change flush. And I defer the review to Paul/others. On Fri, Dec 14, 2012 at 6:34 AM, <pliard@chromium.org> wrote: > Thanks Paul. > > > https://chromiumcodereview.appspot.com/11572039/diff/1/net/http/http_network_... > File net/http/http_network_layer.h (right): > > https://chromiumcodereview.appspot.com/11572039/diff/1/net/http/http_network_... > net/http/http_network_layer.h:10: #include "base/memory/ref_counted.h" > > On 2012/12/14 14:19:31, pauljensen wrote: >> >> While you're in this file, you could add: >> #include "base/basictypes.h" >> #include "base/compiler_specific.h" > > > Sure, done. > > https://chromiumcodereview.appspot.com/11572039/diff/1/net/http/http_network_... > > net/http/http_network_layer.h:48: const > scoped_refptr<HttpNetworkSession> session_; > On 2012/12/14 14:19:31, pauljensen wrote: >> >> While you're in this file, you could add: >> DISALLOW_COPY_AND_ASSIGN(HttpNetworkLayer); > > > Done. > > https://chromiumcodereview.appspot.com/11572039/ 
 This change lgtm; it's simply removing some dead code. Could we continue the discussion of NCN signals on suspend elsewhere as it's orthogonal to this CL? Likewise, can we remove the last two sentences of the description on this CL as they're also somewhat unrelated? 
 Yes, we can move the discussion elsewhere. But the socket pool statement should stay because it helps explain why this change is safe, since the flushing is happening at a different layer. The android sentence can be removed. On Fri, Dec 14, 2012 at 2:13 PM, <pauljensen@chromium.org> wrote: > This change lgtm; it's simply removing some dead code. > Could we continue the discussion of NCN signals on suspend elsewhere as it's > orthogonal to this CL? Likewise, can we remove the last two sentences of > the > description on this CL as they're also somewhat unrelated? > > https://codereview.chromium.org/11572039/ 
 Actually, shouldn't we register for notifications instead? One thing that this code was supposed to do was to prevent issuing new requests that race with the suspend notification. That is not covered anywhere else. On 2012/12/14 22:16:44, willchan wrote: > Yes, we can move the discussion elsewhere. But the socket pool > statement should stay because it helps explain why this change is > safe, since the flushing is happening at a different layer. The > android sentence can be removed. > > On Fri, Dec 14, 2012 at 2:13 PM, <mailto:pauljensen@chromium.org> wrote: > > This change lgtm; it's simply removing some dead code. > > Could we continue the discussion of NCN signals on suspend elsewhere as it's > > orthogonal to this CL? Likewise, can we remove the last two sentences of > > the > > description on this CL as they're also somewhat unrelated? > > > > https://codereview.chromium.org/11572039/ 
 So the code has regressed for well over a year, do you think that we need to worry about that case still Ricardo? On Fri, Dec 14, 2012 at 2:18 PM, <rvargas@chromium.org> wrote: > Actually, shouldn't we register for notifications instead? One thing that > this > code was supposed to do was to prevent issuing new requests that race with > the > suspend notification. That is not covered anywhere else. > > > > On 2012/12/14 22:16:44, willchan wrote: >> >> Yes, we can move the discussion elsewhere. But the socket pool >> statement should stay because it helps explain why this change is >> safe, since the flushing is happening at a different layer. The >> android sentence can be removed. > > >> On Fri, Dec 14, 2012 at 2:13 PM, <mailto:pauljensen@chromium.org> wrote: >> > This change lgtm; it's simply removing some dead code. >> > Could we continue the discussion of NCN signals on suspend elsewhere as >> > it's >> > orthogonal to this CL? Likewise, can we remove the last two sentences >> > of >> > the >> > description on this CL as they're also somewhat unrelated? >> > >> > https://codereview.chromium.org/11572039/ > > > > > https://codereview.chromium.org/11572039/ 
 I don't think we have visibility into all of our problems... barely the big ones. I'm sure this is not one of our big problems, but I also don't see the cost of avoiding the race. On 2012/12/14 23:34:17, willchan wrote: > So the code has regressed for well over a year, do you think that we > need to worry about that case still Ricardo? > > On Fri, Dec 14, 2012 at 2:18 PM, <mailto:rvargas@chromium.org> wrote: > > Actually, shouldn't we register for notifications instead? One thing that > > this > > code was supposed to do was to prevent issuing new requests that race with > > the > > suspend notification. That is not covered anywhere else. > > > > > > > > On 2012/12/14 22:16:44, willchan wrote: > >> > >> Yes, we can move the discussion elsewhere. But the socket pool > >> statement should stay because it helps explain why this change is > >> safe, since the flushing is happening at a different layer. The > >> android sentence can be removed. > > > > > >> On Fri, Dec 14, 2012 at 2:13 PM, <mailto:pauljensen@chromium.org> wrote: > >> > This change lgtm; it's simply removing some dead code. > >> > Could we continue the discussion of NCN signals on suspend elsewhere as > >> > it's > >> > orthogonal to this CL? Likewise, can we remove the last two sentences > >> > of > >> > the > >> > description on this CL as they're also somewhat unrelated? > >> > > >> > https://codereview.chromium.org/11572039/ > > > > > > > > > > https://codereview.chromium.org/11572039/ 
 That's a reasonable point. OK. I don't care too much. I defer. On Fri, Dec 14, 2012 at 3:41 PM, <rvargas@chromium.org> wrote: > I don't think we have visibility into all of our problems... barely the big > ones. I'm sure this is not one of our big problems, but I also don't see the > cost of avoiding the race. > > > > On 2012/12/14 23:34:17, willchan wrote: >> >> So the code has regressed for well over a year, do you think that we >> need to worry about that case still Ricardo? > > >> On Fri, Dec 14, 2012 at 2:18 PM, <mailto:rvargas@chromium.org> wrote: >> > Actually, shouldn't we register for notifications instead? One thing >> > that >> > this >> > code was supposed to do was to prevent issuing new requests that race >> > with >> > the >> > suspend notification. That is not covered anywhere else. >> > >> > >> > >> > On 2012/12/14 22:16:44, willchan wrote: >> >> >> >> Yes, we can move the discussion elsewhere. But the socket pool >> >> statement should stay because it helps explain why this change is >> >> safe, since the flushing is happening at a different layer. The >> >> android sentence can be removed. >> > >> > >> >> On Fri, Dec 14, 2012 at 2:13 PM, <mailto:pauljensen@chromium.org> >> >> wrote: >> >> > This change lgtm; it's simply removing some dead code. >> >> > Could we continue the discussion of NCN signals on suspend elsewhere >> >> > as >> >> > it's >> >> > orthogonal to this CL? Likewise, can we remove the last two >> >> > sentences >> >> > of >> >> > the >> >> > description on this CL as they're also somewhat unrelated? >> >> > >> >> > https://codereview.chromium.org/11572039/ >> > >> > >> > >> > >> > https://codereview.chromium.org/11572039/ > > > > > https://codereview.chromium.org/11572039/ 
 I wonder what problems could result from requests proceeding during a suspend-and-resume event. Unfortunately this code predates subversion so I have no log to consult. I do worry that there are ordering problems with the OnResume event. For example when CrOS's NetworkChangeNotifier receives OnResume it fires NetworkChangeNotifier::NotifyObserversOfIPAddressChange. The Windows and Linux NetworkChangeNotifiers also send a flurry of changes during suspend and resume. My concern is that some part of Chrome will attempt contact with a server and get ERR_NETWORK_IO_SUSPENDED if it happens to act before HttpNetworkLayer processes OnResume. It's not the end of the world but its wasteful. 
 On 2012/12/15 00:25:18, pauljensen wrote: > I wonder what problems could result from requests proceeding during a > suspend-and-resume event. Unfortunately this code predates subversion so I have > no log to consult. > > I do worry that there are ordering problems with the OnResume event. For > example when CrOS's NetworkChangeNotifier receives OnResume it fires > NetworkChangeNotifier::NotifyObserversOfIPAddressChange. The Windows and Linux > NetworkChangeNotifiers also send a flurry of changes during suspend and resume. > My concern is that some part of Chrome will attempt contact with a server and > get ERR_NETWORK_IO_SUSPENDED if it happens to act before HttpNetworkLayer > processes OnResume. It's not the end of the world but its wasteful. I removed the last sentence of the CL's description. I believe Paul Jensen is not a net/ owner (yet). So I need Will's or Ricardo's LGTM. 
 lgtm 
 On 2012/12/18 21:22:42, rvargas wrote: > lgtm Thanks guys, I'm submitting. 
 CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/pliard@chromium.org/11572039/15001 
 Retried try job too often on win_aura for step(s) unit_tests 
 On 2012/12/19 10:24:29, I haz the power (commit-bot) wrote: > Retried try job too often on win_aura for step(s) unit_tests It seems that a large number of tests are failing with this change. I'm not really confident with enabling this functionality. What do you think? 
 My like very good 
 On 2012/12/26 09:34:34, Philippe wrote: > On 2012/12/19 10:24:29, I haz the power (commit-bot) wrote: > > Retried try job too often on win_aura for step(s) unit_tests > > It seems that a large number of tests are failing with this change. I'm not > really confident with enabling this functionality. What do you think? Have you debugged the failures? Do they reproduce? What's causing the failures? 
 On 2012/12/26 09:34:34, Philippe wrote: > On 2012/12/19 10:24:29, I haz the power (commit-bot) wrote: > > Retried try job too often on win_aura for step(s) unit_tests > > It seems that a large number of tests are failing with this change. I'm not > really confident with enabling this functionality. What do you think? At this moment there seems to be a (Windows-only) crash that will be caused by this. Because the stack trace has unresolved symbols, I recommend you simply reproduce this crash on your own. Probably the smallest repro case is: net_unittests --gtest_filter=HttpNetworkLayerTest.CreateAndDestroy I'm guessing the crash is because |base::SystemMonitor::Get()| returns NULL. You either need to perform the actions conditionally, or ensure that in test setup the SystemMonitor is created. I tried to find where SystemMonitor is normally created, and other than Android (base/android/.../SystemMonitor.java) and one Mac test (RemovableDeviceNotificationsMacTest), I couldn't find any. 
 On 2012/12/26 14:46:44, szym wrote: > On 2012/12/26 09:34:34, Philippe wrote: > > On 2012/12/19 10:24:29, I haz the power (commit-bot) wrote: > > > Retried try job too often on win_aura for step(s) unit_tests > > > > It seems that a large number of tests are failing with this change. I'm not > > really confident with enabling this functionality. What do you think? > > At this moment there seems to be a (Windows-only) crash that will be caused by > this. Because the stack trace has unresolved symbols, I recommend you simply > reproduce this crash on your own. Probably the smallest repro case is: > net_unittests --gtest_filter=HttpNetworkLayerTest.CreateAndDestroy > > I'm guessing the crash is because |base::SystemMonitor::Get()| returns NULL. You > either need to perform the actions conditionally, or ensure that in test setup > the SystemMonitor is created. > > I tried to find where SystemMonitor is normally created, and other than Android > (base/android/.../SystemMonitor.java) and one Mac test > (RemovableDeviceNotificationsMacTest), I couldn't find any. Indeed SystemMonitor::Get() returning NULL is likely the cause. Unfortunately I didn't find a common testing harness for these failing tests. I think I will conditionally dereference SystemMonitor::Get(). 
 On 2012/12/26 15:05:09, Philippe wrote: > On 2012/12/26 14:46:44, szym wrote: > > On 2012/12/26 09:34:34, Philippe wrote: > > > On 2012/12/19 10:24:29, I haz the power (commit-bot) wrote: > > > > Retried try job too often on win_aura for step(s) unit_tests > > > > > > It seems that a large number of tests are failing with this change. I'm not > > > really confident with enabling this functionality. What do you think? > > > > At this moment there seems to be a (Windows-only) crash that will be caused by > > this. Because the stack trace has unresolved symbols, I recommend you simply > > reproduce this crash on your own. Probably the smallest repro case is: > > net_unittests --gtest_filter=HttpNetworkLayerTest.CreateAndDestroy > > > > I'm guessing the crash is because |base::SystemMonitor::Get()| returns NULL. > You > > either need to perform the actions conditionally, or ensure that in test setup > > the SystemMonitor is created. > > > > I tried to find where SystemMonitor is normally created, and other than > Android > > (base/android/.../SystemMonitor.java) and one Mac test > > (RemovableDeviceNotificationsMacTest), I couldn't find any. I'm really bad at grepping. It turns out that all processes in content/ create it at some point. This makes me concerned that the browser_tests failures are actually caused by ordering in which case adding a condition on |SystemMonitor::Get() != NULL| would be harmful. > Indeed SystemMonitor::Get() returning NULL is likely the cause. Unfortunately I > didn't find a common testing harness for these failing tests. I think I will > conditionally dereference SystemMonitor::Get(). 
 On 2012/12/26 15:17:48, szym wrote: > On 2012/12/26 15:05:09, Philippe wrote: > > On 2012/12/26 14:46:44, szym wrote: > > > On 2012/12/26 09:34:34, Philippe wrote: > > > > On 2012/12/19 10:24:29, I haz the power (commit-bot) wrote: > > > > > Retried try job too often on win_aura for step(s) unit_tests > > > > > > > > It seems that a large number of tests are failing with this change. I'm > not > > > > really confident with enabling this functionality. What do you think? > > > > > > At this moment there seems to be a (Windows-only) crash that will be caused > by > > > this. Because the stack trace has unresolved symbols, I recommend you simply > > > reproduce this crash on your own. Probably the smallest repro case is: > > > net_unittests --gtest_filter=HttpNetworkLayerTest.CreateAndDestroy > > > > > > I'm guessing the crash is because |base::SystemMonitor::Get()| returns NULL. > > You > > > either need to perform the actions conditionally, or ensure that in test > setup > > > the SystemMonitor is created. > > > > > > I tried to find where SystemMonitor is normally created, and other than > > Android > > > (base/android/.../SystemMonitor.java) and one Mac test > > > (RemovableDeviceNotificationsMacTest), I couldn't find any. > > I'm really bad at grepping. It turns out that all processes in content/ create > it at some point. This makes me concerned that the browser_tests failures are > actually caused by ordering in which case adding a condition on > |SystemMonitor::Get() != NULL| would be harmful. > > > Indeed SystemMonitor::Get() returning NULL is likely the cause. Unfortunately > I > > didn't find a common testing harness for these failing tests. I think I will > > conditionally dereference SystemMonitor::Get(). I will see if I can reproduce these crashes manually on Android then. I don't have any Windows environment. 
 This is a #if defined(OS_WIN) change, so not sure how you'd catch that on Android. On Wed, Dec 26, 2012 at 10:29 AM, <pliard@chromium.org> wrote: > On 2012/12/26 15:17:48, szym wrote: > >> On 2012/12/26 15:05:09, Philippe wrote: >> > On 2012/12/26 14:46:44, szym wrote: >> > > On 2012/12/26 09:34:34, Philippe wrote: >> > > > On 2012/12/19 10:24:29, I haz the power (commit-bot) wrote: >> > > > > Retried try job too often on win_aura for step(s) unit_tests >> > > > >> > > > It seems that a large number of tests are failing with this change. >> I'm >> not >> > > > really confident with enabling this functionality. What do you >> think? >> > > >> > > At this moment there seems to be a (Windows-only) crash that will be >> > caused > >> by >> > > this. Because the stack trace has unresolved symbols, I recommend you >> > simply > >> > > reproduce this crash on your own. Probably the smallest repro case is: >> > > net_unittests --gtest_filter=**HttpNetworkLayerTest.** >> CreateAndDestroy >> > > >> > > I'm guessing the crash is because |base::SystemMonitor::Get()| returns >> > NULL. > >> > You >> > > either need to perform the actions conditionally, or ensure that in >> test >> setup >> > > the SystemMonitor is created. >> > > >> > > I tried to find where SystemMonitor is normally created, and other >> than >> > Android >> > > (base/android/.../**SystemMonitor.java) and one Mac test >> > > (**RemovableDeviceNotificationsMa**cTest), I couldn't find any. >> > > I'm really bad at grepping. It turns out that all processes in content/ >> create >> it at some point. This makes me concerned that the browser_tests failures >> are >> actually caused by ordering in which case adding a condition on >> |SystemMonitor::Get() != NULL| would be harmful. >> > > > Indeed SystemMonitor::Get() returning NULL is likely the cause. >> > Unfortunately > >> I >> > didn't find a common testing harness for these failing tests. I think I >> will >> > conditionally dereference SystemMonitor::Get(). >> > > I will see if I can reproduce these crashes manually on Android then. I > don't > have any Windows environment. > > https://chromiumcodereview.**appspot.com/11572039/<https://chromiumcodereview... > 
 
 
 > I'm really bad at grepping. It turns out that all processes in content/ create > it at some point. This makes me concerned that the browser_tests failures are > actually caused by ordering in which case adding a condition on > |SystemMonitor::Get() != NULL| would be harmful. Even though SystemMonitor lives in base, it is up to the application layer to use one or not, so the right thing to do for code at the net layer is indeed to check for NULL. 
 On 2012/12/26 19:30:27, rvargas wrote: > > I'm really bad at grepping. It turns out that all processes in content/ create > > it at some point. This makes me concerned that the browser_tests failures are > > actually caused by ordering in which case adding a condition on > > |SystemMonitor::Get() != NULL| would be harmful. > > Even though SystemMonitor lives in base, it is up to the application layer to > use one or not, so the right thing to do for code at the net layer is indeed to > check for NULL. That's reasonable, but SystemMonitor's lifetime is associated with a different thread, so just adding the condition could really be a data race. 
 On 2012/12/26 19:33:23, szym wrote: > On 2012/12/26 19:30:27, rvargas wrote: > > > I'm really bad at grepping. It turns out that all processes in content/ > create > > > it at some point. This makes me concerned that the browser_tests failures > are > > > actually caused by ordering in which case adding a condition on > > > |SystemMonitor::Get() != NULL| would be harmful. > > > > Even though SystemMonitor lives in base, it is up to the application layer to > > use one or not, so the right thing to do for code at the net layer is indeed > to > > check for NULL. > > That's reasonable, but SystemMonitor's lifetime is associated with a different > thread, so just adding the condition could really be a data race. That sounds like a bug on the handling of the global system monitor. On the one hand, this is what we currently do for the UrlRequestJob notification; on the other hand, browser_main_loop.cc is deleting the monitor before stopping all the named threads so it seems like currently we have a race at shutdown here. 
 Has someone filed a bug about the race? I just noticed that FtpTransactionFactory (which is parent to FtpNetworkLayer) has a similar problem to HttpNetworkLayer. It has a Suspend() function but no callers :( 
 On 2013/01/21 04:07:12, pauljensen wrote: > Has someone filed a bug about the race? > I just noticed that FtpTransactionFactory (which is parent to FtpNetworkLayer) > has a similar problem to HttpNetworkLayer. It has a Suspend() function but no > callers :( I didn't get to come back to this recently. Also, the failures on the bots (when subscribing to power events) still need to be investigated. Could someone take this over? | 
