* bootchooser: constant decrement of remaining_attempts @ 2018-10-09 16:37 Patrick Huesmann 2018-10-10 7:14 ` Sascha Hauer 0 siblings, 1 reply; 5+ messages in thread From: Patrick Huesmann @ 2018-10-09 16:37 UTC (permalink / raw) To: barebox Hi, I'm building a RAUC- & bootchooser-based firmware update solution. The scenario is symmetric rootfs slots, manual update, userspace (RAUC) marks as good. It seems to work well, however I noticed that it's always decrementing and resetting the remaining_attempts of the booted system, not only when there's an update, but also during regular boot-ups. I thought that the RAUC/bootchooser combo was mainly about providing a safeguard against accidentally "bricking" the system with corrupt or incomplete firmware updates. However, the logic of decrementing and later resetting the remaining_attempts is apparently not limited to the period between performing the update and the validation (mark as good) of that update, but also running all the other times the system is booted. This can have some undesirable side effects: 1) When the boot process is interrupted for any reason (power issues, brown-out resets, users unplugging the gadget while it boots, etc.) more than three times in a row (assuming a remaining_attempts reset value of 3), then bootchooser will happily switch to the fall-back target, even though there's nothing wrong with the actual target at all. I guess this can be worked around by syncing the fall-back target to the last updated one, after the last update has been verified as good. However this brings additional cost & complexity, and feels more like a hack than a proper solution. 2) In every complete boot cycle, there are two writes to the barebox-state partition (bootchooser decrementing the remaining_attempts, then userspace resetting the remaining_attempts when it marks the target as good). For systems that boot up & power down a lot, this will generate lots of unnecessary flash writes over time. Probably it won't be enough to actually wear out the flash, but still it doesn't "feel" quite right. (I jumped through hoops to have a proper read-only root and would like to limit the overall number of flash writes when possible). I'm thinking of an option that limits the remaining_attempts logic to the phase when barebox attempts to boot a newly flashed update, until that update is marked as good later in userspace. There could be an extra (optional) variable in the barebox-state, that allows the userspace to deliberately enable/disable the remaining_attempts logic in barebox. Is such a option already available? Or if it's not, would patches introducing that option be accepted upstream? Or am I thinking totally wrong here and this goes completely against the whole bootchooser design? Best regards, Patrick _______________________________________________ barebox mailing list barebox@lists.infradead.org http://lists.infradead.org/mailman/listinfo/barebox ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: bootchooser: constant decrement of remaining_attempts 2018-10-09 16:37 bootchooser: constant decrement of remaining_attempts Patrick Huesmann @ 2018-10-10 7:14 ` Sascha Hauer 2018-10-10 8:06 ` Jan Lübbe 0 siblings, 1 reply; 5+ messages in thread From: Sascha Hauer @ 2018-10-10 7:14 UTC (permalink / raw) To: Patrick Huesmann; +Cc: barebox, Enrico Joerns Hi Patrick, +Cc Jan and Enrico On Tue, Oct 09, 2018 at 06:37:25PM +0200, Patrick Huesmann wrote: > Hi, > > I'm building a RAUC- & bootchooser-based firmware update solution. > The scenario is symmetric rootfs slots, manual update, userspace > (RAUC) marks as good. > It seems to work well, however I noticed that it's always decrementing > and resetting the remaining_attempts of the booted system, not only > when there's an update, but also during regular boot-ups. > > I thought that the RAUC/bootchooser combo was mainly about providing a > safeguard against accidentally "bricking" the system with corrupt or > incomplete firmware updates. > However, the logic of decrementing and later resetting the > remaining_attempts is apparently not limited to the period between > performing the update and the validation (mark as good) of that > update, but also running all the other times the system is booted. > > This can have some undesirable side effects: > > 1) When the boot process is interrupted for any reason (power issues, > brown-out resets, users unplugging the gadget while it boots, etc.) > more than three times in a row (assuming a remaining_attempts reset > value of 3), then bootchooser will happily switch to the fall-back > target, even though there's nothing wrong with the actual target at > all. I think what you want here is the global.bootchooser.reset_attempts=power-on option. With this option bootchooser will reset the remaining attempts to the default value with each power on reset, meaning that the primary target will only become invalid when the watchdog bites you three times in a row, but not when the device is turned off in between. > I guess this can be worked around by syncing the fall-back target to > the last updated one, after the last update has been verified as good. > However this brings additional cost & complexity, and feels more like > a hack than a proper solution. > > 2) In every complete boot cycle, there are two writes to the > barebox-state partition (bootchooser decrementing the > remaining_attempts, then userspace resetting the remaining_attempts > when it marks the target as good). For systems that boot up & power > down a lot, this will generate lots of unnecessary flash writes over > time. Probably it won't be enough to actually wear out the flash, but > still it doesn't "feel" quite right. (I jumped through hoops to have a > proper read-only root and would like to limit the overall number of > flash writes when possible). > > I'm thinking of an option that limits the remaining_attempts logic to > the phase when barebox attempts to boot a newly flashed update, until > that update is marked as good later in userspace. There could be an > extra (optional) variable in the barebox-state, that allows the > userspace to deliberately enable/disable the remaining_attempts logic > in barebox. I don't think such an option is available at the moment. Maybe we could declare remaining_attempts=INT_MAX as infinite attempts. Whenever that value is found the remaining_attempts counter wouldn't be decreased. After an update userspace could then set the remaining_attempts counter of the new system to three and the new system would set it to INT_MAX when successfully booted. What do you think? Sascha -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | _______________________________________________ barebox mailing list barebox@lists.infradead.org http://lists.infradead.org/mailman/listinfo/barebox ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: bootchooser: constant decrement of remaining_attempts 2018-10-10 7:14 ` Sascha Hauer @ 2018-10-10 8:06 ` Jan Lübbe 2018-10-10 11:40 ` Patrick Huesmann 0 siblings, 1 reply; 5+ messages in thread From: Jan Lübbe @ 2018-10-10 8:06 UTC (permalink / raw) To: Sascha Hauer, Patrick Huesmann; +Cc: barebox, Enrico Joerns On Wed, 2018-10-10 at 09:14 +0200, Sascha Hauer wrote: > Hi Patrick, > > +Cc Jan and Enrico > > On Tue, Oct 09, 2018 at 06:37:25PM +0200, Patrick Huesmann wrote: > > Hi, > > > > I'm building a RAUC- & bootchooser-based firmware update solution. > > The scenario is symmetric rootfs slots, manual update, userspace > > (RAUC) marks as good. > > It seems to work well, however I noticed that it's always decrementing > > and resetting the remaining_attempts of the booted system, not only > > when there's an update, but also during regular boot-ups. > > > > I thought that the RAUC/bootchooser combo was mainly about providing a > > safeguard against accidentally "bricking" the system with corrupt or > > incomplete firmware updates. > > However, the logic of decrementing and later resetting the > > remaining_attempts is apparently not limited to the period between > > performing the update and the validation (mark as good) of that > > update, but also running all the other times the system is booted. > > > > This can have some undesirable side effects: > > > > 1) When the boot process is interrupted for any reason (power issues, > > brown-out resets, users unplugging the gadget while it boots, etc.) > > more than three times in a row (assuming a remaining_attempts reset > > value of 3), then bootchooser will happily switch to the fall-back > > target, even though there's nothing wrong with the actual target at > > all. > > I think what you want here is the global.bootchooser.reset_attempts=power-on > option. With this option bootchooser will reset the remaining attempts > to the default value with each power on reset, meaning that the primary > target will only become invalid when the watchdog bites you three times > in a row, but not when the device is turned off in between. If you want to ensure that the old system is not booted anymore, you should set its priority to zero. Leaving the old image enabled is useful in cases where you want to protect against problems that occur some time after the update. > > I guess this can be worked around by syncing the fall-back target to > > the last updated one, after the last update has been verified as good. > > However this brings additional cost & complexity, and feels more like > > a hack than a proper solution. > > > > 2) In every complete boot cycle, there are two writes to the > > barebox-state partition (bootchooser decrementing the > > remaining_attempts, then userspace resetting the remaining_attempts > > when it marks the target as good). For systems that boot up & power > > down a lot, this will generate lots of unnecessary flash writes over > > time. Probably it won't be enough to actually wear out the flash, but > > still it doesn't "feel" quite right. (I jumped through hoops to have a > > proper read-only root and would like to limit the overall number of > > flash writes when possible). > > > > I'm thinking of an option that limits the remaining_attempts logic to > > the phase when barebox attempts to boot a newly flashed update, until > > that update is marked as good later in userspace. There could be an > > extra (optional) variable in the barebox-state, that allows the > > userspace to deliberately enable/disable the remaining_attempts logic > > in barebox. > > I don't think such an option is available at the moment. Maybe we could > declare remaining_attempts=INT_MAX as infinite attempts. Whenever that > value is found the remaining_attempts counter wouldn't be decreased. > > After an update userspace could then set the remaining_attempts counter > of the new system to three and the new system would set it to INT_MAX > when successfully booted. INT_MAX would need to be relative to the actual type defined in the state variable (u32 vs. u8). An alternative would be to have a global flag to en-/disable counting. Currently there is one way to avoid writes to flash in the successful case: Use a watchdog and call bootchooser -s, which will cause it to skip decrementing if it boots the same target as previously. It seems that's not documented, though. :/ Regards, Jan -- Pengutronix e.K. | | Industrial Linux Solutions | http://www.pengutronix.de/ | Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | _______________________________________________ barebox mailing list barebox@lists.infradead.org http://lists.infradead.org/mailman/listinfo/barebox ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: bootchooser: constant decrement of remaining_attempts 2018-10-10 8:06 ` Jan Lübbe @ 2018-10-10 11:40 ` Patrick Huesmann 2018-10-11 8:25 ` Patrick Huesmann 0 siblings, 1 reply; 5+ messages in thread From: Patrick Huesmann @ 2018-10-10 11:40 UTC (permalink / raw) To: barebox; +Cc: ejo Hi, Am Mi., 10. Okt. 2018 um 10:06 Uhr schrieb Jan Lübbe <jlu@pengutronix.de>: > > INT_MAX would need to be relative to the actual type defined in the > state variable (u32 vs. u8). An alternative would be to have a global > flag to en-/disable counting. > > Currently there is one way to avoid writes to flash in the successful > case: Use a watchdog and call bootchooser -s, which will cause it to > skip decrementing if it boots the same target as previously. It seems > that's not documented, though. :/ "bootchooser -s" seems to do the trick, so I will just use that. Pretty nice, as I don't have to change vanilla barebox, but can accomplish the desired behavior with a custom bootstate variable ("disable_counting") and 3 lines of barebox scripting. #!/bin/sh if [ "${state.bootstate.disable_counting}" = "1" ]; then bootchooser -s fi boot bootchooser Now I just have to patch RAUC to reset bootstate.disable_counting after flash upgrade and set it again after verification, which will be straight forward. Thanks! Am Mi., 10. Okt. 2018 um 10:06 Uhr schrieb Jan Lübbe <jlu@pengutronix.de>: > > On Wed, 2018-10-10 at 09:14 +0200, Sascha Hauer wrote: > > Hi Patrick, > > > > +Cc Jan and Enrico > > > > On Tue, Oct 09, 2018 at 06:37:25PM +0200, Patrick Huesmann wrote: > > > Hi, > > > > > > I'm building a RAUC- & bootchooser-based firmware update solution. > > > The scenario is symmetric rootfs slots, manual update, userspace > > > (RAUC) marks as good. > > > It seems to work well, however I noticed that it's always decrementing > > > and resetting the remaining_attempts of the booted system, not only > > > when there's an update, but also during regular boot-ups. > > > > > > I thought that the RAUC/bootchooser combo was mainly about providing a > > > safeguard against accidentally "bricking" the system with corrupt or > > > incomplete firmware updates. > > > However, the logic of decrementing and later resetting the > > > remaining_attempts is apparently not limited to the period between > > > performing the update and the validation (mark as good) of that > > > update, but also running all the other times the system is booted. > > > > > > This can have some undesirable side effects: > > > > > > 1) When the boot process is interrupted for any reason (power issues, > > > brown-out resets, users unplugging the gadget while it boots, etc.) > > > more than three times in a row (assuming a remaining_attempts reset > > > value of 3), then bootchooser will happily switch to the fall-back > > > target, even though there's nothing wrong with the actual target at > > > all. > > > > I think what you want here is the global.bootchooser.reset_attempts=power-on > > option. With this option bootchooser will reset the remaining attempts > > to the default value with each power on reset, meaning that the primary > > target will only become invalid when the watchdog bites you three times > > in a row, but not when the device is turned off in between. > > If you want to ensure that the old system is not booted anymore, you > should set its priority to zero. > > Leaving the old image enabled is useful in cases where you want to > protect against problems that occur some time after the update. > > > > I guess this can be worked around by syncing the fall-back target to > > > the last updated one, after the last update has been verified as good. > > > However this brings additional cost & complexity, and feels more like > > > a hack than a proper solution. > > > > > > 2) In every complete boot cycle, there are two writes to the > > > barebox-state partition (bootchooser decrementing the > > > remaining_attempts, then userspace resetting the remaining_attempts > > > when it marks the target as good). For systems that boot up & power > > > down a lot, this will generate lots of unnecessary flash writes over > > > time. Probably it won't be enough to actually wear out the flash, but > > > still it doesn't "feel" quite right. (I jumped through hoops to have a > > > proper read-only root and would like to limit the overall number of > > > flash writes when possible). > > > > > > I'm thinking of an option that limits the remaining_attempts logic to > > > the phase when barebox attempts to boot a newly flashed update, until > > > that update is marked as good later in userspace. There could be an > > > extra (optional) variable in the barebox-state, that allows the > > > userspace to deliberately enable/disable the remaining_attempts logic > > > in barebox. > > > > I don't think such an option is available at the moment. Maybe we could > > declare remaining_attempts=INT_MAX as infinite attempts. Whenever that > > value is found the remaining_attempts counter wouldn't be decreased. > > > > After an update userspace could then set the remaining_attempts counter > > of the new system to three and the new system would set it to INT_MAX > > when successfully booted. > > INT_MAX would need to be relative to the actual type defined in the > state variable (u32 vs. u8). An alternative would be to have a global > flag to en-/disable counting. > > Currently there is one way to avoid writes to flash in the successful > case: Use a watchdog and call bootchooser -s, which will cause it to > skip decrementing if it boots the same target as previously. It seems > that's not documented, though. :/ > > Regards, > Jan > -- > Pengutronix e.K. | | > Industrial Linux Solutions | http://www.pengutronix.de/ | > Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | > Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 | _______________________________________________ barebox mailing list barebox@lists.infradead.org http://lists.infradead.org/mailman/listinfo/barebox ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: bootchooser: constant decrement of remaining_attempts 2018-10-10 11:40 ` Patrick Huesmann @ 2018-10-11 8:25 ` Patrick Huesmann 0 siblings, 0 replies; 5+ messages in thread From: Patrick Huesmann @ 2018-10-11 8:25 UTC (permalink / raw) To: barebox; +Cc: ejo I've created a pull request for RAUC, if anyone needs this functionality: https://github.com/rauc/rauc/pull/346 _______________________________________________ barebox mailing list barebox@lists.infradead.org http://lists.infradead.org/mailman/listinfo/barebox ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-10-11 8:25 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-10-09 16:37 bootchooser: constant decrement of remaining_attempts Patrick Huesmann 2018-10-10 7:14 ` Sascha Hauer 2018-10-10 8:06 ` Jan Lübbe 2018-10-10 11:40 ` Patrick Huesmann 2018-10-11 8:25 ` Patrick Huesmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox