* [PATCH v3] Documentation: add watchdog documentation
@ 2019-09-24 7:54 Oleksij Rempel
2019-09-25 10:09 ` Sascha Hauer
0 siblings, 1 reply; 7+ messages in thread
From: Oleksij Rempel @ 2019-09-24 7:54 UTC (permalink / raw)
To: barebox; +Cc: Oleksij Rempel
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
---
Documentation/user/user-manual.rst | 1 +
Documentation/user/watchdog.rst | 116 +++++++++++++++++++++++++++++
2 files changed, 117 insertions(+)
create mode 100644 Documentation/user/watchdog.rst
diff --git a/Documentation/user/user-manual.rst b/Documentation/user/user-manual.rst
index f04981c3f0..41fdb8805c 100644
--- a/Documentation/user/user-manual.rst
+++ b/Documentation/user/user-manual.rst
@@ -34,6 +34,7 @@ Contents:
state
random
debugging
+ watchdog
* :ref:`search`
* :ref:`genindex`
diff --git a/Documentation/user/watchdog.rst b/Documentation/user/watchdog.rst
new file mode 100644
index 0000000000..87c63aa078
--- /dev/null
+++ b/Documentation/user/watchdog.rst
@@ -0,0 +1,116 @@
+Watchdog Support
+================
+
+Warnings and Design Consideration
+---------------------------------
+
+A watchdog is the last line of defense on misbehaving systems. Thus, proper
+hardware and watchdog design considerations should be made to be able to reduce
+the impact of failing systems in the field. In the best case, the bootloader
+should not touch it at all. No watchdog feeding should be done until
+application-critical software (or a userspace service manager such as
+'systemd') was started.
+
+In case the bootloader is responsible for watchdog activation, the system can
+be considered as failed by design. The following threats can affect the system
+which are mostly addressable by properly designed watchdog and watchdog
+strategy:
+
+- software-based misconfigurations or bugs prevent the system from starting.
+- glitches caused by under-voltage, inappropriate power-on sequence or noisy
+ power supply.
+- physical damages caused by humidity, vibration or temperature.
+- temperature-based misbehavior of the system, e.g. clock is not running or
+ running with wrong frequency.
+- chemical reactions, e.g. some clock crystals will stop to work in contact
+ with Helium, see for example:
+ https://ifixit.org/blog/11986/iphones-are-allergic-to-helium/
+- failed storage prevents booting. NAND, SD, SSD, HDD, SPI-flash all of this
+ some day stop to work because their read/write cycles are exceeded.
+
+In all these cases, the bootloader won't be able to start and a properly
+designed watchdog may take some action. For example: recover the system by
+resetting it, or power it off to reduce the damage.
+
+Barebox Watchdog Functionality
+------------------------------
+
+Nevertheless, in some cases we are not able to influence the hardware design
+anymore or while developing one needs to be able to feed the watchdog to
+disable it from within the bootloader. For these scenarios barebox provides the
+watchdog framework with the following functionality and at least
+``CONFIG_WATCHDOG`` should be enabled:
+
+Polling
+~~~~~~~
+
+Watchdog polling/feeding allows to feed the watchdog and keep it running on one
+side and to not reset the system on the other side. It is needed on hardware
+with short-time watchdogs. For example the Atheros ar9331 watchdog has a
+maximal timeout of 7 seconds, so it may reset even on netboot.
+Or it can be used on systems where the watchdog is already running and can't be
+disabled, an example for that is the watchdog of the i.MX2 series.
+This functionally can be seen as a threat, since in error cases barebox will
+continue to feed the watchdog even if that is not desired. So, depending on
+your needs ``CONFIG_WATCHDOG_POLLER`` can be enabled or disabled at compile
+time. Even if barebox was built with watchdog polling support, it is not
+enabled by default. To start polling from command line run:
+
+.. code-block:: console
+
+ wdog0.autoping=1
+
+The poller interval is not configurable, but fixed at 500ms and the watchdog
+timeout is configured by default to the maximum of the supported values by
+hardware. To change the timeout used by the poller, run:
+
+.. code-block:: console
+
+ wdog0.timeout_cur=7
+
+To read the current watchdog's configuration, run:
+
+.. code-block:: console
+
+ devinfo wdog0
+
+The output may look as follows where ``timeout_cur`` and ``timeout_max`` are
+measured in seconds:
+
+.. code-block:: console
+
+ barebox@DPTechnics DPT-Module:/ devinfo wdog0
+ Parameters:
+ autoping: 1 (type: bool)
+ timeout_cur: 7 (type: uint32)
+ timeout_max: 10 (type: uint32)
+
+Use barebox' environment to persist these changes between reboots:
+
+.. code-block:: console
+
+ nv dev.wdog0.autoping=1
+ nv dev.wdog0.timeout_cur=7
+
+Boot Watchdog Timeout
+~~~~~~~~~~~~~~~~~~~~~
+
+With this functionality barebox may start a watchdog or update the timeout of
+an already-running one, just before kicking the boot image. It can be
+configured temporarily via
+
+.. code-block:: console
+
+ global boot.watchdog_timeout=10
+
+or persistently by
+
+.. code-block:: console
+
+ nv boot.watchdog_timeout=10
+
+where the used value again is measured in seconds.
+
+On a system with multiple watchdogs, only the first one (wdog0) is affected by
+the ``boot.watchdog_timeout`` parameter.
+
--
2.23.0
_______________________________________________
barebox mailing list
barebox@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/barebox
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] Documentation: add watchdog documentation
2019-09-24 7:54 [PATCH v3] Documentation: add watchdog documentation Oleksij Rempel
@ 2019-09-25 10:09 ` Sascha Hauer
0 siblings, 0 replies; 7+ messages in thread
From: Sascha Hauer @ 2019-09-25 10:09 UTC (permalink / raw)
To: Oleksij Rempel; +Cc: barebox
On Tue, Sep 24, 2019 at 09:54:41AM +0200, Oleksij Rempel wrote:
> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
> ---
> Documentation/user/user-manual.rst | 1 +
> Documentation/user/watchdog.rst | 116 +++++++++++++++++++++++++++++
> 2 files changed, 117 insertions(+)
> create mode 100644 Documentation/user/watchdog.rst
>
> diff --git a/Documentation/user/user-manual.rst b/Documentation/user/user-manual.rst
> index f04981c3f0..41fdb8805c 100644
> --- a/Documentation/user/user-manual.rst
> +++ b/Documentation/user/user-manual.rst
> @@ -34,6 +34,7 @@ Contents:
> state
> random
> debugging
> + watchdog
>
> * :ref:`search`
> * :ref:`genindex`
> diff --git a/Documentation/user/watchdog.rst b/Documentation/user/watchdog.rst
> new file mode 100644
> index 0000000000..87c63aa078
> --- /dev/null
> +++ b/Documentation/user/watchdog.rst
> @@ -0,0 +1,116 @@
> +Watchdog Support
> +================
> +
> +Warnings and Design Consideration
> +---------------------------------
> +
> +A watchdog is the last line of defense on misbehaving systems. Thus, proper
> +hardware and watchdog design considerations should be made to be able to reduce
> +the impact of failing systems in the field. In the best case, the bootloader
> +should not touch it at all. No watchdog feeding should be done until
> +application-critical software (or a userspace service manager such as
> +'systemd') was started.
> +
> +In case the bootloader is responsible for watchdog activation, the system can
> +be considered as failed by design. The following threats can affect the system
> +which are mostly addressable by properly designed watchdog and watchdog
> +strategy:
> +
> +- software-based misconfigurations or bugs prevent the system from starting.
> +- glitches caused by under-voltage, inappropriate power-on sequence or noisy
> + power supply.
> +- physical damages caused by humidity, vibration or temperature.
> +- temperature-based misbehavior of the system, e.g. clock is not running or
> + running with wrong frequency.
> +- chemical reactions, e.g. some clock crystals will stop to work in contact
> + with Helium, see for example:
> + https://ifixit.org/blog/11986/iphones-are-allergic-to-helium/
> +- failed storage prevents booting. NAND, SD, SSD, HDD, SPI-flash all of this
> + some day stop to work because their read/write cycles are exceeded.
> +
> +In all these cases, the bootloader won't be able to start and a properly
> +designed watchdog may take some action. For example: recover the system by
> +resetting it, or power it off to reduce the damage.
I haven't seen any watchdogs powering off the system.
In the list above only in the case of glitches caused by under-voltage a
watchdog makes a difference. In all the other cases a watchdog won't
help either.
Given that I don't agree to the claim that systems where the bootloader
has to enable the watchdog are a design failure. Also I bet there are
SoCs on which the watchdog can't be enabled by default before the
bootloader. I wouldn't call boards designed around such a SoC a failure
by design.
Sascha
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
_______________________________________________
barebox mailing list
barebox@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/barebox
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] Documentation: add watchdog documentation
2019-02-18 8:56 ` Tomaž Šolc
@ 2019-02-18 9:23 ` Oleksij Rempel
0 siblings, 0 replies; 7+ messages in thread
From: Oleksij Rempel @ 2019-02-18 9:23 UTC (permalink / raw)
To: Tomaž Šolc, barebox
On 18.02.19 09:56, Tomaž Šolc wrote:
> On 18. 02. 19 09:06, Oleksij Rempel wrote:
>> On 18.02.19 08:56, Tomaž Šolc wrote:
>>> On 18. 02. 19 08:12, Oleksij Rempel wrote:
>>>> +In case the bootloader is responsible for watchdog activation, the system can
>>>> +be considered as failed by design.
>>>
>>> I think this is too strongly worded and I would leave out this last sentence. It seems
>>> arrogant for documentation to judge what is "failed by design" like this, without
>>> considering any other requirements for a system.
>>
>> Can you please provide an example of a requirement, which can't be considered as bad
>> design.
>
> Not everything is an avionics system that needs to address cosmic particles or whatever.
> That doesn't make it a bad design and it's not realistic to expect everything to be made
> up to such standards.
:) sure
> Documentation calling 90% of systems out there "failed by design" is just driving
> potential users away in my opinion. It's ok to make people aware of the limitations though
> (and I think the rest of your text does that just fine).
>
> You list an example yourself below in the text: things like netboot can make boot time
> unpredictable enough that watchdog must be feed during boot. Are all netboot systems
> "failed by design"?
Yes, it is :) at least for a production system. Making some thing bad for development, do
not justify making a production system in a same bad way.
> Some systems don't allow the watchdog to be enabled permanently, but need software to
> enable it (example: bcm2835). Bootloader is the earliest point where this can be done.
> This solves a bad kernel update (might be a requirement for a consumer device), but
> doesn't address power supply glitches during bootloader operation (might not be a
> requirement).
>
> Anyway, just an opinion from someone new to Barebox.
The point of this sentence is to make everybody feel bad if system is designed in a bad way..
Just one example, almost all consumer WiFi router based on Qualcomm Atheros are with
broken watchdog. It is not avionics, but some times users should just manually reset it
and it is bad.
bcm2835 with filed SD card, which should be power cycled. Most probably nobody will die if
it will fail, but it just bad.
In most cases, if you ask: "It is 2019!!! Why this system has no proper watchdog?! why
should i manually power cycle my router, TV, laptop or PC or even a RPi". The answer will
be: "Well, it is not so important, ... we do not even feel bad about it"
If some thing is bad, it should be called as bad. No political correctness.. I hope you
understand my point ;D
Kind regards,
Oleksij Rempel
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
_______________________________________________
barebox mailing list
barebox@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/barebox
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] Documentation: add watchdog documentation
2019-02-18 8:06 ` Oleksij Rempel
@ 2019-02-18 8:56 ` Tomaž Šolc
2019-02-18 9:23 ` Oleksij Rempel
0 siblings, 1 reply; 7+ messages in thread
From: Tomaž Šolc @ 2019-02-18 8:56 UTC (permalink / raw)
To: Oleksij Rempel, barebox
On 18. 02. 19 09:06, Oleksij Rempel wrote:
> On 18.02.19 08:56, Tomaž Šolc wrote:
>> On 18. 02. 19 08:12, Oleksij Rempel wrote:
>>> +In case the bootloader is responsible for watchdog activation, the
>>> system can
>>> +be considered as failed by design.
>>
>> I think this is too strongly worded and I would leave out this last
>> sentence. It seems arrogant for documentation to judge what is "failed
>> by design" like this, without considering any other requirements for a
>> system.
>
> Can you please provide an example of a requirement, which can't be
> considered as bad design.
Not everything is an avionics system that needs to address cosmic
particles or whatever. That doesn't make it a bad design and it's not
realistic to expect everything to be made up to such standards.
Documentation calling 90% of systems out there "failed by design" is
just driving potential users away in my opinion. It's ok to make people
aware of the limitations though (and I think the rest of your text does
that just fine).
You list an example yourself below in the text: things like netboot can
make boot time unpredictable enough that watchdog must be feed during
boot. Are all netboot systems "failed by design"?
Some systems don't allow the watchdog to be enabled permanently, but
need software to enable it (example: bcm2835). Bootloader is the
earliest point where this can be done. This solves a bad kernel update
(might be a requirement for a consumer device), but doesn't address
power supply glitches during bootloader operation (might not be a
requirement).
Anyway, just an opinion from someone new to Barebox.
Best regards
Tomaž
_______________________________________________
barebox mailing list
barebox@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/barebox
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] Documentation: add watchdog documentation
2019-02-18 7:56 ` Tomaž Šolc
@ 2019-02-18 8:06 ` Oleksij Rempel
2019-02-18 8:56 ` Tomaž Šolc
0 siblings, 1 reply; 7+ messages in thread
From: Oleksij Rempel @ 2019-02-18 8:06 UTC (permalink / raw)
To: Tomaž Šolc, barebox
On 18.02.19 08:56, Tomaž Šolc wrote:
> On 18. 02. 19 08:12, Oleksij Rempel wrote:
>> +A watchdog is the last line of defense on misbehaving systems. Thus, proper
>> +hardware and watchdog design considerations should be made to be able to reduce
>> +the impact of failing systems in the field. In the best case, the bootloader
>> +should not touch it at all. No watchdog feeding should be done until
>> +application-critical software (or a userspace service manager such as
>> +'systemd') was started.
>> +
>> +In case the bootloader is responsible for watchdog activation, the system can
>> +be considered as failed by design.
>
> I think this is too strongly worded and I would leave out this last sentence. It seems
> arrogant for documentation to judge what is "failed by design" like this, without
> considering any other requirements for a system.
Can you please provide an example of a requirement, which can't be considered as bad design.
> Such a "failed" watchdog is still better than no watchdog in many cases and sometimes it's
> the only option, as the text in later paragraphs explains. The paragraph above already
> recommends that in the ideal case the bootloader shouldn't touch the watchdog. I think
> that is enough.
>
> Also, as far as I know, the Linux kernel will feed the watchdog on a kernel timer during
> boot and until a userspace process grabs /dev/watchdog. So based on this basically all
> systems based on Linux are already a failed design.
Correct. The fact, it is enabled by default in kernel do not means, it was a good decision.
Kind regards,
Oleksij Rempel
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
_______________________________________________
barebox mailing list
barebox@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/barebox
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v3] Documentation: add watchdog documentation
2019-02-18 7:12 Oleksij Rempel
@ 2019-02-18 7:56 ` Tomaž Šolc
2019-02-18 8:06 ` Oleksij Rempel
0 siblings, 1 reply; 7+ messages in thread
From: Tomaž Šolc @ 2019-02-18 7:56 UTC (permalink / raw)
To: barebox
On 18. 02. 19 08:12, Oleksij Rempel wrote:
> +A watchdog is the last line of defense on misbehaving systems. Thus, proper
> +hardware and watchdog design considerations should be made to be able to reduce
> +the impact of failing systems in the field. In the best case, the bootloader
> +should not touch it at all. No watchdog feeding should be done until
> +application-critical software (or a userspace service manager such as
> +'systemd') was started.
> +
> +In case the bootloader is responsible for watchdog activation, the system can
> +be considered as failed by design.
I think this is too strongly worded and I would leave out this last
sentence. It seems arrogant for documentation to judge what is "failed
by design" like this, without considering any other requirements for a
system.
Such a "failed" watchdog is still better than no watchdog in many cases
and sometimes it's the only option, as the text in later paragraphs
explains. The paragraph above already recommends that in the ideal case
the bootloader shouldn't touch the watchdog. I think that is enough.
Also, as far as I know, the Linux kernel will feed the watchdog on a
kernel timer during boot and until a userspace process grabs
/dev/watchdog. So based on this basically all systems based on Linux are
already a failed design.
Best regards
Tomaž
_______________________________________________
barebox mailing list
barebox@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/barebox
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v3] Documentation: add watchdog documentation
@ 2019-02-18 7:12 Oleksij Rempel
2019-02-18 7:56 ` Tomaž Šolc
0 siblings, 1 reply; 7+ messages in thread
From: Oleksij Rempel @ 2019-02-18 7:12 UTC (permalink / raw)
To: barebox; +Cc: Oleksij Rempel
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
---
Documentation/user/user-manual.rst | 1 +
Documentation/user/watchdog.rst | 116 +++++++++++++++++++++++++++++
2 files changed, 117 insertions(+)
create mode 100644 Documentation/user/watchdog.rst
diff --git a/Documentation/user/user-manual.rst b/Documentation/user/user-manual.rst
index 516b760b1b..d5526de285 100644
--- a/Documentation/user/user-manual.rst
+++ b/Documentation/user/user-manual.rst
@@ -33,6 +33,7 @@ Contents:
system-reset
state
random
+ watchdog
* :ref:`search`
* :ref:`genindex`
diff --git a/Documentation/user/watchdog.rst b/Documentation/user/watchdog.rst
new file mode 100644
index 0000000000..87c63aa078
--- /dev/null
+++ b/Documentation/user/watchdog.rst
@@ -0,0 +1,116 @@
+Watchdog Support
+================
+
+Warnings and Design Consideration
+---------------------------------
+
+A watchdog is the last line of defense on misbehaving systems. Thus, proper
+hardware and watchdog design considerations should be made to be able to reduce
+the impact of failing systems in the field. In the best case, the bootloader
+should not touch it at all. No watchdog feeding should be done until
+application-critical software (or a userspace service manager such as
+'systemd') was started.
+
+In case the bootloader is responsible for watchdog activation, the system can
+be considered as failed by design. The following threats can affect the system
+which are mostly addressable by properly designed watchdog and watchdog
+strategy:
+
+- software-based misconfigurations or bugs prevent the system from starting.
+- glitches caused by under-voltage, inappropriate power-on sequence or noisy
+ power supply.
+- physical damages caused by humidity, vibration or temperature.
+- temperature-based misbehavior of the system, e.g. clock is not running or
+ running with wrong frequency.
+- chemical reactions, e.g. some clock crystals will stop to work in contact
+ with Helium, see for example:
+ https://ifixit.org/blog/11986/iphones-are-allergic-to-helium/
+- failed storage prevents booting. NAND, SD, SSD, HDD, SPI-flash all of this
+ some day stop to work because their read/write cycles are exceeded.
+
+In all these cases, the bootloader won't be able to start and a properly
+designed watchdog may take some action. For example: recover the system by
+resetting it, or power it off to reduce the damage.
+
+Barebox Watchdog Functionality
+------------------------------
+
+Nevertheless, in some cases we are not able to influence the hardware design
+anymore or while developing one needs to be able to feed the watchdog to
+disable it from within the bootloader. For these scenarios barebox provides the
+watchdog framework with the following functionality and at least
+``CONFIG_WATCHDOG`` should be enabled:
+
+Polling
+~~~~~~~
+
+Watchdog polling/feeding allows to feed the watchdog and keep it running on one
+side and to not reset the system on the other side. It is needed on hardware
+with short-time watchdogs. For example the Atheros ar9331 watchdog has a
+maximal timeout of 7 seconds, so it may reset even on netboot.
+Or it can be used on systems where the watchdog is already running and can't be
+disabled, an example for that is the watchdog of the i.MX2 series.
+This functionally can be seen as a threat, since in error cases barebox will
+continue to feed the watchdog even if that is not desired. So, depending on
+your needs ``CONFIG_WATCHDOG_POLLER`` can be enabled or disabled at compile
+time. Even if barebox was built with watchdog polling support, it is not
+enabled by default. To start polling from command line run:
+
+.. code-block:: console
+
+ wdog0.autoping=1
+
+The poller interval is not configurable, but fixed at 500ms and the watchdog
+timeout is configured by default to the maximum of the supported values by
+hardware. To change the timeout used by the poller, run:
+
+.. code-block:: console
+
+ wdog0.timeout_cur=7
+
+To read the current watchdog's configuration, run:
+
+.. code-block:: console
+
+ devinfo wdog0
+
+The output may look as follows where ``timeout_cur`` and ``timeout_max`` are
+measured in seconds:
+
+.. code-block:: console
+
+ barebox@DPTechnics DPT-Module:/ devinfo wdog0
+ Parameters:
+ autoping: 1 (type: bool)
+ timeout_cur: 7 (type: uint32)
+ timeout_max: 10 (type: uint32)
+
+Use barebox' environment to persist these changes between reboots:
+
+.. code-block:: console
+
+ nv dev.wdog0.autoping=1
+ nv dev.wdog0.timeout_cur=7
+
+Boot Watchdog Timeout
+~~~~~~~~~~~~~~~~~~~~~
+
+With this functionality barebox may start a watchdog or update the timeout of
+an already-running one, just before kicking the boot image. It can be
+configured temporarily via
+
+.. code-block:: console
+
+ global boot.watchdog_timeout=10
+
+or persistently by
+
+.. code-block:: console
+
+ nv boot.watchdog_timeout=10
+
+where the used value again is measured in seconds.
+
+On a system with multiple watchdogs, only the first one (wdog0) is affected by
+the ``boot.watchdog_timeout`` parameter.
+
--
2.20.1
_______________________________________________
barebox mailing list
barebox@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/barebox
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-09-25 10:09 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-24 7:54 [PATCH v3] Documentation: add watchdog documentation Oleksij Rempel
2019-09-25 10:09 ` Sascha Hauer
-- strict thread matches above, loose matches on Subject: below --
2019-02-18 7:12 Oleksij Rempel
2019-02-18 7:56 ` Tomaž Šolc
2019-02-18 8:06 ` Oleksij Rempel
2019-02-18 8:56 ` Tomaž Šolc
2019-02-18 9:23 ` Oleksij Rempel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox