* Environment changes lead to weird boot behaviour
@ 2013-02-14 18:29 Christian Kapeller
2013-02-15 18:57 ` Sascha Hauer
0 siblings, 1 reply; 2+ messages in thread
From: Christian Kapeller @ 2013-02-14 18:29 UTC (permalink / raw)
To: barebox
Hi,
I try to investigate a situation where barebox (v2013.02.0 + board patches)
fails to boot the linux kernel on my karo-tx53 based board. The problem may
well be introduced by myself, but after a few days of investigation is still
fail to grasp the problem's root.
Depending on whether files are present in the boot environment the kernel may
start in some cases, in some it won't.
The file contents seems not to be relevant, since I've managed to get a
broken boot situation, by simply adding a ash script doing one 'echo blah'.
In all cases barebox shuts down in orderly fashion, and jumps to the kernel
image. The kernel in question is a zImage (3.4) + Initramfs + concatenated
devicetree. Also another zImage + concatenated devicetree is affected.
Background: I am implementing a 'foolproof' field update scheme. The
control flow looks like:
(Good Case) boot0 -(A)-> bootA/bootB -(B)-> kernel
(Bad Case 1) boot0 -(A)-> bootA/bootB -(C)-> rescue-kernel
(Bad Case 2) boot0 -(D)-> rescue-kernel
boot0 .. 1st stage barebox in 256k NAND partition
bootA/B .. 2nd stage barebox in 256k NAND partition
kernel .. production kernel + ubiroot in NAND
rescue-kernel .. selfcontained rescue kernel + initramfs in NAND
bootenv .. stores just state variables. (256k NAND partition)
scriptenv .. stores just scripts and static config (bundled with 2ndstage)
(A) boot0 checks one of 2 partitions with 2nd stage barebox in a uimage,
and boots the newer one.
(B) 2nd stage bb starts production system
(C) 2nd stage bb starts rescue kernel bc button/bootenv says so.
(D) 1st stage bb starts rescue system bc no 2nd stage is valid
I want to be able to exchange 2nd stage without hassle. To do this,
I've introduced a split of the bootenvironment: boot scripts stay with
the barebox image, non-volatile data is saved in a barebox environment.
The following patch accomplishes this:
diff --git a/common/startup.c b/common/startup.c
index 14409a2..59e76ac 100644
--- a/common/startup.c
+++ b/common/startup.c
@@ -108,15 +108,17 @@ void start_barebox (void)
debug("initcalls done\n");
#ifdef CONFIG_ENV_HANDLING
- if (envfs_load(default_environment_path, "/env", 0)) {
+ envfs_load("/dev/defaultenv", "/env", 0);
#ifdef CONFIG_DEFAULT_ENVIRONMENT
+ mkdir("/var", 0);
+ if (envfs_load(default_environment_path, "/var", 0)) {
printf("no valid environment found on %s. "
"Using default environment\n",
default_environment_path);
- envfs_load("/dev/defaultenv", "/env", 0);
-#endif
+ envfs_save("/dev/env0", "/var");
}
#endif
+#endif
#ifdef CONFIG_COMMAND_SUPPORT
printf("running /env/bin/init...\n");
Everything looks peachy, until I add a file in the boot environment
using the bareboxenv tool. Say, I add a 'update-in-progress' flag. If
the 2nd stage loader sees this, it knows, that something went wrong,
and can act accordingly.
The problem is, although I can read the state variable out of the
environment, the kernel boot fails with no messages from the kernel.
No earlyprintk output, nothing.
There the search started:
removing the new file, by just using 'rm /var/update-in-progress'
made the kernel boot again. ... most of the time.
The removing some scripts (not relevant to this bootpath) from
the image bundled scriptenv helped,.. sometimes.
I removed the 'common/bareboxenv' file before every recompile.
I've investigated size issues: I use defaultenv-2 + custom scripts
together ~ 225k worth of ash scripts giving a 15k
common/barebox_default_env. I found no correlation between size
and failure.
I've tried to boil down the scripting stuff, to get a clean
failure case, but no success here, hence I don't post the code
in this mail.
I can compile bb images that render the kernel unbootable.
So I ruled out issues when writing the environment from linux.
The rescue kernel is bootable without any additional kernel
parameters. So I should get at least something from there.
just 'bootm /dev/rescue' works right away.
I've ruled out partition overlaps. The partitions (8 of them)
are registered with mtdparts-add by means of a quite bulky
environment variable.
I've tried to add a big binary blob to the scriptenv,
making the bb image nearly 256k big. No reproducible
failure,
I've tried to add 30 shell scripts echoing some line.
I source those from /env/bin/init, to see whether ash
couqhs up on them, also no reproducible failure.
So my questions are:
Do you know of any side effects the above patch may introduce?
Do you know of a way to cause a kernel to fail to boot, by just
adding a irrelevant shell script to the boot environment?
What else can I look for?
Best regards,
Christian
--
Christian Kapeller
cmotion GmbH
Kriehubergasse 16
1050 Wien / Austria
http://www.cmotion.eu
christian.kapeller@cmotion.eu
Phone: +43 1 789 1096 38
_______________________________________________
barebox mailing list
barebox@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/barebox
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: Environment changes lead to weird boot behaviour
2013-02-14 18:29 Environment changes lead to weird boot behaviour Christian Kapeller
@ 2013-02-15 18:57 ` Sascha Hauer
0 siblings, 0 replies; 2+ messages in thread
From: Sascha Hauer @ 2013-02-15 18:57 UTC (permalink / raw)
To: Christian Kapeller; +Cc: barebox
On Thu, Feb 14, 2013 at 07:29:15PM +0100, Christian Kapeller wrote:
> Hi,
>
> I try to investigate a situation where barebox (v2013.02.0 + board patches)
> fails to boot the linux kernel on my karo-tx53 based board. The problem may
> well be introduced by myself, but after a few days of investigation is still
> fail to grasp the problem's root.
>
> Depending on whether files are present in the boot environment the kernel may
> start in some cases, in some it won't.
>
> The file contents seems not to be relevant, since I've managed to get a
> broken boot situation, by simply adding a ash script doing one 'echo blah'.
>
> In all cases barebox shuts down in orderly fashion, and jumps to the kernel
> image. The kernel in question is a zImage (3.4) + Initramfs + concatenated
> devicetree. Also another zImage + concatenated devicetree is affected.
>
>
> Background: I am implementing a 'foolproof' field update scheme. The
> control flow looks like:
>
> (Good Case) boot0 -(A)-> bootA/bootB -(B)-> kernel
> (Bad Case 1) boot0 -(A)-> bootA/bootB -(C)-> rescue-kernel
> (Bad Case 2) boot0 -(D)-> rescue-kernel
>
> boot0 .. 1st stage barebox in 256k NAND partition
> bootA/B .. 2nd stage barebox in 256k NAND partition
> kernel .. production kernel + ubiroot in NAND
> rescue-kernel .. selfcontained rescue kernel + initramfs in NAND
> bootenv .. stores just state variables. (256k NAND partition)
> scriptenv .. stores just scripts and static config (bundled with 2ndstage)
>
>
> (A) boot0 checks one of 2 partitions with 2nd stage barebox in a uimage,
> and boots the newer one.
> (B) 2nd stage bb starts production system
> (C) 2nd stage bb starts rescue kernel bc button/bootenv says so.
> (D) 1st stage bb starts rescue system bc no 2nd stage is valid
>
> I want to be able to exchange 2nd stage without hassle. To do this,
> I've introduced a split of the bootenvironment: boot scripts stay with
> the barebox image, non-volatile data is saved in a barebox environment.
>
> The following patch accomplishes this:
>
> diff --git a/common/startup.c b/common/startup.c
> index 14409a2..59e76ac 100644
> --- a/common/startup.c
> +++ b/common/startup.c
> @@ -108,15 +108,17 @@ void start_barebox (void)
> debug("initcalls done\n");
>
> #ifdef CONFIG_ENV_HANDLING
> - if (envfs_load(default_environment_path, "/env", 0)) {
> + envfs_load("/dev/defaultenv", "/env", 0);
> #ifdef CONFIG_DEFAULT_ENVIRONMENT
> + mkdir("/var", 0);
> + if (envfs_load(default_environment_path, "/var", 0)) {
> printf("no valid environment found on %s. "
> "Using default environment\n",
> default_environment_path);
> - envfs_load("/dev/defaultenv", "/env", 0);
> -#endif
> + envfs_save("/dev/env0", "/var");
> }
> #endif
> +#endif
> #ifdef CONFIG_COMMAND_SUPPORT
> printf("running /env/bin/init...\n");
>
>
> Everything looks peachy, until I add a file in the boot environment
> using the bareboxenv tool. Say, I add a 'update-in-progress' flag. If
> the 2nd stage loader sees this, it knows, that something went wrong,
> and can act accordingly.
>
> The problem is, although I can read the state variable out of the
> environment, the kernel boot fails with no messages from the kernel.
> No earlyprintk output, nothing.
>
>
> There the search started:
>
> removing the new file, by just using 'rm /var/update-in-progress'
> made the kernel boot again. ... most of the time.
>
> The removing some scripts (not relevant to this bootpath) from
> the image bundled scriptenv helped,.. sometimes.
>
> I removed the 'common/bareboxenv' file before every recompile.
>
> I've investigated size issues: I use defaultenv-2 + custom scripts
> together ~ 225k worth of ash scripts giving a 15k
> common/barebox_default_env. I found no correlation between size
> and failure.
>
> I've tried to boil down the scripting stuff, to get a clean
> failure case, but no success here, hence I don't post the code
> in this mail.
>
> I can compile bb images that render the kernel unbootable.
> So I ruled out issues when writing the environment from linux.
>
> The rescue kernel is bootable without any additional kernel
> parameters. So I should get at least something from there.
> just 'bootm /dev/rescue' works right away.
>
> I've ruled out partition overlaps. The partitions (8 of them)
> are registered with mtdparts-add by means of a quite bulky
> environment variable.
>
> I've tried to add a big binary blob to the scriptenv,
> making the bb image nearly 256k big. No reproducible
> failure,
>
> I've tried to add 30 shell scripts echoing some line.
> I source those from /env/bin/init, to see whether ash
> couqhs up on them, also no reproducible failure.
>
>
> So my questions are:
>
> Do you know of any side effects the above patch may introduce?
>
> Do you know of a way to cause a kernel to fail to boot, by just
> adding a irrelevant shell script to the boot environment?
>
> What else can I look for?
I have no real idea. Some suggestions/questions:
- Could it be that your kernel image overlaps the malloc space? Normally
this shouldn't happen as barebox has protection against this, but who
knows...
- Do you boot your kernel with devicetree?
- You could calculate and dump a crc right before shutdown_barebox in
arch/arm/lib/armlinux.c to see if your kernel image is corrupted
sometimes
Sascha
--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
_______________________________________________
barebox mailing list
barebox@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/barebox
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2013-02-15 18:57 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-02-14 18:29 Environment changes lead to weird boot behaviour Christian Kapeller
2013-02-15 18:57 ` Sascha Hauer
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox