From mboxrd@z Thu Jan 1 00:00:00 1970 Return-path: Received: from moutng.kundenserver.de ([212.227.126.186]) by merlin.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1U63Yf-0006UP-DX for barebox@lists.infradead.org; Thu, 14 Feb 2013 18:29:22 +0000 Message-ID: <511D2CFB.9010602@cmotion.eu> Date: Thu, 14 Feb 2013 19:29:15 +0100 From: Christian Kapeller MIME-Version: 1.0 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: barebox-bounces@lists.infradead.org Errors-To: barebox-bounces+u.kleine-koenig=pengutronix.de@lists.infradead.org Subject: Environment changes lead to weird boot behaviour To: barebox@lists.infradead.org Hi, I try to investigate a situation where barebox (v2013.02.0 + board patches) fails to boot the linux kernel on my karo-tx53 based board. The problem may well be introduced by myself, but after a few days of investigation is still fail to grasp the problem's root. Depending on whether files are present in the boot environment the kernel may start in some cases, in some it won't. The file contents seems not to be relevant, since I've managed to get a broken boot situation, by simply adding a ash script doing one 'echo blah'. In all cases barebox shuts down in orderly fashion, and jumps to the kernel image. The kernel in question is a zImage (3.4) + Initramfs + concatenated devicetree. Also another zImage + concatenated devicetree is affected. Background: I am implementing a 'foolproof' field update scheme. The control flow looks like: (Good Case) boot0 -(A)-> bootA/bootB -(B)-> kernel (Bad Case 1) boot0 -(A)-> bootA/bootB -(C)-> rescue-kernel (Bad Case 2) boot0 -(D)-> rescue-kernel boot0 .. 1st stage barebox in 256k NAND partition bootA/B .. 2nd stage barebox in 256k NAND partition kernel .. production kernel + ubiroot in NAND rescue-kernel .. selfcontained rescue kernel + initramfs in NAND bootenv .. stores just state variables. (256k NAND partition) scriptenv .. stores just scripts and static config (bundled with 2ndstage) (A) boot0 checks one of 2 partitions with 2nd stage barebox in a uimage, and boots the newer one. (B) 2nd stage bb starts production system (C) 2nd stage bb starts rescue kernel bc button/bootenv says so. (D) 1st stage bb starts rescue system bc no 2nd stage is valid I want to be able to exchange 2nd stage without hassle. To do this, I've introduced a split of the bootenvironment: boot scripts stay with the barebox image, non-volatile data is saved in a barebox environment. The following patch accomplishes this: diff --git a/common/startup.c b/common/startup.c index 14409a2..59e76ac 100644 --- a/common/startup.c +++ b/common/startup.c @@ -108,15 +108,17 @@ void start_barebox (void) debug("initcalls done\n"); #ifdef CONFIG_ENV_HANDLING - if (envfs_load(default_environment_path, "/env", 0)) { + envfs_load("/dev/defaultenv", "/env", 0); #ifdef CONFIG_DEFAULT_ENVIRONMENT + mkdir("/var", 0); + if (envfs_load(default_environment_path, "/var", 0)) { printf("no valid environment found on %s. " "Using default environment\n", default_environment_path); - envfs_load("/dev/defaultenv", "/env", 0); -#endif + envfs_save("/dev/env0", "/var"); } #endif +#endif #ifdef CONFIG_COMMAND_SUPPORT printf("running /env/bin/init...\n"); Everything looks peachy, until I add a file in the boot environment using the bareboxenv tool. Say, I add a 'update-in-progress' flag. If the 2nd stage loader sees this, it knows, that something went wrong, and can act accordingly. The problem is, although I can read the state variable out of the environment, the kernel boot fails with no messages from the kernel. No earlyprintk output, nothing. There the search started: removing the new file, by just using 'rm /var/update-in-progress' made the kernel boot again. ... most of the time. The removing some scripts (not relevant to this bootpath) from the image bundled scriptenv helped,.. sometimes. I removed the 'common/bareboxenv' file before every recompile. I've investigated size issues: I use defaultenv-2 + custom scripts together ~ 225k worth of ash scripts giving a 15k common/barebox_default_env. I found no correlation between size and failure. I've tried to boil down the scripting stuff, to get a clean failure case, but no success here, hence I don't post the code in this mail. I can compile bb images that render the kernel unbootable. So I ruled out issues when writing the environment from linux. The rescue kernel is bootable without any additional kernel parameters. So I should get at least something from there. just 'bootm /dev/rescue' works right away. I've ruled out partition overlaps. The partitions (8 of them) are registered with mtdparts-add by means of a quite bulky environment variable. I've tried to add a big binary blob to the scriptenv, making the bb image nearly 256k big. No reproducible failure, I've tried to add 30 shell scripts echoing some line. I source those from /env/bin/init, to see whether ash couqhs up on them, also no reproducible failure. So my questions are: Do you know of any side effects the above patch may introduce? Do you know of a way to cause a kernel to fail to boot, by just adding a irrelevant shell script to the boot environment? What else can I look for? Best regards, Christian -- Christian Kapeller cmotion GmbH Kriehubergasse 16 1050 Wien / Austria http://www.cmotion.eu christian.kapeller@cmotion.eu Phone: +43 1 789 1096 38 _______________________________________________ barebox mailing list barebox@lists.infradead.org http://lists.infradead.org/mailman/listinfo/barebox