* [PATCH master v2 1/3] include: asm-generic: reloc: implement runtime_address()
2022-10-20 13:15 [PATCH master v2 0/3] Fix GCC 11 THUMB2 relocate_to_current_adr miscompile Ahmad Fatoum
@ 2022-10-20 13:15 ` Ahmad Fatoum
2022-10-20 13:15 ` [PATCH master v2 2/3] ARM: cpu: add compiler barrier around unrelocated access Ahmad Fatoum
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Ahmad Fatoum @ 2022-10-20 13:15 UTC (permalink / raw)
To: barebox; +Cc: Ahmad Fatoum
This introduces runtime_address(__linker_defined_symbol) as an
alternative to error-prone __linker_defined_symbol +
get_runtime_offset()/global_variable_offset().
While most code is better served by doing a
relocate_to_current_adr(); setup_c();
and jumping to a noinline function for anything remotely complicated,
we can't do that always:
- In relocation code, PBL uncompressing preparatory code, we _must_
access linker defined symbols before relocation unless we
reimplement them in assembly.
- I believe GCC doesn't guarantee that an external object referenced
in a noinline function has its address computed in the same
function. Compiler may see occasion to pc-relative read e.g. two
addresses located after function return into registers, believing
that relocation must have happened before C code first runs.
We then do the relocation, but the addresses are never touched
again, so we dereference an unrelocated address later on.
For these situation we introduce a new runtime_address() macro that
hides behind assembly the origin of the address it returns and so the
compiler can not assume that it may move it around across calls to
functions like relocate_to_current_adr() or the relocation loop in
relocate_to_current_adr() itself.
This has one major shortcoming that exists with the opencoded
addition as well: Compiler will generate PC-relative access to data
defined in the same translation unit, so we end up adding the offset
twice. We employ some GCC builtin magic to catch most of this at
compile-time. If we just did RELOC_HIDE() with a cast, we may lull
board code authors into false security when they use it for non
linker defined symbols.
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
include/asm-generic/reloc.h | 69 +++++++++++++++++++++++++++++++++++++
1 file changed, 69 insertions(+)
diff --git a/include/asm-generic/reloc.h b/include/asm-generic/reloc.h
index 90459371ebe8..06fccbd6f367 100644
--- a/include/asm-generic/reloc.h
+++ b/include/asm-generic/reloc.h
@@ -3,8 +3,77 @@
#ifndef _ASM_GENERIC_RELOC_H_
#define _ASM_GENERIC_RELOC_H_
+#include <linux/build_bug.h>
+#include <linux/compiler.h>
+
#ifndef global_variable_offset
#define global_variable_offset() get_runtime_offset()
#endif
+/*
+ * Using sizeof() on incomplete types always fails, so we use GCC's
+ * __builtin_object_size() instead. This is the mechanism underlying
+ * FORTIFY_SOURCE. &symbol should always be something GCC can compute
+ * a size for, even without annotations, unless it's incomplete.
+ * The second argument ensures we get 0 for failure.
+ */
+#define __has_type_complete(sym) __builtin_object_size(&(sym), 2)
+
+#define __has_type_byte_array(sym) (sizeof(*sym) == 1 + __must_be_array(sym))
+
+/*
+ * runtime_address() defined below is supposed to be used exclusively
+ * with linker defined symbols, e.g. unsigned char input_end[].
+ *
+ * We can't completely ensure that, but this gets us close enough
+ * to avoid most abuse of runtime_address().
+ */
+#define __is_incomplete_byte_array(sym) \
+ (!__has_type_complete(sym) && __has_type_byte_array(sym))
+
+/*
+ * While accessing global variables before C environment is setup is
+ * questionable, we can't avoid it when we decide to write our
+ * relocation routines in C. This invites a tricky problem with
+ * this naive code:
+ *
+ * var = &variable + global_variable_offset(); relocate_to_current_adr();
+ *
+ * Compiler is within rights to rematerialize &variable after
+ * relocate_to_current_adr(), which is unfortunate because we
+ * then end up adding a relocated &variable with the relocation
+ * offset once more. We avoid this here by hiding address with
+ * RELOC_HIDE. This is required as a simple compiler barrier()
+ * with "memory" clobber is not immune to compiler proving that
+ * &sym fits in a register and as such is unaffected by the memory
+ * clobber. barrier_data(&sym) would work too, but that comes with
+ * aforementioned compiler "memory" barrier, that we don't care for.
+ *
+ * We don't necessarily need the volatile variable assignment when
+ * using the compiler-gcc.h RELOC_HIDE implementation as __asm__
+ * __volatile__ takes care of it, but the generic RELOC_HIDE
+ * implementation has GCC misscompile runtime_address() when not passing
+ * in a volatile object. Volatile casts instead of variable assignments
+ * also led to miscompilations with GCC v11.1.1 for THUMB2.
+ */
+
+#define runtime_address(sym) ({ \
+ void *volatile __addrof_sym = (sym); \
+ if (!__is_incomplete_byte_array(sym)) \
+ __unsafe_runtime_address(); \
+ RELOC_HIDE(__addrof_sym, global_variable_offset()); \
+})
+
+/*
+ * Above will fail for "near" objects, e.g. data in the same
+ * translation unit or with LTO, as the compiler can be smart
+ * enough to omit relocation entry and just generate PC relative
+ * accesses leading to base address being added twice. We try to
+ * catch most of these here by triggering an error when runtime_address()
+ * is used with anything that is not a byte array of unknown size.
+ */
+extern void *__compiletime_error(
+ "runtime_address() may only be called on linker defined symbols."
+) __unsafe_runtime_address(void);
+
#endif
--
2.30.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH master v2 2/3] ARM: cpu: add compiler barrier around unrelocated access
2022-10-20 13:15 [PATCH master v2 0/3] Fix GCC 11 THUMB2 relocate_to_current_adr miscompile Ahmad Fatoum
2022-10-20 13:15 ` [PATCH master v2 1/3] include: asm-generic: reloc: implement runtime_address() Ahmad Fatoum
@ 2022-10-20 13:15 ` Ahmad Fatoum
2022-10-20 13:15 ` [PATCH v2 3/3] RISC-V: add compiler barriers around unrelocated accesses Ahmad Fatoum
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Ahmad Fatoum @ 2022-10-20 13:15 UTC (permalink / raw)
To: barebox; +Cc: Ahmad Fatoum
GCC v11.1.1 was observed miscompiling relocate_to_current_adr() while
generating THUMB2 code:
dynsym = (void *)__dynsym_start + offset_var;
178c: 4b48 ldr r3, [pc, #288] ; (18b0 <relocate_to_current_adr+0x13c>)
178e: 5869 ldr r1, [r5, r1]
dend = (void *)__rel_dyn_end + offset_var;
1790: 4407 add r7, r0
dynsym = (void *)__dynsym_start + offset_var;
1792: 58e8 ldr r0, [r5, r3]
1794: f102 0308 add.w r3, r2, #8
1798: 440b add r3, r1
179a: 4410 add r0, r2
dynend = (void *)__dynsym_end + offset_var;
while (dstart < dend) {
179c: f1a3 0608 sub.w r6, r3, #8
17a0: 42b7 cmp r7, r6
17a2: d80a bhi.n 17ba <relocate_to_current_adr+0x46>
dynend = (void *)__dynsym_end + offset_var;
17a4: 4b43 ldr r3, [pc, #268] ; (18b4 <relocate_to_current_adr+0x140>)
}
dstart += sizeof(*rel);
}
__memset(dynsym, 0, (unsigned long)dynend - (unsigned long)dynsym);
17a6: 2100 movs r1, #0
dynend = (void *)__dynsym_end + offset_var;
17a8: 58eb ldr r3, [r5, r3]
17aa: 441a add r2, r3
__memset(dynsym, 0, (unsigned long)dynend - (unsigned long)dynsym);
17ac: 1a12 subs r2, r2, r0
17ae: f000 fda5 bl 22fc <__memset>
Both &__dynsym_start and &__dynsym_end will change value after relocation,
so we absolutely want address calculation and addition of offset_var to
happen before relocation. Compiler is within rights though to assume
variables to be already relocated though and thus proves that &__dynsym_end
may not change in the loop and thus move dynend calculation below the
relocation loop and thus we end up with dynend being incremented by
offset_var once more. The resulting out-of-bounds memset() will overwrite
parts of barebox and break its startup.
The naive solution of moving dynsym/dynend calculation beyond the
relocation loop is insufficient as the compiler may decide to move
it back. Instead the only solution short of rewriting this all in
assembly seems to be hiding the origin of dynsym's value, so the
optimizer may not prove the assumption that relocation would not affect
its value. This is done using runtime_address, which was introduced in
a previous commit. With this, the __memset call now uses precomputed
values as expected: no last minute ldr, everything tidily placed into
registers prior to the relocation loop:
17be: 2100 movs r1, #0
17c0: 1b52 subs r2, r2, r5
17c2: 4628 mov r0, r5
17c4: f000 fdaa bl 231c <__memset>
Fixes: a8b788ba61eb ("relocate_to_current_adr: hang directly on error instead of panic()")
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
arch/arm/cpu/common.c | 15 +++++++++------
arch/arm/cpu/uncompress.c | 4 ++--
2 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/arch/arm/cpu/common.c b/arch/arm/cpu/common.c
index 5ccacf204751..62781b76ce68 100644
--- a/arch/arm/cpu/common.c
+++ b/arch/arm/cpu/common.c
@@ -61,16 +61,19 @@ void pbl_barebox_break(void)
*/
void relocate_to_current_adr(void)
{
- unsigned long offset, offset_var;
+ unsigned long offset;
unsigned long __maybe_unused *dynsym, *dynend;
void *dstart, *dend;
/* Get offset between linked address and runtime address */
offset = get_runtime_offset();
- offset_var = global_variable_offset();
- dstart = (void *)__rel_dyn_start + offset_var;
- dend = (void *)__rel_dyn_end + offset_var;
+ /*
+ * We have yet to relocate, so using runtime_address
+ * to compute the relocated address
+ */
+ dstart = runtime_address(__rel_dyn_start);
+ dend = runtime_address(__rel_dyn_end);
#if defined(CONFIG_CPU_64)
while (dstart < dend) {
@@ -96,8 +99,8 @@ void relocate_to_current_adr(void)
dstart += sizeof(*rel);
}
#elif defined(CONFIG_CPU_32)
- dynsym = (void *)__dynsym_start + offset_var;
- dynend = (void *)__dynsym_end + offset_var;
+ dynsym = runtime_address(__dynsym_start);
+ dynend = runtime_address(__dynsym_end);
while (dstart < dend) {
struct elf32_rel *rel = dstart;
diff --git a/arch/arm/cpu/uncompress.c b/arch/arm/cpu/uncompress.c
index 537ee63229d7..65de87f10923 100644
--- a/arch/arm/cpu/uncompress.c
+++ b/arch/arm/cpu/uncompress.c
@@ -53,8 +53,8 @@ void __noreturn barebox_pbl_start(unsigned long membase, unsigned long memsize,
unsigned long pc = get_pc();
/* piggy data is not relocated, so determine the bounds now */
- pg_start = input_data + global_variable_offset();
- pg_end = input_data_end + global_variable_offset();
+ pg_start = runtime_address(input_data);
+ pg_end = runtime_address(input_data_end);
if (IS_ENABLED(CONFIG_PBL_RELOCATABLE)) {
/*
--
2.30.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v2 3/3] RISC-V: add compiler barriers around unrelocated accesses
2022-10-20 13:15 [PATCH master v2 0/3] Fix GCC 11 THUMB2 relocate_to_current_adr miscompile Ahmad Fatoum
2022-10-20 13:15 ` [PATCH master v2 1/3] include: asm-generic: reloc: implement runtime_address() Ahmad Fatoum
2022-10-20 13:15 ` [PATCH master v2 2/3] ARM: cpu: add compiler barrier around unrelocated access Ahmad Fatoum
@ 2022-10-20 13:15 ` Ahmad Fatoum
2022-10-21 5:34 ` [PATCH master v2 0/3] Fix GCC 11 THUMB2 relocate_to_current_adr miscompile Ahmad Fatoum
2022-10-24 9:03 ` Sascha Hauer
4 siblings, 0 replies; 6+ messages in thread
From: Ahmad Fatoum @ 2022-10-20 13:15 UTC (permalink / raw)
To: barebox; +Cc: Ahmad Fatoum
We observed on ARM miscompilation because get_runtime_offset() was
cached before relocation, while address computation of symbol happened
after, effectively adding the base address twice to the symbol offset.
New runtime_address() hides origin of the symbol going into the address
calculation and thereby thwarts this optimization. Employ it in RISC-V
code as well to avoid such issues as experienced on ARM.
Signed-off-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
---
arch/riscv/boot/uncompress.c | 4 ++--
arch/riscv/include/asm/sections.h | 3 ++-
arch/riscv/lib/reloc.c | 10 +++++++---
3 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/arch/riscv/boot/uncompress.c b/arch/riscv/boot/uncompress.c
index 4ed9b4d37192..4addd49e7389 100644
--- a/arch/riscv/boot/uncompress.c
+++ b/arch/riscv/boot/uncompress.c
@@ -36,8 +36,8 @@ void __noreturn barebox_pbl_start(unsigned long membase, unsigned long memsize,
irq_init_vector(__riscv_mode(flags));
/* piggy data is not relocated, so determine the bounds now */
- pg_start = input_data + get_runtime_offset();
- pg_end = input_data_end + get_runtime_offset();
+ pg_start = runtime_address(input_data);
+ pg_end = runtime_address(input_data_end);
pg_len = pg_end - pg_start;
uncompressed_len = input_data_len();
diff --git a/arch/riscv/include/asm/sections.h b/arch/riscv/include/asm/sections.h
index 6673648bcd58..cea039cc5e14 100644
--- a/arch/riscv/include/asm/sections.h
+++ b/arch/riscv/include/asm/sections.h
@@ -6,6 +6,7 @@
#include <asm-generic/sections.h>
#include <linux/types.h>
#include <asm/unaligned.h>
+#include <asm/reloc.h>
extern char __rel_dyn_start[];
extern char __rel_dyn_end[];
@@ -19,7 +20,7 @@ unsigned long get_runtime_offset(void);
static inline unsigned int input_data_len(void)
{
- return get_unaligned((const u32 *)(input_data_end + get_runtime_offset() - 4));
+ return get_unaligned((const u32 *)runtime_address(input_data_end) - 1);
}
#endif
diff --git a/arch/riscv/lib/reloc.c b/arch/riscv/lib/reloc.c
index da53c50448d7..86a4b3719d5f 100644
--- a/arch/riscv/lib/reloc.c
+++ b/arch/riscv/lib/reloc.c
@@ -42,9 +42,13 @@ void relocate_to_current_adr(void)
if (!offset)
return;
- dstart = __rel_dyn_start + offset;
- dend = __rel_dyn_end + offset;
- dynsym = (void *)__dynsym_start + offset;
+ /*
+ * We have yet to relocate, so using runtime_address
+ * to compute the relocated address
+ */
+ dstart = runtime_address(__rel_dyn_start);
+ dend = runtime_address(__rel_dyn_end);
+ dynsym = runtime_address(__dynsym_start);
for (rela = dstart; (void *)rela < dend; rela++) {
unsigned long *fixup;
--
2.30.2
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH master v2 0/3] Fix GCC 11 THUMB2 relocate_to_current_adr miscompile
2022-10-20 13:15 [PATCH master v2 0/3] Fix GCC 11 THUMB2 relocate_to_current_adr miscompile Ahmad Fatoum
` (2 preceding siblings ...)
2022-10-20 13:15 ` [PATCH v2 3/3] RISC-V: add compiler barriers around unrelocated accesses Ahmad Fatoum
@ 2022-10-21 5:34 ` Ahmad Fatoum
2022-10-24 9:03 ` Sascha Hauer
4 siblings, 0 replies; 6+ messages in thread
From: Ahmad Fatoum @ 2022-10-21 5:34 UTC (permalink / raw)
To: barebox
On 20.10.22 15:15, Ahmad Fatoum wrote:
> Since a8b788ba61eb ("relocate_to_current_adr: hang directly on error instead
> of panic()"), GCC can prove that variables aren't supposed to overlap and as
> such it generated code than readded get_runtime_offset() on top of an
> already relocated linker-defined variable's address.
> See PATCH 2/3 for a disassembly of the affected code. Board code can
> be similarly micompiled, but that's a fix for another day.
v1 -> v2:
- rename get_unrelocated() to runtime_address()
- add Fixes: line after identifying why this issue only popped up now:
removing panic() in relocate_to_current_adr() shifted code around
enough for compiler to consider optimization worthwhile
- fixed left-over + offset in RISC-V patch
>
> Ahmad Fatoum (3):
> include: asm-generic: reloc: implement runtime_address()
> ARM: cpu: add compiler barrier around unrelocated access
> RISC-V: add compiler barriers around unrelocated accesses
>
> arch/arm/cpu/common.c | 15 ++++---
> arch/arm/cpu/uncompress.c | 4 +-
> arch/riscv/boot/uncompress.c | 4 +-
> arch/riscv/include/asm/sections.h | 3 +-
> arch/riscv/lib/reloc.c | 10 +++--
> include/asm-generic/reloc.h | 69 +++++++++++++++++++++++++++++++
> 6 files changed, 91 insertions(+), 14 deletions(-)
>
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH master v2 0/3] Fix GCC 11 THUMB2 relocate_to_current_adr miscompile
2022-10-20 13:15 [PATCH master v2 0/3] Fix GCC 11 THUMB2 relocate_to_current_adr miscompile Ahmad Fatoum
` (3 preceding siblings ...)
2022-10-21 5:34 ` [PATCH master v2 0/3] Fix GCC 11 THUMB2 relocate_to_current_adr miscompile Ahmad Fatoum
@ 2022-10-24 9:03 ` Sascha Hauer
4 siblings, 0 replies; 6+ messages in thread
From: Sascha Hauer @ 2022-10-24 9:03 UTC (permalink / raw)
To: Ahmad Fatoum; +Cc: barebox
On Thu, Oct 20, 2022 at 03:15:07PM +0200, Ahmad Fatoum wrote:
> Since a8b788ba61eb ("relocate_to_current_adr: hang directly on error instead
> of panic()"), GCC can prove that variables aren't supposed to overlap and as
> such it generated code than readded get_runtime_offset() on top of an
> already relocated linker-defined variable's address.
> See PATCH 2/3 for a disassembly of the affected code. Board code can
> be similarly micompiled, but that's a fix for another day.
>
> Ahmad Fatoum (3):
> include: asm-generic: reloc: implement runtime_address()
> ARM: cpu: add compiler barrier around unrelocated access
> RISC-V: add compiler barriers around unrelocated accesses
Applied, thanks
Sascha
>
> arch/arm/cpu/common.c | 15 ++++---
> arch/arm/cpu/uncompress.c | 4 +-
> arch/riscv/boot/uncompress.c | 4 +-
> arch/riscv/include/asm/sections.h | 3 +-
> arch/riscv/lib/reloc.c | 10 +++--
> include/asm-generic/reloc.h | 69 +++++++++++++++++++++++++++++++
> 6 files changed, 91 insertions(+), 14 deletions(-)
>
> --
> 2.30.2
>
>
>
--
Pengutronix e.K. | |
Steuerwalder Str. 21 | http://www.pengutronix.de/ |
31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
^ permalink raw reply [flat|nested] 6+ messages in thread