From: Andrey Smirnov <andrew.smirnov@gmail.com>
To: barebox@lists.infradead.org
Cc: Andrey Smirnov <andrew.smirnov@gmail.com>
Subject: [PATCH v4 27/29] ARM: mmu: Implement on-demand PTE allocation
Date: Mon, 21 May 2018 20:15:08 -0700 [thread overview]
Message-ID: <20180522031510.25505-28-andrew.smirnov@gmail.com> (raw)
In-Reply-To: <20180522031510.25505-1-andrew.smirnov@gmail.com>
Allocating PTEs for every 4K page corresponding to SDRAM upfront costs
us quite a bit of memory: 1KB per 1MB or RAM. This is far from being a
deal-breaker for majority of use-cases, but for builds where amount of
free memory is in hundres of KBs* it becomes a real hurdle for being
able to use MMU (which also means no L1 cache).
Given how we really only need PTEs for a very few regions of memory
dedicated from DMA buffers (Ethernet, USB, etc), changing MMU code to
do on-demand section splitting can allow us to save significant amount
of memory without any functionality loss.
Below is a very trivial comparison of memory usages on start before
and after this patch is applied.
Before:
barebox@ZII VF610 Development Board, Rev B:/ meminfo
used: 1271584
free: 265553032
After:
barebox@ZII VF610 Development Board, Rev B:/ meminfo
used: 795276
free: 266024448
Tested on:
- VF610 Tower Board,
- VF610 ZII Development Board (Rev. C)
- i.MX51 Babbage Board
- i.MX7 SabreSD Board
- i.MX6 ZII RDU2 Board
- AT91SAM9X5-EK Board
* One example of such use-case is memory testing while running purely
out of SRAM
Signed-off-by: Andrey Smirnov <andrew.smirnov@gmail.com>
---
arch/arm/cpu/mmu.c | 209 ++++++++++++++++++++++++---------------------
1 file changed, 110 insertions(+), 99 deletions(-)
diff --git a/arch/arm/cpu/mmu.c b/arch/arm/cpu/mmu.c
index c3c23c639..5f82b63ca 100644
--- a/arch/arm/cpu/mmu.c
+++ b/arch/arm/cpu/mmu.c
@@ -35,6 +35,7 @@
#define PMD_SECT_DEF_CACHED (PMD_SECT_WB | PMD_SECT_DEF_UNCACHED)
#define PTRS_PER_PTE (PGDIR_SIZE / PAGE_SIZE)
+#define ARCH_MAP_WRITECOMBINE ((unsigned)-1)
static uint32_t *ttb;
@@ -60,6 +61,7 @@ static inline void tlb_invalidate(void)
#define PTE_FLAGS_UNCACHED_V7 (0)
#define PTE_FLAGS_CACHED_V4 (PTE_SMALL_AP_UNO_SRW | PTE_BUFFERABLE | PTE_CACHEABLE)
#define PTE_FLAGS_UNCACHED_V4 PTE_SMALL_AP_UNO_SRW
+#define PGD_FLAGS_WC_V7 PMD_SECT_TEX(1)
/*
* PTE flags to set cached and uncached areas.
@@ -68,6 +70,7 @@ static inline void tlb_invalidate(void)
static uint32_t pte_flags_cached;
static uint32_t pte_flags_wc;
static uint32_t pte_flags_uncached;
+static uint32_t pgd_flags_wc;
#define PTE_MASK ((1 << 12) - 1)
@@ -110,6 +113,11 @@ static u32 *arm_create_pte(unsigned long virt, uint32_t flags)
return table;
}
+static bool pgd_type_table(u32 pgd)
+{
+ return (pgd & PMD_TYPE_MASK) == PMD_TYPE_TABLE;
+}
+
static u32 *find_pte(unsigned long adr)
{
u32 *table;
@@ -117,23 +125,8 @@ static u32 *find_pte(unsigned long adr)
if (!ttb)
arm_mmu_not_initialized_error();
- if ((ttb[pgd_index(adr)] & PMD_TYPE_MASK) != PMD_TYPE_TABLE) {
- struct memory_bank *bank;
- int i = 0;
-
- /*
- * This should only be called for page mapped memory inside our
- * memory banks. It's a bug to call it with section mapped memory
- * locations.
- */
- pr_crit("%s: TTB for address 0x%08lx is not of type table\n",
- __func__, adr);
- pr_crit("Memory banks:\n");
- for_each_memory_bank(bank)
- pr_crit("#%d 0x%08lx - 0x%08lx\n", i, bank->start,
- bank->start + bank->size - 1);
- BUG();
- }
+ if (!pgd_type_table(ttb[pgd_index(adr)]))
+ return NULL;
/* find the coarse page table base address */
table = (u32 *)(ttb[pgd_index(adr)] & ~0x3ff);
@@ -159,42 +152,114 @@ static void dma_inv_range(unsigned long start, unsigned long end)
__dma_inv_range(start, end);
}
-static int __remap_range(void *_start, size_t size, u32 pte_flags)
-{
- unsigned long start = (unsigned long)_start;
- u32 *p;
- int numentries, i;
-
- numentries = size >> PAGE_SHIFT;
- p = find_pte(start);
-
- for (i = 0; i < numentries; i++) {
- p[i] &= ~PTE_MASK;
- p[i] |= pte_flags | PTE_TYPE_SMALL;
- }
-
- dma_flush_range(p, numentries * sizeof(u32));
- tlb_invalidate();
-
- return 0;
-}
-
int arch_remap_range(void *start, size_t size, unsigned flags)
{
+ u32 addr = (u32)start;
u32 pte_flags;
+ u32 pgd_flags;
+
+ BUG_ON(!IS_ALIGNED(addr, PAGE_SIZE));
switch (flags) {
case MAP_CACHED:
pte_flags = pte_flags_cached;
+ pgd_flags = PMD_SECT_DEF_CACHED;
break;
case MAP_UNCACHED:
pte_flags = pte_flags_uncached;
+ pgd_flags = PMD_SECT_DEF_UNCACHED;
+ break;
+ case ARCH_MAP_WRITECOMBINE:
+ pte_flags = pte_flags_wc;
+ pgd_flags = pgd_flags_wc;
break;
default:
return -EINVAL;
}
- return __remap_range(start, size, pte_flags);
+ while (size) {
+ const bool pgdir_size_aligned = IS_ALIGNED(addr, PGDIR_SIZE);
+ u32 *pgd = (u32 *)&ttb[pgd_index(addr)];
+ size_t chunk;
+
+ if (size >= PGDIR_SIZE && pgdir_size_aligned &&
+ !pgd_type_table(*pgd)) {
+ /*
+ * TODO: Add code to discard a page table and
+ * replace it with a section
+ */
+ chunk = PGDIR_SIZE;
+ *pgd = addr | pgd_flags;
+ dma_flush_range(pgd, sizeof(*pgd));
+ } else {
+ unsigned int num_ptes;
+ u32 *table = NULL;
+ unsigned int i;
+ u32 *pte;
+ /*
+ * We only want to cover pages up until next
+ * section boundary in case there we would
+ * have an opportunity to re-map the whole
+ * section (say if we got here becasue address
+ * was not aligned on PGDIR_SIZE boundary)
+ */
+ chunk = pgdir_size_aligned ?
+ PGDIR_SIZE : ALIGN(addr, PGDIR_SIZE) - addr;
+ /*
+ * At the same time we want to make sure that
+ * we don't go on remapping past requested
+ * size in case that is less that the distance
+ * to next PGDIR_SIZE boundary.
+ */
+ chunk = min(chunk, size);
+ num_ptes = chunk / PAGE_SIZE;
+
+ pte = find_pte(addr);
+ if (!pte) {
+ /*
+ * If PTE is not found it means that
+ * we needs to split this section and
+ * create a new page table for it
+ *
+ * NOTE: Here we assume that section
+ * we just split was mapped as cached
+ */
+ table = arm_create_pte(addr, pte_flags_cached);
+ pte = find_pte(addr);
+ BUG_ON(!pte);
+ /*
+ * We just split this section and
+ * modified it's Level 1 descriptor,
+ * so it needs to be flushed.
+ */
+ dma_flush_range(pgd, sizeof(*pgd));
+ }
+
+ for (i = 0; i < num_ptes; i++) {
+ pte[i] &= ~PTE_MASK;
+ pte[i] |= pte_flags | PTE_TYPE_SMALL;
+ }
+
+ if (table) {
+ /*
+ * If we just created a new page
+ * table, the whole table would have
+ * to be flushed, not just PTEs that
+ * we touched when re-mapping.
+ */
+ pte = table;
+ num_ptes = PTRS_PER_PTE;
+ }
+
+ dma_flush_range(pte, num_ptes * sizeof(u32));
+ }
+
+ addr += chunk;
+ size -= chunk;
+ }
+
+ tlb_invalidate();
+ return 0;
}
void *map_io_sections(unsigned long phys, void *_start, size_t size)
@@ -209,55 +274,6 @@ void *map_io_sections(unsigned long phys, void *_start, size_t size)
return _start;
}
-/*
- * remap the memory bank described by mem cachable and
- * bufferable
- */
-static int arm_mmu_remap_sdram(struct memory_bank *bank)
-{
- unsigned long phys = (unsigned long)bank->start;
- unsigned long ttb_start = pgd_index(phys);
- unsigned long ttb_end = ttb_start + pgd_index(bank->size);
- unsigned long num_ptes = bank->size / PAGE_SIZE;
- int i, pte;
- u32 *ptes;
-
- pr_debug("remapping SDRAM from 0x%08lx (size 0x%08lx)\n",
- phys, bank->size);
-
- /*
- * We replace each 1MiB section in this range with second level page
- * tables, therefore we must have 1Mib aligment here.
- */
- if (!IS_ALIGNED(phys, PGDIR_SIZE) || !IS_ALIGNED(bank->size, PGDIR_SIZE))
- return -EINVAL;
-
- ptes = xmemalign(PAGE_SIZE, num_ptes * sizeof(u32));
-
- pr_debug("ptes: 0x%p ttb_start: 0x%08lx ttb_end: 0x%08lx\n",
- ptes, ttb_start, ttb_end);
-
- for (i = 0; i < num_ptes; i++) {
- ptes[i] = (phys + i * PAGE_SIZE) | PTE_TYPE_SMALL |
- pte_flags_cached;
- }
-
- pte = 0;
-
- for (i = ttb_start; i < ttb_end; i++) {
- ttb[i] = (unsigned long)(&ptes[pte]) | PMD_TYPE_TABLE |
- (0 << 4);
- pte += PTRS_PER_PTE;
- }
-
- dma_flush_range(ttb, 0x4000);
- dma_flush_range(ptes, num_ptes * sizeof(u32));
-
- tlb_invalidate();
-
- return 0;
-}
-
#define ARM_HIGH_VECTORS 0xffff0000
#define ARM_LOW_VECTORS 0x0
@@ -423,10 +439,12 @@ static int mmu_init(void)
if (cpu_architecture() >= CPU_ARCH_ARMv7) {
pte_flags_cached = PTE_FLAGS_CACHED_V7;
pte_flags_wc = PTE_FLAGS_WC_V7;
+ pgd_flags_wc = PGD_FLAGS_WC_V7;
pte_flags_uncached = PTE_FLAGS_UNCACHED_V7;
} else {
pte_flags_cached = PTE_FLAGS_CACHED_V4;
pte_flags_wc = PTE_FLAGS_UNCACHED_V4;
+ pgd_flags_wc = PMD_SECT_DEF_UNCACHED;
pte_flags_uncached = PTE_FLAGS_UNCACHED_V4;
}
@@ -477,13 +495,6 @@ static int mmu_init(void)
__mmu_cache_on();
- /*
- * Now that we have the MMU and caches on remap sdram again using
- * page tables
- */
- for_each_memory_bank(bank)
- arm_mmu_remap_sdram(bank);
-
return 0;
}
mmu_initcall(mmu_init);
@@ -501,7 +512,7 @@ void mmu_disable(void)
__mmu_cache_off();
}
-static void *dma_alloc(size_t size, dma_addr_t *dma_handle, uint32_t pte_flags)
+static void *dma_alloc(size_t size, dma_addr_t *dma_handle, unsigned flags)
{
void *ret;
@@ -512,19 +523,19 @@ static void *dma_alloc(size_t size, dma_addr_t *dma_handle, uint32_t pte_flags)
dma_inv_range((unsigned long)ret, (unsigned long)ret + size);
- __remap_range(ret, size, pte_flags);
+ arch_remap_range(ret, size, flags);
return ret;
}
void *dma_alloc_coherent(size_t size, dma_addr_t *dma_handle)
{
- return dma_alloc(size, dma_handle, pte_flags_uncached);
+ return dma_alloc(size, dma_handle, MAP_UNCACHED);
}
void *dma_alloc_writecombine(size_t size, dma_addr_t *dma_handle)
{
- return dma_alloc(size, dma_handle, pte_flags_wc);
+ return dma_alloc(size, dma_handle, ARCH_MAP_WRITECOMBINE);
}
unsigned long virt_to_phys(volatile void *virt)
@@ -540,7 +551,7 @@ void *phys_to_virt(unsigned long phys)
void dma_free_coherent(void *mem, dma_addr_t dma_handle, size_t size)
{
size = PAGE_ALIGN(size);
- __remap_range(mem, size, pte_flags_cached);
+ arch_remap_range(mem, size, MAP_CACHED);
free(mem);
}
--
2.17.0
_______________________________________________
barebox mailing list
barebox@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/barebox
next prev parent reply other threads:[~2018-05-22 3:16 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-22 3:14 [PATCH v4 00/29] ARM MMU code improvements and " Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 01/29] ARM: mmu: Remove unused ARM_VECTORS_SIZE Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 02/29] ARM: mmu: Make use of IS_ALIGNED in arm_mmu_remap_sdram() Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 03/29] ARM: mmu: Use ALIGN and ALIGN_DOWN in map_cachable() Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 04/29] ARM: mmu: Introduce set_ttbr() Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 05/29] ARM: mmu: Introduce set_domain() Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 06/29] ARM: mmu: Share code for create_sections() Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 07/29] ARM: mmu: Separate index and address in create_sections() Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 08/29] sizes.h: Sync with Linux 4.16 Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 09/29] ARM: mmu: Specify size in bytes in create_sections() Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 10/29] ARM: mmu: Share code for initial flat mapping creation Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 11/29] ARM: mmu: Share PMD_SECT_DEF_CACHED Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 12/29] ARM: mmu: Drop needless shifting in map_io_sections() Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 13/29] ARM: mmu: Replace hardcoded shifts with pgd_index() from Linux Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 14/29] ARM: mmu: Trivial simplification in arm_mmu_remap_sdram() Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 15/29] ARM: mmu: Replace various SZ_1M with PGDIR_SIZE Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 16/29] ARM: mmu: Use PAGE_SIZE when specifying size of one page Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 17/29] ARM: mmu: Define and use PTRS_PER_PTE Andrey Smirnov
2018-05-22 3:14 ` [PATCH v4 18/29] ARM: mmu: Use PAGE_SIZE instead of magic right shift by 12 Andrey Smirnov
2018-05-22 3:15 ` [PATCH v4 19/29] ARM: mmu: Use xmemalign in arm_create_pte() Andrey Smirnov
2018-05-22 3:15 ` [PATCH v4 20/29] ARM: mmu: Use xmemalign in mmu_init() Andrey Smirnov
2018-05-22 3:15 ` [PATCH v4 21/29] ARM: mmu: Share code between dma_alloc_*() functions Andrey Smirnov
2018-05-22 3:15 ` [PATCH v4 22/29] ARM: mmu: Pass PTE flags a parameter to arm_create_pte() Andrey Smirnov
2018-05-22 3:15 ` [PATCH v4 23/29] ARM: mmu: Make sure that address is 1M aligned in arm_create_pte() Andrey Smirnov
2018-05-22 3:15 ` [PATCH v4 24/29] ARM: mmu: Use find_pte() to find PTE in create_vector_table() Andrey Smirnov
2018-05-22 3:15 ` [PATCH v4 25/29] ARM: mmu: Use dma_inv_range() in dma_sync_single_for_cpu() Andrey Smirnov
2018-05-22 3:15 ` [PATCH v4 26/29] ARM: mmu: Simplify the use of dma_flush_range() Andrey Smirnov
2018-05-22 3:15 ` Andrey Smirnov [this message]
2018-05-22 3:15 ` [PATCH v4 28/29] ARM: mmu: Introduce ARM_TTB_SIZE Andrey Smirnov
2018-05-22 3:15 ` [PATCH v4 29/29] ARM: mmu: Do not try to pick early TTB up Andrey Smirnov
2018-05-22 7:16 ` [PATCH v4 00/29] ARM MMU code improvements and on-demand PTE allocation Sascha Hauer
2018-05-22 18:37 ` Andrey Smirnov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180522031510.25505-28-andrew.smirnov@gmail.com \
--to=andrew.smirnov@gmail.com \
--cc=barebox@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox