From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Sun, 19 May 2024 16:21:03 +0000 (-0700)
Subject: Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel... 
X-Git-Tag: v6.10-rc5-pxa1908~287
X-Git-Url: https://git.dujemihanovic.xyz/?a=commitdiff_plain;h=61307b7be41a1f1039d1d1368810a1d92cb97b44;p=linux.git

Merge tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull mm updates from Andrew Morton:
 "The usual shower of singleton fixes and minor series all over MM,
  documented (hopefully adequately) in the respective changelogs.
  Notable series include:

   - Lucas Stach has provided some page-mapping cleanup/consolidation/
     maintainability work in the series "mm/treewide: Remove pXd_huge()
     API".

   - In the series "Allow migrate on protnone reference with
     MPOL_PREFERRED_MANY policy", Donet Tom has optimized mempolicy's
     MPOL_PREFERRED_MANY mode, yielding almost doubled performance in
     one test.

   - In their series "Memory allocation profiling" Kent Overstreet and
     Suren Baghdasaryan have contributed a means of determining (via
     /proc/allocinfo) whereabouts in the kernel memory is being
     allocated: number of calls and amount of memory.

   - Matthew Wilcox has provided the series "Various significant MM
     patches" which does a number of rather unrelated things, but in
     largely similar code sites.

   - In his series "mm: page_alloc: freelist migratetype hygiene"
     Johannes Weiner has fixed the page allocator's handling of
     migratetype requests, with resulting improvements in compaction
     efficiency.

   - In the series "make the hugetlb migration strategy consistent"
     Baolin Wang has fixed a hugetlb migration issue, which should
     improve hugetlb allocation reliability.

   - Liu Shixin has hit an I/O meltdown caused by readahead in a
     memory-tight memcg. Addressed in the series "Fix I/O high when
     memory almost met memcg limit".

   - In the series "mm/filemap: optimize folio adding and splitting"
     Kairui Song has optimized pagecache insertion, yielding ~10%
     performance improvement in one test.

   - Baoquan He has cleaned up and consolidated the early zone
     initialization code in the series "mm/mm_init.c: refactor
     free_area_init_core()".

   - Baoquan has also redone some MM initializatio code in the series
     "mm/init: minor clean up and improvement".

   - MM helper cleanups from Christoph Hellwig in his series "remove
     follow_pfn".

   - More cleanups from Matthew Wilcox in the series "Various
     page->flags cleanups".

   - Vlastimil Babka has contributed maintainability improvements in the
     series "memcg_kmem hooks refactoring".

   - More folio conversions and cleanups in Matthew Wilcox's series:
	"Convert huge_zero_page to huge_zero_folio"
	"khugepaged folio conversions"
	"Remove page_idle and page_young wrappers"
	"Use folio APIs in procfs"
	"Clean up __folio_put()"
	"Some cleanups for memory-failure"
	"Remove page_mapping()"
	"More folio compat code removal"

   - David Hildenbrand chipped in with "fs/proc/task_mmu: convert
     hugetlb functions to work on folis".

   - Code consolidation and cleanup work related to GUP's handling of
     hugetlbs in Peter Xu's series "mm/gup: Unify hugetlb, part 2".

   - Rick Edgecombe has developed some fixes to stack guard gaps in the
     series "Cover a guard gap corner case".

   - Jinjiang Tu has fixed KSM's behaviour after a fork+exec in the
     series "mm/ksm: fix ksm exec support for prctl".

   - Baolin Wang has implemented NUMA balancing for multi-size THPs.
     This is a simple first-cut implementation for now. The series is
     "support multi-size THP numa balancing".

   - Cleanups to vma handling helper functions from Matthew Wilcox in
     the series "Unify vma_address and vma_pgoff_address".

   - Some selftests maintenance work from Dev Jain in the series
     "selftests/mm: mremap_test: Optimizations and style fixes".

   - Improvements to the swapping of multi-size THPs from Ryan Roberts
     in the series "Swap-out mTHP without splitting".

   - Kefeng Wang has significantly optimized the handling of arm64's
     permission page faults in the series
	"arch/mm/fault: accelerate pagefault when badaccess"
	"mm: remove arch's private VM_FAULT_BADMAP/BADACCESS"

   - GUP cleanups from David Hildenbrand in "mm/gup: consistently call
     it GUP-fast".

   - hugetlb fault code cleanups from Vishal Moola in "Hugetlb fault
     path to use struct vm_fault".

   - selftests build fixes from John Hubbard in the series "Fix
     selftests/mm build without requiring "make headers"".

   - Memory tiering fixes/improvements from Ho-Ren (Jack) Chuang in the
     series "Improved Memory Tier Creation for CPUless NUMA Nodes".
     Fixes the initialization code so that migration between different
     memory types works as intended.

   - David Hildenbrand has improved follow_pte() and fixed an errant
     driver in the series "mm: follow_pte() improvements and acrn
     follow_pte() fixes".

   - David also did some cleanup work on large folio mapcounts in his
     series "mm: mapcount for large folios + page_mapcount() cleanups".

   - Folio conversions in KSM in Alex Shi's series "transfer page to
     folio in KSM".

   - Barry Song has added some sysfs stats for monitoring multi-size
     THP's in the series "mm: add per-order mTHP alloc and swpout
     counters".

   - Some zswap cleanups from Yosry Ahmed in the series "zswap
     same-filled and limit checking cleanups".

   - Matthew Wilcox has been looking at buffer_head code and found the
     documentation to be lacking. The series is "Improve buffer head
     documentation".

   - Multi-size THPs get more work, this time from Lance Yang. His
     series "mm/madvise: enhance lazyfreeing with mTHP in madvise_free"
     optimizes the freeing of these things.

   - Kemeng Shi has added more userspace-visible writeback
     instrumentation in the series "Improve visibility of writeback".

   - Kemeng Shi then sent some maintenance work on top in the series
     "Fix and cleanups to page-writeback".

   - Matthew Wilcox reduces mmap_lock traffic in the anon vma code in
     the series "Improve anon_vma scalability for anon VMAs". Intel's
     test bot reported an improbable 3x improvement in one test.

   - SeongJae Park adds some DAMON feature work in the series
	"mm/damon: add a DAMOS filter type for page granularity access recheck"
	"selftests/damon: add DAMOS quota goal test"

   - Also some maintenance work in the series
	"mm/damon/paddr: simplify page level access re-check for pageout"
	"mm/damon: misc fixes and improvements"

   - David Hildenbrand has disabled some known-to-fail selftests ni the
     series "selftests: mm: cow: flag vmsplice() hugetlb tests as
     XFAIL".

   - memcg metadata storage optimizations from Shakeel Butt in "memcg:
     reduce memory consumption by memcg stats".

   - DAX fixes and maintenance work from Vishal Verma in the series
     "dax/bus.c: Fixups for dax-bus locking""

* tag 'mm-stable-2024-05-17-19-19' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (426 commits)
  memcg, oom: cleanup unused memcg_oom_gfp_mask and memcg_oom_order
  selftests/mm: hugetlb_madv_vs_map: avoid test skipping by querying hugepage size at runtime
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_wp
  mm/hugetlb: add missing VM_FAULT_SET_HINDEX in hugetlb_fault
  selftests: cgroup: add tests to verify the zswap writeback path
  mm: memcg: make alloc_mem_cgroup_per_node_info() return bool
  mm/damon/core: fix return value from damos_wmark_metric_value
  mm: do not update memcg stats for NR_{FILE/SHMEM}_PMDMAPPED
  selftests: cgroup: remove redundant enabling of memory controller
  Docs/mm/damon/maintainer-profile: allow posting patches based on damon/next tree
  Docs/mm/damon/maintainer-profile: change the maintainer's timezone from PST to PT
  Docs/mm/damon/design: use a list for supported filters
  Docs/admin-guide/mm/damon/usage: fix wrong schemes effective quota update command
  Docs/admin-guide/mm/damon/usage: fix wrong example of DAMOS filter matching sysfs file
  selftests/damon: classify tests for functionalities and regressions
  selftests/damon/_damon_sysfs: use 'is' instead of '==' for 'None'
  selftests/damon/_damon_sysfs: find sysfs mount point from /proc/mounts
  selftests/damon/_damon_sysfs: check errors from nr_schemes file reads
  mm/damon/core: initialize ->esz_bp from damos_quota_init_priv()
  selftests/damon: add a test for DAMOS quota goal
  ...
---

61307b7be41a1f1039d1d1368810a1d92cb97b44
diff --cc arch/alpha/lib/checksum.c
index 17fa230baeef,c29b98ef9c82..27b2a9edf3cc
--- a/arch/alpha/lib/checksum.c
+++ b/arch/alpha/lib/checksum.c
@@@ -12,9 -12,9 +12,10 @@@
   
  #include <linux/module.h>
  #include <linux/string.h>
 +#include <net/checksum.h>
  
  #include <asm/byteorder.h>
+ #include <asm/checksum.h>
  
  static inline unsigned short from64to16(unsigned long x)
  {
diff --cc arch/alpha/lib/fpreg.c
index eee11fb4c7f1,3d32165043f8..9a238e7536ae
--- a/arch/alpha/lib/fpreg.c
+++ b/arch/alpha/lib/fpreg.c
@@@ -8,8 -8,8 +8,9 @@@
  #include <linux/compiler.h>
  #include <linux/export.h>
  #include <linux/preempt.h>
+ #include <asm/fpu.h>
  #include <asm/thread_info.h>
 +#include <asm/fpu.h>
  
  #if defined(CONFIG_ALPHA_EV6) || defined(CONFIG_ALPHA_EV67)
  #define STT(reg,val)  asm volatile ("ftoit $f"#reg",%0" : "=r"(val));
diff --cc arch/arm64/include/asm/pgtable.h
index bde9fd179388,1303d30287dc..f8efbc128446
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@@ -560,19 -502,15 +554,17 @@@ static inline int pmd_trans_huge(pmd_t 
  #define pmd_mkclean(pmd)	pte_pmd(pte_mkclean(pmd_pte(pmd)))
  #define pmd_mkdirty(pmd)	pte_pmd(pte_mkdirty(pmd_pte(pmd)))
  #define pmd_mkyoung(pmd)	pte_pmd(pte_mkyoung(pmd_pte(pmd)))
 -
 -static inline pmd_t pmd_mkinvalid(pmd_t pmd)
 -{
 -	pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID));
 -	pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID));
 -
 -	return pmd;
 -}
 +#define pmd_mkinvalid(pmd)	pte_pmd(pte_mkinvalid(pmd_pte(pmd)))
 +#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP
 +#define pmd_uffd_wp(pmd)	pte_uffd_wp(pmd_pte(pmd))
 +#define pmd_mkuffd_wp(pmd)	pte_pmd(pte_mkuffd_wp(pmd_pte(pmd)))
 +#define pmd_clear_uffd_wp(pmd)	pte_pmd(pte_clear_uffd_wp(pmd_pte(pmd)))
 +#define pmd_swp_uffd_wp(pmd)	pte_swp_uffd_wp(pmd_pte(pmd))
 +#define pmd_swp_mkuffd_wp(pmd)	pte_pmd(pte_swp_mkuffd_wp(pmd_pte(pmd)))
 +#define pmd_swp_clear_uffd_wp(pmd) \
 +				pte_pmd(pte_swp_clear_uffd_wp(pmd_pte(pmd)))
 +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_WP */
  
- #define pmd_thp_or_huge(pmd)	(pmd_huge(pmd) || pmd_trans_huge(pmd))
- 
  #define pmd_write(pmd)		pte_write(pmd_pte(pmd))
  
  #define pmd_mkhuge(pmd)		(__pmd(pmd_val(pmd) & ~PMD_TABLE_BIT))
diff --cc arch/powerpc/mm/mem.c
index 22cc978cdb47,a197d4c2244b..d325217ab201
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@@ -16,7 -16,7 +16,8 @@@
  #include <linux/highmem.h>
  #include <linux/suspend.h>
  #include <linux/dma-direct.h>
 +#include <linux/execmem.h>
+ #include <linux/vmalloc.h>
  
  #include <asm/swiotlb.h>
  #include <asm/machdep.h>
diff --cc include/linux/fortify-string.h
index 85fc0e6f0f7f,b24d62bad0b3..d658ae729a02
--- a/include/linux/fortify-string.h
+++ b/include/linux/fortify-string.h
@@@ -738,10 -734,10 +738,11 @@@ __FORTIFY_INLINE void *kmemdup_noprof(c
  	if (__compiletime_lessthan(p_size, size))
  		__read_overflow();
  	if (p_size < size)
 -		fortify_panic(FORTIFY_FUNC_kmemdup, FORTIFY_READ, p_size, size, NULL);
 +		fortify_panic(FORTIFY_FUNC_kmemdup, FORTIFY_READ, p_size, size,
 +			      __real_kmemdup(p, 0, gfp));
  	return __real_kmemdup(p, size, gfp);
  }
+ #define kmemdup(...)	alloc_hooks(kmemdup_noprof(__VA_ARGS__))
  
  /**
   * strcpy - Copy a string into another string buffer
diff --cc include/linux/slab.h
index ebc20173cd4e,4cc37ef22aae..7247e217e21b
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@@ -744,68 -773,42 +773,47 @@@ static inline __alloc_size(1, 2) void *
   * @size: how many bytes of memory are required.
   * @flags: the type of memory to allocate (see kmalloc).
   */
- static inline __alloc_size(1) void *kzalloc(size_t size, gfp_t flags)
+ static inline __alloc_size(1) void *kzalloc_noprof(size_t size, gfp_t flags)
  {
- 	return kmalloc(size, flags | __GFP_ZERO);
+ 	return kmalloc_noprof(size, flags | __GFP_ZERO);
  }
+ #define kzalloc(...)				alloc_hooks(kzalloc_noprof(__VA_ARGS__))
+ #define kzalloc_node(_size, _flags, _node)	kmalloc_node(_size, (_flags)|__GFP_ZERO, _node)
  
- /**
-  * kzalloc_node - allocate zeroed memory from a particular memory node.
-  * @size: how many bytes of memory are required.
-  * @flags: the type of memory to allocate (see kmalloc).
-  * @node: memory node from which to allocate
-  */
- static inline __alloc_size(1) void *kzalloc_node(size_t size, gfp_t flags, int node)
- {
- 	return kmalloc_node(size, flags | __GFP_ZERO, node);
- }
+ extern void *kvmalloc_node_noprof(size_t size, gfp_t flags, int node) __alloc_size(1);
+ #define kvmalloc_node(...)			alloc_hooks(kvmalloc_node_noprof(__VA_ARGS__))
  
- extern void *kvmalloc_node(size_t size, gfp_t flags, int node) __alloc_size(1);
- static inline __alloc_size(1) void *kvmalloc(size_t size, gfp_t flags)
- {
- 	return kvmalloc_node(size, flags, NUMA_NO_NODE);
- }
- static inline __alloc_size(1) void *kvzalloc_node(size_t size, gfp_t flags, int node)
- {
- 	return kvmalloc_node(size, flags | __GFP_ZERO, node);
- }
- static inline __alloc_size(1) void *kvzalloc(size_t size, gfp_t flags)
- {
- 	return kvmalloc(size, flags | __GFP_ZERO);
- }
+ #define kvmalloc(_size, _flags)			kvmalloc_node(_size, _flags, NUMA_NO_NODE)
+ #define kvmalloc_noprof(_size, _flags)		kvmalloc_node_noprof(_size, _flags, NUMA_NO_NODE)
 -#define kvzalloc(_size, _flags)			kvmalloc(_size, _flags|__GFP_ZERO)
++#define kvzalloc(_size, _flags)			kvmalloc(_size, (_flags)|__GFP_ZERO)
+ 
 -#define kvzalloc_node(_size, _flags, _node)	kvmalloc_node(_size, _flags|__GFP_ZERO, _node)
++#define kvzalloc_node(_size, _flags, _node)	kvmalloc_node(_size, (_flags)|__GFP_ZERO, _node)
  
 -static inline __alloc_size(1, 2) void *kvmalloc_array_noprof(size_t n, size_t size, gfp_t flags)
 +static inline __alloc_size(1, 2) void *
- kvmalloc_array_node(size_t n, size_t size, gfp_t flags, int node)
++kvmalloc_array_node_noprof(size_t n, size_t size, gfp_t flags, int node)
  {
  	size_t bytes;
  
  	if (unlikely(check_mul_overflow(n, size, &bytes)))
  		return NULL;
  
- 	return kvmalloc_node(bytes, flags, node);
 -	return kvmalloc_node_noprof(bytes, flags, NUMA_NO_NODE);
++	return kvmalloc_node_noprof(bytes, flags, node);
  }
  
- static inline __alloc_size(1, 2) void *
- kvmalloc_array(size_t n, size_t size, gfp_t flags)
- {
- 	return kvmalloc_array_node(n, size, flags, NUMA_NO_NODE);
- }
- 
- static inline __alloc_size(1, 2) void *
- kvcalloc_node(size_t n, size_t size, gfp_t flags, int node)
- {
- 	return kvmalloc_array_node(n, size, flags | __GFP_ZERO, node);
- }
++#define kvmalloc_array_noprof(...)		kvmalloc_array_node_noprof(__VA_ARGS__, NUMA_NO_NODE)
++#define kvcalloc_node_noprof(_n,_s,_f,_node)	kvmalloc_array_node_noprof(_n,_s,(_f)|__GFP_ZERO,_node)
++#define kvcalloc_noprof(...)			kvcalloc_node_noprof(__VA_ARGS__, NUMA_NO_NODE)
 +
- static inline __alloc_size(1, 2) void *kvcalloc(size_t n, size_t size, gfp_t flags)
- {
- 	return kvmalloc_array(n, size, flags | __GFP_ZERO);
- }
+ #define kvmalloc_array(...)			alloc_hooks(kvmalloc_array_noprof(__VA_ARGS__))
 -#define kvcalloc(_n, _size, _flags)		kvmalloc_array(_n, _size, _flags|__GFP_ZERO)
 -#define kvcalloc_noprof(_n, _size, _flags)	kvmalloc_array_noprof(_n, _size, _flags|__GFP_ZERO)
++#define kvcalloc_node(...)			alloc_hooks(kvcalloc_node_noprof(__VA_ARGS__))
++#define kvcalloc(...)				alloc_hooks(kvcalloc_noprof(__VA_ARGS__))
  
- extern void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
+ extern void *kvrealloc_noprof(const void *p, size_t oldsize, size_t newsize, gfp_t flags)
  		      __realloc_size(3);
+ #define kvrealloc(...)				alloc_hooks(kvrealloc_noprof(__VA_ARGS__))
+ 
  extern void kvfree(const void *addr);
 -DEFINE_FREE(kvfree, void *, if (_T) kvfree(_T))
 +DEFINE_FREE(kvfree, void *, if (!IS_ERR_OR_NULL(_T)) kvfree(_T))
  
  extern void kvfree_sensitive(const void *addr, size_t len);
  
diff --cc include/linux/string.h
index 10e5177bb49c,793c27ad7c0d..60168aa2af07
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@@ -284,11 -282,12 +284,13 @@@ extern void kfree_const(const void *x)
  extern char *kstrdup(const char *s, gfp_t gfp) __malloc;
  extern const char *kstrdup_const(const char *s, gfp_t gfp);
  extern char *kstrndup(const char *s, size_t len, gfp_t gfp);
- extern void *kmemdup(const void *src, size_t len, gfp_t gfp) __realloc_size(2);
+ extern void *kmemdup_noprof(const void *src, size_t len, gfp_t gfp) __realloc_size(2);
+ #define kmemdup(...)	alloc_hooks(kmemdup_noprof(__VA_ARGS__))
+ 
  extern void *kvmemdup(const void *src, size_t len, gfp_t gfp) __realloc_size(2);
  extern char *kmemdup_nul(const char *s, size_t len, gfp_t gfp);
 -extern void *kmemdup_array(const void *src, size_t element_size, size_t count, gfp_t gfp);
 +extern void *kmemdup_array(const void *src, size_t element_size, size_t count, gfp_t gfp)
 +		__realloc_size(2, 3);
  
  /* lib/argv_split.c */
  extern char **argv_split(gfp_t gfp, const char *str, int *argcp);
diff --cc io_uring/memmap.c
index 523d982af2b0,000000000000..4785d6af5fee
mode 100644,000000..100644
--- a/io_uring/memmap.c
+++ b/io_uring/memmap.c
@@@ -1,336 -1,0 +1,336 @@@
 +// SPDX-License-Identifier: GPL-2.0
 +#include <linux/kernel.h>
 +#include <linux/init.h>
 +#include <linux/errno.h>
 +#include <linux/mm.h>
 +#include <linux/mman.h>
 +#include <linux/slab.h>
 +#include <linux/vmalloc.h>
 +#include <linux/io_uring.h>
 +#include <linux/io_uring_types.h>
 +#include <asm/shmparam.h>
 +
 +#include "memmap.h"
 +#include "kbuf.h"
 +
 +static void *io_mem_alloc_compound(struct page **pages, int nr_pages,
 +				   size_t size, gfp_t gfp)
 +{
 +	struct page *page;
 +	int i, order;
 +
 +	order = get_order(size);
 +	if (order > MAX_PAGE_ORDER)
 +		return ERR_PTR(-ENOMEM);
 +	else if (order)
 +		gfp |= __GFP_COMP;
 +
 +	page = alloc_pages(gfp, order);
 +	if (!page)
 +		return ERR_PTR(-ENOMEM);
 +
 +	for (i = 0; i < nr_pages; i++)
 +		pages[i] = page + i;
 +
 +	return page_address(page);
 +}
 +
 +static void *io_mem_alloc_single(struct page **pages, int nr_pages, size_t size,
 +				 gfp_t gfp)
 +{
 +	void *ret;
 +	int i;
 +
 +	for (i = 0; i < nr_pages; i++) {
 +		pages[i] = alloc_page(gfp);
 +		if (!pages[i])
 +			goto err;
 +	}
 +
 +	ret = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL);
 +	if (ret)
 +		return ret;
 +err:
 +	while (i--)
 +		put_page(pages[i]);
 +	return ERR_PTR(-ENOMEM);
 +}
 +
 +void *io_pages_map(struct page ***out_pages, unsigned short *npages,
 +		   size_t size)
 +{
 +	gfp_t gfp = GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_NOWARN;
 +	struct page **pages;
 +	int nr_pages;
 +	void *ret;
 +
 +	nr_pages = (size + PAGE_SIZE - 1) >> PAGE_SHIFT;
 +	pages = kvmalloc_array(nr_pages, sizeof(struct page *), gfp);
 +	if (!pages)
 +		return ERR_PTR(-ENOMEM);
 +
 +	ret = io_mem_alloc_compound(pages, nr_pages, size, gfp);
 +	if (!IS_ERR(ret))
 +		goto done;
 +
 +	ret = io_mem_alloc_single(pages, nr_pages, size, gfp);
 +	if (!IS_ERR(ret)) {
 +done:
 +		*out_pages = pages;
 +		*npages = nr_pages;
 +		return ret;
 +	}
 +
 +	kvfree(pages);
 +	*out_pages = NULL;
 +	*npages = 0;
 +	return ret;
 +}
 +
 +void io_pages_unmap(void *ptr, struct page ***pages, unsigned short *npages,
 +		    bool put_pages)
 +{
 +	bool do_vunmap = false;
 +
 +	if (!ptr)
 +		return;
 +
 +	if (put_pages && *npages) {
 +		struct page **to_free = *pages;
 +		int i;
 +
 +		/*
 +		 * Only did vmap for the non-compound multiple page case.
 +		 * For the compound page, we just need to put the head.
 +		 */
 +		if (PageCompound(to_free[0]))
 +			*npages = 1;
 +		else if (*npages > 1)
 +			do_vunmap = true;
 +		for (i = 0; i < *npages; i++)
 +			put_page(to_free[i]);
 +	}
 +	if (do_vunmap)
 +		vunmap(ptr);
 +	kvfree(*pages);
 +	*pages = NULL;
 +	*npages = 0;
 +}
 +
 +void io_pages_free(struct page ***pages, int npages)
 +{
 +	struct page **page_array = *pages;
 +
 +	if (!page_array)
 +		return;
 +
 +	unpin_user_pages(page_array, npages);
 +	kvfree(page_array);
 +	*pages = NULL;
 +}
 +
 +struct page **io_pin_pages(unsigned long uaddr, unsigned long len, int *npages)
 +{
 +	unsigned long start, end, nr_pages;
 +	struct page **pages;
 +	int ret;
 +
 +	end = (uaddr + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
 +	start = uaddr >> PAGE_SHIFT;
 +	nr_pages = end - start;
 +	if (WARN_ON_ONCE(!nr_pages))
 +		return ERR_PTR(-EINVAL);
 +
 +	pages = kvmalloc_array(nr_pages, sizeof(struct page *), GFP_KERNEL);
 +	if (!pages)
 +		return ERR_PTR(-ENOMEM);
 +
 +	ret = pin_user_pages_fast(uaddr, nr_pages, FOLL_WRITE | FOLL_LONGTERM,
 +					pages);
 +	/* success, mapped all pages */
 +	if (ret == nr_pages) {
 +		*npages = nr_pages;
 +		return pages;
 +	}
 +
 +	/* partial map, or didn't map anything */
 +	if (ret >= 0) {
 +		/* if we did partial map, release any pages we did get */
 +		if (ret)
 +			unpin_user_pages(pages, ret);
 +		ret = -EFAULT;
 +	}
 +	kvfree(pages);
 +	return ERR_PTR(ret);
 +}
 +
 +void *__io_uaddr_map(struct page ***pages, unsigned short *npages,
 +		     unsigned long uaddr, size_t size)
 +{
 +	struct page **page_array;
 +	unsigned int nr_pages;
 +	void *page_addr;
 +
 +	*npages = 0;
 +
 +	if (uaddr & (PAGE_SIZE - 1) || !size)
 +		return ERR_PTR(-EINVAL);
 +
 +	nr_pages = 0;
 +	page_array = io_pin_pages(uaddr, size, &nr_pages);
 +	if (IS_ERR(page_array))
 +		return page_array;
 +
 +	page_addr = vmap(page_array, nr_pages, VM_MAP, PAGE_KERNEL);
 +	if (page_addr) {
 +		*pages = page_array;
 +		*npages = nr_pages;
 +		return page_addr;
 +	}
 +
 +	io_pages_free(&page_array, nr_pages);
 +	return ERR_PTR(-ENOMEM);
 +}
 +
 +static void *io_uring_validate_mmap_request(struct file *file, loff_t pgoff,
 +					    size_t sz)
 +{
 +	struct io_ring_ctx *ctx = file->private_data;
 +	loff_t offset = pgoff << PAGE_SHIFT;
 +
 +	switch ((pgoff << PAGE_SHIFT) & IORING_OFF_MMAP_MASK) {
 +	case IORING_OFF_SQ_RING:
 +	case IORING_OFF_CQ_RING:
 +		/* Don't allow mmap if the ring was setup without it */
 +		if (ctx->flags & IORING_SETUP_NO_MMAP)
 +			return ERR_PTR(-EINVAL);
 +		return ctx->rings;
 +	case IORING_OFF_SQES:
 +		/* Don't allow mmap if the ring was setup without it */
 +		if (ctx->flags & IORING_SETUP_NO_MMAP)
 +			return ERR_PTR(-EINVAL);
 +		return ctx->sq_sqes;
 +	case IORING_OFF_PBUF_RING: {
 +		struct io_buffer_list *bl;
 +		unsigned int bgid;
 +		void *ptr;
 +
 +		bgid = (offset & ~IORING_OFF_MMAP_MASK) >> IORING_OFF_PBUF_SHIFT;
 +		bl = io_pbuf_get_bl(ctx, bgid);
 +		if (IS_ERR(bl))
 +			return bl;
 +		ptr = bl->buf_ring;
 +		io_put_bl(ctx, bl);
 +		return ptr;
 +		}
 +	}
 +
 +	return ERR_PTR(-EINVAL);
 +}
 +
 +int io_uring_mmap_pages(struct io_ring_ctx *ctx, struct vm_area_struct *vma,
 +			struct page **pages, int npages)
 +{
 +	unsigned long nr_pages = npages;
 +
 +	vm_flags_set(vma, VM_DONTEXPAND);
 +	return vm_insert_pages(vma, vma->vm_start, pages, &nr_pages);
 +}
 +
 +#ifdef CONFIG_MMU
 +
 +__cold int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
 +{
 +	struct io_ring_ctx *ctx = file->private_data;
 +	size_t sz = vma->vm_end - vma->vm_start;
 +	long offset = vma->vm_pgoff << PAGE_SHIFT;
 +	void *ptr;
 +
 +	ptr = io_uring_validate_mmap_request(file, vma->vm_pgoff, sz);
 +	if (IS_ERR(ptr))
 +		return PTR_ERR(ptr);
 +
 +	switch (offset & IORING_OFF_MMAP_MASK) {
 +	case IORING_OFF_SQ_RING:
 +	case IORING_OFF_CQ_RING:
 +		return io_uring_mmap_pages(ctx, vma, ctx->ring_pages,
 +						ctx->n_ring_pages);
 +	case IORING_OFF_SQES:
 +		return io_uring_mmap_pages(ctx, vma, ctx->sqe_pages,
 +						ctx->n_sqe_pages);
 +	case IORING_OFF_PBUF_RING:
 +		return io_pbuf_mmap(file, vma);
 +	}
 +
 +	return -EINVAL;
 +}
 +
 +unsigned long io_uring_get_unmapped_area(struct file *filp, unsigned long addr,
 +					 unsigned long len, unsigned long pgoff,
 +					 unsigned long flags)
 +{
 +	void *ptr;
 +
 +	/*
 +	 * Do not allow to map to user-provided address to avoid breaking the
 +	 * aliasing rules. Userspace is not able to guess the offset address of
 +	 * kernel kmalloc()ed memory area.
 +	 */
 +	if (addr)
 +		return -EINVAL;
 +
 +	ptr = io_uring_validate_mmap_request(filp, pgoff, len);
 +	if (IS_ERR(ptr))
 +		return -ENOMEM;
 +
 +	/*
 +	 * Some architectures have strong cache aliasing requirements.
 +	 * For such architectures we need a coherent mapping which aliases
 +	 * kernel memory *and* userspace memory. To achieve that:
 +	 * - use a NULL file pointer to reference physical memory, and
 +	 * - use the kernel virtual address of the shared io_uring context
 +	 *   (instead of the userspace-provided address, which has to be 0UL
 +	 *   anyway).
 +	 * - use the same pgoff which the get_unmapped_area() uses to
 +	 *   calculate the page colouring.
 +	 * For architectures without such aliasing requirements, the
 +	 * architecture will return any suitable mapping because addr is 0.
 +	 */
 +	filp = NULL;
 +	flags |= MAP_SHARED;
 +	pgoff = 0;	/* has been translated to ptr above */
 +#ifdef SHM_COLOUR
 +	addr = (uintptr_t) ptr;
 +	pgoff = addr >> PAGE_SHIFT;
 +#else
 +	addr = 0UL;
 +#endif
- 	return current->mm->get_unmapped_area(filp, addr, len, pgoff, flags);
++	return mm_get_unmapped_area(current->mm, filp, addr, len, pgoff, flags);
 +}
 +
 +#else /* !CONFIG_MMU */
 +
 +int io_uring_mmap(struct file *file, struct vm_area_struct *vma)
 +{
 +	return is_nommu_shared_mapping(vma->vm_flags) ? 0 : -EINVAL;
 +}
 +
 +unsigned int io_uring_nommu_mmap_capabilities(struct file *file)
 +{
 +	return NOMMU_MAP_DIRECT | NOMMU_MAP_READ | NOMMU_MAP_WRITE;
 +}
 +
 +unsigned long io_uring_get_unmapped_area(struct file *file, unsigned long addr,
 +					 unsigned long len, unsigned long pgoff,
 +					 unsigned long flags)
 +{
 +	void *ptr;
 +
 +	ptr = io_uring_validate_mmap_request(file, pgoff, len);
 +	if (IS_ERR(ptr))
 +		return PTR_ERR(ptr);
 +
 +	return (unsigned long) ptr;
 +}
 +
 +#endif /* !CONFIG_MMU */
diff --cc kernel/module/main.c
index 91e185607d4b,2d25eebc549d..d18a94b973e1
--- a/kernel/module/main.c
+++ b/kernel/module/main.c
@@@ -56,8 -56,8 +56,9 @@@
  #include <linux/dynamic_debug.h>
  #include <linux/audit.h>
  #include <linux/cfi.h>
+ #include <linux/codetag.h>
  #include <linux/debugfs.h>
 +#include <linux/execmem.h>
  #include <uapi/linux/module.h>
  #include "internal.h"
  
@@@ -1188,50 -1198,32 +1189,54 @@@ void __weak module_arch_freeing_init(st
  {
  }
  
 -static bool mod_mem_use_vmalloc(enum mod_mem_type type)
 +static int module_memory_alloc(struct module *mod, enum mod_mem_type type)
  {
 -	return IS_ENABLED(CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC) &&
 -		mod_mem_type_is_core_data(type);
 -}
 +	unsigned int size = PAGE_ALIGN(mod->mem[type].size);
 +	enum execmem_type execmem_type;
 +	void *ptr;
  
 -static void *module_memory_alloc(unsigned int size, enum mod_mem_type type)
 -{
 -	if (mod_mem_use_vmalloc(type))
 -		return vzalloc(size);
 -	return module_alloc(size);
 +	mod->mem[type].size = size;
 +
 +	if (mod_mem_type_is_data(type))
 +		execmem_type = EXECMEM_MODULE_DATA;
 +	else
 +		execmem_type = EXECMEM_MODULE_TEXT;
 +
 +	ptr = execmem_alloc(execmem_type, size);
 +	if (!ptr)
 +		return -ENOMEM;
 +
 +	/*
 +	 * The pointer to these blocks of memory are stored on the module
 +	 * structure and we keep that around so long as the module is
 +	 * around. We only free that memory when we unload the module.
 +	 * Just mark them as not being a leak then. The .init* ELF
 +	 * sections *do* get freed after boot so we *could* treat them
 +	 * slightly differently with kmemleak_ignore() and only grey
 +	 * them out as they work as typical memory allocations which
 +	 * *do* eventually get freed, but let's just keep things simple
 +	 * and avoid *any* false positives.
 +	 */
 +	kmemleak_not_leak(ptr);
 +
 +	memset(ptr, 0, size);
 +	mod->mem[type].base = ptr;
 +
 +	return 0;
  }
  
- static void module_memory_free(struct module *mod, enum mod_mem_type type)
 -static void module_memory_free(void *ptr, enum mod_mem_type type,
++static void module_memory_free(struct module *mod, enum mod_mem_type type,
+ 			       bool unload_codetags)
  {
 +	void *ptr = mod->mem[type].base;
 +
+ 	if (!unload_codetags && mod_mem_type_is_core_data(type))
+ 		return;
+ 
 -	if (mod_mem_use_vmalloc(type))
 -		vfree(ptr);
 -	else
 -		module_memfree(ptr);
 +	execmem_free(ptr);
  }
  
- static void free_mod_mem(struct module *mod)
+ static void free_mod_mem(struct module *mod, bool unload_codetags)
  {
  	for_each_mod_mem_type(type) {
  		struct module_memory *mod_mem = &mod->mem[type];
@@@ -1242,12 -1234,13 +1247,12 @@@
  		/* Free lock-classes; relies on the preceding sync_rcu(). */
  		lockdep_free_key_range(mod_mem->base, mod_mem->size);
  		if (mod_mem->size)
- 			module_memory_free(mod, type);
 -			module_memory_free(mod_mem->base, type,
 -					   unload_codetags);
++			module_memory_free(mod, type, unload_codetags);
  	}
  
  	/* MOD_DATA hosts mod, so free it at last */
  	lockdep_free_key_range(mod->mem[MOD_DATA].base, mod->mem[MOD_DATA].size);
- 	module_memory_free(mod, MOD_DATA);
 -	module_memory_free(mod->mem[MOD_DATA].base, MOD_DATA, unload_codetags);
++	module_memory_free(mod, MOD_DATA, unload_codetags);
  }
  
  /* Free a module, remove from lists, etc. */
@@@ -2287,7 -2309,7 +2299,7 @@@ static int move_module(struct module *m
  	return 0;
  out_enomem:
  	for (t--; t >= 0; t--)
- 		module_memory_free(mod, t);
 -		module_memory_free(mod->mem[t].base, t, true);
++		module_memory_free(mod, t, true);
  	return ret;
  }
  
diff --cc tools/testing/selftests/lib.mk
index 33c24ceddfb7,1dae4a02957f..3023e0e2f58f
--- a/tools/testing/selftests/lib.mk
+++ b/tools/testing/selftests/lib.mk
@@@ -52,24 -44,19 +52,33 @@@ endi
  selfdir = $(realpath $(dir $(filter %/lib.mk,$(MAKEFILE_LIST))))
  top_srcdir = $(selfdir)/../../..
  
 +# msg: emit succinct information message describing current building step
 +# $1 - generic step name (e.g., CC, LINK, etc);
 +# $2 - optional "flavor" specifier; if provided, will be emitted as [flavor];
 +# $3 - target (assumed to be file); only file name will be emitted;
 +# $4 - optional extra arg, emitted as-is, if provided.
 +ifeq ($(V),1)
 +Q =
 +msg =
 +else
 +Q = @
 +msg = @printf '  %-8s%s %s%s\n' "$(1)" "$(if $(2), [$(2)])" "$(notdir $(3))" "$(if $(4), $(4))";
 +MAKEFLAGS += --no-print-directory
 +endif
 +
  ifeq ($(KHDR_INCLUDES),)
 -KHDR_INCLUDES := -isystem $(top_srcdir)/usr/include
 +KHDR_INCLUDES := -D_GNU_SOURCE -isystem $(top_srcdir)/usr/include
  endif
  
+ # In order to use newer items that haven't yet been added to the user's system
+ # header files, add $(TOOLS_INCLUDES) to the compiler invocation in each
+ # each selftest.
+ # You may need to add files to that location, or to refresh an existing file. In
+ # order to do that, run "make headers" from $(top_srcdir), then copy the
+ # header file that you want from $(top_srcdir)/usr/include/... , to the matching
+ # subdir in $(TOOLS_INCLUDE).
+ TOOLS_INCLUDES := -isystem $(top_srcdir)/tools/include/uapi
+ 
  # The following are built by lib.mk common compile rules.
  # TEST_CUSTOM_PROGS should be used by tests that require
  # custom build rule and prevent common build rule use.