Replace P2ALIGN with P2ALIGN_TYPED and delete P2ALIGN.
In P2ALIGN, the result would be incorrect when align is unsigned integer and x is larger than max value of the type of align. In that case, -(align) would be a positive integer, which means high bits would be zero and finally stay zero after '&' when align is converted to a larger integer type.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Youzhong Yang <yyang@mathworks.com> Signed-off-by: Qiuhao Chen <chenqiuhao1997@gmail.com> Closes #15940
In dbuf_read_verify_dnode_crypt(): - We don't need original dbuf locked there. Instead take a lock on a dnode dbuf, that is actually manipulated. - Block decryption for a dnode dbuf if it is currently being written. ARC hash lock does not protect anonymous buffers, so arc_untransform() is unsafe when used on buffers being written, that may happen in case of encrypted dnode buffers, since they are not copied by dbuf_dirty()/dbuf_hold_copy().
In dbuf_read(): - If the buffer is in flight, recheck its compression/encryption status after it is cached, since it may need arc_untransform().
Tested-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16104
When compressed ARC is disabled, we may have to re-compress when writing into L2ARC. If doing so we can't fit it into the original physical size, we should just fail immediately, since even if it may still fit into allocation size, its checksum will never match.
While there, refactor the code similar to other compression places without using abd_return_buf_copy().
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16038
There is no reason for these module parameters to be read-only. Being modified they just apply on next pool import/creation, that is useful for testing different values.
Reviewed-by: Rich Ercolani <rincebrain@gmail.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16118
As I understand just for being less predictable dnode hash includes 8 bits of objset pointer, starting at 6. But since objset_t is more than 1KB in size, its allocations are likely aligned to 2KB, that means 11 lower bits provide no entropy. Just take the 8 bits starting from 11.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16131
Code for pools before version 11 uses dmu_objset_find_dp() to scan for children datasets/clones. It calls enqueue_clones_cb() and enqueue_cb() callbacks in parallel from multiple taskq threads. It ends up bad for scan_ds_queue_insert(), corrupting scn_queue AVL-tree. Fix it by introducing a mutex to protect those two scan_ds_queue_insert() calls. All other calls are done from the sync thread and so serialized.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16162
Previous code overengineered cloned range calculation by using BP_GET_LSIZE(). The problem is that legacy holes don't have the logical size, so result will be wrong. But we also don't need to look on every block size, since they all must be identical.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16165
ZAP: Fix leaf references on zap_expand_leaf() errors
Depending on kind of error zap_expand_leaf() may return with or without valid leaf reference held. Make sure it returns NULL if due to error it has no leaf to return. Make its callers to check the returned leaf pointer, and release the leaf if it is not NULL.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #12366 Closes #16159
Originally Solaris didn't expect errors there, but they may happen if we fail to add entry into ZAP. Linux fixed it in #7421, but it was never fully ported to FreeBSD.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored-By: iXsystems, Inc. Closes #13215 Closes #16138
In the commit of the head_errlog feature we introduced a bug in dsl_dataset_promote_sync(): we may dereference origin_head and hds, both dereferencing ddpa after calling promote_sync() on ddpa.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Chunwei Chen <david.chen@nutanix.com> Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: George Amanakis <gamanakis@gmail.com> Closes #16272 Closes #16273
Linux 6.7 compat: detect if kernel defines intptr_t
Since Linux 6.7 the kernel has defined intptr_t. Clang has -Wtypedef-redefinition by default, which causes the build to fail because we also have a typedef for intptr_t.
Since its better to use the kernel's if it exists, detect it and skip our own.
Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16201
Linux 6.9: Call add_disk() from workqueue to fix zfs_allow_010_pos (#16282)
The 6.9 kernel behaves differently in how it releases block devices. In the common case it will async release the device only after the return to userspace. This is different from the 6.8 and older kernels which release the block devices synchronously. To get around this, call add_disk() from a workqueue so that the kernel uses a different codepath to release our zvols in the way we expect. This stops zfs_allow_010_pos from hanging.
Fixes: #16089
Signed-off-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Linux has started moving to a model where instead of applying block queue limits through individual modification functions, a complete limits structure is built up and applied atomically, either when the block device or open, or some time afterwards. As of 6.10 this transition appears only partly completed.
This commit matches that model within OpenZFS in a way that should work for past and future kernels. We set up a queue limits structure with any limits that have had their modification functions removed. For newer kernels that can have limits applied at block device open (HAVE_BLK_ALLOC_DISK_2ARG), we have a conversion function to turn the OpenZFS queue limits structure into Linux's queue_limits structure, which can then be passed in. For older kernels, we provide an application function that just calls the old functions for each limit in the structure.
Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Linux 6.10: work harder to avoid kmem_cache_alloc reuse
Linux 6.10 change kmem_cache_alloc to be a macro, rather than a function, such that the old #undef for it in spl-kmem-cache.c would remove its definition completely, breaking the build.
This inverts the model used before. Rather than always defining the kmem_cache_* macro, then undefining then inside spl-kmem-cache.c, instead we make a special tag to indicate we're currently inside spl-kmem-cache.c, and not defining those in macros in the first place, so we can use the kernel-supplied kmem_cache_* functions to implement spl_kmem_cache_*, as we expect.
For all other callers, we create the macros as normal and remove access to the kernel's own conflicting names.
Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Linux 5.16: use bdev_nr_bytes() to get device capacity
This helper was introduced long ago, in 5.16. Since 6.10, bd_inode no longer exists, but the helper has been updated, so detect it and use it in all versions where it is available.
Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
We're seeing failures for redacted_deleted and redacted_mount on FreeBSD 13-15:
09:58:34.74 diff: /dev/fd/3: No such file or directory 09:58:34.74 ERROR: diff /dev/fd/3 /dev/fd/4 exited 2
The test was trying to diff the file listings between two directories to see if they are the same. The workaround is to do a string comparison of the directory listings instead of using `diff`.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #16224
This commit fixes what is probably a copy-paste mistake. The `dracut.zfs` manpage claims that the `bootfs.rollback` option executes `zfs snapshot -Rf`. `zfs snapshot` does not have a `-R` option. `zfs rollback` does.
Signed-off-by: Alphan Yılmaz <alphanyilmaz@gmail.com> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Tony Hutter <hutter2@llnl.gov>
On fedora 40, on the 6.9.4 kernel (in updates-testing), assign_str expands to a "do {<stuff> } while(0)" loop. Without this semicolon, the while(0) is unterminated, causing a cascade of useless errors. With this semicolon, it compiles fine. It also compiles fine on 6.8.11 (the previous kernel). I have not tested earlier kernels than that, but at worst it should add a pointless semicolon.
All other instances in the source tree are already terminated with semicolons.
Signed-off-by: Daniel Berlin <dberlin@dberlin.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Specifying a single test is kind of a hassle, because the full relative path under the test suite dir has to be included, but it's not always clear what that path even is.
This change allows `-t` to take the name of a single test instead of a full path. If the value has no `/` characters, we search for a file of that name under the test root, and if found, use that as the full test path instead.
Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16088
The test runner accumulates output from individual tests, then writes it to the log at the end. If a test hangs or crashes the system half way through, we get no insight into how it got to where it did.
This adds a -D option for "debug". When set, all test output is written to stdout.
Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Akash B <akash-b@hpe.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16096
abd_iter_page: rework to handle multipage scatterlists
Previously, abd_iter_page() would assume that every scatterlist would contain a single page (compound or no), because that's all we ever create in abd_alloc_chunks(). However, scatterlists can contain multiple pages of arbitrary provenance, and if we get one of those, we'd get all the math wrong.
This reworks things to handle multiple pages in a scatterlist, by properly finding the right page within it for the given offset, and understanding better where the end of the page is and not crossing it.
Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reported-by: Brian Atkinson <batkinson@lanl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Brian Atkinson <batkinson@lanl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16108
find_system_library: fix var cleanup when library not found
The "not found" path is attempting to clear SOMELIB_CFLAGS and SOMELIB_LIBS by resetting them in AC_SUBST(). However, the second arg to AC_SUBST is expanded in autoconf with `m4_ifvaln([$2], [[$1]=$2])`, which is defined as "if the first arg is non-empty". The m4 "empty" construction is [], therefore, the existing AC_SUBST calls never modify the variables at all.
The effect of this is that leftovers from the library test can leak out. At least, if a library header is found in the first stage, but the library itself is not, -lsomelib is added to SOMELIB_LIBS and further tests done. If that library is not found, SOMELIB_LIBS will not be cleared.
For most of our library tests this hasn't been a problem, as they're either always found properly via pkg-config or set directly, or the calling test immediately aborts configure. For an optional dependency however, an apparent "partial" result where the header is found but no corresponding library causes link errors later.
I think a complete fix should probably not be setting SOMELIB_xxx until the final result is known, but for now, adjusting the AC_SUBST calls to explictly set the empty shell string (which is not "empty" to m4) at least restores the intent.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140
libspl/assert: show process/task details in assert output
Makes it much easier to see what thing complained.
Getting thread id, program name and thread name vary wildly between Linux and FreeBSD, so those are set up in macros. pthread_getname_np() did not appear in musl until very recently, but the same info has always been available via prctl(PR_GET_NAME), so we use that instead.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140
If multiple threads trip an assertion at the same moment (quite common), they can be printing at the same time, and their output gets messy.
This adds a simple lock around the whole thing, to prevent a second task printing assert output before the first has finished.
Additionally, if libspl_assert_ok is not set, abort() is called without dropping the lock, so that any other asserting tasks will be killed before starting any output, rather than only getting part-way through. This is a tradeoff; it's assumed that multiple threads asserting at the same moment are likely the same fault in different instances of a thread, and so there won't be any more useful information from the other tasks anyway.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140
libspl/assert: use libunwind for backtrace when available
libunwind seems to do a better job of resolving a symbols than backtrace(), and is also useful on platforms that don't have backtrace() (eg musl). If it's available, use it.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16140
Unbreak FreeBSD cross-build on MacOS broken in 051460b8b
MacOS used FreeBSD-compatible getprogname() and pthread_getname_np(). But pthread_getthreadid_np() does not exist on MacOS. This implements libspl_gettid() using pthread_threadid_np() to get the thread id of the current thread.
Tested with FreeBSD GitHub actions freebsd-src/.github/workflows/cross-bootstrap-tools.yml
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Signed-off-by: Martin Matuska <mm@FreeBSD.org> Closes #16167
The pthread_* functions are in -lpthread on FreeBSD. Some of them are implicitly linked through libc, but on FreeBSD 13 at least pthread_getname_np() is not. Just be explicit, since -lpthread is the documented interface anyway.
Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Closes #16168
ztest has a very nice ability to show a backtrace when there's an unexpected crash. zdb is used often enough on corrupted data and can blow up too, so nice output is useful there too.
Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16181
Use memset to zero stack allocations containing unions
C99 6.7.8.17 says that when an undesignated initialiser is used, only the first element of a union is initialised. If the first element is not the largest within the union, how the remaining space is initialised is up to the compiler.
GCC extends the initialiser to the entire union, while Clang treats the remainder as padding, and so initialises according to whatever automatic/implicit initialisation rules are currently active.
When Linux is compiled with CONFIG_INIT_STACK_ALL_PATTERN, -ftrivial-auto-var-init=pattern is added to the kernel CFLAGS. This flag sets the policy for automatic/implicit initialisation of variables on the stack.
Taken together, this means that when compiling under CONFIG_INIT_STACK_ALL_PATTERN on Clang, the "zero" initialiser will only zero the first element in a union, and the rest will be filled with a pattern. This is significant for aes_ctx_t, which in aes_encrypt_atomic() and aes_decrypt_atomic() is initialised to zero, but then used as a gcm_ctx_t, which is the fifth element in the union, and thus gets pattern initialisation. Later, it's assumed to be zero, resulting in a hang.
As confusing and undiscoverable as it is, by the spec, we are at fault when we initialise a structure containing a union with the zero initializer. As such, this commit replaces these uses with an explicit memset(0).
Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16135 Closes #16206
In case of error dmu_buf_fill_done() returns the buffer back into DB_UNCACHED state. Since during transition from DB_UNCACHED into DB_FILL state dbuf_noread() allocates an ARC buffer, we must free it here, otherwise it will be leaked.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #15665 Closes #15802 Closes #16216
- Add old eviction for special and dedup metaslab classes. Those vdevs may be potentially big and fragmented with large metaslabs, while their asynchronous write pattern is not really different from normal class. It seems an omission to not evict old metaslabs from them. - If we have metaslab preload enabled, which means we are not too low on memory, do not evict active metaslabs even if they are not used for some time. Eviction of active metaslabs means we won't be able to write anything until we load them, that may take some time, that is straight opposite to metaslab preload goals. For small systems the memory saving should be less important after recent reduction in number of allocators and so open metaslabs.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Closes #16214
zts: test single-disk pool resumes properly after disk pull
A single disk pool should suspend when its disk fails and hold the IO. When the disk is returned, the pool should return and the IO be reissued, leaving everything in good shape.
Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Don Brady <don.brady@klarasystems.com>
After c3f2f1aa2, vdev_fault_wanted is set on a vdev after a probe fails. An end-of-txg async task is charged with actually faulting the vdev.
In a single-disk pool, the probe failure will degrade the last disk, and then suspend the pool. However, vdev_fault_wanted is not cleared. After the pool returns, the transaction finishes and the async task runs and faults the vdev, which suspends the pool again.
The fix is simple: when reopening a vdev, clear the async fault flag. If the vdev is still failed, the startup probe will quickly notice and degrade/suspend it again. If not, all is well!
Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Co-authored-by: Don Brady <don.brady@klarasystems.com> Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Jorgen Lundman <lundman@lundman.net> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Don Brady <don.brady@klarasystems.com>
These are used for DDT and BRT stores. There's limited information available to produce meaningful output, but at least we can put something on screen rather than crashing.
Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov>
FreeBSD: Use a statement expression to implement SET_ERROR() (#16284)
This way we can avoid making assumptions about the SDT probe implementation. No functional change intended.
Signed-off-by: Mark Johnston <markj@FreeBSD.org> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Fix long_free_dirty accounting for small files (#16264)
For files smaller than recordsize, it's most likely that they don't have L1 blocks. However, current calculation will always return at least 1 L1 block.
In this change, we check dnode level to figure out if it has L1 blocks or not, and return 0 if it doesn't. This will reduce the chance of unnecessary throttling when deleting a large number of small files.
Signed-off-by: Chunwei Chen <david.chen@nutanix.com> Co-authored-by: Chunwei Chen <david.chen@nutanix.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
ZTS: Make do_vol_test() more deterministic (#16379)
- Explicitly disable compression since mkfile uses a zero buffer. - Explicitly sync file systems instead of waiting for timeout.
Signed-off-by: Alexander Motin <mav@FreeBSD.org> Sponsored by: iXsystems, Inc. Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Tony Hutter <hutter2@llnl.gov>
[2.2.5-only] Make 'rmmod zfs' work after zfs-2.2.4 (#16406)
db65272ae was added to zfs-2.2.4 to stub in the VDEV_PROP_RAIDZ_EXPANDING enum without adding the RAIDz expansion feature. This was needed to provide the right enum count for when the VDEV_PROP_SLOW_IO proprieties got added. This had the unfortunate side effect of breaking module removal though.
Specifically, with the VDEV_PROP_RAIDZ_EXPANDING stub added, the module would correctly omit making kobjects for the RAIDz expansion vdev property, but then would try to blindly remove its non-existent kobjects during module unload.
This commit fixes the issue by checking for an uninitialized kobject.
Fixes: #16249
Signed-off-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ameer Hamza <ahamza@ixsystems.com> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
In pipe_build_write_buffer we increment uio_iov but did not update uio_iovcnt. This would not cause an OOB read (thanks to to uio_resid) but is inconsistent and could be an issue if other code changes are made in the future.
Reported by: Synacktiv Reviewed by: jhb, markj, brooks Sponsored by: The Alpha-Omega Project Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45999
(cherry picked from commit d8ff42e816848a0d4a427755b46b8560cb86ebc8)
The `FBIN2FREQ()` and `FREQ2FBIN()` macros in `ar9300eep.h` are invoked in most places around the `ath_hal` code with a (effectively) boolean second argument, corresponding to "is this 2GHz?". But in the code that is warned about, the value `HAL_FREQ_BAND_2GHZ` is of a different non-boolean type, `HAL_FREQ_BAND`.
Update the `FBIN2FREQ()` and `FREQ2FBIN()` macros to interpret the second argument as boolean value, and rename the macro parameter names to better describe their meaning.
Merge commit d2353ae00c3b from llvm git (by Argyrios Kyrtzidis):
[utils/TableGen/X86CompressEVEXTablesEmitter.cpp] Make sure the tablegen output for the `checkPredicate` function is deterministic (#84533)
The output for the `checkPredicate` function was depending on a `std::map` iteration that was non-deterministic from run to run, because the keys were pointer values.
Make a change so that the keys are `StringRef`s so the ordering is stable.
This avoids non-determinism in llvm-tblgen output, which could cause differences in the generated X86GenCompressEVEXTables.inc file. Although these differences are not influencing the meaning of the generated code, they still change a few bytes in libllvm. This in turn influences all the binaries linked with libllvm, such as clang, ld.lld, etc.
Reported by: cperciva MFC after: 3 days
(cherry picked from commit 7a8d05ba19b7762596c0ff22e668e4d50bac81cf)
Have a separate PXEBOOTSIZE variable that acts much like LOADERSIZE variable to limit the size of the loader used for pxeldr. This allows people to override it independently of LOADERSIZE, which they may need to set larger for other reasons. Combined with PXEBOOT_DEFAULT_INTERP, you can build a larger lua loader, while still being able to build pxeldr with the 4th one, for example.
loader: Document that WITH_BEARSSL may need other tweaks
/boot/loader is right up aginst the 500k limit we have to make sure everything works in a wide variety of environments. However, adding WITH_BEARSSL can push it over the edge since we are so close to the limit with it enabled. One may also need to increase LOADERSIZE when enabling it. It's often safe to go much higher, especially when you don't plan on using pxeldr. Document this trade off here.
Make it possible to disable pxeboot. This loader will fail to build when it's too large. When /boot/loader needs to be larger like that, this options will disable a component whose build will fail. It is an explicit option rather than implicit when things are too large to force the user to make the explicit tradeoffs rather than wonder why they have a stale pxeboot or other odd failure mode.
MFC After: 3 days Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D46212
(cherry picked from commit 20d35d5817851df3a6d20e75df2e14a192b94940)
When the various loaders under stand/efi are built, the resulting binaries differ over multiple runs, even if WITH_REPRODUCIBLE_BUILD is used. This is caused by lld multithreading and the custom linker scripts for the loaders, and affects the following binaries:
intr: Remove dead code from intr_event_remove_handler()
We currently destroy the ithread in intr_event_destroy(). In preparation for fixing a bug there, remove this dead code and reorganize a bit to avoid some code duplication. No functional change intended.
Per https://sourceware.org/gdb/current/onlinedocs/gdb.html/Overview.html#Binary-Data certain bytes must be escaped. The XML register definitions we have so far do not run afoul of that rule, but the stub should handle them anyway.
msdosfs: fix cluster limit when mounting FAT-16 file systems
The maximum cluster number was calculated based on the number of data cluters that fit in the givem partition size and the size of the FAT area. This limit did not take into account that the highest 10 cluster numbers are reserved and must not be used for files.
PR: 280347 Reported by: pho@FreeBSD.org
(cherry picked from commit 45d4e82bf61f91792142a2b9e2af657dab8500fd)
MFC jail: only chdir to user's home directory when user is specified
jail(8) with the "exec.clean" parameter not only cleans the enviromnent variables before running commands, but also changes to the user's home directory. While this makes sense when auser is specified (via one of the exec.*_user parameters), it leads to all commands being run in the jail's /root directory even in the absence of an explicitly specified user. This can lead to problems when e.g. rc scripts are run from that non-world-readable directory, and run counter to expectations that jail startup is analogous to system startup.
Restrict this behvaiour to only users exlicitly specified, either via the command line or jail parameters, but not the implicit root user. While this changes long-stand practice, it's the more intuitive action.
jexec(8) has the same problem, and the same fix.
PR: 277210 Reported by: johannes.kunde at gmail Differential Revision: https://reviews.freebsd.org/D46226
(cherry picked from commit 5cf705491727dd963485f9911ee3d52c3bf148db)
Introduce a new function, add_timeout_timespec(), to use timespec structs to handle timeouts. Make add_timeout() into a wrapper for the latter function to retain compatibility with the rest of the codebase. No functional change intended.
Sponsored by: Google LLC (GSoC 2024) Signed-off-by: Isaac Cilia Attard <icattard@FreeBSD.org> MFC after: 10 days Reviwed by: cperciva, brooks, Tom Hukins, Alexander Ziaee Pull Request: https://github.com/freebsd/freebsd-src/pull/1368
(cherry picked from commit 16a235f23c066d27b3a53c66cf6aa329be07cdb9)
Make arp_timeout available to dhclient.c, set the default timeout to 250 ms, and provide a new command-line argument, 'n' for setting the timeout to 0.
Sponsored by: Google LLC (GSoC 2024) Signed-off-by: Isaac Cilia Attard <icattard@FreeBSD.org> MFC after: 10 days Reviwed by: cperciva, brooks, Tom Hukins, Alexander Ziaee Pull Request: https://github.com/freebsd/freebsd-src/pull/1368
(cherry picked from commit b51569ad3c806688befc00dad51d15a7e61659fb)
dhclient: rc.conf option to disable ARP resolution
Introduce a new rc.conf option to not wait for ARP resolution within dhclient. This is plausible on many modern networks where it is possible to trust the DHCP server to know whether an IP address is available.
Sponsored by: Google LLC (GSoC 2024) Signed-off-by: Isaac Cilia Attard <icattard@FreeBSD.org> MFC after: 10 days Reviwed by: cperciva, brooks, Tom Hukins, Alexander Ziaee Pull Request: https://github.com/freebsd/freebsd-src/pull/1368
(cherry picked from commit 503adcdf1db35eab0f3d35392947a6da3bd19539)
The DHCP server in EC2 knows exactly which system should be using which IP address (and in fact EC2 has source IP filtering on by default) so there's no point ARPing an address before using it.
The preceding commits (changing the ARP wait time from 2 s to 250 ms) and this one (eliminating the wait entirely in EC2) reduce the time required for a newly launched FreeBSD/EC2 instance to launch by 2 seconds.
Discussed with: icattard MFC after: 10 days Sponsored by: Amazon
(cherry picked from commit 54a543d5ea3a58aee2f001498376127efea24bd2)
socket: Fix handling of listening sockets in sotoxsocket()
A lock needs to be held to ensure that the socket does not become a listening socket while sotoxsocket() is loading fields from the socket buffers, as the memory backing the socket buffers is repurposed when transitioning to a listening socket.
xen/netfront: Decouple XENNET tags from mbuf lifetimes
netmap's generic mode tries to improve performance by minimizing mbuf allocations. In service of this goal, it maintains an extra reference to the mbuf and polls the counter to see if the driver has released its reference by calling m_freem(). As a result, the extref destructor is not called when expected by the netfront driver, and mbufs tags are not freed.
Modify the tx path to release its mbuf tags promptly when reclaiming tx descriptors. They are drawn from a fixed-size pool, so otherwise are quickly exhausted when a netfront interface is in netmap generic mode.
Co-authored by: royger MFC after: 2 weeks Fixes: dabb3db7a817 ("xen/netfront: deal with mbuf data crossing a page boundary") Sponsored by: Cloud Software Group Sponsored by: Klara, Inc. Sponsored by: Zenarmor
(cherry picked from commit 2e4781cb12af2d13262ed5decf6fd95c8d58d9f5)
ithread: Improve synchronization in ithread_destroy()
Previously, to destroy an ithread we would set IT_DEAD in its flags, and then wake it up if it wasn't already running. After doing this, intr_event_destroy() would free the intr_event structure. However, it did not wait for the ithread to exit, so it was possible for the ithread to access the intr_event after it was freed.
This use-after-free happens readily when running the pf tests in parallel, since they frequently create and destroy VNET jails, and pf registers several VNET-local swi handlers.
Fix the race by modifying ithread_destroy() to wait until the ithread has signaled that it is about to exit by setting ie->ie_thread = NULL. Existing callers of intr_event_destroy() are allowed to sleep.
In 534ee17e6 pf state checking for ICMP(v6) was made stricter. This change failed to correctly set the pf_pdesc for ICMP-in-ICMP lookups, resulting in ICMP error packets potentially being dropped incorrectly. Specially, it copied the ICMP header into a separate variable, not into the pf_pdesc.
Populate the required pf_pdesc fields for the embedded ICMP packet's state lookup.
When removing a user's home directory, if the directory is a ZFS dataset, it cannot be removed. If the directory has been emptied, use "zfs destroy" to destroy it. This complements the automatic dataset creation in adduser. Note that datasets within the directory and snapshots are not handled, as the complete path is not constructed.
While here, add waitpid() calls to rmat() and pw_user_del().
Reviewed by: des Differential Revision: https://reviews.freebsd.org/D45348
(cherry picked from commit d2f1f71ec8c62dd26d6169d0d671a5aa5a933c1a)
pf tests: ensure temporary files end up in the atf working directory
Many of the tests create temporary files. pid files, log files, tcpdump captures, ... We should take care to ensure they're stored in the temporary working directory Kyua creates rather than in the root directory.
This ensures there are no conflicts between simultaneously running tests, and also keeps the root directory clean.
vnet tests: verify that we can load if_epair and if_bridge
We're going to start running many of the vnet tests in nested jails (so they can run in parallel). That means the tests won't be able to load kernel modules, which we commonly do for if_epair and if_bridge.
Just assume that all vnet tests need this, because so many of them do that we don't want to manually annotate all of them. This is essentially a no-op on non-nested tests.
Do the same for the python test framework.
While here also have pflog_init actually call pft_init. While having pflog loaded implies we have pf too pft_init also checks for vimage support, and now for if_epair.
linux/zvol_os: fix SET_ERROR with negative return codes
SET_ERROR is our facility for tracking errors internally. The negation is to match the what the kernel expects from us. Thus, the negation should happen outside of the SET_ERROR.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <rob.norris@klarasystems.com> Closes #16364
Make sure log record don't stray beyond valid memory region.
There is a lack of verification of the space occupied by fixed members of lr_t in the zil_parse.
We can create a crafted image to trigger an out of bounds read by following these steps: 1) Do some file operations and reboot to simulate abnormal exit without umount 2) zil_chain.zc_nused: 0x1000 3) First lr_t lr_t.lrc_txtype: 0x0 lr_t.lrc_reclen: 0x1000-0xb8-0x1 lr_t.lrc_txg: 0x0 lr_t.lrc_seq: 0x1 4) Update checksum in zil_chain.zc_eck
Fix: Add some checks to make sure the remaining bytes are large enough to hold an log record.
Signed-off-by: XDTG <click1799@163.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
FreeBSD: Fix RLIMIT_FSIZE handling for block cloning
ZFS implements copy_file_range(2) using block cloning when possible. This implementation must respect the RLIMIT_FSIZE limit.
zfs_clone_range() already checks the limit, so it is safe to remove this check in zfs_freebsd_copy_file_range(). Moreover, the removed check produces false positives: the length passed to copy_file_range(2) may be larger than the input file size; as the man page notes, "for best performance, call copy_file_range() with the largest len value possible." In particular, some existing code passes SSIZE_MAX there.
The check in zfs_clone_range() clamps the length to the input file's size before checking, but the removed check uses the caller supplied length, so something like
Linux 6.11: enable queue flush through queue limits
In 6.11 struct queue_limits gains a 'features' field, where, among other things, flush and write-cache are enabled. Detect it and use it.
Along the way, the blk_queue_set_write_cache() compat wrapper gets a little cleanup. Since both flags are alway set together, its now a single bool. Also the very very ancient version that sets q->flush_flags directly couldn't actually turn it off, so I've fixed that. Not that we use it, but still.
Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Rob Norris <robn@despairlabs.com> Sponsored-by: https://despairlabs.com/sponsor/ Closes #16400
Use /dev/urandom so we never have to wait on entropy.
Reviewed-by: George Melikov <mail@gmelikov.ru> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #16442
The 6.10 kernel broke our rpm-kmod builds. The 6.10 kernel really wants the source files in the same directory as the object files. This workaround makes rpm-kmod work again. It also updates the builtin kernel codepath to work correctly with 6.10.
See kernel commits:
b1992c3772e6 kbuild: use $(src) instead of $(srctree)/$(src) for source directory 9a0ebe5011f4 kbuild: use $(obj)/ instead of $(src)/ for common pattern rules
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #16439 Closes #16450
Fix null ptr deref when renaming a zvol with snaps and snapdev=visible (#16316)
If a zvol is renamed, and it has one or more snapshots, and snapdev=visible is true for the zvol, then the rename causes a kernel null pointer dereference error. This has the effect (on Linux, anyway) of killing the z_zvol taskq kthread, with locks still held; which in turn causes a variety of zvol-related operations afterward to hang indefinitely (such as udev workers, among other things).
The problem occurs because of an oversight in #15486 (e36ff84c338d2f7b15aef2538f6a9507115bbf4a). As documented in dataset_kstats_create, some datasets may not actually have kstats allocated for them; and at least at the present time, this is true for snapshots. In practical terms, this means that for snapshots, dk->dk_kstats will be NULL. The dataset_kstats_rename function introduced in the patch above does not first check whether dk->dk_kstats is NULL before proceeding, unlike e.g. the nearby dataset_kstats_update_* functions.
In the very particular circumstance in which a zvol is renamed, AND that zvol has one or more snapshots, AND that zvol also has snapdev=visible, zvol_rename_minors_impl will loop over not just the zvol dataset itself, but each of the zvol's snapshots as well, so that their device nodes will be renamed as well. This results in dataset_kstats_create being called for snapshots, where, as we've established, dk->dk_kstats is NULL.
Fix this by simply adding a NULL check before doing anything in dataset_kstats_rename.
This still allows the dataset_name kstat value for the zvol to be updated (as was the intent of the original patch), and merely blocks attempts by the code to act upon the zvol's non-kstat-having snapshots. If at some future time, kstats are added for snapshots, then things should work as intended in that case as well.
Signed-off-by: Justin Gottula <justin@jgottula.com> Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Alan Somers <asomers@gmail.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov>
linux/zvol_os.c: Fix max_discard_sectors limit for 6.8+ kernel
In kernels 6.8 and later, the zvol block device is allocated with qlimits passed during initialization. However, the zvol driver does not set `max_hw_discard_sectors`, which is necessary to properly initialize `max_discard_sectors`. This causes the `zvol_misc_trim` test to fail on 6.8+ kernels when invoking the `blkdiscard` command. Setting `max_hw_discard_sectors` in the `HAVE_BLK_ALLOC_DISK_2ARG` case resolve the issue.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #16462
linux/zvol_os.c: cleanup limits for non-blk mq case
Rob Noris suggested that we could clean up redundant limits for the case of non-blk mq scenario.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Rob Norris <robn@despairlabs.com> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #16462
Update the META file to reflect compatibility with the 6.10 kernel.
Reviewed-by: Rob Norris <robn@despairlabs.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Tony Hutter <hutter2@llnl.gov> Closes #16466
Upstream unbound includes a backup configure file which is distributed in the upstream tarball. It must be created by their release process and not deleted prior to packaging the tarball. I've received two emails so far asking about it. Let's remove it so nobody else asks about it.
(cherry picked from commit 51c8a9c1be57b6750e7c64e1379e8c33bd0f02c1)
libalias: fix subtle racy problem in outside-inside forwarding
sys/netinet/libalias/alias_db.c has internal static function UseLink() that passes a link to CleanupLink() to verify if the link has expired. If so, UseLink() may return NULL.
_FindLinkIn()'s usage of UseLink() is not quite correct.
Assume there is "redirect_port udp" configured to forward incoming traffic for specific port to some internal address. Such a rule creates partially specified permanent link.
After first such incoming packet libalias creates new fully specifiled temporary LINK_UDP with default timeout of 60 seconds. Also, in case of low traffic libalias may assign "timestamp" for this new temporary link way in the past because LibAliasTime is updated seldom and can keep old value for tens of seconds, and it will be used for the temporary link.
It may happen that next incoming packet for redirected port passed to _FindLinkIn() results in a call to UseLink() that returns NULL due to detected expiration. Immediate return of NULL results in broken translation: either a packet is dropped (deny_incoming mode) or delivered to original destination address instead of internal one.
Fix it with additional check for NULL to proceed with a search for original partially specified link. In case of UDP, it also recreates temporary fully specified link with a call to ReLink().
Practical examples are "redirect_port udp" rules for unidirectional SYSLOG protocol (port 514) or some low volume VPN encapsulated in UDP.
Thanks to Peter Much for initial analysis and first version of a patch.
Reported by: Peter Much <pmc@citylink.dinoex.sub.org> PR: 269770
(cherry picked from commit 8132e959099f0c533f698d8fbc17386f9144432f) (cherry picked from commit e5b85380836378c9e321a4e6d300591e6faf622a)
amdsmn(4), amdtemp(4): add support for AMD Ryzen 7 "Phoenix" processors
Adds support for AMD Ryzen 7 "Phoenix" processors (family 0x19, model 0x70-0x7f) to the amdsmn(4) and amdtemp(4) drivers. This enables temperature readings of these CPUs via sysctl.
The sensors function identically to those for the "Raphael" processors (model 0x60-0x6f); only the PCI device ID differs.
PR: kern/280942 Relnotes: yes MFC after: 3 days
(cherry picked from commit ef3f8aa0a0492487ac7db839de078b1913f61b4c)
Introduce a -V option, which can be used alongside -d (default unit change), in order to hot-swap devices (i.e switch to them on the fly without needing to restart the track), in case virtual_oss(8) exists and is running.
Sponsored by: The FreeBSD Foundation MFC after: 2 days Reviewed by: dev_submerge.ch Differential Revision: https://reviews.freebsd.org/D46253
(cherry picked from commit 9aac27599acaffa21ff69c5be8a2d71d29cc3d6b)
- Use snd_afmt2str() to display format conversions in feeder_format, instead of the plain hex value. - Simplify feeder_rate contents. - Print "ch" (e.g 2.1ch) after matrix values in feeder_matrix. - Use snd_afmt2str() instead of a plain hex for the the rest of the feeder classes.
Sponsored by: The FreeBSD Foundation MFC after: 2 days Reviewed by: dev_submerge.ch Differential Revision: https://reviews.freebsd.org/D46309
(cherry picked from commit 0864dfe6299b75e424b845c0d0e1a593da905ae3)
This reverts commit fa03d37432caf17d56a931a9e6f5d9b06f102c5b.
This commit caused us to not send IGMP leave messages if the inpcb went away. In other words: we freed pending packets whenever the socket closed rather than when the interface (or address) goes away.
mcast: fix leaked igmp packets on multicast cleanup
When we release a multicast address (e.g. on interface shutdown) we may still have packets queued in inm_scq. We have to free those, or we'll leak memory.
nvme: Add SGL structure and constants for use in NVMe commands
Fabrics capsules use an SGL structure instead of prp1/2 addresses to describe the data buffer used for a command. The SGL structure is added to a union with the existing prp1/2 fields.
nvmecontrol: Display additional Fabrics-related fields for cdata
Some of these fields are specific to Fabrics controllers (such as the size of capsules) while other fields are shared with PCI-e controllers, but are more relevant for Fabrics controllers (such as KeepAlive timer properties).
nvmecontrol: Create letoh to generically convert to host order
Using _Generic, create letoh which will generically convert uintXX_t types from little endian to host, regardless of the size. This name has been floated as a possible addition to endian.h.
nvmecontrol: Make the error log page work on native format
As the number of page types proliferates, it becomes untennable to convert them in read_logpage (especailly since new UUID page types will need to be supported). Convert the error page printing code to operate on little endian data.
nvmecontrol: Have to truncate on all 32-bit architectures
armv7, powerpc, powerpcspe and i386 all lack 128-bit integer types. Adjust the comment and #ifdef. I don't think we support nvme on any of these other architectures at the moment, but it won't hurt to be more precise.
We can't post a AER for this page, so there's no need to be able to swap it to host byte order. It's not one of the standard defined pages that can post via AER, and the vendor's public docs for this temperature page don't suggest it's possible to get over or under event changes. Since nvmecontrol no longer needsd the swap routine, remove it since it's now unused.
This patch uses __ARM_ARCH set by compiler (both GCC and Clang have this) whenever possible instead of hardcoding it to 7. This change allows code to compile on earlier ARM architectures such as armv5te.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Closes #15557
module/icp/asm-arm/sha2: enable non-SIMD asm kernels on armv5/6
My merged pull request #15557 fixes compilation of sha2 kernels on arm v5/6. However, the compiler guards only allows sha256/512_armv7_impl to be used when __ARM_ARCH > 6. This patch enables these ASM kernels on all arm architectures. Some compiler guards are adjusted accordingly to avoid the unnecessary compilation of SIMD (e.g., neon, armv8ce) kernels on old architectures.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Closes #15623
contrib: link zpool to zfs in bash-completion (#16376)
Currently user won't have completion of `zpool` command until they trigger completion of `zfs` first. This patch adds a link to `zfs`, thus user can use both to initialize the completion.
Fixes: #16320
Signed-off-by: Shengqi Chen <harry-chen@outlook.com> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Rob Norris <rob.norris@klarasystems.com> Reviewed-by: Tino Reichardt <milky-zfs@mcmilk.de>
ZTS: fix history_007_pos test on Ubuntu 24.04 (#16410)
The timezone "US/Mountain" isn't supported on newer linux versions. Using the correct timezone "America/Denver" like it's done in FreeBSD will fix this. Older Linux distros should behave also okay with this.
Signed-off-by: Tino Reichardt <milky-zfs@mcmilk.de> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: George Melikov <mail@gmelikov.ru>
zvol queue limits initialization depends on `zv_volblocksize`, but it is initialized later, leading to several limits being initialized with incorrect values, including `max_discard_*` limits. This also causes `blkdiscard` command to consistently fail, as `blk_ioctl_discard` reads `bdev_max_discard_sectors()` limits as 0, leading to failure. The fix is straightforward: initialize `zv->zv_volblocksize` early, before setting the queue limits. This PR should fix `zvol/zvol_misc/zvol_misc_trim` failure on recent PRs, as the test case issues `blkdiscard` for a zvol. Additionally, `zvol_misc_trim` was recently enabled in `6c7d41a`, which is why the issue wasn't identified earlier.
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Tony Hutter <hutter2@llnl.gov> Signed-off-by: Ameer Hamza <ahamza@ixsystems.com> Closes #16454
Some SCTP implementations will abort connections and then later re-use the same port numbers (i.e. both src and dst) for a new connection, before pf has fully purged the old connection.
Apply the same hack we already have for similarly misbehaving TCP implementations and forcibly remove the old state so we can create a new one.
nvmecontrol: Flesh out nvmecontrol format information
The format command takes a number of different parameters. Include a brief summary of what the values mean, though since the driver's support for metadata is at best weak, 0's are almost always used for values other than -f format. Add an example that ties it all together.
nvmecontrol: Allow optional /dev/ for device names
nvmecontrol operates on devices. Allow a user to specify the /dev/ if they want. Any device that starts with / will be treated as if it was a full path for maximum flexbility.
Sponsored by: Netflix
(cherry picked from commit b12cae88cfb6286bc85a47b36ddd84f52b5c38ca)
defaults/rc.conf: Remove /usr/lib32 from ldconfig32_paths
Commit 99132daf6f70cb0cc969c555d3612547fa3cf1db prepends /usr/lib32 to the list of paths in ldconfig32_paths since it is a standard library path in ld-elf32.so.1. Remove /usr/lib32 from the value in rc.conf so that it is not listed twice.
Reviewed by: olce, kib Sponsored by: University of Cambridge, Google, Inc. Differential Revision: https://reviews.freebsd.org/D44752
(cherry picked from commit 4bf5db113f760619bf754c22864b1d7e2acdeabd)
Enable L2 cache of all (MRU+MFU) metadata but MFU data only
`l2arc_mfuonly` was added to avoid wasting L2 ARC on read-once MRU data and metadata. However it can be useful to cache as much metadata as possible while, at the same time, restricting data cache to MFU buffers only.
This patch allow for such behavior by setting `l2arc_mfuonly` to 2 (or higher). The list of possible values is the following: 0: cache both MRU and MFU for both data and metadata; 1: cache only MFU for both data and metadata; 2: cache both MRU and MFU for metadata, but only MFU for data.
Reviewed-by: Alexander Motin <mav@FreeBSD.org> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Gionatan Danti <g.danti@assyoma.it> Closes #16343 Closes #16402
tcp: initialize the LRO hash table with correct size
There will at most lro_entries entries in the LRO hash table. So no need to take lro_mbufs into account, which only results in the LRO hash table being too large and therefore wasting memory.
Reviewed by: rrs Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D46378
(cherry picked from commit aa6c490bf80fcef15cfc0d3f562fae19ef2375aa)
Originally, a SYN-cache entry was always allocated and later freed, when not needed anymore. Then the allocation was avoided, when no SYN-cache entry was needed, and a copy on the stack was used. But the logic regarding freeing was not updated. This patch doesn't re-check conditions (which may have changed) when deciding to insert or free the entry, but uses the result of the earlier check. This simplifies the code and improves also consistency.
Reviewed by: glebius Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D46410
(cherry picked from commit e41364711ca3f7e214f9607ebedf62e03e51633d)
Stop shipping a log file for etcupdate. This is a source of non-reproducability as it uses mktemp thereby guaranteeing the output is different each run.
The tests correctly skip if no snd_dummy neither mixer is found, but the cleanup is still called with the skip condition, which fails if there is no mixer.
MFC after: 2 days Reviewed by: christos Differential Revision: https://reviews.freebsd.org/D46491
(cherry picked from commit 080c85127e3fba2c8cfb78cb75f7b306aee4028d)
When trying to use a VLAN device (e.g. "em0.123") with a dot the library fails to parse the interface correctly. The former pattern is much too restrictive given that almost all characters can be coerced into a device name via ifconfig.
Remove the particularly restrictive validation. Some characters still cannot be used as an interface name as they are used as delimiters in the syntax, but this allows to be able to use most of them without an issue.
Split out the common parts of building the uart devinfo from ACPI tables from the SPCR parser. This will be used when we support the DBG2 table to find the debug uart to be used by the kernel gdb stub.
Reviewed by: imp Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D44357
(cherry picked from commit 473c0b44ae8c51b2aebc51887714b2ed14de50bf)
To support recent extensions to the Arm architecture we may need to store more or larger registers when sending a signal.
To support this create a list of these extra registers. Userspace that needs to access a register in the signal handler can then walk the list to find the correct register struct and read/write its contents.
When we enable checking for BTI on arm64 we need to include an ELF note in all object files linked into a module.
As using objcopy from a binary to an ELF object file doesn't add the note switch to using .incbin from an assembly file. This allows us to add the needed note without affecting the included object.
When returning from an exception to userspace clear the saved td_frame. On the next exception this should point to the frame, however this is not guaranteed.
To ensure the trap frame pointer is either valid or NULL clear it before returning to userspace in the EL0 synchronous exception handler.
When entering the kernel with the E2H field set the layout of the cnthctl_el2 register changes. Use the correct field locations to enable access to the counter and timer registers from EL1.
Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D45529
(cherry picked from commit 997511dffe651e1d2d708f37f2ced430a6ab3349)
In locore.S we need to configure access to the GICv3. To check if it's available we read the id_aa64pfr0_el1 register, however we then only check if a GICv3.0 or 4.0 is present. If the system has a GICv4.1 this check would fail.
Move to checking if the GICV3+ is not absent so this will still work if the field is updated again.
Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D45530
(cherry picked from commit 57ef7935eb114e98e7e554c5ffbded68fd038c04)
arm64: Ensure sctlr and pstate are in known states
Before entering the kernel exception level ensure sctlr_el2 and sctlr_el1 are in a known state. The EOS flag needs to be set to ensure an eret instruction is a context synchronization event.
Set spcr_el1 when entering the kernel from EL1 and use an eret instruction to return to the caller. This ensures the CPU pstate is consistent with the value in spcr_el1 as it is the only way to set it directly.
Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D45528
(cherry picked from commit 034c83fd7d85f57193850a73cc0ac957a211f725)
dev/uart: Add APMC0D08 as found in the Intel E2100
This uart has the requirement for 32-bit sized and aligned memory accesses. It is also described in the Serial Port Console Redirection Table (SPCR) with a different interface type value.
Reviewed by: imp Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D45834
(cherry picked from commit 9840598aa31f2a89272f5bef6545e316f254f0c6)
The only part of DEBUG_BUFRING we don't support in userspace is the mutex checks. Add _KERNEL checks around these so we can enable the extra debugging.
If a thread reads the head but then sleeps for long enough that another thread fills the ring and leaves the new head with the expected value then the cmpset can pass when it should have failed.
To work around this keep the full head and tail value and use the upper bits as a generation count.
Reviewed by: kib Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D46151
(cherry picked from commit 3cc603909e09c958e20dd5a8a341f62f29e33a07)
Use an atomic operation with a memory barrier loading br_cons_tail from the producer thread and storing to it in the consumer thread.
On dequeue we need to read the pointer value from the buf_ring before moving the consumer tail as that indicates the entry is available to be used. The store release atomic operation guarantees this.
In the enqueueing thread we then need to use a load acquire atomic operation to ensure writing to this entry can only happen after the tail has been read and checked.
Reported by: Ali Saidi <alisaidi@amazon.com> Co-developed by: Ali Saidi <alisaidi@amazon.com> Reviewed by: markj Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D46152
(cherry picked from commit 44e1cfca417c5ef0db908f3836ec3ba704ef1de2)
As with br_cons_tail use an atomic load acquire to read br_prod_tail in buf_ring_dequeue_mc and buf_ring_peek*.
On dequeue we need to ensure we don't read the entry from the buf_ring until it is available and prod_tail has updated. There is already an appropriate store in the enqueue path and an appropriate load in the single consumer dequeue, we just need one in the other functions that read from the buf_ring.
When enqueueing on an architecture with a weak memory model ensure loading br->br_prod_head and br->br_cons_tail are ordered correctly.
If br_cons_tail is loaded first then other threads may perform a dequeue and enqueue before br_prod_head is loaded. This will mean the tail is one less than it should be and the code under the prod_next == cons_tail check could incorrectly be skipped.
buf_ring_dequeue_mc has the same issue with br->br_prod_tail and br->br_cons_head so needs the same fix.
Reported by: Ali Saidi <alisaidi@amazon.com> Co-developed by: Ali Saidi <alisaidi@amazon.com> Reviewed by: imp, kib, markj Sponsored by: Arm Ltd Differential Revision: https://reviews.freebsd.org/D46155
(cherry picked from commit fe2445f47d027c73aa7266669e7d94b70d3949a4)
When targeting Armv8.1 we can assume FEAT_LSE is available and can use the atomic instructions this provides without needing to check for support first.
Per pw(8), when -H is set, the password should be supplied already encrypted in a form suitable for writing directly to the password database (passwd in cloud-init tems); -h provides a special interface by which interactive scripts can set an account password using pw(8) in plain text (plain_text_passwd in cloud-init terms).
The default user (freebsd) is defined with a plain_text_passwd (freebsd), not with an encrypted one.
(cherry picked from commit 7b73ecfe648487c7706ac2b854dcf1435e60e4ca)
As stated in sshd(8), the recommended permissions for ~/.ssh are read/write/execute for the user, and not accessible by others; and the recommended permissions for ~/.ssh/authorized_keys are read/write for the user, and not accessible by others.
(cherry picked from commit 07d17ca189fcf3cc44b7706040b05ca8135c3b85)
Commit 07d17ca189fcf3cc44b7706040b05ca8135c3b85 set the recommended permissions for the SSH authorized keys file and directory. The tests, however, were failing on CI.
Use stat to check for the proper permissions.
Fixes: 07d17ca189f nuageinit: Set recommended SSH permissions Reported by: Jenkins
(cherry picked from commit 8edd6c07c8dafcc5828bceb5fea0684c7d0d0775)
libgeom: Avoid fixed remappings of the devstat device
libgeom maintains a quasi-private mapping of /dev/devstat, which might grow over time if new devices appear. When the mapping needs to be expanded, the old mapping is passed as a hint, but this appears to be unnecessary.
Simplify and improve things a bit: - stop passing a hint when remapping, - don't creat a mapping in geom_stats_open(), as geom_stats_resync() will create it for us, - check for errors from munmap().
Factor out the bits that run with the sock I/O lock held into a separate function. In this implementation, we are doing a bit more work under the I/O lock than before. However, lock contention is only a problem when multiple threads are transmitting on the same socket, which is an unusual case that is not expected to perform well in any case.
This is a production release to fix three bugs, none of which affects well formed scripts on FreeBSD:
The first bug is that bc/dc will exit on macOS when the terminal is resized.
The second bug is that an array, which should only be a function parameter, was accepted as part of larger expressions.
The third bug is that the value stack for dc was cleared on any error. However, this is not how other dc behave. To bring dc more in line with other implementations, this behavior was changed. This change is why this version is a new major version.
(cherry picked from commit 54d20d67e2af28d948ce2df13feb039fa10900fc)
MFC after: 3 days
(cherry picked from commit 12e0d316644a4f80f5f1f78cf07bd93def43b1ca)
Building with GCC failed with the following error message:
error: to be safe all intermediate pointers in cast from 'char **' to 'const char **' must be 'const' qualified [-Werror=cast-qual]
This was caused by main() being declared with "char *argv[]" as the 3rd parameter, but argv later being passed cast to "const char**":
113 | if (BC_IS_BC) s = bc_main(argc, (const char**) argv); | ^
This is fixed by declaring the 3rd parameter of main() as "const char *argv[]".
Reported by: CI MFC after: 3 days
(cherry picked from commit ef5752762ba9ec54d5c02023167d24bcdbb45fd7)
vendor/bc: upgrade to version 7.0.1
This update fixes building bc on FreeBSD with non-default compilers (GCC-12, GCC-13). GCC warned about casting argv from non-const to const and since warnings are treated as errors, the build failed.
(cherry picked from commit 1e19146fc7692f59e8dfc5da7957e938cd0b81b8) (cherry picked from commit 5b0dc991093c82824f6fe566af947f64f5072264)
openssl: Avoid type errors in EAI-related name check logic.
The incorrectly typed data is read only, used in a compare operation, so neither remote code execution, nor memory content disclosure were possible. However, applications performing certificate name checks were vulnerable to denial of service.
The GENERAL_TYPE data type is a union, and we must take care to access the correct member, based on `gen->type`, not all the member fields have the same structure, and a segfault is possible if the wrong member field is read.
The code in question was lightly refactored with the intent to make it more obviously correct.
Following bluhm's advice this changes the way we setup state keys and perform state lookups for ICMPv6 Neighbor Discovery packets: - replace the NS-dst with ND target address; - replace the NA-src with ND target address; - replace the NA-dst with unspecified address if it is a multicast.
This allows pf to match Address Resolution, Neighbor Unreachability Detection and Duplicate Address Detection packets to the corresponding states without the need to create new ones or match unrelated ones. As a side effect we're doing now one state table lookup for ND packets instead of two.
Fixes a bug uncovered by one of the previous commits that virtually breaks IPv6 connectivity after few minutes of use.
pf: be less strict about icmp state checking for sloppy state tracking
Sloppy state tracking renders ICMP direction check useless and harmful as we might see only half of the connection in the asymmetric setups but ignore the state match. The bug was reported and fix was verified by Insan Praja <insan () ims-solusi ! com>. Thanks! OK mcbride, henning
If pf_icmp_state_lookup() finds a state but rejects it for not matching the expected direction we should unlock the state (and NULL out *state). This simplifies life for callers, and also ensures there's no confusion about what a non-NULL returned state means.
Previously it could have been left in there by the caller, resulting in callers unlocking the same state twice.
During unpacking, we ensure that we do not read beyond the declared size. However, unpack uses a function that copies null-terminated strings. Prior to this commit, if the last string was not null-terminated, it could result in copying data into a buffer smaller than the allocated size.
Security: FreeBSD-24:09.libnv Security: CVE-2024-45288 Security: CAP-03 Reported by: Synacktiv Sponsored by: The Alpha-Omega Project Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D46138
(cherry picked from commit 3aaaca1b51ad844ef9e9b3d945217ab3dd189bae)
bhyve: fix Out-Of-Bounds read/write heap in tpm_ppi_mem_handler
The function tpm_ppi_mem_handler is vulnerable to buffer over-read and over-write, the MMIO handler serves the heap allocated structure tpm_ppi_qemu. The issue is that the structure size is smaller than 0x1000 and the handler does not validate the offset and size (sizeof is 0x15A while the handler allows up to 0x1000 bytes)
Reported by: Synacktiv Reviewed by: corvink Security: FreeBSD-SA-24:10.bhyve Security: CVE-2024-41928 Security: HYP-01 Sponsored by: The Alpha-Omega Project Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45980
(cherry picked from commit a06fc21e770a482c8915411ebc98c870e42dd29b)
The virtio_scsi device allows a guest VM to directly send SCSI commands to the kernel driver exposed on /dev/cam/ctl. This setup makes the vulnerability directly accessible from VMs through the pci_virtio_scsi bhyve device.
The function ctl_write_buffer sets the CTL_FLAG_ALLOCATED flag, causing the kern_data_ptr to be freed when the command finishes processing. However, the buffer is still stored in lun->write_buffer, leading to a Use-After-Free vulnerability.
Since the buffer needs to persist indefinitely, so it can be accessed by READ BUFFER, do not set CTL_FLAG_ALLOCATED.
Reported by: Synacktiv Reviewed by: Pierre Pronchery <pierre@freebsdfoundation.org> Reviewed by: jhb Security: FreeBSD-SA-24:11.ctl Security: CVE-2024-45063 Security: HYP-03 Sponsored by: Axcient Sponsored by: The Alpha-Omega Project Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D46424
(cherry picked from commit 670b582db6cb827a8760df942ed8af0020a0b4d0)
ctl: fix memory disclosure in read/write buffer commands
The functions ctl_write_buffer() and ctl_read_buffer() are vulnerable to a kernel memory disclosure caused by an uninitialized kernel allocation. If one of these functions is called for the first time for a given LUN, a kernel allocation is performed without the M_ZERO flag. Then a call to ctl_read_buffer() returns the content of this allocation, which may contain kernel data.
Reported by: Synacktiv Reviewed by: asomers Reviewed by: jhb Security: FreeBSD-SA-24:11.ctl Security: CVE-2024-8178 Security: HYP-05 Sponsored by: The Alpha-Omega Project Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45952
(cherry picked from commit ea44766b78d639d3a89afd5302ec6feffaade813)
ctl: fix Out-Of-Bounds access in ctl_report_supported_opcodes
This vulnerability is directly accessible to a guest VM through the pci_virtio_scsi bhyve device.
In the function ctl_report_supported_opcodes() accessible from the VM, the option RSO_OPTIONS_OC_ASA does not check the requested service_action value before accessing &ctl_cmd_table[].
Reported by: Synacktiv Reviewed by: asomers Security: FreeBSD-SA-24:11.ctl Security: CVE-2024-42416 Security: HYP-06 Sponsored by: The Alpha-Omega Project Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D46027
(cherry picked from commit af438acbfde3d25dbdc82b2b3d72380f0191e9d9)
umtx: shm: Collapse USHMF_REG_LINKED and USHMF_OBJ_LINKED flags
...into the only USHMF_LINKED, as they are always set or unset together.
This is both to stop giving the impression that they can be set/unset independently, which they can't with the current code, and to make it clearer that an upcoming reference counting fix is correct.
Reviewed by: kib Approved by: emaste (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D46126
(cherry picked from commit dd83da532c36830a0c0aac624903849262ec6f68)
umtx: shm: Fix use-after-free due to multiple drops of the registry reference
umtx_shm_unref_reg_locked() would unconditionally drop the "registry" reference, tied to USHMF_LINKED.
This is not a problem for caller umtx_shm_object_terminated(), which operates under the 'umtx_shm_lock' lock end-to-end, but it is for indirect caller umtx_shm(), which drops the lock between umtx_shm_find_reg() and the call to umtx_shm_unref_reg(true) that deregisters the umtx shared region (from 'umtx_shm_registry'; umtx_shm_find_reg() only finds registered shared mutexes).
Thus, two concurrent user-space callers of _umtx_op() with UMTX_OP_SHM and flags UMTX_SHM_DESTROY, both progressing past umtx_shm_find_reg() but before umtx_shm_unref_reg(true), would then decrease twice the reference count for the single reference standing for the shared mutex's registration.
Reported by: Synacktiv Reviewed by: kib Approved by: emaste (mentor) Security: FreeBSD-SA-24:14.umtx Security: CVE-2024-43102 Security: CAP-01 Sponsored by: The Alpha-Omega Project Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D46126
(cherry picked from commit 62f40433ab47ad4a9694a22a0313d57661502ca1)
This hardens against provoked use-after-free occurences should there be reference counting leaks in the future (which is currently not the case).
At the deepest level, umtx_shm_find_reg_unlocked() now returns EOVERFLOW when it cannot grant an additional reference to the registry object, and so will umtx_shm_find_reg(). umtx_shm_create_reg() will fail if calling umtx_shm_find_reg() returns EOVERFLOW (meaning a SHM object for the passed key already exists, but we can't acquire another reference on it), avoiding the creation of a duplicate registry entry for a given key (this wouldn't pose problem for the rest of the code in its current form, but is expressly avoided for intelligibility and hardening purposes).
Since umtx_shm_find_reg*(), and consequently the whole _umtx_op() system call, can only return EOVERFLOW on such a bug manifesting, we don't document that return value.
Reviewed by: kib, emaste Approved by: emaste (mentor) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D46126
(cherry picked from commit c3e6dfe55c0e81d0717b0458bc95128384c3ebe8)
sys: Mark ACL conversion routines as __result_use_check
Both acl_copy_oldacl_into_acl() and acl_copy_acl_into_oldacl() may fail in some circumstances (e.g., acl.acl_cnt exceeding the capacity of OLDACL_MAX_ENTRIES). This change marks both routines with __result_use_check, enforcing check for errors by the caller.
Suggested by: markj Reviewed by: markj, emaste Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D46254
(cherry picked from commit ef9fc9609a1ff53047577aa7cf51246fc04c954b)
For build reproducibility we set PE headers to an arbitrary timestamp. Nothing in FreeBSD uses this timestamp, but bump it from 2016 to 2024 so that the timestamp does not seem "too old" in case some third party tool is used to inspect EFI boot components.
Reviewed by: imp Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D46527
(cherry picked from commit 1b9cfd6a625dc82611846cb9a53c1886f7af3758)
intrng: Remove irq_root_ipicount and corresponding intr_pic_claim_root arg
The static irq_root_ipicount variable is only ever written to (with the value passed to irq_root_ipicount), never read. Moreover, the bcm2836 driver, as used by the Raspberry Pi 2B and 3A/B (but not 4, which uses a GIC-400, though does have the legacy interrupt controller present too) passes 0 as ipicount, despite implementing IPIs. It's thus inaccurate and serves no purpose, so should be removed.
The arm and arm64 implementations of dispatching IPIs via PIC_IPI_SEND are almost identical, and entirely MI with the lone exception of a single store barrier on arm64 (that is likely either redundant or needed on arm too). Thus, de-duplicate this code by moving it to INTRNG as a generic IPI glue framework. The ipi_* functions remain declared in MD smp.h headers and implemented in MD code, but are trivial wrappers around intr_ipi_send that could be made MI, at least for INTRNG ports, at a later date.
Note that, whilst both arm and arm64 had an ii_send member in intr_ipi to abstract over how to send interrupts,, they were always ultimately using PIC_IPI_SEND, and so this complexity has been removed. A follow-up commit will re-introduce the same flexibility by instead allowing a device other than the root PIC to be registered as the IPI sender.
As part of this, strengthen a MAXCPU assertion that was missed in commit 2f0b059eeafc ("intrng: switch from MAXCPU to mp_ncpus") (which itself is mis-titled).
intrng: Allow alternative IPI PICs to be registered and used
On RISC-V, the root PIC (whether the PLIC or, as will be the case in future, the local interrupt controller) cannot send IPIs, relying on another means to trigger the necessary software interrupts (firmware calls), but there are upcoming standard devices that will be able to inject them, so we can't just put the firmware calls in the root PIC driver.
Thus, split out a new intr_ipi_dev from intr_irq_root_dev to use for sending IPIs. New devices can be registered with a given priority up until the first IPI is set up, when the best device seen so far gets frozen as the IPI device to use.
This approach is based on the Arm PSCI driver, though that makes more extensive use of its softc than we do here. This will be used to extract the SBI IPI code as a real PIC.
riscv: Convert local interrupt controller to a newbus PIC
Currently the local interrupt controller implementation is based on pre-INTRNG arm/arm64 code, using hand-rolled event code rather than INTRNG. This then interacts weirdly with the PLIC, and other future interrupt controllers like the APLIC and IMSICs in the upcoming AIA specification, since they become the root PIC despite not being the logical root. Instead, use a real newbus device for it and register it as the root PIC.
This also adapts the IPI code to make use of the newly-added INTRNG generic IPI handling framework, adding a new sbi_ipi as the PIC. In future there will be alternative devices for sending IPIs that will register with higher priorities, such as the proposed AIA IMSIC and ACLINT SSWI.
This is a repeat of 63bf2d735ca3 ("Remove the unused arm64_cpu driver.") for RISC-V, which copied the defunct code from arm64 with no changes beyond substituting riscv64 for arm64, and made no use of it elsewhere. It has thus always been entirely superfluous.
bsdinstall: Fix netconfig script when no interfaces are present
The script uses [ -z "$INTERFACES" ] to check if the list of interfaces is empty and will exit early if so, but INTERFACES always contains at least a space due to the way it appends the list of wireless devices. Fix this by only adding the space when there are devices to append, mirroring the behaviour for non-wireless devices above (both will result in a redundant leading space when the list is non-empty, but that one is harmless).
Fixes: 159ca5c844cd ("Adapt to new wireless scheme where base wlan interfaces do not show up in ifconfig anymore.") MFC after: 1 week
(cherry picked from commit b809c7d6a26924ac351e49a15011da718cc3feec)
bsdinstall: Drop Error from title in netconfig no interfaces dialog
This isn't inherently an error. It is if you're attempting to download dist tarballs or later install packages, but a FreeBSD system with no NIC is a reasonable setup to have, especially in a throwaway VM setting, so we shouldn't say it is one.
Leaving the exit code as 1 is still fine, since auto will ignore it, and avoids breaking other uses.
MFC after: 1 week
(cherry picked from commit 7414d14bd51d8378057bbe952c2715b9f32d1d3e)
arm: Set NEW_PCIB in DEFAULTS rather than a subset of kernel configs
All other architectures set NEW_PCIB in DEFAULTS, with arm being the one remaining straggler that only sets it for GENERIC and TEGRA124. ARMADA38X and ARMADAXP contain device pci but don't set NEW_PCIB, however GENERIC claims to support them and as part of that NEW_PCIB support was added to mv_pci, so these configs are most likely just stale. Other than NOTES that just leaves ALPINE as the one kernel with PCI support not covered by GENERIC, but al_pci is supported by arm64 which enables NEW_PCIB, and it's just a generic_pcie_fdt_driver with some fixup code to deal with quirks so should support PCI_RES_BUS just fine. Therefore it is believed that all in-tree kernel configs support NEW_PCIB in reality, and so let's take a step towards removing all the non-NEW_PCIB code by having it always-on everywhere.
efibootmgr: Simplify make_next_boot_var_name and fix cnt == 0 case
If cnt == 0 we access element 0 unconditionally, which is out of bounds, and then if that doesn't crash and happens to be 0 we will access element - 1, also out of bounds, and then if that doesn't crash will add 1 to whatever junk is there and use that for the variable. On CHERI, though, this does crash. This code is also overly complicated, with unnecessary special cases and tracking more state than needed.
Rewrite it in a more general manner that doesn't need those special cases and naturally works for cnt == 0.
Found by: CHERI Reviewed by: imp Fixes: 1285bcc833a3 ("Import Netflix's efibootmgr to help manage UEFI boot variables") MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D44029
(cherry picked from commit 09cb8031b43c8e98abb5ff9b43ff649031d1e808)
This used to be name = mktemp followed by fd = open downstream, replacing upstream's crude PID-based sprintf, but in 1.4.7 this was changed upstream to this buggy code, which we then picked up in the 1.5.0 import. Presumably nobody's actually used ee's ispell function in the past 15 years; that or it's just ended up using junk file names as temporary files if name's happened to be a valid address to something that can be interpreted as a string.
Reported by: Dapeng Gao <dapeng.gao@cl.cam.ac.uk> Fixes: 96b676e99984 ("Update ee(1) in the base system to version 1.5.0.") MFC after: 1 week
(cherry picked from commit 25a33bfe9ce2b55812201f475e9d3e64009b40dc)
This was true at time of commit, but the path was changed 2 weeks later to just be the /dev/flash/spiN name, without updating the manpage.
Reported by: David Gilbert <dgilbert@daveg.ca> Fixes: 68dd77957786 ("Give the mx25l device sole ownership of the name /dev/flash/spi* ...") MFC after: 1 week
(cherry picked from commit 703768a23590d8faf65b0f16dd395248ff7273f6)
Since commit 246364454fc1 ("etcupdate: Use new buildetc and installetc targets when available"), beinstall has been much slower for the etcupdate step, as it's been doing a kernel-toolchain (admittedly without LLVM itself being built). Given beinstall requires an object tree to already have been built and just installs it, we can pass -B to beinstall to reuse that tree rather than build kernel-toolchain in another one.
tools/build/make.py: Add missing comma to fix tinderbox and worlds
The missing comma meant this was interpreted as a single target called "tinderboxworlds", and so neither tinderbox nor worlds were recognised as being MI targets (i.e. still required TARGET(_ARCH) to be given).
Fixes: 5157b451c654 ("tools/build/make.py: Grow the list of MI targets")
(cherry picked from commit edec803c5b72681b39ce969cc16d634e08bb3ac2)
fu740_pci_dw: Fix PERST delay and keep asserted for rest of reset sequence
DELAY takes microseconds not milliseconds, so 100 was too low. Moreover, when enabling hw.pci.clear_pcib, PCI emeration would still stop at one of the first bridges, but by asserting PERST for the rest of the reset sequence that appears to be reliably addressed.
To comply with FIPS 140 guidance, you must be using a specifically validated and approved version of the fips module. Currently, only OpenSSL 3.0.8 and 3.0.9 have been approved by NIST for FIPS 140 validation. As such, we need to stop shipping later versions of the module in the base system.
This uses a better rational approximation to improve the accuracy of both functions. For exhaustive testing of asinf(3) in the interval, the current libm gives:
tcpdump: ppp: Use the buffer stack for the de-escaping buffer
This both saves the buffer for freeing later and saves the packet pointer and snapend to be restored when packet processing is complete, even if an exception is thrown with longjmp.
This means that the hex/ASCII printing in pretty_print_packet() processes the packet data as captured or read from the savefile, rather than as modified by the PPP printer, so that the bounds checking is correct.
That fixes CVE-2024-2397, which was caused by an exception being thrown by the hex/ASCII printer (which should only happen if those routines are called by a packet printer, not if they're called for the -X/-x/-A flag), which jumps back to the setjmp() that surrounds the packet printer. Hilarity^Winfinite looping ensues.
Also, restore ndo->ndo_packetp before calling the hex/ASCII printing routine, in case nd_pop_all_packet_info() didn't restore it.
Reviewed by: emaste
(cherry picked from commit f8860353d4f4c25bacdae5bc1cfb7a95edc9bfe0)
Commit d8a5961 made a change to nfsv4_sattr() that broke parsing of the setable attributes for a NFSv4 SETATTR. (It broke out of the code by setting "error" and returning right away, instead of noting the error in nd_repstat and allowing parsing of the attributes to continue.) By returning prematurely, it was possible for SETATTR to return the error, but with a bogus set of attribute bits set, since "retbits" had not yet been set to all zeros. (I am not sure if any client could be affected by this bug. The patch was done for a failure case detected by a pynfs test suite and not an actual client.)
While here, the patch also fixes a few cases where the value of attributes gets set for attributes after an error has been set in nd_repstat. This would not really break the protocol, since a SETATTR is allowed to set some attributes and still return an failure, but should not really be done.
(cherry picked from commit 5037c6398b2327366494a0434a894dc17ba8d023)
Allow the cloudware *_FLAVOURS and *_FSLIST values to be overridden at the command line, to assist users who want to e.g. build only one of the many EC2 AMIs available.
(cherry picked from commit 863975b6840b2833b0f772648ba2532806ffece8)
This makes it possible for a VM build configuration file to pass options to make installworld/installkernel/distribution, e.g. WITHOUT_DEBUG_FILES=YES in order to produce smaller images.
Note that these options are only applied at install time, not at build time (since the same build is installed into many different VM images), so not all src.conf options are usable here.
These are the same as the standard "base" images except:
* They don't have kernel or world debug symbols, * They don't have FreeBSD tests, * They don't have 32-bit libraries, * They don't have LLDB, * They don't have the Amazon SSM Agent pre-installed, * They don't default to installing the awscli at first boot.
This reduces the amount of disk space in use when the EC2 instance finishes booting from ~5 GB to ~1 GB.
MFC: ng_ipfw(4): add missing change after previous commit
The function ng_ipfw_input() used to enjoy implicit 32->16 bits truncation of ng_ipfw_findhook1's second argument. Make it explicit to recover from the breakage.
PR: 281082 Reported by: Ruben van Staveren <ruben@verweg.com> Tested by: Ruben van Staveren <ruben@verweg.com> Fixes: 20e1f207cc789a28783344614d6d1d1c639c5797
(cherry picked from commit becd0079c052cb87e7649b78733b99abae8861ee)
Several functions did not validate the slot index resulting in OOB read on the heap of the slot device structure which could lead to arbitrary reads/writes and potentially code execution.
Reported by: Synacktiv Reviewed by: markj (earlier), jhb Security: CVE-2024-41721 Security: HYP-02 Sponsored by: The Alpha-Omega Project Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45996
(cherry picked from commit e72d86ad9c62c8054d7977a71f08e68ef755c132)
ctladm.8: fix several errors in the "port" section
* Document the "-d" option. * Add the "-c" and "-r" options to the summary. * Correct the list of required options. * Clarify that the "-t" option is only for use with "-o", "-w", and "-W" * Replace references to the nonexistent "-n" with "-p".
Also, fix a few related error strings in the ctladm command.
ctladm: print port number with a succesful "port -c" command
Make "ctladm port -c" print the port number of the newly successful port. This way it won't have to be guessed by a subsequent "ctladm portlist" command. That means it's safe to use it concurrently with other ctladm processes. In particular, this allows the tests to be run in parallel.
ctladm: don't require the use of "-p" with "port -r"
When removing a port, the ioctl frontend requires the "-p" argument. But other frontends, like cfiscsi, do not. So don't require that argument in the ctladm command. The frontend driver will report an error if any required argument is missing.
ctladm: deprecate the undocumented "port -l" option
It was mostly removed from the man page in 9c887a4f86f5fd4f51c23443dc8435e52783a782, but left in the command. Fully remove it from the man page, and warn if anybody uses it. Remove it entirely for FreeBSD 16.
ctld: parse config file independently of getting kernel info
Separate the parsing of the config file from the reading of kernel port information. This has three benefits:
* Separation of concerns makes future changes easier. * Allows the config file to be read earlier, which is necessary for fixing PR 271460. * Reduces total line count, by eliminating duplication between parse.y (for traditional config file) and uclparse.c (for UCL config file).
The targ example program doesn't compile with current clang, and probably hasn't for multiple releases. Fix the build. I don't have the right hardware to test it, though.
Sponsored by: Axcient
(cherry picked from commit 873881b7dbb72077f3723f49a9f10a432231c532)
If a user does pathconf(_, _PC_MIN_HOLE_SIZE) on a fusefs file system, the kernel must actually issue a FUSE_LSEEK operation in order to determine whether the server supports it. We cache that result, so we only have to send FUSE_LSEEK the first time that _PC_MIN_HOLE_SIZE is requested on any given mountpoint.
Problem 1:
Unlike fpathconf, pathconf operates on files that may not be open. But FUSE_LSEEK requires the file to be open. As described in PR 278135, FUSE_LSEEK cannot be sent for unopened files, causing _PC_MIN_HOLE_size to wrongly report EINVAL. We never noticed that before because the fusefs test suite only uses fpathconf, not pathconf. Fix this bug by opening the file if necessary.
Problem 2:
On a completely sparse file, with no data blocks at all, FUSE_LSEEK with SEEK_DATA would fail to ENXIO. That's correct behavior, but fuse_vnop_pathconf wrongly interpreted that as "FUSE_LSEEK not supported". Fix the interpretation.
kernel: Make some compile time constant variables const
Those variables are not going to be changed at runtime. Make them const to avoid potential overwriting. This will also help spotting accidental global variables shadowing, since the variable's name such as `version` is short and commonly used.
This change was inspired by reviewing khng's work D44760.
The test suite runs the same tests twice, as different users, and these can trample over each other when run in parallel, causing spurious test failures.
MFC after: 1 week
(cherry picked from commit 41ece3c036bda3d4da321989ee59d0555c10d603)
netinet: Explicitly disallow connections to the unspecified address
If the V_connect_ifaddr_wild sysctl says that we shouldn't infer a destination address, return an error. Otherwise it's possible for use of an unspecified foreign address to trigger a subsequent assertion failure, for example in in_pcblookup_hash_locked().
Similarly, if no interface addresses are assigned, fail quickly upon an attempt to connect to the unspecified address.
All uses of this function were incorrect. if_amcount is a reference count which tracks the number of times the network stack internally set IFF_ALLMULTI. (if_pcount is the corresponding counter for IFF_PROMISC.)
Remove if_getamcount() and fix up callers to get the number of assigned multicast addresses instead, since that's what they actually want.
pkgbase: Unify pkg ABI handling for pkgbase targets
Right now, to get the pkg ABI we either use PKG_ABI, derived from newvers.sh, or use an ABI file from the staged world. This inconsistency is confusing and can cause problems.
Switch to a single source of truth: use an ABI file from the worldstage dir to get the ABI of pkgbase packages. In particular, we do not need to know the ABI until staging is done. More specifically: - use a shell command to define PKG_ABI, - replace inline uses of ABI_FILE, - run sign-packages in a subshell (this was already done for the update-packages target) so that the staging targets are done before we try to evaluate the ABI.
Reviewed by: manu MFC after: 1 month Sponsored by: Innovate UK Differential Revision: https://reviews.freebsd.org/D46287
(cherry picked from commit b118b6eb4cb7520eb348a6ac965b077fc5179fde)
To build the packages target, we build src and src-sys packages containing the source code from which the repo was built. These packages take significantly longer than the others, presumably because they contain many more files. Because both source packages are built to satisfy the same target, they end up being built serially. Split them into separate subtargets so that they can run in parallel. This saves a couple of minutes on my build machine.
pkgbase: Make src package creation recipes more precise
Just remove the plist created by the respective rule. Otherwise the two receipes can race with each other.
Fixes: d7d5c9efef03 ("pkgbase: Let source packages be built in parallel") Reviewed by: bapt, emaste Reported by: Mark Millard <marklmi@yahoo.com> Differential Revision: https://reviews.freebsd.org/D46320
(cherry picked from commit d02dcf21eea3973a714294b011537c2af6c747fa)
The in-tree ZFS test suite is somewhat outdated and I see a number of failures there. I tend to think that we want to integrate the OpenZFS test suite somehow, replacing the legacy one, though it's also possible to run that as a separate test suite.
In any case, if one wants to run the OpenZFS test suite separately, it's useful to be able to disable installation of the legacy ZFS test suite, so let's provide a src.conf option to do that.
ifnet: Add handling for toggling IFF_ALLMULTI in ifhwioctl()
IFF_ALLMULTI has an associated activation counter and so needs special treatment, like IFF_PROMISC. Introduce IFF_PALLMULTI, akin to IFF_PPROMISC, which indicates that userspace requested allmulti mode, and handle it specially in ifhwioctl().
Similar to "promisc", this allows the IFF_ALLMULTI flag to be toggled from userspace if it happens to be useful to disable multicast packet filtering. One use-case is when implementing IPv6 neighbour discovery over netmap.
Revert "tzsetup: symlink /etc/localtime instead of copying"
This failed when used with tzsetup's -C option (for example, when using etcupdate -D to update a jail from the host). Revert the stable/14 MFC for now; will be reapplied after being fixed in main..
This reverts commit fc43a1b6842afa806dfd7ba48de5bece63d04456. This reverts commit 87f7f0389f8b7bf30ef12df5c0d337cb2789883e.
Closes: 280538 Fixes: cf8a18 (back out logging to /var/log/adduser) MFC after: 3 days Reported by: Herbert Baerschneider <herbert.baerschneider@protonmail.com>
kernel: Fix defining of .init_array and .fini_array sections
These input sections can have decimal numbers as the priority suffix. Clang emits the '%u' form, while SORT is an alias for SORT_BY_NAME, hence will result in wrong order of constructors / destructors in output sections. Fix by using the correct sorting command SORT_BY_INIT_PRIORITY instead [1].
The functions referenced by section .fini_array is in the normal order, but been executed in the reverse order. The order is same with .init_array section.
Currently these sections are not used, there should be no functional change.
Note: As for the .ctors and .dtors sections, both Clang and GCC emit the priority suffix in the form of '%05u', so there is no semantic difference between SORT_BY_NAME and SORT_BY_INIT_PRIORITY for those sections [2].
This fix is extracted from a bigger patch [3] of hselasky, with additional fix for .fini_array section.
kernel: Add defination of .init_array and .fini_array for all other platforms
Currently these sections are not used but defined only for amd64 and i386. Added them for all other platforms to keep all platforms in sync. There should be no functional change.
This change is extracted from a bigger patch [1] of hselasky, with additional fix for the order of .fini_array section.
An interface's bpf could feasibly not exist, in which case bpf_peers_present() would panic from a NULL pointer dereference. Solve this by adding a new IfAPI that could deal with a NULL bpf, if such could occur in the network stack.
This comment was introduced by fix [1], later the fix was refined by change [2], and the context of the usage of `m_get2()` and `m_getjcl()` got lost, then the comment became obscure.
Update to reflect the current behavior.
1. f13da24715a7 net/bpf: Fix writing of buffer bigger than PAGESIZE 2. a051ca72e281 Introduce m_get3()
Fixes: a051ca72e281 Introduce m_get3() MFC after: 3 days
(cherry picked from commit 343bf78e487190557889c8ba53d8080b268867f7)
The change of its description from integer to boolean didn't actually change it to a boolean, but only made it impossible to set as either a boolean or an integer.
Rather than make it work as a boolean parameter should, just revert to the old (working) integer parameter, and change the documentation to match.
PR: 274263 Reported by: andrew.hotlab at hotmail
(cherry picked from commit ae1a0648b05acf798816e7b83b3c10856de5c8e5)
Fix "singleton" function used by regcomp() to turn character set matches into exact character matches if a character set has exactly one element.
The underlying cset representation is complex; most critically it records"small" characters (codepoint less than either 128 or 256 depending on locale) in a bit vector, and "wide" characters in a secondary array.
Unfortunately the "singleton" function uses to identify singleton sets treated a cset as a singleton if either the "small" or the "wide" sets had exactly one element (it would then ignore the other set).
The easiest way to demonstrate this bug:
$ export LANG=C.UTF-8 $ echo 'a' | grep '[abà]'
It should match (and print "a") but instead it doesn't match because the single accented character in the set is misinterpreted as a singleton.
net/e1000/base: fix link power down Current code is a result of work to reduce duplication between various device models. However, the logic that was replaced did not exactly match the new logic, and as a result the link power down was not working correctly for some NICs, and the link remained up even when the interface is down.
Fix it to correctly power down the link under all circumstances that were supported by old logic.
When VF issues a reset to PF there is a 50 msec wait plus an additional max of 1 msec (200 * 5us) for the PF to indicate the reset is complete before timeout.
In some cases, it is seen that the reset is timing out, in which case the reset does not complete and an error is returned.
In order to account for this, continue to wait an initial 50 msecs, but then allow a max of an additional 50 msecs (10,000 * 5us) for the command to complete.
Fixes: af75078 ("first public release") Cc: stable@dpdk.org
Signed-off-by: Kevin Traynor <ktraynor@redhat.com> Acked-by: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
Obtained from: DPDK (64e714f)
(cherry picked from commit 28fdb212adc0431fff683749a1307038e25ff58e)
net/ixgbe/base: fix 5G link speed reported on VF When 5000 Base-T was set on PF the VF reported 100 Base-T. This patch changes ixgbe_check_mac_link_vf function where there was an incorrect conditional which checks using PF mac types, now it is correctly using VF mac types.
Fixes: 12e2090 ("net/ixgbe/base: include new speeds in VFLINK interpretation") Cc: stable@dpdk.org
Signed-off-by: Piotr Skajewski <piotrx.skajewski@intel.com> Reviewed-by: Radoslaw Tyl <radoslawx.tyl@intel.com> Reviewed-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Reviewed-by: Alice Michael <alice.michael@intel.com>
Obtained from: DPDK (9eb7fdb)
(cherry picked from commit ab92cab02633580f763a38a329a5b25050bb4fbf)
net/ixgbe/base: fix PHY ID for X550 Function ixgbe_get_phy_type_from_id() for X550_PHY_ID2 and X550_PHY_ID3 always return ixgbe_phy_unknown instead of ixgbe_phy_aq because phy ID's last 4 bits are always masked, and should not be taken into account when selecting phy type.
This patch adds default PHY ID for X550 devices with mask on last 4 bits (0xFFFFFFF0), and fixes the switch statement to use it.
Fixes: 58ddc80 ("ixgbe/base: add new X550 PHY ids") Cc: stable@dpdk.org
Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com> Reviewed-by: Piotr Skajewski <piotrx.skajewski@intel.com> Reviewed-by: Alice Michael <alice.michael@intel.com>
Obtained from: DPDK (a9f5a3b)
(cherry picked from commit 9b56dfd27c64fcaf2dfbaa1eb3e2bd2b163fa56c)
There is name similarity within IXGBE_VT_MSGTYPE_ACK and PFMAILBOX.ACK / VFMAILBOX.ACK which may cause confusion. Rename MSGTYPE macros to SUCCESS and FAILURE as they are not specified in datasheet and now will be easily distinguishable.
Signed-off-by: Jakub Chylkowski <jakubx.chylkowski@intel.com> Reviewed-by: Marek Zalfresso-jundzillo <marekx.zalfresso-jundzillo@intel.com> Reviewed-by: Alice Michael <alice.michael@intel.com> Reviewed-by: Piotr Skajewski <piotrx.skajewski@intel.com> Reviewed-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Tested-by: Piotr Skajewski <piotrx.skajewski@intel.com> Tested-by: Alice Michael <alice.michael@intel.com>
Obtained from: DPDK (4f675c9)
(cherry picked from commit 10746040820ee5186caf4d4d61cf88196ec213ba)
net/ixgbe/base: correct registers names to match datasheet Some of mailbox-related registers have different names than it is specified in datasheet. Correct these names to correspond to their datasheet counterparts. Additionally, several calculations are changed to no longer use magic numbers but dedicated macros instead.
Signed-off-by: Jakub Chylkowski <jakubx.chylkowski@intel.com> Reviewed-by: Marek Zalfresso-jundzillo <marekx.zalfresso-jundzillo@intel.com> Reviewed-by: Alice Michael <alice.michael@intel.com> Reviewed-by: Piotr Skajewski <piotrx.skajewski@intel.com> Reviewed-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Tested-by: Piotr Skajewski <piotrx.skajewski@intel.com> Tested-by: Alice Michael <alice.michael@intel.com>
Obtained from: DPDK (10fd55e)
(cherry picked from commit b3c7fde6fe9113f849232604523878b4b68df0cc)
Current mailbox API does not work as described in documentation and is prone to errors (for example, it is doing locks on read). Introduce new mailbox API and provide compatibility functions with old API.
New error codes have been introduced: - IXGBE_ERR_CONFIG - ixgbe_mbx_operations is not correctly set - IXGBE_ERR_TIMEOUT - mailbox operation, e.g. poll for message, timedout - IXGBE_ERR_MBX_NOMSG - no message available on read
In addition, some refactoring has been done: mailbox structures were defined twice: in ixgbe_type.h and ixgbe_vf.h. Move them into ixgbe_mbx.h as this header is dedicated for mailbox.
Signed-off-by: Jakub Chylkowski <jakubx.chylkowski@intel.com> Reviewed-by: Alice Michael <alice.michael@intel.com> Reviewed-by: Piotr Pietruszewski <piotr.pietruszewski@intel.com> Tested-by: Alice Michael <alice.michael@intel.com> Tested-by: Piotr Skajewski <piotrx.skajewski@intel.com>
Obtained from: DPDK (6d243d2)
Reapply message
This reverts commit d80c12ba682a6f23791f3d6e657f9e603b152aa2.
(cherry picked from commit 7234c3099947d202702e98d844ecd2d649c834d2)
Change max credit and credit refill to a maximum possible value, 9128. Too small values cause the incorrect calculation of the bandwidth limits to each traffic class for frames larger than 4088 bytes.
Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com> Reviewed-by: Piotr Skajewski <piotrx.skajewski@intel.com> Reviewed-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Reviewed-by: Alice Michael <alice.michael@intel.com> Tested-by: Piotr Skajewski <piotrx.skajewski@intel.com>
Obtained from: DPDK (440823f)
(cherry picked from commit 1b80ac6fa64eaa575b99521cbd71a3780bf5139b)
Some function comments have mismatches between actual function names and function name in comments, which causes warnings with kernel-doc. Fix comments to match function names.
Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com> Reviewed-by: Piotr Skajewski <piotrx.skajewski@intel.com> Reviewed-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Reviewed-by: Alice Michael <alice.michael@intel.com>
Obtained from: DPDK (7b5bc85)
(cherry picked from commit edef2769483b29457f028a508ea96fc1099a0a21)
net/ixgbe/base: add reset count field to HW struct Add fw_rst_cnt to store the number of resets after fw update. This value is required to detect if the EICR.MNG event occurred after firmware update reset.
Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com> Reviewed-by: Piotr Skajewski <piotrx.skajewski@intel.com> Reviewed-by: Slawomir Mrozowicz <slawomirx.mrozowicz@intel.com> Reviewed-by: Alice Michael <alice.michael@intel.com>
Obtained from: DPDK (9ab0e9c)
(cherry picked from commit 224f7ab8b4706653c7d3f78e624bc36c97679f30)
net/ixgbe/base: replace HIC with direct register access Unify FW access method to direct register read/writes across all Atom(R) C3000 products.
Atom(R) C3000 fiber exhibited an issue with the Host Interface Command execution being locked when another LAN function attempted to acquire the SWFW sync on Manageability Host. This resulted in HIC atomicity break and bogus data being read since the other LAN function cleared all semaphores on timeout whereas HIC execution continued after unlock. Direct register IOSF access showed higher stability and reliability.
Signed-off-by: Marek Mical <marekx.mical@intel.com> Reviewed-by: Krzysztof Galazka <krzysztof.galazka@intel.com> Reviewed-by: Eryk Rybak <eryk.roch.rybak@intel.com> Reviewed-by: Francis Racicot <Francis.Racicot@intel.com> Reviewed-by: Alice Michael <alice.michael@intel.com>
Obtained from: DPDK (e947f1e)
(cherry picked from commit f56311e37d4c39b1deab6aa8523f3332c29e1ad3)
net/ixgbe/base: remove circular header dependency Including one header file in second header file should be avoided, so fix it by forward declaring the struct instead.
Signed-off-by: Barbara Skobiej <barbara.skobiej@intel.com>
Obtained from: DPDK (0bc2af5)
(cherry picked from commit 3167854b9d2188c4039239f741870e044b7507ac)
net/ixgbe/base: improve SWFW semaphore acquisition HWSW semaphore acquisition in Atom C3000 NIC is a two stage process. Each time two semaphore acquisitions are required. Each second semaphore failure requires re-acquisition of first semaphore. This patch decouples the two acquisitions preventing potentially hundreds of thousands of unnecessary loop iterations.
Signed-off-by: Barbara Skobiej <barbara.skobiej@intel.com>
Obtained from: DPDK (99f960c)
(cherry picked from commit cc9944183187308a71489651b11342d293aac7d1)
net/ixgbe/base: prevent untrusted loop bound Added length check against EEPROM size in words to prevent untrusted loop bound reported by static code analysis.
net/ixgbe/base: remove unused function prototypes There are some function prototypes that were introduced at some point but were never implemented, so remove them.
Signed-off-by: Chinh Cao <chinh.t.cao@intel.com>
Obtained from: DPDK (e9cc1b4)
(cherry picked from commit 420c984470270e0f7200124d8015236584aef243)
ixgbe: update if_sriov to use the new mailbox apis
This fixes a page fault when creating VFs and updates to the new mailbox API and naming conventions.
The functionality works to the same level that it did before my recent changes. In particular on my 82599 it creates both passthru and ixv interfaces. In either case, the PF seems to lose the ability to pass traffic. The ixv driver fails to attach. These issues are present with or without my updates.
If you use SR-IOV on ixgbe I would be interested in hearing what does or does not work for you.
(cherry picked from commit 36c516b31136f645472c12d8597534656272acd6)
mountd: Add check for "=" after exports(5) options
Some exports(5) options take a "=arg" component that provides an argument value for the option. Others do not. Without this patch, if "=arg" was provided for an option that did not take an argument value, the "=arg" was simply ignored. This could result in confusion w.r.t. what was being exported, as noted by the Problem Report.
This patch adds a check for "=arg" for the options that do not take an argument value and fails the exports line if one is found.
PR: 281003 (cherry picked from commit 3df987c99d1194a0e43a84853e934aa0c0ab09db)
This release incorporates the following bug fixes and mitigations: - Fixed possible denial of service in X.509 name checks ([CVE-2024-6119]) - Fixed possible buffer overread in SSL_select_next_proto() ([CVE-2024-5535])
Release notes can be found at: https://openssl-library.org/news/openssl-3.0-notes/index.html
Co-authored-by: gordon MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D46602
libc/getnameinfo: stop adding NI_NUMERICHOST where inappropriate
Checking the first nibble of the IPv6 address to be 0 and then excluding two well known cases (v4-mapped, loopback) leaves us with more cases where the first nibble could be 0, e.g., the RFC 6052, 2.1 Well-Known Prefix 64:ff9b::/96. It is not practical to track them all and it is not clear what lead to this special casing originally, so remove them.
While here also remove the IN6_IS_ADDR_LINKLOCAL() + NI_NUMERICHOST case as link-local address resolution does exist.
We do leave the IN6_IS_ADDR_MULTICAST() case for now as I could not find any references to any official reverse lookups for these.
Adding comments for more case (and some historic behaviour) in order to make it easier to follow the logic.
rc: network.subr update consitency with older change (v6/v4 order)
As of 1b5be7204eaeeaf58eefdebe5b308f90792c693b we setup parts of IPv6 before IPv4 if configured. For consistency change a case in ifn_start() calling ipv6_up() before ipv4_up() and reverse in ifn_stop().
For EXT_CSD_PART_CONFIG_ACC_BOOT<n> and possibly others with suffixes we fail to create proper disk aliases (symlinks), which shows up as g_dev_taste: make_dev_alias_p() failed (name=mmcsd0, error=17)
In this case we ended up with the followng two: /dev/mmcsd0 -> sdda0 /dev/mmcsd1 -> sdda0boot1 Note that (i) it should be mmcsd0boot1 and not mmcsd1 and that (ii) there is no mmcsd0boot0 (failed above as it tried to create a second mmcsd0).
Adjust the code (using a highly simplified version--compared to my original approach--suggested by imp) using an extended format string with (sdda/mmcsd) prefix as first argument to create proper names.
malloc(9): extend contigmalloc(9) by a "slab cookie"
Extend kern_malloc.c internals to also cover contigmalloc(9) by a "slab cookie" and not just malloc/malloc_large. This allows us to call free(9) even on contigmalloc(9) addresses and deprecate contigfree(9). Update the contigmalloc(9) man page accordingly.
The way this is done (free(9) working for contigmalloc) will hide the UMA/VM bits from a consumer which may otherwise need to know whether the original allocation was by malloc or contigmalloc by looking at the cookie (likely via an accessor function). This simplifies the implementation of consumers of mixed environments a lot.
This is preliminary work to allow LinuxKPI to be adjusted to better play by the rules Linux puts out for various allocations. Most of this was described/explained to me by jhb.
One may observe that realloc(9) is currently unchanged (and contrary to [contig]malloc/[contig]free an implementation may need access the "slab cookie" information given it will likely be implementation dependent which allocation type to use if size changes beyond the usable size of the initial allocation).
Described by: jhb Sponsored by: The FreeBSD Foundation Reviewed by: markj, kib Differential Revision: https://reviews.freebsd.org/D45812
(cherry picked from commit 9e6544dd6e02c46b805d11ab925c4f3b18ad7a4b)
kern_malloc: fold free and zfree together into one __always_inline func
free() and zfree() are essentially the same copy and pasted code with the extra explicit_bzero() (and formerly kasan) calls. Add a bool to add the extra functionality and make both functions a wrapper around the common code and let the compiler do the optimization based on the bool input when inlining.
No functional changes intended.
Suggested by: kib (in D45812) Sponsored by: The FreeBSD Foundation Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D46101
And the fix from Olivier Certner (olce):
kern_malloc: Restore working KASAN runtime after free() and zfree() folding
In the zfree() case, the explicit_bzero() calls zero all the allocation, including the redzone which malloc() has marked as invalid. So calling kasan_mark() before those is in fact necessary.
This fixes a crash at boot when 'ldconfig' is run and tries to get random bytes through getrandom() (relevant part of the stack is read_random_uio() -> zfree() -> explicit_bzero()) for kernels with KASAN compiled in.
Approved by: markj (mentor) Fixes: 4fab5f005482 ("kern_malloc: fold free and zfree together into one __always_inline func") Sponsored by: The FreeBSD Foundation
(cherry picked from commit 4fab5f005482aa88bc0f7d7a0a5e81b436869112) (cherry picked from commit 28391f188ca18b6251ba46040adf81946b0ccb03)
In order to allow the allocator to change in the future move it into the implementation file from being an inline function in the header.
While here factor out the size calculation and add a comment as-to why this is done. We will need the size (_s) in the future to make a decision on how to allocate.
Sponsored by: The FreeBSD Foundation Reviewed by: emaste Differential Revision: https://reviews.freebsd.org/D45815
(cherry picked from commit 1f7df757017404011732196e65981d9325f7a89f)
net80211: scan/internal: change boolean argument from int to bool
ieee80211_probe_curchan() passes a "force" argument which is bool. Make it such. Adjust the (*sc_scan_probe_curchan)() KPI to bool as well. This is all a big NOP as the only implementor of this function, ieee80211_swscan_probe_curchan(), does not use the argument at all.
I came across this when pondering a different scan implementation. Rather than dropping the change remove the argument from the function, and push the cleanup out given it is purely net80211 internal code (the argument may have reason for existance in the future).
Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45816
(cherry picked from commit 9776aba34576596cbe49084457ee40730fec55a2)
LinuxKPI/lindebugfs: stop panicing in lindebugfs, fix simple_read_from_buffer
Trying to use lindebugfs for debugging wirless drivers two issues became apparent: (a) a panic in lindebugfs calling a hard coded release function if the caller had not provided one. This seems to be based on assumptions that no longer hold up. Remove the hard coded release function to prevent panics. (b) In LinuxKPI simple_read_from_buffer() would call copy_to_user() but buffers weren't setup for this (lindebugfs copies data from its own buffer) and then pseudofs will do another copyout to the user on this; remove the copy_to_user() and simply copy the data over to the provided buffer; this works for as long as the only consumers remain debugfs callers (which currently seems to be the case). [the only out-of-tree consumers I am aware off are two drm-kmod drivers/gpu/drm/amd/pm/* debugfs functions I cannot test].
Sponsored by: The FreeBSD Foundation Tested by: jfree Differential Revision: https://reviews.freebsd.org/D45755
(cherry picked from commit 5668c22a13c6befa9b8486387d38457c40ce7af4)
Implement ieee80211_sn_*() using the equivalent net80211 macros. We need those implemented for at least 11n. While here also sort all the BA functions together next to the "sn" functions.
Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D45819
(cherry picked from commit db8b3578627b5be93eba019ab2bbe3c03f7366f4)
Allow a user to change the "ether" address by ifconfig while a VAP is not UP. Compared to net80211 (given we have no callback) we register an eventhandler per-vif (a global one would force us to use hacks to derive if a vap is indeed also a lkpi_80211 vif).
Sponsored by: The FreeBSD Foundation PR: 277356 Tested by: lwhsu Differential Revision: https://reviews.freebsd.org/D46121
(cherry picked from commit 4aff4048f5b1b6ab0b905726853ba6083e37cc37)
Add a get_random_u8() implementation following the u36 and u64 versions. We'll likely want to macro-ify them in the future and add all the types which makes sense just to be done.
Sponsored by: The FreeBSD Foundation Reviewed by: emaste Differential Revision: https://reviews.freebsd.org/D46464
(cherry picked from commit f29e915bc0d216a87f222a208caeb2172c93e4ea)
Add a version of no_printk(), which seems to be there to have format string checking while never calling the printk. It seems a very weird thing and it needs a return code and for some reason my initial while (0) { } version hadn't worked while porting over new code but could have been further downstream format string problems. if (0) seems to do the job though I would have expected that to more likely simply get optimised out without any futher format checking.
Sponsored by: The FreeBSD Foundation Reviewed by: emaste Differential Revision: https://reviews.freebsd.org/D46463
(cherry picked from commit 1847e63d63f440cfcb2f4ee2c2ee8990f0272d88)
LinuxKPI: add general module_driver(), use it for module_pci_driver()
Factor out module_pci_driver() from 366d68f283793 into a general module_driver() so other bus attachments can also use the same kind of macro without duplicating all the lines.
Redefine module_pci_driver() using the new general macro.
No functional changes intended.
Sponsored by: The FreeBSD Foundation Reviewed by: manu Differential Revision: https://reviews.freebsd.org/D46467
(cherry picked from commit f5c7feee7129dc88a2e5dc3ce0a075cb5e4f534a)
Add new enums to netdevice.h (including one which is referenced but no value of it is used in a driver so we have to add a "dummy" value to avoid an empty enum).
Sponsored by: The FreeBSD Foundation
(cherry picked from commit 6ed447b51a9d6cf22aae2dfba6efce3922ae6d57)
LinuxKPI: 802.11: add futher defines to ieee80211.h and nl80211.h
Upstream new defines, enum values, etc. for coming driver updates which are non-conflicting with the current state.
The only notable change is the rename of the enum ieee80211_ap_reg_power but the enum name had not been used so far by any driver in the tree (only in mac80211.h) but an updated version of ath11k does use it so we need to correct our initial naming.
Sponsored by: The FreeBSD Foundation
(cherry picked from commit c1c989588df67396392edceb0e7e7028abc06c49)
Move RANDOM_FORTUNA_{NPOOLS,DEFPOOLSIZE} from fortuna.c to fortuna.h and use RANDOM_FORTUNA_DEFPOOLSIZE in random_harvestq.c rather than having a magic (albeit explained in a comment) number. The NPOOLS value will be used in a later commit.
Add a new loader variable entropy_efi_seed_size which defaults to 2048; if not defined (e.g. if the /boot/lua/ is updated but /boot/defaults/ isn't) the same 2048 default will be used.
The EFI RNG on some platforms takes a long time if we request 2048 bytes of entropy, so we would like to request less; but our kernel Fortuna RNG needs to be fed 2048 bytes in order to consider itself "fully seeded". If we have between 64 bytes (the size of a single Fortuna pool and enough to guarantee cryptographic security) and 2048 bytes (what Fortuna wants) then the boot process will hang waiting for more entropy despite in fact having enough to operate securely.
Since 64 bytes of entropy is plenty to be cryptographically secure (an attack of cost ~ 2^128 is infeasible, which implies a mere 16 bytes of entropy), use PBKDF2 (aka pkcs5v2_genkey_raw) to spread the entropy across 2048 bytes. This is secure since PBKDF2 has the property that every subset of output bytes has within O(1) of the maximum possible amount of entropy.
In 5c73b3e0a3db calls to core.loadEntropy were added to core.boot and core.autoboot; but neither of those is invoked if we disable the "beastie" menu. Add a core.loadEntropy call to the no-menu path.
Reviewed by: imp MFC after: 1 week Sponsored by: Amazon Fixes: 5c73b3e0a3db ("Add support for getting early entropy from UEFI") Differential Revision: https://reviews.freebsd.org/D46637
(cherry picked from commit 74a28cf6e7f66c7c12fd25ee8231eeedf756bf08)
linuxulator: ignore AT_NO_AUTOMOUNT for all stat variants
Commit ff39d74aa99a ignored AT_NO_AUTOMOUNT for statx(), but did not change fstat64() or newfstatat(), which also take an equivalent flags argument. Add a linux_to_bsd_stat_flags() helper and use it in all three places.
PR: 281526 Reviewed by: trasz Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D46711
(cherry picked from commit 3cf834d069d1dcdbe464ea74624930eaf916715d)
Following is a changelog of new features and fixes to wpa:
hostapd: * Wi-Fi Easy Connect - add support for DPP release 3 - allow Configurator parameters to be provided during config exchange * HE/IEEE 802.11ax/Wi-Fi 6 - various fixes * EHT/IEEE 802.11be/Wi-Fi 7 - add preliminary support * SAE: add support for fetching the password from a RADIUS server * support OpenSSL 3.0 API changes * support background radar detection and CAC with some additional drivers * support RADIUS ACL/PSK check during 4-way handshake (wpa_psk_radius=3) * EAP-SIM/AKA: support IMSI privacy * improve 4-way handshake operations - use Secure=1 in message 3 during PTK rekeying * OCV: do not check Frequency Segment 1 Channel Number for 160 MHz cases to avoid interoperability issues * support new SAE AKM suites with variable length keys * support new AKM for 802.1X/EAP with SHA384 * extend PASN support for secure ranging * FT: Use SHA256 to derive PMKID for AKM 00-0F-AC:3 (FT-EAP) - this is based on additional details being added in the IEEE 802.11 standard - the new implementation is not backwards compatible * improved ACS to cover additional channel types/bandwidths * extended Multiple BSSID support * fix beacon protection with FT protocol (incorrect BIGTK was provided) * support unsynchronized service discovery (USD) * add preliminary support for RADIUS/TLS * add support for explicit SSID protection in 4-way handshake (a mitigation for CVE-2023-52424; disabled by default for now, can be enabled with ssid_protection=1) * fix SAE H2E rejected groups validation to avoid downgrade attacks * use stricter validation for some RADIUS messages * a large number of other fixes, cleanup, and extensions
wpa_supplicant: * Wi-Fi Easy Connect - add support for DPP release 3 - allow Configurator parameters to be provided during config exchange * MACsec - add support for GCM-AES-256 cipher suite - remove incorrect EAP Session-Id length constraint - add hardware offload support for additional drivers * HE/IEEE 802.11ax/Wi-Fi 6 - support BSS color updates - various fixes * EHT/IEEE 802.11be/Wi-Fi 7 - add preliminary support * support OpenSSL 3.0 API changes * improve EAP-TLS support for TLSv1.3 * EAP-SIM/AKA: support IMSI privacy * improve mitigation against DoS attacks when PMF is used * improve 4-way handshake operations - discard unencrypted EAPOL frames in additional cases - use Secure=1 in message 2 during PTK rekeying * OCV: do not check Frequency Segment 1 Channel Number for 160 MHz cases to avoid interoperability issues * support new SAE AKM suites with variable length keys * support new AKM for 802.1X/EAP with SHA384 * improve cross-AKM roaming with driver-based SME/BSS selection * PASN - extend support for secure ranging - allow PASN implementation to be used with external programs for Wi-Fi Aware * FT: Use SHA256 to derive PMKID for AKM 00-0F-AC:3 (FT-EAP) - this is based on additional details being added in the IEEE 802.11 standard - the new implementation is not backwards compatible, but PMKSA caching with FT-EAP was, and still is, disabled by default * support a pregenerated MAC (mac_addr=3) as an alternative mechanism for using per-network random MAC addresses * EAP-PEAP: require Phase 2 authentication by default (phase2_auth=1) to improve security for still unfortunately common invalid configurations that do not set ca_cert * extend SCS support for QoS Characteristics * extend MSCS support * support unsynchronized service discovery (USD) * add support for explicit SSID protection in 4-way handshake (a mitigation for CVE-2023-52424; disabled by default for now, can be enabled with ssid_protection=1) - in addition, verify SSID after key setup when beacon protection is used * fix SAE H2E rejected groups validation to avoid downgrade attacks * a large number of other fixes, cleanup, and extensions
+ ntpd added to ntp.conf(5) description (search keywords) + expand NTP so these pages are shown when `apropos time` + "standard" => "reference" for increased consistency - removed redundant or duplicated search keywords
e1000: Delay safe_pause switch until SI_SUB_CLOCKS
Based on sysinit_sub_id, SI_SUB_CLOCKS is after SI_SUB_CONFIGURE.
SI_SUB_CONFIGURE = 0x3800000, /* Configure devices */ At this stage, the variable “cold” will be set to 0.
SI_SUB_CLOCKS = 0x4800000, /* real-time and stat clocks*/ At this stage, the clock configuration will be done, and the real-time clock can be used.
In the e1000 driver, if the API safe_pause_* are called between SI_SUB_CONFIGURE and SI_SUB_CLOCKS stages, it will choose the wrong clock source. The API safe_pause_* uses “cold” the value of which is updated in SI_SUB_CONFIGURE, to decide if the real-time clock source is ready. However, the real-time clock is not ready til the SI_SUB_CLOCKS routines are done.
bitset: __BIT_FFS_AT(): Fix herald comment, take 2
Remove the reference to the nonexistent 'end' parameter. While here, rephrase a bit.
I did the initial comment fix (commit "bitset: Fix __BIT_FFS_AT()'s herald comment", f3ab0d86e8070c73) as part of an experiment introducing macros to operate on ranges of bits in a bitset and subject to a predicate (a generalization of some code used in some pending modifications of the ULE scheduler), which was finally ditched as being too verbose and impractical to use. I however then forgot to remove the reference to 'end'.
No functional change.
Noted by: emaste Approved by: emaste (mentor) MFC after: 3 days MFC with: f3ab0d86e807 Sponsored by: The FreeBSD Foundation
(cherry picked from commit ad4cf76ec4d4524381350e77b02b9abe24eb4b02)
The driver is giant-locked and thus already prints a deprecation warning when attaching. The device file interface was broken in 14.0 and 14.1, see commit 12500c14281d, but it took a very long time for anyone to notice, and in that case it was only because of some code which probes all device files.
dtrace_getarg() previously walked the call stack looking for a frame matching the dtrace_invop_callsite symbol, in order to look for a trapframe corresponding to an invop (i.e., FBT or kinst) probe. Commit 3ba8e9dc4a0e broke this in some cases by breaking the expected alignment of the dtrace_invop_callsite symbol.
Rather than groveling around the stack to find invop probe arguments, simply use the trapframe reference saved by dtrace_invop(). This is simpler and less fragile.
FBT refuses to create probes in modules which depend on dtrace(all), but dtrace_test is a convenient place to add functions specifically for testing dtrace.
The dependency on dtraceall is not needed, so just remove it. In fact, it can be useful to test SDT probe creation by loading dtrace_test with and without dtraceall loaded.
This chipset suffered an (un)usual number of bugs and iterations. Let's add our NVM/firmware code from e1000 and the similar igc_nvm function from DPDK to keep track of issues.
Sponsored by: BBOX.io
(cherry picked from commit 33ed9bdca307bedb3d66a50ed7d4d7b4bf4acf39)
When a signal is trapped, the script continues after the trap code has run, unless the trap code explicitly exits. In the particular case of locate.updatedb, this is mostly harmless, except that the trap code is executed twice (once for the signal and once when we reach the end of the script), but it's still worth fixing.
Furthermore, install the trap as soon as we've created the temporary directory, to minimize the window during which we can fail to clean up after ourselves if interrupted.
While here, simplify the empty check at the end and make some minor style tweaks.
(cherry picked from commit f62c1f3f8e91c78d402e1db4e518e4899a4ba2b9)
locate.updatedb: Revert to using cat to copy the db.
This script is usually run unprivileged, so install fails to create a temporary file while copying the finished database. Revert to using cat, which can overwrite the existing file as it is usually owned by the same user which is running the script.
The manual page says %m is replaced with “the string representation of the error code stored in the errno variable at the beginning of the call”. However, we don't actually save `errno` until fairly late in `__vfprintf()`. Make sure it is saved before we do anything that might perturb `errno`.
vmm: Properly handle writes spanning across two pages in vm_handle_db
The vm_handle_db function is responsible for writing correct status register values into memory when a guest VM is being single-stepped using the RFLAGS.TF mechanism. However, it currently does not properly handle an edge case where the resulting write spans across two pages. This commit fixes this by making vm_handle_db use two vm_copy_info structs.
Security: HYP-09 Reviewed by: markj
(cherry picked from commit 51fda658baa3f80c9778f3a9873fbf67df87119b)
vmm: avoid potential KASSERT kernel panic in vm_handle_db
If the guest VM emits the exit code VM_EXITCODE_DB the kernel will execute the function named vm_handle_db.
If the value of rsp is not page aligned and if rsp+sizeof(uint64_t) spans across two pages, the function vm_copy_setup will need two structs vm_copyinfo to prepare the copy operation.
For instance is rsp value is 0xFFC, two vm_copyinfo objects are needed:
* address=0xFFC, len=4 * address=0x1000, len=4
The vulnerability was addressed by commit 51fda658baa ("vmm: Properly handle writes spanning across two pages in vm_handle_db"). Still, replace the KASSERT with an error return as a more defensive approach.
Reported by: Synacktiv Reviewed by markj, emaste Security: HYP-09 Sponsored by: The Alpha-Omega Project Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D46133
(cherry picked from commit d19fa9c1b72bc52e51524abcc59ad844012ec365)
The function hda_codec_command is vulnerable to buffer over-read, the payload value is extracted from the command and used as an array index without any validation. Fortunately, the payload value is capped at 255, so the information disclosure is limited and only a small part of .rodata of bhyve binary can be disclosed.
The risk is low because the leaked information is not sensitive. An attacker may be able to validate the version of the bhyve binary using this information disclosure (layout of .rodata information, ex: jmp_tables) before executing an exploit.
Reported by: Synacktiv Reviewed by: christos, emaste Security: HYP-13 Sponsored by: The Alpha-Omega Project Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D46098
(cherry picked from commit e94a1d6a7f2eb932850e1db418bf34d5c6991ce8)
Changes: https://git.tcpdump.org/libpcap/blob/bbcbc9174df3298a854daee2b3e666a4b6e5383a:/CHANGES Reviewed by: emaste Obtained from: https://www.tcpdump.org/release/libpcap-1.10.5.tar.gz Sponsored by: The FreeBSD Foundation
(cherry picked from commit afdbf109c6a661a729938f68211054a0a50d38ac) (cherry picked from commit ecb75be376a3e18d3e4836b6ee07015264784694) (cherry picked from commit f0bcebe67ef6cf9f104535d6cd9f151c1b61dd6a) (cherry picked from commit 34aa6f2c2db5cc9655f201a1ef01adbb9fb484d5)
Changes: https://git.tcpdump.org/tcpdump/blob/4a789712f187e3ac7b2c0044c3a3f8c71b83646e:/CHANGES Obtained from: https://www.tcpdump.org/release/tcpdump-4.99.5.tar.xz Sponsored by: The FreeBSD Foundation
(cherry picked from commit 0a7e5f1f02aad2ff5fff1c60f44c6975fd07e1d9)
The previous width of Netif (10 or 8) was too short for modern interface names; make it 12, which is long enough to display "epair0a.1000".
This came up in practice with genet(4) interfaces, since the base interface name is long enough that with the previous limit, VLAN identifiers would be truncated at 1 character in the IPv6 output: "genet0.100" becomes "genet0.1".
The width is now fixed, and doesn't depend on the address family, because there's no reason that length of the interface name would vary based on the AF.
As for the consumer `enc_add_hhooks()`, `hhook_add_hook()` will never fail for the given parameters. Meanwhile, to build the module if_enc(4), at least option INET or INET6 is required, so no need for the error EPFNOSUPPORT.
Standardize the utilities from nuage.lua, to return nil on failure, plus an error message as a second result, and some value different from nil on success.
Make warnmsg() and errmsg() append "nuageinit: " by default. Pass an optional second parameter as false to avoid printing this tag.
Signed-off-by: Jose Luis Duran <jlduran@gmail.com> (cherry picked from commit 945632ca76117029e7bd1f46d17ccb378973daf7)
The hashed password usually contains a "$" sign, which, when used on a shell, must be escaped. Also, the plain text password may contain special characters that require escaping.
Add a quick fix by enclosing it in single quotes. Note that if the plain text password contains a "'", it will still fail. This will be properly fixed in later commits.
Some here documents require the document to be a string literal, especially when passing invalid characters. Enclose it in single quotes.
Signed-off-by: Jose Luis Duran <jlduran@gmail.com> (cherry picked from commit b9ce743c5447e90c2c97f4d49e048c301f708527)
- Export NUAGE_FAKE_ROOTDIR only once - Use the header section of the test to require the root user - Use the PWD environment variable - Set the root/sys shell as /bin/sh - Use RFC 5737 reserved IP addresses
Signed-off-by: Jose Luis Duran <jlduran@gmail.com> (cherry picked from commit e72457c4f5166eef2a27249e02f3c1e9a1cf852d)
- Add the firstboot-freebsd-update package, as long as we do not have pkgbase, this is needed - Support SLAAC by default to complement DHCPv4 (use SYNCDHP instead)
Signed-off-by: Jose Luis Duran <jlduran@gmail.com> (cherry picked from commit 120740221fd4a4577e63e6c279f9873cabe449d0)
Merge commit b84d773fd004 from llvm git (by Fangrui Song):
[Parallel] Revert sequential task changes
https://reviews.llvm.org/D148728 introduced `bool Sequential` to unify `execute` and the old `spawn` without argument. However, sequential tasks might be executed by any worker thread (non-deterministic), leading to non-determinism output for ld.lld -z nocombreloc (see https://reviews.llvm.org/D133003).
In addition, the extra member variables have overhead. This sequential task has only been used for lld parallel relocation scanning.
This patch restores the behavior before https://reviews.llvm.org/D148728 .
Bump lld LINKER_FREEBSD_VERSION for reproducibility fix
The upstream fix to make lld output for our EFI loaders reproducible again was committed in 54521a2ff93a. Bump lld's LINKER_FREEBSD_VERSION to be able to check this in the EFI loader Makefile.
MFC after: 3 days
(cherry picked from commit f97c7fdc59d252cc8611968ffac541d4b8342b8b)
The new sys/conf/std.debug contains the list of debugging options enabled by default in -CURRENT, so they don't need to be listed individually in every kernel config.
Introduce *-DEBUG variants of the major kernel configs.
(cherry picked and modified from commit 4f8f9d708e6a4143f3b178bfab10d0a9b75ba2fe)