udev: skip loop device setup for device-mapper devices#42473
udev: skip loop device setup for device-mapper devices#42473angelcerveraroldan wants to merge 1 commit into
Conversation
Device mapper devices do not have partscan enabled, so probe_gpt_boot_disk_needs_loop would identify them as needing a loop device when they contain the EFI partition from which the machine booted. This causes blkid probing to not take place. Fixes: systemd#42472 Co-developed-by: Claude Opus 4.6 <[email protected]>
Claude review of PR #42473 (68231b2)Small, focused fix. No correctness, memory-safety, lifetime, or security issues found. A few consistency/idiom points below. Suggestions
Nits
|
|
can you explain how this can happen for you? why do you have a GPT partition on top of DM and don't expect partitions to be created for it? |
|
(i am pretty sure we should not filter out DM via C code though. if this makes sense – and i really don't grok in which kind of setup you'd even end up with this – then this should be filtered out in the udev rule level) |
| if (r > 0) | ||
| return 0; | ||
|
|
||
| if (sd_device_get_sysattr_value(dev, "dm/name", NULL) >= 0) { |
There was a problem hiding this comment.
Claude: suggestion: Detecting a device-mapper device by probing the dm/name sysattr deviates from the established conventions in the tree. udev elsewhere identifies DM devices with device_sysname_startswith(dev, "dm-") (e.g. src/udev/udev-worker.c, src/udev/udev-watch.c), and shared block-device code keys off dm/uuid (blockdev_is_encrypted() in src/shared/blockdev-util.c). Both dm/name and dm/uuid live on the whole-disk DM node that is probed here, so the check is functionally correct, but one of the existing idioms would be more consistent and self-documenting. As a secondary point, the >= 0 test collapses every failure mode (-EIO, -ENOMEM, …) into "not a DM device", whereas the rest of probe_gpt_boot_disk_needs_loop() captures the return value in r and propagates genuine errors via log_device_debug_errno(). Switching to device_sysname_startswith() sidesteps that entirely; if you keep sd_device_get_sysattr_value(), consider distinguishing -ENOENT (absent) from real errors.
| if (r > 0) | ||
| return 0; | ||
|
|
||
| if (sd_device_get_sysattr_value(dev, "dm/name", NULL) >= 0) { |
There was a problem hiding this comment.
Claude: suggestion: The DM-device skip is placed inside the if (!sector_size_mismatch) block, so it only takes effect when partition scanning is disabled but sector sizes match. That correctly matches the reported bug (#42472). However, the sector-size-mismatch branch below is left unchanged: if a DM device's GPT genuinely uses a different sector size than the DM block device, the code will still set ID_PART_GPT_AUTO_ROOT_DISK_NEEDS_LOOP and attempt to stack a loop device on the DM node — the same ineffective action the fix avoids for the partscan case. The mismatch-on-DM scenario is rare (mostly El Torito CD-ROM boot), so this is not blocking, but it's worth confirming that behavior is intended rather than an oversight of the chosen placement.
| if (r > 0) | ||
| return 0; | ||
|
|
||
| if (sd_device_get_sysattr_value(dev, "dm/name", NULL) >= 0) { |
There was a problem hiding this comment.
Claude: nit: Per CODING_STYLE, a NULL literal passed as an argument should be annotated with the parameter name. The analogous call in src/udev/udev-builtin-input_id.c writes sd_device_get_sysattr_value(pdev, "capabilities/ev", /* ret= */ NULL); this call should likewise use /* ret= */ NULL.
Thanks for the quick reply. The issue was encountered in a test where the whole disk is a DM device, as it is setup as a multipath disk. Because of this, the GPT headers as well as the EFI partition are inside the DM device. Before the regression, since there was no sector size mismatch, The issue is not that the loop device is setup, rather its that If interested, this is a link to the exact test that is failing: |
but why wouldn't partition scanning be on in that case? you are booting from a multipath disk but are using th enon-multipath'ed partitions as backing source for your actual file systems? that sounds broken? are you sure your test is actually testing the right thing? |
|
Hey @poettering 👋. It's been a while 😄 Maybe it would help if we describe the problem more rather than focusing on the proposed solution here (as there may be a more appropriate way to fix the problem). The multipath disks are partitioned disks with an OS installed on them. The filesystems (boot, root, EFI) sit on the partitions. The problem we are hitting is that |
|
To further illustrate.. For a system setup for this test with Fedora 44 ( |
|
also: |
I understand all that. But I am pretty sure that testcase is borked, because you are setting up multipath, but do not enable partitions on the multipath DM device. This means for all the partitions on the multipath DM device the only available block devices are going to be the non-multi-pathed partition block devices, and that means you are actually bypassing multipath for your file system access. |
so these block devices suggest you do have partitions enabled after all? or are you using a userspace partition scanner? |
Yes, sorry for the confusion, when I said the DM device does not have partscan enabled, what I meant is that the kernel partscan is disabled on the whole disk DM multipath device. The multipath disk itself contains the GPT partition table, and a userspace partition scanner is used to set them up. I believe that it is handled by kpartx via dracuts multipath module and its associated rules. |
Device mapper devices do not have partscan enabled, so probe_gpt_boot_disk_needs_loop would identify them as needing a loop device when they contain the EFI partition from which the machine booted. This causes blkid probing to not take place.
Fixes: #42472
Co-developed-by: Claude Opus 4.6 [email protected]