Skip to content

udev: skip loop device setup for device-mapper devices#42473

Open
angelcerveraroldan wants to merge 1 commit into
systemd:mainfrom
angelcerveraroldan:dm-devices-loop
Open

udev: skip loop device setup for device-mapper devices#42473
angelcerveraroldan wants to merge 1 commit into
systemd:mainfrom
angelcerveraroldan:dm-devices-loop

Conversation

@angelcerveraroldan
Copy link
Copy Markdown

Device mapper devices do not have partscan enabled, so probe_gpt_boot_disk_needs_loop would identify them as needing a loop device when they contain the EFI partition from which the machine booted. This causes blkid probing to not take place.

Fixes: #42472

Co-developed-by: Claude Opus 4.6 [email protected]

Device mapper devices do not have partscan enabled, so
probe_gpt_boot_disk_needs_loop would identify them as needing a loop device
when they contain the EFI partition from which the machine booted. This causes
blkid probing to not take place.

Fixes: systemd#42472

Co-developed-by: Claude Opus 4.6 <[email protected]>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

Claude review of PR #42473 (68231b2)

Small, focused fix. No correctness, memory-safety, lifetime, or security issues found. A few consistency/idiom points below.

Suggestions

  • Use idiomatic DM detectionsrc/udev/udev-builtin-blkid.c:462 — prefer device_sysname_startswith(dev, "dm-") or the dm/uuid sysattr over a raw dm/name existence probe; the >= 0 test also folds real errors into "not a DM device".
  • DM skip not applied on sector-size mismatchsrc/udev/udev-builtin-blkid.c:462 — DM devices are only excused when sector sizes match; the mismatch branch can still flag them as needing a loop device.

Nits

  • Annotate NULL argumentsrc/udev/udev-builtin-blkid.c:462 — use /* ret= */ NULL per CODING_STYLE.

Workflow run

@github-actions github-actions Bot added udev please-review PR is ready for (re-)review by a maintainer labels Jun 4, 2026
@poettering
Copy link
Copy Markdown
Member

can you explain how this can happen for you? why do you have a GPT partition on top of DM and don't expect partitions to be created for it?

@poettering
Copy link
Copy Markdown
Member

(i am pretty sure we should not filter out DM via C code though. if this makes sense – and i really don't grok in which kind of setup you'd even end up with this – then this should be filtered out in the udev rule level)

@poettering poettering added needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer and removed please-review PR is ready for (re-)review by a maintainer labels Jun 4, 2026
if (r > 0)
return 0;

if (sd_device_get_sysattr_value(dev, "dm/name", NULL) >= 0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: Detecting a device-mapper device by probing the dm/name sysattr deviates from the established conventions in the tree. udev elsewhere identifies DM devices with device_sysname_startswith(dev, "dm-") (e.g. src/udev/udev-worker.c, src/udev/udev-watch.c), and shared block-device code keys off dm/uuid (blockdev_is_encrypted() in src/shared/blockdev-util.c). Both dm/name and dm/uuid live on the whole-disk DM node that is probed here, so the check is functionally correct, but one of the existing idioms would be more consistent and self-documenting. As a secondary point, the >= 0 test collapses every failure mode (-EIO, -ENOMEM, …) into "not a DM device", whereas the rest of probe_gpt_boot_disk_needs_loop() captures the return value in r and propagates genuine errors via log_device_debug_errno(). Switching to device_sysname_startswith() sidesteps that entirely; if you keep sd_device_get_sysattr_value(), consider distinguishing -ENOENT (absent) from real errors.

if (r > 0)
return 0;

if (sd_device_get_sysattr_value(dev, "dm/name", NULL) >= 0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: suggestion: The DM-device skip is placed inside the if (!sector_size_mismatch) block, so it only takes effect when partition scanning is disabled but sector sizes match. That correctly matches the reported bug (#42472). However, the sector-size-mismatch branch below is left unchanged: if a DM device's GPT genuinely uses a different sector size than the DM block device, the code will still set ID_PART_GPT_AUTO_ROOT_DISK_NEEDS_LOOP and attempt to stack a loop device on the DM node — the same ineffective action the fix avoids for the partscan case. The mismatch-on-DM scenario is rare (mostly El Torito CD-ROM boot), so this is not blocking, but it's worth confirming that behavior is intended rather than an oversight of the chosen placement.

if (r > 0)
return 0;

if (sd_device_get_sysattr_value(dev, "dm/name", NULL) >= 0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude: nit: Per CODING_STYLE, a NULL literal passed as an argument should be annotated with the parameter name. The analogous call in src/udev/udev-builtin-input_id.c writes sd_device_get_sysattr_value(pdev, "capabilities/ev", /* ret= */ NULL); this call should likewise use /* ret= */ NULL.

@angelcerveraroldan
Copy link
Copy Markdown
Author

angelcerveraroldan commented Jun 4, 2026

can you explain how this can happen for you? why do you have a GPT partition on top of DM and don't expect partitions to be created for it?

Thanks for the quick reply. The issue was encountered in a test where the whole disk is a DM device, as it is setup as a multipath disk. Because of this, the GPT headers as well as the EFI partition are inside the DM device.

Before the regression, since there was no sector size mismatch, probe_gpt_boot_disk_needs_loop would return 0, this function now returns 1 because DM devices are marked as not having partscan enabled [1].

The issue is not that the loop device is setup, rather its that builtin_blkid now returns earlier (because probe_gpt_boot_disk_needs_loop returns 1) than it did before, and blkid probing is no longer taking place, so when the system boots, properties like ID_PART_TABLE_UUID are not set. Would a udev rule not lead to the same issue ?

If interested, this is a link to the exact test that is failing:
coreos/fedora-coreos-config#4196

[1] https://github.com/torvalds/linux/blob/ba3e43a9e601636f5edb54e259a74f96ca3b8fd8/drivers/md/dm.c#L2352

@poettering
Copy link
Copy Markdown
Member

Thanks for the quick reply. The issue was encountered in a test where the whole disk is a DM device, as it is setup as a multipath disk. Because of this, the GPT headers as well as the EFI partition are inside the DM device.

but why wouldn't partition scanning be on in that case? you are booting from a multipath disk but are using th enon-multipath'ed partitions as backing source for your actual file systems? that sounds broken? are you sure your test is actually testing the right thing?

@dustymabe
Copy link
Copy Markdown
Contributor

dustymabe commented Jun 4, 2026

Hey @poettering 👋. It's been a while 😄

Maybe it would help if we describe the problem more rather than focusing on the proposed solution here (as there may be a more appropriate way to fix the problem).

The multipath disks are partitioned disks with an OS installed on them. The filesystems (boot, root, EFI) sit on the partitions.

The problem we are hitting is that ID_PART_TABLE_UUID isn't set for the device (see the consuming code here) and we think that is because builtin_blkid is now returning here earlier than it was prior to 4e0eabd.

@dustymabe
Copy link
Copy Markdown
Contributor

dustymabe commented Jun 4, 2026

To further illustrate.. For a system setup for this test with Fedora 44 (systemd-259.5-1.fc44.x86_64) this is what the disks look like:

[core@qemu0 ~]$ sudo multipath -v 2 -ll
0x4b8a85d8f244deb7 dm-0 NVME,VirtualMultipath
size=15G features='0' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=1 status=active
  |- 6:0:0:0 sda 8:0  active ready running
  `- 7:0:0:0 sdb 8:16 active ready running

 
[core@qemu0 ~]$ lsblk
NAME                     MAJ:MIN RM  SIZE RO TYPE  MOUNTPOINTS
sda                        8:0    0   15G  0 disk  
└─0x4b8a85d8f244deb7     252:0    0   15G  0 mpath 
  ├─0x4b8a85d8f244deb7p1 252:1    0    1M  0 part  
  ├─0x4b8a85d8f244deb7p2 252:2    0  127M  0 part  
  ├─0x4b8a85d8f244deb7p3 252:3    0  384M  0 part  /boot
  ├─0x4b8a85d8f244deb7p4 252:4    0   10G  0 part  /sysroot/ostree/deploy/fedora-coreos/var
  │                                                /sysroot
  │                                                /etc
  └─0x4b8a85d8f244deb7p5 252:5    0  4.5G  0 part  /var
sdb                        8:16   0   15G  0 disk  
└─0x4b8a85d8f244deb7     252:0    0   15G  0 mpath 
  ├─0x4b8a85d8f244deb7p1 252:1    0    1M  0 part  
  ├─0x4b8a85d8f244deb7p2 252:2    0  127M  0 part  
  ├─0x4b8a85d8f244deb7p3 252:3    0  384M  0 part  /boot
  ├─0x4b8a85d8f244deb7p4 252:4    0   10G  0 part  /sysroot/ostree/deploy/fedora-coreos/var
  │                                                /sysroot
  │                                                /etc
  └─0x4b8a85d8f244deb7p5 252:5    0  4.5G  0 part  /var

@dustymabe
Copy link
Copy Markdown
Contributor

also:

[core@qemu0 ~]$ cat /proc/mounts | sort | head -n 5
/dev/mapper/0x4b8a85d8f244deb7p3 /boot ext4 ro,seclabel,nosuid,nodev,relatime 0 0
/dev/mapper/0x4b8a85d8f244deb7p4 /etc xfs rw,seclabel,relatime,inode64,logbufs=8,logbsize=32k,prjquota 0 0
/dev/mapper/0x4b8a85d8f244deb7p4 /sysroot xfs ro,seclabel,relatime,inode64,logbufs=8,logbsize=32k,prjquota 0 0
/dev/mapper/0x4b8a85d8f244deb7p4 /sysroot/ostree/deploy/fedora-coreos/var xfs rw,seclabel,relatime,inode64,logbufs=8,logbsize=32k,prjquota 0 0
/dev/mapper/0x4b8a85d8f244deb7p5 /var ext4 rw,seclabel,relatime 0 0

@poettering
Copy link
Copy Markdown
Member

Maybe it would help if we describe the problem more rather than focusing on the proposed solution here (as there may be a more appropriate way to fix the problem).

The multipath disks are partitioned disks with an OS installed on them. The filesystems (boot, root, EFI) sit on the partitions.

The problem we are hitting is that ID_PART_TABLE_UUID isn't set for the device (see the consuming code here) and we think that is because builtin_blkid is now returning here earlier than it was prior to 4e0eabd.

I understand all that. But I am pretty sure that testcase is borked, because you are setting up multipath, but do not enable partitions on the multipath DM device. This means for all the partitions on the multipath DM device the only available block devices are going to be the non-multi-pathed partition block devices, and that means you are actually bypassing multipath for your file system access.

@poettering
Copy link
Copy Markdown
Member

/dev/mapper/0x4b8a85d8f244deb7p4 /etc xfs rw,seclabel,relatime,inode64,logbufs=8,logbsize=32k,prjquota 0 0
/dev/mapper/0x4b8a85d8f244deb7p4 /sysroot xfs ro,seclabel,relatime,inode64,logbufs=8,logbsize=32k,prjquota 0 0
/dev/mapper/0x4b8a85d8f244deb7p4 /sysroot/ostree/deploy/fedora-coreos/var xfs rw,seclabel,relatime,inode64,logbufs=8,logbsize=32k,prjquota 0 0
/dev/mapper/0x4b8a85d8f244deb7p5 /var ext4 rw,seclabel,relatime 0 0

so these block devices suggest you do have partitions enabled after all? or are you using a userspace partition scanner?

@angelcerveraroldan
Copy link
Copy Markdown
Author

so these block devices suggest you do have partitions enabled after all? or are you using a userspace partition scanner?

Yes, sorry for the confusion, when I said the DM device does not have partscan enabled, what I meant is that the kernel partscan is disabled on the whole disk DM multipath device.

The multipath disk itself contains the GPT partition table, and a userspace partition scanner is used to set them up. I believe that it is handled by kpartx via dracuts multipath module and its associated rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

claude-review needs-reporter-feedback ❓ There's an unanswered question, the reporter needs to answer udev

Development

Successfully merging this pull request may close these issues.

DM devices incorrectly identified as needing a loop device

4 participants