Implement cpuinfo_deinitialize() to free heap-allocated globals#387
Conversation
|
Hi @crvineeth97! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at [email protected]. Thanks! |
Bump pytorch/cpuinfo to crvineeth97/cpuinfo@df8c6a8 which implements cpuinfo_deinitialize() to properly free heap-allocated globals. This prevents memory leak reports from App Verifier, Valgrind, and sanitizers when ORT is dynamically loaded/unloaded. ORT integration: - Add CPUIDInfo::ShutDown() which calls cpuinfo_deinitialize() - Call ShutdownCpuInfo() from DllMain on DLL_PROCESS_DETACH - In memleak-check builds, also call shutdown during process termination The cpuinfo bump also includes upstream fixes that make three ORT patches redundant (removed): - patch_vcpkg_arm64ec_support.patch (pytorch/cpuinfo#324) - win_arm_fp16_detection_fallback.patch (pytorch/cpuinfo#348) - 0001-Add-implementation-for-cpuinfo_deinitialize.patch The patch_cpuinfo_h_for_arm64ec.patch is retained as it is not yet upstream in cpuinfo. Related: pytorch/cpuinfo#150, pytorch/cpuinfo#387 Co-authored-by: Copilot <[email protected]>
|
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks! |
831e1bb to
8ff94bd
Compare
|
Hi @fbarchard @malfet @GregoryComer |
malfet
left a comment
There was a problem hiding this comment.
In principle it sounds great, but I think it's better to move the portions for de-init next to respective allocation routines?
|
The init and deinit guards are reset at the end of the deinitialize function so that if a DLL loads, unloads and then loads again, it can initialize if required. The deinit tests I added in the previous commit were incorrect since the ABI doesn't allow checking the various variables when cpuinfo_is_initialized is false. The init.cc tests sufficiently test the guard reset since each test does init and deinit |
|
@malfet @fbarchard @GregoryComer May I please get a review for this? |
|
Hi @malfet @fbarchard @GregoryComer, I have a pending pull request in the onnxruntime repo (microsoft/onnxruntime#28245) that depends on this PR. I would greatly appreciate your reviews. |
GregoryComer
left a comment
There was a problem hiding this comment.
This overall looks reasonable. Given that the changes are mostly gated to the new deinit functions, it should be relatively safe. One request - can you add some tests that cover de-init and then re-init? It would be good to have regression tests to make sure that things function correctly across multiple re-init cycles.
Thank you! The existing init.cc tests already implicitly exercise deinit → re-init cycles and that's the reason why I had to add the reinit in the first place. Each of the ~30 test cases calls cpuinfo_initialize() at the start and cpuinfo_deinitialize() at the end, and since GTest runs them sequentially in a single process, every test after the first is effectively a re-init cycle. Is that sufficient? |
Ah, good point. That should be sufficient, I think. Since I haven't seen any objections from the other reviews and CI is green, I'll go ahead and merge. |
|
Thank you so much! I really appreciate it |
Bump pytorch/cpuinfo to crvineeth97/cpuinfo@df8c6a8 which implements cpuinfo_deinitialize() to properly free heap-allocated globals. This prevents memory leak reports from App Verifier, Valgrind, and sanitizers when ORT is dynamically loaded/unloaded. ORT integration: - Add CPUIDInfo::ShutDown() which calls cpuinfo_deinitialize() - Call ShutdownCpuInfo() from DllMain on DLL_PROCESS_DETACH - In memleak-check builds, also call shutdown during process termination The cpuinfo bump also includes upstream fixes that make three ORT patches redundant (removed): - patch_vcpkg_arm64ec_support.patch (pytorch/cpuinfo#324) - win_arm_fp16_detection_fallback.patch (pytorch/cpuinfo#348) - 0001-Add-implementation-for-cpuinfo_deinitialize.patch The patch_cpuinfo_h_for_arm64ec.patch is retained as it is not yet upstream in cpuinfo. Related: pytorch/cpuinfo#150, pytorch/cpuinfo#387 Co-authored-by: Copilot <[email protected]>
Bump pytorch/cpuinfo to crvineeth97/cpuinfo@df8c6a8 which implements cpuinfo_deinitialize() to properly free heap-allocated globals. This prevents memory leak reports from App Verifier, Valgrind, and sanitizers when ORT is dynamically loaded/unloaded. ORT integration: - Add CPUIDInfo::ShutDown() which calls cpuinfo_deinitialize() - Call ShutdownCpuInfo() from DllMain on DLL_PROCESS_DETACH - In memleak-check builds, also call shutdown during process termination The cpuinfo bump also includes upstream fixes that make three ORT patches redundant (removed): - patch_vcpkg_arm64ec_support.patch (pytorch/cpuinfo#324) - win_arm_fp16_detection_fallback.patch (pytorch/cpuinfo#348) - 0001-Add-implementation-for-cpuinfo_deinitialize.patch The patch_cpuinfo_h_for_arm64ec.patch is retained as it is not yet upstream in cpuinfo. Related: pytorch/cpuinfo#150, pytorch/cpuinfo#387 Co-authored-by: Copilot <[email protected]>
Currently
cpuinfo_deinitialize()is a no-op stub. Applications that dynamically load and unload libraries using cpuinfo (e.g., ONNX Runtime as a DLL on Windows, or as a shared library viadlopen/dlcloseon Linux) leave orphaned heap allocations when the library is unloaded mid-process. These are flagged as memory leaks by tools such as App Verifier and Valgrind.This PR implements cleanup for all heap-allocated globals: processors, cores, clusters, packages, caches, and uarch info. Platform-specific details:
HeapFreeto matchHeapAllocallocations. Cleanup is guarded byInitOnceExecuteOncefor thread safety.init-by-logical-sys-info.c.free()to matchcallocallocations. Cleanup is guarded bypthread_oncefor thread safety.cpuinfo_deinitialize()is a no-op on these platforms because they use static storage for some globals (packages, L3 cache) that cannot be safely freed.Cleanup is idempotent, meaning that calling
cpuinfo_deinitialize()multiple times is safe.Also fixes a minor existing leak of
processor_infosin the x86 Windows init cleanup path.Related issues