Optimize delta resolution from O(N^2) to O(N) in pack indexer.#7268
Open
sstepashka wants to merge 1 commit into
Open
Optimize delta resolution from O(N^2) to O(N) in pack indexer.#7268sstepashka wants to merge 1 commit into
sstepashka wants to merge 1 commit into
Conversation
The original git_indexer_commit resolved REF_DELTA objects by retrying the entire unresolved list on each pass — O(N^2) for packs with deep or wide delta chains. There are 2 dependency maps: - `desp_by_offset` - base object pack offset to `OFS_DELTA` children - `deps_by_oid` - OID to its `REF_DELTA` children Delta resolution then becomes a single BFS traversal: each resolved object immediately looks up and resolves its children in O(1), giving O(N) total work. Thin pack fixup (injecting missing bases from the ODB) was preserved and integrated into the same traversal. New regression test (`test_pack_indexer__delta_chain`) was added to cover out-of-order delta resolution. Existing tests pass. I also verified the change by cloning a giant repo (~5.7 Gb) before and after with some clone example of a bare mirror of a single branch and compared the resulting directories using `diff -rq`. They are identical. The change doesn't make too much difference with default `-DUSE_SHA1`, but when I switch to use `CommonCrypto` on MacOS. I get: ``` 8:57.53 total ``` instead of previous ``` 17:12.05 total ```
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



The original git_indexer_commit resolved REF_DELTA objects by retrying the entire unresolved list on each pass — O(N^2) for packs with deep or wide delta chains.
There are 2 dependency maps:
desp_by_offset- base object pack offset toOFS_DELTAchildrendeps_by_oid- OID to itsREF_DELTAchildrenDelta resolution then becomes a single BFS traversal: each resolved object immediately looks up and resolves its children in O(1), giving O(N) total work. Thin pack fixup (injecting missing bases from the ODB) was preserved and integrated into the same traversal.
New regression test (
test_pack_indexer__delta_chain) was added to cover out-of-order delta resolution. Existing tests pass.I also verified the change by cloning a giant repo (~5.7 Gb) before and after with some clone example of a bare mirror of a single branch and compared the resulting directories using
diff -rq. They are identical.The change doesn't make too much difference with default
-DUSE_SHA1, but when I switch to useCommonCryptoon MacOS. I get:instead of previous