NIFI-16011: Reduce number of FlowFiles used in LoadBalanceIT from 100…#11325
Conversation
|
[claude-opus-4.7] System-tests run 27299783298 finished with three failed shards. All three are pre-existing flakes unrelated to the change in this PR; main has been failing with the same symptoms (8 of the last 10 main
Have triggered |
|
[claude-opus-4.7] Rerun attempt 2 of the system-tests run also failed on
The non-system-test jobs (Windows FR, Scan, Ubuntu integration-tests, CodeQL, Corretto EN, macOS JP, macos-15 Java 25 attempt 2) are all green. The reduction from 100 → 20 FlowFiles in
@markap14 please advise — I will pause aggressive polling and switch to once-per-hour until you weigh in. |
Experimental commit: temporarily revert Jetty 12.1.10 → 12.1.9Pushed Why I think Jetty 12.1.10 is the trigger
What this commit is and isn'tThis is not intended to be merged as-is. The PR remains a Draft. If this commit's CI run shows the flakes disappear, we will:
If the flakes persist on 12.1.9, the hypothesis is wrong and we keep digging. The existing |
Hypothesis confirmed: Jetty 12.1.10 introduced the HTTP/2 RST_STREAM regressionWorkflow run 27353150066 with commit
Primary finding
Remaining Java 25 failures (not Jetty)These look like pre-existing flakes in different code paths:
None of these involve HTTP/2 RST_STREAM or cluster replication failures. Proposed next stepSuggest we proceed in this order:
Waiting on direction before proceeding. |
43fe62e to
7539793
Compare
…reduce LoadBalanceIT request volume Introduce a new nifi.properties setting, nifi.cluster.node.protocol.http.version, that configures the HTTP version that the cluster node web client prefers when replicating requests to other nodes. Accepts HTTP_2 (the default) or HTTP_1_1; invalid values log a warning and fall back to HTTP_2. The setting is wired into FlowControllerConfiguration.webClientService() so the cluster replication HttpClient honors the configured version. The shipped conf/nifi.properties template defaults to HTTP_2 so production traffic is unchanged. The system test resource templates under nifi-system-test-suite/src/test/resources/conf/* set the value to HTTP_1_1 to work around intermittent "RST_STREAM received Stream cancelled" failures the JDK java.net.http.HttpClient produces when it talks to Jetty 12.1.10 (jetty PR #15087 / issue #15009) under the heavy disconnect / offload / restart patterns the system tests exercise. The replicator cannot retry replicated POSTs, so when Jetty sends RST_STREAM mid-stream the in-flight request is lost. Also reduces the number of FlowFiles used in LoadBalanceIT from 100 to 20 so each test iterates over the queue with far fewer API calls, reducing replication pressure and shortening the test. Includes administration-guide documentation for the new property and unit-test coverage in NiFiPropertiesTest for the default, override, and blank-fallback cases.
|
[claude-opus-4.7] Pre-existing flake on
Confirmed pre-existing on Reran just the |
kevdoran
left a comment
There was a problem hiding this comment.
Thanks for this @markap14! It will be great to have more reliable system tests!.
The rationale for keeping HTTP_2 as the production default and using HTTP_1_1 only for system tests makes sense.
A few non-blocking suggestions:
1. Missing @return on the new getter. getClusterNodeProtocolHttpVersion() has a full Javadoc block but no @return tag. It won't fail checkstyle (allowMissingReturnTag=true), but it's inconsistent with the coding standard now that the method is documented:
/**
* ...
* @return the configured cluster node protocol HTTP version, or the default when the property is missing or blank
*/
public String getClusterNodeProtocolHttpVersion() {2. Property scope is broader than the name suggests. The webClientService() bean this configures isn't only used for request replication — it's also injected into NarRestApiClient, AssetsRestApiClient, NiFiRestApiClient, and StandardUploadRequestReplicator. So nifi.cluster.node.protocol.http.version effectively governs the HTTP version for all framework intra-cluster HTTP clients. They're all node-to-node operations so the name is defensible, but it would be worth noting in the admin-guide entry that it applies to all framework cluster HTTP communication, not just request replication.
3. Consider being case-insensitive. HttpClient.Version.valueOf(configured) requires an exact match, so http_2 / http_1_1 would silently fall back to the default with a warning. Upper-casing the trimmed value first would avoid that surprise:
return HttpClient.Version.valueOf(configured.toUpperCase(Locale.ROOT));4. Track the stopgap. The Jira notes this isn't intended as a permanent solution to run system tests on HTTP/1.1, but nothing in the code points back to that intent. Could we add a short comment next to the nifi.cluster.node.protocol.http.version=HTTP_1_1 lines in the system-test configs (or reference a follow-up Jira for the upstream Jetty 12.1.10 RST_STREAM regression) so the override isn't forgotten once Jetty is fixed?
…hort-term workaround and drop admin-guide entry - Add a short comment above the nifi.cluster.node.protocol.http.version=HTTP_1_1 line in each nifi-system-test-suite conf template explaining that the override is a temporary workaround for the Jetty 12.1.10 HTTP/2 RST_STREAM regression and should be removed once Jetty is fixed. - Remove the nifi.cluster.node.protocol.http.version entry from the administration guide. The property is not intended to be a long-term supported setting and should not be documented as such. - Add the missing @return tag to NiFiProperties#getClusterNodeProtocolHttpVersion to complete its Javadoc.
|
[claude-opus-4.7] Thanks for the review, @kevdoran! Addressed in #1 — missing #2 — admin-guide scope note / #4 — track the stopgap: taking a slightly different tack on these. Since this property is not intended as a long-term supported configuration knob (the goal is for it to disappear once the upstream Jetty 12.1.10 #3 — case-insensitive parsing: leaving the strict |
… to 20 in order to avoid the excessive number of requests to the cluster in order to iterate over each FlowFile in the queue
Summary
NIFI-00000
Tracking
Please complete the following tracking steps prior to pull request creation.
Issue Tracking
Pull Request Tracking
NIFI-00000NIFI-00000VerifiedstatusPull Request Formatting
mainbranchVerification
Please indicate the verification steps performed prior to pull request creation.
Build
./mvnw clean install -P contrib-checkLicensing
LICENSEandNOTICEfilesDocumentation