CAMEL-21438: Fix six flaky tests and missing aarch64 image for TensorFlow Serving#23927
Conversation
…rrent puts callEvictionIfNeeded() runs after OperationContext releases its read lock, so the cache can still be oversized when the CountDownLatch fires. Replace immediate post-latch assertions with Awaitility.await().untilAsserted() to give in-flight evictions time to complete before checking size and counter. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
With continued(true) and bridgeErrorHandler=true, the bridge creates a synthetic exchange, the onException route executes, and the exchange then continues through the main route body (including mock:result). The test incorrectly expected mock:result to receive 0 messages. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
|
🌟 Thank you for your contribution to the Apache Camel project! 🌟 🐫 Apache Camel Committers, please review the following items:
|
|
🧪 CI tested the following changed modules:
All tested modules (13 modules)
|
davsclaus
left a comment
There was a problem hiding this comment.
Review: CAMEL-21438 — Fix flaky tests and missing aarch64 image
CI is green. The test reliability fixes are well-motivated and follow project conventions. A few notes below.
Findings
1. PR description is incomplete (Minor)
The description mentions 3 changes, but the diff modifies 7 files. Four additional fixes are not covered:
FileProducerCharsetUTFtoISOConvertBodyToTest— atomic file write +deleteIfExistscleanupFileSortByExpressionTest— move expectations before routes, remove hardcoded 1s timeoutFileSortByIgnoreCaseExpressionTest— remove redundantcontext.start(), fixinitialDelay/delay, typo fixFileSortByNestedExpressionTest— remove redundantcontext.start(), move expectations
Please update the PR description to cover all changes.
2. DefaultConsumerBridgeErrorHandlerContinuedTest — assertion reversal
This reverses the intent of the original CAMEL-22907 test (fe9abb35c3b7), which explicitly asserted mock:result should receive 0 messages. The change to expectedMinimumMessageCount(1) is consistent with how continued(true) should work — after onException processing, the exchange continues through the route to mock:result. The original expectedMessageCount(0) was a timing-dependent assertion that passed only when the mock check completed before the continued exchange arrived.
This looks correct to me, but since it reverses an intentional assertion from CAMEL-22907, it would be good to get confirmation from @gnodet that this interpretation is right.
3. bitnamilegacy image
The mirror.gcr.io/bitnamilegacy/tensorflow-serving:2.19.1 image uses a "legacy" namespace. Worth checking that it has ongoing maintenance for long-term reliability.
Verified (no issues)
SimpleLRUCacheTest— Awaitility usage is correct; eviction callbacks fire asynchronously. Follows the project's "noThread.sleep()" convention.- File sort tests — Moving mock expectations before
context.addRoutes()and removing redundantcontext.start()(context is auto-started byContextTestSupport.setUp()) is the right pattern. FileProducerCharsetUTFtoISOConvertBodyToTest— Atomic rename prevents file consumer from seeing a half-written file. Good defensive fix.- TensorFlow aarch64 image — Switching from Docker Hub to
mirror.gcr.iofollows project conventions.
Note: This review covers project conventions, test patterns, and behavioral correctness. It does not replace specialized tools such as CodeRabbit or SonarCloud.
This review was generated by an AI agent and may contain inaccuracies. Please verify all suggestions before applying.
Claude Code on behalf of Claus Ibsen
| getMockEndpoint("mock:result").expectedMessageCount(0); | ||
| // With continued(true), the exchange continues through the main route after error handling, | ||
| // so mock:result should receive messages. | ||
| getMockEndpoint("mock:result").expectedMinimumMessageCount(1); |
There was a problem hiding this comment.
This reverses the original CAMEL-22907 assertion which expected 0 messages here, with the rationale that "the consumer throws before creating a valid exchange". With continued(true), the bridge error handler creates an exchange, onException processes it, and continued(true) allows it to proceed through the route to mock:result. The original expectedMessageCount(0) was a race condition — it passed only when the mock assertion completed before the continued exchange arrived.
The new assertion looks correct, but since it contradicts the original test's explicit intent, confirmation from the CAMEL-22907 author would be valuable.
There was a problem hiding this comment.
@gnodet, you added this test in CAMEL-22907 with expectedMessageCount(0) and the comment "since the consumer throws before creating a valid exchange, mock:result won't receive messages".
We found it flaky: after the CAMEL-22907 fix, continued(true) causes the exchange to continue through the main route after onException handling, so mock:result does receive messages. The assertion was changed to expectedMinimumMessageCount(1).
Could you confirm whether this matches the intended post-fix behavior, or whether mock:result was expected to remain at zero for a reason we may have missed?
Claude Code on behalf of Adriano Machado
| ## --------------------------------------------------------------------------- | ||
| tensorflow.serving.container=mirror.gcr.io/tensorflow/serving:2.19.1 | ||
| tensorflow.serving.container.aarch64=bitnami/tensorflow-serving:2.18.0 | ||
| tensorflow.serving.container.aarch64=mirror.gcr.io/bitnamilegacy/tensorflow-serving:2.19.1 |
There was a problem hiding this comment.
Minor: the bitnamilegacy namespace suggests this image may be deprecated upstream. Worth verifying it has ongoing maintenance so this doesn't need another fix soon.
… addRoutes MockEndpoint.expectedBodiesReceived() must be called before routes start delivering messages. With initialDelay=0 the file consumer fires on a background thread as soon as addRoutes() returns; if messages arrive before expectedBodiesReceived() is called, expectedBodyValues is still null so performAssertions() never populates actualBodyValues, causing the order assertion to compare each expected body against null. Fix: move getMockEndpoint() + expectedBodiesReceived() before addRoutes() in all three FileSortBy test classes. Also remove the no-op context.start() calls that widened the race window. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Two remaining issues after two prior fixes: 1. Race between @beforeeach and the file consumer: writeTestData() runs after setUp() has already started the context and route. With initialDelay=0 the consumer polls every 10ms; a poll landing between Files.newOutputStream() (creates 0-byte file) and fos.close() picks up an empty file and produces 0-byte output, causing the size > 0 assertion to never be satisfied. Fix: write to a .tmp file first, then atomically rename (POSIX rename syscall). The consumer's fileName filter ignores the .tmp file and only ever sees the fully-written target. 2. @AfterEach used Files.delete() instead of Files.deleteIfExists(): if the test fails before the output file is created, cleanup also throws, masking the original failure with "cannot run due to an error cleaning up". Fix: switch to Files.deleteIfExists(). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Thanks for flagging this. The However, there is currently no well-maintained
The remaining alternative is to apply the Claude Code on behalf of Adriano Machado |
…cit mock timeout Two issues caused non-deterministic failures: 1. The expiry route consumed from `sjms:topic:ExpiryQueue` but Artemis routes expired messages to an ANYCAST address. A JMS Topic consumer creates a MULTICAST subscription, which does not receive ANYCAST messages. Whether the test passed depended on which subscriber created the ExpiryQueue address first. Fixed by switching to `sjms:queue:ExpiryQueue`. 2. Both test methods called the static `MockEndpoint.assertIsSatisfied(context)` which delegates to each mock's own `resultWaitTime` (defaults to 0). With `requestTimeout=500ms` and `timeToLive=1000ms`, the expiry notification arrives after the assertion was already evaluated. Fixed by calling `mockExpiredAdvisory.assertIsSatisfied(10_000)` with an explicit 10-second timeout on the instance directly. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Yeah though aarch64 is for mac developers to run on the laptop and not for production users. So it may be okay to allow them to test and validate tensorflow. Can we add a comment that about so we will not forget. |
Description
Fix six test reliability issues:
1.
DefaultConsumerBridgeErrorHandlerContinuedTest(flaky assertion)The test asserted
mock:resultwould receive zero messages whencontinued(true)is set on the error handler. In practice,continued(true)causes the exchange to continue through the main route after error handling, somock:resultdoes receive at least one message. Changed the expectation toexpectedMinimumMessageCount(1).2.
SimpleLRUCacheTest(race condition after concurrent puts)After the concurrent-put latch completed, the test immediately asserted cache size and eviction counter. The eviction listener callback can fire asynchronously, causing intermittent failures. Wrapped the post-latch assertions in
Awaitility.await().atMost(10, TimeUnit.SECONDS).untilAsserted(...)so the assertions wait for the eviction to settle.3. TensorFlow Serving test-infra: missing aarch64 image
The
bitnami/tensorflow-serving:2.18.0image on Docker Hub is not available foraarch64. Replaced it withmirror.gcr.io/bitnamilegacy/tensorflow-serving:2.19.1, which is available on the Google mirror registry for that architecture, consistent with the project's preference formirror.gcr.ioover Docker Hub.4.
FileSortByExpressionTest,FileSortByNestedExpressionTest,FileSortByIgnoreCaseExpressionTest(mock expectation race)MockEndpoint.expectedBodiesReceived()must be called before routes start delivering messages. WithinitialDelay=0, the file consumer fires on a background thread as soon asaddRoutes()returns. If messages arrive beforeexpectedBodiesReceived()is called,expectedBodyValuesis still null soperformAssertions()never populatesactualBodyValues, causing body order assertions to compare each expected value againstnull.Fix: move
getMockEndpoint()andexpectedBodiesReceived()beforecontext.addRoutes()in all three test classes. Also removed no-opcontext.start()calls that widened the race window. The@DisabledOnOs(ppc64le)guard onFileSortByIgnoreCaseExpressionTestis retained until ppc64le CI can confirm the fix is sufficient on that platform.5.
FileProducerCharsetUTFtoISOConvertBodyToTest(0-byte file race + cleanup masking)Two issues remained after two prior fixes (CAMEL-18608):
@BeforeEach writeTestData()runs after the superclasssetUp()starts the context and route. WithinitialDelay=0the consumer polls every 10 ms. A poll landing betweenFiles.newOutputStream()(creates a 0-byte file) andfos.close()picks up an empty file, producing 0-byte output and causing the size assertion to never be satisfied.Fix: write to a
.tmpfile first then atomically rename viaFiles.move(..., ATOMIC_MOVE). The consumer'sfileNamefilter ignores the.tmpfile and only ever sees the fully-written target.@AfterEachusedFiles.delete()instead ofFiles.deleteIfExists(): if the test fails before the output file is created, cleanup also throws, masking the original failure with a misleading "cannot run due to an error cleaning up" message.Fix: switch to
Files.deleteIfExists().6.
QueueProducerQoSTest(ANYCAST/MULTICAST mismatch + zero-timeout mock assertion)Two issues caused non-deterministic failures:
sjms:topic:ExpiryQueuebut Artemis routes expired messages to an ANYCAST address. A JMS Topic consumer creates a MULTICAST subscription, which does not receive ANYCAST messages. Whether the test passed depended on which subscriber created the ExpiryQueue address first. Fixed by switching tosjms:queue:ExpiryQueue.MockEndpoint.assertIsSatisfied(context)which delegates to each mock's ownresultWaitTime(defaults to 0). WithrequestTimeout=500msandtimeToLive=1000ms, the expiry notification arrives after the assertion was already evaluated. Fixed by callingmockExpiredAdvisory.assertIsSatisfied(10_000)with an explicit 10-second timeout.Target
mainbranch)Tracking
Apache Camel coding standards and style
mvn clean install -DskipTestslocally from root folder and I have committed all auto-generated changes.Claude Code on behalf of Adriano Machado