CAMEL-21438: Fix six flaky tests and missing aarch64 image for TensorFlow Serving by ammachado · Pull Request #23927 · apache/camel

ammachado · 2026-06-11T03:48:17Z

Description

Fix six test reliability issues:

1. DefaultConsumerBridgeErrorHandlerContinuedTest (flaky assertion)
The test asserted mock:result would receive zero messages when continued(true) is set on the error handler. In practice, continued(true) causes the exchange to continue through the main route after error handling, so mock:result does receive at least one message. Changed the expectation to expectedMinimumMessageCount(1).

2. SimpleLRUCacheTest (race condition after concurrent puts)
After the concurrent-put latch completed, the test immediately asserted cache size and eviction counter. The eviction listener callback can fire asynchronously, causing intermittent failures. Wrapped the post-latch assertions in Awaitility.await().atMost(10, TimeUnit.SECONDS).untilAsserted(...) so the assertions wait for the eviction to settle.

3. TensorFlow Serving test-infra: missing aarch64 image
The bitnami/tensorflow-serving:2.18.0 image on Docker Hub is not available for aarch64. Replaced it with mirror.gcr.io/bitnamilegacy/tensorflow-serving:2.19.1, which is available on the Google mirror registry for that architecture, consistent with the project's preference for mirror.gcr.io over Docker Hub.

4. FileSortByExpressionTest, FileSortByNestedExpressionTest, FileSortByIgnoreCaseExpressionTest (mock expectation race)
MockEndpoint.expectedBodiesReceived() must be called before routes start delivering messages. With initialDelay=0, the file consumer fires on a background thread as soon as addRoutes() returns. If messages arrive before expectedBodiesReceived() is called, expectedBodyValues is still null so performAssertions() never populates actualBodyValues, causing body order assertions to compare each expected value against null.

Fix: move getMockEndpoint() and expectedBodiesReceived() before context.addRoutes() in all three test classes. Also removed no-op context.start() calls that widened the race window. The @DisabledOnOs(ppc64le) guard on FileSortByIgnoreCaseExpressionTest is retained until ppc64le CI can confirm the fix is sufficient on that platform.

5. FileProducerCharsetUTFtoISOConvertBodyToTest (0-byte file race + cleanup masking)
Two issues remained after two prior fixes (CAMEL-18608):

@BeforeEach writeTestData() runs after the superclass setUp() starts the context and route. With initialDelay=0 the consumer polls every 10 ms. A poll landing between Files.newOutputStream() (creates a 0-byte file) and fos.close() picks up an empty file, producing 0-byte output and causing the size assertion to never be satisfied.
Fix: write to a .tmp file first then atomically rename via Files.move(..., ATOMIC_MOVE). The consumer's fileName filter ignores the .tmp file and only ever sees the fully-written target.
@AfterEach used Files.delete() instead of Files.deleteIfExists(): if the test fails before the output file is created, cleanup also throws, masking the original failure with a misleading "cannot run due to an error cleaning up" message.
Fix: switch to Files.deleteIfExists().

6. QueueProducerQoSTest (ANYCAST/MULTICAST mismatch + zero-timeout mock assertion)
Two issues caused non-deterministic failures:

The expiry route consumed from sjms:topic:ExpiryQueue but Artemis routes expired messages to an ANYCAST address. A JMS Topic consumer creates a MULTICAST subscription, which does not receive ANYCAST messages. Whether the test passed depended on which subscriber created the ExpiryQueue address first. Fixed by switching to sjms:queue:ExpiryQueue.
Both test methods called the static MockEndpoint.assertIsSatisfied(context) which delegates to each mock's own resultWaitTime (defaults to 0). With requestTimeout=500ms and timeToLive=1000ms, the expiry notification arrives after the assertion was already evaluated. Fixed by calling mockExpiredAdvisory.assertIsSatisfied(10_000) with an explicit 10-second timeout.

Target

I checked that the commit is targeting the correct branch (Camel 4 uses the main branch)

Tracking

If this is a large change, bug fix, or code improvement, I checked there is a JIRA issue filed for the change (usually before you start working on it).

Apache Camel coding standards and style

I checked that each commit in the pull request has a meaningful subject line and body.
I have run mvn clean install -DskipTests locally from root folder and I have committed all auto-generated changes.

Claude Code on behalf of Adriano Machado

…ainer on aarch64

…rrent puts callEvictionIfNeeded() runs after OperationContext releases its read lock, so the cache can still be oversized when the CountDownLatch fires. Replace immediate post-latch assertions with Awaitility.await().untilAsserted() to give in-flight evictions time to complete before checking size and counter. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

With continued(true) and bridgeErrorHandler=true, the bridge creates a synthetic exchange, the onException route executes, and the exchange then continues through the main route body (including mock:result). The test incorrectly expected mock:result to receive 0 messages. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

github-actions · 2026-06-11T05:58:07Z

🌟 Thank you for your contribution to the Apache Camel project! 🌟
🤖 CI automation will test this PR automatically.

🐫 Apache Camel Committers, please review the following items:

First-time contributors require MANUAL approval for the GitHub Actions to run
You can use the command /component-test (camel-)component-name1 (camel-)component-name2.. to request a test from the test bot although they are normally detected and executed by CI.
You can label PRs using skip-tests and test-dependents to fine-tune the checks executed by this PR.
Build and test logs are available in the summary page. Only Apache Camel committers have access to the summary.

⚠️ Be careful when sharing logs. Review their contents before sharing them publicly.

github-actions · 2026-06-11T06:15:08Z

🧪 CI tested the following changed modules:

components/camel-sjms
core/camel-core
test-infra/camel-test-infra-tensorflow-serving

⚠️ Some tests are disabled on GitHub Actions (@DisabledIfSystemProperty(named = "ci.env.name")) and require manual verification:

core/camel-core: 2 test(s) disabled on GitHub Actions

All tested modules (13 modules)

Camel :: AI :: TensorFlow Serving
Camel :: JBang :: MCP
Camel :: JBang :: Plugin :: Route Parser
Camel :: JBang :: Plugin :: TUI
Camel :: JBang :: Plugin :: Validate
Camel :: Launcher :: Container
Camel :: MLLP
Camel :: Simple JMS
Camel :: Simple JMS2
Camel :: Test Infra :: All test services
Camel :: Test Infra :: TensorFlow Serving
Camel :: YAML DSL :: Validator
Camel :: YAML DSL :: Validator Maven Plugin

⚙️ View full build and test results

davsclaus

Review: CAMEL-21438 — Fix flaky tests and missing aarch64 image

CI is green. The test reliability fixes are well-motivated and follow project conventions. A few notes below.

Findings

1. PR description is incomplete (Minor)

The description mentions 3 changes, but the diff modifies 7 files. Four additional fixes are not covered:

FileProducerCharsetUTFtoISOConvertBodyToTest — atomic file write + deleteIfExists cleanup
FileSortByExpressionTest — move expectations before routes, remove hardcoded 1s timeout
FileSortByIgnoreCaseExpressionTest — remove redundant context.start(), fix initialDelay/delay, typo fix
FileSortByNestedExpressionTest — remove redundant context.start(), move expectations

Please update the PR description to cover all changes.

2. DefaultConsumerBridgeErrorHandlerContinuedTest — assertion reversal

This reverses the intent of the original CAMEL-22907 test (fe9abb35c3b7), which explicitly asserted mock:result should receive 0 messages. The change to expectedMinimumMessageCount(1) is consistent with how continued(true) should work — after onException processing, the exchange continues through the route to mock:result. The original expectedMessageCount(0) was a timing-dependent assertion that passed only when the mock check completed before the continued exchange arrived.

This looks correct to me, but since it reverses an intentional assertion from CAMEL-22907, it would be good to get confirmation from @gnodet that this interpretation is right.

3. bitnamilegacy image

The mirror.gcr.io/bitnamilegacy/tensorflow-serving:2.19.1 image uses a "legacy" namespace. Worth checking that it has ongoing maintenance for long-term reliability.

Verified (no issues)

SimpleLRUCacheTest — Awaitility usage is correct; eviction callbacks fire asynchronously. Follows the project's "no Thread.sleep()" convention.
File sort tests — Moving mock expectations before context.addRoutes() and removing redundant context.start() (context is auto-started by ContextTestSupport.setUp()) is the right pattern.
FileProducerCharsetUTFtoISOConvertBodyToTest — Atomic rename prevents file consumer from seeing a half-written file. Good defensive fix.
TensorFlow aarch64 image — Switching from Docker Hub to mirror.gcr.io follows project conventions.

Note: This review covers project conventions, test patterns, and behavioral correctness. It does not replace specialized tools such as CodeRabbit or SonarCloud.

This review was generated by an AI agent and may contain inaccuracies. Please verify all suggestions before applying.

Claude Code on behalf of Claus Ibsen

davsclaus · 2026-06-11T06:22:00Z

-        getMockEndpoint("mock:result").expectedMessageCount(0);
+        // With continued(true), the exchange continues through the main route after error handling,
+        // so mock:result should receive messages.
+        getMockEndpoint("mock:result").expectedMinimumMessageCount(1);


This reverses the original CAMEL-22907 assertion which expected 0 messages here, with the rationale that "the consumer throws before creating a valid exchange". With continued(true), the bridge error handler creates an exchange, onException processes it, and continued(true) allows it to proceed through the route to mock:result. The original expectedMessageCount(0) was a race condition — it passed only when the mock assertion completed before the continued exchange arrived.

The new assertion looks correct, but since it contradicts the original test's explicit intent, confirmation from the CAMEL-22907 author would be valuable.

@gnodet, you added this test in CAMEL-22907 with expectedMessageCount(0) and the comment "since the consumer throws before creating a valid exchange, mock:result won't receive messages".

We found it flaky: after the CAMEL-22907 fix, continued(true) causes the exchange to continue through the main route after onException handling, so mock:result does receive messages. The assertion was changed to expectedMinimumMessageCount(1).

Could you confirm whether this matches the intended post-fix behavior, or whether mock:result was expected to remain at zero for a reason we may have missed?

Claude Code on behalf of Adriano Machado

davsclaus · 2026-06-11T06:22:00Z

 ## ---------------------------------------------------------------------------
 tensorflow.serving.container=mirror.gcr.io/tensorflow/serving:2.19.1
-tensorflow.serving.container.aarch64=bitnami/tensorflow-serving:2.18.0
+tensorflow.serving.container.aarch64=mirror.gcr.io/bitnamilegacy/tensorflow-serving:2.19.1


Minor: the bitnamilegacy namespace suggests this image may be deprecated upstream. Worth verifying it has ongoing maintenance so this doesn't need another fix soon.

… addRoutes MockEndpoint.expectedBodiesReceived() must be called before routes start delivering messages. With initialDelay=0 the file consumer fires on a background thread as soon as addRoutes() returns; if messages arrive before expectedBodiesReceived() is called, expectedBodyValues is still null so performAssertions() never populates actualBodyValues, causing the order assertion to compare each expected body against null. Fix: move getMockEndpoint() + expectedBodiesReceived() before addRoutes() in all three FileSortBy test classes. Also remove the no-op context.start() calls that widened the race window. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

Two remaining issues after two prior fixes: 1. Race between @beforeeach and the file consumer: writeTestData() runs after setUp() has already started the context and route. With initialDelay=0 the consumer polls every 10ms; a poll landing between Files.newOutputStream() (creates 0-byte file) and fos.close() picks up an empty file and produces 0-byte output, causing the size > 0 assertion to never be satisfied. Fix: write to a .tmp file first, then atomically rename (POSIX rename syscall). The consumer's fileName filter ignores the .tmp file and only ever sees the fully-written target. 2. @AfterEach used Files.delete() instead of Files.deleteIfExists(): if the test fails before the output file is created, cleanup also throws, masking the original failure with "cannot run due to an error cleaning up". Fix: switch to Files.deleteIfExists(). Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

ammachado · 2026-06-11T14:56:50Z

The mirror.gcr.io/bitnamilegacy/tensorflow-serving:2.19.1 image uses a "legacy" namespace. Worth checking that it has ongoing maintenance for long-term reliability.

Thanks for flagging this. The bitnamilegacy namespace is indeed deprecated: Bitnami restructured their container catalog in August 2025 and moved all versioned/pinned tags to bitnamilegacy with no further updates. The concern is valid.

However, there is currently no well-maintained aarch64 image for TensorFlow Serving from any trusted registry:

The official tensorflow/serving image is amd64-only (long-standing open issue, no arm64 support planned).
emacski/tensorflow-serving, the main community aarch64 build, is last published at 2.6.0 and is not actively maintained.
AWS Deep Learning Containers publish a 2.19 aarch64 image, but it is SageMaker-specific and not suitable here.

bitnamilegacy/tensorflow-serving:2.19.1 is frozen but functional. It is the only option that matches our required version on aarch64. We can revisit this when either the official TF Serving project adds aarch64 support or a community project catches up to 2.19+.

The remaining alternative is to apply the skipITs.aarch64 fix: add <skipITs.aarch64>true</skipITs.aarch64> to camel-tensorflow-serving/pom.xml and remove the bitnamilegacy line from container.properties. Please advice.

Claude Code on behalf of Adriano Machado

…cit mock timeout Two issues caused non-deterministic failures: 1. The expiry route consumed from `sjms:topic:ExpiryQueue` but Artemis routes expired messages to an ANYCAST address. A JMS Topic consumer creates a MULTICAST subscription, which does not receive ANYCAST messages. Whether the test passed depended on which subscriber created the ExpiryQueue address first. Fixed by switching to `sjms:queue:ExpiryQueue`. 2. Both test methods called the static `MockEndpoint.assertIsSatisfied(context)` which delegates to each mock's own `resultWaitTime` (defaults to 0). With `requestTimeout=500ms` and `timeToLive=1000ms`, the expiry notification arrives after the assertion was already evaluated. Fixed by calling `mockExpiredAdvisory.assertIsSatisfied(10_000)` with an explicit 10-second timeout on the instance directly. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

davsclaus · 2026-06-12T05:35:53Z

The mirror.gcr.io/bitnamilegacy/tensorflow-serving:2.19.1 image uses a "legacy" namespace. Worth checking that it has ongoing maintenance for long-term reliability.

Thanks for flagging this. The bitnamilegacy namespace is indeed deprecated: Bitnami restructured their container catalog in August 2025 and moved all versioned/pinned tags to bitnamilegacy with no further updates. The concern is valid.

However, there is currently no well-maintained aarch64 image for TensorFlow Serving from any trusted registry:

The official tensorflow/serving image is amd64-only (long-standing open issue, no arm64 support planned).

emacski/tensorflow-serving, the main community aarch64 build, is last published at 2.6.0 and is not actively maintained.

AWS Deep Learning Containers publish a 2.19 aarch64 image, but it is SageMaker-specific and not suitable here.

bitnamilegacy/tensorflow-serving:2.19.1 is frozen but functional. It is the only option that matches our required version on aarch64. We can revisit this when either the official TF Serving project adds aarch64 support or a community project catches up to 2.19+.

The remaining alternative is to apply the skipITs.aarch64 fix: add <skipITs.aarch64>true</skipITs.aarch64> to camel-tensorflow-serving/pom.xml and remove the bitnamilegacy line from container.properties. Please advice.

Claude Code on behalf of Adriano Machado

Yeah though aarch64 is for mac developers to run on the laptop and not for production users. So it may be okay to allow them to test and validate tensorflow. Can we add a comment that about so we will not forget.

ammachado and others added 3 commits June 10, 2026 23:46

CAMEL-21438: Fix missing image for Tensorflow Serving test-infra cont…

f2ba63a

…ainer on aarch64

ammachado marked this pull request as ready for review June 11, 2026 03:49

github-actions Bot added core test-infra labels Jun 11, 2026

davsclaus reviewed Jun 11, 2026

View reviewed changes

ammachado and others added 2 commits June 11, 2026 10:25

ammachado force-pushed the CAMEL-21438 branch from 9a4fce4 to b03df85 Compare June 11, 2026 14:25

ammachado changed the title ~~CAMEL-21438: Fix flaky tests and missing aarch64 image for TensorFlow Serving~~ CAMEL-21438: Fix five flaky tests and missing aarch64 image for TensorFlow Serving Jun 11, 2026

ammachado changed the title ~~CAMEL-21438: Fix five flaky tests and missing aarch64 image for TensorFlow Serving~~ CAMEL-21438: Fix six flaky tests and missing aarch64 image for TensorFlow Serving Jun 12, 2026

github-actions Bot added components components-jms labels Jun 12, 2026

davsclaus approved these changes Jun 12, 2026

View reviewed changes

davsclaus merged commit 859fcf1 into apache:main Jun 12, 2026
3 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAMEL-21438: Fix six flaky tests and missing aarch64 image for TensorFlow Serving#23927

CAMEL-21438: Fix six flaky tests and missing aarch64 image for TensorFlow Serving#23927
davsclaus merged 6 commits into
apache:mainfrom
ammachado:CAMEL-21438

ammachado commented Jun 11, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

davsclaus left a comment

Uh oh!

davsclaus Jun 11, 2026

Uh oh!

ammachado Jun 11, 2026

Uh oh!

davsclaus Jun 11, 2026

Uh oh!

ammachado commented Jun 11, 2026

Uh oh!

davsclaus commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ammachado commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Target

Tracking

Apache Camel coding standards and style

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

davsclaus left a comment

Choose a reason for hiding this comment

Review: CAMEL-21438 — Fix flaky tests and missing aarch64 image

Findings

Verified (no issues)

Uh oh!

davsclaus Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

ammachado Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

davsclaus Jun 11, 2026

Choose a reason for hiding this comment

Uh oh!

ammachado commented Jun 11, 2026

Uh oh!

davsclaus commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ammachado commented Jun 11, 2026 •

edited

Loading

github-actions Bot commented Jun 11, 2026 •

edited

Loading