Skip to content

WIP: Add Application Load Balancer Controller Manager#879

Open
kamilprzybyl wants to merge 74 commits into
mainfrom
feat/kp/add-alb-ingress-controller
Open

WIP: Add Application Load Balancer Controller Manager#879
kamilprzybyl wants to merge 74 commits into
mainfrom
feat/kp/add-alb-ingress-controller

Conversation

@kamilprzybyl
Copy link
Copy Markdown

How to categorize this PR?

What this PR does / why we need it:

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Breaking changes:

@ske-prow
Copy link
Copy Markdown

ske-prow Bot commented Mar 17, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign timebertt for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ske-prow ske-prow Bot added do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 17, 2026
@kamilprzybyl kamilprzybyl changed the title Add Application Load Balancer Controller Manager WIP: Add Application Load Balancer Controller Manager Mar 17, 2026
@ske-prow ske-prow Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 17, 2026
@kamilprzybyl kamilprzybyl force-pushed the feat/kp/add-alb-ingress-controller branch from 7ecc416 to 86bdf1d Compare March 17, 2026 09:37
// generate self-signed certificates for the metrics server. While convenient for development and testing,
// this setup is not recommended for production.
//
// TODO(user): If you enable certManager, uncomment the following lines:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to create separate examples for that

Comment thread cmd/application-load-balancer-controller-manager/main.go Outdated
Comment thread pkg/alb/ingress/alb_spec.go Outdated
Comment thread pkg/alb/ingress/alb_spec.go Outdated
Comment thread pkg/alb/ingress/alb_spec.go Outdated
sort.SliceStable(ruleMetadataList, func(i, j int) bool {
a, b := ruleMetadataList[i], ruleMetadataList[j]
// 1. Host name (lexicographically)
if a.host != b.host {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work with * host matcher?

Comment thread pkg/alb/ingress/alb_spec.go Outdated
}

// cleanupCerts deletes the certificates from the Certificates API that are no longer associated with any Ingress in the IngressClass
func (r *IngressClassReconciler) cleanupCerts(ctx context.Context, ingressClass *networkingv1.IngressClass, ingresses []*networkingv1.Ingress) error {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really want to delete certificates? That could be very dangerous as users might just not associate the certificate anymore and want to keep it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each Ingress TLS secret reference is mapped to a unique ALB Certificate resource. Even if multiple Ingresses reference the same k8s Secret, they will each trigger the creation of a separate, independent certificate on the ALB. That means, deleting one Ingress (and its associated ALB Certificate) does not impact the certificates still being used by other Ingresses.

Comment thread pkg/alb/ingress/alb_spec.go Outdated

// isCertValid checks if the certificate chain is complete. It is used for checking if
// the cert-manager's ACME challenge is completed, or if it's sill ongoing.
func isCertValid(secret *corev1.Secret) (bool, error) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that already handled by the certificate api?

Copy link
Copy Markdown
Author

@kamilprzybyl kamilprzybyl Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cert API only checks if the public and private keys are matching + if the certificate is a valid X.509 format but it does not reject incomplete chains.

Renamed the function from isCertValid to isCertReady and clarified the function in the comment above.

Comment thread samples/ingress/issuer.yaml Outdated
@ske-prow ske-prow Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 19, 2026
Comment thread pkg/alb/ingress/ingressclass_controller.go Outdated
@kamilprzybyl kamilprzybyl force-pushed the feat/kp/add-alb-ingress-controller branch 2 times, most recently from d8de28a to f903b25 Compare March 20, 2026 19:05
@kamilprzybyl
Copy link
Copy Markdown
Author

We currently only delete certificates when the IngressClass is deleted or no Ingress references the IngressClass.
We need to add logic to clean up certificates that are no longer referenced by active Ingresses.

@kamilprzybyl
Copy link
Copy Markdown
Author

We currently only delete certificates when the IngressClass is deleted or no Ingress references the IngressClass. We need to add logic to clean up certificates that are no longer referenced by active Ingresses.

@kamilprzybyl kamilprzybyl reopened this Mar 25, 2026
@dergeberl dergeberl mentioned this pull request Mar 25, 2026
@dergeberl dergeberl force-pushed the feat/kp/add-alb-ingress-controller branch from 3ad18ef to 9d6f767 Compare March 26, 2026 10:04
@ske-prow ske-prow Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 26, 2026
Comment thread pkg/alb/ingress/alb_spec.go Outdated
// It uses the trusted CAs from the operating system for validation.
tlsBridgingTrustedCaAnnotation = "alb.stackit.cloud/tls-bridging-trusted-ca"
// If set, the application load balancer enables TLS bridging with a custom CA provided as value.
tlsBridgingCustomCaAnnotation = "alb.stackit.cloud/tls-bridging-custom-ca"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably use as value a reference to a TLS secret.

Comment thread pkg/alb/ingress/alb_spec.go Outdated
// It merges and sorts all routing rules across the ingresses based on host, priority, path specificity, path type, and ingress origin.
// The resulting ALB payload includes targets derived from cluster nodes, target pools per backend service, HTTP(S) listeners,
// and optional TLS certificate bindings. This spec is later used to create or update the actual ALB instance.
func (r *IngressClassReconciler) albSpecFromIngress( //nolint:funlen,gocyclo // We go through a lot of fields. Not much complexity.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is indeed complex, should be split up into multiple functions.

Comment thread pkg/alb/ingress/alb_spec.go Outdated
return 0, fmt.Errorf("service not found: %s", path.Backend.Service.Name)
}

if path.Backend.Service.Port.Name != "" {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be refactored to have less branching, 4 indents is too much imo.

Comment thread docs/albcm.md Outdated
Comment thread docs/albcm.md Outdated
Comment thread docs/deployment.md
## Overview

The STACKIT Cloud Provider includes both the Cloud Controller Manager (CCM) for managing cloud resources and the CSI driver for persistent storage. This deployment provides a unified solution for cloud integration and storage provisioning.
The STACKIT Cloud Provider includes the Cloud Controller Manager (CCM) for managing cloud resources, the CSI driver for persistent storage and the Application Load Balancer Controller Manager (ALBCM) for managing STACKIT Application Load Balancer (ALB) via Ingress Resources.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have different deployment docs for each binary. CCM and ALBCM should be different md files. It's easier to comprehend.

Comment thread cmd/application-load-balancer-controller-manager/main.go Outdated
@ske-prow ske-prow Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 1, 2026
@kamilprzybyl kamilprzybyl force-pushed the feat/kp/add-alb-ingress-controller branch from ccab903 to cde054d Compare June 2, 2026 17:50
Comment thread pkg/alb/ingress/add.go
}

targets := []albsdk.Target{}
for _, node := range nodes.Items {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also filter out nodes that are to be removed, similar to the CCM. This also requires extending the watcher predicate.

Copy link
Copy Markdown
Member

@hown3d hown3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some more general code concerns.

var evtErr *errorEvent

if errors.As(errItem, &evtErr) {
log.Info(evtErr.description, "typ", evtErr.typ, "ingressRef", evtErr.ingressRef)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of filtering here for the error type, include the "typ" and "ingressRef" in the Error method of the errorEvent type. This way you can just always use errItem.Error().

ctx context.Context,
class *networkingv1.IngressClass,
ingresses []networkingv1.Ingress,
) (payload *albsdk.CreateLoadBalancerPayload, activeCertIDs map[string]string, validationErrors []error, err error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bad practice to have separate error variables return and also having slices of errors. Please use one error type and use errors.Join to join your errors.
In the calling function you can use unwrap to get a list of all the errors.

Comment thread pkg/alb/ingress/events.go
return e.description
}

func (r *IngressClassReconciler) SendEvents(class *networkingv1.IngressClass, validationErrors []error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a method of the errorEvent type.

func (e *errorEvent) RecordEvent(class *networkingv1.IngressClass, recorder record.EventRecorder) {
	if evtErr.ingressRef.Name == "" {
		return
	}
	recorder.Eventf(class, corev1.EventTypeWarning, "ALB", "Error in %s %s in Namespace %s: %s", evtErr.ingressRef.Kind, evtErr.ingressRef.Name, evtErr.ingressRef.Namespace, evtErr.description)
	recorder.Event(&evtErr.ingressRef, corev1.EventTypeWarning, "ALB", evtErr.description)
}

You currently parse the errors into the errorEvent to later call this SendEvents function to parse the errors as types again. This is not neccessary.

Comment thread pkg/alb/ingress/events.go
type errorEvent struct {
ingressRef corev1.ObjectReference
description string
typ string
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is always set to "error". What purpose does it bring?
Maybe a field.Path could be useful here for the user to directly see from what field in his ingress manifest the error stems.

continue
}
if dstTargetPool.skipCertificateValidation != srcTargetPool.skipCertificateValidation {
mergeErrors = append(mergeErrors, &errorEvent{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use errors.Join and don't return an error slice

Comment thread docs/albcm.md
| `alb.stackit.cloud/cookie-persistence-name` | IngressClass, Ingress | Optional | Sets the name for session cookie persistence. |
| `alb.stackit.cloud/cookie-persistence-ttl-seconds`| IngressClass, Ingress | Optional | Sets the TTL (in seconds) for cookie persistence. |
| `alb.stackit.cloud/priority` | Ingress | Optional | Defines the evaluation priority of the Ingress. |
| `alb.stackit.cloud/traget-pool-tls-enabled` | IngressClass, Ingress, Service | Optional | Enables TLS bridging using OS trusted CAs. |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typos: target

Comment thread docs/albcm.md
Currently, the Ingress API is supported.
Support for Gateway API is planned.

##### Environment Variables
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a section for "Get Started" that also links to the example deployment.yaml that tells a user how to actually run this alb controller in their cluster (if not managed by SKE).
Having a headline and then 5th headline with environment variable is confusing.

Comment thread pkg/alb/ingress/add.go
For(&networkingv1.IngressClass{}, builder.WithPredicates(ingressClassPredicate())).
Watches(&corev1.Node{}, nodeEventHandler(r.Client), builder.WithPredicates(nodePredicate())).
Watches(&networkingv1.Ingress{}, ingressEventHandler(r.Client)).
Watches(&corev1.Secret{}, secretEventHandler(r.Client)).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure to disable the cache for secrets, as this would otherwise get quite huge pretty fast if we watch all secrets.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, the secret contains a server side field selector for "type". Therefore we could just watch for secrets of type TLS to reduce API server churn

)

func init() {
utilruntime.Must(clientgoscheme.AddToScheme(scheme))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to use a custom scheme if we just use kubernetes core resources

Comment thread pkg/alb/ingress/update.go
return len(c.CertificateConfig.CertificateIds) != len(d.CertificateConfig.CertificateIds)
}

func targetPoolsChanged(current, desired []albsdk.TargetPool) bool {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please implement these changed functions with slices.EqualFunc

fischerman and others added 8 commits June 8, 2026 16:01
Utilized getSortedIngressesForIngressClass for getAlbSpecForIngressClass
… alb specs via getAlbSpecForResources at last step
fixed unit tests and added more for ingress controller
unit tests for alb spec are added
ensured that ephemeral address option defaults to false when internal-alb annotation is defined.
@ske-prow
Copy link
Copy Markdown

ske-prow Bot commented Jun 8, 2026

@kamilprzybyl: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cloud-provider-stackit-verify b0bc504 link true /test pull-cloud-provider-stackit-verify

Full PR test history. Your PR dashboard. Command help for this repository.
Please help us cut down on flakes by linking this test failure to an open flake report or filing a new flake report if you can't find an existing one. Also see the gardener testing guideline for how to avoid and hunt flakes.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/needs-kind Indicates a PR lacks a `kind/foo` label and requires one. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants