pipeline-manager: pipeline tags and distinguish client metadata#6229
Conversation
|
so this is not per user, it's per pipeline manager. |
|
I'm a bit confused why we went with folders when users were asking for tags and in the issue everyone seemed to like tags? |
gz
left a comment
There was a problem hiding this comment.
lets just build the version people asked for (tags) from the beginning no need to confuse users with different models
| @@ -0,0 +1,6 @@ | |||
| -- Add metadata field to the pipeline table. | |||
| -- This is arbitrary, optional text provided by the user. | |||
| ALTER TABLE pipeline ADD COLUMN metadata TEXT NOT NULL DEFAULT ''; | |||
There was a problem hiding this comment.
I don't think it's good to have a catch-all JSON element this. We're already strictly structured, we dont have to switch to the dark side of using unstructured data on top of a relational database.
| pub struct PipelineInfo { | ||
| pub id: PipelineId, | ||
| pub name: String, | ||
| /// Deprecated: use `metadata` instead. |
There was a problem hiding this comment.
I don't think we need to mark it deprecated; nothing has changed we weren't using it before we dont need to use it now?
There was a problem hiding this comment.
I want to discuss this more with Simon once he's back; I think it's benefitial to fold the "description" column into the "metadata" column model I propose here
| && program_config.is_none()) | ||
| ); | ||
|
|
||
| // Metadata-only fast path. `metadata` is a free-form client-side annotation |
There was a problem hiding this comment.
I dont understand how it can be free form and at the same time serve as a web-ui folder structure.
If we use if for that IMO it deserves an well structured field that is reserved for use by webconsole so we dont have arbitrary clients overwriting/using it differently.
The model I am proposing is not "unstructured", it just doesn't bind relational structure to the shape structure for a family of pipeline information fields with similar semantics - they are not used by the backend, not relevant to backend or pipeline deployment lifecycle, only used by the clients - "description", "tags", "path". JSON operations on a small dataset (for the number of pipelines) are fine in Postgres. Are you saying you would prefer a e.g. "tags" column in PipelineDescriptor, with the same semantics as the "metadata" field I added? (can be changed without triggering pipeline version updates, when the pipeline is running)? |
Two points I want to make here:
|
|
The customer did not ask for tags specifically, they listed tags and folders as ways to solve their UX problem. I need to discuss tags with Anna because the feature is not trivial - good location to show and edit tags, and e.g. version can be displayed as another tag, which could improve the design and UX |
|
Whether tags or folders, the feature is likely to be used by tooling around Feldera some of our customers have built, not just web-console |
They see the change immediately - within the polling period of web-console, which is 2 seconds now |
|
What I would like to do is merge this PR with folders as a simpler solution, and then open a separate PR for user-defined tags and search-by-tag. Tags and folders are orthogonal, both are optional, both are useful. |
snkas
left a comment
There was a problem hiding this comment.
My design suggestion for the backend is:
- Adding a pipeline field
tags, of typeVec<String> - Adding a pipeline table field
tagswith typeVARCHAR[]orTEXT[] - The
tagsfield can be edited like any other field usingPATCH /v0/pipelines/<name> - Only tricky part is allowing the tags to be editable while the pipeline is running.
I can get to it sometime mid next week.
Reasoning for my design:
- In future, the backend can potentially filter by tag without having to do JSON parsing
- In future, the tags can still be migrated as the type is known to the database, without having to do JSON operations
- Having a separate
pipeline_tagtable is likely too much as they are always part of the pipeline - Deprecating
descriptionshould be a separate PR - It seems unnecessary to have
metadataas nested, as what is "metadata" is not as well defined, and at least for now besides tags I'm not sure how much more of these fields we would add
The above does assume that we don't support for instance defining colors of tags, as that would require them to become independently management objects with their own API endpoint.
|
Regarding tags vs. nested directory structure: it seems to be most simple to just have tags, as it allows a pipeline to belong to multiple groups, and it is a concept users are already familiar with (e.g., GitHub). Directory structure is also a big departure from the current list. |
|
NACK from me on folders too. Folders are used in environments where there is a natural filesystem hierarchy for projects and a mix of content types (e.g. workspace, catalogs, schemas and separate files for sql like in dbt and other warehouse cloud products). It's not a good fit here. Tags are the most appropriate here. There's many other things in the backlog we should get to first, but let's go through UI design here as well before implementing anything. |
|
I'll remove the folders and make this PR pipeline-manager only. Tags will be waiting for the UI design |
mythical-fred
left a comment
There was a problem hiding this comment.
The backend (pipeline-manager) changes are well-structured and thoroughly tested — the PipelineFieldUpdates struct with exhaustive destructuring is a nice pattern, and the metadata-only fast-path that skips version/refresh_version bumps is well-reasoned.
However, the web-console side adds ~1,060 lines of new behavioral code (DraggableTreeView, PipelineTreeView, SidebarPipelineTreeView, plus the Table.svelte rewrite) with zero tests. This is a hard-rule violation — behavior changes without tests get REQUEST_CHANGES, no exceptions. See inline comments for specifics.
Gerd's concerns about the metadata field design are still open; I won't duplicate them here.
| @@ -0,0 +1,588 @@ | |||
| <script lang="ts" generics="T extends { name: string }"> | |||
There was a problem hiding this comment.
This is 588 lines of new interactive behavior — drag-and-drop, folder creation, folder renaming, expand/collapse state, multi-selection propagation. None of it has tests.
Good first targets for Vitest + @testing-library/svelte:
- Tree-building logic (
ensureFolder,sortNode) — extract into a pure function and unit-test the tree shape for various folder-path inputs. folderCheckState— pure logic over a set; easy to test without DOM.is_metadata_only-style classification on the JS side:parseMetadata,buildMetadata,getFolderPath— these are already pure functions, just not tested.- Component tests: render the tree view with a few items, simulate a drag from one row onto another, assert the
onCreateFolderForcallback fires with the correct arguments.
The recommended stack is Vitest + @testing-library/svelte — runs in Node, no browser, no flake.
| if (!metadata) { | ||
| return {} | ||
| } | ||
| try { |
There was a problem hiding this comment.
DRY: parseMetadata is duplicated here and in Table.svelte (lines ~57-64). Extract it into a shared utility (e.g. $lib/functions/pipelines/metadata.ts) — along with getFolderPath and buildMetadata — so the parsing logic lives in one place and can be unit-tested once.
a489b95 to
5222729
Compare
|
I agree that "description" API field should not be deprecated in the PR. More so, "description" and "tags" fields should be top-level in the SelectedPipelineInfo API struct and others. I have updated the PR to reflect that. |
|
I argue against having separate "description" and "tags" columns in the Pipeline table.
The benefits of a single |
snkas
left a comment
There was a problem hiding this comment.
I left a few comments -- let's discuss on how best to get this PR in the shape to get merged.
| // "material change happened" counter; client-metadata edits are not | ||
| // material. | ||
| // - No `pipeline_monitor_event` is emitted. Such an event triggers a | ||
| // Postgres `NOTIFY` that wakes the per-pipeline runner. The runner has |
There was a problem hiding this comment.
There is no relation between pipeline_monitor_event and NOTIFY.
| // This will also return an error if the pipeline does not exist. | ||
| let current = get_pipeline(txn, tenant_id, original_name).await?; | ||
|
|
||
| // Apply the patch to a copy of the current client metadata so we can both |
There was a problem hiding this comment.
This is a departure from the current way that patches are handled, only at the top level.
| /// | ||
| /// Persisted as a single JSON object in the `client_metadata` text column | ||
| /// (renamed from `metadata` in V35). The schema lives in code, not the | ||
| /// database — adding a new annotation field only requires extending this |
There was a problem hiding this comment.
Why are there so many em-dashes in all comments (which is not ASCII)?
| /// not break when newer writers add fields. Empty / missing values | ||
| /// (description == "", empty tags) are normalized to `None` so that the | ||
| /// serialized form stays minimal (`{}` for fully empty metadata, which we | ||
| /// further collapse to `""` in storage). |
There was a problem hiding this comment.
Why? Can you add in the commit description how backward compatibility is done?
|
|
||
| /// Self-descriptive tags for grouping / filtering. | ||
| #[serde(default, skip_serializing_if = "Option::is_none")] | ||
| pub tags: Option<Vec<String>>, |
There was a problem hiding this comment.
Is there any enforcement on tags?
| if self.is_empty() { | ||
| String::new() | ||
| } else { | ||
| // Serializing a fixed-shape struct of strings cannot fail. |
There was a problem hiding this comment.
Fixed-shape struct? What is that?
| } | ||
| } | ||
|
|
||
| /// True when no annotations are set. |
| pub fn apply_patch(&mut self, patch: ClientMetadata) { | ||
| let ClientMetadata { description, tags } = patch; | ||
| if let Some(d) = description { | ||
| self.description = if d.is_empty() { None } else { Some(d) }; |
There was a problem hiding this comment.
Description is just an empty string, why is it interpreted as None now?
| self.description = if d.is_empty() { None } else { Some(d) }; | ||
| } | ||
| if let Some(t) = tags { | ||
| self.tags = if t.is_empty() { None } else { Some(t) }; |
There was a problem hiding this comment.
An empty array seems perfectly fine?
| } | ||
| }, | ||
| "PatchPipeline": { | ||
| "type": "object", |
There was a problem hiding this comment.
Why did the OpenAPI spec shift so much?
There was a problem hiding this comment.
It just did due to the new CLientMetadata
| /// Client-generated data stored alongside a pipeline. | ||
| /// | ||
| /// Persisted as a single JSON object in the `client_metadata` text column | ||
| /// (renamed from `metadata` in V35). The schema lives in code, not the |
|
Does it pass CI? |
I haven't run CI on it yet |
mythical-fred
left a comment
There was a problem hiding this comment.
Big improvement since my last pass — the untested ~1,060-line drag-and-drop tree view is gone, folders dropped, and the backend now ships a proper integration test (pipeline_client_metadata_update_while_running) that covers the metadata fast-path including the version/refresh_version/monitor-event invariants. The destructuring-based PipelineFieldUpdates pattern is nice. Downgrading my previous REQUEST_CHANGES to a COMMENT.
Not approving yet:
- @snkas has an active review pass with many open prose threads and a real
V35migration question; let that play out. - The metadata-shape design (
client_metadataJSON column vs flattags/descriptioncolumns) is still being debated between @gz, @snkas and the author. Not my call to settle.
One residual finding inline.
| @@ -273,6 +275,8 @@ const toPipeline = < | |||
| ) => ({ | |||
| name: pipeline.name, | |||
| description: pipeline.description ?? '', | |||
| tags: pipeline.tags ?? [], | |||
| path: pipeline.path ?? '', | |||
| runtimeConfig: pipeline.runtime_config, | |||
| programConfig: pipeline.program_config!, | |||
| programCode: pipeline.program_code ?? '', | |||
| @@ -293,7 +297,9 @@ const toExtendedPipeline = ({ | |||
| deploymentStatus: deployment_status, | |||
| deploymentStatusSince: pipeline.deployment_status_since, | |||
| programStatusSince: pipeline.program_status_since, | |||
| description: pipeline.description, | |||
| description: pipeline.description ?? '', | |||
| tags: pipeline.tags ?? [], | |||
| path: pipeline.path ?? '', | |||
| id: pipeline.id, | |||
| name: pipeline.name, | |||
| programCode: pipeline.program_code ?? '', | |||
| @@ -327,6 +333,8 @@ const toExtendedPipeline = ({ | |||
| const fromPipeline = <T extends Partial<Pipeline>>(pipeline: T) => ({ | |||
| name: pipeline?.name, | |||
| description: pipeline?.description, | |||
| tags: pipeline?.tags, | |||
| path: pipeline?.path, | |||
There was a problem hiding this comment.
Dead field: path is leftover from the removed tree-view feature. The API no longer carries it (no path field on PipelineInfo / PipelineSelectedInfo / PostPutPipeline / PatchPipeline after the refactor — only description and tags), so pipeline.path is always undefined here and in lines 279, 302, 337. Drop the four path: … lines. While here: tags is read into PipelineThumb/Pipeline and written back via fromPipeline, but nothing in this PR actually edits or displays tags — consider trimming the writeback path too until a follow-up PR adds the UI, otherwise it's dead code in waiting.
668c196 to
30739f6
Compare
mythical-fred
left a comment
There was a problem hiding this comment.
Blockers:
- Still no web-console tests for the new search/filter behavior in List.svelte (same as review 4300127304).
- Metadata storage shape still unsettled: tags are now present, but the client_metadata JSON column vs. first-class columns decision is still under debate (see review 4341939167 / gz comment). Please resolve before merge.
Also: commit message "Review fixes" is a dirty-history message; please squash/rename.
500ee3c to
7475e0e
Compare
04d805d to
26c7fc2
Compare
snkas
left a comment
There was a problem hiding this comment.
Looks good, just make sure downstream CI will also pass, and then ready to merge. Nice job!
mythical-fred
left a comment
There was a problem hiding this comment.
Rebase pickup re-APPROVE.
Force-push afcd9465 → 26c7fc28 is a clean rebase onto fresh main (7 ahead / 1 behind at the previous tip). The only PR-affected file that moved is docs.feldera.com/docs/changelog.md, where the new tags / description-no-version-bump entries get correctly threaded under Unreleased while the v0.311.0 / v0.309.0 / v0.307.0 headings that landed on main get inserted around them. Resolved cleanly. No behavior change to the PR itself.
Prior APPROVE @ afcd9465ca stands.
gz
left a comment
There was a problem hiding this comment.
it's a bit puzzling that this feature needs 2.2k lines of code.
I understand that much of it is tests, but one wonders if we can maybe fold some of these tests into existing ones (we dont always have to write new tests especially given that CI is already slow)
|
I'll see if some tests can be folded, but the new functionality is not trivial - skipping version bumps for certain fields - so the complexity is justified |
43b2b01 to
c4b67b8
Compare
c4b67b8 to
653dd2d
Compare
mythical-fred
left a comment
There was a problem hiding this comment.
Re-APPROVE on 653dd2d. The follow-up "Simplify pipeline tags tests" commit is a clean cleanup pass:
pipeline_parsing.rsgets rustdoc on the threePIPELINE_COLUMNS_*constants spelling out which descriptor flavor each one feeds (full / monitoring / event-info) and what gets dropped. Helpful for future readers.- The
split_columnstest now goes through a newcolumn_to_descriptor_fieldshelper that knowsclient_metadataexpands into{description, tags}at the descriptor layer. That's the right shape: the SQL column is one JSON blob, the wire descriptor is two fields, and the test was previously hand-waving around that. - Python:
test_tags_default_emptyfolded intotest_tags, andtest_pipeline_tags_invalid+test_pipeline_description_lengthmerged intotest_pipeline_metadata_validation. Less duplication, same coverage (invalid tag set on create, over-limit description on create, exactly-max description accepted, patch with invalid tag rejected).
Nothing else changed.
Add a `tags` field to pipelines: a list of free-form labels for organizing, grouping, and filtering pipelines. Together with the existing `description`, tags are now stored as client-side metadata in a single `client_metadata` JSON column (migration V34) rather than a dedicated column. Further annotation fields can be added by extending the Rust `ClientMetadata` struct alone, with no additional database migration. Editing `tags` or `description` no longer bumps the pipeline `version`, `program_version`, or `refresh_version`, and does not trigger recompilation: this metadata is client-generated and has no deployment semantics. As a result it can be patched at any deployment stage (running, paused, or stopped), whereas core fields remain editable only while stopped. Compatibility: - Migration V34 adds `client_metadata`, copies existing descriptions into it, and drops the old `description` column. Reads are lenient: an empty column, and even corrupted JSON, are read as empty metadata (and logged), so a single bad row never blocks reads. - Descriptions migrated from the old column are exempt from validation until next changed, so values predating the constraints stay readable and re-submittable. - PATCH patches each field independently (an omitted field is left untouched); an empty string or empty list is a real value that clears the field, not a request to leave it unchanged. POST/PUT replace resets omitted metadata to empty. - Validation: up to 50 tags, each up to 50 chars drawn from `[a-zA-Z0-9 ._/|\:=-]`; description up to 300 chars. Violations return 400. `tags` is included in POST/PUT/PATCH bodies and GET responses (both selectors); the OpenAPI spec is regenerated accordingly. User-facing: - Python: `PipelineBuilder(..., tags=[...])`, `pipeline.modify(tags=...)`, and a new `pipeline.tags()` getter; the REST client sends tags on create/replace/patch. - The fda CLI and the web-console pipeline list carry the new field through. Signed-off-by: Karakatiza666 <[email protected]> Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
Signed-off-by: Karakatiza666 <[email protected]>
The manager reuses its own HTTPS certificate as the leaf certificate it hands to each pipeline, so marking it CA:TRUE made rustls reject the pipeline as a server (CaUsedAsEndEntity) on /pause. Exposed by the new test_pipeline_metadata_edits_at_deployment_stages, the first crud test to start and pause a pipeline. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]>
653dd2d to
a7ca075
Compare
|
A new commit fixes HTTPS test failure caused by the new pipeline test. The new test The problem is in one of our CI jobs, "Make sure manager runs with HTTPS," which runs exactly this test file with the manager in HTTPS mode. That job generates a throwaway test certificate on the fly, and it was marking that certificate as an authority ( The fix is a one-line change in the CI job: generate the throwaway certificate as an ordinary server certificate (drop the |

pipeline-manager changes
This PR adds the
ClientMetadatastruct that is flattened on serialization and is not bound to the pipeline deployment state; it is to be used by clients to store certain metadata - likedescription, ortagsfor organizing the pipelines.client_metadatacan be updated when the pipeline is running, it does not triggerversionnorrefresh_versionincrements.web-console changes
Re-generate API bindings, forward the new
tagsfield to be accessed laterAdded search bar to the pipeline list in the sidebar when editing a pipeline
Python SDK
Added support for the new
tagsfieldTesting: manual