Skip to content

feat: Implement data quality field contraint protos#366

Open
omirandadev wants to merge 4 commits into
masterfrom
feat/field-constraints-data-quality
Open

feat: Implement data quality field contraint protos#366
omirandadev wants to merge 4 commits into
masterfrom
feat/field-constraints-data-quality

Conversation

@omirandadev

Copy link
Copy Markdown
Collaborator

What this PR does / why we need it:

Adds typed FieldConstraints and Imputation proto messages plus matching Python/Pydantic surfaces so customers can declare per-column data quality rules and null-imputation strategies on a SparkSource. Constraints are stored on the source's proto and round-trip through the registry. The runtime that enforces them lives downstream in implementing feature engineering jobs that handle them.

Which issue(s) this PR fixes:

Misc

if len(proto.allowed_values) > 0:
kwargs["allowed_values"] = list(proto.allowed_values)
# `regex` is a plain string in proto3; empty == unset.
if proto.regex != "":

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Saw the empty == unset note here, makes sense for proto3. Only nit: the validator still accepts regex="" and to_proto writes it, so you can create a value that silently reads back as None. Could reject empty regex in _regex_compiles to keep the two ends consistent.

allowed_values: Optional[List[str]] = None
regex: Optional[str] = None
unique: Optional[bool] = None
custom: Optional[Dict[str, str]] = None

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every other field validates at definition time (regex compiles, ranges in [0,1], etc.), but custom predicates are stored as-is, so a bad/empty predicate only fails at the next FE run. Could we reject empty predicate strings here? Maybe a non-empty check would catch the common typo early.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants