recipe-patterns

📁 jediv/dataiku-chat-control 📅 2 days ago
2
总安装量
2
周安装量
#65916
全站排名
安装命令
npx skills add https://github.com/jediv/dataiku-chat-control --skill recipe-patterns

Agent 安装分布

opencode 2
gemini-cli 2
claude-code 2
github-copilot 2
codex 2
kimi-cli 2

Skill 文档

Dataiku Recipe Patterns

Reference patterns for creating different recipe types via the Python API.

Recipe Type Decision Table

Recipe Type Use When Key Method
Prepare Column transforms, filtering, formula columns, renaming, data cleaning project.new_recipe("prepare", ...)
Join Combining datasets on key columns (LEFT, INNER, RIGHT, OUTER) project.new_recipe("join", ...)
Group Aggregations: sum, count, avg, min, max, stddev, etc. project.new_recipe("grouping", ...)
Sync Copying data between connections (e.g., to a data warehouse) project.new_recipe("sync", ...)
Python Custom transformations not possible with visual recipes project.new_recipe("python", ...)

Universal Builder Pattern

Every recipe follows the same create-configure-run lifecycle:

# 1. Create via builder
builder = project.new_recipe("<type>", "<recipe_name>")
builder.with_input("<input_dataset>")
builder.with_output("<output_dataset>")
recipe = builder.create()

# 2. Configure settings
settings = recipe.get_settings()
# ... recipe-specific configuration ...
settings.save()

# 3. Apply schema updates (visual recipes only)
schema_updates = recipe.compute_schema_updates()
if schema_updates.any_action_required():
    schema_updates.apply()

# 4. Run and check
job = recipe.run(no_fail=True)
state = job.get_status()["baseStatus"]["state"]  # "DONE" or "FAILED"

Prepare Recipe Quick Reference

Prepare recipes use add_processor_step() (preferred) or raw_steps.append() to add processors:

settings = recipe.get_settings()

# Preferred: add_processor_step(type, params)
settings.add_processor_step("CreateColumnWithGREL", {
    "column": "revenue",
    "expression": "price * quantity"
})

# Alternative: raw_steps.append() for direct dict manipulation
# settings.raw_steps.append({
#     "type": "CreateColumnWithGREL",
#     "params": {"column": "revenue", "expression": "price * quantity"}
# })

settings.save()

Common Processors

Processor Purpose
CreateColumnWithGREL Add calculated / derived columns
ColumnTrimmer Strip whitespace from text columns
ColumnLowercaser Lowercase text for consistency
FillEmptyWithValue Replace nulls with a default
FilterOnValue Keep or remove rows by column value
FilterOnFormula Keep or remove rows by GREL expression
ColumnRenamer Rename columns
ColumnsSelector Keep or remove a set of columns
ColumnSplitter Split a column by delimiter
DateParser Parse string to date
DateFormatter Format date to string

Top 5 GREL Patterns

Pattern Example Notes
Math price * quantity Standard operators +, -, *, /
Conditional if(amount > 1000, 'large', 'small') Nestable: if(..., ..., if(...))
String ops upper(name), trim(val), length(s) Also lower(), toString()
Date extraction datePart(order_date, 'month') Parts: year, month, day, hour
Coalesce coalesce(val, 'default') Returns first non-null argument

Always Remember

  1. Call settings.save() after configuration changes
  2. Call compute_schema_updates().apply() for visual recipes (join, grouping, etc.)
  3. Call recipe.run(no_fail=True) to execute (already waits for completion)
  4. Check job.get_status()["baseStatus"]["state"] for success (“DONE”) or failure (“FAILED”)
  5. Verify output dataset has expected data and schema

Common Pitfalls

Schema Propagation

Visual recipes (join, grouping) need schema updates applied before running:

schema_updates = recipe.compute_schema_updates()
if schema_updates.any_action_required():
    schema_updates.apply()

Column Case for SQL Databases

Use UPPERCASE column names in dataset schemas to avoid “invalid identifier” errors:

for col in raw["schema"]["columns"]:
    col["name"] = col["name"].upper()

Job Completion

recipe.run() already waits — do not look for wait_for_completion().

Full signature:

job = recipe.run(job_type='NON_RECURSIVE_FORCED_BUILD', partitions=None, wait=True, no_fail=False)
  • job_type — Controls build behavior. 'NON_RECURSIVE_FORCED_BUILD' (default) rebuilds only this recipe; use 'RECURSIVE_FORCED_BUILD' to rebuild upstream dependencies too.
  • partitions — Specify partition identifiers when running on partitioned datasets. Defaults to None (all/non-partitioned).
  • wait — When True (default), blocks until the job completes. Set to False for async execution, then poll job.get_status() yourself.
  • no_fail — When False (default), raises an exception if the job fails. Set to True to suppress exceptions and inspect the job status manually.

Typical usage:

job = recipe.run(no_fail=True)  # Returns after job completes
state = job.get_status()["baseStatus"]["state"]  # "DONE" or "FAILED"

Detailed References

Recipe types:

Data preparation:

Working Examples