recipe-patterns
2
总安装量
2
周安装量
#65916
全站排名
安装命令
npx skills add https://github.com/jediv/dataiku-chat-control --skill recipe-patterns
Agent 安装分布
opencode
2
gemini-cli
2
claude-code
2
github-copilot
2
codex
2
kimi-cli
2
Skill 文档
Dataiku Recipe Patterns
Reference patterns for creating different recipe types via the Python API.
Recipe Type Decision Table
| Recipe Type | Use When | Key Method |
|---|---|---|
| Prepare | Column transforms, filtering, formula columns, renaming, data cleaning | project.new_recipe("prepare", ...) |
| Join | Combining datasets on key columns (LEFT, INNER, RIGHT, OUTER) | project.new_recipe("join", ...) |
| Group | Aggregations: sum, count, avg, min, max, stddev, etc. | project.new_recipe("grouping", ...) |
| Sync | Copying data between connections (e.g., to a data warehouse) | project.new_recipe("sync", ...) |
| Python | Custom transformations not possible with visual recipes | project.new_recipe("python", ...) |
Universal Builder Pattern
Every recipe follows the same create-configure-run lifecycle:
# 1. Create via builder
builder = project.new_recipe("<type>", "<recipe_name>")
builder.with_input("<input_dataset>")
builder.with_output("<output_dataset>")
recipe = builder.create()
# 2. Configure settings
settings = recipe.get_settings()
# ... recipe-specific configuration ...
settings.save()
# 3. Apply schema updates (visual recipes only)
schema_updates = recipe.compute_schema_updates()
if schema_updates.any_action_required():
schema_updates.apply()
# 4. Run and check
job = recipe.run(no_fail=True)
state = job.get_status()["baseStatus"]["state"] # "DONE" or "FAILED"
Prepare Recipe Quick Reference
Prepare recipes use add_processor_step() (preferred) or raw_steps.append() to add processors:
settings = recipe.get_settings()
# Preferred: add_processor_step(type, params)
settings.add_processor_step("CreateColumnWithGREL", {
"column": "revenue",
"expression": "price * quantity"
})
# Alternative: raw_steps.append() for direct dict manipulation
# settings.raw_steps.append({
# "type": "CreateColumnWithGREL",
# "params": {"column": "revenue", "expression": "price * quantity"}
# })
settings.save()
Common Processors
| Processor | Purpose |
|---|---|
CreateColumnWithGREL |
Add calculated / derived columns |
ColumnTrimmer |
Strip whitespace from text columns |
ColumnLowercaser |
Lowercase text for consistency |
FillEmptyWithValue |
Replace nulls with a default |
FilterOnValue |
Keep or remove rows by column value |
FilterOnFormula |
Keep or remove rows by GREL expression |
ColumnRenamer |
Rename columns |
ColumnsSelector |
Keep or remove a set of columns |
ColumnSplitter |
Split a column by delimiter |
DateParser |
Parse string to date |
DateFormatter |
Format date to string |
Top 5 GREL Patterns
| Pattern | Example | Notes |
|---|---|---|
| Math | price * quantity |
Standard operators +, -, *, / |
| Conditional | if(amount > 1000, 'large', 'small') |
Nestable: if(..., ..., if(...)) |
| String ops | upper(name), trim(val), length(s) |
Also lower(), toString() |
| Date extraction | datePart(order_date, 'month') |
Parts: year, month, day, hour |
| Coalesce | coalesce(val, 'default') |
Returns first non-null argument |
Always Remember
- Call
settings.save()after configuration changes - Call
compute_schema_updates().apply()for visual recipes (join, grouping, etc.) - Call
recipe.run(no_fail=True)to execute (already waits for completion) - Check
job.get_status()["baseStatus"]["state"]for success (“DONE”) or failure (“FAILED”) - Verify output dataset has expected data and schema
Common Pitfalls
Schema Propagation
Visual recipes (join, grouping) need schema updates applied before running:
schema_updates = recipe.compute_schema_updates()
if schema_updates.any_action_required():
schema_updates.apply()
Column Case for SQL Databases
Use UPPERCASE column names in dataset schemas to avoid “invalid identifier” errors:
for col in raw["schema"]["columns"]:
col["name"] = col["name"].upper()
Job Completion
recipe.run() already waits — do not look for wait_for_completion().
Full signature:
job = recipe.run(job_type='NON_RECURSIVE_FORCED_BUILD', partitions=None, wait=True, no_fail=False)
job_typeâ Controls build behavior.'NON_RECURSIVE_FORCED_BUILD'(default) rebuilds only this recipe; use'RECURSIVE_FORCED_BUILD'to rebuild upstream dependencies too.partitionsâ Specify partition identifiers when running on partitioned datasets. Defaults toNone(all/non-partitioned).waitâ WhenTrue(default), blocks until the job completes. Set toFalsefor async execution, then polljob.get_status()yourself.no_failâ WhenFalse(default), raises an exception if the job fails. Set toTrueto suppress exceptions and inspect the job status manually.
Typical usage:
job = recipe.run(no_fail=True) # Returns after job completes
state = job.get_status()["baseStatus"]["state"] # "DONE" or "FAILED"
Detailed References
Recipe types:
- references/prepare-recipe.md â Prepare recipe builder pattern, raw_steps API
- references/join-recipe.md â Join configuration, multi-table joins, column selection, prefix behavior
- references/group-recipe.md â Aggregation flags, output naming, type compatibility
- references/sync-recipe.md â Sync recipe pattern
- references/python-recipe.md â Python recipe with
set_code
Data preparation:
- references/processors.md â All processor types with parameters and complete example
- references/grel-functions.md â Full GREL function table and formula syntax
- references/date-operations.md â DateParser, DateFormatter, datePart examples
Working Examples
- scripts/run_recipe.py â Run any recipe by name and check job status