recipe-patterns

📁 jediv/dataiku-chat-control 📅 2 days ago

总安装量

周安装量

#65916

全站排名

安装命令

npx skills add https://github.com/jediv/dataiku-chat-control --skill recipe-patterns

Agent 安装分布

opencode 2

gemini-cli 2

claude-code 2

github-copilot 2

codex 2

kimi-cli 2

Skill 文档

Dataiku Recipe Patterns

Reference patterns for creating different recipe types via the Python API.

Recipe Type Decision Table

Recipe Type	Use When	Key Method
Prepare	Column transforms, filtering, formula columns, renaming, data cleaning	`project.new_recipe("prepare", ...)`
Join	Combining datasets on key columns (LEFT, INNER, RIGHT, OUTER)	`project.new_recipe("join", ...)`
Group	Aggregations: sum, count, avg, min, max, stddev, etc.	`project.new_recipe("grouping", ...)`
Sync	Copying data between connections (e.g., to a data warehouse)	`project.new_recipe("sync", ...)`
Python	Custom transformations not possible with visual recipes	`project.new_recipe("python", ...)`

Universal Builder Pattern

Every recipe follows the same create-configure-run lifecycle:

# 1. Create via builder
builder = project.new_recipe("<type>", "<recipe_name>")
builder.with_input("<input_dataset>")
builder.with_output("<output_dataset>")
recipe = builder.create()

# 2. Configure settings
settings = recipe.get_settings()
# ... recipe-specific configuration ...
settings.save()

# 3. Apply schema updates (visual recipes only)
schema_updates = recipe.compute_schema_updates()
if schema_updates.any_action_required():
    schema_updates.apply()

# 4. Run and check
job = recipe.run(no_fail=True)
state = job.get_status()["baseStatus"]["state"]  # "DONE" or "FAILED"

Prepare Recipe Quick Reference

Prepare recipes use add_processor_step() (preferred) or raw_steps.append() to add processors:

settings = recipe.get_settings()

# Preferred: add_processor_step(type, params)
settings.add_processor_step("CreateColumnWithGREL", {
    "column": "revenue",
    "expression": "price * quantity"
})

# Alternative: raw_steps.append() for direct dict manipulation
# settings.raw_steps.append({
#     "type": "CreateColumnWithGREL",
#     "params": {"column": "revenue", "expression": "price * quantity"}
# })

settings.save()

Common Processors

Processor	Purpose
`CreateColumnWithGREL`	Add calculated / derived columns
`ColumnTrimmer`	Strip whitespace from text columns
`ColumnLowercaser`	Lowercase text for consistency
`FillEmptyWithValue`	Replace nulls with a default
`FilterOnValue`	Keep or remove rows by column value
`FilterOnFormula`	Keep or remove rows by GREL expression
`ColumnRenamer`	Rename columns
`ColumnsSelector`	Keep or remove a set of columns
`ColumnSplitter`	Split a column by delimiter
`DateParser`	Parse string to date
`DateFormatter`	Format date to string

Top 5 GREL Patterns

Pattern	Example	Notes
Math	`price * quantity`	Standard operators `+`, `-`, `*`, `/`
Conditional	`if(amount > 1000, 'large', 'small')`	Nestable: `if(..., ..., if(...))`
String ops	`upper(name)`, `trim(val)`, `length(s)`	Also `lower()`, `toString()`
Date extraction	`datePart(order_date, 'month')`	Parts: `year`, `month`, `day`, `hour`
Coalesce	`coalesce(val, 'default')`	Returns first non-null argument

Always Remember

Call settings.save() after configuration changes
Call compute_schema_updates().apply() for visual recipes (join, grouping, etc.)
Call recipe.run(no_fail=True) to execute (already waits for completion)
Check job.get_status()["baseStatus"]["state"] for success (“DONE”) or failure (“FAILED”)
Verify output dataset has expected data and schema

Common Pitfalls

Schema Propagation

Visual recipes (join, grouping) need schema updates applied before running:

schema_updates = recipe.compute_schema_updates()
if schema_updates.any_action_required():
    schema_updates.apply()

Column Case for SQL Databases

Use UPPERCASE column names in dataset schemas to avoid “invalid identifier” errors:

for col in raw["schema"]["columns"]:
    col["name"] = col["name"].upper()

Job Completion

recipe.run() already waits — do not look for wait_for_completion().

Full signature:

job = recipe.run(job_type='NON_RECURSIVE_FORCED_BUILD', partitions=None, wait=True, no_fail=False)

job_type â Controls build behavior. 'NON_RECURSIVE_FORCED_BUILD' (default) rebuilds only this recipe; use 'RECURSIVE_FORCED_BUILD' to rebuild upstream dependencies too.
partitions â Specify partition identifiers when running on partitioned datasets. Defaults to None (all/non-partitioned).
wait â When True (default), blocks until the job completes. Set to False for async execution, then poll job.get_status() yourself.
no_fail â When False (default), raises an exception if the job fails. Set to True to suppress exceptions and inspect the job status manually.

Typical usage:

job = recipe.run(no_fail=True)  # Returns after job completes
state = job.get_status()["baseStatus"]["state"]  # "DONE" or "FAILED"

Detailed References

Recipe types:

references/prepare-recipe.md â Prepare recipe builder pattern, raw_steps API
references/join-recipe.md â Join configuration, multi-table joins, column selection, prefix behavior
references/group-recipe.md â Aggregation flags, output naming, type compatibility
references/sync-recipe.md â Sync recipe pattern
references/python-recipe.md â Python recipe with set_code

Data preparation:

references/processors.md â All processor types with parameters and complete example
references/grel-functions.md â Full GREL function table and formula syntax
references/date-operations.md â DateParser, DateFormatter, datePart examples

Working Examples

scripts/run_recipe.py â Run any recipe by name and check job status

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台