dlt-skill
npx skills add https://github.com/untitled-data-company/dlt-skill --skill dlt-skill
Agent 安装分布
Skill 文档
dlt Pipeline Creator
Choose pipeline type with the decision tree below; then follow the Core Workflow.
Quick start: 1) Use the decision tree. 2) Follow the Core Workflow. 3) Use patterns and references as needed.
Pipeline Type Decision Tree
When a user requests a dlt pipeline, determine which type to create:
START: User wants to create a dlt pipeline
â
âââ Is there a dlt verified source available for this platform?
â (Check: https://dlthub.com/docs/dlt-ecosystem/verified-sources)
â â
â YES â Use VERIFIED SOURCE approach
â â Examples: Salesforce, GitHub, Stripe, HubSpot, Slack
â â Action: Guide user through `dlt init <source> <destination>`
â â
â NO â Continue to next question
â
âââ Is this a REST API with standard patterns?
â (Standard auth, pagination, JSON responses)
â â
â YES â Use DECLARATIVE REST API approach
â â Examples: Pokemon API, simple REST APIs with clear endpoints
â â Action: Create config-based pipeline with rest_api_source
â â
â NO â Continue to next question
â
âââ Does this require custom logic or Python packages?
â
YES â Use CUSTOM PYTHON approach
Examples: Python packages (simple-salesforce), complex transformations,
non-standard APIs, custom data sources
Action: Create custom source with @dlt.source and @dlt.resource decorators
Core Workflow
1. Understand Requirements
Ask clarifying questions:
- Source: What is the data source? (API URL, platform name, database, etc.)
- Source type: Does this match a verified source, REST API, or require custom code?
- Destination: Where should data be loaded? (DuckDB, BigQuery, Snowflake, etc.)
- Resources: What specific data/endpoints are needed?
- Incremental: Should the pipeline load incrementally or do full refreshes?
- Authentication: What credentials are required?
2. Choose Pipeline Approach
Based on the decision tree above, select:
- Verified source – Pre-built, tested connector
- Declarative REST API – Config-based REST API pipeline
- Custom Python – Full control with Python code
3. Initialize or Create Pipeline
Verified source
dlt init <source_name> <destination_name>
Examples:
dlt init salesforce bigquerydlt init github duckdbdlt init stripe snowflake
Declarative REST API or Custom Python
Use templates from this skill’s assets/templates/ (copy into the project if needed):
declarative_rest_pipeline.py– For REST APIscustom_python_pipeline.py– For custom sources
4. Install Required Packages
Recommended: Use the helper script (detects pip/uv/poetry):
python scripts/install_packages.py --destination <destination_name>
Manual: pip install "dlt[<destination>,workspace]" (e.g. bigquery, snowflake). For DuckDB use dlt[workspace] only. The workspace extra is required for dlt pipeline <name> show and the dashboard.
5. Configure Credentials
Create or update .dlt/secrets.toml:
Structure:
[sources.<source_name>]
# Source credentials here
[destination.<destination_name>]
# Destination credentials here
Use the template: assets/templates/.dlt/secrets.toml
Important: Remind user to add .dlt/secrets.toml to .gitignore!
Note for DuckDB: DuckDB doesn’t require credentials in secrets.toml. Just specify the database file path in the pipeline or config.toml.
6. Configure Pipeline Settings
Create or update .dlt/config.toml for non-sensitive settings:
[sources.<source_name>]
base_url = "https://api.example.com"
timeout = 30
[destination.<destination_name>]
location = "US"
Use the template: assets/templates/.dlt/config.toml
7. Implement Pipeline Logic
Flesh out the pipeline code based on requirements:
For verified sources:
- Customize resource selection with
.with_resources() - Configure incremental loading with
.apply_hints() - See: references/verified-sources.md
For Declarative REST API:
- Define client configuration (base_url, auth)
- Configure resources and endpoints
- Set up pagination and incremental loading
- Resource-level options (e.g.
max_table_nesting,table_name) are set in the resource dict in the config; see references/rest-api-source.md Resource configuration. - See: references/rest-api-source.md
For Custom Python:
- Implement
@dlt.sourceand@dlt.resourcefunctions - Use generators and yield patterns
- Configure write dispositions and primary keys
- See: references/custom-sources.md
8. Configure Incremental Loading (If Needed)
For pipelines that should load only new/changed data:
- Identify cursor field (timestamp, ID)
- Set write disposition to
merge - Define primary key for deduplication
- Configure incremental parameters
See: references/incremental-loading.md
9. Test and Run Pipeline
python <pipeline_file>.py
Check for errors and verify data is loaded correctly.
10. Inspect Results
Prerequisite: Ensure dlt[workspace] is installed (included by default when using install_packages.py).
Open the dlt dashboard to inspect loaded data:
dlt pipeline <pipeline_name> show
Or use the helper script:
python scripts/open_dashboard.py <pipeline_name>
Pipeline Patterns
Pattern 1: Verified source â Select specific resources
from salesforce import salesforce_source
source = salesforce_source()
pipeline = dlt.pipeline(
pipeline_name='salesforce_pipeline',
destination='bigquery',
dataset_name='salesforce_data'
)
# Load only specific Salesforce objects
pipeline.run(source.with_resources("Account", "Opportunity", "Contact"))
Pattern 2: Declarative REST API – Simple Endpoints
from dlt.sources.rest_api import rest_api_source
config = {
"client": {
"base_url": "https://pokeapi.co/api/v2/",
},
"resources": [
"pokemon",
{
"name": "pokemon_details",
"endpoint": "pokemon/{name}",
"write_disposition": "merge",
"primary_key": "id"
}
]
}
pipeline = dlt.pipeline(
pipeline_name="pokemon",
destination="duckdb",
dataset_name="pokemon_data"
)
pipeline.run(rest_api_source(config))
Pattern 3: Custom Python – Using Python Package
import dlt
from simple_salesforce import Salesforce
@dlt.source
def salesforce_custom(username=dlt.secrets.value, password=dlt.secrets.value):
sf = Salesforce(username=username, password=password)
@dlt.resource(write_disposition='merge', primary_key='Id')
def accounts():
records = sf.query_all("SELECT Id, Name FROM Account")
yield records['records']
return accounts
pipeline = dlt.pipeline(
pipeline_name='salesforce_custom',
destination='duckdb',
dataset_name='salesforce'
)
pipeline.run(salesforce_custom())
Pattern 4: Incremental Loading with REST API
config = {
"client": {
"base_url": "https://api.github.com/repos/dlt-hub/dlt/",
"auth": {"token": dlt.secrets["github_token"]}
},
"resources": [
{
"name": "issues",
"endpoint": {
"path": "issues",
"params": {
"state": "all",
"since": "{incremental.start_value}"
}
},
"incremental": {
"cursor_path": "updated_at",
"initial_value": "2024-01-01T00:00:00Z"
},
"write_disposition": "merge",
"primary_key": "id"
}
]
}
Pattern 5: Non-endpoint resources for REST API sources (e.g. Database-Seeded or File-Seeded parameters)
Use non-endpoint resources (e.g. Database-Seeded or File-Seeded parameters) to drive REST API calls from a database, file, or other non-API source. Pre-fetch data outside the dlt pipeline context to avoid dlt.attach() / context conflicts. The seed resource must yield a list of dicts so each row drives one API request.
import duckdb
import dlt
from dlt.sources.rest_api import rest_api_source
# 1. Pre-fetch data from database (outside dlt context)
def get_locations():
conn = duckdb.connect("locations.duckdb", read_only=True)
result = conn.execute("SELECT id, lat, lng FROM locations").fetchall()
conn.close()
return [{"id": r[0], "lat": r[1], "lng": r[2]} for r in result]
# 2. Create seed resource
@dlt.resource(selected=False)
def locations():
yield get_locations() # Yield as LIST
# 3. Configure REST API with resolve
config = {
"client": {"base_url": "https://api.weather.com/"},
"resources": [
locations(),
{
"name": "weather",
"endpoint": {
"path": "forecast",
"params": {
"lat": "{resources.locations.lat}",
"lng": "{resources.locations.lng}"
},
"data_selector": "$",
"paginator": "single_page"
},
"include_from_parent": ["id"],
"primary_key": "_locations_id"
}
]
}
source = rest_api_source(config)
pipeline = dlt.pipeline(
pipeline_name="weather",
destination="duckdb",
dataset_name="weather_data"
)
pipeline.run(source)
See: references/rest-api-source.md (Non-REST Endpoint Resources, Query/Path Params, Single-Object Responses, include_from_parent).
Best Practices (Data Engineering)
- Secrets: Use
.dlt/secrets.toml; never hardcode; add to.gitignore - Primary keys: Set for merge operations and deduplication
- Write dispositions:
append(events),merge(stateful),replace(snapshots) - Performance: Yield pages not rows; use incremental loading when possible
See references/performance-tuning.md, references/incremental-loading.md, and references/troubleshooting.md for more.
Common Challenges and Solutions
Auth (OAuth2): In REST config use "auth": {"type": "oauth2_client_credentials", ...}. For custom Python use dlt.sources.helpers.rest_client.auth.OAuth2ClientCredentials with paginate(). See references/rest-api-source.md.
Custom pagination / nested data / performance: See references/rest-api-source.md, references/custom-sources.md, references/performance-tuning.md.
Reference Documentation â When to Read What
- Full workflow / step-by-step example â references/examples.md
- Verified source â references/verified-sources.md
- Declarative REST API â references/rest-api-source.md
- Custom Python source â references/custom-sources.md
- Incremental loading â references/incremental-loading.md
- Performance â references/performance-tuning.md
- Errors / debugging â references/troubleshooting.md
- dlt basics â references/core-concepts.md
Templates and Scripts
Templates (assets/templates/)
- custom_python_pipeline.py – Custom Python pipeline skeleton
- verified_source_pipeline.py – Verified source pipeline skeleton
- declarative_rest_pipeline.py – Declarative REST API pipeline skeleton
- .dlt/config.toml – Configuration file template
- .dlt/secrets.toml – Secrets file template
- .gitignore – Git ignore template for dlt projects
Scripts (scripts/)
- install_packages.py – Install dlt + destination extras (includes
workspace). Run when setting up a new project or adding a destination. - open_dashboard.py – Open pipeline dashboard (
dlt pipeline <name> show). Run after a pipeline run to inspect loaded data.
Key Reminders
- Always ask about destination – Don’t assume
- Security first – Never commit secrets; use
.dlt/secrets.tomland provide.gitignore - Start simple – Use verified sources when available; test incrementally
- Read references – Load detailed docs only when needed