lakebase-setup

📁 databricks/app-templates 📅 14 days ago
11
总安装量
11
周安装量
#27840
全站排名
安装命令
npx skills add https://github.com/databricks/app-templates --skill lakebase-setup

Agent 安装分布

opencode 11
gemini-cli 11
github-copilot 11
cursor 11
claude-code 10
codex 10

Skill 文档

Lakebase Setup for Agent Memory

Note: This template does not include memory by default. Use this skill if you want to add memory capabilities to your agent. For pre-configured memory templates, see:

  • agent-langgraph-short-term-memory – Conversation history within a session
  • agent-langgraph-long-term-memory – User facts that persist across sessions

Overview

Lakebase provides persistent storage for agent memory:

  • Short-term memory: Conversation history within a thread (AsyncCheckpointSaver)
  • Long-term memory: User facts across sessions (AsyncDatabricksStore)

Complete Setup Workflow

┌─────────────────────────────────────────────────────────────────────────────┐
│  1. Add dependency  →  2. Get instance  →  3. Configure DAB + app.yaml     │
│  4. Configure .env  →  5. Initialize tables  →  6. Deploy + Run      │
└─────────────────────────────────────────────────────────────────────────────┘

Step 1: Add Memory Dependency

Add the memory extra to your pyproject.toml:

dependencies = [
    "databricks-langchain[memory]",
    # ... other dependencies
]

Then sync dependencies:

uv sync

Step 2: Create or Get Lakebase Instance

Option A: Create New Instance (via Databricks UI)

  1. Go to your Databricks workspace
  2. Navigate to Compute → Lakebase
  3. Click Create Instance
  4. Note the instance name

Option B: Use Existing Instance

If you have an existing instance, note its name for the next step.


Step 3: Configure databricks.yml (Lakebase Resource)

Add the Lakebase database resource to your app in databricks.yml:

resources:
  apps:
    agent_langgraph:
      name: "your-app-name"
      source_code_path: ./

      resources:
        # ... other resources (experiment, UC functions, etc.) ...

        # Lakebase instance for long-term memory
        - name: 'database'
          database:
            instance_name: '<your-lakebase-instance-name>'
            database_name: 'postgres'
            permission: 'CAN_CONNECT_AND_CREATE'

Important:

  • The instance_name: '<your-lakebase-instance-name>' must match the value reference in app.yaml
  • Using the database resource type automatically grants the app’s service principal access to Lakebase

Update app.yaml (Environment Variables)

Update app.yaml with the Lakebase instance name:

env:
  # ... other env vars ...

  # Lakebase instance name - must match instance_name in databricks.yml database resource
  # Note: Use 'value' (not 'valueFrom') because AsyncDatabricksStore needs the instance name,
  # not the full connection string that valueFrom would provide
  - name: LAKEBASE_INSTANCE_NAME
    value: "<your-lakebase-instance-name>"

  # Static values for embedding configuration
  - name: EMBEDDING_ENDPOINT
    value: "databricks-gte-large-en"
  - name: EMBEDDING_DIMS
    value: "1024"

Important:

  • The LAKEBASE_INSTANCE_NAME value must match the instance_name in your databricks.yml database resource
  • The database resource handles permissions; app.yaml provides the instance name to your code
  • Don’t use valueFrom for Lakebase – it provides the connection string, not the instance name

Step 4: Configure .env (Local Development)

For local development, add to .env:

# Lakebase configuration for long-term memory
LAKEBASE_INSTANCE_NAME=<your-instance-name>
EMBEDDING_ENDPOINT=databricks-gte-large-en
EMBEDDING_DIMS=1024

Important: embedding_dims must match the embedding endpoint:

Endpoint Dimensions
databricks-gte-large-en 1024
databricks-bge-large-en 1024

Note: .env is only for local development. When deployed, the app gets LAKEBASE_INSTANCE_NAME from the valueFrom reference in app.yaml.


Step 5: Initialize Store Tables (CRITICAL – First Time Only)

Before deploying, you must initialize the Lakebase tables. The AsyncDatabricksStore creates tables on first use, but you need to do this locally first:

# Run this script locally BEFORE first deployment
import asyncio
from databricks_langchain import AsyncDatabricksStore

async def setup_store():
    async with AsyncDatabricksStore(
        instance_name="<your-instance-name>",
        embedding_endpoint="databricks-gte-large-en",
        embedding_dims=1024,
    ) as store:
        print("Setting up store tables...")
        await store.setup()  # Creates required tables
        print("Store tables created!")

        # Verify with a test write/read
        await store.aput(("test", "init"), "test_key", {"value": "test_value"})
        results = await store.asearch(("test", "init"), query="test", limit=1)
        print(f"Test successful: {results}")

asyncio.run(setup_store())

Run with:

uv run python -c "$(cat <<'EOF'
import asyncio
from databricks_langchain import AsyncDatabricksStore

async def setup():
    async with AsyncDatabricksStore(
        instance_name="<your-instance-name>",
        embedding_endpoint="databricks-gte-large-en",
        embedding_dims=1024,
    ) as store:
        await store.setup()
        print("Tables created!")

asyncio.run(setup())
EOF
)"

This creates these tables in the public schema:

  • store – Key-value storage for memories
  • store_vectors – Vector embeddings for semantic search
  • store_migrations – Schema migration tracking
  • vector_migrations – Vector schema migration tracking

Step 6: Deploy and Run Your App

IMPORTANT: Always run both deploy AND run commands:

# Deploy resources and upload files
databricks bundle deploy

# Start/restart the app with new code (REQUIRED!)
databricks bundle run agent_langgraph

Note: bundle deploy only uploads files and configures resources. bundle run is required to actually start the app with the new code.


Complete Example: databricks.yml with Lakebase

bundle:
  name: agent_langgraph

resources:
  experiments:
    agent_langgraph_experiment:
      name: /Users/${workspace.current_user.userName}/${bundle.name}-${bundle.target}

  apps:
    agent_langgraph:
      name: "my-agent-app"
      description: "Agent with long-term memory"
      source_code_path: ./

      resources:
        - name: 'experiment'
          experiment:
            experiment_id: "${resources.experiments.agent_langgraph_experiment.id}"
            permission: 'CAN_MANAGE'

        # Lakebase instance for long-term memory
        - name: 'database'
          database:
            instance_name: '<your-lakebase-instance-name>'
            database_name: 'postgres'
            permission: 'CAN_CONNECT_AND_CREATE'

targets:
  dev:
    mode: development
    default: true

Complete Example: app.yaml

command: ["uv", "run", "start-app"]

env:
  - name: MLFLOW_TRACKING_URI
    value: "databricks"
  - name: MLFLOW_REGISTRY_URI
    value: "databricks-uc"
  - name: API_PROXY
    value: "http://localhost:8000/invocations"
  - name: CHAT_APP_PORT
    value: "3000"
  - name: CHAT_PROXY_TIMEOUT_SECONDS
    value: "300"
  # Reference experiment resource from databricks.yml
  - name: MLFLOW_EXPERIMENT_ID
    valueFrom: "experiment"
  # Lakebase instance name (must match instance_name in databricks.yml)
  - name: LAKEBASE_INSTANCE_NAME
    value: "<your-lakebase-instance-name>"
  # Embedding configuration
  - name: EMBEDDING_ENDPOINT
    value: "databricks-gte-large-en"
  - name: EMBEDDING_DIMS
    value: "1024"

Troubleshooting

Issue Cause Solution
“embedding_dims is required when embedding_endpoint is specified” Missing embedding_dims parameter Add embedding_dims=1024 to AsyncDatabricksStore
“relation ‘store’ does not exist” Tables not initialized Run await store.setup() locally first (Step 5)
“Unable to resolve Lakebase instance ‘None'” Missing env var in deployed app Add LAKEBASE_INSTANCE_NAME value to app.yaml
“Unable to resolve Lakebase instance ‘…database.cloud.databricks.com'” Used valueFrom instead of value Use value: "<instance-name>" not valueFrom for Lakebase
“permission denied for table store” Missing grants The database resource in DAB should handle this; verify the resource is configured
“Failed to connect to Lakebase” Wrong instance name Verify instance name in databricks.yml and .env
Connection pool errors on exit Python cleanup race Ignore PythonFinalizationError – it’s harmless
App not updated after deploy Forgot to run bundle Run databricks bundle run agent_langgraph after deploy
valueFrom not resolving Resource name mismatch Ensure valueFrom value matches name in databricks.yml resources

Quick Reference: LakebaseClient API

For manual permission management (usually not needed with DAB database resource):

from databricks_ai_bridge.lakebase import LakebaseClient, SchemaPrivilege, TablePrivilege

client = LakebaseClient(instance_name="...")

# Create role (must do first)
client.create_role(identity_name, "SERVICE_PRINCIPAL")

# Grant schema (note: schemas is a list, grantee not role)
client.grant_schema(
    grantee="...",
    schemas=["public"],
    privileges=[SchemaPrivilege.USAGE, SchemaPrivilege.CREATE],
)

# Grant tables (note: tables includes schema prefix)
client.grant_table(
    grantee="...",
    tables=["public.store"],
    privileges=[TablePrivilege.SELECT, TablePrivilege.INSERT, ...],
)

# Execute raw SQL
client.execute("SELECT * FROM pg_tables WHERE schemaname = 'public'")

Service Principal Identifiers

When granting permissions manually, note that Databricks apps have multiple identifiers:

Field Format Example
service_principal_id Numeric ID 1234567890123456
service_principal_client_id UUID a1b2c3d4-e5f6-7890-abcd-ef1234567890
service_principal_name String name my-app-service-principal

Get all identifiers:

databricks apps get <app-name> --output json | jq '{
  id: .service_principal_id,
  client_id: .service_principal_client_id,
  name: .service_principal_name
}'

Which to use:

  • LakebaseClient.create_role() – Use service_principal_client_id (UUID) or service_principal_name
  • Raw SQL grants – Use service_principal_client_id (UUID)

Next Steps

  • Add memory to agent code: see agent-memory skill
  • Test locally: see run-locally skill
  • Deploy: see deploy skill