hex-gemini-bq-patterns

📁 4l3k51/hex-gemini-bq-patterns 📅 10 days ago
1
总安装量
1
周安装量
#49049
全站排名
安装命令
npx skills add https://github.com/4l3k51/hex-gemini-bq-patterns --skill hex-gemini-bq-patterns

Agent 安装分布

trae 1
cursor 1
claude-code 1

Skill 文档

Hex.tech Patterns

Project Secrets and Variables

Hex injects project secrets directly into the notebook kernel namespace as bare Python variables. They are set in the Hex UI sidebar under “Secrets” (not “Environment variables”) and appear in the Variable explorer at runtime. They are NOT environment variables.

# Correct — bare variable, set in Hex Variables panel
response = call_api(gemini_vertex_api_key, prompt)
conversation = get_data(thread_url)

NEVER do any of these:

api_key = os.environ["GEMINI_API_KEY"]       # WRONG
api_key = os.getenv("GEMINI_API_KEY")        # WRONG
api_key = "AIzaSy..."                         # WRONG
from dotenv import load_dotenv; load_dotenv() # WRONG
api_key = globals()["gemini_api_key"]         # WRONG

Data Connections

import hextoolkit as htk
connection = htk.get_data_connection("your-connection-name")
df = connection.query("SELECT * FROM table LIMIT 10")

Do not use SQLAlchemy engines, BigQuery client libraries, or raw connection strings.

LLM API Calls — Gemini / Vertex AI

Use raw REST via requests. Do NOT use provider SDKs (google.cloud.aiplatform, vertexai, etc.).

Generation

import requests

url = (
    f"https://{VERTEX_CONFIG['location']}-aiplatform.googleapis.com/v1/"
    f"projects/{VERTEX_CONFIG['project_id']}/locations/{VERTEX_CONFIG['location']}/"
    f"publishers/google/models/{VERTEX_CONFIG['generate_model']}:generateContent"
)
payload = {
    "contents": [{"role": "user", "parts": [{"text": prompt}]}]
}
response = requests.post(
    url, params={"key": gemini_vertex_api_key}, json=payload,
    headers={"Content-Type": "application/json"}
)
response.raise_for_status()
text = response.json()["candidates"][0]["content"]["parts"][0]["text"]

Embeddings

url = (
    f"https://{VERTEX_CONFIG['location']}-aiplatform.googleapis.com/v1/"
    f"projects/{VERTEX_CONFIG['project_id']}/locations/{VERTEX_CONFIG['location']}/"
    f"publishers/google/models/{VERTEX_CONFIG['embed_model']}:predict"
)
payload = {"instances": [{"content": t} for t in texts]}
response = requests.post(
    url, params={"key": gemini_vertex_api_key}, json=payload,
    headers={"Content-Type": "application/json"}
)
response.raise_for_status()
embeddings = [p["embeddings"]["values"] for p in response.json()["predictions"]]

SQL Queries in Python

Execute SQL via hextoolkit connection, not via SQLAlchemy, pandas.read_sql(), or BigQuery client libraries.

Pure SQL filters (no Python variables)

connection = htk.get_data_connection("supabase-etl-prod-eu")

query = """
SELECT 
    story_id, title, url, hn_url, comments_text, comment_count
FROM `supabase-etl-prod-eu.raw_aleksi.supabase_hn_threads`
WHERE week_start = DATE_SUB(DATE_TRUNC(CURRENT_DATE(), WEEK(MONDAY)), INTERVAL 1 WEEK)
"""

df = connection.query(query)

Dynamic filters via f-string

connection = htk.get_data_connection("supabase-etl-prod-eu")

query = f"""
WITH target_thread AS (
  SELECT DISTINCT title
  FROM `supabase-etl-prod-eu.dbt.commonroom_webhook`
  WHERE {activity_filter}
    AND external_activity_url LIKE '{safe_url}%'
)
SELECT
  m.title,
  m.contact_full_name,
  m.timestamp,
  m.content
FROM `supabase-etl-prod-eu.dbt.commonroom_webhook` m
INNER JOIN target_thread t ON m.title = t.title
WHERE {activity_filter}
ORDER BY m.timestamp
"""

df = connection.query(query)

Retry Pattern

def call_gemini_with_retries(gemini_vertex_api_key, prompt, max_retries=3):
    for attempt in range(max_retries):
        try:
            url = f"https://{VERTEX_CONFIG['location']}-aiplatform.googleapis.com/v1/projects/{VERTEX_CONFIG['project_id']}/locations/{VERTEX_CONFIG['location']}/publishers/google/models/{VERTEX_CONFIG['generate_model']}:generateContent"
            payload = {"contents": [{"role": "user", "parts": [{"text": prompt}]}]}
            params = {"key": gemini_vertex_api_key}
            headers = {"Content-Type": "application/json"}

            response = requests.post(url, params=params, json=payload, headers=headers)
            response.raise_for_status()
            return response.json()["candidates"][0]["content"]["parts"][0]["text"]

        except Exception as e:
            print(f"Attempt {attempt + 1} failed: {e}")
            if attempt == max_retries - 1:
                raise e
            time.sleep(2 ** attempt)
    return None

Cell Timing

Add time.sleep(2) at the top of cells that depend on prior cell outputs to ensure previous cells have finished executing:

import time
time.sleep(2)

# rest of cell logic that depends on variables from prior cells

Cross-Cell Variable References

Variables defined in any cell (including Hex query cells like dataframe_52) are available in all subsequent cells. Reference them directly:

if dataframe_52 is None or dataframe_52.empty:
    print("ERROR: No data")
elif not gemini_vertex_api_key:
    print("ERROR: gemini_vertex_api_key not set")
else:
    result = analyze(gemini_vertex_api_key, dataframe_52)