azure-blob-storage

📁 fusionet24/aiskills 📅 Today

总安装量

周安装量

安装命令

npx skills add https://github.com/fusionet24/aiskills --skill azure-blob-storage

Agent 安装分布

amp 1

cline 1

opencode 1

cursor 1

continue 1

kimi-cli 1

Skill 文档

Azure Blob Storage Skill

Overview

This skill enables interaction with Azure Blob Storage and Azure Data Lake Storage Gen2 (ADLS Gen2). It provides capabilities for authenticating, listing containers/blobs, reading blob content, and managing blob metadata.

When to Use

User mentions Azure Blob Storage, Azure Storage, or ADLS
Need to list containers or blobs in a storage account
Read data files from blob storage (CSV, JSON, Parquet)
Upload or download files to/from blob storage
Get blob metadata (size, last modified, content type)
Work with hierarchical namespace (ADLS Gen2) directories

Prerequisites

Environment variables must be configured:

AZURE_STORAGE_ACCOUNT_NAME: Storage account name
AZURE_STORAGE_ACCOUNT_KEY: Storage account key OR
AZURE_STORAGE_CONNECTION_STRING: Complete connection string

Authentication

Option 1: Account Name + Key

from azure.storage.blob import BlobServiceClient

account_name = os.getenv("AZURE_STORAGE_ACCOUNT_NAME")
account_key = os.getenv("AZURE_STORAGE_ACCOUNT_KEY")
account_url = f"https://{account_name}.blob.core.windows.net"

blob_service_client = BlobServiceClient(account_url, credential=account_key)

Option 2: Connection String

from azure.storage.blob import BlobServiceClient

connection_string = os.getenv("AZURE_STORAGE_CONNECTION_STRING")
blob_service_client = BlobServiceClient.from_connection_string(connection_string)

Option 3: Managed Identity (for Azure-hosted apps)

from azure.identity import DefaultAzureCredential
from azure.storage.blob import BlobServiceClient

account_name = os.getenv("AZURE_STORAGE_ACCOUNT_NAME")
account_url = f"https://{account_name}.blob.core.windows.net"
credential = DefaultAzureCredential()

blob_service_client = BlobServiceClient(account_url, credential=credential)

Common Operations

List Containers

from azure.storage.blob import BlobServiceClient

blob_service_client = BlobServiceClient(account_url, credential=account_key)

print("Containers in storage account:")
containers = blob_service_client.list_containers()
for container in containers:
    print(f"  - {container.name}")

List Blobs in a Container

container_client = blob_service_client.get_container_client("my-container")

print("Blobs in container 'my-container':")
blobs = container_client.list_blobs()
for blob in blobs:
    print(f"  - {blob.name} ({blob.size} bytes, modified: {blob.last_modified})")

Filter Blobs by Prefix (Directory-like)

# List all blobs in "2024/sales/" path
blobs = container_client.list_blobs(name_starts_with="2024/sales/")
for blob in blobs:
    print(f"  - {blob.name}")

Download Blob Content

blob_client = blob_service_client.get_blob_client(
    container="my-container",
    blob="data/file.csv"
)

# Download to stream
download_stream = blob_client.download_blob()
content = download_stream.readall()
print(f"Downloaded {len(content)} bytes")

# Download to file
with open("local_file.csv", "wb") as file:
    download_stream = blob_client.download_blob()
    file.write(download_stream.readall())

Read CSV/JSON/Parquet from Blob

import pandas as pd
from io import BytesIO

blob_client = blob_service_client.get_blob_client(
    container="data-container",
    blob="sales/2024/sales.csv"
)

# Download blob content
download_stream = blob_client.download_blob()
content = download_stream.readall()

# Read with pandas
if blob.name.endswith('.csv'):
    df = pd.read_csv(BytesIO(content))
elif blob.name.endswith('.json'):
    df = pd.read_json(BytesIO(content))
elif blob.name.endswith('.parquet'):
    df = pd.read_parquet(BytesIO(content))

print(f"Loaded DataFrame with {len(df)} rows and {len(df.columns)} columns")

Get Blob Metadata

blob_client = blob_service_client.get_blob_client(
    container="my-container",
    blob="data/file.parquet"
)

properties = blob_client.get_blob_properties()
print(f"Blob: {properties.name}")
print(f"Size: {properties.size} bytes")
print(f"Content Type: {properties.content_settings.content_type}")
print(f"Last Modified: {properties.last_modified}")
print(f"ETag: {properties.etag}")

Upload Blob

blob_client = blob_service_client.get_blob_client(
    container="my-container",
    blob="output/result.csv"
)

# Upload from local file
with open("local_result.csv", "rb") as data:
    blob_client.upload_blob(data, overwrite=True)

# Upload from string/bytes
content = "col1,col2\nvalue1,value2"
blob_client.upload_blob(content.encode('utf-8'), overwrite=True)

Check if Blob Exists

blob_client = blob_service_client.get_blob_client(
    container="my-container",
    blob="data/file.csv"
)

if blob_client.exists():
    print("Blob exists")
else:
    print("Blob does not exist")

Error Handling

from azure.core.exceptions import ResourceNotFoundError, AzureError

try:
    blob_client = blob_service_client.get_blob_client(
        container="my-container",
        blob="data/file.csv"
    )
    content = blob_client.download_blob().readall()
except ResourceNotFoundError:
    print("Blob not found - check container and blob name")
except AzureError as e:
    print(f"Azure error: {e.message}")
except Exception as e:
    print(f"Unexpected error: {str(e)}")

Best Practices

Use Specific Paths: Always specify full blob paths including container
Handle Large Files: For large files (>100MB), use streaming downloads
Batch Operations: When listing many blobs, use pagination
Connection Reuse: Reuse BlobServiceClient instance for multiple operations
Error Handling: Always wrap blob operations in try/except blocks
Authentication: Prefer managed identity in production over account keys

Security Notes

Never commit account keys or connection strings to version control
Use environment variables or Azure Key Vault for credentials
In production, use managed identities instead of account keys
Apply least-privilege access using SAS tokens or Azure RBAC

Extensibility

This skill is designed for Azure Blob Storage. For other cloud providers:

AWS S3: Create separate aws-s3 skill following similar pattern
Google Cloud Storage: Create gcs-storage skill
MinIO: Create minio skill for S3-compatible storage

Common Use Cases

Data Discovery: List containers and blobs to find datasets
Data Ingestion: Download files for processing or profiling
Data Export: Upload transformed data back to blob storage
Metadata Collection: Get file sizes, types, and modification dates
Directory Traversal: Use prefixes to navigate ADLS Gen2 directories

Troubleshooting

Authentication Errors: Verify env variables are set correctly
Permission Denied: Check storage account access policies and RBAC roles
Blob Not Found: Confirm container and blob names (case-sensitive)
Network Issues: Check firewall rules and network connectivity
Large File Timeouts: Increase timeout or use chunked downloads

GitHub 仓库 ↗ ← 返回陌讯 Skills 聚合平台