azure-blob-storage
0
总安装量
1
周安装量
安装命令
npx skills add https://github.com/fusionet24/aiskills --skill azure-blob-storage
Agent 安装分布
amp
1
cline
1
opencode
1
cursor
1
continue
1
kimi-cli
1
Skill 文档
Azure Blob Storage Skill
Overview
This skill enables interaction with Azure Blob Storage and Azure Data Lake Storage Gen2 (ADLS Gen2). It provides capabilities for authenticating, listing containers/blobs, reading blob content, and managing blob metadata.
When to Use
- User mentions Azure Blob Storage, Azure Storage, or ADLS
- Need to list containers or blobs in a storage account
- Read data files from blob storage (CSV, JSON, Parquet)
- Upload or download files to/from blob storage
- Get blob metadata (size, last modified, content type)
- Work with hierarchical namespace (ADLS Gen2) directories
Prerequisites
Environment variables must be configured:
AZURE_STORAGE_ACCOUNT_NAME: Storage account nameAZURE_STORAGE_ACCOUNT_KEY: Storage account key ORAZURE_STORAGE_CONNECTION_STRING: Complete connection string
Authentication
Option 1: Account Name + Key
from azure.storage.blob import BlobServiceClient
account_name = os.getenv("AZURE_STORAGE_ACCOUNT_NAME")
account_key = os.getenv("AZURE_STORAGE_ACCOUNT_KEY")
account_url = f"https://{account_name}.blob.core.windows.net"
blob_service_client = BlobServiceClient(account_url, credential=account_key)
Option 2: Connection String
from azure.storage.blob import BlobServiceClient
connection_string = os.getenv("AZURE_STORAGE_CONNECTION_STRING")
blob_service_client = BlobServiceClient.from_connection_string(connection_string)
Option 3: Managed Identity (for Azure-hosted apps)
from azure.identity import DefaultAzureCredential
from azure.storage.blob import BlobServiceClient
account_name = os.getenv("AZURE_STORAGE_ACCOUNT_NAME")
account_url = f"https://{account_name}.blob.core.windows.net"
credential = DefaultAzureCredential()
blob_service_client = BlobServiceClient(account_url, credential=credential)
Common Operations
List Containers
from azure.storage.blob import BlobServiceClient
blob_service_client = BlobServiceClient(account_url, credential=account_key)
print("Containers in storage account:")
containers = blob_service_client.list_containers()
for container in containers:
print(f" - {container.name}")
List Blobs in a Container
container_client = blob_service_client.get_container_client("my-container")
print("Blobs in container 'my-container':")
blobs = container_client.list_blobs()
for blob in blobs:
print(f" - {blob.name} ({blob.size} bytes, modified: {blob.last_modified})")
Filter Blobs by Prefix (Directory-like)
# List all blobs in "2024/sales/" path
blobs = container_client.list_blobs(name_starts_with="2024/sales/")
for blob in blobs:
print(f" - {blob.name}")
Download Blob Content
blob_client = blob_service_client.get_blob_client(
container="my-container",
blob="data/file.csv"
)
# Download to stream
download_stream = blob_client.download_blob()
content = download_stream.readall()
print(f"Downloaded {len(content)} bytes")
# Download to file
with open("local_file.csv", "wb") as file:
download_stream = blob_client.download_blob()
file.write(download_stream.readall())
Read CSV/JSON/Parquet from Blob
import pandas as pd
from io import BytesIO
blob_client = blob_service_client.get_blob_client(
container="data-container",
blob="sales/2024/sales.csv"
)
# Download blob content
download_stream = blob_client.download_blob()
content = download_stream.readall()
# Read with pandas
if blob.name.endswith('.csv'):
df = pd.read_csv(BytesIO(content))
elif blob.name.endswith('.json'):
df = pd.read_json(BytesIO(content))
elif blob.name.endswith('.parquet'):
df = pd.read_parquet(BytesIO(content))
print(f"Loaded DataFrame with {len(df)} rows and {len(df.columns)} columns")
Get Blob Metadata
blob_client = blob_service_client.get_blob_client(
container="my-container",
blob="data/file.parquet"
)
properties = blob_client.get_blob_properties()
print(f"Blob: {properties.name}")
print(f"Size: {properties.size} bytes")
print(f"Content Type: {properties.content_settings.content_type}")
print(f"Last Modified: {properties.last_modified}")
print(f"ETag: {properties.etag}")
Upload Blob
blob_client = blob_service_client.get_blob_client(
container="my-container",
blob="output/result.csv"
)
# Upload from local file
with open("local_result.csv", "rb") as data:
blob_client.upload_blob(data, overwrite=True)
# Upload from string/bytes
content = "col1,col2\nvalue1,value2"
blob_client.upload_blob(content.encode('utf-8'), overwrite=True)
Check if Blob Exists
blob_client = blob_service_client.get_blob_client(
container="my-container",
blob="data/file.csv"
)
if blob_client.exists():
print("Blob exists")
else:
print("Blob does not exist")
Error Handling
from azure.core.exceptions import ResourceNotFoundError, AzureError
try:
blob_client = blob_service_client.get_blob_client(
container="my-container",
blob="data/file.csv"
)
content = blob_client.download_blob().readall()
except ResourceNotFoundError:
print("Blob not found - check container and blob name")
except AzureError as e:
print(f"Azure error: {e.message}")
except Exception as e:
print(f"Unexpected error: {str(e)}")
Best Practices
- Use Specific Paths: Always specify full blob paths including container
- Handle Large Files: For large files (>100MB), use streaming downloads
- Batch Operations: When listing many blobs, use pagination
- Connection Reuse: Reuse
BlobServiceClientinstance for multiple operations - Error Handling: Always wrap blob operations in try/except blocks
- Authentication: Prefer managed identity in production over account keys
Security Notes
- Never commit account keys or connection strings to version control
- Use environment variables or Azure Key Vault for credentials
- In production, use managed identities instead of account keys
- Apply least-privilege access using SAS tokens or Azure RBAC
Extensibility
This skill is designed for Azure Blob Storage. For other cloud providers:
- AWS S3: Create separate
aws-s3skill following similar pattern - Google Cloud Storage: Create
gcs-storageskill - MinIO: Create
minioskill for S3-compatible storage
Common Use Cases
- Data Discovery: List containers and blobs to find datasets
- Data Ingestion: Download files for processing or profiling
- Data Export: Upload transformed data back to blob storage
- Metadata Collection: Get file sizes, types, and modification dates
- Directory Traversal: Use prefixes to navigate ADLS Gen2 directories
Troubleshooting
- Authentication Errors: Verify env variables are set correctly
- Permission Denied: Check storage account access policies and RBAC roles
- Blob Not Found: Confirm container and blob names (case-sensitive)
- Network Issues: Check firewall rules and network connectivity
- Large File Timeouts: Increase timeout or use chunked downloads