Arc 2026.01.1 Release Notes
New Features
Official Python SDK
The official Python SDK for Arc is now available on PyPI as arc-tsdb-client.
Installation:
pip install arc-tsdb-client
# With DataFrame support
pip install arc-tsdb-client[pandas] # pandas
pip install arc-tsdb-client[polars] # polars
pip install arc-tsdb-client[all] # all optional dependenciesKey features:
- High-performance MessagePack columnar ingestion
- Query support with JSON, Arrow IPC, pandas, polars, and PyArrow responses
- Full async API with httpx
- Buffered writes with automatic batching (size and time thresholds)
- Complete management API (retention policies, continuous queries, delete operations, authentication)
- DataFrame integration for pandas, polars, and PyArrow
Documentation: https://docs.basekick.net/arc/sdks/python
Azure Blob Storage Backend
Arc now supports Azure Blob Storage as a storage backend, enabling deployment on Microsoft Azure infrastructure.
Configuration options:
storage_backend = "azure"or"azblob"- Connection string authentication
- Account key authentication
- SAS token authentication
- Managed Identity support (recommended for Azure deployments)
Example configuration:
[storage]
backend = "azure"
azure_container = "arc-data"
azure_account_name = "mystorageaccount"
# Use one of: connection_string, account_key, sas_token, or managed identity
azure_use_managed_identity = trueNative TLS/SSL Support
Arc now supports native HTTPS/TLS without requiring a reverse proxy, ideal for users running Arc from native packages (deb/rpm) on bare metal or VMs.
Configuration options:
server.tls_enabled- Enable/disable native TLSserver.tls_cert_file- Path to certificate PEM fileserver.tls_key_file- Path to private key PEM file
Environment variables:
ARC_SERVER_TLS_ENABLEDARC_SERVER_TLS_CERT_FILEARC_SERVER_TLS_KEY_FILE
Example configuration:
[server]
port = 443
tls_enabled = true
tls_cert_file = "/etc/letsencrypt/live/example.com/fullchain.pem"
tls_key_file = "/etc/letsencrypt/live/example.com/privkey.pem"Key features:
- Uses Fiber's built-in
ListenTLS()for direct HTTPS support - Automatic HSTS header (
Strict-Transport-Security) when TLS is enabled - Certificate and key file validation on startup
- Backward compatible - TLS disabled by default
Configurable Ingestion Concurrency
Ingestion concurrency settings are now configurable to support high-concurrency deployments with many simultaneous clients
Configuration options:
ingest.flush_workers- Async flush worker pool size (default: 2x CPU cores, min 8, max 64)ingest.flush_queue_size- Pending flush queue capacity (default: 4x workers, min 100)ingest.shard_count- Buffer shards for lock distribution (default: 32)
Environment variables:
ARC_INGEST_FLUSH_WORKERSARC_INGEST_FLUSH_QUEUE_SIZEARC_INGEST_SHARD_COUNT
Example configuration for high concurrency:
[ingest]
flush_workers = 32 # More workers for parallel I/O
flush_queue_size = 200 # Larger queue for burst handling
shard_count = 64 # More shards to reduce lock contentionKey features:
- Defaults scale dynamically with CPU cores (similar to QuestDB and InfluxDB)
- Previously hardcoded values now tunable for specific workloads
- Helps prevent flush queue overflow under high concurrent load
Data-Time Partitioning
Parquet files are now organized by the data's timestamp instead of ingestion time, enabling proper backfill of historical data.
Key features:
- Historical data lands in correct time-based partitions (e.g., December 2024 data goes to
2024/12/folders, not today's folder) - Batches spanning multiple hours are automatically split into separate files per hour
- Data is sorted by timestamp within each Parquet file for optimal query performance
- Enables accurate partition pruning for time-range queries
How it works:
- Single-hour batches: sorted and written to one file
- Multi-hour batches: split by hour boundary, each hour sorted independently
Example: Backfilling data from December 1st, 2024:
# Before: All data went to ingestion date
data/mydb/cpu/2025/01/04/... (wrong - today's partition)
# After: Data goes to correct historical partition
data/mydb/cpu/2024/12/01/14/... (correct - data's timestamp)
data/mydb/cpu/2024/12/01/15/...
Contributed by @schotime
Compaction API Triggers
Hourly and daily compaction now have separate schedules and can be triggered manually via API.
API Endpoints:
| Method | Endpoint | Description |
|---|---|---|
POST
| /api/v1/compaction/hourly
| Trigger hourly compaction |
POST
| /api/v1/compaction/daily
| Trigger daily compaction |
Configuration:
[compaction]
hourly_schedule = "0 * * * *" # Every hour
daily_schedule = "0 2 * * *" # Daily at 2 AMContributed by @schotime
Configurable Max Payload Size
The maximum request payload size for write endpoints is now configurable, with the default increased from 100MB to 1GB.
Configuration options:
server.max_payload_size- Maximum payload size (e.g., "1GB", "500MB")- Environment variable:
ARC_SERVER_MAX_PAYLOAD_SIZE
Example configuration:
[server]
max_payload_size = "2GB"Key features:
- Applies to both compressed and decompressed payloads
- Supports human-readable units: B, KB, MB, GB
- Improved error messages suggest batching when limit is exceeded
- Default increased 10x from 100MB to 1GB to support larger bulk imports
Database Management API
New REST API endpoints for managing databases programmatically, enabling pre-creation of databases before agents send data.
Endpoints:
| Method | Endpoint | Description |
|---|---|---|
GET
| /api/v1/databases
| List all databases with measurement counts |
POST
| /api/v1/databases
| Create a new database |
GET
| /api/v1/databases/:name
| Get database info |
GET
| /api/v1/databases/:name/measurements
| List measurements in a database |
DELETE
| /api/v1/databases/:name
| Delete a database (requires delete.enabled=true)
|
Example usage:
# List databases
curl -H "Authorization: Bearer $TOKEN" http://localhost:8000/api/v1/databases
# Create a database
curl -X POST -H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "production"}' \
http://localhost:8000/api/v1/databases
# Delete a database (requires confirmation)
curl -X DELETE -H "Authorization: Bearer $TOKEN" \
"http://localhost:8000/api/v1/databases/old_data?confirm=true"Key features:
- Database name validation (alphanumeric, underscore, hyphen; must start with letter; max 64 characters)
- Reserved names protected (
system,internal,_internal) - DELETE respects
delete.enabledconfiguration for safety - DELETE requires
?confirm=truequery parameter - Works with all storage backends (local, S3, Azure)
DuckDB S3 Query Support (httpfs)
Arc now configures the DuckDB httpfs extension automatically, enabling direct queries against Parquet files stored in S3.
Key improvements:
- Automatic httpfs extension installation and configuration
- S3 credentials passed to DuckDB for authenticated access
SET GLOBALused to persist credentials across connection pool- Works with standard S3 buckets (note: S3 Express One Zone uses different auth and is not supported by httpfs)
Configuration:
[storage]
backend = "s3"
s3_bucket = "my-bucket"
s3_region = "us-east-2"
# Credentials via environment variables recommended:
# AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEYImprovements
Storage Backend Interface Enhancements
- Added
ListDirectories()method for efficient partition discovery - Added
ListObjects()method for listing files within partitions - Both local and S3 backends implement the enhanced interface
Compaction Subprocess Improvements
- Fixed "argument list too long" error when compacting partitions with many files
- Job configuration now passed via stdin instead of command-line arguments
- Supports compaction of partitions with 15,000+ files
Arrow Writer Enhancements
- Added row-to-columnar conversion for efficient data ingestion
- Improved buffer management for high-throughput scenarios
Ingestion Pipeline Optimizations
- Zstd compression support: Added Zstd decompression for MessagePack payloads. Zstd achieves 9.57M rec/sec with only 5% overhead vs uncompressed (compared to 12% overhead with GZIP at 8.85M rec/sec). Auto-detected via magic bytes - no client configuration required.
- Consolidated type conversion helpers: Extracted common
toInt64(),toFloat64(),firstNonNil()functions, eliminating ~100 lines of duplicate code across the ingestion pipeline. - O(n log n) column sorting: Replaced O(n²) bubble sort with
sort.Slice()for column ordering in schema inference. - Single-pass timestamp normalization: Reduced from 2-3 passes to single pass for timestamp type conversion and unit normalization.
- Result: 7% throughput improvement (9.47M → 10.1M rec/s), 63% p50 latency reduction (8.40ms → 3.09ms), 84% p99 latency reduction (42.29ms → 6.73ms).
Authentication Performance Optimizations
- Token lookup index: Added
token_prefixcolumn with database index for O(1) token lookup instead of O(n) full table scan. Reduces bcrypt comparisons from O(n/2) average to O(1-2) per cache miss. - Atomic cache counters: Replaced mutex-protected counters with
atomic.Int64operations, eliminating lock contention on cache hit/miss tracking. - Auth metrics integration: Added Prometheus metrics for authentication requests, cache hits/misses, and auth failures for better observability.
- Consolidated token extraction: Extracted common
ExtractTokenFromRequest()helper eliminating duplicate token header parsing between middleware and auth handler.
Query Performance Optimizations
- Arrow IPC throughput boost: Arrow IPC query responses now deliver 5.2M rows/sec (80% improvement from 2.88M rows/sec). Full table scans achieve 927M rows/sec (596M records in 685ms).
- SQL transform caching: Added 60-second TTL cache for SQL-to-storage-path transformations. This caches the result of converting table references (e.g.,
FROM mydb.cpu) to DuckDBread_parquet()calls (e.g.,FROM read_parquet('./data/mydb/cpu/**/*.parquet')). Benchmark shows 49-104x speedup on cache hits (~300ns vs 13-37μs per transformation). Particularly beneficial for dashboard refresh scenarios where the same queries are executed repeatedly. - Partition path caching: Added 60-second TTL cache for
OptimizeTablePath()results. Saves 50-100ms per recurring query pattern (significant for dashboard refresh scenarios). - Glob result caching: Added 30-second TTL cache for
filepath.Glob()results. Saves 5-10ms per query for large partition sets by avoiding repeated filesystem operations. - Cache statistics available via
pruner.GetAllCacheStats()for monitoring hit rates.
Storage Roundtrip Optimizations
- Fixed N+1 query pattern in database listing: Listing databases with measurement counts now uses 2 storage calls instead of N+1 (90% reduction for 20 databases).
- Optimized database existence checks: Direct marker file lookup via
storage.Exists()instead of listing all databases (O(1) vs O(n)). - Removed redundant existence checks:
handleListMeasurementsnow combines marker file check with measurement listing in a single flow. - Batch row counting in delete handler: Replaced N individual COUNT queries with single batch query using
read_parquet()with file list. - Combined before/after row counts: Single query with
COUNT(*) FILTERreplaces two separate COUNT queries during delete operations. - Extracted partition pruning helper: Reduced ~190 lines of duplicated code to ~90 lines with
buildReadParquetExpr()helper.
Bug Fixes
- Fixed DuckDB S3 credentials not persisting across connection pool (changed
SETtoSET GLOBAL) - Fixed compaction subprocess failing with large file counts
- Fixed CTE (Common Table Expressions) support - CTEs now work correctly in queries. Previously, CTE names like
WITH campaign AS (...)were incorrectly converted to physical storage paths, causing "No files found" errors. CTE names are now properly recognized and preserved as virtual table references. - Fixed JOIN clause table resolution -
JOIN database.tablesyntax now correctly converts toread_parquet()paths. Previously onlyFROMclauses were handled. - Fixed string literal corruption in queries - String literals containing SQL keywords (e.g.,
WHERE msg = 'SELECT * FROM mydb.cpu') are no longer incorrectly rewritten. String content is now protected during SQL-to-storage-path conversion. - Fixed SQL comment handling - Comments containing table references (e.g.,
-- FROM mydb.cpu) are no longer incorrectly converted to storage paths. Both single-line (--) and multi-line (/* */) comments are now properly stripped before processing. - Added LATERAL JOIN support -
LATERAL JOIN,CROSS JOIN LATERAL, and other LATERAL join variants now correctly convert table references to storage paths. - Fixed UTC consistency in path generation - Storage paths now consistently use UTC time instead of local timezone, preventing partition misalignment across different server timezones.
Performance
Tested at 10.1M records/second with:
- p50 latency: 3.09ms
- p95 latency: 5.16ms
- p99 latency: 6.73ms
- p999 latency: 9.29ms
Breaking Changes
None
Upgrade Notes
-
S3 credentials: For S3 storage backend, credentials are now also passed to DuckDB for httpfs queries. Ensure
AWS_ACCESS_KEY_IDandAWS_SECRET_ACCESS_KEYenvironment variables are set, or configures3_access_keyands3_secret_keyin the config file. -
Azure backend: New storage backend option. No changes required for existing S3 or local deployments.
-
Token prefix migration: Existing API tokens will be automatically migrated on startup. Legacy tokens are marked with a special prefix and continue to work normally. New tokens and rotated tokens benefit from O(1) lookup performance. No action required.
Contributors
Thanks to the following contributors for this release:
- @schotime (Adam Schroder) - Data-time partitioning, compaction API triggers, UTC fixes
Dependencies
- Added
github.com/Azure/azure-sdk-for-go/sdk/storage/azblobfor Azure Blob Storage support - Added
github.com/Azure/azure-sdk-for-go/sdk/azidentityfor Azure authentication