Skip to main content
Skip to main content

DataStore Logging

DataStore uses Python's standard logging module. This guide shows how to configure logging for debugging.

Quick Start

from chdb import datastore as pd
from chdb.datastore.config import config

# Enable debug logging
config.enable_debug()

# Now all operations will log details
ds = pd.read_csv("data.csv")
result = ds.filter(ds['age'] > 25).to_df()

Log Levels

LevelValueDescription
DEBUG10Detailed information for debugging
INFO20General operational information
WARNING30Warning messages (default)
ERROR40Error messages
CRITICAL50Critical failures

Setting Log Level

import logging
from chdb.datastore.config import config

# Using standard logging levels
config.set_log_level(logging.DEBUG)
config.set_log_level(logging.INFO)
config.set_log_level(logging.WARNING)  # Default
config.set_log_level(logging.ERROR)

# Using quick preset
config.enable_debug()  # Sets DEBUG level + verbose format

Log Format

Simple Format (Default)

config.set_log_format("simple")

Output:

DEBUG - Executing SQL query
DEBUG - Cache miss for key abc123

Verbose Format

config.set_log_format("verbose")

Output:

2024-01-15 10:30:45.123 DEBUG datastore.core - Executing SQL query
2024-01-15 10:30:45.456 DEBUG datastore.cache - Cache miss for key abc123

What Gets Logged

DEBUG Level

  • SQL queries generated
  • Execution engine selection
  • Cache operations (hits/misses)
  • Operation timings
  • Data source information
DEBUG - Creating DataStore from file 'data.csv'
DEBUG - SQL: SELECT * FROM file('data.csv', 'CSVWithNames') WHERE age > 25
DEBUG - Using engine: chdb
DEBUG - Execution time: 0.089s
DEBUG - Cache: Storing result (key: abc123)

INFO Level

  • Major operation completions
  • Configuration changes
  • Data source connections
INFO - Loaded 1,000,000 rows from data.csv
INFO - Execution engine set to: chdb
INFO - Connected to MySQL: localhost:3306/mydb

WARNING Level

  • Deprecated feature usage
  • Performance warnings
  • Non-critical issues
WARNING - Large result set (>1M rows) may cause memory issues
WARNING - Cache TTL exceeded, re-executing query
WARNING - Column 'date' has mixed types, using string

ERROR Level

  • Query execution failures
  • Connection errors
  • Data conversion errors
ERROR - Failed to execute SQL: syntax error near 'FORM'
ERROR - Connection to MySQL failed: timeout
ERROR - Cannot convert column 'price' to float

Custom Logging Configuration

Using Python Logging

import logging

# Configure root logger
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.FileHandler('datastore.log'),
        logging.StreamHandler()
    ]
)

# Get DataStore logger
ds_logger = logging.getLogger('chdb.datastore')
ds_logger.setLevel(logging.DEBUG)

Log to File

import logging

# Create file handler
file_handler = logging.FileHandler('datastore_debug.log')
file_handler.setLevel(logging.DEBUG)
file_handler.setFormatter(logging.Formatter(
    '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
))

# Add to DataStore logger
ds_logger = logging.getLogger('chdb.datastore')
ds_logger.addHandler(file_handler)

Suppress Logging

import logging

# Suppress all DataStore logs
logging.getLogger('chdb.datastore').setLevel(logging.CRITICAL)

# Or using config
config.set_log_level(logging.CRITICAL)

Debugging Scenarios

Debug SQL Generation

config.enable_debug()

ds = pd.read_csv("data.csv")
result = ds.filter(ds['age'] > 25).groupby('city').sum()

Log output:

DEBUG - Creating DataStore from file 'data.csv'
DEBUG - Building filter: age > 25
DEBUG - Building groupby: city
DEBUG - Building aggregation: sum
DEBUG - Generated SQL:
        SELECT city, SUM(*) 
        FROM file('data.csv', 'CSVWithNames')
        WHERE age > 25
        GROUP BY city

Debug Engine Selection

config.enable_debug()

result = ds.filter(ds['x'] > 10).apply(custom_func)

Log output:

DEBUG - filter: selecting engine (eligible: chdb, pandas)
DEBUG - filter: using chdb (SQL-compatible)
DEBUG - apply: selecting engine (eligible: pandas)
DEBUG - apply: using pandas (custom function)

Debug Cache Operations

config.enable_debug()

# First execution
result1 = ds.filter(ds['age'] > 25).to_df()
# DEBUG - Cache miss for query hash abc123
# DEBUG - Executing query...
# DEBUG - Caching result (key: abc123, size: 1.2MB)

# Second execution (same query)
result2 = ds.filter(ds['age'] > 25).to_df()
# DEBUG - Cache hit for query hash abc123
# DEBUG - Returning cached result

Debug Performance Issues

config.enable_debug()
config.enable_profiling()

# Logs will show timing for each operation
result = (ds
    .filter(ds['amount'] > 100)
    .groupby('region')
    .agg({'amount': 'sum'})
    .to_df()
)

Log output:

DEBUG - filter: 0.002ms
DEBUG - groupby: 0.001ms
DEBUG - agg: 0.003ms
DEBUG - SQL generation: 0.012ms
DEBUG - SQL execution: 89.456ms  <- Main time spent here
DEBUG - Result conversion: 2.345ms

Production Configuration

import logging
from chdb.datastore.config import config

# Production: minimal logging
config.set_log_level(logging.WARNING)
config.set_log_format("simple")
config.set_profiling_enabled(False)

Log Rotation

import logging
from logging.handlers import RotatingFileHandler

# Create rotating file handler
handler = RotatingFileHandler(
    'datastore.log',
    maxBytes=10*1024*1024,  # 10MB
    backupCount=5
)
handler.setLevel(logging.WARNING)

# Add to DataStore logger
logging.getLogger('chdb.datastore').addHandler(handler)

Environment Variables

You can also configure logging via environment variables:

# Set log level
export CHDB_LOG_LEVEL=DEBUG

# Set log format
export CHDB_LOG_FORMAT=verbose
import os
import logging

# Read from environment
log_level = os.environ.get('CHDB_LOG_LEVEL', 'WARNING')
config.set_log_level(getattr(logging, log_level))

Summary

TaskCommand
Enable debugconfig.enable_debug()
Set levelconfig.set_log_level(logging.DEBUG)
Set formatconfig.set_log_format("verbose")
Log to fileUse Python logging handlers
Suppress logsconfig.set_log_level(logging.CRITICAL)