Function-Level Configuration
DataStore allows fine-grained control over execution at the function level, including engine selection and Dtype correction.
Function Engine Configuration
Override the execution engine for specific functions.
Setting Function Engines
When to Use
Force chdb for:
- Functions with better ClickHouse performance
- Functions that benefit from SQL optimization
- Large-scale string/datetime operations
Force pandas for:
- Functions with pandas-specific behavior
- When exact pandas compatibility is required
- Custom string operations
Example
Overlapping Functions
159+ functions are available in both chdb and pandas engines:
| Category | Functions |
|---|---|
| String | length, upper, lower, trim, ltrim, rtrim, concat, substring, replace, reverse, contains, startswith, endswith |
| Math | abs, round, floor, ceil, exp, log, log10, sqrt, pow, sin, cos, tan |
| DateTime | year, month, day, hour, minute, second, dayofweek, dayofyear, quarter |
| Aggregation | sum, avg, min, max, count, std, var, median |
For overlapping functions, the engine is selected based on:
- Explicit function configuration (if set)
- Global execution_engine setting
- Auto-selection based on context
chdb-Only Functions
Some functions are only available through ClickHouse:
| Category | Functions |
|---|---|
| Array | arraySum, arrayAvg, arraySort, arrayDistinct, groupArray, arrayElement |
| JSON | JSONExtractString, JSONExtractInt, JSONExtractFloat, JSONHas |
| URL | domain, path, protocol, extractURLParameter |
| IP | IPv4StringToNum, IPv4NumToString, isIPv4String |
| Geo | greatCircleDistance, geoDistance, geoToH3 |
| Hash | cityHash64, xxHash64, sipHash64, MD5, SHA256 |
| Conditional | sumIf, countIf, avgIf, minIf, maxIf |
These functions automatically use chdb engine regardless of configuration.
pandas-Only Functions
Some functions are only available through pandas:
| Category | Functions |
|---|---|
| Apply | Custom lambda functions, user-defined functions |
| Complex Pivot | Pivot tables with custom aggregations |
| Stack/Unstack | Complex reshaping operations |
| Interpolate | Time series interpolation methods |
These functions automatically use pandas engine regardless of configuration.
Dtype Correction
Configure how DataStore corrects data types between engines.
Correction Levels
Correction Level Details
| Level | Description | Types Corrected |
|---|---|---|
NONE | No automatic correction | None |
CRITICAL | Essential corrections | NULL handling, boolean conversion |
HIGH (default) | Common corrections | Integer/float precision, datetime, string encoding |
MEDIUM | More corrections | Decimal precision, timezone handling |
ALL | Maximum correction | All type differences |
When Types Need Correction
Type differences can occur when:
- ClickHouse → pandas: Different integer sizes (Int64 vs int64)
- pandas → ClickHouse: Python objects to SQL types
- NULL handling: pandas NA vs ClickHouse NULL
- Boolean: Different boolean representations
- DateTime: Timezone differences
Example
Function Configuration API
function_config Object
Per-Call Override
Some methods support per-call engine override: