Feature serving patterns
ML systems need features in two contexts: Batch training on historical data and real-time inference on live requests. Traditional systems force you to write feature logic twice. You write once for training in Python/Spark and once for serving in a REST API. The training version and production version drift apart. This causes training-serving skew where models train on different features than they see in production.
Xorq solves this by letting you write feature logic once as an expression that works in both contexts. A feature serving pattern defines where and when this computation happens. Your choice determines latency, freshness, and infrastructure complexity.
What is a feature serving pattern?
A feature serving pattern determines when feature computation happens relative to prediction requests. This timing decision controls three critical properties: response latency, feature freshness, and computational cost.
Batch precomputation runs feature computation in scheduled jobs before requests arrive. At request time, you look up precomputed values from storage. This gives microsecond lookups, but features might be hours or days old.
On-demand computation runs feature computation when the request arrives. This gives up-to-date features but adds computation time to every request latency.
Hybrid computation precomputes expensive features in batch and computes cheap features on-demand, balancing speed and freshness by splitting the workload between the two approaches.
The fundamental trade-off
Feature serving requires choosing between computation speed and data freshness, and you cannot optimize both simultaneously. If you precompute features, then you get fast lookups but stale data. If you compute on every request, then you get up-to-date data but slower responses. Patterns represent different positions on this spectrum.
Why the same expression prevents drift
Traditional ML systems separate training and serving code. Training uses SQL or Spark queries for historical data. Serving uses Python or Java functions for real-time requests. Someone updates the training logic but forgets the serving logic. Aggregation formulas drift, causing models to train on one feature definition but see different features in production.
Xorq solves this through deferred execution, where the expression defines computation without executing it immediately. Training executes the expression on historical data while serving executes it on current data. Same logic everywhere produces zero drift.
This is why patterns matter. If the same expression works in both contexts, then you can choose when to execute it. Batch execution happens before requests, while on-demand execution happens during requests, but the computation logic stays identical regardless of timing.
Pattern 1: Batch precomputation
Batch precomputation separates computation from serving. Scheduled jobs compute features periodically for all entities. Results store in fast-access storage so requests perform lookups instead of computation.
The mental model
Think of batch precomputation like a phone book where the phone company compiles all numbers once. When you need a number, you look it up instantly, accepting that new numbers take time to appear in the book.
How timing works
Understanding when computation and serving happen clarifies why batch precomputation delivers fast responses.
Before requests: This is the batch phase. Scheduled jobs run on a regular schedule. The expression executes on the entire dataset. Results write to storage indexed by entity ID. You can now query storage for fast lookups.
During requests: This is the serving phase. Requests arrive with entity IDs. The system looks up precomputed features by ID and returns cached results immediately. No computation happens during the request.
When to use batch precomputation
Features need expensive computation such as multi-table joins or long time windows. Staleness is acceptable at hourly, daily, or weekly updates. You need very low response latency and serve millions of entities so batch spreads cost.
Why this pattern works
Batch precomputation spreads expensive computation across all entities at once, reducing per-entity cost. Computing features for one million customers in batch takes about the same time as computing for one customer individually, so the per-request cost drops to simple lookup overhead.
Storage becomes the serving layer. Fast storage, such as in-memory cache or local SSD, provides consistent microsecond latency, making complex computation irrelevant to serving performance.
Trade-offs
You gain: Ultra-fast lookups, predictable latency, efficient batching. You accept: Stale features that are hours to days old, storage and batch job management, and new entities only after the next run.
Pattern 2: On-demand computation
On-demand computation merges computation and serving. No precomputation occurs, so every request triggers computation using the current data, and results are never cached between requests.
The mental model
Think of on-demand like a restaurant cooking to order, where you wait longer for your meal, but the food is prepared exactly when you need it, and nothing sits waiting to be served.
How timing works
On-demand computation has only one phase, during which everything happens in real time.
During requests: This is the only phase. Requests arrive with entity IDs. The expression executes immediately with current data. Computation runs with filters, joins, and aggregations. Results return directly to the requester. Nothing persists for the next request.
No batch phase exists, and no precomputation happens, so every request pays the full computation cost.
When to use on-demand computation
Features must be up-to-date within seconds. Computation is cheap with simple filters on recent data. You tolerate higher per-request latency and serve thousands of entities rather than millions.
Why this pattern works
On-demand computation eliminates infrastructure complexity. No batch jobs to schedule, no storage to manage, and no cache invalidation logic means the database is the only dependency.
Features are guaranteed to be up-to-date. If data changes one second ago, then features reflect that change immediately with no waiting for batch jobs to catch up.
Trade-offs
You gain: Always up-to-date features, simplified infrastructure with no batch or storage to manage, and immediate updates. You accept: Higher latency, database load, unpredictable latency, and expensive features that slow all requests.
Pattern 3: Hybrid computation
Hybrid computation splits features by computational cost. Expensive features use batch precomputation while cheap features use on-demand computation, and serving combines both at request time.
The mental model
Think of a hybrid like a restaurant with prep work where the kitchen prepares expensive components ahead, like stocks, sauces, and slow-cooked items, while quick components cook to order, like searing and garnishing, with assembly happening when you order.
How timing works
Hybrid computation operates in two distinct phases that work together to balance speed and freshness.
Before requests: This is the batch phase for expensive features. Scheduled jobs compute them only. Results store for fast lookup.
During requests: This is the combined phase. The system quickly looks up precomputed, expensive features. It computes cheap features from current data. It joins both feature sets and returns the combined results.
When to use hybrid computation
You need both speed and freshness. Features split naturally into expensive ones for batch and cheap ones for on-demand. Your latency budget allows combined lookup and computation. You run production ML at scale.
Why this pattern works
Hybrid patterns optimize the trade-off curve. You get fast lookups for expensive computation and up-to-date data for cheap computation, achieving better latency than pure on-demand and better freshness than pure batch.
The pattern adapts to your specific features. If 90% of the computation cost comes from three features, then batch those features while computing the remaining features on demand, optimizing where it matters most.
Trade-offs
You gain: Balanced speed and freshness, flexibility, and optimal performance for many production systems. You accept: Two computation paths, analysis to split batch versus on-demand, dual infrastructure, and more failure points.
Choosing the correct pattern
Your choice depends on three constraints: latency requirements, feature characteristics, and organizational capabilities. Each pattern section above includes a “When to use” subsection that explains the specific scenarios where that pattern is optimal. Review those sections to understand which pattern fits your constraints.
Common misunderstandings
These misconceptions can lead to poor architectural decisions when choosing feature serving patterns.
“Online” does not mean “fast”
Online serving means synchronous request-response, not necessarily low latency. Computing expensive features on demand is still online, even if requests take seconds. If you need sub-100ms latency, then you need batch precomputation or straightforward on-demand features.
Batch can be frequent
Running batch jobs every minute provides minute-level freshness. The batch-versus-on-demand distinction concerns where computation occurs, not the update frequency, since computation occurs in scheduled jobs or on request. A batch job running every minute is still a batch job, not an on-demand one.
You do not always need feature stores
Traditional feature stores solve training-serving skew by storing precomputed features. If your expression works in both contexts, then you already solved skew, and storage becomes just caching for batch patterns. Many systems need only expressions and caching, not separate feature store infrastructure.
Preventing training-serving skew
The same expression must define features for both training and serving. Do not reimplement logic in different languages or frameworks, because if training uses one expression and serving reimplements that logic, then drift will occur over time.
The expression is the contract. Training executes it on historical data while serving executes it on current data, guaranteeing identical computation logic.
When patterns do not apply
If your features never repeat across requests, then serving patterns add unnecessary complexity, and you should compute features directly in application code instead.
If each prediction requires unique custom logic, then patterns offer no reuse benefit, and the overhead of expressions and serving infrastructure outweighs the gains.
Patterns work when computation logic repeats. Repetition enables optimization through timing choices, so without repetition, simpler approaches work better.
Learning more
Serving expressions as endpoints explains how expressions become serving endpoints and the architecture that enables this.
Intelligent caching system covers caching strategies that power batch patterns, including time-to-live and invalidation logic.
Point-in-time correctness discusses temporal correctness for features that change over time, preventing data leakage in training.