Stream Window Aggregate Feature View
StreamWindowAggregateFeatureView is used for streaming time-window aggregation features, such as a 10-minute rolling count of per-user transactions. It processes raw data from a streaming source (e.g. Kafka and Kinesis) and can be backfilled from any
BatchDataSource (e.g. S3, Hive Tables, Redshift) that contains a historical log of events.
- your use case requires very fresh features (<1 second) that update whenever a new raw event is available on the stream
- you need tumbling, hopping or rolling time window aggregations of type
- you have your raw events available on a stream
- 30 minute rolling click count of a user
- 5 minute rolling transaction sum
- Last 10 transactions of a user
- Max transaction amount of a user
StreamWindowAggregateFeatureView is a specialized implementation for time-window aggregations that is more efficient and performant than what a normal
StreamFeatureView could accomplish. Tecton is able to achieve higher efficiency and feature freshness, because it stores partial feature values in tiles that are rolled-up at feature request time (for more details, see below).
For more examples see Examples here.
See the API reference for the full list of parameters.
In the body of your Python function, you'll define row-level transformations that will then be aggregated according to the
Your transformation must output a column for each entity and a timestamp column. Each additional column must be aggregated by at least one
FeatureAggregation. The final number of features will be based on the number of time windows you configure.
See how to use a Stream Window Aggregate Feature View in a notebook here.
How it works
StreamWindowAggregateFeatureView use Spark Structured Streaming jobs under the hood. They operate on a sliding time window, or in continuous mode. When using a sliding time window, they update on some frequency (the slide period) and aggregate over an often longer period of time (the time window). After each slide period has elapsed, Tecton will update the value in the online store.
Behind the scenes, Tecton stores partial aggregations in the form of tiles. The tile size is defined by the
aggregation_slide_period parameter. At feature request-time, Tecton's online and offline feature serving capabilities automatically roll up the persisted tiles (as well as persisted event projections in the case of continuous streaming features). This has several key benefits:
- Significantly reduced storage requirements if you define several time windows
- Reduced stream job memory requirements
- Streaming features with a time window that exceeds the streaming source's retention period can be backfilled from the batch source. Tecton will backfill historical tiles from the batch source and combine tiles that were written by the streaming source transparently at request time.
See this blog for more technical details.
Optimizing feature data freshness with continuous processing (Beta)
For applications that require the most up-to-date feature data, continuous processing for Stream Window Aggregate Feature Views can update feature values in less than a second after the event is available in the stream data source. With continuous mode, Tecton will process each event as it arrives, rather than waiting for the slide period to complete.
See this example for how to configure a Stream Window Aggregate Feature View to use continuous processing.
Continuous processing is currently available in beta.