Skip to content

Batch Window Aggregate Feature View

A BatchWindowAggregateFeatureView is used for batch time-window aggregation features, such as a 1 hour rolling count of per-user transactions. It processes raw data from any BatchDataSource (e.g. S3, Hive Tables, Redshift) that contains a historical log of events.

Use a BatchWindowAggregateFeatureView if:

  • you have your raw events available in a Batch Data Source
  • you need tumbling, hopping or rolling time window aggregations of type count, sum, mean, max, min, last-n
  • your use case can tolerate a feature freshness of > 1 hour

Common Examples:

  • 1 hour rolling click count of a user
  • Last 10 transactions of a user
  • Max transaction amount of a user

BatchWindowAggregateFeatureView is a specialized implementation for time-window aggregations that is more efficient and performant than what a normal BatchFeatureView could accomplish. Tecton is able to achieve higher efficiency and feature freshness, because it stores partial feature values in tiles that are rolled-up at feature request time (for more details, see below).

Example

Row-level Transformation

Parameters

See the API reference for the full list of parameters.

Transformation

In the body of your Python function, you'll define row-level transformations that will then be aggregated according to the FeatureAggregation parameter.

Your transformation must output a column for each entity and a timestamp column. Each additional column must be aggregated by at least one FeatureAggregation. The final number of features will be based on the number of time windows you configure.

Usage Example

See how to use a Batch Window Aggregate Feature View in a notebook here.

How it works

BatchWindowAggregateFeatureView run using Spark jobs. They update on some frequency (the slide period) and aggregate over an often longer period of time (the time window). After each slide period has elapsed, Tecton will update the value in the online store.

Behind the scenes, Tecton stores partial aggregations in the form of tiles. The tile size is defined by the aggregation_slide_period parameter. At feature request-time, Tecton's online and offline feature serving capabilities automatically roll up the persisted tiles (as well as persisted event projections in the case of continuous streaming features). This has several key benefits:

  • Significantly reduced storage requirements if you define several time windows
  • Reduced precompute resource requirements, given that Tecton needs to only compute incremental tiles and not the entire time window