Skip to content

Creating & Managing Features

Defining Feature pipelines

What programming languages do you support for defining features?

For each feature in Tecton, you will create a python-based feature definition file that includes all of the metadata and logic you want Tecton to manage for you.

Tecton's transformation logic is managed in Spark, using PySpark or SQL transformations. If your model requires request-time transformations, those are managed in python.

See Feature Views for more details.

What data types are supported for feature values?

Tecton supports the following Spark data types:

  • LongType
  • DoubleType
  • StringType
  • BooleanType
  • ArrayType with LongType, DoubleType, FloatType, and StringType elements.

Feature materialization and lineage

What happens when the definition of a feature changes?

If a feature's definition changes, Tecton automatically detects all the dependencies on that feature, surfaces that for you, and asks you to confirm if you want to go forward with the changes. If you would like to roll back the changes or see the feature lineage, these definitions are backed by git. So you can always track the state of the world of your feature store, at all times.

How far back does Tecton support time-travel?

You set your features' backfill start date in Tecton. Time-travel can be performed as far back as feature data exists.

What support do you provide for time travel?

Tecton performs time travel on a row level basis - our granularity of time travel can be quite specific. If you have event driven ML models where you're regularly making predictions and you need to go back to every single specific point in time, and get the feature values as of that point in time, Tecton will handle that full travel query as opposed to just being able to get all feature values at a single point in time.

Does Tecton provide the functionality to replay and fix a backfill if the underlying data source is updated?

Yes, it is possible to kick off an "overwrite backfill" for a particular time range. Tecton will replay all transformations. This functionality is currently in private preview. To run an overwrite backfill, contact support@tecton.ai.

When scheduling materializations, does Tecton only materialize new data? Or does Tecton re-materialize all data?

Generally speaking, Tecton only reads and computes new data. There may be instances in which more historical data is required (eg, computing a one month average at materialization time requires knowing the full window of information).

What does Tecton do for data lineage? Does it support the entire data flow?

For data lineage, we consider both how features are created and how they are consumed. For feature creation, we show you the entire data flow - from your raw data sources, to the different transformations being ran, to where the data is being stored. For feature consumption, we have concept of a FeatureService which maps to the features for a model that is running. For any feature, you can see which services are using it and, likewise for any service, what are all the features that are inside of it - there is bidirectional tracking.

Does Tecton have an Airflow or Prefect integration?

Tecton does not currently have a first-class Airflow or Prefect integration. However, since Tecton's SDK is a plain Python package, it can be run from any environment with access to Python.

Currently, reading features from Tecton requires access to a Spark cluster. Ensure your Airflow environment has access to a Spark cluster when using Airflow with Tecton.

Sharing Features

Can users inspect features?

Tecton provides a variety of ways for your data scientists to inspect features. They can review the actual code of the feature, see summary statistics for all features, and query the feature's data using the Tecton SDK.

Can users register and discover features in Tecton?

Yes, with Tecton, you register the entire transformation logic, plus metadata around families, owners, custom tags, and more. The Tecton Web UI then allows users to access, browse and discover different families of features if you break them down by different use cases or filter down to different metadata tags that you can add on to these features.

How can users ensure there are no duplicate features ingested?

The Tecton Feature Store manages feature dependencies via the names of the objects that are configured for Tecton (eg, data sources, feature views, and services). It is possible to have users submit similar features with different names; we would recommend users first look to reuse features that exist in the feature store.

Handling Nulls

Does Tecton support null feature values?

Yes. Tecton supports nulls for feature values. Null values may be returned when data is missing (e.g. for a brand new user) or when the feature view column computes null (e.g. SELECT NULLIF(a, b)).

Does Tecton support null elements in arrays?

Yes. Tecton supports null elements in arrays, e.g. ["foo", null]. However, null elements in arrays in on-demand feature views have some special handling.

Arrays with null elements in On-Demand Feature Views.

In On-Demand Feature Views, arrays are represented as numpy arrays. Numeric numpy arrays do not directly support null. Tecton works around this using the same semantics as Spark for Pandas UDFs:

  • Floating Point Arrays
    • Input: Null values are cast to NaN
      • Eg: [1.0, NaN, null]np.array([1, 'nan', 'nan'], dtype=np.float32)
    • Output: NaN values are cast to Null.
      • Eg: np.array([1, 'nan', 'nan'], dtype=np.float32)[1.0, null, null]
  • Integer Arrays
    • Input: If null is present in an integer array, the entire array is cast to a numpy float64 array with NaNs for null values. Warning: this will be a lossy conversion for integer values larger than 2^53 - 1. Do not use nulls in your integer arrays if this will be a problem.
      • Eg: [1, 2]np.array([1, 2], dtype=np.int64)
      • Eg: [1, null]np.array([1, 'nan'], dtype=np.float64)
    • Output: ODFVs with integer array outputs may output Float64s, which will be cast to Int64s. Tecton will fail if converting a Float64 to an Int64 would cause an Int64 overflow, i.e. the value is larger than 2^63.
      • Eg: np.array([1, 2], dtype=np.int64)[1, 2]
      • Eg: np.array([1, 'nan'], dtype=np.float64)[1, None]
  • String Arrays
    • String arrays are represented with numpy object arrays, which support null, so no special handling is needed. E.g. np.array(['foo', None], dtype=object).