Skip to content

Transformations

Transformations are Tecton objects that describe a set of operations on data. The operations are expressed through standard frameworks such as Spark SQL, PySpark, and Pandas.

Transformations are required to create Feature Views. Once defined, a Transformation can be reused within multiple Feature Views, or multiple Transformations can be composed within a single Feature View. Using these Transformations with your feature store provides several benefits:

  • Reusability: You can define a common Transformation — to clean up data, for example — that can be shared across all Features.
  • Feature versioning: If you change a Feature Transformation, the Feature Store increments the version of that feature and ensures that you don't accidentally mix features that were computed using two different implementations.
  • End-to-end lineage tracking and reproducibility: Since Tecton manages Transformations, it can tie feature definitions all the way through a training data set and a model that's used in production.
  • Visibility: Enabling data scientists to examine the code and see how the feature is calculated will help them understand if it's appropriate to re-use for their model.

Transformation Types

Register a python function as a Transformation in Tecton by annotating it @transformation, and set the mode parameter depending on the language used for the transformation. The current options are spark_sql, pyspark, and pandas

Spark SQL

SQL transformations are configured with mode=spark_sql, and return a Spark SQL query.

Function inputs must be a Spark dataframe or a Tecton constant. The tables in the FROM clause must be parameterized via the inputs.

Example

from tecton import transformation

@transformation(mode="spark_sql")
def user_has_good_credit_transformation(credit_scores):
    return f"""
        SELECT
            user_id,
            IF (credit_score > 670, 1, 0) as user_has_good_credit,
            date as timestamp
        FROM
            {credit_scores}
        """

Note that Spark SQL transformations cannot be used within an OnDemandFeatureView.

PySpark

PySpark transformations are configured with mode=pyspark, and contain Python code that will be executed within a Spark context. They can additionally include third party libraries as user-defined PySpark functions if your cluster allows third party libraries.

Function inputs must be a Spark dataframe or a Tecton constant.

Example

@transformation(mode="pyspark")
def user_has_good_credit_transformation(credit_scores):
    from pyspark.sql import functions as F

    df = credit_scores.withColumn("user_has_good_credit", \
        F.when(credit_scores["credit_score"] > 670, 1).otherwise(0))
    return df.select("user_id", \
        df["date"].alias("timestamp"), \
        "user_has_good_credit")

Note that PySpark transformations, like Spark SQL transformation, cannot be used within an OnDemandFeatureView.

Pandas

Pandas transformations are annotated with mode=pandas. Pandas transformations they can only be used by an OnDemandFeatureView.

Function inputs must be a Pandas dataframe or a Tecton constant.

Example

@transformation(mode="pandas")
def transaction_amount_is_high_transformation(transaction_request):
    import pandas as pd

    df = pd.DataFrame()
    df['transaction_amount_is_high'] = (transaction_request['amount'] >= 10000).astype('int64')
    return df

Using external functions or libraries

When applying Transformations to the Tecton feature repository, only the Transformation function’s body is registered. This means imports and other references from the outside of the Transformation function’s body will not work.

Importing Libraries

In order to use imported libraries, you must import Python libraries inside the Transformation function, not at the top level as you normally would. Avoid using aliases for imports (e.g. use import pandas instead of import pandas as pd).

### Valid
from tecton import transformation

@transformation(mode="pandas")
def my_transformation(request):
    import pandas

    df = pandas.DataFrame()
    df['amount_is_high'] = (request['amount'] >= 10000).astype('int64')
    return df
### Invalid - pandas is imported outside my_transformation!
from tecton import transformation
import pandas

@transformation(mode="pandas")
def my_transformation(request):
    df = pandas.DataFrame()
    df['amount_is_high'] = (request['amount'] >= 10000).astype('int64')
    return df

Any libraries used in function signatures must also be imported outside the function.

from tecton import transformation
import pandas # required for type hints on my_transformation.

@transformation(mode="pandas")
def my_transformation(request: pandas.DataFrame) -> pandas.DataFrame:
    import pandas # required for pandas.DataFrame() below.

    df = pd.DataFrame()
    df['amount_is_high'] = (request['amount'] >= 10000).astype('int64')
    return df

Using Transformations in Feature Views

Once you've created a Transformation, the next step is to call it from a Feature View. See the Feature View Overview for more details.