Skip to content

What's New?

November 15, 2021

Default Stream Cluster Configuration now includes On-demand instances for Driver nodes

Using an on-demand instance for the Spark driver node can make stream processing more reliable by preventing losing the entire cluster to spot termination. By using a mix of instance types, you can ensure reliability while taking advantage of cheaper spot instances for additional processing power.

The new first_on_demand parameter for DatabricksClusterConfig and EMRClusterConfig enables configuring a mix of on-demand and spot instances in a single cluster. When configured, the first first_on_demand nodes of the cluster will use on_demand instances. The rest will use the type specified by instance_availability.

If not specified, then Tecton will default to first_on_demand=1 for StreamFeatureView and StreamWindowAggregateFeatureView.

Spot with fallback instance availability for Databricks

Materialization jobs on Databricks can now be configured to use the spot with fallback availability option.

DatabricksClusterConfig.instance_availability now supports the spot_with_fallback option. See the Databricks documentation for more details.

Raw Data Source Preview

You can now use the Tecton SDK to view Data Source inputs before the translator function is applied. Viewing a sample of this raw data can help debug translator and data source issues.

To do so, set apply_translator=False when using the StreamDataSource.start_stream_preview() or BatchDataSource.get_dataframe() methods.

November 8, 2021

Jobs Tab

The new Jobs tab in the Web UI makes it easy to keep track of materialization jobs for all of your features.

You can navigate to the Jobs tab by click on 'Jobs' in the sidebar, under the Resources section. You can then filter by date, or any of the columns in the table.

jobs page details

Batch & Stream Translator Definitions in Web UI

Data Source translator definitions can now be viewed in the Data Source Overview in the Web UI, similar to Transformations. This code preview makes it easier to understand the expected output of the Data Source.

Data Source Screenshot

October 26, 2021

Tecton SDK 0.1.0 Release

Back in May, we announced the release of the Framework v2 API. After helping customers transition to the new Framework over the last 6 months, we're finally removing outdated methods from the SDK with release 0.1.0.

With 0.1.0, you will no longer be able to run tecton apply for Framework v1 Objects, such as Feature Packages.

Additionally, the following interactive SDK methods have been renamed:

  • FeatureService.feature_packages() is replaced with FeatureService.feature_views()
  • FeatureView.is_online is replaced with FeatureView.is_on_demand
  • Workspace.get_feature_package() is replaced with Workspace.get_feature_view()
  • Workspace.list_feature_packages() is replaced with Workspace.list_feature_views() / Workspace.list_feature_tables()
  • Workspace.get_virtual_data_source() is replaced with Workspace.get_data_source()
  • Workspace.list_virtual_data_sources() is replaced with Workspace.list_data_sources()

October 21, 2021

Array Features

Tecton now natively supports Array type features with Float32, Float64, Int64, and String type elements.

This allows Tecton users to compactly store and serve features like dense embeddings or lists of categorical values.

Here's an example On-Demand Feature View Overview that uses array features to compute the cosine similarity between a precomputed user embedding and a query embedding:

October 15, 2021

New FeatureView Run SDK Method for testing feature transformations

Tecton is releasing a new Run API to be used for dry-run executing a FeatureView's transformation the same way Tecton will execute it during materialization or at feature retrieval time. One method introduced is run which can be called from all Feature Views, and also supports mock input data. Another method is run_stream available for streamable Feature Views.

Here's an example of how this looks in practice.

import tecton
import spark
from datetime import datetime
from datetime import timedelta

ws = tecton.get_workspace('...')
fv = ws.get_feature_view('...') # BatchFeatureView or StreamFeatureView
mock_df_1 = spark.createDataFrame([ # Spark and Pandas DataFrame are supported
    {'field_1': 'value_1', ...}, # row 1
    {'field_1': 'value_1', ...}, # row 2
])
mock_df_2 = spark.createDataFrame([
    {'field_1': 'value_1', ...}, # row 1
])

# input_1 and input_2 correspond to FeatureView input names.
fv.run(<input_1>=mock_df_1,
             <input_2>=mock_df_1,
           feature_start_time=start_time=(datetime.now() - timedelta(days = 7)))

The API details can be found here under each FeatureView type.

Databricks Runtime 9.1 and EMR Runtime 6.4 Support

We're excited to announce that Tecton is launching support for Spark 3 with the latest Databricks and EMR runtimes! These new versions deliver faster query performance, security updates, and better support for Delta tables.

The latest Tecton SDK can now be used with Databricks Runtime 9.1 LTS or EMR 6.4.0. See the documentation for instructions on how to update your notebook cluster.

Tecton will continue to support compatibility with the current runtimes (DBR 6.4 LTS, EMR 5.30) through Dec. 31, 2021. Of course, we recommend you update sooner to take advantage of the newest versions.

Support for Python 3.8

Previously, the Tecton SDK required a Python 3.7 environment due to incompatibility between PySpark 2.4.5 and newer Python versions. Starting with v0.0.58, you can install the Tecton CLI in a Python 3.8. Just run pip install tecton to get the latest version.

October 8, 2021

New Feature Retrieval SDK Methods

Tecton is releasing a new API to be used for feature retrieval. For fetching features from the offline store, we've introduced the method get_historical_features. For fetching features from the online store, we've introduced the method get_online_features. Both these methods can be called on Feature Services, Feature Views, and Feature Tables.

These methods will replace existing methods get_feature_dataframe, get_feature_vector, get_features, preview, and the module level functions tecton.get_historical_features and tecton.get_online_features. These functions will be deprecated in the future so please transition into using the new api.

There is also a new method introduced for data sources. On a BatchDataSource or a StreamDataSource, we will be deprecating the dataframe and previewmethods, and instead introducing the get_dataframe method. This returns the data from the data source as a TectonDataFrame. You can filter the data fetched using the start_time and end_time parameters.

The API details can be found here under each FeatureView type.

October 1, 2021

batch_cluster_config now available for Feature Table declarations

You can now include batch_cluster_config configuration in Feature Table declarations. By specifying an EMRClusterConfig or DatabricksClusterConfig configuration or referencing an ExistingClusterConfig, the ingest materialization jobs will run on a cluster of workers with the specified configuration options like instance type, instance size, and spark configurations.

FeatureTable(
    name="user_page_click_feature_table",
    entities=[content],
    schema=schema,
    online=True,
    offline=True,
    owner="example@tecton.ai",
    batch_cluster_config=DatabricksClusterConfig(
        instance_type="m5.2xlarge",
        number_of_workers=2,
        spark_config={
            "spark.executor.memory": "7000m",
        },
    ),
)

Resolved Plan Hooks issue for Windows environments

Previously including a plan.py would cause problems for users in a Windows environment. This issue is resolved in SDK version 0.0.54.

Adds .tectonignore to ignore paths and files

You can now add .tectonignore to your feature repository to ignore specified paths and files, similar to .gitignore. See the usage guide for an example.

August 27, 2021

Grant Admin privileges to a user

Existing admin users can now directly grant admin privileges to other users through the Admin console, rather than having to make a request to Tecton support. See the Admin console documentation for more detail.

Directly run Tecton plan hooks

Plan hooks provide a mechanism for running unit tests during tecton plan or apply. You can now directly run the plan hooks with the new tecton test command, rather than having to run the entire tecton plan.

tecton plan status updates and performance optimizations

Running tecton plan can take a few minutes, especially when creating a new workspace. The command line interface will now print out status updates as it progresses through the validation steps. We have additionally added a few performance optimizations to shorten the validation time for large feature repositories.

tecton plan status update

Feature Table ingestion improvements

Running FeatureTable.ingest() will now create a new Spark job cluster to process the data input. This change adds several product improvements:

  • Automatic retries in case of ephemeral errors.
  • Ability to ingest larger datasets.
  • Ability to specify a DatabricksClusterConfig or EMRClusterConfig in case more resources are necessary to complete the ingestion.
  • View ingestion status and history in the Materialization tab.

feature table materialization status

August 13, 2021

Creating Live Workspaces (Beta)

Previously, features would only be materialized to the online or offline store in the prod workspace.

You can now create additional workspaces with automatic materialization by using the flag --live. Please see the documentation for more detail.

This new capability allows different groups within the same organization to operate independent feature stores, or test new features that rely on materialization without affecting production.

Remember that materialization can incur additional Tecton and infrastructure costs. Tecton recommends only applying features with online=True when you plan to access the feature online.

--json-out flag for tecton plan (beta)

Tecton recommends integrating with your CI/CD processes for deploying changes to your production feature store.

With tecton plan --json-out, Tecton will return the expected diff in a json format so that you can more easily parse the output for your automation.

See the documentation for more detail.

August 2, 2021

New API Key management commands

The CLI "API key" commands have been refactored as sub-commands for consistency and clearer help docs.

$ tecton -h
...
commands:
  api-key create         create a new API key
  api-key delete         deactivate an API key by its ID
  api-key list           list active API keys
...

The previous CLI commands (e.g. tecton create-api-key) have been deprecated but are still supported.

tecton.get_feature_vector() metadata

The get_feature_vector() method now returns more metadata, specifically effective feature times and slo information.

To access slo information:

fs = tecton.get_feature_service('...name...')
keys = {'user_id': '10'}
slo_info = fs.get_feature_vector(keys).slo_info

To access effective feature times use the return_effective_times parameter:

fs = tecton.get_feature_service('...name...')
keys = {'user_id': '10'}
feature_vector = fs.get_feature_vector(keys)

response_dict = feature_vector.to_dict(return_effective_times=True)
response_pandas = feature_vector.to_pandas(return_effective_times=True)
response_numpy = feature_vector.to_numpy(return_effective_times=True)

New tecton.get_feature_freshness() method for viewing freshness status in a Notebook

There is now a tecton.get_feature_freshness() method that returns the freshness status of all features in a given workspace. This returns the same information as the CLI command tecton freshness :

import tecton
tecton.get_feature_freshness('workspace_name')

To get the data returned in a Python dictionary set the to_dict parameter:

freshness_dict = tecton.get_feature_freshness('workspace_name', to_dict=True)

July 9, 2021

Feature Monitoring Summary Dashboard

In addition to monitoring the materialization status for individual Feature Views, we've now added a summary dashboard to easily see if any of the Feature Views in a workspace are stale or have failing materialization jobs.

To view this dashboard, just click on Features in the left-hand navigation, then select the Monitoring Summary tab.

Feature Monitoring Summary

Databricks Runtime 6.4 Extended Support

Tecton Notebook clusters are now configured by default to run on DBR 6.4 Extended Support in order to stay on an officially supported runtime.

The option to use Databricks Runtimes 8+ is coming soon.

June 25, 2021

Unlock users for incorrect password attempts

Admins can now Unlock users who have been locked out for too many incorrect password attempts using the Admin Console.

See the User Management instructions for details on how to unlock a user.

Tecton SDK on PyPI

The Tecton SDK is now being published to PyPI at https://pypi.org/project/tecton/ for easier installation.

See the CLI Setup Instructions for details on how to install the Tecton package.

June 7, 2021

Continuous mode for StreamWindowAggregateFeatureView

Continuous mode for Stream Window Aggregate Feature Views now enables new event data to be included in feature values in less than a second. This low ingestion latency can dramatically improve model performance for many use-cases, such as fraud detection or product recommendations.

In order to take advantage of continuous processing mode, all you need to do is set aggregation_slide_period=continuous in your feature view definition.

May 24, 2021

Framework v2

We're excited to introduce you to Tecton's Framework v2! While the core concepts remain the same, we've improved the API and added popular new features based on user feedback.

  1. Transformation definitions now have
    • Streamlined authoring for single-transformation features; and
    • Flexible pipelines for composition and re-use.
  2. On Demand Feature Views (formerly Online Feature Packages) can now combine current request data with materialized batch or stream features.
  3. BatchFeatureViews formerly TemporalFeaturePackage) can join multiple batch sources in their transformations.
  4. Backfilling for Batch Feature Views (formerly TemporalFeaturePackage) with data look back will be much more efficient.
  5. New object names make it easier to differentiate batch and stream processing.

May 3, 2021

Spark Configuration Options

Spark configuration options can now be added to EMRClusterConfig or DatabricksClusterConfig. This can be helpful if you're looking to materialize a particularly large dataset, and running into limitations on memory in Spark. The following options are currently supported:

  • spark.driver.memory
  • spark.driver.memoryOverhead
  • spark.executor.memory
  • spark.executor.memoryOverhead
MaterializationConfig(
    online_enabled=True,
    offline_enabled=True,
    feature_start_time=datetime(2021, 1, 1),
    batch_materialization=EMRClusterConfig(
        instance_type="m4.xlarge",
        spark_config={
            "spark.executor.memory": "2g",
            "spark.driver.memory": "2g",
        },
)

April 26, 2021

  • Schema Override for FileDSConfig FileDSConfig now supports an optional schema_override parameter, which can be used to specify a schema with a pyspark.sql.types.StructType object. If the parameter is set, then the schema will be explicitly used whenever Tecton reads from the file as opposed to being inferred automatically. This can be helpful if your file contains a column type that Spark doesn't support, for example INT64 (TIMESTAMP_MICROS).
FileDSConfig(
    uri='s3://ad-impressions-data/ctr_events.pq',
    file_format="parquet",
    schema_override=(
        pyspark.sql.types.StructType()
        .add("ad_id", pyspark.sql.types.LongType(), True)
        .add("user_uuid", pyspark.sql.types.StringType(), True)
        .add("timestamp", pyspark.sql.types.TimestampType(), True)
        .add("clicked", pyspark.sql.types.LongType(), True)
    ),
)

April 6, 2021

  • Online Feature Logging: Feature Services now have the ability to continuously log online requests and feature vector responses as Tecton Datasets. These logged feature datasets can be used for auditing, analysis, training dataset generation, and spine creation.

    Feature Logging Diagram

    To enable feature logging on a FeatureService, simply add a LoggingConfig like in the example below and optionally specify a sample rate. You can also optionally set log_effective_times=True to log the feature timestamps from the Feature Store. As a reminder, Tecton will always serve the latest stored feature values as of the time of the request.

    Run tecton apply to apply your changes.

    from tecton import LoggingConfig
    
    ctr_prediction_service = FeatureService(
        name='ctr_prediction_service',
        features=[
            ad_ground_truth_ctr_performance_7_days,
            user_total_ad_frequency_counts
        ],
        logging=LoggingConfig(
            sample_rate=0.5,
            log_effective_times=False
        )
    )
    

    This will create a new Tecton Dataset under the Datasets tab in the Web UI. This dataset will continue having new feature logs appended to it every 30 mins. If the features in the Feature Service change, a new dataset version will be created.

    Logged Features

    This dataset can be fetched in a notebook using the code snippet below.

    import tecton
    dataset = tecton.get_dataset('ctr_prediction_service.logged_requests.4')
    display(dataset.to_spark())
    

    Logged Features Dataset

  • Easier Dataset Retrieval: All Datasets can now be retrieved by name using the method below:

    import tecton
    dataset = tecton.get_dataset('my_dataset')
    display(dataset.to_spark())
    

March 5, 2021

  • Self-Serve User Management: Tecton now offers a self-service user management portal through the Web UI. Navigate to the Admin Console by clicking on your avatar at the top right of the screen as pictured below: Admin Panel

    From the Admin Console, cluster administrators can add new users or remove existing users from their Tecton instance. Self Serve User Management

February 17, 2021

  • Faster FileDataSource Validation: FileDSConfig now supports an optional new schema_uri parameter, which significantly decreases tecton plan and tecton apply latency. This parameter allows users to specify a specific subpath within the data source URI that will be used as a de facto example of the schema. Example:
      clicks_sleepnumber_file = FileDSConfig(
        uri="s3://acme-my-data/batch_events/",
        file_format="parquet",
        schema_uri="s3://acme-my-data/batch_events/date=2020-06-09/part-00000-tid-644275213368719145-d040e31e-d1c6-42f1-8677-4e97f09df7bc-1358-1.c000.parquet",
      )
    
    Normally, FileDSConfig schema inference recurses through all partitions within a path (e.g. "s3://ad-impressions-data/batch_events/" above), which can be very expensive for large datasets with fine-grained partitioning. This process is not required if a Hive metastore such as Glue is used, since all partitions are already known.

January 22, 2021

New Features

  • Workspace Home Page: A workspace dashboard is now available through our web UI. Navigate to your cluster.tecton.ai URL or click on the Tecton logo in the top left-hand corner of the screen to go to the new dashboard. This dashboard provides a high-level overview of the Tecton Objects in your workspace, changes to your workspace, and quick links to helpful resources. Workspace Dash

January 15, 2021

New Features

  • Limited Destructive Updates: Changing feature_start_time, online_enabled, or offline_enabled in MaterializationConfig for TFPs and TAFPs will no longer be a destructive update. Changing these params will schedule the additional jobs necessary to fill in the gaps, and not destroy existing materialized data nor induce serving downtime.

Breaking Changes

  • @online_transformation arguments must now be named identically to RequestContext schema fields to prevent accidentally swapping arguments.
      rc = RequestContext(
      schema={
        "field_A": StringType()
      })
    
      # OK
      @online_transformation(request_context=rc, output_schema=output_schema)
      def ad_is_displayed_as_banner_transformer(field_A: pandas.Series):
        pass
    
      # Error: RequestContext schema fields ['field_A'] do not
      # match transformation function arguments ['field_X'].
      @online_transformation(request_context=rc, output_schema=output_schema)
      def ad_is_displayed_as_banner_transformer(field_X: pandas.Series):
        pass
    

January 8, 2021

Breaking Changes

  • The Tecton Object .get() accessor methods are now deprecated. To fetch a Tecton Objects, please use the newer workspace methods such as workspace.get_entity("entity_name")
      from tecton import *
      workspace = get_workspace("prod")
      fp = workspace.get_feature_package("my_fp")
      fs = workspace.get_feature_service("my_fs")
      e = workspace.get_entity("my_entity")
      t = workspace.get_transformation("my_transform")
      vds = workspace.get_virtual_data_source("my_vds")
    

December 23, 2020

New Features

  • New Monitoring, Alerting, and Debugging Tools: Monitoring, alerting, and debugging tools are now available to ensure that production FeaturePackages remain in a healthy state. A brief list of the new tools are below, however, more information can be found in the documentation.

    • Add alert_email to the MonitoringConfig of a FeaturePackage to enable email alert. Alert Email
    • Navigate to the "Materialization" tab in the Web UI to see new information on the status of processing jobs and other helpful information.
    • Use the tecton materialization-status [FP_NAME] command in the CLI to retrieve more detailed materialization processing job information for a specific FeaturePackage
    • Use the tecton freshness command in the CLI to retrieve cluster-level freshness information for all production FeaturePackages.
  • Feature Repo File Paths: All Tecton objects now show their Feature Repo file path in the UI to make them easier to discover and edit. Repo Link

  • FeatureService Metadata API FeatureServices have a new metadata API for fetching information about expected input parameters and returned features.
      curl -X POST https://staging.tecton.ai/v1/feature-service/metadata -H "Authorization: Tecton-key $API_KEY" -d\
      '{ "params": { "feature_service_name": "yolo2" } }'
    
      {
        "featureServiceType" : "DEFAULT",
        "inputRequestContextKeys":[{"name":"device_type","type":"string"},{"name":"a","type":"string"}],
          "featureValues":[{"name":"oofp.is_mobile_device","type":"boolean"}]
      }
    

December 7, 2020

New Features

  • PushFeaturePackages: Users can now create PushFeaturePackages to ingest features generated outside of Tecton and load them into the offline and online Feature Stores for training or prediction. For a detailed example, check out Pushing Feature Values into Feature Stores

      import tecton
      import pandas
    
      fp = tecton.get_feature_package('user_purchases_push_fp')
    
      pandas_df = pandas.DataFrame([{
        "timestamp": pandas.Timestamp("2020-09-18 12:00:06", tz="UTC"),
        "userid": "u123",
        "num_purchases": 91
      }])
    
      fp.ingest(pandas_df)
    

  • Improved Documentation and Usability: The Tecton documentation has been updated with easier navigation and a focus on practical examples. Recently, the Tecton team has also shipped a large number of usability improvements throughout the product including bug fixes, better error messages, better validations, and more.

November 6, 2020

New Features

  • FeaturePackage Freshness Monitoring: FeaturePackages now show their "actual freshness" value under the "Materialization" tab. Actual Freshness

October 30, 2020

New Features

  • Tecton Feature Freshness CLI Overview: Users can now view the freshness of all features in the CLI by running tecton freshness.
    $ tecton freshness
               Feature Package               Stale?   Freshness   Expected Freshness     Created At
    =================================================================================================
    ad_ground_truth_ctr_performance_7_days   N        14h 40m     2d                   10/01/20 2:25
    user_ad_impression_counts                N        40m 24s     2h                   10/01/20 2:16
    content_keyword_ctr_performance:v2       N        40m 25s     2h                   09/04/20 22:22
    ad_group_ctr_performance                 N        40m 26s     2h                   08/26/20 12:52
    ad_is_displayed_as_banner                -        -           -                    07/24/20 13:51
    

October 19, 2020

New Features

  • Snowflake Data Sources: Tecton now supports Snowflake as a data source!
      click_stream_snowflake_ds = SnowflakeDSConfig(
      url="https://[your-cluster].eu-west-1.snowflakecomputing.com/",
      database="YOUR_DB",
      schema="CLICK_STREAM_SCHEMA",
      warehouse="COMPUTE_WH",
      table="CLICK_STREAM",
      )
    
      transaction_snowflake_vds = VirtualDataSource(
        name="click_stream_snowflake_vds",
        batch_ds_config=click_stream_snowflake_ds,
      )
    

October 9, 2020

New Features

  • Web UI Materialization Job Monitoring: Materialization jobs are now displayed to help monitor FeaturePackages.

    The easiest way to check the health of a materialized FeaturePackage is now through the Web UI. Navigate to the FeaturePackage in question and switch to the "Materialization" tab to see FeaturePackage materialization diagnostics at a glance.

    The new "Materialization Jobs" table displays the most relevant information about a FeaturePackage's materialization jobs. Retried jobs are grouped into rows, and the most recent job's status is displayed. Visit the "Run Page" for a row to view more specific job information or use the SDK to dive deeper into Materialization Jobs.

    Materialization Status UI

  • Easier Tecton CLI Login: Users can now log into the Tecton CLI simply by running tecton login [cluster URL]. This will automatically open a browser tab to authenticate. tecton configure will now be deprecated, as users no longer need to set keys manually. Tecton Login

September 25, 2020

New Features

  • Improvements to SDK Materialization Status Monitoring: When running fp.materialization_status(verbose=True), users will now also see two additional columns for each run: "TERMINATION_REASON" and "STATE_MESSAGE". These columns should provide more information for failed materialization runs.
  • Simpler Feature Service Definitions: online_serving_enabled is now set to True by default in FeatureServices, making the typical FeatureService definition simpler. Set online_serving_enabled=False if you want to create a batch-only FeatureService.
  • Bug fixes in Saved Datasets.

September 18, 2020

New Features

  • Materialization Status Improvements: The Materialization Status graph in a FeaturePackage's Materialization tab now shows better descriptors that make it clear which bar is related to Streaming vs Batch data. Hover over the bar to view descriptor. Materialization Status
  • In the interactive SDK, users can now pass a flag to only show materialization errors by calling my_feature_package.materialization_status(only_errors=True)

September 11, 2020

New Features

  • Feature Package entities are now hyperlinks to specific Entity pages. Entity Links

September 4, 2020

New Features and Breaking Changes

  • Quicker Workspace Iteration: Users no longer have to confirm destructive changes when running tecton apply in non-prod workspaces. These safety checks are unnecessary because non-prod workspaces do not contain materialized data and can be easily restored to a prior state.
  • The default_join_keys parameter in the Entity class has been renamed to join_keys. default_join_keys will be deprecated.
      partner_entity = Entity(name="PartnerWebsite", join_keys=["partner_id"], description="The partner website participating in the ad network.")
    

August 21, 2020

New Features

  • Feature Summary Statistics: Tecton now computes and displays data summary statistics in the Web UI for features the have offline materialization enabled. Summary Stats

August 14, 2020

New Features

  • Feature Freshness Custom Monitoring: Users now can customize the freshness monitoring of their Feature Packages using MonitoringConfig. The Web UI will also reflect these configurations.

    In your Tecton declarative API configuration file, import MonitoringConfig to specify how your materialized Feature Package should be monitored. If you don't provide this config, we will compute a default threshold.

      from tecton import MonitoringConfig, TemporalFeaturePackage
    
      ...
    
      my_feature_package = TemporalFeaturePackage(
        name="my_feature_package",
        ...
        materialization=MaterializationConfig(
            schedule_interval="3d",
            ...
        ),
        monitoring_config = MonitoringConfig(
            monitor_freshness=True,
            expected_feature="2w"
        )
      )
    
    You can then find this in the Materialization tab on a Feature Package page on the Web UI.

    Transform FCOs

August 6, 2020

New Features

  • Kafka and Redshift Data Sources: Tecton now supports Kafka and Redshift as data sources!

July 31, 2020

New Features

  • First Class Transformations: Transformations are now first-class objects in Tecton. They can be cataloged with metadata, viewed in the UI, and fetched in a notebook. Transform FCOs
      import tecton
    
      # Prod workspace
      tecton.get_transformation('my_transformation')
    
      # Specified workspace
      ws = tecton.get_workspace('my_ws')
      ws.get_transformation('my_transformation')
    
      Property                                   Value
      ================================================================================
      name          my_transformation
      description   None
      created_at    2020-07-28 20:15:14
      defined_in    my/transformation.py
      owner         ravi
      type          SQL
      inputs        Transformations: ['transformation1']
                    Virtual Data Sources: None
      use_context   True
      transformer   def my_transformation(context, transformation1_view):
                        return f"""
                            SELECT
                                content_id,
                                SUM(clicked) as actual2,
                                to_timestamp('{context.feature_data_end_time}') as
                    timestamp
                            FROM
                                {transformation1_view}
                            GROUP BY
                                content_id
                        """
    

July 27, 2020

New Features and Breaking Changes

  • Simpler FeatureService Definitions: Specifying the features that are used in a FeatureService is now done in the constructor. Along with this change, the FeatureService.add() method has been deprecated. This change ensures that a FeatureService definition has a single source of truth, and makes the FeatureService class consistent with other Tecton classes.
    from tecton import FeatureService
    from feature_repo.features import my_package1, my_package2
    
    my_service = FeatureService(
        name='example_feature_service',
        features=[
            my_package1,
            my_package2
        ]
    )
    
  • Materialization parameters changes: When defining a FeaturePackage, all materialization-parameters are now specified in a configuration class, MaterializationConfig. This change is expected to increase organization and re-use (as MaterializationConfig can be reused across many FeaturePackages.) Some parameters have been renamed for clarity and brevity.

    from tecton import TemporalFeaturePackage, MaterializationConfig
    
    ad_ground_truth_ctr_performance_7_days = TemporalFeaturePackage(
        name="ad_ground_truth_ctr_performance_7_days",
        transformation=ad_ground_truth_ctr_performance_7_days_transformer,
        entities=[e.ad_entity],
        data_source_configs=[data_sources.ad_impressions_batch_config],
        materialization=MaterializationConfig(
            online_enabled=True,
            feature_start_time=datetime(2020, 6, 19),
            schedule_interval='1day',
            serving_ttl='1day',
            data_lookback_period='7days'
        ),
    )
    
    The table below lists the full changes:

    Old New
    online_materialization_enabled online_enabled
    offline_materialization_enabled offline_enabled
    feature_store_start_time feature_start_time
    batch_materialization_schedule schedule_interval
    data_lookback data_lookback_period
    serving_tll serving_ttl
  • DataSource class names: Datasource class names have been shortened for brevity. The table below lists the full changes:

    Old New
    HiveDataSourceConfig HiveDSConfig
    KinesisDataSourceConfig KinesisDSConfig
    FileDataSourceConfig FileDSConfig
  • Interactive and Declarative class split: Tecton has fully split its Interactive and Declarative Python classes. The Reference API now lists seperate pages for Interactive classes (which are used in notebooks and returned from functions such as tecton.get_feature_package()), and Declarative classes (which are used to declare Tecton objects in a Feature Repository.)

  • timestamp_key is now optional: When declaring a FeaturePackage, Tecton will now infer the timestamp_key argument when possible.

July 6, 2020

New Features

  • Metadata tagging: Tecton users now have the ability to add metadata tags to VirtualDataSources, Entities, Feature Packages, and Feature Services. Simply pass a Python dictionary containing all tags via the tags parameter to their constructors. These tags will show up in the Tecton Web UI.
      my_feature_package = TemporalFeaturePackage(
        name="my_example_temporal_feature_package",
        ...
        tags={
          'tag_key':'tag_value',
          'experimental':'true',
        }
      )
    
  • Plan Hooks for OnlineTransformation testing: Tecton supports Plan Hooks that run automatically every time tecton plan or tecton apply is run. This lets you trigger customizable behavior during key actions during the tecton workflow. Plan Hooks are great for creating unit tests for OnlineTransformations where errors would otherwise only be caught at runtime.

June 25, 2020

New Features and Breaking Changes

  • Configurable offline and online materialization: Users can now independently configure offline and online materialization for a FeaturePackage in order to optimize costs and store exactly the data that is needed.

    This is enabled via the new parameters, online_materialization_enabled, and offline_materialization_enabled. The materialization_enabled parameter has been removed.

    # Example: online materialization is required for serving,
    # but offline materialization for historical look-up is not required.
    
    my_feature_package = TemporalFeaturePackage(
        name='my_feature',
        ...
        online_materialization_enabled=True,
        offline_materialization_enabled=False,
    )
    
  • Easier experimentation with Feature Services: Users can now create FeatureServices that depend on FeaturePackages which are not materializing, allowing for richer experimentation before materialization is enabled.

    This is enabled via the new online_serving_enabled parameter on FeatureService, which configures whether a FeatureService can serve feature values online. By setting online_serving_enabled to False, users can now create FeatureServices with non-materializing FeaturePackages.

    online_serving_enabled defaults to False, meaning that the default behavior of FeatureServices has changed.

    # to use a FeatureService for online queries, online_serving_enabled
    # must be explicitly set to True.
    
    my_feature_service = FeatureService(
        name='feature_service_for_online_use',
        ...,
        online_serving_enabled=True
    )
    

June 11, 2020

New Features

  • Faster Tecton CLI: The Spark driver initialization has been removed from the CLI, making it much quicker to run tecton plan and tecton apply. Try it out in the latest SDK!
  • Workspace names are now included in the Web UI URL to enable direct linking to objects in a Workspace.
    Workspaces URL

New Features and Breaking Changes

  • The command for creating a Workspace has been changed from tecton workspace new [workspace] to tecton workspace create [workspace]. For a complete list of Workspace commands, check out Using Workspaces.

May 11, 2020

New Features

  • Workspaces: Users can now define different Tecton Workspaces which offer an isolated environment for experimental iteration. Workspaces are designed to work well with code branches. To get started try entering the CLI commands below and then navigate to your new workspace in the Tecton Web UI. You can find detailed documentation on Workspaces here.
    $ git checkout -b [name]
    $ tecton workspace create [name]
    $ tecton apply
    
  • File Data Sources now support Parquet and CSV file formats.
  • The Tecton CLI now provides more helpful error messages that tell where offending objects are defined.

Bug Fixes

  • http://[domain].tecton.ai now correctly redirects to https instead of hanging.