Accelerate ML feature pipelines with new capabilities in Amazon SageMaker Feature Store
Artificial Intelligence Accelerate ML feature pipelines with new capabilities in Amazon SageMaker Feature Store Amazon SageMaker Feature Store is a fully managed, purpose-built repository to store, share, and manage features for machine learning (ML) models. It now supports Apache Iceberg table format, streaming ingestion, scalable batch ingestion, and fine-grained access control through AWS Lake Formation. As organizations scale their machine learning platforms from experimentation to production, two operational challenges consistently surface. The first is securing access to sensitive feature data without introducing manual overhead for every new feature group. The second is keeping storage costs predictable when high-frequency streaming workloads generate ever-growing volumes of Apache Iceberg metadata. For example, one retail analytics team discovered that their Apache Iceberg-based offline store had accumulated over 50 TB of metadata files in under a year, driving substantial and unexpected Amazon Simple Storage Service (Amazon S3) charges. Meanwhile, infrastructure teams across industries told us they need Lake Formation-enforced access control on feature data that works automatically at the point of feature group creation. They don’t want it as an afterthought requiring repetitive manual configuration. Today, we’re announcing three new capabilities available in SageMaker Python SDK v3.8.0 that address these challenges: - Native AWS Lake Formation integration – Register your offline store with Lake Formation during feature group creation, or for existing feature groups, to enforce column-level, row-level, and cell-level access control. No manual Lake Formation setup required. - Additional Apache Iceberg table properties – Control metadata retention and snapshot lifecycle policies at feature group creation or on existing feature groups to prevent metadata accumulation and reduce storage costs. - Feature Store support in SageMaker Python SDK v3 – The modernized SDK v3.8.0 brings the full set of Feature Store capabilities, including these new features, into a modular, faster, lighter-weight package. In this post, we walk through each capability with code examples you can use to get started. For complete end-to-end walkthroughs, see the accompanying notebooks for Lake Formation governance and Iceberg table properties in the SageMaker Python SDK repository. Prerequisites To follow along with the examples in this post, you need: - An AWS account with permissions to create Amazon SageMaker AI resources. - An Amazon SageMaker AI execution role with access to Amazon S3, AWS Glue, and AWS Lake Formation. - SageMaker Python SDK v3.8.0 or later. You can use the following command to install SageMaker: pip install --upgrade "sagemaker>=3.8.0" - For Lake Formation integration: at least one Data Lake Administrator configured in your account. Feature Store validates this before activating access control. - An existing Amazon S3 bucket for offline store data. Solution overview These capabilities are delivered through new parameters in the SDK v3 FeatureGroupManager.create() and FeatureGroupManager.update() calls. The LakeFormationConfig triggers automatic access control setup, and the IcebergProperties configures metadata lifecycle. Both can be set at feature group creation time or applied to existing feature groups. Feature Store in SageMaker Python SDK v3 SageMaker Python SDK v3.8.0, released April 16, 2026, is the foundation for the capabilities described in this post.…

