Skip to main content
Version: 1.9.1

Manage Validations

To explore your data and fine-tune your Expectations, run an ad hoc Validation as described on this page. To run recurring Validations, use a schedule or an orchestrator.

Run an ad hoc Validation

Workflows for ad hoc Validations vary based on the following aspects of what you're validating:

Find your workflow

To help you find the right workflow for your particular combination of Expectation type, data scope, and Data Source, the table below provides a summary of the workflow for each possible combination of factors and a link to detailed instructions.

ExpectationsScopeData SourceWorkflow summary and link to full instructions
GX-managedEntire Data AssetAlloyDB
Amazon S3
Aurora
Citus
Databricks SQL
Neon
PostgreSQL
Redshift
Snowflake
Use the UI:
  • Click the Validate button.
Jump to full instructions
Azure Blob Storage
BigQuery
Google Cloud Storage
Use the API:
  • Retrieve your Data Asset’s GX-managed Checkpoint.
  • Run the Checkpoint.
Jump to full instructions
Time intervalAlloyDB
Aurora
Citus
Databricks SQL
Neon
PostgreSQL
Redshift
Snowflake
Use the UI:
  • Choose a Batch interval.
  • Click the Validate button and select a Batch to validate.
Jump to full instructions
Amazon S3
Azure Blob Storage
Google Cloud Storage
Use the API:
  • Update your Data Asset’s GX-managed Batch Definition to partition your data based on regex filename matching.
  • Retrieve your Data Asset's GX-managed Checkpoint.
  • Run the Checkpoint with Batch Parameters passed as strings.
Jump to full instructions
BigQueryUse the API:
  • Update your Data Asset’s GX-managed Batch Definition to partition your data based on values in a DATE or DATETIME column.
  • Retrieve your Data Asset's GX-managed Checkpoint.
  • Run the Checkpoint with Batch Parameters passed as integers.
Jump to full instructions
API-managedEntire Data AssetAll sourcesUse the API:
  • Retrieve your Data Asset's GX-managed Batch Definition.
  • Create a Validation Definition to associate your API-managed Expectations with your Data Asset via its GX-managed Batch Definition.
  • Run the Validation Definition.
Jump to full instructions
Time intervalAlloyDB
Aurora
BigQuery
Citus
Databricks SQL
Neon
PostgreSQL
Redshift
Snowflake
Use the API:
  • Create a Batch Definition to partition your data based on values in a DATE or DATETIME column.
  • Create a Validation Definition to associate your API-managed Expectations with your Data Asset via your Batch Definition.
  • Run the Validation Definition with Batch Parameters passed as integers.
Jump to full instructions
Amazon S3
Azure Blob Storage
Google Cloud Storage
Use the API:
  • Create a Batch Definition to partition your data based on regex filename matching.
  • Create a Validation Definition to associate your API-managed Expectations with your Data Asset via your Batch Definition.
  • Run the Validation Definition with Batch Parameters passed as strings.
Jump to full instructions

No matter how you run your Validations, you can view historical Validation Results in the GX Cloud UI.

GX-managed Expectations, entire asset

If your Data Source is one of the following, you can use the GX Cloud UI to validate GX-managed Expectations for your entire Data Asset:

  • AlloyDB
  • Amazon S3
  • Aurora
  • Citus
  • Databricks SQL
  • Neon
  • PostgreSQL
  • Redshift
  • Snowflake

For all Data Sources, you can use the GX Cloud API to validate GX-managed Expectations for your entire Data Asset.

Prerequisites

Procedure

  1. In the GX Cloud UI, select the relevant Workspace and then click Data Assets.

  2. In the Data Assets list, click the Data Asset name.

  3. Click Validate.

When the Validation is complete, you can view the results in the GX Cloud UI.

GX-managed Expectations, time interval

If your Data Source is one of the following, you can use the GX Cloud UI to validate GX-managed Expectations for a time-based subset of your Data Asset:

  • AlloyDB
  • Aurora
  • Citus
  • Databricks SQL
  • Neon
  • PostgreSQL
  • Redshift
  • Snowflake

For all Data Sources, you can use the GX Cloud API to validate GX-managed Expectations for a time-based subset of your Data Asset. Note that the code is different for SQL Data Sources vs. filesystem Data Sources.

Prerequisites

Procedure

To validate your data incrementally, you will first define how to partition your data into Batches and then select a specific time-based Batch to validate.

First, partition your data.

  1. In the GX Cloud UI, select the relevant Workspace and then click Data Assets.

  2. In the Data Assets list, click the Data Asset name.

  3. Next to the current Batch configuration, click pencil icon Edit Batch.

  4. Choose a Batch interval.

    • Year partitions Data Asset records by year.
    • Month partitions Data Asset records by year and month.
    • Day partitions Data Asset records by year, month, and day.
  5. Under Validate by, select the column that contains the DATE or DATETIME data to partition on.

  6. Click Save.

Then, you can validate a Batch of data.

  1. Click Validate.

  2. Select one of the following options to Specify a single Batch to validate:

    • Latest Batch. Note that the latest Batch may still be receiving new data. For example, if you are batching by day and have new data arriving every hour, the latest Batch will be any data that has arrived in the current day. The latest daily Batch is not necessarily a full 24 hours worth of data.

    • Custom Batch, which will let you enter a specific period of time to validate based on how you've batched your data. For example, if you've batched your data by month, you'll be prompted to enter a Year-Month to identify the records to validate.

  3. Click Run.

When the Validation is complete, you can view the results in the GX Cloud UI.

API-managed Expectations, entire asset

To validate API-managed Expectations for your entire Data Asset, use the GX Cloud API. The process is the same regardless of your Data Source. You will first create a Validation Definition that links your data to your Expectations. Then you can run the Validation Definition to validate the referenced data against the associated Expectations for testing or data exploration. If you want to trigger Actions based on the Validation Results, you will add your Validation Definition to a Checkpoint that associates your tests with conditional logic for responding to results.

Prerequisites

Procedure

To help you validate API-managed Expectations on an entire Data Asset with the Cloud API, GX Cloud provides a GX-managed Batch Definition you can use to identify your data.

  1. Retrieve your Data Asset’s GX-managed Batch Definition.

    Python
    import great_expectations as gx

    context = gx.get_context(mode="cloud")

    data_source_name = "my_data_source"
    data_asset_name = "my_data_asset"
    batch_definition_name = f"{data_asset_name} - GX-Managed Batch Definition"

    batch_definition = (
    context.data_sources.get(data_source_name)
    .get_asset(data_asset_name)
    .get_batch_definition(batch_definition_name)
    )
  2. Retrieve your API-managed Expectation Suite.

    Python
    suite_name = "my_expectation_suite"
    suite = context.suites.get(name=suite_name)
  3. Create a Validation Definition that associates the Batch Definition with the Expectation Suite.

    Python
    definition_name = "my_validation_definition"
    validation_definition = gx.ValidationDefinition(
    data=batch_definition, suite=suite, name=definition_name
    )
  4. Run the Validation Definition.

    Python
    # The following is an example of running a Validation Definition for an in-memory dataframe Data Asset.
    # If you are working with a SQL or filesystem Data Asset, omit the batch_parameters.
    batch_parameters = {"dataframe": test_df}
    validation_definition.run(batch_parameters=batch_parameters)
  5. Optional. Create a Checkpoint so you can trigger Actions based on the Validation Results of your API-managed Expectations.

    Python
    # Retrieve the Validation Definition
    validation_definition = context.validation_definitions.get("my_validation_definition")

    # Create a Checkpoint
    checkpoint_name = "my_checkpoint"
    checkpoint_config = gx.Checkpoint(
    name=checkpoint_name, validation_definitions=[validation_definition]
    )

    # Save the Checkpoint to the data context
    checkpoint = context.checkpoints.add(checkpoint_config)

    # Run the Checkpoint
    # The following is an example of running a Checkpoint for an in-memory dataframe Data Asset.
    # If you are working with a SQL or filesystem Data Asset, omit the batch_parameters.
    checkpoint.run(batch_parameters=batch_parameters)

When the Validation is complete, you can view the results in the GX Cloud UI.

API-managed Expectations, time interval

To validate API-managed Expectations for a time-based subset of a Data Asset, use the GX Cloud API. Note that the code is different for SQL Data Sources vs. filesystem Data Sources. You will first partition your data and create a Validation Definition that links your partitioned data to your Expectations. Then you can run the Validation Definition to validate the referenced data against the associated Expectations for testing or data exploration. If you want to trigger Actions based on the Validation Results, you will add your Validation Definition to a Checkpoint that associates your tests with conditional logic for responding to results.

Prerequisites

Procedure

The code for validating API-managed Expectations on a time-based subset of a Data Asset depends on your Data Source type. For SQL Data Sources, you will partition your data based on values in a DATE or DATETIME column. For filesystem Data Sources, you will partition your data based on regex filename matching.

To validate your data incrementally, you will first define how to partition your data into Batches and then select a specific time-based Batch to validate.

  1. Retrieve your Data Asset.

    Python
    data_source_name = "my_data_source"
    data_asset_name = "my_data_asset"

    import great_expectations as gx

    context = gx.get_context(mode="cloud")
    ds = context.data_sources.get(data_source_name)
    data_asset = ds.get_asset(data_asset_name)
  2. Decide how you want to batch your data. Reference the table below to determine the method to use to achieve your goal.

    Goalmethod
    Partition records by yearadd_batch_definition_yearly
    Partition records by year and monthadd_batch_definition_monthly
    Partition records by year, month, and dayadd_batch_definition_daily
  3. Partition your data. This example demonstrates daily Batches with the add_batch_definition_daily method. Refer to the above table for methods for other types of Batches.

    Python
    batch_definition_name = "my_daily_batch_definition"
    date_column = "my_date_or_datetime_column"
    daily_batch_definition = data_asset.add_batch_definition_daily(
    name=batch_definition_name, column=date_column
    )
  4. Retrieve your API-managed Expectation Suite.

    Python
    suite_name = "my_expectation_suite"
    suite = context.suites.get(name=suite_name)
  5. Create a Validation Definition that associates your time-based Batch Definition with your API-managed Expectation Suite.

    Python
    definition_name = "my_validation_definition"
    validation_definition = gx.ValidationDefinition(
    data=daily_batch_definition, suite=suite, name=definition_name
    )

    validation_definition = context.validation_definitions.add(validation_definition)
  6. Run the Validation Definition with Batch Parameters passed as integers.

    Python
    batch_parameters_daily = {"year": 2019, "month": 1, "day": 30}

    validation_definition.run(batch_parameters=batch_parameters_daily)
  7. Optional. Create a Checkpoint so you can trigger Actions based on the Validation Results of your API-managed Expectations.

    Python
    # Retrieve the Validation Definition
    validation_definition = context.validation_definitions.get("my_validation_definition")

    # Create a Checkpoint
    checkpoint_name = "my_checkpoint"
    checkpoint_config = gx.Checkpoint(
    name=checkpoint_name, validation_definitions=[validation_definition]
    )

    # Save the Checkpoint to the data context
    checkpoint = context.checkpoints.add(checkpoint_config)

    # When you run the Checkpoint, pass Batch Parameters as integers
    batch_parameters_daily = {"year": 2019, "month": 1, "day": 30}

    checkpoint.run(batch_parameters=batch_parameters_daily)

When the Validation is complete, you can view the results in the GX Cloud UI.

View Validation run history

  1. In GX Cloud, select the relevant Workspace and then click Data Assets.

  2. In the Data Assets list, click the Data Asset name.

  3. Click the Validations tab.

  4. If you have multiple Expectation Suites, select the suite of interest.

  5. Do one or more of the following:

    • To view results for a specific Validation run, select an entry in the Batches & run history pane.

      • To view only Expectations that failed in the selected run, click Failures only.
    • To view the run history of all Validations, select All Runs to view a graph showing the Validation run history for all columns.

      • To view details about a specific Validation run in the Validation timeline, including the observed values, hover over a success or failure severity icon.

      Provided details are: success, severity, run time, batch interval, batch column, batch name, and observed value.

    Run history details

    Depending on how your Data Assets are validated, you may find the following information on entries in the Batches & run history pane.

    • A calendar icon calendar icon indicates a Validation ran by a GX-managed schedule.
    • Batch information is included for any Validation ran on a subset of a Data Asset.
  6. Optional. Click Share to copy the URL for the Validation Results and share them with other users in your workspace.