Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel

test_dataset.py 978 B

You have to be logged in to leave a comment. Sign In
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  1. def test_dataset(df):
  2. """Test dataset quality and integrity."""
  3. column_list = ["id", "created_on", "title", "description", "tag"]
  4. df.expect_table_columns_to_match_ordered_list(column_list=column_list) # schema adherence
  5. tags = ["computer-vision", "natural-language-processing", "mlops", "other"]
  6. df.expect_column_values_to_be_in_set(column="tag", value_set=tags) # expected labels
  7. df.expect_compound_columns_to_be_unique(column_list=["title", "description"]) # data leaks
  8. df.expect_column_values_to_not_be_null(column="tag") # missing values
  9. df.expect_column_values_to_be_unique(column="id") # unique values
  10. df.expect_column_values_to_be_of_type(column="title", type_="str") # type adherence
  11. # Expectation suite
  12. expectation_suite = df.get_expectation_suite(discard_failed_expectations=False)
  13. results = df.validate(expectation_suite=expectation_suite, only_return_failures=True).to_json_dict()
  14. assert results["success"]
Tip!

Press p or to see the previous file or, n or to see the next file

Comments

Loading...