API¶
checks¶
This file contains the functions doing the actual asserts.
You can potentially use this file during interactive sessions,
probably via the pipe
method.
checks.py
Each function in here should
- Take a DataFrame as its first argument, maybe optional arguments
- Makes its assert on the result
- Return the original DataFrame
-
engarde.checks.
is_monotonic
(df, items=None, increasing=None, strict=False)¶ Asserts that the DataFrame is monotonic.
Parameters: df : Series or DataFrame
items : dict
mapping columns to conditions (increasing, strict)
increasing : None or bool
None is either increasing or decreasing.
strict : whether the comparison should be strict
Returns: df : DataFrame
-
engarde.checks.
is_same_as
(df, df_to_compare, **kwargs)¶ Assert that two pandas dataframes are the equal
Parameters: df : pandas DataFrame
df_to_compare : pandas DataFrame
**kwargs : dict
keyword arguments passed through to panda’s
assert_frame_equal
Returns: df : DataFrame
-
engarde.checks.
is_shape
(df, shape)¶ Asserts that the DataFrame is of a known shape.
Parameters: df : DataFrame
shape : tuple
(n_rows, n_columns). Use None or -1 if you don’t care about a dimension.
Returns: df : DataFrame
-
engarde.checks.
none_missing
(df, columns=None)¶ Asserts that there are no missing values (NaNs) in the DataFrame.
Parameters: df : DataFrame
columns : list
list of columns to restrict the check to
Returns: df : DataFrame
same as the original
-
engarde.checks.
unique_index
(df)¶ Assert that the index is unique
Parameters: df : DataFrame Returns: df : DataFrame
-
engarde.checks.
within_n_std
(df, n=3)¶ Assert that every value is within
n
standard deviations of its column’s mean.Parameters: df : DataFame
n : int
number of standard deviations from the mean
Returns: df : DataFrame
-
engarde.checks.
within_range
(df, items=None)¶ Assert that a DataFrame is within a range.
Parameters: df : DataFame
items : dict
mapping of columns (k) to a (low, high) tuple (v) that
df[k]
is expected to be between.Returns: df : DataFrame
-
engarde.checks.
within_set
(df, items=None)¶ Assert that df is a subset of items
Parameters: df : DataFrame
items : dict
mapping of columns (k) to array-like of values (v) that
df[k]
is expected to be a subset ofReturns: df : DataFrame
-
engarde.checks.
has_dtypes
(df, items)¶ Assert that a DataFrame has
dtypes
Parameters: df: DataFrame
items: dict
mapping of columns to dtype.
Returns: df : DataFrame
-
engarde.checks.
verify
(df, check, *args, **kwargs)¶ Generic verify. Assert that
check(df, *args, **kwargs)
is true.Parameters: df : DataFrame
check : function
Should take DataFrame and **kwargs. Returns bool
Returns: df : DataFrame
same as the input.
-
engarde.checks.
verify_all
(df, check, *args, **kwargs)¶ Verify that all the entries in
check(df, *args, **kwargs)
are true.
-
engarde.checks.
verify_any
(df, check, *args, **kwargs)¶ Verify that any of the entries in
check(df, *args, **kwargs)
is true
decorators¶
-
engarde.decorators.
none_missing
(columns=None)¶ Asserts that no missing values (NaN) are found
-
engarde.decorators.
within_range
(items)¶ Check that a DataFrame’s values are within a range.
Parameters: items : dict or array-like
dict maps columss to (lower, upper) array-like checks the same (lower, upper) for each column
-
engarde.decorators.
within_set
(items)¶ Check that DataFrame values are within set.
>>> @within_set({'A': {1, 3}}) >>> def f(df): return df
-
engarde.decorators.
has_dtypes
(items)¶ Tests that the dtypes are as specified in items.
-
engarde.decorators.
verify
(func, *args, **kwargs)¶ Assert that func(df, *args, **kwargs) is true.
-
engarde.decorators.
verify_all
(func, *args, **kwargs)¶ Assert that all of func(*args, **kwargs) are true.
-
engarde.decorators.
verify_any
(func, *args, **kwargs)¶ Assert that any of func(*args, **kwargs) are true.
-
engarde.decorators.
within_n_std
(n=3)¶ Tests that all values are within 3 standard deviations of their mean.
This file provides a nice API for each of the checks, designed to fit seamlessly into an ETL pipeline. Each of the functions defined here can be applied to a functino that returns a DataFrame.