API

checks

This file contains the functions doing the actual asserts. You can potentially use this file during interactive sessions, probably via the pipe method.

checks.py

Each function in here should

  • Take a DataFrame as its first argument, maybe optional arguments
  • Makes its assert on the result
  • Return the original DataFrame
engarde.checks.is_monotonic(df, items=None, increasing=None, strict=False)

Asserts that the DataFrame is monotonic.

Parameters:

df : Series or DataFrame

items : dict

mapping columns to conditions (increasing, strict)

increasing : None or bool

None is either increasing or decreasing.

strict : whether the comparison should be strict

Returns:

df : DataFrame

engarde.checks.is_same_as(df, df_to_compare, **kwargs)

Assert that two pandas dataframes are the equal

Parameters:

df : pandas DataFrame

df_to_compare : pandas DataFrame

**kwargs : dict

keyword arguments passed through to panda’s assert_frame_equal

Returns:

df : DataFrame

engarde.checks.is_shape(df, shape)

Asserts that the DataFrame is of a known shape.

Parameters:

df : DataFrame

shape : tuple

(n_rows, n_columns). Use None or -1 if you don’t care about a dimension.

Returns:

df : DataFrame

engarde.checks.none_missing(df, columns=None)

Asserts that there are no missing values (NaNs) in the DataFrame.

Parameters:

df : DataFrame

columns : list

list of columns to restrict the check to

Returns:

df : DataFrame

same as the original

engarde.checks.unique_index(df)

Assert that the index is unique

Parameters:df : DataFrame
Returns:df : DataFrame
engarde.checks.within_n_std(df, n=3)

Assert that every value is within n standard deviations of its column’s mean.

Parameters:

df : DataFame

n : int

number of standard devations from the mean

Returns:

df : DatFrame

engarde.checks.within_range(df, items=None)

Assert that a DataFrame is within a range.

Parameters:

df : DataFame

items : dict

mapping of columns (k) to a (low, high) tuple (v) that df[k] is expected to be between.

Returns:

df : DataFrame

engarde.checks.within_set(df, items=None)

Assert that df is a subset of items

Parameters:

df : DataFrame

items : dict

mapping of columns (k) to array-like of values (v) that df[k] is expected to be a subset of

Returns:

df : DataFrame

engarde.checks.has_dtypes(df, items)

Assert that a DataFrame has dtypes

Parameters:

df: DataFrame

items: dict

mapping of columns to dtype.

Returns:

df : DataFrame

engarde.checks.verify(df, check, *args, **kwargs)

Generic verify. Assert that check(df, *args, **kwargs) is true.

Parameters:

df : DataFrame

check : function

Should take DataFrame and **kwargs. Returns bool

Returns:

df : DataFrame

same as the input.

engarde.checks.verify_all(df, check, *args, **kwargs)

Verify that all the entries in check(df, *args, **kwargs) are true.

engarde.checks.verify_any(df, check, *args, **kwargs)

Verify that any of the entries in check(df, *args, **kwargs) is true

decorators

engarde.decorators.none_missing(columns=None)

Asserts that no missing values (NaN) are found

engarde.decorators.within_range(items)

Check that a DataFrame’s values are within a range.

Parameters:

items : dict or array-like

dict maps columss to (lower, upper) array-like checks the same (lower, upper) for each column

engarde.decorators.within_set(items)

Check that DataFrame values are within set.

>>> @within_set({'A': {1, 3}})
>>> def f(df):
        return df
engarde.decorators.has_dtypes(items)

Tests that the dtypes are as specified in items.

engarde.decorators.verify(func, *args, **kwargs)

Assert that func(df, *args, **kwargs) is true.

engarde.decorators.verify_all(func, *args, **kwargs)

Assert that all of func(*args, **kwargs) are true.

engarde.decorators.verify_any(func, *args, **kwargs)

Assert that any of func(*args, **kwargs) are true.

engarde.decorators.within_n_std(n=3)

Tests that all values are within 3 standard deviations of their mean.

This file provides a nice API for each of the checks, designed to fit seamlessly into an ETL pipeline. Each of the functions defined here can be applied to a functino that returns a DataFrame.