API¶

checks¶

This file contains the functions doing the actual asserts. You can potentially use this file during interactive sessions, probably via the pipe method.

checks.py

Each function in here should

Take a DataFrame as its first argument, maybe optional arguments
Makes its assert on the result
Return the original DataFrame

engarde.checks.is_monotonic(df, items=None, increasing=None, strict=False)¶

Asserts that the DataFrame is monotonic.

Parameters:	df : Series or DataFrame items : dict mapping columns to conditions (increasing, strict) increasing : None or bool None is either increasing or decreasing. strict : whether the comparison should be strict
Returns:	df : DataFrame

engarde.checks.is_same_as(df, df_to_compare, **kwargs)¶

Assert that two pandas dataframes are the equal

Parameters:	df : pandas DataFrame df_to_compare : pandas DataFrame **kwargs : dict keyword arguments passed through to panda’s `assert_frame_equal`
Returns:	df : DataFrame

engarde.checks.is_shape(df, shape)¶

Asserts that the DataFrame is of a known shape.

Parameters:	df : DataFrame shape : tuple (n_rows, n_columns). Use None or -1 if you don’t care about a dimension.
Returns:	df : DataFrame

engarde.checks.none_missing(df, columns=None)¶

Asserts that there are no missing values (NaNs) in the DataFrame.

Parameters:	df : DataFrame columns : list list of columns to restrict the check to
Returns:	df : DataFrame same as the original

engarde.checks.unique_index(df)¶

Assert that the index is unique

Parameters:	df : DataFrame
Returns:	df : DataFrame

engarde.checks.within_n_std(df, n=3)¶

Assert that every value is within n standard deviations of its column’s mean.

Parameters:	df : DataFame n : int number of standard deviations from the mean
Returns:	df : DataFrame

engarde.checks.within_range(df, items=None)¶

Assert that a DataFrame is within a range.

Parameters:	df : DataFame items : dict mapping of columns (k) to a (low, high) tuple (v) that `df[k]` is expected to be between.
Returns:	df : DataFrame

engarde.checks.within_set(df, items=None)¶

Assert that df is a subset of items

Parameters:	df : DataFrame items : dict mapping of columns (k) to array-like of values (v) that `df[k]` is expected to be a subset of
Returns:	df : DataFrame

engarde.checks.has_dtypes(df, items)¶

Assert that a DataFrame has dtypes

Parameters:	df: DataFrame items: dict mapping of columns to dtype.
Returns:	df : DataFrame

engarde.checks.verify(df, check, *args, **kwargs)¶

Generic verify. Assert that check(df, *args, **kwargs) is true.

Parameters:	df : DataFrame check : function Should take DataFrame and **kwargs. Returns bool
Returns:	df : DataFrame same as the input.

engarde.checks.verify_all(df, check, *args, **kwargs)¶: Verify that all the entries in check(df, *args, **kwargs) are true.

engarde.checks.verify_any(df, check, *args, **kwargs)¶: Verify that any of the entries in check(df, *args, **kwargs) is true

engarde.checks.one_to_many(df, unitcol, manycol)¶

Assert that a many-to-one relationship is preserved between two columns. For example, a retail store will have have distinct departments, each with several employees. If each employee may only work in a single department, then the relationship of the department to the employees is one to many.

Parameters:	df : DataFrame unitcol : str The column that encapulates the groups in `manycol`. manycol : str The column that must remain unique in the distict pairs between `manycol` and `unitcol`
Returns:	df : DataFrame

engarde.checks.is_same_as(df, df_to_compare, **kwargs)

Assert that two pandas dataframes are the equal

Parameters:	df : pandas DataFrame df_to_compare : pandas DataFrame **kwargs : dict keyword arguments passed through to panda’s `assert_frame_equal`
Returns:	df : DataFrame

decorators¶

engarde.decorators.none_missing(columns=None)¶: Asserts that no missing values (NaN) are found

engarde.decorators.within_range(items)¶

Check that a DataFrame’s values are within a range.

Parameters:	items : dict or array-like dict maps columss to (lower, upper) array-like checks the same (lower, upper) for each column

engarde.decorators.within_set(items)¶

Check that DataFrame values are within set.

>>> @within_set({'A': {1, 3}})
>>> def f(df):
        return df

engarde.decorators.has_dtypes(items)¶: Tests that the dtypes are as specified in items.

engarde.decorators.verify(func, *args, **kwargs)¶: Assert that func(df, *args, **kwargs) is true.

engarde.decorators.verify_all(func, *args, **kwargs)¶: Assert that all of func(*args, **kwargs) are true.

engarde.decorators.verify_any(func, *args, **kwargs)¶: Assert that any of func(*args, **kwargs) are true.

engarde.decorators.within_n_std(n=3)¶: Tests that all values are within 3 standard deviations of their mean.

engarde.decorators.one_to_many(unitcol, manycol)¶: Tests that each value in manycol only is associated with just a single value in unitcol.

This file provides a nice API for each of the checks, designed to fit seamlessly into an ETL pipeline. Each of the functions defined here can be applied to a function that returns a DataFrame.