tab_err.error_mechanism#

Classes#

EAR

ErrorMechanism subclass implementing the Erroneous Completely At Random error mechanism.

ECAR

ErrorMechanism subclass implementing the 'Erroneous Completely At Random' error mechanism.

ENAR

ErrorMechanism subclass implementing the Erroneous Not At Random error mechanism.

Package Contents#

class tab_err.error_mechanism.EAR#

Bases: tab_err.error_mechanism._error_mechanism.ErrorMechanism

ErrorMechanism subclass implementing the Erroneous Completely At Random error mechanism.

Description:

Errors are assumed to be completely independent of the data distribution

class tab_err.error_mechanism.ECAR#

Bases: tab_err.error_mechanism._error_mechanism.ErrorMechanism

ErrorMechanism subclass implementing the ‘Erroneous Completely At Random’ error mechanism.

Description:

Errors are assumed to be completely independent of the data distribution

class tab_err.error_mechanism.ENAR(condition_to_column: int | str | None = None, seed: int | None = None)#

Bases: tab_err.error_mechanism._error_mechanism.ErrorMechanism

ErrorMechanism subclass implementing the Erroneous Not At Random error mechanism.

Description:

Errors are assumed to depend on either other variables, the incorrect data itself, or both.

sample(data: pandas.DataFrame, column: str | int, error_rate: float, error_mask: pandas.DataFrame | None = None) pandas.DataFrame#

Returns an error mask for locations to introduce errors in a pandas DataFrame.

Description:

Does error checking for the abstract method ‘_sample’. Assigns the _random_generator attribute. Calls subclass _sample method.

Parameters:
  • data (pd.DataFrame) – DataFrame containing the column to add errors to

  • column (str | int) – The column of ‘data’ to create an error mask for

  • error_rate (float) – Percentage of rows to be affected by errors in range [0,1].

  • error_mask (pd.DataFrame | None, optional) – An existing error mask to add more errors to in the case of the mid-/high-level APIs. Defaults to None.

Raises:
  • ValueError – If error rate is out of the [0,1] interval, a ValueError is thrown

  • TypeError – If the ‘data’ argument is not a pandas dataframe or the data is empty, a TypeError is thrown

  • ValueError – If required and there are not 2 columns in the ‘data’ argument, a ValueError is thrown.

Returns:

Updated dataframe with the generated error mask

Return type:

pd.DataFrame

condition_to_column = None#