tab_err#

Submodules#

Classes#

ErrorMechanism

Error Mechanism Abstract Base Class.

ErrorModel

Combines an error mechanism and error type and defines how many percent of the column should be perturbed.

ErrorType

Error Type Abstract Base Class.

Package Contents#

class tab_err.ErrorMechanism(condition_to_column: int | str | None = None, seed: int | None = None)#

Bases: abc.ABC

Error Mechanism Abstract Base Class.

sample(data: pandas.DataFrame, column: str | int, error_rate: float, error_mask: pandas.DataFrame | None = None) pandas.DataFrame#

Returns an error mask for locations to introduce errors in a pandas DataFrame.

Description:

Does error checking for the abstract method ‘_sample’. Assigns the _random_generator attribute. Calls subclass _sample method.

Parameters:
  • data (pd.DataFrame) – DataFrame containing the column to add errors to

  • column (str | int) – The column of ‘data’ to create an error mask for

  • error_rate (float) – Percentage of rows to be affected by errors in range [0,1].

  • error_mask (pd.DataFrame | None, optional) – An existing error mask to add more errors to in the case of the mid-/high-level APIs. Defaults to None.

Raises:
  • ValueError – If error rate is out of the [0,1] interval, a ValueError is thrown

  • TypeError – If the ‘data’ argument is not a pandas dataframe or the data is empty, a TypeError is thrown

  • ValueError – If required and there are not 2 columns in the ‘data’ argument, a ValueError is thrown.

Returns:

Updated dataframe with the generated error mask

Return type:

pd.DataFrame

condition_to_column = None#
class tab_err.ErrorModel#

Combines an error mechanism and error type and defines how many percent of the column should be perturbed.

error_mechanism#

Instance of an ErrorMechanism that will be applied.

Type:

ErrorMechanism

error_type#

Instance of an ErrorType that will be applied.

Type:

ErrorType

error_rate#

Defines how many percent should be perturbed.

Type:

float

apply(data: pandas.DataFrame, column: str | int) tuple[pandas.DataFrame, pandas.DataFrame]#

Applies the defined ErrorModel to the given column of a pandas DataFrame.

Parameters:
  • data (pd.DataFrame) – The pandas DataFrame to create errors in.

  • column (str | int) – The column to create errors in.

Returns:

  • The first element is a copy of ‘data’ with errors.

  • The second element is the associated error mask.

Return type:

tuple[pd.DataFrame, pd.DataFrame]

error_mechanism: tab_err.ErrorMechanism#
error_rate: float#
error_type: tab_err.ErrorType#
class tab_err.ErrorType(config: tab_err.error_type._config.ErrorTypeConfig | dict | None = None, seed: int | None = None)#

Bases: abc.ABC

Error Type Abstract Base Class.

apply(data: pandas.DataFrame, error_mask: pandas.DataFrame, column: str | int) pandas.Series#

Applies an ErrorType to a column of ‘data’. Does type and shape checking and creates a random number generator.

Parameters:
  • data (pd.DataFrame) – The Pandas DataFrame containing the column where errors are to be introduced.

  • error_mask (pd.DataFrame) – The Pandas DataFrame containing the error mask for ‘column’.

  • column (str | int) – The index in the ‘data’ and ‘error_mask’ DataFrames where errors are to be introduced.

Returns:

The data column, ‘column’, after errors of ErrorType at the locations specified by ‘error_mask’ are introduced.

Return type:

pd.Series

classmethod from_dict(data: dict[str, Any]) ErrorType#

Deserialize an ErrorType object from a dictionary.

Parameters:

data (dict[str, Any]) – A dictionary representation of the ErrorType object.

Returns:

An ErrorType object deserialized from the dictionary.

Return type:

ErrorType

get_valid_columns(data: pandas.DataFrame) list[str | int]#

Finds the valid columns to which the error type can be applied. Wrapper around _get_valid_columns.

to_dict() dict[str, Any]#

Serialized the ErrorType object into a dictionary.

Returns:

A dictionary representation of the ErrorType object.

Return type:

dict[str, Any]