research_client.datavalidator.validation

Validation tools for the datavalidator package.

Classes

ValidationResult

Data validation result.

Validator

Data validation interface.

class ValidationResult[source]

Bases: object

Data validation result.

Each ValidationResult represents the result of a data validation attempt in a Validator. The ValidationResult can be queried for details on the data that was validated, whether it passed/failed validation, what the requirements for validation were, etc. A ValidationResult will evaluate to True if the validation has succeeded and to False if it has failed.

__init__(success, type_, typedesc, constraint, data, rawdata, casting)[source]

Constructs a new ValidationResult.

Parameters:
  • success (bool) – Whether the validaton has succeeded or failed.

  • type – The built-in data type against which validation was carried out.

  • typedesc (str) – A short user-intelligible description of the data’s type.

  • constraint (Union[str, tuple[int, int], tuple[float, float], list[int], list[float], Iterable[Any], tuple[Iterable[Any], Iterable[Any]], list[Iterable[Any]], dict[Any, Any], bool, None]) – A constraint data type appropriate to the type of the data, see also the types submodule.

  • data (Any) – The data that was evaluated (after casting if the forcecasting and/or softcasting options were active.)

  • rawdata (Any) – The raw, uncast data as it was passed to the validation method.

  • casting (bool) – Whether casting was applied to rawdata to yield data.

  • type_ (Any)

tohtml()[source]

Returns HTML formatted explanation of the data validation result.

Return type:

str

tojson()[source]

Returns a JSON representation of the data validation result.

Return type:

str

tostring()[source]

Returns string explanation of the data validation result.

Return type:

str

casting

Type:    bool

Whether the data in data was cast or not.

Note that this does not necessarily mean that casting was necessary, e.g. an integer that was passed to a Validator’s vint() method will still have been cast to int() and set the casting attribute to True despite being of type int before.

constraint

Type:    Union[str, tuple[int, int], tuple[float, float], list[int], list[float], Iterable[Any], tuple[Iterable[Any], Iterable[Any]], list[Iterable[Any]], dict[Any, Any], bool, None]

The constraint, if any, which was used to validate the data.

data

Type:    Any

The data itself, possibly cast.

This is the data post-casting if force- or softcasting were used, and can be used to automatically ensure typecasting or type-narrowing for data storage. For the raw data pre-casting use the raw_data attribute.

rawdata

Type:    Any

The raw data, as it was passed to the validator.

This is always the data as it was passed to the Validator, irrespective of whether forcecasting or softcasting were applied.

success

Type:    bool

Whether the data has been succeessfully validated or not.

type_

Type:    Any

The internal data type indicated for validation.

typedesc

Type:    str

A user-directed description of the type of data being validated.

class Validator[source]

Bases: object

Data validation interface.

A Validator offers a convenient interface for validating a set of data points, of the same or different types. The Validator will store any failed validation results, can optionally force casting of the data to a specific type, can be evaluated for success in a boolean expression, and allows for the conditional raising of a DataValidationError exception if any validation attempts have failed.

A single Validator should only be used once for a closed set of data, as reuse will add the results to the existing Validator and always evaluate False if it has previously had unsuccessful validation attempts (though under some circumstances, e.g. the successive building of datasets with late repairs, this may be desirable).

__casefolddict(dict_)
Return type:

dict[TypeVar(XT), TypeVar(YT)]

Parameters:

dict_ (dict[~XT, ~YT])

__casefoldifstr(x)
Return type:

TypeVar(XT)

Parameters:

x (XT)

__condcast(type_, data, overwrite=None)

Conditionally casts data to a type.

Attempts to cast data to type_ if, taking into account overwrite, forcecasting applies. Returns data itself if forcecasting doesn’t apply, and None if forcecasting applies but data cannot be cast to type_.

Return type:

Union[TypeVar(XT), TypeVar(YT), None]

Parameters:
  • type_ (Callable[[...], XT])

  • data (YT)

  • overwrite (Optional[bool])

__forcecast(overwrite=None)
Return type:

bool

Parameters:

overwrite (Optional[bool])

__ignorecase(overwrite=None)
Parameters:

overwrite (Optional[bool])

__init__(forcecast=False, ignorecase=False)[source]

Constructs a new Validator.

Parameters:
  • forcecast (bool, default: False) – Whether to force casting of the data arguments to the validation methods to the indicated type (e.g. str for .validatestring()). This will set the default behaviour for validation calls, but can be overwritten by passing the named argument forcecast=True or forcecast=False on individual method calls. For some methods, e.g. polars and enums, casting is done by passing the matched value rather than typecasting.

  • ignorecase (bool, default: False) – Whether to ignore case in string comparisons. If true, strings will be compared in all uppercase, and regular expression matches will be passed the IGNORECASE flag. Can be overwritten on each validation call by passing ignorecase=True/False. Default value: False.

__storeresult(result)
Parameters:

result (ValidationResult)

__trycall(func, *args, **kwargs)

Returns result of func() if possible, None if an Exception is raised.

Return type:

Optional[TypeVar(XT)]

Parameters:
  • func (Callable[[...], XT])

  • args (Any)

  • kwargs (dict[Any, Any])

raiseif()[source]

Raises a DataValidationError iff at least one validation has failed.

tohtml(errorsonly=False)[source]

Returns a paragraph-by-paragraph HTML representation of validation attempts.

Parameters:

errorsonly (bool, default: False) – Whether to include only the errors or all validation attempts.

tostring(errorsonly=False)[source]

Returns a line-by-line string representation of validation attempts.

Parameters:

errorsonly (bool, default: False) – Whether to include only the errors or all validation attempts.

vbool(typedesc, constraint, data, forcecast=None)[source]

Validates a bool.

Parameters:
  • typedesc (str) – An end-user intelligible description of the desired data type, e.g. “User ID” or “postcode”.

  • constraint (bool) – A boolean that must be matched, or None to just validate any bool.

  • data (Any) – The data to be validated.

  • forcecast (Optional[bool], default: None) – Optional argument to overwrite the objects default setting for forced casting of data arguments.

venum(typedesc, constraint, data, forcecast=None, ignorecase=None)[source]

Validates a string against a key:value enumerable.

Checks whether data is either contained in the keys or the values of the enumerable. If forcecasting is used, it casts to the value that was matched (not the key), or to None if no match was found.

Return type:

ValidationResult

Parameters:
  • typedesc (str)

  • constraint (dict[Any, Any])

  • data (Any)

  • forcecast (Optional[bool])

  • ignorecase (Optional[bool])

vfloat(typedesc, constraint, data, forcecast=None)[source]

Validates a float against an inclusive range of integers or floats.

Parameters:
  • typedesc (str) – An end-user intelligible description of the desired data type, e.g. “User ID” or “postcode”.

  • constraint (Union[tuple[int, int], tuple[float, float], list[int], list[float]]) – A two member tuple or list of integers where the first element represents the inclusive lower bound and the second member the inclusive upper bound of the permissible range of integer values. For example, (3, 5) would successfully validate the data inputs 3, 4, 5, but fail validation for 2 or 6.

  • data (Any) – The data to be validated.

  • forcecast (Optional[bool], default: None) – Optional argument to overwrite the objects default setting for forced casting of data arguments.

vint(typedesc, constraint, data, forcecast=None)[source]

Validates an integer against an inclusive range of integers or floats.

Parameters:
  • typedesc (str) – An end-user intelligible description of the desired data type, e.g. “User ID” or “postcode”.

  • constraint (Union[tuple[int, int], tuple[float, float], list[int], list[float]]) – A two member tuple or list of integers where the first element represents the inclusive lower bound and the second member the inclusive upper bound of the permissible range of integer values. For example, (3, 5) would successfully validate the data inputs 3, 4, 5, but fail validation for 2 or 6.

  • data (Any) – The data to be validated.

  • forcecast (Optional[bool], default: None) – Optional argument to overwrite the validator’s default setting for forced casting of data arguments.

Return type:

ValidationResult

vpolar(typedesc, constraint, data, forcecast=None, ignorecase=None)[source]

Validates a string against two sets of polar terms.

Checkes whether data is in either of two sets of polar opposition terms. If the forcecast option is active, membership in the first of the two sets results in casting to True, membership in the second set to False, and memebership in neither set to None. Validation is successful if data is contained in either of the two sets, and unsuccessful otherwise.

Parameters:
  • typedesc (str) – An end-user intelligible description of the desired data type, e.g. “User ID” or “postcode”.

  • constraint (Union[tuple[Iterable[Any], Iterable[Any]], list[Iterable[Any]]]) – A two member tuple or list of containers of any type (must support membership testing with in). The first member is a container of acceptable truthy values, the second is a container of acceptable falsy values. Note that python built-in boolean types True and False are always validated as correct.

  • data (Any) – The data to be validated.

  • forcecast (Optional[bool], default: None) – Optional argument to overwrite the objects default setting for forced casting of data arguments.

  • ignorecase (Optional[bool], default: None) – Optional argument to overwrite the validator’s default setting for case sensitivity.

Return type:

ValidationResult

vstr(typedesc, constraint, data, forcecast=None, ignorecase=None, flags=0)[source]

Validates a string against a regular expression pattern.

Parameters:
  • typedesc (str) – An end-user intelligible description of the desired data type, e.g. “User ID” or “postcode”.

  • constraint (str) – A regular expression to match the string against. Important: Note that the regular expression will implicitly be enclosed by A and Z to match the beginning and end of the string. These thus need not be specified in the pattern provided.

  • data (Any) – The data to be validated.

  • forcecast (Optional[bool], default: None) – Optional argument to overwrite the validator’s default setting for forced casting of data arguments.

  • ignorecase (Optional[bool], default: None) – Optional argument to overwrite the validator’s default setting for case sensitivity.

  • flags (Union[RegexFlag, int], default: 0) – Additional regex flags to be passed to re.match().

Return type:

ValidationResult

failed

Type:    list[ValidationResult]

forcecast

Type:    bool

ignorecase

Type:    bool

results

Type:    list[ValidationResult]

successful

Type:    list[ValidationResult]