research_client.datavalidator.schemas

Prototypes to conveniently define data classes with built-in validation.

Classes

CField

Defines a manually validated data field for DataSchema classes.

CFieldList

Defines a field containing an arbitrary number of CField data.

DataField

Base class for data schema fields.

DataFieldList

Defines a field containing an arbitrary number of DataField data.

DataGroup

Defines a data group for DataSchema classes.

DataSchema

Abstract base class to define auto-validating data classes.

VField

Defines an auto-validated data field for DataSchema classes.

VFieldList

Defines a field containing an arbitrary number of VField data.

class CField[source]

Bases: DataField

Defines a manually validated data field for DataSchema classes.

__init__(name, type_, typedesc, vmethod, forcecast=None, required=True)[source]

Instantiates a new CField.

Parameters:
  • name (str)

  • type_ (str)

  • typedesc (str)

  • vmethod (Callable[[Any, Any], Any])

  • forcecast (Optional[bool])

  • required (bool)

_fieldparams = [('name', <class 'str'>, 'The name of the DataField'), ('type_', typing.Any, 'The data type for the DataField'), ('typedesc', <class 'str'>, 'A user-intelligible description of the data type'), ('vmethod', typing.Callable[[typing.Any, typing.Any], typing.Any], 'A callable accepting a two arguments: the first is the type_ of the DataField and the second the value. The callable should raise either a TypeError or a DataValidationError if the value is invalid, and must return a (possibly processed) version of the value it was passed which fits the type_ it was passed.'), ('forcecast', typing.Optional[bool], 'Whether to force casting of data during validation', None), ('required', <class 'bool'>, 'Whether the field is required', True)]

Type:    list[Union[tuple[str, Any, str], tuple[str, Any, str, Any]]]

forcecast

Type:    Optional[bool]

vmethod

Type:    Union[str, Callable[[Any, Any], Any]]

class CFieldList[source]

Bases: CField, DataFieldList

Defines a field containing an arbitrary number of CField data.

class DataField[source]

Bases: object

Base class for data schema fields.

To instantiate, use VField, CField, or shorthand notation in a DataSchema declaration.

__init__(name, type_, typedesc, required=True)[source]

Instantiates a new DataField.

Parameters:
  • name (str)

  • type_ (Any)

  • typedesc (str)

  • required (bool)

classmethod fieldparams()[source]

Returns the parameter list for a DataField of this type.

Return type:

list[Union[tuple[str, Any, str], tuple[str, Any, str, Any]]]

fieldspecs()[source]

Returns the values for each parameter of the DataField.

Return type:

dict[str, Any]

_fieldparams = [('name', <class 'str'>, 'The name of the DataField'), ('type_', typing.Any, 'The data type for the DataField'), ('typedesc', <class 'str'>, 'A user-intelligible description of the data type'), ('required', <class 'bool'>, 'Whether the field is required', True)]

Type:    list[Union[tuple[str, Any, str], tuple[str, Any, str, Any]]]

name

Type:    str

required

Type:    bool

type_

Type:    Any

typedesc

Type:    str

class DataFieldList[source]

Bases: DataField

Defines a field containing an arbitrary number of DataField data.

class DataGroup[source]

Bases: object

Defines a data group for DataSchema classes.

__init__(name, fields)[source]

Instantiates a new DataGroup.

Parameters:
  • name (str)

  • fields (dict[str, Union[DataField, dict[str, Any]]])

getfield(key)[source]

Gets the data field with the name indicated by key.

Return type:

Union[dict[str, Any], DataField]

Parameters:

key (str)

class DataSchema[source]

Bases: object

Abstract base class to define auto-validating data classes.

@TODO: - __getattr__(self, name) - to retreive data
  • __setattr__(self, name, value) - to set data (with validation)

  • __delattr__(self, name) - to remove/clear a datapoint

  • Move some __new__ stuff to __init_subclass__(cls)?

  • JSON import/export

__delfieldfactory(fieldname, fieldspecs)
Return type:

Callable[[DataSchema], None]

Parameters:
  • fieldname (str)

  • fieldspecs (dict[str, Any])

__delfieldlistfactory(fieldname, fieldspec)
Return type:

Callable[[DataSchema], None]

Parameters:
  • fieldname (str)

  • fieldspec (dict[str, Any])

__delgroupfactory(gname, fieldspecs)
Return type:

Callable[[DataSchema], None]

Parameters:
  • gname (str)

  • fieldspecs (dict[str, dict[str, Any]])

classmethod __functionalize()

Dynamically creates and attaches methods to get/set values.

Return type:

None

__getfieldfactory(fieldname, fieldspecs)
Return type:

Callable[[DataSchema], Any]

Parameters:
  • fieldname (str)

  • fieldspecs (dict[str, Any])

__getfieldlistfactory(fieldname, fieldspec)
Return type:

Callable[[DataSchema], list[Any]]

Parameters:
  • fieldname (str)

  • fieldspec (dict[str, Any])

classmethod __getfieldspecs(key)

Get the specifications for a single DataField or fields in a DataGroup.

Return type:

dict[str, dict[str, Any]]

Parameters:

key (str)

__getgroupfactory(gname, fieldspecs)
Return type:

Callable[[DataSchema], dict[str, Any]]

Parameters:
  • gname (str)

  • fieldspecs (dict[str, dict[str, Any]])

classmethod __index()

Creates a flat list of index keys, separating groups and fields with “/”.

Return type:

list[str]

__init__(forcecast=True, ignorecase=True)[source]

Initialises a new DataSchema object.

Parameters:
  • forcecast (bool)

  • ignorecase (bool)

classmethod __materialize()

Creates an empty __data store pre-populated with DataGroups/DataFieldLists.

Return type:

dict[str, Union[dict[str, Any], Any]]

static __new__(cls, *args, **kwargs)[source]

Constructs a new DataSchema instance.

Parameters:
  • args (Any)

  • kwargs (Any)

classmethod __schematize(schema, schemaname)
Return type:

Union[DataGroup, DataField]

Parameters:
__setfieldfactory(fieldname, fieldspec)
Return type:

Callable[[DataSchema, Any], None]

Parameters:
  • fieldname (str)

  • fieldspec (dict[str, Any])

__setfieldlistfactory(fieldname, fieldspec)
Return type:

Callable[[DataSchema, list[Any]], None]

Parameters:
  • fieldname (str)

  • fieldspec (dict[str, Any])

__setgroupfactory(gname, fieldspecs)
Return type:

Callable[[DataSchema, dict[str, Any]], None]

Parameters:
  • gname (str)

  • fieldspecs (dict[str, dict[str, Any]])

_autovalidate(vr, fieldspec, value)[source]

Calls the appropriate validation method for fieldspec and value.

Return type:

ValidationResult

Parameters:
  • vr (Validator)

  • fieldspec (dict[str, Any])

  • value (Any)

_customvalidate(vr, fieldspec, value)[source]

Calls a custom validation method on value and appends result to vr.

Return type:

ValidationResult

Parameters:
  • vr (Validator)

  • fieldspec (dict[str, Any])

  • value (Any)

_getfield(key)[source]

Gets the field or group addressed by key.

Return type:

Union[DataField, DataGroup]

Parameters:

key (str)

_getvalue(key)[source]

Gets the value from the data addressed by key.

Return type:

Any

Parameters:

key (str)

static _isna(value)[source]

Returns True if value is either None or [], False otherwise.

Return type:

bool

Parameters:

value (Any)

_setvalue(key, value)[source]

Sets the value for the data addressed by key (without validation).

Return type:

None

Parameters:
  • key (str)

  • value (Any)

data(includemissing=False, onlyrequired=False)[source]

Returns the data of the DataSchema as a schematic dictionary.

Return type:

dict[str, Union[dict[str, Any], Any]]

Parameters:
  • includemissing (bool)

  • onlyrequired (bool)

iscomplete(onlyrequired=True)[source]

Checks whether the dataset is complete.

Return type:

bool

Parameters:

onlyrequired (bool)

items(includemissing=False, onlyrequired=False)[source]

Returns a list of key-value pairs for data in the DataSchema.

Return type:

list[tuple[str, Any]]

Parameters:
  • includemissing (bool)

  • onlyrequired (bool)

keys(includemissing=False, onlyrequired=False)[source]

Returns a list of keys for the DataSchema.

Return type:

list[str]

Parameters:
  • includemissing (bool)

  • onlyrequired (bool)

missing(onlyrequired=True)[source]

Return a list of keys for missing fields.

Return type:

list[str]

Parameters:

onlyrequired (bool)

values(includemissing=False, onlyrequired=False)[source]

Returns a list of values for the data in the DataSchema.

Return type:

list[Any]

Parameters:
  • includemissing (bool)

  • onlyrequired (bool)

__data

Type:    dict[str, Union[dict[str, Any], Any]]

__keys

Type:    list[str]

__schema

Type:    dict[str, Union[DataGroup, dict[str, Union[DataField, dict[str, Any]]], DataField, dict[str, Any]]]

__schematized

Type:    bool

forcecast

Type:    bool

ignorecase

Type:    bool

class VField[source]

Bases: DataField

Defines an auto-validated data field for DataSchema classes.

__init__(name, type_, typedesc, constraint, forcecast=None, ignorecase=None, flags=0, required=True)[source]

Instantiates a new VField.

Parameters:
  • name (str)

  • type_ (str)

  • typedesc (str)

  • constraint (Any)

  • forcecast (Optional[bool])

  • ignorecase (Optional[bool])

  • flags (Union[RegexFlag, int])

  • required (bool)

_fieldparams = [('name', <class 'str'>, 'The name of the DataField'), ('type_', typing.Any, 'The data type for the DataField'), ('typedesc', <class 'str'>, 'A user-intelligible description of the data type'), ('constraint', typing.Any, 'A DataValidator constraint approprite for type_'), ('forcecast', typing.Optional[bool], 'Whether to force casting of data during validation', None), ('ignorecase', typing.Optional[bool], 'Whether to ignore case for string-type data validation', None), ('flags', typing.Union[re.RegexFlag, int], 'Flags to pass to the regular expression engine if type_ is `str`.', 0), ('required', <class 'bool'>, 'Whether the field is required', True)]

Type:    list[Union[tuple[str, Any, str], tuple[str, Any, str, Any]]]

constraint

Type:    Any

flags

Type:    Union[RegexFlag, int]

forcecast

Type:    Optional[bool]

ignorecase

Type:    Optional[bool]

class VFieldList[source]

Bases: VField, DataFieldList

Defines a field containing an arbitrary number of VField data.