Using Schemas in Kosh¶
This notebook shows how to use schema in Kosh to validate your metadata
import kosh
import os
kosh_example_sql_file = "kosh_schemas_example.sql"
# Create and open a new store (erase if exists)
store = kosh.connect(kosh_example_sql_file, delete_all_contents=True)
# create a dataset
dataset = store.create()
Let's create a schema to validate our metadata a schema object takes two dictionaries as input one for the required attributes and one for the optional attributes
For each attributes we need to provide validation functions or valid values
- If the "validation" is a callable it will be applied on values of the attribute and must pass and return True
- If the validation is an instance of 'type' the attribute must be an instance of the validation type
- Otherwise the value must match "validation"
It is possible though to have multiple possible validations for a single attribute, simply define them in the dictionary as a list, if any validation passes the attribute is considered valid
Let's create a validation schema that requires our datasets to have the attribute "must" with any value and allow for an attribute 'maybe' that must be one of 1, "yes" or True
required = {"must": None}
optional = {"maybe": [1, "yes"]}
schema = kosh.KoshSchema(required, optional)
Our current (blank) dataset will not validate, we can first try it as follow:
try:
schema.validate(dataset)
except ValueError as err:
print("As expected, we failed to validate with error:", err)
As expected, we failed to validate with error: Could not validate 9e0152167054428692bb87486de0f891 1 required attribute errors: {'must': AttributeError('Object 9e0152167054428692bb87486de0f891 does not have must attribute')} 0 optional attributes errors: {}
# Let's add the attribute
dataset.must = "I have must"
# Validation now passes
schema.validate(dataset)
True
Now let's have must as an integer
required = {"must": int}
optional = {"maybe": [1, "yes"]}
schema = kosh.KoshSchema(required, optional)
# it does not validate anymore
try:
schema.validate(dataset)
except ValueError as err:
print("As expected, it now fails to validate with error:", err)
As expected, it now fails to validate with error: Could not validate 9e0152167054428692bb87486de0f891 1 required attribute errors: {'must': ValueError('value I have must failed validation')} 0 optional attributes errors: {}
dataset
KOSH DATASET id: 9e0152167054428692bb87486de0f891 name: Unnamed Dataset creator: cdoutrix --- Attributes --- creator: cdoutrix must: I have must name: Unnamed Dataset --- Associated Data (0)--- --- Ensembles (0)--- [] --- Ensemble Attributes ---
# Let's fix this
dataset.must = 5
# It now validates
schema.validate(dataset)
True
# Note that any extra attribute is ok but will not be checked for validation
dataset.any = "hi"
schema.validate(dataset)
True
# We can now enforce this schema subsequently
dataset.schema = schema
# Now we cannot set `must` to a bad value
try:
dataset.must = 7.6
except ValueError as err:
print("Failed to set attribute as it did not validate (must be int). Error:", err)
Failed to set attribute as it did not validate (must be int). Error: value 7.6 failed validation
# Still at 5
dataset.must
5
Note that when setting the schema attribute all attributes of the dataset will be checked
dataset2 = store.create()
dataset2.must = 7.6
try:
dataset2.schema = schema
except:
pass
# Similarly optional attribute must validate
try:
dataset.maybe = "b"
except ValueError as err:
print("Optional attributes must validate as well. Error:", err)
Optional attributes must validate as well. Error: Could not validate value 'b'
dataset.maybe = "yes"
dataset.maybe = 1
Now sometimes we need more complex validation let's create a simple validation function
def isYes(value):
if isinstance(value, str):
return value.lower()[0] == "y"
elif isinstance(value, int):
return value == 1
required = {"must": int}
optional = {"maybe": isYes}
schema = kosh.KoshSchema(required, optional)
dataset.schema = schema
dataset.maybe = "y"
we can also pass list of possible validations
def isNo(value):
if isinstance(value, str):
return value.lower()[0] == "n"
elif isinstance(value, int):
return value == 0
required = {"must": int}
optional = {"maybe": [isYes, isNo, "oui"]}
schema = kosh.KoshSchema(required, optional)
dataset.schema = schema
dataset.maybe = "N"
dataset.maybe = 'No'
dataset.maybe = 'oui'
dataset.maybe = 'Yes'