Creating a loader with Kosh¶
In this example we will create a custom loader for some ASCII representation
The structure is
- Headers at the begining of the file starting with #
- # varname vs axis
- indicates new variable with name
varname
- var_value axis_value
- repeated n times
- end
- marks end of the current variable
We will assume the user already has functions to read the data in. These can be found in the some_user_io_functions.py file.
The function to read data in is called load_variable_from_file
, the one to list the features in the file is called get_variable_names
.
import os
import kosh
# Make sure local file is new sql file
kosh_example_sql_file = "my_store.sql"
# Create db on file
store = kosh.connect(kosh_example_sql_file, delete_all_contents=True)
# Add dataset to the store
sample = store.create(name="example", metadata={'project':"example"})
# Associate file with datasets
sample.associate("example.ultra", mime_type="custom")
'3d460527af804fc2bad73bd7d89ca5fc'
Let's create our CustomLoader
inheriting from kosh.KoshLoader
For this we will need to:
- register the types we can read at the class level (not in
__init__
) and the format each type can be exported as. - types : { "custom" : ["numpy",] }
- IMPORTANT the keys in this dictionary is what Kosh uses to tie the loader to a mime_type
- implement the
extract
function to read data in - desired feature is in
self.feature
- potential keywords are in:
self._user_passed_parameters[1]
- kosh object describing the source is in
self.obj
(can query its attributes if desired) - source uri is at:
self.obj.uri
- The function to read a variable from a file is:
load_variable_from_file
- implement the
list_features(self)
function, using theget_variable_names
helper function. - optionally implement the
describe_feature(self, feature)
import sys, os
sys.path.append(".")
from some_user_io_functions import load_variable_from_file, get_variable_names
Let's query the function documentation
load_variable_from_file?
Signature: load_variable_from_file(filepath, variable_names) Docstring: Load the variable 'variable_name' for a file at filepath :param filepath: path to the file to read :type filepath: str :param variable_names: Name of the variable(s) to read in file :type variable_names: str or list :return: A numpy array containing the variable(s) values :rtype: numpy.ndarray File: ~/git/kosh/examples/some_user_io_functions.py Type: function
from kosh import KoshLoader
import numpy
class CustomLoader(KoshLoader):
types ={"custom": ["numpy", ]} # keys ties things back to mime_type in associate function
def extract(self, *args, **kargs):
return load_variable_from_file(self.obj.uri, self.feature)
def list_features(self):
return get_variable_names(self.obj.uri)
def describe_feature(self, feature):
var = load_variable_from_file(self.obj.uri, feature)
info = {"name": feature, "size": var.shape}
return info
At this point we need to register/add our loader with the store (let's save it in the store as well).
store.add_loader(CustomLoader)
print(sample.list_features())
['time', 'energy', 'var2']
sample.describe_feature("energy")
{'name': 'energy', 'size': (8,)}
Or extract its features
print(sample.get("energy"))
[0.6 0.7 0.8 0.6 0.5 0.2 0.1 0.6]
Some advanced tips¶
Slicing¶
Please see the advanced slicing notebook for tips on how to make your loader more efficient in regard to slicing
Who asked for this data?¶
In some case the loader might want to know who requested the data (for example to access some attributes on this dataset). You can retrieve the requestor via the get_requestor()
function on any loader.
In the example bellow we will show an example of enhanced hdf5 loader that returns a slice of the data if some attributes are presernt in the requesting dataset.
class SlicerLoader(kosh.loaders.HDF5Loader):
def extract(self):
# let's get the requestor
req = self.get_requestor()
# Let's see if we can get the slicing info from it?
rng_min = getattr(req, "range_min", 0)
rng_max = getattr(req, "range_max", None)
rng_step = getattr(req, "range_step", 1)
h5 = super(SlicerLoader,self).extract()
# let's slice the second dimension
return h5[:, rng_min:rng_max:rng_step]
ds1 = store.create(metadata={"range_min":4, "range_max":8, "range_step":2})
ds1.associate("sample_files/run_000.hdf5", "hdf5")
ds2 = store.create()
ds2.associate("sample_files/run_000.hdf5", "hdf5")
# remove the hdf5 loader to make sure it picks up ours
del store.loaders["hdf5"]
store.add_loader(SlicerLoader)
# The full dataset (since the attributes are not present)
print(ds2["node/metrics_9"][:].shape)
# Now if we access the dataset with the attributes
print(ds1["node/metrics_9"][:].shape)
# Changing the attributes will change the output
ds1.range_max += 4
ds1.range_step = 1
print(ds1["node/metrics_9"][:].shape)
(2, 18) (2, 2) (2, 8)