Creating a loader with Kosh¶

In this example we will create a custom loader for some ASCII representation

The structure is

Headers at the begining of the file starting with #
# varname vs axis
indicates new variable with name varname
var_value axis_value
repeated n times
end
marks end of the current variable

We will assume the user already has functions to read the data in. These can be found in the some_user_io_functions.py file.

The function to read data in is called load_variable_from_file, the one to list the features in the file is called get_variable_names.

In [1]:

Copied!





import os
import kosh

# Make sure local file is new sql file
kosh_example_sql_file = "my_store.sql"
    
# Create db on file
store = kosh.connect(kosh_example_sql_file, delete_all_contents=True)
import os
import kosh

# Make sure local file is new sql file
kosh_example_sql_file = "my_store.sql"
    
# Create db on file
store = kosh.connect(kosh_example_sql_file, delete_all_contents=True)

In [2]:

Copied!

# Add dataset to the store
sample = store.create(name="example", metadata={'project':"example"})
# Add dataset to the store
sample = store.create(name="example", metadata={'project':"example"})

In [3]:

Copied!

# Associate file with datasets
sample.associate("example.ultra", mime_type="custom")
# Associate file with datasets
sample.associate("example.ultra", mime_type="custom")

Out[3]:

'3d460527af804fc2bad73bd7d89ca5fc'

Let's create our CustomLoader inheriting from kosh.KoshLoader

For this we will need to:

register the types we can read at the class level (not in __init__) and the format each type can be exported as.
types : { "custom" : ["numpy",] }
IMPORTANT the keys in this dictionary is what Kosh uses to tie the loader to a mime_type
implement the extract function to read data in
desired feature is in self.feature
potential keywords are in: self._user_passed_parameters[1]
kosh object describing the source is in self.obj (can query its attributes if desired)
source uri is at: self.obj.uri
The function to read a variable from a file is: load_variable_from_file
implement the list_features(self) function, using the get_variable_names helper function.
optionally implement the describe_feature(self, feature)

In [4]:

Copied!

import sys, os
sys.path.append(".")
from some_user_io_functions import load_variable_from_file, get_variable_names
import sys, os
sys.path.append(".")
from some_user_io_functions import load_variable_from_file, get_variable_names

Let's query the function documentation

In [5]:

Copied!

load_variable_from_file?
load_variable_from_file?

Signature: load_variable_from_file(filepath, variable_names)
Docstring:
Load the variable 'variable_name' for a file at filepath
:param filepath: path to the file to read
:type filepath: str
:param variable_names: Name of the variable(s) to read in file
:type variable_names: str or list
:return: A numpy array containing the variable(s) values
:rtype: numpy.ndarray
File:      ~/git/kosh/examples/some_user_io_functions.py
Type:      function

In [6]:

Copied!





from kosh import KoshLoader
import numpy

class CustomLoader(KoshLoader):
    types ={"custom": ["numpy", ]}  # keys ties things back to mime_type in associate function
    def extract(self, *args, **kargs):
        return load_variable_from_file(self.obj.uri, self.feature)
        
    def list_features(self):
        return get_variable_names(self.obj.uri)
    
    def describe_feature(self, feature):
        var = load_variable_from_file(self.obj.uri, feature)
        info = {"name": feature, "size": var.shape}
        return info
from kosh import KoshLoader
import numpy

class CustomLoader(KoshLoader):
    types ={"custom": ["numpy", ]}  # keys ties things back to mime_type in associate function
    def extract(self, *args, **kargs):
        return load_variable_from_file(self.obj.uri, self.feature)
        
    def list_features(self):
        return get_variable_names(self.obj.uri)
    
    def describe_feature(self, feature):
        var = load_variable_from_file(self.obj.uri, feature)
        info = {"name": feature, "size": var.shape}
        return info

At this point we need to register/add our loader with the store (let's save it in the store as well).

In [7]:

Copied!

store.add_loader(CustomLoader)
store.add_loader(CustomLoader)

We can now query our dataset, as explained in the previous notebook.

In [8]:

Copied!

print(sample.list_features())
print(sample.list_features())

['time', 'energy', 'var2']

In [9]:

Copied!

sample.describe_feature("energy")
sample.describe_feature("energy")

Out[9]:

{'name': 'energy', 'size': (8,)}

Or extract its features

In [10]:

Copied!

print(sample.get("energy"))
print(sample.get("energy"))

[0.6 0.7 0.8 0.6 0.5 0.2 0.1 0.6]

Some advanced tips¶

Slicing¶

Please see the advanced slicing notebook for tips on how to make your loader more efficient in regard to slicing

Who asked for this data?¶

In some case the loader might want to know who requested the data (for example to access some attributes on this dataset). You can retrieve the requestor via the get_requestor() function on any loader.

In the example bellow we will show an example of enhanced hdf5 loader that returns a slice of the data if some attributes are presernt in the requesting dataset.

In [11]:

Copied!





class SlicerLoader(kosh.loaders.HDF5Loader):
    def extract(self):
        # let's get the requestor
        req = self.get_requestor()
        # Let's see if we can get the slicing info from it?
        rng_min = getattr(req, "range_min", 0)
        rng_max = getattr(req, "range_max", None)
        rng_step = getattr(req, "range_step", 1)
        h5 = super(SlicerLoader,self).extract()
        # let's slice the second dimension
        return h5[:, rng_min:rng_max:rng_step]

ds1 = store.create(metadata={"range_min":4, "range_max":8, "range_step":2})
ds1.associate("sample_files/run_000.hdf5", "hdf5")
ds2 = store.create()
ds2.associate("sample_files/run_000.hdf5", "hdf5")

# remove the hdf5 loader to make sure it picks up ours
del store.loaders["hdf5"]
store.add_loader(SlicerLoader)

# The full dataset (since the attributes are not present)
print(ds2["node/metrics_9"][:].shape)

# Now if we access the dataset with the attributes
print(ds1["node/metrics_9"][:].shape)

# Changing the attributes will change the output
ds1.range_max += 4
ds1.range_step = 1
print(ds1["node/metrics_9"][:].shape)
class SlicerLoader(kosh.loaders.HDF5Loader):
    def extract(self):
        # let's get the requestor
        req = self.get_requestor()
        # Let's see if we can get the slicing info from it?
        rng_min = getattr(req, "range_min", 0)
        rng_max = getattr(req, "range_max", None)
        rng_step = getattr(req, "range_step", 1)
        h5 = super(SlicerLoader,self).extract()
        # let's slice the second dimension
        return h5[:, rng_min:rng_max:rng_step]

ds1 = store.create(metadata={"range_min":4, "range_max":8, "range_step":2})
ds1.associate("sample_files/run_000.hdf5", "hdf5")
ds2 = store.create()
ds2.associate("sample_files/run_000.hdf5", "hdf5")

# remove the hdf5 loader to make sure it picks up ours
del store.loaders["hdf5"]
store.add_loader(SlicerLoader)

# The full dataset (since the attributes are not present)
print(ds2["node/metrics_9"][:].shape)

# Now if we access the dataset with the attributes
print(ds1["node/metrics_9"][:].shape)

# Changing the attributes will change the output
ds1.range_max += 4
ds1.range_step = 1
print(ds1["node/metrics_9"][:].shape)

(2, 18)
(2, 2)
(2, 8)