Kosh and Sina Interoperability¶
Table Of Content
- Introduction
- Opening/Creating a New Store
- Adding Entries
- Accessing a Record/Dataset with known id
- Getting Everything In the Store
- Deleting entries
- Updating Entries
- Searching the Store
- Data
Introduction¶
In this notebook we will show you how Kosh and Sina are related and how to do things that both can do, along with things that are better suited for each software.
We will also show how to make them work together.
Kosh uses Sina under the hood, for the purpose of this notebooks, both Sina and Kosh will work off the same store.
Opening/Creating a new store.¶
SQLite¶
Both Sina and Kosh will create a store for you if it does not exists.
# Cleanup first in case we ran this before
import os
import sys
if os.path.exists("my_sina_store.sql"):
os.remove("my_sina_store.sql")
if os.path.exists("my_kosh_store.sql"):
os.remove("my_kosh_store.sql")
# Sina
import sina
# New or existing store
store_sina = sina.connect("my_sina_store.sql")
# If you want to clear the data in the store
store_sina.delete_all_contents(force="SKIP PROMPT")
# Kosh
import kosh
# New or existing store
store_kosh = kosh.connect("my_kosh_store.sql")
# You can also delete its content
store_kosh.delete_all_contents(force="SKIP PROMPT")
# Kosh let you wipe the data on loading
store_kosh = kosh.connect("my_kosh_store.sql", delete_all_contents=True)
# Kosh can open a Sina store, we will use it for the rest of this notebook
# so that both Sina and Kosh operate on the same store
# You will get a warning because this store does have have some of Kosh reserved features
store_kosh = kosh.connect("my_sina_store.sql")
MySql¶
# Sina
# mysql_store_sina = sina.connect("mysql://<your_username>@:/>read_default_file=<path_to_cnf>")
# Kosh
# mysql_store_kosh = kosh.connect("mysql://<your_username>@:/>read_default_file=<path_to_cnf>")
Casandra¶
???
NOTE
Kosh and Sina store are mostly interchangeable, you can access the sina store and records directly from a Kosh store.
# You can access the Sina store from a Kosh store
the_sina_store = store_kosh.get_sina_store()
# Or the records
records = store_kosh.get_sina_records()
# or from the store
records = the_sina_store.records
# Sina
from sina.model import Record
sina_record = Record(id="my_id", type="my_chosen_type")
store_sina.records.insert(sina_record)
# Kosh
# type will be 'dataset', random unique id will be generated
kosh_dataset_record = store_kosh.create()
# Picking id and type
kosh_dataset_record_2 = store_kosh.create(id="some_id", sina_type="some_type")
import sina
sina_records = sina.utils.convert_json_to_records_and_relationships("sina_curve_rec.json")
for sina_record in sina_records:
store_sina.records.insert(sina_record)
You can also ingest data outside of Python
!sina ingest --database my_sina_store.sql sina_curve_rec_2.json
/usr/bin/sh: sina: command not found
rec = sina_records[0][0]
Kosh¶
Similarly, Kosh has its own export
/import
functions, that are using Sina's json format under the hood.
Kosh can import Sina json files directly as well.
The match_attributes
is here to help resolving conflicts with other datasets already in the store.
store_kosh.import_dataset?
datasets = store_kosh.import_dataset("sina_curve_rec.json", match_attributes=["name", "id"])
datasets = store_kosh.import_dataset("kosh_dataset.json", match_attributes=["name", "id"])
datasets = store_kosh.import_dataset(kosh_dataset_record.export(), match_attributes=["name", "id"])
list(datasets)
[<kosh.core_sina.KoshSinaObject at 0x2aabed376210>]
Signature: store_kosh.import_dataset(datasets, match_attributes=['name'], merge_handler=None, merge_handler_kargs={}) Docstring: import datasets and ensembles that were exported from another store, or load them from a json file :param datasets: Dataset/Ensemble object exported by another store, a dataset/ensemble or a json file containing these. :type datasets: json file, json loaded object, KoshDataset or KoshEnsemble :param match_attributes: parameters on a dataset to use if this it is already in the store in general we can't use 'id' since it is randomly generated at creation If the "same" dataset was created in two different stores (e.g running the same code twice but with different Kosh store) the dataset would be identical in both store but with different ids. This helps you make sure you do not end up with duplicate entries. Warning, if this parameter is too lose too many datasets will match and the import will abort, if it's too tight duplicates will not be identified. :type match_attributes: list of str :param merge_handler: If found dataset has attributes with different values from imported dataset how do we handle this? Accept values are: None, "conservative", "overwrite", "preserve", or a function. The function decalartion should be: foo(store_dataset, imported_dataset_attributes_dict, section, **merge_handler_kargs) Where `store_dataset` is the destination kosh dataset or its non-data dictionary section `imported_dataset_attributes_dict` is a dictionary of attributes/values of the dataset being imported `section` is the section of the record being updated `merge_handler_kargs` is a dict of passed for this function And return a dictionary of attributes/values the target_dataset should have. :type merge_handler: None, str, func :param merge_handler_kargs: If a function is passed to merge_handler these keywords arguments will be passed in addition to this store dataset and the imported dataset. :type merge_handler_kargs: dict :return: list of datasets :rtype: list of KoshSinaDataset File: ~/miniconda3/envs/kosh/lib/python3.7/site-packages/kosh/store.py Type: method
Accessing a Record/Dataset with known id¶
# Sina
my_rec = store_sina.records.get("obj1")
print(my_rec)
# Kosh
dataset = store_kosh.open("an_id")
print(dataset)
Model Record <id=obj1, type=some_type> KOSH DATASET id: an_id name: Unnamed Dataset creator: anonymous --- Attributes --- creator: anonymous name: Unnamed Dataset --- Associated Data (0)--- --- Ensembles (0)--- [] --- Ensemble Attributes ---
Getting Everything In the Store¶
# Sina
sina_all = store_sina.records.get_all()
# sina_all = store_sina.records.find()
# Kosh
# Will only return "datasets" (not associated sources, see bellow)
kosh_all = store_kosh.find()
Deleting Entries¶
# Sina
store_sina.records.delete(sina_record)
# or id
store_sina.records.delete("obj2")
# Kosh
# Using the dataset itself
store_kosh.delete(kosh_dataset_record)
# Or the id
store_kosh.delete("an_id")
Updating Entries¶
# Sina
rec = store_sina.records.get("my_id")
rec.add_data("pi", 3.14159)
# or
rec["data"]["pi_over_2"] = {"value": 1.57}
print(rec["data"])
# Note that the record is NOT updated in the database yet
print(store_sina.records.get("my_id")["data"])
kosh_rec = store_kosh.open("my_id")
print(kosh_rec) # not updated
# Let's update
store_sina.records.delete("my_id")
store_sina.records.insert(rec)
print(kosh_rec) # Updated live no need to fetch again
{'pi': {'value': 3.14159}, 'pi_over_2': {'value': 1.57}} {} KOSH DATASET id: my_id name: ??? creator: ??? --- Associated Data (0)--- --- Ensembles (0)--- [] --- Ensemble Attributes --- KOSH DATASET id: my_id name: ??? creator: ??? --- Attributes --- pi: 3.14159 pi_over_2: 1.57 --- Associated Data (0)--- --- Ensembles (0)--- [] --- Ensemble Attributes ---
# Kosh
ds = store_kosh.open("some_id")
ds.pi = 3.14159
ds.pi_over_2 = 1.57
# Store is updated
# Kosh way
ds2 = store_kosh.open("some_id")
print(ds2)
# Sina way
print(store_sina.records.get("some_id")["data"])
KOSH DATASET id: some_id name: Unnamed Dataset creator: cdoutrix --- Attributes --- creator: cdoutrix name: Unnamed Dataset pi: 3.14159 pi_over_2: 1.57 --- Associated Data (0)--- --- Ensembles (0)--- [] --- Ensemble Attributes --- {'creator': {'value': '29227d615664b750489776379f5cd287'}, 'name': {'value': 'Unnamed Dataset'}, '_associated_data_': {'value': None}, 'pi': {'value': 3.14159}, 'pi_over_2': {'value': 1.57}}
Searching the Store¶
Sina is designed to help you query your store in many different ways. Kosh is designed to help you get to your external data fast and easily
You can use sina query capabilities to pinpoint your Kosh datasets.
Reminder: You can access sina store and sina records directly from an opened Kosh store.
At its most basic think of Kosh's find
function as an analog of Sina's find
function
Sina let you query the store in many ways, and has much more advanced and efficient queries than Kosh
Kosh can do similar things, usually less efficiently, but within one function call only.
Search records by type¶
# Sina
list(store_sina.records.find(types=["some_type",]))
list(store_sina.records.find_with_type("some_type"))
[Model Record <id=1b1c6f1b37044542b8b57c69df8b5a87, type=some_type>, Model Record <id=obj1, type=some_type>, Model Record <id=some_id, type=some_type>]
# Kosh
list(store_kosh.find(types=["some_type",]))
[KOSH DATASET id: 1b1c6f1b37044542b8b57c69df8b5a87 name: ??? creator: ??? --- Attributes --- param1: 1 param2: 2 param3: 3.3 --- Associated Data (2)--- Mime_type: image/png foo.png ( 1b1c6f1b37044542b8b57c69df8b5a87 ) Mime_type: sina/curve internal ( timeplot_1 ) --- Ensembles (0)--- [] --- Ensemble Attributes --- , KOSH DATASET id: some_id name: Unnamed Dataset creator: cdoutrix --- Attributes --- creator: cdoutrix name: Unnamed Dataset pi: 3.14159 pi_over_2: 1.57 --- Associated Data (0)--- --- Ensembles (0)--- [] --- Ensemble Attributes --- , KOSH DATASET id: obj1 name: ??? creator: ??? --- Attributes --- param1: 1 param2: 2 param3: 3.3 --- Associated Data (2)--- Mime_type: image/png foo.png ( obj1 ) Mime_type: sina/curve internal ( timeplot_1 ) --- Ensembles (0)--- [] --- Ensemble Attributes --- ]
list(store_sina.records.find(data= {"pi_over_2":sina.utils.DataRange(1.3, 1.6), "pi":3.14159, "creator":sina.utils.exists()}))
# or via the data dedicated function:
list(store_sina.records.find_with_data(pi_over_2=sina.utils.DataRange(1.3, 1.6), pi=3.14159, creator=sina.utils.exists()))
['some_id']
list(store_kosh.find('creator', pi_over_2=sina.utils.DataRange(1.3, 1.6), pi=3.14159))
[KOSH DATASET id: some_id name: Unnamed Dataset creator: cdoutrix --- Attributes --- creator: cdoutrix name: Unnamed Dataset pi: 3.14159 pi_over_2: 1.57 --- Associated Data (0)--- --- Ensembles (0)--- [] --- Ensemble Attributes --- ]
Search records with file uri¶
Sina records can contain a special field to store files related to this record. You can search Sina for all records linked to a specific file.
list(store_sina.records.find(file_uri="foo.png"))
# or via its dedicated function
list(store_sina.records.find_with_file_uri("foo.png"))
[Model Record <id=1b1c6f1b37044542b8b57c69df8b5a87, type=some_type>, Model Record <id=obj1, type=some_type>]
Kosh can accomplish the same search via its dedicated file_uri
key when searching
list(store_kosh.find(file_uri='foo.png'))
type(store_sina.records)
sina.datastore.DataStore.RecordOperations
At this point it is worth noting that, in Kosh, it is recommended to associate
files with a dataset rather than using the file
section.
Associating a file (source) with a Kosh dataset will create a new record in the database with a Kosh reserved record type. There many reasons why Kosh does this.
- If a file is
associated
with many Kosh datasets this saves on the number of entries in the database. - Since files are now represented by their own records, we can add many queryable metadata to them.
- As your problem complexity grows, many files/sources can be associated with a dataset. Having these files represented as records in Sina allows Kosh to use Sina's query capabilities to quickly pinpoint the desired files(s)/source(s).
Let's demonstrate this:
my_kosh_dataset = store_kosh.open("my_id")
for i in range(100):
my_kosh_dataset.associate("some_file_{:04d}.png".format(i), mime_type="png", metadata= {"some_param":i})
# now let's search all source for this dataset with `some_param` value between 73 and 90
list(my_kosh_dataset.search(some_param=sina.utils.DataRange(73, 90)))
[<kosh.core_sina.KoshSinaFile at 0x2aabed3882d0>, <kosh.core_sina.KoshSinaFile at 0x2aabed37eb90>, <kosh.core_sina.KoshSinaFile at 0x2aabed37e390>, <kosh.core_sina.KoshSinaFile at 0x2aabed306c90>, <kosh.core_sina.KoshSinaFile at 0x2aabed3067d0>, <kosh.core_sina.KoshSinaFile at 0x2aabed319690>, <kosh.core_sina.KoshSinaFile at 0x2aabed388e50>, <kosh.core_sina.KoshSinaFile at 0x2aabed388510>, <kosh.core_sina.KoshSinaFile at 0x2aabed310dd0>, <kosh.core_sina.KoshSinaFile at 0x2aabed3764d0>, <kosh.core_sina.KoshSinaFile at 0x2aabed376a50>, <kosh.core_sina.KoshSinaFile at 0x2aabed349d50>, <kosh.core_sina.KoshSinaFile at 0x2aabed349dd0>, <kosh.core_sina.KoshSinaFile at 0x2aabed2ea190>, <kosh.core_sina.KoshSinaFile at 0x2aabed2ea6d0>, <kosh.core_sina.KoshSinaFile at 0x2aabed2f3fd0>, <kosh.core_sina.KoshSinaFile at 0x2aabed2f3750>]
Data¶
Curves¶
Sina¶
Sina allows you to query the "data" section of its records, but you can also access and search curves sets
which are essentially time series associated with a record.
A curve set is constituted of an independent
variable and some dependent
variable(s).
You can ask Sina to give you all records with a volume
curve set having values greater than 15
list(store_sina.records.find(data={"volume":sina.utils.any_in(sina.utils.DataRange(min=15.))}))
[]
You can then get the curves from the record.
rec = store_sina.records.get("obj1")
rec["curve_sets"]
{'timeplot_1': {'independent': {'time': {'value': [0, 1, 2]}}, 'dependent': {'feature_a': {'value': [15, 25, 35], 'tags': ['tag1']}, 'feature_b': {'value': [10.1, 25.2, 40.3], 'units': 'm'}}}}
Kosh¶
Kosh's uses Sina search capabilities under the hood, so similarly you would do:
vol_ids = list(store_kosh.find(volume=sina.utils.any_in(sina.utils.DataRange(min=15.))))
# And to get the curves list:
dataset = store_kosh.open("obj1")
print(dataset.list_features())
['timeplot_1', 'timeplot_1/feature_a', 'timeplot_1/feature_b', 'timeplot_1/time']
Let's access the time
print(dataset.get("timeplot_1/time"))
[0 1 2]
External Data (large files)¶
Sina provides a mechanism to link files to records, via the add_file
function.
If you also provide a mime_type
attribute to this added file Kosh will treat it as an associated file and will be able to extract its data via loader (although it will not be able to find it via an attribute search).
rec.add_file("sample_files/run_000.hdf5", mimetype="hdf5")
store_sina.records.delete(rec.id)
store_sina.records.insert(rec)
dataset.list_features(use_cache=False) # Because it was cached and Kosh cannot know something changed from sina side
['timeplot_1', 'timeplot_1/feature_a', 'timeplot_1/feature_b', 'timeplot_1/time', 'cycles', 'direction', 'elements', 'node', 'node/metrics_0', 'node/metrics_1', 'node/metrics_10', 'node/metrics_11', 'node/metrics_12', 'node/metrics_2', 'node/metrics_3', 'node/metrics_4', 'node/metrics_5', 'node/metrics_6', 'node/metrics_7', 'node/metrics_8', 'node/metrics_9', 'zone', 'zone/metrics_0', 'zone/metrics_1', 'zone/metrics_2', 'zone/metrics_3', 'zone/metrics_4']
dataset.get("zone/metrics_0"), dataset.get("timeplot_1/feature_a")
(<HDF5 dataset "metrics_0": shape (2, 4), type "<f4">, array([15, 25, 35]))