Working With datasets¶
This tutorial is a sequel to Tutorial 01 which should have been successfully ran before this tutotrial.
In this tutorial we will open a store, look for some datasets of interest, do some work and augment the metadata.
Connect to store (using sina local file)¶
In [1]:
Copied!
from kosh import connect
import os
# local tutorial sql file
kosh_example_sql_file = "kosh_example.sql"
# connect to store
store = connect(kosh_example_sql_file)
from kosh import connect
import os
# local tutorial sql file
kosh_example_sql_file = "kosh_example.sql"
# connect to store
store = connect(kosh_example_sql_file)
Looping through datasets¶
Let's look for our "Kosh Tutorial project"-related datasets
In [2]:
Copied!
datasets = list(store.find(project="Kosh Tutorial"))
print("We identified {} possible datasets".format(len(datasets)))
datasets = list(store.find(project="Kosh Tutorial"))
print("We identified {} possible datasets".format(len(datasets)))
We identified 125 possible datasets
Working with datasets and files.¶
Now we are going to identify failed nodes and their failure cycles.
In [3]:
Copied!
import numpy
import h5py
try:
from tqdm.autonotebook import tqdm
except ImportError:
tqdm = list
import glob
import numpy
import h5py
try:
from tqdm.autonotebook import tqdm
except ImportError:
tqdm = list
import glob
/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.7/site-packages/ipykernel_launcher.py:4: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console) after removing the cwd from sys.path.
In [4]:
Copied!
import random
def some_long_operation():
return bool(random.randint(0,1))
pbar = tqdm(datasets)
for dataset in pbar:
try: # only works for tqdm
pbar.set_postfix_str(ds.name)
except Exception:
pass
hdf5s = list(dataset.find(mime_type="hdf5", ids_only=True)) # Get associated files ids that are mime_type hdf5
if len(hdf5s)>0:
h5 = store._load(hdf5s[0]) # load the hdf5 file Kosh object (because we used ids_only=True)
# Here we simulate some long operation that we would like to store in kosh
# rather than running every time
dataset.failed = some_long_operation()
h5file = h5.open(mode="r")
# Store dimensions so we can search in Kosh
dataset.cycles = h5file["node"]["metrics_0"].shape[0]
dataset.nodes = h5file["node"]["metrics_0"].shape[1]
import random
def some_long_operation():
return bool(random.randint(0,1))
pbar = tqdm(datasets)
for dataset in pbar:
try: # only works for tqdm
pbar.set_postfix_str(ds.name)
except Exception:
pass
hdf5s = list(dataset.find(mime_type="hdf5", ids_only=True)) # Get associated files ids that are mime_type hdf5
if len(hdf5s)>0:
h5 = store._load(hdf5s[0]) # load the hdf5 file Kosh object (because we used ids_only=True)
# Here we simulate some long operation that we would like to store in kosh
# rather than running every time
dataset.failed = some_long_operation()
h5file = h5.open(mode="r")
# Store dimensions so we can search in Kosh
dataset.cycles = h5file["node"]["metrics_0"].shape[0]
dataset.nodes = h5file["node"]["metrics_0"].shape[1]
0%| | 0/125 [00:00<?, ?it/s]
In [5]:
Copied!
print(len(list(store.find(project="Kosh Tutorial", failed=True))))
print(len(list(store.find(project="Kosh Tutorial", failed=True))))
3