Adding Data To Datasets¶
This tutorial is a sequel to Tutorial 00) which should have been successfully ran before this tutotrial.
Connect to store (using sina local file and asynchronous mode)¶
from kosh import connect
import os
# local tutorial sql file
kosh_example_sql_file = "kosh_example.sql"
# connect to store in asynchronous mode
store = connect(kosh_example_sql_file)
/g/g19/cdoutrix/miniconda3/envs/kosh/lib/python3.8/site-packages/scipy/__init__.py:138: UserWarning: A NumPy version >=1.16.5 and <1.23.0 is required for this version of SciPy (detected version 1.23.3) warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion} is required for this version of "
Adding Files to Datasets¶
Let's find datasets containing param1
from sina.utils import DataRange
# We're setting a min value less than the known min, to ensure all dataset come back
datasets = list(store.find(param1=DataRange(-1.e20)))
print(len(datasets))
125
Let's scan the directories and add relevant files to the datasets
import os
import glob
try:
from tqdm.autonotebook import tqdm
except:
tqdm = list
pth = "sample_files"
pbar = tqdm(datasets[:10])
for i, dataset in enumerate(pbar):
hdf5 = dataset.name+".hdf5"
if len(hdf5)>0:
try:
dataset.associate(os.path.join(pth,hdf5), mime_type="hdf5")
except Exception: # file already here
pass
/tmp/ipykernel_63859/1290189239.py:4: TqdmExperimentalWarning: Using `tqdm.autonotebook.tqdm` in notebook mode. Use `tqdm.tqdm` instead to force console mode (e.g. in jupyter console) from tqdm.autonotebook import tqdm
0%| | 0/10 [00:00<?, ?it/s]
List ids of data URIs associated with this dataset
dataset._associated_data_
['71e0d881b0b744dcaf31915e2c71d968']
Let's find datasets with data with mime type hdf5
dataset.find(mime_type="hdf5")
<generator object KoshDataset.find at 0x2aaade1d34a0>
file = store._load(dataset._associated_data_[0])
file.uri
'/g/g19/cdoutrix/git/kosh/examples/sample_files/run_062.hdf5'
h5 = dataset.open(dataset._associated_data_[0])
h5
<HDF5 file "run_062.hdf5" (mode r)>
h5 = store.open(dataset._associated_data_[0])
h5
<HDF5 file "run_062.hdf5" (mode r)>
# You can associate many sources to a dataset
dataset.associate("some_other_file", mime_type="netcdf")
dataset._associated_data_
['71e0d881b0b744dcaf31915e2c71d968', '2008fdac8cdb4a37976f65c3d6f34b15']
# Or many datasets at once
dataset.associate(["file2", "file3"], mime_type="png")
dataset._associated_data_
['71e0d881b0b744dcaf31915e2c71d968', '2008fdac8cdb4a37976f65c3d6f34b15', 'e466dd75d4e949c6b088c1f0f0e04449', '82225d89d283448183215aa8d742dd20']
# They do NOT have to be of them type and/or metadata
dataset.associate(["file5", "file6"], mime_type=["tiff", "jpg"], metadata=[{"name":"some"}, {"age":21}])
dataset._associated_data_
['71e0d881b0b744dcaf31915e2c71d968', '2008fdac8cdb4a37976f65c3d6f34b15', 'e466dd75d4e949c6b088c1f0f0e04449', '82225d89d283448183215aa8d742dd20', 'faa1d71f61644cc9835daf3f7927209f', '31fa0c4a5da04f6ba096f34ec86a93ab']
Removing associated files¶
Sometimes you might need to remove an association this can be done via the dissociate
command.
dataset.dissociate("file5")
dataset._associated_data_
['71e0d881b0b744dcaf31915e2c71d968', '2008fdac8cdb4a37976f65c3d6f34b15', 'e466dd75d4e949c6b088c1f0f0e04449', '82225d89d283448183215aa8d742dd20', '31fa0c4a5da04f6ba096f34ec86a93ab']
Adding curves to a dataset¶
Sometimes you don't need/want a file hanging around, you just want to save a curve (think 1D data)
You can easily do so.
You can organize/group your curve into different curve_sets
and give them a name. If you don't, Kosh will name them automaticaly for you.
dataset.add_curve([1,2,3,4], "time", "my_curves")
dataset.add_curve([2.3, 3.4, 5.6, 7.8], "some_variable", "my_curves")
dataset.add_curve([3, 4,5], "time", "my_other_curves")
dataset
KOSH DATASET id: 209c7382c1334ef4afe8fa95ef0cb58b name: run_062 creator: cdoutrix --- Attributes --- creator: cdoutrix name: run_062 param1: 0.3299019516056123 param2: 0.24940142061599885 param3: 4.635686431066943 param4: 2.4118405159503844 param5: 2.21532924044391 param6: J project: Kosh Tutorial --- Associated Data (6)--- Mime_type: hdf5 /g/g19/cdoutrix/git/kosh/examples/sample_files/run_062.hdf5 ( 71e0d881b0b744dcaf31915e2c71d968 ) Mime_type: jpg file6 ( 31fa0c4a5da04f6ba096f34ec86a93ab ) Mime_type: netcdf some_other_file ( 2008fdac8cdb4a37976f65c3d6f34b15 ) Mime_type: png file2 ( e466dd75d4e949c6b088c1f0f0e04449 ) file3 ( 82225d89d283448183215aa8d742dd20 ) Mime_type: sina/curve internal ( my_curves, my_other_curves ) --- Ensembles (0)--- [] --- Ensemble Attributes ---
Removing curves and curve_sets¶
Similarly you can remove curves or curve_set (if a curve_set becomes empty it will be automatically removed)
dataset.remove_curve("some_variable", "my_curves")
# or
dataset.remove_curve("my_curves/time")
# notice the "my_curves" is gone
dataset
KOSH DATASET id: 209c7382c1334ef4afe8fa95ef0cb58b name: run_062 creator: cdoutrix --- Attributes --- creator: cdoutrix name: run_062 param1: 0.3299019516056123 param2: 0.24940142061599885 param3: 4.635686431066943 param4: 2.4118405159503844 param5: 2.21532924044391 param6: J project: Kosh Tutorial --- Associated Data (6)--- Mime_type: hdf5 /g/g19/cdoutrix/git/kosh/examples/sample_files/run_062.hdf5 ( 71e0d881b0b744dcaf31915e2c71d968 ) Mime_type: jpg file6 ( 31fa0c4a5da04f6ba096f34ec86a93ab ) Mime_type: netcdf some_other_file ( 2008fdac8cdb4a37976f65c3d6f34b15 ) Mime_type: png file2 ( e466dd75d4e949c6b088c1f0f0e04449 ) file3 ( 82225d89d283448183215aa8d742dd20 ) Mime_type: sina/curve internal ( my_other_curves ) --- Ensembles (0)--- [] --- Ensemble Attributes ---
dataset.remove_curve("my_other_curves")
# all gone
dataset
KOSH DATASET id: 209c7382c1334ef4afe8fa95ef0cb58b name: run_062 creator: cdoutrix --- Attributes --- creator: cdoutrix name: run_062 param1: 0.3299019516056123 param2: 0.24940142061599885 param3: 4.635686431066943 param4: 2.4118405159503844 param5: 2.21532924044391 param6: J project: Kosh Tutorial --- Associated Data (5)--- Mime_type: hdf5 /g/g19/cdoutrix/git/kosh/examples/sample_files/run_062.hdf5 ( 71e0d881b0b744dcaf31915e2c71d968 ) Mime_type: jpg file6 ( 31fa0c4a5da04f6ba096f34ec86a93ab ) Mime_type: netcdf some_other_file ( 2008fdac8cdb4a37976f65c3d6f34b15 ) Mime_type: png file2 ( e466dd75d4e949c6b088c1f0f0e04449 ) file3 ( 82225d89d283448183215aa8d742dd20 ) --- Ensembles (0)--- [] --- Ensemble Attributes ---