Curate & link multi-modal data#
Show code cell content
!lamin init --storage ./test-multimodal --schema bionty
import lamindb as ln
import lnschema_bionty as lb
lb.settings.species = "human"
ln.settings.verbosity = 3
ln.track()
MuData object#
Let’s use a MuData object:
Show code cell content
mdata = ln.dev.datasets.mudata_papalexi21_subset()
mdata
First we register the file:
file = ln.File(
"papalexi21_subset.h5mu", description="Sub-sampled MuData from Papalexi21"
)
file.save()
Register features#
Now let’s register the 3 feature sets this data contains:
rna
adt
obs (metadata)
modalities#
For the two modalities rna and adt, we use bionty tables as the reference:
mdata["rna"].var_names[:5]
feature_set_rna = ln.FeatureSet.from_values(
mdata["rna"].var_names, field=lb.Gene.symbol
)
mdata["adt"].var_names
feature_set_adt = ln.FeatureSet.from_values(
mdata["adt"].var_names, field=lb.CellMarker.name
)
Link them to file:
file.features.add_feature_set(feature_set_rna, slot="rna")
file.features.add_feature_set(feature_set_adt, slot="adt")
metadata#
The 3rd feature set is the obs:
obs = mdata["rna"].obs
We’re only interested in a single metadata column:
ln.Feature(name="gene_target", type="category").save()
feature_set_obs = ln.FeatureSet.from_df(obs, "metadata")
file.features.add_feature_set(feature_set_obs, slot="obs")
gene_targets = lb.Gene.from_values(obs["gene_target"], "symbol")
ln.save(gene_targets)
file.add_labels(gene_targets)
labels = []
for col in ["orig.ident", "perturbation", "replicate", "Phase", "guide_ID"]:
labels += ln.Label.from_values(obs[col])
Because none of these labels seem like something we’d want to track in the registry or validate, we don’t link them to the file.
file.features
file.describe()
file.view_lineage()
Show code cell content
!lamin delete test-multimodal