Curate & link flow cytometry data#
Show code cell content
!lamin init --storage ./test-flow --schema bionty
import lamindb as ln
import lnschema_bionty as lb
import readfcs
lb.settings.species = "human" # globally set species
ln.track()
We start with a flow cytometry file from Alpert19:
Show code cell content
filepath = ln.dev.datasets.file_fcs_alpert19()
filepath
Use readfcs to read the fcs file into memory:
adata = readfcs.read("Alpert19.fcs")
adata
Track data with cell markers#
We’ll use the CellMarker
reference to link features:
file = ln.File.from_anndata(adata, description="Alpert19", var_ref=lb.CellMarker.name)
We see that many features aren’t validated. Let’s standardize the identifiers:
adata.var.index = lb.CellMarker.bionty().map_synonyms(adata.var.index)
Now things look much better, but we still have 5 CellMaker records that seem more like metadata.
file = ln.File.from_anndata(adata, description="Alpert19", var_ref=lb.CellMarker.name)
Hence, let’s curate the AnnData a bit more:
validated = lb.CellMarker.bionty().validate(adata.var.index, "name")
Let’s move metadata (non-validated cell markers) into adata.obs
:
adata.obs = adata[:, ~validated].to_df()
adata = adata[:, validated].copy()
Now we have a clean panel of 45 CellMarkers and metadata that we don’t want to register:
file = ln.File.from_anndata(adata, description="Alpert19", var_ref=lb.CellMarker.name)
file.save()
file.features
file.features["var"].df().head(10)
Let’s register another flow file:
adata2 = readfcs.read(ln.dev.datasets.file_fcs())
file2 = ln.File.from_anndata(
adata2, description="My fcs file", var_ref=lb.CellMarker.name
)
adata2.var.index = lb.CellMarker.bionty().map_synonyms(adata2.var.index)
file2 = ln.File.from_anndata(
adata2, description="My fcs file", var_ref=lb.CellMarker.name
)
file2.save()
file2.view_lineage()
Query by cell markers#
Which datasets have CD14 in the flow panel:
cell_markers = lb.CellMarker.lookup()
cell_markers.cd14
panels_with_cd14 = ln.FeatureSet.filter(cell_markers=cell_markers.cd14).all()
ln.File.filter(feature_sets__in=panels_with_cd14).df()
Shared cell markers between two files:
files = ln.File.filter(feature_sets__in=panels_with_cd14).list()
file1, file2 = files[0], files[1]
file1_markers = file1.features["var"]
file2_markers = file2.features["var"]
shared_markers = file1_markers & file2_markers
shared_markers.list("name")
Flow marker registry#
Check out your CellMarker registry:
lb.CellMarker.filter().df()
Show code cell content
# a few tests
assert set(shared_markers.list("name")) == set(
[
"Ccr7",
"CD3",
"Cd14",
"Cd19",
"CD127",
"CD27",
"CD28",
"CD8",
"Cd4",
"CD57",
]
)
ln.File.filter(feature_sets__in=panels_with_cd14).exists()
Show code cell content
# clean up test instance
!lamin delete test-flow
!rm -r test-flow