.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/convert_lp_to_benchmark.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code. .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_convert_lp_to_benchmark.py: Convert Lightning Pose project to benchmark dataset ====================================================== Create a ``poseinterface`` benchmark dataset from a Lightning Pose (LP) project. .. GENERATED FROM PYTHON SOURCE LINES 9-11 Imports ------- .. GENERATED FROM PYTHON SOURCE LINES 11-28 .. code-block:: Python import json import shutil import tempfile from datetime import datetime, timezone from pathlib import Path import poseinterface from poseinterface.clips import extract_clip from poseinterface.io import ( annotations_to_poseinterface, frames_to_poseinterface, predictions_to_poseinterface, split_lp_collected_data, video_to_poseinterface, ) from poseinterface.utils import tree .. GENERATED FROM PYTHON SOURCE LINES 29-43 Overview -------- We'll handle the conversion in two steps: 1. **Convert:** LP project files (videos, frame annotations, and keypoint predictions) are restructured into the :ref:`poseinterface benchmark layout `. 2. **Extract clips:** Short video clips and their labels are extracted from the converted videos and their corresponding keypoint predictions, ready for expert review. The workflow is similar to the one followed in :ref:`sphx_glr_auto_examples_convert_dlc_to_benchmark.py`, with a few differences explained below. .. GENERATED FROM PYTHON SOURCE LINES 45-69 Source Lightning Pose project ----------------------------- We work with a dataset from the `International Brain Laboratory (IBL) `_, containing videos of mouse paw movements analysed using `Lightning Pose `_. .. note:: This example runs against a lightweight fixture shipped with the repository (under ``tests/data/``). Replace ``source_project_dir`` and ``benchmark_base_dir`` with the paths to your LP project and benchmark dataset directories, respectively. Keep in mind that your project will contain more files than are shown here. .. warning:: Lightning Pose saves prediction files to the model output directory, **not** to the project's ``videos/`` directory. Before running this script, move (or copy) each session's prediction CSV — and apply any manual corrections you have made — into ``/videos/``, named to match the corresponding video stem (e.g. ``.csv``). .. GENERATED FROM PYTHON SOURCE LINES 69-79 .. code-block:: Python source_project_dir = ( Path(".").resolve().parent / "tests" / "data" / "lightningpose" / "ibl-paw" ) print(tree(source_project_dir, level=1, exclude_hidden=True)) # For this example we use a temporary directory, cleaned up at the end. benchmark_base_dir = Path(tempfile.mkdtemp(prefix="poseinterface-benchmark-")) print(f"\nBenchmark dataset will be saved to: {benchmark_base_dir}") .. rst-class:: sphx-glr-script-out .. code-block:: none ibl-paw/ ├── CollectedData.csv ├── labeled-data/ └── videos/ 2 directories, 1 files Benchmark dataset will be saved to: /tmp/poseinterface-benchmark-hz4d6lsf .. GENERATED FROM PYTHON SOURCE LINES 80-88 The LP project differs from a DLC project in one key respect: all session annotations live in a **single project-level** ``CollectedData.csv`` rather than in per-session files inside ``labeled-data/``. The two sub-directories we care about otherwise mirror the DLC layout: - ``videos/``: session videos and (after the move described above) their corresponding prediction CSVs. - ``labeled-data/``: sampled frames. .. GENERATED FROM PYTHON SOURCE LINES 88-91 .. code-block:: Python print(tree(source_project_dir / "videos", level=1, exclude_hidden=True)) .. rst-class:: sphx-glr-script-out .. code-block:: none videos/ ├── 6c6983ef73834989918332b1a300d17a_left.csv ├── 6c6983ef73834989918332b1a300d17a_left.mp4 ├── a92c4b1d46bd457ea1f4414265f0e2d4_left.csv └── a92c4b1d46bd457ea1f4414265f0e2d4_left.mp4 0 directories, 4 files .. GENERATED FROM PYTHON SOURCE LINES 92-95 .. code-block:: Python print(tree(source_project_dir / "labeled-data", level=2, exclude_hidden=True)) .. rst-class:: sphx-glr-script-out .. code-block:: none labeled-data/ ├── 6c6983ef73834989918332b1a300d17a_left/ │ ├── img00000006.png │ ├── img00000036.png │ └── img00000136.png └── a92c4b1d46bd457ea1f4414265f0e2d4_left/ ├── img00014918.png ├── img00074921.png └── img00074925.png 2 directories, 6 files .. GENERATED FROM PYTHON SOURCE LINES 96-104 Define sessions to convert --------------------------- We select two sessions from the LP project and assign each to either the ``Train`` or ``Test`` split of the :ref:`benchmark dataset `. You may expand this list with more sessions, but ensure that each session belongs to exactly one split, and that the same subject doesn't appear in both splits (to avoid data leakage). .. GENERATED FROM PYTHON SOURCE LINES 104-124 .. code-block:: Python sessions = [ { "split": "Train", "source_video": "6c6983ef73834989918332b1a300d17a_left.mp4", "sub_id": "SWC054", "ses_id": "6c6983ef73834989918332b1a300d17a", "cam_id": "left", }, { "split": "Test", "source_video": "a92c4b1d46bd457ea1f4414265f0e2d4_left.mp4", "sub_id": "KS023", "ses_id": "a92c4b1d46bd457ea1f4414265f0e2d4", "cam_id": "left", }, ] project_name = "IBL-paw" .. GENERATED FROM PYTHON SOURCE LINES 125-134 Split the project-level annotation file ---------------------------------------- Unlike DLC, Lightning Pose stores all session annotations in a single project-level ``CollectedData.csv``. We split it into per-session ``CollectedData_.csv`` files and create a temporary directory mirroring the ``labeled-data/`` structure with symlinks to the original frames. This is necessary because the underlying loader (sleap-io) resolves image paths relative to the CSV location, so the split CSV must live alongside the frame images it references. .. GENERATED FROM PYTHON SOURCE LINES 134-149 .. code-block:: Python lp_session_base = benchmark_base_dir / ".lp_sessions" split_results = split_lp_collected_data( input_path=source_project_dir / "CollectedData.csv", output_dir=lp_session_base, ) for ses_name, split_csv in split_results.items(): orig_frames_dir = source_project_dir / "labeled-data" / ses_name ses_dir = split_csv.parent for img in sorted(orig_frames_dir.glob("*.png")): (ses_dir / img.name).symlink_to(img) print("Split annotation files:") for ses_name, csv_path in split_results.items(): print(f" {ses_name}: {csv_path.name}") .. rst-class:: sphx-glr-script-out .. code-block:: none Split annotation files: 6c6983ef73834989918332b1a300d17a_left: CollectedData_mattw.csv a92c4b1d46bd457ea1f4414265f0e2d4_left: CollectedData_mattw.csv .. GENERATED FROM PYTHON SOURCE LINES 150-158 Convert to benchmark format ---------------------------- For each session we: 1. copy (and re-encode, if necessary) the session video; 2. convert LP keypoint annotations to COCO JSON, as well as copy and rename the corresponding frame images; 3. convert LP keypoint predictions to COCO JSON. .. GENERATED FROM PYTHON SOURCE LINES 158-242 .. code-block:: Python for session in sessions: split = session["split"] ids = {k: session[k] for k in ["sub_id", "ses_id", "cam_id"]} sub_ses_prefix = f"sub-{ids['sub_id']}_ses-{ids['ses_id']}" sub_ses_cam_prefix = f"{sub_ses_prefix}_cam-{ids['cam_id']}" source_video_path = source_project_dir / "videos" / session["source_video"] target_session_dir = ( benchmark_base_dir / split / project_name / sub_ses_prefix ) target_frames_dir = target_session_dir / "Frames" target_frames_dir.mkdir(parents=True, exist_ok=True) # LP session name matches ses_id + "_" + cam_id (e.g. labeled-data/ dir). video_stem = source_video_path.stem _lp_key = f"{session['ses_id']}_{session['cam_id']}" lp_session_name: str | None = _lp_key if _lp_key in split_results else None print(f"Converting session: {split}/{project_name}/{sub_ses_prefix}") # Copy the session video, re-encoding to H.264/yuv420p if necessary. video_to_poseinterface( input_video=source_video_path, output_video_dir=target_session_dir, **ids, ) print(f"\tvideo: {source_video_path.name} -> {sub_ses_cam_prefix}.mp4") # Convert LP annotations to COCO frame labels JSON, then copy the # corresponding frame images with standardised poseinterface filenames. if lp_session_name is None: print( f"\tNo matching LP session found for {video_stem!r}." " Skipping annotations-to-poseinterface conversion." ) else: # The split CSV lives in the temp dir alongside frame symlinks so # that sleap-io can resolve image paths relative to the CSV location. source_annotations_path = split_results[lp_session_name] # Use the original frames dir (not the symlinks) for the copy step. source_frames_dir = ( source_project_dir / "labeled-data" / lp_session_name ) framelabels_path = annotations_to_poseinterface( input_path=source_annotations_path, output_dir=target_frames_dir, format="frame", **ids, ) frames_to_poseinterface( input_dir=source_frames_dir, output_dir=target_frames_dir, framelabels_path=framelabels_path, ) print( f"\tannotations (+ frame images): " f"{source_annotations_path.name} -> {framelabels_path.name}" ) # Convert LP predictions to COCO video labels JSON for clip extraction. # Prediction CSVs must be present in videos/ before running this script; # see the warning in the "Source Lightning Pose project" section above. source_predictions_path = next( (source_project_dir / "videos").glob(f"{video_stem}*.csv"), None, ) if source_predictions_path is None: print( f"\tNo prediction CSV found for {video_stem!r} in " f"{source_project_dir / 'videos'}. Skipping predictions-to-" "poseinterface conversion." ) else: predictions_to_poseinterface( input_path=source_predictions_path, video_path=source_video_path, output_dir=target_session_dir, **ids, ) print( f"\tpredictions: {source_predictions_path.name} -> " f"{sub_ses_cam_prefix}_videolabels.json" ) print("Done.\n") .. rst-class:: sphx-glr-script-out .. code-block:: none Converting session: Train/IBL-paw/sub-SWC054_ses-6c6983ef73834989918332b1a300d17a video: 6c6983ef73834989918332b1a300d17a_left.mp4 -> sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left.mp4 annotations (+ frame images): CollectedData_mattw.csv -> sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_framelabels.json predictions: 6c6983ef73834989918332b1a300d17a_left.csv -> sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_videolabels.json Done. Converting session: Test/IBL-paw/sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4 video: a92c4b1d46bd457ea1f4414265f0e2d4_left.mp4 -> sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left.mp4 annotations (+ frame images): CollectedData_mattw.csv -> sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_framelabels.json predictions: a92c4b1d46bd457ea1f4414265f0e2d4_left.csv -> sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_videolabels.json Done. .. GENERATED FROM PYTHON SOURCE LINES 243-244 The resulting benchmark dataset: .. GENERATED FROM PYTHON SOURCE LINES 244-247 .. code-block:: Python print(tree(benchmark_base_dir, level=5, exclude_hidden=True)) .. rst-class:: sphx-glr-script-out .. code-block:: none poseinterface-benchmark-hz4d6lsf/ ├── Test/ │ └── IBL-paw/ │ └── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4/ │ ├── Frames/ │ │ ├── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_frame-14918.png │ │ ├── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_frame-74921.png │ │ ├── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_frame-74925.png │ │ └── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_framelabels.json │ ├── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left.mp4 │ └── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_videolabels.json └── Train/ └── IBL-paw/ └── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a/ ├── Frames/ │ ├── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_frame-006.png │ ├── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_frame-036.png │ ├── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_frame-136.png │ └── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_framelabels.json ├── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left.mp4 └── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_videolabels.json 8 directories, 12 files .. GENERATED FROM PYTHON SOURCE LINES 248-259 .. note:: Frame labels (``framelabels.json``) are generated for both splits, but in the **published** dataset the ``Test`` split intentionally omits them for evaluation. See the :ref:`folder structure specification` for details. The ``videolabels.json`` files generated alongside each session video are intermediate artifacts used for clip extraction in the next section, and will not be included in the published dataset. .. GENERATED FROM PYTHON SOURCE LINES 262-273 Extract clips ------------- Clips (short video segments) can be extracted from the converted session videos. When the ``videolabels.json`` files are present, the corresponding clip label files (``cliplabels.json``) are generated automatically during clip extraction. These clip label files should then be proof-read and corrected by experts before being included in the benchmark dataset. First, we specify the clip-extraction parameters. This step can be repeated with different parameters to incrementally expand the clip set. .. GENERATED FROM PYTHON SOURCE LINES 273-278 .. code-block:: Python duration = 5 # in frames start_frames = [25, 50, 75] print(f"Extracting {duration}-frame clips starting at frames: {start_frames}") .. rst-class:: sphx-glr-script-out .. code-block:: none Extracting 5-frame clips starting at frames: [25, 50, 75] .. GENERATED FROM PYTHON SOURCE LINES 279-282 We loop over all sessions and extract clips at each start frame. The resulting video clips and their ``cliplabels.json`` files are saved in a ``Clips/`` subdirectory within each session folder. .. GENERATED FROM PYTHON SOURCE LINES 282-299 .. code-block:: Python for session in sessions: sub_ses_prefix = f"sub-{session['sub_id']}_ses-{session['ses_id']}" sub_ses_cam_prefix = f"{sub_ses_prefix}_cam-{session['cam_id']}" session_dir = ( benchmark_base_dir / session["split"] / project_name / sub_ses_prefix ) for start_frame in start_frames: clip_path, _ = extract_clip( video_path=session_dir / f"{sub_ses_cam_prefix}.mp4", start_frame=start_frame, duration=duration, ) print(f"Extracted clip: {clip_path.stem}") .. rst-class:: sphx-glr-script-out .. code-block:: none Extracted clip: sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_start-25_dur-5 Extracted clip: sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_start-50_dur-5 Extracted clip: sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_start-75_dur-5 Extracted clip: sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_start-25_dur-5 Extracted clip: sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_start-50_dur-5 Extracted clip: sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_start-75_dur-5 .. GENERATED FROM PYTHON SOURCE LINES 300-302 The resulting benchmark dataset, including the extracted clips and their corresponding labels: .. GENERATED FROM PYTHON SOURCE LINES 302-306 .. code-block:: Python print(tree(benchmark_base_dir, level=5, exclude_hidden=True)) .. rst-class:: sphx-glr-script-out .. code-block:: none poseinterface-benchmark-hz4d6lsf/ ├── Test/ │ └── IBL-paw/ │ └── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4/ │ ├── Clips/ │ │ ├── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_start-25_dur-5.mp4 │ │ ├── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_start-25_dur-5_cliplabels.json │ │ ├── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_start-50_dur-5.mp4 │ │ ├── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_start-50_dur-5_cliplabels.json │ │ ├── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_start-75_dur-5.mp4 │ │ └── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_start-75_dur-5_cliplabels.json │ ├── Frames/ │ │ ├── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_frame-14918.png │ │ ├── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_frame-74921.png │ │ ├── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_frame-74925.png │ │ └── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_framelabels.json │ ├── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left.mp4 │ └── sub-KS023_ses-a92c4b1d46bd457ea1f4414265f0e2d4_cam-left_videolabels.json └── Train/ └── IBL-paw/ └── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a/ ├── Clips/ │ ├── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_start-25_dur-5.mp4 │ ├── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_start-25_dur-5_cliplabels.json │ ├── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_start-50_dur-5.mp4 │ ├── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_start-50_dur-5_cliplabels.json │ ├── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_start-75_dur-5.mp4 │ └── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_start-75_dur-5_cliplabels.json ├── Frames/ │ ├── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_frame-006.png │ ├── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_frame-036.png │ ├── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_frame-136.png │ └── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_framelabels.json ├── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left.mp4 └── sub-SWC054_ses-6c6983ef73834989918332b1a300d17a_cam-left_videolabels.json 10 directories, 24 files .. GENERATED FROM PYTHON SOURCE LINES 307-318 .. note:: In the published dataset, the ``Train`` split includes all ``cliplabels.json`` files. The ``Test`` split omits all ``cliplabels.json`` files and instead provides only clip start labels (``startlabels.json``), derived from each clip's first frame, to support point-tracker evaluation. The ``videolabels.json`` files generated in the previous section are intermediate artifacts used for clip extraction, and are never shared. See the :ref:`folder structure specification` for details. .. GENERATED FROM PYTHON SOURCE LINES 321-332 Record provenance (optional) ---------------------------- This step is optional and can be safely skipped, but it is highly recommended when converting real data, for book-keeping and reproducibility purposes. We save a copy of this script alongside a JSON sidecar with the ``poseinterface`` version (including git commit, via ``setuptools_scm``) and a UTC timestamp. Both files are written to a top-level ``.provenance/`` folder, named by project, so multiple projects under the same ``benchmark_base_dir`` stay distinct. .. GENERATED FROM PYTHON SOURCE LINES 332-354 .. code-block:: Python # sphinx_gallery_capture_repr = () provenance_dir = benchmark_base_dir / ".provenance" provenance_dir.mkdir(parents=True, exist_ok=True) # ``__file__`` is set when running this script directly with Python, but # not when sphinx-gallery executes it during the docs build. script_path_str = globals().get("__file__") if script_path_str: shutil.copy(Path(script_path_str), provenance_dir / f"{project_name}.py") (provenance_dir / f"{project_name}.json").write_text( json.dumps( { "poseinterface_version": poseinterface.__version__, "converted_at": datetime.now(timezone.utc).isoformat(), "source_project_dir": str(source_project_dir), }, indent=2, ) ) .. GENERATED FROM PYTHON SOURCE LINES 355-366 Clean up -------- Since this example writes to a temporary directory, we remove it at the end. .. warning:: Only run this cell when ``benchmark_base_dir`` points to a temporary location. The guard below refuses to delete anything outside the system temp directory, so it is safe to leave in place when you adapt this example to a real benchmark dataset path. .. GENERATED FROM PYTHON SOURCE LINES 366-377 .. code-block:: Python system_tempdir = Path(tempfile.gettempdir()).resolve() target = benchmark_base_dir.resolve() if target.is_relative_to(system_tempdir) and target != system_tempdir: shutil.rmtree(target) print(f"Removed temporary benchmark directory: {target}") else: print( f"Refusing to remove {target}: not inside system temp dir " f"({system_tempdir}). Delete manually if you really want to." ) .. rst-class:: sphx-glr-script-out .. code-block:: none Removed temporary benchmark directory: /tmp/poseinterface-benchmark-hz4d6lsf .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 4.572 seconds) .. _sphx_glr_download_auto_examples_convert_lp_to_benchmark.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: convert_lp_to_benchmark.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: convert_lp_to_benchmark.py ` .. container:: sphx-glr-download sphx-glr-download-zip :download:`Download zipped: convert_lp_to_benchmark.zip ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_