Lesson Flow

Learn

Goals and Concepts

Start with the capability target and concept set for this module.

Practice

Studio Activity

Apply the ideas in a guided activity tied to realistic outputs.

Check

Assessment Rubric

Use the rubric to verify competency and identify improvement targets.

Interactive Lab

Practice in short loops: checkpoint quiz, microtask decision, and competency progress tracking.

Checkpoint Quiz

Q1. Which output most clearly demonstrates module competency?

Competency is shown through measurable, method-linked evidence.

Q2. What should always accompany a technical claim in this curriculum?

Every claim should include boundaries and uncertainty.

Q3. What is the best next step after identifying a gap in understanding?

Progress improves when gaps become explicit practice targets.

Microtask Decision

Choose the action that best improves scientific reliability.

Progress Tracker

State is saved locally in your browser for this module.

0% complete

Annotation Challenge

Click the hotspot with the strongest evidence for the requested feature.

Connectomics training scene

Selected hotspot: none

Capability target

Produce a scalable, reproducible query-and-analysis plan for a large connectomics dataset, including storage assumptions, indexing strategy, and provenance capture.

Why this module matters

Connectomics is now data-system-limited as much as algorithm-limited. If learners cannot reason about throughput, storage, and indexing, they cannot execute reliable analyses on real datasets.

Concept set

1) Data architecture is scientific method infrastructure

2) Query cost is a research variable

3) Provenance must be first-class

Hidden curriculum scaffold

Core workflow: scalable query planning

  1. Define analysis question and required data granularity.
  2. Select storage/index strategy aligned to access pattern.
  3. Prototype baseline query and profile bottlenecks.
  4. Add provenance logging and version controls.
  5. Validate reproducibility and publish query package.

60-minute tutorial run-of-show

  1. **00:00-08:00 Architecture framing and failure examples**
  2. **08:00-20:00 Access-pattern to index mapping exercise**
  3. **20:00-34:00 Query profiling and bottleneck diagnosis**
  4. **34:00-46:00 Provenance logging implementation**
  5. **46:00-56:00 Team review of reproducibility gaps**
  6. **56:00-60:00 Competency check and next-step assignment**

Studio activity: petascale query design lab

Scenario: Your team must deliver a weekly motif-analysis report from a multi-terabyte connectomics store.

Tasks

  1. Propose storage/index layout for expected query patterns.
  2. Write or outline two critical queries and estimate performance risk.
  3. Define minimum provenance fields for outputs.
  4. Produce one optimization proposal and one reproducibility safeguard.

Expected outputs

Assessment rubric

Scale context: real-world numbers

To ground the abstract concepts, here are the data scales learners will encounter:

Dataset Raw volume Neurons Synapses Storage
MICrONS (minnie65) 1 mm³ mouse V1 ~80,000 ~500M ~2 PB
H01 ~1 mm³ human temporal cortex ~57,000 cells ~150M ~1.4 PB
FlyWire Whole adult Drosophila brain ~139,255 ~54.5M ~100 TB
MouseConnects (planned) ~10 mm³ mouse hippocampus TBD TBD >10 PB

Teaching point: “When your synapse table has 500 million rows, a poorly written query doesn’t just run slowly — it may not finish at all. Architecture decisions determine whether your science is feasible.”

Key tools and formats

Tool/Format Purpose When to use
Zarr/N5 Chunked array storage Volumetric data, cloud-friendly
Neuroglancer precomputed Multiscale image pyramids Web browsing of EM/segmentation
CAVEclient Python API for CAVE tables Synapse queries, annotation access
CloudVolume Python API for volumetric data Image/segmentation chunk access
pandas/Dask Tabular data manipulation Synapse tables, annotation analysis
BigQuery/DuckDB SQL on large tables Complex joins on synapse/annotation tables

Content library references

Teaching resources

References

Quick practice prompt

Document one query you use with:

  1. data source/version,
  2. expected runtime class,
  3. one provenance field you currently miss.

Teaching Materials

Activity Worksheet

Learner worksheet aligned to the studio activity and rubric.

Open worksheet

Slide Source

Marp source file for editing and rendering.

course/decks/marp/modules/module12.marp.md

Related Content