Architecture¶
This page explains the main code paths in terms of responsibilities rather than individual source files.
High-level flow¶
The active project flow looks like this:
nextflow working directory
-> run discovery
-> run selection
-> analysis/log staging
-> desktop-style metadata synthesis
-> manifest creation
-> .2me tarball
-> import
Main areas of the codebase¶
- CLI entry points
src/create_2me/create_from_cli_run.rsandsrc/importer/import_from_2me.rsdefine the active command-line flows.- Nextflow capture
src/nextflow/nextflow_toolkit.rsindexes historical runs withnextflow logand orchestrates CLI-run packaging.- Analysis staging
src/nextflow/nextflow_analysis.rsresolves output directories, finds matching logs, distillsnextflow.stdout, and synthesizesprogress.json.- Metadata extraction
src/nextflow_log_parser.rsparses the reduced Nextflow transcript into workflow identity fields such as project, repository, revision, and version.- Desktop analysis model
src/epi2me_desktop_analysis.rsdefines the EPI2ME-style analysis record that is serialized into the archive payload.- Workflow payload model
src/epi2me_workflow.rsinventories installed workflow files for packaging and import.- Manifest and archive semantics
src/xmanifest.rsdefines the portable archive structure, provenance, and manifest verification logic.
Design intent¶
The recurring design theme is translation.
epi4you is not trying to replace Nextflow or EPI2ME Desktop. Instead it
translates between:
raw CLI-oriented run artifacts,
EPI2ME-style metadata expectations, and
a portable archive form suitable for transfer.
This translation is why the codebase contains both low-level filesystem work and higher-level domain models such as manifests and desktop analyses.
Relationship to the broader repository¶
The repository contains additional code for workflows, containers, and database operations that reflects the wider original ambition of the project.
Even where those paths are not the current main CLI entry points, they still matter architecturally because they explain why the manifest supports multiple payload types and why the project thinks in terms of “bioinformatics assets” rather than only “analysis result folders”.