Platform

Platform module for training and evaluation infrastructure.

myriad.platform.train_and_evaluate(config, agent=None)[source]

Main entry point for a training run. Initializes everything and runs the outer training loop.

Output directory is automatically managed: - Under Hydra: uses current directory (Hydra-managed) - Otherwise: creates timestamped directory in outputs/

Parameters:
  • config (Config) – Training configuration specifying environment, agent, and run parameters.

  • agent (Agent | None) – Optional pre-built Agent instance. If provided, config.agent is used only for logging/metadata and the supplied agent runs instead.

Returns:

  • agent_state: Trained agent (ready for inference)

  • training_metrics: Training history (loss, reward, etc.)

  • eval_metrics: Evaluation history (episode returns, lengths)

  • config: Configuration used (for reproducibility)

  • final_env_state: Final environment states (can be used to resume training)

Return type:

TrainingResults containing

myriad.platform.evaluate(config, agent_state=None, agent=None, return_episodes=False, save_episodes_to_disk_flag=None)[source]

Evaluation-only entry point (no training).

Useful for: - Non-learning controllers (random, bang-bang, PID) - Pre-trained models - Baseline comparisons - Benchmarking and validation

Output directory is automatically managed: - Under Hydra: uses current directory (Hydra-managed) - Otherwise: creates timestamped directory in outputs/

Parameters:
  • config (EvalConfig) – EvalConfig specifying environment, agent, and evaluation parameters. Use config_to_eval_config() to convert a training Config if needed.

  • agent_state (AgentState | None) – Optional pre-initialized agent state. If None, agent will be initialized with random weights using config.run.seed.

  • agent (Agent | None) – Optional pre-built Agent instance. If provided, config.agent is used only for logging/metadata and the supplied agent runs instead. Use this for agents whose constructor requires non-serializable arguments (e.g. a JAX array schedule for myriad.agents.classical.open_loop).

  • return_episodes (bool) – If True, return full episode trajectories in EvaluationResults.episodes. This includes observations, actions, rewards, and dones for each step.

  • save_episodes_to_disk_flag (bool | None) – If True, save episodes to disk (respects config settings). If None, infers from config.run.eval_episode_save_frequency. Episodes can be saved to disk without keeping them in memory (return_episodes=False).

Returns:

  • Summary statistics (mean_return, std_return, min, max)

  • Raw episode data (episode_returns, episode_lengths)

  • Optional trajectory data (if return_episodes=True)

  • Metadata (num_episodes, seed)

Return type:

EvaluationResults containing

class myriad.platform.TrainingResults(agent_state, training_metrics, eval_metrics, config, run_dir, final_env_state=None)[source]

Bases: object

Complete results from a training run.

Returned by train_and_evaluate() and contains everything needed to:

  • Use the trained agent for inference

  • Analyze training progress

  • Reproduce the run

  • Resume training (optional)

agent_state: Any

Trained agent state (can be used for inference with agent.select_action()).

training_metrics: TrainingMetrics

Training metrics history (loss, reward, etc.).

eval_metrics: EvaluationMetrics

Evaluation metrics history (episode returns, lengths).

config: Config

Configuration used for this training run (for reproducibility).

run_dir: Path

Directory where training outputs were saved.

final_env_state: Any | None = None

Final state of training environments (can be used to resume training).

summary()[source]

Get summary statistics for quick inspection.

Returns:

  • final_eval_return_mean: Mean return from last evaluation checkpoint

  • final_eval_return_std: Std deviation from last evaluation checkpoint

  • training_steps_per_env: Environment steps per individual environment

  • training_global_steps: Total global environment steps across all envs

  • num_eval_checkpoints: Number of evaluations performed

Return type:

Dictionary with key metrics

__repr__()[source]

Human-readable summary of training results.

save(directory, save_checkpoint=False)[source]

Save results and optionally agent checkpoint to directory.

Saves: - .hydra/config.yaml: Configuration used for the run - results.pkl: TrainingResults without agent_state - checkpoints/final.msgpack: Agent state (if save_checkpoint=True)

Note: The agent_state is excluded from results.pkl and saved separately using Flax msgpack serialization for reliability with JAX/Flax objects.

Parameters:
  • directory (Path | str) – Directory to save results to (typically Hydra output directory)

  • save_checkpoint (bool) – Whether to save agent checkpoint

Raises:

RuntimeError – If agent checkpoint serialization fails

Example

>>> results = train_and_evaluate(config)
>>> results.save(Path.cwd(), save_checkpoint=True)
static load(directory)[source]

Load results from directory.

Parameters:

directory (Path | str) – Directory containing results.pkl

Returns:

Loaded TrainingResults object

Return type:

TrainingResults

Example

>>> results = TrainingResults.load("outputs/2026-02-12/14-30-52")
>>> print(results.summary())
save_agent(path)[source]

Save trained agent state to file using Flax msgpack serialization.

Parameters:

path (str | Path) – Path to save the agent state (typically with .msgpack extension)

Raises:

RuntimeError – If serialization fails

Example

>>> results = train_and_evaluate(config)
>>> results.save_agent("trained_agent.msgpack")
static load_agent(path)[source]

Load agent state from file.

Parameters:

path (str | Path) – Path to the saved agent state file

Returns:

The loaded agent state (can be passed to evaluate())

Raises:
Return type:

Any

Example

>>> agent_state = TrainingResults.load_agent("trained_agent.msgpack")
>>> results = evaluate(config, agent_state=agent_state)
__init__(agent_state, training_metrics, eval_metrics, config, run_dir, final_env_state=None)
class myriad.platform.TrainingMetrics(global_steps, steps_per_env, loss=None, reward=None, agent_metrics=None)[source]

Bases: object

Training metrics collected at each logging checkpoint.

Metrics are captured at intervals defined by eval_frequency in the run config. Each list contains one entry per logging checkpoint.

global_steps: list[int]

Global environment steps at each checkpoint (total across all envs).

steps_per_env: list[int]

Steps per individual environment at each checkpoint.

loss: list[float] | None = None

Training loss values (if available from agent).

reward: list[float] | None = None

Mean reward per step (if available).

agent_metrics: dict[str, list[float]] | None = None

Agent-specific metrics (e.g., q_value, td_error for DQN).

__init__(global_steps, steps_per_env, loss=None, reward=None, agent_metrics=None)
class myriad.platform.EvaluationMetrics(global_steps, steps_per_env, episode_returns, episode_lengths, mean_return, std_return, mean_length)[source]

Bases: object

Evaluation metrics collected at each evaluation checkpoint.

Metrics are captured at intervals defined by eval_frequency in the run config. Each list contains one entry per evaluation checkpoint.

global_steps: list[int]

Global environment steps at each evaluation (total across all envs).

steps_per_env: list[int]

Steps per individual environment at each evaluation.

episode_returns: list[ndarray]

Raw episode returns from each evaluation. Each array contains returns from eval_rollouts episodes.

episode_lengths: list[ndarray]

Raw episode lengths from each evaluation. Each array contains lengths from eval_rollouts episodes.

mean_return: list[float]

Mean episode return at each evaluation.

std_return: list[float]

Standard deviation of episode returns at each evaluation.

mean_length: list[float]

Mean episode length at each evaluation.

__init__(global_steps, steps_per_env, episode_returns, episode_lengths, mean_return, std_return, mean_length)
class myriad.platform.EvaluationResults(mean_return, std_return, min_return, max_return, mean_length, std_length, min_length, max_length, episode_returns, episode_lengths, num_episodes, seed, config, run_dir, episodes=None, agent_state=None)[source]

Bases: object

Results from an evaluation-only run.

Returned by evaluate() and contains:

  • Summary statistics (mean, std, min, max)

  • Raw episode data (for custom analysis)

  • Optional trajectory data (if return_episodes=True)

  • Metadata (seed, num_episodes, config)

mean_return: float

Mean episode return across all episodes.

__init__(mean_return, std_return, min_return, max_return, mean_length, std_length, min_length, max_length, episode_returns, episode_lengths, num_episodes, seed, config, run_dir, episodes=None, agent_state=None)
std_return: float

Standard deviation of episode returns.

min_return: float

Minimum episode return.

max_return: float

Maximum episode return.

mean_length: float

Mean episode length (number of steps).

std_length: float

Standard deviation of episode lengths.

min_length: int

Minimum episode length.

max_length: int

Maximum episode length.

episode_returns: ndarray

(num_episodes,)

Type:

Raw episode returns. Shape

episode_lengths: ndarray

(num_episodes,)

Type:

Raw episode lengths. Shape

num_episodes: int

Number of episodes evaluated.

seed: int

Random seed used for evaluation.

config: EvalConfig

Evaluation configuration used (for reproducibility).

run_dir: Path

Directory where evaluation outputs were saved.

episodes: dict[str, ndarray] | None = None

Full episode trajectories (if return_episodes=True). Contains: - observations: Shape (num_episodes, max_steps, obs_dim) - actions: Shape (num_episodes, max_steps, ...) - rewards: Shape (num_episodes, max_steps) - dones: Shape (num_episodes, max_steps)

agent_state: Any | None = None

Agent state used for evaluation (if provided).

save(directory, save_checkpoint=False)[source]

Save results and optionally agent checkpoint to directory.

Saves: - .hydra/config.yaml: Configuration used for the run (if config is present) - results.pkl: EvaluationResults without agent_state - checkpoints/final.msgpack: Agent state (if save_checkpoint=True and agent_state exists)

Note: The agent_state is excluded from results.pkl and saved separately using Flax msgpack serialization for reliability with JAX/Flax objects.

Parameters:
  • directory (Path | str) – Directory to save results to (typically Hydra output directory)

  • save_checkpoint (bool) – Whether to save agent checkpoint

Raises:

RuntimeError – If agent checkpoint serialization fails

Example

>>> results = evaluate(config, agent_state=agent_state)
>>> results.save(Path.cwd(), save_checkpoint=True)
static load(directory)[source]

Load results from directory.

Parameters:

directory (Path | str) – Directory containing results.pkl

Returns:

Loaded EvaluationResults object

Return type:

EvaluationResults

Example

>>> results = EvaluationResults.load("outputs/2026-02-12/14-30-52")
>>> print(results.summary())
summary()[source]

Get summary statistics for quick inspection.

Returns:

  • mean_return: Mean episode return

  • std_return: Standard deviation of returns

  • min_return: Minimum return

  • max_return: Maximum return

  • mean_length: Mean episode length

  • num_episodes: Number of episodes evaluated

Return type:

Dictionary with key metrics

__repr__()[source]

Human-readable summary of evaluation results.

class myriad.platform.SessionLogger(wandb_run, run_dir, seed=0)[source]

Bases: object

Unified logger for training and evaluation sessions.

Focuses on logging metrics and episodes during runs. Artifact persistence (saving results, checkpoints) is handled by the result objects themselves.

Handles three destinations automatically: 1. Memory - Captures metrics for return values 2. Disk - Saves episode trajectories 3. Remote - Logs to W&B (metrics + artifacts)

Example

>>> logger = SessionLogger.for_training(config)
>>> logger.log_training_step(...)
>>> logger.log_evaluation(..., save_episodes=True)
>>> training_metrics, eval_metrics = logger.get_results()
>>> logger.finalize()
__init__(wandb_run, run_dir, seed=0)[source]

Initialize the session logger.

Parameters:
  • wandb_run (Any | None) – W&B run instance (None to disable remote logging)

  • run_dir (Path) – Base directory for outputs (episode files, etc.)

  • seed (int) – Random seed for metadata

classmethod for_training(config, run_dir=None)[source]

Create a logger for training sessions.

Parameters:
  • config (Config) – Training configuration

  • run_dir (Path | None) – Output directory for artifacts (default: current directory)

Returns:

Configured SessionLogger instance

Return type:

SessionLogger

classmethod for_evaluation(config, run_dir=None)[source]

Create a logger for evaluation-only sessions.

Parameters:
  • config (EvalConfig) – Evaluation configuration

  • run_dir (Path | None) – Output directory for artifacts (default: current directory)

Returns:

Configured SessionLogger instance

Return type:

SessionLogger

log_training_step(global_step, steps_per_env, metrics_history, steps_this_chunk)[source]

Log training metrics.

Handles memory capture + W&B logging.

Parameters:
  • global_step (int) – Global environment steps

  • steps_per_env (int) – Steps per individual environment

  • metrics_history (dict[str, Any]) – Raw metrics from the training loop

  • steps_this_chunk (int) – Number of steps in this chunk

log_evaluation(global_step, steps_per_env, eval_results, save_episodes=False, episode_save_count=None)[source]

Log evaluation results.

One call handles: - Captures metrics to memory - Saves episodes to disk (if save_episodes=True) - Logs metrics to W&B - Uploads episode artifacts to W&B

Parameters:
  • global_step (int) – Global environment steps

  • steps_per_env (int) – Steps per individual environment

  • eval_results (dict[str, Any]) – Dictionary with ‘episode_return’, ‘episode_length’, ‘dones’, and optionally ‘episodes’ (trajectory data)

  • save_episodes (bool) – If True, save episodes to disk and log to W&B

  • episode_save_count (int | None) – Number of episodes to save (None = all available)

Returns:

Path to saved episodes directory (if saved), else None

Return type:

Path | None

get_results()[source]

Return captured metrics without closing the session.

finalize(exit_code=0)[source]

Close the W&B run.

Parameters:

exit_code (int) – 0 for clean/intentional exit (finished, killed by sweep agent, user-stopped), 1 for unexpected failure (OOM, crash).

log_videos(episode_dir, render_frame_fn, global_step, fps=50, max_episodes=None, video_dir=None)[source]

Render saved episodes to videos and log to W&B.

Parameters:
  • episode_dir (Path) – Path to directory containing .npz episode files

  • render_frame_fn (Callable[[ndarray], ndarray]) – Function that takes observation array and returns RGB frame

  • global_step (int) – Global environment steps (for W&B logging step)

  • fps (int) – Frames per second for rendered videos

  • max_episodes (int | None) – Maximum number of episodes to render (None = all)

  • video_dir (Path | None) – Optional output directory for videos (if None, creates temporary videos)

property wandb_run: Any | None

Get the underlying W&B run instance.

property episode_base_dir: Path

Get the base directory for episode storage.

myriad.platform.load_run(run_path)[source]

Load all artifacts from a run directory.

This is the main entry point for loading runs. It loads config, results, and metadata in one call. Agent checkpoints can be loaded on demand.

Parameters:

run_path (str | Path) – Path to run directory

Returns:

RunArtifacts container with all run data

Return type:

RunArtifacts

Example

>>> run = load_run("outputs/2026-02-12/14-30-52")
>>> print(f"Final return: {run.results.summary()['mean_return']}")
>>> agent = run.load_checkpoint()  # Lazy load if needed
myriad.platform.load_run_config(run_path)[source]

Load config from run directory.

Loads from .hydra/config.yaml and validates with Pydantic. Requires run_metadata.yaml to determine config type.

Parameters:

run_path (str | Path) – Path to run directory

Returns:

Config or EvalConfig depending on run type

Raises:
Return type:

Config | EvalConfig

Example

>>> config = load_run_config("outputs/2026-02-12/14-30-52")
>>> print(config.run.seed)
myriad.platform.load_run_results(run_path)[source]

Load results from run directory.

Parameters:

run_path (str | Path) – Path to run directory

Returns:

TrainingResults or EvaluationResults

Return type:

TrainingResults | EvaluationResults

Example

>>> results = load_run_results("outputs/2026-02-12/14-30-52")
>>> print(results.summary())
myriad.platform.load_run_checkpoint(run_path, checkpoint='final')[source]

Load agent checkpoint from run directory.

Parameters:
  • run_path (str | Path) – Path to run directory

  • checkpoint (str) – Checkpoint name (default: “final”)

Returns:

Agent state from checkpoint

Raises:
Return type:

Any

Example

>>> agent_state = load_run_checkpoint("outputs/2026-02-12/14-30-52")
>>> # Use with evaluate()
>>> results = evaluate(config, agent_state=agent_state)
myriad.platform.load_run_metadata(run_path)[source]

Load run metadata from run directory.

Parameters:

run_path (str | Path) – Path to run directory

Returns:

Dictionary with metadata (run_type, timestamp, git_hash, versions)

Raises:

FileNotFoundError – If metadata file not found

Return type:

dict

Example

>>> metadata = load_run_metadata("outputs/2026-02-12/14-30-52")
>>> print(metadata["git_hash"])
class myriad.platform.RunArtifacts(config, results, metadata, run_path)[source]

Bases: Generic[ConfigT, ResultsT]

Container for all artifacts from a run.

Provides a unified interface to access configs, results, metadata, and optionally load checkpoints.

Type parameters:

ConfigT: Config or EvalConfig ResultsT: TrainingResults or EvaluationResults

config: ConfigT

Configuration used for this run.

results: ResultsT

Results from the run.

metadata: dict

Run metadata (timestamp, git hash, versions).

__init__(config, results, metadata, run_path)
run_path: Path

Path to the run directory.

load_checkpoint(checkpoint='final')[source]

Load agent checkpoint from disk.

Always loads fresh from disk (no caching).

Parameters:

checkpoint (str) – Checkpoint name (default: “final”)

Returns:

Agent state from checkpoint

Raises:
Return type:

Any

myriad.platform.fetch_run(run_id)[source]

Fetch a single W&B run by its fully-qualified ID.

Parameters:

run_id (str) – Fully-qualified run ID (entity/project/run_id).

Returns:

A wandb.Run object.

Return type:

Any

myriad.platform.fetch_sweep_runs(sweep_id, *, state=None)[source]

Fetch runs from a W&B sweep, optionally filtered by state.

Parameters:
  • sweep_id (str) – Fully-qualified sweep ID (entity/project/sweep_id).

  • state (str | None) – If provided, only return runs with this state (e.g. "finished", "running", "crashed"). If None, return all runs.

Returns:

List of wandb.Run objects.

Return type:

list[Any]

myriad.platform.fetch_top_k_runs(sweep_id, metric, top_k, *, maximize)[source]

Return the top-K finished runs from a W&B sweep, sorted by metric.

Parameters:
  • sweep_id (str) – Fully-qualified sweep ID (entity/project/sweep_id).

  • metric (str) – W&B summary metric name to rank by (e.g. eval/return/best).

  • top_k (int) – Number of top runs to return.

  • maximize (bool) – If True, sort descending (higher is better). If False, ascending.

Returns:

List of wandb.Run objects, length ≤ top_k.

Return type:

list[Any]

myriad.platform.config_from_wandb_run(run)[source]

Reconstruct a Config from a W&B run object.

W&B stores the full model_dump() nested dict in run.config. Filters W&B-internal metadata and unwraps sweep param wrappers before passing to Config.model_validate.

Parameters:

run (Any) – A wandb.Run object (from e.g. wandb.Api().run(...)).

Returns:

A validated Config instance.

Return type:

Config

myriad.platform.runs_to_dataframe(runs, metrics=None)[source]

Convert a list of W&B runs to a Polars DataFrame.

Each row corresponds to one run. Config fields are flattened with dot-separated keys (e.g. agent.lr). Summary metrics are included as-is.

Parameters:
  • runs (list[Any]) – List of wandb.Run objects.

  • metrics (list[str] | None) – If provided, include only these summary metric keys. If None, include all summary keys that don’t start with _.

Returns:

A polars.DataFrame with one row per run.

Return type:

DataFrame

myriad.platform.save_agent_state(agent_state, path)[source]

Serialize and save agent state to file.

Parameters:
  • agent_state (Any) – Agent state to save

  • path (str | Path) – File path (typically with .msgpack extension)

Raises:

RuntimeError – If serialization or file writing fails

myriad.platform.load_agent_state(path)[source]

Load and deserialize agent state from file.

Parameters:

path (str | Path) – File path to load from

Returns:

Deserialized agent state

Raises:
Return type:

Any

myriad.platform.serialize_agent_state(agent_state)[source]

Serialize agent state to msgpack bytes.

Parameters:

agent_state (Any) – Agent state to serialize (typically Flax TrainState or similar)

Returns:

Serialized bytes

Raises:

RuntimeError – If serialization fails

Return type:

bytes

myriad.platform.deserialize_agent_state(data)[source]

Deserialize agent state from msgpack bytes.

Parameters:

data (bytes) – Msgpack-serialized bytes

Returns:

Deserialized agent state

Raises:

RuntimeError – If deserialization fails

Return type:

Any

Config builder utilities for programmatic use.

This module provides high-level functions to create training and evaluation configs without requiring detailed knowledge of Pydantic models.

myriad.configs.builder.create_config(env, agent, num_envs=1, steps_per_env=1000, rollout_steps=None, eval_max_steps=None, eval_frequency=100, eval_rollouts=10, seed=42, wandb_enabled=False, **kwargs)[source]

Create a training config with sensible defaults.

This is the recommended way to create configs programmatically. It provides a simpler interface than constructing nested Pydantic models.

Parameters:
  • env (str) – Environment name (e.g., “cartpole-control”, “ccas-ccar-control”)

  • agent (str) – Agent name (e.g., “dqn”, “pqn”, “random”)

  • num_envs (int) – Number of parallel environments to run

  • steps_per_env (int) – Number of steps to run per environment

  • rollout_steps (int | None) – Number of steps to collect per environment before updating (for on-policy agents only). If None, defaults to 2 for on-policy agents.

  • eval_max_steps (int | None) – Maximum steps per evaluation episode. If None, uses environment-specific default from registry or Config models.

  • eval_frequency (int) – Log and evaluate every N steps-per-env (0 to disable)

  • eval_rollouts (int) – Number of episodes to run during evaluation

  • seed (int) – Random seed for reproducibility

  • wandb_enabled (bool) – Enable Weights & Biases logging

  • **kwargs (Any) – Additional config overrides. Can specify nested parameters using dot notation (e.g., agent.learning_rate=1e-3) or pass dicts for nested configs (e.g., wandb={"project": "my-project"}).

Returns:

Fully configured Config object ready for train_and_evaluate()

Return type:

Config

myriad.configs.builder.create_eval_config(env, agent, eval_rollouts=10, eval_max_steps=None, seed=42, wandb_enabled=False, **kwargs)[source]

Create an evaluation-only config with sensible defaults.

Use this for evaluating non-learning controllers (random, PID, bang-bang) or pre-trained models without any training.

Parameters:
  • env (str) – Environment name (e.g., “cartpole-control”)

  • agent (str) – Agent name (e.g., “random”, “dqn”)

  • eval_rollouts (int) – Number of episodes to evaluate

  • eval_max_steps (int | None) – Maximum steps per episode. If None, uses environment-specific default from registry or Config models.

  • seed (int) – Random seed for reproducibility

  • wandb_enabled (bool) – Enable Weights & Biases logging

  • **kwargs (Any) – Additional config overrides (same as create_config)

Returns:

Fully configured EvalConfig object ready for evaluate()

Return type:

EvalConfig