Platform¶

Platform module for training and evaluation infrastructure.

myriad.platform.train_and_evaluate(config, agent=None)[source]¶

Main entry point for a training run. Initializes everything and runs the outer training loop.

Output directory is automatically managed: - Under Hydra: uses current directory (Hydra-managed) - Otherwise: creates timestamped directory in outputs/

Parameters:

config (Config) – Training configuration specifying environment, agent, and run parameters.
agent (Agent | None) – Optional pre-built Agent instance. If provided, config.agent is used only for logging/metadata and the supplied agent runs instead.

Returns:

agent_state: Trained agent (ready for inference)
training_metrics: Training history (loss, reward, etc.)
eval_metrics: Evaluation history (episode returns, lengths)
config: Configuration used (for reproducibility)
final_env_state: Final environment states (can be used to resume training)

Return type:

TrainingResults containing

myriad.platform.evaluate(config, agent_state=None, agent=None, return_episodes=False, save_episodes_to_disk_flag=None)[source]¶

Evaluation-only entry point (no training).

Useful for: - Non-learning controllers (random, bang-bang, PID) - Pre-trained models - Baseline comparisons - Benchmarking and validation

Output directory is automatically managed: - Under Hydra: uses current directory (Hydra-managed) - Otherwise: creates timestamped directory in outputs/

Parameters:

config (EvalConfig) – EvalConfig specifying environment, agent, and evaluation parameters. Use config_to_eval_config() to convert a training Config if needed.
agent_state (AgentState | None) – Optional pre-initialized agent state. If None, agent will be initialized with random weights using config.run.seed.
agent (Agent | None) – Optional pre-built Agent instance. If provided, config.agent is used only for logging/metadata and the supplied agent runs instead. Use this for agents whose constructor requires non-serializable arguments (e.g. a JAX array schedule for myriad.agents.classical.open_loop).
return_episodes (bool) – If True, return full episode trajectories in EvaluationResults.episodes. This includes observations, actions, rewards, and dones for each step.
save_episodes_to_disk_flag (bool | None) – If True, save episodes to disk (respects config settings). If None, infers from config.run.eval_episode_save_frequency. Episodes can be saved to disk without keeping them in memory (return_episodes=False).

Returns:

Summary statistics (mean_return, std_return, min, max)
Raw episode data (episode_returns, episode_lengths)
Optional trajectory data (if return_episodes=True)
Metadata (num_episodes, seed)

Return type:

EvaluationResults containing

class myriad.platform.TrainingResults(agent_state, training_metrics, eval_metrics, config, run_dir, final_env_state=None)[source]¶

Bases: object

Complete results from a training run.

Returned by train_and_evaluate() and contains everything needed to:

Use the trained agent for inference
Analyze training progress
Reproduce the run
Resume training (optional)

agent_state: Any¶: Trained agent state (can be used for inference with agent.select_action()).

training_metrics: TrainingMetrics¶: Training metrics history (loss, reward, etc.).

eval_metrics: EvaluationMetrics¶: Evaluation metrics history (episode returns, lengths).

config: Config¶: Configuration used for this training run (for reproducibility).

run_dir: Path¶: Directory where training outputs were saved.

final_env_state: Any | None = None¶: Final state of training environments (can be used to resume training).

summary()[source]¶

Get summary statistics for quick inspection.

Returns:

final_eval_return_mean: Mean return from last evaluation checkpoint
final_eval_return_std: Std deviation from last evaluation checkpoint
training_steps_per_env: Environment steps per individual environment
training_global_steps: Total global environment steps across all envs
num_eval_checkpoints: Number of evaluations performed

Return type:

Dictionary with key metrics

__repr__()[source]¶

Human-readable summary of training results.

save(directory, save_checkpoint=False)[source]¶

Save results and optionally agent checkpoint to directory.

Saves: - .hydra/config.yaml: Configuration used for the run - results.pkl: TrainingResults without agent_state - checkpoints/final.msgpack: Agent state (if save_checkpoint=True)

Note: The agent_state is excluded from results.pkl and saved separately using Flax msgpack serialization for reliability with JAX/Flax objects.

Parameters:

directory (Path | str) – Directory to save results to (typically Hydra output directory)
save_checkpoint (bool) – Whether to save agent checkpoint

Raises:

RuntimeError – If agent checkpoint serialization fails

Example

>>> results = train_and_evaluate(config)
>>> results.save(Path.cwd(), save_checkpoint=True)

static load(directory)[source]¶

Load results from directory.

Parameters:: directory (Path | str) – Directory containing results.pkl
Returns:: Loaded TrainingResults object
Return type:: TrainingResults

Example

>>> results = TrainingResults.load("outputs/2026-02-12/14-30-52")
>>> print(results.summary())

save_agent(path)[source]¶

Save trained agent state to file using Flax msgpack serialization.

Parameters:: path (str | Path) – Path to save the agent state (typically with .msgpack extension)
Raises:: RuntimeError – If serialization fails

Example

>>> results = train_and_evaluate(config)
>>> results.save_agent("trained_agent.msgpack")

static load_agent(path)[source]¶

Load agent state from file.

Parameters:

path (str | Path) – Path to the saved agent state file

Returns:

The loaded agent state (can be passed to evaluate())

Raises:

FileNotFoundError – If file doesn’t exist
RuntimeError – If deserialization fails

Return type:

Any

Example

>>> agent_state = TrainingResults.load_agent("trained_agent.msgpack")
>>> results = evaluate(config, agent_state=agent_state)

__init__(agent_state, training_metrics, eval_metrics, config, run_dir, final_env_state=None)¶

class myriad.platform.TrainingMetrics(global_steps, steps_per_env, loss=None, reward=None, agent_metrics=None)[source]¶

Bases: object

Training metrics collected at each logging checkpoint.

Metrics are captured at intervals defined by eval_frequency in the run config. Each list contains one entry per logging checkpoint.

global_steps: list[int]¶: Global environment steps at each checkpoint (total across all envs).

steps_per_env: list[int]¶: Steps per individual environment at each checkpoint.

loss: list[float] | None = None¶: Training loss values (if available from agent).

reward: list[float] | None = None¶: Mean reward per step (if available).

agent_metrics: dict[str, list[float]] | None = None¶: Agent-specific metrics (e.g., q_value, td_error for DQN).

__init__(global_steps, steps_per_env, loss=None, reward=None, agent_metrics=None)¶

class myriad.platform.EvaluationMetrics(global_steps, steps_per_env, episode_returns, episode_lengths, mean_return, std_return, mean_length)[source]¶

Bases: object

Evaluation metrics collected at each evaluation checkpoint.

Metrics are captured at intervals defined by eval_frequency in the run config. Each list contains one entry per evaluation checkpoint.

global_steps: list[int]¶: Global environment steps at each evaluation (total across all envs).

steps_per_env: list[int]¶: Steps per individual environment at each evaluation.

episode_returns: list[ndarray]¶: Raw episode returns from each evaluation. Each array contains returns from eval_rollouts episodes.

episode_lengths: list[ndarray]¶: Raw episode lengths from each evaluation. Each array contains lengths from eval_rollouts episodes.

mean_return: list[float]¶: Mean episode return at each evaluation.

std_return: list[float]¶: Standard deviation of episode returns at each evaluation.

mean_length: list[float]¶: Mean episode length at each evaluation.

__init__(global_steps, steps_per_env, episode_returns, episode_lengths, mean_return, std_return, mean_length)¶

class myriad.platform.EvaluationResults(mean_return, std_return, min_return, max_return, mean_length, std_length, min_length, max_length, episode_returns, episode_lengths, num_episodes, seed, config, run_dir, episodes=None, agent_state=None)[source]¶

Bases: object

Results from an evaluation-only run.

Returned by evaluate() and contains:

Summary statistics (mean, std, min, max)
Raw episode data (for custom analysis)
Optional trajectory data (if return_episodes=True)
Metadata (seed, num_episodes, config)

mean_return: float¶: Mean episode return across all episodes.

__init__(mean_return, std_return, min_return, max_return, mean_length, std_length, min_length, max_length, episode_returns, episode_lengths, num_episodes, seed, config, run_dir, episodes=None, agent_state=None)¶

std_return: float¶: Standard deviation of episode returns.

min_return: float¶: Minimum episode return.

max_return: float¶: Maximum episode return.

mean_length: float¶: Mean episode length (number of steps).

std_length: float¶: Standard deviation of episode lengths.

min_length: int¶: Minimum episode length.

max_length: int¶: Maximum episode length.

episode_returns: ndarray¶

(num_episodes,)

Type:: Raw episode returns. Shape

episode_lengths: ndarray¶

(num_episodes,)

Type:: Raw episode lengths. Shape

num_episodes: int¶: Number of episodes evaluated.

seed: int¶: Random seed used for evaluation.

config: EvalConfig¶: Evaluation configuration used (for reproducibility).

run_dir: Path¶: Directory where evaluation outputs were saved.

episodes: dict[str, ndarray] | None = None¶: Full episode trajectories (if return_episodes=True). Contains: - observations: Shape (num_episodes, max_steps, obs_dim) - actions: Shape (num_episodes, max_steps, ...) - rewards: Shape (num_episodes, max_steps) - dones: Shape (num_episodes, max_steps)

agent_state: Any | None = None¶: Agent state used for evaluation (if provided).

save(directory, save_checkpoint=False)[source]¶

Save results and optionally agent checkpoint to directory.

Saves: - .hydra/config.yaml: Configuration used for the run (if config is present) - results.pkl: EvaluationResults without agent_state - checkpoints/final.msgpack: Agent state (if save_checkpoint=True and agent_state exists)

Note: The agent_state is excluded from results.pkl and saved separately using Flax msgpack serialization for reliability with JAX/Flax objects.

Parameters:

directory (Path | str) – Directory to save results to (typically Hydra output directory)
save_checkpoint (bool) – Whether to save agent checkpoint

Raises:

RuntimeError – If agent checkpoint serialization fails

Example

>>> results = evaluate(config, agent_state=agent_state)
>>> results.save(Path.cwd(), save_checkpoint=True)

static load(directory)[source]¶

Load results from directory.

Parameters:: directory (Path | str) – Directory containing results.pkl
Returns:: Loaded EvaluationResults object
Return type:: EvaluationResults

Example

>>> results = EvaluationResults.load("outputs/2026-02-12/14-30-52")
>>> print(results.summary())

summary()[source]¶

Get summary statistics for quick inspection.

Returns:

mean_return: Mean episode return
std_return: Standard deviation of returns
min_return: Minimum return
max_return: Maximum return
mean_length: Mean episode length
num_episodes: Number of episodes evaluated

Return type:

Dictionary with key metrics

__repr__()[source]¶

Human-readable summary of evaluation results.

class myriad.platform.SessionLogger(wandb_run, run_dir, seed=0)[source]¶

Bases: object

Unified logger for training and evaluation sessions.

Focuses on logging metrics and episodes during runs. Artifact persistence (saving results, checkpoints) is handled by the result objects themselves.

Handles three destinations automatically: 1. Memory - Captures metrics for return values 2. Disk - Saves episode trajectories 3. Remote - Logs to W&B (metrics + artifacts)

Example

>>> logger = SessionLogger.for_training(config)
>>> logger.log_training_step(...)
>>> logger.log_evaluation(..., save_episodes=True)
>>> training_metrics, eval_metrics = logger.get_results()
>>> logger.finalize()

__init__(wandb_run, run_dir, seed=0)[source]¶

Initialize the session logger.

Parameters:

wandb_run (Any | None) – W&B run instance (None to disable remote logging)
run_dir (Path) – Base directory for outputs (episode files, etc.)
seed (int) – Random seed for metadata

classmethod for_training(config, run_dir=None)[source]¶

Create a logger for training sessions.

Parameters:

config (Config) – Training configuration
run_dir (Path | None) – Output directory for artifacts (default: current directory)

Returns:

Configured SessionLogger instance

Return type:

SessionLogger

classmethod for_evaluation(config, run_dir=None)[source]¶

Create a logger for evaluation-only sessions.

Parameters:

config (EvalConfig) – Evaluation configuration
run_dir (Path | None) – Output directory for artifacts (default: current directory)

Returns:

Configured SessionLogger instance

Return type:

SessionLogger

log_training_step(global_step, steps_per_env, metrics_history, steps_this_chunk)[source]¶

Log training metrics.

Handles memory capture + W&B logging.

Parameters:

global_step (int) – Global environment steps
steps_per_env (int) – Steps per individual environment
metrics_history (dict[str, Any]) – Raw metrics from the training loop
steps_this_chunk (int) – Number of steps in this chunk

log_evaluation(global_step, steps_per_env, eval_results, save_episodes=False, episode_save_count=None)[source]¶

Log evaluation results.

One call handles: - Captures metrics to memory - Saves episodes to disk (if save_episodes=True) - Logs metrics to W&B - Uploads episode artifacts to W&B

Parameters:

global_step (int) – Global environment steps
steps_per_env (int) – Steps per individual environment
eval_results (dict[str, Any]) – Dictionary with ‘episode_return’, ‘episode_length’, ‘dones’, and optionally ‘episodes’ (trajectory data)
save_episodes (bool) – If True, save episodes to disk and log to W&B
episode_save_count (int | None) – Number of episodes to save (None = all available)

Returns:

Path to saved episodes directory (if saved), else None

Return type:

Path | None

get_results()[source]¶

Return captured metrics without closing the session.

finalize(exit_code=0)[source]¶

Close the W&B run.

Parameters:: exit_code (int) – 0 for clean/intentional exit (finished, killed by sweep agent, user-stopped), 1 for unexpected failure (OOM, crash).

log_videos(episode_dir, render_frame_fn, global_step, fps=50, max_episodes=None, video_dir=None)[source]¶

Render saved episodes to videos and log to W&B.

Parameters:

episode_dir (Path) – Path to directory containing .npz episode files
render_frame_fn (Callable[[ndarray], ndarray]) – Function that takes observation array and returns RGB frame
global_step (int) – Global environment steps (for W&B logging step)
fps (int) – Frames per second for rendered videos
max_episodes (int | None) – Maximum number of episodes to render (None = all)
video_dir (Path | None) – Optional output directory for videos (if None, creates temporary videos)

property wandb_run: Any | None¶: Get the underlying W&B run instance.

property episode_base_dir: Path¶: Get the base directory for episode storage.

myriad.platform.load_run(run_path)[source]¶

Load all artifacts from a run directory.

This is the main entry point for loading runs. It loads config, results, and metadata in one call. Agent checkpoints can be loaded on demand.

Parameters:: run_path (str | Path) – Path to run directory
Returns:: RunArtifacts container with all run data
Return type:: RunArtifacts

Example

>>> run = load_run("outputs/2026-02-12/14-30-52")
>>> print(f"Final return: {run.results.summary()['mean_return']}")
>>> agent = run.load_checkpoint()  # Lazy load if needed

myriad.platform.load_run_config(run_path)[source]¶

Load config from run directory.

Loads from .hydra/config.yaml and validates with Pydantic. Requires run_metadata.yaml to determine config type.

Parameters:

run_path (str | Path) – Path to run directory

Returns:

Config or EvalConfig depending on run type

Raises:

FileNotFoundError – If config.yaml or run_metadata.yaml not found
RuntimeError – If run_type field missing from metadata

Return type:

Config | EvalConfig

Example

>>> config = load_run_config("outputs/2026-02-12/14-30-52")
>>> print(config.run.seed)

myriad.platform.load_run_results(run_path)[source]¶

Load results from run directory.

Parameters:: run_path (str | Path) – Path to run directory
Returns:: TrainingResults or EvaluationResults
Return type:: TrainingResults | EvaluationResults

Example

>>> results = load_run_results("outputs/2026-02-12/14-30-52")
>>> print(results.summary())

myriad.platform.load_run_checkpoint(run_path, checkpoint='final')[source]¶

Load agent checkpoint from run directory.

Parameters:

run_path (str | Path) – Path to run directory
checkpoint (str) – Checkpoint name (default: “final”)

Returns:

Agent state from checkpoint

Raises:

FileNotFoundError – If checkpoint file not found
RuntimeError – If deserialization fails

Return type:

Any

Example

>>> agent_state = load_run_checkpoint("outputs/2026-02-12/14-30-52")
>>> # Use with evaluate()
>>> results = evaluate(config, agent_state=agent_state)

myriad.platform.load_run_metadata(run_path)[source]¶

Load run metadata from run directory.

Parameters:: run_path (str | Path) – Path to run directory
Returns:: Dictionary with metadata (run_type, timestamp, git_hash, versions)
Raises:: FileNotFoundError – If metadata file not found
Return type:: dict

Example

>>> metadata = load_run_metadata("outputs/2026-02-12/14-30-52")
>>> print(metadata["git_hash"])

class myriad.platform.RunArtifacts(config, results, metadata, run_path)[source]¶

Bases: Generic[ConfigT, ResultsT]

Container for all artifacts from a run.

Provides a unified interface to access configs, results, metadata, and optionally load checkpoints.

Type parameters:: ConfigT: Config or EvalConfig ResultsT: TrainingResults or EvaluationResults

config: ConfigT¶: Configuration used for this run.

results: ResultsT¶: Results from the run.

metadata: dict¶: Run metadata (timestamp, git hash, versions).

__init__(config, results, metadata, run_path)¶

run_path: Path¶: Path to the run directory.

load_checkpoint(checkpoint='final')[source]¶

Load agent checkpoint from disk.

Always loads fresh from disk (no caching).

Parameters:

checkpoint (str) – Checkpoint name (default: “final”)

Returns:

Agent state from checkpoint

Raises:

FileNotFoundError – If checkpoint file not found
RuntimeError – If deserialization fails

Return type:

Any

myriad.platform.fetch_run(run_id)[source]¶

Fetch a single W&B run by its fully-qualified ID.

Parameters:: run_id (str) – Fully-qualified run ID (entity/project/run_id).
Returns:: A wandb.Run object.
Return type:: Any

myriad.platform.fetch_sweep_runs(sweep_id, *, state=None)[source]¶

Fetch runs from a W&B sweep, optionally filtered by state.

Parameters:

sweep_id (str) – Fully-qualified sweep ID (entity/project/sweep_id).
state (str | None) – If provided, only return runs with this state (e.g. "finished", "running", "crashed"). If None, return all runs.

Returns:

List of wandb.Run objects.

Return type:

list[Any]

myriad.platform.fetch_top_k_runs(sweep_id, metric, top_k, *, maximize)[source]¶

Return the top-K finished runs from a W&B sweep, sorted by metric.

Parameters:

sweep_id (str) – Fully-qualified sweep ID (entity/project/sweep_id).
metric (str) – W&B summary metric name to rank by (e.g. eval/return/best).
top_k (int) – Number of top runs to return.
maximize (bool) – If True, sort descending (higher is better). If False, ascending.

Returns:

List of wandb.Run objects, length ≤ top_k.

Return type:

list[Any]

myriad.platform.config_from_wandb_run(run)[source]¶

Reconstruct a Config from a W&B run object.

W&B stores the full model_dump() nested dict in run.config. Filters W&B-internal metadata and unwraps sweep param wrappers before passing to Config.model_validate.

Parameters:: run (Any) – A wandb.Run object (from e.g. wandb.Api().run(...)).
Returns:: A validated Config instance.
Return type:: Config

myriad.platform.runs_to_dataframe(runs, metrics=None)[source]¶

Convert a list of W&B runs to a Polars DataFrame.

Each row corresponds to one run. Config fields are flattened with dot-separated keys (e.g. agent.lr). Summary metrics are included as-is.

Parameters:

runs (list[Any]) – List of wandb.Run objects.
metrics (list[str] | None) – If provided, include only these summary metric keys. If None, include all summary keys that don’t start with _.

Returns:

A polars.DataFrame with one row per run.

Return type:

DataFrame

myriad.platform.save_agent_state(agent_state, path)[source]¶

Serialize and save agent state to file.

Parameters:

agent_state (Any) – Agent state to save
path (str | Path) – File path (typically with .msgpack extension)

Raises:

RuntimeError – If serialization or file writing fails

myriad.platform.load_agent_state(path)[source]¶

Load and deserialize agent state from file.

Parameters:

path (str | Path) – File path to load from

Returns:

Deserialized agent state

Raises:

FileNotFoundError – If file doesn’t exist
RuntimeError – If deserialization fails

Return type:

Any

myriad.platform.serialize_agent_state(agent_state)[source]¶

Serialize agent state to msgpack bytes.

Parameters:: agent_state (Any) – Agent state to serialize (typically Flax TrainState or similar)
Returns:: Serialized bytes
Raises:: RuntimeError – If serialization fails
Return type:: bytes

myriad.platform.deserialize_agent_state(data)[source]¶

Deserialize agent state from msgpack bytes.

Parameters:: data (bytes) – Msgpack-serialized bytes
Returns:: Deserialized agent state
Raises:: RuntimeError – If deserialization fails
Return type:: Any

Config builder utilities for programmatic use.

This module provides high-level functions to create training and evaluation configs without requiring detailed knowledge of Pydantic models.

myriad.configs.builder.create_config(env, agent, num_envs=1, steps_per_env=1000, rollout_steps=None, eval_max_steps=None, eval_frequency=100, eval_rollouts=10, seed=42, wandb_enabled=False, **kwargs)[source]¶

Create a training config with sensible defaults.

This is the recommended way to create configs programmatically. It provides a simpler interface than constructing nested Pydantic models.

Parameters:

env (str) – Environment name (e.g., “cartpole-control”, “ccas-ccar-control”)
agent (str) – Agent name (e.g., “dqn”, “pqn”, “random”)
num_envs (int) – Number of parallel environments to run
steps_per_env (int) – Number of steps to run per environment
rollout_steps (int | None) – Number of steps to collect per environment before updating (for on-policy agents only). If None, defaults to 2 for on-policy agents.
eval_max_steps (int | None) – Maximum steps per evaluation episode. If None, uses environment-specific default from registry or Config models.
eval_frequency (int) – Log and evaluate every N steps-per-env (0 to disable)
eval_rollouts (int) – Number of episodes to run during evaluation
seed (int) – Random seed for reproducibility
wandb_enabled (bool) – Enable Weights & Biases logging
**kwargs (Any) – Additional config overrides. Can specify nested parameters using dot notation (e.g., agent.learning_rate=1e-3) or pass dicts for nested configs (e.g., wandb={"project": "my-project"}).

Returns:

Fully configured Config object ready for train_and_evaluate()

Return type:

Config

myriad.configs.builder.create_eval_config(env, agent, eval_rollouts=10, eval_max_steps=None, seed=42, wandb_enabled=False, **kwargs)[source]¶

Create an evaluation-only config with sensible defaults.

Use this for evaluating non-learning controllers (random, PID, bang-bang) or pre-trained models without any training.

Parameters:

env (str) – Environment name (e.g., “cartpole-control”)
agent (str) – Agent name (e.g., “random”, “dqn”)
eval_rollouts (int) – Number of episodes to evaluate
eval_max_steps (int | None) – Maximum steps per episode. If None, uses environment-specific default from registry or Config models.
seed (int) – Random seed for reproducibility
wandb_enabled (bool) – Enable Weights & Biases logging
**kwargs (Any) – Additional config overrides (same as create_config)

Returns:

Fully configured EvalConfig object ready for evaluate()

Return type:

EvalConfig