Core concepts¶

Three-layer design¶

Every experiment has three layers:

┌─────────────────────────────────────┐
│  Learner  (DQN, PQN, PID, Random)  │  Picks actions from observations
└──────────────┬──────────────────────┘
               │ obs, reward, done
┌──────────────▼──────────────────────┐
│  Task  (Control)                    │  Defines obs, reward, termination
└──────────────┬──────────────────────┘
               │ state, action → next_state
┌──────────────▼──────────────────────┐
│  Physics  (step_physics)            │  Pure dynamics, no RL concepts
└─────────────────────────────────────┘

Physics is a stateless pure function: (state, action, params, config) → next_state. No rewards, no termination. You can call it directly for MPC planning or test it against analytical solutions.

Task wraps physics with what the agent observes (get_obs), what it optimizes (reward), and when episodes end. Multiple tasks can share the same physics — you write the dynamics once.

Learner only sees (obs, reward, done). It doesn’t know or care what system it’s controlling.

Layers talk to each other through Python Protocols (structural typing), not inheritance.

JAX-native¶

Environment and agent functions are pure. They work with jax.jit, jax.vmap, and jax.lax.scan.

State is data. Environment state is a NamedTuple PyTree, not a mutable object.
No Python control flow in JIT paths. Use jax.lax.cond / jax.lax.select, not if/else.
Observations are NamedTuples. Classical controllers access fields by name (obs.theta). Neural networks call obs.to_array(). Both work with vmap.

One vmap call gets you 100,000 parallel environments on a single GPU.

Evaluation vs. training¶

Two execution paths.

Evaluation runs a fixed agent (no learning):

config = create_eval_config("cartpole-control", "pid", kp=1.0)
results = evaluate(config)

Training trains an RL agent, then evaluates:

config = create_config("cartpole-control", "dqn", num_envs=64, steps_per_env=2000)
results = train_and_evaluate(config)

Both return a Results object with metrics. Pass return_episodes=True to evaluate() to get per-step trajectory data.

What’s in the box¶

Environments¶

Environments are registered as {system}-{task}:

Environment	System	Task
`cartpole-control`	Classic cart-pole	Balance the pole
`pendulum-control`	Classic pendulum	Swing up and balance
`ccas-ccar-control`	Gene circuit (CcaS/CcaR)	Track target protein expression

from myriad.envs import make_env, list_envs

list_envs()           # all registered names
env = make_env("cartpole-control")

Agents¶

Agent	Type	Actions
`random`	Classical	Any
`bangbang`	Classical	Discrete
`pid`	Classical	Continuous
`dqn`	RL	Discrete
`pqn`	RL	Continuous

Classical agents don’t learn — use create_eval_config + evaluate. RL agents need create_config + train_and_evaluate.