Core concepts¶
Three-layer design¶
Every experiment has three layers:
┌─────────────────────────────────────┐
│ Learner (DQN, PQN, PID, Random) │ Picks actions from observations
└──────────────┬──────────────────────┘
│ obs, reward, done
┌──────────────▼──────────────────────┐
│ Task (Control) │ Defines obs, reward, termination
└──────────────┬──────────────────────┘
│ state, action → next_state
┌──────────────▼──────────────────────┐
│ Physics (step_physics) │ Pure dynamics, no RL concepts
└─────────────────────────────────────┘
Physics is a stateless pure function: (state, action, params, config) → next_state. No rewards, no termination. You can call it directly for MPC planning or test it against analytical solutions.
Task wraps physics with what the agent observes (get_obs), what it optimizes (reward), and when episodes end. Multiple tasks can share the same physics — you write the dynamics once.
Learner only sees (obs, reward, done). It doesn’t know or care what system it’s controlling.
Layers talk to each other through Python Protocols (structural typing), not inheritance.
JAX-native¶
Environment and agent functions are pure. They work with jax.jit, jax.vmap, and jax.lax.scan.
State is data. Environment state is a NamedTuple PyTree, not a mutable object.
No Python control flow in JIT paths. Use
jax.lax.cond/jax.lax.select, notif/else.Observations are NamedTuples. Classical controllers access fields by name (
obs.theta). Neural networks callobs.to_array(). Both work withvmap.
One vmap call gets you 100,000 parallel environments on a single GPU.
Evaluation vs. training¶
Two execution paths.
Evaluation runs a fixed agent (no learning):
config = create_eval_config("cartpole-control", "pid", kp=1.0)
results = evaluate(config)
Training trains an RL agent, then evaluates:
config = create_config("cartpole-control", "dqn", num_envs=64, steps_per_env=2000)
results = train_and_evaluate(config)
Both return a Results object with metrics. Pass return_episodes=True to evaluate() to get per-step trajectory data.
What’s in the box¶
Environments¶
Environments are registered as {system}-{task}:
Environment |
System |
Task |
|---|---|---|
|
Classic cart-pole |
Balance the pole |
|
Classic pendulum |
Swing up and balance |
|
Gene circuit (CcaS/CcaR) |
Track target protein expression |
from myriad.envs import make_env, list_envs
list_envs() # all registered names
env = make_env("cartpole-control")
Agents¶
Agent |
Type |
Actions |
|---|---|---|
|
Classical |
Any |
|
Classical |
Discrete |
|
Classical |
Continuous |
|
RL |
Discrete |
|
RL |
Continuous |
Classical agents don’t learn — use create_eval_config + evaluate. RL agents need create_config + train_and_evaluate.