Flow Diagrams¶
更新日: 2026-07-05
主要処理の時系列を Mermaid で示す。コードを読む前に、どの module がどの順番で呼ばれるかを掴むためのページである。
Self-Play Collection Flow¶
flowchart TD
CLI["selfplay cli.py / cli_args.py"] --> PAR["parallel.py<br/>worker split / local batcher"]
PAR --> IMPL["impl.py<br/>collect_selfplay_games"]
IMPL --> BATTLE["battle.py<br/>run_selfplay_battle"]
BATTLE --> POLICY["policy_factory.py / policies.py<br/>PolicyDecision"]
POLICY --> SEARCH["search.ismcts<br/>optional search policy"]
BATTLE --> PENDING["records.py<br/>make_pending_record"]
PENDING --> TARGETS["record_targets.py<br/>belief / aux labels"]
PENDING --> META["record_metadata.py<br/>card / attack metadata"]
BATTLE --> FINAL["records.py<br/>finalize_records"]
FINAL --> IO["record_io.py<br/>append/write JSONL"]
FINAL --> SUM["summary_stats/display/csv.py"]
重要な境界:
battle.pyは CABT の game progression を担当する。records.pyは decision point を training record に変換する。record_io.pyは incremental output の atomic-ish append を担当する。
ISMCTS Simulation Flow¶
flowchart TD
ROOT["impl.ismcts_policy"] --> ENC["features.encoder.encode_observation"]
ROOT --> BASE["base policy_fn<br/>root prior/value"]
ROOT --> TREE["tree.ISMCTSTree<br/>root node"]
ROOT --> BELIEF["belief.sample_hidden_state"]
BELIEF --> BEGIN["SearchApi.search_begin"]
BEGIN --> SIM["simulation.run_simulation"]
SIM --> KEY["public_state.information_set_key"]
SIM --> CAND["actions.candidate_action_indices"]
CAND --> PUCT["actions.select_puct_action"]
PUCT --> STEP["SearchApi.search_step"]
STEP --> LEAF["simulation.leaf_value_from_decision"]
LEAF --> VALUE["mcts progress / aux value"]
VALUE --> BACK["actions.backpropagate"]
BACK --> VISITS["actions.visit_distribution"]
VISITS --> DECISION["PolicyDecision"]
重要な境界:
impl.pyは root decision と hidden sampling orchestration。simulation.pyは tree traversal と backup。actions.pyは action selection / visit distribution。public_state.pyは information-set identity。
Training Data Flow¶
flowchart LR
JSONL["self-play JSONL<br/>SelfPlayRecord"] --> LOAD["training.data.records<br/>record_from_dict"]
LOAD --> FILTER["usable_search_records<br/>filter_records_by_player"]
FILTER --> COLLATE["search_collate.collate_search_batch"]
COLLATE --> MODEL["Policy/Value model"]
MODEL --> LOSS["policy_value.losses"]
LOSS --> CKPT["policy_value.checkpointing<br/>checkpoint .pt"]
JSONL --> BFILTER["usable_belief_records"]
BFILTER --> BCOLLATE["belief_collate.collate_belief_batch"]
BCOLLATE --> BMODEL["BeliefNet"]
BMODEL --> BLOSS["belief.losses"]
BLOSS --> BCKPT["belief.checkpointing<br/>belief .pt"]
重要な境界:
- JSONL schema の正は
training.targets。 - loader は古い JSONL 互換を守る。
- collate は model input tensor と target tensor の契約点。
Checkpoint Load / Inference Flow¶
flowchart TD
CKPT["checkpoint .pt"] --> LOAD["load_policy_value_checkpoint<br/>model metadata"]
LOAD --> FACTORY["create_policy_value_model"]
FACTORY --> MODEL["legacy or unified model"]
MODEL --> POLICY["decision.policy.neural_policy"]
POLICY --> SEARCH{"search enabled?"}
SEARCH -- no --> OUT["PolicyDecision"]
SEARCH -- yes --> ISMCTS["search.ismcts.ismcts_policy"]
ISMCTS --> OUT
SERVER["serving.policy_server"] --> POLICY
SUB["submission.main"] --> LOAD
重要な境界:
- checkpoint の
model_classとmodel_configが model 復元を決める。 PolicyDecisionは NN-only / search / remote の共通戻り値。
Evaluation Flow¶
flowchart TD
CLI["evaluation.tournament.cli"] --> IMPL["evaluation.tournament.impl"]
IMPL --> DECKS["deck pool loading"]
IMPL --> POLICY["policy construction"]
POLICY --> MATCH["run_head_to_head / match loop"]
MATCH --> SUMMARY["summaries.summarize_matches"]
SUMMARY --> JSON["evaluation JSON"]
SUMMARY --> CSV["summary CSV"]
重要な境界:
- evaluation は checkpoint 単体ではなく、online policy/search を含めた agent behavior を測る。
- summary は win rate だけでなく result reason、attack、prize、unfinished を見る。