Belief Training Modules¶

対象: pca.training.belief_train, pca.training.belief

Purpose¶

self-play JSONL に含まれる full-observation teacher label から BeliefNet を学習する。学習済み BeliefNet は ISMCTS の hidden-state sampling prior として使う。

Modules¶

Module	Role	Implementation Details
`pca.training.belief_train`	CLI / facade	`python -m pca.training.belief_train` の entrypoint と旧 import path 互換。
`pca.training.belief.config`	config	`BeliefTrainConfig`。
`pca.training.belief.data`	dataloader	belief target のある records を count/filter/streaming load する。
`pca.training.belief.losses`	loss	hand/deck/prize/threat/KO の supervised loss。
`pca.training.belief.runner`	training loop	model/optimizer loop、metrics、best checkpoint。
`pca.training.belief.checkpointing`	checkpoint payload	BeliefNet checkpoint metadata。

Public API¶

API	Usage
`BeliefTrainConfig`	belief training 設定。
`train_belief(input_path, output_path, config, best_output_path)`	BeliefNet 学習。
`belief_dataloader(paths, config)`	belief batch iterator。
`belief_loss(output, batch, ...)`	supervised belief loss。
`belief_checkpoint_payload(...)`	checkpoint 保存 payload。

CLI Usage¶

PYTHONPATH=src uv run python -m pca.training.belief_train \
  --input data/selfplay/example.jsonl \
  --output checkpoints/belief.pt \
  --best-output checkpoints/belief_best.pt \
  --epochs 3 \
  --batch-size 64

Notes¶

belief target は self-play 時に full observation が取れている record にだけ存在する。
推論時の hidden consistency は BeliefNet だけでなく PublicKnowledgeTracker の hard constraints と組み合わせる。