Self-Play Modules¶

対象: pca.training.selfplay

Purpose¶

CABT self-play を実行し、1 decision point ごとの SelfPlayRecord を JSONL に保存する。checkpoint policy、search policy、rule agent、remote policy、parallel workers、summary CSV をまとめて扱う。

Modules¶

Module	Role	Implementation Details
`__main__.py`	module entrypoint	`python -m pca.training.selfplay` から `cli.main()` を呼ぶ。
`__init__.py`	public facade	package 外向き API を re-export する。
`cli_args.py`	CLI args	self-play の全 option と YAML config parse。
`cli.py`	CLI orchestration	defaults、attack metadata、output/summary writing。
`impl.py`	collection flow	single-process collection と worker execution。
`battle.py`	one-game loop	CABT battle API を進め、pending records と game diagnostics を集める。
`parallel.py`	multiprocessing	worker 分割、local policy batcher、incremental output。
`policy_factory.py`	policy construction	CLI args から player0/player1 policy と oracle policy を作る。
`policies.py`	policy implementations	checkpoint/search/remote/local batch policy、belief prior source 解決。
`records.py`	record assembly	`PendingRecord` を final `SelfPlayRecord` に変換する facade。
`record_targets.py`	training labels	belief label、oracle hidden state、aux prize target を抽出する。
`record_metadata.py`	embedded metadata	visible card / legal attack metadata を JSONL に埋め込む。
`record_io.py`	JSONL I/O	write/truncate/append と lock cleanup。
`decks.py`	deck loading	deck CSV / directory / `DeckSpec`。
`state.py`	observation state	current player/result helpers。
`agents.py`	rule-agent metadata	rule-agent assignment summary。
`diagnostic_outcomes.py`	outcome helpers	result reason、policy target normalization、softmax scores。
`diagnostic_stats.py`	stats collection	policy/search/CABT runtime stats を diagnostics に反映する。
`diagnostics.py`	log rendering	board/perf/search sections を組み立てる。
`summary_stats.py`	summary stats	deck/agent/matchup table の集計。
`summary_display.py`	console output	summary table の表示。
`summary_csv.py`	CSV output	deck/agent/matchup summary CSV。
`summary.py`	summary facade	summary 系 API の re-export。
`runner.py`	runner facade	battle runner の re-export。
`formatting.py`	log formatting	ANSI color と section helper。
`types.py`	shared types	`SelfPlayConfig`, `SelfPlaySummary`, `PendingRecord`, `DeckSpec`, worker args。

Public API¶

API	Usage
`run_selfplay_battle(...)`	1 game の records と summary を返す。
`collect_selfplay_games(...)`	single-process collection。
`collect_selfplay_games_parallel(...)`	worker 並列 collection。
`make_pending_record(...)`	decision point record の中間表現を作る。
`finalize_records(...)`	game 終了後に result/value/aux target を確定する。
`write_records_jsonl(...)` / `append_records_jsonl(...)`	JSONL 出力。

CLI Usage¶

PYTHONPATH=src uv run python -m pca.training.selfplay \
  --config configs/v13/selfplay-v13-ismcts-selfplay.yaml \
  --output data/selfplay/example.jsonl \
  --games 100 \
  --workers 8

Notes¶

長時間実行では incremental JSONL append を使う。中断しても完了 game の record は残る。
records.py は facade。label 抽出や I/O の実体は record_targets.py / record_io.py に分かれている。