Source Design and API Guide¶

更新日: 2026-07-05

この文書は src/pca/ 以下の実装を、初めて読む人が「どこに何があり、どの API を使えばよいか」から追えるようにまとめた入口である。手法そのものの背景は current-method.md、コード配置の短い一覧は directory-layout.md を参照する。各モジュールごとの詳細説明は modules/index.md に分けている。初見で読む順番は modules/reading-guide.md、処理フロー図は modules/flows.md、変更手順は modules/recipes.md を参照する。

CABT 本体の API 仕様は CABT API module が正である。あちらは Enums / Data Classes / Functions を大項目にして、AreaType、OptionType、Observation、search_begin() などを列挙している。この文書も同じ読み味に寄せ、主要 dataclass、関数、CLI、JSONL schema を一覧化する。

Overview¶

src/pca/ は Pokemon TCG AI Battle Challenge 用エージェントの実装本体である。大きく分けると、CABT 境界、特徴量化、意思決定、探索、モデル、学習、評価、提出 bundle で構成される。

src/pca/
├─ cabt/          # CABT API adapter / card metadata
├─ features/      # observation と legal option の token/object 化
├─ decision/      # policy 関数、checkpoint 推論、remote/batched policy
├─ search/        # determinized search, ISMCTS, belief sampling
├─ models/        # Policy/Value Net, BeliefNet
├─ training/      # self-play JSONL 生成、dataset、学習 loop
├─ evaluation/    # tournament / deck-pool evaluation
├─ rule_agents/   # rule-based teacher / ported agents
├─ serving/       # policy server
└─ submission/    # Kaggle 提出 entrypoint / bundle builder

代表的な 1 手の流れは次の通り。

CABT が公開 observation と select.option を返す。
features.encoder.encode_observation() が state/history/action tokens を作る。
decision.policy の policy 関数が PolicyDecision を返す。
search.ismcts.ismcts_policy() が hidden state をサンプルし、PUCT simulation を回す。
探索の visit distribution から CABT option index を選ぶ。

Data Model¶

PolicyDecision¶

実装: pca.decision.policy.PolicyDecision

1 decision point の policy 出力である。NN-only、rule agent、remote policy、ISMCTS のどれも最終的にはこの形を返す。

Field	Type	Description
`selected`	`list[int]`	CABT `select.option` の index。`min_count` / `max_count` に従う。
`scores`	`list[float]`	各 legal option の logit / score。長さは action count。
`value`	`float \| None`	現在手番プレイヤー視点の局面 value。
`target_policy`	`list[float] \| None`	self-play 学習用の明示 policy target。ISMCTS visit distribution など。
`meta`	`dict[str, Any]`	auxiliary heads、teacher weight、rule-agent 情報などの追加情報。

EncodedObservation¶

実装: pca.features.encoder.EncodedObservation

CABT observation を学習・推論しやすい token/object 形式にしたもの。state、history、action を分けることで、動的 legal action を action-conditioned に扱う。

Field	Description
`state_tokens`	公開盤面の token 列。
`history_tokens`	ログ履歴の token 列。
`action_tokens`	各 legal option の token 列。
`state_card_ids` / `action_card_ids` / `history_card_ids`	static card embedding 用のカード ID。
`state_objects` / `action_objects` / `history_objects`	unified model 用の object rows。
`min_count` / `max_count`	CABT selection が要求する選択数。
`action_count`	legal option 数。

SelfPlayRecord¶

実装: pca.training.targets.SelfPlayRecord

self-play JSONL の 1 行に対応する。1 game ではなく 1 decision point が 1 record。

Field	Type	Description
`search`	`SearchTrainingTarget`	policy/value 学習用の observation、action、search target。
`belief`	`BeliefTrainingTarget \| None`	full observation から得られる hidden-state 教師ラベル。
`aux`	`AuxPrizeTrainingTarget \| None`	完了 trajectory から作るサイド取得補助 target。
`meta`	`dict[str, Any]`	game id、result、deck name、teacher weight、診断値など。

Hidden State¶

実装: pca.search.belief.HiddenStateSample

ISMCTS determinization で CABT search API に渡す hidden zone の具体化である。

Field	Description
`your_deck` / `your_prize`	自分側の hidden deck / prize。
`opponent_deck` / `opponent_prize` / `opponent_hand`	相手側 hidden zone の sample。
`opponent_active`	full observation oracle target 用に使う相手 active。

Package APIs¶

`pca.cabt`¶

CABT 公式 API とこのリポジトリ内部表現の境界。

API	Description
`CABTSearchApi.from_import()`	公式 `cg.api` を import して search API adapter を作る。
`CABTSearchApi.search_begin(obs, hidden)`	hidden state を注入した CABT search session を開始する。
`CABTSearchApi.search_step(state, selected)`	search session 内で 1 action を進める。
`CABTSearchApi.search_release(state)`	search state を解放する。
`has_search_input(obs)`	observation が search API 用 hidden input を持つか判定する。
`load_card_database(path)`	`EN_Card_Data.csv` から static card metadata を読む。

CABT の Observation、SelectData、Option、SearchState の正確な field は CABT 公式 API を参照する。この実装では dict と dataclass の両方を扱うため、境界では features.encoder.get_value() を使う。

`pca.features`¶

observation / option / history の特徴量化。

API	Description
`encode_observation(obs, card_db=None)`	CABT observation を `EncodedObservation` に変換する。
`token(kind, value)`	安定した token id を作る。
`encode_static_card_features(card, vocab)`	static card metadata を token 化する。
`encode_static_attack_features(attack, vocab)`	attack metadata を token 化する。
`get_active_vocab()`	現在の vocabulary を取得する。

設計上のポイント:

legal action は固定クラスではなく、action_tokens[i] として option ごとに表現する。
v13 unified model では token だけでなく object rows も使う。
static card / attack metadata は JSONL に埋め込めるため、学習時に CSV がなくても再現できる。

`pca.decision`¶

policy 関数を作る層。policy 関数の型は概ね Callable[[Any], PolicyDecision]。

API	Description
`deterministic_token_policy(obs)`	encoder score に基づく deterministic fallback。
`neural_policy(obs, model, torch, ...)`	checkpoint model で単発推論する。
`neural_policy_batch(observations, model, torch, ...)`	複数 observation を batch 推論する。
`remote_policy(url, timeout)`	HTTP policy server に推論を委譲する。
`batched_policy_client(...)`	multiprocessing local batcher 経由の policy 関数を作る。
`with_heuristic_reranker(policy, config)`	base policy の score に heuristic reranker を合成する。

PolicyDecision.value は「現在手番プレイヤー視点」である。探索側では root player 視点に必要に応じて符号を変える。

`pca.search`¶

探索と hidden-state sampling。

`pca.search.mcts`¶

value shaping と legacy root rollout utilities。

API	Description
`SearchValueConfig`	search leaf / action delta / aux prize の重み設定。
`search_value_config_for_profile(name)`	`current`、`v12_prize_race`、`v13_aux_prize_race` などの preset。
`terminal_value_for_player(result, player, ...)`	勝敗を root player 視点 value に変換する。
`progress_value_for_player(obs, player, config)`	サイド・盤面 progress 由来の leaf value。
`aux_prize_value_from_decision(obs, decision, player, config)`	aux prize heads を search value に変換する。

`pca.search.belief`¶

hidden information の prior と hard constraint。

API	Description
`BeliefPrior`	hand / deck / prize / next-threat / knockout の prior。
`belief_prior_from_model(obs, model, torch, ...)`	BeliefNet から prior を作る。
`belief_prior_from_output(output, ...)`	model output dict から prior を作る。
`PublicKnowledgeTracker.update(obs)`	公開ログから確定 hidden constraints を更新する。
`sample_hidden_state(obs, your_deck, opponent_deck, rng, ...)`	ISMCTS 用 determinization を作る。

BeliefNet は hidden zone を決め打ちしない。soft prior として sample_hidden_state() に渡し、公開ログで確定する情報は PublicKnowledgeTracker が hard constraint として上書きする。

`pca.search.ismcts`¶

現行の探索本体。pca.search.ismcts は互換 re-export を持つ package facade で、実体は責務別に分かれる。

Module	Role
`config.py`	`ISMCTSConfig`、`SearchApi` protocol、session cleanup。
`tree.py`	`ISMCTSNode`、`ISMCTSTree`、information-set node cache。
`runtime_stats.py`	ISMCTS diagnostics counters。
`simulation_types.py`	deferred leaf / completed simulation の内部 dataclass。
`simulation.py`	hidden state clone、leaf value、simulation loop、leaf batching。
`actions.py`	PUCT、candidate pruning、visit distribution、action category。
`public_state.py`	public information key / public token helpers。
`impl.py`	root policy orchestration。
`policy.py` / `types.py`	互換 facade。

主要 API:

API	Description
`ISMCTSConfig`	determinization 数、simulation 数、候補手上限、Dirichlet noise、leaf batching など。
`ISMCTSRuntimeStats`	self-play diagnostics に流す runtime counters。
`ismcts_policy(obs, your_full_deck, opponent_prior_deck, policy_fn, search_api, config, ...)`	belief sampling 付き ISMCTS で 1 手を決める。
`ismcts_policy_with_hidden(obs, hidden, policy_fn, search_api, config, ...)`	training-only oracle target 用。hidden state を固定して探索する。
`candidate_action_indices(obs, node, encoded, limit, ...)`	root/non-root 候補手 pruning。
`visit_distribution(node, action_count, temperature)`	visit counts を policy target に変換する。
`leaf_value_from_decision(obs, decision, root_player, value_config, stats)`	leaf NN value + progress / aux shaping。

ismcts_policy() は以下の順で動く。

root observation を encode し、base policy_fn から root prior/value を得る。
PublicKnowledgeTracker と BeliefPrior を使って hidden state を determinization ごとに sample する。
CABT search_begin() に hidden state を渡し、tree simulation を実行する。
leaf では rollout せず、NN value と progress value を合成する。
root node の visit distribution を target_policy として返す。

`pca.models`¶

Torch model 定義。

API	Description
`BeliefNet`	opponent hand / deck / prize / threat / KO を予測する supervised model。
`ActionConditionedPolicyValueNet`	legacy policy/value model。
`UnifiedTokenPolicyValueNet`	v13 unified model。dynamic state、history、action、static card/attack を統合する。
`create_policy_value_model(model_config, model_class)`	checkpoint metadata から対応 model を作る factory。
`load_policy_value_checkpoint(path, torch, ...)`	checkpoint を model と metadata に復元する。

v13 unified model は shared transformer trunk から policy logits、win value、turn value、aux prize heads、integrated belief heads を出せる。詳細は unified-token-policy-value.md を参照。

`pca.training`¶

self-play 収集、JSONL dataset、policy/value 学習、belief 学習。

`pca.training.selfplay`¶

Module	Role
`cli_args.py` / `cli.py`	`python -m pca.training.selfplay` の CLI。
`battle.py`	1 game の CABT self-play 実行。
`parallel.py`	workers、local policy batcher、incremental JSONL output。
`policy_factory.py`	CLI args から player0/player1 policy と oracle policy を作る。
`policies.py`	checkpoint policy、search policy、belief prior 解決。
`records.py`	`PendingRecord` から `SelfPlayRecord` を組み立てる facade。
`record_targets.py`	belief target / aux prize target / oracle hidden state。
`record_metadata.py`	card / attack metadata の JSONL 埋め込み。
`record_io.py`	JSONL write / append / truncate / lock cleanup。
`summary_*`	deck / agent / matchup summary と CSV 出力。
`diagnostic_*`	result、policy target、runtime stats diagnostics。

主要 API:

API	Description
`run_selfplay_battle(api, deck0, deck1, policy_factory, config, ...)`	1 game を実行し records と summary を返す。
`collect_selfplay_games(...)`	single-process self-play。
`collect_selfplay_games_parallel(...)`	worker 並列 self-play。
`make_pending_record(obs, decision, ...)`	decision point の中間 record を作る。
`finalize_records(pending, result, ...)`	game 終了後に final_result / aux target を入れる。
`write_records_jsonl(records, path)`	self-play JSONL を書く。

`pca.training.data`¶

JSONL を学習 batch に変換する。

API	Description
`load_records_jsonl(path)`	`SelfPlayRecord` の list を読み込む。
`iter_records_jsonl(path)`	streaming iterator。
`usable_search_records(records)`	search target と action が有効な record を抽出。
`usable_belief_records(records)`	belief target が有効な record を抽出。
`collate_search_batch(records, ...)`	`SearchBatch` を作る。
`collate_belief_batch(records, ...)`	`BeliefBatch` を作る。
`record_policy_weight(record, ...)`	teacher quality / low progress / deck-out 由来の policy weight。

`pca.training.policy_value`¶

Policy/Value 学習。

API	Description
`TrainConfig`	epoch、batch size、loss weight、model class、metadata path など。
`train_policy_value(input_path, output_path, config, best_output_path)`	学習 loop。`pca.training.train` からも re-export。
`policy_value_loss(output, batch, ...)`	policy/value/aux/belief integrated loss。
`resolve_model_config(...)`	input JSONL / checkpoint / CLI から model config を決める。
`load_unified_feature_tables_from_jsonl(...)`	JSONL embedded metadata から static feature table を作る。
`policy_value_checkpoint_payload(...)`	checkpoint payload を作る。

`pca.training.belief`¶

BeliefNet 学習。

API	Description
`BeliefTrainConfig`	belief 学習 config。
`train_belief(input_path, output_path, config, best_output_path)`	BeliefNet 学習 loop。
`belief_loss(output, batch, ...)`	hand / deck / prize / threat / KO loss。
`belief_dataloader(paths, config)`	belief record streaming dataloader。

`pca.evaluation`¶

デッキ間評価、policy 比較、CSV summary。

API	Description
`MatchConfig`	1 matchup の設定。
`run_head_to_head(...)`	deck0 vs deck1 を指定 game 数だけ実行する。
`run_against_deck_pool(...)`	own deck pool と opponent deck pool の組み合わせ評価。
`summarize_matches(matches)`	win rate、unfinished、attack、prize などを集計する。
`write_csv_summary(...)`	evaluation summary CSV を出力する。

CLI は python -m pca.evaluation.tournament。

`pca.rule_agents`¶

rule-based teacher と ported agent。

API	Description
`RuleAgentRegistry`	YAML registry から agent spec と deck compatibility を読む。
`rule_pool_policy_factory(...)`	deck に合う rule agent を選び、policy 関数を作る。
`GenericAdvancedHeuristicAgent`	fallback / 汎用 rule agent。
`PortedRuleAgent`	notebook 由来 agent を repository 内 agent として包む base。

self-play では --policy rule-pool や asymmetric --policy0 rule-pool --policy1 search で使う。

`pca.serving`¶

API	Description
`policy_server.py`	checkpoint policy/value inference を HTTP で提供する。Mac MPS / CUDA を self-play worker から分離したいときに使う。

`pca.submission`¶

API	Description
`submission.main`	Kaggle 提出 entrypoint。CABT observation から selected option index を返す。
`submission.build_bundle`	`src/pca` と checkpoint 等を提出 bundle にまとめる。

Module Reference¶

この節は src/pca/ 配下の各モジュールを、実装の入口として読むための一覧である。Package APIs が「外から何を使うか」を説明するのに対し、ここでは「そのファイルが内部で何を担当しているか」を示す。

CABT Boundary¶

Module	Implementation Notes
`pca.cabt.card_db`	`EN_Card_Data.csv` 由来の `CardStaticFeatures` / `AttackStaticFeatures` / `CardDatabase` を定義する。static card embedding や JSONL metadata 補完の元データを扱う。
`pca.cabt.schema`	CABT の `AreaType` / `OptionType` / `SelectContext` など、公式 API の enum 値を軽量に参照するための境界 module。
`pca.cabt.search_api`	公式 `cg.api.search_begin/search_step/search_release/search_end` を `SearchApi` protocol に合わせて包む。search stats もここで計測する。
`pca.cli_config`	CLI 引数と YAML config を合成する共通 parser。self-play、train、evaluation の `--config` はここを通る。
`pca.data.deck_pool`	deck CSV と deck pool の読み込み・重み付けを扱う。evaluation / self-play の deck sampling で使う。

Features¶

Module	Implementation Notes
`pca.features.encoder`	CABT observation、logs、players、pokemon、legal options を token / object row に変換する中心 module。dict/dataclass 両対応の `get_value()` もここにある。
`pca.features.vocab`	token vocabulary と static feature vocabulary を管理する。`token(kind, value)` の安定 ID 生成を支える。

Decision¶

Module	Implementation Notes
`pca.decision.policy`	`PolicyDecision` と policy runtime stats、deterministic fallback、NN 推論、batch 推論、checkpoint model からの policy adapter を実装する。
`pca.decision.batched_policy`	multiprocessing queue 経由で local policy/value inference を batch 化する。self-play worker が多いときの NN forward 集約に使う。
`pca.decision.remote_policy`	HTTP policy server に observation を送り、`PolicyDecision` を復元する adapter。Mac MPS / GPU inference を別 process に逃がす用途。
`pca.decision.reranker`	policy score に heuristic score を混ぜる実験用 reranker。attack-ready、bench safety、prize race などの補助 score を扱う。

Search¶

Module	Implementation Notes
`pca.search.mcts`	search value shaping の共通部品。terminal value、progress value、selected action delta、aux prize value、legacy root rollout を実装する。
`pca.search.belief`	hidden hand/deck/prize の prior、公開ログからの hard constraints、hidden-state sampling を実装する。`PublicKnowledgeTracker` はここ。
`pca.search.determinized`	旧 determinized search。hidden state を 1 つずつ固定して root rollout する互換 path。

ISMCTS¶

Module	Implementation Notes
`pca.search.ismcts.config`	`ISMCTSConfig` と `SearchApi` protocol、CABT search session cleanup を定義する。探索設定の dataclass はここが正。
`pca.search.ismcts.tree`	`ISMCTSNode` と `ISMCTSTree` を定義する。public information key による node cache、encode cache、node 作成/再利用を担当する。
`pca.search.ismcts.runtime_stats`	ISMCTS の runtime diagnostics を集約する巨大な stats dataclass。深さ、stop reason、candidate pruning、aux prize 利用状況を記録する。
`pca.search.ismcts.simulation_types`	leaf batching 用の内部 dataclass。`DeferredLeafEvaluation` と `CompletedSimulation` を持つ。
`pca.search.ismcts.simulation`	1 simulation の tree traversal、leaf value、backpropagation、deferred leaf batch、hidden state clone を実装する。
`pca.search.ismcts.actions`	PUCT action selection、Dirichlet noise、candidate pruning、option equivalence、visit distribution、selected action 決定を扱う。
`pca.search.ismcts.public_state`	public observation から information-set key と public-state tokens を作る。hidden zone は count / placeholder token に落とす。
`pca.search.ismcts.impl`	`ismcts_policy()` / `ismcts_policy_with_hidden()` の root orchestration。hidden sampling、root node 初期化、simulation loop 呼び出し、最終 `PolicyDecision` 作成を担当する。
`pca.search.ismcts.policy`	policy entrypoint の薄い facade。
`pca.search.ismcts.types`	旧 import 互換の facade。config / runtime_stats / tree / simulation_types を re-export する。

Models¶

Module	Implementation Notes
`pca.models.belief`	`BeliefNet`。public state tokens から opponent hand/deck/prize/threat/KO の supervised output を出す。
`pca.models.policy_value`	legacy `ActionConditionedPolicyValueNet`。state/history/action token を別々に encode し、legal option ごとに score を出す。
`pca.models.policy_value_unified.model`	v13 `UnifiedTokenPolicyValueNet` 本体。unified object token trunk と policy/value/aux/belief heads を束ねる。
`pca.models.policy_value_unified.factory`	model config から legacy/unified model を作る factory。checkpoint 互換 load の入口。
`pca.models.policy_value_unified.tokens`	unified model の token / object tensor 化。
`pca.models.policy_value_unified.card_static`	static card / attack feature table を embedding 入力に変換する。
`pca.models.policy_value_unified.heads`	policy/value/aux/belief heads の小部品。

Training Targets and Data¶

Module	Implementation Notes
`pca.training.targets`	JSONL schema の dataclass。`SearchTrainingTarget`、`BeliefTrainingTarget`、`AuxPrizeTrainingTarget`、`SelfPlayRecord` を定義する。
`pca.training.dataset`	`pca.training.data` の互換 facade。古い import を維持するために主要 loader/collate を re-export する。
`pca.training.data.records`	JSONL から `SelfPlayRecord` を復元する。古い JSONL の欠損 field を default で補う。
`pca.training.data.search_collate`	policy/value 学習用 `SearchBatch` を作る。action padding、policy target、aux/belief integrated targets をまとめる。
`pca.training.data.belief_collate`	belief 学習用 `BeliefBatch` を作る。hidden card ids を multi-hot target に変換する。
`pca.training.data.collate_utils`	multi-hot、card id to index、object row padding など batch 化の共通 helper。
`pca.training.data.weights`	teacher policy weight、passive deck-out、low-progress downweight など record 重み付けを実装する。
`pca.training.data.types`	`SearchBatch` / `BeliefBatch` dataclass。

Self-Play¶

Module	Implementation Notes
`pca.training.selfplay.__main__`	`python -m pca.training.selfplay` の entrypoint。
`pca.training.selfplay.__init__`	package public API の re-export。既存 import 互換を守る。
`pca.training.selfplay.cli_args`	self-play CLI 引数定義。長い option 群と YAML config parse をここに集約する。
`pca.training.selfplay.cli`	CLI 実行の薄い orchestration。defaults、attack metadata 解決、出力 summary 書き込みを行う。
`pca.training.selfplay.impl`	single-process / worker の self-play orchestration。game loop、callback、summary collection を担当する。
`pca.training.selfplay.battle`	1 game の実行本体。CABT battle API に action を返し、pending record と game diagnostics を集める。
`pca.training.selfplay.parallel`	multiprocessing workers、local policy batcher process、incremental JSONL append を管理する。
`pca.training.selfplay.policy_factory`	CLI args から player0/player1 policy factory と oracle policy factory を作る。
`pca.training.selfplay.policies`	deterministic/checkpoint/search/remote policy、belief prior source、NN cache、local batch policy を実装する。
`pca.training.selfplay.records`	`PendingRecord` を `SelfPlayRecord` に確定する record assembly facade。teacher weight と final result もここで入る。
`pca.training.selfplay.record_targets`	full observation から belief label、oracle hidden state、aux prize label を抽出する。
`pca.training.selfplay.record_metadata`	visible card / legal attack metadata を JSONL に埋め込む。
`pca.training.selfplay.record_io`	records JSONL の write/truncate/append と lock file cleanup。parallel stream output の安全性を担う。
`pca.training.selfplay.decks`	deck CSV / deck directory の読み込み、deck spec の作成。
`pca.training.selfplay.state`	observation から current player/result を読む小 helper。
`pca.training.selfplay.agents`	rule-agent assignment の summary と mixed-policy 関連 helper。
`pca.training.selfplay.diagnostic_outcomes`	result reason、policy target normalization、softmax score など outcome helper。
`pca.training.selfplay.diagnostic_stats`	policy/search/CABT stats を self-play diagnostics dict に反映する。
`pca.training.selfplay.diagnostics`	diagnostics のテキスト表示、board/perf/search sections を組み立てる。
`pca.training.selfplay.summary_stats`	deck / agent / matchup summary の集計 dataclass と table builder。
`pca.training.selfplay.summary_display`	console summary rendering。
`pca.training.selfplay.summary_csv`	summary CSV writers。
`pca.training.selfplay.summary`	summary 系 public API の互換 facade。
`pca.training.selfplay.runner`	runner facade。
`pca.training.selfplay.formatting`	ANSI color、log block/section、color toggle。
`pca.training.selfplay.types`	`SelfPlayConfig`、`SelfPlaySummary`、`PendingRecord`、`DeckSpec`、worker args など self-play 共通型。

Policy/Value Training¶

Module	Implementation Notes
`pca.training.train`	Policy/Value training CLI と `train_policy_value()` の facade。policy_value package の公開 helper も互換 re-export する。
`pca.training.policy_value.config`	`TrainConfig` と `SearchDataSummary`。学習設定とデータ件数 summary。
`pca.training.policy_value.data`	search record の prescan、streaming dataloader、data summary を実装する。
`pca.training.policy_value.losses`	policy loss、value loss、turn value、aux prize、integrated belief loss を合成する。
`pca.training.policy_value.metadata`	JSONL embedded card/attack metadata を読み、unified static feature table と cache を作る。
`pca.training.policy_value.model_config`	checkpoint / input JSONL / CLI args から model config を決め、互換 state dict load を行う。
`pca.training.policy_value.checkpointing`	checkpoint payload、training config、model metadata を保存形式にまとめる。
`pca.training.policy_value.runtime`	device selection、AMP、memory log、progress line、LR schedule helper。

Belief Training¶

Module	Implementation Notes
`pca.training.belief_train`	BeliefNet training CLI と互換 facade。
`pca.training.belief.config`	`BeliefTrainConfig`。
`pca.training.belief.data`	belief records の filtering、count、streaming dataloader。
`pca.training.belief.losses`	hand/deck/prize/threat/KO supervised loss。
`pca.training.belief.runner`	BeliefNet training loop と best checkpoint 保存。
`pca.training.belief.checkpointing`	belief checkpoint payload。

Evaluation¶

Module	Implementation Notes
`pca.evaluation.tournament.__main__`	`python -m pca.evaluation.tournament` entrypoint。
`pca.evaluation.tournament.impl`	evaluation CLI、match orchestration、policy construction、deck-pool loop の実体。
`pca.evaluation.tournament.types`	`MatchConfig`、`MatchSummary`、`DeckEvaluation` など評価 result dataclass。
`pca.evaluation.tournament.summaries`	match summaries の集計、CSV row 作成、win/loss/unfinished metrics。
`pca.evaluation.tournament.agents`	evaluation 用 agent/policy factory facade。
`pca.evaluation.tournament.deck_pool`	deck-pool evaluation facade。
`pca.evaluation.tournament.matches`	match execution facade。
`pca.evaluation.tournament.cli`	CLI facade。
`pca.evaluation.bundle_loader`	submission bundle を local evaluation に読み込む helper。
`pca.evaluation.bundle_battle`	bundle agent を使った battle 実行 CLI。

Rule Agents¶

Module	Implementation Notes
`pca.rule_agents.base`	rule agent の base protocol / dataclass。agent assignment や score entry の共通型を置く。
`pca.rule_agents.generic`	汎用 heuristic rule agent。fallback teacher として使う。
`pca.rule_agents.heuristics`	rule agent が使う盤面評価・候補手評価の細かい heuristic 群。
`pca.rule_agents.policy`	rule agent を `PolicyDecision` を返す policy function に変換する。teacher metadata も付与する。
`pca.rule_agents.ported`	notebook 由来 agent を package agent として移植するための base。
`pca.rule_agents.registry`	YAML registry から agent spec、deck compatibility、deck csv を読む。
`pca.rule_agents.agents.*`	個別 deck / strategy 向け agent 実装。各 package は `Agent` を export する。

Serving and Submission¶

Module	Implementation Notes
`pca.serving.policy_server`	HTTP `/predict` で Policy/Value inference を提供する server。self-play workers から remote policy として使う。
`pca.submission.main`	Kaggle runtime で呼ばれる提出 agent。checkpoint load、deck selection、policy/search execution をまとめる。
`pca.submission.build_bundle`	`src/pca`、checkpoint、設定ファイルを Kaggle 提出 bundle に固める。
`pca.tools.export_attack_data`	CABT から attack metadata を export し、static attack embedding に使う JSON を作る。

CLI¶

Self-Play¶

PYTHONPATH=src uv run python -m pca.training.selfplay \
  --config configs/v13/selfplay-v13-ismcts-selfplay.yaml \
  --output data/selfplay/example.jsonl \
  --games 100 \
  --workers 8 \
  --policy search \
  --checkpoint checkpoints/policy_value.pt \
  --device cpu

重要オプション:

Option	Description
`--policy`, `--policy0`, `--policy1`	deterministic / checkpoint / search / rule-pool。
`--checkpoint*`	Policy/Value checkpoint。
`--belief-checkpoint*`	hidden-state prior 用 Belief checkpoint。
`--search-mode ismcts`	ISMCTS を使う。
`--ismcts-determinizations`	hidden state sample 数。
`--ismcts-simulations-per-determinization`	sample ごとの simulation 数。
`--ismcts-leaf-batching`	leaf policy/value batch 評価を有効化。
`--full-observation-targets`	belief target 用 full observation を収集する。
`--no-incremental-output`	game ごとの即時 append を無効化する。

Policy/Value Training¶

PYTHONPATH=src uv run python -m pca.training.train \
  --input data/selfplay/example.jsonl \
  --output checkpoints/policy_value.pt \
  --best-output checkpoints/policy_value_best.pt \
  --epochs 3 \
  --batch-size 64 \
  --model-class unified \
  --device mps

重要オプション:

Option	Description
`--streaming`	JSONL を streaming dataloader で読む。
`--policy-target-source`	`search` / `oracle` / fallback policy target。
`--aux-prize-target-profile completion_v1`	aux prize heads を学習する。
`--integrated-belief-profile`	unified model の belief heads を使う。
`--card-data` / `--attack-data`	static metadata table。JSONL 埋め込みがあれば省略可能。

Belief Training¶

PYTHONPATH=src uv run python -m pca.training.belief_train \
  --input data/selfplay/example.jsonl \
  --output checkpoints/belief.pt \
  --best-output checkpoints/belief_best.pt \
  --epochs 3 \
  --batch-size 64

Evaluation¶

PYTHONPATH=src uv run python -m pca.evaluation.tournament \
  --own-deck-dir decks/own \
  --opponent-deck-dir decks/opponents/holdout \
  --games 20 \
  --policy search \
  --checkpoint checkpoints/policy_value_best.pt \
  --search-mode ismcts \
  --output data/eval/eval.json \
  --summary-output data/eval/eval.csv

Policy Server¶

PYTHONPATH=src uv run python -m pca.serving.policy_server \
  --checkpoint checkpoints/policy_value_best.pt \
  --device mps \
  --host 127.0.0.1 \
  --port 8765

self-play / evaluation 側では --remote-policy-url http://127.0.0.1:8765/predict を指定する。

JSONL Schema¶

self-play JSONL は 1 行 1 SelfPlayRecord。例:

{
  "search": {
    "observation_tokens": [101, 102],
    "history_tokens": [[201]],
    "action_tokens": [[301], [302]],
    "search_policy": [0.8, 0.2],
    "selected_action": [0],
    "final_result": 1.0,
    "card_metadata": [],
    "action_metadata": [],
    "oracle_policy": null,
    "turn_value": 0.15,
    "min_count": 1,
    "max_count": 1
  },
  "belief": {
    "observation_tokens": [101, 102],
    "opponent_hand_card_ids": [10],
    "opponent_deck_card_ids": [11, 12],
    "opponent_prize_card_ids": [13],
    "next_threat_card_ids": [],
    "knockout_threat": 0.0
  },
  "aux": {
    "this_turn_prize_gain": 1,
    "this_turn_prize_gain_mask": true,
    "own_prize_completion": 0.5,
    "opp_prize_completion": 0.0,
    "prize_completion_mask": true
  },
  "meta": {
    "game_id": 0,
    "step_index": 0,
    "your_index": 0,
    "result": 0
  }
}

互換性:

古い JSONL では belief や aux がないことがある。
training.data.records.record_from_dict() は欠損 field に default を入れる。
static metadata は card_metadata / action_metadata に埋め込まれていれば学習時に外部 CSV なしで使える。

Module Boundary Rules¶

CABT dataclass / dict の差異は cabt/ と features.encoder.get_value() で吸収する。
policy 関数は PolicyDecision を返す。探索も rule agent もこの型にそろえる。
探索は search/、学習データ生成は training/selfplay/、学習 loop は training/policy_value/ と training/belief/ に分ける。
大きい package は facade を持つ。pca.training.dataset、pca.training.train、pca.training.selfplay、pca.search.ismcts.types は既存 import 互換のために re-export する。
src/pca/ から scripts/、notebooks/、sample/ へ依存しない。
新しい公開 API を追加したら tests/test_module_boundaries.py に re-export 境界を追加する。

Extension Guide¶

新しい search value profile を追加する¶

pca.search.mcts.SearchValueConfig に必要な weight を足す。
search_value_config_for_profile() に profile 名を追加する。
self-play / evaluation CLI の choices に profile 名を追加する。
tests/test_encoder.py または専用テストで value sign と leaf shaping を確認する。

新しい self-play metadata を JSONL に入れる¶

training/selfplay/records.py または record_metadata.py で PendingRecord / SearchTrainingTarget に入れる。
training/targets.py の dataclass に default 付き field を追加する。
training/data/records.py の record_from_dict() が古い JSONL を読めることを確認する。
tests/test_selfplay_summary.py または tests/test_training.py に round-trip テストを追加する。

新しい model head を追加する¶

models/policy_value_unified/heads.py に head を追加する。
models/policy_value_unified/model.py の output dict に出す。
training/data/search_collate.py で target を batch 化する。
training/policy_value/losses.py に loss と metrics を追加する。
training/policy_value/checkpointing.py に config metadata を保存する。

新しい rule agent を追加する¶

src/pca/rule_agents/agents/<agent_id>/ に agent class を置く。
RuleAgentRegistry の YAML に agent_id、class_path、deck_csv、compatibility metadata を追加する。
--policy rule-pool --rule-agent-config ... で self-play 収集する。
teacher policy weight を使う場合は decision.meta に teacher_type=rule_agent と weight を入れる。

Testing¶

よく使う確認コマンド:

PYTHONPATH=src uv run python -m py_compile src/pca/search/ismcts/impl.py
PYTHONPATH=src uv run python -m pca.training.selfplay --help
PYTHONPATH=src uv run python -m unittest tests.test_module_boundaries
PYTHONPATH=src uv run python -m unittest discover -s tests

リファクタ時の最低ライン:

facade / re-export を動かしたら tests.test_module_boundaries。
encoder、search、ISMCTS を動かしたら tests.test_encoder。
self-play JSONL や summary を動かしたら tests.test_selfplay_summary。
training batch / loss を動かしたら tests.test_training。
最後に unittest discover -s tests。

Source Design and API Guide¶

Overview¶

Data Model¶

PolicyDecision¶

EncodedObservation¶

SelfPlayRecord¶

Hidden State¶

Package APIs¶

pca.cabt¶

pca.features¶

pca.decision¶

pca.search¶

pca.search.mcts¶

pca.search.belief¶

pca.search.ismcts¶

pca.models¶

pca.training¶

pca.training.selfplay¶

pca.training.data¶

pca.training.policy_value¶

pca.training.belief¶

pca.evaluation¶

pca.rule_agents¶

pca.serving¶

pca.submission¶

Module Reference¶

CABT Boundary¶

Features¶

Decision¶

Search¶

ISMCTS¶

Models¶

Training Targets and Data¶

Self-Play¶

Policy/Value Training¶

Belief Training¶

Evaluation¶

Rule Agents¶

Serving and Submission¶

CLI¶

Self-Play¶

Policy/Value Training¶

Belief Training¶

Evaluation¶

Policy Server¶

JSONL Schema¶

Module Boundary Rules¶

Extension Guide¶

新しい search value profile を追加する¶

新しい self-play metadata を JSONL に入れる¶

新しい model head を追加する¶

新しい rule agent を追加する¶

Testing¶

`pca.cabt`¶

`pca.features`¶

`pca.decision`¶

`pca.search`¶

`pca.search.mcts`¶

`pca.search.belief`¶

`pca.search.ismcts`¶

`pca.models`¶

`pca.training`¶

`pca.training.selfplay`¶

`pca.training.data`¶

`pca.training.policy_value`¶

`pca.training.belief`¶

`pca.evaluation`¶

`pca.rule_agents`¶

`pca.serving`¶

`pca.submission`¶