参考文献・先行研究¶

Pokemon TCG AI Battle Challenge への取り組みで参照する論文・実装。Strategy Category 提出レポートの参考文献欄の素材として育てる。

新しく論文を読んだら必ずここに追記し、それぞれ docs/exploration/paper-{slug}.md に要約を残すこと。

不完全情報ゲームの探索 (Phase 3 中心)¶

Cowling, Powley, Whitehouse (2012) "Information Set Monte Carlo Tree Search" — PDF / IEEE
IS-MCTS 原論文。SO-ISMCTS / MO-ISMCTS / Multi-Observer ISMCTS の 3 アルゴリズムを定義。Phase 3-C のロードマップの根拠。
Whitehouse, Powley, Cowling "Determinization and Information Set Monte Carlo Tree Search for Dou Di Zhu" — IEEE
中国カードゲーム Dou Di Zhu で Determinized UCT vs IS-MCTS を実測比較した先行例。ポケカに最も近い設定の 1 つ。
Cowling et al. (2015) "Information capture and reuse strategies in MCTS, with applications to games of hidden information" — ScienceDirect
情報集合間の情報共有・再利用の手法。Phase 5 で SO-ISMCTS を作る際の最適化アイデア。
Powley, Cowling, Whitehouse (2013) "Reducing the burden of knowledge" — AI Factory
一般読者向けまとめ。Strategy レポートの背景説明に引用しやすい。
Schmid et al. (2021) "Student of Games: A unified learning algorithm for both perfect and imperfect information games" — arXiv 2112.03178
AlphaZero と DeepStack の統合系。Phase 5 で「斬新な手法」として参照する価値あり。

Silver et al. (2018) "AlphaZero" — original
policy/value head の loss 設計 (visit 分布 cross-entropy + value MSE) は Phase 3-B の修正の根拠。
Schrittwieser et al. (2020) "MuZero"
モデル学習を含めた拡張。CABT エンジンが提供されているため MuZero 化のメリットは限定的だが、特性発動のような効果推論を NN に任せる余地はある。
MiniZero (2023) — arXiv 2310.11305
AlphaZero/MuZero の公平な比較。ハイパラ調整の指針として。
Survey on Self-play Methods (2024) — arXiv 2408.01072
Phase 3-D のチェックポイントローテーション設計の参照。

Moravčík et al. (2017) "DeepStack: Expert-Level AI in Heads-Up No-Limit Poker" — arXiv 1701.01724
continual re-solving と Bayes 範囲更新。ポケカの相手 hand belief 更新のアイデアの源。
DouZero+ (2022) "Improving DouDizhu AI by Opponent Modeling and Coach-guided Learning" — arXiv 2204.02558
カードゲームでの explicit opponent modeling。Phase 5-5 (デッキタイプ分類器) の参照。
Heinrich, Silver (2016) "Deep RL from Self-Play in Imperfect-Information Games" (NFSP)
平均戦略学習。policy collapse 対策の理論。
He et al. (2016) "Opponent Modeling in Deep Reinforcement Learning" — arXiv 1609.05559
implicit/explicit opponent modeling の整理。

Microsoft Suphx (2020) "Mastering Mahjong with Deep Reinforcement Learning" — arXiv 2003.13590
global reward prediction, oracle guiding (訓練時のみ完全情報), runtime policy adaptation。Phase 5-4 の根拠。
Tjong (2024) "Transformer-based Mahjong AI" — Wiley
階層的意思決定と fan backward。ポケカの「メイン選択 → 詳細選択」の階層を扱う参照。

Świechowski et al. (2018) "Improving Hearthstone AI by Combining MCTS and Supervised Learning" — arXiv 1808.04794
Hearthstone での MCTS + 教師あり学習併用。
Vieira, Tavares, Chaimowicz (2020〜2023) Legends of Code and Magic 系列 — arXiv 2407.05879 / arXiv 2009.00655
CCG での deck-building と drafting の RL。
Hoover et al. (2024) "A Taxonomy of Collectible Card Games from a Game-Playing AI Perspective" — arXiv 2410.06299
CCG AI の分類。Strategy レポート背景章に有用。

taylorhansen/pokemonshowdown-ai — GitHub
Showdown 上の Pokemon バトルの RL 実装。
poke-env — Docs
Showdown 用 Python interface。
Wang (2024) MIT MEng thesis "Winning at Pokémon Random Battles Using RL" — dspace
PPO 自己対戦 1677 → PPO+MCTS lookahead 1756 (Glicko-1) の改善。Phase 4 の MCTS 併用が効くことの数値的裏付け。
(2025) "Human-Level Competitive Pokémon via Scalable Offline RL with Transformers" — arXiv 2504.04395
Transformer + offline RL の最新研究。Strategy レポートの「斬新な手法」議論のベンチマーク。