跟AI學打遊戲

https://openai.com/index/openai-five-defeats-dota-2-world-champions/

OpenAI Five defeats Dota 2 world champions

OpenAI Five competitive event in a large, dim venue with bright spotlights and a large audience

OpenAI Five is the first AI to beat the world champions in an esports game, having won two back-to-back games versus the world champion Dota 2 team, OG(opens in a new window), at Finals this weekend. Both OpenAI Five and DeepMind’s AlphaStar had previously beaten good pros privately but lost their live pro matches, making this also the first time an AI has beaten esports pros on livestream.

Side view of OpenAI Five Dota team posing behind a panel

At OpenAI Five Finals, we also shared two surprises:

  1. OpenAI Five discovered a rudimentary ability to be a teammate with humans, even though our training process focuses exclusively on beating other bots. The ease with which we turned a competitive AI into a cooperative one makes us hopeful that future AI systems can be very beneficial for humans given active development effort.
  2. From April 18–21, we’re scaling up OpenAI Five to play the Internet(opens in a new window), whether as a competitor or teammate. This final test will let us answer an important research question—to what extent OpenAI Five is exploitable or can otherwise be reliably beaten—and be potentially the largest-ever deployment of a highly-competent deep reinforcement learning agent that people can knowingly interact with.

Why Dota?

We started OpenAI Five in order to work on a problem that felt outside of the reach of existing deep reinforcement learningA algorithms. We hoped that by working on a problem that was unsolvable by current methods, we’d need to make a big increase in the capability of our tools. We were expecting to need sophisticated algorithmic ideas, such as hierarchical reinforcement learning, but we were surprised by what we found: the fundamental improvement we needed for this problem was scale. Achieving and utilizing that scale wasn’t easy and was the bulk of our research effort!

Human view versus AI view of a Dota Matrix

Human view versus AI view of a robotics matrix

OpenAI Five sees the world as a bunch of numbers that it must decipher. It uses the same general-purpose learning code whether those numbers represent the state of a Dota game (about 20,000 numbers) or robotic hand (about 200).

To build OpenAI Five, we created a system called Rapid which let us run PPO at previously unprecedented scale. The results exceeded our wildest expectations, and we produced a world-class Dota bot without hitting any fundamental performance limits.

The surprising power of today’s RL algorithms comes at the cost of massive amounts of experience, which can be impractical outside of a game or simulated environment. This limitation may not be as bad as sounds—for example, we used Rapid to control a robotic hand to dexterously reorient a block, trained entirely in simulation and executed on a physical robot. But we think decreasing the amount of experience is a next challenge for RL.

We are retiring OpenAI Five as a competitor today, but progress made and technology developed will continue to drive our future work. This isn’t the end of our Dota work—we think that Dota is a much more intrinsically interesting and difficult (and now well-understood!) environment for RL development than the standard ones used today.

Compute

OpenAI Five’s victories on Saturday, as compared to its losses at The International 2018, are due to a major change: 8x more training compute. In many previous phases of the project, we’d drive further progress by increasing our training scale. But after The International, we’d already dedicated the vast majority of our project’s compute to training a single OpenAI Five model. So we increased the scale of compute in the only way available to us: training for longer.

Compute versus Ts Final Log Smoothed

In total, the current version of OpenAI Five has consumed 800 petaflop/s-days and experienced about 45,000 years of Dota self-play over 10 realtime months (up from about 10,000 years over 1.5 realtime months as of The International), for an average of 250 years of simulated experience per day. The Finals version of OpenAI Five has a 99.9% winrate versus the TI version.B

Transfer learning

The current version of OpenAI Five has been training continuously since June 2018, despite changes to the model size(opens in a new window) and the game rules (including some fairly large game patch updates and newly implemented features). In each case, we were able to transfer the model over and continue training—something that is an open challenge for RL in other domains. To the best of our knowledge, this is the first time an RL agent has been trained using such a long-lived training run.

To make this work, we’ve continued to flesh out our surgery tooling so that we can start from trained parameters even across substantial architecture changes.

More heroes

We saw very little slowdown in training going from 5 to 18 heroes. We hypothesized the same would be true going to even more heroes, and after The International, we put a lot of effort into integrating new ones.

We spent several weeks training with hero pools up to 25 heroes, bringing those heroes to approximately 5k MMR (about 95th percentile of Dota players). Although they were still improving, they weren’t learning fast enough to reach pro level before Finals. We haven’t yet had time to investigate why, but our hypotheses range from insufficient model capacity to needing better matchmaking for the expanded hero pool to requiring more training time for new heroes to catch up to old heroes. Imagine how hard it is for a human to learn a new hero when everyone else has mastered theirs!

We believe these issues are fundamentally solvable, and solving them could be interesting in its own right. The Finals version plays with 17 heroes—we removed Lich because his abilities were changed significantly in Dota version 7.20.

Cooperative mode

It actually felt nice; my Viper gave his life for me at some point. He tried to help me, thinking “I’m sure she knows what she’s doing” and then obviously I didn’t. But, you know, he believed in me. I don’t get that a lot with [human] teammates. — Sheever (opens in a new window)

Bracket of two competing Dota teams: Sheever and Blitz versus Capitalist and ODPixel

OpenAI Five’s ability to play with humans presents a compelling vision for the future of human-AI interaction, one where AI systems collaborate and enhance the human experience. Our testers reported feeling supported by their bot teammates, that they learned from playing alongside these advanced systems, and that it was generally a fun experience overall.

Note that OpenAI Five exhibits zero-shot transfer learning—it was trained to have all heroes controlled by copies of itself, but generalizes to controlling a subset of heroes, playing with or against humans. We were very surprised this worked as well as it did. In fact, we’d considered doing a cooperative match at The International but assumed it’d require dedicated training.

Arena

We’re launching OpenAI Five Arena, a public experiment where we’ll let anyone play OpenAI Five in both competitive and cooperative modes. We’d known that our 1v1 bot would be exploitable through clever strategies; we don’t know to what extent the same is true of OpenAI Five, but we’re excited to invite the community to help us find out!

Arena (opens in a new window) opens Thursday, April 18 at 6pm PST and will close 11:59pm PST on Sunday, April 21. Please register so we can ensure there’s enough server capacity in your region! Results of all games will be automatically reported to the Arena public leaderboard.

Five Arena matchup

We’re incredibly grateful for all the support the Dota community has shown us over the past two years, and we hope that Arena will also serve as one small way of giving back. Have fun with it!

What’s next

We will be releasing a more technical analysis of OpenAI Five once we’ve reviewed the outcomes of OpenAI Five Arena.

Afterwards, we’ll continue working with the Dota 2 environment within OpenAI. We’ve seen rapid progress in the past two years on RL capabilities, and we think that Dota 2 will continue to help us push forward what’s possible—whether with achieving competent performance from less data or true human-AI cooperation.

If you are interested in advancing AI capabilities and helping further our mission of ensuring they benefit humanity, we’re hiring!

Footnotes

  1. A

Deep reinforcement learning is the idea of training a deep neural network to achieve goals using rewards and punishments

  1. B

Winrate evaluated on the current game patch. This biases the winrate towards the Finals version as the TI version was trained on an older patch, but currently we don’t have another way to compare agents trained on different game versions.

Author

OpenAI

Game Replays & Planning Views

OG Game 1: Replay(opens in a new window), OpenAI Five planning view(opens in a new window)OG Game 2: Replay(opens in a new window), OpenAI Five planning view(opens in a new window)Coop Game: Replay(opens in a new window), OpenAI Five planning view(opens in a new window)Planning View Legend

Last edited by @suen 2024-09-18T12:21:34Z

也实在是没啥办法,星际争霸2有人开发的AI(好像是叫北极星还是啥)好几年前就能打赢顶级职业了,这种上帝视角操控的游戏AI的apm和反应速度甚至多线能力,细微操作都比人类强太多了,人类胜过AI的点可能仅限于更良好的大局观(存疑),非常规的打法选择(比如说SC2开局protess拖农民修地堡或者野3bg一波)ai显然没法以全防流开局所以还是存在被降服的可能,以及对总体战局的把控和诱导。但这些方式对ai来说也就多适应几次。moba和rts被ai征服了,下一个会不会是fps呢

Goal 4: Solve a wide variety of games using a single agent

We aim to train an agent capable enough to solve any game in our initial metric. Games are virtual mini-worlds that are very diverse, and learning to play games quickly and well will require significant advances in generative models and reinforcement learning(opens in a new window). (We are inspired by the pioneering work of DeepMind(opens in a new window), who have produced impressive(opens in a new window) results(opens in a new window) in this area in the past few years.)

Our projects and fundamental research all have shared cores, so progress on any is likely to benefit the others. Each captures a different aspect of goal-solving, and was chosen for its potential to significantly move our metric.

We’re just getting started on these projects, and the details may change as we gain additional data. We also expect to add new projects over time.

圍棋沒了,遊戲也會的。
在職業選手也打不過AI的時代,也許正好可以都一起想想,我們為什麼打遊戲。

1 Like

是不是能让ai打代肝啊)

但是我记得在限制apm的情况下星际的职业还是能跟ai抗衡一下的

反正时间还长,看看未来进步吧

AI在竞技游戏就是开挂一样吧,无缝的态势感知和队内交流,技能和操作可以零失误,不受自身状态影响,AI对对手的感知手段跟人类玩家一样吗,还是可以透视啥的?

比如第五人格那个噩梦人机()

1 Like

单挑就算了,多打多还是能挣扎一下的:)
毕竟游戏里最不缺的就是伪人

1 Like

诶,你猜猜有一种叫AI的外挂原理是什么😋