MetaAI_Official OP t1_izfmvrj wrote on December 8, 2022 at 7:28 PM

Reply to comment by Butanium_ in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

Actually our agent can play longer games, and much of our earlier testing (where we had to manually approve all outgoing messages) was on 24 hour games instead of the 5 minute games that we report on in the paper. The agent is overall a bit more effective in shorter time controls but the agent was in fact scoring quite well in longer time formats as well. However, these games take weeks to complete, and ultimately we decided that it would take too long to play enough games for statistical significance, hence the focus on shorter games. -JG

MetaAI_Official OP t1_izfmvc5 wrote on December 8, 2022 at 7:28 PM

Reply to comment by mouldygoldie in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

Thanks! Glad you enjoyed it! -NB

MetaAI_Official OP t1_izfmiav wrote on December 8, 2022 at 7:26 PM

Reply to comment by Specialist-Regret241 in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

As noted in an answer to a previous question: we were originally targeting 24hr-turn games, but ended up pivoting to 5min-turn games due to the inability to gather a sufficient number of samples in the 24hr-turn format (as playing a single game can sometimes take months)! Playing 24hr-turn games would indeed pose additional challenges from a language generation perspective — while human players tend to send a similar number of messages in each format, messages in 24hr turns tend to be significantly longer (and likely more complex). Moreover, human players would have more time to interrogate mistakes from the bot, which could potentially lead to the agent making further mistakes. -ED

MetaAI_Official OP t1_izfm7lf wrote on December 8, 2022 at 7:24 PM

Reply to comment by MetaAI_Official in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

We did also get good human players to review the games and look for really good or bad moves, but that was very early in the development process - CICERO generated good moves and it would be counter-productive to stop it making what it thinks is the best moves. For example, at the tournament I was at in Bangkok a few weeks ago I thought "what would CICERO do?" and then I did a different set of moves - but what CICERO would have done was right! -AG

MetaAI_Official OP t1_izfm4rw wrote on December 8, 2022 at 7:23 PM

Reply to comment by loranditsum in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

I don’t know about “biggest” :p but as someone without a graduate degree working in AI research, I’ve definitely felt imposter syndrome at times. One of the amazing things about working with large teams of research experts is that people bring extremely deep and diverse knowledge. Just on our team there are experts in NLP, reinforcement learning, game theory, systems engineering, and Diplomacy itself. When people are specialized in this way, the total knowledge on the team is much more than the knowledge of any individual, which is excellent for the team but was daunting for me at first! -JG

MetaAI_Official OP t1_izflw2v wrote on December 8, 2022 at 7:22 PM

Reply to comment by TheBaxes in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

There's quite a few open-source Reinforcement Learning challenges that you can explore with modest amounts of compute in order to build some experience training RL models, for example the Nethack Learning Environment, Atari, Minigrid, etc. For me personally, I had only worked in NLP / dialogue for years but got into RL by implementing Random Network Distillation models for NetHack. It's a fun area that definitely has its own unique challenges vs other domains. -AM

MetaAI_Official OP t1_izfll3c wrote on December 8, 2022 at 7:20 PM

Reply to comment by Swolnerman in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

As someone without a PhD, I will say I definitely don't think it's necessary to have a graduate degree to work at the cutting edge of ML. Our team contains people with a mix of educational backgrounds working on all aspects of the projects, and the majority of the team do not have PhDs. I don't think there's an optimal choice for everyone, it probably depends on how you learn best and what type of problem you want to work on, but there's certainly a lot of great research being done by people without PhDs within industry! -AL

MetaAI_Official OP t1_izfljno wrote on December 8, 2022 at 7:19 PM

Reply to comment by RapidRewards in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

It takes significant effort, but yes, on the strategic planning side it is often possible to work out why CICERO came up with particular moves or intents. We often did this during development when debugging. You can look at the moves considered by the search for it and its opponents and see what values those achieved in the iterations within the search, and see how the equilibrium evolved in response to those values, you can look at the initial policy prior probabilities, and so on. Not entirely unlike walking through a debug log of how a chess engine explored a tree of possible moves and why it came up with the value it did. In fact, generally with systems that do explicit planning rather than simply running a giant opaque model end-to-end, it's usually possible to reverse-engineer "why" the system is doing something, although it may take a lot of time and effort per position. We haven't tried a human in the loop for choosing moves though. -DW

MetaAI_Official OP t1_izflfk2 wrote on December 8, 2022 at 7:19 PM

Reply to comment by Takithereal in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

I think the speed that it went from playing no communication games to full communication games was the biggest surprise - not just the natural language but the adaptation of strategy and tactics. I expected it to really struggle to climb out of what it had learned from that style of game, but it did so pretty quickly, which is probably down to the technical expertise of the team. I guess beyond that, the AI plays some approaches that upset the inherited wisdom of the diplomacy playing group. I'm totally revisiting some opening lines for example. In terms of what we can learn - the strategic ideas the emerge seem to be very much aligned with high level human players. Patience, collaboration, improving position rather than brute force tactical tricks... at that level of abstraction it plays very similarly to a good human player. -AG

MetaAI_Official OP t1_izfldg5 wrote on December 8, 2022 at 7:18 PM

Reply to comment by ditlevrisdahl in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

Early on, we primarily evaluated the model using self-play, having team members play against it, and by building small test sets to evaluate specific behaviors. In the last year, we started evaluating the model by putting it in live games against humans (with another human in the loop to review its outgoing messages and intervene if necessary). We quickly learned that the mistakes the model makes in self-play weren't necessarily reflective of its behaviors in human play. Playing against humans became *super* important for developing our research agenda! -ED

MetaAI_Official OP t1_izfl6g3 wrote on December 8, 2022 at 7:17 PM

Reply to comment by levi97zzz in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

CICERO reasons about the beliefs, goals, and intentions of the other players. Whether that counts as "theory of mind" depends on the definition. This reasoning is partly implicit through the output of the policy network based on the conversations and sequence of actions, and part of it is explicit through the strategic reasoning algorithm. -NB

MetaAI_Official OP t1_izfkmc4 wrote on December 8, 2022 at 7:13 PM

Reply to comment by TheFibo1123 in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

Re: Dialogue-related challenges: Moving from the "no press" setting (without negotiation) to the "full press" setting presented a host of challenges at the intersection of natural language processing and strategic reasoning. From a language perspective, playing Diplomacy requires engaging in lengthy and complex conversations with six different parties simultaneously. Messages the agent sends needed to be grounded in both the game state as well as the long, dialogue histories. In order to actually win the game, the agent must not only mimic human-like conversation, but it must also use language as an *intentional tool* to engage in negotiations and achieve goals. On the flip side, it also requires *understanding* these complex conversations in order to plan and take appropriate actions. Consider: if the agents actions did not reflect its conversations/agreements, players may not want to cooperate with it, and at the same time, it must take into account that other players might not be honest when coordinating/negotiating to plans.

Re: AI technologies in the future: advancements in this space have many potential applications and will hopefully improve human-AI communication in general to get closer to the way people communicate with each other. -ED

MetaAI_Official OP t1_izfk9ug wrote on December 8, 2022 at 7:11 PM

Reply to comment by Roger_M8 in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

In 2019 we had just finished up Pluribus and were discussing what to pursue next. We saw the incredible breakthroughs happening across the field, like GPT-2, AlphaStar, and OpenAI Five, and knew that we needed to be ambitious with our next goal because the field was advancing quickly. We were discussing what would be the hardest game to make an AI for and landed on Diplomacy due to its integration of natural language and strategy. We thought it could take 10 years to fully address, but we were okay with that because historically that kind of research timeframe had been the norm. Obviously things worked out better than we expected though.

Our long-term goal was always the full natural language game of Diplomacy but we tried to break the project down into smaller milestones that we could tackle along the way. That led to our papers on human-level no-press Diplomacy, no-press Diplomacy from scratch, better modeling of humans in no-press Diplomacy, and expert-level no-press Diplomacy. -NB

MetaAI_Official OP t1_izfjgik wrote on December 8, 2022 at 7:06 PM

Reply to comment by pyepyepie in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

Figuring out how to get strong control over the language model by grounding in "intents"/plans was one of the major challenges of this work. Fig. 4 in the paper shows we achieved relatively strong control in this sense: prior to any filters, ~93% of messages generated by CICERO were consistent with intents and ~87% were consistent with the game state. As you note, however, the model is not perfect, and we relied on a suite of classifiers to help filter additional mistakes. Many of the mistakes CICERO made were relative to information that was *not* directly represented in its input (and thus required additional reasoning steps), e.g., reasoning further-into-the-future states or counterfactual past states, discussing plans for third parties, etc. We could have considered grounding CICERO in a richer representation of "intents" (e.g., including plans for third parties) or of the game state (e.g., explicitly representing past states), but in practice we found that (i) richer intents would be harder to annotate/select and often take the language model out of distribution and (ii) we had to balance the trade off between richer game state representation with the dialogue history representation. Exploring ways to get stronger control/improve the reasoning capabilities of language models is an interesting future direction. -ED

MetaAI_Official OP t1_izfj3yl wrote on December 8, 2022 at 7:03 PM

Reply to comment by xutw21 in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

We tried hard in the paper to articulate the important research challenges and how we solved them. At a high level, the big questions were:

RL/planning: What even constitutes a good strategy in games with both competition and cooperation? The theory that undergirds prior successes in games no longer applies
NLP: How can we maintain dialogues that remain coherent and grounded over very long interactions
Joint: How do we make the agent speak and act in a “unified” way? I.e. how does dialogue inform actions and planning inform dialogue so we can use dialogue intentionally to achieve goals?

One practical challenge we faced was how to measure progress during CICERO’s development. At first we tried comparing different agents by playing them against each other, but we found that good performance against other agents didn’t correlate well with how well it would play with humans, especially when language is involved! We ended up developing a whole spectrum of evaluation approaches, including A/B testing specific components of the dialogue, collaborating with three top Diplomacy players (Andrew Goff, Markus Zijlstra, and Karthik Konath) to play with CICERO and annotate its messages and moves in self-play games, and looking at the performance of CICERO against diverse populations of agents. -AL

MetaAI_Official OP t1_izfi4f8 wrote on December 8, 2022 at 6:57 PM

Reply to comment by MetaAI_Official in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

CICERO sent/received an average of 292 messages per game (the 5277 is the number of messages it sent over the course of 40 games). This figure was comparable to its human counterparts. As Andrew points out, this was quite an interesting technical problem to tackle — there are real risks to sending too many messages (annoying your allies, + the additional risk of degenerate text spirals), but missing opportunities to collaborate by not sending enough messages can also be devastating. -ED

MetaAI_Official OP t1_izfhy6x wrote on December 8, 2022 at 6:56 PM

Reply to comment by This_Objective7808 in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

I loved this problem! The average human player sends way too few messages compared to the best human players, so the challenge was how far to push this before it became.... weird. So it wasn't just infinite messaging either. I'll let others answer how that was technically achieved, but this was an underrated challenge to achieving great play. What a great question! -AG

MetaAI_Official OP t1_izfhq3k wrote on December 8, 2022 at 6:54 PM

Reply to comment by Effective-Dig8734 in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

The next step is taking the lessons we've learned from CICERO and extending them more broadly to other research domains. We're also hoping that others are able to build on our open-sourced work and will continue to use Diplomacy as a benchmark for research. -NB

MetaAI_Official OP t1_izfhgd0 wrote on December 8, 2022 at 6:53 PM

Reply to comment by MetaAI_Official in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

As Andrew has said, Diplomacy is less about lying and more about trust-building than beginners typically think. Of course, there are times when some amount of lying may be the best strategy. One reason that CICERO did not use deception effectively - and why we abandoned it - is that it wasn't very good at reasoning about the long-term cost of lying, i.e. knowing exactly how much a particular lie would hurt its ability to cooperate with the other player in the future. We're not really interested in building lying AIs, but being able to understand the long-term consequences of one's actions on other people's behavior is an interesting research direction! -AL

MetaAI_Official OP t1_izfheer wrote on December 8, 2022 at 6:52 PM

Reply to comment by NeverStopWondering in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

From a strategic perspective, it attempts similar things but the results are a little different - which is understandable as it reacts differently. It tends to build more unorthodox alliances just because it doesn't know they're unorthodox. It actually made the self-play games quite fun to watch, although if the point is to compete against humans it is kind of tangential to the key challenges. -AG

MetaAI_Official OP t1_izfh1t6 wrote on December 8, 2022 at 6:50 PM

Reply to comment by NeverStopWondering in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

We tested the model using self-play frequently before we ever put it in front of humans (outside of our team). One interesting learning was that mistakes that the model makes in self-play games aren't reflective of the mistakes it makes when playing against humans. From a language perspective, in self-play, the model is more prone to "spirals" of degenerate text (as one bad message begets the next, and the model continues to mimic its past language). Moreover, humans reacted differently to mistakes the model made — in human play, a human might question/interrogate the agent after receiving a bad message, while another model is unlikely to do so. This really underscored the importance of playing against humans during development for research progress. -ED

MetaAI_Official OP t1_izffy2d wrote on December 8, 2022 at 6:43 PM

Reply to comment by xutw21 in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

From a non-technical point of view, the fact that the human Diplomacy players we worked with (Karthik and Markus) were really excellent players so the model kept being evaluated against the best, rather than accounting for human players sometimes being average instead. Accounting for all levels of play was challenging! -AG

MetaAI_Official OP t1_izffqx4 wrote on December 8, 2022 at 6:42 PM

Reply to comment by Rybolos in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

As we look at the incredible potential of what AI can unlock in the physical and virtual worlds we need to balance that optimism with an appreciation for the risks. These risks can come in many forms whether through unintended uses of new technologies or through bad actors looking to exploit areas of vulnerability. Being thoughtful about research release (through, e.g., special licenses, as you suggest), is one way to help this research move forward while limiting potential negative use cases. There are also many other research areas which I think are promising for bolstering positive use cases and limiting negative ones; to name just a few, improving control over language model outputs, investing in modeling for rapid adaptability and flexibility, discriminating between human and model-generated text, etc. -ED

MetaAI_Official OP t1_izffhli wrote on December 8, 2022 at 6:40 PM

Reply to comment by thatguydr in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

Nah CICERO is still invited to the house games -NB

MetaAI_Official OP t1_izff2pu wrote on December 8, 2022 at 6:37 PM

Reply to comment by [deleted] in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official

I really strongly disagree that lying is a positive in Diplomacy. The best players do it as little as possible - it is a game about building trust in an environment where trust is hard to build. I think Diplomacy has a reputation for being about lying because new players think just because they can do it, they must. I am nearly certain that a "CICERO II" wouldn't lie more. -AG