MetaAI_Official
MetaAI_Official OP t1_izfeehk wrote
Reply to comment by Liorogamer in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official
One challenge was being able to hold 6 simultaneous conversations at a human speed in the fast-moving "blitz" Diplomacy format, since CICERO has to do a lot of planning and NLP work for each message it sends (see Fig 1 in our paper). We ended up splitting CICERO into "sub-agents" that handle conversations with each other player. CICERO actually ran on 56 GPUs in parallel for our human games (although it can also run on a single GPU in slower time formats). -AL
MetaAI_Official OP t1_izfe82v wrote
Reply to comment by [deleted] in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official
The title of the paper doesn't refer to CICERO being "human-like" necessarily (though it does behave in a fairly human-like way). Instead it refers to the agent achieving a score that's on the level of strong human players.
But also, CICERO is not just trying to be human-like: it’s also trying to model how *other* humans are likely to behave, which is necessary for cooperating with them. In one of our earlier papers we show that even in a dialogue-free version of Diplomacy, an AI that’s trained purely with RL without accounting for human behavior fares quite poorly when playing with humans (Paper). The wider applications we see for this work are all about building smart agents that can cooperate with humans (self-driving cars, AI assistants, …) and for all these systems it’s important to understand how people think and match their expectations (which often means responding in a human-like way ourselves, though not necessarily).
When language is involved, understanding human conventions is even more important. For example, saying “Want to support me into HOL from BEL? Then I’ll be able to help you into PIC in the fall” is likely more effective than the message “Support BEL-HOL” even if both express the same intent. -AL
MetaAI_Official OP t1_izfcvy5 wrote
Reply to comment by JimmyTheCrossEyedDog in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official
We disentangle the complexity of the action space from the complexity of the planning algorithm by using a policy proposal network. For each game state we sample a few actions from the network - sets of unit-order pairs - and then do planning only among these actions. Now, in case of continuous actions we will have modify the policy proposal network, but that was already explored for other games with continuous action space such as StarCraft. - AB
MetaAI_Official OP t1_izfcd0s wrote
Reply to comment by TissueReligion in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official
I started grad school in 2012 and technically defended in 2020, but I actually left the PhD in 2018 and finished up my dissertation while working over the next two years. My grad school research was unusually focused for a PhD student. All my research, starting with my first paper, was focused on answering the question of how to develop an AI that could beat top humans in no-limit poker. After we succeeded in that in 2017, my research shifted more toward generality and scalability.
My original plan was to defend in summer 2019, do an industry research stint for a year, and then start a faculty position in 2020. (1-year deferrals are common in academia these days.) So I applied to universities and industry labs in fall 2018. FAIR gave me an offer and also said that I could start working immediately, even though I told them that I'd be doing faculty interviews for most of spring 2019. That seemed like a strictly better option than staying in grad school and making near-minimum wage, so after considering a few other options I chose to join FAIR immediately.
I ended up liking it so much that I turned down my faculty offers and stayed at FAIR. Once I knew I wasn't going to faculty, there wasn't as much urgency to finishing my PhD. I wanted to include one more major project in my thesis, ReBeL, so I held off on defending until that was done. -NB
MetaAI_Official OP t1_izfc2zh wrote
Reply to comment by MetaAI_Official in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official
I love that Meta open-sourced this. I think that's an important point. I really saw this is a way Meta is giving back to the AI community and the scientific ocmmunity in general and that's one of the reasons I agreed to join this project. I think it is far better for advances like this to come from open academic research than from top secret programs so it is a major ethical tick for Meta that they invest in research like this. -AG
MetaAI_Official OP t1_izfc0o8 wrote
Reply to comment by addition in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official
Meta has no plans to turn CICERO into a product and that was never the goal. This is purely AI research that we have open sourced for the wider research community. I think there are a lot of valuable lessons that the research community can learn from this project. -NB
MetaAI_Official OP t1_izfbfrz wrote
Reply to comment by pm_me_your_pay_slips in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official
Backstabbing tends to get devalued by CICERO. It has long been my thinking that backstabbing is a poor option in the game and I always feel like I fail when I have to do it, and CICERO seems to agree with me. It gets clearly better results when it is honest and collaborates with allies over the long term. If you forced it to play a pure tactical style game in an environment with communication it would perform poorly, and I think there's a marker there for human players who want to get better as well as some interesting AI ethics ideas that can be explored in future. -AG
MetaAI_Official OP t1_izfawoe wrote
Reply to comment by ClayStep in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official
Actually the language model was capable of suggesting good moves to a human player *because* the planning side of CICERO had determined these to be good moves for that player and supplied those moves in an *intent* that it conditioned the language model to talk about. CICERO uses the same planning engine to find moves for itself and to find mutually beneficial moves to suggest to other players. Within the planning side, as described in our paper, we *do* use a finetuned language model to propose possible actions for both Cicero and the other players - this model is trained to predict actions directly, rather than dialogue. This gives a good starting point, but contains many bad moves as well, this is why we run a planning/search algorithm on top. -DW
MetaAI_Official OP t1_izfam7u wrote
Reply to comment by MetaAI_Official in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official
There were also some places where it looked like it was heading down strategic blind alleys but it kept getting strong results - so for me it also showed that humans can also get stuck in local optimums, especially when groups and their collective "meta-strategies" get involved. -AG
MetaAI_Official OP t1_izfa9bo wrote
Reply to comment by pimmen89 in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official
I'm not entirely sure if this answers what you were asking, but on the strategic planning side of CICERO, in some sense the fundamental challenge of Diplomacy is that it has a large number of local optima, with no inherent notion of which optimum is better than any other. Because you need to sometimes cooperate with others to do well, the way you need to play depends heavily on the conventions and expectations of other players, and a strategy that is near-optimal in one population of players can be disastrous in another population of players. This is precisely what we observed in earlier work on No-press Diplomacy (Paper). Central to many of our strategic planning techniques in Cicero is the idea of regularization towards human-like behavioral policies, to ensure CICERO's play remains roughly compatible with human play, rather than falling into any of the countless other equilibria that don't. -DW
MetaAI_Official OP t1_izf9vhi wrote
Reply to comment by nraw in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official
AI sits at the very heart of work across Meta. We are part of Meta AI's Fundamental AI Research team -- known as FAIR. Exploratory research, open science, and cross-collaboration are foundational to FAIR efforts. Researchers like us have the freedom to pursue pure open science type work and collaborate with the industry and academia.
Research teams also work closely with product teams across Meta. This gives engineers an early view into where the latest in AI is heading and gives researchers an up-close look at how AI is working at scale. This internal collaboration has helped us build a faster research to production pipeline too. -AL
MetaAI_Official OP t1_izf9mic wrote
Reply to comment by hophophop1233 in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official
While CICERO is only capable of playing Diplomacy, the underlying technology is relevant to many real-world applications. We think others will be able to build on this research in a way that might lead things like better AI personal assistants or NPCs in the metaverse.I think the way we integrated strategic reasoning with NLP was novel and has implications for future research.
We've open-sourced all the models and code. We're also making the training data available to researchers who apply through our RFP. Running the full CICERO agent, including the strategic reasoning component, is quite expensive. The raw models by themselves are more manageable though. -NB
MetaAI_Official OP t1_izfet1n wrote
Reply to comment by MetaAI_Official in [D] We're the Meta AI research team behind CICERO, the first AI agent to achieve human-level performance in the game Diplomacy. We’ll be answering your questions on December 8th starting at 10am PT. Ask us anything! by MetaAI_Official
One of our models trained for several days, and at certain times of the day (but not every day) training speeds would drop dramatically and certain machines became unstable. After a lot of investigation, it turned out that the datacenter cooling system was malfunctioning, and around mid-day on particularly hot days, GPU failure rates would skyrocket. For the rest of the model training run, we had a weather forecast bookmarked to look out for especially hot days! -JG