2021 Competition Results

This year we ran two tracks of the competition. The Standard Track which is based on Angry Birds Chrome, and the new Novelty Track which is based on Science Birds.

For the Standard Trackagents have 30 minutes per round to solve 8 new Angry Birds levels that we designed ourselves. Each level contains only game objects that are known to the agent. We did not run any of the agents on the new competition levels before the competition and they are unknown to all participants. For every match, all agents can see the current high scores per level for all agents of the same match. and can use this information to select which level to solve next. Levels can be solved again in any order until the time is up. Each agent runs on an individual laptop and can use its full computational power. We ran 4 agents per round and every round was live streamed as part of IJCAI. The overall score per agent is the sum of their highest score for each of the 8 game levels. The two agents with the highest overall score progress to the next round until we have a winner.

We had 5 participants this year, the unmodified defending AIBIRDS 2019 Champion BamBirds, we call it BamBirds 2019, the improved version BamBirds 2021, both from Bamberg University in Germany. In addition, Agent X from ANU in Australia, Shiro from NIAD in Japan, and MEMI from GIST in South Korea.  In order to determine the four agents for the Semi Final, we ran a qualification round using the 8 games from previous competitions. The four best agents qualified for the live Semi Final match while the fifth agent was run individually on the 8 Semi Final games. 

In the 2019 competition, there was a huge improvement in performance compared to previous years. The two best agents from 2019, BamBirds and SimbaDD both smashed the best agents from 2018. Given that BamBirds 2019 was so good, and SimbaDD didn't participate, we were concerned that BamBirds would dominate this competition and both versions progress to the Grand Final. At the start of the Semi Final this seemed to be the case, both BamBirds versions started to solved game levels, with BamBirds 2021 taking the lead. Agent X was slower, but started to catch up with BamBirds 2019. After a while, Agent X finally moved ahead of BamBirds 2019 and started to close in on BamBirds 2021. In the end, BamBirds 2021 won the Semi Final with 312,910 points, Agent X came second with 270,200 points and qualified for the Grand Final. BamBirds 2019 was third with 235,040 points and MEMI fourth with 142,400 points. Shiro was stuck at the first of the eight levels and ended up with 0 points. You can watch the recording of the semi final. 

In the Grand Final between BamBirds 2021 and Agent X, Agent X solved the first level and took the lead. After a while BamBirds 2021 was catching up and went ahead. Both agents we solving more games and the score was very close, with BamBirds 2021 staying slightly ahead. Then Agent X overtook BamBirds 2021 again, and slowly extended its lead. BamBirds couldn't solve the games that Agent X was solving and then changed its strategy to playing already solved games again. But that strategy was not successful and Agent X increased its lead further. In the end, Agent X was so far ahead that a victory seemed certain. Once the thirty minutes were up, Agent X won with 257,330 points and was the new AIBIRDS 2021 Champion! Congratulations to Daniel Lutalo from ANU! BamBirds 2021 ended up in second place with 168,290 points. Agent X solved all 8 games of the Grand Final and is the deserved winner. You can watch the recording of the grand final. 

 

The Novelty Track was very different from the Standard Track. Not only because it was based on Science Birds, which has slightly different physics from Angry Birds, but also because we introduced novelty that was unknown to the participants. This novelty was of three different types, we call it novelty levels. Novelty level 1 was new game entities that are visibly different from known game objects and have different properties. These could be new birds with new special powers, new pigs, or new blocks and other new game objects. Novelty level 2 was existing game entities, but with a changed parameter, which could be any of the parameters that define the game entities. The difficulty here is that agents can only determine what has changed by interacting with the modified game entity and observing what has changed. Novelty level 3 is a change in game representation. For example, the game could be upside down or black and white. Before the competition we released 12 test novelties, 5 for novelty levels 1 and 2, and 2 for novelty level 3 for participants to experiment and develop their agents. But the novelties we used in the competition (2 per novelty level) were unknown to participants. The Novelty Track is much more realistic as in the real Angry Birds game there novelty is very often introduced. New game entities, new capabilities, or new game versions. The AI capabilities to deal with novelty are of utmost importance for AI dealing with real world situations, where novelty occurs very frequently. AI agents need to be able to detect what is novel and need to be able to adjust to it. An example is a future household robot. Whenever you purchase a new item, the household robot needs to understand what the new item is, what it does and how it can be used, just like other members of the household. 

The novelties we used in the competition are the following:

Novelty Level 1 (Previously unseen objects or entities):

  • Novelty 1.1: New egg-shaped object which gives -10,000 points when hit
  • Novelty 1.2: New bird with low friction and low bounciness that can slide on the ground

Novelty Level 2 (Change in object features):

  • Novelty 2.1: Pig color changed to red
  • Novelty 2.2: Launch force of red bird increased, the bird shoots further

Novelty Level 3 (Change in representation):

  • Novelty 3.1: The game is flipped upside down, agents need to shoot downwards
  • Novelty 3.2: Changed color map from RGB to BGR, all objects have a different color. 

In order to evaluate agents on these novelties, we set up a competition with ten different trials for each of the six novelties. A trial is a fixed sequence of Angry Birds games, each game can be placed only once and games most be played in the given order. Each trial has an unknown number of standard, non-novel games at the beginning of the trial, followed by a fixed number of novel games, i.e., at one point the games change from non-novel games to novel games. The task of the agents remains to solve each level, i.e., to kill all the pigs with as few birds as possible. How to solve this task can change quite a bit when novelty is introduced, and it is possible that agents unable to deal with novelty cannot solve any games anymore. Agents need to detect the novelty and adjust to it, a very difficult task. Each trial contained between 0 and 10 non-novel games, followed by 40 novel games. But these settings were unknown to participants. In addition to solving the games, agents also had to report when they believe novelty has been introduced. Therefore, agents are evaluated on two aspects: (1) their novelty detection performance, which is based on the percentage of trials where they correctly detect novelty (i.e., they report novelty after novelty occurs) and on the number of novel games they need before they can detect it. (2) their novelty reaction performance, which is the overall game score they received in the novel games. See here for a more detailed description of these measures. Given that we used 6 novelties, ten trials per novelty, plus ten trials for no-novelty, and each trial consists of around 50 games, i.e., each agent had to play around 3500 games, we were not able to run the competition live, but ran it in advance on AWS. Agents had on average three minutes per game. 

We had six teams who were brave enough to participate in this extremely challenging competition: BamBirds from the University of Bamberg, who was the winner of the previous standard track. CIMARRON from the University of Massachusetts Amherst, Dongqing 1 from Bytedance and Monash University, HYDRA from the Palo Alto Research Center and the University of Pennsylvania, OpenMIND from Smart Information Flow Technologies, and Shiro from NIAD-QE. The overall winner of the competition is the agent with the best novelty reaction performance across all novelties. In addition, we have two other major awards: the agent with the best novelty detection performance across all novelties, and the agent with the best non-novelty performance, which corresponds to the Standard Track. We also have six subcategories: best novelty detection performance and best novelty reaction performance for each of the three novelty levels. 

We presented the results live as part of the AIBIRDS Competition Session at IJCAI 2021. The presentation which includes details about the six novelties we used  can be found here

The overall winner and AIBIRDS 2021 Novelty Champion is: CIMARRON from the University of Massachusatts Amherst! Congratulations to David Jensen and his team! Second place went to Dongqing 1, third place to HYDRA. 

  • Subcategory winners for the best novelty reaction performance are: BamBirds for novelty level 1, CIMARRON for novelty level 2 and for novelty level 3. 

The winner of the Novelty Detection Award is OpenMIND from SIFT! Congratulations to David Musliner and his team! Second place went to HYDRA, third place to Dongqing 1.  

  • Subcategory winners for the best novelty detection performance are: CIMARRON for novelty level 1, Dongqing 1 for novelty level 2 and OpenMind for novelty level 3. 

The winner of the Non-Novelty Award is Dongqing 1 from Bytedance and Monash University! Congratulations to Dongqing Wen! Second place went to BamBirds, third place to CIMARRON. 

Detailed results, including results per novelty can be found in the presentation slides. Results for the three award categories are in the table at the end of this page. 

 

That was the end of a very exciting competition. We saw many improvements and amazing shots. In the Standard Treck, two agents beat the previous winner, and the best agent was able to solve all 8 levels of the Grand Final. Unfortunately, we were unable to run our Man vs Machine Challenge this year and to test if the best agents are already better than human players, as our competition was virtual and not in person due to COVID-19. We hope to be able to test this again at next years competition at IJCAI 2022. The Novelty Track was a real highlight this year, with excellent performance by different teams and three different winners of our three awards. We will definitely continue the Novelty Track next year. We will consider porting our Standard Track also to Science Birds, in order to make it easier for learning agents to use large amounts of training data and faster gameplay (up to 50 times faster than Angry Birds Chrome), 

We hope to see many improved agents and many new agents at our next competition in 2022. Angry Birds remains a very challenging problem for AI and the new Novelty Track makes it even more challenging. We are still waiting to see an exceptionally good deep learning agent. We encourage and challenge all members of the AI community to take on this problem and to develop AI that can successfully deal with a physical environment. See you in 2022! 

Jochen, Katya, Vimu, Chathura, Cheng and Peng. 

 

The main results of the competition can be found in the following table:   

Standard Treck     Semi Final  
         
Grand Final     1. BamBirds 2021  312,910
      2. Agent X  270,200
1. Agent X   257,330   3. BamBirds 2019  235,040
2. BamBirds 2021   168,290  

4. MEMI 

 142,400
      5. Shiro   0
         
Novelty Track        
         
 Novelty Reaction (out of 2400    novel games)     Novelty Detection  (max score 39)
         
 1. CIMARRON 49,653,730     1. OpenMIND  26.32
 2. Dongqing 1 41,231,190     2. HYDRA  25.12
 3. HYDRA 37,766,360     3. Dongqing 1  24.59
         
Non-novelty performance (out of 461 non-novel games)       
         
 1. Dongqing 1 16,942,950       
 2. BamBirds 16,616,040       
 3. CIMARRON 15,561,600