2023 Novelty Track
This year we continue the successful AIBIRDS Novelty Track. In past AIBIRDS competitions, all game objects were known in advance and didn't change over the years. However, in the real Angry Birds game we get new objects all the time. New birds with new capabilities, new types of blocks with different behaviour, new game features, new background and so on. Humans can deal with those novelties very effectively, for AI this is much harder. Learning based systems, such as Deep Learning systems, often need a lot of training data to be able to perform well. If novelty is introduced, these systems typically need to retrain, again with lots of training data. One of the big challenges in AI is to develop systems that can deal with novelty as efficiently and as effectively as humans and adjust to it quickly. Encouraging the development of such AI systems and testing and comparing them is the purpose of our novelty track.
There are many fundamentally different kinds of novelty, we call them novelty levels. Our aim is to introduce new novelty levels for each competition. For the second competition, we focus on the first six novelty levels (according to: Ted Senator. Open World Novelty Hierarchy, in: Science of Artificial Intelligence and Learning for Open-World Novelty, BAA, 2019):
- Novelty level 0 (Instances): Previously unseen objects or entities. This corresponds to new Angry Birds game levels and is what we are doing as part of our standard AIBIRDS competition.
- Novelty level 1 (Class): Previously unseen classes of objects or entities. This corresponds to new game objects with new properties, such as a new type of block that behaves differently to previous block types. These new game objects can be visually distinguished from known objects, but at first sight it is unknown how they behave.
- Novelty level 2 (Attribute): Change in a feature of an object or entity, such as color, shape, or orientation not previously relevant to classification or action. In our competition this corresponds to modified object properties, for example, wood blocks have now twice the mass as before. Many of these novelties cannot be seen, but lead to a different game play behaviour. We will not introduce new capabilities and will not change the environment, only game objects.
- Novelty level 3 (Representations): Change in how entities or features are specified, corresponding to a transformation of dimensions or coordinate systems, not necessarily spatial or temporal. An example for our competition would be that screenshots will be in greyscale only rather than in colour.
- Novelty level 4 (Relations, static): Change in allowed (static) relationships between game entities. An example for our competition would be that pigs can now be underground.
- Novelty level 5 (Interactions, dynamic): Change in allowed interactions between game entities. For example, a pig could try to escape from an approaching bird by moving away from the bird.
As part of the novelty track, we will test agents capabilities to deal with these novelty levels. We plan to run the novelty track as follows (this might change based on feedback we get from participants):
- For each novelty level 1-5, we introduce several novelties. For example for novelty level 1, a single new game object is one novelty, or a new block material such as clay blocks is one novelty. For novelty level 2, the specific change of one existing parameter value of an existing game object would be one novelty, for example the mass, or friction of a wood block. The novelties we introduce for the competition are unknown to participants.
- For every novelty 1-5 we introduce, we generate game levels that each include this one particular novelty (in future competitions, game levels may include more than one novelty from different novelty levels). Game levels can include more than one of the same novel object.
- Participants submit their novelty agents to us seven days before the competition. Participants can train their agents as much as they like before they submit their agents, for example using the sample novelties we published, or novelties they create themselves, but not on the competition novelties.
- The competition consists of multiple trials. Each trial ti is dedicated to one specific novelty and consists of a sequence of n different Angry Birds games. The first mi games of a trial are standard games without novelty, the following n-mi games are games with novelty. Neither n nor mi are known to participants, mi can be between 0 and n, that is, a trial might consist of only standard games, only novel games, or a sequence of standard games followed by novel games. Each game in a trial can only be played once, the games in a trial have to be played in the given order. There will be a time limit per trial.
- Agents are required to report for every game in a trial if they believe the novelty switch has happened or not. This is a value between 0 and 1, where any value above 0.5 is interpreted as the trial has switched to novel games. For each game we record the solved score that has been achieved by the agent. If a game has been solved (=all pigs have been killed) the solved score is equal to the game score. If a game has not been solved, the solved score is 0.
- For each agent we measure the following:
- For each novelty level, we measure the aggregated solved score. This is the sum of the solved scores of each game that contains novelty (or of all games for novelty level 0).
- For each novelty level 1-5, we measure the percentage of correctly detected trials (%CDT). These are trials where the agent reports that novelty has been detected for the first time for a novel game (=no false positives and at least one true positive, if novel games were present in the trial).
- For each novelty level, we also measure the average number of novel games needed to detect novelty (#NGN). For a given trial, if game #5 is the first novel game, but novelty has only been detected in game #10, then the number of novel games needed for this trial is 6. This is only recorded for correctly detected trials with novel games.
- For each novelty level we determine the agent with the highest aggregated solved score, as well as the agent with the highest novelty detection score = %CDT * (MAX-NGN - #NGN), where MAX-NGN is the NGN value when novelty is detected at the last game of a trial, averaged over all trials.
- The winner of the competition is the agent with the highest aggregated solved score across all the novelty levels 1-5. There will be subcategories for the best performing agent for each of the six novelty levels. There will also be a special award for the agent with the best novelty detection score across all novelty levels 1-5, as well as subcategories for each of the five novelty levels 1-5.
The competition will use the Science Birds software originally developed by Lucas Ferreira. We have extended the original framework to include novelty and also offer a speed up of the gameplay by up to 50 times. Our new Science Birds framework allows you to generate and load game levels and immediately play them, that is you can easily use machine learning and deep learning approaches for developing and training your agents. We also developed a new API which includes screenshots, but also "noisy ground truth" which resembles what you could obtain from the screenshots using state of the art computer vision. Agents can use either screenshots, noisy ground truth or both.
The current version of our software framework is available in open source (please use the latest release, currently 0.5.12) which includes a number of sample novelties and corresponding games for each of the novelty levels we use for the competition. You can use these sample games for developing and testing your agent. The novelties used in the competition will be different from the sample novelties provided in advance, but of a similar type. Registered competition participants can obtain access to the source code of our modified version of Science Birds in order to introduce their own novelty for training and testing their agents.
Note that novelty level 0 is similar to the standard AIBIRDS competition, with the only exception that we use Science Birds instead of Angry Birds Chrome. Science Birds is more natural for Machine Learning and Deep Learning-based agents as it provides a large amount of training data as well as up to 50 times speed up of gameplay, and we are considering moving our whole competition to Science Birds in future years. We therefore also want to encourage teams who are not interested in building novelty agents to be able to build agents based on the Science Birds framework.
We hope you find this new track appealing and hope to advance the state of the art in this important field of AI.