If you don't wish to read the entire proposal, I have summarized it in the research poster below which will be presented at the OptaPro 2021 Forum:

Abstract:
This study will identify and evaluate the best attacking strategies to penetrate a high opposition press into an opposition’s half during short goal kicks by:
1) Quantifying a high opposition press
2) Building a tactical recommender system to analyse team profiles
3) Evaluating best attacking strategies used by teams that have similar tactical profiles
The study will initially define a High-Press of an opposition by measuring the density of players using a Kernel Density Estimator at given distance from the goal line (based on expert recommendations or an empirical bandwidth) when defending short Goal Kicks. This study will use 9 numbered zones in one half of the pitch with 3 equally split vertical and 3 unequally split horizontal zones (Box width, left channel, Right Channel) forming a 3v3 zone matrix. This is to understand zone interactions of teams thereby creating a tactical profile like that of Atlanta United’s. A hybrid recommender system would combine zone interactions and player positions involved during this build up phase to make recommendations on team’s following a similar approach. The best attacking strategies is identified thereafter by understanding underloaded and overloaded zones in this phase.
Introduction
Short Goal Kick Rule: A goal-kick is no longer required to leave the penalty area before a teammate touches the ball, meaning once the goalkeeper touches the ball for the kick, it is immediately considered in play. Opponents must remain outside the penalty area when a goal-kick takes place.
Team choose to press high to win the ball closer to the opponent’s goal leaving space in behind the last line or between lines. There exists a risk-reward trade off in adopting a high or a low press. The risk is to allow space in behind thereby pressurizing the space and players in front towards the attacking team’s own goal. This study would limit the inference to only high-pressure opposition during short goal kicks. This study will analyse the best attacking strategies adopted by teams that have a similar tactical profile of Atlanta United using a combination of tracking and event data.
Research Question:
The Research question is divided into four parts:
· How do we define a high opposition press?
· What metrics can be used to understand the tactical profile of a team?
· How can we identify teams with similar tactical styles?
· How can we define and evaluate strategies used by teams based on the tactical styles identified?
Phase I: Defining a high opposition press
A high press can be defined by the height of a defensive block or line a team employs to force a regain of possession closer to the opposition’s goal. (Low et al,2018, Frencken, W.,2012) characterized teams pressing high with closer (tighter) team centroids or inter-team distances. High pressing teams maintain a numerically superior (equal or greater numbers) compact shape in a zone closer to goal and apply collective pressure on their opponents as high up the pitch as possible. (Bangsbo, 2002). Given this premise, this study would identify a high opposition press defined through a Kernel Density Estimator (KDE) of player positions closer to the opposition’s penalty box. A heatmap of opposition players positions during goal kicks would help classify a high block defence or a high press (as shown in Fig 1.1).

Since short goal kicks allow for more than 1 player inside the 18-yard box, opposition teams press high to employ more outfield players (starting from outside the box) to press the ball as soon as the goal kick has been restarted. Hence, this study would define a high press using:
1) Kernel Density Estimator (KDE): The density estimator (Fig 1.1) will provide a decent estimate of number of defensive players committed to a press closer to goal. Three valid bandwidths on distance from goal can be used namely Silverman Bandwidth, Cross Validation & Trial & Error (Refer to Appendix).
2) Height of the Block: The centroid location of the furthest players within the high pressure and medium pressure zones as shown in Fig 1.2 or based on expert opinion
Phase II: Understand the tactical profile of a team
This study would create a hybrid recommender system to identify teams with a similar tactical profile of Atlanta United. A hybrid recommender system is based on two methods:
1) Collaborative Filtering: What zones are the teams using when playing out the back into the opposition half? (Cano & Morisio,2019)
This filtering technique would focus on identifying specific zones that teams use when playing out the back into the opposition half.

Fig 2.1: The above figure (left) shows the numbered zones and Fig 2.2 (right)show different possession chains leading to absorption (zone 10 )
Given a set number of actions involved during the sequence of play, the Collaborative Filtering technique would create a matrix of zone interactions made by teams using a Boolean Indicator i.e. Interaction: 1, No Interaction: 0

The above image shows an example of unique interaction zones. The unique interaction zones are measured based on removing consecutive zones occurring across actions (i.e. only the first zone interaction is recorded for consecutive zone interactions). Once a set number of actions is estimated empirically by estimating a boundary beyond which the odds of reaching the absorption state drops (for eg: 8 actions), teams following similar interaction zones can be identified using K- Nearest Neighbours or a Clustering (K-means) algorithm.
A K-nearest algorithm is a technique popularly used in collaborative filtering Recommender Systems which represent the most common family of recommenders. It is mostly utilized to analyse neighbourhood and find teams of similar profiles or find teams with similar tactical characteristics. The Clustering (K-means) technique would group similar teams into clusters through a set number of clusters (groups) defined. A set number of clusters to choose is at the plot's "elbow" shown below i.e. the point where adding another cluster doesn't reduce the sum of squared errors too much.

1) Content Based Filtering (CBF): What player positions are involved in these zones moving the ball from a goal kick until absorption (opposition half)? (Cano & Morisio,2019)
CBF assumes that teams that who play out from the back involving specific players in the past, will most likely involve the same players in the future as well. It considers frequency of player interactions in teams and provide recommendations. For this study, a matrix of player interactions (i.e. frequency of player position interactions) would be considered. Given a set number of actions, top ‘n’ player position interactions in a sequence of play leading up until the opposition half.
Methodology: Using a cosine similarity function, we can identify which teams have similar or are closely related in terms of their preferences of player interactions when building out from the back. As shown below, a cosine similarity function is a metric used to measure similarity between documents (sequences of teams) irrespective of size. Since this filtering method directly uses a frequentist approach, there is no need to transform documents into numbers for it to be model appropriate.

Phase III: Hybrid Recommender System: Identify teams with similar tactical styles
Considering both filtering techniques, a weighted approach may be used to identify a unified recommendation from both techniques. Whilst a hybrid recommender system is more accurate than its individual components, this study might only consider the collaborative filtering technique. This is due to the sparsity in classification of positions in f73 dataset which may cause problems to final recommendations (Cano & Morisio,2019)
Phase IV: Define and Evaluate attacking strategies based on Similar team profiles
Consider a scenario that we have now identified at least 10 teams having a similar style to Atalanta United using the hybrid recommendation system. These teams may or may not be in the same league as we haven’t considered speed of play into our recommendation engine. This design consideration can be expanded to use speed based on expert knowledge. This proposal won’t confine the sample space by adding another variable (speed, distance of ball moved) as a filter but can be done at a later stage.
Practical Applications
This study would lead to answering key questions such as:
1) How often do teams find a player or players in overloaded/underloaded zones when playing out the back?
2) Given the number of players interacting with the ball in underloaded zones, how often are team’s successful?
3) Given the number of players interacting with the ball in overloaded zones, how often are team’s successful?
4) How effective are the attacking strategies defined by these player-zone interactions?
By creating a zone matrix where teams are either underloaded or overloaded in zones when the ball is in a specific zone would help us understand the success rate of penetration as shown in the images below:

References
Frencken, W., de Poel, H. J., Visscher, C., & Lemmink, K. (2012). Variability of inter-team distances associated with match events in elite-standard soccer. Journal of sports sciences, 30(12), 1207. https://doi.org/10.1080/02640414.2012.703783
Bangsbo J, Peitersen B. Defensive Soccer Tactics. Human Kinetics, 2002.
Hybrid Recommender Systems: Systematic Literature Review: Erion Cano, Maurizio Morisio, 2019
Comments