Fall In Love With Network Science At First Sight!
Two and half years ago, I registered for Network Science Data class, my very first course in Network Science. Coming from an engineering background, during the course I was feeling like a dumb! Literally having no clue about what is going on in the class. However, amazing Nicola Perra turned the class into one of my best academic experiences. He covered many aspects in Network Science including computational networks' representation, network metrics, community detection, temporal networks, and dynamics on networks. At the beginning, I was slow and dull, but the class was engaging enough to make me fall in love with Network Science!
As the final project, we should have come up with a question and analyze the real network data accordingly. Being a soccer fanatic, it was easy for me to figure out what I want to work on! I decided to analyze the Premier League data to see if I can develop a predictive framework to forecast the final table standings. Before going further, I have a spoiler alert! As I said this was a small course project and the analyses are rudimentary and naive in a way and definitely having several flaws. There is no claim regarding getting the best out of the data. The purpose of this post is to review how I started getting interested in Network Science!
Let me start with an introduction. The Premier League is an English professional league for men's association football clubs. At the top of the English football league system, it is the country's primary football competition. Contested by 20 clubs, it operates on a system of promotion and relegation with the Football League. Besides English clubs, the Welsh clubs that compete in the English football pyramid can also qualify to play. Seasons run from August to May, with teams playing 38 matches each (playing each team in the league twice, home and away) totalling 380 matches in the season [Wikipedia].
Where I got the data from ...
Football-Data is a free football betting portal providing historical results and odds to help football betting enthusiasts analyze many years of data quickly and efficiently to gain an edge over the bookmaker. Whilst other football results and odds databases do exist, Football-Data is unique in making available computer-ready data in Excel and CSV format for quantitative analysis [football-data]. The datasets contain information about each match including home/away team, full time home/away team goals, full time result, home/away team shots, home/away team corners, etc. I put this information into a network format. Assume two teams of Manchester United and Chelsea play two times and the final results for each are 4-2 and 3-1.
Aggregating these two results, I assumed there are two directed weighted links between these teams. The link going from Manchester United to Chelsea has a weight of 3. In other words, Manchester send 3 credits to Chelsea. In this way, we will have a complete directed weighted network at the end of each season.
How to predict the final ranking ...
I wanted to find a way to use this network to develop a predictive model to forecast the final ranking of all teams. Therefore, I used a PageRank-like algorithm in which the goals scored by team A against team B are the credits team B sends to team A, and vice versa. The Page-Rank values are iteratively distributed around the network according to these credits. Since the number of in-coming and out-going links are the same for all nodes, the Page-Rank algorithm needs to be redefined based on the weights rather than the links. I applied this algorithm on the network of season 2013-14. The network is shown as below in which the size of nodes is proportional to the PageRank values and the thickness at the end of a link is proportional to the weight.
To see if this measure can provide prediction power, I used the information of the half of the season to predict the ranking at the end of the season. As you can see below, the algorithm does pretty well in predicting the top teams' ranking, being accurate on the top five teams.
However, it does poorly in predicting the bottom of the table. One reason could be the relegation battle on the bottom of the table. Usually the last six teams on the table are trying to escape the relegation zone and their performance is difficult to predict. Those teams usually go through changes on the bench - players and the management - , also feeling the pressure from the fanbase and media. They may have very erratic results, like beating the team at the top, but losing to teams below them!
Applying the algorithm on the previous seasons also show that the strength of prediction the top of the table. Again I used the results of the first half of the season.
I checked if other available variables in the dataset can improve the statistical power in prediction based on the estimated error. As a results, it turned out that the best predictors are total goals, total goals and shots on target, and total shots on target respectively. The other thing that I was curious about was the application of this algorithm on other leagues. So I used it on the La Liga - the top professional division of the Spanish football league systems for the same season as 2013-14. I sorted the PageRank values for both Premier League and La Liga and plotted them. At the first sight both lines look similar, but they have a minor difference. The difference between two red dots (Premier League) next to each other is less than the same on the blue line (La Liga). One way I interpreted this is the Premier League is more competitive in comparison with La Liga. In other words, in the Premier League teams are so close to each other, making it difficult for teams to improve their ranking. By the way, this is not something new. Many soccer pundits argue that Premier League is the most competitive professional soccer league. Of course there are many ways to improve this framework. One can consider extra credit for a team who draws against a powerful rival. The other way is to add a damping factor to the model to mostly value recent results.
Final Words ...
Network Science is a beautiful field, recently emerging, thriving, and shedding light on the new aspects of our life. If you want to know more about it, Network Science book by Albert-Laszlo Barabasi is a great place to start!