
Uncovering the Hidden Insights: Going Beyond the Surface of Player Statistics
Sep 27, 2024
5 min read

Introduction to the Problem
Soccer has become one of the most expensive sports in the world, and it is clear that money plays a big role in the sport. As a soccer enthusiast, I want to explain how this sport operates. Unlike in other sports, the dream for many players isn't to come to the United States and make it big; rather, they aim to move to Europe, where soccer is extremely popular and the best teams are located. The top 5 European leagues are the Spanish, English, German, Italian, and French Leagues. These leagues are extremely competitive compared to others surrounding them, and they have produced some of the most iconic players in the game. In 2022 alone, they spent over 4.6 billion euros on players. One of the most expensive transfers in the world was a transfer from Barcelona to PSG. The French team bought the Brazilian player from the Spanish club for a quarter of a billion euros. One single player was equivalent to a quarter of a billion dollars. The issue nowadays is that due to this expensive nature, many players who aren't even iconic or even heard of are being bought for 125 million euros for no reason. Making the sport the team with the most money wins. The rush of feeling that your team's adversary will be a good close match allows soccer to be a rush and an enjoyable watch. Instead nowadays, if the club has a lot of money, you'll know they have a lot of good players that will make the game boring. Another main thing that many people don't watch soccer is that link stats don't define a player. For example, defenders barely have stats, they don't score or assist as they're always defending the goal but there are certainly defenders who are worth more than some attackers who score all the time but that's because their defensive skills are so outstanding that you don't even need numbers to see who good a defender is.
Introduction of Data
The dataset contains various information such as height, value, nationality, club, goals, and assists. This data is crucial for my research as I aim to categorize players based on their value and determine the factors contributing to their high prices. I want to understand whether it's their statistics, consistency, or fame that makes these players expensive. The dataset can be accessed on Kaggle through this Link. I've used my reference on soccer knowledge, and much of this information is extremely accurate. Although it is a little outdated, it still works since the market didn't explode at this point but was on an extreme rise.
Pre-processing
I started by examining the data, but it was overwhelming due to the large number of soccer players. I identified the most crucial information as player name, nationality, position, club, age, value, height, and goals. However, this was still quite broad since all players have these attributes. To narrow down the data, I filtered out players with a value below 50,000,000 to focus on those who significantly impact the data. Then, I organized the list from the most expensive to the least, and unsurprisingly, Lionel Messi emerged as the top player. (in my opinion the Best Player in the world not really opinion he's the best ) I created a visualization that represented the top 10 players on the list along with their goals and their value. To my surprise, it showed that some players had relatively few goals but were still ranked very high on the list.


In this picture, we can see the code I utilized to show the players how much money they make in millions and showcase their goals within a seasonal year. Just showcases the top 10 performers of that season.

I decided to use a scatterplot to visualize the data further and see if height truly matters in soccer as it does in other sports. I've always believed that height impacts performance in all sports, but in soccer, it's different. Very tall players don't perform well, and their value doesn't reflect their height. On the other hand, shorter players excel, with their value being 5 to 6 times higher than that of taller players.

Evaluation
I decided to use the Random Forest Classifier to see the accuracy of my graph. I used this Random Forest to identify the Values of the players which gave me an accuracy of 92.5% which is extremely good although I've cleaned up the data since it had an overload of players I had to condense it to a smaller set that would be easier for the Random Model Forest to showcase. I chose Random Model Forest because it does an extraordinary job dealing with large datasets with numbers. The data is no longer being updated since it only takes information of these soccer players for one season it doesn't have any cons for this data set.
Advantages and Disadvantages of the Random Forest Model
PROS: Handles large quantities of information extremely well, and works well with categorical and numerical statistics.
CONS: Its main weak point is the fact that it doesn't deal well at all with consistently updated data if the data is updated frequently it hurts the Random Forest Model
What I learned
The project examines the factors impacting a player's salary. While I used to think that height was a key factor in soccer, many top players like Messi, who earn high wages, are not particularly tall. This demonstrates that soccer is inclusive of all, regardless of physical attributes. Success in soccer is attainable for those who work hard. I also delved into the reasons why some players earn more than others, even if they have few goals or assists. In soccer, statistical data is not as crucial as consistent performance and overall contribution to the team. At times, players may execute runs or moves without the ball that disrupt the defense, requiring excellent communication within the team. Although the player causing the distraction may not receive statistical credit like an assist or goal, they play a crucial role in the team's success. What's most important is that players' pays are based on their performance and consistency there are some defenders.
Impact
This project emphasizes the significance of utilizing machine learning to identify player value, a critical aspect for teams looking to acquire specific players. It is equally crucial for players aiming to secure lucrative deals in the sport to comprehend the factors influencing their earnings, such as goal-scoring or assisting. The key factors revolve around consistency and media appeal; a player with a substantial fan base will likely be retained by the team due to the entertainment and engagement they provide to fans.
Conclusion
In general, my project showcased machine learning techniques and the application of a classification algorithm to forecast player values. While I had hoped for a perfect outcome, achieving a 94% success rate is commendable. Considering the diverse backgrounds of many soccer players, I believe this success rate is quite impressive, and I am pleased with how effectively my classification algorithm performed.
Sources:
https://www.kaggle.com/datasets/kriegsmaschine/soccer-players-values-and-their-statistics
My Code: https://www.mediafire.com/file/4dn1n8jipvgpybp/Soccer_data.ipynb/file
Sep 27, 2024
5 min read
0
27
0