This is my first post on Medium and I am very excited to share it with y’all.
I will be explaining in the most simple way how you can use unsupervised machine learning tools like Community Detection and Network Analysis algorithms such as Louvain and Girvan-Newman to determine which stocks to keep in your portfolio and possibly make money of the stock market (even in today’s retail investor & fundamentally flawed fuelled rallies) in Python and R.
The whole process will be broken down into three different parts for easier understanding and code segmentation.
You can find the whole project and dataset here: https://www.kaggle.com/akiboy96/spotify-song-popularity-genre-prediction
The dataset we will explore, analyze and model on will be the Spotify dataset that contains song information over the decades. The dataset essentially has information about the song such as, track name, artist name, danceability, key of the song, acousticness, speech, tempo, liveness, valence, popularity and decade along with other factors that would help us deduce meaningful information in determining if a song can be classified as a hit or not.
Music is considered a very subjective area and a lot of people have different preferences in the type of music they listen to. However, if a large number of people like a song, then it’s definitely considered a hit because it has large mass appeal and is played often and thus considered popular. …
We finally get to the Portfolio Building where we can try to beat the SPY average returns Year over Yer (YoY). The whole code base can be found here:
To build the portfolio based on the correlation of stock prices in the S&P 500, we will use network centrality as a measure to understand how the stocks are better integrated with one and other.
Using this diagram below you can understand the process better.
This is a continuation of the previous blog which showcased data scraping and cleaning. We will now move to modelling our network and form communities using clustering algorithms that will eventually be used to create our portfolio that could beat the SPY average YoY return of 8.93%.
This end to end solution architecture shows how stock information will be transformed into a network that builds communities of correlated stocks by price movement over time.
The next step is to build the relationship between the stocks by calculating the correlation coefficient as shown below.
Time Series Cross — Correlation Calculation:
We will create a function to calculate the cross-correlation of stocks over different time windows. …