Hello Readers,

This is my first post on Medium and I am very excited to share it with y’all.

I will be explaining in the most simple way how you can use unsupervised machine learning tools like Community Detection and Network Analysis algorithms such as Louvain and Girvan-Newman to determine which stocks to keep in your portfolio and possibly make money of the stock market (even in today’s retail investor & fundamentally flawed fuelled rallies) in Python and R.

The whole process will be broken down into three different parts for easier understanding and code segmentation.

Part 1 — Data…

You can find the whole project and dataset here: https://www.kaggle.com/akiboy96/spotify-song-popularity-genre-prediction

correlation plot with distributions

The dataset we will explore, analyze and model on will be the Spotify dataset that contains song information over the decades. The dataset essentially has information about the song such as, track name, artist name, danceability, key of the song, acousticness, speech, tempo, liveness, valence, popularity and decade along with other factors that would help us deduce meaningful information in determining if a song can be classified as a hit or not.

Music is considered a very subjective area and a lot of people have different preferences in the type…

We finally get to the Portfolio Building where we can try to beat the SPY average returns Year over Yer (YoY). The whole code base can be found here:

To build the portfolio based on the correlation of stock prices in the S&P 500, we will use network centrality as a measure to understand how the stocks are better integrated with one and other.

Using this diagram below you can understand the process better.

This is a continuation of the previous blog which showcased data scraping and cleaning. We will now move to modelling our network and form communities using clustering algorithms that will eventually be used to create our portfolio that could beat the SPY average YoY return of 8.93%.

Part 2: Modelling

This end to end solution architecture shows how stock information will be transformed into a network that builds communities of correlated stocks by price movement over time.

The next step is to build the relationship between the stocks by calculating the correlation coefficient as shown below.

Time Series Cross — Correlation Calculation:


Aakash Kedia

CS, music and football in no particular order

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store