TLDR; Know what problem we want to solve first. (O)btain necessary data, (S)crub data to analysable format, (E)xplore data to understand its behavior, build (M)odel using the data, i(N)terpret result so that it can answer the problem we have define in the first place
Let's speedrun this!
OSEMN is a 5 iterative stage on how does data scientist work through a problem. Where each stands of (O)btain, (S)crub, (E)xplore, (M)odel, i(N)terpret
Clean data from any typo & convert all to format that is most suitable to the model, which is numerical
Understand how does the data correlates with the other, so that you can decide which features to be enterred into the model
Put in the features you've decided into the model , and tune it to find the best parameters. Careful not to overfit!
Voila, you got your result from the model, time to interpret the result in non-tech term so that you can present it to your boss
Since I love you all and you made it this far, I've created a Notebook that shows how can we implement this OSEMN framework in real life. You can see it Here
Problem : With given person height, how much is the person weight most likely ?
Solution : Build a simple Machine Learning Linear Regression Model that the predict the person weight with given person height
Obtain : Use a public data on random male & female weights & heights, in csv (csv is good for u kids)
Scrub : Make sure theres no empty value & all weights and heights are in numerical value
Explore : We see what is the correlation between weight & height
Model : Use Linear Regression to train the model using the height data in order for the model to learn the weight
iNterpret : We can get the linear regression general solution (intercept & coefficient) for predicting the person weight, given height
Thanks for reading, and don't message me if you're lost (lmao jk, you can add me in linkedin and ask me there!