Sunday, May 10, 2020

Answer for Customer Analytics: Any rule of thumb to ignore some features based on correlation?

https://365datascience.com/dwqa-answer/answer-for-customer-analytics-any-rule-of-thumb-to-ignore-some-features-based-on-correlation/ -

Hi MinliYu, 

thanks for reaching out! 

You’re absolutely right, when two features have high correlation, they are likely to contain very similar information. In such cases we might want to remove one of the features, but the question then becomes which one do we keep? If we have prior knowledge on the dataset, we can decide, that one of the features makes more sense for our model, and leave the other one out. Otherwise, to avoid the correlation issue, we can rely on the following method:

Using dimensionality reduction such as PCA helps us avoid collinearity, as the PCA components are orthogonal to each other. In addition, PCA keeps the features with the most variance, which means that we’ve not lost any important feature by mistake.

 

Best, 

Eli

 




#365datascience #DataScience #data #science #365datascience #BigData #tutorial #infographic #career #salary #education #howto #scientist #engineer #course #engineer #MachineLearning #machine #learning #certificate #udemy

No comments:

Post a Comment