Kaggle — Personality Prediction (Rank 21)
ML competition predicting introvert vs extrovert from 8 behavioural features — ranked 21st out of 4,329 teams (top 0.5%) with 97.27% accuracy.
Rank 21
out of 4,329 Teams
Top 0.5%
Global Percentile
97.27%
CV Accuracy
8
Behavioural Features
The Problem
Kaggle Playground Series S5E7 challenge: predict personality type (Introvert/Extrovert) from anonymised behavioural survey features. The key challenges are handling missing values, extracting maximum signal from only 8 features, and avoiding leaderboard overfitting on a dataset with near-perfect class separability.
The Solution
Merged two external personality datasets with the competition training data to expand training signal. Applied median imputation for numeric features (Time_spent_Alone, Social_event_attendance, Going_outside, Friends_circle_size, Post_frequency) and most-frequent imputation for categoricals (Stage_fear, Drained_after_socializing). Used OneHotEncoding and trained a GradientBoostingClassifier (sklearn) achieving 97.27% cross-validation accuracy. The external dataset merge — matching on all 7 feature columns — was the key technique that unlocked near-perfect accuracy.
Results & Metrics
- Ranked 21st globally out of 4,329 competing teams — top 0.5% worldwide
- 97.27% cross-validation accuracy on the training set
- Key technique: merged 2 external personality datasets matched on all 7 behavioural features
- 8 features used: Time_spent_Alone, Stage_fear, Social_event_attendance, Going_outside, Drained_after_socializing, Friends_circle_size, Post_frequency
- One of 10+ Kaggle competitions completed by Kumar Katariya