ML

Kaggle — Personality Prediction (Rank 21)

ML competition predicting introvert vs extrovert from 8 behavioural features — ranked 21st out of 4,329 teams (top 0.5%) with 97.27% accuracy.

ML
🏆

Rank 21

out of 4,329 Teams

Top 0.5%

Global Percentile

97.27%

CV Accuracy

8

Behavioural Features

The Problem

Kaggle Playground Series S5E7 challenge: predict personality type (Introvert/Extrovert) from anonymised behavioural survey features. The key challenges are handling missing values, extracting maximum signal from only 8 features, and avoiding leaderboard overfitting on a dataset with near-perfect class separability.

The Solution

Merged two external personality datasets with the competition training data to expand training signal. Applied median imputation for numeric features (Time_spent_Alone, Social_event_attendance, Going_outside, Friends_circle_size, Post_frequency) and most-frequent imputation for categoricals (Stage_fear, Drained_after_socializing). Used OneHotEncoding and trained a GradientBoostingClassifier (sklearn) achieving 97.27% cross-validation accuracy. The external dataset merge — matching on all 7 feature columns — was the key technique that unlocked near-perfect accuracy.

Results & Metrics

  • Ranked 21st globally out of 4,329 competing teams — top 0.5% worldwide
  • 97.27% cross-validation accuracy on the training set
  • Key technique: merged 2 external personality datasets matched on all 7 behavioural features
  • 8 features used: Time_spent_Alone, Stage_fear, Social_event_attendance, Going_outside, Drained_after_socializing, Friends_circle_size, Post_frequency
  • One of 10+ Kaggle competitions completed by Kumar Katariya

Tech Stack

PythonGradientBoostingClassifierscikit-learnPandasOneHotEncodingSimpleImputer