VOOZH about

URL: https://www.geeksforgeeks.org/machine-learning/overview-of-personality-prediction-project-using-ml/

โ‡ฑ Personality Prediction Project using ML - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Personality Prediction Project using ML

Last Updated : 29 Aug, 2025

Myers-Briggs Type Indicator (MBTI) is used to predict personality type based on answers to a MBTI-style survey. The MBTI framework classifies personalities into 16 distinct types based on four dimensions involving how people perceive the world and make decisions. Let's make a machine learning model which will:

  • Learns from a dataset of social media posts labeled with MBTI types.
  • The textual data is converted into numerical features using TF-IDF vectorization, capturing the importance of words.
  • It combines text features with simulated or collected questionnaire answers representing preferences in social behavior, information processing, decision making, work style and values.
  • A Random Forest classifier is trained on this hybrid data to predict the personality type accurately.

Step-by-Step Implementation

Let's build our prediction model step by step and use it to predict our personality type:

Step 1: Install dependencies

We will install the required packages,

Step 2: Import Libraries and Load Data

We will import the required libraries for our model and load the MBTI dataset which contains user posts and their MBTI labels

  • pandas: Used for data manipulation and loading CSV files.
  • LabelEncoder: Converts MBTI personality type labels (strings) into numeric codes for classification.
  • train_test_split: Splits dataset into training and testing subsets.
  • TfidfVectorizer: Converts user text data (posts) into numerical vectors using TF-IDF vectorization.

The MTBI dataset can be download from here.

Step 3: Encode Personality Labels and Split Dataset

We will encode the labels and also split the dataset for training and testing,

  • Label Encoder transforms MBTI labels into integers (e.g., 'INFP' -> 6).
  • Separates posts (X_text) and label codes (y).
  • Split: 80% training data, 20% testing to evaluate model generalization.

Step 4: TF-IDF Vectorization of Text Data

Now we:

  • Converts raw text posts into sparse matrices of TF-IDF features.
  • Limits to top 3000 frequent words for tractability.
  • Removes common English stop words to reduce noise.

Step 5: Simulate Questionnaire Data for Training

We simulate questions and answers for training the model.

Step 6: Combine Text and Questionnaire Features

Now we,

  • Horizontally stacks the TF-IDF vectors and questionnaire answer vectors.
  • Combines text content and survey responses into one feature matrix.
  • hstack efficiently handles sparse text vectors combined with dense questionnaire data.

Step 7: Train Random Forest Model and Evaluate Performance

  • RandomForestClassifierRandom Forest classifier is an ensemble tree-based model that combines many decision trees to improve accuracy and reduce overfitting.
  • n_estimators=100 specifies 100 trees in the forest.
  • random_state=42 ensures results can be reproduced.
  • After training on both text features and questionnaire answers, it predicts on the unseen test set.
  • accuracy_score: Shows overall proportion of correctly predicted instances.
  • classification_report: Provides detailed metrics per MBTI category for a nuanced evaluation.

Output:

๐Ÿ‘ Screenshot-2025-08-29-094217
Training and Testing

Step 8: Save Trained Model and Vectorize for Use

Now we save the trained Random Forest model and all encoders/vectorizers to disk. These files are loaded later for interactive prediction after deployment.

To know more about saving and reusing the model we can refer to: Save and Load Machine Learning Models.

Step 9: Load Saved Models and Personality Description File

Here we,

  • Load the trained classifier, vectorizer and label encoder for inference.
  • Load a JSON file with textual personality descriptions for each MBTI type.
  • This allows showing detailed feedback on predictions.

The JSON file with personality description can be download from here.

Step 10: Questionnaire Setup and Interactive User Input

Now we,

  • Define the 5 MBTI survey questions with two answer options each.
  • Gets freeform self-description from user.
  • Then sequentially asks each MBTI question, collects responses as binary 0/1.

Output:

๐Ÿ‘ questionnaire
Questions

Step 11: Vectorize Input and Combine Features

  • Converts the userโ€™s text into a TF-IDF vector (same space as training).
  • Formats questionnaire answers as a numeric feature vector.
  • Stacks both into one hybrid vector for prediction.

Step 12: Make Personality Prediction and Output Description

Now our model,

  • Passes combined features through the trained model to predict the MBTI label code.
  • Converts numeric MBTI code back to string label.
  • Retrieves and prints the detailed MBTI description for user clarity.

Output:

๐Ÿ‘ Screenshot-2025-08-29-102052
Personality Predicted by Model

As we saw that our model predicted the personality type of a person based on the answers of the questionnaire.

Step 13: Store the Profile in ChromaDB Vector Database

Our model,

  • Connects to ChromaDB (local vector DB) to store user profile embeddings.
  • Metadata contains MBTI type, answers and user text for rich querying.
  • Uses a unique UUID string as identifier for each stored profile.
  • Persists the profile for future user comparisons, recommendations or analytics.

Output:

Your profile has been saved to the personality database.

Step 14: Access the Database

We can access the ChromaDB database,

  • To get all stored metadata and IDs.
  • Retrieves all saved vectorsโ€™ metadata and ids (user texts and MBTI types stored in metadata).

Output:

Stored profile IDs: ['ff6ea2d8-0b78-47ea-b125-0d9baec116a2', '3665925b-1b07-489b-9108-7f4ad3914618']
Stored metadata example: [{'user_text': 'I am a calm person and an extrovert. I love to to explore things', 'mbti_type': 'INFP', 'answers': '[0, 1, 0, 1, 1]'},
{'mbti_type': 'INFP', 'answers': '[1, 0, 1, 0, 1]', 'user_text': 'I am a sad person'}]

The complete notebook can be download from here.

Comment
Article Tags:
Article Tags: