Begin by importing train_test_split from sklearn.model_selection. You'll also need libraries like NumPy or pandas for dataset manipulation.
Organize your data into features (X) and labels (y). For supervised learning, X is your input data, and y is the target variable.
Use train_test_split(X, y, test_size=0.2) to divide your dataset. The test_size parameter defines the proportion for testing (e.g., 20% test, 80% train).
If you want a balanced split for classification tasks, use the stratify=y argument to maintain the class distribution in both training and testing sets.
By default, train_test_split shuffles the data before splitting. Set shuffle=False if you need to preserve the original order, such as in time series data.
After splitting, use the training set for model training and the test set for evaluation. This helps in assessing the model's performance on unseen data.