![]() |
VOOZH | about |
In this article, we look at how to convert sklearn dataset to a pandas dataframe in Python.
Sklearn and pandas are python libraries that are used widely for data science and machine learning operations. Pandas is majorly focused on data processing, manipulation, cleaning, and visualization whereas sklearn library provides a vast list of tools and functions to train machine learning models.
Here we imported the iris dataset from the sklearn library. We then load this data by calling the load_iris() method and saving it in the iris_data named variable. This variable has the type sklearn.utils._bunch.Bunch. The iris_data has different attributes, namely, data, target, frame, target_names, DESCR, feature_names, filename, data_module. We will make use of the data and feature_names attribute. The data attribute returns the complete data matrix for the iris dataset. The feature_names attribute returns a list of column names to consider for the data.
Output:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) 0 5.1 3.5 1.4 0.2 1 4.9 3.0 1.4 0.2 2 4.7 3.2 1.3 0.2 3 4.6 3.1 1.5 0.2 4 5.0 3.6 1.4 0.2
In this example, we will create a function named convert_to_dataframe that will help us to convert the sklearn datasets to pandas dataframe. This function will require one parameter i.e. sk_data which is the sklearn dataset and return a pandas dataframe format of this data. We are using sklearn's diabetes dataset in this example.
Output:
age sex bmi bp s1 s2 s3 \
0 0.038076 0.050680 0.061696 0.021872 -0.044223 -0.034821 -0.043401
1 -0.001882 -0.044642 -0.051474 -0.026328 -0.008449 -0.019163 0.074412
2 0.085299 0.050680 0.044451 -0.005670 -0.045599 -0.034194 -0.032356
3 -0.089063 -0.044642 -0.011595 -0.036656 0.012191 0.024991 -0.036038
4 0.005383 -0.044642 -0.036385 0.021872 0.003935 0.015596 0.008142
s4 s5 s6
0 -0.002592 0.019907 -0.017646
1 -0.039493 -0.068332 -0.092204
2 -0.002592 0.002861 -0.025930
3 0.034309 0.022688 -0.009362
4 -0.002592 -0.031988 -0.046641