![]() |
VOOZH | about |
Python lists are one of the most versatile data structures, offering a range of built-in functions for efficient data manipulation. When working with Pandas, we often need to extract entire rows from a DataFrame and store them in a list for further processing. Unlike columns, which are easily accessible as Pandas Series, rows require different methods for extraction. Lets look at some different methods to do so:
This method extracts all rows as a list of lists by converting the DataFrame into a NumPy array and then transforming it into a list. It is one of the fastest ways to convert a DataFrame into a list when performance is a key concern.
[['10/2/2011', 'Music', 10000], ['11/2/2011', 'Poetry', 5000], ['12/2/2011', 'Theatre', 15000], ['13/2/2011', 'Comedy', 2000]]
Explanation: This method extracts the underlying NumPy array of the DataFrame using df.values, which contains all rows as a structured array. The tolist() function then converts this array into a list of lists, where each row becomes a list.
Similar to values.tolist(), this method explicitly converts the DataFrame into a NumPy array before converting it into a list. It provides better compatibility with newer Pandas versions.
[['10/2/2011', 'Music', 10000], ['11/2/2011', 'Poetry', 5000], ['12/2/2011', 'Theatre', 15000], ['13/2/2011', 'Comedy', 2000]]
Explanation: Similar to values.tolist(), this method explicitly converts the DataFrame into a NumPy array using to_numpy().
This method converts each row into a dictionary, where column names serve as keys. This is particularly useful when working with structured data that needs to retain column labels.
[{'Date': '10/2/2011', 'Event': 'Music', 'Cost': 10000}, {'Date': '11/2/2011', 'Event': 'Poetry', 'Cost': 5000}, {'Date': '12/2/2011', 'Event': 'Theatre', 'Cost': 15000}, {'Date': '13/2/2011', 'Event'...Explanation: This method converts the DataFrame into a list of dictionaries, where each row is represented as a dictionary. The orient='records' argument ensures that each dictionary represents a row, with column names as keys and corresponding values as dictionary values.
This method iterates over each row as a named tuple, making it memory-efficient. It is useful for processing large datasets efficiently.
[['10/2/2011', 'Music', 10000], ['11/2/2011', 'Poetry', 5000], ['12/2/2011', 'Theatre', 15000], ['13/2/2011', 'Comedy', 2000]]
Explanation:itertuples() method iterates over each row as a named tuple, providing efficient row-wise access and using index=False ensures that the index column is not included in the tuples.