VOOZH about

URL: https://www.geeksforgeeks.org/python/eliminating-repeated-lines-from-a-file-using-python/

⇱ Eliminating repeated lines from a file using Python - GeeksforGeeks


  • Courses
  • Tutorials
  • Interview Prep

Eliminating repeated lines from a file using Python

Last Updated : 19 Dec, 2025

Given a text file that contains several duplicate lines, the task is to remove all repeated lines and produce an output file containing only unique lines, while keeping their original order.

Example: Input file(myfile.txt)

This is a sample line.
Python is a powerful language.
This is a sample line.

Output:
This is a sample line.
Python is a powerful language.

Below are several methods to eliminate repeated lines from a file:

Using a Set

This method removes duplicate lines by storing only unique lines in a Python set.

Output

This is a sample line.
Python is a powerful language.

Explanation:

  • seen = set(): Stores all unique lines encountered
  • for ln in f_in: Reads every line one by one
  • if ln not in seen: Checks if the line is unique
  • f_out.write(ln): Writes unique line to output file
  • seen.add(ln): Marks the line as seen.

Using a List

This method removes repeated lines by checking each line before adding it to a list, ensuring only unique lines are kept.

Output

This is a sample line.
Python is a powerful language.

Explanation:

  • f_out.write(ln): Writes only unique lines
  • seen.append(ln): Saves the line for comparison

Using Pandas

This method removes duplicate lines by loading the file into a Pandas DataFrame and using its built-in drop_duplicates() function.

Output

This is a sample line.
Python is a powerful language.

Explanation:

  • read_csv(...): Reads text lines into a DataFrame
  • drop_duplicates(): Removes duplicate rows
  • to_csv(...): Saves cleaned data back to a file

Related Articles:

Comment