![]() |
VOOZH | about |
We are given a string that may contain one or more URLs and our task is to extract them efficiently. This is useful for web scraping, text processing, and data validation.
For Example:
Input: s = "My Profile: https://www.geeksforgeeks.org/404.html/ in the portal of https://www.geeksforgeeks.org/"
Output: ['https://www.geeksforgeeks.org/404.html/', 'https://www.geeksforgeeks.org/']
Ouput is a list containing all the URLs.
Below are the several methods to perform this task:
re.findall() function in Python is used to find all occurrences of a pattern in a given string and return them as a list.
URLs: ['https://www.geeksforgeeks.org/404.html/', 'https://www.geeksforgeeks.org/']
Explanation:
urlparse() function from urllib.parse breaks down a URL into components like scheme, domain, path, and query.
URLs: ['https://www.geeksforgeeks.org/404.html/', 'https://www.geeksforgeeks.org/']
Explanation:
urlextract is a third-party Python library used to easily extract URLs from text without writing complex regular expressions. Use the following pip command to install it:
pip install urlextract
Output
['https://www.geeksforgeeks.org/404.html/', 'https://www.geeksforgeeks.org/']
Explanation:
This approach splits the text into words and checks each word using the built-in startswith() method to see if it begins with "http://" or "https://". Matching words are treated as URLs and collected.
Urls: ['https://www.geeksforgeeks.org/404.html/', 'https://www.geeksforgeeks.org/']
Explanation:
find() is a built-in method in Python that is used to find a specific element in a collection, so we can use it to identify and extract a URL from a string.
Urls: ['https://www.geeksforgeeks.org/404.html/', 'https://www.geeksforgeeks.org/']
Explanation: